chatterbox-ui/README.md

# Chatterbox TTS Application

A comprehensive text-to-speech application with multiple interfaces for generating speech from text using the Chatterbox TTS model. Supports single utterance generation, multi-speaker dialogs, and long-form audiobook generation.

## Features

- **Multiple Interfaces**: Web UI, FastAPI backend, Gradio interface, and CLI tools
- **Single Utterance Generation**: Generate speech from text using a selected speaker
- **Dialog Generation**: Create multi-speaker conversations with configurable silence gaps
- **Audiobook Generation**: Convert long-form text into narrated audiobooks
- **Speaker Management**: Add/remove speakers with custom audio samples
- **Memory Optimization**: Automatic model cleanup after generation
- **Output Organization**: Files saved in organized directories with ZIP packaging

## Getting Started

### Quick Setup

1. Clone the repository and install dependencies:
   ```bash
   git clone https://github.com/your-username/chatterbox-ui.git
   cd chatterbox-ui
   pip install -r requirements.txt
   npm install
   ```

2. Run automated setup:
   ```bash
   python setup.py
   ```

3. Prepare speaker samples:
   - Add audio samples (WAV format) to `speaker_data/speaker_samples/`
   - Configure speakers in `speaker_data/speakers.yaml`

### Running the Application

**Full-Stack Web Application:**
```bash
# Start both backend and frontend servers
python start_servers.py
```

**Individual Components:**
```bash
# Backend only (FastAPI)
uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000

# Frontend only
cd frontend && python start_dev_server.py

# Gradio interface
python gradio_app.py
```

## Usage

### Web Interface
Access the modern web UI at `http://localhost:8001` for interactive dialog creation with drag-and-drop editing.

### CLI Tools

**Single utterance generation:**
```bash
python cbx-generate.py --sample speaker_samples/voice.wav --output output.wav --text "Hello world"
```

**Dialog generation:**
```bash
python cbx-dialog-generate.py --dialog dialog.md --output dialog_output
```

**Audiobook generation:**
```bash
python cbx-audiobook.py --input book.txt --output audiobook --speaker speaker_name
```

### Gradio Interface
- **Single Utterance Tab**: Select speaker, enter text, adjust parameters, generate
- **Dialog Generation Tab**: Configure speakers and create multi-speaker conversations
- Dialog format:
  ```
  Speaker1: "Hello, how are you?"
  Speaker2: "I'm doing well!"
  Silence: 0.5
  Speaker1: "What are your plans for today?"
  ```

## Architecture Overview

### Application Structure
- **Frontend**: Modern vanilla JavaScript web UI (`frontend/`)
- **Backend**: FastAPI REST API (`backend/`)
- **CLI Tools**: Command-line utilities (`cbx-*.py`)
- **Gradio Interface**: Alternative web UI (`gradio_app.py`)

### New Files and Features
- **`cbx-audiobook.py`**: Generate long-form audiobooks from text files
- **`import_helper.py`**: Utility for managing imports and dependencies
- **Backend Services**: Enhanced dialog processing, speaker management, and TTS services
- **Web Frontend**: Interactive dialog editor with drag-and-drop functionality

### File Organization
- `single_output/` - Single utterance generations
- `dialog_output/` - Multi-speaker dialog files
- `tts_outputs/` - Raw TTS generation files
- `speaker_data/` - Speaker configurations and audio samples
- Generated files packaged in ZIP archives for download

### API Endpoints
- `/api/speakers/` - Speaker CRUD operations
- `/api/dialog/generate/` - Full dialog generation
- `/api/dialog/generate_line/` - Single line generation
- `/generated_audio/` - Static audio file serving

## Configuration

### Environment Setup
Key configuration files:
- `.env` - Global settings
- `backend/.env` - Backend-specific settings
- `frontend/.env` - Frontend-specific settings
- `speaker_data/speakers.yaml` - Speaker configuration

### Development Commands
```bash
# Run tests
python backend/run_api_test.py
npm test

# Backend development
uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000

# Access points
# Web UI: http://localhost:8001
# API: http://localhost:8000
# API Docs: http://localhost:8000/docs
```

## Memory Management

The application automatically:
- Cleans up the TTS model after each generation
- Manages GPU memory (CUDA/MPS devices)
- Optimizes memory usage for long-form content

## Troubleshooting

- **"Skipping unknown speaker"**: Configure speaker in `speaker_data/speakers.yaml`
- **"Sample file not found"**: Verify audio files exist in `speaker_data/speaker_samples/`
- **Memory issues**: Use model reinitialization options for long content
- **CORS errors**: Check frontend/backend port configuration (frontend origin is auto-included when using `start_servers.py`)
- **Import errors**: Run `python import_helper.py` to check dependencies