chatterbox-ui/README.md

# Chatterbox TTS Gradio App

This Gradio application provides a user interface for text-to-speech generation using the Chatterbox TTS model. It supports both single utterance generation and multi-speaker dialog generation with configurable silence gaps.

## Features

- **Single Utterance Generation**: Generate speech from text using a selected speaker
- **Dialog Generation**: Create multi-speaker conversations with configurable silence gaps
- **Speaker Management**: Add/remove speakers with custom audio samples
- **Memory Optimization**: Automatic model cleanup after generation
- **Output Organization**: Files saved in `single_output/` and `dialog_output/` directories

## Getting Started

1. Clone the repository:
   ```bash
   git clone https://github.com/your-username/chatterbox-test.git
   ```

2. Install dependencies:
   ```bash
   pip install -r requirements.txt
   ```

3. Prepare speaker samples:
   - Create a `speaker_samples/` directory
   - Add audio samples (WAV format) for each speaker
   - Update `speakers.yaml` with speaker names and file paths

4. Run the app:
   ```bash
   python gradio_app.py
   ```

## Usage

### Single Utterance Tab
- Select a speaker from the dropdown
- Enter text to synthesize
- Adjust generation parameters as needed
- Click "Generate Speech"

### Dialog Generation Tab
1. Add speakers using the speaker configuration section
2. Enter dialog in the format:
   ```
   Speaker1: "Hello, how are you?"
   Speaker2: "I'm doing well!"
   Silence: 0.5
   Speaker1: "What are your plans for today?"
   ```
3. Set output base name
4. Click "Generate Dialog"

## File Organization

- Generated single utterances are saved to `single_output/`
- Dialog generation files are saved to `dialog_output/`
- Concatenated dialog files have `_concatenated.wav` suffix
- All files are zipped together for download

## Memory Management

The app automatically:
- Cleans up the TTS model after each generation
- Frees GPU memory (for CUDA/MPS devices)
- Deletes intermediate tensors to minimize memory footprint

## Troubleshooting

- **"Skipping unknown speaker"**: Add the speaker first using the speaker configuration
- **"Sample file not found"**: Verify the audio file exists in `speaker_samples/`
- **Memory issues**: Try enabling "Re-initialize model each line" for long dialogs