74 lines
2.3 KiB
Markdown
74 lines
2.3 KiB
Markdown
# Chatterbox TTS Gradio App
|
|
|
|
This Gradio application provides a user interface for text-to-speech generation using the Chatterbox TTS model. It supports both single utterance generation and multi-speaker dialog generation with configurable silence gaps.
|
|
|
|
## Features
|
|
|
|
- **Single Utterance Generation**: Generate speech from text using a selected speaker
|
|
- **Dialog Generation**: Create multi-speaker conversations with configurable silence gaps
|
|
- **Speaker Management**: Add/remove speakers with custom audio samples
|
|
- **Memory Optimization**: Automatic model cleanup after generation
|
|
- **Output Organization**: Files saved in `single_output/` and `dialog_output/` directories
|
|
|
|
## Getting Started
|
|
|
|
1. Clone the repository:
|
|
```bash
|
|
git clone https://github.com/your-username/chatterbox-test.git
|
|
```
|
|
|
|
2. Install dependencies:
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
3. Prepare speaker samples:
|
|
- Create a `speaker_samples/` directory
|
|
- Add audio samples (WAV format) for each speaker
|
|
- Update `speakers.yaml` with speaker names and file paths
|
|
|
|
4. Run the app:
|
|
```bash
|
|
python gradio_app.py
|
|
```
|
|
|
|
## Usage
|
|
|
|
### Single Utterance Tab
|
|
- Select a speaker from the dropdown
|
|
- Enter text to synthesize
|
|
- Adjust generation parameters as needed
|
|
- Click "Generate Speech"
|
|
|
|
### Dialog Generation Tab
|
|
1. Add speakers using the speaker configuration section
|
|
2. Enter dialog in the format:
|
|
```
|
|
Speaker1: "Hello, how are you?"
|
|
Speaker2: "I'm doing well!"
|
|
Silence: 0.5
|
|
Speaker1: "What are your plans for today?"
|
|
```
|
|
3. Set output base name
|
|
4. Click "Generate Dialog"
|
|
|
|
## File Organization
|
|
|
|
- Generated single utterances are saved to `single_output/`
|
|
- Dialog generation files are saved to `dialog_output/`
|
|
- Concatenated dialog files have `_concatenated.wav` suffix
|
|
- All files are zipped together for download
|
|
|
|
## Memory Management
|
|
|
|
The app automatically:
|
|
- Cleans up the TTS model after each generation
|
|
- Frees GPU memory (for CUDA/MPS devices)
|
|
- Deletes intermediate tensors to minimize memory footprint
|
|
|
|
## Troubleshooting
|
|
|
|
- **"Skipping unknown speaker"**: Add the speaker first using the speaker configuration
|
|
- **"Sample file not found"**: Verify the audio file exists in `speaker_samples/`
|
|
- **Memory issues**: Try enabling "Re-initialize model each line" for long dialogs
|