|
|
||
|---|---|---|
| .note | ||
| backend | ||
| frontend | ||
| node_modules | ||
| speaker_data | ||
| .gitignore | ||
| README-dialog-generator.md | ||
| README.md | ||
| babel.config.cjs | ||
| cbx-dialog-generate.py | ||
| cbx-generate.py | ||
| chatterbox-test.py | ||
| chatterbox_tts.py.bak | ||
| gradio_app.py | ||
| package-lock.json | ||
| package.json | ||
| sample-dialog.md | ||
| speakers.yaml | ||
| storage_service.py | ||
| test1-wav | ||
README.md
Chatterbox TTS Gradio App
This Gradio application provides a user interface for text-to-speech generation using the Chatterbox TTS model. It supports both single utterance generation and multi-speaker dialog generation with configurable silence gaps.
Features
- Single Utterance Generation: Generate speech from text using a selected speaker
- Dialog Generation: Create multi-speaker conversations with configurable silence gaps
- Speaker Management: Add/remove speakers with custom audio samples
- Memory Optimization: Automatic model cleanup after generation
- Output Organization: Files saved in
single_output/anddialog_output/directories
Getting Started
-
Clone the repository:
git clone https://github.com/your-username/chatterbox-test.git -
Install dependencies:
pip install -r requirements.txt -
Prepare speaker samples:
- Create a
speaker_samples/directory - Add audio samples (WAV format) for each speaker
- Update
speakers.yamlwith speaker names and file paths
- Create a
-
Run the app:
python gradio_app.py
Usage
Single Utterance Tab
- Select a speaker from the dropdown
- Enter text to synthesize
- Adjust generation parameters as needed
- Click "Generate Speech"
Dialog Generation Tab
- Add speakers using the speaker configuration section
- Enter dialog in the format:
Speaker1: "Hello, how are you?" Speaker2: "I'm doing well!" Silence: 0.5 Speaker1: "What are your plans for today?" - Set output base name
- Click "Generate Dialog"
File Organization
- Generated single utterances are saved to
single_output/ - Dialog generation files are saved to
dialog_output/ - Concatenated dialog files have
_concatenated.wavsuffix - All files are zipped together for download
Memory Management
The app automatically:
- Cleans up the TTS model after each generation
- Frees GPU memory (for CUDA/MPS devices)
- Deletes intermediate tensors to minimize memory footprint
Troubleshooting
- "Skipping unknown speaker": Add the speaker first using the speaker configuration
- "Sample file not found": Verify the audio file exists in
speaker_samples/ - Memory issues: Try enabling "Re-initialize model each line" for long dialogs