- Use singleton pattern from TTSService for efficient model management - Remove complex manual memory cleanup code - Simplify CLI arguments by removing redundant memory management options - Load model once at start, let singleton handle efficient reuse - Remove keep-model-loaded and cleanup-interval options - Streamline generation logic to match backend service patterns 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> |
||
---|---|---|
.aider.tags.cache.v4 | ||
.note | ||
.opencode | ||
backend | ||
frontend | ||
speaker_data | ||
.aider.chat.history.md | ||
.aider.input.history | ||
.env.example | ||
.gitignore | ||
AGENTS.md | ||
API_REFERENCE.md | ||
CLAUDE.md | ||
ENVIRONMENT_SETUP.md | ||
OpenCode.md | ||
README-dialog-generator.md | ||
README.md | ||
babel.config.cjs | ||
cbx-audiobook.py | ||
cbx-dialog-generate.py | ||
cbx-generate.py | ||
chatterbox-test.py | ||
chatterbox_tts.py.bak | ||
gradio_app.py | ||
import_helper.py | ||
package-lock.json | ||
package.json | ||
requirements.txt | ||
sample-dialog.md | ||
setup.py | ||
speakers.yaml | ||
start_servers.py | ||
storage_service.py | ||
test.py | ||
test1-wav |
README.md
Chatterbox TTS Gradio App
This Gradio application provides a user interface for text-to-speech generation using the Chatterbox TTS model. It supports both single utterance generation and multi-speaker dialog generation with configurable silence gaps.
Features
- Single Utterance Generation: Generate speech from text using a selected speaker
- Dialog Generation: Create multi-speaker conversations with configurable silence gaps
- Speaker Management: Add/remove speakers with custom audio samples
- Memory Optimization: Automatic model cleanup after generation
- Output Organization: Files saved in
single_output/
anddialog_output/
directories
Getting Started
-
Clone the repository:
git clone https://github.com/your-username/chatterbox-test.git
-
Install dependencies:
pip install -r requirements.txt
-
Prepare speaker samples:
- Create a
speaker_samples/
directory - Add audio samples (WAV format) for each speaker
- Update
speakers.yaml
with speaker names and file paths
- Create a
-
Run the app:
python gradio_app.py
Usage
Single Utterance Tab
- Select a speaker from the dropdown
- Enter text to synthesize
- Adjust generation parameters as needed
- Click "Generate Speech"
Dialog Generation Tab
- Add speakers using the speaker configuration section
- Enter dialog in the format:
Speaker1: "Hello, how are you?" Speaker2: "I'm doing well!" Silence: 0.5 Speaker1: "What are your plans for today?"
- Set output base name
- Click "Generate Dialog"
File Organization
- Generated single utterances are saved to
single_output/
- Dialog generation files are saved to
dialog_output/
- Concatenated dialog files have
_concatenated.wav
suffix - All files are zipped together for download
Memory Management
The app automatically:
- Cleans up the TTS model after each generation
- Frees GPU memory (for CUDA/MPS devices)
- Deletes intermediate tensors to minimize memory footprint
Troubleshooting
- "Skipping unknown speaker": Add the speaker first using the speaker configuration
- "Sample file not found": Verify the audio file exists in
speaker_samples/
- Memory issues: Try enabling "Re-initialize model each line" for long dialogs