Reviewed-on: #1 |
||
|---|---|---|
| .note | ||
| .opencode | ||
| backend | ||
| frontend | ||
| speaker_data | ||
| .env.example | ||
| .gitignore | ||
| AGENTS.md | ||
| API_REFERENCE.md | ||
| CLAUDE.md | ||
| ENVIRONMENT_SETUP.md | ||
| OpenCode.md | ||
| README-dialog-generator.md | ||
| README.md | ||
| babel.config.cjs | ||
| cbx-audiobook.py | ||
| cbx-dialog-generate.py | ||
| cbx-generate.py | ||
| chatterbox-test.py | ||
| chatterbox_tts.py.bak | ||
| forge.yaml | ||
| gradio_app.py | ||
| import_helper.py | ||
| jest.config.cjs | ||
| package-lock.json | ||
| package.json | ||
| predator.txt | ||
| requirements.txt | ||
| sample-audiobook.txt | ||
| sample-dialog.md | ||
| setup-windows.ps1 | ||
| setup.py | ||
| speakers.yaml | ||
| start_servers.py | ||
| storage_service.py | ||
| test.py | ||
| test1-wav | ||
README.md
Chatterbox TTS Application
A comprehensive text-to-speech application with multiple interfaces for generating speech from text using the Chatterbox TTS model. Supports single utterance generation, multi-speaker dialogs, and long-form audiobook generation.
Features
- Multiple Interfaces: Web UI, FastAPI backend, Gradio interface, and CLI tools
- Single Utterance Generation: Generate speech from text using a selected speaker
- Dialog Generation: Create multi-speaker conversations with configurable silence gaps
- Audiobook Generation: Convert long-form text into narrated audiobooks
- Speaker Management: Add/remove speakers with custom audio samples
- Paste Script (JSONL) Import: Paste a dialog script as JSONL directly into the editor via a modal
- Memory Optimization: Automatic model cleanup after generation
- Output Organization: Files saved in organized directories with ZIP packaging
Getting Started
Quick Setup
-
Clone the repository and install dependencies:
git clone https://github.com/your-username/chatterbox-ui.git cd chatterbox-ui pip install -r requirements.txt npm install -
Run automated setup:
python setup.py -
Prepare speaker samples:
- Add audio samples (WAV format) to
speaker_data/speaker_samples/ - Configure speakers in
speaker_data/speakers.yaml
- Add audio samples (WAV format) to
Windows Quick Start
On Windows, a PowerShell setup script is provided to automate environment setup and startup.
# From the repository root in PowerShell
./setup-windows.ps1
# First time only, if scripts are blocked:
# Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
What it does:
- Creates/uses
.venv - Upgrades pip and installs deps from
backend/requirements.txtand rootrequirements.txt - Creates a default
.envwith sensible ports if missing - Starts both servers via
start_servers.py
Running the Application
Full-Stack Web Application:
# Start both backend and frontend servers
python start_servers.py
On Windows, you can also use the one-liner PowerShell script:
./setup-windows.ps1
Individual Components:
# Backend only (FastAPI)
uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000
# Frontend only
cd frontend && python start_dev_server.py
# Gradio interface
python gradio_app.py
Usage
Web Interface
Access the modern web UI at http://localhost:8001 for interactive dialog creation.
Paste Script (JSONL) in Dialog Editor
Quickly load a dialog by pasting JSONL (one JSON object per line):
- Click
Paste Scriptin the Dialog Editor. - Paste JSONL content, for example:
{"type":"speech","speaker_id":"dummy_speaker","text":"Hello there!"}
{"type":"silence","duration":0.5}
{"type":"speech","speaker_id":"dummy_speaker","text":"This is the second line."}
- Click
Loadand confirm replacement if prompted.
Notes:
- Input is validated per line; errors report line numbers.
- The dialog is saved to localStorage, so it persists across refreshes.
- Unknown
speaker_ids will still load; add speakers later if needed.
CLI Tools
Single utterance generation:
python cbx-generate.py --sample speaker_samples/voice.wav --output output.wav --text "Hello world"
Dialog generation:
python cbx-dialog-generate.py --dialog dialog.md --output dialog_output
Audiobook generation:
python cbx-audiobook.py --input book.txt --output audiobook --speaker speaker_name
Gradio Interface
- Single Utterance Tab: Select speaker, enter text, adjust parameters, generate
- Dialog Generation Tab: Configure speakers and create multi-speaker conversations
- Dialog format:
Speaker1: "Hello, how are you?" Speaker2: "I'm doing well!" Silence: 0.5 Speaker1: "What are your plans for today?"
Architecture Overview
Application Structure
- Frontend: Modern vanilla JavaScript web UI (
frontend/) - Backend: FastAPI REST API (
backend/) - CLI Tools: Command-line utilities (
cbx-*.py) - Gradio Interface: Alternative web UI (
gradio_app.py)
New Files and Features
cbx-audiobook.py: Generate long-form audiobooks from text filesimport_helper.py: Utility for managing imports and dependencies- Backend Services: Enhanced dialog processing, speaker management, and TTS services
- Web Frontend: Interactive dialog editor with drag-and-drop functionality
File Organization
single_output/- Single utterance generationsdialog_output/- Multi-speaker dialog filestts_outputs/- Raw TTS generation filesspeaker_data/- Speaker configurations and audio samples- Generated files packaged in ZIP archives for download
API Endpoints
/api/speakers/- Speaker CRUD operations/api/dialog/generate/- Full dialog generation/api/dialog/generate_line/- Single line generation/generated_audio/- Static audio file serving
Configuration
Environment Setup
Key configuration files:
.env- Global settingsbackend/.env- Backend-specific settingsfrontend/.env- Frontend-specific settingsspeaker_data/speakers.yaml- Speaker configuration
Development Commands
# Run tests
python backend/run_api_test.py
npm test
# Backend development
uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000
# Access points
# Web UI: http://localhost:8001
# API: http://localhost:8000
# API Docs: http://localhost:8000/docs
Memory Management
The application automatically:
- Cleans up the TTS model after each generation
- Manages GPU memory (CUDA/MPS devices)
- Optimizes memory usage for long-form content
Troubleshooting
- "Skipping unknown speaker": Configure speaker in
speaker_data/speakers.yaml - "Sample file not found": Verify audio files exist in
speaker_data/speaker_samples/ - Memory issues: Use model reinitialization options for long content
- CORS errors: Check frontend/backend port configuration (frontend origin is auto-included when using
start_servers.py) - Import errors: Run
python import_helper.pyto check dependencies
Windows-specific
- If PowerShell blocks script execution, run once:
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser - If Windows Firewall prompts the first time you run servers, allow access on your private network.