|
||
---|---|---|
.note | ||
.opencode | ||
backend | ||
frontend | ||
higgs-audio@f04f5df76a | ||
speaker_data | ||
.env.example | ||
.gitignore | ||
AGENTS.md | ||
API_REFERENCE.md | ||
CLAUDE.md | ||
ENVIRONMENT_SETUP.md | ||
OpenCode.md | ||
README-dialog-generator.md | ||
README.md | ||
babel.config.cjs | ||
cbx-audiobook.py | ||
cbx-dialog-generate.py | ||
cbx-generate.py | ||
chatterbox-test.py | ||
chatterbox_tts.py.bak | ||
gradio_app.py | ||
higgs_plan.md | ||
import_helper.py | ||
package-lock.json | ||
package.json | ||
predator.txt | ||
requirements.txt | ||
sample-audiobook.txt | ||
sample-dialog.md | ||
setup.py | ||
speakers.yaml | ||
start_servers.py | ||
start_servers_safe.py | ||
storage_service.py | ||
test.py | ||
test1-wav |
README.md
Chatterbox TTS Application
A comprehensive text-to-speech application with multiple interfaces for generating speech from text using the Chatterbox TTS model. Supports single utterance generation, multi-speaker dialogs, and long-form audiobook generation.
Features
- Multiple Interfaces: Web UI, FastAPI backend, Gradio interface, and CLI tools
- Single Utterance Generation: Generate speech from text using a selected speaker
- Dialog Generation: Create multi-speaker conversations with configurable silence gaps
- Audiobook Generation: Convert long-form text into narrated audiobooks
- Speaker Management: Add/remove speakers with custom audio samples
- Memory Optimization: Automatic model cleanup after generation
- Output Organization: Files saved in organized directories with ZIP packaging
Getting Started
Quick Setup
-
Clone the repository and install dependencies:
git clone https://github.com/your-username/chatterbox-ui.git cd chatterbox-ui pip install -r requirements.txt npm install
-
Run automated setup:
python setup.py
-
Prepare speaker samples:
- Add audio samples (WAV format) to
speaker_data/speaker_samples/
- Configure speakers in
speaker_data/speakers.yaml
- Add audio samples (WAV format) to
Running the Application
Full-Stack Web Application:
# Start both backend and frontend servers
python start_servers.py
Individual Components:
# Backend only (FastAPI)
uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000
# Frontend only
cd frontend && python start_dev_server.py
# Gradio interface
python gradio_app.py
Usage
Web Interface
Access the modern web UI at http://localhost:8001
for interactive dialog creation with drag-and-drop editing.
CLI Tools
Single utterance generation:
python cbx-generate.py --sample speaker_samples/voice.wav --output output.wav --text "Hello world"
Dialog generation:
python cbx-dialog-generate.py --dialog dialog.md --output dialog_output
Audiobook generation:
python cbx-audiobook.py --input book.txt --output audiobook --speaker speaker_name
Gradio Interface
- Single Utterance Tab: Select speaker, enter text, adjust parameters, generate
- Dialog Generation Tab: Configure speakers and create multi-speaker conversations
- Dialog format:
Speaker1: "Hello, how are you?" Speaker2: "I'm doing well!" Silence: 0.5 Speaker1: "What are your plans for today?"
Architecture Overview
Application Structure
- Frontend: Modern vanilla JavaScript web UI (
frontend/
) - Backend: FastAPI REST API (
backend/
) - CLI Tools: Command-line utilities (
cbx-*.py
) - Gradio Interface: Alternative web UI (
gradio_app.py
)
New Files and Features
cbx-audiobook.py
: Generate long-form audiobooks from text filesimport_helper.py
: Utility for managing imports and dependencies- Backend Services: Enhanced dialog processing, speaker management, and TTS services
- Web Frontend: Interactive dialog editor with drag-and-drop functionality
File Organization
single_output/
- Single utterance generationsdialog_output/
- Multi-speaker dialog filestts_outputs/
- Raw TTS generation filesspeaker_data/
- Speaker configurations and audio samples- Generated files packaged in ZIP archives for download
API Endpoints
/api/speakers/
- Speaker CRUD operations/api/dialog/generate/
- Full dialog generation/api/dialog/generate_line/
- Single line generation/generated_audio/
- Static audio file serving
Configuration
Environment Setup
Key configuration files:
.env
- Global settingsbackend/.env
- Backend-specific settingsfrontend/.env
- Frontend-specific settingsspeaker_data/speakers.yaml
- Speaker configuration
Development Commands
# Run tests
python backend/run_api_test.py
npm test
# Backend development
uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000
# Access points
# Web UI: http://localhost:8001
# API: http://localhost:8000
# API Docs: http://localhost:8000/docs
Memory Management
The application automatically:
- Cleans up the TTS model after each generation
- Manages GPU memory (CUDA/MPS devices)
- Optimizes memory usage for long-form content
Troubleshooting
- "Skipping unknown speaker": Configure speaker in
speaker_data/speakers.yaml
- "Sample file not found": Verify audio files exist in
speaker_data/speaker_samples/
- Memory issues: Use model reinitialization options for long content
- CORS errors: Check frontend/backend port configuration
- Import errors: Run
python import_helper.py
to check dependencies