chatterbox-ui/README.md

154 lines
4.7 KiB
Markdown

# Chatterbox TTS Application
A comprehensive text-to-speech application with multiple interfaces for generating speech from text using the Chatterbox TTS model. Supports single utterance generation, multi-speaker dialogs, and long-form audiobook generation.
## Features
- **Multiple Interfaces**: Web UI, FastAPI backend, Gradio interface, and CLI tools
- **Single Utterance Generation**: Generate speech from text using a selected speaker
- **Dialog Generation**: Create multi-speaker conversations with configurable silence gaps
- **Audiobook Generation**: Convert long-form text into narrated audiobooks
- **Speaker Management**: Add/remove speakers with custom audio samples
- **Memory Optimization**: Automatic model cleanup after generation
- **Output Organization**: Files saved in organized directories with ZIP packaging
## Getting Started
### Quick Setup
1. Clone the repository and install dependencies:
```bash
git clone https://github.com/your-username/chatterbox-ui.git
cd chatterbox-ui
pip install -r requirements.txt
npm install
```
2. Run automated setup:
```bash
python setup.py
```
3. Prepare speaker samples:
- Add audio samples (WAV format) to `speaker_data/speaker_samples/`
- Configure speakers in `speaker_data/speakers.yaml`
### Running the Application
**Full-Stack Web Application:**
```bash
# Start both backend and frontend servers
python start_servers.py
```
**Individual Components:**
```bash
# Backend only (FastAPI)
uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000
# Frontend only
cd frontend && python start_dev_server.py
# Gradio interface
python gradio_app.py
```
## Usage
### Web Interface
Access the modern web UI at `http://localhost:8001` for interactive dialog creation with drag-and-drop editing.
### CLI Tools
**Single utterance generation:**
```bash
python cbx-generate.py --sample speaker_samples/voice.wav --output output.wav --text "Hello world"
```
**Dialog generation:**
```bash
python cbx-dialog-generate.py --dialog dialog.md --output dialog_output
```
**Audiobook generation:**
```bash
python cbx-audiobook.py --input book.txt --output audiobook --speaker speaker_name
```
### Gradio Interface
- **Single Utterance Tab**: Select speaker, enter text, adjust parameters, generate
- **Dialog Generation Tab**: Configure speakers and create multi-speaker conversations
- Dialog format:
```
Speaker1: "Hello, how are you?"
Speaker2: "I'm doing well!"
Silence: 0.5
Speaker1: "What are your plans for today?"
```
## Architecture Overview
### Application Structure
- **Frontend**: Modern vanilla JavaScript web UI (`frontend/`)
- **Backend**: FastAPI REST API (`backend/`)
- **CLI Tools**: Command-line utilities (`cbx-*.py`)
- **Gradio Interface**: Alternative web UI (`gradio_app.py`)
### New Files and Features
- **`cbx-audiobook.py`**: Generate long-form audiobooks from text files
- **`import_helper.py`**: Utility for managing imports and dependencies
- **Backend Services**: Enhanced dialog processing, speaker management, and TTS services
- **Web Frontend**: Interactive dialog editor with drag-and-drop functionality
### File Organization
- `single_output/` - Single utterance generations
- `dialog_output/` - Multi-speaker dialog files
- `tts_outputs/` - Raw TTS generation files
- `speaker_data/` - Speaker configurations and audio samples
- Generated files packaged in ZIP archives for download
### API Endpoints
- `/api/speakers/` - Speaker CRUD operations
- `/api/dialog/generate/` - Full dialog generation
- `/api/dialog/generate_line/` - Single line generation
- `/generated_audio/` - Static audio file serving
## Configuration
### Environment Setup
Key configuration files:
- `.env` - Global settings
- `backend/.env` - Backend-specific settings
- `frontend/.env` - Frontend-specific settings
- `speaker_data/speakers.yaml` - Speaker configuration
### Development Commands
```bash
# Run tests
python backend/run_api_test.py
npm test
# Backend development
uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000
# Access points
# Web UI: http://localhost:8001
# API: http://localhost:8000
# API Docs: http://localhost:8000/docs
```
## Memory Management
The application automatically:
- Cleans up the TTS model after each generation
- Manages GPU memory (CUDA/MPS devices)
- Optimizes memory usage for long-form content
## Troubleshooting
- **"Skipping unknown speaker"**: Configure speaker in `speaker_data/speakers.yaml`
- **"Sample file not found"**: Verify audio files exist in `speaker_data/speaker_samples/`
- **Memory issues**: Use model reinitialization options for long content
- **CORS errors**: Check frontend/backend port configuration
- **Import errors**: Run `python import_helper.py` to check dependencies