154 lines
4.7 KiB
Markdown
154 lines
4.7 KiB
Markdown
# Chatterbox TTS Application
|
|
|
|
A comprehensive text-to-speech application with multiple interfaces for generating speech from text using the Chatterbox TTS model. Supports single utterance generation, multi-speaker dialogs, and long-form audiobook generation.
|
|
|
|
## Features
|
|
|
|
- **Multiple Interfaces**: Web UI, FastAPI backend, Gradio interface, and CLI tools
|
|
- **Single Utterance Generation**: Generate speech from text using a selected speaker
|
|
- **Dialog Generation**: Create multi-speaker conversations with configurable silence gaps
|
|
- **Audiobook Generation**: Convert long-form text into narrated audiobooks
|
|
- **Speaker Management**: Add/remove speakers with custom audio samples
|
|
- **Memory Optimization**: Automatic model cleanup after generation
|
|
- **Output Organization**: Files saved in organized directories with ZIP packaging
|
|
|
|
## Getting Started
|
|
|
|
### Quick Setup
|
|
|
|
1. Clone the repository and install dependencies:
|
|
```bash
|
|
git clone https://github.com/your-username/chatterbox-ui.git
|
|
cd chatterbox-ui
|
|
pip install -r requirements.txt
|
|
npm install
|
|
```
|
|
|
|
2. Run automated setup:
|
|
```bash
|
|
python setup.py
|
|
```
|
|
|
|
3. Prepare speaker samples:
|
|
- Add audio samples (WAV format) to `speaker_data/speaker_samples/`
|
|
- Configure speakers in `speaker_data/speakers.yaml`
|
|
|
|
### Running the Application
|
|
|
|
**Full-Stack Web Application:**
|
|
```bash
|
|
# Start both backend and frontend servers
|
|
python start_servers.py
|
|
```
|
|
|
|
**Individual Components:**
|
|
```bash
|
|
# Backend only (FastAPI)
|
|
uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000
|
|
|
|
# Frontend only
|
|
cd frontend && python start_dev_server.py
|
|
|
|
# Gradio interface
|
|
python gradio_app.py
|
|
```
|
|
|
|
## Usage
|
|
|
|
### Web Interface
|
|
Access the modern web UI at `http://localhost:8001` for interactive dialog creation with drag-and-drop editing.
|
|
|
|
### CLI Tools
|
|
|
|
**Single utterance generation:**
|
|
```bash
|
|
python cbx-generate.py --sample speaker_samples/voice.wav --output output.wav --text "Hello world"
|
|
```
|
|
|
|
**Dialog generation:**
|
|
```bash
|
|
python cbx-dialog-generate.py --dialog dialog.md --output dialog_output
|
|
```
|
|
|
|
**Audiobook generation:**
|
|
```bash
|
|
python cbx-audiobook.py --input book.txt --output audiobook --speaker speaker_name
|
|
```
|
|
|
|
### Gradio Interface
|
|
- **Single Utterance Tab**: Select speaker, enter text, adjust parameters, generate
|
|
- **Dialog Generation Tab**: Configure speakers and create multi-speaker conversations
|
|
- Dialog format:
|
|
```
|
|
Speaker1: "Hello, how are you?"
|
|
Speaker2: "I'm doing well!"
|
|
Silence: 0.5
|
|
Speaker1: "What are your plans for today?"
|
|
```
|
|
|
|
## Architecture Overview
|
|
|
|
### Application Structure
|
|
- **Frontend**: Modern vanilla JavaScript web UI (`frontend/`)
|
|
- **Backend**: FastAPI REST API (`backend/`)
|
|
- **CLI Tools**: Command-line utilities (`cbx-*.py`)
|
|
- **Gradio Interface**: Alternative web UI (`gradio_app.py`)
|
|
|
|
### New Files and Features
|
|
- **`cbx-audiobook.py`**: Generate long-form audiobooks from text files
|
|
- **`import_helper.py`**: Utility for managing imports and dependencies
|
|
- **Backend Services**: Enhanced dialog processing, speaker management, and TTS services
|
|
- **Web Frontend**: Interactive dialog editor with drag-and-drop functionality
|
|
|
|
### File Organization
|
|
- `single_output/` - Single utterance generations
|
|
- `dialog_output/` - Multi-speaker dialog files
|
|
- `tts_outputs/` - Raw TTS generation files
|
|
- `speaker_data/` - Speaker configurations and audio samples
|
|
- Generated files packaged in ZIP archives for download
|
|
|
|
### API Endpoints
|
|
- `/api/speakers/` - Speaker CRUD operations
|
|
- `/api/dialog/generate/` - Full dialog generation
|
|
- `/api/dialog/generate_line/` - Single line generation
|
|
- `/generated_audio/` - Static audio file serving
|
|
|
|
## Configuration
|
|
|
|
### Environment Setup
|
|
Key configuration files:
|
|
- `.env` - Global settings
|
|
- `backend/.env` - Backend-specific settings
|
|
- `frontend/.env` - Frontend-specific settings
|
|
- `speaker_data/speakers.yaml` - Speaker configuration
|
|
|
|
### Development Commands
|
|
```bash
|
|
# Run tests
|
|
python backend/run_api_test.py
|
|
npm test
|
|
|
|
# Backend development
|
|
uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000
|
|
|
|
# Access points
|
|
# Web UI: http://localhost:8001
|
|
# API: http://localhost:8000
|
|
# API Docs: http://localhost:8000/docs
|
|
```
|
|
|
|
## Memory Management
|
|
|
|
The application automatically:
|
|
- Cleans up the TTS model after each generation
|
|
- Manages GPU memory (CUDA/MPS devices)
|
|
- Optimizes memory usage for long-form content
|
|
|
|
## Troubleshooting
|
|
|
|
- **"Skipping unknown speaker"**: Configure speaker in `speaker_data/speakers.yaml`
|
|
- **"Sample file not found"**: Verify audio files exist in `speaker_data/speaker_samples/`
|
|
- **Memory issues**: Use model reinitialization options for long content
|
|
- **CORS errors**: Check frontend/backend port configuration (frontend origin is auto-included when using `start_servers.py`)
|
|
- **Import errors**: Run `python import_helper.py` to check dependencies
|