# Chatterbox TTS Application A comprehensive text-to-speech application with multiple interfaces for generating speech from text using the Chatterbox TTS model. Supports single utterance generation, multi-speaker dialogs, and long-form audiobook generation. ## Features - **Multiple Interfaces**: Web UI, FastAPI backend, Gradio interface, and CLI tools - **Single Utterance Generation**: Generate speech from text using a selected speaker - **Dialog Generation**: Create multi-speaker conversations with configurable silence gaps - **Audiobook Generation**: Convert long-form text into narrated audiobooks - **Speaker Management**: Add/remove speakers with custom audio samples - **Memory Optimization**: Automatic model cleanup after generation - **Output Organization**: Files saved in organized directories with ZIP packaging ## Getting Started ### Quick Setup 1. Clone the repository and install dependencies: ```bash git clone https://github.com/your-username/chatterbox-ui.git cd chatterbox-ui pip install -r requirements.txt npm install ``` 2. Run automated setup: ```bash python setup.py ``` 3. Prepare speaker samples: - Add audio samples (WAV format) to `speaker_data/speaker_samples/` - Configure speakers in `speaker_data/speakers.yaml` ### Running the Application **Full-Stack Web Application:** ```bash # Start both backend and frontend servers python start_servers.py ``` **Individual Components:** ```bash # Backend only (FastAPI) uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000 # Frontend only cd frontend && python start_dev_server.py # Gradio interface python gradio_app.py ``` ## Usage ### Web Interface Access the modern web UI at `http://localhost:8001` for interactive dialog creation with drag-and-drop editing. ### CLI Tools **Single utterance generation:** ```bash python cbx-generate.py --sample speaker_samples/voice.wav --output output.wav --text "Hello world" ``` **Dialog generation:** ```bash python cbx-dialog-generate.py --dialog dialog.md --output dialog_output ``` **Audiobook generation:** ```bash python cbx-audiobook.py --input book.txt --output audiobook --speaker speaker_name ``` ### Gradio Interface - **Single Utterance Tab**: Select speaker, enter text, adjust parameters, generate - **Dialog Generation Tab**: Configure speakers and create multi-speaker conversations - Dialog format: ``` Speaker1: "Hello, how are you?" Speaker2: "I'm doing well!" Silence: 0.5 Speaker1: "What are your plans for today?" ``` ## Architecture Overview ### Application Structure - **Frontend**: Modern vanilla JavaScript web UI (`frontend/`) - **Backend**: FastAPI REST API (`backend/`) - **CLI Tools**: Command-line utilities (`cbx-*.py`) - **Gradio Interface**: Alternative web UI (`gradio_app.py`) ### New Files and Features - **`cbx-audiobook.py`**: Generate long-form audiobooks from text files - **`import_helper.py`**: Utility for managing imports and dependencies - **Backend Services**: Enhanced dialog processing, speaker management, and TTS services - **Web Frontend**: Interactive dialog editor with drag-and-drop functionality ### File Organization - `single_output/` - Single utterance generations - `dialog_output/` - Multi-speaker dialog files - `tts_outputs/` - Raw TTS generation files - `speaker_data/` - Speaker configurations and audio samples - Generated files packaged in ZIP archives for download ### API Endpoints - `/api/speakers/` - Speaker CRUD operations - `/api/dialog/generate/` - Full dialog generation - `/api/dialog/generate_line/` - Single line generation - `/generated_audio/` - Static audio file serving ## Configuration ### Environment Setup Key configuration files: - `.env` - Global settings - `backend/.env` - Backend-specific settings - `frontend/.env` - Frontend-specific settings - `speaker_data/speakers.yaml` - Speaker configuration ### Development Commands ```bash # Run tests python backend/run_api_test.py npm test # Backend development uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000 # Access points # Web UI: http://localhost:8001 # API: http://localhost:8000 # API Docs: http://localhost:8000/docs ``` ## Memory Management The application automatically: - Cleans up the TTS model after each generation - Manages GPU memory (CUDA/MPS devices) - Optimizes memory usage for long-form content ## Troubleshooting - **"Skipping unknown speaker"**: Configure speaker in `speaker_data/speakers.yaml` - **"Sample file not found"**: Verify audio files exist in `speaker_data/speaker_samples/` - **Memory issues**: Use model reinitialization options for long content - **CORS errors**: Check frontend/backend port configuration - **Import errors**: Run `python import_helper.py` to check dependencies