chatterbox-ui/README.md

6.3 KiB

Chatterbox TTS Application

A comprehensive text-to-speech application with multiple interfaces for generating speech from text using the Chatterbox TTS model. Supports single utterance generation, multi-speaker dialogs, and long-form audiobook generation.

Features

  • Multiple Interfaces: Web UI, FastAPI backend, Gradio interface, and CLI tools
  • Single Utterance Generation: Generate speech from text using a selected speaker
  • Dialog Generation: Create multi-speaker conversations with configurable silence gaps
  • Audiobook Generation: Convert long-form text into narrated audiobooks
  • Speaker Management: Add/remove speakers with custom audio samples
  • Paste Script (JSONL) Import: Paste a dialog script as JSONL directly into the editor via a modal
  • Memory Optimization: Automatic model cleanup after generation
  • Output Organization: Files saved in organized directories with ZIP packaging

Getting Started

Quick Setup

  1. Clone the repository and install dependencies:

    git clone https://github.com/your-username/chatterbox-ui.git
    cd chatterbox-ui
    pip install -r requirements.txt
    npm install
    
  2. Run automated setup:

    python setup.py
    
  3. Prepare speaker samples:

    • Add audio samples (WAV format) to speaker_data/speaker_samples/
    • Configure speakers in speaker_data/speakers.yaml

Windows Quick Start

On Windows, a PowerShell setup script is provided to automate environment setup and startup.

# From the repository root in PowerShell
./setup-windows.ps1

# First time only, if scripts are blocked:
# Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser

What it does:

  • Creates/uses .venv
  • Upgrades pip and installs deps from backend/requirements.txt and root requirements.txt
  • Creates a default .env with sensible ports if missing
  • Starts both servers via start_servers.py

Running the Application

Full-Stack Web Application:

# Start both backend and frontend servers
python start_servers.py

On Windows, you can also use the one-liner PowerShell script:

./setup-windows.ps1

Individual Components:

# Backend only (FastAPI)
uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000

# Frontend only 
cd frontend && python start_dev_server.py

# Gradio interface
python gradio_app.py

Usage

Web Interface

Access the modern web UI at http://localhost:8001 for interactive dialog creation.

Paste Script (JSONL) in Dialog Editor

Quickly load a dialog by pasting JSONL (one JSON object per line):

  1. Click Paste Script in the Dialog Editor.
  2. Paste JSONL content, for example:
{"type":"speech","speaker_id":"dummy_speaker","text":"Hello there!"}
{"type":"silence","duration":0.5}
{"type":"speech","speaker_id":"dummy_speaker","text":"This is the second line."}
  1. Click Load and confirm replacement if prompted.

Notes:

  • Input is validated per line; errors report line numbers.
  • The dialog is saved to localStorage, so it persists across refreshes.
  • Unknown speaker_ids will still load; add speakers later if needed.

CLI Tools

Single utterance generation:

python cbx-generate.py --sample speaker_samples/voice.wav --output output.wav --text "Hello world"

Dialog generation:

python cbx-dialog-generate.py --dialog dialog.md --output dialog_output

Audiobook generation:

python cbx-audiobook.py --input book.txt --output audiobook --speaker speaker_name

Gradio Interface

  • Single Utterance Tab: Select speaker, enter text, adjust parameters, generate
  • Dialog Generation Tab: Configure speakers and create multi-speaker conversations
  • Dialog format:
    Speaker1: "Hello, how are you?"
    Speaker2: "I'm doing well!"
    Silence: 0.5
    Speaker1: "What are your plans for today?"
    

Architecture Overview

Application Structure

  • Frontend: Modern vanilla JavaScript web UI (frontend/)
  • Backend: FastAPI REST API (backend/)
  • CLI Tools: Command-line utilities (cbx-*.py)
  • Gradio Interface: Alternative web UI (gradio_app.py)

New Files and Features

  • cbx-audiobook.py: Generate long-form audiobooks from text files
  • import_helper.py: Utility for managing imports and dependencies
  • Backend Services: Enhanced dialog processing, speaker management, and TTS services
  • Web Frontend: Interactive dialog editor with drag-and-drop functionality

File Organization

  • single_output/ - Single utterance generations
  • dialog_output/ - Multi-speaker dialog files
  • tts_outputs/ - Raw TTS generation files
  • speaker_data/ - Speaker configurations and audio samples
  • Generated files packaged in ZIP archives for download

API Endpoints

  • /api/speakers/ - Speaker CRUD operations
  • /api/dialog/generate/ - Full dialog generation
  • /api/dialog/generate_line/ - Single line generation
  • /generated_audio/ - Static audio file serving

Configuration

Environment Setup

Key configuration files:

  • .env - Global settings
  • backend/.env - Backend-specific settings
  • frontend/.env - Frontend-specific settings
  • speaker_data/speakers.yaml - Speaker configuration

Development Commands

# Run tests
python backend/run_api_test.py
npm test

# Backend development
uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000

# Access points
# Web UI: http://localhost:8001
# API: http://localhost:8000
# API Docs: http://localhost:8000/docs

Memory Management

The application automatically:

  • Cleans up the TTS model after each generation
  • Manages GPU memory (CUDA/MPS devices)
  • Optimizes memory usage for long-form content

Troubleshooting

  • "Skipping unknown speaker": Configure speaker in speaker_data/speakers.yaml
  • "Sample file not found": Verify audio files exist in speaker_data/speaker_samples/
  • Memory issues: Use model reinitialization options for long content
  • CORS errors: Check frontend/backend port configuration (frontend origin is auto-included when using start_servers.py)
  • Import errors: Run python import_helper.py to check dependencies

Windows-specific

  • If PowerShell blocks script execution, run once:
    Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
    
  • If Windows Firewall prompts the first time you run servers, allow access on your private network.