Go to file
Steve White 948712bb3f current workign version using chatterbox. 2025-08-12 11:31:00 -05:00
.note Working layout. 2025-06-05 17:38:12 -05:00
.opencode Update README and add new features 2025-06-24 15:37:02 -05:00
backend current workign version using chatterbox. 2025-08-12 11:31:00 -05:00
frontend current workign version using chatterbox. 2025-08-12 11:31:00 -05:00
speaker_data current workign version using chatterbox. 2025-08-12 11:31:00 -05:00
.env.example updated with startup script 2025-06-17 16:26:55 -05:00
.gitignore updated with startup script 2025-06-17 16:26:55 -05:00
AGENTS.md Update README and add new features 2025-06-24 15:37:02 -05:00
API_REFERENCE.md current workign version using chatterbox. 2025-08-12 11:31:00 -05:00
CLAUDE.md Update README and add new features 2025-06-24 15:37:02 -05:00
ENVIRONMENT_SETUP.md current workign version using chatterbox. 2025-08-12 11:31:00 -05:00
OpenCode.md Update README and add new features 2025-06-24 15:37:02 -05:00
README-dialog-generator.md Gradio app added, cbx-dialog-generate.py added 2025-06-04 08:30:07 -05:00
README.md current workign version using chatterbox. 2025-08-12 11:31:00 -05:00
babel.config.cjs Working layout. 2025-06-05 17:38:12 -05:00
cbx-audiobook.py Update README and add new features 2025-06-24 15:37:02 -05:00
cbx-dialog-generate.py Update README and add new features 2025-06-24 15:37:02 -05:00
cbx-generate.py Update README and add new features 2025-06-24 15:37:02 -05:00
chatterbox-test.py Gradio app added, cbx-dialog-generate.py added 2025-06-04 08:30:07 -05:00
chatterbox_tts.py.bak Gradio app added, cbx-dialog-generate.py added 2025-06-04 08:30:07 -05:00
gradio_app.py Major update: Enhanced memory management, configurable silence gaps, and file organization 2025-06-04 12:37:52 -05:00
import_helper.py Update README and add new features 2025-06-24 15:37:02 -05:00
package-lock.json Working layout. 2025-06-05 17:38:12 -05:00
package.json Working layout. 2025-06-05 17:38:12 -05:00
predator.txt Update README and add new features 2025-06-24 15:37:02 -05:00
requirements.txt Update README and add new features 2025-06-24 15:37:02 -05:00
sample-audiobook.txt Update README and add new features 2025-06-24 15:37:02 -05:00
sample-dialog.md Gradio app added, cbx-dialog-generate.py added 2025-06-04 08:30:07 -05:00
setup.py updated with startup script 2025-06-17 16:26:55 -05:00
speakers.yaml Major update: Enhanced memory management, configurable silence gaps, and file organization 2025-06-04 12:37:52 -05:00
start_servers.py current workign version using chatterbox. 2025-08-12 11:31:00 -05:00
storage_service.py Updated note directory- gradio interface working. 2025-06-05 09:20:19 -05:00
test.py Patched up to work on m3 laptop. Need to fix the location specific shit. 2025-06-07 16:06:38 -05:00
test1-wav Gradio app added, cbx-dialog-generate.py added 2025-06-04 08:30:07 -05:00

README.md

Chatterbox TTS Application

A comprehensive text-to-speech application with multiple interfaces for generating speech from text using the Chatterbox TTS model. Supports single utterance generation, multi-speaker dialogs, and long-form audiobook generation.

Features

  • Multiple Interfaces: Web UI, FastAPI backend, Gradio interface, and CLI tools
  • Single Utterance Generation: Generate speech from text using a selected speaker
  • Dialog Generation: Create multi-speaker conversations with configurable silence gaps
  • Audiobook Generation: Convert long-form text into narrated audiobooks
  • Speaker Management: Add/remove speakers with custom audio samples
  • Memory Optimization: Automatic model cleanup after generation
  • Output Organization: Files saved in organized directories with ZIP packaging

Getting Started

Quick Setup

  1. Clone the repository and install dependencies:

    git clone https://github.com/your-username/chatterbox-ui.git
    cd chatterbox-ui
    pip install -r requirements.txt
    npm install
    
  2. Run automated setup:

    python setup.py
    
  3. Prepare speaker samples:

    • Add audio samples (WAV format) to speaker_data/speaker_samples/
    • Configure speakers in speaker_data/speakers.yaml

Running the Application

Full-Stack Web Application:

# Start both backend and frontend servers
python start_servers.py

Individual Components:

# Backend only (FastAPI)
uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000

# Frontend only 
cd frontend && python start_dev_server.py

# Gradio interface
python gradio_app.py

Usage

Web Interface

Access the modern web UI at http://localhost:8001 for interactive dialog creation with drag-and-drop editing.

CLI Tools

Single utterance generation:

python cbx-generate.py --sample speaker_samples/voice.wav --output output.wav --text "Hello world"

Dialog generation:

python cbx-dialog-generate.py --dialog dialog.md --output dialog_output

Audiobook generation:

python cbx-audiobook.py --input book.txt --output audiobook --speaker speaker_name

Gradio Interface

  • Single Utterance Tab: Select speaker, enter text, adjust parameters, generate
  • Dialog Generation Tab: Configure speakers and create multi-speaker conversations
  • Dialog format:
    Speaker1: "Hello, how are you?"
    Speaker2: "I'm doing well!"
    Silence: 0.5
    Speaker1: "What are your plans for today?"
    

Architecture Overview

Application Structure

  • Frontend: Modern vanilla JavaScript web UI (frontend/)
  • Backend: FastAPI REST API (backend/)
  • CLI Tools: Command-line utilities (cbx-*.py)
  • Gradio Interface: Alternative web UI (gradio_app.py)

New Files and Features

  • cbx-audiobook.py: Generate long-form audiobooks from text files
  • import_helper.py: Utility for managing imports and dependencies
  • Backend Services: Enhanced dialog processing, speaker management, and TTS services
  • Web Frontend: Interactive dialog editor with drag-and-drop functionality

File Organization

  • single_output/ - Single utterance generations
  • dialog_output/ - Multi-speaker dialog files
  • tts_outputs/ - Raw TTS generation files
  • speaker_data/ - Speaker configurations and audio samples
  • Generated files packaged in ZIP archives for download

API Endpoints

  • /api/speakers/ - Speaker CRUD operations
  • /api/dialog/generate/ - Full dialog generation
  • /api/dialog/generate_line/ - Single line generation
  • /generated_audio/ - Static audio file serving

Configuration

Environment Setup

Key configuration files:

  • .env - Global settings
  • backend/.env - Backend-specific settings
  • frontend/.env - Frontend-specific settings
  • speaker_data/speakers.yaml - Speaker configuration

Development Commands

# Run tests
python backend/run_api_test.py
npm test

# Backend development
uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000

# Access points
# Web UI: http://localhost:8001
# API: http://localhost:8000
# API Docs: http://localhost:8000/docs

Memory Management

The application automatically:

  • Cleans up the TTS model after each generation
  • Manages GPU memory (CUDA/MPS devices)
  • Optimizes memory usage for long-form content

Troubleshooting

  • "Skipping unknown speaker": Configure speaker in speaker_data/speakers.yaml
  • "Sample file not found": Verify audio files exist in speaker_data/speaker_samples/
  • Memory issues: Use model reinitialization options for long content
  • CORS errors: Check frontend/backend port configuration (frontend origin is auto-included when using start_servers.py)
  • Import errors: Run python import_helper.py to check dependencies