Go to file

Steve White 34e1b144d9 Working higgs-tts version.		2025-08-09 21:56:48 -05:00
.note	Working layout.	2025-06-05 17:38:12 -05:00
.opencode	Update README and add new features	2025-06-24 15:37:02 -05:00
backend	Working higgs-tts version.	2025-08-09 21:56:48 -05:00
frontend	Working higgs-tts version.	2025-08-09 21:56:48 -05:00
higgs-audio@f04f5df76a	Working higgs-tts version.	2025-08-09 21:56:48 -05:00
speaker_data	Working higgs-tts version.	2025-08-09 21:56:48 -05:00
.env.example	Working higgs-tts version.	2025-08-09 21:56:48 -05:00
.gitignore	updated with startup script	2025-06-17 16:26:55 -05:00
AGENTS.md	Update README and add new features	2025-06-24 15:37:02 -05:00
API_REFERENCE.md	Add API reference	2025-06-06 23:18:47 -05:00
CLAUDE.md	Update README and add new features	2025-06-24 15:37:02 -05:00
ENVIRONMENT_SETUP.md	updated with startup script	2025-06-17 16:26:55 -05:00
OpenCode.md	Update README and add new features	2025-06-24 15:37:02 -05:00
README-dialog-generator.md	Gradio app added, cbx-dialog-generate.py added	2025-06-04 08:30:07 -05:00
README.md	Update README and add new features	2025-06-24 15:37:02 -05:00
babel.config.cjs	Working layout.	2025-06-05 17:38:12 -05:00
cbx-audiobook.py	Update README and add new features	2025-06-24 15:37:02 -05:00
cbx-dialog-generate.py	Update README and add new features	2025-06-24 15:37:02 -05:00
cbx-generate.py	Update README and add new features	2025-06-24 15:37:02 -05:00
chatterbox-test.py	Gradio app added, cbx-dialog-generate.py added	2025-06-04 08:30:07 -05:00
chatterbox_tts.py.bak	Gradio app added, cbx-dialog-generate.py added	2025-06-04 08:30:07 -05:00
gradio_app.py	Major update: Enhanced memory management, configurable silence gaps, and file organization	2025-06-04 12:37:52 -05:00
higgs_plan.md	Working higgs-tts version.	2025-08-09 21:56:48 -05:00
import_helper.py	Update README and add new features	2025-06-24 15:37:02 -05:00
package-lock.json	Working higgs-tts version.	2025-08-09 21:56:48 -05:00
package.json	Working higgs-tts version.	2025-08-09 21:56:48 -05:00
predator.txt	Update README and add new features	2025-06-24 15:37:02 -05:00
requirements.txt	Working higgs-tts version.	2025-08-09 21:56:48 -05:00
sample-audiobook.txt	Update README and add new features	2025-06-24 15:37:02 -05:00
sample-dialog.md	Gradio app added, cbx-dialog-generate.py added	2025-06-04 08:30:07 -05:00
setup.py	updated with startup script	2025-06-17 16:26:55 -05:00
speakers.yaml	Major update: Enhanced memory management, configurable silence gaps, and file organization	2025-06-04 12:37:52 -05:00
start_servers.py	Working higgs-tts version.	2025-08-09 21:56:48 -05:00
start_servers_safe.py	Working higgs-tts version.	2025-08-09 21:56:48 -05:00
storage_service.py	Updated note directory- gradio interface working.	2025-06-05 09:20:19 -05:00
test.py	Patched up to work on m3 laptop. Need to fix the location specific shit.	2025-06-07 16:06:38 -05:00
test1-wav	Gradio app added, cbx-dialog-generate.py added	2025-06-04 08:30:07 -05:00

README.md

Chatterbox TTS Application

A comprehensive text-to-speech application with multiple interfaces for generating speech from text using the Chatterbox TTS model. Supports single utterance generation, multi-speaker dialogs, and long-form audiobook generation.

Features

Multiple Interfaces: Web UI, FastAPI backend, Gradio interface, and CLI tools
Single Utterance Generation: Generate speech from text using a selected speaker
Dialog Generation: Create multi-speaker conversations with configurable silence gaps
Audiobook Generation: Convert long-form text into narrated audiobooks
Speaker Management: Add/remove speakers with custom audio samples
Memory Optimization: Automatic model cleanup after generation
Output Organization: Files saved in organized directories with ZIP packaging

Getting Started

Quick Setup

Clone the repository and install dependencies:

git clone https://github.com/your-username/chatterbox-ui.git
cd chatterbox-ui
pip install -r requirements.txt
npm install

Run automated setup:
```
python setup.py
```
Prepare speaker samples:
- Add audio samples (WAV format) to speaker_data/speaker_samples/
- Configure speakers in speaker_data/speakers.yaml

Running the Application

Full-Stack Web Application:

# Start both backend and frontend servers
python start_servers.py

Individual Components:

# Backend only (FastAPI)
uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000

# Frontend only 
cd frontend && python start_dev_server.py

# Gradio interface
python gradio_app.py

Usage

Web Interface

Access the modern web UI at http://localhost:8001 for interactive dialog creation with drag-and-drop editing.

CLI Tools

Single utterance generation:

python cbx-generate.py --sample speaker_samples/voice.wav --output output.wav --text "Hello world"

Dialog generation:

python cbx-dialog-generate.py --dialog dialog.md --output dialog_output

Audiobook generation:

python cbx-audiobook.py --input book.txt --output audiobook --speaker speaker_name

Gradio Interface

Single Utterance Tab: Select speaker, enter text, adjust parameters, generate
Dialog Generation Tab: Configure speakers and create multi-speaker conversations

Dialog format:

Speaker1: "Hello, how are you?"
Speaker2: "I'm doing well!"
Silence: 0.5
Speaker1: "What are your plans for today?"

Architecture Overview

Application Structure

Frontend: Modern vanilla JavaScript web UI (frontend/)
Backend: FastAPI REST API (backend/)
CLI Tools: Command-line utilities (cbx-*.py)
Gradio Interface: Alternative web UI (gradio_app.py)

New Files and Features

cbx-audiobook.py: Generate long-form audiobooks from text files
import_helper.py: Utility for managing imports and dependencies
Backend Services: Enhanced dialog processing, speaker management, and TTS services
Web Frontend: Interactive dialog editor with drag-and-drop functionality

File Organization

single_output/ - Single utterance generations
dialog_output/ - Multi-speaker dialog files
tts_outputs/ - Raw TTS generation files
speaker_data/ - Speaker configurations and audio samples
Generated files packaged in ZIP archives for download

API Endpoints

/api/speakers/ - Speaker CRUD operations
/api/dialog/generate/ - Full dialog generation
/api/dialog/generate_line/ - Single line generation
/generated_audio/ - Static audio file serving

Configuration

Environment Setup

Key configuration files:

.env - Global settings
backend/.env - Backend-specific settings
frontend/.env - Frontend-specific settings
speaker_data/speakers.yaml - Speaker configuration

Development Commands

# Run tests
python backend/run_api_test.py
npm test

# Backend development
uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000

# Access points
# Web UI: http://localhost:8001
# API: http://localhost:8000
# API Docs: http://localhost:8000/docs

Memory Management

The application automatically:

Cleans up the TTS model after each generation
Manages GPU memory (CUDA/MPS devices)
Optimizes memory usage for long-form content

Troubleshooting

"Skipping unknown speaker": Configure speaker in speaker_data/speakers.yaml
"Sample file not found": Verify audio files exist in speaker_data/speaker_samples/
Memory issues: Use model reinitialization options for long content
CORS errors: Check frontend/backend port configuration
Import errors: Run python import_helper.py to check dependencies