Go to file

Steve White 733c9d1b5f Merge pull request 'feat/frontend-phase1' (#1 ) from feat/frontend-phase1 into main Reviewed-on: #1		2025-08-14 15:44:24 +00:00
.note	added back end concurrency and front end paste feature.	2025-08-14 10:33:44 -05:00
.opencode	Update README and add new features	2025-06-24 15:37:02 -05:00
backend	added back end concurrency and front end paste feature.	2025-08-14 10:33:44 -05:00
frontend	added back end concurrency and front end paste feature.	2025-08-14 10:33:44 -05:00
speaker_data	fixed some UI problems and added a clear dialog button.	2025-08-13 18:10:02 -05:00
.env.example	updated with startup script	2025-06-17 16:26:55 -05:00
.gitignore	fixed some UI problems and added a clear dialog button.	2025-08-13 18:10:02 -05:00
AGENTS.md	Update README and add new features	2025-06-24 15:37:02 -05:00
API_REFERENCE.md	current workign version using chatterbox.	2025-08-12 11:31:00 -05:00
CLAUDE.md	Update README and add new features	2025-06-24 15:37:02 -05:00
ENVIRONMENT_SETUP.md	current workign version using chatterbox.	2025-08-12 11:31:00 -05:00
OpenCode.md	Update README and add new features	2025-06-24 15:37:02 -05:00
README-dialog-generator.md	Gradio app added, cbx-dialog-generate.py added	2025-06-04 08:30:07 -05:00
README.md	docs: update README with Windows setup and Paste Script instructions	2025-08-14 10:42:40 -05:00
babel.config.cjs	Working layout.	2025-06-05 17:38:12 -05:00
cbx-audiobook.py	Update README and add new features	2025-06-24 15:37:02 -05:00
cbx-dialog-generate.py	Update README and add new features	2025-06-24 15:37:02 -05:00
cbx-generate.py	Update README and add new features	2025-06-24 15:37:02 -05:00
chatterbox-test.py	Gradio app added, cbx-dialog-generate.py added	2025-06-04 08:30:07 -05:00
chatterbox_tts.py.bak	Gradio app added, cbx-dialog-generate.py added	2025-06-04 08:30:07 -05:00
forge.yaml	fixed some UI problems and added a clear dialog button.	2025-08-13 18:10:02 -05:00
gradio_app.py	Major update: Enhanced memory management, configurable silence gaps, and file organization	2025-06-04 12:37:52 -05:00
import_helper.py	Update README and add new features	2025-06-24 15:37:02 -05:00
jest.config.cjs	feat(frontend): Phase 1 – normalize speakers endpoints, fix API docs and JSON parsing, consolidate state in app.js, tweak CSS border color, align jest/babel-jest + add jest.config.cjs, add dev scripts, sanitize repo URL	2025-08-12 12:16:23 -05:00
package-lock.json	Working layout.	2025-06-05 17:38:12 -05:00
package.json	feat(frontend): Phase 1 – normalize speakers endpoints, fix API docs and JSON parsing, consolidate state in app.js, tweak CSS border color, align jest/babel-jest + add jest.config.cjs, add dev scripts, sanitize repo URL	2025-08-12 12:16:23 -05:00
predator.txt	Update README and add new features	2025-06-24 15:37:02 -05:00
requirements.txt	Update README and add new features	2025-06-24 15:37:02 -05:00
sample-audiobook.txt	Update README and add new features	2025-06-24 15:37:02 -05:00
sample-dialog.md	Gradio app added, cbx-dialog-generate.py added	2025-06-04 08:30:07 -05:00
setup-windows.ps1	Added windows setup script	2025-08-14 10:35:30 -05:00
setup.py	updated with startup script	2025-06-17 16:26:55 -05:00
speakers.yaml	Major update: Enhanced memory management, configurable silence gaps, and file organization	2025-06-04 12:37:52 -05:00
start_servers.py	current workign version using chatterbox.	2025-08-12 11:31:00 -05:00
storage_service.py	Updated note directory- gradio interface working.	2025-06-05 09:20:19 -05:00
test.py	Patched up to work on m3 laptop. Need to fix the location specific shit.	2025-06-07 16:06:38 -05:00
test1-wav	Gradio app added, cbx-dialog-generate.py added	2025-06-04 08:30:07 -05:00

README.md

Chatterbox TTS Application

A comprehensive text-to-speech application with multiple interfaces for generating speech from text using the Chatterbox TTS model. Supports single utterance generation, multi-speaker dialogs, and long-form audiobook generation.

Features

Multiple Interfaces: Web UI, FastAPI backend, Gradio interface, and CLI tools
Single Utterance Generation: Generate speech from text using a selected speaker
Dialog Generation: Create multi-speaker conversations with configurable silence gaps
Audiobook Generation: Convert long-form text into narrated audiobooks
Speaker Management: Add/remove speakers with custom audio samples
Paste Script (JSONL) Import: Paste a dialog script as JSONL directly into the editor via a modal
Memory Optimization: Automatic model cleanup after generation
Output Organization: Files saved in organized directories with ZIP packaging

Getting Started

Quick Setup

Clone the repository and install dependencies:

git clone https://github.com/your-username/chatterbox-ui.git
cd chatterbox-ui
pip install -r requirements.txt
npm install

Run automated setup:
```
python setup.py
```
Prepare speaker samples:
- Add audio samples (WAV format) to speaker_data/speaker_samples/
- Configure speakers in speaker_data/speakers.yaml

Windows Quick Start

On Windows, a PowerShell setup script is provided to automate environment setup and startup.

# From the repository root in PowerShell
./setup-windows.ps1

# First time only, if scripts are blocked:
# Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser

What it does:

Creates/uses .venv
Upgrades pip and installs deps from backend/requirements.txt and root requirements.txt
Creates a default .env with sensible ports if missing
Starts both servers via start_servers.py

Running the Application

Full-Stack Web Application:

# Start both backend and frontend servers
python start_servers.py

On Windows, you can also use the one-liner PowerShell script:

./setup-windows.ps1

Individual Components:

# Backend only (FastAPI)
uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000

# Frontend only 
cd frontend && python start_dev_server.py

# Gradio interface
python gradio_app.py

Usage

Web Interface

Access the modern web UI at http://localhost:8001 for interactive dialog creation.

Paste Script (JSONL) in Dialog Editor

Quickly load a dialog by pasting JSONL (one JSON object per line):

Click Paste Script in the Dialog Editor.
Paste JSONL content, for example:

{"type":"speech","speaker_id":"dummy_speaker","text":"Hello there!"}
{"type":"silence","duration":0.5}
{"type":"speech","speaker_id":"dummy_speaker","text":"This is the second line."}

Click Load and confirm replacement if prompted.

Notes:

Input is validated per line; errors report line numbers.
The dialog is saved to localStorage, so it persists across refreshes.
Unknown speaker_ids will still load; add speakers later if needed.

CLI Tools

Single utterance generation:

python cbx-generate.py --sample speaker_samples/voice.wav --output output.wav --text "Hello world"

Dialog generation:

python cbx-dialog-generate.py --dialog dialog.md --output dialog_output

Audiobook generation:

python cbx-audiobook.py --input book.txt --output audiobook --speaker speaker_name

Gradio Interface

Single Utterance Tab: Select speaker, enter text, adjust parameters, generate
Dialog Generation Tab: Configure speakers and create multi-speaker conversations

Dialog format:

Speaker1: "Hello, how are you?"
Speaker2: "I'm doing well!"
Silence: 0.5
Speaker1: "What are your plans for today?"

Architecture Overview

Application Structure

Frontend: Modern vanilla JavaScript web UI (frontend/)
Backend: FastAPI REST API (backend/)
CLI Tools: Command-line utilities (cbx-*.py)
Gradio Interface: Alternative web UI (gradio_app.py)

New Files and Features

cbx-audiobook.py: Generate long-form audiobooks from text files
import_helper.py: Utility for managing imports and dependencies
Backend Services: Enhanced dialog processing, speaker management, and TTS services
Web Frontend: Interactive dialog editor with drag-and-drop functionality

File Organization

single_output/ - Single utterance generations
dialog_output/ - Multi-speaker dialog files
tts_outputs/ - Raw TTS generation files
speaker_data/ - Speaker configurations and audio samples
Generated files packaged in ZIP archives for download

API Endpoints

/api/speakers/ - Speaker CRUD operations
/api/dialog/generate/ - Full dialog generation
/api/dialog/generate_line/ - Single line generation
/generated_audio/ - Static audio file serving

Configuration

Environment Setup

Key configuration files:

.env - Global settings
backend/.env - Backend-specific settings
frontend/.env - Frontend-specific settings
speaker_data/speakers.yaml - Speaker configuration

Development Commands

# Run tests
python backend/run_api_test.py
npm test

# Backend development
uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000

# Access points
# Web UI: http://localhost:8001
# API: http://localhost:8000
# API Docs: http://localhost:8000/docs

Memory Management

The application automatically:

Cleans up the TTS model after each generation
Manages GPU memory (CUDA/MPS devices)
Optimizes memory usage for long-form content

Troubleshooting

"Skipping unknown speaker": Configure speaker in speaker_data/speakers.yaml
"Sample file not found": Verify audio files exist in speaker_data/speaker_samples/
Memory issues: Use model reinitialization options for long content
CORS errors: Check frontend/backend port configuration (frontend origin is auto-included when using start_servers.py)
Import errors: Run python import_helper.py to check dependencies

Windows-specific

If PowerShell blocks script execution, run once:

Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser

If Windows Firewall prompts the first time you run servers, allow access on your private network.