4.9 KiB

Raw Permalink Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Common Development Commands

Backend (FastAPI)

# Install backend dependencies (run from project root)
pip install -r backend/requirements.txt

# Run backend development server (run from project root)
uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000

# Run backend tests
python backend/run_api_test.py

# Backend API accessible at http://127.0.0.1:8000
# API docs at http://127.0.0.1:8000/docs

Frontend Development

# Install frontend dependencies
npm install

# Run frontend tests
npm test

# Start frontend dev server separately
cd frontend && python start_dev_server.py

Integrated Development Environment

# Start both backend and frontend servers concurrently
python start_servers.py

# Or alternatively, run backend startup script from backend directory
cd backend && python start_server.py

Command-Line TTS Generation

# Generate single utterance with CLI
python cbx-generate.py --sample speaker_samples/voice.wav --output output.wav --text "Hello world"

# Generate dialog from script
python cbx-dialog-generate.py --dialog dialog.md --output dialog_output

# Generate audiobook from text file
python cbx-audiobook.py --input book.txt --output audiobook --speaker speaker_name

Alternative Interfaces

# Run Gradio interface (standalone TTS app)
python gradio_app.py

Architecture Overview

This is a full-stack TTS (Text-to-Speech) application with three interfaces:

Modern web frontend (vanilla JS) - Interactive dialog editor at frontend/
FastAPI backend - REST API at backend/
Gradio interface - Alternative UI in gradio_app.py

Frontend-Backend Communication

Frontend: Vanilla JS (ES6 modules) serving on port 8001
Backend: FastAPI serving on port 8000
API Base: http://localhost:8000/api
CORS: Configured for frontend communication
File Serving: Generated audio served via /generated_audio/ endpoint

Key API Endpoints

/api/speakers/ - Speaker CRUD operations
/api/dialog/generate/ - Full dialog generation
/api/dialog/generate_line/ - Single line generation
/generated_audio/ - Static audio file serving

Backend Service Architecture

Located in backend/app/services/:

TTSService: Chatterbox TTS model lifecycle management
SpeakerManagementService: Speaker data and sample management
DialogProcessorService: Dialog script to audio processing
AudioManipulationService: Audio concatenation and ZIP creation

Frontend Architecture

Modular design: api.js (API layer) + app.js (app logic)
No framework: Modern vanilla JavaScript with ES6+ features
Interactive editor: Table-based dialog creation with drag-drop reordering

Data Flow

User creates dialog in frontend table editor
Frontend sends dialog items to /api/dialog/generate/
Backend processes speech/silence items via services
TTS generates audio, segments concatenated
ZIP archive created with all outputs
Frontend receives URLs for playback/download

Speaker Configuration

Location: speaker_data/speakers.yaml and speaker_data/speaker_samples/
Format: YAML config referencing WAV audio samples
Management: Both API endpoints and file-based configuration

Output Organization

dialog_output/ - Generated dialog files
single_output/ - Single utterance outputs
tts_outputs/ - Raw TTS generation files
Generated ZIPs contain organized file structure

Development Setup Notes

Python virtual environment expected at project root (.venv)
Backend commands run from project root, not backend/ directory
Frontend served separately (typically port 8001)
Speaker samples must be WAV format in speaker_data/speaker_samples/

Environment Configuration

Quick Setup

# Run automated setup (creates .env files)
python setup.py

# Install dependencies
pip install -r backend/requirements.txt
npm install

Manual Environment Variables

Key environment variables that can be configured in .env files:

PROJECT_ROOT: Base project directory
BACKEND_PORT/FRONTEND_PORT: Server ports (default: 8000/8001)
DEVICE: TTS model device (auto, cpu, cuda, mps)
CORS_ORIGINS: Allowed frontend origins for CORS
SPEAKER_SAMPLES_DIR: Directory for speaker audio files

Configuration Files Structure

.env: Global configuration
backend/.env: Backend-specific settings
frontend/.env: Frontend-specific settings
speaker_data/speakers.yaml: Speaker configuration

CLI Tools Overview

cbx-generate.py: Single utterance generation
cbx-dialog-generate.py: Multi-speaker dialog generation
cbx-audiobook.py: Long-form audiobook generation
start_servers.py: Integrated development server launcher

4.9 KiB Raw Permalink Blame History