4.9 KiB
4.9 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Common Development Commands
Backend (FastAPI)
# Install backend dependencies (run from project root)
pip install -r backend/requirements.txt
# Run backend development server (run from project root)
uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000
# Run backend tests
python backend/run_api_test.py
# Backend API accessible at http://127.0.0.1:8000
# API docs at http://127.0.0.1:8000/docs
Frontend Development
# Install frontend dependencies
npm install
# Run frontend tests
npm test
# Start frontend dev server separately
cd frontend && python start_dev_server.py
Integrated Development Environment
# Start both backend and frontend servers concurrently
python start_servers.py
# Or alternatively, run backend startup script from backend directory
cd backend && python start_server.py
Command-Line TTS Generation
# Generate single utterance with CLI
python cbx-generate.py --sample speaker_samples/voice.wav --output output.wav --text "Hello world"
# Generate dialog from script
python cbx-dialog-generate.py --dialog dialog.md --output dialog_output
# Generate audiobook from text file
python cbx-audiobook.py --input book.txt --output audiobook --speaker speaker_name
Alternative Interfaces
# Run Gradio interface (standalone TTS app)
python gradio_app.py
Architecture Overview
This is a full-stack TTS (Text-to-Speech) application with three interfaces:
- Modern web frontend (vanilla JS) - Interactive dialog editor at
frontend/
- FastAPI backend - REST API at
backend/
- Gradio interface - Alternative UI in
gradio_app.py
Frontend-Backend Communication
- Frontend: Vanilla JS (ES6 modules) serving on port 8001
- Backend: FastAPI serving on port 8000
- API Base:
http://localhost:8000/api
- CORS: Configured for frontend communication
- File Serving: Generated audio served via
/generated_audio/
endpoint
Key API Endpoints
/api/speakers/
- Speaker CRUD operations/api/dialog/generate/
- Full dialog generation/api/dialog/generate_line/
- Single line generation/generated_audio/
- Static audio file serving
Backend Service Architecture
Located in backend/app/services/
:
- TTSService: Chatterbox TTS model lifecycle management
- SpeakerManagementService: Speaker data and sample management
- DialogProcessorService: Dialog script to audio processing
- AudioManipulationService: Audio concatenation and ZIP creation
Frontend Architecture
- Modular design:
api.js
(API layer) +app.js
(app logic) - No framework: Modern vanilla JavaScript with ES6+ features
- Interactive editor: Table-based dialog creation with drag-drop reordering
Data Flow
- User creates dialog in frontend table editor
- Frontend sends dialog items to
/api/dialog/generate/
- Backend processes speech/silence items via services
- TTS generates audio, segments concatenated
- ZIP archive created with all outputs
- Frontend receives URLs for playback/download
Speaker Configuration
- Location:
speaker_data/speakers.yaml
andspeaker_data/speaker_samples/
- Format: YAML config referencing WAV audio samples
- Management: Both API endpoints and file-based configuration
Output Organization
dialog_output/
- Generated dialog filessingle_output/
- Single utterance outputstts_outputs/
- Raw TTS generation files- Generated ZIPs contain organized file structure
Development Setup Notes
- Python virtual environment expected at project root (
.venv
) - Backend commands run from project root, not
backend/
directory - Frontend served separately (typically port 8001)
- Speaker samples must be WAV format in
speaker_data/speaker_samples/
Environment Configuration
Quick Setup
# Run automated setup (creates .env files)
python setup.py
# Install dependencies
pip install -r backend/requirements.txt
npm install
Manual Environment Variables
Key environment variables that can be configured in .env
files:
PROJECT_ROOT
: Base project directoryBACKEND_PORT
/FRONTEND_PORT
: Server ports (default: 8000/8001)DEVICE
: TTS model device (auto
,cpu
,cuda
,mps
)CORS_ORIGINS
: Allowed frontend origins for CORSSPEAKER_SAMPLES_DIR
: Directory for speaker audio files
Configuration Files Structure
.env
: Global configurationbackend/.env
: Backend-specific settingsfrontend/.env
: Frontend-specific settingsspeaker_data/speakers.yaml
: Speaker configuration
CLI Tools Overview
cbx-generate.py
: Single utterance generationcbx-dialog-generate.py
: Multi-speaker dialog generationcbx-audiobook.py
: Long-form audiobook generationstart_servers.py
: Integrated development server launcher