# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Common Development Commands ### Backend (FastAPI) ```bash # Install backend dependencies (run from project root) pip install -r backend/requirements.txt # Run backend development server (run from project root) uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000 # Run backend tests python backend/run_api_test.py # Backend API accessible at http://127.0.0.1:8000 # API docs at http://127.0.0.1:8000/docs ``` ### Frontend Development ```bash # Install frontend dependencies npm install # Run frontend tests npm test # Start frontend dev server separately cd frontend && python start_dev_server.py ``` ### Integrated Development Environment ```bash # Start both backend and frontend servers concurrently python start_servers.py # Or alternatively, run backend startup script from backend directory cd backend && python start_server.py ``` ### Command-Line TTS Generation ```bash # Generate single utterance with CLI python cbx-generate.py --sample speaker_samples/voice.wav --output output.wav --text "Hello world" # Generate dialog from script python cbx-dialog-generate.py --dialog dialog.md --output dialog_output # Generate audiobook from text file python cbx-audiobook.py --input book.txt --output audiobook --speaker speaker_name ``` ### Alternative Interfaces ```bash # Run Gradio interface (standalone TTS app) python gradio_app.py ``` ## Architecture Overview This is a full-stack TTS (Text-to-Speech) application with three interfaces: 1. **Modern web frontend** (vanilla JS) - Interactive dialog editor at `frontend/` 2. **FastAPI backend** - REST API at `backend/` 3. **Gradio interface** - Alternative UI in `gradio_app.py` ### Frontend-Backend Communication - **Frontend**: Vanilla JS (ES6 modules) serving on port 8001 - **Backend**: FastAPI serving on port 8000 - **API Base**: `http://localhost:8000/api` - **CORS**: Configured for frontend communication - **File Serving**: Generated audio served via `/generated_audio/` endpoint ### Key API Endpoints - `/api/speakers/` - Speaker CRUD operations - `/api/dialog/generate/` - Full dialog generation - `/api/dialog/generate_line/` - Single line generation - `/generated_audio/` - Static audio file serving ### Backend Service Architecture Located in `backend/app/services/`: - **TTSService**: Chatterbox TTS model lifecycle management - **SpeakerManagementService**: Speaker data and sample management - **DialogProcessorService**: Dialog script to audio processing - **AudioManipulationService**: Audio concatenation and ZIP creation ### Frontend Architecture - **Modular design**: `api.js` (API layer) + `app.js` (app logic) - **No framework**: Modern vanilla JavaScript with ES6+ features - **Interactive editor**: Table-based dialog creation with drag-drop reordering ### Data Flow 1. User creates dialog in frontend table editor 2. Frontend sends dialog items to `/api/dialog/generate/` 3. Backend processes speech/silence items via services 4. TTS generates audio, segments concatenated 5. ZIP archive created with all outputs 6. Frontend receives URLs for playback/download ### Speaker Configuration - **Location**: `speaker_data/speakers.yaml` and `speaker_data/speaker_samples/` - **Format**: YAML config referencing WAV audio samples - **Management**: Both API endpoints and file-based configuration ### Output Organization - `dialog_output/` - Generated dialog files - `single_output/` - Single utterance outputs - `tts_outputs/` - Raw TTS generation files - Generated ZIPs contain organized file structure ## Development Setup Notes - Python virtual environment expected at project root (`.venv`) - Backend commands run from project root, not `backend/` directory - Frontend served separately (typically port 8001) - Speaker samples must be WAV format in `speaker_data/speaker_samples/` ## Environment Configuration ### Quick Setup ```bash # Run automated setup (creates .env files) python setup.py # Install dependencies pip install -r backend/requirements.txt npm install ``` ### Manual Environment Variables Key environment variables that can be configured in `.env` files: - `PROJECT_ROOT`: Base project directory - `BACKEND_PORT`/`FRONTEND_PORT`: Server ports (default: 8000/8001) - `DEVICE`: TTS model device (`auto`, `cpu`, `cuda`, `mps`) - `CORS_ORIGINS`: Allowed frontend origins for CORS - `SPEAKER_SAMPLES_DIR`: Directory for speaker audio files ### Configuration Files Structure - `.env`: Global configuration - `backend/.env`: Backend-specific settings - `frontend/.env`: Frontend-specific settings - `speaker_data/speakers.yaml`: Speaker configuration ## CLI Tools Overview - `cbx-generate.py`: Single utterance generation - `cbx-dialog-generate.py`: Multi-speaker dialog generation - `cbx-audiobook.py`: Long-form audiobook generation - `start_servers.py`: Integrated development server launcher