166 lines
4.9 KiB
Markdown
166 lines
4.9 KiB
Markdown
# CLAUDE.md
|
|
|
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
|
|
## Common Development Commands
|
|
|
|
### Backend (FastAPI)
|
|
|
|
```bash
|
|
# Install backend dependencies (run from project root)
|
|
pip install -r backend/requirements.txt
|
|
|
|
# Run backend development server (run from project root)
|
|
uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000
|
|
|
|
# Run backend tests
|
|
python backend/run_api_test.py
|
|
|
|
# Backend API accessible at http://127.0.0.1:8000
|
|
# API docs at http://127.0.0.1:8000/docs
|
|
```
|
|
|
|
### Frontend Development
|
|
|
|
```bash
|
|
# Install frontend dependencies
|
|
npm install
|
|
|
|
# Run frontend tests
|
|
npm test
|
|
|
|
# Start frontend dev server separately
|
|
cd frontend && python start_dev_server.py
|
|
```
|
|
|
|
### Integrated Development Environment
|
|
|
|
```bash
|
|
# Start both backend and frontend servers concurrently
|
|
python start_servers.py
|
|
|
|
# Or alternatively, run backend startup script from backend directory
|
|
cd backend && python start_server.py
|
|
```
|
|
|
|
### Command-Line TTS Generation
|
|
|
|
```bash
|
|
# Generate single utterance with CLI
|
|
python cbx-generate.py --sample speaker_samples/voice.wav --output output.wav --text "Hello world"
|
|
|
|
# Generate dialog from script
|
|
python cbx-dialog-generate.py --dialog dialog.md --output dialog_output
|
|
|
|
# Generate audiobook from text file
|
|
python cbx-audiobook.py --input book.txt --output audiobook --speaker speaker_name
|
|
```
|
|
|
|
### Alternative Interfaces
|
|
|
|
```bash
|
|
# Run Gradio interface (standalone TTS app)
|
|
python gradio_app.py
|
|
```
|
|
|
|
## Architecture Overview
|
|
|
|
This is a full-stack TTS (Text-to-Speech) application with three interfaces:
|
|
|
|
1. **Modern web frontend** (vanilla JS) - Interactive dialog editor at `frontend/`
|
|
2. **FastAPI backend** - REST API at `backend/`
|
|
3. **Gradio interface** - Alternative UI in `gradio_app.py`
|
|
|
|
### Frontend-Backend Communication
|
|
|
|
- **Frontend**: Vanilla JS (ES6 modules) serving on port 8001
|
|
- **Backend**: FastAPI serving on port 8000
|
|
- **API Base**: `http://localhost:8000/api`
|
|
- **CORS**: Configured for frontend communication
|
|
- **File Serving**: Generated audio served via `/generated_audio/` endpoint
|
|
|
|
### Key API Endpoints
|
|
|
|
- `/api/speakers/` - Speaker CRUD operations
|
|
- `/api/dialog/generate/` - Full dialog generation
|
|
- `/api/dialog/generate_line/` - Single line generation
|
|
- `/generated_audio/` - Static audio file serving
|
|
|
|
### Backend Service Architecture
|
|
|
|
Located in `backend/app/services/`:
|
|
|
|
- **TTSService**: Chatterbox TTS model lifecycle management
|
|
- **SpeakerManagementService**: Speaker data and sample management
|
|
- **DialogProcessorService**: Dialog script to audio processing
|
|
- **AudioManipulationService**: Audio concatenation and ZIP creation
|
|
|
|
### Frontend Architecture
|
|
|
|
- **Modular design**: `api.js` (API layer) + `app.js` (app logic)
|
|
- **No framework**: Modern vanilla JavaScript with ES6+ features
|
|
- **Interactive editor**: Table-based dialog creation with drag-drop reordering
|
|
|
|
### Data Flow
|
|
|
|
1. User creates dialog in frontend table editor
|
|
2. Frontend sends dialog items to `/api/dialog/generate/`
|
|
3. Backend processes speech/silence items via services
|
|
4. TTS generates audio, segments concatenated
|
|
5. ZIP archive created with all outputs
|
|
6. Frontend receives URLs for playback/download
|
|
|
|
### Speaker Configuration
|
|
|
|
- **Location**: `speaker_data/speakers.yaml` and `speaker_data/speaker_samples/`
|
|
- **Format**: YAML config referencing WAV audio samples
|
|
- **Management**: Both API endpoints and file-based configuration
|
|
|
|
### Output Organization
|
|
|
|
- `dialog_output/` - Generated dialog files
|
|
- `single_output/` - Single utterance outputs
|
|
- `tts_outputs/` - Raw TTS generation files
|
|
- Generated ZIPs contain organized file structure
|
|
|
|
## Development Setup Notes
|
|
|
|
- Python virtual environment expected at project root (`.venv`)
|
|
- Backend commands run from project root, not `backend/` directory
|
|
- Frontend served separately (typically port 8001)
|
|
- Speaker samples must be WAV format in `speaker_data/speaker_samples/`
|
|
|
|
## Environment Configuration
|
|
|
|
### Quick Setup
|
|
```bash
|
|
# Run automated setup (creates .env files)
|
|
python setup.py
|
|
|
|
# Install dependencies
|
|
pip install -r backend/requirements.txt
|
|
npm install
|
|
```
|
|
|
|
### Manual Environment Variables
|
|
Key environment variables that can be configured in `.env` files:
|
|
|
|
- `PROJECT_ROOT`: Base project directory
|
|
- `BACKEND_PORT`/`FRONTEND_PORT`: Server ports (default: 8000/8001)
|
|
- `DEVICE`: TTS model device (`auto`, `cpu`, `cuda`, `mps`)
|
|
- `CORS_ORIGINS`: Allowed frontend origins for CORS
|
|
- `SPEAKER_SAMPLES_DIR`: Directory for speaker audio files
|
|
|
|
### Configuration Files Structure
|
|
- `.env`: Global configuration
|
|
- `backend/.env`: Backend-specific settings
|
|
- `frontend/.env`: Frontend-specific settings
|
|
- `speaker_data/speakers.yaml`: Speaker configuration
|
|
|
|
## CLI Tools Overview
|
|
|
|
- `cbx-generate.py`: Single utterance generation
|
|
- `cbx-dialog-generate.py`: Multi-speaker dialog generation
|
|
- `cbx-audiobook.py`: Long-form audiobook generation
|
|
- `start_servers.py`: Integrated development server launcher
|