chatterbox-ui/CLAUDE.md

3.1 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Common Development Commands

Backend (FastAPI)

# Install backend dependencies (run from project root)
pip install -r backend/requirements.txt

# Run backend development server (run from project root)
uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000

# Run backend tests
python backend/run_api_test.py

# Backend API accessible at http://127.0.0.1:8000
# API docs at http://127.0.0.1:8000/docs

Frontend Testing

# Run frontend tests
npm test

Alternative Interfaces

# Run Gradio interface (standalone TTS app)
python gradio_app.py

Architecture Overview

This is a full-stack TTS (Text-to-Speech) application with three interfaces:

  1. Modern web frontend (vanilla JS) - Interactive dialog editor at frontend/
  2. FastAPI backend - REST API at backend/
  3. Gradio interface - Alternative UI in gradio_app.py

Frontend-Backend Communication

  • Frontend: Vanilla JS (ES6 modules) serving on port 8001
  • Backend: FastAPI serving on port 8000
  • API Base: http://localhost:8000/api
  • CORS: Configured for frontend communication
  • File Serving: Generated audio served via /generated_audio/ endpoint

Key API Endpoints

  • /api/speakers/ - Speaker CRUD operations
  • /api/dialog/generate/ - Full dialog generation
  • /api/dialog/generate_line/ - Single line generation
  • /generated_audio/ - Static audio file serving

Backend Service Architecture

Located in backend/app/services/:

  • TTSService: Chatterbox TTS model lifecycle management
  • SpeakerManagementService: Speaker data and sample management
  • DialogProcessorService: Dialog script to audio processing
  • AudioManipulationService: Audio concatenation and ZIP creation

Frontend Architecture

  • Modular design: api.js (API layer) + app.js (app logic)
  • No framework: Modern vanilla JavaScript with ES6+ features
  • Interactive editor: Table-based dialog creation with drag-drop reordering

Data Flow

  1. User creates dialog in frontend table editor
  2. Frontend sends dialog items to /api/dialog/generate/
  3. Backend processes speech/silence items via services
  4. TTS generates audio, segments concatenated
  5. ZIP archive created with all outputs
  6. Frontend receives URLs for playback/download

Speaker Configuration

  • Location: speaker_data/speakers.yaml and speaker_data/speaker_samples/
  • Format: YAML config referencing WAV audio samples
  • Management: Both API endpoints and file-based configuration

Output Organization

  • dialog_output/ - Generated dialog files
  • single_output/ - Single utterance outputs
  • tts_outputs/ - Raw TTS generation files
  • Generated ZIPs contain organized file structure

Development Setup Notes

  • Python virtual environment expected at project root (.venv)
  • Backend commands run from project root, not backend/ directory
  • Frontend served separately (typically port 8001)
  • Speaker samples must be WAV format in speaker_data/speaker_samples/