11 KiB
Chatterbox TTS API Reference
Overview
The Chatterbox TTS API is a FastAPI-based backend service that provides text-to-speech capabilities with speaker management and dialog generation features. The API supports creating custom speakers from audio samples and generating complex dialogs with multiple speakers, silences, and fine-tuned TTS parameters.
Base URL: http://127.0.0.1:8000
API Version: 0.1.0
Framework: FastAPI with automatic OpenAPI documentation
Quick Start
- Interactive API Documentation:
http://127.0.0.1:8000/docs
(Swagger UI) - Alternative Documentation:
http://127.0.0.1:8000/redoc
(ReDoc) - OpenAPI Schema:
http://127.0.0.1:8000/openapi.json
Authentication
Currently, the API does not require authentication. CORS is configured to allow requests from localhost:8001
and 127.0.0.1:8001
.
Endpoints
🏠 Root Endpoint
GET /
Welcome message and API status check.
Response:
{
"message": "Welcome to the Chatterbox TTS API!"
}
👥 Speaker Management
GET /api/speakers/
Retrieve all available speakers.
Response Model: List[Speaker]
Example Response:
[
{
"id": "speaker_001",
"name": "John Doe",
"sample_path": "/path/to/speaker_samples/john_doe.wav"
},
{
"id": "speaker_002",
"name": "Jane Smith",
"sample_path": "/path/to/speaker_samples/jane_smith.wav"
}
]
Status Codes:
200
: Success
POST /api/speakers/
Create a new speaker from an audio sample.
Request Type: multipart/form-data
Parameters:
name
(form field, required): Speaker nameaudio_file
(file upload, required): Audio sample file (WAV, MP3, etc.)
Response Model: SpeakerResponse
Example Response:
{
"id": "speaker_003",
"name": "Alex Johnson",
"message": "Speaker added successfully."
}
Status Codes:
201
: Speaker created successfully400
: Invalid file type or missing file500
: Server error during speaker creation
Example cURL:
curl -X POST "http://127.0.0.1:8000/api/speakers/" \
-F "name=Alex Johnson" \
-F "audio_file=@/path/to/sample.wav"
GET /api/speakers/{speaker_id}
Get details for a specific speaker.
Path Parameters:
speaker_id
(string, required): Unique speaker identifier
Response Model: Speaker
Example Response:
{
"id": "speaker_001",
"name": "John Doe",
"sample_path": "/path/to/speaker_samples/john_doe.wav"
}
Status Codes:
200
: Success404
: Speaker not found
DELETE /api/speakers/{speaker_id}
Delete a speaker by ID.
Path Parameters:
speaker_id
(string, required): Unique speaker identifier
Example Response:
{
"message": "Speaker deleted successfully"
}
Status Codes:
200
: Speaker deleted successfully404
: Speaker not found
🎭 Dialog Generation
POST /api/dialog/generate_line
Generate audio for a single dialog line (speech or silence).
Request Body: Raw JSON object representing either a SpeechItem
or SilenceItem
Speech Item Example:
{
"type": "speech",
"speaker_id": "speaker_001",
"text": "Hello, this is a test message.",
"exaggeration": 0.7,
"cfg_weight": 0.6,
"temperature": 0.8,
"use_existing_audio": false,
"audio_url": null
}
Silence Item Example:
{
"type": "silence",
"duration": 2.0,
"use_existing_audio": false,
"audio_url": null
}
Response:
{
"audio_url": "/generated_audio/line_abc123def456.wav",
"type": "speech",
"text": "Hello, this is a test message."
}
Status Codes:
200
: Audio generated successfully400
: Invalid request format or unknown dialog item type404
: Speaker not found500
: Server error during generation
POST /api/dialog/generate
Generate a complete dialog from multiple speech and silence items.
Request Model: DialogRequest
Request Body:
{
"dialog_items": [
{
"type": "speech",
"speaker_id": "speaker_001",
"text": "Welcome to our podcast!",
"exaggeration": 0.5,
"cfg_weight": 0.5,
"temperature": 0.8
},
{
"type": "silence",
"duration": 1.0
},
{
"type": "speech",
"speaker_id": "speaker_002",
"text": "Thank you for having me!",
"exaggeration": 0.6,
"cfg_weight": 0.7,
"temperature": 0.9
}
],
"output_base_name": "podcast_episode_01"
}
Response Model: DialogResponse
Example Response:
{
"log": "Processing dialog with 3 items...\nGenerating speech for item 1...\nGenerating silence for item 2...\nGenerating speech for item 3...\nConcatenating audio segments...\nZIP archive created at: /path/to/output.zip",
"concatenated_audio_url": "/generated_audio/podcast_episode_01_concatenated.wav",
"zip_archive_url": "/generated_audio/podcast_episode_01_archive.zip",
"temp_dir_path": "/path/to/temp/directory",
"error_message": null
}
Status Codes:
200
: Dialog generated successfully400
: Invalid request format or validation errors404
: Speaker or file not found500
: Server error during generation
📁 Static File Serving
GET /generated_audio/{filename}
Serve generated audio files and ZIP archives.
Path Parameters:
filename
(string, required): Name of the generated file
Response: Binary audio file or ZIP archive
Example URLs:
http://127.0.0.1:8000/generated_audio/dialog_concatenated.wav
http://127.0.0.1:8000/generated_audio/dialog_archive.zip
📋 Data Models
Speaker Models
Speaker
{
"id": "string",
"name": "string",
"sample_path": "string|null"
}
SpeakerResponse
{
"id": "string",
"name": "string",
"message": "string|null"
}
Dialog Models
SpeechItem
{
"type": "speech",
"speaker_id": "string",
"text": "string",
"exaggeration": 0.5, // 0.0-2.0, controls expressiveness
"cfg_weight": 0.5, // 0.0-2.0, alignment with speaker characteristics
"temperature": 0.8, // 0.0-2.0, randomness in generation
"use_existing_audio": false,
"audio_url": "string|null"
}
SilenceItem
{
"type": "silence",
"duration": 1.0, // seconds, must be > 0
"use_existing_audio": false,
"audio_url": "string|null"
}
DialogRequest
{
"dialog_items": [
// Array of SpeechItem and/or SilenceItem objects
],
"output_base_name": "string" // Base name for output files
}
DialogResponse
{
"log": "string", // Processing log
"concatenated_audio_url": "string|null", // URL to final audio
"zip_archive_url": "string|null", // URL to ZIP archive
"temp_dir_path": "string|null", // Server temp directory
"error_message": "string|null" // Error details if failed
}
🎛️ TTS Parameters
Exaggeration (exaggeration
)
- Range: 0.0 - 2.0
- Default: 0.5
- Description: Controls the expressiveness of speech. Higher values produce more exaggerated, emotional speech.
CFG Weight (cfg_weight
)
- Range: 0.0 - 2.0
- Default: 0.5
- Description: Classifier-Free Guidance weight. Higher values make speech more aligned with the prompt text and speaker characteristics.
Temperature (temperature
)
- Range: 0.0 - 2.0
- Default: 0.8
- Description: Controls randomness in generation. Lower values produce more deterministic speech, higher values add more variation.
🔧 Configuration
Environment Variables
The API uses the following directory structure (configurable in app/config.py
):
- Speaker Samples:
{PROJECT_ROOT}/speaker_data/speaker_samples/
- Generated Audio:
{PROJECT_ROOT}/backend/tts_generated_dialogs/
- Temporary Files:
{PROJECT_ROOT}/tts_temp_outputs/
CORS Settings
- Allowed Origins:
http://localhost:8001
,http://127.0.0.1:8001
- Allowed Methods: All
- Allowed Headers: All
- Credentials: Enabled
🚀 Usage Examples
Python Client Example
import requests
import json
# Base URL
BASE_URL = "http://127.0.0.1:8000"
# Get all speakers
speakers = requests.get(f"{BASE_URL}/api/speakers/").json()
print("Available speakers:", speakers)
# Generate a simple dialog
dialog_request = {
"dialog_items": [
{
"type": "speech",
"speaker_id": speakers[0]["id"],
"text": "Hello world!",
"exaggeration": 0.7,
"cfg_weight": 0.6,
"temperature": 0.9
},
{
"type": "silence",
"duration": 1.0
}
],
"output_base_name": "test_dialog"
}
response = requests.post(
f"{BASE_URL}/api/dialog/generate",
json=dialog_request
)
if response.status_code == 200:
result = response.json()
print("Dialog generated!")
print("Audio URL:", result["concatenated_audio_url"])
print("ZIP URL:", result["zip_archive_url"])
else:
print("Error:", response.text)
JavaScript/Frontend Example
// Generate dialog
const dialogRequest = {
dialog_items: [
{
type: "speech",
speaker_id: "speaker_001",
text: "Welcome to our show!",
exaggeration: 0.6,
cfg_weight: 0.5,
temperature: 0.8
}
],
output_base_name: "intro"
};
fetch('http://127.0.0.1:8000/api/dialog/generate', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify(dialogRequest)
})
.then(response => response.json())
.then(data => {
console.log('Dialog generated:', data);
// Play the audio
const audio = new Audio(data.concatenated_audio_url);
audio.play();
});
⚠️ Error Handling
Common Error Responses
400 Bad Request
{
"detail": "Invalid value or configuration: Text cannot be empty"
}
404 Not Found
{
"detail": "Speaker sample for ID 'invalid_speaker' not found."
}
500 Internal Server Error
{
"detail": "Runtime error during dialog generation: CUDA out of memory"
}
Error Categories
- Validation Errors: Invalid input format, missing required fields
- Resource Errors: Speaker not found, file not accessible
- Processing Errors: TTS model failures, audio processing issues
- System Errors: Memory issues, disk space, model loading failures
🔍 Development & Debugging
Running the Server
# From project root
uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000
API Documentation
- Swagger UI:
http://127.0.0.1:8000/docs
- ReDoc:
http://127.0.0.1:8000/redoc
Logging
The API provides detailed logging in the DialogResponse.log
field for dialog generation operations.
File Management
- Generated files are stored in
backend/tts_generated_dialogs/
- Temporary processing files are kept for inspection (not auto-deleted)
- ZIP archives contain individual audio segments plus concatenated result
📝 Notes
- The API automatically loads and unloads TTS models to manage memory usage
- Speaker audio samples should be clear, single-speaker recordings for best results
- Large dialogs may take significant time to process depending on hardware
- Generated files are served statically and persist until manually cleaned up
Generated on: 2025-06-06
API Version: 0.1.0