feat/frontend-phase1 #1
|
@ -0,0 +1,518 @@
|
|||
# Chatterbox TTS API Reference
|
||||
|
||||
## Overview
|
||||
|
||||
The Chatterbox TTS API is a FastAPI-based backend service that provides text-to-speech capabilities with speaker management and dialog generation features. The API supports creating custom speakers from audio samples and generating complex dialogs with multiple speakers, silences, and fine-tuned TTS parameters.
|
||||
|
||||
**Base URL**: `http://127.0.0.1:8000`
|
||||
**API Version**: 0.1.0
|
||||
**Framework**: FastAPI with automatic OpenAPI documentation
|
||||
|
||||
## Quick Start
|
||||
|
||||
- **Interactive API Documentation**: `http://127.0.0.1:8000/docs` (Swagger UI)
|
||||
- **Alternative Documentation**: `http://127.0.0.1:8000/redoc` (ReDoc)
|
||||
- **OpenAPI Schema**: `http://127.0.0.1:8000/openapi.json`
|
||||
|
||||
## Authentication
|
||||
|
||||
Currently, the API does not require authentication. CORS is configured to allow requests from `localhost:8001` and `127.0.0.1:8001`.
|
||||
|
||||
---
|
||||
|
||||
## Endpoints
|
||||
|
||||
### 🏠 Root Endpoint
|
||||
|
||||
#### `GET /`
|
||||
Welcome message and API status check.
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"message": "Welcome to the Chatterbox TTS API!"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 👥 Speaker Management
|
||||
|
||||
### `GET /api/speakers/`
|
||||
Retrieve all available speakers.
|
||||
|
||||
**Response Model:** `List[Speaker]`
|
||||
|
||||
**Example Response:**
|
||||
```json
|
||||
[
|
||||
{
|
||||
"id": "speaker_001",
|
||||
"name": "John Doe",
|
||||
"sample_path": "/path/to/speaker_samples/john_doe.wav"
|
||||
},
|
||||
{
|
||||
"id": "speaker_002",
|
||||
"name": "Jane Smith",
|
||||
"sample_path": "/path/to/speaker_samples/jane_smith.wav"
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
**Status Codes:**
|
||||
- `200`: Success
|
||||
|
||||
---
|
||||
|
||||
### `POST /api/speakers/`
|
||||
Create a new speaker from an audio sample.
|
||||
|
||||
**Request Type:** `multipart/form-data`
|
||||
|
||||
**Parameters:**
|
||||
- `name` (form field, required): Speaker name
|
||||
- `audio_file` (file upload, required): Audio sample file (WAV, MP3, etc.)
|
||||
|
||||
**Response Model:** `SpeakerResponse`
|
||||
|
||||
**Example Response:**
|
||||
```json
|
||||
{
|
||||
"id": "speaker_003",
|
||||
"name": "Alex Johnson",
|
||||
"message": "Speaker added successfully."
|
||||
}
|
||||
```
|
||||
|
||||
**Status Codes:**
|
||||
- `201`: Speaker created successfully
|
||||
- `400`: Invalid file type or missing file
|
||||
- `500`: Server error during speaker creation
|
||||
|
||||
**Example cURL:**
|
||||
```bash
|
||||
curl -X POST "http://127.0.0.1:8000/api/speakers/" \
|
||||
-F "name=Alex Johnson" \
|
||||
-F "audio_file=@/path/to/sample.wav"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### `GET /api/speakers/{speaker_id}`
|
||||
Get details for a specific speaker.
|
||||
|
||||
**Path Parameters:**
|
||||
- `speaker_id` (string, required): Unique speaker identifier
|
||||
|
||||
**Response Model:** `Speaker`
|
||||
|
||||
**Example Response:**
|
||||
```json
|
||||
{
|
||||
"id": "speaker_001",
|
||||
"name": "John Doe",
|
||||
"sample_path": "/path/to/speaker_samples/john_doe.wav"
|
||||
}
|
||||
```
|
||||
|
||||
**Status Codes:**
|
||||
- `200`: Success
|
||||
- `404`: Speaker not found
|
||||
|
||||
---
|
||||
|
||||
### `DELETE /api/speakers/{speaker_id}`
|
||||
Delete a speaker by ID.
|
||||
|
||||
**Path Parameters:**
|
||||
- `speaker_id` (string, required): Unique speaker identifier
|
||||
|
||||
**Example Response:**
|
||||
```json
|
||||
{
|
||||
"message": "Speaker deleted successfully"
|
||||
}
|
||||
```
|
||||
|
||||
**Status Codes:**
|
||||
- `200`: Speaker deleted successfully
|
||||
- `404`: Speaker not found
|
||||
|
||||
---
|
||||
|
||||
## 🎭 Dialog Generation
|
||||
|
||||
### `POST /api/dialog/generate_line`
|
||||
Generate audio for a single dialog line (speech or silence).
|
||||
|
||||
**Request Body:** Raw JSON object representing either a `SpeechItem` or `SilenceItem`
|
||||
|
||||
#### Speech Item Example:
|
||||
```json
|
||||
{
|
||||
"type": "speech",
|
||||
"speaker_id": "speaker_001",
|
||||
"text": "Hello, this is a test message.",
|
||||
"exaggeration": 0.7,
|
||||
"cfg_weight": 0.6,
|
||||
"temperature": 0.8,
|
||||
"use_existing_audio": false,
|
||||
"audio_url": null
|
||||
}
|
||||
```
|
||||
|
||||
#### Silence Item Example:
|
||||
```json
|
||||
{
|
||||
"type": "silence",
|
||||
"duration": 2.0,
|
||||
"use_existing_audio": false,
|
||||
"audio_url": null
|
||||
}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"audio_url": "/generated_audio/line_abc123def456.wav",
|
||||
"type": "speech",
|
||||
"text": "Hello, this is a test message."
|
||||
}
|
||||
```
|
||||
|
||||
**Status Codes:**
|
||||
- `200`: Audio generated successfully
|
||||
- `400`: Invalid request format or unknown dialog item type
|
||||
- `404`: Speaker not found
|
||||
- `500`: Server error during generation
|
||||
|
||||
---
|
||||
|
||||
### `POST /api/dialog/generate`
|
||||
Generate a complete dialog from multiple speech and silence items.
|
||||
|
||||
**Request Model:** `DialogRequest`
|
||||
|
||||
**Request Body:**
|
||||
```json
|
||||
{
|
||||
"dialog_items": [
|
||||
{
|
||||
"type": "speech",
|
||||
"speaker_id": "speaker_001",
|
||||
"text": "Welcome to our podcast!",
|
||||
"exaggeration": 0.5,
|
||||
"cfg_weight": 0.5,
|
||||
"temperature": 0.8
|
||||
},
|
||||
{
|
||||
"type": "silence",
|
||||
"duration": 1.0
|
||||
},
|
||||
{
|
||||
"type": "speech",
|
||||
"speaker_id": "speaker_002",
|
||||
"text": "Thank you for having me!",
|
||||
"exaggeration": 0.6,
|
||||
"cfg_weight": 0.7,
|
||||
"temperature": 0.9
|
||||
}
|
||||
],
|
||||
"output_base_name": "podcast_episode_01"
|
||||
}
|
||||
```
|
||||
|
||||
**Response Model:** `DialogResponse`
|
||||
|
||||
**Example Response:**
|
||||
```json
|
||||
{
|
||||
"log": "Processing dialog with 3 items...\nGenerating speech for item 1...\nGenerating silence for item 2...\nGenerating speech for item 3...\nConcatenating audio segments...\nZIP archive created at: /path/to/output.zip",
|
||||
"concatenated_audio_url": "/generated_audio/podcast_episode_01_concatenated.wav",
|
||||
"zip_archive_url": "/generated_audio/podcast_episode_01_archive.zip",
|
||||
"temp_dir_path": "/path/to/temp/directory",
|
||||
"error_message": null
|
||||
}
|
||||
```
|
||||
|
||||
**Status Codes:**
|
||||
- `200`: Dialog generated successfully
|
||||
- `400`: Invalid request format or validation errors
|
||||
- `404`: Speaker or file not found
|
||||
- `500`: Server error during generation
|
||||
|
||||
---
|
||||
|
||||
## 📁 Static File Serving
|
||||
|
||||
### `GET /generated_audio/{filename}`
|
||||
Serve generated audio files and ZIP archives.
|
||||
|
||||
**Path Parameters:**
|
||||
- `filename` (string, required): Name of the generated file
|
||||
|
||||
**Response:** Binary audio file or ZIP archive
|
||||
|
||||
**Example URLs:**
|
||||
- `http://127.0.0.1:8000/generated_audio/dialog_concatenated.wav`
|
||||
- `http://127.0.0.1:8000/generated_audio/dialog_archive.zip`
|
||||
|
||||
---
|
||||
|
||||
## 📋 Data Models
|
||||
|
||||
### Speaker Models
|
||||
|
||||
#### `Speaker`
|
||||
```json
|
||||
{
|
||||
"id": "string",
|
||||
"name": "string",
|
||||
"sample_path": "string|null"
|
||||
}
|
||||
```
|
||||
|
||||
#### `SpeakerResponse`
|
||||
```json
|
||||
{
|
||||
"id": "string",
|
||||
"name": "string",
|
||||
"message": "string|null"
|
||||
}
|
||||
```
|
||||
|
||||
### Dialog Models
|
||||
|
||||
#### `SpeechItem`
|
||||
```json
|
||||
{
|
||||
"type": "speech",
|
||||
"speaker_id": "string",
|
||||
"text": "string",
|
||||
"exaggeration": 0.5, // 0.0-2.0, controls expressiveness
|
||||
"cfg_weight": 0.5, // 0.0-2.0, alignment with speaker characteristics
|
||||
"temperature": 0.8, // 0.0-2.0, randomness in generation
|
||||
"use_existing_audio": false,
|
||||
"audio_url": "string|null"
|
||||
}
|
||||
```
|
||||
|
||||
#### `SilenceItem`
|
||||
```json
|
||||
{
|
||||
"type": "silence",
|
||||
"duration": 1.0, // seconds, must be > 0
|
||||
"use_existing_audio": false,
|
||||
"audio_url": "string|null"
|
||||
}
|
||||
```
|
||||
|
||||
#### `DialogRequest`
|
||||
```json
|
||||
{
|
||||
"dialog_items": [
|
||||
// Array of SpeechItem and/or SilenceItem objects
|
||||
],
|
||||
"output_base_name": "string" // Base name for output files
|
||||
}
|
||||
```
|
||||
|
||||
#### `DialogResponse`
|
||||
```json
|
||||
{
|
||||
"log": "string", // Processing log
|
||||
"concatenated_audio_url": "string|null", // URL to final audio
|
||||
"zip_archive_url": "string|null", // URL to ZIP archive
|
||||
"temp_dir_path": "string|null", // Server temp directory
|
||||
"error_message": "string|null" // Error details if failed
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🎛️ TTS Parameters
|
||||
|
||||
### Exaggeration (`exaggeration`)
|
||||
- **Range**: 0.0 - 2.0
|
||||
- **Default**: 0.5
|
||||
- **Description**: Controls the expressiveness of speech. Higher values produce more exaggerated, emotional speech.
|
||||
|
||||
### CFG Weight (`cfg_weight`)
|
||||
- **Range**: 0.0 - 2.0
|
||||
- **Default**: 0.5
|
||||
- **Description**: Classifier-Free Guidance weight. Higher values make speech more aligned with the prompt text and speaker characteristics.
|
||||
|
||||
### Temperature (`temperature`)
|
||||
- **Range**: 0.0 - 2.0
|
||||
- **Default**: 0.8
|
||||
- **Description**: Controls randomness in generation. Lower values produce more deterministic speech, higher values add more variation.
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Configuration
|
||||
|
||||
### Environment Variables
|
||||
The API uses the following directory structure (configurable in `app/config.py`):
|
||||
|
||||
- **Speaker Samples**: `{PROJECT_ROOT}/speaker_data/speaker_samples/`
|
||||
- **Generated Audio**: `{PROJECT_ROOT}/backend/tts_generated_dialogs/`
|
||||
- **Temporary Files**: `{PROJECT_ROOT}/tts_temp_outputs/`
|
||||
|
||||
### CORS Settings
|
||||
- Allowed Origins: `http://localhost:8001`, `http://127.0.0.1:8001`
|
||||
- Allowed Methods: All
|
||||
- Allowed Headers: All
|
||||
- Credentials: Enabled
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Usage Examples
|
||||
|
||||
### Python Client Example
|
||||
|
||||
```python
|
||||
import requests
|
||||
import json
|
||||
|
||||
# Base URL
|
||||
BASE_URL = "http://127.0.0.1:8000"
|
||||
|
||||
# Get all speakers
|
||||
speakers = requests.get(f"{BASE_URL}/api/speakers/").json()
|
||||
print("Available speakers:", speakers)
|
||||
|
||||
# Generate a simple dialog
|
||||
dialog_request = {
|
||||
"dialog_items": [
|
||||
{
|
||||
"type": "speech",
|
||||
"speaker_id": speakers[0]["id"],
|
||||
"text": "Hello world!",
|
||||
"exaggeration": 0.7,
|
||||
"cfg_weight": 0.6,
|
||||
"temperature": 0.9
|
||||
},
|
||||
{
|
||||
"type": "silence",
|
||||
"duration": 1.0
|
||||
}
|
||||
],
|
||||
"output_base_name": "test_dialog"
|
||||
}
|
||||
|
||||
response = requests.post(
|
||||
f"{BASE_URL}/api/dialog/generate",
|
||||
json=dialog_request
|
||||
)
|
||||
|
||||
if response.status_code == 200:
|
||||
result = response.json()
|
||||
print("Dialog generated!")
|
||||
print("Audio URL:", result["concatenated_audio_url"])
|
||||
print("ZIP URL:", result["zip_archive_url"])
|
||||
else:
|
||||
print("Error:", response.text)
|
||||
```
|
||||
|
||||
### JavaScript/Frontend Example
|
||||
|
||||
```javascript
|
||||
// Generate dialog
|
||||
const dialogRequest = {
|
||||
dialog_items: [
|
||||
{
|
||||
type: "speech",
|
||||
speaker_id: "speaker_001",
|
||||
text: "Welcome to our show!",
|
||||
exaggeration: 0.6,
|
||||
cfg_weight: 0.5,
|
||||
temperature: 0.8
|
||||
}
|
||||
],
|
||||
output_base_name: "intro"
|
||||
};
|
||||
|
||||
fetch('http://127.0.0.1:8000/api/dialog/generate', {
|
||||
method: 'POST',
|
||||
headers: {
|
||||
'Content-Type': 'application/json',
|
||||
},
|
||||
body: JSON.stringify(dialogRequest)
|
||||
})
|
||||
.then(response => response.json())
|
||||
.then(data => {
|
||||
console.log('Dialog generated:', data);
|
||||
// Play the audio
|
||||
const audio = new Audio(data.concatenated_audio_url);
|
||||
audio.play();
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ⚠️ Error Handling
|
||||
|
||||
### Common Error Responses
|
||||
|
||||
#### 400 Bad Request
|
||||
```json
|
||||
{
|
||||
"detail": "Invalid value or configuration: Text cannot be empty"
|
||||
}
|
||||
```
|
||||
|
||||
#### 404 Not Found
|
||||
```json
|
||||
{
|
||||
"detail": "Speaker sample for ID 'invalid_speaker' not found."
|
||||
}
|
||||
```
|
||||
|
||||
#### 500 Internal Server Error
|
||||
```json
|
||||
{
|
||||
"detail": "Runtime error during dialog generation: CUDA out of memory"
|
||||
}
|
||||
```
|
||||
|
||||
### Error Categories
|
||||
- **Validation Errors**: Invalid input format, missing required fields
|
||||
- **Resource Errors**: Speaker not found, file not accessible
|
||||
- **Processing Errors**: TTS model failures, audio processing issues
|
||||
- **System Errors**: Memory issues, disk space, model loading failures
|
||||
|
||||
---
|
||||
|
||||
## 🔍 Development & Debugging
|
||||
|
||||
### Running the Server
|
||||
```bash
|
||||
# From project root
|
||||
uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000
|
||||
```
|
||||
|
||||
### API Documentation
|
||||
- **Swagger UI**: `http://127.0.0.1:8000/docs`
|
||||
- **ReDoc**: `http://127.0.0.1:8000/redoc`
|
||||
|
||||
### Logging
|
||||
The API provides detailed logging in the `DialogResponse.log` field for dialog generation operations.
|
||||
|
||||
### File Management
|
||||
- Generated files are stored in `backend/tts_generated_dialogs/`
|
||||
- Temporary processing files are kept for inspection (not auto-deleted)
|
||||
- ZIP archives contain individual audio segments plus concatenated result
|
||||
|
||||
---
|
||||
|
||||
## 📝 Notes
|
||||
|
||||
- The API automatically loads and unloads TTS models to manage memory usage
|
||||
- Speaker audio samples should be clear, single-speaker recordings for best results
|
||||
- Large dialogs may take significant time to process depending on hardware
|
||||
- Generated files are served statically and persist until manually cleaned up
|
||||
|
||||
---
|
||||
|
||||
*Generated on: 2025-06-06*
|
||||
*API Version: 0.1.0*
|
|
@ -16,3 +16,6 @@ fb84ce1c-f32d-4df9-9673-2c64e9603133:
|
|||
a6387c23-4ca4-42b5-8aaf-5699dbabbdf0:
|
||||
name: Mike
|
||||
sample_path: speaker_samples/a6387c23-4ca4-42b5-8aaf-5699dbabbdf0.wav
|
||||
6cf4d171-667d-4bc8-adbb-6d9b7c620cb8:
|
||||
name: Minnie
|
||||
sample_path: speaker_samples/6cf4d171-667d-4bc8-adbb-6d9b7c620cb8.wav
|
||||
|
|
Loading…
Reference in New Issue