feat/frontend-phase1 #1

Merged
stwhite merged 34 commits from feat/frontend-phase1 into main 2025-08-14 15:44:25 +00:00
2 changed files with 521 additions and 0 deletions
Showing only changes of commit c91a9598b1 - Show all commits

518
API_REFERENCE.md Normal file
View File

@ -0,0 +1,518 @@
# Chatterbox TTS API Reference
## Overview
The Chatterbox TTS API is a FastAPI-based backend service that provides text-to-speech capabilities with speaker management and dialog generation features. The API supports creating custom speakers from audio samples and generating complex dialogs with multiple speakers, silences, and fine-tuned TTS parameters.
**Base URL**: `http://127.0.0.1:8000`
**API Version**: 0.1.0
**Framework**: FastAPI with automatic OpenAPI documentation
## Quick Start
- **Interactive API Documentation**: `http://127.0.0.1:8000/docs` (Swagger UI)
- **Alternative Documentation**: `http://127.0.0.1:8000/redoc` (ReDoc)
- **OpenAPI Schema**: `http://127.0.0.1:8000/openapi.json`
## Authentication
Currently, the API does not require authentication. CORS is configured to allow requests from `localhost:8001` and `127.0.0.1:8001`.
---
## Endpoints
### 🏠 Root Endpoint
#### `GET /`
Welcome message and API status check.
**Response:**
```json
{
"message": "Welcome to the Chatterbox TTS API!"
}
```
---
## 👥 Speaker Management
### `GET /api/speakers/`
Retrieve all available speakers.
**Response Model:** `List[Speaker]`
**Example Response:**
```json
[
{
"id": "speaker_001",
"name": "John Doe",
"sample_path": "/path/to/speaker_samples/john_doe.wav"
},
{
"id": "speaker_002",
"name": "Jane Smith",
"sample_path": "/path/to/speaker_samples/jane_smith.wav"
}
]
```
**Status Codes:**
- `200`: Success
---
### `POST /api/speakers/`
Create a new speaker from an audio sample.
**Request Type:** `multipart/form-data`
**Parameters:**
- `name` (form field, required): Speaker name
- `audio_file` (file upload, required): Audio sample file (WAV, MP3, etc.)
**Response Model:** `SpeakerResponse`
**Example Response:**
```json
{
"id": "speaker_003",
"name": "Alex Johnson",
"message": "Speaker added successfully."
}
```
**Status Codes:**
- `201`: Speaker created successfully
- `400`: Invalid file type or missing file
- `500`: Server error during speaker creation
**Example cURL:**
```bash
curl -X POST "http://127.0.0.1:8000/api/speakers/" \
-F "name=Alex Johnson" \
-F "audio_file=@/path/to/sample.wav"
```
---
### `GET /api/speakers/{speaker_id}`
Get details for a specific speaker.
**Path Parameters:**
- `speaker_id` (string, required): Unique speaker identifier
**Response Model:** `Speaker`
**Example Response:**
```json
{
"id": "speaker_001",
"name": "John Doe",
"sample_path": "/path/to/speaker_samples/john_doe.wav"
}
```
**Status Codes:**
- `200`: Success
- `404`: Speaker not found
---
### `DELETE /api/speakers/{speaker_id}`
Delete a speaker by ID.
**Path Parameters:**
- `speaker_id` (string, required): Unique speaker identifier
**Example Response:**
```json
{
"message": "Speaker deleted successfully"
}
```
**Status Codes:**
- `200`: Speaker deleted successfully
- `404`: Speaker not found
---
## 🎭 Dialog Generation
### `POST /api/dialog/generate_line`
Generate audio for a single dialog line (speech or silence).
**Request Body:** Raw JSON object representing either a `SpeechItem` or `SilenceItem`
#### Speech Item Example:
```json
{
"type": "speech",
"speaker_id": "speaker_001",
"text": "Hello, this is a test message.",
"exaggeration": 0.7,
"cfg_weight": 0.6,
"temperature": 0.8,
"use_existing_audio": false,
"audio_url": null
}
```
#### Silence Item Example:
```json
{
"type": "silence",
"duration": 2.0,
"use_existing_audio": false,
"audio_url": null
}
```
**Response:**
```json
{
"audio_url": "/generated_audio/line_abc123def456.wav",
"type": "speech",
"text": "Hello, this is a test message."
}
```
**Status Codes:**
- `200`: Audio generated successfully
- `400`: Invalid request format or unknown dialog item type
- `404`: Speaker not found
- `500`: Server error during generation
---
### `POST /api/dialog/generate`
Generate a complete dialog from multiple speech and silence items.
**Request Model:** `DialogRequest`
**Request Body:**
```json
{
"dialog_items": [
{
"type": "speech",
"speaker_id": "speaker_001",
"text": "Welcome to our podcast!",
"exaggeration": 0.5,
"cfg_weight": 0.5,
"temperature": 0.8
},
{
"type": "silence",
"duration": 1.0
},
{
"type": "speech",
"speaker_id": "speaker_002",
"text": "Thank you for having me!",
"exaggeration": 0.6,
"cfg_weight": 0.7,
"temperature": 0.9
}
],
"output_base_name": "podcast_episode_01"
}
```
**Response Model:** `DialogResponse`
**Example Response:**
```json
{
"log": "Processing dialog with 3 items...\nGenerating speech for item 1...\nGenerating silence for item 2...\nGenerating speech for item 3...\nConcatenating audio segments...\nZIP archive created at: /path/to/output.zip",
"concatenated_audio_url": "/generated_audio/podcast_episode_01_concatenated.wav",
"zip_archive_url": "/generated_audio/podcast_episode_01_archive.zip",
"temp_dir_path": "/path/to/temp/directory",
"error_message": null
}
```
**Status Codes:**
- `200`: Dialog generated successfully
- `400`: Invalid request format or validation errors
- `404`: Speaker or file not found
- `500`: Server error during generation
---
## 📁 Static File Serving
### `GET /generated_audio/{filename}`
Serve generated audio files and ZIP archives.
**Path Parameters:**
- `filename` (string, required): Name of the generated file
**Response:** Binary audio file or ZIP archive
**Example URLs:**
- `http://127.0.0.1:8000/generated_audio/dialog_concatenated.wav`
- `http://127.0.0.1:8000/generated_audio/dialog_archive.zip`
---
## 📋 Data Models
### Speaker Models
#### `Speaker`
```json
{
"id": "string",
"name": "string",
"sample_path": "string|null"
}
```
#### `SpeakerResponse`
```json
{
"id": "string",
"name": "string",
"message": "string|null"
}
```
### Dialog Models
#### `SpeechItem`
```json
{
"type": "speech",
"speaker_id": "string",
"text": "string",
"exaggeration": 0.5, // 0.0-2.0, controls expressiveness
"cfg_weight": 0.5, // 0.0-2.0, alignment with speaker characteristics
"temperature": 0.8, // 0.0-2.0, randomness in generation
"use_existing_audio": false,
"audio_url": "string|null"
}
```
#### `SilenceItem`
```json
{
"type": "silence",
"duration": 1.0, // seconds, must be > 0
"use_existing_audio": false,
"audio_url": "string|null"
}
```
#### `DialogRequest`
```json
{
"dialog_items": [
// Array of SpeechItem and/or SilenceItem objects
],
"output_base_name": "string" // Base name for output files
}
```
#### `DialogResponse`
```json
{
"log": "string", // Processing log
"concatenated_audio_url": "string|null", // URL to final audio
"zip_archive_url": "string|null", // URL to ZIP archive
"temp_dir_path": "string|null", // Server temp directory
"error_message": "string|null" // Error details if failed
}
```
---
## 🎛️ TTS Parameters
### Exaggeration (`exaggeration`)
- **Range**: 0.0 - 2.0
- **Default**: 0.5
- **Description**: Controls the expressiveness of speech. Higher values produce more exaggerated, emotional speech.
### CFG Weight (`cfg_weight`)
- **Range**: 0.0 - 2.0
- **Default**: 0.5
- **Description**: Classifier-Free Guidance weight. Higher values make speech more aligned with the prompt text and speaker characteristics.
### Temperature (`temperature`)
- **Range**: 0.0 - 2.0
- **Default**: 0.8
- **Description**: Controls randomness in generation. Lower values produce more deterministic speech, higher values add more variation.
---
## 🔧 Configuration
### Environment Variables
The API uses the following directory structure (configurable in `app/config.py`):
- **Speaker Samples**: `{PROJECT_ROOT}/speaker_data/speaker_samples/`
- **Generated Audio**: `{PROJECT_ROOT}/backend/tts_generated_dialogs/`
- **Temporary Files**: `{PROJECT_ROOT}/tts_temp_outputs/`
### CORS Settings
- Allowed Origins: `http://localhost:8001`, `http://127.0.0.1:8001`
- Allowed Methods: All
- Allowed Headers: All
- Credentials: Enabled
---
## 🚀 Usage Examples
### Python Client Example
```python
import requests
import json
# Base URL
BASE_URL = "http://127.0.0.1:8000"
# Get all speakers
speakers = requests.get(f"{BASE_URL}/api/speakers/").json()
print("Available speakers:", speakers)
# Generate a simple dialog
dialog_request = {
"dialog_items": [
{
"type": "speech",
"speaker_id": speakers[0]["id"],
"text": "Hello world!",
"exaggeration": 0.7,
"cfg_weight": 0.6,
"temperature": 0.9
},
{
"type": "silence",
"duration": 1.0
}
],
"output_base_name": "test_dialog"
}
response = requests.post(
f"{BASE_URL}/api/dialog/generate",
json=dialog_request
)
if response.status_code == 200:
result = response.json()
print("Dialog generated!")
print("Audio URL:", result["concatenated_audio_url"])
print("ZIP URL:", result["zip_archive_url"])
else:
print("Error:", response.text)
```
### JavaScript/Frontend Example
```javascript
// Generate dialog
const dialogRequest = {
dialog_items: [
{
type: "speech",
speaker_id: "speaker_001",
text: "Welcome to our show!",
exaggeration: 0.6,
cfg_weight: 0.5,
temperature: 0.8
}
],
output_base_name: "intro"
};
fetch('http://127.0.0.1:8000/api/dialog/generate', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify(dialogRequest)
})
.then(response => response.json())
.then(data => {
console.log('Dialog generated:', data);
// Play the audio
const audio = new Audio(data.concatenated_audio_url);
audio.play();
});
```
---
## ⚠️ Error Handling
### Common Error Responses
#### 400 Bad Request
```json
{
"detail": "Invalid value or configuration: Text cannot be empty"
}
```
#### 404 Not Found
```json
{
"detail": "Speaker sample for ID 'invalid_speaker' not found."
}
```
#### 500 Internal Server Error
```json
{
"detail": "Runtime error during dialog generation: CUDA out of memory"
}
```
### Error Categories
- **Validation Errors**: Invalid input format, missing required fields
- **Resource Errors**: Speaker not found, file not accessible
- **Processing Errors**: TTS model failures, audio processing issues
- **System Errors**: Memory issues, disk space, model loading failures
---
## 🔍 Development & Debugging
### Running the Server
```bash
# From project root
uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000
```
### API Documentation
- **Swagger UI**: `http://127.0.0.1:8000/docs`
- **ReDoc**: `http://127.0.0.1:8000/redoc`
### Logging
The API provides detailed logging in the `DialogResponse.log` field for dialog generation operations.
### File Management
- Generated files are stored in `backend/tts_generated_dialogs/`
- Temporary processing files are kept for inspection (not auto-deleted)
- ZIP archives contain individual audio segments plus concatenated result
---
## 📝 Notes
- The API automatically loads and unloads TTS models to manage memory usage
- Speaker audio samples should be clear, single-speaker recordings for best results
- Large dialogs may take significant time to process depending on hardware
- Generated files are served statically and persist until manually cleaned up
---
*Generated on: 2025-06-06*
*API Version: 0.1.0*

View File

@ -16,3 +16,6 @@ fb84ce1c-f32d-4df9-9673-2c64e9603133:
a6387c23-4ca4-42b5-8aaf-5699dbabbdf0:
name: Mike
sample_path: speaker_samples/a6387c23-4ca4-42b5-8aaf-5699dbabbdf0.wav
6cf4d171-667d-4bc8-adbb-6d9b7c620cb8:
name: Minnie
sample_path: speaker_samples/6cf4d171-667d-4bc8-adbb-6d9b7c620cb8.wav