Compare commits
13 Commits
9e4fb35800
...
c91a9598b1
Author | SHA1 | Date |
---|---|---|
|
c91a9598b1 | |
|
b37aa56fa6 | |
|
f9e952286d | |
|
26f1d98b46 | |
|
e11a4a091c | |
|
252f885b5a | |
|
d8eb2492d7 | |
|
d3ff6e5241 | |
|
4a7c1ea6a1 | |
|
0261b86ad2 | |
|
6ccdd18463 | |
|
1575bf4292 | |
|
f2f907452b |
|
@ -0,0 +1,518 @@
|
||||||
|
# Chatterbox TTS API Reference
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
The Chatterbox TTS API is a FastAPI-based backend service that provides text-to-speech capabilities with speaker management and dialog generation features. The API supports creating custom speakers from audio samples and generating complex dialogs with multiple speakers, silences, and fine-tuned TTS parameters.
|
||||||
|
|
||||||
|
**Base URL**: `http://127.0.0.1:8000`
|
||||||
|
**API Version**: 0.1.0
|
||||||
|
**Framework**: FastAPI with automatic OpenAPI documentation
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
- **Interactive API Documentation**: `http://127.0.0.1:8000/docs` (Swagger UI)
|
||||||
|
- **Alternative Documentation**: `http://127.0.0.1:8000/redoc` (ReDoc)
|
||||||
|
- **OpenAPI Schema**: `http://127.0.0.1:8000/openapi.json`
|
||||||
|
|
||||||
|
## Authentication
|
||||||
|
|
||||||
|
Currently, the API does not require authentication. CORS is configured to allow requests from `localhost:8001` and `127.0.0.1:8001`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Endpoints
|
||||||
|
|
||||||
|
### 🏠 Root Endpoint
|
||||||
|
|
||||||
|
#### `GET /`
|
||||||
|
Welcome message and API status check.
|
||||||
|
|
||||||
|
**Response:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"message": "Welcome to the Chatterbox TTS API!"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 👥 Speaker Management
|
||||||
|
|
||||||
|
### `GET /api/speakers/`
|
||||||
|
Retrieve all available speakers.
|
||||||
|
|
||||||
|
**Response Model:** `List[Speaker]`
|
||||||
|
|
||||||
|
**Example Response:**
|
||||||
|
```json
|
||||||
|
[
|
||||||
|
{
|
||||||
|
"id": "speaker_001",
|
||||||
|
"name": "John Doe",
|
||||||
|
"sample_path": "/path/to/speaker_samples/john_doe.wav"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "speaker_002",
|
||||||
|
"name": "Jane Smith",
|
||||||
|
"sample_path": "/path/to/speaker_samples/jane_smith.wav"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
|
**Status Codes:**
|
||||||
|
- `200`: Success
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### `POST /api/speakers/`
|
||||||
|
Create a new speaker from an audio sample.
|
||||||
|
|
||||||
|
**Request Type:** `multipart/form-data`
|
||||||
|
|
||||||
|
**Parameters:**
|
||||||
|
- `name` (form field, required): Speaker name
|
||||||
|
- `audio_file` (file upload, required): Audio sample file (WAV, MP3, etc.)
|
||||||
|
|
||||||
|
**Response Model:** `SpeakerResponse`
|
||||||
|
|
||||||
|
**Example Response:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"id": "speaker_003",
|
||||||
|
"name": "Alex Johnson",
|
||||||
|
"message": "Speaker added successfully."
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Status Codes:**
|
||||||
|
- `201`: Speaker created successfully
|
||||||
|
- `400`: Invalid file type or missing file
|
||||||
|
- `500`: Server error during speaker creation
|
||||||
|
|
||||||
|
**Example cURL:**
|
||||||
|
```bash
|
||||||
|
curl -X POST "http://127.0.0.1:8000/api/speakers/" \
|
||||||
|
-F "name=Alex Johnson" \
|
||||||
|
-F "audio_file=@/path/to/sample.wav"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### `GET /api/speakers/{speaker_id}`
|
||||||
|
Get details for a specific speaker.
|
||||||
|
|
||||||
|
**Path Parameters:**
|
||||||
|
- `speaker_id` (string, required): Unique speaker identifier
|
||||||
|
|
||||||
|
**Response Model:** `Speaker`
|
||||||
|
|
||||||
|
**Example Response:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"id": "speaker_001",
|
||||||
|
"name": "John Doe",
|
||||||
|
"sample_path": "/path/to/speaker_samples/john_doe.wav"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Status Codes:**
|
||||||
|
- `200`: Success
|
||||||
|
- `404`: Speaker not found
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### `DELETE /api/speakers/{speaker_id}`
|
||||||
|
Delete a speaker by ID.
|
||||||
|
|
||||||
|
**Path Parameters:**
|
||||||
|
- `speaker_id` (string, required): Unique speaker identifier
|
||||||
|
|
||||||
|
**Example Response:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"message": "Speaker deleted successfully"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Status Codes:**
|
||||||
|
- `200`: Speaker deleted successfully
|
||||||
|
- `404`: Speaker not found
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🎭 Dialog Generation
|
||||||
|
|
||||||
|
### `POST /api/dialog/generate_line`
|
||||||
|
Generate audio for a single dialog line (speech or silence).
|
||||||
|
|
||||||
|
**Request Body:** Raw JSON object representing either a `SpeechItem` or `SilenceItem`
|
||||||
|
|
||||||
|
#### Speech Item Example:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"type": "speech",
|
||||||
|
"speaker_id": "speaker_001",
|
||||||
|
"text": "Hello, this is a test message.",
|
||||||
|
"exaggeration": 0.7,
|
||||||
|
"cfg_weight": 0.6,
|
||||||
|
"temperature": 0.8,
|
||||||
|
"use_existing_audio": false,
|
||||||
|
"audio_url": null
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Silence Item Example:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"type": "silence",
|
||||||
|
"duration": 2.0,
|
||||||
|
"use_existing_audio": false,
|
||||||
|
"audio_url": null
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Response:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"audio_url": "/generated_audio/line_abc123def456.wav",
|
||||||
|
"type": "speech",
|
||||||
|
"text": "Hello, this is a test message."
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Status Codes:**
|
||||||
|
- `200`: Audio generated successfully
|
||||||
|
- `400`: Invalid request format or unknown dialog item type
|
||||||
|
- `404`: Speaker not found
|
||||||
|
- `500`: Server error during generation
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### `POST /api/dialog/generate`
|
||||||
|
Generate a complete dialog from multiple speech and silence items.
|
||||||
|
|
||||||
|
**Request Model:** `DialogRequest`
|
||||||
|
|
||||||
|
**Request Body:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"dialog_items": [
|
||||||
|
{
|
||||||
|
"type": "speech",
|
||||||
|
"speaker_id": "speaker_001",
|
||||||
|
"text": "Welcome to our podcast!",
|
||||||
|
"exaggeration": 0.5,
|
||||||
|
"cfg_weight": 0.5,
|
||||||
|
"temperature": 0.8
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"type": "silence",
|
||||||
|
"duration": 1.0
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"type": "speech",
|
||||||
|
"speaker_id": "speaker_002",
|
||||||
|
"text": "Thank you for having me!",
|
||||||
|
"exaggeration": 0.6,
|
||||||
|
"cfg_weight": 0.7,
|
||||||
|
"temperature": 0.9
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"output_base_name": "podcast_episode_01"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Response Model:** `DialogResponse`
|
||||||
|
|
||||||
|
**Example Response:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"log": "Processing dialog with 3 items...\nGenerating speech for item 1...\nGenerating silence for item 2...\nGenerating speech for item 3...\nConcatenating audio segments...\nZIP archive created at: /path/to/output.zip",
|
||||||
|
"concatenated_audio_url": "/generated_audio/podcast_episode_01_concatenated.wav",
|
||||||
|
"zip_archive_url": "/generated_audio/podcast_episode_01_archive.zip",
|
||||||
|
"temp_dir_path": "/path/to/temp/directory",
|
||||||
|
"error_message": null
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Status Codes:**
|
||||||
|
- `200`: Dialog generated successfully
|
||||||
|
- `400`: Invalid request format or validation errors
|
||||||
|
- `404`: Speaker or file not found
|
||||||
|
- `500`: Server error during generation
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📁 Static File Serving
|
||||||
|
|
||||||
|
### `GET /generated_audio/{filename}`
|
||||||
|
Serve generated audio files and ZIP archives.
|
||||||
|
|
||||||
|
**Path Parameters:**
|
||||||
|
- `filename` (string, required): Name of the generated file
|
||||||
|
|
||||||
|
**Response:** Binary audio file or ZIP archive
|
||||||
|
|
||||||
|
**Example URLs:**
|
||||||
|
- `http://127.0.0.1:8000/generated_audio/dialog_concatenated.wav`
|
||||||
|
- `http://127.0.0.1:8000/generated_audio/dialog_archive.zip`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📋 Data Models
|
||||||
|
|
||||||
|
### Speaker Models
|
||||||
|
|
||||||
|
#### `Speaker`
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"id": "string",
|
||||||
|
"name": "string",
|
||||||
|
"sample_path": "string|null"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
#### `SpeakerResponse`
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"id": "string",
|
||||||
|
"name": "string",
|
||||||
|
"message": "string|null"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Dialog Models
|
||||||
|
|
||||||
|
#### `SpeechItem`
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"type": "speech",
|
||||||
|
"speaker_id": "string",
|
||||||
|
"text": "string",
|
||||||
|
"exaggeration": 0.5, // 0.0-2.0, controls expressiveness
|
||||||
|
"cfg_weight": 0.5, // 0.0-2.0, alignment with speaker characteristics
|
||||||
|
"temperature": 0.8, // 0.0-2.0, randomness in generation
|
||||||
|
"use_existing_audio": false,
|
||||||
|
"audio_url": "string|null"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
#### `SilenceItem`
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"type": "silence",
|
||||||
|
"duration": 1.0, // seconds, must be > 0
|
||||||
|
"use_existing_audio": false,
|
||||||
|
"audio_url": "string|null"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
#### `DialogRequest`
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"dialog_items": [
|
||||||
|
// Array of SpeechItem and/or SilenceItem objects
|
||||||
|
],
|
||||||
|
"output_base_name": "string" // Base name for output files
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
#### `DialogResponse`
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"log": "string", // Processing log
|
||||||
|
"concatenated_audio_url": "string|null", // URL to final audio
|
||||||
|
"zip_archive_url": "string|null", // URL to ZIP archive
|
||||||
|
"temp_dir_path": "string|null", // Server temp directory
|
||||||
|
"error_message": "string|null" // Error details if failed
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🎛️ TTS Parameters
|
||||||
|
|
||||||
|
### Exaggeration (`exaggeration`)
|
||||||
|
- **Range**: 0.0 - 2.0
|
||||||
|
- **Default**: 0.5
|
||||||
|
- **Description**: Controls the expressiveness of speech. Higher values produce more exaggerated, emotional speech.
|
||||||
|
|
||||||
|
### CFG Weight (`cfg_weight`)
|
||||||
|
- **Range**: 0.0 - 2.0
|
||||||
|
- **Default**: 0.5
|
||||||
|
- **Description**: Classifier-Free Guidance weight. Higher values make speech more aligned with the prompt text and speaker characteristics.
|
||||||
|
|
||||||
|
### Temperature (`temperature`)
|
||||||
|
- **Range**: 0.0 - 2.0
|
||||||
|
- **Default**: 0.8
|
||||||
|
- **Description**: Controls randomness in generation. Lower values produce more deterministic speech, higher values add more variation.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🔧 Configuration
|
||||||
|
|
||||||
|
### Environment Variables
|
||||||
|
The API uses the following directory structure (configurable in `app/config.py`):
|
||||||
|
|
||||||
|
- **Speaker Samples**: `{PROJECT_ROOT}/speaker_data/speaker_samples/`
|
||||||
|
- **Generated Audio**: `{PROJECT_ROOT}/backend/tts_generated_dialogs/`
|
||||||
|
- **Temporary Files**: `{PROJECT_ROOT}/tts_temp_outputs/`
|
||||||
|
|
||||||
|
### CORS Settings
|
||||||
|
- Allowed Origins: `http://localhost:8001`, `http://127.0.0.1:8001`
|
||||||
|
- Allowed Methods: All
|
||||||
|
- Allowed Headers: All
|
||||||
|
- Credentials: Enabled
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🚀 Usage Examples
|
||||||
|
|
||||||
|
### Python Client Example
|
||||||
|
|
||||||
|
```python
|
||||||
|
import requests
|
||||||
|
import json
|
||||||
|
|
||||||
|
# Base URL
|
||||||
|
BASE_URL = "http://127.0.0.1:8000"
|
||||||
|
|
||||||
|
# Get all speakers
|
||||||
|
speakers = requests.get(f"{BASE_URL}/api/speakers/").json()
|
||||||
|
print("Available speakers:", speakers)
|
||||||
|
|
||||||
|
# Generate a simple dialog
|
||||||
|
dialog_request = {
|
||||||
|
"dialog_items": [
|
||||||
|
{
|
||||||
|
"type": "speech",
|
||||||
|
"speaker_id": speakers[0]["id"],
|
||||||
|
"text": "Hello world!",
|
||||||
|
"exaggeration": 0.7,
|
||||||
|
"cfg_weight": 0.6,
|
||||||
|
"temperature": 0.9
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"type": "silence",
|
||||||
|
"duration": 1.0
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"output_base_name": "test_dialog"
|
||||||
|
}
|
||||||
|
|
||||||
|
response = requests.post(
|
||||||
|
f"{BASE_URL}/api/dialog/generate",
|
||||||
|
json=dialog_request
|
||||||
|
)
|
||||||
|
|
||||||
|
if response.status_code == 200:
|
||||||
|
result = response.json()
|
||||||
|
print("Dialog generated!")
|
||||||
|
print("Audio URL:", result["concatenated_audio_url"])
|
||||||
|
print("ZIP URL:", result["zip_archive_url"])
|
||||||
|
else:
|
||||||
|
print("Error:", response.text)
|
||||||
|
```
|
||||||
|
|
||||||
|
### JavaScript/Frontend Example
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
// Generate dialog
|
||||||
|
const dialogRequest = {
|
||||||
|
dialog_items: [
|
||||||
|
{
|
||||||
|
type: "speech",
|
||||||
|
speaker_id: "speaker_001",
|
||||||
|
text: "Welcome to our show!",
|
||||||
|
exaggeration: 0.6,
|
||||||
|
cfg_weight: 0.5,
|
||||||
|
temperature: 0.8
|
||||||
|
}
|
||||||
|
],
|
||||||
|
output_base_name: "intro"
|
||||||
|
};
|
||||||
|
|
||||||
|
fetch('http://127.0.0.1:8000/api/dialog/generate', {
|
||||||
|
method: 'POST',
|
||||||
|
headers: {
|
||||||
|
'Content-Type': 'application/json',
|
||||||
|
},
|
||||||
|
body: JSON.stringify(dialogRequest)
|
||||||
|
})
|
||||||
|
.then(response => response.json())
|
||||||
|
.then(data => {
|
||||||
|
console.log('Dialog generated:', data);
|
||||||
|
// Play the audio
|
||||||
|
const audio = new Audio(data.concatenated_audio_url);
|
||||||
|
audio.play();
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## ⚠️ Error Handling
|
||||||
|
|
||||||
|
### Common Error Responses
|
||||||
|
|
||||||
|
#### 400 Bad Request
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"detail": "Invalid value or configuration: Text cannot be empty"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 404 Not Found
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"detail": "Speaker sample for ID 'invalid_speaker' not found."
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 500 Internal Server Error
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"detail": "Runtime error during dialog generation: CUDA out of memory"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Error Categories
|
||||||
|
- **Validation Errors**: Invalid input format, missing required fields
|
||||||
|
- **Resource Errors**: Speaker not found, file not accessible
|
||||||
|
- **Processing Errors**: TTS model failures, audio processing issues
|
||||||
|
- **System Errors**: Memory issues, disk space, model loading failures
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🔍 Development & Debugging
|
||||||
|
|
||||||
|
### Running the Server
|
||||||
|
```bash
|
||||||
|
# From project root
|
||||||
|
uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000
|
||||||
|
```
|
||||||
|
|
||||||
|
### API Documentation
|
||||||
|
- **Swagger UI**: `http://127.0.0.1:8000/docs`
|
||||||
|
- **ReDoc**: `http://127.0.0.1:8000/redoc`
|
||||||
|
|
||||||
|
### Logging
|
||||||
|
The API provides detailed logging in the `DialogResponse.log` field for dialog generation operations.
|
||||||
|
|
||||||
|
### File Management
|
||||||
|
- Generated files are stored in `backend/tts_generated_dialogs/`
|
||||||
|
- Temporary processing files are kept for inspection (not auto-deleted)
|
||||||
|
- ZIP archives contain individual audio segments plus concatenated result
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📝 Notes
|
||||||
|
|
||||||
|
- The API automatically loads and unloads TTS models to manage memory usage
|
||||||
|
- Speaker audio samples should be clear, single-speaker recordings for best results
|
||||||
|
- Large dialogs may take significant time to process depending on hardware
|
||||||
|
- Generated files are served statically and persist until manually cleaned up
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*Generated on: 2025-06-06*
|
||||||
|
*API Version: 0.1.0*
|
|
@ -0,0 +1,102 @@
|
||||||
|
# CLAUDE.md
|
||||||
|
|
||||||
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||||
|
|
||||||
|
## Common Development Commands
|
||||||
|
|
||||||
|
### Backend (FastAPI)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Install backend dependencies (run from project root)
|
||||||
|
pip install -r backend/requirements.txt
|
||||||
|
|
||||||
|
# Run backend development server (run from project root)
|
||||||
|
uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000
|
||||||
|
|
||||||
|
# Run backend tests
|
||||||
|
python backend/run_api_test.py
|
||||||
|
|
||||||
|
# Backend API accessible at http://127.0.0.1:8000
|
||||||
|
# API docs at http://127.0.0.1:8000/docs
|
||||||
|
```
|
||||||
|
|
||||||
|
### Frontend Testing
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run frontend tests
|
||||||
|
npm test
|
||||||
|
```
|
||||||
|
|
||||||
|
### Alternative Interfaces
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run Gradio interface (standalone TTS app)
|
||||||
|
python gradio_app.py
|
||||||
|
```
|
||||||
|
|
||||||
|
## Architecture Overview
|
||||||
|
|
||||||
|
This is a full-stack TTS (Text-to-Speech) application with three interfaces:
|
||||||
|
|
||||||
|
1. **Modern web frontend** (vanilla JS) - Interactive dialog editor at `frontend/`
|
||||||
|
2. **FastAPI backend** - REST API at `backend/`
|
||||||
|
3. **Gradio interface** - Alternative UI in `gradio_app.py`
|
||||||
|
|
||||||
|
### Frontend-Backend Communication
|
||||||
|
|
||||||
|
- **Frontend**: Vanilla JS (ES6 modules) serving on port 8001
|
||||||
|
- **Backend**: FastAPI serving on port 8000
|
||||||
|
- **API Base**: `http://localhost:8000/api`
|
||||||
|
- **CORS**: Configured for frontend communication
|
||||||
|
- **File Serving**: Generated audio served via `/generated_audio/` endpoint
|
||||||
|
|
||||||
|
### Key API Endpoints
|
||||||
|
|
||||||
|
- `/api/speakers/` - Speaker CRUD operations
|
||||||
|
- `/api/dialog/generate/` - Full dialog generation
|
||||||
|
- `/api/dialog/generate_line/` - Single line generation
|
||||||
|
- `/generated_audio/` - Static audio file serving
|
||||||
|
|
||||||
|
### Backend Service Architecture
|
||||||
|
|
||||||
|
Located in `backend/app/services/`:
|
||||||
|
|
||||||
|
- **TTSService**: Chatterbox TTS model lifecycle management
|
||||||
|
- **SpeakerManagementService**: Speaker data and sample management
|
||||||
|
- **DialogProcessorService**: Dialog script to audio processing
|
||||||
|
- **AudioManipulationService**: Audio concatenation and ZIP creation
|
||||||
|
|
||||||
|
### Frontend Architecture
|
||||||
|
|
||||||
|
- **Modular design**: `api.js` (API layer) + `app.js` (app logic)
|
||||||
|
- **No framework**: Modern vanilla JavaScript with ES6+ features
|
||||||
|
- **Interactive editor**: Table-based dialog creation with drag-drop reordering
|
||||||
|
|
||||||
|
### Data Flow
|
||||||
|
|
||||||
|
1. User creates dialog in frontend table editor
|
||||||
|
2. Frontend sends dialog items to `/api/dialog/generate/`
|
||||||
|
3. Backend processes speech/silence items via services
|
||||||
|
4. TTS generates audio, segments concatenated
|
||||||
|
5. ZIP archive created with all outputs
|
||||||
|
6. Frontend receives URLs for playback/download
|
||||||
|
|
||||||
|
### Speaker Configuration
|
||||||
|
|
||||||
|
- **Location**: `speaker_data/speakers.yaml` and `speaker_data/speaker_samples/`
|
||||||
|
- **Format**: YAML config referencing WAV audio samples
|
||||||
|
- **Management**: Both API endpoints and file-based configuration
|
||||||
|
|
||||||
|
### Output Organization
|
||||||
|
|
||||||
|
- `dialog_output/` - Generated dialog files
|
||||||
|
- `single_output/` - Single utterance outputs
|
||||||
|
- `tts_outputs/` - Raw TTS generation files
|
||||||
|
- Generated ZIPs contain organized file structure
|
||||||
|
|
||||||
|
## Development Setup Notes
|
||||||
|
|
||||||
|
- Python virtual environment expected at project root (`.venv`)
|
||||||
|
- Backend commands run from project root, not `backend/` directory
|
||||||
|
- Frontend served separately (typically port 8001)
|
||||||
|
- Speaker samples must be WAV format in `speaker_data/speaker_samples/`
|
|
@ -17,3 +17,5 @@ TTS_TEMP_OUTPUT_DIR = PROJECT_ROOT / "tts_temp_outputs"
|
||||||
# These are stored within the 'backend' directory to be easily servable.
|
# These are stored within the 'backend' directory to be easily servable.
|
||||||
DIALOG_OUTPUT_PARENT_DIR = PROJECT_ROOT / "backend"
|
DIALOG_OUTPUT_PARENT_DIR = PROJECT_ROOT / "backend"
|
||||||
DIALOG_GENERATED_DIR = DIALOG_OUTPUT_PARENT_DIR / "tts_generated_dialogs"
|
DIALOG_GENERATED_DIR = DIALOG_OUTPUT_PARENT_DIR / "tts_generated_dialogs"
|
||||||
|
# Alias for clarity and backward compatibility
|
||||||
|
DIALOG_OUTPUT_DIR = DIALOG_GENERATED_DIR
|
||||||
|
|
|
@ -11,10 +11,14 @@ class SpeechItem(DialogItemBase):
|
||||||
exaggeration: Optional[float] = Field(0.5, description="Controls the expressiveness of the speech. Higher values lead to more exaggerated speech. Default from Gradio.")
|
exaggeration: Optional[float] = Field(0.5, description="Controls the expressiveness of the speech. Higher values lead to more exaggerated speech. Default from Gradio.")
|
||||||
cfg_weight: Optional[float] = Field(0.5, description="Classifier-Free Guidance weight. Higher values make the speech more aligned with the prompt text and speaker characteristics. Default from Gradio.")
|
cfg_weight: Optional[float] = Field(0.5, description="Classifier-Free Guidance weight. Higher values make the speech more aligned with the prompt text and speaker characteristics. Default from Gradio.")
|
||||||
temperature: Optional[float] = Field(0.8, description="Controls randomness in generation. Lower values make speech more deterministic, higher values more varied. Default from Gradio.")
|
temperature: Optional[float] = Field(0.8, description="Controls randomness in generation. Lower values make speech more deterministic, higher values more varied. Default from Gradio.")
|
||||||
|
use_existing_audio: Optional[bool] = Field(False, description="If true and audio_url is provided, use the existing audio file instead of generating new audio for this line.")
|
||||||
|
audio_url: Optional[str] = Field(None, description="Path or URL to pre-generated audio for this line (used if use_existing_audio is true).")
|
||||||
|
|
||||||
class SilenceItem(DialogItemBase):
|
class SilenceItem(DialogItemBase):
|
||||||
type: Literal['silence'] = 'silence'
|
type: Literal['silence'] = 'silence'
|
||||||
duration: float = Field(..., gt=0, description="Duration of the silence in seconds.")
|
duration: float = Field(..., gt=0, description="Duration of the silence in seconds.")
|
||||||
|
use_existing_audio: Optional[bool] = Field(False, description="If true and audio_url is provided, use the existing audio file for silence instead of generating a new silent segment.")
|
||||||
|
audio_url: Optional[str] = Field(None, description="Path or URL to pre-generated audio for this silence (used if use_existing_audio is true).")
|
||||||
|
|
||||||
class DialogRequest(BaseModel):
|
class DialogRequest(BaseModel):
|
||||||
dialog_items: List[Union[SpeechItem, SilenceItem]] = Field(..., description="A list of speech and silence items.")
|
dialog_items: List[Union[SpeechItem, SilenceItem]] = Field(..., description="A list of speech and silence items.")
|
||||||
|
|
|
@ -1,6 +1,7 @@
|
||||||
from fastapi import APIRouter, Depends, HTTPException, BackgroundTasks
|
from fastapi import APIRouter, Depends, HTTPException, BackgroundTasks
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
import shutil
|
import shutil
|
||||||
|
import os
|
||||||
|
|
||||||
from app.models.dialog_models import DialogRequest, DialogResponse
|
from app.models.dialog_models import DialogRequest, DialogResponse
|
||||||
from app.services.tts_service import TTSService
|
from app.services.tts_service import TTSService
|
||||||
|
@ -32,6 +33,68 @@ def get_audio_manipulation_service():
|
||||||
return AudioManipulationService()
|
return AudioManipulationService()
|
||||||
|
|
||||||
# --- Helper function to manage TTS model loading/unloading ---
|
# --- Helper function to manage TTS model loading/unloading ---
|
||||||
|
|
||||||
|
from app.models.dialog_models import SpeechItem, SilenceItem
|
||||||
|
from app.services.tts_service import TTSService
|
||||||
|
from app.services.audio_manipulation_service import AudioManipulationService
|
||||||
|
from app.services.speaker_service import SpeakerManagementService
|
||||||
|
from fastapi import Body
|
||||||
|
import uuid
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
@router.post("/generate_line")
|
||||||
|
async def generate_line(
|
||||||
|
item: dict = Body(...),
|
||||||
|
tts_service: TTSService = Depends(get_tts_service),
|
||||||
|
audio_manipulator: AudioManipulationService = Depends(get_audio_manipulation_service),
|
||||||
|
speaker_service: SpeakerManagementService = Depends(get_speaker_management_service)
|
||||||
|
):
|
||||||
|
"""
|
||||||
|
Generate audio for a single dialog line (speech or silence).
|
||||||
|
Returns the URL of the generated audio file, or error details on failure.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
if item.get("type") == "speech":
|
||||||
|
speech = SpeechItem(**item)
|
||||||
|
filename_base = f"line_{uuid.uuid4().hex}"
|
||||||
|
out_dir = Path(config.DIALOG_GENERATED_DIR)
|
||||||
|
# Get speaker sample path
|
||||||
|
speaker_info = speaker_service.get_speaker_by_id(speech.speaker_id)
|
||||||
|
if not speaker_info or not getattr(speaker_info, 'sample_path', None):
|
||||||
|
raise HTTPException(status_code=404, detail=f"Speaker sample for ID '{speech.speaker_id}' not found.")
|
||||||
|
speaker_sample_path = speaker_info.sample_path
|
||||||
|
# Ensure absolute path
|
||||||
|
if not os.path.isabs(speaker_sample_path):
|
||||||
|
speaker_sample_path = str((Path(config.SPEAKER_SAMPLES_DIR) / Path(speaker_sample_path).name).resolve())
|
||||||
|
# Generate speech (async)
|
||||||
|
out_path = await tts_service.generate_speech(
|
||||||
|
text=speech.text,
|
||||||
|
speaker_sample_path=speaker_sample_path,
|
||||||
|
output_filename_base=filename_base,
|
||||||
|
speaker_id=speech.speaker_id,
|
||||||
|
output_dir=out_dir,
|
||||||
|
exaggeration=speech.exaggeration,
|
||||||
|
cfg_weight=speech.cfg_weight,
|
||||||
|
temperature=speech.temperature
|
||||||
|
)
|
||||||
|
audio_url = f"/generated_audio/{out_path.name}"
|
||||||
|
elif item.get("type") == "silence":
|
||||||
|
silence = SilenceItem(**item)
|
||||||
|
filename = f"silence_{uuid.uuid4().hex}.wav"
|
||||||
|
out_path = Path(config.DIALOG_GENERATED_DIR) / filename
|
||||||
|
# Generate silence tensor and save as WAV
|
||||||
|
silence_tensor = audio_manipulator._create_silence(silence.duration)
|
||||||
|
import torchaudio
|
||||||
|
torchaudio.save(str(out_path), silence_tensor, audio_manipulator.sample_rate)
|
||||||
|
audio_url = f"/generated_audio/{filename}"
|
||||||
|
else:
|
||||||
|
raise HTTPException(status_code=400, detail="Unknown dialog item type.")
|
||||||
|
return {"audio_url": audio_url}
|
||||||
|
except Exception as e:
|
||||||
|
import traceback
|
||||||
|
tb = traceback.format_exc()
|
||||||
|
raise HTTPException(status_code=500, detail=f"Exception: {str(e)}\nTraceback:\n{tb}")
|
||||||
|
|
||||||
async def manage_tts_model_lifecycle(tts_service: TTSService, task_function, *args, **kwargs):
|
async def manage_tts_model_lifecycle(tts_service: TTSService, task_function, *args, **kwargs):
|
||||||
"""Loads TTS model, executes task, then unloads model."""
|
"""Loads TTS model, executes task, then unloads model."""
|
||||||
try:
|
try:
|
||||||
|
|
|
@ -86,11 +86,58 @@ class DialogProcessorService:
|
||||||
dialog_temp_dir.mkdir(parents=True, exist_ok=True)
|
dialog_temp_dir.mkdir(parents=True, exist_ok=True)
|
||||||
processing_log.append(f"Created temporary directory for segments: {dialog_temp_dir}")
|
processing_log.append(f"Created temporary directory for segments: {dialog_temp_dir}")
|
||||||
|
|
||||||
|
import shutil
|
||||||
segment_idx = 0
|
segment_idx = 0
|
||||||
for i, item in enumerate(dialog_items):
|
for i, item in enumerate(dialog_items):
|
||||||
item_type = item.get("type")
|
item_type = item.get("type")
|
||||||
processing_log.append(f"Processing item {i+1}: type='{item_type}'")
|
processing_log.append(f"Processing item {i+1}: type='{item_type}'")
|
||||||
|
|
||||||
|
# --- Universal: Handle reuse of existing audio for both speech and silence ---
|
||||||
|
use_existing_audio = item.get("use_existing_audio", False)
|
||||||
|
audio_url = item.get("audio_url")
|
||||||
|
if use_existing_audio and audio_url:
|
||||||
|
# Determine source path (handle both absolute and relative)
|
||||||
|
# Map web URL to actual file location in tts_generated_dialogs
|
||||||
|
if audio_url.startswith("/generated_audio/"):
|
||||||
|
src_audio_path = config.DIALOG_OUTPUT_DIR / audio_url[len("/generated_audio/"):]
|
||||||
|
else:
|
||||||
|
src_audio_path = Path(audio_url)
|
||||||
|
if not src_audio_path.is_absolute():
|
||||||
|
# Assume relative to the generated audio root dir
|
||||||
|
src_audio_path = config.DIALOG_OUTPUT_DIR / audio_url.lstrip("/\\")
|
||||||
|
# Now src_audio_path should point to the real file in tts_generated_dialogs
|
||||||
|
if src_audio_path.is_file():
|
||||||
|
segment_filename = f"{output_base_name}_seg{segment_idx}_reused.wav"
|
||||||
|
dest_path = (self.temp_audio_dir / output_base_name / segment_filename)
|
||||||
|
try:
|
||||||
|
if not src_audio_path.exists():
|
||||||
|
processing_log.append(f"[REUSE] Source audio file does not exist: {src_audio_path}")
|
||||||
|
else:
|
||||||
|
processing_log.append(f"[REUSE] Source audio file exists: {src_audio_path}, size={src_audio_path.stat().st_size} bytes")
|
||||||
|
shutil.copyfile(src_audio_path, dest_path)
|
||||||
|
if not dest_path.exists():
|
||||||
|
processing_log.append(f"[REUSE] Destination audio file was not created: {dest_path}")
|
||||||
|
else:
|
||||||
|
processing_log.append(f"[REUSE] Destination audio file created: {dest_path}, size={dest_path.stat().st_size} bytes")
|
||||||
|
# Only include 'type' and 'path' so the concatenator always includes this segment
|
||||||
|
segment_results.append({
|
||||||
|
"type": item_type,
|
||||||
|
"path": str(dest_path)
|
||||||
|
})
|
||||||
|
processing_log.append(f"Reused existing audio for item {i+1}: copied from {src_audio_path} to {dest_path}")
|
||||||
|
except Exception as e:
|
||||||
|
error_message = f"Failed to copy reused audio for item {i+1}: {e}"
|
||||||
|
processing_log.append(error_message)
|
||||||
|
segment_results.append({"type": "error", "message": error_message})
|
||||||
|
segment_idx += 1
|
||||||
|
continue
|
||||||
|
else:
|
||||||
|
error_message = f"Audio file for reuse not found at {src_audio_path} for item {i+1}."
|
||||||
|
processing_log.append(error_message)
|
||||||
|
segment_results.append({"type": "error", "message": error_message})
|
||||||
|
segment_idx += 1
|
||||||
|
continue
|
||||||
|
|
||||||
if item_type == "speech":
|
if item_type == "speech":
|
||||||
speaker_id = item.get("speaker_id")
|
speaker_id = item.get("speaker_id")
|
||||||
text = item.get("text")
|
text = item.get("text")
|
||||||
|
@ -161,6 +208,11 @@ class DialogProcessorService:
|
||||||
processing_log.append(f"Unknown item type '{item_type}' at item {i+1}. Skipping.")
|
processing_log.append(f"Unknown item type '{item_type}' at item {i+1}. Skipping.")
|
||||||
segment_results.append({"type": "error", "message": f"Unknown item type: {item_type}"})
|
segment_results.append({"type": "error", "message": f"Unknown item type: {item_type}"})
|
||||||
|
|
||||||
|
# Log the full segment_results list for debugging
|
||||||
|
processing_log.append("[DEBUG] Final segment_results list:")
|
||||||
|
for idx, seg in enumerate(segment_results):
|
||||||
|
processing_log.append(f" [{idx}] {seg}")
|
||||||
|
|
||||||
return {
|
return {
|
||||||
"log": "\n".join(processing_log),
|
"log": "\n".join(processing_log),
|
||||||
"segment_files": segment_results,
|
"segment_files": segment_results,
|
||||||
|
|
|
@ -16,7 +16,7 @@ DIALOG_PAYLOAD = {
|
||||||
"dialog_items": [
|
"dialog_items": [
|
||||||
{
|
{
|
||||||
"type": "speech",
|
"type": "speech",
|
||||||
"speaker_id": "dummy_speaker", # Ensure this speaker exists in your speakers.yaml and has a sample .wav
|
"speaker_id": "90fcd672-ba84-441a-ac6c-0449a59653bd", # Correct UUID for dummy_speaker
|
||||||
"text": "This is a test from the Python script. One, two, three.",
|
"text": "This is a test from the Python script. One, two, three.",
|
||||||
"exaggeration": 1.5,
|
"exaggeration": 1.5,
|
||||||
"cfg_weight": 4.0,
|
"cfg_weight": 4.0,
|
||||||
|
@ -28,7 +28,7 @@ DIALOG_PAYLOAD = {
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"type": "speech",
|
"type": "speech",
|
||||||
"speaker_id": "dummy_speaker",
|
"speaker_id": "90fcd672-ba84-441a-ac6c-0449a59653bd",
|
||||||
"text": "Testing complete. All systems nominal."
|
"text": "Testing complete. All systems nominal."
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
@ -104,5 +104,49 @@ def run_test():
|
||||||
print(f"An unexpected error occurred: {e}")
|
print(f"An unexpected error occurred: {e}")
|
||||||
print("Test FAILED (Unexpected error)")
|
print("Test FAILED (Unexpected error)")
|
||||||
|
|
||||||
|
def test_generate_line_speech():
|
||||||
|
url = f"{API_BASE_URL}/generate_line"
|
||||||
|
payload = {
|
||||||
|
"type": "speech",
|
||||||
|
"speaker_id": "90fcd672-ba84-441a-ac6c-0449a59653bd", # Correct UUID for dummy_speaker
|
||||||
|
"text": "This is a per-line TTS test.",
|
||||||
|
"exaggeration": 1.0,
|
||||||
|
"cfg_weight": 2.0,
|
||||||
|
"temperature": 0.8
|
||||||
|
}
|
||||||
|
print(f"\nTesting /generate_line with speech item: {payload}")
|
||||||
|
response = requests.post(url, json=payload)
|
||||||
|
print(f"Status: {response.status_code}")
|
||||||
|
try:
|
||||||
|
data = response.json()
|
||||||
|
print(f"Response: {json.dumps(data, indent=2)}")
|
||||||
|
if response.status_code == 200 and "audio_url" in data:
|
||||||
|
print("Speech line test PASSED.")
|
||||||
|
else:
|
||||||
|
print("Speech line test FAILED.")
|
||||||
|
except Exception as e:
|
||||||
|
print(f"Speech line test FAILED: {e}")
|
||||||
|
|
||||||
|
def test_generate_line_silence():
|
||||||
|
url = f"{API_BASE_URL}/generate_line"
|
||||||
|
payload = {
|
||||||
|
"type": "silence",
|
||||||
|
"duration": 1.25
|
||||||
|
}
|
||||||
|
print(f"\nTesting /generate_line with silence item: {payload}")
|
||||||
|
response = requests.post(url, json=payload)
|
||||||
|
print(f"Status: {response.status_code}")
|
||||||
|
try:
|
||||||
|
data = response.json()
|
||||||
|
print(f"Response: {json.dumps(data, indent=2)}")
|
||||||
|
if response.status_code == 200 and "audio_url" in data:
|
||||||
|
print("Silence line test PASSED.")
|
||||||
|
else:
|
||||||
|
print("Silence line test FAILED.")
|
||||||
|
except Exception as e:
|
||||||
|
print(f"Silence line test FAILED: {e}")
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
run_test()
|
run_test()
|
||||||
|
test_generate_line_speech()
|
||||||
|
test_generate_line_silence()
|
||||||
|
|
|
@ -1,11 +1,57 @@
|
||||||
/* Modern, clean, and accessible UI styles for Chatterbox TTS */
|
/* CSS Custom Properties - Color Palette */
|
||||||
|
:root {
|
||||||
|
/* Primary Colors */
|
||||||
|
--primary-blue: #163b65;
|
||||||
|
--primary-blue-dark: #357ab8;
|
||||||
|
--primary-blue-darker: #205081;
|
||||||
|
|
||||||
|
/* Background Colors */
|
||||||
|
--bg-body: #f7f9fa;
|
||||||
|
--bg-white: #fff;
|
||||||
|
--bg-light: #f3f5f7;
|
||||||
|
--bg-lighter: #f9fafb;
|
||||||
|
--bg-blue-light: #f3f7fa;
|
||||||
|
--bg-blue-lighter: #eaf1fa;
|
||||||
|
--bg-gray-light: #f7f7f7;
|
||||||
|
|
||||||
|
/* Text Colors */
|
||||||
|
--text-primary: #222;
|
||||||
|
--text-secondary: #2b2b2b;
|
||||||
|
--text-tertiary: #333;
|
||||||
|
--text-white: #fff;
|
||||||
|
--text-blue: #153f6f;
|
||||||
|
--text-blue-dark: #357ab8;
|
||||||
|
--text-blue-darker: #205081;
|
||||||
|
|
||||||
|
/* Border Colors */
|
||||||
|
--border-light: #1b0404;
|
||||||
|
--border-medium: #cfd8dc;
|
||||||
|
--border-blue: #b5c6df;
|
||||||
|
--border-gray: #e3e3e3;
|
||||||
|
|
||||||
|
/* Status Colors */
|
||||||
|
--error-bg: #e74c3c;
|
||||||
|
--error-bg-dark: #c0392b;
|
||||||
|
--warning-bg: #f9e79f;
|
||||||
|
--warning-text: #b7950b;
|
||||||
|
--warning-border: #f7ca18;
|
||||||
|
|
||||||
|
/* Header/Footer */
|
||||||
|
--header-bg: #222e3a;
|
||||||
|
|
||||||
|
/* Shadows */
|
||||||
|
--shadow-light: rgba(44,62,80,0.04);
|
||||||
|
--shadow-medium: rgba(44,62,80,0.06);
|
||||||
|
--shadow-strong: rgba(44,62,80,0.07);
|
||||||
|
}
|
||||||
|
|
||||||
body {
|
body {
|
||||||
font-family: 'Segoe UI', 'Roboto', 'Arial', sans-serif;
|
font-family: 'Segoe UI', 'Roboto', 'Arial', sans-serif;
|
||||||
line-height: 1.7;
|
line-height: 1.7;
|
||||||
margin: 0;
|
margin: 0;
|
||||||
padding: 0;
|
padding: 0;
|
||||||
background-color: #f7f9fa;
|
background-color: var(--bg-body);
|
||||||
color: #222;
|
color: var(--text-primary);
|
||||||
}
|
}
|
||||||
|
|
||||||
.container {
|
.container {
|
||||||
|
@ -15,11 +61,11 @@ body {
|
||||||
}
|
}
|
||||||
|
|
||||||
header {
|
header {
|
||||||
background: #222e3a;
|
background: var(--header-bg);
|
||||||
color: #fff;
|
color: var(--text-white);
|
||||||
padding: 1.5rem 0 1rem 0;
|
padding: 1.5rem 0 1rem 0;
|
||||||
text-align: center;
|
text-align: center;
|
||||||
border-bottom: 3px solid #4a90e2;
|
border-bottom: 3px solid var(--primary-blue);
|
||||||
}
|
}
|
||||||
|
|
||||||
h1 {
|
h1 {
|
||||||
|
@ -49,7 +95,6 @@ main {
|
||||||
padding: 0;
|
padding: 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
#results-display.panel {
|
#results-display.panel {
|
||||||
flex: 1 1 100%;
|
flex: 1 1 100%;
|
||||||
min-width: 0;
|
min-width: 0;
|
||||||
|
@ -60,52 +105,115 @@ main {
|
||||||
#dialog-items-table {
|
#dialog-items-table {
|
||||||
width: 100%;
|
width: 100%;
|
||||||
border-collapse: collapse;
|
border-collapse: collapse;
|
||||||
background: #fff;
|
background: var(--bg-white);
|
||||||
border-radius: 8px;
|
border-radius: 8px;
|
||||||
overflow: hidden;
|
overflow: hidden;
|
||||||
font-size: 1rem;
|
font-size: 1rem;
|
||||||
margin-bottom: 0;
|
margin-bottom: 0;
|
||||||
|
table-layout: fixed;
|
||||||
}
|
}
|
||||||
|
|
||||||
#dialog-items-table th, #dialog-items-table td {
|
#dialog-items-table th, #dialog-items-table td {
|
||||||
padding: 10px 12px;
|
padding: 7px 10px;
|
||||||
border-bottom: 1px solid #e3e3e3;
|
border: 1px solid var(--border-light);
|
||||||
text-align: left;
|
text-align: left;
|
||||||
|
vertical-align: middle;
|
||||||
|
font-weight: 300;
|
||||||
|
font-size: 0.97rem;
|
||||||
|
color: var(--text-secondary);
|
||||||
|
overflow: hidden;
|
||||||
|
text-overflow: ellipsis;
|
||||||
|
white-space: nowrap;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* Widen the Text/Duration column */
|
||||||
|
#dialog-items-table th:nth-child(3), #dialog-items-table td:nth-child(3) {
|
||||||
|
min-width: 320px;
|
||||||
|
width: 50%;
|
||||||
|
font-weight: 300;
|
||||||
|
font-size: 1rem;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Make the Speaker (2nd) column narrower */
|
||||||
|
#dialog-items-table th:nth-child(2), #dialog-items-table td:nth-child(2) {
|
||||||
|
width: 60px;
|
||||||
|
min-width: 60px;
|
||||||
|
max-width: 60px;
|
||||||
|
text-align: center;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Make the Actions (4th) column narrower */
|
||||||
|
#dialog-items-table th:nth-child(4), #dialog-items-table td:nth-child(4) {
|
||||||
|
width: 110px;
|
||||||
|
min-width: 90px;
|
||||||
|
max-width: 130px;
|
||||||
|
text-align: left;
|
||||||
|
padding-left: 0;
|
||||||
|
padding-right: 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
#dialog-items-table th:first-child, #dialog-items-table td.type-icon-cell {
|
||||||
|
width: 44px;
|
||||||
|
min-width: 36px;
|
||||||
|
max-width: 48px;
|
||||||
|
text-align: center;
|
||||||
|
padding-left: 0;
|
||||||
|
padding-right: 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
.type-icon-cell {
|
||||||
|
text-align: center;
|
||||||
|
vertical-align: middle;
|
||||||
|
}
|
||||||
|
|
||||||
|
.dialog-type-icon {
|
||||||
|
font-size: 1.4em;
|
||||||
|
display: inline-block;
|
||||||
|
line-height: 1;
|
||||||
|
vertical-align: middle;
|
||||||
|
}
|
||||||
|
|
||||||
#dialog-items-table th {
|
#dialog-items-table th {
|
||||||
background: #f3f7fa;
|
background: var(--bg-light);
|
||||||
color: #4a90e2;
|
color: var(--primary-blue);
|
||||||
font-weight: 600;
|
font-weight: 600;
|
||||||
font-size: 1.05rem;
|
font-size: 1.05rem;
|
||||||
}
|
}
|
||||||
|
|
||||||
#dialog-items-table tr:last-child td {
|
#dialog-items-table tr:last-child td {
|
||||||
border-bottom: none;
|
border-bottom: none;
|
||||||
}
|
}
|
||||||
|
|
||||||
#dialog-items-table td.actions {
|
#dialog-items-table td.actions {
|
||||||
text-align: center;
|
text-align: left;
|
||||||
min-width: 90px;
|
min-width: 110px;
|
||||||
|
white-space: nowrap;
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Collapsible log details */
|
/* Collapsible log details */
|
||||||
details#generation-log-details {
|
details#generation-log-details {
|
||||||
margin-bottom: 0;
|
margin-bottom: 0;
|
||||||
border-radius: 4px;
|
border-radius: 4px;
|
||||||
background: #f3f5f7;
|
background: var(--bg-light);
|
||||||
box-shadow: 0 1px 3px rgba(44,62,80,0.04);
|
box-shadow: 0 1px 3px var(--shadow-light);
|
||||||
padding: 0 0 0 0;
|
padding: 0 0 0 0;
|
||||||
transition: box-shadow 0.15s;
|
transition: box-shadow 0.15s;
|
||||||
}
|
}
|
||||||
|
|
||||||
details#generation-log-details[open] {
|
details#generation-log-details[open] {
|
||||||
box-shadow: 0 2px 8px rgba(44,62,80,0.07);
|
box-shadow: 0 2px 8px var(--shadow-strong);
|
||||||
background: #f9fafb;
|
background: var(--bg-lighter);
|
||||||
}
|
}
|
||||||
|
|
||||||
details#generation-log-details summary {
|
details#generation-log-details summary {
|
||||||
font-size: 1rem;
|
font-size: 1rem;
|
||||||
color: #357ab8;
|
color: var(--text-blue);
|
||||||
padding: 10px 0 6px 0;
|
padding: 10px 0 6px 0;
|
||||||
outline: none;
|
outline: none;
|
||||||
}
|
}
|
||||||
|
|
||||||
details#generation-log-details summary:focus {
|
details#generation-log-details summary:focus {
|
||||||
outline: 2px solid #4a90e2;
|
outline: 2px solid var(--primary-blue);
|
||||||
border-radius: 3px;
|
border-radius: 3px;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -132,9 +240,9 @@ details#generation-log-details summary:focus {
|
||||||
}
|
}
|
||||||
|
|
||||||
.card {
|
.card {
|
||||||
background: #fff;
|
background: var(--bg-white);
|
||||||
border-radius: 8px;
|
border-radius: 8px;
|
||||||
box-shadow: 0 2px 8px rgba(44,62,80,0.07);
|
box-shadow: 0 2px 8px var(--shadow-medium);
|
||||||
padding: 18px 20px;
|
padding: 18px 20px;
|
||||||
margin-bottom: 18px;
|
margin-bottom: 18px;
|
||||||
}
|
}
|
||||||
|
@ -154,39 +262,41 @@ h2 {
|
||||||
font-size: 1.5rem;
|
font-size: 1.5rem;
|
||||||
margin-top: 0;
|
margin-top: 0;
|
||||||
margin-bottom: 16px;
|
margin-bottom: 16px;
|
||||||
color: #4a90e2;
|
color: var(--primary-blue);
|
||||||
letter-spacing: 0.5px;
|
letter-spacing: 0.5px;
|
||||||
}
|
}
|
||||||
|
|
||||||
h3 {
|
h3 {
|
||||||
font-size: 1.1rem;
|
font-size: 1.1rem;
|
||||||
margin-bottom: 10px;
|
margin-bottom: 10px;
|
||||||
color: #333;
|
color: var(--text-tertiary);
|
||||||
}
|
}
|
||||||
|
|
||||||
.x-remove-btn {
|
.x-remove-btn {
|
||||||
background: #e74c3c;
|
background: var(--error-bg);
|
||||||
color: #fff;
|
color: var(--text-white);
|
||||||
border: none;
|
border: none;
|
||||||
border-radius: 50%;
|
border-radius: 50%;
|
||||||
width: 28px;
|
width: 32px;
|
||||||
height: 28px;
|
height: 32px;
|
||||||
font-size: 1.2rem;
|
font-size: 1.25rem;
|
||||||
line-height: 1;
|
line-height: 1;
|
||||||
display: inline-flex;
|
display: inline-flex;
|
||||||
align-items: center;
|
align-items: center;
|
||||||
justify-content: center;
|
justify-content: center;
|
||||||
cursor: pointer;
|
cursor: pointer;
|
||||||
transition: background 0.15s;
|
transition: background 0.15s;
|
||||||
margin: 0 2px;
|
margin: 0 3px;
|
||||||
box-shadow: 0 1px 2px rgba(44,62,80,0.06);
|
box-shadow: 0 1px 2px var(--shadow-light);
|
||||||
outline: none;
|
outline: none;
|
||||||
padding: 0;
|
padding: 0;
|
||||||
|
vertical-align: middle;
|
||||||
}
|
}
|
||||||
|
|
||||||
.x-remove-btn:hover, .x-remove-btn:focus {
|
.x-remove-btn:hover, .x-remove-btn:focus {
|
||||||
background: #c0392b;
|
background: var(--error-bg-dark);
|
||||||
color: #fff;
|
color: var(--text-white);
|
||||||
outline: 2px solid #e74c3c;
|
outline: 2px solid var(--error-bg);
|
||||||
}
|
}
|
||||||
|
|
||||||
.form-row {
|
.form-row {
|
||||||
|
@ -202,9 +312,9 @@ label {
|
||||||
margin-bottom: 0;
|
margin-bottom: 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
input[type='text'], input[type='file'] {
|
input[type='text'], input[type='file'], textarea {
|
||||||
padding: 8px 10px;
|
padding: 8px 10px;
|
||||||
border: 1px solid #cfd8dc;
|
border: 1px solid var(--border-medium);
|
||||||
border-radius: 4px;
|
border-radius: 4px;
|
||||||
font-size: 1rem;
|
font-size: 1rem;
|
||||||
width: 100%;
|
width: 100%;
|
||||||
|
@ -212,14 +322,21 @@ input[type='text'], input[type='file'] {
|
||||||
}
|
}
|
||||||
|
|
||||||
input[type='file'] {
|
input[type='file'] {
|
||||||
background: #f7f7f7;
|
background: var(--bg-gray-light);
|
||||||
font-size: 0.97rem;
|
font-size: 0.97rem;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
.dialog-edit-textarea {
|
||||||
|
min-height: 60px;
|
||||||
|
resize: vertical;
|
||||||
|
font-family: inherit;
|
||||||
|
line-height: 1.4;
|
||||||
|
}
|
||||||
|
|
||||||
button {
|
button {
|
||||||
padding: 9px 18px;
|
padding: 9px 18px;
|
||||||
background: #4a90e2;
|
background: var(--primary-blue);
|
||||||
color: #fff;
|
color: var(--text-white);
|
||||||
border: none;
|
border: none;
|
||||||
border-radius: 5px;
|
border-radius: 5px;
|
||||||
cursor: pointer;
|
cursor: pointer;
|
||||||
|
@ -229,8 +346,42 @@ button {
|
||||||
margin-right: 10px;
|
margin-right: 10px;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
.generate-line-btn, .play-line-btn {
|
||||||
|
background: var(--bg-blue-light);
|
||||||
|
color: var(--text-blue);
|
||||||
|
border: 1.5px solid var(--border-blue);
|
||||||
|
border-radius: 50%;
|
||||||
|
width: 32px;
|
||||||
|
height: 32px;
|
||||||
|
font-size: 1.25rem;
|
||||||
|
display: inline-flex;
|
||||||
|
align-items: center;
|
||||||
|
justify-content: center;
|
||||||
|
margin: 0 3px;
|
||||||
|
padding: 0;
|
||||||
|
box-shadow: 0 1px 2px var(--shadow-light);
|
||||||
|
vertical-align: middle;
|
||||||
|
}
|
||||||
|
|
||||||
|
.generate-line-btn:disabled, .play-line-btn:disabled {
|
||||||
|
opacity: 0.45;
|
||||||
|
cursor: not-allowed;
|
||||||
|
}
|
||||||
|
|
||||||
|
.generate-line-btn.loading {
|
||||||
|
background: var(--warning-bg);
|
||||||
|
color: var(--warning-text);
|
||||||
|
border-color: var(--warning-border);
|
||||||
|
}
|
||||||
|
|
||||||
|
.generate-line-btn:hover, .play-line-btn:hover {
|
||||||
|
background: var(--bg-blue-lighter);
|
||||||
|
color: var(--text-blue-darker);
|
||||||
|
border-color: var(--text-blue);
|
||||||
|
}
|
||||||
|
|
||||||
button:hover, button:focus {
|
button:hover, button:focus {
|
||||||
background: #357ab8;
|
background: var(--primary-blue-dark);
|
||||||
outline: none;
|
outline: none;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -246,7 +397,7 @@ button:hover, button:focus {
|
||||||
|
|
||||||
#speaker-list li {
|
#speaker-list li {
|
||||||
padding: 7px 0;
|
padding: 7px 0;
|
||||||
border-bottom: 1px solid #e3e3e3;
|
border-bottom: 1px solid var(--border-gray);
|
||||||
display: flex;
|
display: flex;
|
||||||
justify-content: space-between;
|
justify-content: space-between;
|
||||||
align-items: center;
|
align-items: center;
|
||||||
|
@ -257,7 +408,7 @@ button:hover, button:focus {
|
||||||
}
|
}
|
||||||
|
|
||||||
pre {
|
pre {
|
||||||
background: #f3f5f7;
|
background: var(--bg-light);
|
||||||
padding: 12px;
|
padding: 12px;
|
||||||
border-radius: 4px;
|
border-radius: 4px;
|
||||||
font-size: 0.98rem;
|
font-size: 0.98rem;
|
||||||
|
@ -275,8 +426,8 @@ audio {
|
||||||
#zip-archive-link {
|
#zip-archive-link {
|
||||||
display: inline-block;
|
display: inline-block;
|
||||||
margin-right: 10px;
|
margin-right: 10px;
|
||||||
color: #fff;
|
color: var(--text-white);
|
||||||
background: #4a90e2;
|
background: var(--primary-blue);
|
||||||
padding: 7px 16px;
|
padding: 7px 16px;
|
||||||
border-radius: 4px;
|
border-radius: 4px;
|
||||||
text-decoration: none;
|
text-decoration: none;
|
||||||
|
@ -285,17 +436,17 @@ audio {
|
||||||
}
|
}
|
||||||
|
|
||||||
#zip-archive-link:hover, #zip-archive-link:focus {
|
#zip-archive-link:hover, #zip-archive-link:focus {
|
||||||
background: #357ab8;
|
background: var(--primary-blue-dark);
|
||||||
}
|
}
|
||||||
|
|
||||||
footer {
|
footer {
|
||||||
text-align: center;
|
text-align: center;
|
||||||
padding: 20px 0;
|
padding: 20px 0;
|
||||||
background: #222e3a;
|
background: var(--header-bg);
|
||||||
color: #fff;
|
color: var(--text-white);
|
||||||
margin-top: 40px;
|
margin-top: 40px;
|
||||||
font-size: 1rem;
|
font-size: 1rem;
|
||||||
border-top: 3px solid #4a90e2;
|
border-top: 3px solid var(--primary-blue);
|
||||||
}
|
}
|
||||||
|
|
||||||
@media (max-width: 900px) {
|
@media (max-width: 900px) {
|
||||||
|
@ -328,3 +479,194 @@ footer {
|
||||||
width: 100%;
|
width: 100%;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
.move-up-btn, .move-down-btn {
|
||||||
|
background: var(--bg-blue-light);
|
||||||
|
color: var(--text-blue);
|
||||||
|
border: 1.5px solid var(--border-blue);
|
||||||
|
border-radius: 50%;
|
||||||
|
width: 32px;
|
||||||
|
height: 32px;
|
||||||
|
font-size: 1.25rem;
|
||||||
|
display: inline-flex;
|
||||||
|
align-items: center;
|
||||||
|
justify-content: center;
|
||||||
|
margin: 0 3px;
|
||||||
|
padding: 0;
|
||||||
|
box-shadow: 0 1px 2px var(--shadow-light);
|
||||||
|
vertical-align: middle;
|
||||||
|
cursor: pointer;
|
||||||
|
transition: background 0.15s, color 0.15s, border-color 0.15s;
|
||||||
|
}
|
||||||
|
|
||||||
|
.move-up-btn:disabled, .move-down-btn:disabled {
|
||||||
|
opacity: 0.45;
|
||||||
|
cursor: not-allowed;
|
||||||
|
}
|
||||||
|
|
||||||
|
.move-up-btn:hover:not(:disabled), .move-down-btn:hover:not(:disabled) {
|
||||||
|
background: var(--bg-blue-lighter);
|
||||||
|
color: var(--text-blue-darker);
|
||||||
|
border-color: var(--text-blue);
|
||||||
|
}
|
||||||
|
|
||||||
|
.move-up-btn:focus, .move-down-btn:focus {
|
||||||
|
outline: 2px solid var(--primary-blue);
|
||||||
|
outline-offset: 2px;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* TTS Settings Modal */
|
||||||
|
.modal {
|
||||||
|
position: fixed;
|
||||||
|
top: 0;
|
||||||
|
left: 0;
|
||||||
|
width: 100%;
|
||||||
|
height: 100%;
|
||||||
|
background-color: rgba(0, 0, 0, 0.5);
|
||||||
|
z-index: 1000;
|
||||||
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
|
justify-content: center;
|
||||||
|
}
|
||||||
|
|
||||||
|
.modal-content {
|
||||||
|
background: var(--bg-white);
|
||||||
|
border-radius: 8px;
|
||||||
|
box-shadow: var(--shadow-strong);
|
||||||
|
max-width: 500px;
|
||||||
|
width: 90%;
|
||||||
|
max-height: 80vh;
|
||||||
|
overflow-y: auto;
|
||||||
|
}
|
||||||
|
|
||||||
|
.modal-header {
|
||||||
|
display: flex;
|
||||||
|
justify-content: space-between;
|
||||||
|
align-items: center;
|
||||||
|
padding: 20px 24px 16px;
|
||||||
|
border-bottom: 1px solid var(--border-light);
|
||||||
|
}
|
||||||
|
|
||||||
|
.modal-header h3 {
|
||||||
|
margin: 0;
|
||||||
|
color: var(--text-primary);
|
||||||
|
font-size: 1.25rem;
|
||||||
|
}
|
||||||
|
|
||||||
|
.modal-close {
|
||||||
|
background: none;
|
||||||
|
border: none;
|
||||||
|
font-size: 24px;
|
||||||
|
cursor: pointer;
|
||||||
|
color: var(--text-secondary);
|
||||||
|
padding: 4px;
|
||||||
|
border-radius: 4px;
|
||||||
|
transition: background-color 0.2s;
|
||||||
|
}
|
||||||
|
|
||||||
|
.modal-close:hover {
|
||||||
|
background-color: var(--bg-light);
|
||||||
|
}
|
||||||
|
|
||||||
|
.modal-body {
|
||||||
|
padding: 20px 24px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.settings-group {
|
||||||
|
margin-bottom: 20px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.settings-group label {
|
||||||
|
display: block;
|
||||||
|
margin-bottom: 8px;
|
||||||
|
font-weight: 500;
|
||||||
|
color: var(--text-primary);
|
||||||
|
}
|
||||||
|
|
||||||
|
.settings-group input[type="range"] {
|
||||||
|
width: 100%;
|
||||||
|
margin-bottom: 4px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.settings-group span {
|
||||||
|
display: inline-block;
|
||||||
|
min-width: 40px;
|
||||||
|
font-weight: 500;
|
||||||
|
color: var(--primary-blue);
|
||||||
|
margin-left: 8px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.settings-group small {
|
||||||
|
display: block;
|
||||||
|
color: var(--text-secondary);
|
||||||
|
font-size: 0.875rem;
|
||||||
|
margin-top: 4px;
|
||||||
|
line-height: 1.3;
|
||||||
|
}
|
||||||
|
|
||||||
|
.modal-footer {
|
||||||
|
display: flex;
|
||||||
|
gap: 12px;
|
||||||
|
justify-content: flex-end;
|
||||||
|
padding: 16px 24px 20px;
|
||||||
|
border-top: 1px solid var(--border-light);
|
||||||
|
}
|
||||||
|
|
||||||
|
.btn-primary, .btn-secondary {
|
||||||
|
padding: 8px 16px;
|
||||||
|
border-radius: 4px;
|
||||||
|
border: none;
|
||||||
|
cursor: pointer;
|
||||||
|
font-size: 0.875rem;
|
||||||
|
font-weight: 500;
|
||||||
|
transition: all 0.2s;
|
||||||
|
}
|
||||||
|
|
||||||
|
.btn-primary {
|
||||||
|
background-color: var(--primary-blue);
|
||||||
|
color: var(--text-white);
|
||||||
|
}
|
||||||
|
|
||||||
|
.btn-primary:hover {
|
||||||
|
background-color: var(--primary-blue-dark);
|
||||||
|
}
|
||||||
|
|
||||||
|
.btn-secondary {
|
||||||
|
background-color: var(--bg-light);
|
||||||
|
color: var(--text-secondary);
|
||||||
|
border: 1px solid var(--border-medium);
|
||||||
|
}
|
||||||
|
|
||||||
|
.btn-secondary:hover {
|
||||||
|
background-color: var(--border-light);
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Settings button styling */
|
||||||
|
.settings-line-btn {
|
||||||
|
width: 32px;
|
||||||
|
height: 32px;
|
||||||
|
border-radius: 50%;
|
||||||
|
border: none;
|
||||||
|
background-color: var(--bg-light);
|
||||||
|
color: var(--text-secondary);
|
||||||
|
cursor: pointer;
|
||||||
|
font-size: 14px;
|
||||||
|
margin: 0 2px;
|
||||||
|
transition: all 0.2s;
|
||||||
|
display: inline-flex;
|
||||||
|
align-items: center;
|
||||||
|
justify-content: center;
|
||||||
|
vertical-align: middle;
|
||||||
|
}
|
||||||
|
|
||||||
|
.settings-line-btn:hover {
|
||||||
|
background-color: var(--primary-blue);
|
||||||
|
color: var(--text-white);
|
||||||
|
transform: scale(1.05);
|
||||||
|
}
|
||||||
|
|
||||||
|
.settings-line-btn:disabled {
|
||||||
|
opacity: 0.5;
|
||||||
|
cursor: not-allowed;
|
||||||
|
transform: none;
|
||||||
|
}
|
||||||
|
|
|
@ -38,12 +38,17 @@
|
||||||
<div class="dialog-controls form-row">
|
<div class="dialog-controls form-row">
|
||||||
<button id="add-speech-line-btn">Add Speech Line</button>
|
<button id="add-speech-line-btn">Add Speech Line</button>
|
||||||
<button id="add-silence-line-btn">Add Silence Line</button>
|
<button id="add-silence-line-btn">Add Silence Line</button>
|
||||||
|
<button id="generate-dialog-btn">Generate Dialog</button>
|
||||||
</div>
|
</div>
|
||||||
<div class="dialog-controls form-row">
|
<div class="dialog-controls form-row">
|
||||||
<label for="output-base-name">Output Base Name:</label>
|
<label for="output-base-name">Output Base Name:</label>
|
||||||
<input type="text" id="output-base-name" name="output-base-name" value="dialog_output" required>
|
<input type="text" id="output-base-name" name="output-base-name" value="dialog_output" required>
|
||||||
</div>
|
</div>
|
||||||
<button id="generate-dialog-btn">Generate Dialog</button>
|
<div class="dialog-controls form-row">
|
||||||
|
<button id="save-script-btn">Save Script</button>
|
||||||
|
<input type="file" id="load-script-input" accept=".jsonl" style="display: none;">
|
||||||
|
<button id="load-script-btn">Load Script</button>
|
||||||
|
</div>
|
||||||
</section>
|
</section>
|
||||||
</div>
|
</div>
|
||||||
<!-- Results below -->
|
<!-- Results below -->
|
||||||
|
@ -96,6 +101,40 @@
|
||||||
</div>
|
</div>
|
||||||
</footer>
|
</footer>
|
||||||
|
|
||||||
|
<!-- TTS Settings Modal -->
|
||||||
|
<div id="tts-settings-modal" class="modal" style="display: none;">
|
||||||
|
<div class="modal-content">
|
||||||
|
<div class="modal-header">
|
||||||
|
<h3>TTS Settings</h3>
|
||||||
|
<button class="modal-close" id="tts-modal-close">×</button>
|
||||||
|
</div>
|
||||||
|
<div class="modal-body">
|
||||||
|
<div class="settings-group">
|
||||||
|
<label for="tts-exaggeration">Exaggeration:</label>
|
||||||
|
<input type="range" id="tts-exaggeration" min="0" max="2" step="0.1" value="0.5">
|
||||||
|
<span id="tts-exaggeration-value">0.5</span>
|
||||||
|
<small>Controls expressiveness. Higher values = more exaggerated speech.</small>
|
||||||
|
</div>
|
||||||
|
<div class="settings-group">
|
||||||
|
<label for="tts-cfg-weight">CFG Weight:</label>
|
||||||
|
<input type="range" id="tts-cfg-weight" min="0" max="2" step="0.1" value="0.5">
|
||||||
|
<span id="tts-cfg-weight-value">0.5</span>
|
||||||
|
<small>Alignment with prompt. Higher values = more aligned with speaker characteristics.</small>
|
||||||
|
</div>
|
||||||
|
<div class="settings-group">
|
||||||
|
<label for="tts-temperature">Temperature:</label>
|
||||||
|
<input type="range" id="tts-temperature" min="0" max="2" step="0.1" value="0.8">
|
||||||
|
<span id="tts-temperature-value">0.8</span>
|
||||||
|
<small>Randomness. Lower values = more deterministic, higher = more varied.</small>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div class="modal-footer">
|
||||||
|
<button id="tts-settings-save" class="btn-primary">Save Settings</button>
|
||||||
|
<button id="tts-settings-cancel" class="btn-secondary">Cancel</button>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
<script src="js/api.js" type="module"></script>
|
<script src="js/api.js" type="module"></script>
|
||||||
<script src="js/app.js" type="module" defer></script>
|
<script src="js/app.js" type="module" defer></script>
|
||||||
</body>
|
</body>
|
||||||
|
|
|
@ -100,6 +100,27 @@ export async function deleteSpeaker(speakerId) {
|
||||||
|
|
||||||
// ... (keep API_BASE_URL, getSpeakers, addSpeaker, deleteSpeaker)
|
// ... (keep API_BASE_URL, getSpeakers, addSpeaker, deleteSpeaker)
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Generates audio for a single dialog line (speech or silence).
|
||||||
|
* @param {Object} line - The dialog line object (type: 'speech' or 'silence').
|
||||||
|
* @returns {Promise<Object>} Resolves with { audio_url } on success.
|
||||||
|
* @throws {Error} If the network response is not ok.
|
||||||
|
*/
|
||||||
|
export async function generateLine(line) {
|
||||||
|
const response = await fetch(`${API_BASE_URL}/dialog/generate_line/`, {
|
||||||
|
method: 'POST',
|
||||||
|
headers: {
|
||||||
|
'Content-Type': 'application/json',
|
||||||
|
},
|
||||||
|
body: JSON.stringify(line),
|
||||||
|
});
|
||||||
|
if (!response.ok) {
|
||||||
|
const errorData = await response.json().catch(() => ({ message: response.statusText }));
|
||||||
|
throw new Error(`Failed to generate line audio: ${errorData.detail || errorData.message || response.statusText}`);
|
||||||
|
}
|
||||||
|
return response.json();
|
||||||
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Generates a dialog by sending a payload to the backend.
|
* Generates a dialog by sending a payload to the backend.
|
||||||
* @param {Object} dialogPayload - The payload for dialog generation.
|
* @param {Object} dialogPayload - The payload for dialog generation.
|
||||||
|
|
|
@ -7,10 +7,10 @@ const API_BASE_URL = 'http://localhost:8000'; // Assuming backend runs here
|
||||||
// then this should be http://localhost:8000. The backend will return paths like /generated_audio/...
|
// then this should be http://localhost:8000. The backend will return paths like /generated_audio/...
|
||||||
const API_BASE_URL_FOR_FILES = 'http://localhost:8000';
|
const API_BASE_URL_FOR_FILES = 'http://localhost:8000';
|
||||||
|
|
||||||
document.addEventListener('DOMContentLoaded', () => {
|
document.addEventListener('DOMContentLoaded', async () => {
|
||||||
console.log('DOM fully loaded and parsed');
|
console.log('DOM fully loaded and parsed');
|
||||||
initializeSpeakerManagement();
|
initializeSpeakerManagement();
|
||||||
initializeDialogEditor(); // Placeholder for now
|
await initializeDialogEditor(); // Now properly awaiting the async function
|
||||||
initializeResultsDisplay(); // Placeholder for now
|
initializeResultsDisplay(); // Placeholder for now
|
||||||
});
|
});
|
||||||
|
|
||||||
|
@ -109,12 +109,34 @@ async function handleDeleteSpeaker(speakerId) {
|
||||||
let dialogItems = []; // Holds the sequence of speech/silence items
|
let dialogItems = []; // Holds the sequence of speech/silence items
|
||||||
let availableSpeakersCache = []; // To populate speaker dropdown
|
let availableSpeakersCache = []; // To populate speaker dropdown
|
||||||
|
|
||||||
function initializeDialogEditor() {
|
// Utility: ensure each dialog item has audioUrl, isGenerating, error
|
||||||
|
function normalizeDialogItem(item) {
|
||||||
|
const normalized = {
|
||||||
|
...item,
|
||||||
|
audioUrl: item.audioUrl || null,
|
||||||
|
isGenerating: item.isGenerating || false,
|
||||||
|
error: item.error || null
|
||||||
|
};
|
||||||
|
|
||||||
|
// Add TTS settings for speech items with defaults
|
||||||
|
if (item.type === 'speech') {
|
||||||
|
normalized.exaggeration = item.exaggeration ?? 0.5;
|
||||||
|
normalized.cfg_weight = item.cfg_weight ?? 0.5;
|
||||||
|
normalized.temperature = item.temperature ?? 0.8;
|
||||||
|
}
|
||||||
|
|
||||||
|
return normalized;
|
||||||
|
}
|
||||||
|
|
||||||
|
async function initializeDialogEditor() {
|
||||||
const dialogItemsContainer = document.getElementById('dialog-items-container');
|
const dialogItemsContainer = document.getElementById('dialog-items-container');
|
||||||
const addSpeechLineBtn = document.getElementById('add-speech-line-btn');
|
const addSpeechLineBtn = document.getElementById('add-speech-line-btn');
|
||||||
const addSilenceLineBtn = document.getElementById('add-silence-line-btn');
|
const addSilenceLineBtn = document.getElementById('add-silence-line-btn');
|
||||||
const outputBaseNameInput = document.getElementById('output-base-name');
|
const outputBaseNameInput = document.getElementById('output-base-name');
|
||||||
const generateDialogBtn = document.getElementById('generate-dialog-btn');
|
const generateDialogBtn = document.getElementById('generate-dialog-btn');
|
||||||
|
const saveScriptBtn = document.getElementById('save-script-btn');
|
||||||
|
const loadScriptBtn = document.getElementById('load-script-btn');
|
||||||
|
const loadScriptInput = document.getElementById('load-script-input');
|
||||||
|
|
||||||
// Results Display Elements
|
// Results Display Elements
|
||||||
const generationLogPre = document.getElementById('generation-log-content'); // Corrected ID
|
const generationLogPre = document.getElementById('generation-log-content'); // Corrected ID
|
||||||
|
@ -127,23 +149,49 @@ function initializeDialogEditor() {
|
||||||
let dialogItems = [];
|
let dialogItems = [];
|
||||||
let availableSpeakersCache = []; // Cache for speaker names and IDs
|
let availableSpeakersCache = []; // Cache for speaker names and IDs
|
||||||
|
|
||||||
|
// Load speakers at startup
|
||||||
|
try {
|
||||||
|
availableSpeakersCache = await getSpeakers();
|
||||||
|
console.log(`Loaded ${availableSpeakersCache.length} speakers for dialog editor`);
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Error loading speakers at startup:', error);
|
||||||
|
// Continue without speakers - they'll be loaded when needed
|
||||||
|
}
|
||||||
|
|
||||||
// Function to render the current dialogItems array to the DOM as table rows
|
// Function to render the current dialogItems array to the DOM as table rows
|
||||||
function renderDialogItems() {
|
function renderDialogItems() {
|
||||||
if (!dialogItemsContainer) return;
|
if (!dialogItemsContainer) return;
|
||||||
dialogItemsContainer.innerHTML = '';
|
dialogItemsContainer.innerHTML = '';
|
||||||
|
dialogItems = dialogItems.map(normalizeDialogItem); // Ensure all fields present
|
||||||
dialogItems.forEach((item, index) => {
|
dialogItems.forEach((item, index) => {
|
||||||
const tr = document.createElement('tr');
|
const tr = document.createElement('tr');
|
||||||
|
|
||||||
// Type column
|
// Type column
|
||||||
const typeTd = document.createElement('td');
|
const typeTd = document.createElement('td');
|
||||||
typeTd.textContent = item.type === 'speech' ? 'Speech' : 'Silence';
|
typeTd.classList.add('type-icon-cell');
|
||||||
|
if (item.type === 'speech') {
|
||||||
|
typeTd.innerHTML = '<span class="dialog-type-icon" title="Speech" aria-label="Speech">🗣️</span>';
|
||||||
|
} else {
|
||||||
|
typeTd.innerHTML = '<span class="dialog-type-icon" title="Silence" aria-label="Silence">🤫</span>';
|
||||||
|
}
|
||||||
tr.appendChild(typeTd);
|
tr.appendChild(typeTd);
|
||||||
|
|
||||||
// Speaker column
|
// Speaker column
|
||||||
const speakerTd = document.createElement('td');
|
const speakerTd = document.createElement('td');
|
||||||
if (item.type === 'speech') {
|
if (item.type === 'speech') {
|
||||||
const speaker = availableSpeakersCache.find(s => s.id === item.speaker_id);
|
const speakerSelect = document.createElement('select');
|
||||||
speakerTd.textContent = speaker ? speaker.name : 'Unknown Speaker';
|
speakerSelect.className = 'dialog-speaker-select';
|
||||||
|
availableSpeakersCache.forEach(speaker => {
|
||||||
|
const option = document.createElement('option');
|
||||||
|
option.value = speaker.id;
|
||||||
|
option.textContent = speaker.name;
|
||||||
|
if (speaker.id === item.speaker_id) option.selected = true;
|
||||||
|
speakerSelect.appendChild(option);
|
||||||
|
});
|
||||||
|
speakerSelect.onchange = (e) => {
|
||||||
|
dialogItems[index].speaker_id = e.target.value;
|
||||||
|
};
|
||||||
|
speakerTd.appendChild(speakerSelect);
|
||||||
} else {
|
} else {
|
||||||
speakerTd.textContent = '—';
|
speakerTd.textContent = '—';
|
||||||
}
|
}
|
||||||
|
@ -151,17 +199,94 @@ function initializeDialogEditor() {
|
||||||
|
|
||||||
// Text/Duration column
|
// Text/Duration column
|
||||||
const textTd = document.createElement('td');
|
const textTd = document.createElement('td');
|
||||||
|
textTd.className = 'dialog-editable-cell';
|
||||||
if (item.type === 'speech') {
|
if (item.type === 'speech') {
|
||||||
let txt = item.text.length > 60 ? item.text.substring(0, 57) + '…' : item.text;
|
let txt = item.text.length > 60 ? item.text.substring(0, 57) + '…' : item.text;
|
||||||
textTd.textContent = `"${txt}"`;
|
textTd.textContent = `"${txt}"`;
|
||||||
|
textTd.title = item.text;
|
||||||
} else {
|
} else {
|
||||||
textTd.textContent = `${item.duration}s`;
|
textTd.textContent = `${item.duration}s`;
|
||||||
}
|
}
|
||||||
|
// Double-click to edit
|
||||||
|
textTd.ondblclick = () => {
|
||||||
|
textTd.innerHTML = '';
|
||||||
|
let input;
|
||||||
|
if (item.type === 'speech') {
|
||||||
|
// Use textarea for speech text to enable multi-line editing
|
||||||
|
input = document.createElement('textarea');
|
||||||
|
input.className = 'dialog-edit-textarea';
|
||||||
|
input.value = item.text;
|
||||||
|
input.rows = Math.max(2, Math.ceil(item.text.length / 50)); // Auto-size based on content
|
||||||
|
} else {
|
||||||
|
// Use number input for duration
|
||||||
|
input = document.createElement('input');
|
||||||
|
input.className = 'dialog-edit-input';
|
||||||
|
input.type = 'number';
|
||||||
|
input.value = item.duration;
|
||||||
|
input.min = 0.1;
|
||||||
|
input.step = 0.1;
|
||||||
|
}
|
||||||
|
input.onblur = saveEdit;
|
||||||
|
input.onkeydown = (e) => {
|
||||||
|
if (e.key === 'Enter' && item.type === 'silence') {
|
||||||
|
// Only auto-save on Enter for duration inputs
|
||||||
|
input.blur();
|
||||||
|
} else if (e.key === 'Escape') {
|
||||||
|
renderDialogItems();
|
||||||
|
} else if (e.key === 'Enter' && e.ctrlKey && item.type === 'speech') {
|
||||||
|
// Ctrl+Enter to save for speech text
|
||||||
|
input.blur();
|
||||||
|
}
|
||||||
|
};
|
||||||
|
textTd.appendChild(input);
|
||||||
|
input.focus();
|
||||||
|
function saveEdit() {
|
||||||
|
if (item.type === 'speech') {
|
||||||
|
dialogItems[index].text = input.value;
|
||||||
|
dialogItems[index].audioUrl = null; // Invalidate audio if text edited
|
||||||
|
} else {
|
||||||
|
let val = parseFloat(input.value);
|
||||||
|
if (!isNaN(val) && val > 0) dialogItems[index].duration = val;
|
||||||
|
dialogItems[index].audioUrl = null;
|
||||||
|
}
|
||||||
|
renderDialogItems();
|
||||||
|
}
|
||||||
|
};
|
||||||
tr.appendChild(textTd);
|
tr.appendChild(textTd);
|
||||||
|
|
||||||
// Actions column
|
// Actions column
|
||||||
const actionsTd = document.createElement('td');
|
const actionsTd = document.createElement('td');
|
||||||
actionsTd.classList.add('actions');
|
actionsTd.classList.add('actions');
|
||||||
|
|
||||||
|
// Up button
|
||||||
|
const upBtn = document.createElement('button');
|
||||||
|
upBtn.innerHTML = '↑';
|
||||||
|
upBtn.title = 'Move up';
|
||||||
|
upBtn.className = 'move-up-btn';
|
||||||
|
upBtn.disabled = index === 0;
|
||||||
|
upBtn.onclick = () => {
|
||||||
|
if (index > 0) {
|
||||||
|
[dialogItems[index - 1], dialogItems[index]] = [dialogItems[index], dialogItems[index - 1]];
|
||||||
|
renderDialogItems();
|
||||||
|
}
|
||||||
|
};
|
||||||
|
actionsTd.appendChild(upBtn);
|
||||||
|
|
||||||
|
// Down button
|
||||||
|
const downBtn = document.createElement('button');
|
||||||
|
downBtn.innerHTML = '↓';
|
||||||
|
downBtn.title = 'Move down';
|
||||||
|
downBtn.className = 'move-down-btn';
|
||||||
|
downBtn.disabled = index === dialogItems.length - 1;
|
||||||
|
downBtn.onclick = () => {
|
||||||
|
if (index < dialogItems.length - 1) {
|
||||||
|
[dialogItems[index], dialogItems[index + 1]] = [dialogItems[index + 1], dialogItems[index]];
|
||||||
|
renderDialogItems();
|
||||||
|
}
|
||||||
|
};
|
||||||
|
actionsTd.appendChild(downBtn);
|
||||||
|
|
||||||
|
// Remove button
|
||||||
const removeBtn = document.createElement('button');
|
const removeBtn = document.createElement('button');
|
||||||
removeBtn.innerHTML = '×'; // Unicode multiplication sign (X)
|
removeBtn.innerHTML = '×'; // Unicode multiplication sign (X)
|
||||||
removeBtn.classList.add('remove-dialog-item-btn', 'x-remove-btn');
|
removeBtn.classList.add('remove-dialog-item-btn', 'x-remove-btn');
|
||||||
|
@ -172,8 +297,71 @@ function initializeDialogEditor() {
|
||||||
renderDialogItems();
|
renderDialogItems();
|
||||||
};
|
};
|
||||||
actionsTd.appendChild(removeBtn);
|
actionsTd.appendChild(removeBtn);
|
||||||
tr.appendChild(actionsTd);
|
|
||||||
|
|
||||||
|
// --- NEW: Per-line Generate button ---
|
||||||
|
const generateBtn = document.createElement('button');
|
||||||
|
generateBtn.innerHTML = '⚡';
|
||||||
|
generateBtn.title = 'Generate audio for this line';
|
||||||
|
generateBtn.className = 'generate-line-btn';
|
||||||
|
generateBtn.disabled = item.isGenerating;
|
||||||
|
if (item.isGenerating) generateBtn.classList.add('loading');
|
||||||
|
generateBtn.onclick = async () => {
|
||||||
|
dialogItems[index].isGenerating = true;
|
||||||
|
dialogItems[index].error = null;
|
||||||
|
renderDialogItems();
|
||||||
|
try {
|
||||||
|
const { generateLine } = await import('./api.js');
|
||||||
|
const payload = { ...item };
|
||||||
|
// Remove fields not needed by backend
|
||||||
|
delete payload.audioUrl; delete payload.isGenerating; delete payload.error;
|
||||||
|
const result = await generateLine(payload);
|
||||||
|
dialogItems[index].audioUrl = result.audio_url;
|
||||||
|
} catch (err) {
|
||||||
|
dialogItems[index].error = err.message || 'Failed to generate audio.';
|
||||||
|
alert(dialogItems[index].error);
|
||||||
|
} finally {
|
||||||
|
dialogItems[index].isGenerating = false;
|
||||||
|
renderDialogItems();
|
||||||
|
}
|
||||||
|
};
|
||||||
|
actionsTd.appendChild(generateBtn);
|
||||||
|
|
||||||
|
// --- NEW: Per-line Play button ---
|
||||||
|
const playBtn = document.createElement('button');
|
||||||
|
playBtn.innerHTML = '⏵';
|
||||||
|
playBtn.title = item.audioUrl ? 'Play generated audio' : 'No audio generated yet';
|
||||||
|
playBtn.className = 'play-line-btn';
|
||||||
|
playBtn.disabled = !item.audioUrl;
|
||||||
|
playBtn.onclick = () => {
|
||||||
|
if (!item.audioUrl) return;
|
||||||
|
let audioUrl = item.audioUrl.startsWith('http') ? item.audioUrl : `${API_BASE_URL_FOR_FILES}${item.audioUrl}`;
|
||||||
|
// Use a shared audio element or create one per play
|
||||||
|
let audio = new window.Audio(audioUrl);
|
||||||
|
audio.play();
|
||||||
|
};
|
||||||
|
actionsTd.appendChild(playBtn);
|
||||||
|
|
||||||
|
// --- NEW: Settings button for speech items ---
|
||||||
|
if (item.type === 'speech') {
|
||||||
|
const settingsBtn = document.createElement('button');
|
||||||
|
settingsBtn.innerHTML = '⚙️';
|
||||||
|
settingsBtn.title = 'TTS Settings';
|
||||||
|
settingsBtn.className = 'settings-line-btn';
|
||||||
|
settingsBtn.onclick = () => {
|
||||||
|
showTTSSettingsModal(item, index);
|
||||||
|
};
|
||||||
|
actionsTd.appendChild(settingsBtn);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Show error if present
|
||||||
|
if (item.error) {
|
||||||
|
const errorSpan = document.createElement('span');
|
||||||
|
errorSpan.className = 'line-error-msg';
|
||||||
|
errorSpan.textContent = item.error;
|
||||||
|
actionsTd.appendChild(errorSpan);
|
||||||
|
}
|
||||||
|
|
||||||
|
tr.appendChild(actionsTd);
|
||||||
dialogItemsContainer.appendChild(tr);
|
dialogItemsContainer.appendChild(tr);
|
||||||
});
|
});
|
||||||
}
|
}
|
||||||
|
@ -231,7 +419,7 @@ function initializeDialogEditor() {
|
||||||
alert('Please select a speaker and enter text.');
|
alert('Please select a speaker and enter text.');
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
dialogItems.push({ type: 'speech', speaker_id: speakerId, text: text });
|
dialogItems.push(normalizeDialogItem({ type: 'speech', speaker_id: speakerId, text: text }));
|
||||||
renderDialogItems();
|
renderDialogItems();
|
||||||
clearTempInputArea();
|
clearTempInputArea();
|
||||||
};
|
};
|
||||||
|
@ -273,7 +461,7 @@ function initializeDialogEditor() {
|
||||||
alert('Invalid duration. Please enter a positive number.');
|
alert('Invalid duration. Please enter a positive number.');
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
dialogItems.push({ type: 'silence', duration: duration });
|
dialogItems.push(normalizeDialogItem({ type: 'silence', duration: duration }));
|
||||||
renderDialogItems();
|
renderDialogItems();
|
||||||
clearTempInputArea();
|
clearTempInputArea();
|
||||||
};
|
};
|
||||||
|
@ -301,40 +489,25 @@ function initializeDialogEditor() {
|
||||||
}
|
}
|
||||||
if (dialogItems.length === 0) {
|
if (dialogItems.length === 0) {
|
||||||
alert('Please add at least one speech or silence line to the dialog.');
|
alert('Please add at least one speech or silence line to the dialog.');
|
||||||
return;
|
return; // Prevent further execution if no dialog items
|
||||||
}
|
}
|
||||||
|
|
||||||
// Clear previous results and show loading/status
|
// Smart dialog-wide generation: use pre-generated audio where present
|
||||||
if (generationLogPre) generationLogPre.textContent = 'Generating dialog...';
|
const dialogItemsToGenerate = dialogItems.map(item => {
|
||||||
if (audioPlayer) {
|
// Only send minimal fields for items that need generation
|
||||||
audioPlayer.style.display = 'none';
|
if (item.audioUrl) {
|
||||||
audioPlayer.src = ''; // Clear previous audio source
|
return { ...item, use_existing_audio: true, audio_url: item.audioUrl };
|
||||||
}
|
} else {
|
||||||
if (downloadZipLink) {
|
// Remove frontend-only fields
|
||||||
downloadZipLink.style.display = 'none';
|
const payload = { ...item };
|
||||||
downloadZipLink.href = '#';
|
delete payload.audioUrl; delete payload.isGenerating; delete payload.error;
|
||||||
downloadZipLink.textContent = '';
|
return payload;
|
||||||
}
|
}
|
||||||
if (zipArchivePlaceholder) zipArchivePlaceholder.style.display = 'block'; // Show placeholder
|
});
|
||||||
if (resultsDisplaySection) resultsDisplaySection.style.display = 'block'; // Make sure it's visible
|
|
||||||
|
|
||||||
const payload = {
|
const payload = {
|
||||||
output_base_name: outputBaseName,
|
output_base_name: outputBaseName,
|
||||||
dialog_items: dialogItems.map(item => {
|
dialog_items: dialogItemsToGenerate
|
||||||
// For now, we are not collecting TTS params in the UI for speech items.
|
|
||||||
// The backend will use defaults. If we add UI for these later, they'd be included here.
|
|
||||||
if (item.type === 'speech') {
|
|
||||||
return {
|
|
||||||
type: item.type,
|
|
||||||
speaker_id: item.speaker_id,
|
|
||||||
text: item.text,
|
|
||||||
// exaggeration: item.exaggeration, // Example for future UI enhancement
|
|
||||||
// cfg_weight: item.cfg_weight,
|
|
||||||
// temperature: item.temperature
|
|
||||||
};
|
|
||||||
}
|
|
||||||
return item; // for silence items
|
|
||||||
})
|
|
||||||
};
|
};
|
||||||
|
|
||||||
try {
|
try {
|
||||||
|
@ -345,7 +518,10 @@ function initializeDialogEditor() {
|
||||||
if (generationLogPre) generationLogPre.textContent = result.log || 'No log output.';
|
if (generationLogPre) generationLogPre.textContent = result.log || 'No log output.';
|
||||||
|
|
||||||
if (result.concatenated_audio_url && audioPlayer) { // Check audioPlayer, not audioSource
|
if (result.concatenated_audio_url && audioPlayer) { // Check audioPlayer, not audioSource
|
||||||
audioPlayer.src = result.concatenated_audio_url.startsWith('http') ? result.concatenated_audio_url : `${API_BASE_URL_FOR_FILES}${result.concatenated_audio_url}`;
|
// Cache-busting: append timestamp to force reload
|
||||||
|
let audioUrl = result.concatenated_audio_url.startsWith('http') ? result.concatenated_audio_url : `${API_BASE_URL_FOR_FILES}${result.concatenated_audio_url}`;
|
||||||
|
audioUrl += (audioUrl.includes('?') ? '&' : '?') + 't=' + Date.now();
|
||||||
|
audioPlayer.src = audioUrl;
|
||||||
audioPlayer.load(); // Call load() after setting new source
|
audioPlayer.load(); // Call load() after setting new source
|
||||||
audioPlayer.style.display = 'block';
|
audioPlayer.style.display = 'block';
|
||||||
} else {
|
} else {
|
||||||
|
@ -372,8 +548,274 @@ function initializeDialogEditor() {
|
||||||
});
|
});
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// --- Save/Load Script Functionality ---
|
||||||
|
function saveDialogScript() {
|
||||||
|
if (dialogItems.length === 0) {
|
||||||
|
alert('No dialog items to save. Please add some speech or silence lines first.');
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Filter out UI-specific fields and create clean data for export
|
||||||
|
const exportData = dialogItems.map(item => {
|
||||||
|
const cleanItem = {
|
||||||
|
type: item.type
|
||||||
|
};
|
||||||
|
|
||||||
|
if (item.type === 'speech') {
|
||||||
|
cleanItem.speaker_id = item.speaker_id;
|
||||||
|
cleanItem.text = item.text;
|
||||||
|
// Include TTS parameters if they exist (will use defaults if not present)
|
||||||
|
if (item.exaggeration !== undefined) cleanItem.exaggeration = item.exaggeration;
|
||||||
|
if (item.cfg_weight !== undefined) cleanItem.cfg_weight = item.cfg_weight;
|
||||||
|
if (item.temperature !== undefined) cleanItem.temperature = item.temperature;
|
||||||
|
} else if (item.type === 'silence') {
|
||||||
|
cleanItem.duration = item.duration;
|
||||||
|
}
|
||||||
|
|
||||||
|
return cleanItem;
|
||||||
|
});
|
||||||
|
|
||||||
|
// Convert to JSONL format (one JSON object per line)
|
||||||
|
const jsonlContent = exportData.map(item => JSON.stringify(item)).join('\n');
|
||||||
|
|
||||||
|
// Create and download file
|
||||||
|
const blob = new Blob([jsonlContent], { type: 'application/jsonl' });
|
||||||
|
const url = URL.createObjectURL(blob);
|
||||||
|
const link = document.createElement('a');
|
||||||
|
|
||||||
|
// Generate filename with timestamp
|
||||||
|
const timestamp = new Date().toISOString().replace(/[:.]/g, '-').slice(0, 19);
|
||||||
|
const filename = `dialog_script_${timestamp}.jsonl`;
|
||||||
|
|
||||||
|
link.href = url;
|
||||||
|
link.download = filename;
|
||||||
|
link.style.display = 'none';
|
||||||
|
document.body.appendChild(link);
|
||||||
|
link.click();
|
||||||
|
document.body.removeChild(link);
|
||||||
|
URL.revokeObjectURL(url);
|
||||||
|
|
||||||
|
console.log(`Dialog script saved as ${filename}`);
|
||||||
|
}
|
||||||
|
|
||||||
|
function loadDialogScript(file) {
|
||||||
|
if (!file) {
|
||||||
|
alert('Please select a file to load.');
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
const reader = new FileReader();
|
||||||
|
reader.onload = async function(e) {
|
||||||
|
try {
|
||||||
|
const content = e.target.result;
|
||||||
|
const lines = content.trim().split('\n');
|
||||||
|
const loadedItems = [];
|
||||||
|
|
||||||
|
for (let i = 0; i < lines.length; i++) {
|
||||||
|
const line = lines[i].trim();
|
||||||
|
if (!line) continue; // Skip empty lines
|
||||||
|
|
||||||
|
try {
|
||||||
|
const item = JSON.parse(line);
|
||||||
|
const validatedItem = validateDialogItem(item, i + 1);
|
||||||
|
if (validatedItem) {
|
||||||
|
loadedItems.push(normalizeDialogItem(validatedItem));
|
||||||
|
}
|
||||||
|
} catch (parseError) {
|
||||||
|
console.error(`Error parsing line ${i + 1}:`, parseError);
|
||||||
|
alert(`Error parsing line ${i + 1}: ${parseError.message}`);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if (loadedItems.length === 0) {
|
||||||
|
alert('No valid dialog items found in the file.');
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Confirm replacement if existing items
|
||||||
|
if (dialogItems.length > 0) {
|
||||||
|
const confirmed = confirm(
|
||||||
|
`This will replace your current dialog (${dialogItems.length} items) with the loaded script (${loadedItems.length} items). Continue?`
|
||||||
|
);
|
||||||
|
if (!confirmed) return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Ensure speakers are loaded before rendering
|
||||||
|
if (availableSpeakersCache.length === 0) {
|
||||||
|
try {
|
||||||
|
availableSpeakersCache = await getSpeakers();
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Error fetching speakers:', error);
|
||||||
|
alert('Could not load speakers. Dialog loaded but speaker names may not display correctly.');
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Replace current dialog
|
||||||
|
dialogItems.splice(0, dialogItems.length, ...loadedItems);
|
||||||
|
renderDialogItems();
|
||||||
|
|
||||||
|
console.log(`Loaded ${loadedItems.length} dialog items from script`);
|
||||||
|
alert(`Successfully loaded ${loadedItems.length} dialog items.`);
|
||||||
|
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Error loading dialog script:', error);
|
||||||
|
alert(`Error loading dialog script: ${error.message}`);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
reader.onerror = function() {
|
||||||
|
alert('Error reading file. Please try again.');
|
||||||
|
};
|
||||||
|
|
||||||
|
reader.readAsText(file);
|
||||||
|
}
|
||||||
|
|
||||||
|
function validateDialogItem(item, lineNumber) {
|
||||||
|
if (!item || typeof item !== 'object') {
|
||||||
|
throw new Error(`Line ${lineNumber}: Invalid item format`);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!item.type || !['speech', 'silence'].includes(item.type)) {
|
||||||
|
throw new Error(`Line ${lineNumber}: Invalid or missing type. Must be 'speech' or 'silence'`);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (item.type === 'speech') {
|
||||||
|
if (!item.speaker_id || typeof item.speaker_id !== 'string') {
|
||||||
|
throw new Error(`Line ${lineNumber}: Speech items must have a valid speaker_id`);
|
||||||
|
}
|
||||||
|
if (!item.text || typeof item.text !== 'string') {
|
||||||
|
throw new Error(`Line ${lineNumber}: Speech items must have text`);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check if speaker exists in available speakers
|
||||||
|
const speakerExists = availableSpeakersCache.some(speaker => speaker.id === item.speaker_id);
|
||||||
|
if (availableSpeakersCache.length > 0 && !speakerExists) {
|
||||||
|
console.warn(`Line ${lineNumber}: Speaker '${item.speaker_id}' not found in available speakers`);
|
||||||
|
// Don't throw error, just warn - speaker might be added later
|
||||||
|
}
|
||||||
|
|
||||||
|
return {
|
||||||
|
type: 'speech',
|
||||||
|
speaker_id: item.speaker_id,
|
||||||
|
text: item.text
|
||||||
|
};
|
||||||
|
} else if (item.type === 'silence') {
|
||||||
|
if (typeof item.duration !== 'number' || item.duration <= 0) {
|
||||||
|
throw new Error(`Line ${lineNumber}: Silence items must have a positive duration number`);
|
||||||
|
}
|
||||||
|
|
||||||
|
return {
|
||||||
|
type: 'silence',
|
||||||
|
duration: item.duration
|
||||||
|
};
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Event handlers for save/load
|
||||||
|
if (saveScriptBtn) {
|
||||||
|
saveScriptBtn.addEventListener('click', saveDialogScript);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (loadScriptBtn && loadScriptInput) {
|
||||||
|
loadScriptBtn.addEventListener('click', () => {
|
||||||
|
loadScriptInput.click();
|
||||||
|
});
|
||||||
|
|
||||||
|
loadScriptInput.addEventListener('change', (e) => {
|
||||||
|
const file = e.target.files[0];
|
||||||
|
if (file) {
|
||||||
|
loadDialogScript(file);
|
||||||
|
// Reset input so same file can be loaded again
|
||||||
|
e.target.value = '';
|
||||||
|
}
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
console.log('Dialog Editor Initialized');
|
console.log('Dialog Editor Initialized');
|
||||||
renderDialogItems(); // Initial render (empty)
|
renderDialogItems(); // Initial render (empty)
|
||||||
|
|
||||||
|
// Add a function to show TTS settings modal
|
||||||
|
function showTTSSettingsModal(item, index) {
|
||||||
|
const modal = document.getElementById('tts-settings-modal');
|
||||||
|
const exaggerationSlider = document.getElementById('tts-exaggeration');
|
||||||
|
const exaggerationValue = document.getElementById('tts-exaggeration-value');
|
||||||
|
const cfgWeightSlider = document.getElementById('tts-cfg-weight');
|
||||||
|
const cfgWeightValue = document.getElementById('tts-cfg-weight-value');
|
||||||
|
const temperatureSlider = document.getElementById('tts-temperature');
|
||||||
|
const temperatureValue = document.getElementById('tts-temperature-value');
|
||||||
|
const saveBtn = document.getElementById('tts-settings-save');
|
||||||
|
const cancelBtn = document.getElementById('tts-settings-cancel');
|
||||||
|
const closeBtn = document.getElementById('tts-modal-close');
|
||||||
|
|
||||||
|
// Set current values
|
||||||
|
exaggerationSlider.value = item.exaggeration || 0.5;
|
||||||
|
exaggerationValue.textContent = exaggerationSlider.value;
|
||||||
|
cfgWeightSlider.value = item.cfg_weight || 0.5;
|
||||||
|
cfgWeightValue.textContent = cfgWeightSlider.value;
|
||||||
|
temperatureSlider.value = item.temperature || 0.8;
|
||||||
|
temperatureValue.textContent = temperatureSlider.value;
|
||||||
|
|
||||||
|
// Update value displays when sliders change
|
||||||
|
const updateValueDisplay = (slider, display) => {
|
||||||
|
display.textContent = slider.value;
|
||||||
|
};
|
||||||
|
|
||||||
|
exaggerationSlider.oninput = () => updateValueDisplay(exaggerationSlider, exaggerationValue);
|
||||||
|
cfgWeightSlider.oninput = () => updateValueDisplay(cfgWeightSlider, cfgWeightValue);
|
||||||
|
temperatureSlider.oninput = () => updateValueDisplay(temperatureSlider, temperatureValue);
|
||||||
|
|
||||||
|
// Show modal
|
||||||
|
modal.style.display = 'flex';
|
||||||
|
|
||||||
|
// Save settings
|
||||||
|
const saveSettings = () => {
|
||||||
|
dialogItems[index].exaggeration = parseFloat(exaggerationSlider.value);
|
||||||
|
dialogItems[index].cfg_weight = parseFloat(cfgWeightSlider.value);
|
||||||
|
dialogItems[index].temperature = parseFloat(temperatureSlider.value);
|
||||||
|
|
||||||
|
// Clear any existing audio since settings changed
|
||||||
|
dialogItems[index].audioUrl = null;
|
||||||
|
|
||||||
|
closeModal();
|
||||||
|
renderDialogItems(); // Re-render to reflect changes
|
||||||
|
console.log('TTS settings updated for item:', dialogItems[index]);
|
||||||
|
};
|
||||||
|
|
||||||
|
// Close modal
|
||||||
|
const closeModal = () => {
|
||||||
|
modal.style.display = 'none';
|
||||||
|
// Clean up event listeners
|
||||||
|
exaggerationSlider.oninput = null;
|
||||||
|
cfgWeightSlider.oninput = null;
|
||||||
|
temperatureSlider.oninput = null;
|
||||||
|
saveBtn.onclick = null;
|
||||||
|
cancelBtn.onclick = null;
|
||||||
|
closeBtn.onclick = null;
|
||||||
|
modal.onclick = null;
|
||||||
|
};
|
||||||
|
|
||||||
|
// Event listeners
|
||||||
|
saveBtn.onclick = saveSettings;
|
||||||
|
cancelBtn.onclick = closeModal;
|
||||||
|
closeBtn.onclick = closeModal;
|
||||||
|
|
||||||
|
// Close modal when clicking outside
|
||||||
|
modal.onclick = (e) => {
|
||||||
|
if (e.target === modal) {
|
||||||
|
closeModal();
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
// Close modal on Escape key
|
||||||
|
const handleEscape = (e) => {
|
||||||
|
if (e.key === 'Escape') {
|
||||||
|
closeModal();
|
||||||
|
document.removeEventListener('keydown', handleEscape);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
document.addEventListener('keydown', handleEscape);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// --- Results Display --- //
|
// --- Results Display --- //
|
||||||
|
|
|
@ -10,3 +10,12 @@
|
||||||
fb84ce1c-f32d-4df9-9673-2c64e9603133:
|
fb84ce1c-f32d-4df9-9673-2c64e9603133:
|
||||||
name: Debbie
|
name: Debbie
|
||||||
sample_path: speaker_samples/fb84ce1c-f32d-4df9-9673-2c64e9603133.wav
|
sample_path: speaker_samples/fb84ce1c-f32d-4df9-9673-2c64e9603133.wav
|
||||||
|
90fcd672-ba84-441a-ac6c-0449a59653bd:
|
||||||
|
name: dummy_speaker
|
||||||
|
sample_path: speaker_samples/90fcd672-ba84-441a-ac6c-0449a59653bd.wav
|
||||||
|
a6387c23-4ca4-42b5-8aaf-5699dbabbdf0:
|
||||||
|
name: Mike
|
||||||
|
sample_path: speaker_samples/a6387c23-4ca4-42b5-8aaf-5699dbabbdf0.wav
|
||||||
|
6cf4d171-667d-4bc8-adbb-6d9b7c620cb8:
|
||||||
|
name: Minnie
|
||||||
|
sample_path: speaker_samples/6cf4d171-667d-4bc8-adbb-6d9b7c620cb8.wav
|
||||||
|
|
Loading…
Reference in New Issue