Add API reference

Added persistence of TTS settings in dialog save/restore.
Added settings to allow control of exaggeration, cfg_weight, and temperature on each line.
2025-06-06 23:18:47 -05:00 · 2025-06-06 11:58:48 -05:00 · 2025-06-06 11:53:43 -05:00 · 2025-06-06 11:35:39 -05:00 · 2025-06-06 11:33:54 -05:00 · 2025-06-06 10:36:06 -05:00
12 changed files with 1728 additions and 90 deletions
--- a/API_REFERENCE.md
+++ b/API_REFERENCE.md
@ -0,0 +1,518 @@
 # Chatterbox TTS API Reference
 ## Overview
 The Chatterbox TTS API is a FastAPI-based backend service that provides text-to-speech capabilities with speaker management and dialog generation features. The API supports creating custom speakers from audio samples and generating complex dialogs with multiple speakers, silences, and fine-tuned TTS parameters.
 **Base URL**: `http://127.0.0.1:8000`  
 **API Version**: 0.1.0  
 **Framework**: FastAPI with automatic OpenAPI documentation
 ## Quick Start
 - **Interactive API Documentation**: `http://127.0.0.1:8000/docs` (Swagger UI)
 - **Alternative Documentation**: `http://127.0.0.1:8000/redoc` (ReDoc)
 - **OpenAPI Schema**: `http://127.0.0.1:8000/openapi.json`
 ## Authentication
 Currently, the API does not require authentication. CORS is configured to allow requests from `localhost:8001` and `127.0.0.1:8001`.
 ---
 ## Endpoints
 ### 🏠 Root Endpoint
 #### `GET /`
 Welcome message and API status check.
 **Response:**
 ```json
 {
  "message": "Welcome to the Chatterbox TTS API!"
 }
 ```
 ---
 ## 👥 Speaker Management
 ### `GET /api/speakers/`
 Retrieve all available speakers.
 **Response Model:** `List[Speaker]`
 **Example Response:**
 ```json
 [
  {
    "id": "speaker_001",
    "name": "John Doe",
    "sample_path": "/path/to/speaker_samples/john_doe.wav"
  },
  {
    "id": "speaker_002", 
    "name": "Jane Smith",
    "sample_path": "/path/to/speaker_samples/jane_smith.wav"
  }
 ]
 ```
 **Status Codes:**
 - `200`: Success
 ---
 ### `POST /api/speakers/`
 Create a new speaker from an audio sample.
 **Request Type:** `multipart/form-data`
 **Parameters:**
 - `name` (form field, required): Speaker name
 - `audio_file` (file upload, required): Audio sample file (WAV, MP3, etc.)
 **Response Model:** `SpeakerResponse`
 **Example Response:**
 ```json
 {
  "id": "speaker_003",
  "name": "Alex Johnson",
  "message": "Speaker added successfully."
 }
 ```
 **Status Codes:**
 - `201`: Speaker created successfully
 - `400`: Invalid file type or missing file
 - `500`: Server error during speaker creation
 **Example cURL:**
 ```bash
 curl -X POST "http://127.0.0.1:8000/api/speakers/" \
  -F "name=Alex Johnson" \
  -F "audio_file=@/path/to/sample.wav"
 ```
 ---
 ### `GET /api/speakers/{speaker_id}`
 Get details for a specific speaker.
 **Path Parameters:**
 - `speaker_id` (string, required): Unique speaker identifier
 **Response Model:** `Speaker`
 **Example Response:**
 ```json
 {
  "id": "speaker_001",
  "name": "John Doe", 
  "sample_path": "/path/to/speaker_samples/john_doe.wav"
 }
 ```
 **Status Codes:**
 - `200`: Success
 - `404`: Speaker not found
 ---
 ### `DELETE /api/speakers/{speaker_id}`
 Delete a speaker by ID.
 **Path Parameters:**
 - `speaker_id` (string, required): Unique speaker identifier
 **Example Response:**
 ```json
 {
  "message": "Speaker deleted successfully"
 }
 ```
 **Status Codes:**
 - `200`: Speaker deleted successfully
 - `404`: Speaker not found
 ---
 ## 🎭 Dialog Generation
 ### `POST /api/dialog/generate_line`
 Generate audio for a single dialog line (speech or silence).
 **Request Body:** Raw JSON object representing either a `SpeechItem` or `SilenceItem`
 #### Speech Item Example:
 ```json
 {
  "type": "speech",
  "speaker_id": "speaker_001",
  "text": "Hello, this is a test message.",
  "exaggeration": 0.7,
  "cfg_weight": 0.6,
  "temperature": 0.8,
  "use_existing_audio": false,
  "audio_url": null
 }
 ```
 #### Silence Item Example:
 ```json
 {
  "type": "silence",
  "duration": 2.0,
  "use_existing_audio": false,
  "audio_url": null
 }
 ```
 **Response:**
 ```json
 {
  "audio_url": "/generated_audio/line_abc123def456.wav",
  "type": "speech",
  "text": "Hello, this is a test message."
 }
 ```
 **Status Codes:**
 - `200`: Audio generated successfully
 - `400`: Invalid request format or unknown dialog item type
 - `404`: Speaker not found
 - `500`: Server error during generation
 ---
 ### `POST /api/dialog/generate`
 Generate a complete dialog from multiple speech and silence items.
 **Request Model:** `DialogRequest`
 **Request Body:**
 ```json
 {
  "dialog_items": [
    {
      "type": "speech",
      "speaker_id": "speaker_001", 
      "text": "Welcome to our podcast!",
      "exaggeration": 0.5,
      "cfg_weight": 0.5,
      "temperature": 0.8
    },
    {
      "type": "silence",
      "duration": 1.0
    },
    {
      "type": "speech",
      "speaker_id": "speaker_002",
      "text": "Thank you for having me!",
      "exaggeration": 0.6,
      "cfg_weight": 0.7,
      "temperature": 0.9
    }
  ],
  "output_base_name": "podcast_episode_01"
 }
 ```
 **Response Model:** `DialogResponse`
 **Example Response:**
 ```json
 {
  "log": "Processing dialog with 3 items...\nGenerating speech for item 1...\nGenerating silence for item 2...\nGenerating speech for item 3...\nConcatenating audio segments...\nZIP archive created at: /path/to/output.zip",
  "concatenated_audio_url": "/generated_audio/podcast_episode_01_concatenated.wav",
  "zip_archive_url": "/generated_audio/podcast_episode_01_archive.zip", 
  "temp_dir_path": "/path/to/temp/directory",
  "error_message": null
 }
 ```
 **Status Codes:**
 - `200`: Dialog generated successfully
 - `400`: Invalid request format or validation errors
 - `404`: Speaker or file not found
 - `500`: Server error during generation
 ---
 ## 📁 Static File Serving
 ### `GET /generated_audio/{filename}`
 Serve generated audio files and ZIP archives.
 **Path Parameters:**
 - `filename` (string, required): Name of the generated file
 **Response:** Binary audio file or ZIP archive
 **Example URLs:**
 - `http://127.0.0.1:8000/generated_audio/dialog_concatenated.wav`
 - `http://127.0.0.1:8000/generated_audio/dialog_archive.zip`
 ---
 ## 📋 Data Models
 ### Speaker Models
 #### `Speaker`
 ```json
 {
  "id": "string",
  "name": "string", 
  "sample_path": "string|null"
 }
 ```
 #### `SpeakerResponse`
 ```json
 {
  "id": "string",
  "name": "string",
  "message": "string|null"
 }
 ```
 ### Dialog Models
 #### `SpeechItem`
 ```json
 {
  "type": "speech",
  "speaker_id": "string",
  "text": "string",
  "exaggeration": 0.5,        // 0.0-2.0, controls expressiveness
  "cfg_weight": 0.5,          // 0.0-2.0, alignment with speaker characteristics  
  "temperature": 0.8,         // 0.0-2.0, randomness in generation
  "use_existing_audio": false,
  "audio_url": "string|null"
 }
 ```
 #### `SilenceItem`
 ```json
 {
  "type": "silence",
  "duration": 1.0,            // seconds, must be > 0
  "use_existing_audio": false,
  "audio_url": "string|null"
 }
 ```
 #### `DialogRequest`
 ```json
 {
  "dialog_items": [
    // Array of SpeechItem and/or SilenceItem objects
  ],
  "output_base_name": "string"  // Base name for output files
 }
 ```
 #### `DialogResponse`
 ```json
 {
  "log": "string",                        // Processing log
  "concatenated_audio_url": "string|null", // URL to final audio
  "zip_archive_url": "string|null",       // URL to ZIP archive
  "temp_dir_path": "string|null",         // Server temp directory
  "error_message": "string|null"          // Error details if failed
 }
 ```
 ---
 ## 🎛️ TTS Parameters
 ### Exaggeration (`exaggeration`)
 - **Range**: 0.0 - 2.0
 - **Default**: 0.5
 - **Description**: Controls the expressiveness of speech. Higher values produce more exaggerated, emotional speech.
 ### CFG Weight (`cfg_weight`)
 - **Range**: 0.0 - 2.0  
 - **Default**: 0.5
 - **Description**: Classifier-Free Guidance weight. Higher values make speech more aligned with the prompt text and speaker characteristics.
 ### Temperature (`temperature`)
 - **Range**: 0.0 - 2.0
 - **Default**: 0.8
 - **Description**: Controls randomness in generation. Lower values produce more deterministic speech, higher values add more variation.
 ---
 ## 🔧 Configuration
 ### Environment Variables
 The API uses the following directory structure (configurable in `app/config.py`):
 - **Speaker Samples**: `{PROJECT_ROOT}/speaker_data/speaker_samples/`
 - **Generated Audio**: `{PROJECT_ROOT}/backend/tts_generated_dialogs/`
 - **Temporary Files**: `{PROJECT_ROOT}/tts_temp_outputs/`
 ### CORS Settings
 - Allowed Origins: `http://localhost:8001`, `http://127.0.0.1:8001`
 - Allowed Methods: All
 - Allowed Headers: All
 - Credentials: Enabled
 ---
 ## 🚀 Usage Examples
 ### Python Client Example
 ```python
 import requests
 import json
 # Base URL
 BASE_URL = "http://127.0.0.1:8000"
 # Get all speakers
 speakers = requests.get(f"{BASE_URL}/api/speakers/").json()
 print("Available speakers:", speakers)
 # Generate a simple dialog
 dialog_request = {
    "dialog_items": [
        {
            "type": "speech",
            "speaker_id": speakers[0]["id"],
            "text": "Hello world!",
            "exaggeration": 0.7,
            "cfg_weight": 0.6,
            "temperature": 0.9
        },
        {
            "type": "silence", 
            "duration": 1.0
        }
    ],
    "output_base_name": "test_dialog"
 }
 response = requests.post(
    f"{BASE_URL}/api/dialog/generate",
    json=dialog_request
 )
 if response.status_code == 200:
    result = response.json()
    print("Dialog generated!")
    print("Audio URL:", result["concatenated_audio_url"])
    print("ZIP URL:", result["zip_archive_url"])
 else:
    print("Error:", response.text)
 ```
 ### JavaScript/Frontend Example
 ```javascript
 // Generate dialog
 const dialogRequest = {
  dialog_items: [
    {
      type: "speech",
      speaker_id: "speaker_001",
      text: "Welcome to our show!",
      exaggeration: 0.6,
      cfg_weight: 0.5,
      temperature: 0.8
    }
  ],
  output_base_name: "intro"
 };
 fetch('http://127.0.0.1:8000/api/dialog/generate', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
  },
  body: JSON.stringify(dialogRequest)
 })
 .then(response => response.json())
 .then(data => {
  console.log('Dialog generated:', data);
  // Play the audio
  const audio = new Audio(data.concatenated_audio_url);
  audio.play();
 });
 ```
 ---
 ## ⚠️ Error Handling
 ### Common Error Responses
 #### 400 Bad Request
 ```json
 {
  "detail": "Invalid value or configuration: Text cannot be empty"
 }
 ```
 #### 404 Not Found
 ```json
 {
  "detail": "Speaker sample for ID 'invalid_speaker' not found."
 }
 ```
 #### 500 Internal Server Error
 ```json
 {
  "detail": "Runtime error during dialog generation: CUDA out of memory"
 }
 ```
 ### Error Categories
 - **Validation Errors**: Invalid input format, missing required fields
 - **Resource Errors**: Speaker not found, file not accessible
 - **Processing Errors**: TTS model failures, audio processing issues
 - **System Errors**: Memory issues, disk space, model loading failures
 ---
 ## 🔍 Development & Debugging
 ### Running the Server
 ```bash
 # From project root
 uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000
 ```
 ### API Documentation
 - **Swagger UI**: `http://127.0.0.1:8000/docs`
 - **ReDoc**: `http://127.0.0.1:8000/redoc`
 ### Logging
 The API provides detailed logging in the `DialogResponse.log` field for dialog generation operations.
 ### File Management
 - Generated files are stored in `backend/tts_generated_dialogs/`
 - Temporary processing files are kept for inspection (not auto-deleted)
 - ZIP archives contain individual audio segments plus concatenated result
 ---
 ## 📝 Notes
 - The API automatically loads and unloads TTS models to manage memory usage
 - Speaker audio samples should be clear, single-speaker recordings for best results
 - Large dialogs may take significant time to process depending on hardware
 - Generated files are served statically and persist until manually cleaned up
 ---
 *Generated on: 2025-06-06*  
 *API Version: 0.1.0*
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -0,0 +1,102 @@
 # CLAUDE.md
 This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
 ## Common Development Commands
 ### Backend (FastAPI)
 ```bash
 # Install backend dependencies (run from project root)
 pip install -r backend/requirements.txt
 # Run backend development server (run from project root)
 uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000
 # Run backend tests
 python backend/run_api_test.py
 # Backend API accessible at http://127.0.0.1:8000
 # API docs at http://127.0.0.1:8000/docs
 ```
 ### Frontend Testing
 ```bash
 # Run frontend tests
 npm test
 ```
 ### Alternative Interfaces
 ```bash
 # Run Gradio interface (standalone TTS app)
 python gradio_app.py
 ```
 ## Architecture Overview
 This is a full-stack TTS (Text-to-Speech) application with three interfaces:
 1. **Modern web frontend** (vanilla JS) - Interactive dialog editor at `frontend/`
 2. **FastAPI backend** - REST API at `backend/`  
 3. **Gradio interface** - Alternative UI in `gradio_app.py`
 ### Frontend-Backend Communication
 - **Frontend**: Vanilla JS (ES6 modules) serving on port 8001
 - **Backend**: FastAPI serving on port 8000
 - **API Base**: `http://localhost:8000/api`
 - **CORS**: Configured for frontend communication
 - **File Serving**: Generated audio served via `/generated_audio/` endpoint
 ### Key API Endpoints
 - `/api/speakers/` - Speaker CRUD operations
 - `/api/dialog/generate/` - Full dialog generation  
 - `/api/dialog/generate_line/` - Single line generation
 - `/generated_audio/` - Static audio file serving
 ### Backend Service Architecture
 Located in `backend/app/services/`:
 - **TTSService**: Chatterbox TTS model lifecycle management
 - **SpeakerManagementService**: Speaker data and sample management
 - **DialogProcessorService**: Dialog script to audio processing
 - **AudioManipulationService**: Audio concatenation and ZIP creation
 ### Frontend Architecture
 - **Modular design**: `api.js` (API layer) + `app.js` (app logic)
 - **No framework**: Modern vanilla JavaScript with ES6+ features
 - **Interactive editor**: Table-based dialog creation with drag-drop reordering
 ### Data Flow
 1. User creates dialog in frontend table editor
 2. Frontend sends dialog items to `/api/dialog/generate/`
 3. Backend processes speech/silence items via services
 4. TTS generates audio, segments concatenated
 5. ZIP archive created with all outputs
 6. Frontend receives URLs for playback/download
 ### Speaker Configuration
 - **Location**: `speaker_data/speakers.yaml` and `speaker_data/speaker_samples/`
 - **Format**: YAML config referencing WAV audio samples
 - **Management**: Both API endpoints and file-based configuration
 ### Output Organization
 - `dialog_output/` - Generated dialog files
 - `single_output/` - Single utterance outputs  
 - `tts_outputs/` - Raw TTS generation files
 - Generated ZIPs contain organized file structure
 ## Development Setup Notes
 - Python virtual environment expected at project root (`.venv`)
 - Backend commands run from project root, not `backend/` directory
 - Frontend served separately (typically port 8001)
 - Speaker samples must be WAV format in `speaker_data/speaker_samples/`
--- a/backend/app/config.py
+++ b/backend/app/config.py
@ -17,3 +17,5 @@ TTS_TEMP_OUTPUT_DIR = PROJECT_ROOT / "tts_temp_outputs"
 # These are stored within the 'backend' directory to be easily servable.
 DIALOG_OUTPUT_PARENT_DIR = PROJECT_ROOT / "backend"
 DIALOG_GENERATED_DIR = DIALOG_OUTPUT_PARENT_DIR / "tts_generated_dialogs"
 # Alias for clarity and backward compatibility
 DIALOG_OUTPUT_DIR = DIALOG_GENERATED_DIR
--- a/backend/app/models/dialog_models.py
+++ b/backend/app/models/dialog_models.py
@ -11,10 +11,14 @@ class SpeechItem(DialogItemBase):
    exaggeration: Optional[float] = Field(0.5, description="Controls the expressiveness of the speech. Higher values lead to more exaggerated speech. Default from Gradio.")
    cfg_weight: Optional[float] = Field(0.5, description="Classifier-Free Guidance weight. Higher values make the speech more aligned with the prompt text and speaker characteristics. Default from Gradio.")
    temperature: Optional[float] = Field(0.8, description="Controls randomness in generation. Lower values make speech more deterministic, higher values more varied. Default from Gradio.")
    use_existing_audio: Optional[bool] = Field(False, description="If true and audio_url is provided, use the existing audio file instead of generating new audio for this line.")
    audio_url: Optional[str] = Field(None, description="Path or URL to pre-generated audio for this line (used if use_existing_audio is true).")
 class SilenceItem(DialogItemBase):
    type: Literal['silence'] = 'silence'
    duration: float = Field(..., gt=0, description="Duration of the silence in seconds.")
    use_existing_audio: Optional[bool] = Field(False, description="If true and audio_url is provided, use the existing audio file for silence instead of generating a new silent segment.")
    audio_url: Optional[str] = Field(None, description="Path or URL to pre-generated audio for this silence (used if use_existing_audio is true).")
 class DialogRequest(BaseModel):
    dialog_items: List[Union[SpeechItem, SilenceItem]] = Field(..., description="A list of speech and silence items.")
--- a/backend/app/routers/dialog.py
+++ b/backend/app/routers/dialog.py
@ -1,6 +1,7 @@
 from fastapi import APIRouter, Depends, HTTPException, BackgroundTasks
 from pathlib import Path
 import shutil
 import os
 from app.models.dialog_models import DialogRequest, DialogResponse
 from app.services.tts_service import TTSService
@ -32,6 +33,68 @@ def get_audio_manipulation_service():
    return AudioManipulationService()
 # --- Helper function to manage TTS model loading/unloading --- 
 from app.models.dialog_models import SpeechItem, SilenceItem
 from app.services.tts_service import TTSService
 from app.services.audio_manipulation_service import AudioManipulationService
 from app.services.speaker_service import SpeakerManagementService
 from fastapi import Body
 import uuid
 from pathlib import Path
@router.post("/generate_line")
 async def generate_line(
    item: dict = Body(...),
    tts_service: TTSService = Depends(get_tts_service),
    audio_manipulator: AudioManipulationService = Depends(get_audio_manipulation_service),
    speaker_service: SpeakerManagementService = Depends(get_speaker_management_service)
 ):
    """
    Generate audio for a single dialog line (speech or silence).
    Returns the URL of the generated audio file, or error details on failure.
    """
    try:
        if item.get("type") == "speech":
            speech = SpeechItem(**item)
            filename_base = f"line_{uuid.uuid4().hex}"
            out_dir = Path(config.DIALOG_GENERATED_DIR)
            # Get speaker sample path
            speaker_info = speaker_service.get_speaker_by_id(speech.speaker_id)
            if not speaker_info or not getattr(speaker_info, 'sample_path', None):
                raise HTTPException(status_code=404, detail=f"Speaker sample for ID '{speech.speaker_id}' not found.")
            speaker_sample_path = speaker_info.sample_path
            # Ensure absolute path
            if not os.path.isabs(speaker_sample_path):
                speaker_sample_path = str((Path(config.SPEAKER_SAMPLES_DIR) / Path(speaker_sample_path).name).resolve())
            # Generate speech (async)
            out_path = await tts_service.generate_speech(
                text=speech.text,
                speaker_sample_path=speaker_sample_path,
                output_filename_base=filename_base,
                speaker_id=speech.speaker_id,
                output_dir=out_dir,
                exaggeration=speech.exaggeration,
                cfg_weight=speech.cfg_weight,
                temperature=speech.temperature
            )
            audio_url = f"/generated_audio/{out_path.name}"
        elif item.get("type") == "silence":
            silence = SilenceItem(**item)
            filename = f"silence_{uuid.uuid4().hex}.wav"
            out_path = Path(config.DIALOG_GENERATED_DIR) / filename
            # Generate silence tensor and save as WAV
            silence_tensor = audio_manipulator._create_silence(silence.duration)
            import torchaudio
            torchaudio.save(str(out_path), silence_tensor, audio_manipulator.sample_rate)
            audio_url = f"/generated_audio/{filename}"
        else:
            raise HTTPException(status_code=400, detail="Unknown dialog item type.")
        return {"audio_url": audio_url}
    except Exception as e:
        import traceback
        tb = traceback.format_exc()
        raise HTTPException(status_code=500, detail=f"Exception: {str(e)}\nTraceback:\n{tb}")
 async def manage_tts_model_lifecycle(tts_service: TTSService, task_function, *args, **kwargs):
    """Loads TTS model, executes task, then unloads model."""
    try:
--- a/backend/app/services/dialog_processor_service.py
+++ b/backend/app/services/dialog_processor_service.py
@ -86,11 +86,58 @@ class DialogProcessorService:
        dialog_temp_dir.mkdir(parents=True, exist_ok=True)
        processing_log.append(f"Created temporary directory for segments: {dialog_temp_dir}")
        import shutil
        segment_idx = 0
        for i, item in enumerate(dialog_items):
            item_type = item.get("type")
            processing_log.append(f"Processing item {i+1}: type='{item_type}'")
            # --- Universal: Handle reuse of existing audio for both speech and silence ---
            use_existing_audio = item.get("use_existing_audio", False)
            audio_url = item.get("audio_url")
            if use_existing_audio and audio_url:
                # Determine source path (handle both absolute and relative)
                # Map web URL to actual file location in tts_generated_dialogs
                if audio_url.startswith("/generated_audio/"):
                    src_audio_path = config.DIALOG_OUTPUT_DIR / audio_url[len("/generated_audio/"):]
                else:
                    src_audio_path = Path(audio_url)
                    if not src_audio_path.is_absolute():
                        # Assume relative to the generated audio root dir
                        src_audio_path = config.DIALOG_OUTPUT_DIR / audio_url.lstrip("/\\")
                # Now src_audio_path should point to the real file in tts_generated_dialogs
                if src_audio_path.is_file():
                    segment_filename = f"{output_base_name}_seg{segment_idx}_reused.wav"
                    dest_path = (self.temp_audio_dir / output_base_name / segment_filename)
                    try:
                        if not src_audio_path.exists():
                            processing_log.append(f"[REUSE] Source audio file does not exist: {src_audio_path}")
                        else:
                            processing_log.append(f"[REUSE] Source audio file exists: {src_audio_path}, size={src_audio_path.stat().st_size} bytes")
                        shutil.copyfile(src_audio_path, dest_path)
                        if not dest_path.exists():
                            processing_log.append(f"[REUSE] Destination audio file was not created: {dest_path}")
                        else:
                            processing_log.append(f"[REUSE] Destination audio file created: {dest_path}, size={dest_path.stat().st_size} bytes")
                        # Only include 'type' and 'path' so the concatenator always includes this segment
                        segment_results.append({
                            "type": item_type,
                            "path": str(dest_path)
                        })
                        processing_log.append(f"Reused existing audio for item {i+1}: copied from {src_audio_path} to {dest_path}")
                    except Exception as e:
                        error_message = f"Failed to copy reused audio for item {i+1}: {e}"
                        processing_log.append(error_message)
                        segment_results.append({"type": "error", "message": error_message})
                    segment_idx += 1
                    continue
                else:
                    error_message = f"Audio file for reuse not found at {src_audio_path} for item {i+1}."
                    processing_log.append(error_message)
                    segment_results.append({"type": "error", "message": error_message})
                    segment_idx += 1
                    continue
            if item_type == "speech":
                speaker_id = item.get("speaker_id")
                text = item.get("text")
@ -161,6 +208,11 @@ class DialogProcessorService:
                processing_log.append(f"Unknown item type '{item_type}' at item {i+1}. Skipping.")
                segment_results.append({"type": "error", "message": f"Unknown item type: {item_type}"})
        # Log the full segment_results list for debugging
        processing_log.append("[DEBUG] Final segment_results list:")
        for idx, seg in enumerate(segment_results):
            processing_log.append(f"  [{idx}] {seg}")
        return {
            "log": "\n".join(processing_log),
            "segment_files": segment_results,
--- a/backend/run_api_test.py
+++ b/backend/run_api_test.py
@ -16,7 +16,7 @@ DIALOG_PAYLOAD = {
    "dialog_items": [
        {
            "type": "speech",
-            "speaker_id": "dummy_speaker", # Ensure this speaker exists in your speakers.yaml and has a sample .wav
+            "speaker_id": "90fcd672-ba84-441a-ac6c-0449a59653bd", # Correct UUID for dummy_speaker
            "text": "This is a test from the Python script. One, two, three.",
            "exaggeration": 1.5,
            "cfg_weight": 4.0,
@ -28,7 +28,7 @@ DIALOG_PAYLOAD = {
        },
        {
            "type": "speech",
-            "speaker_id": "dummy_speaker",
+            "speaker_id": "90fcd672-ba84-441a-ac6c-0449a59653bd",
            "text": "Testing complete. All systems nominal."
        },
        {
@ -104,5 +104,49 @@ def run_test():
        print(f"An unexpected error occurred: {e}")
        print("Test FAILED (Unexpected error)")
 def test_generate_line_speech():
    url = f"{API_BASE_URL}/generate_line"
    payload = {
        "type": "speech",
        "speaker_id": "90fcd672-ba84-441a-ac6c-0449a59653bd",  # Correct UUID for dummy_speaker
        "text": "This is a per-line TTS test.",
        "exaggeration": 1.0,
        "cfg_weight": 2.0,
        "temperature": 0.8
    }
    print(f"\nTesting /generate_line with speech item: {payload}")
    response = requests.post(url, json=payload)
    print(f"Status: {response.status_code}")
    try:
        data = response.json()
        print(f"Response: {json.dumps(data, indent=2)}")
        if response.status_code == 200 and "audio_url" in data:
            print("Speech line test PASSED.")
        else:
            print("Speech line test FAILED.")
    except Exception as e:
        print(f"Speech line test FAILED: {e}")
 def test_generate_line_silence():
    url = f"{API_BASE_URL}/generate_line"
    payload = {
        "type": "silence",
        "duration": 1.25
    }
    print(f"\nTesting /generate_line with silence item: {payload}")
    response = requests.post(url, json=payload)
    print(f"Status: {response.status_code}")
    try:
        data = response.json()
        print(f"Response: {json.dumps(data, indent=2)}")
        if response.status_code == 200 and "audio_url" in data:
            print("Silence line test PASSED.")
        else:
            print("Silence line test FAILED.")
    except Exception as e:
        print(f"Silence line test FAILED: {e}")
 if __name__ == "__main__":
    run_test()
    test_generate_line_speech()
    test_generate_line_silence()
--- a/frontend/css/style.css
+++ b/frontend/css/style.css
@ -1,11 +1,57 @@
-/* Modern, clean, and accessible UI styles for Chatterbox TTS */
+/* CSS Custom Properties - Color Palette */
 :root {
    /* Primary Colors */
    --primary-blue: #163b65;
    --primary-blue-dark: #357ab8;
    --primary-blue-darker: #205081;
    /* Background Colors */
    --bg-body: #f7f9fa;
    --bg-white: #fff;
    --bg-light: #f3f5f7;
    --bg-lighter: #f9fafb;
    --bg-blue-light: #f3f7fa;
    --bg-blue-lighter: #eaf1fa;
    --bg-gray-light: #f7f7f7;
    /* Text Colors */
    --text-primary: #222;
    --text-secondary: #2b2b2b;
    --text-tertiary: #333;
    --text-white: #fff;
    --text-blue: #153f6f;
    --text-blue-dark: #357ab8;
    --text-blue-darker: #205081;
    /* Border Colors */
    --border-light: #1b0404;
    --border-medium: #cfd8dc;
    --border-blue: #b5c6df;
    --border-gray: #e3e3e3;
    /* Status Colors */
    --error-bg: #e74c3c;
    --error-bg-dark: #c0392b;
    --warning-bg: #f9e79f;
    --warning-text: #b7950b;
    --warning-border: #f7ca18;
    /* Header/Footer */
    --header-bg: #222e3a;
    /* Shadows */
    --shadow-light: rgba(44,62,80,0.04);
    --shadow-medium: rgba(44,62,80,0.06);
    --shadow-strong: rgba(44,62,80,0.07);
 }
 body {
    font-family: 'Segoe UI', 'Roboto', 'Arial', sans-serif;
    line-height: 1.7;
    margin: 0;
    padding: 0;
-    background-color: #f7f9fa;
+    background-color: var(--bg-body);
-    color: #222;
+    color: var(--text-primary);
 }
 .container {
@ -15,11 +61,11 @@ body {
 }
 header {
-    background: #222e3a;
+    background: var(--header-bg);
-    color: #fff;
+    color: var(--text-white);
    padding: 1.5rem 0 1rem 0;
    text-align: center;
-    border-bottom: 3px solid #4a90e2;
+    border-bottom: 3px solid var(--primary-blue);
 }
 h1 {
@ -49,7 +95,6 @@ main {
    padding: 0;
 }
 #results-display.panel {
    flex: 1 1 100%;
    min-width: 0;
@ -60,52 +105,115 @@ main {
 #dialog-items-table {
    width: 100%;
    border-collapse: collapse;
-    background: #fff;
+    background: var(--bg-white);
    border-radius: 8px;
    overflow: hidden;
    font-size: 1rem;
    margin-bottom: 0;
    table-layout: fixed;
 }
 #dialog-items-table th, #dialog-items-table td {
-    padding: 10px 12px;
+    padding: 7px 10px;
-    border-bottom: 1px solid #e3e3e3;
+    border: 1px solid var(--border-light);
    text-align: left;
    vertical-align: middle;
    font-weight: 300;
    font-size: 0.97rem;
    color: var(--text-secondary);
    overflow: hidden;
    text-overflow: ellipsis;
    white-space: nowrap;
 }
 /* Widen the Text/Duration column */
 #dialog-items-table th:nth-child(3), #dialog-items-table td:nth-child(3) {
    min-width: 320px;
    width: 50%;
    font-weight: 300;
    font-size: 1rem;
 }
 /* Make the Speaker (2nd) column narrower */
 #dialog-items-table th:nth-child(2), #dialog-items-table td:nth-child(2) {
    width: 60px;
    min-width: 60px;
    max-width: 60px;
    text-align: center;
 }
 /* Make the Actions (4th) column narrower */
 #dialog-items-table th:nth-child(4), #dialog-items-table td:nth-child(4) {
    width: 110px;
    min-width: 90px;
    max-width: 130px;
    text-align: left;
    padding-left: 0;
    padding-right: 0;
 }
 #dialog-items-table th:first-child, #dialog-items-table td.type-icon-cell {
    width: 44px;
    min-width: 36px;
    max-width: 48px;
    text-align: center;
    padding-left: 0;
    padding-right: 0;
 }
 .type-icon-cell {
    text-align: center;
    vertical-align: middle;
 }
 .dialog-type-icon {
    font-size: 1.4em;
    display: inline-block;
    line-height: 1;
    vertical-align: middle;
 }
 #dialog-items-table th {
-    background: #f3f7fa;
+    background: var(--bg-light);
-    color: #4a90e2;
+    color: var(--primary-blue);
    font-weight: 600;
    font-size: 1.05rem;
 }
 #dialog-items-table tr:last-child td {
    border-bottom: none;
 }
 #dialog-items-table td.actions {
-    text-align: center;
+    text-align: left;
-    min-width: 90px;
+    min-width: 110px;
    white-space: nowrap;
 }
 /* Collapsible log details */
 details#generation-log-details {
    margin-bottom: 0;
    border-radius: 4px;
-    background: #f3f5f7;
+    background: var(--bg-light);
-    box-shadow: 0 1px 3px rgba(44,62,80,0.04);
+    box-shadow: 0 1px 3px var(--shadow-light);
    padding: 0 0 0 0;
    transition: box-shadow 0.15s;
 }
 details#generation-log-details[open] {
-    box-shadow: 0 2px 8px rgba(44,62,80,0.07);
+    box-shadow: 0 2px 8px var(--shadow-strong);
-    background: #f9fafb;
+    background: var(--bg-lighter);
 }
 details#generation-log-details summary {
    font-size: 1rem;
-    color: #357ab8;
+    color: var(--text-blue);
    padding: 10px 0 6px 0;
    outline: none;
 }
 details#generation-log-details summary:focus {
-    outline: 2px solid #4a90e2;
+    outline: 2px solid var(--primary-blue);
    border-radius: 3px;
 }
@ -132,9 +240,9 @@ details#generation-log-details summary:focus {
 }
 .card {
-    background: #fff;
+    background: var(--bg-white);
    border-radius: 8px;
-    box-shadow: 0 2px 8px rgba(44,62,80,0.07);
+    box-shadow: 0 2px 8px var(--shadow-medium);
    padding: 18px 20px;
    margin-bottom: 18px;
 }
@ -154,39 +262,41 @@ h2 {
    font-size: 1.5rem;
    margin-top: 0;
    margin-bottom: 16px;
-    color: #4a90e2;
+    color: var(--primary-blue);
    letter-spacing: 0.5px;
 }
 h3 {
    font-size: 1.1rem;
    margin-bottom: 10px;
-    color: #333;
+    color: var(--text-tertiary);
 }
 .x-remove-btn {
-    background: #e74c3c;
+    background: var(--error-bg);
-    color: #fff;
+    color: var(--text-white);
    border: none;
    border-radius: 50%;
-    width: 28px;
+    width: 32px;
-    height: 28px;
+    height: 32px;
-    font-size: 1.2rem;
+    font-size: 1.25rem;
    line-height: 1;
    display: inline-flex;
    align-items: center;
    justify-content: center;
    cursor: pointer;
    transition: background 0.15s;
-    margin: 0 2px;
+    margin: 0 3px;
-    box-shadow: 0 1px 2px rgba(44,62,80,0.06);
+    box-shadow: 0 1px 2px var(--shadow-light);
    outline: none;
    padding: 0;
    vertical-align: middle;
 }
 .x-remove-btn:hover, .x-remove-btn:focus {
-    background: #c0392b;
+    background: var(--error-bg-dark);
-    color: #fff;
+    color: var(--text-white);
-    outline: 2px solid #e74c3c;
+    outline: 2px solid var(--error-bg);
 }
 .form-row {
@ -202,9 +312,9 @@ label {
    margin-bottom: 0;
 }
-input[type='text'], input[type='file'] {
+input[type='text'], input[type='file'], textarea {
    padding: 8px 10px;
-    border: 1px solid #cfd8dc;
+    border: 1px solid var(--border-medium);
    border-radius: 4px;
    font-size: 1rem;
    width: 100%;
@ -212,14 +322,21 @@ input[type='text'], input[type='file'] {
 }
 input[type='file'] {
-    background: #f7f7f7;
+    background: var(--bg-gray-light);
    font-size: 0.97rem;
 }
 .dialog-edit-textarea {
    min-height: 60px;
    resize: vertical;
    font-family: inherit;
    line-height: 1.4;
 }
 button {
    padding: 9px 18px;
-    background: #4a90e2;
+    background: var(--primary-blue);
-    color: #fff;
+    color: var(--text-white);
    border: none;
    border-radius: 5px;
    cursor: pointer;
@ -229,8 +346,42 @@ button {
    margin-right: 10px;
 }
 .generate-line-btn, .play-line-btn {
    background: var(--bg-blue-light);
    color: var(--text-blue);
    border: 1.5px solid var(--border-blue);
    border-radius: 50%;
    width: 32px;
    height: 32px;
    font-size: 1.25rem;
    display: inline-flex;
    align-items: center;
    justify-content: center;
    margin: 0 3px;
    padding: 0;
    box-shadow: 0 1px 2px var(--shadow-light);
    vertical-align: middle;
 }
 .generate-line-btn:disabled, .play-line-btn:disabled {
    opacity: 0.45;
    cursor: not-allowed;
 }
 .generate-line-btn.loading {
    background: var(--warning-bg);
    color: var(--warning-text);
    border-color: var(--warning-border);
 }
 .generate-line-btn:hover, .play-line-btn:hover {
    background: var(--bg-blue-lighter);
    color: var(--text-blue-darker);
    border-color: var(--text-blue);
 }
 button:hover, button:focus {
-    background: #357ab8;
+    background: var(--primary-blue-dark);
    outline: none;
 }
@ -246,7 +397,7 @@ button:hover, button:focus {
 #speaker-list li {
    padding: 7px 0;
-    border-bottom: 1px solid #e3e3e3;
+    border-bottom: 1px solid var(--border-gray);
    display: flex;
    justify-content: space-between;
    align-items: center;
@ -257,7 +408,7 @@ button:hover, button:focus {
 }
 pre {
-    background: #f3f5f7;
+    background: var(--bg-light);
    padding: 12px;
    border-radius: 4px;
    font-size: 0.98rem;
@ -275,8 +426,8 @@ audio {
 #zip-archive-link {
    display: inline-block;
    margin-right: 10px;
-    color: #fff;
+    color: var(--text-white);
-    background: #4a90e2;
+    background: var(--primary-blue);
    padding: 7px 16px;
    border-radius: 4px;
    text-decoration: none;
@ -285,17 +436,17 @@ audio {
 }
 #zip-archive-link:hover, #zip-archive-link:focus {
-    background: #357ab8;
+    background: var(--primary-blue-dark);
 }
 footer {
    text-align: center;
    padding: 20px 0;
-    background: #222e3a;
+    background: var(--header-bg);
-    color: #fff;
+    color: var(--text-white);
    margin-top: 40px;
    font-size: 1rem;
-    border-top: 3px solid #4a90e2;
+    border-top: 3px solid var(--primary-blue);
 }
@media (max-width: 900px) {
@ -328,3 +479,194 @@ footer {
        width: 100%;
    }
 }
 .move-up-btn, .move-down-btn {
    background: var(--bg-blue-light);
    color: var(--text-blue);
    border: 1.5px solid var(--border-blue);
    border-radius: 50%;
    width: 32px;
    height: 32px;
    font-size: 1.25rem;
    display: inline-flex;
    align-items: center;
    justify-content: center;
    margin: 0 3px;
    padding: 0;
    box-shadow: 0 1px 2px var(--shadow-light);
    vertical-align: middle;
    cursor: pointer;
    transition: background 0.15s, color 0.15s, border-color 0.15s;
 }
 .move-up-btn:disabled, .move-down-btn:disabled {
    opacity: 0.45;
    cursor: not-allowed;
 }
 .move-up-btn:hover:not(:disabled), .move-down-btn:hover:not(:disabled) {
    background: var(--bg-blue-lighter);
    color: var(--text-blue-darker);
    border-color: var(--text-blue);
 }
 .move-up-btn:focus, .move-down-btn:focus {
    outline: 2px solid var(--primary-blue);
    outline-offset: 2px;
 }
 /* TTS Settings Modal */
 .modal {
    position: fixed;
    top: 0;
    left: 0;
    width: 100%;
    height: 100%;
    background-color: rgba(0, 0, 0, 0.5);
    z-index: 1000;
    display: flex;
    align-items: center;
    justify-content: center;
 }
 .modal-content {
    background: var(--bg-white);
    border-radius: 8px;
    box-shadow: var(--shadow-strong);
    max-width: 500px;
    width: 90%;
    max-height: 80vh;
    overflow-y: auto;
 }
 .modal-header {
    display: flex;
    justify-content: space-between;
    align-items: center;
    padding: 20px 24px 16px;
    border-bottom: 1px solid var(--border-light);
 }
 .modal-header h3 {
    margin: 0;
    color: var(--text-primary);
    font-size: 1.25rem;
 }
 .modal-close {
    background: none;
    border: none;
    font-size: 24px;
    cursor: pointer;
    color: var(--text-secondary);
    padding: 4px;
    border-radius: 4px;
    transition: background-color 0.2s;
 }
 .modal-close:hover {
    background-color: var(--bg-light);
 }
 .modal-body {
    padding: 20px 24px;
 }
 .settings-group {
    margin-bottom: 20px;
 }
 .settings-group label {
    display: block;
    margin-bottom: 8px;
    font-weight: 500;
    color: var(--text-primary);
 }
 .settings-group input[type="range"] {
    width: 100%;
    margin-bottom: 4px;
 }
 .settings-group span {
    display: inline-block;
    min-width: 40px;
    font-weight: 500;
    color: var(--primary-blue);
    margin-left: 8px;
 }
 .settings-group small {
    display: block;
    color: var(--text-secondary);
    font-size: 0.875rem;
    margin-top: 4px;
    line-height: 1.3;
 }
 .modal-footer {
    display: flex;
    gap: 12px;
    justify-content: flex-end;
    padding: 16px 24px 20px;
    border-top: 1px solid var(--border-light);
 }
 .btn-primary, .btn-secondary {
    padding: 8px 16px;
    border-radius: 4px;
    border: none;
    cursor: pointer;
    font-size: 0.875rem;
    font-weight: 500;
    transition: all 0.2s;
 }
 .btn-primary {
    background-color: var(--primary-blue);
    color: var(--text-white);
 }
 .btn-primary:hover {
    background-color: var(--primary-blue-dark);
 }
 .btn-secondary {
    background-color: var(--bg-light);
    color: var(--text-secondary);
    border: 1px solid var(--border-medium);
 }
 .btn-secondary:hover {
    background-color: var(--border-light);
 }
 /* Settings button styling */
 .settings-line-btn {
    width: 32px;
    height: 32px;
    border-radius: 50%;
    border: none;
    background-color: var(--bg-light);
    color: var(--text-secondary);
    cursor: pointer;
    font-size: 14px;
    margin: 0 2px;
    transition: all 0.2s;
    display: inline-flex;
    align-items: center;
    justify-content: center;
    vertical-align: middle;
 }
 .settings-line-btn:hover {
    background-color: var(--primary-blue);
    color: var(--text-white);
    transform: scale(1.05);
 }
 .settings-line-btn:disabled {
    opacity: 0.5;
    cursor: not-allowed;
    transform: none;
 }
--- a/frontend/index.html
+++ b/frontend/index.html
@ -38,12 +38,17 @@
                <div class="dialog-controls form-row">
                    <button id="add-speech-line-btn">Add Speech Line</button>
                    <button id="add-silence-line-btn">Add Silence Line</button>
                    <button id="generate-dialog-btn">Generate Dialog</button>
                </div>
                <div class="dialog-controls form-row">
                    <label for="output-base-name">Output Base Name:</label>
                    <input type="text" id="output-base-name" name="output-base-name" value="dialog_output" required>
                </div>
-                <button id="generate-dialog-btn">Generate Dialog</button>
+                <div class="dialog-controls form-row">
                    <button id="save-script-btn">Save Script</button>
                    <input type="file" id="load-script-input" accept=".jsonl" style="display: none;">
                    <button id="load-script-btn">Load Script</button>
                </div>
            </section>
        </div>
        <!-- Results below -->
@ -96,6 +101,40 @@
        </div>
    </footer>
    <!-- TTS Settings Modal -->
    <div id="tts-settings-modal" class="modal" style="display: none;">
        <div class="modal-content">
            <div class="modal-header">
                <h3>TTS Settings</h3>
                <button class="modal-close" id="tts-modal-close">&times;</button>
            </div>
            <div class="modal-body">
                <div class="settings-group">
                    <label for="tts-exaggeration">Exaggeration:</label>
                    <input type="range" id="tts-exaggeration" min="0" max="2" step="0.1" value="0.5">
                    <span id="tts-exaggeration-value">0.5</span>
                    <small>Controls expressiveness. Higher values = more exaggerated speech.</small>
                </div>
                <div class="settings-group">
                    <label for="tts-cfg-weight">CFG Weight:</label>
                    <input type="range" id="tts-cfg-weight" min="0" max="2" step="0.1" value="0.5">
                    <span id="tts-cfg-weight-value">0.5</span>
                    <small>Alignment with prompt. Higher values = more aligned with speaker characteristics.</small>
                </div>
                <div class="settings-group">
                    <label for="tts-temperature">Temperature:</label>
                    <input type="range" id="tts-temperature" min="0" max="2" step="0.1" value="0.8">
                    <span id="tts-temperature-value">0.8</span>
                    <small>Randomness. Lower values = more deterministic, higher = more varied.</small>
                </div>
            </div>
            <div class="modal-footer">
                <button id="tts-settings-save" class="btn-primary">Save Settings</button>
                <button id="tts-settings-cancel" class="btn-secondary">Cancel</button>
            </div>
        </div>
    </div>
    <script src="js/api.js" type="module"></script>
    <script src="js/app.js" type="module" defer></script>
 </body>
--- a/frontend/js/api.js
+++ b/frontend/js/api.js
@ -100,6 +100,27 @@ export async function deleteSpeaker(speakerId) {
 // ... (keep API_BASE_URL, getSpeakers, addSpeaker, deleteSpeaker)
 /**
 * Generates audio for a single dialog line (speech or silence).
 * @param {Object} line - The dialog line object (type: 'speech' or 'silence').
 * @returns {Promise<Object>} Resolves with { audio_url } on success.
 * @throws {Error} If the network response is not ok.
 */
 export async function generateLine(line) {
    const response = await fetch(`${API_BASE_URL}/dialog/generate_line/`, {
        method: 'POST',
        headers: {
            'Content-Type': 'application/json',
        },
        body: JSON.stringify(line),
    });
    if (!response.ok) {
        const errorData = await response.json().catch(() => ({ message: response.statusText }));
        throw new Error(`Failed to generate line audio: ${errorData.detail || errorData.message || response.statusText}`);
    }
    return response.json();
 }
 /**
 * Generates a dialog by sending a payload to the backend.
 * @param {Object} dialogPayload - The payload for dialog generation.
--- a/frontend/js/app.js
+++ b/frontend/js/app.js
@ -7,10 +7,10 @@ const API_BASE_URL = 'http://localhost:8000'; // Assuming backend runs here
 // then this should be http://localhost:8000. The backend will return paths like /generated_audio/...
 const API_BASE_URL_FOR_FILES = 'http://localhost:8000'; 
-document.addEventListener('DOMContentLoaded', () => {
+document.addEventListener('DOMContentLoaded', async () => {
    console.log('DOM fully loaded and parsed');
    initializeSpeakerManagement();
-    initializeDialogEditor(); // Placeholder for now
+    await initializeDialogEditor(); // Now properly awaiting the async function
    initializeResultsDisplay(); // Placeholder for now
 });
@ -109,12 +109,34 @@ async function handleDeleteSpeaker(speakerId) {
 let dialogItems = []; // Holds the sequence of speech/silence items
 let availableSpeakersCache = []; // To populate speaker dropdown
-function initializeDialogEditor() {
+// Utility: ensure each dialog item has audioUrl, isGenerating, error
 function normalizeDialogItem(item) {
    const normalized = {
        ...item,
        audioUrl: item.audioUrl || null,
        isGenerating: item.isGenerating || false,
        error: item.error || null
    };
    // Add TTS settings for speech items with defaults
    if (item.type === 'speech') {
        normalized.exaggeration = item.exaggeration ?? 0.5;
        normalized.cfg_weight = item.cfg_weight ?? 0.5;
        normalized.temperature = item.temperature ?? 0.8;
    }
    return normalized;
 }
 async function initializeDialogEditor() {
    const dialogItemsContainer = document.getElementById('dialog-items-container');
    const addSpeechLineBtn = document.getElementById('add-speech-line-btn');
    const addSilenceLineBtn = document.getElementById('add-silence-line-btn');
    const outputBaseNameInput = document.getElementById('output-base-name');
    const generateDialogBtn = document.getElementById('generate-dialog-btn');
    const saveScriptBtn = document.getElementById('save-script-btn');
    const loadScriptBtn = document.getElementById('load-script-btn');
    const loadScriptInput = document.getElementById('load-script-input');
    // Results Display Elements
    const generationLogPre = document.getElementById('generation-log-content'); // Corrected ID
@ -127,23 +149,49 @@ function initializeDialogEditor() {
    let dialogItems = [];
    let availableSpeakersCache = []; // Cache for speaker names and IDs
    // Load speakers at startup
    try {
        availableSpeakersCache = await getSpeakers();
        console.log(`Loaded ${availableSpeakersCache.length} speakers for dialog editor`);
    } catch (error) {
        console.error('Error loading speakers at startup:', error);
        // Continue without speakers - they'll be loaded when needed
    }
    // Function to render the current dialogItems array to the DOM as table rows
    function renderDialogItems() {
        if (!dialogItemsContainer) return;
        dialogItemsContainer.innerHTML = '';
        dialogItems = dialogItems.map(normalizeDialogItem); // Ensure all fields present
        dialogItems.forEach((item, index) => {
            const tr = document.createElement('tr');
            // Type column
            const typeTd = document.createElement('td');
-            typeTd.textContent = item.type === 'speech' ? 'Speech' : 'Silence';
+            typeTd.classList.add('type-icon-cell');
            if (item.type === 'speech') {
                typeTd.innerHTML = '<span class="dialog-type-icon" title="Speech" aria-label="Speech">🗣️</span>';
            } else {
                typeTd.innerHTML = '<span class="dialog-type-icon" title="Silence" aria-label="Silence">🤫</span>';
            }
            tr.appendChild(typeTd);
            // Speaker column
            const speakerTd = document.createElement('td');
            if (item.type === 'speech') {
-                const speaker = availableSpeakersCache.find(s => s.id === item.speaker_id);
+                const speakerSelect = document.createElement('select');
-                speakerTd.textContent = speaker ? speaker.name : 'Unknown Speaker';
+                speakerSelect.className = 'dialog-speaker-select';
                availableSpeakersCache.forEach(speaker => {
                    const option = document.createElement('option');
                    option.value = speaker.id;
                    option.textContent = speaker.name;
                    if (speaker.id === item.speaker_id) option.selected = true;
                    speakerSelect.appendChild(option);
                });
                speakerSelect.onchange = (e) => {
                    dialogItems[index].speaker_id = e.target.value;
                };
                speakerTd.appendChild(speakerSelect);
            } else {
                speakerTd.textContent = '—';
            }
@ -151,17 +199,94 @@ function initializeDialogEditor() {
            // Text/Duration column
            const textTd = document.createElement('td');
            textTd.className = 'dialog-editable-cell';
            if (item.type === 'speech') {
                let txt = item.text.length > 60 ? item.text.substring(0, 57) + '…' : item.text;
                textTd.textContent = `"${txt}"`;
                textTd.title = item.text;
            } else {
                textTd.textContent = `${item.duration}s`;
            }
            // Double-click to edit
            textTd.ondblclick = () => {
                textTd.innerHTML = '';
                let input;
                if (item.type === 'speech') {
                    // Use textarea for speech text to enable multi-line editing
                    input = document.createElement('textarea');
                    input.className = 'dialog-edit-textarea';
                    input.value = item.text;
                    input.rows = Math.max(2, Math.ceil(item.text.length / 50)); // Auto-size based on content
                } else {
                    // Use number input for duration
                    input = document.createElement('input');
                    input.className = 'dialog-edit-input';
                    input.type = 'number';
                    input.value = item.duration;
                    input.min = 0.1;
                    input.step = 0.1;
                }
                input.onblur = saveEdit;
                input.onkeydown = (e) => {
                    if (e.key === 'Enter' && item.type === 'silence') {
                        // Only auto-save on Enter for duration inputs
                        input.blur();
                    } else if (e.key === 'Escape') {
                        renderDialogItems();
                    } else if (e.key === 'Enter' && e.ctrlKey && item.type === 'speech') {
                        // Ctrl+Enter to save for speech text
                        input.blur();
                    }
                };
                textTd.appendChild(input);
                input.focus();
                function saveEdit() {
                    if (item.type === 'speech') {
                        dialogItems[index].text = input.value;
                        dialogItems[index].audioUrl = null; // Invalidate audio if text edited
                    } else {
                        let val = parseFloat(input.value);
                        if (!isNaN(val) && val > 0) dialogItems[index].duration = val;
                        dialogItems[index].audioUrl = null;
                    }
                    renderDialogItems();
                }
            };
            tr.appendChild(textTd);
            // Actions column
            const actionsTd = document.createElement('td');
            actionsTd.classList.add('actions');
            // Up button
            const upBtn = document.createElement('button');
            upBtn.innerHTML = '↑';
            upBtn.title = 'Move up';
            upBtn.className = 'move-up-btn';
            upBtn.disabled = index === 0;
            upBtn.onclick = () => {
                if (index > 0) {
                    [dialogItems[index - 1], dialogItems[index]] = [dialogItems[index], dialogItems[index - 1]];
                    renderDialogItems();
                }
            };
            actionsTd.appendChild(upBtn);
            // Down button
            const downBtn = document.createElement('button');
            downBtn.innerHTML = '↓';
            downBtn.title = 'Move down';
            downBtn.className = 'move-down-btn';
            downBtn.disabled = index === dialogItems.length - 1;
            downBtn.onclick = () => {
                if (index < dialogItems.length - 1) {
                    [dialogItems[index], dialogItems[index + 1]] = [dialogItems[index + 1], dialogItems[index]];
                    renderDialogItems();
                }
            };
            actionsTd.appendChild(downBtn);
            // Remove button
            const removeBtn = document.createElement('button');
            removeBtn.innerHTML = '&times;'; // Unicode multiplication sign (X)
            removeBtn.classList.add('remove-dialog-item-btn', 'x-remove-btn');
@ -172,8 +297,71 @@ function initializeDialogEditor() {
                renderDialogItems();
            };
            actionsTd.appendChild(removeBtn);
            tr.appendChild(actionsTd);
            // --- NEW: Per-line Generate button ---
            const generateBtn = document.createElement('button');
            generateBtn.innerHTML = '⚡';
            generateBtn.title = 'Generate audio for this line';
            generateBtn.className = 'generate-line-btn';
            generateBtn.disabled = item.isGenerating;
            if (item.isGenerating) generateBtn.classList.add('loading');
            generateBtn.onclick = async () => {
                dialogItems[index].isGenerating = true;
                dialogItems[index].error = null;
                renderDialogItems();
                try {
                    const { generateLine } = await import('./api.js');
                    const payload = { ...item };
                    // Remove fields not needed by backend
                    delete payload.audioUrl; delete payload.isGenerating; delete payload.error;
                    const result = await generateLine(payload);
                    dialogItems[index].audioUrl = result.audio_url;
                } catch (err) {
                    dialogItems[index].error = err.message || 'Failed to generate audio.';
                    alert(dialogItems[index].error);
                } finally {
                    dialogItems[index].isGenerating = false;
                    renderDialogItems();
                }
            };
            actionsTd.appendChild(generateBtn);
            // --- NEW: Per-line Play button ---
            const playBtn = document.createElement('button');
            playBtn.innerHTML = '⏵';
            playBtn.title = item.audioUrl ? 'Play generated audio' : 'No audio generated yet';
            playBtn.className = 'play-line-btn';
            playBtn.disabled = !item.audioUrl;
            playBtn.onclick = () => {
                if (!item.audioUrl) return;
                let audioUrl = item.audioUrl.startsWith('http') ? item.audioUrl : `${API_BASE_URL_FOR_FILES}${item.audioUrl}`;
                // Use a shared audio element or create one per play
                let audio = new window.Audio(audioUrl);
                audio.play();
            };
            actionsTd.appendChild(playBtn);
            // --- NEW: Settings button for speech items ---
            if (item.type === 'speech') {
                const settingsBtn = document.createElement('button');
                settingsBtn.innerHTML = '⚙️';
                settingsBtn.title = 'TTS Settings';
                settingsBtn.className = 'settings-line-btn';
                settingsBtn.onclick = () => {
                    showTTSSettingsModal(item, index);
                };
                actionsTd.appendChild(settingsBtn);
            }
            // Show error if present
            if (item.error) {
                const errorSpan = document.createElement('span');
                errorSpan.className = 'line-error-msg';
                errorSpan.textContent = item.error;
                actionsTd.appendChild(errorSpan);
            }
            tr.appendChild(actionsTd);
            dialogItemsContainer.appendChild(tr);
        });
    }
@ -231,7 +419,7 @@ function initializeDialogEditor() {
                    alert('Please select a speaker and enter text.');
                    return;
                }
-                dialogItems.push({ type: 'speech', speaker_id: speakerId, text: text });
+                dialogItems.push(normalizeDialogItem({ type: 'speech', speaker_id: speakerId, text: text }));
                renderDialogItems();
                clearTempInputArea();
            };
@ -273,7 +461,7 @@ function initializeDialogEditor() {
                    alert('Invalid duration. Please enter a positive number.');
                    return;
                }
-                dialogItems.push({ type: 'silence', duration: duration });
+                dialogItems.push(normalizeDialogItem({ type: 'silence', duration: duration }));
                renderDialogItems();
                clearTempInputArea();
            };
@ -301,40 +489,25 @@ function initializeDialogEditor() {
            }
            if (dialogItems.length === 0) {
                alert('Please add at least one speech or silence line to the dialog.');
-                return;
+                return; // Prevent further execution if no dialog items
            }
-            // Clear previous results and show loading/status
+            // Smart dialog-wide generation: use pre-generated audio where present
-            if (generationLogPre) generationLogPre.textContent = 'Generating dialog...';
+            const dialogItemsToGenerate = dialogItems.map(item => {
-            if (audioPlayer) {
+                // Only send minimal fields for items that need generation
-                audioPlayer.style.display = 'none';
+                if (item.audioUrl) {
-                audioPlayer.src = ''; // Clear previous audio source
+                    return { ...item, use_existing_audio: true, audio_url: item.audioUrl };
-            }
+                } else {
-            if (downloadZipLink) {
+                    // Remove frontend-only fields
-                downloadZipLink.style.display = 'none';
+                    const payload = { ...item };
-                downloadZipLink.href = '#';
+                    delete payload.audioUrl; delete payload.isGenerating; delete payload.error;
-                downloadZipLink.textContent = '';
+                    return payload;
-            }
+                }
-            if (zipArchivePlaceholder) zipArchivePlaceholder.style.display = 'block'; // Show placeholder
+            });
            if (resultsDisplaySection) resultsDisplaySection.style.display = 'block'; // Make sure it's visible
            const payload = {
                output_base_name: outputBaseName,
-                dialog_items: dialogItems.map(item => {
+                dialog_items: dialogItemsToGenerate
                    // For now, we are not collecting TTS params in the UI for speech items.
                    // The backend will use defaults. If we add UI for these later, they'd be included here.
                    if (item.type === 'speech') {
                        return {
                            type: item.type,
                            speaker_id: item.speaker_id,
                            text: item.text,
                            // exaggeration: item.exaggeration, // Example for future UI enhancement
                            // cfg_weight: item.cfg_weight,
                            // temperature: item.temperature
                        };
                    }
                    return item; // for silence items
                })
            };
            try {
@ -345,7 +518,10 @@ function initializeDialogEditor() {
                if (generationLogPre) generationLogPre.textContent = result.log || 'No log output.';
                if (result.concatenated_audio_url && audioPlayer) { // Check audioPlayer, not audioSource
-                    audioPlayer.src = result.concatenated_audio_url.startsWith('http') ? result.concatenated_audio_url : `${API_BASE_URL_FOR_FILES}${result.concatenated_audio_url}`;
+                    // Cache-busting: append timestamp to force reload
                    let audioUrl = result.concatenated_audio_url.startsWith('http') ? result.concatenated_audio_url : `${API_BASE_URL_FOR_FILES}${result.concatenated_audio_url}`;
                    audioUrl += (audioUrl.includes('?') ? '&' : '?') + 't=' + Date.now();
                    audioPlayer.src = audioUrl;
                    audioPlayer.load(); // Call load() after setting new source
                    audioPlayer.style.display = 'block';
                } else {
@ -372,8 +548,274 @@ function initializeDialogEditor() {
        });
    }
    // --- Save/Load Script Functionality ---
    function saveDialogScript() {
        if (dialogItems.length === 0) {
            alert('No dialog items to save. Please add some speech or silence lines first.');
            return;
        }
        // Filter out UI-specific fields and create clean data for export
        const exportData = dialogItems.map(item => {
            const cleanItem = {
                type: item.type
            };
            if (item.type === 'speech') {
                cleanItem.speaker_id = item.speaker_id;
                cleanItem.text = item.text;
                // Include TTS parameters if they exist (will use defaults if not present)
                if (item.exaggeration !== undefined) cleanItem.exaggeration = item.exaggeration;
                if (item.cfg_weight !== undefined) cleanItem.cfg_weight = item.cfg_weight;
                if (item.temperature !== undefined) cleanItem.temperature = item.temperature;
            } else if (item.type === 'silence') {
                cleanItem.duration = item.duration;
            }
            return cleanItem;
        });
        // Convert to JSONL format (one JSON object per line)
        const jsonlContent = exportData.map(item => JSON.stringify(item)).join('\n');
        // Create and download file
        const blob = new Blob([jsonlContent], { type: 'application/jsonl' });
        const url = URL.createObjectURL(blob);
        const link = document.createElement('a');
        // Generate filename with timestamp
        const timestamp = new Date().toISOString().replace(/[:.]/g, '-').slice(0, 19);
        const filename = `dialog_script_${timestamp}.jsonl`;
        link.href = url;
        link.download = filename;
        link.style.display = 'none';
        document.body.appendChild(link);
        link.click();
        document.body.removeChild(link);
        URL.revokeObjectURL(url);
        console.log(`Dialog script saved as ${filename}`);
    }
    function loadDialogScript(file) {
        if (!file) {
            alert('Please select a file to load.');
            return;
        }
        const reader = new FileReader();
        reader.onload = async function(e) {
            try {
                const content = e.target.result;
                const lines = content.trim().split('\n');
                const loadedItems = [];
                for (let i = 0; i < lines.length; i++) {
                    const line = lines[i].trim();
                    if (!line) continue; // Skip empty lines
                    try {
                        const item = JSON.parse(line);
                        const validatedItem = validateDialogItem(item, i + 1);
                        if (validatedItem) {
                            loadedItems.push(normalizeDialogItem(validatedItem));
                        }
                    } catch (parseError) {
                        console.error(`Error parsing line ${i + 1}:`, parseError);
                        alert(`Error parsing line ${i + 1}: ${parseError.message}`);
                        return;
                    }
                }
                if (loadedItems.length === 0) {
                    alert('No valid dialog items found in the file.');
                    return;
                }
                // Confirm replacement if existing items
                if (dialogItems.length > 0) {
                    const confirmed = confirm(
                        `This will replace your current dialog (${dialogItems.length} items) with the loaded script (${loadedItems.length} items). Continue?`
                    );
                    if (!confirmed) return;
                }
                // Ensure speakers are loaded before rendering
                if (availableSpeakersCache.length === 0) {
                    try {
                        availableSpeakersCache = await getSpeakers();
                    } catch (error) {
                        console.error('Error fetching speakers:', error);
                        alert('Could not load speakers. Dialog loaded but speaker names may not display correctly.');
                    }
                }
                // Replace current dialog
                dialogItems.splice(0, dialogItems.length, ...loadedItems);
                renderDialogItems();
                console.log(`Loaded ${loadedItems.length} dialog items from script`);
                alert(`Successfully loaded ${loadedItems.length} dialog items.`);
            } catch (error) {
                console.error('Error loading dialog script:', error);
                alert(`Error loading dialog script: ${error.message}`);
            }
        };
        reader.onerror = function() {
            alert('Error reading file. Please try again.');
        };
        reader.readAsText(file);
    }
    function validateDialogItem(item, lineNumber) {
        if (!item || typeof item !== 'object') {
            throw new Error(`Line ${lineNumber}: Invalid item format`);
        }
        if (!item.type || !['speech', 'silence'].includes(item.type)) {
            throw new Error(`Line ${lineNumber}: Invalid or missing type. Must be 'speech' or 'silence'`);
        }
        if (item.type === 'speech') {
            if (!item.speaker_id || typeof item.speaker_id !== 'string') {
                throw new Error(`Line ${lineNumber}: Speech items must have a valid speaker_id`);
            }
            if (!item.text || typeof item.text !== 'string') {
                throw new Error(`Line ${lineNumber}: Speech items must have text`);
            }
            // Check if speaker exists in available speakers
            const speakerExists = availableSpeakersCache.some(speaker => speaker.id === item.speaker_id);
            if (availableSpeakersCache.length > 0 && !speakerExists) {
                console.warn(`Line ${lineNumber}: Speaker '${item.speaker_id}' not found in available speakers`);
                // Don't throw error, just warn - speaker might be added later
            }
            return {
                type: 'speech',
                speaker_id: item.speaker_id,
                text: item.text
            };
        } else if (item.type === 'silence') {
            if (typeof item.duration !== 'number' || item.duration <= 0) {
                throw new Error(`Line ${lineNumber}: Silence items must have a positive duration number`);
            }
            return {
                type: 'silence',
                duration: item.duration
            };
        }
    }
    // Event handlers for save/load
    if (saveScriptBtn) {
        saveScriptBtn.addEventListener('click', saveDialogScript);
    }
    if (loadScriptBtn && loadScriptInput) {
        loadScriptBtn.addEventListener('click', () => {
            loadScriptInput.click();
        });
        loadScriptInput.addEventListener('change', (e) => {
            const file = e.target.files[0];
            if (file) {
                loadDialogScript(file);
                // Reset input so same file can be loaded again
                e.target.value = '';
            }
        });
    }
    console.log('Dialog Editor Initialized');
    renderDialogItems(); // Initial render (empty)
    // Add a function to show TTS settings modal
    function showTTSSettingsModal(item, index) {
        const modal = document.getElementById('tts-settings-modal');
        const exaggerationSlider = document.getElementById('tts-exaggeration');
        const exaggerationValue = document.getElementById('tts-exaggeration-value');
        const cfgWeightSlider = document.getElementById('tts-cfg-weight');
        const cfgWeightValue = document.getElementById('tts-cfg-weight-value');
        const temperatureSlider = document.getElementById('tts-temperature');
        const temperatureValue = document.getElementById('tts-temperature-value');
        const saveBtn = document.getElementById('tts-settings-save');
        const cancelBtn = document.getElementById('tts-settings-cancel');
        const closeBtn = document.getElementById('tts-modal-close');
        // Set current values
        exaggerationSlider.value = item.exaggeration || 0.5;
        exaggerationValue.textContent = exaggerationSlider.value;
        cfgWeightSlider.value = item.cfg_weight || 0.5;
        cfgWeightValue.textContent = cfgWeightSlider.value;
        temperatureSlider.value = item.temperature || 0.8;
        temperatureValue.textContent = temperatureSlider.value;
        // Update value displays when sliders change
        const updateValueDisplay = (slider, display) => {
            display.textContent = slider.value;
        };
        exaggerationSlider.oninput = () => updateValueDisplay(exaggerationSlider, exaggerationValue);
        cfgWeightSlider.oninput = () => updateValueDisplay(cfgWeightSlider, cfgWeightValue);
        temperatureSlider.oninput = () => updateValueDisplay(temperatureSlider, temperatureValue);
        // Show modal
        modal.style.display = 'flex';
        // Save settings
        const saveSettings = () => {
            dialogItems[index].exaggeration = parseFloat(exaggerationSlider.value);
            dialogItems[index].cfg_weight = parseFloat(cfgWeightSlider.value);
            dialogItems[index].temperature = parseFloat(temperatureSlider.value);
            // Clear any existing audio since settings changed
            dialogItems[index].audioUrl = null;
            closeModal();
            renderDialogItems(); // Re-render to reflect changes
            console.log('TTS settings updated for item:', dialogItems[index]);
        };
        // Close modal
        const closeModal = () => {
            modal.style.display = 'none';
            // Clean up event listeners
            exaggerationSlider.oninput = null;
            cfgWeightSlider.oninput = null;
            temperatureSlider.oninput = null;
            saveBtn.onclick = null;
            cancelBtn.onclick = null;
            closeBtn.onclick = null;
            modal.onclick = null;
        };
        // Event listeners
        saveBtn.onclick = saveSettings;
        cancelBtn.onclick = closeModal;
        closeBtn.onclick = closeModal;
        // Close modal when clicking outside
        modal.onclick = (e) => {
            if (e.target === modal) {
                closeModal();
            }
        };
        // Close modal on Escape key
        const handleEscape = (e) => {
            if (e.key === 'Escape') {
                closeModal();
                document.removeEventListener('keydown', handleEscape);
            }
        };
        document.addEventListener('keydown', handleEscape);
    }
 }
 // --- Results Display --- //
--- a/speaker_data/speakers.yaml
+++ b/speaker_data/speakers.yaml
@ -10,3 +10,12 @@
 fb84ce1c-f32d-4df9-9673-2c64e9603133:
  name: Debbie
  sample_path: speaker_samples/fb84ce1c-f32d-4df9-9673-2c64e9603133.wav
 90fcd672-ba84-441a-ac6c-0449a59653bd:
  name: dummy_speaker
  sample_path: speaker_samples/90fcd672-ba84-441a-ac6c-0449a59653bd.wav
 a6387c23-4ca4-42b5-8aaf-5699dbabbdf0:
  name: Mike
  sample_path: speaker_samples/a6387c23-4ca4-42b5-8aaf-5699dbabbdf0.wav
 6cf4d171-667d-4bc8-adbb-6d9b7c620cb8:
  name: Minnie
  sample_path: speaker_samples/6cf4d171-667d-4bc8-adbb-6d9b7c620cb8.wav
Author	SHA1	Message	Date
Steve White	c91a9598b1	Add API reference	2025-06-06 23:18:47 -05:00
Steve White	b37aa56fa6	Added persistence of TTS settings in dialog save/restore.	2025-06-06 11:58:48 -05:00
Steve White	f9e952286d	Added settings to allow control of exaggeration, cfg_weight, and temperature on each line.	2025-06-06 11:53:43 -05:00
Steve White	26f1d98b46	Fixed Play button to match other icons	2025-06-06 11:35:39 -05:00
Steve White	e11a4a091c	variablized colors in the .css, tweaked them. Re-arranged buttons.	2025-06-06 11:33:54 -05:00
Steve White	252f885b5a	Updated buttons, save/load	2025-06-06 10:36:06 -05:00
Steve White	d8eb2492d7	Add dialog script save/load functionality and CLAUDE.md - Implement save/load buttons in dialog editor interface - Add JSONL export/import for dialog scripts with validation - Include timestamp-based filenames for saved scripts - Add comprehensive error handling and user confirmations - Create CLAUDE.md with development guidance and architecture overview 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-06-06 10:05:58 -05:00
Steve White	d3ff6e5241	Now uses pre-generated files in concatenated file.	2025-06-06 09:49:04 -05:00
Steve White	4a7c1ea6a1	Added per-line generation and playback; currently regenerates when you hit 'Generate Audio'	2025-06-06 08:44:21 -05:00
Steve White	0261b86ad2	added single line generation to the backend	2025-06-06 08:26:15 -05:00
Steve White	6ccdd18463	Made rows re-orderable	2025-06-06 00:10:36 -05:00
Steve White	1575bf4292	Made speakers drop-down selectable.	2025-06-06 00:05:05 -05:00
Steve White	f2f907452b	Miscellaneous visual changes	2025-06-06 00:02:00 -05:00