Working dialog generator

chore: remove node_modules from git tracking and add to .gitignore
Working layout.
2025-06-05 18:46:09 -05:00 · 2025-06-05 17:40:27 -05:00 · 2025-06-05 17:38:12 -05:00 · 2025-06-05 16:47:47 -05:00 · 2025-06-05 09:22:54 -05:00 · 2025-06-05 09:20:19 -05:00
36 changed files with 8425 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -5,3 +5,9 @@ output*.wav
 *.mp3
 dialog_output/
 *.zip
 .DS_Store
 __pycache__
 projects/
 # Node.js dependencies
 node_modules/
--- a/.note/code_structure.md
+++ b/.note/code_structure.md
@ -0,0 +1,32 @@
 # Code Structure
 *(This document will describe the organization of the codebase as it evolves.)*
 ## Current (Gradio-based - to be migrated)
 -   `gradio_app.py`: Main application logic for the Gradio UI.
 -   `requirements.txt`: Python dependencies.
 -   `speaker_samples/`: Directory for speaker audio samples.
 -   `speakers.yaml`: Configuration for speakers.
 -   `single_output/`: Output directory for single utterance TTS.
 -   `dialog_output/`: Output directory for dialog TTS.
 ## Planned (FastAPI + Vanilla JS)
 ### Backend (FastAPI - Python)
 -   `main.py`: FastAPI application entry point, router setup.
 -   `api/`: Directory for API endpoint modules (e.g., `tts_routes.py`, `speaker_routes.py`).
 -   `core/`: Core logic (e.g., TTS processing, dialog assembly, file management).
 -   `models/`: Pydantic models for request/response validation.
 -   `services/`: Business logic services (e.g., `TTSService`, `DialogService`).
 -   `static/` (or served via CDN): For frontend files if not using a separate frontend server during development.
 ### Frontend (Vanilla JavaScript)
 -   `index.html`: Main HTML file.
 -   `css/`: Stylesheets.
    -   `style.css`
 -   `js/`: JavaScript files.
    -   `app.js`: Main application logic.
    -   `api.js`: Functions for interacting with the FastAPI backend.
    -   `uiComponents.js`: Reusable UI components (e.g., DialogLine, AudioPlayer).
    -   `state.js`: Frontend state management (if needed).
 -   `assets/`: Static assets like images or icons.
--- a/.note/current_focus.md
+++ b/.note/current_focus.md
@ -0,0 +1,23 @@
 # Chatterbox TTS Migration: Backend Development (FastAPI)
 **Primary Goal:** Implement the FastAPI backend for TTS dialog generation.
 **Recent Accomplishments (Phase 1, Step 2 - Speaker Management):**
 - Created Pydantic models for speaker data (`speaker_models.py`).
 - Implemented `SpeakerManagementService` (`speaker_service.py`) for CRUD operations on speakers (metadata in `speakers.yaml`, samples in `speaker_samples/`).
 - Created FastAPI router (`routers/speakers.py`) with endpoints: `GET /api/speakers`, `POST /api/speakers`, `GET /api/speakers/{id}`, `DELETE /api/speakers/{id}`.
 - Integrated speaker router into the main FastAPI app (`main.py`).
 - Successfully tested all speaker API endpoints using `curl`.
 **Current Task (Phase 1, Step 3 - TTS Core):**
 - **Develop `TTSService` in `backend/app/services/tts_service.py`.**
  - Focus on `ChatterboxTTS` model loading, inference, and critical memory management.
  - Define methods for speech generation using speaker samples.
  - Manage TTS parameters (exaggeration, cfg_weight, temperature).
 **Next Immediate Steps:**
 1. Finalize and test the initial implementation of `TTSService`.
 2. Proceed to Phase 1, Step 4: Dialog Processing - Implement `DialogProcessorService` including text splitting logic.
--- a/.note/decision_log.md
+++ b/.note/decision_log.md
@ -0,0 +1,22 @@
 # Decision Log
 This log records key decisions made throughout the project, along with their rationale.
 ---
 **Date:** 2025-06-05
 **Decision ID:** 20250605-001
 **Decision:** Adopt the `.note/` Memory Bank system for project documentation and context management.
 **Rationale:** As per user's global development standards (MEMORY[user_global]) to ensure persistent knowledge and effective collaboration, especially given potential agent memory resets.
 **Impact:** Creation of standard `.note/` files (`project_overview.md`, `current_focus.md`, etc.). All significant project information, decisions, and progress will be logged here.
 ---
 **Date:** 2025-06-05
 **Decision ID:** 20250605-002
 **Decision:** Created a detailed migration plan for moving from Gradio to FastAPI & Vanilla JS.
 **Rationale:** Based on a thorough review of `gradio_app.py` and the user's request, a detailed, phased plan was necessary to guide development. This incorporates key findings about TTS model management, text processing, and output requirements.
 **Impact:** The plan is stored in `.note/detailed_migration_plan.md`. `current_focus.md` has been updated to reflect this. Development will follow this plan upon user approval.
 **Related Memory:** MEMORY[b82cdf38-f0b9-45cd-8097-5b1b47030a40] (System memory of the plan)
 ---
--- a/.note/detailed_migration_plan.md
+++ b/.note/detailed_migration_plan.md
@ -0,0 +1,98 @@
 # Chatterbox TTS: Gradio to FastAPI & Vanilla JS Migration Plan
 This plan outlines the steps to re-implement the dialog generation features of the Chatterbox TTS application, moving from the current Gradio-based implementation to a FastAPI backend and a vanilla JavaScript frontend. It incorporates findings from `gradio_app.py` and aligns with the existing high-level strategy (MEMORY[c20c2cce-46d4-453f-9bc3-c18e05dbc66f]).
 ## 1. Backend (FastAPI) Development
 ### Objective
 Create a robust API to handle TTS generation, speaker management, and file delivery.
 ### Key Modules/Components
 * **API Endpoints:**
  * `POST /api/dialog/generate`:
    * **Input**: Structured list: `[{type: "speech", speaker_id: "str", text: "str"}, {type: "silence", duration: float}]`, `output_base_name: str`.
    * **Output**: JSON with `log: str`, `concatenated_audio_url: str`, `zip_archive_url: str`.
  * `GET /api/speakers`: Returns list of available speakers (`[{id: "str", name: "str", sample_path: "str"}]`).
  * `POST /api/speakers`: Adds a new speaker. Input: `name: str`, `audio_sample_file: UploadFile`. Output: `{id: "str", name: "str", message: "str"}`.
  * `DELETE /api/speakers/{speaker_id}`: Removes a speaker.
 * **Core Logic & Services:**
  * `TTSService`:
    * Manages `ChatterboxTTS` model instance(s) (loading, inference, memory cleanup).
    * Handles `ChatterboxTTS.generate()` calls, incorporating parameters like `exaggeration`, `cfg_weight`, `temperature` (decision needed on exposure vs. defaults).
    * Implements rigorous memory management (inspired by `generate_audio` and `process_dialog`'s `reinit_each_line` concept).
  * `DialogProcessorService`:
    * Orchestrates dialog generation using `TTSService`.
    * Implements `split_text_at_sentence_boundaries` logic for long text inputs.
    * Manages generation of individual audio segments.
  * `AudioManipulationService`:
    * Concatenates audio segments using `torch` and `torchaudio`, inserting specified silences.
    * Creates ZIP archives of all generated audio files using `zipfile`.
  * `SpeakerManagementService`:
    * Manages `speakers.yaml` (or alternative storage) for speaker metadata.
    * Handles storage and retrieval of speaker audio samples (e.g., in `speaker_samples/`).
 * **File Handling:**
  * Strategy for storing and serving generated `.wav` and `.zip` files (e.g., FastAPI `StaticFiles`, temporary directories, or cloud storage).
 ### Implementation Steps (Phase 1)
 1. **Project Setup:** Initialize FastAPI project, define dependencies (`fastapi`, `uvicorn`, `python-multipart`, `pyyaml`, `torch`, `torchaudio`, `chatterbox-tts`).
 2. **Speaker Management:** Implement `SpeakerManagementService` and the `/api/speakers` endpoints.
 3. **TTS Core:** Develop `TTSService`, focusing on model loading, inference, and critical memory management.
 4. **Dialog Processing:** Implement `DialogProcessorService` including text splitting.
 5. **Audio Utilities:** Create `AudioManipulationService` for concatenation and zipping.
 6. **Main Endpoint:** Implement `POST /api/dialog/generate` orchestrating the services.
 7. **Configuration:** Manage paths (`speakers.yaml`, sample storage, output directories) and TTS settings.
 8. **Testing:** Thoroughly test all API endpoints using tools like Postman or `curl`.
 ## 2. Frontend (Vanilla JavaScript) Development
 ### Objective
 Create an intuitive UI for dialog construction, speaker management, and interaction with the backend.
 ### Key Modules/Components
 * **HTML (`index.html`):** Structure for dialog editor, speaker controls, results display.
 * **CSS (`style.css`):** Styling for a clean and usable interface.
 * **JavaScript (`app.js`, `api.js`, `ui.js`):
  * `api.js`: Functions for all backend API communications (`fetch`).
  * `ui.js`: DOM manipulation for dynamic dialog lines, speaker lists, and results rendering.
  * `app.js`: Main application logic, event handling, state management (for dialog lines, speaker data).
 ### Implementation Steps (Phase 2)
 1. **Basic Layout:** Create `index.html` and `style.css`.
 2. **API Client:** Develop `api.js` to interface with all backend endpoints.
 3. **Speaker UI:**
  * Fetch and display speakers using `ui.js` and `api.js`.
  * Implement forms and logic for adding (with file upload) and removing speakers.
 4. **Dialog Editor UI:**
  * Dynamically add/remove/reorder dialog lines (speech/silence).
  * Inputs for speaker selection (populated from API), text, and silence duration.
  * Input for `output_base_name`.
 5. **Interaction & Results:**
  * "Generate Dialog" button to submit data via `api.js`.
  * Display generation log, audio player for concatenated output, and download link for ZIP file.
 ## 3. Integration & Testing (Phase 3)
 1. **Full System Connection:** Ensure seamless frontend-backend communication.
 2. **End-to-End Testing:** Test various dialog scenarios, speaker configurations, and error conditions.
 3. **Performance & Memory:** Profile backend memory usage during generation; refine `TTSService` memory strategies if needed.
 4. **UX Refinement:** Iterate on UI/UX based on testing feedback.
 ## 4. Advanced Features & Deployment (Phase 4)
 * (As per MEMORY[c20c2cce-46d4-453f-9bc3-c18e05dbc66f])
 * **Real-time Updates:** Consider WebSockets for live progress during generation.
 * **Deployment Strategy:** Plan for deploying the FastAPI application and serving the static frontend assets.
 ## Key Considerations from `gradio_app.py` Analysis
 * **Memory Management for TTS Model:** This is critical. The `reinit_each_line` option and explicit cleanup in `generate_audio` highlight this. The FastAPI backend must handle this robustly.
 * **Text Chunking:** The `split_text_at_sentence_boundaries` (max 300 chars) logic is essential and must be replicated.
 * **Dialog Parsing:** The `Speaker: "Text"` and `Silence: duration` format should be the basis for the frontend data structure sent to the backend.
 * **TTS Parameters:** Decide whether to expose advanced TTS parameters (`exaggeration`, `cfg_weight`, `temperature`) for dialog lines in the new API.
 * **File Output:** The backend needs to replicate the generation of individual segment files, a concatenated file, and a ZIP archive.
--- a/.note/development_standards.md
+++ b/.note/development_standards.md
@ -0,0 +1,21 @@
 # Development Standards
 *(To be defined. This document will outline coding conventions, patterns, and best practices for the project.)*
 ## General Principles
 -   **Clarity and Readability:** Code should be easy to understand and maintain.
 -   **Modularity:** Design components with clear responsibilities and interfaces.
 -   **Testability:** Write code that is easily testable.
 ## Python (FastAPI Backend)
 -   Follow PEP 8 style guidelines.
 -   Use type hints.
 -   Structure API endpoints logically.
 ## JavaScript (Vanilla JS Frontend)
 -   Follow modern JavaScript best practices (ES6+).
 -   Organize code into modules.
 -   Prioritize performance and responsiveness.
 ## Commit Messages
 -   Follow conventional commit message format (e.g., `feat: add new TTS feature`, `fix: resolve audio playback bug`).
--- a/.note/interfaces.md
+++ b/.note/interfaces.md
@ -0,0 +1,88 @@
 # Component Interfaces
 *(This document will define the interfaces between different components of the system, especially between the frontend and backend.)*
 ## Backend API (FastAPI)
 *(To be detailed. Examples below)*
 ### `/api/tts/generate_single` (POST)
 -   **Request Body:**
    ```json
    {
      "text": "string",
      "speaker_id": "string",
      "temperature": "float (optional)",
      "length_penalty": "float (optional)"
    }
    ```
 -   **Response Body (Success):**
    ```json
    {
      "audio_url": "string (URL to the generated audio file)",
      "duration_ms": "integer"
    }
    ```
 -   **Response Body (Error):**
    ```json
    {
      "detail": "string (error message)"
    }
    ```
 ### `/api/tts/generate_dialog` (POST)
 -   **Request Body:**
    ```json
    {
      "dialog_lines": [
        {
          "type": "speech", // or "silence"
          "speaker_id": "string (required if type is speech)",
          "text": "string (required if type is speech)",
          "duration_s": "float (required if type is silence)"
        }
      ],
      "output_base_name": "string (optional)"
    }
    ```
 -   **Response Body (Success):**
    ```json
    {
      "dialog_audio_url": "string (URL to the concatenated dialog audio file)",
      "individual_files_zip_url": "string (URL to zip of individual lines)",
      "total_duration_ms": "integer"
    }
    ```
 ### `/api/speakers` (GET)
 -   **Response Body (Success):**
    ```json
    [
      {
        "id": "string",
        "name": "string",
        "sample_url": "string (optional)"
      }
    ]
    ```
 ### `/api/speakers` (POST)
 -   **Request Body:** (Multipart form-data)
    -   `name`: "string"
    -   `audio_sample`: file (WAV)
 -   **Response Body (Success):**
    ```json
    {
      "id": "string",
      "name": "string",
      "message": "Speaker added successfully"
    }
    ```
 ## Frontend Components (Vanilla JS)
 *(To be detailed as frontend development progresses.)*
 -   **DialogLine Component:** Manages input for a single line of dialog (speaker, text).
 -   **AudioPlayer Component:** Handles playback of generated audio.
 -   **ProjectManager Component:** Manages overall project state, dialog lines, and interaction with the backend.
--- a/.note/project_overview.md
+++ b/.note/project_overview.md
@ -0,0 +1,42 @@
 # Project Overview: Chatterbox TTS Application Migration
 ## 1. Current System
 The project is currently a Gradio-based application named "Chatterbox TTS Gradio App".
 Its primary function is to provide a user interface for text-to-speech (TTS) generation using the Chatterbox TTS model.
 Key features of the current Gradio application include:
 - Single utterance TTS generation.
 - Multi-speaker dialog generation with configurable silence gaps.
 - Speaker management (adding/removing speakers with custom audio samples).
 - Automatic memory optimization (model cleanup after generation).
 - Organized output file storage (`single_output/` and `dialog_output/`).
 ## 2. Project Goal: Migration to Modern Web Stack
 The primary goal of this project is to re-implement the Chatterbox TTS application, specifically its dialog generation capabilities, by migrating from the current Gradio framework to a new architecture.
 The new architecture will consist of:
 - **Frontend**: Vanilla JavaScript
 - **Backend**: FastAPI (Python)
 This migration aims to address limitations of the Gradio framework, such as audio playback issues, limited UI control, and state management complexity, and to provide a more robust, performant, and professional user experience.
 ## 3. High-Level Plan & Existing Documentation
 A comprehensive implementation plan for this migration already exists and should be consulted. This plan (Memory ID c20c2cce-46d4-453f-9bc3-c18e05dbc66f) outlines:
 - A 4-phase implementation (Backend API, Frontend Development, Integration & Testing, Production Features).
 - The complete technical architecture.
 - A detailed component system (DialogLine, AudioPlayer, ProjectManager).
 - Features like real-time status updates and drag-and-drop functionality.
 - Migration strategies.
 - Expected benefits (e.g., faster responsiveness, better audio reliability).
 - An estimated timeline.
 ## 4. Scope of Current Work
 The immediate next step, as requested by the user, is to:
 1. Review the existing `gradio_app.py`.
 2. Refine or detail the plan for re-implementing the dialog generation functionality with the new stack, leveraging the existing comprehensive plan.
 This document will be updated as the project progresses to reflect new decisions, architectural changes, and milestones.
--- a/.note/session_log.md
+++ b/.note/session_log.md
@ -0,0 +1,46 @@
 # Session Log
 ---
 **Session Start:** 2025-06-05 (Continued)
 **Goal:** Progress Phase 1 of Chatterbox TTS backend migration: Initial Project Setup.
 **Key Activities & Insights:**
 - Created `backend/app/main.py` with a basic FastAPI application instance.
 - Confirmed user has an existing `.venv` at the project root.
 - Updated `backend/README.md` to reflect usage of the root `.venv` instead of a backend-specific one.
    - Adjusted venv activation paths and command execution locations (project root).
 - Installed backend dependencies from `backend/requirements.txt` into the root `.venv`.
 - Successfully ran the basic FastAPI server using `uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000` from the project root.
 - Verified the API is accessible.
 - Confirmed all Memory Bank files are present. Reviewed `current_focus.md` and `session_log.md`.
 **Next Steps:**
 - Update `current_focus.md` and `session_log.md`.
 - Proceed to Phase 1, Step 2: Speaker Management.
 ---
 ---
 **Session Start:** 2025-06-05
 **Goal:** Initiate migration of Chatterbox TTS dialog generator from Gradio to Vanilla JS + FastAPI.
 **Key Activities & Insights:**
 -   User requested review of `gradio_app.py` and a plan for re-implementation.
 -   Checked for `.note/` Memory Bank directory (MEMORY[user_global]).
 -   Directory not found.
 -   Read `README.md` to gather project context.
 -   Created `.note/` directory and populated standard files:
    -   `project_overview.md` (with initial content based on README and user request).
    -   `current_focus.md` (outlining immediate tasks).
    -   `development_standards.md` (template).
    -   `decision_log.md` (logged decision to use Memory Bank).
    -   `code_structure.md` (initial thoughts on current and future structure).
    -   `session_log.md` (this entry).
    -   `interfaces.md` (template).
 **Next Steps:**
 -   Confirm Memory Bank setup with the user.
 -   Proceed to review `gradio_app.py`.
 ---
--- a/babel.config.cjs
+++ b/babel.config.cjs
@ -0,0 +1,13 @@
 // babel.config.cjs
 module.exports = {
  presets: [
    [
      '@babel/preset-env',
      {
        targets: {
          node: 'current', // Target the current version of Node.js
        },
      },
    ],
  ],
 };
--- a/backend/README.md
+++ b/backend/README.md
@ -0,0 +1,34 @@
 # Chatterbox TTS Backend
 This directory contains the FastAPI backend for the Chatterbox TTS application.
 ## Project Structure
 - `app/`: Contains the main FastAPI application code.
  - `__init__.py`: Makes `app` a Python package.
  - `main.py`: FastAPI application instance and core API endpoints.
  - `services/`: Business logic for TTS, dialog processing, etc.
  - `models/`: Pydantic models for API request/response.
  - `utils/`: Utility functions.
 - `requirements.txt`: Project dependencies for the backend.
 - `README.md`: This file.
 ## Setup & Running
 It is assumed you have a Python virtual environment at the project root (e.g., `.venv`).
 1.  Navigate to the **project root** directory (e.g., `/Volumes/SAM2/CODE/chatterbox-test`).
 2.  Activate the existing Python virtual environment:
    ```bash
    source .venv/bin/activate  # On macOS/Linux
    # .\.venv\Scripts\activate  # On Windows
    ```
 3.  Install dependencies (ensure your terminal is in the **project root**):
    ```bash
    pip install -r backend/requirements.txt
    ```
 4.  Run the development server (ensure your terminal is in the **project root**):
    ```bash
    uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000
    ```
 The API should then be accessible at `http://127.0.0.1:8000`.
--- a/backend/app/init.py
+++ b/backend/app/init.py
@ -0,0 +1 @@
--- a/backend/app/config.py
+++ b/backend/app/config.py
@ -0,0 +1,19 @@
 from pathlib import Path
 # Determine PROJECT_ROOT dynamically.
 # If config.py is at /Volumes/SAM2/CODE/chatterbox-test/backend/app/config.py
 # then PROJECT_ROOT (/Volumes/SAM2/CODE/chatterbox-test) is 2 levels up.
 PROJECT_ROOT = Path(__file__).resolve().parents[2]
 # Speaker data paths
 SPEAKER_DATA_BASE_DIR = PROJECT_ROOT / "speaker_data"
 SPEAKER_SAMPLES_DIR = SPEAKER_DATA_BASE_DIR / "speaker_samples"
 SPEAKERS_YAML_FILE = SPEAKER_DATA_BASE_DIR / "speakers.yaml"
 # TTS temporary output path (used by DialogProcessorService)
 TTS_TEMP_OUTPUT_DIR = PROJECT_ROOT / "tts_temp_outputs"
 # Final dialog output path (used by Dialog router and served by main app)
 # These are stored within the 'backend' directory to be easily servable.
 DIALOG_OUTPUT_PARENT_DIR = PROJECT_ROOT / "backend"
 DIALOG_GENERATED_DIR = DIALOG_OUTPUT_PARENT_DIR / "tts_generated_dialogs"
--- a/backend/app/main.py
+++ b/backend/app/main.py
@ -0,0 +1,43 @@
 from fastapi import FastAPI
 from fastapi.staticfiles import StaticFiles
 from fastapi.middleware.cors import CORSMiddleware
 from pathlib import Path
 from app.routers import speakers, dialog # Import the routers
 from app import config
 app = FastAPI(
    title="Chatterbox TTS API",
    description="API for generating TTS dialogs using Chatterbox TTS.",
    version="0.1.0",
 )
 # CORS Middleware configuration
 origins = [
    "http://localhost:8001",
    "http://127.0.0.1:8001",
    # Add other origins if needed, e.g., your deployed frontend URL
 ]
 app.add_middleware(
    CORSMiddleware,
    allow_origins=origins,
    allow_credentials=True,
    allow_methods=["*"],  # Allows all methods
    allow_headers=["*"],  # Allows all headers
 )
 # Include routers
 app.include_router(speakers.router, prefix="/api/speakers", tags=["Speakers"])
 app.include_router(dialog.router, prefix="/api/dialog", tags=["Dialog Generation"])
@app.get("/")
 async def read_root():
    return {"message": "Welcome to the Chatterbox TTS API!"}
 # Ensure the directory for serving generated audio exists
 config.DIALOG_GENERATED_DIR.mkdir(parents=True, exist_ok=True)
 # Mount StaticFiles to serve generated dialogs
 app.mount("/generated_audio", StaticFiles(directory=config.DIALOG_GENERATED_DIR), name="generated_audio")
 # Further endpoints for speakers, dialog generation, etc., will be added here.
--- a/backend/app/models/init.py
+++ b/backend/app/models/init.py
@ -0,0 +1 @@
--- a/backend/app/models/dialog_models.py
+++ b/backend/app/models/dialog_models.py
@ -0,0 +1,43 @@
 from pydantic import BaseModel, Field, validator
 from typing import List, Union, Literal, Optional
 class DialogItemBase(BaseModel):
    type: str
 class SpeechItem(DialogItemBase):
    type: Literal['speech'] = 'speech'
    speaker_id: str = Field(..., description="ID of the speaker for this speech segment.")
    text: str = Field(..., description="Text content to be synthesized.")
    exaggeration: Optional[float] = Field(0.5, description="Controls the expressiveness of the speech. Higher values lead to more exaggerated speech. Default from Gradio.")
    cfg_weight: Optional[float] = Field(0.5, description="Classifier-Free Guidance weight. Higher values make the speech more aligned with the prompt text and speaker characteristics. Default from Gradio.")
    temperature: Optional[float] = Field(0.8, description="Controls randomness in generation. Lower values make speech more deterministic, higher values more varied. Default from Gradio.")
 class SilenceItem(DialogItemBase):
    type: Literal['silence'] = 'silence'
    duration: float = Field(..., gt=0, description="Duration of the silence in seconds.")
 class DialogRequest(BaseModel):
    dialog_items: List[Union[SpeechItem, SilenceItem]] = Field(..., description="A list of speech and silence items.")
    output_base_name: str = Field(..., description="Base name for the output files (e.g., 'my_dialog_v1'). Extensions will be added automatically.")
    @validator('dialog_items', pre=True, each_item=True)
    def check_item_type(cls, item):
        if not isinstance(item, dict):
            raise ValueError("Each dialog item must be a dictionary.")
        item_type = item.get('type')
        if item_type == 'speech':
            # Pydantic will handle further validation based on SpeechItem model
            return item 
        elif item_type == 'silence':
            # Pydantic will handle further validation based on SilenceItem model
            return item
        raise ValueError(f"Unknown dialog item type: {item_type}. Must be 'speech' or 'silence'.")
 class DialogResponse(BaseModel):
    log: str = Field(description="Log of the dialog generation process.")
    # For now, these URLs might be relative paths or placeholders.
    # Actual serving strategy will determine the final URL format.
    concatenated_audio_url: Optional[str] = Field(None, description="URL/path to the concatenated audio file.")
    zip_archive_url: Optional[str] = Field(None, description="URL/path to the ZIP archive of all audio files.")
    temp_dir_path: Optional[str] = Field(None, description="Path to the temporary directory holding generated files, for server-side reference.")
    error_message: Optional[str] = Field(None, description="Error message if the process failed globally.")
--- a/backend/app/models/speaker_models.py
+++ b/backend/app/models/speaker_models.py
@ -0,0 +1,20 @@
 from pydantic import BaseModel
 from typing import Optional
 class SpeakerBase(BaseModel):
    name: str
 class SpeakerCreate(SpeakerBase):
    # For receiving speaker name, file will be handled separately by FastAPI's UploadFile
    pass
 class Speaker(SpeakerBase):
    id: str
    sample_path: Optional[str] = None # Path to the speaker's audio sample
    class Config:
        from_attributes = True # Replaces orm_mode = True in Pydantic v2
 class SpeakerResponse(SpeakerBase):
    id: str
    message: Optional[str] = None
--- a/backend/app/routers/init.py
+++ b/backend/app/routers/init.py
@ -0,0 +1 @@
--- a/backend/app/routers/dialog.py
+++ b/backend/app/routers/dialog.py
@ -0,0 +1,189 @@
 from fastapi import APIRouter, Depends, HTTPException, BackgroundTasks
 from pathlib import Path
 import shutil
 from app.models.dialog_models import DialogRequest, DialogResponse
 from app.services.tts_service import TTSService
 from app.services.speaker_service import SpeakerManagementService
 from app.services.dialog_processor_service import DialogProcessorService
 from app.services.audio_manipulation_service import AudioManipulationService
 from app import config
 router = APIRouter()
 # --- Dependency Injection for Services ---
 # These can be more sophisticated with a proper DI container or FastAPI's Depends system if services had complex init.
 # For now, direct instantiation or simple Depends is fine.
 def get_tts_service():
    # Consider making device configurable
    return TTSService(device="mps") 
 def get_speaker_management_service():
    return SpeakerManagementService()
 def get_dialog_processor_service(
    tts_service: TTSService = Depends(get_tts_service),
    speaker_service: SpeakerManagementService = Depends(get_speaker_management_service)
 ):
    return DialogProcessorService(tts_service=tts_service, speaker_service=speaker_service)
 def get_audio_manipulation_service():
    return AudioManipulationService()
 # --- Helper function to manage TTS model loading/unloading --- 
 async def manage_tts_model_lifecycle(tts_service: TTSService, task_function, *args, **kwargs):
    """Loads TTS model, executes task, then unloads model."""
    try:
        print("API: Loading TTS model...")
        tts_service.load_model()
        return await task_function(*args, **kwargs)
    except Exception as e:
        # Log or handle specific exceptions if needed before re-raising
        print(f"API: Error during TTS model lifecycle or task execution: {e}")
        raise
    finally:
        print("API: Unloading TTS model...")
        tts_service.unload_model()
 async def process_dialog_flow(
    request: DialogRequest,
    dialog_processor: DialogProcessorService,
    audio_manipulator: AudioManipulationService,
    background_tasks: BackgroundTasks
 ) -> DialogResponse:
    """Core logic for processing the dialog request."""
    processing_log_entries = []
    concatenated_audio_file_path = None
    zip_archive_file_path = None
    final_temp_dir_path_str = None
    try:
        # 1. Process dialog to generate segments
        # The DialogProcessorService creates its own temp dir for segments
        dialog_processing_result = await dialog_processor.process_dialog(
            dialog_items=[item.model_dump() for item in request.dialog_items],
            output_base_name=request.output_base_name
        )
        processing_log_entries.append(dialog_processing_result['log'])
        segment_details = dialog_processing_result['segment_files']
        temp_segment_dir = Path(dialog_processing_result['temp_dir'])
        final_temp_dir_path_str = str(temp_segment_dir)
        # Filter out error segments for concatenation and zipping
        valid_segment_paths_for_concat = [
            Path(s['path']) for s in segment_details 
            if s['type'] == 'speech' and s.get('path') and Path(s['path']).exists()
        ]
        # Create a list of dicts suitable for concatenation service (speech paths and silence durations)
        items_for_concatenation = []
        for s_detail in segment_details:
            if s_detail['type'] == 'speech' and s_detail.get('path') and Path(s_detail['path']).exists():
                items_for_concatenation.append({'type': 'speech', 'path': s_detail['path']})
            elif s_detail['type'] == 'silence' and 'duration' in s_detail:
                items_for_concatenation.append({'type': 'silence', 'duration': s_detail['duration']})
            # Errors are already logged by DialogProcessor
        if not any(item['type'] == 'speech' for item in items_for_concatenation):
            message = "No valid speech segments were generated. Cannot create concatenated audio or ZIP."
            processing_log_entries.append(message)
            return DialogResponse(
                log="\n".join(processing_log_entries),
                temp_dir_path=final_temp_dir_path_str,
                error_message=message
            )
        # 2. Concatenate audio segments
        config.DIALOG_GENERATED_DIR.mkdir(parents=True, exist_ok=True)
        concat_filename = f"{request.output_base_name}_concatenated.wav"
        concatenated_audio_file_path = config.DIALOG_GENERATED_DIR / concat_filename
        audio_manipulator.concatenate_audio_segments(
            segment_results=items_for_concatenation,
            output_concatenated_path=concatenated_audio_file_path
        )
        processing_log_entries.append(f"Concatenated audio saved to: {concatenated_audio_file_path}")
        # 3. Create ZIP archive
        zip_filename = f"{request.output_base_name}_dialog_output.zip"
        zip_archive_path = config.DIALOG_GENERATED_DIR / zip_filename
        # Collect all valid generated speech segment files for zipping
        individual_segment_paths = [
            Path(s['path']) for s in segment_details 
            if s['type'] == 'speech' and s.get('path') and Path(s['path']).exists()
        ]
        # concatenated_audio_file_path is already defined and checked for existence before this block
        audio_manipulator.create_zip_archive(
            segment_file_paths=individual_segment_paths,
            concatenated_audio_path=concatenated_audio_file_path,
            output_zip_path=zip_archive_path
        )
        processing_log_entries.append(f"ZIP archive created at: {zip_archive_path}")
        # Schedule cleanup of the temporary segment directory
        # background_tasks.add_task(shutil.rmtree, temp_segment_dir, ignore_errors=True)
        # processing_log_entries.append(f"Scheduled cleanup for temporary segment directory: {temp_segment_dir}")
        # For now, let's not auto-delete, so user can inspect. Cleanup can be a separate endpoint/job.
        processing_log_entries.append(f"Temporary segment directory for inspection: {temp_segment_dir}")
        return DialogResponse(
            log="\n".join(processing_log_entries),
            # URLs should be relative to a static serving path, e.g., /generated_audio/
            # For now, just returning the name, assuming they are in DIALOG_OUTPUT_DIR
            concatenated_audio_url=f"/generated_audio/{concat_filename}", 
            zip_archive_url=f"/generated_audio/{zip_filename}",
            temp_dir_path=final_temp_dir_path_str
        )
    except FileNotFoundError as e:
        error_msg = f"File not found during dialog generation: {e}"
        processing_log_entries.append(error_msg)
        raise HTTPException(status_code=404, detail=error_msg)
    except ValueError as e:
        error_msg = f"Invalid value or configuration: {e}"
        processing_log_entries.append(error_msg)
        raise HTTPException(status_code=400, detail=error_msg)
    except RuntimeError as e:
        error_msg = f"Runtime error during dialog generation: {e}"
        processing_log_entries.append(error_msg)
        # This could be a 500 if it's an unexpected server error
        raise HTTPException(status_code=500, detail=error_msg)
    except Exception as e:
        import traceback
        error_msg = f"An unexpected error occurred: {e}\n{traceback.format_exc()}"
        processing_log_entries.append(error_msg)
        raise HTTPException(status_code=500, detail=error_msg)
    finally:
        # Ensure logs are captured even if an early exception occurs before full response construction
        if not concatenated_audio_file_path and not zip_archive_file_path and processing_log_entries:
            print("Dialog generation failed. Log: \n" + "\n".join(processing_log_entries))
@router.post("/generate", response_model=DialogResponse)
 async def generate_dialog_endpoint(
    request: DialogRequest,
    background_tasks: BackgroundTasks,
    tts_service: TTSService = Depends(get_tts_service),
    dialog_processor: DialogProcessorService = Depends(get_dialog_processor_service),
    audio_manipulator: AudioManipulationService = Depends(get_audio_manipulation_service)
 ):
    """
    Generates a dialog from a list of speech and silence items.
    - Processes text into manageable chunks.
    - Generates speech for each chunk using the specified speaker.
    - Inserts silences as requested.
    - Concatenates all audio segments into a single file.
    - Creates a ZIP archive of all individual segments and the concatenated file.
    """
    # Wrap the core processing logic with model loading/unloading
    return await manage_tts_model_lifecycle(
        tts_service, 
        process_dialog_flow, 
        request=request, 
        dialog_processor=dialog_processor, 
        audio_manipulator=audio_manipulator,
        background_tasks=background_tasks
    )
--- a/backend/app/routers/speakers.py
+++ b/backend/app/routers/speakers.py
@ -0,0 +1,81 @@
 from typing import List, Annotated
 from fastapi import APIRouter, Depends, HTTPException, UploadFile, File, Form
 from app.models.speaker_models import Speaker, SpeakerResponse
 from app.services.speaker_service import SpeakerManagementService
 router = APIRouter(
    tags=["Speakers"],
    responses={404: {"description": "Not found"}},
 )
 # Dependency to get the speaker service instance
 # This could be more sophisticated with a proper DI system later
 def get_speaker_service():
    return SpeakerManagementService()
@router.get("/", response_model=List[Speaker])
 async def get_all_speakers(
    service: Annotated[SpeakerManagementService, Depends(get_speaker_service)]
 ):
    """
    Retrieve all available speakers.
    """
    return service.get_speakers()
@router.post("/", response_model=SpeakerResponse, status_code=201)
 async def create_new_speaker(
    name: Annotated[str, Form()],
    audio_file: Annotated[UploadFile, File()],
    service: Annotated[SpeakerManagementService, Depends(get_speaker_service)]
 ):
    """
    Add a new speaker.
    Requires speaker name (form data) and an audio sample file (file upload).
    """
    if not audio_file.filename:
        raise HTTPException(status_code=400, detail="No audio file provided.")
    if not audio_file.content_type or not audio_file.content_type.startswith("audio/"):
        raise HTTPException(status_code=400, detail="Invalid audio file type. Please upload a valid audio file (e.g., WAV, MP3).")
    try:
        new_speaker = await service.add_speaker(name=name, audio_file=audio_file)
        return SpeakerResponse(
            id=new_speaker.id,
            name=new_speaker.name,
            message="Speaker added successfully."
        )
    except HTTPException as e:
        # Re-raise HTTPExceptions from the service (e.g., file save error)
        raise e
    except Exception as e:
        # Catch-all for other unexpected errors
        raise HTTPException(status_code=500, detail=f"An unexpected error occurred: {str(e)}")
@router.get("/{speaker_id}", response_model=Speaker)
 async def get_speaker_details(
    speaker_id: str,
    service: Annotated[SpeakerManagementService, Depends(get_speaker_service)]
 ):
    """
    Get details for a specific speaker by ID.
    """
    speaker = service.get_speaker_by_id(speaker_id)
    if not speaker:
        raise HTTPException(status_code=404, detail="Speaker not found")
    return speaker
@router.delete("/{speaker_id}", response_model=dict)
 async def remove_speaker(
    speaker_id: str,
    service: Annotated[SpeakerManagementService, Depends(get_speaker_service)]
 ):
    """
    Delete a speaker by ID.
    """
    deleted = service.delete_speaker(speaker_id)
    if not deleted:
        raise HTTPException(status_code=404, detail="Speaker not found or could not be deleted.")
    return {"message": "Speaker deleted successfully"}
--- a/backend/app/services/init.py
+++ b/backend/app/services/init.py
@ -0,0 +1 @@
--- a/backend/app/services/audio_manipulation_service.py
+++ b/backend/app/services/audio_manipulation_service.py
@ -0,0 +1,241 @@
 import torch
 import torchaudio
 from pathlib import Path
 from typing import List, Dict, Union, Tuple
 import zipfile
 # Define a common sample rate, e.g., from the TTS model. This should ideally be configurable or dynamically obtained.
 # For now, let's assume the TTS model (ChatterboxTTS) outputs at a known sample rate.
 # The ChatterboxTTS model.sr is 24000.
 DEFAULT_SAMPLE_RATE = 24000 
 class AudioManipulationService:
    def __init__(self, default_sample_rate: int = DEFAULT_SAMPLE_RATE):
        self.sample_rate = default_sample_rate
    def _load_audio(self, file_path: Union[str, Path]) -> Tuple[torch.Tensor, int]:
        """Loads an audio file and returns the waveform and sample rate."""
        try:
            waveform, sr = torchaudio.load(file_path)
            return waveform, sr
        except Exception as e:
            raise RuntimeError(f"Error loading audio file {file_path}: {e}")
    def _create_silence(self, duration_seconds: float) -> torch.Tensor:
        """Creates a silent audio tensor of a given duration."""
        num_frames = int(duration_seconds * self.sample_rate)
        return torch.zeros((1, num_frames)) # Mono silence
    def concatenate_audio_segments(
        self, 
        segment_results: List[Dict],
        output_concatenated_path: Path
    ) -> Path:
        """
        Concatenates audio segments and silences into a single audio file.
        Args:
            segment_results: A list of dictionaries, where each dict represents an audio
                             segment or a silence. Expected format:
                             For speech: {'type': 'speech', 'path': 'path/to/audio.wav', ...}
                             For silence: {'type': 'silence', 'duration': 0.5, ...}
            output_concatenated_path: The path to save the final concatenated audio file.
        Returns:
            The path to the concatenated audio file.
        """
        all_waveforms: List[torch.Tensor] = []
        current_sample_rate = self.sample_rate # Assume this initially, verify with first loaded audio
        for i, segment_info in enumerate(segment_results):
            segment_type = segment_info.get("type")
            if segment_type == "speech":
                audio_path_str = segment_info.get("path")
                if not audio_path_str:
                    print(f"Warning: Speech segment {i} has no path. Skipping.")
                    continue
                audio_path = Path(audio_path_str)
                if not audio_path.exists():
                    print(f"Warning: Audio file {audio_path} for segment {i} not found. Skipping.")
                    continue
                try:
                    waveform, sr = self._load_audio(audio_path)
                    # Ensure consistent sample rate. Resample if necessary.
                    # For simplicity, this example assumes all inputs will match self.sample_rate
                    # or the first loaded audio's sample rate. A more robust implementation
                    # would resample if sr != current_sample_rate.
                    if i == 0 and not all_waveforms: # First audio segment sets the reference SR if not default
                        current_sample_rate = sr
                        if sr != self.sample_rate:
                            print(f"Warning: First audio segment SR ({sr} Hz) differs from service default SR ({self.sample_rate} Hz). Using segment SR.")
                    if sr != current_sample_rate:
                        print(f"Warning: Sample rate mismatch for {audio_path} ({sr} Hz) vs expected ({current_sample_rate} Hz). Resampling...")
                        resampler = torchaudio.transforms.Resample(orig_freq=sr, new_freq=current_sample_rate)
                        waveform = resampler(waveform)
                    # Ensure mono. If stereo, take the mean or first channel.
                    if waveform.shape[0] > 1:
                        waveform = torch.mean(waveform, dim=0, keepdim=True)
                    all_waveforms.append(waveform)
                except Exception as e:
                    print(f"Error processing speech segment {audio_path}: {e}. Skipping.")
            elif segment_type == "silence":
                duration = segment_info.get("duration")
                if duration is None or not isinstance(duration, (int, float)) or duration < 0:
                    print(f"Warning: Silence segment {i} has invalid duration. Skipping.")
                    continue
                silence_waveform = self._create_silence(float(duration))
                all_waveforms.append(silence_waveform)
            elif segment_type == "error":
                # Errors are already logged by DialogProcessorService, just skip here.
                print(f"Skipping segment {i} due to previous error: {segment_info.get('message')}")
                continue
            else:
                print(f"Warning: Unknown segment type '{segment_type}' at index {i}. Skipping.")
        if not all_waveforms:
            raise ValueError("No valid audio segments or silences found to concatenate.")
        # Concatenate all waveforms
        final_waveform = torch.cat(all_waveforms, dim=1)
        # Ensure output directory exists
        output_concatenated_path.parent.mkdir(parents=True, exist_ok=True)
        # Save the concatenated audio
        try:
            torchaudio.save(str(output_concatenated_path), final_waveform, current_sample_rate)
            print(f"Concatenated audio saved to: {output_concatenated_path}")
            return output_concatenated_path
        except Exception as e:
            raise RuntimeError(f"Error saving concatenated audio to {output_concatenated_path}: {e}")
    def create_zip_archive(
        self,
        segment_file_paths: List[Path], 
        concatenated_audio_path: Path,
        output_zip_path: Path
    ) -> Path:
        """
        Creates a ZIP archive containing individual audio segments and the concatenated audio file.
        Args:
            segment_file_paths: A list of paths to the individual audio segment files.
            concatenated_audio_path: Path to the final concatenated audio file.
            output_zip_path: The path to save the output ZIP archive.
        Returns:
            The path to the created ZIP archive.
        """
        output_zip_path.parent.mkdir(parents=True, exist_ok=True)
        with zipfile.ZipFile(output_zip_path, 'w', zipfile.ZIP_DEFLATED) as zf:
            # Add concatenated audio
            if concatenated_audio_path.exists():
                zf.write(concatenated_audio_path, arcname=concatenated_audio_path.name)
            else:
                print(f"Warning: Concatenated audio file {concatenated_audio_path} not found for zipping.")
            # Add individual segments
            segments_dir_name = "segments"
            for file_path in segment_file_paths:
                if file_path.exists() and file_path.is_file():
                    # Store segments in a subdirectory within the zip for organization
                    zf.write(file_path, arcname=Path(segments_dir_name) / file_path.name)
                else:
                    print(f"Warning: Segment file {file_path} not found or is not a file. Skipping for zipping.")
        print(f"ZIP archive created at: {output_zip_path}")
        return output_zip_path
 # Example Usage (Test Block)
 if __name__ == "__main__":
    import tempfile
    import shutil
    # Create a temporary directory for test files
    test_temp_dir = Path(tempfile.mkdtemp(prefix="audio_manip_test_"))
    print(f"Created temporary test directory: {test_temp_dir}")
    # Instance of the service
    audio_service = AudioManipulationService()
    # --- Test Data Setup ---
    # Create dummy audio files (e.g., short silences with different names)
    dummy_sr = audio_service.sample_rate
    segment1_path = test_temp_dir / "segment1_speech.wav"
    segment2_path = test_temp_dir / "segment2_speech.wav"
    torchaudio.save(str(segment1_path), audio_service._create_silence(1.0), dummy_sr)
    # Create a dummy segment with a different sample rate to test resampling
    dummy_sr_alt = 16000
    temp_waveform_alt_sr = torch.rand((1, int(0.5 * dummy_sr_alt))) # 0.5s at 16kHz
    torchaudio.save(str(segment2_path), temp_waveform_alt_sr, dummy_sr_alt)
    segment_results_for_concat = [
        {"type": "speech", "path": str(segment1_path), "speaker_id": "spk1", "text_chunk": "Test 1"},
        {"type": "silence", "duration": 0.5},
        {"type": "speech", "path": str(segment2_path), "speaker_id": "spk2", "text_chunk": "Test 2 (alt SR)"},
        {"type": "error", "message": "Simulated error, should be skipped"},
        {"type": "speech", "path": "non_existent_segment.wav"}, # Test non-existent file
        {"type": "silence", "duration": -0.2} # Test invalid duration
    ]
    concatenated_output_path = test_temp_dir / "final_concatenated_audio.wav"
    zip_output_path = test_temp_dir / "audio_archive.zip"
    all_segment_files_for_zip = [segment1_path, segment2_path]
    try:
        # Test concatenation
        print("\n--- Testing Concatenation ---")
        actual_concat_path = audio_service.concatenate_audio_segments(
            segment_results_for_concat, 
            concatenated_output_path
        )
        print(f"Concatenation test successful. Output: {actual_concat_path}")
        assert actual_concat_path.exists()
        # Basic check: load concatenated and verify duration (approx)
        concat_wav, concat_sr = audio_service._load_audio(actual_concat_path)
        expected_duration = 1.0 + 0.5 + 0.5 # seg1 (1.0s) + silence (0.5s) + seg2 (0.5s) = 2.0s
        actual_duration = concat_wav.shape[1] / concat_sr
        print(f"Expected duration (approx): {expected_duration}s, Actual duration: {actual_duration:.2f}s")
        assert abs(actual_duration - expected_duration) < 0.1 # Allow small deviation
        # Test Zipping
        print("\n--- Testing Zipping ---")
        actual_zip_path = audio_service.create_zip_archive(
            all_segment_files_for_zip, 
            actual_concat_path, 
            zip_output_path
        )
        print(f"Zipping test successful. Output: {actual_zip_path}")
        assert actual_zip_path.exists()
        # Verify zip contents (basic check)
        segments_dir_name = "segments" # Define this for the assertion below
        with zipfile.ZipFile(actual_zip_path, 'r') as zf_read:
            zip_contents = zf_read.namelist()
            print(f"ZIP contents: {zip_contents}")
            assert Path(segments_dir_name) / segment1_path.name in [Path(p) for p in zip_contents]
            assert Path(segments_dir_name) / segment2_path.name in [Path(p) for p in zip_contents]
            assert concatenated_output_path.name in zip_contents
        print("\nAll AudioManipulationService tests passed!")
    except Exception as e:
        import traceback
        print(f"\nAn error occurred during AudioManipulationService tests:")
        traceback.print_exc()
    finally:
        # Clean up temporary directory
        # shutil.rmtree(test_temp_dir)
        # print(f"Cleaned up temporary test directory: {test_temp_dir}")
        print(f"Test files are in {test_temp_dir}. Please inspect and delete manually if needed.")
--- a/backend/app/services/dialog_processor_service.py
+++ b/backend/app/services/dialog_processor_service.py
@ -0,0 +1,265 @@
 from pathlib import Path
 from typing import List, Dict, Any, Union
 import re
 from .tts_service import TTSService
 from .speaker_service import SpeakerManagementService
 from app import config
 # Potentially models for dialog structure if we define them
 # from ..models.dialog_models import DialogItem # Example
 class DialogProcessorService:
    def __init__(self, tts_service: TTSService, speaker_service: SpeakerManagementService):
        self.tts_service = tts_service
        self.speaker_service = speaker_service
        # Base directory for storing individual audio segments during processing
        self.temp_audio_dir = config.TTS_TEMP_OUTPUT_DIR
        self.temp_audio_dir.mkdir(parents=True, exist_ok=True)
    def _split_text(self, text: str, max_length: int = 300) -> List[str]:
        """
        Splits text into chunks suitable for TTS processing, attempting to respect sentence boundaries.
        Similar to split_text_at_sentence_boundaries from the original Gradio app.
        Max_length is approximate, as it tries to finish sentences.
        """
        # Basic sentence splitting using common delimiters. More sophisticated NLP could be used.
        # This regex tries to split by '.', '!', '?', '...', followed by space or end of string.
        # It also handles cases where these delimiters might be followed by quotes or parentheses.
        sentences = re.split(r'(?<=[.!?\u2026])\s+|(?<=[.!?\u2026])(?=["\')\]\}\u201d\u2019])|(?<=[.!?\u2026])$', text.strip())
        sentences = [s.strip() for s in sentences if s and s.strip()]
        chunks = []
        current_chunk = ""
        for sentence in sentences:
            if not sentence:
                continue
            if not current_chunk: # First sentence for this chunk
                current_chunk = sentence
            elif len(current_chunk) + len(sentence) + 1 <= max_length:
                current_chunk += " " + sentence
            else:
                chunks.append(current_chunk)
                current_chunk = sentence
        if current_chunk: # Add the last chunk
            chunks.append(current_chunk)
        # Further split any chunks that are still too long (e.g., a single very long sentence)
        final_chunks = []
        for chunk in chunks:
            if len(chunk) > max_length:
                # Simple split by length if a sentence itself is too long
                for i in range(0, len(chunk), max_length):
                    final_chunks.append(chunk[i:i+max_length])
            else:
                final_chunks.append(chunk)
        return final_chunks
    async def process_dialog(self, dialog_items: List[Dict[str, Any]], output_base_name: str) -> Dict[str, Any]:
        """
        Processes a list of dialog items (speech or silence) to generate audio segments.
        Args:
            dialog_items: A list of dictionaries, where each item has:
                          - 'type': 'speech' or 'silence'
                          - For 'speech': 'speaker_id': str, 'text': str
                          - For 'silence': 'duration': float (in seconds)
            output_base_name: The base name for the output files.
        Returns:
            A dictionary containing paths to generated segments and other processing info.
            Example: {
                "log": "Processing complete...",
                "segment_files": [
                    {"type": "speech", "path": "/path/to/segment1.wav", "speaker_id": "X", "text_chunk": "..."},
                    {"type": "silence", "duration": 0.5},
                    {"type": "speech", "path": "/path/to/segment2.wav", "speaker_id": "Y", "text_chunk": "..."}
                ],
                "temp_dir": str(self.temp_audio_dir / output_base_name)
            }
        """
        segment_results = []
        processing_log = []
        # Create a unique subdirectory for this dialog's temporary files
        dialog_temp_dir = self.temp_audio_dir / output_base_name
        dialog_temp_dir.mkdir(parents=True, exist_ok=True)
        processing_log.append(f"Created temporary directory for segments: {dialog_temp_dir}")
        segment_idx = 0
        for i, item in enumerate(dialog_items):
            item_type = item.get("type")
            processing_log.append(f"Processing item {i+1}: type='{item_type}'")
            if item_type == "speech":
                speaker_id = item.get("speaker_id")
                text = item.get("text")
                if not speaker_id or not text:
                    processing_log.append(f"Skipping speech item {i+1} due to missing speaker_id or text.")
                    segment_results.append({"type": "error", "message": "Missing speaker_id or text"})
                    continue
                # Validate speaker_id and get speaker_sample_path
                speaker_info = self.speaker_service.get_speaker_by_id(speaker_id)
                if not speaker_info:
                    processing_log.append(f"Speaker ID '{speaker_id}' not found. Skipping item {i+1}.")
                    segment_results.append({"type": "error", "message": f"Speaker ID '{speaker_id}' not found"})
                    continue
                if not speaker_info.sample_path:
                    processing_log.append(f"Speaker ID '{speaker_id}' has no sample path defined. Skipping item {i+1}.")
                    segment_results.append({"type": "error", "message": f"Speaker ID '{speaker_id}' has no sample path defined"})
                    continue
                # speaker_info.sample_path is relative to config.SPEAKER_DATA_BASE_DIR
                abs_speaker_sample_path = config.SPEAKER_DATA_BASE_DIR / speaker_info.sample_path
                if not abs_speaker_sample_path.is_file():
                    processing_log.append(f"Speaker sample file not found or is not a file at '{abs_speaker_sample_path}' for speaker ID '{speaker_id}'. Skipping item {i+1}.")
                    segment_results.append({"type": "error", "message": f"Speaker sample not a file or not found: {abs_speaker_sample_path}"})
                    continue
                text_chunks = self._split_text(text)
                processing_log.append(f"Split text for speaker '{speaker_id}' into {len(text_chunks)} chunk(s).")
                for chunk_idx, text_chunk in enumerate(text_chunks):
                    segment_filename_base = f"{output_base_name}_seg{segment_idx}_spk{speaker_id}_chunk{chunk_idx}"
                    processing_log.append(f"Generating speech for chunk: '{text_chunk[:50]}...' using speaker '{speaker_id}'")
                    try:
                        segment_output_path = await self.tts_service.generate_speech(
                            text=text_chunk,
                            speaker_id=speaker_id, # For metadata, actual sample path is used by TTS
                            speaker_sample_path=str(abs_speaker_sample_path),
                            output_filename_base=segment_filename_base,
                            output_dir=dialog_temp_dir, # Save to the dialog's temp dir
                            exaggeration=item.get('exaggeration', 0.5), # Default from Gradio, Pydantic model should provide this
                            cfg_weight=item.get('cfg_weight', 0.5),     # Default from Gradio, Pydantic model should provide this
                            temperature=item.get('temperature', 0.8)  # Default from Gradio, Pydantic model should provide this
                        )
                        segment_results.append({
                            "type": "speech", 
                            "path": str(segment_output_path), 
                            "speaker_id": speaker_id, 
                            "text_chunk": text_chunk
                        })
                        processing_log.append(f"Successfully generated segment: {segment_output_path}")
                    except Exception as e:
                        error_message = f"Error generating speech for chunk '{text_chunk[:50]}...': {repr(e)}"
                        processing_log.append(error_message)
                        segment_results.append({"type": "error", "message": error_message, "text_chunk": text_chunk})
                    segment_idx += 1
            elif item_type == "silence":
                duration = item.get("duration")
                if duration is None or duration < 0:
                    processing_log.append(f"Skipping silence item {i+1} due to invalid duration.")
                    segment_results.append({"type": "error", "message": "Invalid duration for silence"})
                    continue
                segment_results.append({"type": "silence", "duration": float(duration)})
                processing_log.append(f"Added silence of {duration}s.")
            else:
                processing_log.append(f"Unknown item type '{item_type}' at item {i+1}. Skipping.")
                segment_results.append({"type": "error", "message": f"Unknown item type: {item_type}"})
        return {
            "log": "\n".join(processing_log),
            "segment_files": segment_results,
            "temp_dir": str(dialog_temp_dir) # For cleanup or zipping later
        }
 if __name__ == "__main__":
    import asyncio
    import pprint
    async def main_test():
        # Initialize services
        tts_service = TTSService(device="mps") # or your preferred device
        speaker_service = SpeakerManagementService()
        dialog_processor = DialogProcessorService(tts_service, speaker_service)
        # Ensure dummy speaker sample exists (TTSService test block usually creates this)
        # For robustness, we can call the TTSService test logic or ensure it's run prior.
        # Here, we assume dummy_speaker_test.wav is available as per previous steps.
        # If not, the 'test_speaker_for_dialog_proc' will fail file validation.
        # First, ensure the dummy speaker file is created by TTSService's own test logic
        # This is a bit of a hack for testing; ideally, test assets are managed independently.
        try:
            print("Ensuring dummy speaker sample is created by running TTSService's main_test logic...")
            from .tts_service import main_test as tts_main_test
            await tts_main_test() # This will create the dummy_speaker_test.wav
            print("TTSService main_test completed, dummy sample should exist.")
        except ImportError:
            print("Could not import tts_service.main_test directly. Ensure dummy_speaker_test.wav exists.")
        except Exception as e:
            print(f"Error running tts_service.main_test for dummy sample creation: {e}")
            print("Proceeding, but 'test_speaker_for_dialog_proc' might fail if sample is missing.")
        sample_dialog_items = [
            {
                "type": "speech",
                "speaker_id": "test_speaker_for_dialog_proc", # Defined in speakers.yaml
                "text": "Hello world! This is the first speech segment."
            },
            {
                "type": "silence",
                "duration": 0.75
            },
            {
                "type": "speech",
                "speaker_id": "test_speaker_for_dialog_proc",
                "text": "This is a much longer piece of text that should definitely be split into multiple, smaller chunks by the dialog processor. It contains several sentences. Let's see how it handles this. The maximum length is set to 300 characters, but it tries to respect sentence boundaries. This sentence itself is quite long and might even be split mid-sentence if it exceeds the hard limit after sentence splitting. We will observe the output carefully to ensure it works as expected, creating multiple audio files for this single text block if necessary."
            },
            {
                "type": "speech",
                "speaker_id": "non_existent_speaker_id",
                "text": "This should fail because the speaker does not exist."
            },
            {
                "type": "invalid_type", 
                "text": "This item has an invalid type."
            },
            {
                "type": "speech",
                "speaker_id": "test_speaker_for_dialog_proc",
                "text": None # Test missing text
            },
            {
                "type": "speech",
                "speaker_id": None, # Test missing speaker_id
                "text": "This is a test with a missing speaker ID."
            },
            {
                "type": "silence",
                "duration": -0.5 # Invalid duration
            }
        ]
        output_base_name = "dialog_processor_test_run"
        try:
            print(f"\nLoading TTS model for DialogProcessorService test...")
            # TTSService's generate_speech will load the model if not already loaded.
            # However, explicit load/unload is good practice for a test block.
            tts_service.load_model()
            print(f"\nProcessing dialog items with base name: {output_base_name}...")
            results = await dialog_processor.process_dialog(sample_dialog_items, output_base_name)
            print("\n--- Processing Log ---")
            print(results.get("log"))
            print("\n--- Segment Files / Results ---")
            pprint.pprint(results.get("segment_files"))
            print(f"\nTemporary directory used: {results.get('temp_dir')}")
            print("\nPlease check the temporary directory for generated audio segments.")
        except Exception as e:
            import traceback
            print(f"\nAn error occurred during the DialogProcessorService test:")
            traceback.print_exc()
        finally:
            print("\nUnloading TTS model...")
            tts_service.unload_model()
            print("DialogProcessorService test finished.")
    asyncio.run(main_test())
--- a/backend/app/services/speaker_service.py
+++ b/backend/app/services/speaker_service.py
@ -0,0 +1,147 @@
 import yaml
 import uuid
 import os
 import io # Added for BytesIO
 import torchaudio # Added for audio processing
 from pathlib import Path
 from typing import List, Dict, Optional, Any
 from fastapi import UploadFile, HTTPException
 from app.models.speaker_models import Speaker, SpeakerCreate
 from app import config
 class SpeakerManagementService:
    def __init__(self):
        self._ensure_data_files_exist()
        self.speakers_data = self._load_speakers_data()
    def _ensure_data_files_exist(self):
        """Ensures the speaker data directory and YAML file exist."""
        config.SPEAKER_DATA_BASE_DIR.mkdir(parents=True, exist_ok=True)
        config.SPEAKER_SAMPLES_DIR.mkdir(parents=True, exist_ok=True)
        if not config.SPEAKERS_YAML_FILE.exists():
            with open(config.SPEAKERS_YAML_FILE, 'w') as f:
                yaml.dump({}, f) # Initialize with an empty dict, as per previous fixes
    def _load_speakers_data(self) -> Dict[str, Any]: # Changed return type to Dict
        """Loads speaker data from the YAML file."""
        try:
            with open(config.SPEAKERS_YAML_FILE, 'r') as f:
                data = yaml.safe_load(f)
                return data if isinstance(data, dict) else {} # Ensure it's a dict
        except FileNotFoundError:
            return {}
        except yaml.YAMLError:
            # Handle corrupted YAML file, e.g., log error and return empty list
            print(f"Error: Corrupted speakers YAML file at {config.SPEAKERS_YAML_FILE}")
            return {}
    def _save_speakers_data(self):
        """Saves the current speaker data to the YAML file."""
        with open(config.SPEAKERS_YAML_FILE, 'w') as f:
            yaml.dump(self.speakers_data, f, sort_keys=False)
    def get_speakers(self) -> List[Speaker]:
        """Returns a list of all speakers."""
        # self.speakers_data is now a dict: {speaker_id: {name: ..., sample_path: ...}}
        return [Speaker(id=spk_id, **spk_attrs) for spk_id, spk_attrs in self.speakers_data.items()]
    def get_speaker_by_id(self, speaker_id: str) -> Optional[Speaker]:
        """Retrieves a speaker by their ID."""
        if speaker_id in self.speakers_data:
            speaker_attributes = self.speakers_data[speaker_id]
            return Speaker(id=speaker_id, **speaker_attributes)
        return None
    async def add_speaker(self, name: str, audio_file: UploadFile) -> Speaker:
        """Adds a new speaker, converts sample to WAV, saves it, and updates YAML."""
        speaker_id = str(uuid.uuid4())
        # Define standardized sample filename and path (always WAV)
        sample_filename = f"{speaker_id}.wav"
        sample_path = config.SPEAKER_SAMPLES_DIR / sample_filename
        try:
            content = await audio_file.read()
            # Use BytesIO to handle the in-memory audio data for torchaudio
            audio_buffer = io.BytesIO(content)
            # Load audio data using torchaudio, this handles various formats (MP3, WAV, etc.)
            # waveform is a tensor, sample_rate is an int
            waveform, sample_rate = torchaudio.load(audio_buffer)
            # Save the audio data as WAV
            # Ensure the SPEAKER_SAMPLES_DIR exists (though _ensure_data_files_exist should handle it)
            config.SPEAKER_SAMPLES_DIR.mkdir(parents=True, exist_ok=True)
            torchaudio.save(str(sample_path), waveform, sample_rate, format="wav")
        except torchaudio.TorchaudioException as e:
            # More specific error for torchaudio issues (e.g. unsupported format, corrupted file)
            raise HTTPException(status_code=400, detail=f"Error processing audio file: {e}. Ensure it's a valid audio format (e.g., WAV, MP3).")
        except Exception as e:
            # General error handling for other issues (e.g., file system errors)
            raise HTTPException(status_code=500, detail=f"Could not save audio file: {e}")
        finally:
            await audio_file.close()
        new_speaker_data = {
            "id": speaker_id,
            "name": name,
            "sample_path": str(sample_path.relative_to(config.SPEAKER_DATA_BASE_DIR)) # Store path relative to speaker_data dir
        }
        # self.speakers_data is now a dict
        self.speakers_data[speaker_id] = {
            "name": name,
            "sample_path": str(sample_path.relative_to(config.SPEAKER_DATA_BASE_DIR))
        }
        self._save_speakers_data()
        # Construct Speaker model for return, including the ID
        return Speaker(id=speaker_id, name=name, sample_path=str(sample_path.relative_to(config.SPEAKER_DATA_BASE_DIR)))
    def delete_speaker(self, speaker_id: str) -> bool:
        """Deletes a speaker and their audio sample."""
        # Speaker data is now a dictionary, keyed by speaker_id
        speaker_to_delete = self.speakers_data.pop(speaker_id, None)
        if speaker_to_delete:
            self._save_speakers_data()
            sample_path_str = speaker_to_delete.get("sample_path")
            if sample_path_str:
                # sample_path_str is relative to SPEAKER_DATA_BASE_DIR
                full_sample_path = config.SPEAKER_DATA_BASE_DIR / sample_path_str
                try:
                    if full_sample_path.is_file(): # Check if it's a file before removing
                        os.remove(full_sample_path)
                except OSError as e:
                    # Log error if file deletion fails but proceed
                    print(f"Error deleting sample file {full_sample_path}: {e}")
            return True
        return False
 # Example usage (for testing, not part of the service itself)
 if __name__ == "__main__":
    service = SpeakerManagementService()
    print("Initial speakers:", service.get_speakers())
    # This part would require a mock UploadFile to run directly
    # print("\nAdding a new speaker (manual test setup needed for UploadFile)")
    # class MockUploadFile:
    #     def __init__(self, filename, content):
    #         self.filename = filename
    #         self._content = content
    #     async def read(self): return self._content
    #     async def close(self): pass
    # import asyncio
    # async def test_add():
    #     mock_file = MockUploadFile("test.wav", b"dummy audio content")
    #     new_speaker = await service.add_speaker(name="Test Speaker", audio_file=mock_file)
    #     print("\nAdded speaker:", new_speaker)
    #     print("Speakers after add:", service.get_speakers())
    #     return new_speaker.id
    # speaker_id_to_delete = asyncio.run(test_add())
    # if speaker_id_to_delete:
    #     print(f"\nDeleting speaker {speaker_id_to_delete}")
    #     service.delete_speaker(speaker_id_to_delete)
    #     print("Speakers after delete:", service.get_speakers())
--- a/backend/app/services/tts_service.py
+++ b/backend/app/services/tts_service.py
@ -0,0 +1,155 @@
 import torch
 import torchaudio
 from typing import Optional
 from chatterbox.tts import ChatterboxTTS
 from pathlib import Path
 import gc # Garbage collector for memory management
 # Define a directory for TTS model outputs, could be temporary or configurable
 TTS_OUTPUT_DIR = Path("/Volumes/SAM2/CODE/chatterbox-test/tts_outputs") # Example path
 class TTSService:
    def __init__(self, device: str = "mps"): # Default to MPS for Macs, can be "cpu" or "cuda"
        self.device = device
        self.model = None
        self._ensure_output_dir_exists()
    def _ensure_output_dir_exists(self):
        """Ensures the TTS output directory exists."""
        TTS_OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
    def load_model(self):
        """Loads the ChatterboxTTS model."""
        if self.model is None:
            print(f"Loading ChatterboxTTS model to device: {self.device}...")
            try:
                self.model = ChatterboxTTS.from_pretrained(device=self.device)
                print("ChatterboxTTS model loaded successfully.")
            except Exception as e:
                print(f"Error loading ChatterboxTTS model: {e}")
                # Potentially raise an exception or handle appropriately
                raise
        else:
            print("ChatterboxTTS model already loaded.")
    def unload_model(self):
        """Unloads the model and clears memory."""
        if self.model is not None:
            print("Unloading ChatterboxTTS model and clearing cache...")
            del self.model
            self.model = None
            if self.device == "cuda":
                torch.cuda.empty_cache()
            elif self.device == "mps":
                if hasattr(torch.mps, "empty_cache"): # Check if empty_cache is available for MPS
                    torch.mps.empty_cache()
            gc.collect() # Explicitly run garbage collection
            print("Model unloaded and memory cleared.")
    async def generate_speech(
        self,
        text: str,
        speaker_sample_path: str, # Absolute path to the speaker's audio sample
        output_filename_base: str, # e.g., "dialog_line_1_spk_X_chunk_0"
        speaker_id: Optional[str] = None, # Optional, mainly for logging if needed, filename base is primary
        output_dir: Optional[Path] = None, # Optional, defaults to TTS_OUTPUT_DIR from this module
        exaggeration: float = 0.5, # Default from Gradio
        cfg_weight: float = 0.5,   # Default from Gradio
        temperature: float = 0.8,  # Default from Gradio
    ) -> Path:
        """
        Generates speech from text using the loaded TTS model and a speaker sample.
        Saves the output to a .wav file.
        """
        if self.model is None:
            self.load_model()
        if self.model is None: # Check again if loading failed
            raise RuntimeError("TTS model is not loaded. Cannot generate speech.")
        # Ensure speaker_sample_path is valid
        speaker_sample_p = Path(speaker_sample_path)
        if not speaker_sample_p.exists() or not speaker_sample_p.is_file():
            raise FileNotFoundError(f"Speaker sample audio file not found: {speaker_sample_path}")
        target_output_dir = output_dir if output_dir is not None else TTS_OUTPUT_DIR
        target_output_dir.mkdir(parents=True, exist_ok=True)
        # output_filename_base from DialogProcessorService is expected to be comprehensive (e.g., includes speaker_id, segment info)
        output_file_path = target_output_dir / f"{output_filename_base}.wav"
        print(f"Generating audio for text: \"{text[:50]}...\" with speaker sample: {speaker_sample_path}")
        try:
            with torch.no_grad(): # Important for inference
                wav = self.model.generate(
                    text=text,
                    audio_prompt_path=str(speaker_sample_p), # Must be a string path
                    exaggeration=exaggeration,
                    cfg_weight=cfg_weight,
                    temperature=temperature,
                )
            torchaudio.save(str(output_file_path), wav, self.model.sr)
            print(f"Audio saved to: {output_file_path}")
            return output_file_path
        except Exception as e:
            print(f"Error during TTS generation or saving: {e}")
            raise
        finally:
            # For now, we keep it loaded. Memory management might need refinement.
            pass
 # Example usage (for testing, not part of the service itself)
 if __name__ == "__main__":
    async def main_test():
        tts_service = TTSService(device="mps")
        try:
            tts_service.load_model()
            dummy_speaker_root = Path("/Volumes/SAM2/CODE/chatterbox-test/speaker_data/speaker_samples")
            dummy_speaker_root.mkdir(parents=True, exist_ok=True)
            dummy_sample_file = dummy_speaker_root / "dummy_speaker_test.wav"
            import os # Added for os.remove
            # Always try to remove an existing dummy file to ensure a fresh one is created
            if dummy_sample_file.exists():
                try:
                    os.remove(dummy_sample_file)
                    print(f"Removed existing dummy sample: {dummy_sample_file}")
                except OSError as e:
                    print(f"Error removing existing dummy sample {dummy_sample_file}: {e}")
                    # Proceeding, but torchaudio.save might fail or overwrite
            print(f"Creating new dummy speaker sample: {dummy_sample_file}")
            # Create a minimal, silent WAV file for testing
            sample_rate = 22050
            duration = 1  # seconds
            num_channels = 1
            num_frames = sample_rate * duration
            audio_data = torch.zeros((num_channels, num_frames))
            try:
                torchaudio.save(str(dummy_sample_file), audio_data, sample_rate)
                print(f"Dummy sample created successfully: {dummy_sample_file}")
            except Exception as save_e:
                print(f"Could not create dummy sample: {save_e}")
                # If creation fails, the subsequent generation test will likely also fail or be skipped.
            if dummy_sample_file.exists():
                output_path = await tts_service.generate_speech(
                    text="Hello, this is a test of the Text-to-Speech service.",
                    speaker_id="test_speaker",
                    speaker_sample_path=str(dummy_sample_file),
                    output_filename_base="test_generation"
                )
                print(f"Test generation output: {output_path}")
            else:
                print(f"Skipping generation test as dummy sample {dummy_sample_file} not found.")
        except Exception as e:
            import traceback
            print(f"Error during TTS generation or saving:")
            traceback.print_exc()
        finally:
            tts_service.unload_model()
    import asyncio
    asyncio.run(main_test())
--- a/backend/requirements.txt
+++ b/backend/requirements.txt
@ -0,0 +1,7 @@
 fastapi
 uvicorn[standard]
 python-multipart
 PyYAML
 torch
 torchaudio
 chatterbox-tts
--- a/backend/run_api_test.py
+++ b/backend/run_api_test.py
@ -0,0 +1,108 @@
 import requests
 import json
 from pathlib import Path
 import time
 # Configuration
 API_BASE_URL = "http://localhost:8000/api/dialog"
 ENDPOINT_URL = f"{API_BASE_URL}/generate"
 # Define project root relative to this test script (assuming it's in backend/)
 PROJECT_ROOT = Path(__file__).resolve().parent
 GENERATED_DIALOGS_DIR = PROJECT_ROOT / "tts_generated_dialogs"
 DIALOG_PAYLOAD = {
    "output_base_name": "test_dialog_from_script",
    "dialog_items": [
        {
            "type": "speech",
            "speaker_id": "dummy_speaker", # Ensure this speaker exists in your speakers.yaml and has a sample .wav
            "text": "This is a test from the Python script. One, two, three.",
            "exaggeration": 1.5,
            "cfg_weight": 4.0,
            "temperature": 0.5
        },
        {
            "type": "silence",
            "duration": 0.5
        },
        {
            "type": "speech",
            "speaker_id": "dummy_speaker",
            "text": "Testing complete. All systems nominal."
        },
        {
            "type": "speech",
            "speaker_id": "non_existent_speaker", # Test case for invalid speaker
            "text": "This should produce an error for this segment."
        },
        {
            "type": "silence",
            "duration": 0.25 # Changed to valid duration
        }
    ]
 }
 def run_test():
    print(f"Sending POST request to: {ENDPOINT_URL}")
    print("Payload:")
    print(json.dumps(DIALOG_PAYLOAD, indent=2))
    print("-" * 50)
    try:
        start_time = time.time()
        response = requests.post(ENDPOINT_URL, json=DIALOG_PAYLOAD, timeout=120) # Increased timeout for TTS processing
        end_time = time.time()
        print(f"Response received in {end_time - start_time:.2f} seconds.")
        print(f"Status Code: {response.status_code}")
        print("-" * 50)
        if response.content:
            try:
                response_data = response.json()
                print("Response JSON:")
                print(json.dumps(response_data, indent=2))
                print("-" * 50)
                if response.status_code == 200:
                    print("Test PASSED (HTTP 200 OK)")
                    concatenated_url = response_data.get("concatenated_audio_url")
                    zip_url = response_data.get("zip_archive_url")
                    temp_dir = response_data.get("temp_dir_path")
                    if concatenated_url:
                        print(f"Concatenated audio URL: http://localhost:8000{concatenated_url}")
                    if zip_url:
                        print(f"ZIP archive URL: http://localhost:8000{zip_url}")
                    if temp_dir:
                        print(f"Temporary segment directory: {temp_dir}")
                    print("\nTo verify, check the generated files in:")
                    print(f"  Concatenated/ZIP: {GENERATED_DIALOGS_DIR}")
                    print(f"  Individual segments (if not cleaned up): {temp_dir}")
                else:
                    print(f"Test FAILED (HTTP {response.status_code})")
                    if response_data.get("detail"):
                        print(f"Error Detail: {response_data.get('detail')}")
            except json.JSONDecodeError:
                print("Response content is not valid JSON:")
                print(response.text)
                print("Test FAILED (Invalid JSON Response)")
        else:
            print("Response content is empty.")
            print(f"Test FAILED (Empty Response, HTTP {response.status_code})")
    except requests.exceptions.ConnectionError as e:
        print(f"Connection Error: {e}")
        print("Test FAILED (Could not connect to the server. Is it running?)")
    except requests.exceptions.Timeout as e:
        print(f"Request Timeout: {e}")
        print("Test FAILED (The request timed out. TTS processing might be too slow or stuck.)")
    except Exception as e:
        print(f"An unexpected error occurred: {e}")
        print("Test FAILED (Unexpected error)")
 if __name__ == "__main__":
    run_test()
--- a/frontend/css/style.css
+++ b/frontend/css/style.css
@ -0,0 +1,330 @@
 /* Modern, clean, and accessible UI styles for Chatterbox TTS */
 body {
    font-family: 'Segoe UI', 'Roboto', 'Arial', sans-serif;
    line-height: 1.7;
    margin: 0;
    padding: 0;
    background-color: #f7f9fa;
    color: #222;
 }
 .container {
    max-width: 1100px;
    margin: 0 auto;
    padding: 0 18px;
 }
 header {
    background: #222e3a;
    color: #fff;
    padding: 1.5rem 0 1rem 0;
    text-align: center;
    border-bottom: 3px solid #4a90e2;
 }
 h1 {
    font-size: 2.4rem;
    margin: 0;
    letter-spacing: 1px;
 }
 main {
    margin-top: 30px;
    margin-bottom: 30px;
 }
 .panel-grid {
    display: flex;
    flex-wrap: wrap;
    gap: 28px;
    justify-content: space-between;
 }
 .panel {
    flex: 1 1 320px;
    min-width: 320px;
    background: none;
    box-shadow: none;
    border: none;
    padding: 0;
 }
 #results-display.panel {
    flex: 1 1 100%;
    min-width: 0;
    margin-top: 32px;
 }
 /* Dialog Table Styles */
 #dialog-items-table {
    width: 100%;
    border-collapse: collapse;
    background: #fff;
    border-radius: 8px;
    overflow: hidden;
    font-size: 1rem;
    margin-bottom: 0;
 }
 #dialog-items-table th, #dialog-items-table td {
    padding: 10px 12px;
    border-bottom: 1px solid #e3e3e3;
    text-align: left;
 }
 #dialog-items-table th {
    background: #f3f7fa;
    color: #4a90e2;
    font-weight: 600;
    font-size: 1.05rem;
 }
 #dialog-items-table tr:last-child td {
    border-bottom: none;
 }
 #dialog-items-table td.actions {
    text-align: center;
    min-width: 90px;
 }
 /* Collapsible log details */
 details#generation-log-details {
    margin-bottom: 0;
    border-radius: 4px;
    background: #f3f5f7;
    box-shadow: 0 1px 3px rgba(44,62,80,0.04);
    padding: 0 0 0 0;
    transition: box-shadow 0.15s;
 }
 details#generation-log-details[open] {
    box-shadow: 0 2px 8px rgba(44,62,80,0.07);
    background: #f9fafb;
 }
 details#generation-log-details summary {
    font-size: 1rem;
    color: #357ab8;
    padding: 10px 0 6px 0;
    outline: none;
 }
 details#generation-log-details summary:focus {
    outline: 2px solid #4a90e2;
    border-radius: 3px;
 }
@media (max-width: 900px) {
    .panel-grid {
        display: block;
        gap: 0;
    }
    .panel, .full-width-panel {
        min-width: 0;
        width: 100%;
        flex: 1 1 100%;
    }
    #dialog-items-table th, #dialog-items-table td {
        font-size: 0.97rem;
        padding: 7px 8px;
    }
    #speaker-management.panel {
        margin-bottom: 36px;
        width: 100%;
        max-width: 100%;
        flex: 1 1 100%;
    }
 }
 .card {
    background: #fff;
    border-radius: 8px;
    box-shadow: 0 2px 8px rgba(44,62,80,0.07);
    padding: 18px 20px;
    margin-bottom: 18px;
 }
 section {
    margin-bottom: 0;
    border-radius: 0;
    padding: 0;
    background: none;
 }
 hr {
    display: none;
 }
 h2 {
    font-size: 1.5rem;
    margin-top: 0;
    margin-bottom: 16px;
    color: #4a90e2;
    letter-spacing: 0.5px;
 }
 h3 {
    font-size: 1.1rem;
    margin-bottom: 10px;
    color: #333;
 }
 .x-remove-btn {
    background: #e74c3c;
    color: #fff;
    border: none;
    border-radius: 50%;
    width: 28px;
    height: 28px;
    font-size: 1.2rem;
    line-height: 1;
    display: inline-flex;
    align-items: center;
    justify-content: center;
    cursor: pointer;
    transition: background 0.15s;
    margin: 0 2px;
    box-shadow: 0 1px 2px rgba(44,62,80,0.06);
    outline: none;
    padding: 0;
 }
 .x-remove-btn:hover, .x-remove-btn:focus {
    background: #c0392b;
    color: #fff;
    outline: 2px solid #e74c3c;
 }
 .form-row {
    display: flex;
    align-items: center;
    gap: 12px;
    margin-bottom: 14px;
 }
 label {
    min-width: 120px;
    font-weight: 500;
    margin-bottom: 0;
 }
 input[type='text'], input[type='file'] {
    padding: 8px 10px;
    border: 1px solid #cfd8dc;
    border-radius: 4px;
    font-size: 1rem;
    width: 100%;
    box-sizing: border-box;
 }
 input[type='file'] {
    background: #f7f7f7;
    font-size: 0.97rem;
 }
 button {
    padding: 9px 18px;
    background: #4a90e2;
    color: #fff;
    border: none;
    border-radius: 5px;
    cursor: pointer;
    font-size: 1rem;
    font-weight: 500;
    transition: background 0.15s;
    margin-right: 10px;
 }
 button:hover, button:focus {
    background: #357ab8;
    outline: none;
 }
 .dialog-controls {
    margin-bottom: 10px;
 }
 #speaker-list {
    list-style: none;
    padding: 0;
    margin: 0;
 }
 #speaker-list li {
    padding: 7px 0;
    border-bottom: 1px solid #e3e3e3;
    display: flex;
    justify-content: space-between;
    align-items: center;
 }
 #speaker-list li:last-child {
    border-bottom: none;
 }
 pre {
    background: #f3f5f7;
    padding: 12px;
    border-radius: 4px;
    font-size: 0.98rem;
    white-space: pre-wrap;
    word-wrap: break-word;
    margin: 0;
 }
 audio {
    width: 100%;
    margin-top: 8px;
    margin-bottom: 8px;
 }
 #zip-archive-link {
    display: inline-block;
    margin-right: 10px;
    color: #fff;
    background: #4a90e2;
    padding: 7px 16px;
    border-radius: 4px;
    text-decoration: none;
    font-weight: 500;
    transition: background 0.15s;
 }
 #zip-archive-link:hover, #zip-archive-link:focus {
    background: #357ab8;
 }
 footer {
    text-align: center;
    padding: 20px 0;
    background: #222e3a;
    color: #fff;
    margin-top: 40px;
    font-size: 1rem;
    border-top: 3px solid #4a90e2;
 }
@media (max-width: 900px) {
    .panel-grid {
        flex-direction: column;
        gap: 22px;
    }
    .panel {
        min-width: 0;
    }
 }
 /* Simple side-by-side layout for speaker management */
 .speaker-mgmt-row {
    display: flex;
    gap: 20px;
 }
 .speaker-mgmt-row .card {
    flex: 1;
    width: 50%;
 }
 /* Stack on mobile */
@media (max-width: 768px) {
    .speaker-mgmt-row {
        flex-direction: column;
    }
    .speaker-mgmt-row .card {
        width: 100%;
    }
 }
--- a/frontend/index.html
+++ b/frontend/index.html
@ -0,0 +1,102 @@
 <!DOCTYPE html>
 <html lang="en">
 <head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Chatterbox TTS Frontend</title>
    <link rel="stylesheet" href="css/style.css">
 </head>
 <body>
    <header>
        <div class="container">
            <h1>Chatterbox TTS</h1>
        </div>
    </header>
    <main class="container" role="main">
        <div class="panel-grid">
            <section id="dialog-editor" class="panel full-width-panel" aria-labelledby="dialog-editor-title">
                <h2 id="dialog-editor-title">Dialog Editor</h2>
                <div class="card">
                    <table id="dialog-items-table">
                        <thead>
                            <tr>
                                <th>Type</th>
                                <th>Speaker</th>
                                <th>Text / Duration</th>
                                <th>Actions</th>
                            </tr>
                        </thead>
                        <tbody id="dialog-items-container">
                            <!-- Dialog items will be rendered here by JavaScript as <tr> -->
                        </tbody>
                    </table>
                </div>
                <div id="temp-input-area" class="card">
                    <!-- Temporary inputs for speech/silence will go here -->
                </div>
                <div class="dialog-controls form-row">
                    <button id="add-speech-line-btn">Add Speech Line</button>
                    <button id="add-silence-line-btn">Add Silence Line</button>
                </div>
                <div class="dialog-controls form-row">
                    <label for="output-base-name">Output Base Name:</label>
                    <input type="text" id="output-base-name" name="output-base-name" value="dialog_output" required>
                </div>
                <button id="generate-dialog-btn">Generate Dialog</button>
            </section>
        </div>
        <!-- Results below -->
        <section id="results-display" class="panel" aria-labelledby="results-display-title">
            <h2 id="results-display-title">Results</h2>
            <div class="card">
                <details id="generation-log-details">
                    <summary style="cursor:pointer;font-weight:500;">Show Generation Log</summary>
                    <pre id="generation-log-content" style="margin-top:12px;">(Generation log will appear here)</pre>
                </details>
            </div>
            <div class="card">
                <h3>Concatenated Audio:</h3>
                <audio id="concatenated-audio-player" controls src=""></audio>
            </div>
            <div class="card">
                <h3>Download Archive:</h3>
                <a id="zip-archive-link" href="#" download style="display: none;">Download ZIP</a>
                <p id="zip-archive-placeholder">(ZIP download link will appear here)</p>
            </div>
        </section>
        <!-- Speaker management row below Results, side by side -->
        <div class="speaker-mgmt-row">
            <div id="speaker-list-container" class="card">
                <h3>Available Speakers</h3>
                <ul id="speaker-list">
                    <!-- Speakers will be populated here by JavaScript -->
                </ul>
            </div>
            <div id="add-speaker-container" class="card">
                <h3>Add New Speaker</h3>
                <form id="add-speaker-form">
                    <div class="form-row">
                        <label for="speaker-name">Speaker Name:</label>
                        <input type="text" id="speaker-name" name="name" required>
                    </div>
                    <div class="form-row">
                        <label for="speaker-sample">Audio Sample (WAV or MP3):</label>
                        <input type="file" id="speaker-sample" name="audio_file" accept=".wav,.mp3" required>
                    </div>
                    <button type="submit">Add Speaker</button>
                </form>
            </div>
        </div>
    </main>
    <footer>
        <div class="container">
            <p>&copy; 2024 Chatterbox TTS</p>
        </div>
    </footer>
    <script src="js/api.js" type="module"></script>
    <script src="js/app.js" type="module" defer></script>
 </body>
 </html>
--- a/frontend/js/api.js
+++ b/frontend/js/api.js
@ -0,0 +1,131 @@
 // frontend/js/api.js
 const API_BASE_URL = 'http://localhost:8000/api'; // Assuming backend runs on port 8000
 /**
 * Fetches the list of available speakers.
 * @returns {Promise<Array<Object>>} A promise that resolves to an array of speaker objects.
 * @throws {Error} If the network response is not ok.
 */
 export async function getSpeakers() {
    const response = await fetch(`${API_BASE_URL}/speakers/`);
    if (!response.ok) {
        const errorData = await response.json().catch(() => ({ message: response.statusText }));
        throw new Error(`Failed to fetch speakers: ${errorData.detail || errorData.message || response.statusText}`);
    }
    return response.json();
 }
 // We will add more functions here: addSpeaker, deleteSpeaker, generateDialog
 // ... (keep API_BASE_URL and getSpeakers)
 /**
 * Adds a new speaker.
 * @param {FormData} formData - The form data containing speaker name and audio file.
 *                               Example: formData.append('name', 'New Speaker');
 *                                        formData.append('audio_sample_file', fileInput.files[0]);
 * @returns {Promise<Object>} A promise that resolves to the new speaker object.
 * @throws {Error} If the network response is not ok.
 */
 export async function addSpeaker(formData) {
    const response = await fetch(`${API_BASE_URL}/speakers/`, {
        method: 'POST',
        body: formData, // FormData sets Content-Type to multipart/form-data automatically
    });
    if (!response.ok) {
        console.log('API_JS_ADD_SPEAKER: Entered !response.ok block. Status:', response.status, 'StatusText:', response.statusText);
        let errorPayload = { detail: `Request failed with status ${response.status}` }; // Default payload
        try {
            console.log('API_JS_ADD_SPEAKER: Attempting to parse error response as JSON...');
            errorPayload = await response.json();
            console.log('API_JS_ADD_SPEAKER: Successfully parsed error JSON:', errorPayload);
        } catch (e) {
            console.warn('API_JS_ADD_SPEAKER: Failed to parse error response as JSON. Error:', e);
            // Use statusText if JSON parsing fails
            errorPayload = { detail: response.statusText || `Request failed with status ${response.status} and no JSON body.`, parseError: e.toString() };
        }
        console.error('--- BEGIN SERVER ERROR PAYLOAD (addSpeaker) ---');
        console.error('Status:', response.status);
        console.error('Status Text:', response.statusText);
        console.error('Parsed Payload:', errorPayload);
        console.error('--- END SERVER ERROR PAYLOAD (addSpeaker) ---');
        let detailedMessage = "Unknown error";
        if (errorPayload && errorPayload.detail) {
            if (typeof errorPayload.detail === 'string') {
                detailedMessage = errorPayload.detail;
            } else {
                // If detail is an array (FastAPI validation errors) or object, stringify it.
                detailedMessage = JSON.stringify(errorPayload.detail);
            }
        } else if (errorPayload && errorPayload.message) { 
            detailedMessage = errorPayload.message;
        } else if (response.statusText) {
            detailedMessage = response.statusText;
        } else {
            detailedMessage = `HTTP error ${response.status}`;
        }
        console.log(`API_JS_ADD_SPEAKER: Constructed detailedMessage: "${detailedMessage}"`);
        console.log(`API_JS_ADD_SPEAKER: Throwing error with message: "Failed to add speaker: ${detailedMessage}"`);
        throw new Error(`Failed to add speaker: ${detailedMessage}`);
    }
    return response.json();
 }
 // ... (keep API_BASE_URL, getSpeakers, addSpeaker)
 /**
 * Deletes a speaker by their ID.
 * @param {string} speakerId - The ID of the speaker to delete.
 * @returns {Promise<Object>} A promise that resolves to the response data (e.g., success message).
 * @throws {Error} If the network response is not ok.
 */
 export async function deleteSpeaker(speakerId) {
    const response = await fetch(`${API_BASE_URL}/speakers/${speakerId}/`, {
        method: 'DELETE',
    });
    if (!response.ok) {
        const errorData = await response.json().catch(() => ({ message: response.statusText }));
        throw new Error(`Failed to delete speaker ${speakerId}: ${errorData.detail || errorData.message || response.statusText}`);
    }
    // Handle 204 No Content specifically, as .json() would fail
    if (response.status === 204) {
        return { message: `Speaker ${speakerId} deleted successfully.` };
    }
    return response.json();
 }
 // ... (keep API_BASE_URL, getSpeakers, addSpeaker, deleteSpeaker)
 /**
 * Generates a dialog by sending a payload to the backend.
 * @param {Object} dialogPayload - The payload for dialog generation.
 *   Example:
 *   {
 *     output_base_name: "my_dialog",
 *     dialog_items: [
 *       { type: "speech", speaker_id: "speaker1", text: "Hello world.", exaggeration: 1.0, cfg_weight: 2.0, temperature: 0.7 },
 *       { type: "silence", duration_ms: 500 },
 *       { type: "speech", speaker_id: "speaker2", text: "How are you?" }
 *     ]
 *   }
 * @returns {Promise<Object>} A promise that resolves to the dialog generation response (log, file URLs).
 * @throws {Error} If the network response is not ok.
 */
 export async function generateDialog(dialogPayload) {
    const response = await fetch(`${API_BASE_URL}/dialog/generate/`, {
        method: 'POST',
        headers: {
            'Content-Type': 'application/json',
        },
        body: JSON.stringify(dialogPayload),
    });
    if (!response.ok) {
        const errorData = await response.json().catch(() => ({ message: response.statusText }));
        throw new Error(`Failed to generate dialog: ${errorData.detail || errorData.message || response.statusText}`);
    }
    return response.json();
 }
--- a/frontend/js/app.js
+++ b/frontend/js/app.js
@ -0,0 +1,390 @@
 import { getSpeakers, addSpeaker, deleteSpeaker, generateDialog } from './api.js';
 const API_BASE_URL = 'http://localhost:8000'; // Assuming backend runs here
 // This should match the base URL from which FastAPI serves static files
 // If your main app is at http://localhost:8000, and static files are served from /generated_audio relative to that,
 // then this should be http://localhost:8000. The backend will return paths like /generated_audio/...
 const API_BASE_URL_FOR_FILES = 'http://localhost:8000'; 
 document.addEventListener('DOMContentLoaded', () => {
    console.log('DOM fully loaded and parsed');
    initializeSpeakerManagement();
    initializeDialogEditor(); // Placeholder for now
    initializeResultsDisplay(); // Placeholder for now
 });
 // --- Speaker Management --- //
 const speakerListUL = document.getElementById('speaker-list');
 const addSpeakerForm = document.getElementById('add-speaker-form');
 function initializeSpeakerManagement() {
    loadSpeakers();
    if (addSpeakerForm) {
        addSpeakerForm.addEventListener('submit', async (event) => {
            event.preventDefault();
            const formData = new FormData(addSpeakerForm);
            const speakerName = formData.get('name');
            const audioFile = formData.get('audio_file');
            if (!speakerName || !audioFile || audioFile.size === 0) {
                alert('Please provide a speaker name and an audio file.');
                return;
            }
            try {
                const newSpeaker = await addSpeaker(formData);
                alert(`Speaker added: ${newSpeaker.name} (ID: ${newSpeaker.id})`);
                addSpeakerForm.reset();
                loadSpeakers(); // Refresh speaker list
            } catch (error) {
                console.error('Failed to add speaker:', error);
                alert('Error adding speaker: ' + error.message);
            }
        });
    }
 }
 async function loadSpeakers() {
    if (!speakerListUL) return;
    try {
        const speakers = await getSpeakers();
        speakerListUL.innerHTML = ''; // Clear existing list
        if (speakers.length === 0) {
            const listItem = document.createElement('li');
            listItem.textContent = 'No speakers available.';
            speakerListUL.appendChild(listItem);
            return;
        }
        speakers.forEach(speaker => {
            const listItem = document.createElement('li');
            // Create a container for the speaker name and delete button
            const container = document.createElement('div');
            container.style.display = 'flex';
            container.style.justifyContent = 'space-between';
            container.style.alignItems = 'center';
            container.style.width = '100%';
            // Add speaker name
            const nameSpan = document.createElement('span');
            nameSpan.textContent = speaker.name;
            container.appendChild(nameSpan);
            // Add delete button
            const deleteBtn = document.createElement('button');
            deleteBtn.textContent = 'Delete';
            deleteBtn.classList.add('delete-speaker-btn');
            deleteBtn.onclick = () => handleDeleteSpeaker(speaker.id);
            container.appendChild(deleteBtn);
            listItem.appendChild(container);
            speakerListUL.appendChild(listItem);
        });
    } catch (error) {
        console.error('Failed to load speakers:', error);
        speakerListUL.innerHTML = '<li>Error loading speakers. See console for details.</li>';
        alert('Error loading speakers: ' + error.message);
    }
 }
 async function handleDeleteSpeaker(speakerId) {
    if (!speakerId) {
        alert('Cannot delete speaker: Speaker ID is missing.');
        return;
    }
    if (!confirm(`Are you sure you want to delete speaker ${speakerId}?`)) return;
    try {
        await deleteSpeaker(speakerId);
        alert(`Speaker ${speakerId} deleted successfully.`);
        loadSpeakers(); // Refresh speaker list
    } catch (error) {
        console.error(`Failed to delete speaker ${speakerId}:`, error);
        alert(`Error deleting speaker: ${error.message}`);
    }
 }
 // --- Dialog Editor --- //
 let dialogItems = []; // Holds the sequence of speech/silence items
 let availableSpeakersCache = []; // To populate speaker dropdown
 function initializeDialogEditor() {
    const dialogItemsContainer = document.getElementById('dialog-items-container');
    const addSpeechLineBtn = document.getElementById('add-speech-line-btn');
    const addSilenceLineBtn = document.getElementById('add-silence-line-btn');
    const outputBaseNameInput = document.getElementById('output-base-name');
    const generateDialogBtn = document.getElementById('generate-dialog-btn');
    // Results Display Elements
    const generationLogPre = document.getElementById('generation-log-content'); // Corrected ID
    const audioPlayer = document.getElementById('concatenated-audio-player'); // Corrected ID
    // audioSource will be the audioPlayer itself, no separate element by default in the HTML
    const downloadZipLink = document.getElementById('zip-archive-link'); // Corrected ID
    const zipArchivePlaceholder = document.getElementById('zip-archive-placeholder');
    const resultsDisplaySection = document.getElementById('results-display');
    let dialogItems = [];
    let availableSpeakersCache = []; // Cache for speaker names and IDs
    // Function to render the current dialogItems array to the DOM as table rows
    function renderDialogItems() {
        if (!dialogItemsContainer) return;
        dialogItemsContainer.innerHTML = '';
        dialogItems.forEach((item, index) => {
            const tr = document.createElement('tr');
            // Type column
            const typeTd = document.createElement('td');
            typeTd.textContent = item.type === 'speech' ? 'Speech' : 'Silence';
            tr.appendChild(typeTd);
            // Speaker column
            const speakerTd = document.createElement('td');
            if (item.type === 'speech') {
                const speaker = availableSpeakersCache.find(s => s.id === item.speaker_id);
                speakerTd.textContent = speaker ? speaker.name : 'Unknown Speaker';
            } else {
                speakerTd.textContent = '—';
            }
            tr.appendChild(speakerTd);
            // Text/Duration column
            const textTd = document.createElement('td');
            if (item.type === 'speech') {
                let txt = item.text.length > 60 ? item.text.substring(0, 57) + '…' : item.text;
                textTd.textContent = `"${txt}"`;
            } else {
                textTd.textContent = `${item.duration}s`;
            }
            tr.appendChild(textTd);
            // Actions column
            const actionsTd = document.createElement('td');
            actionsTd.classList.add('actions');
            const removeBtn = document.createElement('button');
            removeBtn.innerHTML = '&times;'; // Unicode multiplication sign (X)
            removeBtn.classList.add('remove-dialog-item-btn', 'x-remove-btn');
            removeBtn.setAttribute('aria-label', 'Remove dialog line');
            removeBtn.title = 'Remove';
            removeBtn.onclick = () => {
                dialogItems.splice(index, 1);
                renderDialogItems();
            };
            actionsTd.appendChild(removeBtn);
            tr.appendChild(actionsTd);
            dialogItemsContainer.appendChild(tr);
        });
    }
    const tempInputArea = document.getElementById('temp-input-area');
    function clearTempInputArea() {
        if (tempInputArea) tempInputArea.innerHTML = '';
    }
    if (addSpeechLineBtn) {
        addSpeechLineBtn.addEventListener('click', async () => {
            clearTempInputArea(); // Clear any previous inputs
            if (availableSpeakersCache.length === 0) {
                try {
                    availableSpeakersCache = await getSpeakers();
                } catch (error) {
                    alert('Could not load speakers. Please try again.');
                    console.error('Error fetching speakers for dialog:', error);
                    return;
                }
            }
            if (availableSpeakersCache.length === 0) {
                alert('No speakers available. Please add a speaker first.');
                return;
            }
            const speakerSelectLabel = document.createElement('label');
            speakerSelectLabel.textContent = 'Speaker: ';
            speakerSelectLabel.htmlFor = 'temp-speaker-select';
            const speakerSelect = document.createElement('select');
            speakerSelect.id = 'temp-speaker-select';
            availableSpeakersCache.forEach(speaker => {
                const option = document.createElement('option');
                option.value = speaker.id;
                option.textContent = speaker.name;
                speakerSelect.appendChild(option);
            });
            const textInputLabel = document.createElement('label');
            textInputLabel.textContent = ' Text: ';
            textInputLabel.htmlFor = 'temp-speech-text';
            const textInput = document.createElement('textarea');
            textInput.id = 'temp-speech-text';
            textInput.rows = 2;
            textInput.placeholder = 'Enter speech text';
            const addButton = document.createElement('button');
            addButton.textContent = 'Add Speech';
            addButton.onclick = () => {
                const speakerId = speakerSelect.value;
                const text = textInput.value.trim();
                if (!speakerId || !text) {
                    alert('Please select a speaker and enter text.');
                    return;
                }
                dialogItems.push({ type: 'speech', speaker_id: speakerId, text: text });
                renderDialogItems();
                clearTempInputArea();
            };
            const cancelButton = document.createElement('button');
            cancelButton.textContent = 'Cancel';
            cancelButton.onclick = clearTempInputArea;
            if (tempInputArea) {
                tempInputArea.appendChild(speakerSelectLabel);
                tempInputArea.appendChild(speakerSelect);
                tempInputArea.appendChild(textInputLabel);
                tempInputArea.appendChild(textInput);
                tempInputArea.appendChild(addButton);
                tempInputArea.appendChild(cancelButton);
            }
        });
    }
    if (addSilenceLineBtn) {
        addSilenceLineBtn.addEventListener('click', () => {
            clearTempInputArea(); // Clear any previous inputs
            const durationInputLabel = document.createElement('label');
            durationInputLabel.textContent = 'Duration (s): ';
            durationInputLabel.htmlFor = 'temp-silence-duration';
            const durationInput = document.createElement('input');
            durationInput.type = 'number';
            durationInput.id = 'temp-silence-duration';
            durationInput.step = '0.1';
            durationInput.min = '0.1';
            durationInput.placeholder = 'e.g., 0.5';
            const addButton = document.createElement('button');
            addButton.textContent = 'Add Silence';
            addButton.onclick = () => {
                const duration = parseFloat(durationInput.value);
                if (isNaN(duration) || duration <= 0) {
                    alert('Invalid duration. Please enter a positive number.');
                    return;
                }
                dialogItems.push({ type: 'silence', duration: duration });
                renderDialogItems();
                clearTempInputArea();
            };
            const cancelButton = document.createElement('button');
            cancelButton.textContent = 'Cancel';
            cancelButton.onclick = clearTempInputArea;
            if (tempInputArea) {
                tempInputArea.appendChild(durationInputLabel);
                tempInputArea.appendChild(durationInput);
                tempInputArea.appendChild(addButton);
                tempInputArea.appendChild(cancelButton);
            }
        });
    }
    if (generateDialogBtn && outputBaseNameInput) {
        generateDialogBtn.addEventListener('click', async () => {
            const outputBaseName = outputBaseNameInput.value.trim();
            if (!outputBaseName) {
                alert('Please enter an output base name.');
                outputBaseNameInput.focus();
                return;
            }
            if (dialogItems.length === 0) {
                alert('Please add at least one speech or silence line to the dialog.');
                return;
            }
            // Clear previous results and show loading/status
            if (generationLogPre) generationLogPre.textContent = 'Generating dialog...';
            if (audioPlayer) {
                audioPlayer.style.display = 'none';
                audioPlayer.src = ''; // Clear previous audio source
            }
            if (downloadZipLink) {
                downloadZipLink.style.display = 'none';
                downloadZipLink.href = '#';
                downloadZipLink.textContent = '';
            }
            if (zipArchivePlaceholder) zipArchivePlaceholder.style.display = 'block'; // Show placeholder
            if (resultsDisplaySection) resultsDisplaySection.style.display = 'block'; // Make sure it's visible
            const payload = {
                output_base_name: outputBaseName,
                dialog_items: dialogItems.map(item => {
                    // For now, we are not collecting TTS params in the UI for speech items.
                    // The backend will use defaults. If we add UI for these later, they'd be included here.
                    if (item.type === 'speech') {
                        return {
                            type: item.type,
                            speaker_id: item.speaker_id,
                            text: item.text,
                            // exaggeration: item.exaggeration, // Example for future UI enhancement
                            // cfg_weight: item.cfg_weight,
                            // temperature: item.temperature
                        };
                    }
                    return item; // for silence items
                })
            };
            try {
                console.log('Generating dialog with payload:', JSON.stringify(payload, null, 2));
                const result = await generateDialog(payload);
                console.log('Dialog generation successful:', result);
                if (generationLogPre) generationLogPre.textContent = result.log || 'No log output.';
                if (result.concatenated_audio_url && audioPlayer) { // Check audioPlayer, not audioSource
                    audioPlayer.src = result.concatenated_audio_url.startsWith('http') ? result.concatenated_audio_url : `${API_BASE_URL_FOR_FILES}${result.concatenated_audio_url}`;
                    audioPlayer.load(); // Call load() after setting new source
                    audioPlayer.style.display = 'block';
                } else {
                    if (audioPlayer) audioPlayer.style.display = 'none'; // Ensure it's hidden if no URL
                    if (generationLogPre) generationLogPre.textContent += '\nNo concatenated audio URL found.';
                }
                if (result.zip_archive_url && downloadZipLink) {
                    downloadZipLink.href = result.zip_archive_url.startsWith('http') ? result.zip_archive_url : `${API_BASE_URL_FOR_FILES}${result.zip_archive_url}`;
                    downloadZipLink.textContent = `Download ${outputBaseName}.zip`;
                    downloadZipLink.style.display = 'block';
                    if (zipArchivePlaceholder) zipArchivePlaceholder.style.display = 'none'; // Hide placeholder
                } else {
                    if (downloadZipLink) downloadZipLink.style.display = 'none';
                    if (zipArchivePlaceholder) zipArchivePlaceholder.style.display = 'block'; // Show placeholder if no link
                    if (generationLogPre) generationLogPre.textContent += '\nNo ZIP archive URL found.';
                }
            } catch (error) {
                console.error('Dialog generation failed:', error);
                if (generationLogPre) generationLogPre.textContent = `Error generating dialog: ${error.message}`;
                alert(`Error generating dialog: ${error.message}`);
            }
        });
    }
    console.log('Dialog Editor Initialized');
    renderDialogItems(); // Initial render (empty)
 }
 // --- Results Display --- //
 function initializeResultsDisplay() {
    const generationLogContent = document.getElementById('generation-log-content');
    const concatenatedAudioPlayer = document.getElementById('concatenated-audio-player');
    const zipArchiveLink = document.getElementById('zip-archive-link');
    const zipArchivePlaceholder = document.getElementById('zip-archive-placeholder');
    // Functions to update these elements will be called by the generateDialog handler
    // e.g., updateLog(message), setAudioSource(url), setZipLink(url)
    console.log('Results Display Initialized');
 }
--- a/frontend/tests/api.test.js
+++ b/frontend/tests/api.test.js
@ -0,0 +1,196 @@
 // frontend/tests/api.test.js
 // Import the function to test (adjust path if your structure is different)
 // We might need to configure Jest or use Babel for ES module syntax if this causes issues.
 import { getSpeakers, addSpeaker, deleteSpeaker, generateDialog } from '../js/api.js';
 // Mock the global fetch function
 global.fetch = jest.fn();
 const API_BASE_URL = 'http://localhost:8000/api'; // Centralize for all tests
 describe('API Client - getSpeakers', () => {
    beforeEach(() => {
        // Clear all instances and calls to constructor and all methods:
        fetch.mockClear();
    });
    it('should fetch speakers successfully', async () => {
        const mockSpeakers = [{ id: '1', name: 'Speaker 1' }, { id: '2', name: 'Speaker 2' }];
        fetch.mockResolvedValueOnce({
            ok: true,
            json: async () => mockSpeakers,
        });
        const speakers = await getSpeakers();
        expect(fetch).toHaveBeenCalledTimes(1);
        expect(fetch).toHaveBeenCalledWith(`${API_BASE_URL}/speakers`);
        expect(speakers).toEqual(mockSpeakers);
    });
    it('should throw an error if the network response is not ok', async () => {
        fetch.mockResolvedValueOnce({
            ok: false,
            statusText: 'Not Found',
            json: async () => ({ detail: 'Speakers not found' }) // Simulate FastAPI error response
        });
        await expect(getSpeakers()).rejects.toThrow('Failed to fetch speakers: Speakers not found');
        expect(fetch).toHaveBeenCalledTimes(1);
    });
    it('should throw a generic error if parsing error response fails', async () => {
        fetch.mockResolvedValueOnce({
            ok: false,
            statusText: 'Internal Server Error',
            json: async () => { throw new Error('Failed to parse error JSON'); } // Simulate error during .json()
        });
        await expect(getSpeakers()).rejects.toThrow('Failed to fetch speakers: Internal Server Error');
        expect(fetch).toHaveBeenCalledTimes(1);
    });
    it('should throw an error if fetch itself fails (network error)', async () => {
        fetch.mockRejectedValueOnce(new TypeError('Network failed'));
        await expect(getSpeakers()).rejects.toThrow('Network failed'); // This will be the original fetch error
        expect(fetch).toHaveBeenCalledTimes(1);
    });
 });
 describe('API Client - addSpeaker', () => {
    beforeEach(() => {
        fetch.mockClear();
    });
    it('should add a speaker successfully', async () => {
        const mockFormData = new FormData(); // In a real scenario, this would have data
        mockFormData.append('name', 'Test Speaker');
        // mockFormData.append('audio_sample_file', new File([''], 'sample.wav')); // File creation in Node test needs more setup or a mock
        const mockResponse = { id: '3', name: 'Test Speaker', message: 'Speaker added successfully' };
        fetch.mockResolvedValueOnce({
            ok: true,
            json: async () => mockResponse,
        });
        const result = await addSpeaker(mockFormData);
        expect(fetch).toHaveBeenCalledTimes(1);
        expect(fetch).toHaveBeenCalledWith(`${API_BASE_URL}/speakers`, {
            method: 'POST',
            body: mockFormData,
        });
        expect(result).toEqual(mockResponse);
    });
    it('should throw an error if adding a speaker fails', async () => {
        const mockFormData = new FormData();
        fetch.mockResolvedValueOnce({
            ok: false,
            statusText: 'Bad Request',
            json: async () => ({ detail: 'Invalid speaker data' }),
        });
        await expect(addSpeaker(mockFormData)).rejects.toThrow('Failed to add speaker: Invalid speaker data');
        expect(fetch).toHaveBeenCalledTimes(1);
    });
 });
 describe('API Client - deleteSpeaker', () => {
    beforeEach(() => {
        fetch.mockClear();
    });
    it('should delete a speaker successfully with JSON response', async () => {
        const speakerId = 'test-speaker-id-123';
        const mockResponse = { message: `Speaker ${speakerId} deleted successfully` };
        fetch.mockResolvedValueOnce({
            ok: true,
            status: 200, // Or any 2xx status that might return JSON
            json: async () => mockResponse,
        });
        const result = await deleteSpeaker(speakerId);
        expect(fetch).toHaveBeenCalledTimes(1);
        expect(fetch).toHaveBeenCalledWith(`${API_BASE_URL}/speakers/${speakerId}`, {
            method: 'DELETE',
        });
        expect(result).toEqual(mockResponse);
    });
    it('should handle successful deletion with 204 No Content response', async () => {
        const speakerId = 'test-speaker-id-204';
        fetch.mockResolvedValueOnce({
            ok: true,
            status: 204,
            statusText: 'No Content',
            // .json() is not called by the function if status is 204
        });
        const result = await deleteSpeaker(speakerId);
        expect(fetch).toHaveBeenCalledTimes(1);
        expect(fetch).toHaveBeenCalledWith(`${API_BASE_URL}/speakers/${speakerId}`, {
            method: 'DELETE',
        });
        expect(result).toEqual({ message: `Speaker ${speakerId} deleted successfully.` });
    });
    it('should throw an error if deleting a speaker fails (e.g., speaker not found)', async () => {
        const speakerId = 'non-existent-speaker-id';
        fetch.mockResolvedValueOnce({
            ok: false,
            status: 404,
            statusText: 'Not Found',
            json: async () => ({ detail: 'Speaker not found' }),
        });
        await expect(deleteSpeaker(speakerId)).rejects.toThrow(`Failed to delete speaker ${speakerId}: Speaker not found`);
        expect(fetch).toHaveBeenCalledTimes(1);
    });
 });
 describe('API Client - generateDialog', () => {
    beforeEach(() => {
        fetch.mockClear();
    });
    it('should generate dialog successfully', async () => {
        const mockPayload = {
            output_base_name: "test_dialog",
            dialog_items: [
                { type: "speech", speaker_id: "spk_1", text: "Hello.", exaggeration: 1.0, cfg_weight: 3.0, temperature: 0.5 },
                { type: "silence", duration_ms: 250 }
            ]
        };
        const mockResponse = {
            log: "Dialog generated.",
            concatenated_audio_url: "/audio/test_dialog_concatenated.wav",
            zip_archive_url: "/audio/test_dialog.zip"
        };
        fetch.mockResolvedValueOnce({
            ok: true,
            json: async () => mockResponse,
        });
        const result = await generateDialog(mockPayload);
        expect(fetch).toHaveBeenCalledTimes(1);
        expect(fetch).toHaveBeenCalledWith(`${API_BASE_URL}/dialog/generate`, {
            method: 'POST',
            headers: { 'Content-Type': 'application/json' },
            body: JSON.stringify(mockPayload),
        });
        expect(result).toEqual(mockResponse);
    });
    it('should throw an error if dialog generation fails', async () => {
        const mockPayload = { output_base_name: "fail_dialog", dialog_items: [] }; // Example invalid payload
        fetch.mockResolvedValueOnce({
            ok: false,
            statusText: 'Bad Request',
            json: async () => ({ detail: 'Invalid dialog data' }),
        });
        await expect(generateDialog(mockPayload)).rejects.toThrow('Failed to generate dialog: Invalid dialog data');
        expect(fetch).toHaveBeenCalledTimes(1);
    });
 });
--- a/package-lock.json
+++ b/package-lock.json
--- a/package.json
+++ b/package.json
@ -0,0 +1,23 @@
 {
  "name": "chatterbox-test",
  "version": "1.0.0",
  "description": "This Gradio application provides a user interface for text-to-speech generation using the Chatterbox TTS model. It supports both single utterance generation and multi-speaker dialog generation with configurable silence gaps.",
  "main": "index.js",
  "type": "module",
  "scripts": {
    "test": "jest"
  },
  "repository": {
    "type": "git",
    "url": "https://oauth2:78f77aaebb8fa1cd3efbd5b738177c127f7d7d0b@gitea.r8z.us/stwhite/chatterbox-ui.git"
  },
  "keywords": [],
  "author": "",
  "license": "ISC",
  "devDependencies": {
    "@babel/core": "^7.27.4",
    "@babel/preset-env": "^7.27.2",
    "babel-jest": "^30.0.0-beta.3",
    "jest": "^29.7.0"
  }
 }
--- a/speaker_data/speakers.yaml
+++ b/speaker_data/speakers.yaml
@ -0,0 +1,12 @@
 831c1dbe-c379-4d9f-868b-9798adc3c05d:
  name: Adam
  sample_path: speaker_samples/831c1dbe-c379-4d9f-868b-9798adc3c05d.wav
 608903c4-b157-46c5-a0ea-4b25eb4b83b6:
  name: Denise
  sample_path: speaker_samples/608903c4-b157-46c5-a0ea-4b25eb4b83b6.wav
 3c93c9df-86dc-4d67-ab55-8104b9301190:
  name: Maria
  sample_path: speaker_samples/3c93c9df-86dc-4d67-ab55-8104b9301190.wav
 fb84ce1c-f32d-4df9-9673-2c64e9603133:
  name: Debbie
  sample_path: speaker_samples/fb84ce1c-f32d-4df9-9673-2c64e9603133.wav
--- a/storage_service.py
+++ b/storage_service.py
@ -0,0 +1,110 @@
 """
 Project storage service for saving and loading Chatterbox TTS projects.
 """
 import json
 import os
 import asyncio
 from pathlib import Path
 from typing import List, Optional
 from datetime import datetime
 from models import DialogProject, DialogLine
 class ProjectStorage:
    """Handles saving and loading projects to/from JSON files."""
    def __init__(self, storage_dir: str = "projects"):
        self.storage_dir = Path(storage_dir)
        self.storage_dir.mkdir(exist_ok=True)
    async def save_project(self, project: DialogProject) -> bool:
        """Save a project to a JSON file."""
        try:
            project_file = self.storage_dir / f"{project.id}.json"
            # Convert to dict and ensure timestamps are strings
            project_data = project.dict()
            project_data["last_modified"] = datetime.now().isoformat()
            # Ensure created_at is set if not already
            if not project_data.get("created_at"):
                project_data["created_at"] = datetime.now().isoformat()
            with open(project_file, 'w', encoding='utf-8') as f:
                json.dump(project_data, f, indent=2, ensure_ascii=False)
            return True
        except Exception as e:
            print(f"Error saving project {project.id}: {e}")
            return False
    async def load_project(self, project_id: str) -> Optional[DialogProject]:
        """Load a project from a JSON file."""
        try:
            project_file = self.storage_dir / f"{project_id}.json"
            if not project_file.exists():
                return None
            with open(project_file, 'r', encoding='utf-8') as f:
                project_data = json.load(f)
            # Validate that audio files still exist
            for line in project_data.get("lines", []):
                if line.get("audio_url"):
                    audio_path = Path("dialog_output") / line["audio_url"].split("/")[-1]
                    if not audio_path.exists():
                        line["audio_url"] = None
                        line["status"] = "pending"
            return DialogProject(**project_data)
        except Exception as e:
            print(f"Error loading project {project_id}: {e}")
            return None
    async def list_projects(self) -> List[dict]:
        """List all saved projects with metadata."""
        projects = []
        for project_file in self.storage_dir.glob("*.json"):
            try:
                with open(project_file, 'r', encoding='utf-8') as f:
                    project_data = json.load(f)
                projects.append({
                    "id": project_data["id"],
                    "name": project_data["name"],
                    "created_at": project_data.get("created_at"),
                    "last_modified": project_data.get("last_modified"),
                    "line_count": len(project_data.get("lines", [])),
                    "has_audio": any(line.get("audio_url") for line in project_data.get("lines", []))
                })
            except Exception as e:
                print(f"Error reading project file {project_file}: {e}")
                continue
        # Sort by last modified (most recent first)
        projects.sort(key=lambda x: x.get("last_modified", ""), reverse=True)
        return projects
    async def delete_project(self, project_id: str) -> bool:
        """Delete a saved project."""
        try:
            project_file = self.storage_dir / f"{project_id}.json"
            if project_file.exists():
                project_file.unlink()
                return True
            return False
        except Exception as e:
            print(f"Error deleting project {project_id}: {e}")
            return False
    async def project_exists(self, project_id: str) -> bool:
        """Check if a project exists in storage."""
        project_file = self.storage_dir / f"{project_id}.json"
        return project_file.exists()
 # Global storage instance
 project_storage = ProjectStorage()
Author	SHA1	Message	Date
Steve White	9e4fb35800	Working dialog generator	2025-06-05 18:46:09 -05:00
Steve White	6adcadded1	chore: remove node_modules from git tracking and add to .gitignore	2025-06-05 17:40:27 -05:00
Steve White	4a294608b1	Working layout.	2025-06-05 17:38:12 -05:00
Steve White	b5db7172cf	Working minimum interface for js and api	2025-06-05 16:47:47 -05:00
Steve White	9d1dc330ea	Update docs in .noew	2025-06-05 09:22:54 -05:00
Steve White	b781d8abcf	Updated note directory- gradio interface working.	2025-06-05 09:20:19 -05:00