Update docs in .noew

2025-06-05 09:22:54 -05:00 · 2025-06-05 09:22:54 -05:00 · 9d1dc330ea
parent b781d8abcf
commit 9d1dc330ea
2 changed files with 68 additions and 67 deletions
--- a/.note/current_focus.md
+++ b/.note/current_focus.md
@ -15,5 +15,6 @@
 - Awaiting your feedback on the detailed migration plan (see `.note/detailed_migration_plan.md`).

 **Next Steps (pending your approval of plan):**
+
 - Begin Phase 1: Backend API Development (FastAPI).
  - Task 1.1: Project Setup (FastAPI project structure, `requirements.txt`).
--- a/.note/detailed_migration_plan.md
+++ b/.note/detailed_migration_plan.md
@ -2,93 +2,93 @@

 This plan outlines the steps to re-implement the dialog generation features of the Chatterbox TTS application, moving from the current Gradio-based implementation to a FastAPI backend and a vanilla JavaScript frontend. It incorporates findings from `gradio_app.py` and aligns with the existing high-level strategy (MEMORY[c20c2cce-46d4-453f-9bc3-c18e05dbc66f]).

-### 1. Backend (FastAPI) Development
+## 1. Backend (FastAPI) Development

 **Objective:** Create a robust API to handle TTS generation, speaker management, and file delivery.

 **Key Modules/Components:**

-* **API Endpoints:**
-  * `POST /api/dialog/generate`:
-    * **Input**: Structured list: `[{type: "speech", speaker_id: "str", text: "str"}, {type: "silence", duration: float}]`, `output_base_name: str`.
-    * **Output**: JSON with `log: str`, `concatenated_audio_url: str`, `zip_archive_url: str`.
-  * `GET /api/speakers`: Returns list of available speakers (`[{id: "str", name: "str", sample_path: "str"}]`).
-  * `POST /api/speakers`: Adds a new speaker. Input: `name: str`, `audio_sample_file: UploadFile`. Output: `{id: "str", name: "str", message: "str"}`.
-  * `DELETE /api/speakers/{speaker_id}`: Removes a speaker.
-* **Core Logic & Services:**
-  * `TTSService`:
-    * Manages `ChatterboxTTS` model instance(s) (loading, inference, memory cleanup).
-    * Handles `ChatterboxTTS.generate()` calls, incorporating parameters like `exaggeration`, `cfg_weight`, `temperature` (decision needed on exposure vs. defaults).
-    * Implements rigorous memory management (inspired by `generate_audio` and `process_dialog`'s `reinit_each_line` concept).
-  * `DialogProcessorService`:
-    * Orchestrates dialog generation using `TTSService`.
-    * Implements `split_text_at_sentence_boundaries` logic for long text inputs.
-    * Manages generation of individual audio segments.
-  * `AudioManipulationService`:
-    * Concatenates audio segments using `torch` and `torchaudio`, inserting specified silences.
-    * Creates ZIP archives of all generated audio files using `zipfile`.
-  * `SpeakerManagementService`:
-    * Manages `speakers.yaml` (or alternative storage) for speaker metadata.
-    * Handles storage and retrieval of speaker audio samples (e.g., in `speaker_samples/`).
-* **File Handling:**
-  * Strategy for storing and serving generated `.wav` and `.zip` files (e.g., FastAPI `StaticFiles`, temporary directories, or cloud storage).
+*   **API Endpoints:**
+    *   `POST /api/dialog/generate`:
+        *   **Input**: Structured list: `[{type: "speech", speaker_id: "str", text: "str"}, {type: "silence", duration: float}]`, `output_base_name: str`.
+        *   **Output**: JSON with `log: str`, `concatenated_audio_url: str`, `zip_archive_url: str`.
+    *   `GET /api/speakers`: Returns list of available speakers (`[{id: "str", name: "str", sample_path: "str"}]`).
+    *   `POST /api/speakers`: Adds a new speaker. Input: `name: str`, `audio_sample_file: UploadFile`. Output: `{id: "str", name: "str", message: "str"}`.
+    *   `DELETE /api/speakers/{speaker_id}`: Removes a speaker.
+*   **Core Logic & Services:**
+    *   `TTSService`:
+        *   Manages `ChatterboxTTS` model instance(s) (loading, inference, memory cleanup).
+        *   Handles `ChatterboxTTS.generate()` calls, incorporating parameters like `exaggeration`, `cfg_weight`, `temperature` (decision needed on exposure vs. defaults).
+        *   Implements rigorous memory management (inspired by `generate_audio` and `process_dialog`'s `reinit_each_line` concept).
+    *   `DialogProcessorService`:
+        *   Orchestrates dialog generation using `TTSService`.
+        *   Implements `split_text_at_sentence_boundaries` logic for long text inputs.
+        *   Manages generation of individual audio segments.
+    *   `AudioManipulationService`:
+        *   Concatenates audio segments using `torch` and `torchaudio`, inserting specified silences.
+        *   Creates ZIP archives of all generated audio files using `zipfile`.
+    *   `SpeakerManagementService`:
+        *   Manages `speakers.yaml` (or alternative storage) for speaker metadata.
+        *   Handles storage and retrieval of speaker audio samples (e.g., in `speaker_samples/`).
+*   **File Handling:**
+    *   Strategy for storing and serving generated `.wav` and `.zip` files (e.g., FastAPI `StaticFiles`, temporary directories, or cloud storage).

 **Implementation Steps (Phase 1):**

-1. **Project Setup:** Initialize FastAPI project, define dependencies (`fastapi`, `uvicorn`, `python-multipart`, `pyyaml`, `torch`, `torchaudio`, `chatterbox-tts`).
-2. **Speaker Management:** Implement `SpeakerManagementService` and the `/api/speakers` endpoints.
-3. **TTS Core:** Develop `TTSService`, focusing on model loading, inference, and critical memory management.
-4. **Dialog Processing:** Implement `DialogProcessorService` including text splitting.
-5. **Audio Utilities:** Create `AudioManipulationService` for concatenation and zipping.
-6. **Main Endpoint:** Implement `POST /api/dialog/generate` orchestrating the services.
-7. **Configuration:** Manage paths (`speakers.yaml`, sample storage, output directories) and TTS settings.
-8. **Testing:** Thoroughly test all API endpoints using tools like Postman or `curl`.
+1.  **Project Setup:** Initialize FastAPI project, define dependencies (`fastapi`, `uvicorn`, `python-multipart`, `pyyaml`, `torch`, `torchaudio`, `chatterbox-tts`).
+2.  **Speaker Management:** Implement `SpeakerManagementService` and the `/api/speakers` endpoints.
+3.  **TTS Core:** Develop `TTSService`, focusing on model loading, inference, and critical memory management.
+4.  **Dialog Processing:** Implement `DialogProcessorService` including text splitting.
+5.  **Audio Utilities:** Create `AudioManipulationService` for concatenation and zipping.
+6.  **Main Endpoint:** Implement `POST /api/dialog/generate` orchestrating the services.
+7.  **Configuration:** Manage paths (`speakers.yaml`, sample storage, output directories) and TTS settings.
+8.  **Testing:** Thoroughly test all API endpoints using tools like Postman or `curl`.

-### 2. Frontend (Vanilla JavaScript) Development
+## 2. Frontend (Vanilla JavaScript) Development

 **Objective:** Create an intuitive UI for dialog construction, speaker management, and interaction with the backend.

 **Key Modules/Components:**

-* **HTML (`index.html`):** Structure for dialog editor, speaker controls, results display.
-* **CSS (`style.css`):** Styling for a clean and usable interface.
-* **JavaScript (`app.js`, `api.js`, `ui.js`):
-  * `api.js`: Functions for all backend API communications (`fetch`).
-  * `ui.js`: DOM manipulation for dynamic dialog lines, speaker lists, and results rendering.
-  * `app.js`: Main application logic, event handling, state management (for dialog lines, speaker data).
+*   **HTML (`index.html`):** Structure for dialog editor, speaker controls, results display.
+*   **CSS (`style.css`):** Styling for a clean and usable interface.
+*   **JavaScript (`app.js`, `api.js`, `ui.js`):**
+    *   `api.js`: Functions for all backend API communications (`fetch`).
+    *   `ui.js`: DOM manipulation for dynamic dialog lines, speaker lists, and results rendering.
+    *   `app.js`: Main application logic, event handling, state management (for dialog lines, speaker data).

 **Implementation Steps (Phase 2):**

-1. **Basic Layout:** Create `index.html` and `style.css`.
-2. **API Client:** Develop `api.js` to interface with all backend endpoints.
-3. **Speaker UI:**
-  * Fetch and display speakers using `ui.js` and `api.js`.
-  * Implement forms and logic for adding (with file upload) and removing speakers.
-4. **Dialog Editor UI:**
-  * Dynamically add/remove/reorder dialog lines (speech/silence).
-  * Inputs for speaker selection (populated from API), text, and silence duration.
-  * Input for `output_base_name`.
-5. **Interaction & Results:**
-  * "Generate Dialog" button to submit data via `api.js`.
-  * Display generation log, audio player for concatenated output, and download link for ZIP file.
+1.  **Basic Layout:** Create `index.html` and `style.css`.
+2.  **API Client:** Develop `api.js` to interface with all backend endpoints.
+3.  **Speaker UI:**
+    *   Fetch and display speakers using `ui.js` and `api.js`.
+    *   Implement forms and logic for adding (with file upload) and removing speakers.
+4.  **Dialog Editor UI:**
+    *   Dynamically add/remove/reorder dialog lines (speech/silence).
+    *   Inputs for speaker selection (populated from API), text, and silence duration.
+    *   Input for `output_base_name`.
+5.  **Interaction & Results:**
+    *   "Generate Dialog" button to submit data via `api.js`.
+    *   Display generation log, audio player for concatenated output, and download link for ZIP file.

-### 3. Integration & Testing (Phase 3)
+## 3. Integration & Testing (Phase 3)

-1. **Full System Connection:** Ensure seamless frontend-backend communication.
-2. **End-to-End Testing:** Test various dialog scenarios, speaker configurations, and error conditions.
-3. **Performance & Memory:** Profile backend memory usage during generation; refine `TTSService` memory strategies if needed.
-4. **UX Refinement:** Iterate on UI/UX based on testing feedback.
+1.  **Full System Connection:** Ensure seamless frontend-backend communication.
+2.  **End-to-End Testing:** Test various dialog scenarios, speaker configurations, and error conditions.
+3.  **Performance & Memory:** Profile backend memory usage during generation; refine `TTSService` memory strategies if needed.
+4.  **UX Refinement:** Iterate on UI/UX based on testing feedback.

-### 4. Advanced Features & Deployment (Phase 4)
+## 4. Advanced Features & Deployment (Phase 4)

-* (As per MEMORY[c20c2cce-46d4-453f-9bc3-c18e05dbc66f])
-* **Real-time Updates:** Consider WebSockets for live progress during generation.
-* **Deployment Strategy:** Plan for deploying the FastAPI application and serving the static frontend assets.
+*   (As per MEMORY[c20c2cce-46d4-453f-9bc3-c18e05dbc66f])
+*   **Real-time Updates:** Consider WebSockets for live progress during generation.
+*   **Deployment Strategy:** Plan for deploying the FastAPI application and serving the static frontend assets.

-### Key Considerations from `gradio_app.py` Analysis:
+## Key Considerations from `gradio_app.py` Analysis

-* **Memory Management for TTS Model:** This is critical. The `reinit_each_line` option and explicit cleanup in `generate_audio` highlight this. The FastAPI backend must handle this robustly.
-* **Text Chunking:** The `split_text_at_sentence_boundaries` (max 300 chars) logic is essential and must be replicated.
-* **Dialog Parsing:** The `Speaker: "Text"` and `Silence: duration` format should be the basis for the frontend data structure sent to the backend.
-* **TTS Parameters:** Decide whether to expose advanced TTS parameters (`exaggeration`, `cfg_weight`, `temperature`) for dialog lines in the new API.
-* **File Output:** The backend needs to replicate the generation of individual segment files, a concatenated file, and a ZIP archive.
+*   **Memory Management for TTS Model:** This is critical. The `reinit_each_line` option and explicit cleanup in `generate_audio` highlight this. The FastAPI backend must handle this robustly.
+*   **Text Chunking:** The `split_text_at_sentence_boundaries` (max 300 chars) logic is essential and must be replicated.
+*   **Dialog Parsing:** The `Speaker: "Text"` and `Silence: duration` format should be the basis for the frontend data structure sent to the backend.
+*   **TTS Parameters:** Decide whether to expose advanced TTS parameters (`exaggeration`, `cfg_weight`, `temperature`) for dialog lines in the new API.
+*   **File Output:** The backend needs to replicate the generation of individual segment files, a concatenated file, and a ZIP archive.