Compare commits

..

No commits in common. "main" and "dev" have entirely different histories.
main ... dev

25 changed files with 230 additions and 1765 deletions

1
.gitignore vendored
View File

@ -22,4 +22,3 @@ backend/tts_generated_dialogs/
# Node.js dependencies # Node.js dependencies
node_modules/ node_modules/
.aider*

View File

@ -1,188 +0,0 @@
# Chatterbox TTS Backend: Bounded Concurrency + File I/O Offload Plan
Date: 2025-08-14
Owner: Backend
Status: Proposed (ready to implement)
## Goals
- Increase GPU utilization and reduce wall-clock time for dialog generation.
- Keep model lifecycle stable (leveraging current `ModelManager`).
- Minimal-risk changes: no API shape changes to clients.
## Scope
- Implement bounded concurrency for per-line speech chunk generation within a single dialog request.
- Offload audio file writes to threads to overlap GPU compute and disk I/O.
- Add configuration knobs to tune concurrency.
## Current State (References)
- `backend/app/services/dialog_processor_service.py`
- `DialogProcessorService.process_dialog()` iterates items and awaits `tts_service.generate_speech(...)` sequentially (lines ~171201).
- `backend/app/services/tts_service.py`
- `TTSService.generate_speech()` runs the TTS forward and calls `torchaudio.save(...)` on the event loop thread (blocking).
- `backend/app/services/model_manager.py`
- `ModelManager.using()` tracks active work; prevents idle eviction during requests.
- `backend/app/routers/dialog.py`
- `process_dialog_flow()` expects ordered `segment_files` and then concatenates; good to keep order stable.
## Design Overview
1) Bounded concurrency at dialog level
- Plan all output segments with a stable `segment_idx` (including speech chunks, silence, and reused audio).
- For speech chunks, schedule concurrent async tasks with a global semaphore set by config `TTS_MAX_CONCURRENCY` (start at 34).
- Await all tasks and collate results by `segment_idx` to preserve order.
2) File I/O offload
- Replace direct `torchaudio.save(...)` with `await asyncio.to_thread(torchaudio.save, ...)` in `TTSService.generate_speech()`.
- This lets the next GPU forward start while previous file writes happen on worker threads.
## Configuration
Add to `backend/app/config.py`:
- `TTS_MAX_CONCURRENCY: int` (default: `int(os.getenv("TTS_MAX_CONCURRENCY", "3"))`).
- Optional (future): `TTS_ENABLE_AMP_ON_CUDA: bool = True` to allow mixed precision on CUDA only.
## Implementation Steps
### A. Dialog-level concurrency
- File: `backend/app/services/dialog_processor_service.py`
- Function: `DialogProcessorService.process_dialog()`
1. Planning pass to assign indices
- Iterate `dialog_items` and build a list `planned_segments` entries:
- For silence or reuse: immediately append a final result with assigned `segment_idx` and continue.
- For speech: split into `text_chunks`; for each chunk create a planned entry: `{ segment_idx, type: 'speech', speaker_id, text_chunk, abs_speaker_sample_path, tts_params }`.
- Increment `segment_idx` for every planned segment (speech chunk or silence/reuse) to preserve final order.
2. Concurrency setup
- Create `sem = asyncio.Semaphore(config.TTS_MAX_CONCURRENCY)`.
- For each planned speech segment, create a task with an inner wrapper:
```python
async def run_one(planned):
async with sem:
try:
out_path = await self.tts_service.generate_speech(
text=planned.text_chunk,
speaker_sample_path=planned.abs_speaker_sample_path,
output_filename_base=planned.filename_base,
output_dir=dialog_temp_dir,
exaggeration=planned.exaggeration,
cfg_weight=planned.cfg_weight,
temperature=planned.temperature,
)
return planned.segment_idx, {"type": "speech", "path": str(out_path), "speaker_id": planned.speaker_id, "text_chunk": planned.text_chunk}
except Exception as e:
return planned.segment_idx, {"type": "error", "message": f"Error generating speech: {e}", "text_chunk": planned.text_chunk}
```
- Schedule with `asyncio.create_task(run_one(p))` and collect tasks.
3. Await and collate
- `results_map = {}`; for each completed task, set `results_map[idx] = payload`.
- Merge: start with all previously final (silence/reuse/error) entries placed by `segment_idx`, then fill speech results by `segment_idx` into a single `segment_results` list sorted ascending by index.
- Keep `processing_log` entries for each planned segment (queued, started, finished, errors).
4. Return value unchanged
- Return `{"log": ..., "segment_files": segment_results, "temp_dir": str(dialog_temp_dir)}`. This maintains router and concatenator behavior.
### B. Offload audio writes
- File: `backend/app/services/tts_service.py`
- Function: `TTSService.generate_speech()`
1. After obtaining `wav` tensor, replace:
```python
# torchaudio.save(str(output_file_path), wav, self.model.sr)
```
with:
```python
await asyncio.to_thread(torchaudio.save, str(output_file_path), wav, self.model.sr)
```
- Keep the rest of cleanup logic (delete `wav`, `gc.collect()`, cache emptying) unchanged.
2. Optional (CUDA-only AMP)
- If CUDA is used and `config.TTS_ENABLE_AMP_ON_CUDA` is True, wrap forward with AMP:
```python
with torch.cuda.amp.autocast(dtype=torch.float16):
wav = self.model.generate(...)
```
- Leave MPS/CPU code path as-is.
## Error Handling & Ordering
- Every planned segment owns a unique `segment_idx`.
- On failure, insert an error record at that index; downstream concatenation will skip missing/nonexistent paths already.
- Preserve exact output order expected by `routers/dialog.py::process_dialog_flow()`.
## Performance Expectations
- GPU util should increase from ~50% to 7590% depending on dialog size and line lengths.
- Wall-clock reduction is workload-dependent; target 1.52.5x on multi-line dialogs.
## Metrics & Instrumentation
- Add timestamped log entries per segment: planned→queued→started→saved.
- Log effective concurrency (max in-flight), and cumulative GPU time if available.
- Optionally add a simple timing summary at end of `process_dialog()`.
## Testing Plan
1. Unit-ish
- Small dialog (3 speech lines, 1 silence). Ensure ordering is stable and files exist.
- Introduce an invalid speaker to verify error propagation doesnt break the rest.
2. Integration
- POST `/api/dialog/generate` with 2050 mixed-length lines and a couple silences.
- Validate: response OK, concatenated file exists, zip contains all generated speech segments, order preserved.
- Compare runtime vs. sequential baseline (before/after).
3. Stress/limits
- Long lines split into many chunks; verify no OOM with `TTS_MAX_CONCURRENCY`=3.
- Try `TTS_MAX_CONCURRENCY`=1 to simulate sequential; compare metrics.
## Rollout & Config Defaults
- Default `TTS_MAX_CONCURRENCY=3`.
- Expose via environment variable; no client changes needed.
- If instability observed, set `TTS_MAX_CONCURRENCY=1` to revert to sequential behavior quickly.
## Risks & Mitigations
- OOM under high concurrency → Mitigate with low default, easy rollback, and chunking already in place.
- Disk I/O saturation → Offload to threads; if disk is a bottleneck, decrease concurrency.
- Model thread safety → We call `model.generate` concurrently only up to semaphore cap; if underlying library is not thread-safe for forward passes, consider serializing forwards but still overlapping with file I/O; early logs will reveal.
## Follow-up (Out of Scope for this change)
- Dynamic batching queue inside `TTSService` for further GPU efficiency.
- CUDA AMP enablement and profiling.
- Per-speaker sub-queues if batching requires same-speaker inputs.
## Acceptance Criteria
- `TTS_MAX_CONCURRENCY` is configurable; default=3.
- File writes occur via `asyncio.to_thread`.
- Order of `segment_files` unchanged relative to sequential output.
- End-to-end works for both small and large dialogs; error cases logged.
- Observed GPU utilization and runtime improve on representative dialog.

View File

@ -1,138 +0,0 @@
# Frontend Review and Recommendations
Date: 2025-08-12T11:32:16-05:00
Scope: `frontend/` of `chatterbox-test` monorepo
---
## Summary
- Static vanilla JS frontend served by `frontend/start_dev_server.py` interacting with FastAPI backend under `/api`.
- Solid feature set (speaker management, dialog editor, per-line generation, full dialog generation, save/load) with robust error handling.
- Key issues: inconsistent API trailing slashes, Jest/babel-jest version/config mismatch, minor state duplication, alert/confirm UX, overly dark border color, token in `package.json` repo URL.
---
## Findings
- **Framework/structure**
- `frontend/` is static vanilla JS. Main files:
- `index.html`, `js/app.js`, `js/api.js`, `js/config.js`, `css/style.css`.
- Dev server: `frontend/start_dev_server.py` (CORS, env-based port/host).
- **API client vs backend routes (trailing slashes)**
- Frontend `frontend/js/api.js` currently uses:
- `getSpeakers()`: `${API_BASE_URL}/speakers/` (trailing).
- `addSpeaker()`: `${API_BASE_URL}/speakers/` (trailing).
- `deleteSpeaker()`: `${API_BASE_URL}/speakers/${speakerId}/` (trailing).
- `generateLine()`: `${API_BASE_URL}/dialog/generate_line`.
- `generateDialog()`: `${API_BASE_URL}/dialog/generate`.
- Backend routes:
- `backend/app/routers/speakers.py`: `GET/POST /` and `DELETE /{speaker_id}` (no trailing slash on delete when prefixed under `/api/speakers`).
- `backend/app/routers/dialog.py`: `/generate_line` and `/generate` (match frontend).
- Tests in `frontend/tests/api.test.js` expect no trailing slashes for `/speakers` and `/speakers/{id}`.
- Implication: Inconsistent trailing slashes can cause test failures and possible 404s for delete.
- **Payload schema inconsistencies**
- `generateDialog()` JSDoc shows `silence` as `{ duration_ms: 500 }` but backend expects `duration` (seconds). UI also uses `duration` seconds.
- **Form fields alignment**
- Speaker add uses `name` and `audio_file` which match backend (`Form` and `File`).
- **State management duplication in `frontend/js/app.js`**
- `dialogItems` and `availableSpeakersCache` defined at module scope and again inside `initializeDialogEditor()`, creating shadowing risk. Consolidate to a single source of truth.
- **UX considerations**
- Heavy use of `alert()`/`confirm()`. Prefer inline notifications/banners and per-row error chips (you already render `item.error`).
- Add global loading/disabled states for long actions (e.g., full dialog generation, speaker add/delete).
- **CSS theme issue**
- `--border-light` is `#1b0404` (dark red); semantically a light gray fits better and improves contrast harmony.
- **Testing/Jest/Babel config**
- Root `package.json` uses `jest@^29.7.0` with `babel-jest@^30.0.0-beta.3` (major mismatch). Align versions.
- No `jest.config.cjs` to configure `transform` via `babel-jest` for ESM modules.
- **Security**
- `package.json` `repository.url` embeds a token. Remove secrets from VCS immediately.
- **Dev scripts**
- Only `"test": "jest"` present. Add scripts to run the frontend dev server and test config explicitly.
- **Response handling consistency**
- `generateLine()` parses via `response.text()` then `JSON.parse()`. Others use `response.json()`. Standardize for consistency.
---
## Recommended Actions (Phase 1: Quick wins)
- **Normalize API paths in `frontend/js/api.js`**
- Use no trailing slashes:
- `GET/POST`: `${API_BASE_URL}/speakers`
- `DELETE`: `${API_BASE_URL}/speakers/${speakerId}`
- Keep dialog endpoints unchanged.
- **Fix JSDoc for `generateDialog()`**
- Use `silence: { duration: number }` (seconds), not `duration_ms`.
- **Refactor `frontend/js/app.js` state**
- Remove duplicate `dialogItems`/`availableSpeakersCache` declarations. Choose module-scope or function-scope, and pass references.
- **Improve UX**
- Replace `alert/confirm` with inline banners near `#results-display` and per-row error chips (extend existing `.line-error-msg`).
- Add disabled/loading states for global generate and speaker actions.
- **CSS tweak**
- Set `--border-light: #e5e7eb;` (or similar) to reflect a light border.
- **Harden tests/Jest config**
- Align versions: either Jest 29 + `babel-jest` 29, or upgrade both to 30 stable together.
- Add `jest.config.cjs` with `transform` using `babel-jest` and suitable `testEnvironment`.
- Ensure tests expect normalized API paths (recommended to change code to match tests).
- **Dev scripts**
- Add to root `package.json`:
- `"frontend:dev": "python3 frontend/start_dev_server.py"`
- `"test:frontend": "jest --config ./jest.config.cjs"`
- **Sanitize repository URL**
- Remove embedded token from `package.json`.
- **Standardize response parsing**
- Switch `generateLine()` to `response.json()` unless backend returns `text/plain`.
---
## Backend Endpoint Confirmation
- `speakers` router (`backend/app/routers/speakers.py`):
- List/Create: `GET /`, `POST /` (when mounted under `/api/speakers``/api/speakers/`).
- Delete: `DELETE /{speaker_id}` (→ `/api/speakers/{speaker_id}`), no trailing slash.
- `dialog` router (`backend/app/routers/dialog.py`):
- `POST /generate_line`, `POST /generate` (mounted under `/api/dialog`).
---
## Proposed Implementation Plan
- **Phase 1 (12 hours)**
- Normalize API paths in `api.js`.
- Fix JSDoc for `generateDialog`.
- Consolidate dialog state in `app.js`.
- Adjust `--border-light` to light gray.
- Add `jest.config.cjs`, align Jest/babel-jest versions.
- Add dev/test scripts.
- Remove token from `package.json`.
- **Phase 2 (24 hours)**
- Inline notifications and comprehensive loading/disabled states.
- **Phase 3 (optional)**
- ESLint + Prettier.
- Consider Vite migration (HMR, proxy to backend, improved DX).
---
## Notes
- Current local time captured for this review: 2025-08-12T11:32:16-05:00.
- Frontend config (`frontend/js/config.js`) supports env overrides for API base and dev server port.
- Tests (`frontend/tests/api.test.js`) currently assume endpoints without trailing slashes.

View File

@ -1,204 +0,0 @@
# Unload Model on Idle: Implementation Plan
## Goals
- Automatically unload large TTS model(s) when idle to reduce RAM/VRAM usage.
- Lazy-load on demand without breaking API semantics.
- Configurable timeout and safety controls.
## Requirements
- Config-driven idle timeout and poll interval.
- Thread-/async-safe across concurrent requests.
- No unload while an inference is in progress.
- Clear logs and metrics for load/unload events.
## Configuration
File: `backend/app/config.py`
- Add:
- `MODEL_IDLE_TIMEOUT_SECONDS: int = 900` (0 disables eviction)
- `MODEL_IDLE_CHECK_INTERVAL_SECONDS: int = 60`
- `MODEL_EVICTION_ENABLED: bool = True`
- Bind to env: `MODEL_IDLE_TIMEOUT_SECONDS`, `MODEL_IDLE_CHECK_INTERVAL_SECONDS`, `MODEL_EVICTION_ENABLED`.
## Design
### ModelManager (Singleton)
File: `backend/app/services/model_manager.py` (new)
- Responsibilities:
- Manage lifecycle (load/unload) of the TTS model/pipeline.
- Provide `get()` that returns a ready model (lazy-load if needed) and updates `last_used`.
- Track active request count to block eviction while > 0.
- Internals:
- `self._model` (or components), `self._last_used: float`, `self._active: int`.
- Locks: `asyncio.Lock` for load/unload; `asyncio.Lock` or `asyncio.Semaphore` for counters.
- Optional CUDA cleanup: `torch.cuda.empty_cache()` after unload.
- API:
- `async def get(self) -> Model`: ensures loaded; bumps `last_used`.
- `async def load(self)`: idempotent; guarded by lock.
- `async def unload(self)`: only when `self._active == 0`; clears refs and caches.
- `def touch(self)`: update `last_used`.
- Context helper: `async def using(self)`: async context manager incrementing/decrementing `active` safely.
### Idle Reaper Task
Registration: FastAPI startup (e.g., in `backend/app/main.py`)
- Background task loop every `MODEL_IDLE_CHECK_INTERVAL_SECONDS`:
- If eviction enabled and timeout > 0 and model is loaded and `active == 0` and `now - last_used >= timeout`, call `unload()`.
- Handle cancellation on shutdown.
### API Integration
- Replace direct model access in endpoints with:
```python
manager = ModelManager.instance()
async with manager.using():
model = await manager.get()
# perform inference
```
- Optionally call `manager.touch()` at request start for non-inference paths that still need the model resident.
## Pseudocode
```python
# services/model_manager.py
import time, asyncio
from typing import Optional
from .config import settings
class ModelManager:
_instance: Optional["ModelManager"] = None
def __init__(self):
self._model = None
self._last_used = time.time()
self._active = 0
self._lock = asyncio.Lock()
self._counter_lock = asyncio.Lock()
@classmethod
def instance(cls):
if not cls._instance:
cls._instance = cls()
return cls._instance
async def load(self):
async with self._lock:
if self._model is not None:
return
# ... load model/pipeline here ...
self._model = await load_pipeline()
self._last_used = time.time()
async def unload(self):
async with self._lock:
if self._model is None:
return
if self._active > 0:
return # safety: do not unload while in use
# ... free resources ...
self._model = None
try:
import torch
torch.cuda.empty_cache()
except Exception:
pass
async def get(self):
if self._model is None:
await self.load()
self._last_used = time.time()
return self._model
async def _inc(self):
async with self._counter_lock:
self._active += 1
async def _dec(self):
async with self._counter_lock:
self._active = max(0, self._active - 1)
self._last_used = time.time()
def last_used(self):
return self._last_used
def is_loaded(self):
return self._model is not None
def active(self):
return self._active
def using(self):
manager = self
class _Ctx:
async def __aenter__(self):
await manager._inc()
return manager
async def __aexit__(self, exc_type, exc, tb):
await manager._dec()
return _Ctx()
# main.py (startup)
@app.on_event("startup")
async def start_reaper():
async def reaper():
while True:
try:
await asyncio.sleep(settings.MODEL_IDLE_CHECK_INTERVAL_SECONDS)
if not settings.MODEL_EVICTION_ENABLED:
continue
timeout = settings.MODEL_IDLE_TIMEOUT_SECONDS
if timeout <= 0:
continue
m = ModelManager.instance()
if m.is_loaded() and m.active() == 0 and (time.time() - m.last_used()) >= timeout:
await m.unload()
except asyncio.CancelledError:
break
except Exception as e:
logger.exception("Idle reaper error: %s", e)
app.state._model_reaper_task = asyncio.create_task(reaper())
@app.on_event("shutdown")
async def stop_reaper():
task = getattr(app.state, "_model_reaper_task", None)
if task:
task.cancel()
with contextlib.suppress(Exception):
await task
```
```
## Observability
- Logs: model load/unload, reaper decisions, active count.
- Metrics (optional): counters and gauges (load events, active, residency time).
## Safety & Edge Cases
- Avoid unload when `active > 0`.
- Guard multiple loads/unloads with lock.
- Multi-worker servers: each worker manages its own model.
- Cold-start latency: document expected additional latency for first request after idle unload.
## Testing
- Unit tests for `ModelManager`: load/unload idempotency, counter behavior.
- Simulated reaper triggering with short timeouts.
- Endpoint tests: concurrency (N simultaneous inferences), ensure no unload mid-flight.
## Rollout Plan
1. Introduce config + Manager (no reaper), switch endpoints to `using()`.
2. Enable reaper with long timeout in staging; observe logs/metrics.
3. Tune timeout; enable in production.
## Tasks Checklist
- [ ] Add config flags and defaults in `backend/app/config.py`.
- [ ] Create `backend/app/services/model_manager.py`.
- [ ] Register startup/shutdown reaper in app init (`backend/app/main.py`).
- [ ] Refactor endpoints to use `ModelManager.instance().using()` and `get()`.
- [ ] Add logs and optional metrics.
- [ ] Add unit/integration tests.
- [ ] Update README/ops docs.
## Alternatives Considered
- Gunicorn/uvicorn worker preloading with external idle supervisor: more complexity, less portability.
- OS-level cgroup memory pressure eviction: opaque and risky for correctness.
## Configuration Examples
```
MODEL_EVICTION_ENABLED=true
MODEL_IDLE_TIMEOUT_SECONDS=900
MODEL_IDLE_CHECK_INTERVAL_SECONDS=60
```

View File

@ -359,7 +359,7 @@ The API uses the following directory structure (configurable in `app/config.py`)
- **Temporary Files**: `{PROJECT_ROOT}/tts_temp_outputs/` - **Temporary Files**: `{PROJECT_ROOT}/tts_temp_outputs/`
### CORS Settings ### CORS Settings
- Allowed Origins: `http://localhost:8001`, `http://127.0.0.1:8001` (plus any `FRONTEND_HOST:FRONTEND_PORT` when using `start_servers.py`) - Allowed Origins: `http://localhost:8001`, `http://127.0.0.1:8001`
- Allowed Methods: All - Allowed Methods: All
- Allowed Headers: All - Allowed Headers: All
- Credentials: Enabled - Credentials: Enabled

View File

@ -58,7 +58,7 @@ The application uses environment variables for configuration. Three `.env` files
- `VITE_DEV_SERVER_HOST`: Frontend development server host - `VITE_DEV_SERVER_HOST`: Frontend development server host
#### CORS Configuration #### CORS Configuration
- `CORS_ORIGINS`: Comma-separated list of allowed origins. When using `start_servers.py` with the default `FRONTEND_HOST=0.0.0.0` and no explicit `CORS_ORIGINS`, CORS will allow all origins (wildcard) to simplify development. - `CORS_ORIGINS`: Comma-separated list of allowed origins
#### Device Configuration #### Device Configuration
- `DEVICE`: Device for TTS model (auto, cpu, cuda, mps) - `DEVICE`: Device for TTS model (auto, cpu, cuda, mps)
@ -101,7 +101,7 @@ CORS_ORIGINS=http://localhost:3000
### Common Issues ### Common Issues
1. **Permission Errors**: Ensure the `PROJECT_ROOT` directory is writable 1. **Permission Errors**: Ensure the `PROJECT_ROOT` directory is writable
2. **CORS Errors**: Check that your frontend URL is in `CORS_ORIGINS`. (When using `start_servers.py`, your specified `FRONTEND_HOST:FRONTEND_PORT` will be autoincluded.) 2. **CORS Errors**: Check that your frontend URL is in `CORS_ORIGINS`
3. **Model Loading Errors**: Verify `DEVICE` setting matches your hardware 3. **Model Loading Errors**: Verify `DEVICE` setting matches your hardware
4. **Path Errors**: Ensure all path variables point to existing, accessible directories 4. **Path Errors**: Ensure all path variables point to existing, accessible directories

View File

@ -9,7 +9,6 @@ A comprehensive text-to-speech application with multiple interfaces for generati
- **Dialog Generation**: Create multi-speaker conversations with configurable silence gaps - **Dialog Generation**: Create multi-speaker conversations with configurable silence gaps
- **Audiobook Generation**: Convert long-form text into narrated audiobooks - **Audiobook Generation**: Convert long-form text into narrated audiobooks
- **Speaker Management**: Add/remove speakers with custom audio samples - **Speaker Management**: Add/remove speakers with custom audio samples
- **Paste Script (JSONL) Import**: Paste a dialog script as JSONL directly into the editor via a modal
- **Memory Optimization**: Automatic model cleanup after generation - **Memory Optimization**: Automatic model cleanup after generation
- **Output Organization**: Files saved in organized directories with ZIP packaging - **Output Organization**: Files saved in organized directories with ZIP packaging
@ -24,6 +23,7 @@ A comprehensive text-to-speech application with multiple interfaces for generati
pip install -r requirements.txt pip install -r requirements.txt
npm install npm install
``` ```
2. Run automated setup: 2. Run automated setup:
```bash ```bash
python setup.py python setup.py
@ -33,24 +33,6 @@ A comprehensive text-to-speech application with multiple interfaces for generati
- Add audio samples (WAV format) to `speaker_data/speaker_samples/` - Add audio samples (WAV format) to `speaker_data/speaker_samples/`
- Configure speakers in `speaker_data/speakers.yaml` - Configure speakers in `speaker_data/speakers.yaml`
### Windows Quick Start
On Windows, a PowerShell setup script is provided to automate environment setup and startup.
```powershell
# From the repository root in PowerShell
./setup-windows.ps1
# First time only, if scripts are blocked:
# Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
```
What it does:
- Creates/uses `.venv`
- Upgrades pip and installs deps from `backend/requirements.txt` and root `requirements.txt`
- Creates a default `.env` with sensible ports if missing
- Starts both servers via `start_servers.py`
### Running the Application ### Running the Application
**Full-Stack Web Application:** **Full-Stack Web Application:**
@ -59,12 +41,6 @@ What it does:
python start_servers.py python start_servers.py
``` ```
On Windows, you can also use the one-liner PowerShell script:
```powershell
./setup-windows.ps1
```
**Individual Components:** **Individual Components:**
```bash ```bash
# Backend only (FastAPI) # Backend only (FastAPI)
@ -80,26 +56,7 @@ python gradio_app.py
## Usage ## Usage
### Web Interface ### Web Interface
Access the modern web UI at `http://localhost:8001` for interactive dialog creation. Access the modern web UI at `http://localhost:8001` for interactive dialog creation with drag-and-drop editing.
#### Paste Script (JSONL) in Dialog Editor
Quickly load a dialog by pasting JSONL (one JSON object per line):
1. Click `Paste Script` in the Dialog Editor.
2. Paste JSONL content, for example:
```jsonl
{"type":"speech","speaker_id":"dummy_speaker","text":"Hello there!"}
{"type":"silence","duration":0.5}
{"type":"speech","speaker_id":"dummy_speaker","text":"This is the second line."}
```
3. Click `Load` and confirm replacement if prompted.
Notes:
- Input is validated per line; errors report line numbers.
- The dialog is saved to localStorage, so it persists across refreshes.
- Unknown `speaker_id`s will still load; add speakers later if needed.
### CLI Tools ### CLI Tools
@ -192,12 +149,5 @@ The application automatically:
- **"Skipping unknown speaker"**: Configure speaker in `speaker_data/speakers.yaml` - **"Skipping unknown speaker"**: Configure speaker in `speaker_data/speakers.yaml`
- **"Sample file not found"**: Verify audio files exist in `speaker_data/speaker_samples/` - **"Sample file not found"**: Verify audio files exist in `speaker_data/speaker_samples/`
- **Memory issues**: Use model reinitialization options for long content - **Memory issues**: Use model reinitialization options for long content
- **CORS errors**: Check frontend/backend port configuration (frontend origin is auto-included when using `start_servers.py`) - **CORS errors**: Check frontend/backend port configuration
- **Import errors**: Run `python import_helper.py` to check dependencies - **Import errors**: Run `python import_helper.py` to check dependencies
### Windows-specific
- If PowerShell blocks script execution, run once:
```powershell
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
```
- If Windows Firewall prompts the first time you run servers, allow access on your private network.

View File

@ -6,34 +6,20 @@ from dotenv import load_dotenv
load_dotenv() load_dotenv()
# Project root - can be overridden by environment variable # Project root - can be overridden by environment variable
PROJECT_ROOT = Path( PROJECT_ROOT = Path(os.getenv("PROJECT_ROOT", Path(__file__).parent.parent.parent)).resolve()
os.getenv("PROJECT_ROOT", Path(__file__).parent.parent.parent)
).resolve()
# Directory paths # Directory paths
SPEAKER_DATA_BASE_DIR = Path( SPEAKER_DATA_BASE_DIR = Path(os.getenv("SPEAKER_DATA_BASE_DIR", str(PROJECT_ROOT / "speaker_data")))
os.getenv("SPEAKER_DATA_BASE_DIR", str(PROJECT_ROOT / "speaker_data")) SPEAKER_SAMPLES_DIR = Path(os.getenv("SPEAKER_SAMPLES_DIR", str(SPEAKER_DATA_BASE_DIR / "speaker_samples")))
) SPEAKERS_YAML_FILE = Path(os.getenv("SPEAKERS_YAML_FILE", str(SPEAKER_DATA_BASE_DIR / "speakers.yaml")))
SPEAKER_SAMPLES_DIR = Path(
os.getenv("SPEAKER_SAMPLES_DIR", str(SPEAKER_DATA_BASE_DIR / "speaker_samples"))
)
SPEAKERS_YAML_FILE = Path(
os.getenv("SPEAKERS_YAML_FILE", str(SPEAKER_DATA_BASE_DIR / "speakers.yaml"))
)
# TTS temporary output path (used by DialogProcessorService) # TTS temporary output path (used by DialogProcessorService)
TTS_TEMP_OUTPUT_DIR = Path( TTS_TEMP_OUTPUT_DIR = Path(os.getenv("TTS_TEMP_OUTPUT_DIR", str(PROJECT_ROOT / "tts_temp_outputs")))
os.getenv("TTS_TEMP_OUTPUT_DIR", str(PROJECT_ROOT / "tts_temp_outputs"))
)
# Final dialog output path (used by Dialog router and served by main app) # Final dialog output path (used by Dialog router and served by main app)
# These are stored within the 'backend' directory to be easily servable. # These are stored within the 'backend' directory to be easily servable.
DIALOG_OUTPUT_PARENT_DIR = PROJECT_ROOT / "backend" DIALOG_OUTPUT_PARENT_DIR = PROJECT_ROOT / "backend"
DIALOG_GENERATED_DIR = Path( DIALOG_GENERATED_DIR = Path(os.getenv("DIALOG_GENERATED_DIR", str(DIALOG_OUTPUT_PARENT_DIR / "tts_generated_dialogs")))
os.getenv(
"DIALOG_GENERATED_DIR", str(DIALOG_OUTPUT_PARENT_DIR / "tts_generated_dialogs")
)
)
# Alias for clarity and backward compatibility # Alias for clarity and backward compatibility
DIALOG_OUTPUT_DIR = DIALOG_GENERATED_DIR DIALOG_OUTPUT_DIR = DIALOG_GENERATED_DIR
@ -43,41 +29,11 @@ HOST = os.getenv("HOST", "0.0.0.0")
PORT = int(os.getenv("PORT", "8000")) PORT = int(os.getenv("PORT", "8000"))
RELOAD = os.getenv("RELOAD", "true").lower() == "true" RELOAD = os.getenv("RELOAD", "true").lower() == "true"
# CORS configuration: determine allowed origins based on env & frontend binding # CORS configuration
_cors_env = os.getenv("CORS_ORIGINS", "") CORS_ORIGINS = [origin.strip() for origin in os.getenv("CORS_ORIGINS", "http://localhost:8001,http://127.0.0.1:8001").split(",")]
_frontend_host = os.getenv("FRONTEND_HOST")
_frontend_port = os.getenv("FRONTEND_PORT")
# If the dev server is bound to 0.0.0.0 (all interfaces), allow all origins
if _frontend_host == "0.0.0.0": # dev convenience when binding wildcard
CORS_ORIGINS = ["*"]
elif _cors_env:
# parse comma-separated origins, strip whitespace
CORS_ORIGINS = [origin.strip() for origin in _cors_env.split(",") if origin.strip()]
else:
# default to allow all origins in development
CORS_ORIGINS = ["*"]
# Auto-include specific frontend origin when not using wildcard CORS
if CORS_ORIGINS != ["*"] and _frontend_host and _frontend_port:
_frontend_origin = f"http://{_frontend_host.strip()}:{_frontend_port.strip()}"
if _frontend_origin not in CORS_ORIGINS:
CORS_ORIGINS.append(_frontend_origin)
# Device configuration # Device configuration
DEVICE = os.getenv("DEVICE", "auto") DEVICE = os.getenv("DEVICE", "auto")
# Concurrency configuration
# Max number of concurrent TTS generation tasks per dialog request
TTS_MAX_CONCURRENCY = int(os.getenv("TTS_MAX_CONCURRENCY", "3"))
# Model idle eviction configuration
# Enable/disable idle-based model eviction
MODEL_EVICTION_ENABLED = os.getenv("MODEL_EVICTION_ENABLED", "true").lower() == "true"
# Unload model after this many seconds of inactivity (0 disables eviction)
MODEL_IDLE_TIMEOUT_SECONDS = int(os.getenv("MODEL_IDLE_TIMEOUT_SECONDS", "900"))
# How often the reaper checks for idleness
MODEL_IDLE_CHECK_INTERVAL_SECONDS = int(os.getenv("MODEL_IDLE_CHECK_INTERVAL_SECONDS", "60"))
# Ensure directories exist # Ensure directories exist
SPEAKER_SAMPLES_DIR.mkdir(parents=True, exist_ok=True) SPEAKER_SAMPLES_DIR.mkdir(parents=True, exist_ok=True)

View File

@ -2,10 +2,6 @@ from fastapi import FastAPI
from fastapi.staticfiles import StaticFiles from fastapi.staticfiles import StaticFiles
from fastapi.middleware.cors import CORSMiddleware from fastapi.middleware.cors import CORSMiddleware
from pathlib import Path from pathlib import Path
import asyncio
import contextlib
import logging
import time
from app.routers import speakers, dialog # Import the routers from app.routers import speakers, dialog # Import the routers
from app import config from app import config
@ -42,47 +38,3 @@ config.DIALOG_GENERATED_DIR.mkdir(parents=True, exist_ok=True)
app.mount("/generated_audio", StaticFiles(directory=config.DIALOG_GENERATED_DIR), name="generated_audio") app.mount("/generated_audio", StaticFiles(directory=config.DIALOG_GENERATED_DIR), name="generated_audio")
# Further endpoints for speakers, dialog generation, etc., will be added here. # Further endpoints for speakers, dialog generation, etc., will be added here.
# --- Background task: idle model reaper ---
logger = logging.getLogger("app.model_reaper")
@app.on_event("startup")
async def _start_model_reaper():
from app.services.model_manager import ModelManager
async def reaper():
while True:
try:
await asyncio.sleep(config.MODEL_IDLE_CHECK_INTERVAL_SECONDS)
if not getattr(config, "MODEL_EVICTION_ENABLED", True):
continue
timeout = getattr(config, "MODEL_IDLE_TIMEOUT_SECONDS", 0)
if timeout <= 0:
continue
m = ModelManager.instance()
if m.is_loaded() and m.active() == 0 and (time.time() - m.last_used()) >= timeout:
logger.info("Idle timeout reached (%.0fs). Unloading model...", timeout)
await m.unload()
except asyncio.CancelledError:
break
except Exception:
logger.exception("Model reaper encountered an error")
# Log eviction configuration at startup
logger.info(
"Model Eviction -> enabled: %s | idle_timeout: %ss | check_interval: %ss",
getattr(config, "MODEL_EVICTION_ENABLED", True),
getattr(config, "MODEL_IDLE_TIMEOUT_SECONDS", 0),
getattr(config, "MODEL_IDLE_CHECK_INTERVAL_SECONDS", 60),
)
app.state._model_reaper_task = asyncio.create_task(reaper())
@app.on_event("shutdown")
async def _stop_model_reaper():
task = getattr(app.state, "_model_reaper_task", None)
if task:
task.cancel()
with contextlib.suppress(Exception):
await task

View File

@ -9,8 +9,6 @@ from app.services.speaker_service import SpeakerManagementService
from app.services.dialog_processor_service import DialogProcessorService from app.services.dialog_processor_service import DialogProcessorService
from app.services.audio_manipulation_service import AudioManipulationService from app.services.audio_manipulation_service import AudioManipulationService
from app import config from app import config
from typing import AsyncIterator
from app.services.model_manager import ModelManager
router = APIRouter() router = APIRouter()
@ -18,12 +16,9 @@ router = APIRouter()
# These can be more sophisticated with a proper DI container or FastAPI's Depends system if services had complex init. # These can be more sophisticated with a proper DI container or FastAPI's Depends system if services had complex init.
# For now, direct instantiation or simple Depends is fine. # For now, direct instantiation or simple Depends is fine.
async def get_tts_service() -> AsyncIterator[TTSService]: def get_tts_service():
"""Dependency that holds a usage token for the duration of the request.""" # Consider making device configurable
manager = ModelManager.instance() return TTSService(device="mps")
async with manager.using():
service = await manager.get_service()
yield service
def get_speaker_management_service(): def get_speaker_management_service():
return SpeakerManagementService() return SpeakerManagementService()
@ -37,7 +32,7 @@ def get_dialog_processor_service(
def get_audio_manipulation_service(): def get_audio_manipulation_service():
return AudioManipulationService() return AudioManipulationService()
# --- Helper imports --- # --- Helper function to manage TTS model loading/unloading ---
from app.models.dialog_models import SpeechItem, SilenceItem from app.models.dialog_models import SpeechItem, SilenceItem
from app.services.tts_service import TTSService from app.services.tts_service import TTSService
@ -133,7 +128,19 @@ async def generate_line(
detail=error_detail detail=error_detail
) )
# Removed per-request load/unload in favor of ModelManager idle eviction. async def manage_tts_model_lifecycle(tts_service: TTSService, task_function, *args, **kwargs):
"""Loads TTS model, executes task, then unloads model."""
try:
print("API: Loading TTS model...")
tts_service.load_model()
return await task_function(*args, **kwargs)
except Exception as e:
# Log or handle specific exceptions if needed before re-raising
print(f"API: Error during TTS model lifecycle or task execution: {e}")
raise
finally:
print("API: Unloading TTS model...")
tts_service.unload_model()
async def process_dialog_flow( async def process_dialog_flow(
request: DialogRequest, request: DialogRequest,
@ -267,10 +274,12 @@ async def generate_dialog_endpoint(
- Concatenates all audio segments into a single file. - Concatenates all audio segments into a single file.
- Creates a ZIP archive of all individual segments and the concatenated file. - Creates a ZIP archive of all individual segments and the concatenated file.
""" """
# Execute core processing; ModelManager dependency keeps the model marked "in use". # Wrap the core processing logic with model loading/unloading
return await process_dialog_flow( return await manage_tts_model_lifecycle(
request=request, tts_service,
dialog_processor=dialog_processor, process_dialog_flow,
request=request,
dialog_processor=dialog_processor,
audio_manipulator=audio_manipulator, audio_manipulator=audio_manipulator,
background_tasks=background_tasks, background_tasks=background_tasks
) )

View File

@ -1,8 +1,6 @@
from pathlib import Path from pathlib import Path
from typing import List, Dict, Any, Union from typing import List, Dict, Any, Union
import re import re
import asyncio
from datetime import datetime
from .tts_service import TTSService from .tts_service import TTSService
from .speaker_service import SpeakerManagementService from .speaker_service import SpeakerManagementService
@ -94,72 +92,24 @@ class DialogProcessorService:
import shutil import shutil
segment_idx = 0 segment_idx = 0
tasks = []
results_map: Dict[int, Dict[str, Any]] = {}
sem = asyncio.Semaphore(getattr(config, "TTS_MAX_CONCURRENCY", 2))
async def run_one(planned: Dict[str, Any]):
async with sem:
text_chunk = planned["text_chunk"]
speaker_id = planned["speaker_id"]
abs_speaker_sample_path = planned["abs_speaker_sample_path"]
filename_base = planned["filename_base"]
params = planned["params"]
seg_idx = planned["segment_idx"]
start_ts = datetime.now()
start_line = (
f"[{start_ts.isoformat(timespec='seconds')}] [TTS-TASK] START seg_idx={seg_idx} "
f"speaker={speaker_id} chunk_len={len(text_chunk)} base={filename_base}"
)
try:
out_path = await self.tts_service.generate_speech(
text=text_chunk,
speaker_id=speaker_id,
speaker_sample_path=str(abs_speaker_sample_path),
output_filename_base=filename_base,
output_dir=dialog_temp_dir,
exaggeration=params.get('exaggeration', 0.5),
cfg_weight=params.get('cfg_weight', 0.5),
temperature=params.get('temperature', 0.8),
)
end_ts = datetime.now()
duration = (end_ts - start_ts).total_seconds()
end_line = (
f"[{end_ts.isoformat(timespec='seconds')}] [TTS-TASK] END seg_idx={seg_idx} "
f"dur={duration:.2f}s -> {out_path}"
)
return seg_idx, {
"type": "speech",
"path": str(out_path),
"speaker_id": speaker_id,
"text_chunk": text_chunk,
}, start_line + "\n" + f"Successfully generated segment: {out_path}" + "\n" + end_line
except Exception as e:
end_ts = datetime.now()
err_line = (
f"[{end_ts.isoformat(timespec='seconds')}] [TTS-TASK] ERROR seg_idx={seg_idx} "
f"speaker={speaker_id} err={repr(e)}"
)
return seg_idx, {
"type": "error",
"message": f"Error generating speech for chunk '{text_chunk[:50]}...': {repr(e)}",
"text_chunk": text_chunk,
}, err_line
for i, item in enumerate(dialog_items): for i, item in enumerate(dialog_items):
item_type = item.get("type") item_type = item.get("type")
processing_log.append(f"Processing item {i+1}: type='{item_type}'") processing_log.append(f"Processing item {i+1}: type='{item_type}'")
# --- Handle reuse of existing audio --- # --- Universal: Handle reuse of existing audio for both speech and silence ---
use_existing_audio = item.get("use_existing_audio", False) use_existing_audio = item.get("use_existing_audio", False)
audio_url = item.get("audio_url") audio_url = item.get("audio_url")
if use_existing_audio and audio_url: if use_existing_audio and audio_url:
# Determine source path (handle both absolute and relative)
# Map web URL to actual file location in tts_generated_dialogs
if audio_url.startswith("/generated_audio/"): if audio_url.startswith("/generated_audio/"):
src_audio_path = config.DIALOG_OUTPUT_DIR / audio_url[len("/generated_audio/"):] src_audio_path = config.DIALOG_OUTPUT_DIR / audio_url[len("/generated_audio/"):]
else: else:
src_audio_path = Path(audio_url) src_audio_path = Path(audio_url)
if not src_audio_path.is_absolute(): if not src_audio_path.is_absolute():
# Assume relative to the generated audio root dir
src_audio_path = config.DIALOG_OUTPUT_DIR / audio_url.lstrip("/\\") src_audio_path = config.DIALOG_OUTPUT_DIR / audio_url.lstrip("/\\")
# Now src_audio_path should point to the real file in tts_generated_dialogs
if src_audio_path.is_file(): if src_audio_path.is_file():
segment_filename = f"{output_base_name}_seg{segment_idx}_reused.wav" segment_filename = f"{output_base_name}_seg{segment_idx}_reused.wav"
dest_path = (self.temp_audio_dir / output_base_name / segment_filename) dest_path = (self.temp_audio_dir / output_base_name / segment_filename)
@ -173,18 +123,22 @@ class DialogProcessorService:
processing_log.append(f"[REUSE] Destination audio file was not created: {dest_path}") processing_log.append(f"[REUSE] Destination audio file was not created: {dest_path}")
else: else:
processing_log.append(f"[REUSE] Destination audio file created: {dest_path}, size={dest_path.stat().st_size} bytes") processing_log.append(f"[REUSE] Destination audio file created: {dest_path}, size={dest_path.stat().st_size} bytes")
results_map[segment_idx] = {"type": item_type, "path": str(dest_path)} # Only include 'type' and 'path' so the concatenator always includes this segment
segment_results.append({
"type": item_type,
"path": str(dest_path)
})
processing_log.append(f"Reused existing audio for item {i+1}: copied from {src_audio_path} to {dest_path}") processing_log.append(f"Reused existing audio for item {i+1}: copied from {src_audio_path} to {dest_path}")
except Exception as e: except Exception as e:
error_message = f"Failed to copy reused audio for item {i+1}: {e}" error_message = f"Failed to copy reused audio for item {i+1}: {e}"
processing_log.append(error_message) processing_log.append(error_message)
results_map[segment_idx] = {"type": "error", "message": error_message} segment_results.append({"type": "error", "message": error_message})
segment_idx += 1 segment_idx += 1
continue continue
else: else:
error_message = f"Audio file for reuse not found at {src_audio_path} for item {i+1}." error_message = f"Audio file for reuse not found at {src_audio_path} for item {i+1}."
processing_log.append(error_message) processing_log.append(error_message)
results_map[segment_idx] = {"type": "error", "message": error_message} segment_results.append({"type": "error", "message": error_message})
segment_idx += 1 segment_idx += 1
continue continue
@ -193,81 +147,70 @@ class DialogProcessorService:
text = item.get("text") text = item.get("text")
if not speaker_id or not text: if not speaker_id or not text:
processing_log.append(f"Skipping speech item {i+1} due to missing speaker_id or text.") processing_log.append(f"Skipping speech item {i+1} due to missing speaker_id or text.")
results_map[segment_idx] = {"type": "error", "message": "Missing speaker_id or text"} segment_results.append({"type": "error", "message": "Missing speaker_id or text"})
segment_idx += 1
continue continue
# Validate speaker_id and get speaker_sample_path
speaker_info = self.speaker_service.get_speaker_by_id(speaker_id) speaker_info = self.speaker_service.get_speaker_by_id(speaker_id)
if not speaker_info: if not speaker_info:
processing_log.append(f"Speaker ID '{speaker_id}' not found. Skipping item {i+1}.") processing_log.append(f"Speaker ID '{speaker_id}' not found. Skipping item {i+1}.")
results_map[segment_idx] = {"type": "error", "message": f"Speaker ID '{speaker_id}' not found"} segment_results.append({"type": "error", "message": f"Speaker ID '{speaker_id}' not found"})
segment_idx += 1
continue continue
if not speaker_info.sample_path: if not speaker_info.sample_path:
processing_log.append(f"Speaker ID '{speaker_id}' has no sample path defined. Skipping item {i+1}.") processing_log.append(f"Speaker ID '{speaker_id}' has no sample path defined. Skipping item {i+1}.")
results_map[segment_idx] = {"type": "error", "message": f"Speaker ID '{speaker_id}' has no sample path defined"} segment_results.append({"type": "error", "message": f"Speaker ID '{speaker_id}' has no sample path defined"})
segment_idx += 1
continue continue
# speaker_info.sample_path is relative to config.SPEAKER_DATA_BASE_DIR
abs_speaker_sample_path = config.SPEAKER_DATA_BASE_DIR / speaker_info.sample_path abs_speaker_sample_path = config.SPEAKER_DATA_BASE_DIR / speaker_info.sample_path
if not abs_speaker_sample_path.is_file(): if not abs_speaker_sample_path.is_file():
processing_log.append(f"Speaker sample file not found or is not a file at '{abs_speaker_sample_path}' for speaker ID '{speaker_id}'. Skipping item {i+1}.") processing_log.append(f"Speaker sample file not found or is not a file at '{abs_speaker_sample_path}' for speaker ID '{speaker_id}'. Skipping item {i+1}.")
results_map[segment_idx] = {"type": "error", "message": f"Speaker sample not a file or not found: {abs_speaker_sample_path}"} segment_results.append({"type": "error", "message": f"Speaker sample not a file or not found: {abs_speaker_sample_path}"})
segment_idx += 1
continue continue
text_chunks = self._split_text(text) text_chunks = self._split_text(text)
processing_log.append(f"Split text for speaker '{speaker_id}' into {len(text_chunks)} chunk(s).") processing_log.append(f"Split text for speaker '{speaker_id}' into {len(text_chunks)} chunk(s).")
for chunk_idx, text_chunk in enumerate(text_chunks): for chunk_idx, text_chunk in enumerate(text_chunks):
filename_base = f"{output_base_name}_seg{segment_idx}_spk{speaker_id}_chunk{chunk_idx}" segment_filename_base = f"{output_base_name}_seg{segment_idx}_spk{speaker_id}_chunk{chunk_idx}"
processing_log.append(f"Queueing TTS for chunk: '{text_chunk[:50]}...' using speaker '{speaker_id}'") processing_log.append(f"Generating speech for chunk: '{text_chunk[:50]}...' using speaker '{speaker_id}'")
planned = {
"segment_idx": segment_idx, try:
"speaker_id": speaker_id, segment_output_path = await self.tts_service.generate_speech(
"text_chunk": text_chunk, text=text_chunk,
"abs_speaker_sample_path": abs_speaker_sample_path, speaker_id=speaker_id, # For metadata, actual sample path is used by TTS
"filename_base": filename_base, speaker_sample_path=str(abs_speaker_sample_path),
"params": { output_filename_base=segment_filename_base,
'exaggeration': item.get('exaggeration', 0.5), output_dir=dialog_temp_dir, # Save to the dialog's temp dir
'cfg_weight': item.get('cfg_weight', 0.5), exaggeration=item.get('exaggeration', 0.5), # Default from Gradio, Pydantic model should provide this
'temperature': item.get('temperature', 0.8), cfg_weight=item.get('cfg_weight', 0.5), # Default from Gradio, Pydantic model should provide this
}, temperature=item.get('temperature', 0.8) # Default from Gradio, Pydantic model should provide this
} )
tasks.append(asyncio.create_task(run_one(planned))) segment_results.append({
"type": "speech",
"path": str(segment_output_path),
"speaker_id": speaker_id,
"text_chunk": text_chunk
})
processing_log.append(f"Successfully generated segment: {segment_output_path}")
except Exception as e:
error_message = f"Error generating speech for chunk '{text_chunk[:50]}...': {repr(e)}"
processing_log.append(error_message)
segment_results.append({"type": "error", "message": error_message, "text_chunk": text_chunk})
segment_idx += 1 segment_idx += 1
elif item_type == "silence": elif item_type == "silence":
duration = item.get("duration") duration = item.get("duration")
if duration is None or duration < 0: if duration is None or duration < 0:
processing_log.append(f"Skipping silence item {i+1} due to invalid duration.") processing_log.append(f"Skipping silence item {i+1} due to invalid duration.")
results_map[segment_idx] = {"type": "error", "message": "Invalid duration for silence"} segment_results.append({"type": "error", "message": "Invalid duration for silence"})
segment_idx += 1
continue continue
results_map[segment_idx] = {"type": "silence", "duration": float(duration)} segment_results.append({"type": "silence", "duration": float(duration)})
processing_log.append(f"Added silence of {duration}s.") processing_log.append(f"Added silence of {duration}s.")
segment_idx += 1
else: else:
processing_log.append(f"Unknown item type '{item_type}' at item {i+1}. Skipping.") processing_log.append(f"Unknown item type '{item_type}' at item {i+1}. Skipping.")
results_map[segment_idx] = {"type": "error", "message": f"Unknown item type: {item_type}"} segment_results.append({"type": "error", "message": f"Unknown item type: {item_type}"})
segment_idx += 1
# Await all TTS tasks and merge results
if tasks:
processing_log.append(
f"Dispatching {len(tasks)} TTS task(s) with concurrency limit "
f"{getattr(config, 'TTS_MAX_CONCURRENCY', 2)}"
)
completed = await asyncio.gather(*tasks, return_exceptions=False)
for idx, payload, maybe_log in completed:
results_map[idx] = payload
if maybe_log:
processing_log.append(maybe_log)
# Build ordered list
for idx in sorted(results_map.keys()):
segment_results.append(results_map[idx])
# Log the full segment_results list for debugging # Log the full segment_results list for debugging
processing_log.append("[DEBUG] Final segment_results list:") processing_log.append("[DEBUG] Final segment_results list:")
@ -277,7 +220,7 @@ class DialogProcessorService:
return { return {
"log": "\n".join(processing_log), "log": "\n".join(processing_log),
"segment_files": segment_results, "segment_files": segment_results,
"temp_dir": str(dialog_temp_dir) "temp_dir": str(dialog_temp_dir) # For cleanup or zipping later
} }
if __name__ == "__main__": if __name__ == "__main__":

View File

@ -1,170 +0,0 @@
import asyncio
import time
import logging
from typing import Optional
import gc
import os
_proc = None
try:
import psutil # type: ignore
_proc = psutil.Process(os.getpid())
except Exception:
psutil = None # type: ignore
def _rss_mb() -> float:
"""Return current process RSS in MB, or -1.0 if unavailable."""
global _proc
try:
if _proc is None and psutil is not None:
_proc = psutil.Process(os.getpid())
if _proc is not None:
return _proc.memory_info().rss / (1024 * 1024)
except Exception:
return -1.0
return -1.0
try:
import torch # Optional; used for cache cleanup metrics
except Exception: # pragma: no cover - torch may not be present in some envs
torch = None # type: ignore
from app import config
from app.services.tts_service import TTSService
logger = logging.getLogger(__name__)
class ModelManager:
_instance: Optional["ModelManager"] = None
def __init__(self):
self._service: Optional[TTSService] = None
self._last_used: float = time.time()
self._active: int = 0
self._lock = asyncio.Lock()
self._counter_lock = asyncio.Lock()
@classmethod
def instance(cls) -> "ModelManager":
if not cls._instance:
cls._instance = cls()
return cls._instance
async def _ensure_service(self) -> None:
if self._service is None:
# Use configured device, default is handled by TTSService itself
device = getattr(config, "DEVICE", "auto")
# TTSService presently expects explicit device like "mps"/"cpu"/"cuda"; map "auto" to "mps" on Mac otherwise cpu
if device == "auto":
try:
import torch
if hasattr(torch.backends, "mps") and torch.backends.mps.is_available():
device = "mps"
elif torch.cuda.is_available():
device = "cuda"
else:
device = "cpu"
except Exception:
device = "cpu"
self._service = TTSService(device=device)
async def load(self) -> None:
async with self._lock:
await self._ensure_service()
if self._service and self._service.model is None:
before_mb = _rss_mb()
logger.info(
"Loading TTS model (device=%s)... (rss_before=%.1f MB)",
self._service.device,
before_mb,
)
self._service.load_model()
after_mb = _rss_mb()
if after_mb >= 0 and before_mb >= 0:
logger.info(
"TTS model loaded (rss_after=%.1f MB, delta=%.1f MB)",
after_mb,
after_mb - before_mb,
)
self._last_used = time.time()
async def unload(self) -> None:
async with self._lock:
if not self._service:
return
if self._active > 0:
logger.debug("Skip unload: %d active operations", self._active)
return
if self._service.model is not None:
before_mb = _rss_mb()
logger.info(
"Unloading idle TTS model... (rss_before=%.1f MB, active=%d)",
before_mb,
self._active,
)
self._service.unload_model()
# Drop the service instance as well to release any lingering refs
self._service = None
# Force GC and attempt allocator cache cleanup
try:
gc.collect()
finally:
if torch is not None:
try:
if hasattr(torch, "cuda") and torch.cuda.is_available():
torch.cuda.empty_cache()
except Exception:
logger.debug("cuda.empty_cache() failed", exc_info=True)
try:
# MPS empty_cache may exist depending on torch version
mps = getattr(torch, "mps", None)
if mps is not None and hasattr(mps, "empty_cache"):
mps.empty_cache()
except Exception:
logger.debug("mps.empty_cache() failed", exc_info=True)
after_mb = _rss_mb()
if after_mb >= 0 and before_mb >= 0:
logger.info(
"Idle unload complete (rss_after=%.1f MB, delta=%.1f MB)",
after_mb,
after_mb - before_mb,
)
self._last_used = time.time()
async def get_service(self) -> TTSService:
if not self._service or self._service.model is None:
await self.load()
self._last_used = time.time()
return self._service # type: ignore[return-value]
async def _inc(self) -> None:
async with self._counter_lock:
self._active += 1
async def _dec(self) -> None:
async with self._counter_lock:
self._active = max(0, self._active - 1)
self._last_used = time.time()
def last_used(self) -> float:
return self._last_used
def is_loaded(self) -> bool:
return bool(self._service and self._service.model is not None)
def active(self) -> int:
return self._active
def using(self):
manager = self
class _Ctx:
async def __aenter__(self):
await manager._inc()
return manager
async def __aexit__(self, exc_type, exc, tb):
await manager._dec()
return _Ctx()

View File

@ -1,14 +1,11 @@
import torch import torch
import torchaudio import torchaudio
import asyncio
from typing import Optional from typing import Optional
from chatterbox.tts import ChatterboxTTS from chatterbox.tts import ChatterboxTTS
from pathlib import Path from pathlib import Path
import gc # Garbage collector for memory management import gc # Garbage collector for memory management
import os import os
from contextlib import contextmanager from contextlib import contextmanager
from datetime import datetime
import time
# Import configuration # Import configuration
try: try:
@ -117,51 +114,41 @@ class TTSService:
# output_filename_base from DialogProcessorService is expected to be comprehensive (e.g., includes speaker_id, segment info) # output_filename_base from DialogProcessorService is expected to be comprehensive (e.g., includes speaker_id, segment info)
output_file_path = target_output_dir / f"{output_filename_base}.wav" output_file_path = target_output_dir / f"{output_filename_base}.wav"
start_ts = datetime.now() print(f"Generating audio for text: \"{text[:50]}...\" with speaker sample: {speaker_sample_path}")
print(f"[{start_ts.isoformat(timespec='seconds')}] [TTS] START generate+save base={output_filename_base} len={len(text)} sample={speaker_sample_path}") wav = None
try: try:
def _gen_and_save() -> Path: with torch.no_grad(): # Important for inference
t0 = time.perf_counter() wav = self.model.generate(
wav = None text=text,
try: audio_prompt_path=str(speaker_sample_p), # Must be a string path
with torch.no_grad(): # Important for inference exaggeration=exaggeration,
wav = self.model.generate( cfg_weight=cfg_weight,
text=text, temperature=temperature,
audio_prompt_path=str(speaker_sample_p), # Must be a string path )
exaggeration=exaggeration,
cfg_weight=cfg_weight, torchaudio.save(str(output_file_path), wav, self.model.sr)
temperature=temperature, print(f"Audio saved to: {output_file_path}")
) return output_file_path
# Save the audio synchronously in the same thread
torchaudio.save(str(output_file_path), wav, self.model.sr)
t1 = time.perf_counter()
print(f"[TTS-THREAD] Saved {output_file_path.name} in {t1 - t0:.2f}s")
return output_file_path
finally:
# Cleanup in the same thread that created the tensor
if wav is not None:
del wav
gc.collect()
if self.device == "cuda":
torch.cuda.empty_cache()
elif self.device == "mps":
if hasattr(torch.mps, "empty_cache"):
torch.mps.empty_cache()
out_path = await asyncio.to_thread(_gen_and_save)
end_ts = datetime.now()
print(f"[{end_ts.isoformat(timespec='seconds')}] [TTS] END generate+save base={output_filename_base} dur={(end_ts - start_ts).total_seconds():.2f}s -> {out_path}")
# Optionally unload model after generation
if unload_after:
print("Unloading TTS model after generation...")
self.unload_model()
return out_path
except Exception as e: except Exception as e:
print(f"Error during TTS generation or saving: {e}") print(f"Error during TTS generation or saving: {e}")
raise raise
finally:
# Explicitly delete the wav tensor to free memory
if wav is not None:
del wav
# Force garbage collection and cache cleanup
gc.collect()
if self.device == "cuda":
torch.cuda.empty_cache()
elif self.device == "mps":
if hasattr(torch.mps, "empty_cache"):
torch.mps.empty_cache()
# Unload the model if requested
if unload_after:
print("Unloading TTS model after generation...")
self.unload_model()
# Example usage (for testing, not part of the service itself) # Example usage (for testing, not part of the service itself)
if __name__ == "__main__": if __name__ == "__main__":

View File

@ -14,14 +14,6 @@ if __name__ == "__main__":
print(f"CORS Origins: {config.CORS_ORIGINS}") print(f"CORS Origins: {config.CORS_ORIGINS}")
print(f"Project Root: {config.PROJECT_ROOT}") print(f"Project Root: {config.PROJECT_ROOT}")
print(f"Device: {config.DEVICE}") print(f"Device: {config.DEVICE}")
# Idle eviction settings
print(
"Model Eviction -> enabled: {} | idle_timeout: {}s | check_interval: {}s".format(
getattr(config, "MODEL_EVICTION_ENABLED", True),
getattr(config, "MODEL_IDLE_TIMEOUT_SECONDS", 0),
getattr(config, "MODEL_IDLE_CHECK_INTERVAL_SECONDS", 60),
)
)
uvicorn.run( uvicorn.run(
"app.main:app", "app.main:app",

View File

@ -1,2 +0,0 @@
# yaml-language-server: $schema=https://raw.githubusercontent.com/antinomyhq/forge/refs/heads/main/forge.schema.json
model: qwen/qwen3-coder

View File

@ -24,7 +24,7 @@
--text-blue-darker: #205081; --text-blue-darker: #205081;
/* Border Colors */ /* Border Colors */
--border-light: #e5e7eb; --border-light: #1b0404;
--border-medium: #cfd8dc; --border-medium: #cfd8dc;
--border-blue: #b5c6df; --border-blue: #b5c6df;
--border-gray: #e3e3e3; --border-gray: #e3e3e3;
@ -55,7 +55,7 @@ body {
} }
.container { .container {
max-width: 1280px; max-width: 1100px;
margin: 0 auto; margin: 0 auto;
padding: 0 18px; padding: 0 18px;
} }
@ -134,17 +134,6 @@ main {
font-size: 1rem; font-size: 1rem;
} }
/* Allow wrapping for Text/Duration (3rd) column */
#dialog-items-table td:nth-child(3),
#dialog-items-table td.dialog-editable-cell {
white-space: pre-wrap; /* wrap text and preserve newlines */
overflow: visible; /* override global overflow hidden */
text-overflow: clip; /* no ellipsis */
word-break: break-word;/* wrap long words/URLs */
color: var(--text-primary); /* darker text for readability */
font-weight: 350; /* slightly heavier than 300, lighter than 400 */
}
/* Make the Speaker (2nd) column narrower */ /* Make the Speaker (2nd) column narrower */
#dialog-items-table th:nth-child(2), #dialog-items-table td:nth-child(2) { #dialog-items-table th:nth-child(2), #dialog-items-table td:nth-child(2) {
width: 60px; width: 60px;
@ -153,11 +142,11 @@ main {
text-align: center; text-align: center;
} }
/* Actions (4th) column sizing */ /* Make the Actions (4th) column narrower */
#dialog-items-table th:nth-child(4), #dialog-items-table td:nth-child(4) { #dialog-items-table th:nth-child(4), #dialog-items-table td:nth-child(4) {
width: 200px; width: 110px;
min-width: 180px; min-width: 90px;
max-width: 280px; max-width: 130px;
text-align: left; text-align: left;
padding-left: 0; padding-left: 0;
padding-right: 0; padding-right: 0;
@ -197,22 +186,8 @@ main {
#dialog-items-table td.actions { #dialog-items-table td.actions {
text-align: left; text-align: left;
min-width: 200px; min-width: 110px;
white-space: normal; /* allow wrapping so we don't see ellipsis */ white-space: nowrap;
overflow: visible; /* override table cell default from global rule */
text-overflow: clip; /* no ellipsis */
}
/* Allow wrapping of action buttons on smaller screens */
@media (max-width: 900px) {
#dialog-items-table th:nth-child(4), #dialog-items-table td:nth-child(4) {
width: auto;
min-width: 160px;
max-width: none;
}
#dialog-items-table td.actions {
white-space: normal;
}
} }
/* Collapsible log details */ /* Collapsible log details */
@ -371,7 +346,7 @@ button {
margin-right: 10px; margin-right: 10px;
} }
.generate-line-btn, .play-line-btn, .stop-line-btn { .generate-line-btn, .play-line-btn {
background: var(--bg-blue-light); background: var(--bg-blue-light);
color: var(--text-blue); color: var(--text-blue);
border: 1.5px solid var(--border-blue); border: 1.5px solid var(--border-blue);
@ -388,7 +363,7 @@ button {
vertical-align: middle; vertical-align: middle;
} }
.generate-line-btn:disabled, .play-line-btn:disabled, .stop-line-btn:disabled { .generate-line-btn:disabled, .play-line-btn:disabled {
opacity: 0.45; opacity: 0.45;
cursor: not-allowed; cursor: not-allowed;
} }
@ -399,7 +374,7 @@ button {
border-color: var(--warning-border); border-color: var(--warning-border);
} }
.generate-line-btn:hover, .play-line-btn:hover, .stop-line-btn:hover { .generate-line-btn:hover, .play-line-btn:hover {
background: var(--bg-blue-lighter); background: var(--bg-blue-lighter);
color: var(--text-blue-darker); color: var(--text-blue-darker);
border-color: var(--text-blue); border-color: var(--text-blue);
@ -474,72 +449,6 @@ footer {
border-top: 3px solid var(--primary-blue); border-top: 3px solid var(--primary-blue);
} }
/* Inline Notification */
.notice {
max-width: 1280px;
margin: 16px auto 0;
padding: 12px 16px;
border-radius: 6px;
border: 1px solid var(--border-medium);
background: var(--bg-white);
color: var(--text-primary);
display: flex;
align-items: center;
gap: 12px;
box-shadow: 0 1px 2px var(--shadow-light);
}
.notice--info {
border-color: var(--border-blue);
background: var(--bg-blue-light);
}
.notice--success {
border-color: #A7F3D0;
background: #ECFDF5;
}
.notice--warning {
border-color: var(--warning-border);
background: var(--warning-bg);
}
.notice--error {
border-color: var(--error-bg-dark);
background: #FEE2E2;
}
.notice__content {
flex: 1;
}
.notice__actions {
display: flex;
gap: 8px;
}
.notice__actions button {
padding: 6px 12px;
border-radius: 4px;
border: 1px solid var(--border-medium);
background: var(--bg-white);
cursor: pointer;
}
.notice__actions .btn-primary {
background: var(--primary-blue);
color: var(--text-white);
border: none;
}
.notice__close {
background: none;
border: none;
font-size: 18px;
cursor: pointer;
color: var(--text-secondary);
}
@media (max-width: 900px) { @media (max-width: 900px) {
.panel-grid { .panel-grid {
flex-direction: column; flex-direction: column;

View File

@ -11,38 +11,8 @@
<div class="container"> <div class="container">
<h1>Chatterbox TTS</h1> <h1>Chatterbox TTS</h1>
</div> </div>
<!-- Paste Script Modal -->
<div id="paste-script-modal" class="modal" style="display: none;">
<div class="modal-content">
<div class="modal-header">
<h3>Paste Dialog Script</h3>
<button class="modal-close" id="paste-script-close">&times;</button>
</div>
<div class="modal-body">
<p>Paste JSONL content (one JSON object per line). Example lines:</p>
<pre style="white-space:pre-wrap; background:#f6f8fa; padding:8px; border-radius:4px;">
{"type":"speech","speaker_id":"alice","text":"Hello there!"}
{"type":"silence","duration":0.5}
{"type":"speech","speaker_id":"bob","text":"Hi!"}
</pre>
<textarea id="paste-script-text" rows="10" style="width:100%;" placeholder='Paste JSONL here'></textarea>
</div>
<div class="modal-footer">
<button id="paste-script-load" class="btn-primary">Load</button>
<button id="paste-script-cancel" class="btn-secondary">Cancel</button>
</div>
</div>
</div>
</header> </header>
<!-- Global inline notification area -->
<div id="global-notice" class="notice" role="status" aria-live="polite" style="display:none;">
<div class="notice__content" id="global-notice-content"></div>
<div class="notice__actions" id="global-notice-actions"></div>
<button class="notice__close" id="global-notice-close" aria-label="Close notification">&times;</button>
</div>
<main class="container" role="main"> <main class="container" role="main">
<div class="panel-grid"> <div class="panel-grid">
<section id="dialog-editor" class="panel full-width-panel" aria-labelledby="dialog-editor-title"> <section id="dialog-editor" class="panel full-width-panel" aria-labelledby="dialog-editor-title">
@ -78,7 +48,6 @@
<button id="save-script-btn">Save Script</button> <button id="save-script-btn">Save Script</button>
<input type="file" id="load-script-input" accept=".jsonl" style="display: none;"> <input type="file" id="load-script-input" accept=".jsonl" style="display: none;">
<button id="load-script-btn">Load Script</button> <button id="load-script-btn">Load Script</button>
<button id="paste-script-btn">Paste Script</button>
</div> </div>
</section> </section>
</div> </div>
@ -132,8 +101,8 @@
</div> </div>
</footer> </footer>
<!-- TTS Settings Modal --> <!-- TTS Settings Modal -->
<div id="tts-settings-modal" class="modal" style="display: none;"> <div id="tts-settings-modal" class="modal" style="display: none;">
<div class="modal-content"> <div class="modal-content">
<div class="modal-header"> <div class="modal-header">
<h3>TTS Settings</h3> <h3>TTS Settings</h3>

View File

@ -10,7 +10,7 @@ const API_BASE_URL = API_BASE_URL_WITH_PREFIX;
* @throws {Error} If the network response is not ok. * @throws {Error} If the network response is not ok.
*/ */
export async function getSpeakers() { export async function getSpeakers() {
const response = await fetch(`${API_BASE_URL}/speakers`); const response = await fetch(`${API_BASE_URL}/speakers/`);
if (!response.ok) { if (!response.ok) {
const errorData = await response.json().catch(() => ({ message: response.statusText })); const errorData = await response.json().catch(() => ({ message: response.statusText }));
throw new Error(`Failed to fetch speakers: ${errorData.detail || errorData.message || response.statusText}`); throw new Error(`Failed to fetch speakers: ${errorData.detail || errorData.message || response.statusText}`);
@ -26,12 +26,12 @@ export async function getSpeakers() {
* Adds a new speaker. * Adds a new speaker.
* @param {FormData} formData - The form data containing speaker name and audio file. * @param {FormData} formData - The form data containing speaker name and audio file.
* Example: formData.append('name', 'New Speaker'); * Example: formData.append('name', 'New Speaker');
* formData.append('audio_file', fileInput.files[0]); * formData.append('audio_sample_file', fileInput.files[0]);
* @returns {Promise<Object>} A promise that resolves to the new speaker object. * @returns {Promise<Object>} A promise that resolves to the new speaker object.
* @throws {Error} If the network response is not ok. * @throws {Error} If the network response is not ok.
*/ */
export async function addSpeaker(formData) { export async function addSpeaker(formData) {
const response = await fetch(`${API_BASE_URL}/speakers`, { const response = await fetch(`${API_BASE_URL}/speakers/`, {
method: 'POST', method: 'POST',
body: formData, // FormData sets Content-Type to multipart/form-data automatically body: formData, // FormData sets Content-Type to multipart/form-data automatically
}); });
@ -86,7 +86,7 @@ export async function addSpeaker(formData) {
* @throws {Error} If the network response is not ok. * @throws {Error} If the network response is not ok.
*/ */
export async function deleteSpeaker(speakerId) { export async function deleteSpeaker(speakerId) {
const response = await fetch(`${API_BASE_URL}/speakers/${speakerId}`, { const response = await fetch(`${API_BASE_URL}/speakers/${speakerId}/`, {
method: 'DELETE', method: 'DELETE',
}); });
if (!response.ok) { if (!response.ok) {
@ -124,8 +124,18 @@ export async function generateLine(line) {
const errorData = await response.json().catch(() => ({ message: response.statusText })); const errorData = await response.json().catch(() => ({ message: response.statusText }));
throw new Error(`Failed to generate line audio: ${errorData.detail || errorData.message || response.statusText}`); throw new Error(`Failed to generate line audio: ${errorData.detail || errorData.message || response.statusText}`);
} }
const data = await response.json();
return data; const responseText = await response.text();
console.log('Raw response text:', responseText);
try {
const jsonData = JSON.parse(responseText);
console.log('Parsed JSON:', jsonData);
return jsonData;
} catch (parseError) {
console.error('JSON parse error:', parseError);
throw new Error(`Invalid JSON response: ${responseText}`);
}
} }
/** /**
@ -136,7 +146,7 @@ export async function generateLine(line) {
* output_base_name: "my_dialog", * output_base_name: "my_dialog",
* dialog_items: [ * dialog_items: [
* { type: "speech", speaker_id: "speaker1", text: "Hello world.", exaggeration: 1.0, cfg_weight: 2.0, temperature: 0.7 }, * { type: "speech", speaker_id: "speaker1", text: "Hello world.", exaggeration: 1.0, cfg_weight: 2.0, temperature: 0.7 },
* { type: "silence", duration: 0.5 }, * { type: "silence", duration_ms: 500 },
* { type: "speech", speaker_id: "speaker2", text: "How are you?" } * { type: "speech", speaker_id: "speaker2", text: "How are you?" }
* ] * ]
* } * }

View File

@ -1,69 +1,6 @@
import { getSpeakers, addSpeaker, deleteSpeaker, generateDialog } from './api.js'; import { getSpeakers, addSpeaker, deleteSpeaker, generateDialog } from './api.js';
import { API_BASE_URL, API_BASE_URL_FOR_FILES } from './config.js'; import { API_BASE_URL, API_BASE_URL_FOR_FILES } from './config.js';
// Shared per-line audio playback state to prevent overlapping playback
let currentLineAudio = null;
let currentLinePlayBtn = null;
let currentLineStopBtn = null;
// --- Global Inline Notification Helpers --- //
const noticeEl = document.getElementById('global-notice');
const noticeContentEl = document.getElementById('global-notice-content');
const noticeActionsEl = document.getElementById('global-notice-actions');
const noticeCloseBtn = document.getElementById('global-notice-close');
function hideNotice() {
if (!noticeEl) return;
noticeEl.style.display = 'none';
noticeEl.className = 'notice';
if (noticeContentEl) noticeContentEl.textContent = '';
if (noticeActionsEl) noticeActionsEl.innerHTML = '';
}
function showNotice(message, type = 'info', options = {}) {
if (!noticeEl || !noticeContentEl || !noticeActionsEl) {
console[type === 'error' ? 'error' : 'log']('[NOTICE]', message);
return () => {};
}
const { timeout = null, actions = [] } = options;
noticeEl.className = `notice notice--${type}`;
noticeContentEl.textContent = message;
noticeActionsEl.innerHTML = '';
actions.forEach(({ text, primary = false, onClick }) => {
const btn = document.createElement('button');
btn.textContent = text;
if (primary) btn.classList.add('btn-primary');
btn.onclick = () => {
try { onClick && onClick(); } finally { hideNotice(); }
};
noticeActionsEl.appendChild(btn);
});
if (noticeCloseBtn) noticeCloseBtn.onclick = hideNotice;
noticeEl.style.display = 'flex';
let timerId = null;
if (timeout && Number.isFinite(timeout)) {
timerId = window.setTimeout(hideNotice, timeout);
}
return () => {
if (timerId) window.clearTimeout(timerId);
hideNotice();
};
}
function confirmAction(message) {
return new Promise((resolve) => {
showNotice(message, 'warning', {
actions: [
{ text: 'Cancel', primary: false, onClick: () => resolve(false) },
{ text: 'Confirm', primary: true, onClick: () => resolve(true) },
],
});
});
}
document.addEventListener('DOMContentLoaded', async () => { document.addEventListener('DOMContentLoaded', async () => {
console.log('DOM fully loaded and parsed'); console.log('DOM fully loaded and parsed');
initializeSpeakerManagement(); initializeSpeakerManagement();
@ -86,24 +23,18 @@ function initializeSpeakerManagement() {
const audioFile = formData.get('audio_file'); const audioFile = formData.get('audio_file');
if (!speakerName || !audioFile || audioFile.size === 0) { if (!speakerName || !audioFile || audioFile.size === 0) {
showNotice('Please provide a speaker name and an audio file.', 'warning', { timeout: 4000 }); alert('Please provide a speaker name and an audio file.');
return; return;
} }
try { try {
const submitBtn = addSpeakerForm.querySelector('button[type="submit"]');
const prevText = submitBtn ? submitBtn.textContent : null;
if (submitBtn) { submitBtn.disabled = true; submitBtn.textContent = 'Adding…'; }
const newSpeaker = await addSpeaker(formData); const newSpeaker = await addSpeaker(formData);
showNotice(`Speaker added: ${newSpeaker.name} (ID: ${newSpeaker.id})`, 'success', { timeout: 3000 }); alert(`Speaker added: ${newSpeaker.name} (ID: ${newSpeaker.id})`);
addSpeakerForm.reset(); addSpeakerForm.reset();
loadSpeakers(); // Refresh speaker list loadSpeakers(); // Refresh speaker list
} catch (error) { } catch (error) {
console.error('Failed to add speaker:', error); console.error('Failed to add speaker:', error);
showNotice('Error adding speaker: ' + error.message, 'error'); alert('Error adding speaker: ' + error.message);
} finally {
const submitBtn = addSpeakerForm.querySelector('button[type="submit"]');
if (submitBtn) { submitBtn.disabled = false; submitBtn.textContent = 'Add Speaker'; }
} }
}); });
} }
@ -148,24 +79,23 @@ async function loadSpeakers() {
} catch (error) { } catch (error) {
console.error('Failed to load speakers:', error); console.error('Failed to load speakers:', error);
speakerListUL.innerHTML = '<li>Error loading speakers. See console for details.</li>'; speakerListUL.innerHTML = '<li>Error loading speakers. See console for details.</li>';
showNotice('Error loading speakers: ' + error.message, 'error'); alert('Error loading speakers: ' + error.message);
} }
} }
async function handleDeleteSpeaker(speakerId) { async function handleDeleteSpeaker(speakerId) {
if (!speakerId) { if (!speakerId) {
showNotice('Cannot delete speaker: Speaker ID is missing.', 'warning', { timeout: 4000 }); alert('Cannot delete speaker: Speaker ID is missing.');
return; return;
} }
const ok = await confirmAction(`Are you sure you want to delete speaker ${speakerId}?`); if (!confirm(`Are you sure you want to delete speaker ${speakerId}?`)) return;
if (!ok) return;
try { try {
await deleteSpeaker(speakerId); await deleteSpeaker(speakerId);
showNotice(`Speaker ${speakerId} deleted successfully.`, 'success', { timeout: 3000 }); alert(`Speaker ${speakerId} deleted successfully.`);
loadSpeakers(); // Refresh speaker list loadSpeakers(); // Refresh speaker list
} catch (error) { } catch (error) {
console.error(`Failed to delete speaker ${speakerId}:`, error); console.error(`Failed to delete speaker ${speakerId}:`, error);
showNotice(`Error deleting speaker: ${error.message}`, 'error'); alert(`Error deleting speaker: ${error.message}`);
} }
} }
@ -201,12 +131,6 @@ async function initializeDialogEditor() {
const saveScriptBtn = document.getElementById('save-script-btn'); const saveScriptBtn = document.getElementById('save-script-btn');
const loadScriptBtn = document.getElementById('load-script-btn'); const loadScriptBtn = document.getElementById('load-script-btn');
const loadScriptInput = document.getElementById('load-script-input'); const loadScriptInput = document.getElementById('load-script-input');
const pasteScriptBtn = document.getElementById('paste-script-btn');
const pasteModal = document.getElementById('paste-script-modal');
const pasteText = document.getElementById('paste-script-text');
const pasteLoadBtn = document.getElementById('paste-script-load');
const pasteCancelBtn = document.getElementById('paste-script-cancel');
const pasteCloseBtn = document.getElementById('paste-script-close');
// Results Display Elements // Results Display Elements
const generationLogPre = document.getElementById('generation-log-content'); // Corrected ID const generationLogPre = document.getElementById('generation-log-content'); // Corrected ID
@ -216,6 +140,9 @@ async function initializeDialogEditor() {
const zipArchivePlaceholder = document.getElementById('zip-archive-placeholder'); const zipArchivePlaceholder = document.getElementById('zip-archive-placeholder');
const resultsDisplaySection = document.getElementById('results-display'); const resultsDisplaySection = document.getElementById('results-display');
let dialogItems = [];
let availableSpeakersCache = []; // Cache for speaker names and IDs
// Load speakers at startup // Load speakers at startup
try { try {
availableSpeakersCache = await getSpeakers(); availableSpeakersCache = await getSpeakers();
@ -225,48 +152,6 @@ async function initializeDialogEditor() {
// Continue without speakers - they'll be loaded when needed // Continue without speakers - they'll be loaded when needed
} }
// --- LocalStorage persistence helpers ---
const LS_KEY = 'dialogEditor.items.v1';
function saveDialogToLocalStorage() {
try {
const exportData = dialogItems.map(item => {
const obj = { type: item.type };
if (item.type === 'speech') {
obj.speaker_id = item.speaker_id;
obj.text = item.text;
if (item.exaggeration !== undefined) obj.exaggeration = item.exaggeration;
if (item.cfg_weight !== undefined) obj.cfg_weight = item.cfg_weight;
if (item.temperature !== undefined) obj.temperature = item.temperature;
if (item.audioUrl) obj.audioUrl = item.audioUrl; // keep existing audio reference if present
} else if (item.type === 'silence') {
obj.duration = item.duration;
}
return obj;
});
localStorage.setItem(LS_KEY, JSON.stringify({ items: exportData }));
} catch (e) {
console.warn('Failed to save dialog to localStorage:', e);
}
}
function loadDialogFromLocalStorage() {
try {
const raw = localStorage.getItem(LS_KEY);
if (!raw) return;
const parsed = JSON.parse(raw);
if (!parsed || !Array.isArray(parsed.items)) return;
const loaded = parsed.items.map(normalizeDialogItem);
dialogItems.splice(0, dialogItems.length, ...loaded);
console.log(`Restored ${loaded.length} dialog items from localStorage`);
} catch (e) {
console.warn('Failed to load dialog from localStorage:', e);
}
}
// Attempt to restore saved dialog before first render
loadDialogFromLocalStorage();
// Function to render the current dialogItems array to the DOM as table rows // Function to render the current dialogItems array to the DOM as table rows
function renderDialogItems() { function renderDialogItems() {
if (!dialogItemsContainer) return; if (!dialogItemsContainer) return;
@ -299,8 +184,6 @@ async function initializeDialogEditor() {
}); });
speakerSelect.onchange = (e) => { speakerSelect.onchange = (e) => {
dialogItems[index].speaker_id = e.target.value; dialogItems[index].speaker_id = e.target.value;
// Persist change
saveDialogToLocalStorage();
}; };
speakerTd.appendChild(speakerSelect); speakerTd.appendChild(speakerSelect);
} else { } else {
@ -312,7 +195,8 @@ async function initializeDialogEditor() {
const textTd = document.createElement('td'); const textTd = document.createElement('td');
textTd.className = 'dialog-editable-cell'; textTd.className = 'dialog-editable-cell';
if (item.type === 'speech') { if (item.type === 'speech') {
textTd.textContent = `"${item.text}"`; let txt = item.text.length > 60 ? item.text.substring(0, 57) + '…' : item.text;
textTd.textContent = `"${txt}"`;
textTd.title = item.text; textTd.title = item.text;
} else { } else {
textTd.textContent = `${item.duration}s`; textTd.textContent = `${item.duration}s`;
@ -359,8 +243,6 @@ async function initializeDialogEditor() {
if (!isNaN(val) && val > 0) dialogItems[index].duration = val; if (!isNaN(val) && val > 0) dialogItems[index].duration = val;
dialogItems[index].audioUrl = null; dialogItems[index].audioUrl = null;
} }
// Persist changes before re-render
saveDialogToLocalStorage();
renderDialogItems(); renderDialogItems();
} }
}; };
@ -379,7 +261,6 @@ async function initializeDialogEditor() {
upBtn.onclick = () => { upBtn.onclick = () => {
if (index > 0) { if (index > 0) {
[dialogItems[index - 1], dialogItems[index]] = [dialogItems[index], dialogItems[index - 1]]; [dialogItems[index - 1], dialogItems[index]] = [dialogItems[index], dialogItems[index - 1]];
saveDialogToLocalStorage();
renderDialogItems(); renderDialogItems();
} }
}; };
@ -394,7 +275,6 @@ async function initializeDialogEditor() {
downBtn.onclick = () => { downBtn.onclick = () => {
if (index < dialogItems.length - 1) { if (index < dialogItems.length - 1) {
[dialogItems[index], dialogItems[index + 1]] = [dialogItems[index + 1], dialogItems[index]]; [dialogItems[index], dialogItems[index + 1]] = [dialogItems[index + 1], dialogItems[index]];
saveDialogToLocalStorage();
renderDialogItems(); renderDialogItems();
} }
}; };
@ -408,7 +288,6 @@ async function initializeDialogEditor() {
removeBtn.title = 'Remove'; removeBtn.title = 'Remove';
removeBtn.onclick = () => { removeBtn.onclick = () => {
dialogItems.splice(index, 1); dialogItems.splice(index, 1);
saveDialogToLocalStorage();
renderDialogItems(); renderDialogItems();
}; };
actionsTd.appendChild(removeBtn); actionsTd.appendChild(removeBtn);
@ -435,8 +314,6 @@ async function initializeDialogEditor() {
if (result && result.audio_url) { if (result && result.audio_url) {
dialogItems[index].audioUrl = result.audio_url; dialogItems[index].audioUrl = result.audio_url;
console.log('Set audioUrl to:', result.audio_url); console.log('Set audioUrl to:', result.audio_url);
// Persist newly generated audio reference
saveDialogToLocalStorage();
} else { } else {
console.error('Invalid result structure:', result); console.error('Invalid result structure:', result);
throw new Error('Invalid response: missing audio_url'); throw new Error('Invalid response: missing audio_url');
@ -444,7 +321,7 @@ async function initializeDialogEditor() {
} catch (err) { } catch (err) {
console.error('Error in generateLine:', err); console.error('Error in generateLine:', err);
dialogItems[index].error = err.message || 'Failed to generate audio.'; dialogItems[index].error = err.message || 'Failed to generate audio.';
showNotice(dialogItems[index].error, 'error'); alert(dialogItems[index].error);
} finally { } finally {
dialogItems[index].isGenerating = false; dialogItems[index].isGenerating = false;
renderDialogItems(); renderDialogItems();
@ -453,107 +330,19 @@ async function initializeDialogEditor() {
actionsTd.appendChild(generateBtn); actionsTd.appendChild(generateBtn);
// --- NEW: Per-line Play button --- // --- NEW: Per-line Play button ---
const playPauseBtn = document.createElement('button'); const playBtn = document.createElement('button');
playPauseBtn.innerHTML = '⏵'; playBtn.innerHTML = '⏵';
playPauseBtn.title = item.audioUrl ? 'Play' : 'No audio generated yet'; playBtn.title = item.audioUrl ? 'Play generated audio' : 'No audio generated yet';
playPauseBtn.className = 'play-line-btn'; playBtn.className = 'play-line-btn';
playPauseBtn.disabled = !item.audioUrl; playBtn.disabled = !item.audioUrl;
playBtn.onclick = () => {
const stopBtn = document.createElement('button');
stopBtn.innerHTML = '⏹';
stopBtn.title = 'Stop';
stopBtn.className = 'stop-line-btn';
stopBtn.disabled = !item.audioUrl;
const setBtnStatesForPlaying = () => {
try {
playPauseBtn.innerHTML = '⏸';
playPauseBtn.title = 'Pause';
stopBtn.disabled = false;
} catch (e) { /* detached */ }
};
const setBtnStatesForPausedOrStopped = () => {
try {
playPauseBtn.innerHTML = '⏵';
playPauseBtn.title = 'Play';
} catch (e) { /* detached */ }
};
const stopCurrent = () => {
if (currentLineAudio) {
try { currentLineAudio.pause(); currentLineAudio.currentTime = 0; } catch (e) { /* noop */ }
}
if (currentLinePlayBtn) {
try { currentLinePlayBtn.innerHTML = '⏵'; currentLinePlayBtn.title = 'Play'; } catch (e) { /* detached */ }
}
if (currentLineStopBtn) {
try { currentLineStopBtn.disabled = true; } catch (e) { /* detached */ }
}
currentLineAudio = null;
currentLinePlayBtn = null;
currentLineStopBtn = null;
};
playPauseBtn.onclick = () => {
if (!item.audioUrl) return; if (!item.audioUrl) return;
const audioUrl = item.audioUrl.startsWith('http') ? item.audioUrl : `${API_BASE_URL_FOR_FILES}${item.audioUrl}`; let audioUrl = item.audioUrl.startsWith('http') ? item.audioUrl : `${API_BASE_URL_FOR_FILES}${item.audioUrl}`;
// Use a shared audio element or create one per play
// If controlling the same line let audio = new window.Audio(audioUrl);
if (currentLineAudio && currentLinePlayBtn === playPauseBtn) { audio.play();
if (currentLineAudio.paused) {
// Resume
currentLineAudio.play().then(() => setBtnStatesForPlaying()).catch(err => {
console.error('Audio resume failed:', err);
showNotice('Could not resume audio.', 'error', { timeout: 2000 });
});
} else {
// Pause
try { currentLineAudio.pause(); } catch (e) { /* noop */ }
setBtnStatesForPausedOrStopped();
}
return;
}
// Switching to a different line: stop previous
if (currentLineAudio) {
stopCurrent();
}
// Start new audio
const audio = new window.Audio(audioUrl);
currentLineAudio = audio;
currentLinePlayBtn = playPauseBtn;
currentLineStopBtn = stopBtn;
const clearState = () => {
if (currentLineAudio === audio) {
setBtnStatesForPausedOrStopped();
try { stopBtn.disabled = true; } catch (e) { /* detached */ }
currentLineAudio = null;
currentLinePlayBtn = null;
currentLineStopBtn = null;
}
};
audio.addEventListener('ended', clearState, { once: true });
audio.addEventListener('error', clearState, { once: true });
audio.play().then(() => setBtnStatesForPlaying()).catch(err => {
console.error('Audio play failed:', err);
clearState();
showNotice('Could not play audio.', 'error', { timeout: 2000 });
});
}; };
actionsTd.appendChild(playBtn);
stopBtn.onclick = () => {
// Only acts if this line is the active one
if (currentLineAudio && currentLinePlayBtn === playPauseBtn) {
stopCurrent();
}
};
actionsTd.appendChild(playPauseBtn);
actionsTd.appendChild(stopBtn);
// --- NEW: Settings button for speech items --- // --- NEW: Settings button for speech items ---
if (item.type === 'speech') { if (item.type === 'speech') {
@ -594,13 +383,13 @@ async function initializeDialogEditor() {
try { try {
availableSpeakersCache = await getSpeakers(); availableSpeakersCache = await getSpeakers();
} catch (error) { } catch (error) {
showNotice('Could not load speakers. Please try again.', 'error'); alert('Could not load speakers. Please try again.');
console.error('Error fetching speakers for dialog:', error); console.error('Error fetching speakers for dialog:', error);
return; return;
} }
} }
if (availableSpeakersCache.length === 0) { if (availableSpeakersCache.length === 0) {
showNotice('No speakers available. Please add a speaker first.', 'warning', { timeout: 4000 }); alert('No speakers available. Please add a speaker first.');
return; return;
} }
@ -630,11 +419,10 @@ async function initializeDialogEditor() {
const speakerId = speakerSelect.value; const speakerId = speakerSelect.value;
const text = textInput.value.trim(); const text = textInput.value.trim();
if (!speakerId || !text) { if (!speakerId || !text) {
showNotice('Please select a speaker and enter text.', 'warning', { timeout: 4000 }); alert('Please select a speaker and enter text.');
return; return;
} }
dialogItems.push(normalizeDialogItem({ type: 'speech', speaker_id: speakerId, text: text })); dialogItems.push(normalizeDialogItem({ type: 'speech', speaker_id: speakerId, text: text }));
saveDialogToLocalStorage();
renderDialogItems(); renderDialogItems();
clearTempInputArea(); clearTempInputArea();
}; };
@ -673,11 +461,10 @@ async function initializeDialogEditor() {
addButton.onclick = () => { addButton.onclick = () => {
const duration = parseFloat(durationInput.value); const duration = parseFloat(durationInput.value);
if (isNaN(duration) || duration <= 0) { if (isNaN(duration) || duration <= 0) {
showNotice('Invalid duration. Please enter a positive number.', 'warning', { timeout: 4000 }); alert('Invalid duration. Please enter a positive number.');
return; return;
} }
dialogItems.push(normalizeDialogItem({ type: 'silence', duration: duration })); dialogItems.push(normalizeDialogItem({ type: 'silence', duration: duration }));
saveDialogToLocalStorage();
renderDialogItems(); renderDialogItems();
clearTempInputArea(); clearTempInputArea();
}; };
@ -699,18 +486,15 @@ async function initializeDialogEditor() {
generateDialogBtn.addEventListener('click', async () => { generateDialogBtn.addEventListener('click', async () => {
const outputBaseName = outputBaseNameInput.value.trim(); const outputBaseName = outputBaseNameInput.value.trim();
if (!outputBaseName) { if (!outputBaseName) {
showNotice('Please enter an output base name.', 'warning', { timeout: 4000 }); alert('Please enter an output base name.');
outputBaseNameInput.focus(); outputBaseNameInput.focus();
return; return;
} }
if (dialogItems.length === 0) { if (dialogItems.length === 0) {
showNotice('Please add at least one speech or silence line to the dialog.', 'warning', { timeout: 4000 }); alert('Please add at least one speech or silence line to the dialog.');
return; // Prevent further execution if no dialog items return; // Prevent further execution if no dialog items
} }
const prevText = generateDialogBtn.textContent;
generateDialogBtn.disabled = true;
generateDialogBtn.textContent = 'Generating…';
// Smart dialog-wide generation: use pre-generated audio where present // Smart dialog-wide generation: use pre-generated audio where present
const dialogItemsToGenerate = dialogItems.map(item => { const dialogItemsToGenerate = dialogItems.map(item => {
// Only send minimal fields for items that need generation // Only send minimal fields for items that need generation
@ -762,11 +546,7 @@ async function initializeDialogEditor() {
} catch (error) { } catch (error) {
console.error('Dialog generation failed:', error); console.error('Dialog generation failed:', error);
if (generationLogPre) generationLogPre.textContent = `Error generating dialog: ${error.message}`; if (generationLogPre) generationLogPre.textContent = `Error generating dialog: ${error.message}`;
showNotice(`Error generating dialog: ${error.message}`, 'error'); alert(`Error generating dialog: ${error.message}`);
}
finally {
generateDialogBtn.disabled = false;
generateDialogBtn.textContent = prevText;
} }
}); });
} }
@ -774,7 +554,7 @@ async function initializeDialogEditor() {
// --- Save/Load Script Functionality --- // --- Save/Load Script Functionality ---
function saveDialogScript() { function saveDialogScript() {
if (dialogItems.length === 0) { if (dialogItems.length === 0) {
showNotice('No dialog items to save. Please add some speech or silence lines first.', 'warning', { timeout: 4000 }); alert('No dialog items to save. Please add some speech or silence lines first.');
return; return;
} }
@ -819,12 +599,11 @@ async function initializeDialogEditor() {
URL.revokeObjectURL(url); URL.revokeObjectURL(url);
console.log(`Dialog script saved as ${filename}`); console.log(`Dialog script saved as ${filename}`);
showNotice(`Dialog script saved as ${filename}`, 'success', { timeout: 3000 });
} }
function loadDialogScript(file) { function loadDialogScript(file) {
if (!file) { if (!file) {
showNotice('Please select a file to load.', 'warning', { timeout: 4000 }); alert('Please select a file to load.');
return; return;
} }
@ -847,19 +626,19 @@ async function initializeDialogEditor() {
} }
} catch (parseError) { } catch (parseError) {
console.error(`Error parsing line ${i + 1}:`, parseError); console.error(`Error parsing line ${i + 1}:`, parseError);
showNotice(`Error parsing line ${i + 1}: ${parseError.message}`, 'error'); alert(`Error parsing line ${i + 1}: ${parseError.message}`);
return; return;
} }
} }
if (loadedItems.length === 0) { if (loadedItems.length === 0) {
showNotice('No valid dialog items found in the file.', 'warning', { timeout: 4000 }); alert('No valid dialog items found in the file.');
return; return;
} }
// Confirm replacement if existing items // Confirm replacement if existing items
if (dialogItems.length > 0) { if (dialogItems.length > 0) {
const confirmed = await confirmAction( const confirmed = confirm(
`This will replace your current dialog (${dialogItems.length} items) with the loaded script (${loadedItems.length} items). Continue?` `This will replace your current dialog (${dialogItems.length} items) with the loaded script (${loadedItems.length} items). Continue?`
); );
if (!confirmed) return; if (!confirmed) return;
@ -871,97 +650,30 @@ async function initializeDialogEditor() {
availableSpeakersCache = await getSpeakers(); availableSpeakersCache = await getSpeakers();
} catch (error) { } catch (error) {
console.error('Error fetching speakers:', error); console.error('Error fetching speakers:', error);
showNotice('Could not load speakers. Dialog loaded but speaker names may not display correctly.', 'warning', { timeout: 5000 }); alert('Could not load speakers. Dialog loaded but speaker names may not display correctly.');
} }
} }
// Replace current dialog // Replace current dialog
dialogItems.splice(0, dialogItems.length, ...loadedItems); dialogItems.splice(0, dialogItems.length, ...loadedItems);
// Persist loaded script
saveDialogToLocalStorage();
renderDialogItems(); renderDialogItems();
console.log(`Loaded ${loadedItems.length} dialog items from script`); console.log(`Loaded ${loadedItems.length} dialog items from script`);
showNotice(`Successfully loaded ${loadedItems.length} dialog items.`, 'success', { timeout: 3000 }); alert(`Successfully loaded ${loadedItems.length} dialog items.`);
} catch (error) { } catch (error) {
console.error('Error loading dialog script:', error); console.error('Error loading dialog script:', error);
showNotice(`Error loading dialog script: ${error.message}`, 'error'); alert(`Error loading dialog script: ${error.message}`);
} }
}; };
reader.onerror = function() { reader.onerror = function() {
showNotice('Error reading file. Please try again.', 'error'); alert('Error reading file. Please try again.');
}; };
reader.readAsText(file); reader.readAsText(file);
} }
// Load dialog script from pasted JSONL text
async function loadDialogScriptFromText(text) {
if (!text || !text.trim()) {
showNotice('Please paste JSONL content to load.', 'warning', { timeout: 4000 });
return false;
}
try {
const lines = text.trim().split('\n');
const loadedItems = [];
for (let i = 0; i < lines.length; i++) {
const line = lines[i].trim();
if (!line) continue; // Skip empty lines
try {
const item = JSON.parse(line);
const validatedItem = validateDialogItem(item, i + 1);
if (validatedItem) {
loadedItems.push(normalizeDialogItem(validatedItem));
}
} catch (parseError) {
console.error(`Error parsing line ${i + 1}:`, parseError);
showNotice(`Error parsing line ${i + 1}: ${parseError.message}`, 'error');
return false;
}
}
if (loadedItems.length === 0) {
showNotice('No valid dialog items found in the pasted content.', 'warning', { timeout: 4000 });
return false;
}
// Confirm replacement if existing items
if (dialogItems.length > 0) {
const confirmed = await confirmAction(
`This will replace your current dialog (${dialogItems.length} items) with the pasted script (${loadedItems.length} items). Continue?`
);
if (!confirmed) return false;
}
// Ensure speakers are loaded before rendering
if (availableSpeakersCache.length === 0) {
try {
availableSpeakersCache = await getSpeakers();
} catch (error) {
console.error('Error fetching speakers:', error);
showNotice('Could not load speakers. Dialog loaded but speaker names may not display correctly.', 'warning', { timeout: 5000 });
}
}
// Replace current dialog
dialogItems.splice(0, dialogItems.length, ...loadedItems);
// Persist loaded script
saveDialogToLocalStorage();
renderDialogItems();
console.log(`Loaded ${loadedItems.length} dialog items from pasted text`);
showNotice(`Successfully loaded ${loadedItems.length} dialog items.`, 'success', { timeout: 3000 });
return true;
} catch (error) {
console.error('Error loading dialog script from text:', error);
showNotice(`Error loading dialog script: ${error.message}`, 'error');
return false;
}
}
function validateDialogItem(item, lineNumber) { function validateDialogItem(item, lineNumber) {
if (!item || typeof item !== 'object') { if (!item || typeof item !== 'object') {
throw new Error(`Line ${lineNumber}: Invalid item format`); throw new Error(`Line ${lineNumber}: Invalid item format`);
@ -1017,75 +729,12 @@ async function initializeDialogEditor() {
const file = e.target.files[0]; const file = e.target.files[0];
if (file) { if (file) {
loadDialogScript(file); loadDialogScript(file);
// Reset input so same file can be loaded again
e.target.value = '';
} }
}); });
} }
// --- Paste Script (JSONL) Modal Handlers ---
if (pasteScriptBtn && pasteModal && pasteText && pasteLoadBtn && pasteCancelBtn && pasteCloseBtn) {
let escHandler = null;
const closePasteModal = () => {
pasteModal.style.display = 'none';
pasteLoadBtn.onclick = null;
pasteCancelBtn.onclick = null;
pasteCloseBtn.onclick = null;
pasteModal.onclick = null;
if (escHandler) {
document.removeEventListener('keydown', escHandler);
escHandler = null;
}
};
const openPasteModal = () => {
pasteText.value = '';
pasteModal.style.display = 'flex';
escHandler = (e) => { if (e.key === 'Escape') closePasteModal(); };
document.addEventListener('keydown', escHandler);
pasteModal.onclick = (e) => { if (e.target === pasteModal) closePasteModal(); };
pasteCloseBtn.onclick = closePasteModal;
pasteCancelBtn.onclick = closePasteModal;
pasteLoadBtn.onclick = async () => {
const ok = await loadDialogScriptFromText(pasteText.value);
if (ok) closePasteModal();
};
};
pasteScriptBtn.addEventListener('click', openPasteModal);
}
// --- Clear Dialog Button ---
let clearDialogBtn = document.getElementById('clear-dialog-btn');
if (!clearDialogBtn) {
clearDialogBtn = document.createElement('button');
clearDialogBtn.id = 'clear-dialog-btn';
clearDialogBtn.textContent = 'Clear Dialog';
// Insert next to Save/Load if possible
const saveLoadContainer = saveScriptBtn ? saveScriptBtn.parentElement : null;
if (saveLoadContainer) {
saveLoadContainer.appendChild(clearDialogBtn);
} else {
// Fallback: append near the add buttons container
const addBtnsContainer = addSpeechLineBtn ? addSpeechLineBtn.parentElement : null;
if (addBtnsContainer) addBtnsContainer.appendChild(clearDialogBtn);
}
}
if (clearDialogBtn) {
clearDialogBtn.addEventListener('click', async () => {
if (dialogItems.length === 0) {
showNotice('Dialog is already empty.', 'info', { timeout: 2500 });
return;
}
const ok = await confirmAction(`This will remove ${dialogItems.length} dialog item(s). Continue?`);
if (!ok) return;
// Clear any transient input UI
if (typeof clearTempInputArea === 'function') clearTempInputArea();
// Clear state and persistence
dialogItems.splice(0, dialogItems.length);
try { localStorage.removeItem(LS_KEY); } catch (e) { /* ignore */ }
renderDialogItems();
showNotice('Dialog cleared.', 'success', { timeout: 2500 });
});
}
console.log('Dialog Editor Initialized'); console.log('Dialog Editor Initialized');
renderDialogItems(); // Initial render (empty) renderDialogItems(); // Initial render (empty)
@ -1132,8 +781,6 @@ async function initializeDialogEditor() {
dialogItems[index].audioUrl = null; dialogItems[index].audioUrl = null;
closeModal(); closeModal();
// Persist settings change
saveDialogToLocalStorage();
renderDialogItems(); // Re-render to reflect changes renderDialogItems(); // Re-render to reflect changes
console.log('TTS settings updated for item:', dialogItems[index]); console.log('TTS settings updated for item:', dialogItems[index]);
}; };

View File

@ -13,15 +13,8 @@ const getEnvVar = (name, defaultValue) => {
}; };
// API Configuration // API Configuration
// Default to the same hostname as the frontend, on port 8000 (override via VITE_API_BASE_URL*) export const API_BASE_URL = getEnvVar('VITE_API_BASE_URL', 'http://localhost:8000');
const _defaultHost = (typeof window !== 'undefined' && window.location?.hostname) || 'localhost'; export const API_BASE_URL_WITH_PREFIX = getEnvVar('VITE_API_BASE_URL_WITH_PREFIX', 'http://localhost:8000/api');
const _defaultPort = getEnvVar('VITE_API_BASE_URL_PORT', '8000');
const _defaultBase = `http://${_defaultHost}:${_defaultPort}`;
export const API_BASE_URL = getEnvVar('VITE_API_BASE_URL', _defaultBase);
export const API_BASE_URL_WITH_PREFIX = getEnvVar(
'VITE_API_BASE_URL_WITH_PREFIX',
`${_defaultBase}/api`
);
// For file serving (same as API_BASE_URL since files are served from the same server) // For file serving (same as API_BASE_URL since files are served from the same server)
export const API_BASE_URL_FOR_FILES = API_BASE_URL; export const API_BASE_URL_FOR_FILES = API_BASE_URL;

View File

@ -1,9 +0,0 @@
// jest.config.cjs
module.exports = {
testEnvironment: 'node',
transform: {
'^.+\\.js$': 'babel-jest',
},
moduleFileExtensions: ['js', 'json'],
roots: ['<rootDir>/frontend/tests', '<rootDir>'],
};

View File

@ -5,13 +5,11 @@
"main": "index.js", "main": "index.js",
"type": "module", "type": "module",
"scripts": { "scripts": {
"test": "jest", "test": "jest"
"test:frontend": "jest --config ./jest.config.cjs",
"frontend:dev": "python3 frontend/start_dev_server.py"
}, },
"repository": { "repository": {
"type": "git", "type": "git",
"url": "https://gitea.r8z.us/stwhite/chatterbox-ui.git" "url": "https://oauth2:78f77aaebb8fa1cd3efbd5b738177c127f7d7d0b@gitea.r8z.us/stwhite/chatterbox-ui.git"
}, },
"keywords": [], "keywords": [],
"author": "", "author": "",
@ -19,7 +17,7 @@
"devDependencies": { "devDependencies": {
"@babel/core": "^7.27.4", "@babel/core": "^7.27.4",
"@babel/preset-env": "^7.27.2", "@babel/preset-env": "^7.27.2",
"babel-jest": "^29.7.0", "babel-jest": "^30.0.0-beta.3",
"jest": "^29.7.0" "jest": "^29.7.0"
} }
} }

View File

@ -1,123 +0,0 @@
#Requires -Version 5.1
<#!
Chatterbox TTS - Windows setup script
What it does:
- Creates a Python virtual environment in .venv (if missing)
- Upgrades pip
- Installs dependencies from backend/requirements.txt and requirements.txt
- Creates a default .env with sensible ports if not present
- Launches start_servers.py using the venv's Python
Usage:
- Right-click this file and "Run with PowerShell" OR from PowerShell:
./setup-windows.ps1
- Optional flags:
-NoInstall -> Skip installing dependencies (just start servers)
-NoStart -> Prepare env but do not start servers
Notes:
- You may need to allow script execution once:
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
- Press Ctrl+C in the console to stop both servers.
!#>
param(
[switch]$NoInstall,
[switch]$NoStart
)
$ErrorActionPreference = 'Stop'
function Write-Info($msg) { Write-Host "[INFO] $msg" -ForegroundColor Cyan }
function Write-Ok($msg) { Write-Host "[ OK ] $msg" -ForegroundColor Green }
function Write-Warn($msg) { Write-Host "[WARN] $msg" -ForegroundColor Yellow }
function Write-Err($msg) { Write-Host "[FAIL] $msg" -ForegroundColor Red }
$root = Split-Path -Parent $MyInvocation.MyCommand.Path
Set-Location $root
$venvDir = Join-Path $root ".venv"
$venvPython = Join-Path $venvDir "Scripts/python.exe"
# 1) Ensure Python available
function Get-BasePython {
try {
$pyExe = (Get-Command py -ErrorAction SilentlyContinue)
if ($pyExe) { return 'py -3' }
} catch { }
try {
$pyExe = (Get-Command python -ErrorAction SilentlyContinue)
if ($pyExe) { return 'python' }
} catch { }
throw "Python not found. Please install Python 3.x and add it to PATH."
}
# 2) Create venv if missing
if (-not (Test-Path $venvPython)) {
Write-Info "Creating virtual environment in .venv"
$basePy = Get-BasePython
if ($basePy -eq 'py -3') {
& py -3 -m venv .venv
} else {
& python -m venv .venv
}
Write-Ok "Virtual environment created"
} else {
Write-Info "Using existing virtual environment: $venvDir"
}
if (-not (Test-Path $venvPython)) {
throw ".venv python not found at $venvPython"
}
# 3) Install dependencies
if (-not $NoInstall) {
Write-Info "Upgrading pip"
& $venvPython -m pip install --upgrade pip
# Backend requirements
$backendReq = Join-Path $root 'backend/requirements.txt'
if (Test-Path $backendReq) {
Write-Info "Installing backend requirements"
& $venvPython -m pip install -r $backendReq
} else {
Write-Warn "backend/requirements.txt not found"
}
# Root requirements (optional frontend / project libs)
$rootReq = Join-Path $root 'requirements.txt'
if (Test-Path $rootReq) {
Write-Info "Installing root requirements"
& $venvPython -m pip install -r $rootReq
} else {
Write-Warn "requirements.txt not found at repo root"
}
Write-Ok "Dependency installation complete"
}
# 4) Ensure .env exists with sensible defaults
$envPath = Join-Path $root '.env'
if (-not (Test-Path $envPath)) {
Write-Info "Creating default .env"
@(
'BACKEND_PORT=8000',
'BACKEND_HOST=127.0.0.1',
'FRONTEND_PORT=8001',
'FRONTEND_HOST=127.0.0.1'
) -join "`n" | Out-File -FilePath $envPath -Encoding utf8 -Force
Write-Ok ".env created"
} else {
Write-Info ".env already exists; leaving as-is"
}
# 5) Start servers
if ($NoStart) {
Write-Info "-NoStart specified; setup complete. You can start later with:"
Write-Host " `"$venvPython`" `"$root\start_servers.py`"" -ForegroundColor Gray
exit 0
}
Write-Info "Starting servers via start_servers.py"
& $venvPython "$root/start_servers.py"

View File

@ -28,9 +28,3 @@ dd3552d9-f4e8-49ed-9892-f9e67afcf23c:
2cdd6d3d-c533-44bf-a5f6-cc83bd089d32: 2cdd6d3d-c533-44bf-a5f6-cc83bd089d32:
name: Grace name: Grace
sample_path: speaker_samples/2cdd6d3d-c533-44bf-a5f6-cc83bd089d32.wav sample_path: speaker_samples/2cdd6d3d-c533-44bf-a5f6-cc83bd089d32.wav
3d3e85db-3d67-4488-94b2-ffc189fbb287:
name: RCB
sample_path: speaker_samples/3d3e85db-3d67-4488-94b2-ffc189fbb287.wav
f754cf35-892c-49b6-822a-f2e37246623b:
name: Jim
sample_path: speaker_samples/f754cf35-892c-49b6-822a-f2e37246623b.wav

View File

@ -14,109 +14,101 @@ from pathlib import Path
# Try to load environment variables, but don't fail if dotenv is not available # Try to load environment variables, but don't fail if dotenv is not available
try: try:
from dotenv import load_dotenv from dotenv import load_dotenv
load_dotenv() load_dotenv()
except ImportError: except ImportError:
print("python-dotenv not installed, using system environment variables only") print("python-dotenv not installed, using system environment variables only")
# Configuration # Configuration
BACKEND_PORT = int(os.getenv("BACKEND_PORT", "8000")) BACKEND_PORT = int(os.getenv('BACKEND_PORT', '8000'))
BACKEND_HOST = os.getenv("BACKEND_HOST", "0.0.0.0") BACKEND_HOST = os.getenv('BACKEND_HOST', '0.0.0.0')
# Frontend host/port (for dev server binding) FRONTEND_PORT = int(os.getenv('FRONTEND_PORT', '8001'))
FRONTEND_PORT = int(os.getenv("FRONTEND_PORT", "8001")) FRONTEND_HOST = os.getenv('FRONTEND_HOST', '127.0.0.1')
FRONTEND_HOST = os.getenv("FRONTEND_HOST", "0.0.0.0")
# Export frontend host/port so backend CORS config can pick them up automatically
os.environ["FRONTEND_HOST"] = FRONTEND_HOST
os.environ["FRONTEND_PORT"] = str(FRONTEND_PORT)
# Get project root directory # Get project root directory
PROJECT_ROOT = Path(__file__).parent.absolute() PROJECT_ROOT = Path(__file__).parent.absolute()
def run_backend(): def run_backend():
"""Run the backend FastAPI server""" """Run the backend FastAPI server"""
os.chdir(PROJECT_ROOT / "backend") os.chdir(PROJECT_ROOT / "backend")
cmd = [ cmd = [
sys.executable, sys.executable, "-m", "uvicorn",
"-m", "app.main:app",
"uvicorn", "--reload",
"app.main:app", f"--host={BACKEND_HOST}",
"--reload", f"--port={BACKEND_PORT}"
f"--host={BACKEND_HOST}",
f"--port={BACKEND_PORT}",
] ]
print(f"\n{'='*50}") print(f"\n{'='*50}")
print(f"Starting Backend Server at http://{BACKEND_HOST}:{BACKEND_PORT}") print(f"Starting Backend Server at http://{BACKEND_HOST}:{BACKEND_PORT}")
print(f"API docs available at http://{BACKEND_HOST}:{BACKEND_PORT}/docs") print(f"API docs available at http://{BACKEND_HOST}:{BACKEND_PORT}/docs")
print(f"{'='*50}\n") print(f"{'='*50}\n")
return subprocess.Popen( return subprocess.Popen(
cmd, cmd,
stdout=subprocess.PIPE, stdout=subprocess.PIPE,
stderr=subprocess.STDOUT, stderr=subprocess.STDOUT,
universal_newlines=True, universal_newlines=True,
bufsize=1, bufsize=1
) )
def run_frontend(): def run_frontend():
"""Run the frontend development server""" """Run the frontend development server"""
frontend_dir = PROJECT_ROOT / "frontend" frontend_dir = PROJECT_ROOT / "frontend"
os.chdir(frontend_dir) os.chdir(frontend_dir)
cmd = [sys.executable, "start_dev_server.py"] cmd = [sys.executable, "start_dev_server.py"]
env = os.environ.copy() env = os.environ.copy()
env["VITE_DEV_SERVER_HOST"] = FRONTEND_HOST env["VITE_DEV_SERVER_HOST"] = FRONTEND_HOST
env["VITE_DEV_SERVER_PORT"] = str(FRONTEND_PORT) env["VITE_DEV_SERVER_PORT"] = str(FRONTEND_PORT)
print(f"\n{'='*50}") print(f"\n{'='*50}")
print(f"Starting Frontend Server at http://{FRONTEND_HOST}:{FRONTEND_PORT}") print(f"Starting Frontend Server at http://{FRONTEND_HOST}:{FRONTEND_PORT}")
print(f"{'='*50}\n") print(f"{'='*50}\n")
return subprocess.Popen( return subprocess.Popen(
cmd, cmd,
env=env, env=env,
stdout=subprocess.PIPE, stdout=subprocess.PIPE,
stderr=subprocess.STDOUT, stderr=subprocess.STDOUT,
universal_newlines=True, universal_newlines=True,
bufsize=1, bufsize=1
) )
def print_process_output(process, prefix): def print_process_output(process, prefix):
"""Print process output with a prefix""" """Print process output with a prefix"""
for line in iter(process.stdout.readline, ""): for line in iter(process.stdout.readline, ''):
if not line: if not line:
break break
print(f"{prefix} | {line}", end="") print(f"{prefix} | {line}", end='')
def main(): def main():
"""Main function to start both servers""" """Main function to start both servers"""
print("\n🚀 Starting Chatterbox UI Development Environment") print("\n🚀 Starting Chatterbox UI Development Environment")
# Start the backend server # Start the backend server
backend_process = run_backend() backend_process = run_backend()
# Give the backend a moment to start # Give the backend a moment to start
time.sleep(2) time.sleep(2)
# Start the frontend server # Start the frontend server
frontend_process = run_frontend() frontend_process = run_frontend()
# Create threads to monitor and print output # Create threads to monitor and print output
backend_monitor = threading.Thread( backend_monitor = threading.Thread(
target=print_process_output, args=(backend_process, "BACKEND"), daemon=True target=print_process_output,
args=(backend_process, "BACKEND"),
daemon=True
) )
frontend_monitor = threading.Thread( frontend_monitor = threading.Thread(
target=print_process_output, args=(frontend_process, "FRONTEND"), daemon=True target=print_process_output,
args=(frontend_process, "FRONTEND"),
daemon=True
) )
backend_monitor.start() backend_monitor.start()
frontend_monitor.start() frontend_monitor.start()
# Setup signal handling for graceful shutdown # Setup signal handling for graceful shutdown
def signal_handler(sig, frame): def signal_handler(sig, frame):
print("\n\n🛑 Shutting down servers...") print("\n\n🛑 Shutting down servers...")
@ -125,16 +117,16 @@ def main():
# Threads are daemon, so they'll exit when the main thread exits # Threads are daemon, so they'll exit when the main thread exits
print("✅ Servers stopped successfully") print("✅ Servers stopped successfully")
sys.exit(0) sys.exit(0)
signal.signal(signal.SIGINT, signal_handler) signal.signal(signal.SIGINT, signal_handler)
# Print access information # Print access information
print("\n📋 Access Information:") print("\n📋 Access Information:")
print(f" • Frontend: http://{FRONTEND_HOST}:{FRONTEND_PORT}") print(f" • Frontend: http://{FRONTEND_HOST}:{FRONTEND_PORT}")
print(f" • Backend API: http://{BACKEND_HOST}:{BACKEND_PORT}/api") print(f" • Backend API: http://{BACKEND_HOST}:{BACKEND_PORT}/api")
print(f" • API Documentation: http://{BACKEND_HOST}:{BACKEND_PORT}/docs") print(f" • API Documentation: http://{BACKEND_HOST}:{BACKEND_PORT}/docs")
print("\n⚠️ Press Ctrl+C to stop both servers\n") print("\n⚠️ Press Ctrl+C to stop both servers\n")
# Keep the main process running # Keep the main process running
try: try:
while True: while True:
@ -142,6 +134,5 @@ def main():
except KeyboardInterrupt: except KeyboardInterrupt:
signal_handler(None, None) signal_handler(None, None)
if __name__ == "__main__": if __name__ == "__main__":
main() main()