Compare commits

...

13 Commits
higgs ... main

Author SHA1 Message Date
Steve White 733c9d1b5f Merge pull request 'feat/frontend-phase1' (#1) from feat/frontend-phase1 into main
Reviewed-on: #1
2025-08-14 15:44:24 +00:00
Steve White 9c605cd3a0 docs: update README with Windows setup and Paste Script instructions 2025-08-14 10:42:40 -05:00
Steve White d3ac8bf4eb Added windows setup script 2025-08-14 10:35:30 -05:00
Steve White 75a2a37252 added back end concurrency and front end paste feature. 2025-08-14 10:33:44 -05:00
Steve White b28a9bcf58 fixed some UI issues. 2025-08-14 08:11:16 -05:00
Steve White 4f47d69aaa fixed some UI problems and added a clear dialog button. 2025-08-13 18:10:02 -05:00
Steve White f095bb14e5 Fixed buttons; play/pause, stop, settings 2025-08-13 00:43:43 -05:00
Steve White 93e0407eac frontend: add per-line play/pause/stop controls\n\n- Toggle play/pause on same button, add stop button\n- Maintain shared audio state to prevent overlap and update button states accordingly 2025-08-13 00:28:30 -05:00
Steve White c9593fe6cc frontend: prevent overlapping per-line playback; backend: print idle eviction settings on startup\n\n- app.js: add shared Audio state, disable play button while playing, stop previous line when new one plays\n- start_server.py: print eviction enabled/timeout/check interval\n- app/main.py: log eviction settings during FastAPI startup 2025-08-12 17:37:32 -05:00
Steve White cbc164c7a3 backend: implement idle TTS model eviction\n\n- Add MODEL_EVICTION_ENABLED, MODEL_IDLE_TIMEOUT_SECONDS, MODEL_IDLE_CHECK_INTERVAL_SECONDS in app/config.py\n- Add ModelManager service to manage TTSService load/unload with usage tracking\n- Add background idle reaper in app/main.py (startup/shutdown hooks)\n- Refactor dialog router to use ModelManager dependency instead of per-request load/unload 2025-08-12 16:33:54 -05:00
Steve White 41f95cdee3 feat(frontend): inline notifications and loading states
- Add .notice styles and variants in frontend/css/style.css
- Add showNotice, hideNotice, confirmAction in frontend/js/app.js
- Replace all alert and confirm with inline notices
- Add loading states to Add Speaker and Generate Dialog
- Verified container IDs in index.html, grep clean, tests passing
2025-08-12 15:46:23 -05:00
Steve White b62eb0211f feat(frontend): Phase 1 – normalize speakers endpoints, fix API docs and JSON parsing, consolidate state in app.js, tweak CSS border color, align jest/babel-jest + add jest.config.cjs, add dev scripts, sanitize repo URL 2025-08-12 12:16:23 -05:00
Steve White 948712bb3f current workign version using chatterbox. 2025-08-12 11:31:00 -05:00
25 changed files with 1766 additions and 231 deletions

1
.gitignore vendored
View File

@ -22,3 +22,4 @@ backend/tts_generated_dialogs/
# Node.js dependencies
node_modules/
.aider*

188
.note/concurrency_plan.md Normal file
View File

@ -0,0 +1,188 @@
# Chatterbox TTS Backend: Bounded Concurrency + File I/O Offload Plan
Date: 2025-08-14
Owner: Backend
Status: Proposed (ready to implement)
## Goals
- Increase GPU utilization and reduce wall-clock time for dialog generation.
- Keep model lifecycle stable (leveraging current `ModelManager`).
- Minimal-risk changes: no API shape changes to clients.
## Scope
- Implement bounded concurrency for per-line speech chunk generation within a single dialog request.
- Offload audio file writes to threads to overlap GPU compute and disk I/O.
- Add configuration knobs to tune concurrency.
## Current State (References)
- `backend/app/services/dialog_processor_service.py`
- `DialogProcessorService.process_dialog()` iterates items and awaits `tts_service.generate_speech(...)` sequentially (lines ~171201).
- `backend/app/services/tts_service.py`
- `TTSService.generate_speech()` runs the TTS forward and calls `torchaudio.save(...)` on the event loop thread (blocking).
- `backend/app/services/model_manager.py`
- `ModelManager.using()` tracks active work; prevents idle eviction during requests.
- `backend/app/routers/dialog.py`
- `process_dialog_flow()` expects ordered `segment_files` and then concatenates; good to keep order stable.
## Design Overview
1) Bounded concurrency at dialog level
- Plan all output segments with a stable `segment_idx` (including speech chunks, silence, and reused audio).
- For speech chunks, schedule concurrent async tasks with a global semaphore set by config `TTS_MAX_CONCURRENCY` (start at 34).
- Await all tasks and collate results by `segment_idx` to preserve order.
2) File I/O offload
- Replace direct `torchaudio.save(...)` with `await asyncio.to_thread(torchaudio.save, ...)` in `TTSService.generate_speech()`.
- This lets the next GPU forward start while previous file writes happen on worker threads.
## Configuration
Add to `backend/app/config.py`:
- `TTS_MAX_CONCURRENCY: int` (default: `int(os.getenv("TTS_MAX_CONCURRENCY", "3"))`).
- Optional (future): `TTS_ENABLE_AMP_ON_CUDA: bool = True` to allow mixed precision on CUDA only.
## Implementation Steps
### A. Dialog-level concurrency
- File: `backend/app/services/dialog_processor_service.py`
- Function: `DialogProcessorService.process_dialog()`
1. Planning pass to assign indices
- Iterate `dialog_items` and build a list `planned_segments` entries:
- For silence or reuse: immediately append a final result with assigned `segment_idx` and continue.
- For speech: split into `text_chunks`; for each chunk create a planned entry: `{ segment_idx, type: 'speech', speaker_id, text_chunk, abs_speaker_sample_path, tts_params }`.
- Increment `segment_idx` for every planned segment (speech chunk or silence/reuse) to preserve final order.
2. Concurrency setup
- Create `sem = asyncio.Semaphore(config.TTS_MAX_CONCURRENCY)`.
- For each planned speech segment, create a task with an inner wrapper:
```python
async def run_one(planned):
async with sem:
try:
out_path = await self.tts_service.generate_speech(
text=planned.text_chunk,
speaker_sample_path=planned.abs_speaker_sample_path,
output_filename_base=planned.filename_base,
output_dir=dialog_temp_dir,
exaggeration=planned.exaggeration,
cfg_weight=planned.cfg_weight,
temperature=planned.temperature,
)
return planned.segment_idx, {"type": "speech", "path": str(out_path), "speaker_id": planned.speaker_id, "text_chunk": planned.text_chunk}
except Exception as e:
return planned.segment_idx, {"type": "error", "message": f"Error generating speech: {e}", "text_chunk": planned.text_chunk}
```
- Schedule with `asyncio.create_task(run_one(p))` and collect tasks.
3. Await and collate
- `results_map = {}`; for each completed task, set `results_map[idx] = payload`.
- Merge: start with all previously final (silence/reuse/error) entries placed by `segment_idx`, then fill speech results by `segment_idx` into a single `segment_results` list sorted ascending by index.
- Keep `processing_log` entries for each planned segment (queued, started, finished, errors).
4. Return value unchanged
- Return `{"log": ..., "segment_files": segment_results, "temp_dir": str(dialog_temp_dir)}`. This maintains router and concatenator behavior.
### B. Offload audio writes
- File: `backend/app/services/tts_service.py`
- Function: `TTSService.generate_speech()`
1. After obtaining `wav` tensor, replace:
```python
# torchaudio.save(str(output_file_path), wav, self.model.sr)
```
with:
```python
await asyncio.to_thread(torchaudio.save, str(output_file_path), wav, self.model.sr)
```
- Keep the rest of cleanup logic (delete `wav`, `gc.collect()`, cache emptying) unchanged.
2. Optional (CUDA-only AMP)
- If CUDA is used and `config.TTS_ENABLE_AMP_ON_CUDA` is True, wrap forward with AMP:
```python
with torch.cuda.amp.autocast(dtype=torch.float16):
wav = self.model.generate(...)
```
- Leave MPS/CPU code path as-is.
## Error Handling & Ordering
- Every planned segment owns a unique `segment_idx`.
- On failure, insert an error record at that index; downstream concatenation will skip missing/nonexistent paths already.
- Preserve exact output order expected by `routers/dialog.py::process_dialog_flow()`.
## Performance Expectations
- GPU util should increase from ~50% to 7590% depending on dialog size and line lengths.
- Wall-clock reduction is workload-dependent; target 1.52.5x on multi-line dialogs.
## Metrics & Instrumentation
- Add timestamped log entries per segment: planned→queued→started→saved.
- Log effective concurrency (max in-flight), and cumulative GPU time if available.
- Optionally add a simple timing summary at end of `process_dialog()`.
## Testing Plan
1. Unit-ish
- Small dialog (3 speech lines, 1 silence). Ensure ordering is stable and files exist.
- Introduce an invalid speaker to verify error propagation doesnt break the rest.
2. Integration
- POST `/api/dialog/generate` with 2050 mixed-length lines and a couple silences.
- Validate: response OK, concatenated file exists, zip contains all generated speech segments, order preserved.
- Compare runtime vs. sequential baseline (before/after).
3. Stress/limits
- Long lines split into many chunks; verify no OOM with `TTS_MAX_CONCURRENCY`=3.
- Try `TTS_MAX_CONCURRENCY`=1 to simulate sequential; compare metrics.
## Rollout & Config Defaults
- Default `TTS_MAX_CONCURRENCY=3`.
- Expose via environment variable; no client changes needed.
- If instability observed, set `TTS_MAX_CONCURRENCY=1` to revert to sequential behavior quickly.
## Risks & Mitigations
- OOM under high concurrency → Mitigate with low default, easy rollback, and chunking already in place.
- Disk I/O saturation → Offload to threads; if disk is a bottleneck, decrease concurrency.
- Model thread safety → We call `model.generate` concurrently only up to semaphore cap; if underlying library is not thread-safe for forward passes, consider serializing forwards but still overlapping with file I/O; early logs will reveal.
## Follow-up (Out of Scope for this change)
- Dynamic batching queue inside `TTSService` for further GPU efficiency.
- CUDA AMP enablement and profiling.
- Per-speaker sub-queues if batching requires same-speaker inputs.
## Acceptance Criteria
- `TTS_MAX_CONCURRENCY` is configurable; default=3.
- File writes occur via `asyncio.to_thread`.
- Order of `segment_files` unchanged relative to sequential output.
- End-to-end works for both small and large dialogs; error cases logged.
- Observed GPU utilization and runtime improve on representative dialog.

138
.note/review-20250812.md Normal file
View File

@ -0,0 +1,138 @@
# Frontend Review and Recommendations
Date: 2025-08-12T11:32:16-05:00
Scope: `frontend/` of `chatterbox-test` monorepo
---
## Summary
- Static vanilla JS frontend served by `frontend/start_dev_server.py` interacting with FastAPI backend under `/api`.
- Solid feature set (speaker management, dialog editor, per-line generation, full dialog generation, save/load) with robust error handling.
- Key issues: inconsistent API trailing slashes, Jest/babel-jest version/config mismatch, minor state duplication, alert/confirm UX, overly dark border color, token in `package.json` repo URL.
---
## Findings
- **Framework/structure**
- `frontend/` is static vanilla JS. Main files:
- `index.html`, `js/app.js`, `js/api.js`, `js/config.js`, `css/style.css`.
- Dev server: `frontend/start_dev_server.py` (CORS, env-based port/host).
- **API client vs backend routes (trailing slashes)**
- Frontend `frontend/js/api.js` currently uses:
- `getSpeakers()`: `${API_BASE_URL}/speakers/` (trailing).
- `addSpeaker()`: `${API_BASE_URL}/speakers/` (trailing).
- `deleteSpeaker()`: `${API_BASE_URL}/speakers/${speakerId}/` (trailing).
- `generateLine()`: `${API_BASE_URL}/dialog/generate_line`.
- `generateDialog()`: `${API_BASE_URL}/dialog/generate`.
- Backend routes:
- `backend/app/routers/speakers.py`: `GET/POST /` and `DELETE /{speaker_id}` (no trailing slash on delete when prefixed under `/api/speakers`).
- `backend/app/routers/dialog.py`: `/generate_line` and `/generate` (match frontend).
- Tests in `frontend/tests/api.test.js` expect no trailing slashes for `/speakers` and `/speakers/{id}`.
- Implication: Inconsistent trailing slashes can cause test failures and possible 404s for delete.
- **Payload schema inconsistencies**
- `generateDialog()` JSDoc shows `silence` as `{ duration_ms: 500 }` but backend expects `duration` (seconds). UI also uses `duration` seconds.
- **Form fields alignment**
- Speaker add uses `name` and `audio_file` which match backend (`Form` and `File`).
- **State management duplication in `frontend/js/app.js`**
- `dialogItems` and `availableSpeakersCache` defined at module scope and again inside `initializeDialogEditor()`, creating shadowing risk. Consolidate to a single source of truth.
- **UX considerations**
- Heavy use of `alert()`/`confirm()`. Prefer inline notifications/banners and per-row error chips (you already render `item.error`).
- Add global loading/disabled states for long actions (e.g., full dialog generation, speaker add/delete).
- **CSS theme issue**
- `--border-light` is `#1b0404` (dark red); semantically a light gray fits better and improves contrast harmony.
- **Testing/Jest/Babel config**
- Root `package.json` uses `jest@^29.7.0` with `babel-jest@^30.0.0-beta.3` (major mismatch). Align versions.
- No `jest.config.cjs` to configure `transform` via `babel-jest` for ESM modules.
- **Security**
- `package.json` `repository.url` embeds a token. Remove secrets from VCS immediately.
- **Dev scripts**
- Only `"test": "jest"` present. Add scripts to run the frontend dev server and test config explicitly.
- **Response handling consistency**
- `generateLine()` parses via `response.text()` then `JSON.parse()`. Others use `response.json()`. Standardize for consistency.
---
## Recommended Actions (Phase 1: Quick wins)
- **Normalize API paths in `frontend/js/api.js`**
- Use no trailing slashes:
- `GET/POST`: `${API_BASE_URL}/speakers`
- `DELETE`: `${API_BASE_URL}/speakers/${speakerId}`
- Keep dialog endpoints unchanged.
- **Fix JSDoc for `generateDialog()`**
- Use `silence: { duration: number }` (seconds), not `duration_ms`.
- **Refactor `frontend/js/app.js` state**
- Remove duplicate `dialogItems`/`availableSpeakersCache` declarations. Choose module-scope or function-scope, and pass references.
- **Improve UX**
- Replace `alert/confirm` with inline banners near `#results-display` and per-row error chips (extend existing `.line-error-msg`).
- Add disabled/loading states for global generate and speaker actions.
- **CSS tweak**
- Set `--border-light: #e5e7eb;` (or similar) to reflect a light border.
- **Harden tests/Jest config**
- Align versions: either Jest 29 + `babel-jest` 29, or upgrade both to 30 stable together.
- Add `jest.config.cjs` with `transform` using `babel-jest` and suitable `testEnvironment`.
- Ensure tests expect normalized API paths (recommended to change code to match tests).
- **Dev scripts**
- Add to root `package.json`:
- `"frontend:dev": "python3 frontend/start_dev_server.py"`
- `"test:frontend": "jest --config ./jest.config.cjs"`
- **Sanitize repository URL**
- Remove embedded token from `package.json`.
- **Standardize response parsing**
- Switch `generateLine()` to `response.json()` unless backend returns `text/plain`.
---
## Backend Endpoint Confirmation
- `speakers` router (`backend/app/routers/speakers.py`):
- List/Create: `GET /`, `POST /` (when mounted under `/api/speakers``/api/speakers/`).
- Delete: `DELETE /{speaker_id}` (→ `/api/speakers/{speaker_id}`), no trailing slash.
- `dialog` router (`backend/app/routers/dialog.py`):
- `POST /generate_line`, `POST /generate` (mounted under `/api/dialog`).
---
## Proposed Implementation Plan
- **Phase 1 (12 hours)**
- Normalize API paths in `api.js`.
- Fix JSDoc for `generateDialog`.
- Consolidate dialog state in `app.js`.
- Adjust `--border-light` to light gray.
- Add `jest.config.cjs`, align Jest/babel-jest versions.
- Add dev/test scripts.
- Remove token from `package.json`.
- **Phase 2 (24 hours)**
- Inline notifications and comprehensive loading/disabled states.
- **Phase 3 (optional)**
- ESLint + Prettier.
- Consider Vite migration (HMR, proxy to backend, improved DX).
---
## Notes
- Current local time captured for this review: 2025-08-12T11:32:16-05:00.
- Frontend config (`frontend/js/config.js`) supports env overrides for API base and dev server port.
- Tests (`frontend/tests/api.test.js`) currently assume endpoints without trailing slashes.

204
.note/unload_model_plan.md Normal file
View File

@ -0,0 +1,204 @@
# Unload Model on Idle: Implementation Plan
## Goals
- Automatically unload large TTS model(s) when idle to reduce RAM/VRAM usage.
- Lazy-load on demand without breaking API semantics.
- Configurable timeout and safety controls.
## Requirements
- Config-driven idle timeout and poll interval.
- Thread-/async-safe across concurrent requests.
- No unload while an inference is in progress.
- Clear logs and metrics for load/unload events.
## Configuration
File: `backend/app/config.py`
- Add:
- `MODEL_IDLE_TIMEOUT_SECONDS: int = 900` (0 disables eviction)
- `MODEL_IDLE_CHECK_INTERVAL_SECONDS: int = 60`
- `MODEL_EVICTION_ENABLED: bool = True`
- Bind to env: `MODEL_IDLE_TIMEOUT_SECONDS`, `MODEL_IDLE_CHECK_INTERVAL_SECONDS`, `MODEL_EVICTION_ENABLED`.
## Design
### ModelManager (Singleton)
File: `backend/app/services/model_manager.py` (new)
- Responsibilities:
- Manage lifecycle (load/unload) of the TTS model/pipeline.
- Provide `get()` that returns a ready model (lazy-load if needed) and updates `last_used`.
- Track active request count to block eviction while > 0.
- Internals:
- `self._model` (or components), `self._last_used: float`, `self._active: int`.
- Locks: `asyncio.Lock` for load/unload; `asyncio.Lock` or `asyncio.Semaphore` for counters.
- Optional CUDA cleanup: `torch.cuda.empty_cache()` after unload.
- API:
- `async def get(self) -> Model`: ensures loaded; bumps `last_used`.
- `async def load(self)`: idempotent; guarded by lock.
- `async def unload(self)`: only when `self._active == 0`; clears refs and caches.
- `def touch(self)`: update `last_used`.
- Context helper: `async def using(self)`: async context manager incrementing/decrementing `active` safely.
### Idle Reaper Task
Registration: FastAPI startup (e.g., in `backend/app/main.py`)
- Background task loop every `MODEL_IDLE_CHECK_INTERVAL_SECONDS`:
- If eviction enabled and timeout > 0 and model is loaded and `active == 0` and `now - last_used >= timeout`, call `unload()`.
- Handle cancellation on shutdown.
### API Integration
- Replace direct model access in endpoints with:
```python
manager = ModelManager.instance()
async with manager.using():
model = await manager.get()
# perform inference
```
- Optionally call `manager.touch()` at request start for non-inference paths that still need the model resident.
## Pseudocode
```python
# services/model_manager.py
import time, asyncio
from typing import Optional
from .config import settings
class ModelManager:
_instance: Optional["ModelManager"] = None
def __init__(self):
self._model = None
self._last_used = time.time()
self._active = 0
self._lock = asyncio.Lock()
self._counter_lock = asyncio.Lock()
@classmethod
def instance(cls):
if not cls._instance:
cls._instance = cls()
return cls._instance
async def load(self):
async with self._lock:
if self._model is not None:
return
# ... load model/pipeline here ...
self._model = await load_pipeline()
self._last_used = time.time()
async def unload(self):
async with self._lock:
if self._model is None:
return
if self._active > 0:
return # safety: do not unload while in use
# ... free resources ...
self._model = None
try:
import torch
torch.cuda.empty_cache()
except Exception:
pass
async def get(self):
if self._model is None:
await self.load()
self._last_used = time.time()
return self._model
async def _inc(self):
async with self._counter_lock:
self._active += 1
async def _dec(self):
async with self._counter_lock:
self._active = max(0, self._active - 1)
self._last_used = time.time()
def last_used(self):
return self._last_used
def is_loaded(self):
return self._model is not None
def active(self):
return self._active
def using(self):
manager = self
class _Ctx:
async def __aenter__(self):
await manager._inc()
return manager
async def __aexit__(self, exc_type, exc, tb):
await manager._dec()
return _Ctx()
# main.py (startup)
@app.on_event("startup")
async def start_reaper():
async def reaper():
while True:
try:
await asyncio.sleep(settings.MODEL_IDLE_CHECK_INTERVAL_SECONDS)
if not settings.MODEL_EVICTION_ENABLED:
continue
timeout = settings.MODEL_IDLE_TIMEOUT_SECONDS
if timeout <= 0:
continue
m = ModelManager.instance()
if m.is_loaded() and m.active() == 0 and (time.time() - m.last_used()) >= timeout:
await m.unload()
except asyncio.CancelledError:
break
except Exception as e:
logger.exception("Idle reaper error: %s", e)
app.state._model_reaper_task = asyncio.create_task(reaper())
@app.on_event("shutdown")
async def stop_reaper():
task = getattr(app.state, "_model_reaper_task", None)
if task:
task.cancel()
with contextlib.suppress(Exception):
await task
```
```
## Observability
- Logs: model load/unload, reaper decisions, active count.
- Metrics (optional): counters and gauges (load events, active, residency time).
## Safety & Edge Cases
- Avoid unload when `active > 0`.
- Guard multiple loads/unloads with lock.
- Multi-worker servers: each worker manages its own model.
- Cold-start latency: document expected additional latency for first request after idle unload.
## Testing
- Unit tests for `ModelManager`: load/unload idempotency, counter behavior.
- Simulated reaper triggering with short timeouts.
- Endpoint tests: concurrency (N simultaneous inferences), ensure no unload mid-flight.
## Rollout Plan
1. Introduce config + Manager (no reaper), switch endpoints to `using()`.
2. Enable reaper with long timeout in staging; observe logs/metrics.
3. Tune timeout; enable in production.
## Tasks Checklist
- [ ] Add config flags and defaults in `backend/app/config.py`.
- [ ] Create `backend/app/services/model_manager.py`.
- [ ] Register startup/shutdown reaper in app init (`backend/app/main.py`).
- [ ] Refactor endpoints to use `ModelManager.instance().using()` and `get()`.
- [ ] Add logs and optional metrics.
- [ ] Add unit/integration tests.
- [ ] Update README/ops docs.
## Alternatives Considered
- Gunicorn/uvicorn worker preloading with external idle supervisor: more complexity, less portability.
- OS-level cgroup memory pressure eviction: opaque and risky for correctness.
## Configuration Examples
```
MODEL_EVICTION_ENABLED=true
MODEL_IDLE_TIMEOUT_SECONDS=900
MODEL_IDLE_CHECK_INTERVAL_SECONDS=60
```

View File

@ -359,7 +359,7 @@ The API uses the following directory structure (configurable in `app/config.py`)
- **Temporary Files**: `{PROJECT_ROOT}/tts_temp_outputs/`
### CORS Settings
- Allowed Origins: `http://localhost:8001`, `http://127.0.0.1:8001`
- Allowed Origins: `http://localhost:8001`, `http://127.0.0.1:8001` (plus any `FRONTEND_HOST:FRONTEND_PORT` when using `start_servers.py`)
- Allowed Methods: All
- Allowed Headers: All
- Credentials: Enabled

View File

@ -58,7 +58,7 @@ The application uses environment variables for configuration. Three `.env` files
- `VITE_DEV_SERVER_HOST`: Frontend development server host
#### CORS Configuration
- `CORS_ORIGINS`: Comma-separated list of allowed origins
- `CORS_ORIGINS`: Comma-separated list of allowed origins. When using `start_servers.py` with the default `FRONTEND_HOST=0.0.0.0` and no explicit `CORS_ORIGINS`, CORS will allow all origins (wildcard) to simplify development.
#### Device Configuration
- `DEVICE`: Device for TTS model (auto, cpu, cuda, mps)
@ -101,7 +101,7 @@ CORS_ORIGINS=http://localhost:3000
### Common Issues
1. **Permission Errors**: Ensure the `PROJECT_ROOT` directory is writable
2. **CORS Errors**: Check that your frontend URL is in `CORS_ORIGINS`
2. **CORS Errors**: Check that your frontend URL is in `CORS_ORIGINS`. (When using `start_servers.py`, your specified `FRONTEND_HOST:FRONTEND_PORT` will be autoincluded.)
3. **Model Loading Errors**: Verify `DEVICE` setting matches your hardware
4. **Path Errors**: Ensure all path variables point to existing, accessible directories

View File

@ -9,6 +9,7 @@ A comprehensive text-to-speech application with multiple interfaces for generati
- **Dialog Generation**: Create multi-speaker conversations with configurable silence gaps
- **Audiobook Generation**: Convert long-form text into narrated audiobooks
- **Speaker Management**: Add/remove speakers with custom audio samples
- **Paste Script (JSONL) Import**: Paste a dialog script as JSONL directly into the editor via a modal
- **Memory Optimization**: Automatic model cleanup after generation
- **Output Organization**: Files saved in organized directories with ZIP packaging
@ -23,7 +24,6 @@ A comprehensive text-to-speech application with multiple interfaces for generati
pip install -r requirements.txt
npm install
```
2. Run automated setup:
```bash
python setup.py
@ -33,6 +33,24 @@ A comprehensive text-to-speech application with multiple interfaces for generati
- Add audio samples (WAV format) to `speaker_data/speaker_samples/`
- Configure speakers in `speaker_data/speakers.yaml`
### Windows Quick Start
On Windows, a PowerShell setup script is provided to automate environment setup and startup.
```powershell
# From the repository root in PowerShell
./setup-windows.ps1
# First time only, if scripts are blocked:
# Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
```
What it does:
- Creates/uses `.venv`
- Upgrades pip and installs deps from `backend/requirements.txt` and root `requirements.txt`
- Creates a default `.env` with sensible ports if missing
- Starts both servers via `start_servers.py`
### Running the Application
**Full-Stack Web Application:**
@ -41,6 +59,12 @@ A comprehensive text-to-speech application with multiple interfaces for generati
python start_servers.py
```
On Windows, you can also use the one-liner PowerShell script:
```powershell
./setup-windows.ps1
```
**Individual Components:**
```bash
# Backend only (FastAPI)
@ -56,7 +80,26 @@ python gradio_app.py
## Usage
### Web Interface
Access the modern web UI at `http://localhost:8001` for interactive dialog creation with drag-and-drop editing.
Access the modern web UI at `http://localhost:8001` for interactive dialog creation.
#### Paste Script (JSONL) in Dialog Editor
Quickly load a dialog by pasting JSONL (one JSON object per line):
1. Click `Paste Script` in the Dialog Editor.
2. Paste JSONL content, for example:
```jsonl
{"type":"speech","speaker_id":"dummy_speaker","text":"Hello there!"}
{"type":"silence","duration":0.5}
{"type":"speech","speaker_id":"dummy_speaker","text":"This is the second line."}
```
3. Click `Load` and confirm replacement if prompted.
Notes:
- Input is validated per line; errors report line numbers.
- The dialog is saved to localStorage, so it persists across refreshes.
- Unknown `speaker_id`s will still load; add speakers later if needed.
### CLI Tools
@ -149,5 +192,12 @@ The application automatically:
- **"Skipping unknown speaker"**: Configure speaker in `speaker_data/speakers.yaml`
- **"Sample file not found"**: Verify audio files exist in `speaker_data/speaker_samples/`
- **Memory issues**: Use model reinitialization options for long content
- **CORS errors**: Check frontend/backend port configuration
- **CORS errors**: Check frontend/backend port configuration (frontend origin is auto-included when using `start_servers.py`)
- **Import errors**: Run `python import_helper.py` to check dependencies
### Windows-specific
- If PowerShell blocks script execution, run once:
```powershell
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
```
- If Windows Firewall prompts the first time you run servers, allow access on your private network.

View File

@ -6,20 +6,34 @@ from dotenv import load_dotenv
load_dotenv()
# Project root - can be overridden by environment variable
PROJECT_ROOT = Path(os.getenv("PROJECT_ROOT", Path(__file__).parent.parent.parent)).resolve()
PROJECT_ROOT = Path(
os.getenv("PROJECT_ROOT", Path(__file__).parent.parent.parent)
).resolve()
# Directory paths
SPEAKER_DATA_BASE_DIR = Path(os.getenv("SPEAKER_DATA_BASE_DIR", str(PROJECT_ROOT / "speaker_data")))
SPEAKER_SAMPLES_DIR = Path(os.getenv("SPEAKER_SAMPLES_DIR", str(SPEAKER_DATA_BASE_DIR / "speaker_samples")))
SPEAKERS_YAML_FILE = Path(os.getenv("SPEAKERS_YAML_FILE", str(SPEAKER_DATA_BASE_DIR / "speakers.yaml")))
SPEAKER_DATA_BASE_DIR = Path(
os.getenv("SPEAKER_DATA_BASE_DIR", str(PROJECT_ROOT / "speaker_data"))
)
SPEAKER_SAMPLES_DIR = Path(
os.getenv("SPEAKER_SAMPLES_DIR", str(SPEAKER_DATA_BASE_DIR / "speaker_samples"))
)
SPEAKERS_YAML_FILE = Path(
os.getenv("SPEAKERS_YAML_FILE", str(SPEAKER_DATA_BASE_DIR / "speakers.yaml"))
)
# TTS temporary output path (used by DialogProcessorService)
TTS_TEMP_OUTPUT_DIR = Path(os.getenv("TTS_TEMP_OUTPUT_DIR", str(PROJECT_ROOT / "tts_temp_outputs")))
TTS_TEMP_OUTPUT_DIR = Path(
os.getenv("TTS_TEMP_OUTPUT_DIR", str(PROJECT_ROOT / "tts_temp_outputs"))
)
# Final dialog output path (used by Dialog router and served by main app)
# These are stored within the 'backend' directory to be easily servable.
DIALOG_OUTPUT_PARENT_DIR = PROJECT_ROOT / "backend"
DIALOG_GENERATED_DIR = Path(os.getenv("DIALOG_GENERATED_DIR", str(DIALOG_OUTPUT_PARENT_DIR / "tts_generated_dialogs")))
DIALOG_GENERATED_DIR = Path(
os.getenv(
"DIALOG_GENERATED_DIR", str(DIALOG_OUTPUT_PARENT_DIR / "tts_generated_dialogs")
)
)
# Alias for clarity and backward compatibility
DIALOG_OUTPUT_DIR = DIALOG_GENERATED_DIR
@ -29,11 +43,41 @@ HOST = os.getenv("HOST", "0.0.0.0")
PORT = int(os.getenv("PORT", "8000"))
RELOAD = os.getenv("RELOAD", "true").lower() == "true"
# CORS configuration
CORS_ORIGINS = [origin.strip() for origin in os.getenv("CORS_ORIGINS", "http://localhost:8001,http://127.0.0.1:8001").split(",")]
# CORS configuration: determine allowed origins based on env & frontend binding
_cors_env = os.getenv("CORS_ORIGINS", "")
_frontend_host = os.getenv("FRONTEND_HOST")
_frontend_port = os.getenv("FRONTEND_PORT")
# If the dev server is bound to 0.0.0.0 (all interfaces), allow all origins
if _frontend_host == "0.0.0.0": # dev convenience when binding wildcard
CORS_ORIGINS = ["*"]
elif _cors_env:
# parse comma-separated origins, strip whitespace
CORS_ORIGINS = [origin.strip() for origin in _cors_env.split(",") if origin.strip()]
else:
# default to allow all origins in development
CORS_ORIGINS = ["*"]
# Auto-include specific frontend origin when not using wildcard CORS
if CORS_ORIGINS != ["*"] and _frontend_host and _frontend_port:
_frontend_origin = f"http://{_frontend_host.strip()}:{_frontend_port.strip()}"
if _frontend_origin not in CORS_ORIGINS:
CORS_ORIGINS.append(_frontend_origin)
# Device configuration
DEVICE = os.getenv("DEVICE", "auto")
# Concurrency configuration
# Max number of concurrent TTS generation tasks per dialog request
TTS_MAX_CONCURRENCY = int(os.getenv("TTS_MAX_CONCURRENCY", "3"))
# Model idle eviction configuration
# Enable/disable idle-based model eviction
MODEL_EVICTION_ENABLED = os.getenv("MODEL_EVICTION_ENABLED", "true").lower() == "true"
# Unload model after this many seconds of inactivity (0 disables eviction)
MODEL_IDLE_TIMEOUT_SECONDS = int(os.getenv("MODEL_IDLE_TIMEOUT_SECONDS", "900"))
# How often the reaper checks for idleness
MODEL_IDLE_CHECK_INTERVAL_SECONDS = int(os.getenv("MODEL_IDLE_CHECK_INTERVAL_SECONDS", "60"))
# Ensure directories exist
SPEAKER_SAMPLES_DIR.mkdir(parents=True, exist_ok=True)

View File

@ -2,6 +2,10 @@ from fastapi import FastAPI
from fastapi.staticfiles import StaticFiles
from fastapi.middleware.cors import CORSMiddleware
from pathlib import Path
import asyncio
import contextlib
import logging
import time
from app.routers import speakers, dialog # Import the routers
from app import config
@ -38,3 +42,47 @@ config.DIALOG_GENERATED_DIR.mkdir(parents=True, exist_ok=True)
app.mount("/generated_audio", StaticFiles(directory=config.DIALOG_GENERATED_DIR), name="generated_audio")
# Further endpoints for speakers, dialog generation, etc., will be added here.
# --- Background task: idle model reaper ---
logger = logging.getLogger("app.model_reaper")
@app.on_event("startup")
async def _start_model_reaper():
from app.services.model_manager import ModelManager
async def reaper():
while True:
try:
await asyncio.sleep(config.MODEL_IDLE_CHECK_INTERVAL_SECONDS)
if not getattr(config, "MODEL_EVICTION_ENABLED", True):
continue
timeout = getattr(config, "MODEL_IDLE_TIMEOUT_SECONDS", 0)
if timeout <= 0:
continue
m = ModelManager.instance()
if m.is_loaded() and m.active() == 0 and (time.time() - m.last_used()) >= timeout:
logger.info("Idle timeout reached (%.0fs). Unloading model...", timeout)
await m.unload()
except asyncio.CancelledError:
break
except Exception:
logger.exception("Model reaper encountered an error")
# Log eviction configuration at startup
logger.info(
"Model Eviction -> enabled: %s | idle_timeout: %ss | check_interval: %ss",
getattr(config, "MODEL_EVICTION_ENABLED", True),
getattr(config, "MODEL_IDLE_TIMEOUT_SECONDS", 0),
getattr(config, "MODEL_IDLE_CHECK_INTERVAL_SECONDS", 60),
)
app.state._model_reaper_task = asyncio.create_task(reaper())
@app.on_event("shutdown")
async def _stop_model_reaper():
task = getattr(app.state, "_model_reaper_task", None)
if task:
task.cancel()
with contextlib.suppress(Exception):
await task

View File

@ -9,6 +9,8 @@ from app.services.speaker_service import SpeakerManagementService
from app.services.dialog_processor_service import DialogProcessorService
from app.services.audio_manipulation_service import AudioManipulationService
from app import config
from typing import AsyncIterator
from app.services.model_manager import ModelManager
router = APIRouter()
@ -16,9 +18,12 @@ router = APIRouter()
# These can be more sophisticated with a proper DI container or FastAPI's Depends system if services had complex init.
# For now, direct instantiation or simple Depends is fine.
def get_tts_service():
# Consider making device configurable
return TTSService(device="mps")
async def get_tts_service() -> AsyncIterator[TTSService]:
"""Dependency that holds a usage token for the duration of the request."""
manager = ModelManager.instance()
async with manager.using():
service = await manager.get_service()
yield service
def get_speaker_management_service():
return SpeakerManagementService()
@ -32,7 +37,7 @@ def get_dialog_processor_service(
def get_audio_manipulation_service():
return AudioManipulationService()
# --- Helper function to manage TTS model loading/unloading ---
# --- Helper imports ---
from app.models.dialog_models import SpeechItem, SilenceItem
from app.services.tts_service import TTSService
@ -128,19 +133,7 @@ async def generate_line(
detail=error_detail
)
async def manage_tts_model_lifecycle(tts_service: TTSService, task_function, *args, **kwargs):
"""Loads TTS model, executes task, then unloads model."""
try:
print("API: Loading TTS model...")
tts_service.load_model()
return await task_function(*args, **kwargs)
except Exception as e:
# Log or handle specific exceptions if needed before re-raising
print(f"API: Error during TTS model lifecycle or task execution: {e}")
raise
finally:
print("API: Unloading TTS model...")
tts_service.unload_model()
# Removed per-request load/unload in favor of ModelManager idle eviction.
async def process_dialog_flow(
request: DialogRequest,
@ -274,12 +267,10 @@ async def generate_dialog_endpoint(
- Concatenates all audio segments into a single file.
- Creates a ZIP archive of all individual segments and the concatenated file.
"""
# Wrap the core processing logic with model loading/unloading
return await manage_tts_model_lifecycle(
tts_service,
process_dialog_flow,
request=request,
dialog_processor=dialog_processor,
# Execute core processing; ModelManager dependency keeps the model marked "in use".
return await process_dialog_flow(
request=request,
dialog_processor=dialog_processor,
audio_manipulator=audio_manipulator,
background_tasks=background_tasks
background_tasks=background_tasks,
)

View File

@ -1,6 +1,8 @@
from pathlib import Path
from typing import List, Dict, Any, Union
import re
import asyncio
from datetime import datetime
from .tts_service import TTSService
from .speaker_service import SpeakerManagementService
@ -92,24 +94,72 @@ class DialogProcessorService:
import shutil
segment_idx = 0
tasks = []
results_map: Dict[int, Dict[str, Any]] = {}
sem = asyncio.Semaphore(getattr(config, "TTS_MAX_CONCURRENCY", 2))
async def run_one(planned: Dict[str, Any]):
async with sem:
text_chunk = planned["text_chunk"]
speaker_id = planned["speaker_id"]
abs_speaker_sample_path = planned["abs_speaker_sample_path"]
filename_base = planned["filename_base"]
params = planned["params"]
seg_idx = planned["segment_idx"]
start_ts = datetime.now()
start_line = (
f"[{start_ts.isoformat(timespec='seconds')}] [TTS-TASK] START seg_idx={seg_idx} "
f"speaker={speaker_id} chunk_len={len(text_chunk)} base={filename_base}"
)
try:
out_path = await self.tts_service.generate_speech(
text=text_chunk,
speaker_id=speaker_id,
speaker_sample_path=str(abs_speaker_sample_path),
output_filename_base=filename_base,
output_dir=dialog_temp_dir,
exaggeration=params.get('exaggeration', 0.5),
cfg_weight=params.get('cfg_weight', 0.5),
temperature=params.get('temperature', 0.8),
)
end_ts = datetime.now()
duration = (end_ts - start_ts).total_seconds()
end_line = (
f"[{end_ts.isoformat(timespec='seconds')}] [TTS-TASK] END seg_idx={seg_idx} "
f"dur={duration:.2f}s -> {out_path}"
)
return seg_idx, {
"type": "speech",
"path": str(out_path),
"speaker_id": speaker_id,
"text_chunk": text_chunk,
}, start_line + "\n" + f"Successfully generated segment: {out_path}" + "\n" + end_line
except Exception as e:
end_ts = datetime.now()
err_line = (
f"[{end_ts.isoformat(timespec='seconds')}] [TTS-TASK] ERROR seg_idx={seg_idx} "
f"speaker={speaker_id} err={repr(e)}"
)
return seg_idx, {
"type": "error",
"message": f"Error generating speech for chunk '{text_chunk[:50]}...': {repr(e)}",
"text_chunk": text_chunk,
}, err_line
for i, item in enumerate(dialog_items):
item_type = item.get("type")
processing_log.append(f"Processing item {i+1}: type='{item_type}'")
# --- Universal: Handle reuse of existing audio for both speech and silence ---
# --- Handle reuse of existing audio ---
use_existing_audio = item.get("use_existing_audio", False)
audio_url = item.get("audio_url")
if use_existing_audio and audio_url:
# Determine source path (handle both absolute and relative)
# Map web URL to actual file location in tts_generated_dialogs
if audio_url.startswith("/generated_audio/"):
src_audio_path = config.DIALOG_OUTPUT_DIR / audio_url[len("/generated_audio/"):]
else:
src_audio_path = Path(audio_url)
if not src_audio_path.is_absolute():
# Assume relative to the generated audio root dir
src_audio_path = config.DIALOG_OUTPUT_DIR / audio_url.lstrip("/\\")
# Now src_audio_path should point to the real file in tts_generated_dialogs
if src_audio_path.is_file():
segment_filename = f"{output_base_name}_seg{segment_idx}_reused.wav"
dest_path = (self.temp_audio_dir / output_base_name / segment_filename)
@ -123,22 +173,18 @@ class DialogProcessorService:
processing_log.append(f"[REUSE] Destination audio file was not created: {dest_path}")
else:
processing_log.append(f"[REUSE] Destination audio file created: {dest_path}, size={dest_path.stat().st_size} bytes")
# Only include 'type' and 'path' so the concatenator always includes this segment
segment_results.append({
"type": item_type,
"path": str(dest_path)
})
results_map[segment_idx] = {"type": item_type, "path": str(dest_path)}
processing_log.append(f"Reused existing audio for item {i+1}: copied from {src_audio_path} to {dest_path}")
except Exception as e:
error_message = f"Failed to copy reused audio for item {i+1}: {e}"
processing_log.append(error_message)
segment_results.append({"type": "error", "message": error_message})
results_map[segment_idx] = {"type": "error", "message": error_message}
segment_idx += 1
continue
else:
error_message = f"Audio file for reuse not found at {src_audio_path} for item {i+1}."
processing_log.append(error_message)
segment_results.append({"type": "error", "message": error_message})
results_map[segment_idx] = {"type": "error", "message": error_message}
segment_idx += 1
continue
@ -147,70 +193,81 @@ class DialogProcessorService:
text = item.get("text")
if not speaker_id or not text:
processing_log.append(f"Skipping speech item {i+1} due to missing speaker_id or text.")
segment_results.append({"type": "error", "message": "Missing speaker_id or text"})
results_map[segment_idx] = {"type": "error", "message": "Missing speaker_id or text"}
segment_idx += 1
continue
# Validate speaker_id and get speaker_sample_path
speaker_info = self.speaker_service.get_speaker_by_id(speaker_id)
if not speaker_info:
processing_log.append(f"Speaker ID '{speaker_id}' not found. Skipping item {i+1}.")
segment_results.append({"type": "error", "message": f"Speaker ID '{speaker_id}' not found"})
results_map[segment_idx] = {"type": "error", "message": f"Speaker ID '{speaker_id}' not found"}
segment_idx += 1
continue
if not speaker_info.sample_path:
processing_log.append(f"Speaker ID '{speaker_id}' has no sample path defined. Skipping item {i+1}.")
segment_results.append({"type": "error", "message": f"Speaker ID '{speaker_id}' has no sample path defined"})
results_map[segment_idx] = {"type": "error", "message": f"Speaker ID '{speaker_id}' has no sample path defined"}
segment_idx += 1
continue
# speaker_info.sample_path is relative to config.SPEAKER_DATA_BASE_DIR
abs_speaker_sample_path = config.SPEAKER_DATA_BASE_DIR / speaker_info.sample_path
if not abs_speaker_sample_path.is_file():
processing_log.append(f"Speaker sample file not found or is not a file at '{abs_speaker_sample_path}' for speaker ID '{speaker_id}'. Skipping item {i+1}.")
segment_results.append({"type": "error", "message": f"Speaker sample not a file or not found: {abs_speaker_sample_path}"})
results_map[segment_idx] = {"type": "error", "message": f"Speaker sample not a file or not found: {abs_speaker_sample_path}"}
segment_idx += 1
continue
text_chunks = self._split_text(text)
processing_log.append(f"Split text for speaker '{speaker_id}' into {len(text_chunks)} chunk(s).")
for chunk_idx, text_chunk in enumerate(text_chunks):
segment_filename_base = f"{output_base_name}_seg{segment_idx}_spk{speaker_id}_chunk{chunk_idx}"
processing_log.append(f"Generating speech for chunk: '{text_chunk[:50]}...' using speaker '{speaker_id}'")
try:
segment_output_path = await self.tts_service.generate_speech(
text=text_chunk,
speaker_id=speaker_id, # For metadata, actual sample path is used by TTS
speaker_sample_path=str(abs_speaker_sample_path),
output_filename_base=segment_filename_base,
output_dir=dialog_temp_dir, # Save to the dialog's temp dir
exaggeration=item.get('exaggeration', 0.5), # Default from Gradio, Pydantic model should provide this
cfg_weight=item.get('cfg_weight', 0.5), # Default from Gradio, Pydantic model should provide this
temperature=item.get('temperature', 0.8) # Default from Gradio, Pydantic model should provide this
)
segment_results.append({
"type": "speech",
"path": str(segment_output_path),
"speaker_id": speaker_id,
"text_chunk": text_chunk
})
processing_log.append(f"Successfully generated segment: {segment_output_path}")
except Exception as e:
error_message = f"Error generating speech for chunk '{text_chunk[:50]}...': {repr(e)}"
processing_log.append(error_message)
segment_results.append({"type": "error", "message": error_message, "text_chunk": text_chunk})
filename_base = f"{output_base_name}_seg{segment_idx}_spk{speaker_id}_chunk{chunk_idx}"
processing_log.append(f"Queueing TTS for chunk: '{text_chunk[:50]}...' using speaker '{speaker_id}'")
planned = {
"segment_idx": segment_idx,
"speaker_id": speaker_id,
"text_chunk": text_chunk,
"abs_speaker_sample_path": abs_speaker_sample_path,
"filename_base": filename_base,
"params": {
'exaggeration': item.get('exaggeration', 0.5),
'cfg_weight': item.get('cfg_weight', 0.5),
'temperature': item.get('temperature', 0.8),
},
}
tasks.append(asyncio.create_task(run_one(planned)))
segment_idx += 1
elif item_type == "silence":
duration = item.get("duration")
if duration is None or duration < 0:
processing_log.append(f"Skipping silence item {i+1} due to invalid duration.")
segment_results.append({"type": "error", "message": "Invalid duration for silence"})
results_map[segment_idx] = {"type": "error", "message": "Invalid duration for silence"}
segment_idx += 1
continue
segment_results.append({"type": "silence", "duration": float(duration)})
results_map[segment_idx] = {"type": "silence", "duration": float(duration)}
processing_log.append(f"Added silence of {duration}s.")
segment_idx += 1
else:
processing_log.append(f"Unknown item type '{item_type}' at item {i+1}. Skipping.")
segment_results.append({"type": "error", "message": f"Unknown item type: {item_type}"})
results_map[segment_idx] = {"type": "error", "message": f"Unknown item type: {item_type}"}
segment_idx += 1
# Await all TTS tasks and merge results
if tasks:
processing_log.append(
f"Dispatching {len(tasks)} TTS task(s) with concurrency limit "
f"{getattr(config, 'TTS_MAX_CONCURRENCY', 2)}"
)
completed = await asyncio.gather(*tasks, return_exceptions=False)
for idx, payload, maybe_log in completed:
results_map[idx] = payload
if maybe_log:
processing_log.append(maybe_log)
# Build ordered list
for idx in sorted(results_map.keys()):
segment_results.append(results_map[idx])
# Log the full segment_results list for debugging
processing_log.append("[DEBUG] Final segment_results list:")
@ -220,7 +277,7 @@ class DialogProcessorService:
return {
"log": "\n".join(processing_log),
"segment_files": segment_results,
"temp_dir": str(dialog_temp_dir) # For cleanup or zipping later
"temp_dir": str(dialog_temp_dir)
}
if __name__ == "__main__":

View File

@ -0,0 +1,170 @@
import asyncio
import time
import logging
from typing import Optional
import gc
import os
_proc = None
try:
import psutil # type: ignore
_proc = psutil.Process(os.getpid())
except Exception:
psutil = None # type: ignore
def _rss_mb() -> float:
"""Return current process RSS in MB, or -1.0 if unavailable."""
global _proc
try:
if _proc is None and psutil is not None:
_proc = psutil.Process(os.getpid())
if _proc is not None:
return _proc.memory_info().rss / (1024 * 1024)
except Exception:
return -1.0
return -1.0
try:
import torch # Optional; used for cache cleanup metrics
except Exception: # pragma: no cover - torch may not be present in some envs
torch = None # type: ignore
from app import config
from app.services.tts_service import TTSService
logger = logging.getLogger(__name__)
class ModelManager:
_instance: Optional["ModelManager"] = None
def __init__(self):
self._service: Optional[TTSService] = None
self._last_used: float = time.time()
self._active: int = 0
self._lock = asyncio.Lock()
self._counter_lock = asyncio.Lock()
@classmethod
def instance(cls) -> "ModelManager":
if not cls._instance:
cls._instance = cls()
return cls._instance
async def _ensure_service(self) -> None:
if self._service is None:
# Use configured device, default is handled by TTSService itself
device = getattr(config, "DEVICE", "auto")
# TTSService presently expects explicit device like "mps"/"cpu"/"cuda"; map "auto" to "mps" on Mac otherwise cpu
if device == "auto":
try:
import torch
if hasattr(torch.backends, "mps") and torch.backends.mps.is_available():
device = "mps"
elif torch.cuda.is_available():
device = "cuda"
else:
device = "cpu"
except Exception:
device = "cpu"
self._service = TTSService(device=device)
async def load(self) -> None:
async with self._lock:
await self._ensure_service()
if self._service and self._service.model is None:
before_mb = _rss_mb()
logger.info(
"Loading TTS model (device=%s)... (rss_before=%.1f MB)",
self._service.device,
before_mb,
)
self._service.load_model()
after_mb = _rss_mb()
if after_mb >= 0 and before_mb >= 0:
logger.info(
"TTS model loaded (rss_after=%.1f MB, delta=%.1f MB)",
after_mb,
after_mb - before_mb,
)
self._last_used = time.time()
async def unload(self) -> None:
async with self._lock:
if not self._service:
return
if self._active > 0:
logger.debug("Skip unload: %d active operations", self._active)
return
if self._service.model is not None:
before_mb = _rss_mb()
logger.info(
"Unloading idle TTS model... (rss_before=%.1f MB, active=%d)",
before_mb,
self._active,
)
self._service.unload_model()
# Drop the service instance as well to release any lingering refs
self._service = None
# Force GC and attempt allocator cache cleanup
try:
gc.collect()
finally:
if torch is not None:
try:
if hasattr(torch, "cuda") and torch.cuda.is_available():
torch.cuda.empty_cache()
except Exception:
logger.debug("cuda.empty_cache() failed", exc_info=True)
try:
# MPS empty_cache may exist depending on torch version
mps = getattr(torch, "mps", None)
if mps is not None and hasattr(mps, "empty_cache"):
mps.empty_cache()
except Exception:
logger.debug("mps.empty_cache() failed", exc_info=True)
after_mb = _rss_mb()
if after_mb >= 0 and before_mb >= 0:
logger.info(
"Idle unload complete (rss_after=%.1f MB, delta=%.1f MB)",
after_mb,
after_mb - before_mb,
)
self._last_used = time.time()
async def get_service(self) -> TTSService:
if not self._service or self._service.model is None:
await self.load()
self._last_used = time.time()
return self._service # type: ignore[return-value]
async def _inc(self) -> None:
async with self._counter_lock:
self._active += 1
async def _dec(self) -> None:
async with self._counter_lock:
self._active = max(0, self._active - 1)
self._last_used = time.time()
def last_used(self) -> float:
return self._last_used
def is_loaded(self) -> bool:
return bool(self._service and self._service.model is not None)
def active(self) -> int:
return self._active
def using(self):
manager = self
class _Ctx:
async def __aenter__(self):
await manager._inc()
return manager
async def __aexit__(self, exc_type, exc, tb):
await manager._dec()
return _Ctx()

View File

@ -1,11 +1,14 @@
import torch
import torchaudio
import asyncio
from typing import Optional
from chatterbox.tts import ChatterboxTTS
from pathlib import Path
import gc # Garbage collector for memory management
import os
from contextlib import contextmanager
from datetime import datetime
import time
# Import configuration
try:
@ -114,42 +117,52 @@ class TTSService:
# output_filename_base from DialogProcessorService is expected to be comprehensive (e.g., includes speaker_id, segment info)
output_file_path = target_output_dir / f"{output_filename_base}.wav"
print(f"Generating audio for text: \"{text[:50]}...\" with speaker sample: {speaker_sample_path}")
wav = None
start_ts = datetime.now()
print(f"[{start_ts.isoformat(timespec='seconds')}] [TTS] START generate+save base={output_filename_base} len={len(text)} sample={speaker_sample_path}")
try:
with torch.no_grad(): # Important for inference
wav = self.model.generate(
text=text,
audio_prompt_path=str(speaker_sample_p), # Must be a string path
exaggeration=exaggeration,
cfg_weight=cfg_weight,
temperature=temperature,
)
torchaudio.save(str(output_file_path), wav, self.model.sr)
print(f"Audio saved to: {output_file_path}")
return output_file_path
except Exception as e:
print(f"Error during TTS generation or saving: {e}")
raise
finally:
# Explicitly delete the wav tensor to free memory
if wav is not None:
del wav
# Force garbage collection and cache cleanup
gc.collect()
if self.device == "cuda":
torch.cuda.empty_cache()
elif self.device == "mps":
if hasattr(torch.mps, "empty_cache"):
torch.mps.empty_cache()
# Unload the model if requested
def _gen_and_save() -> Path:
t0 = time.perf_counter()
wav = None
try:
with torch.no_grad(): # Important for inference
wav = self.model.generate(
text=text,
audio_prompt_path=str(speaker_sample_p), # Must be a string path
exaggeration=exaggeration,
cfg_weight=cfg_weight,
temperature=temperature,
)
# Save the audio synchronously in the same thread
torchaudio.save(str(output_file_path), wav, self.model.sr)
t1 = time.perf_counter()
print(f"[TTS-THREAD] Saved {output_file_path.name} in {t1 - t0:.2f}s")
return output_file_path
finally:
# Cleanup in the same thread that created the tensor
if wav is not None:
del wav
gc.collect()
if self.device == "cuda":
torch.cuda.empty_cache()
elif self.device == "mps":
if hasattr(torch.mps, "empty_cache"):
torch.mps.empty_cache()
out_path = await asyncio.to_thread(_gen_and_save)
end_ts = datetime.now()
print(f"[{end_ts.isoformat(timespec='seconds')}] [TTS] END generate+save base={output_filename_base} dur={(end_ts - start_ts).total_seconds():.2f}s -> {out_path}")
# Optionally unload model after generation
if unload_after:
print("Unloading TTS model after generation...")
self.unload_model()
return out_path
except Exception as e:
print(f"Error during TTS generation or saving: {e}")
raise
# Example usage (for testing, not part of the service itself)
if __name__ == "__main__":
async def main_test():

View File

@ -14,6 +14,14 @@ if __name__ == "__main__":
print(f"CORS Origins: {config.CORS_ORIGINS}")
print(f"Project Root: {config.PROJECT_ROOT}")
print(f"Device: {config.DEVICE}")
# Idle eviction settings
print(
"Model Eviction -> enabled: {} | idle_timeout: {}s | check_interval: {}s".format(
getattr(config, "MODEL_EVICTION_ENABLED", True),
getattr(config, "MODEL_IDLE_TIMEOUT_SECONDS", 0),
getattr(config, "MODEL_IDLE_CHECK_INTERVAL_SECONDS", 60),
)
)
uvicorn.run(
"app.main:app",

2
forge.yaml Normal file
View File

@ -0,0 +1,2 @@
# yaml-language-server: $schema=https://raw.githubusercontent.com/antinomyhq/forge/refs/heads/main/forge.schema.json
model: qwen/qwen3-coder

View File

@ -24,7 +24,7 @@
--text-blue-darker: #205081;
/* Border Colors */
--border-light: #1b0404;
--border-light: #e5e7eb;
--border-medium: #cfd8dc;
--border-blue: #b5c6df;
--border-gray: #e3e3e3;
@ -55,7 +55,7 @@ body {
}
.container {
max-width: 1100px;
max-width: 1280px;
margin: 0 auto;
padding: 0 18px;
}
@ -134,6 +134,17 @@ main {
font-size: 1rem;
}
/* Allow wrapping for Text/Duration (3rd) column */
#dialog-items-table td:nth-child(3),
#dialog-items-table td.dialog-editable-cell {
white-space: pre-wrap; /* wrap text and preserve newlines */
overflow: visible; /* override global overflow hidden */
text-overflow: clip; /* no ellipsis */
word-break: break-word;/* wrap long words/URLs */
color: var(--text-primary); /* darker text for readability */
font-weight: 350; /* slightly heavier than 300, lighter than 400 */
}
/* Make the Speaker (2nd) column narrower */
#dialog-items-table th:nth-child(2), #dialog-items-table td:nth-child(2) {
width: 60px;
@ -142,11 +153,11 @@ main {
text-align: center;
}
/* Make the Actions (4th) column narrower */
/* Actions (4th) column sizing */
#dialog-items-table th:nth-child(4), #dialog-items-table td:nth-child(4) {
width: 110px;
min-width: 90px;
max-width: 130px;
width: 200px;
min-width: 180px;
max-width: 280px;
text-align: left;
padding-left: 0;
padding-right: 0;
@ -186,8 +197,22 @@ main {
#dialog-items-table td.actions {
text-align: left;
min-width: 110px;
white-space: nowrap;
min-width: 200px;
white-space: normal; /* allow wrapping so we don't see ellipsis */
overflow: visible; /* override table cell default from global rule */
text-overflow: clip; /* no ellipsis */
}
/* Allow wrapping of action buttons on smaller screens */
@media (max-width: 900px) {
#dialog-items-table th:nth-child(4), #dialog-items-table td:nth-child(4) {
width: auto;
min-width: 160px;
max-width: none;
}
#dialog-items-table td.actions {
white-space: normal;
}
}
/* Collapsible log details */
@ -346,7 +371,7 @@ button {
margin-right: 10px;
}
.generate-line-btn, .play-line-btn {
.generate-line-btn, .play-line-btn, .stop-line-btn {
background: var(--bg-blue-light);
color: var(--text-blue);
border: 1.5px solid var(--border-blue);
@ -363,7 +388,7 @@ button {
vertical-align: middle;
}
.generate-line-btn:disabled, .play-line-btn:disabled {
.generate-line-btn:disabled, .play-line-btn:disabled, .stop-line-btn:disabled {
opacity: 0.45;
cursor: not-allowed;
}
@ -374,7 +399,7 @@ button {
border-color: var(--warning-border);
}
.generate-line-btn:hover, .play-line-btn:hover {
.generate-line-btn:hover, .play-line-btn:hover, .stop-line-btn:hover {
background: var(--bg-blue-lighter);
color: var(--text-blue-darker);
border-color: var(--text-blue);
@ -449,6 +474,72 @@ footer {
border-top: 3px solid var(--primary-blue);
}
/* Inline Notification */
.notice {
max-width: 1280px;
margin: 16px auto 0;
padding: 12px 16px;
border-radius: 6px;
border: 1px solid var(--border-medium);
background: var(--bg-white);
color: var(--text-primary);
display: flex;
align-items: center;
gap: 12px;
box-shadow: 0 1px 2px var(--shadow-light);
}
.notice--info {
border-color: var(--border-blue);
background: var(--bg-blue-light);
}
.notice--success {
border-color: #A7F3D0;
background: #ECFDF5;
}
.notice--warning {
border-color: var(--warning-border);
background: var(--warning-bg);
}
.notice--error {
border-color: var(--error-bg-dark);
background: #FEE2E2;
}
.notice__content {
flex: 1;
}
.notice__actions {
display: flex;
gap: 8px;
}
.notice__actions button {
padding: 6px 12px;
border-radius: 4px;
border: 1px solid var(--border-medium);
background: var(--bg-white);
cursor: pointer;
}
.notice__actions .btn-primary {
background: var(--primary-blue);
color: var(--text-white);
border: none;
}
.notice__close {
background: none;
border: none;
font-size: 18px;
cursor: pointer;
color: var(--text-secondary);
}
@media (max-width: 900px) {
.panel-grid {
flex-direction: column;

View File

@ -11,8 +11,38 @@
<div class="container">
<h1>Chatterbox TTS</h1>
</div>
<!-- Paste Script Modal -->
<div id="paste-script-modal" class="modal" style="display: none;">
<div class="modal-content">
<div class="modal-header">
<h3>Paste Dialog Script</h3>
<button class="modal-close" id="paste-script-close">&times;</button>
</div>
<div class="modal-body">
<p>Paste JSONL content (one JSON object per line). Example lines:</p>
<pre style="white-space:pre-wrap; background:#f6f8fa; padding:8px; border-radius:4px;">
{"type":"speech","speaker_id":"alice","text":"Hello there!"}
{"type":"silence","duration":0.5}
{"type":"speech","speaker_id":"bob","text":"Hi!"}
</pre>
<textarea id="paste-script-text" rows="10" style="width:100%;" placeholder='Paste JSONL here'></textarea>
</div>
<div class="modal-footer">
<button id="paste-script-load" class="btn-primary">Load</button>
<button id="paste-script-cancel" class="btn-secondary">Cancel</button>
</div>
</div>
</div>
</header>
<!-- Global inline notification area -->
<div id="global-notice" class="notice" role="status" aria-live="polite" style="display:none;">
<div class="notice__content" id="global-notice-content"></div>
<div class="notice__actions" id="global-notice-actions"></div>
<button class="notice__close" id="global-notice-close" aria-label="Close notification">&times;</button>
</div>
<main class="container" role="main">
<div class="panel-grid">
<section id="dialog-editor" class="panel full-width-panel" aria-labelledby="dialog-editor-title">
@ -48,6 +78,7 @@
<button id="save-script-btn">Save Script</button>
<input type="file" id="load-script-input" accept=".jsonl" style="display: none;">
<button id="load-script-btn">Load Script</button>
<button id="paste-script-btn">Paste Script</button>
</div>
</section>
</div>
@ -101,8 +132,8 @@
</div>
</footer>
<!-- TTS Settings Modal -->
<div id="tts-settings-modal" class="modal" style="display: none;">
<!-- TTS Settings Modal -->
<div id="tts-settings-modal" class="modal" style="display: none;">
<div class="modal-content">
<div class="modal-header">
<h3>TTS Settings</h3>

View File

@ -10,7 +10,7 @@ const API_BASE_URL = API_BASE_URL_WITH_PREFIX;
* @throws {Error} If the network response is not ok.
*/
export async function getSpeakers() {
const response = await fetch(`${API_BASE_URL}/speakers/`);
const response = await fetch(`${API_BASE_URL}/speakers`);
if (!response.ok) {
const errorData = await response.json().catch(() => ({ message: response.statusText }));
throw new Error(`Failed to fetch speakers: ${errorData.detail || errorData.message || response.statusText}`);
@ -26,12 +26,12 @@ export async function getSpeakers() {
* Adds a new speaker.
* @param {FormData} formData - The form data containing speaker name and audio file.
* Example: formData.append('name', 'New Speaker');
* formData.append('audio_sample_file', fileInput.files[0]);
* formData.append('audio_file', fileInput.files[0]);
* @returns {Promise<Object>} A promise that resolves to the new speaker object.
* @throws {Error} If the network response is not ok.
*/
export async function addSpeaker(formData) {
const response = await fetch(`${API_BASE_URL}/speakers/`, {
const response = await fetch(`${API_BASE_URL}/speakers`, {
method: 'POST',
body: formData, // FormData sets Content-Type to multipart/form-data automatically
});
@ -86,7 +86,7 @@ export async function addSpeaker(formData) {
* @throws {Error} If the network response is not ok.
*/
export async function deleteSpeaker(speakerId) {
const response = await fetch(`${API_BASE_URL}/speakers/${speakerId}/`, {
const response = await fetch(`${API_BASE_URL}/speakers/${speakerId}`, {
method: 'DELETE',
});
if (!response.ok) {
@ -124,18 +124,8 @@ export async function generateLine(line) {
const errorData = await response.json().catch(() => ({ message: response.statusText }));
throw new Error(`Failed to generate line audio: ${errorData.detail || errorData.message || response.statusText}`);
}
const responseText = await response.text();
console.log('Raw response text:', responseText);
try {
const jsonData = JSON.parse(responseText);
console.log('Parsed JSON:', jsonData);
return jsonData;
} catch (parseError) {
console.error('JSON parse error:', parseError);
throw new Error(`Invalid JSON response: ${responseText}`);
}
const data = await response.json();
return data;
}
/**
@ -146,7 +136,7 @@ export async function generateLine(line) {
* output_base_name: "my_dialog",
* dialog_items: [
* { type: "speech", speaker_id: "speaker1", text: "Hello world.", exaggeration: 1.0, cfg_weight: 2.0, temperature: 0.7 },
* { type: "silence", duration_ms: 500 },
* { type: "silence", duration: 0.5 },
* { type: "speech", speaker_id: "speaker2", text: "How are you?" }
* ]
* }

View File

@ -1,6 +1,69 @@
import { getSpeakers, addSpeaker, deleteSpeaker, generateDialog } from './api.js';
import { API_BASE_URL, API_BASE_URL_FOR_FILES } from './config.js';
// Shared per-line audio playback state to prevent overlapping playback
let currentLineAudio = null;
let currentLinePlayBtn = null;
let currentLineStopBtn = null;
// --- Global Inline Notification Helpers --- //
const noticeEl = document.getElementById('global-notice');
const noticeContentEl = document.getElementById('global-notice-content');
const noticeActionsEl = document.getElementById('global-notice-actions');
const noticeCloseBtn = document.getElementById('global-notice-close');
function hideNotice() {
if (!noticeEl) return;
noticeEl.style.display = 'none';
noticeEl.className = 'notice';
if (noticeContentEl) noticeContentEl.textContent = '';
if (noticeActionsEl) noticeActionsEl.innerHTML = '';
}
function showNotice(message, type = 'info', options = {}) {
if (!noticeEl || !noticeContentEl || !noticeActionsEl) {
console[type === 'error' ? 'error' : 'log']('[NOTICE]', message);
return () => {};
}
const { timeout = null, actions = [] } = options;
noticeEl.className = `notice notice--${type}`;
noticeContentEl.textContent = message;
noticeActionsEl.innerHTML = '';
actions.forEach(({ text, primary = false, onClick }) => {
const btn = document.createElement('button');
btn.textContent = text;
if (primary) btn.classList.add('btn-primary');
btn.onclick = () => {
try { onClick && onClick(); } finally { hideNotice(); }
};
noticeActionsEl.appendChild(btn);
});
if (noticeCloseBtn) noticeCloseBtn.onclick = hideNotice;
noticeEl.style.display = 'flex';
let timerId = null;
if (timeout && Number.isFinite(timeout)) {
timerId = window.setTimeout(hideNotice, timeout);
}
return () => {
if (timerId) window.clearTimeout(timerId);
hideNotice();
};
}
function confirmAction(message) {
return new Promise((resolve) => {
showNotice(message, 'warning', {
actions: [
{ text: 'Cancel', primary: false, onClick: () => resolve(false) },
{ text: 'Confirm', primary: true, onClick: () => resolve(true) },
],
});
});
}
document.addEventListener('DOMContentLoaded', async () => {
console.log('DOM fully loaded and parsed');
initializeSpeakerManagement();
@ -23,18 +86,24 @@ function initializeSpeakerManagement() {
const audioFile = formData.get('audio_file');
if (!speakerName || !audioFile || audioFile.size === 0) {
alert('Please provide a speaker name and an audio file.');
showNotice('Please provide a speaker name and an audio file.', 'warning', { timeout: 4000 });
return;
}
try {
const submitBtn = addSpeakerForm.querySelector('button[type="submit"]');
const prevText = submitBtn ? submitBtn.textContent : null;
if (submitBtn) { submitBtn.disabled = true; submitBtn.textContent = 'Adding…'; }
const newSpeaker = await addSpeaker(formData);
alert(`Speaker added: ${newSpeaker.name} (ID: ${newSpeaker.id})`);
showNotice(`Speaker added: ${newSpeaker.name} (ID: ${newSpeaker.id})`, 'success', { timeout: 3000 });
addSpeakerForm.reset();
loadSpeakers(); // Refresh speaker list
} catch (error) {
console.error('Failed to add speaker:', error);
alert('Error adding speaker: ' + error.message);
showNotice('Error adding speaker: ' + error.message, 'error');
} finally {
const submitBtn = addSpeakerForm.querySelector('button[type="submit"]');
if (submitBtn) { submitBtn.disabled = false; submitBtn.textContent = 'Add Speaker'; }
}
});
}
@ -79,23 +148,24 @@ async function loadSpeakers() {
} catch (error) {
console.error('Failed to load speakers:', error);
speakerListUL.innerHTML = '<li>Error loading speakers. See console for details.</li>';
alert('Error loading speakers: ' + error.message);
showNotice('Error loading speakers: ' + error.message, 'error');
}
}
async function handleDeleteSpeaker(speakerId) {
if (!speakerId) {
alert('Cannot delete speaker: Speaker ID is missing.');
showNotice('Cannot delete speaker: Speaker ID is missing.', 'warning', { timeout: 4000 });
return;
}
if (!confirm(`Are you sure you want to delete speaker ${speakerId}?`)) return;
const ok = await confirmAction(`Are you sure you want to delete speaker ${speakerId}?`);
if (!ok) return;
try {
await deleteSpeaker(speakerId);
alert(`Speaker ${speakerId} deleted successfully.`);
showNotice(`Speaker ${speakerId} deleted successfully.`, 'success', { timeout: 3000 });
loadSpeakers(); // Refresh speaker list
} catch (error) {
console.error(`Failed to delete speaker ${speakerId}:`, error);
alert(`Error deleting speaker: ${error.message}`);
showNotice(`Error deleting speaker: ${error.message}`, 'error');
}
}
@ -131,6 +201,12 @@ async function initializeDialogEditor() {
const saveScriptBtn = document.getElementById('save-script-btn');
const loadScriptBtn = document.getElementById('load-script-btn');
const loadScriptInput = document.getElementById('load-script-input');
const pasteScriptBtn = document.getElementById('paste-script-btn');
const pasteModal = document.getElementById('paste-script-modal');
const pasteText = document.getElementById('paste-script-text');
const pasteLoadBtn = document.getElementById('paste-script-load');
const pasteCancelBtn = document.getElementById('paste-script-cancel');
const pasteCloseBtn = document.getElementById('paste-script-close');
// Results Display Elements
const generationLogPre = document.getElementById('generation-log-content'); // Corrected ID
@ -140,9 +216,6 @@ async function initializeDialogEditor() {
const zipArchivePlaceholder = document.getElementById('zip-archive-placeholder');
const resultsDisplaySection = document.getElementById('results-display');
let dialogItems = [];
let availableSpeakersCache = []; // Cache for speaker names and IDs
// Load speakers at startup
try {
availableSpeakersCache = await getSpeakers();
@ -152,6 +225,48 @@ async function initializeDialogEditor() {
// Continue without speakers - they'll be loaded when needed
}
// --- LocalStorage persistence helpers ---
const LS_KEY = 'dialogEditor.items.v1';
function saveDialogToLocalStorage() {
try {
const exportData = dialogItems.map(item => {
const obj = { type: item.type };
if (item.type === 'speech') {
obj.speaker_id = item.speaker_id;
obj.text = item.text;
if (item.exaggeration !== undefined) obj.exaggeration = item.exaggeration;
if (item.cfg_weight !== undefined) obj.cfg_weight = item.cfg_weight;
if (item.temperature !== undefined) obj.temperature = item.temperature;
if (item.audioUrl) obj.audioUrl = item.audioUrl; // keep existing audio reference if present
} else if (item.type === 'silence') {
obj.duration = item.duration;
}
return obj;
});
localStorage.setItem(LS_KEY, JSON.stringify({ items: exportData }));
} catch (e) {
console.warn('Failed to save dialog to localStorage:', e);
}
}
function loadDialogFromLocalStorage() {
try {
const raw = localStorage.getItem(LS_KEY);
if (!raw) return;
const parsed = JSON.parse(raw);
if (!parsed || !Array.isArray(parsed.items)) return;
const loaded = parsed.items.map(normalizeDialogItem);
dialogItems.splice(0, dialogItems.length, ...loaded);
console.log(`Restored ${loaded.length} dialog items from localStorage`);
} catch (e) {
console.warn('Failed to load dialog from localStorage:', e);
}
}
// Attempt to restore saved dialog before first render
loadDialogFromLocalStorage();
// Function to render the current dialogItems array to the DOM as table rows
function renderDialogItems() {
if (!dialogItemsContainer) return;
@ -184,6 +299,8 @@ async function initializeDialogEditor() {
});
speakerSelect.onchange = (e) => {
dialogItems[index].speaker_id = e.target.value;
// Persist change
saveDialogToLocalStorage();
};
speakerTd.appendChild(speakerSelect);
} else {
@ -195,8 +312,7 @@ async function initializeDialogEditor() {
const textTd = document.createElement('td');
textTd.className = 'dialog-editable-cell';
if (item.type === 'speech') {
let txt = item.text.length > 60 ? item.text.substring(0, 57) + '…' : item.text;
textTd.textContent = `"${txt}"`;
textTd.textContent = `"${item.text}"`;
textTd.title = item.text;
} else {
textTd.textContent = `${item.duration}s`;
@ -243,6 +359,8 @@ async function initializeDialogEditor() {
if (!isNaN(val) && val > 0) dialogItems[index].duration = val;
dialogItems[index].audioUrl = null;
}
// Persist changes before re-render
saveDialogToLocalStorage();
renderDialogItems();
}
};
@ -261,6 +379,7 @@ async function initializeDialogEditor() {
upBtn.onclick = () => {
if (index > 0) {
[dialogItems[index - 1], dialogItems[index]] = [dialogItems[index], dialogItems[index - 1]];
saveDialogToLocalStorage();
renderDialogItems();
}
};
@ -275,6 +394,7 @@ async function initializeDialogEditor() {
downBtn.onclick = () => {
if (index < dialogItems.length - 1) {
[dialogItems[index], dialogItems[index + 1]] = [dialogItems[index + 1], dialogItems[index]];
saveDialogToLocalStorage();
renderDialogItems();
}
};
@ -288,6 +408,7 @@ async function initializeDialogEditor() {
removeBtn.title = 'Remove';
removeBtn.onclick = () => {
dialogItems.splice(index, 1);
saveDialogToLocalStorage();
renderDialogItems();
};
actionsTd.appendChild(removeBtn);
@ -314,6 +435,8 @@ async function initializeDialogEditor() {
if (result && result.audio_url) {
dialogItems[index].audioUrl = result.audio_url;
console.log('Set audioUrl to:', result.audio_url);
// Persist newly generated audio reference
saveDialogToLocalStorage();
} else {
console.error('Invalid result structure:', result);
throw new Error('Invalid response: missing audio_url');
@ -321,7 +444,7 @@ async function initializeDialogEditor() {
} catch (err) {
console.error('Error in generateLine:', err);
dialogItems[index].error = err.message || 'Failed to generate audio.';
alert(dialogItems[index].error);
showNotice(dialogItems[index].error, 'error');
} finally {
dialogItems[index].isGenerating = false;
renderDialogItems();
@ -330,19 +453,107 @@ async function initializeDialogEditor() {
actionsTd.appendChild(generateBtn);
// --- NEW: Per-line Play button ---
const playBtn = document.createElement('button');
playBtn.innerHTML = '⏵';
playBtn.title = item.audioUrl ? 'Play generated audio' : 'No audio generated yet';
playBtn.className = 'play-line-btn';
playBtn.disabled = !item.audioUrl;
playBtn.onclick = () => {
if (!item.audioUrl) return;
let audioUrl = item.audioUrl.startsWith('http') ? item.audioUrl : `${API_BASE_URL_FOR_FILES}${item.audioUrl}`;
// Use a shared audio element or create one per play
let audio = new window.Audio(audioUrl);
audio.play();
const playPauseBtn = document.createElement('button');
playPauseBtn.innerHTML = '⏵';
playPauseBtn.title = item.audioUrl ? 'Play' : 'No audio generated yet';
playPauseBtn.className = 'play-line-btn';
playPauseBtn.disabled = !item.audioUrl;
const stopBtn = document.createElement('button');
stopBtn.innerHTML = '⏹';
stopBtn.title = 'Stop';
stopBtn.className = 'stop-line-btn';
stopBtn.disabled = !item.audioUrl;
const setBtnStatesForPlaying = () => {
try {
playPauseBtn.innerHTML = '⏸';
playPauseBtn.title = 'Pause';
stopBtn.disabled = false;
} catch (e) { /* detached */ }
};
actionsTd.appendChild(playBtn);
const setBtnStatesForPausedOrStopped = () => {
try {
playPauseBtn.innerHTML = '⏵';
playPauseBtn.title = 'Play';
} catch (e) { /* detached */ }
};
const stopCurrent = () => {
if (currentLineAudio) {
try { currentLineAudio.pause(); currentLineAudio.currentTime = 0; } catch (e) { /* noop */ }
}
if (currentLinePlayBtn) {
try { currentLinePlayBtn.innerHTML = '⏵'; currentLinePlayBtn.title = 'Play'; } catch (e) { /* detached */ }
}
if (currentLineStopBtn) {
try { currentLineStopBtn.disabled = true; } catch (e) { /* detached */ }
}
currentLineAudio = null;
currentLinePlayBtn = null;
currentLineStopBtn = null;
};
playPauseBtn.onclick = () => {
if (!item.audioUrl) return;
const audioUrl = item.audioUrl.startsWith('http') ? item.audioUrl : `${API_BASE_URL_FOR_FILES}${item.audioUrl}`;
// If controlling the same line
if (currentLineAudio && currentLinePlayBtn === playPauseBtn) {
if (currentLineAudio.paused) {
// Resume
currentLineAudio.play().then(() => setBtnStatesForPlaying()).catch(err => {
console.error('Audio resume failed:', err);
showNotice('Could not resume audio.', 'error', { timeout: 2000 });
});
} else {
// Pause
try { currentLineAudio.pause(); } catch (e) { /* noop */ }
setBtnStatesForPausedOrStopped();
}
return;
}
// Switching to a different line: stop previous
if (currentLineAudio) {
stopCurrent();
}
// Start new audio
const audio = new window.Audio(audioUrl);
currentLineAudio = audio;
currentLinePlayBtn = playPauseBtn;
currentLineStopBtn = stopBtn;
const clearState = () => {
if (currentLineAudio === audio) {
setBtnStatesForPausedOrStopped();
try { stopBtn.disabled = true; } catch (e) { /* detached */ }
currentLineAudio = null;
currentLinePlayBtn = null;
currentLineStopBtn = null;
}
};
audio.addEventListener('ended', clearState, { once: true });
audio.addEventListener('error', clearState, { once: true });
audio.play().then(() => setBtnStatesForPlaying()).catch(err => {
console.error('Audio play failed:', err);
clearState();
showNotice('Could not play audio.', 'error', { timeout: 2000 });
});
};
stopBtn.onclick = () => {
// Only acts if this line is the active one
if (currentLineAudio && currentLinePlayBtn === playPauseBtn) {
stopCurrent();
}
};
actionsTd.appendChild(playPauseBtn);
actionsTd.appendChild(stopBtn);
// --- NEW: Settings button for speech items ---
if (item.type === 'speech') {
@ -383,13 +594,13 @@ async function initializeDialogEditor() {
try {
availableSpeakersCache = await getSpeakers();
} catch (error) {
alert('Could not load speakers. Please try again.');
showNotice('Could not load speakers. Please try again.', 'error');
console.error('Error fetching speakers for dialog:', error);
return;
}
}
if (availableSpeakersCache.length === 0) {
alert('No speakers available. Please add a speaker first.');
showNotice('No speakers available. Please add a speaker first.', 'warning', { timeout: 4000 });
return;
}
@ -419,10 +630,11 @@ async function initializeDialogEditor() {
const speakerId = speakerSelect.value;
const text = textInput.value.trim();
if (!speakerId || !text) {
alert('Please select a speaker and enter text.');
showNotice('Please select a speaker and enter text.', 'warning', { timeout: 4000 });
return;
}
dialogItems.push(normalizeDialogItem({ type: 'speech', speaker_id: speakerId, text: text }));
saveDialogToLocalStorage();
renderDialogItems();
clearTempInputArea();
};
@ -461,10 +673,11 @@ async function initializeDialogEditor() {
addButton.onclick = () => {
const duration = parseFloat(durationInput.value);
if (isNaN(duration) || duration <= 0) {
alert('Invalid duration. Please enter a positive number.');
showNotice('Invalid duration. Please enter a positive number.', 'warning', { timeout: 4000 });
return;
}
dialogItems.push(normalizeDialogItem({ type: 'silence', duration: duration }));
saveDialogToLocalStorage();
renderDialogItems();
clearTempInputArea();
};
@ -486,15 +699,18 @@ async function initializeDialogEditor() {
generateDialogBtn.addEventListener('click', async () => {
const outputBaseName = outputBaseNameInput.value.trim();
if (!outputBaseName) {
alert('Please enter an output base name.');
showNotice('Please enter an output base name.', 'warning', { timeout: 4000 });
outputBaseNameInput.focus();
return;
}
if (dialogItems.length === 0) {
alert('Please add at least one speech or silence line to the dialog.');
showNotice('Please add at least one speech or silence line to the dialog.', 'warning', { timeout: 4000 });
return; // Prevent further execution if no dialog items
}
const prevText = generateDialogBtn.textContent;
generateDialogBtn.disabled = true;
generateDialogBtn.textContent = 'Generating…';
// Smart dialog-wide generation: use pre-generated audio where present
const dialogItemsToGenerate = dialogItems.map(item => {
// Only send minimal fields for items that need generation
@ -546,7 +762,11 @@ async function initializeDialogEditor() {
} catch (error) {
console.error('Dialog generation failed:', error);
if (generationLogPre) generationLogPre.textContent = `Error generating dialog: ${error.message}`;
alert(`Error generating dialog: ${error.message}`);
showNotice(`Error generating dialog: ${error.message}`, 'error');
}
finally {
generateDialogBtn.disabled = false;
generateDialogBtn.textContent = prevText;
}
});
}
@ -554,7 +774,7 @@ async function initializeDialogEditor() {
// --- Save/Load Script Functionality ---
function saveDialogScript() {
if (dialogItems.length === 0) {
alert('No dialog items to save. Please add some speech or silence lines first.');
showNotice('No dialog items to save. Please add some speech or silence lines first.', 'warning', { timeout: 4000 });
return;
}
@ -599,11 +819,12 @@ async function initializeDialogEditor() {
URL.revokeObjectURL(url);
console.log(`Dialog script saved as ${filename}`);
showNotice(`Dialog script saved as ${filename}`, 'success', { timeout: 3000 });
}
function loadDialogScript(file) {
if (!file) {
alert('Please select a file to load.');
showNotice('Please select a file to load.', 'warning', { timeout: 4000 });
return;
}
@ -626,19 +847,19 @@ async function initializeDialogEditor() {
}
} catch (parseError) {
console.error(`Error parsing line ${i + 1}:`, parseError);
alert(`Error parsing line ${i + 1}: ${parseError.message}`);
showNotice(`Error parsing line ${i + 1}: ${parseError.message}`, 'error');
return;
}
}
if (loadedItems.length === 0) {
alert('No valid dialog items found in the file.');
showNotice('No valid dialog items found in the file.', 'warning', { timeout: 4000 });
return;
}
// Confirm replacement if existing items
if (dialogItems.length > 0) {
const confirmed = confirm(
const confirmed = await confirmAction(
`This will replace your current dialog (${dialogItems.length} items) with the loaded script (${loadedItems.length} items). Continue?`
);
if (!confirmed) return;
@ -650,30 +871,97 @@ async function initializeDialogEditor() {
availableSpeakersCache = await getSpeakers();
} catch (error) {
console.error('Error fetching speakers:', error);
alert('Could not load speakers. Dialog loaded but speaker names may not display correctly.');
showNotice('Could not load speakers. Dialog loaded but speaker names may not display correctly.', 'warning', { timeout: 5000 });
}
}
// Replace current dialog
dialogItems.splice(0, dialogItems.length, ...loadedItems);
// Persist loaded script
saveDialogToLocalStorage();
renderDialogItems();
console.log(`Loaded ${loadedItems.length} dialog items from script`);
alert(`Successfully loaded ${loadedItems.length} dialog items.`);
showNotice(`Successfully loaded ${loadedItems.length} dialog items.`, 'success', { timeout: 3000 });
} catch (error) {
console.error('Error loading dialog script:', error);
alert(`Error loading dialog script: ${error.message}`);
showNotice(`Error loading dialog script: ${error.message}`, 'error');
}
};
reader.onerror = function() {
alert('Error reading file. Please try again.');
showNotice('Error reading file. Please try again.', 'error');
};
reader.readAsText(file);
}
// Load dialog script from pasted JSONL text
async function loadDialogScriptFromText(text) {
if (!text || !text.trim()) {
showNotice('Please paste JSONL content to load.', 'warning', { timeout: 4000 });
return false;
}
try {
const lines = text.trim().split('\n');
const loadedItems = [];
for (let i = 0; i < lines.length; i++) {
const line = lines[i].trim();
if (!line) continue; // Skip empty lines
try {
const item = JSON.parse(line);
const validatedItem = validateDialogItem(item, i + 1);
if (validatedItem) {
loadedItems.push(normalizeDialogItem(validatedItem));
}
} catch (parseError) {
console.error(`Error parsing line ${i + 1}:`, parseError);
showNotice(`Error parsing line ${i + 1}: ${parseError.message}`, 'error');
return false;
}
}
if (loadedItems.length === 0) {
showNotice('No valid dialog items found in the pasted content.', 'warning', { timeout: 4000 });
return false;
}
// Confirm replacement if existing items
if (dialogItems.length > 0) {
const confirmed = await confirmAction(
`This will replace your current dialog (${dialogItems.length} items) with the pasted script (${loadedItems.length} items). Continue?`
);
if (!confirmed) return false;
}
// Ensure speakers are loaded before rendering
if (availableSpeakersCache.length === 0) {
try {
availableSpeakersCache = await getSpeakers();
} catch (error) {
console.error('Error fetching speakers:', error);
showNotice('Could not load speakers. Dialog loaded but speaker names may not display correctly.', 'warning', { timeout: 5000 });
}
}
// Replace current dialog
dialogItems.splice(0, dialogItems.length, ...loadedItems);
// Persist loaded script
saveDialogToLocalStorage();
renderDialogItems();
console.log(`Loaded ${loadedItems.length} dialog items from pasted text`);
showNotice(`Successfully loaded ${loadedItems.length} dialog items.`, 'success', { timeout: 3000 });
return true;
} catch (error) {
console.error('Error loading dialog script from text:', error);
showNotice(`Error loading dialog script: ${error.message}`, 'error');
return false;
}
}
function validateDialogItem(item, lineNumber) {
if (!item || typeof item !== 'object') {
throw new Error(`Line ${lineNumber}: Invalid item format`);
@ -729,12 +1017,75 @@ async function initializeDialogEditor() {
const file = e.target.files[0];
if (file) {
loadDialogScript(file);
// Reset input so same file can be loaded again
e.target.value = '';
}
});
}
// --- Paste Script (JSONL) Modal Handlers ---
if (pasteScriptBtn && pasteModal && pasteText && pasteLoadBtn && pasteCancelBtn && pasteCloseBtn) {
let escHandler = null;
const closePasteModal = () => {
pasteModal.style.display = 'none';
pasteLoadBtn.onclick = null;
pasteCancelBtn.onclick = null;
pasteCloseBtn.onclick = null;
pasteModal.onclick = null;
if (escHandler) {
document.removeEventListener('keydown', escHandler);
escHandler = null;
}
};
const openPasteModal = () => {
pasteText.value = '';
pasteModal.style.display = 'flex';
escHandler = (e) => { if (e.key === 'Escape') closePasteModal(); };
document.addEventListener('keydown', escHandler);
pasteModal.onclick = (e) => { if (e.target === pasteModal) closePasteModal(); };
pasteCloseBtn.onclick = closePasteModal;
pasteCancelBtn.onclick = closePasteModal;
pasteLoadBtn.onclick = async () => {
const ok = await loadDialogScriptFromText(pasteText.value);
if (ok) closePasteModal();
};
};
pasteScriptBtn.addEventListener('click', openPasteModal);
}
// --- Clear Dialog Button ---
let clearDialogBtn = document.getElementById('clear-dialog-btn');
if (!clearDialogBtn) {
clearDialogBtn = document.createElement('button');
clearDialogBtn.id = 'clear-dialog-btn';
clearDialogBtn.textContent = 'Clear Dialog';
// Insert next to Save/Load if possible
const saveLoadContainer = saveScriptBtn ? saveScriptBtn.parentElement : null;
if (saveLoadContainer) {
saveLoadContainer.appendChild(clearDialogBtn);
} else {
// Fallback: append near the add buttons container
const addBtnsContainer = addSpeechLineBtn ? addSpeechLineBtn.parentElement : null;
if (addBtnsContainer) addBtnsContainer.appendChild(clearDialogBtn);
}
}
if (clearDialogBtn) {
clearDialogBtn.addEventListener('click', async () => {
if (dialogItems.length === 0) {
showNotice('Dialog is already empty.', 'info', { timeout: 2500 });
return;
}
const ok = await confirmAction(`This will remove ${dialogItems.length} dialog item(s). Continue?`);
if (!ok) return;
// Clear any transient input UI
if (typeof clearTempInputArea === 'function') clearTempInputArea();
// Clear state and persistence
dialogItems.splice(0, dialogItems.length);
try { localStorage.removeItem(LS_KEY); } catch (e) { /* ignore */ }
renderDialogItems();
showNotice('Dialog cleared.', 'success', { timeout: 2500 });
});
}
console.log('Dialog Editor Initialized');
renderDialogItems(); // Initial render (empty)
@ -781,6 +1132,8 @@ async function initializeDialogEditor() {
dialogItems[index].audioUrl = null;
closeModal();
// Persist settings change
saveDialogToLocalStorage();
renderDialogItems(); // Re-render to reflect changes
console.log('TTS settings updated for item:', dialogItems[index]);
};

View File

@ -13,8 +13,15 @@ const getEnvVar = (name, defaultValue) => {
};
// API Configuration
export const API_BASE_URL = getEnvVar('VITE_API_BASE_URL', 'http://localhost:8000');
export const API_BASE_URL_WITH_PREFIX = getEnvVar('VITE_API_BASE_URL_WITH_PREFIX', 'http://localhost:8000/api');
// Default to the same hostname as the frontend, on port 8000 (override via VITE_API_BASE_URL*)
const _defaultHost = (typeof window !== 'undefined' && window.location?.hostname) || 'localhost';
const _defaultPort = getEnvVar('VITE_API_BASE_URL_PORT', '8000');
const _defaultBase = `http://${_defaultHost}:${_defaultPort}`;
export const API_BASE_URL = getEnvVar('VITE_API_BASE_URL', _defaultBase);
export const API_BASE_URL_WITH_PREFIX = getEnvVar(
'VITE_API_BASE_URL_WITH_PREFIX',
`${_defaultBase}/api`
);
// For file serving (same as API_BASE_URL since files are served from the same server)
export const API_BASE_URL_FOR_FILES = API_BASE_URL;

9
jest.config.cjs Normal file
View File

@ -0,0 +1,9 @@
// jest.config.cjs
module.exports = {
testEnvironment: 'node',
transform: {
'^.+\\.js$': 'babel-jest',
},
moduleFileExtensions: ['js', 'json'],
roots: ['<rootDir>/frontend/tests', '<rootDir>'],
};

View File

@ -5,11 +5,13 @@
"main": "index.js",
"type": "module",
"scripts": {
"test": "jest"
"test": "jest",
"test:frontend": "jest --config ./jest.config.cjs",
"frontend:dev": "python3 frontend/start_dev_server.py"
},
"repository": {
"type": "git",
"url": "https://oauth2:78f77aaebb8fa1cd3efbd5b738177c127f7d7d0b@gitea.r8z.us/stwhite/chatterbox-ui.git"
"url": "https://gitea.r8z.us/stwhite/chatterbox-ui.git"
},
"keywords": [],
"author": "",
@ -17,7 +19,7 @@
"devDependencies": {
"@babel/core": "^7.27.4",
"@babel/preset-env": "^7.27.2",
"babel-jest": "^30.0.0-beta.3",
"babel-jest": "^29.7.0",
"jest": "^29.7.0"
}
}

123
setup-windows.ps1 Normal file
View File

@ -0,0 +1,123 @@
#Requires -Version 5.1
<#!
Chatterbox TTS - Windows setup script
What it does:
- Creates a Python virtual environment in .venv (if missing)
- Upgrades pip
- Installs dependencies from backend/requirements.txt and requirements.txt
- Creates a default .env with sensible ports if not present
- Launches start_servers.py using the venv's Python
Usage:
- Right-click this file and "Run with PowerShell" OR from PowerShell:
./setup-windows.ps1
- Optional flags:
-NoInstall -> Skip installing dependencies (just start servers)
-NoStart -> Prepare env but do not start servers
Notes:
- You may need to allow script execution once:
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
- Press Ctrl+C in the console to stop both servers.
!#>
param(
[switch]$NoInstall,
[switch]$NoStart
)
$ErrorActionPreference = 'Stop'
function Write-Info($msg) { Write-Host "[INFO] $msg" -ForegroundColor Cyan }
function Write-Ok($msg) { Write-Host "[ OK ] $msg" -ForegroundColor Green }
function Write-Warn($msg) { Write-Host "[WARN] $msg" -ForegroundColor Yellow }
function Write-Err($msg) { Write-Host "[FAIL] $msg" -ForegroundColor Red }
$root = Split-Path -Parent $MyInvocation.MyCommand.Path
Set-Location $root
$venvDir = Join-Path $root ".venv"
$venvPython = Join-Path $venvDir "Scripts/python.exe"
# 1) Ensure Python available
function Get-BasePython {
try {
$pyExe = (Get-Command py -ErrorAction SilentlyContinue)
if ($pyExe) { return 'py -3' }
} catch { }
try {
$pyExe = (Get-Command python -ErrorAction SilentlyContinue)
if ($pyExe) { return 'python' }
} catch { }
throw "Python not found. Please install Python 3.x and add it to PATH."
}
# 2) Create venv if missing
if (-not (Test-Path $venvPython)) {
Write-Info "Creating virtual environment in .venv"
$basePy = Get-BasePython
if ($basePy -eq 'py -3') {
& py -3 -m venv .venv
} else {
& python -m venv .venv
}
Write-Ok "Virtual environment created"
} else {
Write-Info "Using existing virtual environment: $venvDir"
}
if (-not (Test-Path $venvPython)) {
throw ".venv python not found at $venvPython"
}
# 3) Install dependencies
if (-not $NoInstall) {
Write-Info "Upgrading pip"
& $venvPython -m pip install --upgrade pip
# Backend requirements
$backendReq = Join-Path $root 'backend/requirements.txt'
if (Test-Path $backendReq) {
Write-Info "Installing backend requirements"
& $venvPython -m pip install -r $backendReq
} else {
Write-Warn "backend/requirements.txt not found"
}
# Root requirements (optional frontend / project libs)
$rootReq = Join-Path $root 'requirements.txt'
if (Test-Path $rootReq) {
Write-Info "Installing root requirements"
& $venvPython -m pip install -r $rootReq
} else {
Write-Warn "requirements.txt not found at repo root"
}
Write-Ok "Dependency installation complete"
}
# 4) Ensure .env exists with sensible defaults
$envPath = Join-Path $root '.env'
if (-not (Test-Path $envPath)) {
Write-Info "Creating default .env"
@(
'BACKEND_PORT=8000',
'BACKEND_HOST=127.0.0.1',
'FRONTEND_PORT=8001',
'FRONTEND_HOST=127.0.0.1'
) -join "`n" | Out-File -FilePath $envPath -Encoding utf8 -Force
Write-Ok ".env created"
} else {
Write-Info ".env already exists; leaving as-is"
}
# 5) Start servers
if ($NoStart) {
Write-Info "-NoStart specified; setup complete. You can start later with:"
Write-Host " `"$venvPython`" `"$root\start_servers.py`"" -ForegroundColor Gray
exit 0
}
Write-Info "Starting servers via start_servers.py"
& $venvPython "$root/start_servers.py"

View File

@ -28,3 +28,9 @@ dd3552d9-f4e8-49ed-9892-f9e67afcf23c:
2cdd6d3d-c533-44bf-a5f6-cc83bd089d32:
name: Grace
sample_path: speaker_samples/2cdd6d3d-c533-44bf-a5f6-cc83bd089d32.wav
3d3e85db-3d67-4488-94b2-ffc189fbb287:
name: RCB
sample_path: speaker_samples/3d3e85db-3d67-4488-94b2-ffc189fbb287.wav
f754cf35-892c-49b6-822a-f2e37246623b:
name: Jim
sample_path: speaker_samples/f754cf35-892c-49b6-822a-f2e37246623b.wav

View File

@ -14,101 +14,109 @@ from pathlib import Path
# Try to load environment variables, but don't fail if dotenv is not available
try:
from dotenv import load_dotenv
load_dotenv()
except ImportError:
print("python-dotenv not installed, using system environment variables only")
# Configuration
BACKEND_PORT = int(os.getenv('BACKEND_PORT', '8000'))
BACKEND_HOST = os.getenv('BACKEND_HOST', '0.0.0.0')
FRONTEND_PORT = int(os.getenv('FRONTEND_PORT', '8001'))
FRONTEND_HOST = os.getenv('FRONTEND_HOST', '127.0.0.1')
BACKEND_PORT = int(os.getenv("BACKEND_PORT", "8000"))
BACKEND_HOST = os.getenv("BACKEND_HOST", "0.0.0.0")
# Frontend host/port (for dev server binding)
FRONTEND_PORT = int(os.getenv("FRONTEND_PORT", "8001"))
FRONTEND_HOST = os.getenv("FRONTEND_HOST", "0.0.0.0")
# Export frontend host/port so backend CORS config can pick them up automatically
os.environ["FRONTEND_HOST"] = FRONTEND_HOST
os.environ["FRONTEND_PORT"] = str(FRONTEND_PORT)
# Get project root directory
PROJECT_ROOT = Path(__file__).parent.absolute()
def run_backend():
"""Run the backend FastAPI server"""
os.chdir(PROJECT_ROOT / "backend")
cmd = [
sys.executable, "-m", "uvicorn",
"app.main:app",
"--reload",
f"--host={BACKEND_HOST}",
f"--port={BACKEND_PORT}"
sys.executable,
"-m",
"uvicorn",
"app.main:app",
"--reload",
f"--host={BACKEND_HOST}",
f"--port={BACKEND_PORT}",
]
print(f"\n{'='*50}")
print(f"Starting Backend Server at http://{BACKEND_HOST}:{BACKEND_PORT}")
print(f"API docs available at http://{BACKEND_HOST}:{BACKEND_PORT}/docs")
print(f"{'='*50}\n")
return subprocess.Popen(
cmd,
cmd,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
universal_newlines=True,
bufsize=1
bufsize=1,
)
def run_frontend():
"""Run the frontend development server"""
frontend_dir = PROJECT_ROOT / "frontend"
os.chdir(frontend_dir)
cmd = [sys.executable, "start_dev_server.py"]
env = os.environ.copy()
env["VITE_DEV_SERVER_HOST"] = FRONTEND_HOST
env["VITE_DEV_SERVER_PORT"] = str(FRONTEND_PORT)
print(f"\n{'='*50}")
print(f"Starting Frontend Server at http://{FRONTEND_HOST}:{FRONTEND_PORT}")
print(f"{'='*50}\n")
return subprocess.Popen(
cmd,
env=env,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
universal_newlines=True,
bufsize=1
bufsize=1,
)
def print_process_output(process, prefix):
"""Print process output with a prefix"""
for line in iter(process.stdout.readline, ''):
for line in iter(process.stdout.readline, ""):
if not line:
break
print(f"{prefix} | {line}", end='')
print(f"{prefix} | {line}", end="")
def main():
"""Main function to start both servers"""
print("\n🚀 Starting Chatterbox UI Development Environment")
# Start the backend server
backend_process = run_backend()
# Give the backend a moment to start
time.sleep(2)
# Start the frontend server
frontend_process = run_frontend()
# Create threads to monitor and print output
backend_monitor = threading.Thread(
target=print_process_output,
args=(backend_process, "BACKEND"),
daemon=True
target=print_process_output, args=(backend_process, "BACKEND"), daemon=True
)
frontend_monitor = threading.Thread(
target=print_process_output,
args=(frontend_process, "FRONTEND"),
daemon=True
target=print_process_output, args=(frontend_process, "FRONTEND"), daemon=True
)
backend_monitor.start()
frontend_monitor.start()
# Setup signal handling for graceful shutdown
def signal_handler(sig, frame):
print("\n\n🛑 Shutting down servers...")
@ -117,16 +125,16 @@ def main():
# Threads are daemon, so they'll exit when the main thread exits
print("✅ Servers stopped successfully")
sys.exit(0)
signal.signal(signal.SIGINT, signal_handler)
# Print access information
print("\n📋 Access Information:")
print(f" • Frontend: http://{FRONTEND_HOST}:{FRONTEND_PORT}")
print(f" • Backend API: http://{BACKEND_HOST}:{BACKEND_PORT}/api")
print(f" • API Documentation: http://{BACKEND_HOST}:{BACKEND_PORT}/docs")
print("\n⚠️ Press Ctrl+C to stop both servers\n")
# Keep the main process running
try:
while True:
@ -134,5 +142,6 @@ def main():
except KeyboardInterrupt:
signal_handler(None, None)
if __name__ == "__main__":
main()