Compare commits
13 Commits
Author | SHA1 | Date |
---|---|---|
|
733c9d1b5f | |
|
9c605cd3a0 | |
|
d3ac8bf4eb | |
|
75a2a37252 | |
|
b28a9bcf58 | |
|
4f47d69aaa | |
|
f095bb14e5 | |
|
93e0407eac | |
|
c9593fe6cc | |
|
cbc164c7a3 | |
|
41f95cdee3 | |
|
b62eb0211f | |
|
948712bb3f |
|
@ -22,3 +22,4 @@ backend/tts_generated_dialogs/
|
|||
|
||||
# Node.js dependencies
|
||||
node_modules/
|
||||
.aider*
|
||||
|
|
|
@ -0,0 +1,188 @@
|
|||
# Chatterbox TTS Backend: Bounded Concurrency + File I/O Offload Plan
|
||||
|
||||
Date: 2025-08-14
|
||||
Owner: Backend
|
||||
Status: Proposed (ready to implement)
|
||||
|
||||
## Goals
|
||||
|
||||
- Increase GPU utilization and reduce wall-clock time for dialog generation.
|
||||
- Keep model lifecycle stable (leveraging current `ModelManager`).
|
||||
- Minimal-risk changes: no API shape changes to clients.
|
||||
|
||||
## Scope
|
||||
|
||||
- Implement bounded concurrency for per-line speech chunk generation within a single dialog request.
|
||||
- Offload audio file writes to threads to overlap GPU compute and disk I/O.
|
||||
- Add configuration knobs to tune concurrency.
|
||||
|
||||
## Current State (References)
|
||||
|
||||
- `backend/app/services/dialog_processor_service.py`
|
||||
- `DialogProcessorService.process_dialog()` iterates items and awaits `tts_service.generate_speech(...)` sequentially (lines ~171–201).
|
||||
- `backend/app/services/tts_service.py`
|
||||
- `TTSService.generate_speech()` runs the TTS forward and calls `torchaudio.save(...)` on the event loop thread (blocking).
|
||||
- `backend/app/services/model_manager.py`
|
||||
- `ModelManager.using()` tracks active work; prevents idle eviction during requests.
|
||||
- `backend/app/routers/dialog.py`
|
||||
- `process_dialog_flow()` expects ordered `segment_files` and then concatenates; good to keep order stable.
|
||||
|
||||
## Design Overview
|
||||
|
||||
1) Bounded concurrency at dialog level
|
||||
|
||||
- Plan all output segments with a stable `segment_idx` (including speech chunks, silence, and reused audio).
|
||||
- For speech chunks, schedule concurrent async tasks with a global semaphore set by config `TTS_MAX_CONCURRENCY` (start at 3–4).
|
||||
- Await all tasks and collate results by `segment_idx` to preserve order.
|
||||
|
||||
2) File I/O offload
|
||||
|
||||
- Replace direct `torchaudio.save(...)` with `await asyncio.to_thread(torchaudio.save, ...)` in `TTSService.generate_speech()`.
|
||||
- This lets the next GPU forward start while previous file writes happen on worker threads.
|
||||
|
||||
## Configuration
|
||||
|
||||
Add to `backend/app/config.py`:
|
||||
|
||||
- `TTS_MAX_CONCURRENCY: int` (default: `int(os.getenv("TTS_MAX_CONCURRENCY", "3"))`).
|
||||
- Optional (future): `TTS_ENABLE_AMP_ON_CUDA: bool = True` to allow mixed precision on CUDA only.
|
||||
|
||||
## Implementation Steps
|
||||
|
||||
### A. Dialog-level concurrency
|
||||
|
||||
- File: `backend/app/services/dialog_processor_service.py`
|
||||
- Function: `DialogProcessorService.process_dialog()`
|
||||
|
||||
1. Planning pass to assign indices
|
||||
|
||||
- Iterate `dialog_items` and build a list `planned_segments` entries:
|
||||
- For silence or reuse: immediately append a final result with assigned `segment_idx` and continue.
|
||||
- For speech: split into `text_chunks`; for each chunk create a planned entry: `{ segment_idx, type: 'speech', speaker_id, text_chunk, abs_speaker_sample_path, tts_params }`.
|
||||
- Increment `segment_idx` for every planned segment (speech chunk or silence/reuse) to preserve final order.
|
||||
|
||||
2. Concurrency setup
|
||||
|
||||
- Create `sem = asyncio.Semaphore(config.TTS_MAX_CONCURRENCY)`.
|
||||
- For each planned speech segment, create a task with an inner wrapper:
|
||||
|
||||
```python
|
||||
async def run_one(planned):
|
||||
async with sem:
|
||||
try:
|
||||
out_path = await self.tts_service.generate_speech(
|
||||
text=planned.text_chunk,
|
||||
speaker_sample_path=planned.abs_speaker_sample_path,
|
||||
output_filename_base=planned.filename_base,
|
||||
output_dir=dialog_temp_dir,
|
||||
exaggeration=planned.exaggeration,
|
||||
cfg_weight=planned.cfg_weight,
|
||||
temperature=planned.temperature,
|
||||
)
|
||||
return planned.segment_idx, {"type": "speech", "path": str(out_path), "speaker_id": planned.speaker_id, "text_chunk": planned.text_chunk}
|
||||
except Exception as e:
|
||||
return planned.segment_idx, {"type": "error", "message": f"Error generating speech: {e}", "text_chunk": planned.text_chunk}
|
||||
```
|
||||
|
||||
- Schedule with `asyncio.create_task(run_one(p))` and collect tasks.
|
||||
|
||||
3. Await and collate
|
||||
|
||||
- `results_map = {}`; for each completed task, set `results_map[idx] = payload`.
|
||||
- Merge: start with all previously final (silence/reuse/error) entries placed by `segment_idx`, then fill speech results by `segment_idx` into a single `segment_results` list sorted ascending by index.
|
||||
- Keep `processing_log` entries for each planned segment (queued, started, finished, errors).
|
||||
|
||||
4. Return value unchanged
|
||||
|
||||
- Return `{"log": ..., "segment_files": segment_results, "temp_dir": str(dialog_temp_dir)}`. This maintains router and concatenator behavior.
|
||||
|
||||
### B. Offload audio writes
|
||||
|
||||
- File: `backend/app/services/tts_service.py`
|
||||
- Function: `TTSService.generate_speech()`
|
||||
|
||||
1. After obtaining `wav` tensor, replace:
|
||||
|
||||
```python
|
||||
# torchaudio.save(str(output_file_path), wav, self.model.sr)
|
||||
```
|
||||
|
||||
with:
|
||||
|
||||
```python
|
||||
await asyncio.to_thread(torchaudio.save, str(output_file_path), wav, self.model.sr)
|
||||
```
|
||||
|
||||
- Keep the rest of cleanup logic (delete `wav`, `gc.collect()`, cache emptying) unchanged.
|
||||
|
||||
2. Optional (CUDA-only AMP)
|
||||
|
||||
- If CUDA is used and `config.TTS_ENABLE_AMP_ON_CUDA` is True, wrap forward with AMP:
|
||||
|
||||
```python
|
||||
with torch.cuda.amp.autocast(dtype=torch.float16):
|
||||
wav = self.model.generate(...)
|
||||
```
|
||||
|
||||
- Leave MPS/CPU code path as-is.
|
||||
|
||||
## Error Handling & Ordering
|
||||
|
||||
- Every planned segment owns a unique `segment_idx`.
|
||||
- On failure, insert an error record at that index; downstream concatenation will skip missing/nonexistent paths already.
|
||||
- Preserve exact output order expected by `routers/dialog.py::process_dialog_flow()`.
|
||||
|
||||
## Performance Expectations
|
||||
|
||||
- GPU util should increase from ~50% to 75–90% depending on dialog size and line lengths.
|
||||
- Wall-clock reduction is workload-dependent; target 1.5–2.5x on multi-line dialogs.
|
||||
|
||||
## Metrics & Instrumentation
|
||||
|
||||
- Add timestamped log entries per segment: planned→queued→started→saved.
|
||||
- Log effective concurrency (max in-flight), and cumulative GPU time if available.
|
||||
- Optionally add a simple timing summary at end of `process_dialog()`.
|
||||
|
||||
## Testing Plan
|
||||
|
||||
1. Unit-ish
|
||||
|
||||
- Small dialog (3 speech lines, 1 silence). Ensure ordering is stable and files exist.
|
||||
- Introduce an invalid speaker to verify error propagation doesn’t break the rest.
|
||||
|
||||
2. Integration
|
||||
|
||||
- POST `/api/dialog/generate` with 20–50 mixed-length lines and a couple silences.
|
||||
- Validate: response OK, concatenated file exists, zip contains all generated speech segments, order preserved.
|
||||
- Compare runtime vs. sequential baseline (before/after).
|
||||
|
||||
3. Stress/limits
|
||||
|
||||
- Long lines split into many chunks; verify no OOM with `TTS_MAX_CONCURRENCY`=3.
|
||||
- Try `TTS_MAX_CONCURRENCY`=1 to simulate sequential; compare metrics.
|
||||
|
||||
## Rollout & Config Defaults
|
||||
|
||||
- Default `TTS_MAX_CONCURRENCY=3`.
|
||||
- Expose via environment variable; no client changes needed.
|
||||
- If instability observed, set `TTS_MAX_CONCURRENCY=1` to revert to sequential behavior quickly.
|
||||
|
||||
## Risks & Mitigations
|
||||
|
||||
- OOM under high concurrency → Mitigate with low default, easy rollback, and chunking already in place.
|
||||
- Disk I/O saturation → Offload to threads; if disk is a bottleneck, decrease concurrency.
|
||||
- Model thread safety → We call `model.generate` concurrently only up to semaphore cap; if underlying library is not thread-safe for forward passes, consider serializing forwards but still overlapping with file I/O; early logs will reveal.
|
||||
|
||||
## Follow-up (Out of Scope for this change)
|
||||
|
||||
- Dynamic batching queue inside `TTSService` for further GPU efficiency.
|
||||
- CUDA AMP enablement and profiling.
|
||||
- Per-speaker sub-queues if batching requires same-speaker inputs.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- `TTS_MAX_CONCURRENCY` is configurable; default=3.
|
||||
- File writes occur via `asyncio.to_thread`.
|
||||
- Order of `segment_files` unchanged relative to sequential output.
|
||||
- End-to-end works for both small and large dialogs; error cases logged.
|
||||
- Observed GPU utilization and runtime improve on representative dialog.
|
|
@ -0,0 +1,138 @@
|
|||
# Frontend Review and Recommendations
|
||||
|
||||
Date: 2025-08-12T11:32:16-05:00
|
||||
Scope: `frontend/` of `chatterbox-test` monorepo
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
- Static vanilla JS frontend served by `frontend/start_dev_server.py` interacting with FastAPI backend under `/api`.
|
||||
- Solid feature set (speaker management, dialog editor, per-line generation, full dialog generation, save/load) with robust error handling.
|
||||
- Key issues: inconsistent API trailing slashes, Jest/babel-jest version/config mismatch, minor state duplication, alert/confirm UX, overly dark border color, token in `package.json` repo URL.
|
||||
|
||||
---
|
||||
|
||||
## Findings
|
||||
|
||||
- **Framework/structure**
|
||||
- `frontend/` is static vanilla JS. Main files:
|
||||
- `index.html`, `js/app.js`, `js/api.js`, `js/config.js`, `css/style.css`.
|
||||
- Dev server: `frontend/start_dev_server.py` (CORS, env-based port/host).
|
||||
|
||||
- **API client vs backend routes (trailing slashes)**
|
||||
- Frontend `frontend/js/api.js` currently uses:
|
||||
- `getSpeakers()`: `${API_BASE_URL}/speakers/` (trailing).
|
||||
- `addSpeaker()`: `${API_BASE_URL}/speakers/` (trailing).
|
||||
- `deleteSpeaker()`: `${API_BASE_URL}/speakers/${speakerId}/` (trailing).
|
||||
- `generateLine()`: `${API_BASE_URL}/dialog/generate_line`.
|
||||
- `generateDialog()`: `${API_BASE_URL}/dialog/generate`.
|
||||
- Backend routes:
|
||||
- `backend/app/routers/speakers.py`: `GET/POST /` and `DELETE /{speaker_id}` (no trailing slash on delete when prefixed under `/api/speakers`).
|
||||
- `backend/app/routers/dialog.py`: `/generate_line` and `/generate` (match frontend).
|
||||
- Tests in `frontend/tests/api.test.js` expect no trailing slashes for `/speakers` and `/speakers/{id}`.
|
||||
- Implication: Inconsistent trailing slashes can cause test failures and possible 404s for delete.
|
||||
|
||||
- **Payload schema inconsistencies**
|
||||
- `generateDialog()` JSDoc shows `silence` as `{ duration_ms: 500 }` but backend expects `duration` (seconds). UI also uses `duration` seconds.
|
||||
|
||||
- **Form fields alignment**
|
||||
- Speaker add uses `name` and `audio_file` which match backend (`Form` and `File`).
|
||||
|
||||
- **State management duplication in `frontend/js/app.js`**
|
||||
- `dialogItems` and `availableSpeakersCache` defined at module scope and again inside `initializeDialogEditor()`, creating shadowing risk. Consolidate to a single source of truth.
|
||||
|
||||
- **UX considerations**
|
||||
- Heavy use of `alert()`/`confirm()`. Prefer inline notifications/banners and per-row error chips (you already render `item.error`).
|
||||
- Add global loading/disabled states for long actions (e.g., full dialog generation, speaker add/delete).
|
||||
|
||||
- **CSS theme issue**
|
||||
- `--border-light` is `#1b0404` (dark red); semantically a light gray fits better and improves contrast harmony.
|
||||
|
||||
- **Testing/Jest/Babel config**
|
||||
- Root `package.json` uses `jest@^29.7.0` with `babel-jest@^30.0.0-beta.3` (major mismatch). Align versions.
|
||||
- No `jest.config.cjs` to configure `transform` via `babel-jest` for ESM modules.
|
||||
|
||||
- **Security**
|
||||
- `package.json` `repository.url` embeds a token. Remove secrets from VCS immediately.
|
||||
|
||||
- **Dev scripts**
|
||||
- Only `"test": "jest"` present. Add scripts to run the frontend dev server and test config explicitly.
|
||||
|
||||
- **Response handling consistency**
|
||||
- `generateLine()` parses via `response.text()` then `JSON.parse()`. Others use `response.json()`. Standardize for consistency.
|
||||
|
||||
---
|
||||
|
||||
## Recommended Actions (Phase 1: Quick wins)
|
||||
|
||||
- **Normalize API paths in `frontend/js/api.js`**
|
||||
- Use no trailing slashes:
|
||||
- `GET/POST`: `${API_BASE_URL}/speakers`
|
||||
- `DELETE`: `${API_BASE_URL}/speakers/${speakerId}`
|
||||
- Keep dialog endpoints unchanged.
|
||||
|
||||
- **Fix JSDoc for `generateDialog()`**
|
||||
- Use `silence: { duration: number }` (seconds), not `duration_ms`.
|
||||
|
||||
- **Refactor `frontend/js/app.js` state**
|
||||
- Remove duplicate `dialogItems`/`availableSpeakersCache` declarations. Choose module-scope or function-scope, and pass references.
|
||||
|
||||
- **Improve UX**
|
||||
- Replace `alert/confirm` with inline banners near `#results-display` and per-row error chips (extend existing `.line-error-msg`).
|
||||
- Add disabled/loading states for global generate and speaker actions.
|
||||
|
||||
- **CSS tweak**
|
||||
- Set `--border-light: #e5e7eb;` (or similar) to reflect a light border.
|
||||
|
||||
- **Harden tests/Jest config**
|
||||
- Align versions: either Jest 29 + `babel-jest` 29, or upgrade both to 30 stable together.
|
||||
- Add `jest.config.cjs` with `transform` using `babel-jest` and suitable `testEnvironment`.
|
||||
- Ensure tests expect normalized API paths (recommended to change code to match tests).
|
||||
|
||||
- **Dev scripts**
|
||||
- Add to root `package.json`:
|
||||
- `"frontend:dev": "python3 frontend/start_dev_server.py"`
|
||||
- `"test:frontend": "jest --config ./jest.config.cjs"`
|
||||
|
||||
- **Sanitize repository URL**
|
||||
- Remove embedded token from `package.json`.
|
||||
|
||||
- **Standardize response parsing**
|
||||
- Switch `generateLine()` to `response.json()` unless backend returns `text/plain`.
|
||||
|
||||
---
|
||||
|
||||
## Backend Endpoint Confirmation
|
||||
|
||||
- `speakers` router (`backend/app/routers/speakers.py`):
|
||||
- List/Create: `GET /`, `POST /` (when mounted under `/api/speakers` → `/api/speakers/`).
|
||||
- Delete: `DELETE /{speaker_id}` (→ `/api/speakers/{speaker_id}`), no trailing slash.
|
||||
- `dialog` router (`backend/app/routers/dialog.py`):
|
||||
- `POST /generate_line`, `POST /generate` (mounted under `/api/dialog`).
|
||||
|
||||
---
|
||||
|
||||
## Proposed Implementation Plan
|
||||
|
||||
- **Phase 1 (1–2 hours)**
|
||||
- Normalize API paths in `api.js`.
|
||||
- Fix JSDoc for `generateDialog`.
|
||||
- Consolidate dialog state in `app.js`.
|
||||
- Adjust `--border-light` to light gray.
|
||||
- Add `jest.config.cjs`, align Jest/babel-jest versions.
|
||||
- Add dev/test scripts.
|
||||
- Remove token from `package.json`.
|
||||
|
||||
- **Phase 2 (2–4 hours)**
|
||||
- Inline notifications and comprehensive loading/disabled states.
|
||||
|
||||
- **Phase 3 (optional)**
|
||||
- ESLint + Prettier.
|
||||
- Consider Vite migration (HMR, proxy to backend, improved DX).
|
||||
|
||||
---
|
||||
|
||||
## Notes
|
||||
- Current local time captured for this review: 2025-08-12T11:32:16-05:00.
|
||||
- Frontend config (`frontend/js/config.js`) supports env overrides for API base and dev server port.
|
||||
- Tests (`frontend/tests/api.test.js`) currently assume endpoints without trailing slashes.
|
|
@ -0,0 +1,204 @@
|
|||
# Unload Model on Idle: Implementation Plan
|
||||
|
||||
## Goals
|
||||
- Automatically unload large TTS model(s) when idle to reduce RAM/VRAM usage.
|
||||
- Lazy-load on demand without breaking API semantics.
|
||||
- Configurable timeout and safety controls.
|
||||
|
||||
## Requirements
|
||||
- Config-driven idle timeout and poll interval.
|
||||
- Thread-/async-safe across concurrent requests.
|
||||
- No unload while an inference is in progress.
|
||||
- Clear logs and metrics for load/unload events.
|
||||
|
||||
## Configuration
|
||||
File: `backend/app/config.py`
|
||||
- Add:
|
||||
- `MODEL_IDLE_TIMEOUT_SECONDS: int = 900` (0 disables eviction)
|
||||
- `MODEL_IDLE_CHECK_INTERVAL_SECONDS: int = 60`
|
||||
- `MODEL_EVICTION_ENABLED: bool = True`
|
||||
- Bind to env: `MODEL_IDLE_TIMEOUT_SECONDS`, `MODEL_IDLE_CHECK_INTERVAL_SECONDS`, `MODEL_EVICTION_ENABLED`.
|
||||
|
||||
## Design
|
||||
### ModelManager (Singleton)
|
||||
File: `backend/app/services/model_manager.py` (new)
|
||||
- Responsibilities:
|
||||
- Manage lifecycle (load/unload) of the TTS model/pipeline.
|
||||
- Provide `get()` that returns a ready model (lazy-load if needed) and updates `last_used`.
|
||||
- Track active request count to block eviction while > 0.
|
||||
- Internals:
|
||||
- `self._model` (or components), `self._last_used: float`, `self._active: int`.
|
||||
- Locks: `asyncio.Lock` for load/unload; `asyncio.Lock` or `asyncio.Semaphore` for counters.
|
||||
- Optional CUDA cleanup: `torch.cuda.empty_cache()` after unload.
|
||||
- API:
|
||||
- `async def get(self) -> Model`: ensures loaded; bumps `last_used`.
|
||||
- `async def load(self)`: idempotent; guarded by lock.
|
||||
- `async def unload(self)`: only when `self._active == 0`; clears refs and caches.
|
||||
- `def touch(self)`: update `last_used`.
|
||||
- Context helper: `async def using(self)`: async context manager incrementing/decrementing `active` safely.
|
||||
|
||||
### Idle Reaper Task
|
||||
Registration: FastAPI startup (e.g., in `backend/app/main.py`)
|
||||
- Background task loop every `MODEL_IDLE_CHECK_INTERVAL_SECONDS`:
|
||||
- If eviction enabled and timeout > 0 and model is loaded and `active == 0` and `now - last_used >= timeout`, call `unload()`.
|
||||
- Handle cancellation on shutdown.
|
||||
|
||||
### API Integration
|
||||
- Replace direct model access in endpoints with:
|
||||
```python
|
||||
manager = ModelManager.instance()
|
||||
async with manager.using():
|
||||
model = await manager.get()
|
||||
# perform inference
|
||||
```
|
||||
- Optionally call `manager.touch()` at request start for non-inference paths that still need the model resident.
|
||||
|
||||
## Pseudocode
|
||||
```python
|
||||
# services/model_manager.py
|
||||
import time, asyncio
|
||||
from typing import Optional
|
||||
from .config import settings
|
||||
|
||||
class ModelManager:
|
||||
_instance: Optional["ModelManager"] = None
|
||||
|
||||
def __init__(self):
|
||||
self._model = None
|
||||
self._last_used = time.time()
|
||||
self._active = 0
|
||||
self._lock = asyncio.Lock()
|
||||
self._counter_lock = asyncio.Lock()
|
||||
|
||||
@classmethod
|
||||
def instance(cls):
|
||||
if not cls._instance:
|
||||
cls._instance = cls()
|
||||
return cls._instance
|
||||
|
||||
async def load(self):
|
||||
async with self._lock:
|
||||
if self._model is not None:
|
||||
return
|
||||
# ... load model/pipeline here ...
|
||||
self._model = await load_pipeline()
|
||||
self._last_used = time.time()
|
||||
|
||||
async def unload(self):
|
||||
async with self._lock:
|
||||
if self._model is None:
|
||||
return
|
||||
if self._active > 0:
|
||||
return # safety: do not unload while in use
|
||||
# ... free resources ...
|
||||
self._model = None
|
||||
try:
|
||||
import torch
|
||||
torch.cuda.empty_cache()
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
async def get(self):
|
||||
if self._model is None:
|
||||
await self.load()
|
||||
self._last_used = time.time()
|
||||
return self._model
|
||||
|
||||
async def _inc(self):
|
||||
async with self._counter_lock:
|
||||
self._active += 1
|
||||
|
||||
async def _dec(self):
|
||||
async with self._counter_lock:
|
||||
self._active = max(0, self._active - 1)
|
||||
self._last_used = time.time()
|
||||
|
||||
def last_used(self):
|
||||
return self._last_used
|
||||
|
||||
def is_loaded(self):
|
||||
return self._model is not None
|
||||
|
||||
def active(self):
|
||||
return self._active
|
||||
|
||||
def using(self):
|
||||
manager = self
|
||||
class _Ctx:
|
||||
async def __aenter__(self):
|
||||
await manager._inc()
|
||||
return manager
|
||||
async def __aexit__(self, exc_type, exc, tb):
|
||||
await manager._dec()
|
||||
return _Ctx()
|
||||
|
||||
# main.py (startup)
|
||||
@app.on_event("startup")
|
||||
async def start_reaper():
|
||||
async def reaper():
|
||||
while True:
|
||||
try:
|
||||
await asyncio.sleep(settings.MODEL_IDLE_CHECK_INTERVAL_SECONDS)
|
||||
if not settings.MODEL_EVICTION_ENABLED:
|
||||
continue
|
||||
timeout = settings.MODEL_IDLE_TIMEOUT_SECONDS
|
||||
if timeout <= 0:
|
||||
continue
|
||||
m = ModelManager.instance()
|
||||
if m.is_loaded() and m.active() == 0 and (time.time() - m.last_used()) >= timeout:
|
||||
await m.unload()
|
||||
except asyncio.CancelledError:
|
||||
break
|
||||
except Exception as e:
|
||||
logger.exception("Idle reaper error: %s", e)
|
||||
app.state._model_reaper_task = asyncio.create_task(reaper())
|
||||
|
||||
@app.on_event("shutdown")
|
||||
async def stop_reaper():
|
||||
task = getattr(app.state, "_model_reaper_task", None)
|
||||
if task:
|
||||
task.cancel()
|
||||
with contextlib.suppress(Exception):
|
||||
await task
|
||||
```
|
||||
```
|
||||
|
||||
## Observability
|
||||
- Logs: model load/unload, reaper decisions, active count.
|
||||
- Metrics (optional): counters and gauges (load events, active, residency time).
|
||||
|
||||
## Safety & Edge Cases
|
||||
- Avoid unload when `active > 0`.
|
||||
- Guard multiple loads/unloads with lock.
|
||||
- Multi-worker servers: each worker manages its own model.
|
||||
- Cold-start latency: document expected additional latency for first request after idle unload.
|
||||
|
||||
## Testing
|
||||
- Unit tests for `ModelManager`: load/unload idempotency, counter behavior.
|
||||
- Simulated reaper triggering with short timeouts.
|
||||
- Endpoint tests: concurrency (N simultaneous inferences), ensure no unload mid-flight.
|
||||
|
||||
## Rollout Plan
|
||||
1. Introduce config + Manager (no reaper), switch endpoints to `using()`.
|
||||
2. Enable reaper with long timeout in staging; observe logs/metrics.
|
||||
3. Tune timeout; enable in production.
|
||||
|
||||
## Tasks Checklist
|
||||
- [ ] Add config flags and defaults in `backend/app/config.py`.
|
||||
- [ ] Create `backend/app/services/model_manager.py`.
|
||||
- [ ] Register startup/shutdown reaper in app init (`backend/app/main.py`).
|
||||
- [ ] Refactor endpoints to use `ModelManager.instance().using()` and `get()`.
|
||||
- [ ] Add logs and optional metrics.
|
||||
- [ ] Add unit/integration tests.
|
||||
- [ ] Update README/ops docs.
|
||||
|
||||
## Alternatives Considered
|
||||
- Gunicorn/uvicorn worker preloading with external idle supervisor: more complexity, less portability.
|
||||
- OS-level cgroup memory pressure eviction: opaque and risky for correctness.
|
||||
|
||||
## Configuration Examples
|
||||
```
|
||||
MODEL_EVICTION_ENABLED=true
|
||||
MODEL_IDLE_TIMEOUT_SECONDS=900
|
||||
MODEL_IDLE_CHECK_INTERVAL_SECONDS=60
|
||||
```
|
|
@ -359,7 +359,7 @@ The API uses the following directory structure (configurable in `app/config.py`)
|
|||
- **Temporary Files**: `{PROJECT_ROOT}/tts_temp_outputs/`
|
||||
|
||||
### CORS Settings
|
||||
- Allowed Origins: `http://localhost:8001`, `http://127.0.0.1:8001`
|
||||
- Allowed Origins: `http://localhost:8001`, `http://127.0.0.1:8001` (plus any `FRONTEND_HOST:FRONTEND_PORT` when using `start_servers.py`)
|
||||
- Allowed Methods: All
|
||||
- Allowed Headers: All
|
||||
- Credentials: Enabled
|
||||
|
|
|
@ -58,7 +58,7 @@ The application uses environment variables for configuration. Three `.env` files
|
|||
- `VITE_DEV_SERVER_HOST`: Frontend development server host
|
||||
|
||||
#### CORS Configuration
|
||||
- `CORS_ORIGINS`: Comma-separated list of allowed origins
|
||||
- `CORS_ORIGINS`: Comma-separated list of allowed origins. When using `start_servers.py` with the default `FRONTEND_HOST=0.0.0.0` and no explicit `CORS_ORIGINS`, CORS will allow all origins (wildcard) to simplify development.
|
||||
|
||||
#### Device Configuration
|
||||
- `DEVICE`: Device for TTS model (auto, cpu, cuda, mps)
|
||||
|
@ -101,7 +101,7 @@ CORS_ORIGINS=http://localhost:3000
|
|||
### Common Issues
|
||||
|
||||
1. **Permission Errors**: Ensure the `PROJECT_ROOT` directory is writable
|
||||
2. **CORS Errors**: Check that your frontend URL is in `CORS_ORIGINS`
|
||||
2. **CORS Errors**: Check that your frontend URL is in `CORS_ORIGINS`. (When using `start_servers.py`, your specified `FRONTEND_HOST:FRONTEND_PORT` will be auto‑included.)
|
||||
3. **Model Loading Errors**: Verify `DEVICE` setting matches your hardware
|
||||
4. **Path Errors**: Ensure all path variables point to existing, accessible directories
|
||||
|
||||
|
|
56
README.md
56
README.md
|
@ -9,6 +9,7 @@ A comprehensive text-to-speech application with multiple interfaces for generati
|
|||
- **Dialog Generation**: Create multi-speaker conversations with configurable silence gaps
|
||||
- **Audiobook Generation**: Convert long-form text into narrated audiobooks
|
||||
- **Speaker Management**: Add/remove speakers with custom audio samples
|
||||
- **Paste Script (JSONL) Import**: Paste a dialog script as JSONL directly into the editor via a modal
|
||||
- **Memory Optimization**: Automatic model cleanup after generation
|
||||
- **Output Organization**: Files saved in organized directories with ZIP packaging
|
||||
|
||||
|
@ -23,7 +24,6 @@ A comprehensive text-to-speech application with multiple interfaces for generati
|
|||
pip install -r requirements.txt
|
||||
npm install
|
||||
```
|
||||
|
||||
2. Run automated setup:
|
||||
```bash
|
||||
python setup.py
|
||||
|
@ -33,6 +33,24 @@ A comprehensive text-to-speech application with multiple interfaces for generati
|
|||
- Add audio samples (WAV format) to `speaker_data/speaker_samples/`
|
||||
- Configure speakers in `speaker_data/speakers.yaml`
|
||||
|
||||
### Windows Quick Start
|
||||
|
||||
On Windows, a PowerShell setup script is provided to automate environment setup and startup.
|
||||
|
||||
```powershell
|
||||
# From the repository root in PowerShell
|
||||
./setup-windows.ps1
|
||||
|
||||
# First time only, if scripts are blocked:
|
||||
# Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
|
||||
```
|
||||
|
||||
What it does:
|
||||
- Creates/uses `.venv`
|
||||
- Upgrades pip and installs deps from `backend/requirements.txt` and root `requirements.txt`
|
||||
- Creates a default `.env` with sensible ports if missing
|
||||
- Starts both servers via `start_servers.py`
|
||||
|
||||
### Running the Application
|
||||
|
||||
**Full-Stack Web Application:**
|
||||
|
@ -41,6 +59,12 @@ A comprehensive text-to-speech application with multiple interfaces for generati
|
|||
python start_servers.py
|
||||
```
|
||||
|
||||
On Windows, you can also use the one-liner PowerShell script:
|
||||
|
||||
```powershell
|
||||
./setup-windows.ps1
|
||||
```
|
||||
|
||||
**Individual Components:**
|
||||
```bash
|
||||
# Backend only (FastAPI)
|
||||
|
@ -56,7 +80,26 @@ python gradio_app.py
|
|||
## Usage
|
||||
|
||||
### Web Interface
|
||||
Access the modern web UI at `http://localhost:8001` for interactive dialog creation with drag-and-drop editing.
|
||||
Access the modern web UI at `http://localhost:8001` for interactive dialog creation.
|
||||
|
||||
#### Paste Script (JSONL) in Dialog Editor
|
||||
Quickly load a dialog by pasting JSONL (one JSON object per line):
|
||||
|
||||
1. Click `Paste Script` in the Dialog Editor.
|
||||
2. Paste JSONL content, for example:
|
||||
|
||||
```jsonl
|
||||
{"type":"speech","speaker_id":"dummy_speaker","text":"Hello there!"}
|
||||
{"type":"silence","duration":0.5}
|
||||
{"type":"speech","speaker_id":"dummy_speaker","text":"This is the second line."}
|
||||
```
|
||||
|
||||
3. Click `Load` and confirm replacement if prompted.
|
||||
|
||||
Notes:
|
||||
- Input is validated per line; errors report line numbers.
|
||||
- The dialog is saved to localStorage, so it persists across refreshes.
|
||||
- Unknown `speaker_id`s will still load; add speakers later if needed.
|
||||
|
||||
### CLI Tools
|
||||
|
||||
|
@ -149,5 +192,12 @@ The application automatically:
|
|||
- **"Skipping unknown speaker"**: Configure speaker in `speaker_data/speakers.yaml`
|
||||
- **"Sample file not found"**: Verify audio files exist in `speaker_data/speaker_samples/`
|
||||
- **Memory issues**: Use model reinitialization options for long content
|
||||
- **CORS errors**: Check frontend/backend port configuration
|
||||
- **CORS errors**: Check frontend/backend port configuration (frontend origin is auto-included when using `start_servers.py`)
|
||||
- **Import errors**: Run `python import_helper.py` to check dependencies
|
||||
|
||||
### Windows-specific
|
||||
- If PowerShell blocks script execution, run once:
|
||||
```powershell
|
||||
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
|
||||
```
|
||||
- If Windows Firewall prompts the first time you run servers, allow access on your private network.
|
||||
|
|
|
@ -6,20 +6,34 @@ from dotenv import load_dotenv
|
|||
load_dotenv()
|
||||
|
||||
# Project root - can be overridden by environment variable
|
||||
PROJECT_ROOT = Path(os.getenv("PROJECT_ROOT", Path(__file__).parent.parent.parent)).resolve()
|
||||
PROJECT_ROOT = Path(
|
||||
os.getenv("PROJECT_ROOT", Path(__file__).parent.parent.parent)
|
||||
).resolve()
|
||||
|
||||
# Directory paths
|
||||
SPEAKER_DATA_BASE_DIR = Path(os.getenv("SPEAKER_DATA_BASE_DIR", str(PROJECT_ROOT / "speaker_data")))
|
||||
SPEAKER_SAMPLES_DIR = Path(os.getenv("SPEAKER_SAMPLES_DIR", str(SPEAKER_DATA_BASE_DIR / "speaker_samples")))
|
||||
SPEAKERS_YAML_FILE = Path(os.getenv("SPEAKERS_YAML_FILE", str(SPEAKER_DATA_BASE_DIR / "speakers.yaml")))
|
||||
SPEAKER_DATA_BASE_DIR = Path(
|
||||
os.getenv("SPEAKER_DATA_BASE_DIR", str(PROJECT_ROOT / "speaker_data"))
|
||||
)
|
||||
SPEAKER_SAMPLES_DIR = Path(
|
||||
os.getenv("SPEAKER_SAMPLES_DIR", str(SPEAKER_DATA_BASE_DIR / "speaker_samples"))
|
||||
)
|
||||
SPEAKERS_YAML_FILE = Path(
|
||||
os.getenv("SPEAKERS_YAML_FILE", str(SPEAKER_DATA_BASE_DIR / "speakers.yaml"))
|
||||
)
|
||||
|
||||
# TTS temporary output path (used by DialogProcessorService)
|
||||
TTS_TEMP_OUTPUT_DIR = Path(os.getenv("TTS_TEMP_OUTPUT_DIR", str(PROJECT_ROOT / "tts_temp_outputs")))
|
||||
TTS_TEMP_OUTPUT_DIR = Path(
|
||||
os.getenv("TTS_TEMP_OUTPUT_DIR", str(PROJECT_ROOT / "tts_temp_outputs"))
|
||||
)
|
||||
|
||||
# Final dialog output path (used by Dialog router and served by main app)
|
||||
# These are stored within the 'backend' directory to be easily servable.
|
||||
DIALOG_OUTPUT_PARENT_DIR = PROJECT_ROOT / "backend"
|
||||
DIALOG_GENERATED_DIR = Path(os.getenv("DIALOG_GENERATED_DIR", str(DIALOG_OUTPUT_PARENT_DIR / "tts_generated_dialogs")))
|
||||
DIALOG_GENERATED_DIR = Path(
|
||||
os.getenv(
|
||||
"DIALOG_GENERATED_DIR", str(DIALOG_OUTPUT_PARENT_DIR / "tts_generated_dialogs")
|
||||
)
|
||||
)
|
||||
|
||||
# Alias for clarity and backward compatibility
|
||||
DIALOG_OUTPUT_DIR = DIALOG_GENERATED_DIR
|
||||
|
@ -29,11 +43,41 @@ HOST = os.getenv("HOST", "0.0.0.0")
|
|||
PORT = int(os.getenv("PORT", "8000"))
|
||||
RELOAD = os.getenv("RELOAD", "true").lower() == "true"
|
||||
|
||||
# CORS configuration
|
||||
CORS_ORIGINS = [origin.strip() for origin in os.getenv("CORS_ORIGINS", "http://localhost:8001,http://127.0.0.1:8001").split(",")]
|
||||
# CORS configuration: determine allowed origins based on env & frontend binding
|
||||
_cors_env = os.getenv("CORS_ORIGINS", "")
|
||||
_frontend_host = os.getenv("FRONTEND_HOST")
|
||||
_frontend_port = os.getenv("FRONTEND_PORT")
|
||||
|
||||
# If the dev server is bound to 0.0.0.0 (all interfaces), allow all origins
|
||||
if _frontend_host == "0.0.0.0": # dev convenience when binding wildcard
|
||||
CORS_ORIGINS = ["*"]
|
||||
elif _cors_env:
|
||||
# parse comma-separated origins, strip whitespace
|
||||
CORS_ORIGINS = [origin.strip() for origin in _cors_env.split(",") if origin.strip()]
|
||||
else:
|
||||
# default to allow all origins in development
|
||||
CORS_ORIGINS = ["*"]
|
||||
|
||||
# Auto-include specific frontend origin when not using wildcard CORS
|
||||
if CORS_ORIGINS != ["*"] and _frontend_host and _frontend_port:
|
||||
_frontend_origin = f"http://{_frontend_host.strip()}:{_frontend_port.strip()}"
|
||||
if _frontend_origin not in CORS_ORIGINS:
|
||||
CORS_ORIGINS.append(_frontend_origin)
|
||||
|
||||
# Device configuration
|
||||
DEVICE = os.getenv("DEVICE", "auto")
|
||||
|
||||
# Concurrency configuration
|
||||
# Max number of concurrent TTS generation tasks per dialog request
|
||||
TTS_MAX_CONCURRENCY = int(os.getenv("TTS_MAX_CONCURRENCY", "3"))
|
||||
|
||||
# Model idle eviction configuration
|
||||
# Enable/disable idle-based model eviction
|
||||
MODEL_EVICTION_ENABLED = os.getenv("MODEL_EVICTION_ENABLED", "true").lower() == "true"
|
||||
# Unload model after this many seconds of inactivity (0 disables eviction)
|
||||
MODEL_IDLE_TIMEOUT_SECONDS = int(os.getenv("MODEL_IDLE_TIMEOUT_SECONDS", "900"))
|
||||
# How often the reaper checks for idleness
|
||||
MODEL_IDLE_CHECK_INTERVAL_SECONDS = int(os.getenv("MODEL_IDLE_CHECK_INTERVAL_SECONDS", "60"))
|
||||
|
||||
# Ensure directories exist
|
||||
SPEAKER_SAMPLES_DIR.mkdir(parents=True, exist_ok=True)
|
||||
|
|
|
@ -2,6 +2,10 @@ from fastapi import FastAPI
|
|||
from fastapi.staticfiles import StaticFiles
|
||||
from fastapi.middleware.cors import CORSMiddleware
|
||||
from pathlib import Path
|
||||
import asyncio
|
||||
import contextlib
|
||||
import logging
|
||||
import time
|
||||
from app.routers import speakers, dialog # Import the routers
|
||||
from app import config
|
||||
|
||||
|
@ -38,3 +42,47 @@ config.DIALOG_GENERATED_DIR.mkdir(parents=True, exist_ok=True)
|
|||
app.mount("/generated_audio", StaticFiles(directory=config.DIALOG_GENERATED_DIR), name="generated_audio")
|
||||
|
||||
# Further endpoints for speakers, dialog generation, etc., will be added here.
|
||||
|
||||
# --- Background task: idle model reaper ---
|
||||
logger = logging.getLogger("app.model_reaper")
|
||||
|
||||
@app.on_event("startup")
|
||||
async def _start_model_reaper():
|
||||
from app.services.model_manager import ModelManager
|
||||
|
||||
async def reaper():
|
||||
while True:
|
||||
try:
|
||||
await asyncio.sleep(config.MODEL_IDLE_CHECK_INTERVAL_SECONDS)
|
||||
if not getattr(config, "MODEL_EVICTION_ENABLED", True):
|
||||
continue
|
||||
timeout = getattr(config, "MODEL_IDLE_TIMEOUT_SECONDS", 0)
|
||||
if timeout <= 0:
|
||||
continue
|
||||
m = ModelManager.instance()
|
||||
if m.is_loaded() and m.active() == 0 and (time.time() - m.last_used()) >= timeout:
|
||||
logger.info("Idle timeout reached (%.0fs). Unloading model...", timeout)
|
||||
await m.unload()
|
||||
except asyncio.CancelledError:
|
||||
break
|
||||
except Exception:
|
||||
logger.exception("Model reaper encountered an error")
|
||||
|
||||
# Log eviction configuration at startup
|
||||
logger.info(
|
||||
"Model Eviction -> enabled: %s | idle_timeout: %ss | check_interval: %ss",
|
||||
getattr(config, "MODEL_EVICTION_ENABLED", True),
|
||||
getattr(config, "MODEL_IDLE_TIMEOUT_SECONDS", 0),
|
||||
getattr(config, "MODEL_IDLE_CHECK_INTERVAL_SECONDS", 60),
|
||||
)
|
||||
|
||||
app.state._model_reaper_task = asyncio.create_task(reaper())
|
||||
|
||||
|
||||
@app.on_event("shutdown")
|
||||
async def _stop_model_reaper():
|
||||
task = getattr(app.state, "_model_reaper_task", None)
|
||||
if task:
|
||||
task.cancel()
|
||||
with contextlib.suppress(Exception):
|
||||
await task
|
||||
|
|
|
@ -9,6 +9,8 @@ from app.services.speaker_service import SpeakerManagementService
|
|||
from app.services.dialog_processor_service import DialogProcessorService
|
||||
from app.services.audio_manipulation_service import AudioManipulationService
|
||||
from app import config
|
||||
from typing import AsyncIterator
|
||||
from app.services.model_manager import ModelManager
|
||||
|
||||
router = APIRouter()
|
||||
|
||||
|
@ -16,9 +18,12 @@ router = APIRouter()
|
|||
# These can be more sophisticated with a proper DI container or FastAPI's Depends system if services had complex init.
|
||||
# For now, direct instantiation or simple Depends is fine.
|
||||
|
||||
def get_tts_service():
|
||||
# Consider making device configurable
|
||||
return TTSService(device="mps")
|
||||
async def get_tts_service() -> AsyncIterator[TTSService]:
|
||||
"""Dependency that holds a usage token for the duration of the request."""
|
||||
manager = ModelManager.instance()
|
||||
async with manager.using():
|
||||
service = await manager.get_service()
|
||||
yield service
|
||||
|
||||
def get_speaker_management_service():
|
||||
return SpeakerManagementService()
|
||||
|
@ -32,7 +37,7 @@ def get_dialog_processor_service(
|
|||
def get_audio_manipulation_service():
|
||||
return AudioManipulationService()
|
||||
|
||||
# --- Helper function to manage TTS model loading/unloading ---
|
||||
# --- Helper imports ---
|
||||
|
||||
from app.models.dialog_models import SpeechItem, SilenceItem
|
||||
from app.services.tts_service import TTSService
|
||||
|
@ -128,19 +133,7 @@ async def generate_line(
|
|||
detail=error_detail
|
||||
)
|
||||
|
||||
async def manage_tts_model_lifecycle(tts_service: TTSService, task_function, *args, **kwargs):
|
||||
"""Loads TTS model, executes task, then unloads model."""
|
||||
try:
|
||||
print("API: Loading TTS model...")
|
||||
tts_service.load_model()
|
||||
return await task_function(*args, **kwargs)
|
||||
except Exception as e:
|
||||
# Log or handle specific exceptions if needed before re-raising
|
||||
print(f"API: Error during TTS model lifecycle or task execution: {e}")
|
||||
raise
|
||||
finally:
|
||||
print("API: Unloading TTS model...")
|
||||
tts_service.unload_model()
|
||||
# Removed per-request load/unload in favor of ModelManager idle eviction.
|
||||
|
||||
async def process_dialog_flow(
|
||||
request: DialogRequest,
|
||||
|
@ -274,12 +267,10 @@ async def generate_dialog_endpoint(
|
|||
- Concatenates all audio segments into a single file.
|
||||
- Creates a ZIP archive of all individual segments and the concatenated file.
|
||||
"""
|
||||
# Wrap the core processing logic with model loading/unloading
|
||||
return await manage_tts_model_lifecycle(
|
||||
tts_service,
|
||||
process_dialog_flow,
|
||||
request=request,
|
||||
dialog_processor=dialog_processor,
|
||||
# Execute core processing; ModelManager dependency keeps the model marked "in use".
|
||||
return await process_dialog_flow(
|
||||
request=request,
|
||||
dialog_processor=dialog_processor,
|
||||
audio_manipulator=audio_manipulator,
|
||||
background_tasks=background_tasks
|
||||
background_tasks=background_tasks,
|
||||
)
|
||||
|
|
|
@ -1,6 +1,8 @@
|
|||
from pathlib import Path
|
||||
from typing import List, Dict, Any, Union
|
||||
import re
|
||||
import asyncio
|
||||
from datetime import datetime
|
||||
|
||||
from .tts_service import TTSService
|
||||
from .speaker_service import SpeakerManagementService
|
||||
|
@ -92,24 +94,72 @@ class DialogProcessorService:
|
|||
|
||||
import shutil
|
||||
segment_idx = 0
|
||||
tasks = []
|
||||
results_map: Dict[int, Dict[str, Any]] = {}
|
||||
sem = asyncio.Semaphore(getattr(config, "TTS_MAX_CONCURRENCY", 2))
|
||||
|
||||
async def run_one(planned: Dict[str, Any]):
|
||||
async with sem:
|
||||
text_chunk = planned["text_chunk"]
|
||||
speaker_id = planned["speaker_id"]
|
||||
abs_speaker_sample_path = planned["abs_speaker_sample_path"]
|
||||
filename_base = planned["filename_base"]
|
||||
params = planned["params"]
|
||||
seg_idx = planned["segment_idx"]
|
||||
start_ts = datetime.now()
|
||||
start_line = (
|
||||
f"[{start_ts.isoformat(timespec='seconds')}] [TTS-TASK] START seg_idx={seg_idx} "
|
||||
f"speaker={speaker_id} chunk_len={len(text_chunk)} base={filename_base}"
|
||||
)
|
||||
try:
|
||||
out_path = await self.tts_service.generate_speech(
|
||||
text=text_chunk,
|
||||
speaker_id=speaker_id,
|
||||
speaker_sample_path=str(abs_speaker_sample_path),
|
||||
output_filename_base=filename_base,
|
||||
output_dir=dialog_temp_dir,
|
||||
exaggeration=params.get('exaggeration', 0.5),
|
||||
cfg_weight=params.get('cfg_weight', 0.5),
|
||||
temperature=params.get('temperature', 0.8),
|
||||
)
|
||||
end_ts = datetime.now()
|
||||
duration = (end_ts - start_ts).total_seconds()
|
||||
end_line = (
|
||||
f"[{end_ts.isoformat(timespec='seconds')}] [TTS-TASK] END seg_idx={seg_idx} "
|
||||
f"dur={duration:.2f}s -> {out_path}"
|
||||
)
|
||||
return seg_idx, {
|
||||
"type": "speech",
|
||||
"path": str(out_path),
|
||||
"speaker_id": speaker_id,
|
||||
"text_chunk": text_chunk,
|
||||
}, start_line + "\n" + f"Successfully generated segment: {out_path}" + "\n" + end_line
|
||||
except Exception as e:
|
||||
end_ts = datetime.now()
|
||||
err_line = (
|
||||
f"[{end_ts.isoformat(timespec='seconds')}] [TTS-TASK] ERROR seg_idx={seg_idx} "
|
||||
f"speaker={speaker_id} err={repr(e)}"
|
||||
)
|
||||
return seg_idx, {
|
||||
"type": "error",
|
||||
"message": f"Error generating speech for chunk '{text_chunk[:50]}...': {repr(e)}",
|
||||
"text_chunk": text_chunk,
|
||||
}, err_line
|
||||
|
||||
for i, item in enumerate(dialog_items):
|
||||
item_type = item.get("type")
|
||||
processing_log.append(f"Processing item {i+1}: type='{item_type}'")
|
||||
|
||||
# --- Universal: Handle reuse of existing audio for both speech and silence ---
|
||||
# --- Handle reuse of existing audio ---
|
||||
use_existing_audio = item.get("use_existing_audio", False)
|
||||
audio_url = item.get("audio_url")
|
||||
if use_existing_audio and audio_url:
|
||||
# Determine source path (handle both absolute and relative)
|
||||
# Map web URL to actual file location in tts_generated_dialogs
|
||||
if audio_url.startswith("/generated_audio/"):
|
||||
src_audio_path = config.DIALOG_OUTPUT_DIR / audio_url[len("/generated_audio/"):]
|
||||
else:
|
||||
src_audio_path = Path(audio_url)
|
||||
if not src_audio_path.is_absolute():
|
||||
# Assume relative to the generated audio root dir
|
||||
src_audio_path = config.DIALOG_OUTPUT_DIR / audio_url.lstrip("/\\")
|
||||
# Now src_audio_path should point to the real file in tts_generated_dialogs
|
||||
if src_audio_path.is_file():
|
||||
segment_filename = f"{output_base_name}_seg{segment_idx}_reused.wav"
|
||||
dest_path = (self.temp_audio_dir / output_base_name / segment_filename)
|
||||
|
@ -123,22 +173,18 @@ class DialogProcessorService:
|
|||
processing_log.append(f"[REUSE] Destination audio file was not created: {dest_path}")
|
||||
else:
|
||||
processing_log.append(f"[REUSE] Destination audio file created: {dest_path}, size={dest_path.stat().st_size} bytes")
|
||||
# Only include 'type' and 'path' so the concatenator always includes this segment
|
||||
segment_results.append({
|
||||
"type": item_type,
|
||||
"path": str(dest_path)
|
||||
})
|
||||
results_map[segment_idx] = {"type": item_type, "path": str(dest_path)}
|
||||
processing_log.append(f"Reused existing audio for item {i+1}: copied from {src_audio_path} to {dest_path}")
|
||||
except Exception as e:
|
||||
error_message = f"Failed to copy reused audio for item {i+1}: {e}"
|
||||
processing_log.append(error_message)
|
||||
segment_results.append({"type": "error", "message": error_message})
|
||||
results_map[segment_idx] = {"type": "error", "message": error_message}
|
||||
segment_idx += 1
|
||||
continue
|
||||
else:
|
||||
error_message = f"Audio file for reuse not found at {src_audio_path} for item {i+1}."
|
||||
processing_log.append(error_message)
|
||||
segment_results.append({"type": "error", "message": error_message})
|
||||
results_map[segment_idx] = {"type": "error", "message": error_message}
|
||||
segment_idx += 1
|
||||
continue
|
||||
|
||||
|
@ -147,70 +193,81 @@ class DialogProcessorService:
|
|||
text = item.get("text")
|
||||
if not speaker_id or not text:
|
||||
processing_log.append(f"Skipping speech item {i+1} due to missing speaker_id or text.")
|
||||
segment_results.append({"type": "error", "message": "Missing speaker_id or text"})
|
||||
results_map[segment_idx] = {"type": "error", "message": "Missing speaker_id or text"}
|
||||
segment_idx += 1
|
||||
continue
|
||||
|
||||
# Validate speaker_id and get speaker_sample_path
|
||||
speaker_info = self.speaker_service.get_speaker_by_id(speaker_id)
|
||||
if not speaker_info:
|
||||
processing_log.append(f"Speaker ID '{speaker_id}' not found. Skipping item {i+1}.")
|
||||
segment_results.append({"type": "error", "message": f"Speaker ID '{speaker_id}' not found"})
|
||||
results_map[segment_idx] = {"type": "error", "message": f"Speaker ID '{speaker_id}' not found"}
|
||||
segment_idx += 1
|
||||
continue
|
||||
if not speaker_info.sample_path:
|
||||
processing_log.append(f"Speaker ID '{speaker_id}' has no sample path defined. Skipping item {i+1}.")
|
||||
segment_results.append({"type": "error", "message": f"Speaker ID '{speaker_id}' has no sample path defined"})
|
||||
results_map[segment_idx] = {"type": "error", "message": f"Speaker ID '{speaker_id}' has no sample path defined"}
|
||||
segment_idx += 1
|
||||
continue
|
||||
|
||||
# speaker_info.sample_path is relative to config.SPEAKER_DATA_BASE_DIR
|
||||
abs_speaker_sample_path = config.SPEAKER_DATA_BASE_DIR / speaker_info.sample_path
|
||||
if not abs_speaker_sample_path.is_file():
|
||||
processing_log.append(f"Speaker sample file not found or is not a file at '{abs_speaker_sample_path}' for speaker ID '{speaker_id}'. Skipping item {i+1}.")
|
||||
segment_results.append({"type": "error", "message": f"Speaker sample not a file or not found: {abs_speaker_sample_path}"})
|
||||
results_map[segment_idx] = {"type": "error", "message": f"Speaker sample not a file or not found: {abs_speaker_sample_path}"}
|
||||
segment_idx += 1
|
||||
continue
|
||||
|
||||
text_chunks = self._split_text(text)
|
||||
processing_log.append(f"Split text for speaker '{speaker_id}' into {len(text_chunks)} chunk(s).")
|
||||
|
||||
for chunk_idx, text_chunk in enumerate(text_chunks):
|
||||
segment_filename_base = f"{output_base_name}_seg{segment_idx}_spk{speaker_id}_chunk{chunk_idx}"
|
||||
processing_log.append(f"Generating speech for chunk: '{text_chunk[:50]}...' using speaker '{speaker_id}'")
|
||||
|
||||
try:
|
||||
segment_output_path = await self.tts_service.generate_speech(
|
||||
text=text_chunk,
|
||||
speaker_id=speaker_id, # For metadata, actual sample path is used by TTS
|
||||
speaker_sample_path=str(abs_speaker_sample_path),
|
||||
output_filename_base=segment_filename_base,
|
||||
output_dir=dialog_temp_dir, # Save to the dialog's temp dir
|
||||
exaggeration=item.get('exaggeration', 0.5), # Default from Gradio, Pydantic model should provide this
|
||||
cfg_weight=item.get('cfg_weight', 0.5), # Default from Gradio, Pydantic model should provide this
|
||||
temperature=item.get('temperature', 0.8) # Default from Gradio, Pydantic model should provide this
|
||||
)
|
||||
segment_results.append({
|
||||
"type": "speech",
|
||||
"path": str(segment_output_path),
|
||||
"speaker_id": speaker_id,
|
||||
"text_chunk": text_chunk
|
||||
})
|
||||
processing_log.append(f"Successfully generated segment: {segment_output_path}")
|
||||
except Exception as e:
|
||||
error_message = f"Error generating speech for chunk '{text_chunk[:50]}...': {repr(e)}"
|
||||
processing_log.append(error_message)
|
||||
segment_results.append({"type": "error", "message": error_message, "text_chunk": text_chunk})
|
||||
filename_base = f"{output_base_name}_seg{segment_idx}_spk{speaker_id}_chunk{chunk_idx}"
|
||||
processing_log.append(f"Queueing TTS for chunk: '{text_chunk[:50]}...' using speaker '{speaker_id}'")
|
||||
planned = {
|
||||
"segment_idx": segment_idx,
|
||||
"speaker_id": speaker_id,
|
||||
"text_chunk": text_chunk,
|
||||
"abs_speaker_sample_path": abs_speaker_sample_path,
|
||||
"filename_base": filename_base,
|
||||
"params": {
|
||||
'exaggeration': item.get('exaggeration', 0.5),
|
||||
'cfg_weight': item.get('cfg_weight', 0.5),
|
||||
'temperature': item.get('temperature', 0.8),
|
||||
},
|
||||
}
|
||||
tasks.append(asyncio.create_task(run_one(planned)))
|
||||
segment_idx += 1
|
||||
|
||||
|
||||
elif item_type == "silence":
|
||||
duration = item.get("duration")
|
||||
if duration is None or duration < 0:
|
||||
processing_log.append(f"Skipping silence item {i+1} due to invalid duration.")
|
||||
segment_results.append({"type": "error", "message": "Invalid duration for silence"})
|
||||
results_map[segment_idx] = {"type": "error", "message": "Invalid duration for silence"}
|
||||
segment_idx += 1
|
||||
continue
|
||||
segment_results.append({"type": "silence", "duration": float(duration)})
|
||||
results_map[segment_idx] = {"type": "silence", "duration": float(duration)}
|
||||
processing_log.append(f"Added silence of {duration}s.")
|
||||
|
||||
segment_idx += 1
|
||||
|
||||
else:
|
||||
processing_log.append(f"Unknown item type '{item_type}' at item {i+1}. Skipping.")
|
||||
segment_results.append({"type": "error", "message": f"Unknown item type: {item_type}"})
|
||||
results_map[segment_idx] = {"type": "error", "message": f"Unknown item type: {item_type}"}
|
||||
segment_idx += 1
|
||||
|
||||
# Await all TTS tasks and merge results
|
||||
if tasks:
|
||||
processing_log.append(
|
||||
f"Dispatching {len(tasks)} TTS task(s) with concurrency limit "
|
||||
f"{getattr(config, 'TTS_MAX_CONCURRENCY', 2)}"
|
||||
)
|
||||
completed = await asyncio.gather(*tasks, return_exceptions=False)
|
||||
for idx, payload, maybe_log in completed:
|
||||
results_map[idx] = payload
|
||||
if maybe_log:
|
||||
processing_log.append(maybe_log)
|
||||
|
||||
# Build ordered list
|
||||
for idx in sorted(results_map.keys()):
|
||||
segment_results.append(results_map[idx])
|
||||
|
||||
# Log the full segment_results list for debugging
|
||||
processing_log.append("[DEBUG] Final segment_results list:")
|
||||
|
@ -220,7 +277,7 @@ class DialogProcessorService:
|
|||
return {
|
||||
"log": "\n".join(processing_log),
|
||||
"segment_files": segment_results,
|
||||
"temp_dir": str(dialog_temp_dir) # For cleanup or zipping later
|
||||
"temp_dir": str(dialog_temp_dir)
|
||||
}
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
|
|
@ -0,0 +1,170 @@
|
|||
import asyncio
|
||||
import time
|
||||
import logging
|
||||
from typing import Optional
|
||||
import gc
|
||||
import os
|
||||
|
||||
_proc = None
|
||||
try:
|
||||
import psutil # type: ignore
|
||||
_proc = psutil.Process(os.getpid())
|
||||
except Exception:
|
||||
psutil = None # type: ignore
|
||||
|
||||
def _rss_mb() -> float:
|
||||
"""Return current process RSS in MB, or -1.0 if unavailable."""
|
||||
global _proc
|
||||
try:
|
||||
if _proc is None and psutil is not None:
|
||||
_proc = psutil.Process(os.getpid())
|
||||
if _proc is not None:
|
||||
return _proc.memory_info().rss / (1024 * 1024)
|
||||
except Exception:
|
||||
return -1.0
|
||||
return -1.0
|
||||
|
||||
try:
|
||||
import torch # Optional; used for cache cleanup metrics
|
||||
except Exception: # pragma: no cover - torch may not be present in some envs
|
||||
torch = None # type: ignore
|
||||
|
||||
from app import config
|
||||
from app.services.tts_service import TTSService
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class ModelManager:
|
||||
_instance: Optional["ModelManager"] = None
|
||||
|
||||
def __init__(self):
|
||||
self._service: Optional[TTSService] = None
|
||||
self._last_used: float = time.time()
|
||||
self._active: int = 0
|
||||
self._lock = asyncio.Lock()
|
||||
self._counter_lock = asyncio.Lock()
|
||||
|
||||
@classmethod
|
||||
def instance(cls) -> "ModelManager":
|
||||
if not cls._instance:
|
||||
cls._instance = cls()
|
||||
return cls._instance
|
||||
|
||||
async def _ensure_service(self) -> None:
|
||||
if self._service is None:
|
||||
# Use configured device, default is handled by TTSService itself
|
||||
device = getattr(config, "DEVICE", "auto")
|
||||
# TTSService presently expects explicit device like "mps"/"cpu"/"cuda"; map "auto" to "mps" on Mac otherwise cpu
|
||||
if device == "auto":
|
||||
try:
|
||||
import torch
|
||||
if hasattr(torch.backends, "mps") and torch.backends.mps.is_available():
|
||||
device = "mps"
|
||||
elif torch.cuda.is_available():
|
||||
device = "cuda"
|
||||
else:
|
||||
device = "cpu"
|
||||
except Exception:
|
||||
device = "cpu"
|
||||
self._service = TTSService(device=device)
|
||||
|
||||
async def load(self) -> None:
|
||||
async with self._lock:
|
||||
await self._ensure_service()
|
||||
if self._service and self._service.model is None:
|
||||
before_mb = _rss_mb()
|
||||
logger.info(
|
||||
"Loading TTS model (device=%s)... (rss_before=%.1f MB)",
|
||||
self._service.device,
|
||||
before_mb,
|
||||
)
|
||||
self._service.load_model()
|
||||
after_mb = _rss_mb()
|
||||
if after_mb >= 0 and before_mb >= 0:
|
||||
logger.info(
|
||||
"TTS model loaded (rss_after=%.1f MB, delta=%.1f MB)",
|
||||
after_mb,
|
||||
after_mb - before_mb,
|
||||
)
|
||||
self._last_used = time.time()
|
||||
|
||||
async def unload(self) -> None:
|
||||
async with self._lock:
|
||||
if not self._service:
|
||||
return
|
||||
if self._active > 0:
|
||||
logger.debug("Skip unload: %d active operations", self._active)
|
||||
return
|
||||
if self._service.model is not None:
|
||||
before_mb = _rss_mb()
|
||||
logger.info(
|
||||
"Unloading idle TTS model... (rss_before=%.1f MB, active=%d)",
|
||||
before_mb,
|
||||
self._active,
|
||||
)
|
||||
self._service.unload_model()
|
||||
# Drop the service instance as well to release any lingering refs
|
||||
self._service = None
|
||||
# Force GC and attempt allocator cache cleanup
|
||||
try:
|
||||
gc.collect()
|
||||
finally:
|
||||
if torch is not None:
|
||||
try:
|
||||
if hasattr(torch, "cuda") and torch.cuda.is_available():
|
||||
torch.cuda.empty_cache()
|
||||
except Exception:
|
||||
logger.debug("cuda.empty_cache() failed", exc_info=True)
|
||||
try:
|
||||
# MPS empty_cache may exist depending on torch version
|
||||
mps = getattr(torch, "mps", None)
|
||||
if mps is not None and hasattr(mps, "empty_cache"):
|
||||
mps.empty_cache()
|
||||
except Exception:
|
||||
logger.debug("mps.empty_cache() failed", exc_info=True)
|
||||
after_mb = _rss_mb()
|
||||
if after_mb >= 0 and before_mb >= 0:
|
||||
logger.info(
|
||||
"Idle unload complete (rss_after=%.1f MB, delta=%.1f MB)",
|
||||
after_mb,
|
||||
after_mb - before_mb,
|
||||
)
|
||||
self._last_used = time.time()
|
||||
|
||||
async def get_service(self) -> TTSService:
|
||||
if not self._service or self._service.model is None:
|
||||
await self.load()
|
||||
self._last_used = time.time()
|
||||
return self._service # type: ignore[return-value]
|
||||
|
||||
async def _inc(self) -> None:
|
||||
async with self._counter_lock:
|
||||
self._active += 1
|
||||
|
||||
async def _dec(self) -> None:
|
||||
async with self._counter_lock:
|
||||
self._active = max(0, self._active - 1)
|
||||
self._last_used = time.time()
|
||||
|
||||
def last_used(self) -> float:
|
||||
return self._last_used
|
||||
|
||||
def is_loaded(self) -> bool:
|
||||
return bool(self._service and self._service.model is not None)
|
||||
|
||||
def active(self) -> int:
|
||||
return self._active
|
||||
|
||||
def using(self):
|
||||
manager = self
|
||||
|
||||
class _Ctx:
|
||||
async def __aenter__(self):
|
||||
await manager._inc()
|
||||
return manager
|
||||
|
||||
async def __aexit__(self, exc_type, exc, tb):
|
||||
await manager._dec()
|
||||
|
||||
return _Ctx()
|
|
@ -1,11 +1,14 @@
|
|||
import torch
|
||||
import torchaudio
|
||||
import asyncio
|
||||
from typing import Optional
|
||||
from chatterbox.tts import ChatterboxTTS
|
||||
from pathlib import Path
|
||||
import gc # Garbage collector for memory management
|
||||
import os
|
||||
from contextlib import contextmanager
|
||||
from datetime import datetime
|
||||
import time
|
||||
|
||||
# Import configuration
|
||||
try:
|
||||
|
@ -114,42 +117,52 @@ class TTSService:
|
|||
# output_filename_base from DialogProcessorService is expected to be comprehensive (e.g., includes speaker_id, segment info)
|
||||
output_file_path = target_output_dir / f"{output_filename_base}.wav"
|
||||
|
||||
print(f"Generating audio for text: \"{text[:50]}...\" with speaker sample: {speaker_sample_path}")
|
||||
wav = None
|
||||
start_ts = datetime.now()
|
||||
print(f"[{start_ts.isoformat(timespec='seconds')}] [TTS] START generate+save base={output_filename_base} len={len(text)} sample={speaker_sample_path}")
|
||||
try:
|
||||
with torch.no_grad(): # Important for inference
|
||||
wav = self.model.generate(
|
||||
text=text,
|
||||
audio_prompt_path=str(speaker_sample_p), # Must be a string path
|
||||
exaggeration=exaggeration,
|
||||
cfg_weight=cfg_weight,
|
||||
temperature=temperature,
|
||||
)
|
||||
|
||||
torchaudio.save(str(output_file_path), wav, self.model.sr)
|
||||
print(f"Audio saved to: {output_file_path}")
|
||||
return output_file_path
|
||||
except Exception as e:
|
||||
print(f"Error during TTS generation or saving: {e}")
|
||||
raise
|
||||
finally:
|
||||
# Explicitly delete the wav tensor to free memory
|
||||
if wav is not None:
|
||||
del wav
|
||||
|
||||
# Force garbage collection and cache cleanup
|
||||
gc.collect()
|
||||
if self.device == "cuda":
|
||||
torch.cuda.empty_cache()
|
||||
elif self.device == "mps":
|
||||
if hasattr(torch.mps, "empty_cache"):
|
||||
torch.mps.empty_cache()
|
||||
|
||||
# Unload the model if requested
|
||||
def _gen_and_save() -> Path:
|
||||
t0 = time.perf_counter()
|
||||
wav = None
|
||||
try:
|
||||
with torch.no_grad(): # Important for inference
|
||||
wav = self.model.generate(
|
||||
text=text,
|
||||
audio_prompt_path=str(speaker_sample_p), # Must be a string path
|
||||
exaggeration=exaggeration,
|
||||
cfg_weight=cfg_weight,
|
||||
temperature=temperature,
|
||||
)
|
||||
|
||||
# Save the audio synchronously in the same thread
|
||||
torchaudio.save(str(output_file_path), wav, self.model.sr)
|
||||
t1 = time.perf_counter()
|
||||
print(f"[TTS-THREAD] Saved {output_file_path.name} in {t1 - t0:.2f}s")
|
||||
return output_file_path
|
||||
finally:
|
||||
# Cleanup in the same thread that created the tensor
|
||||
if wav is not None:
|
||||
del wav
|
||||
gc.collect()
|
||||
if self.device == "cuda":
|
||||
torch.cuda.empty_cache()
|
||||
elif self.device == "mps":
|
||||
if hasattr(torch.mps, "empty_cache"):
|
||||
torch.mps.empty_cache()
|
||||
|
||||
out_path = await asyncio.to_thread(_gen_and_save)
|
||||
end_ts = datetime.now()
|
||||
print(f"[{end_ts.isoformat(timespec='seconds')}] [TTS] END generate+save base={output_filename_base} dur={(end_ts - start_ts).total_seconds():.2f}s -> {out_path}")
|
||||
|
||||
# Optionally unload model after generation
|
||||
if unload_after:
|
||||
print("Unloading TTS model after generation...")
|
||||
self.unload_model()
|
||||
|
||||
return out_path
|
||||
except Exception as e:
|
||||
print(f"Error during TTS generation or saving: {e}")
|
||||
raise
|
||||
|
||||
# Example usage (for testing, not part of the service itself)
|
||||
if __name__ == "__main__":
|
||||
async def main_test():
|
||||
|
|
|
@ -14,6 +14,14 @@ if __name__ == "__main__":
|
|||
print(f"CORS Origins: {config.CORS_ORIGINS}")
|
||||
print(f"Project Root: {config.PROJECT_ROOT}")
|
||||
print(f"Device: {config.DEVICE}")
|
||||
# Idle eviction settings
|
||||
print(
|
||||
"Model Eviction -> enabled: {} | idle_timeout: {}s | check_interval: {}s".format(
|
||||
getattr(config, "MODEL_EVICTION_ENABLED", True),
|
||||
getattr(config, "MODEL_IDLE_TIMEOUT_SECONDS", 0),
|
||||
getattr(config, "MODEL_IDLE_CHECK_INTERVAL_SECONDS", 60),
|
||||
)
|
||||
)
|
||||
|
||||
uvicorn.run(
|
||||
"app.main:app",
|
||||
|
|
|
@ -0,0 +1,2 @@
|
|||
# yaml-language-server: $schema=https://raw.githubusercontent.com/antinomyhq/forge/refs/heads/main/forge.schema.json
|
||||
model: qwen/qwen3-coder
|
|
@ -24,7 +24,7 @@
|
|||
--text-blue-darker: #205081;
|
||||
|
||||
/* Border Colors */
|
||||
--border-light: #1b0404;
|
||||
--border-light: #e5e7eb;
|
||||
--border-medium: #cfd8dc;
|
||||
--border-blue: #b5c6df;
|
||||
--border-gray: #e3e3e3;
|
||||
|
@ -55,7 +55,7 @@ body {
|
|||
}
|
||||
|
||||
.container {
|
||||
max-width: 1100px;
|
||||
max-width: 1280px;
|
||||
margin: 0 auto;
|
||||
padding: 0 18px;
|
||||
}
|
||||
|
@ -134,6 +134,17 @@ main {
|
|||
font-size: 1rem;
|
||||
}
|
||||
|
||||
/* Allow wrapping for Text/Duration (3rd) column */
|
||||
#dialog-items-table td:nth-child(3),
|
||||
#dialog-items-table td.dialog-editable-cell {
|
||||
white-space: pre-wrap; /* wrap text and preserve newlines */
|
||||
overflow: visible; /* override global overflow hidden */
|
||||
text-overflow: clip; /* no ellipsis */
|
||||
word-break: break-word;/* wrap long words/URLs */
|
||||
color: var(--text-primary); /* darker text for readability */
|
||||
font-weight: 350; /* slightly heavier than 300, lighter than 400 */
|
||||
}
|
||||
|
||||
/* Make the Speaker (2nd) column narrower */
|
||||
#dialog-items-table th:nth-child(2), #dialog-items-table td:nth-child(2) {
|
||||
width: 60px;
|
||||
|
@ -142,11 +153,11 @@ main {
|
|||
text-align: center;
|
||||
}
|
||||
|
||||
/* Make the Actions (4th) column narrower */
|
||||
/* Actions (4th) column sizing */
|
||||
#dialog-items-table th:nth-child(4), #dialog-items-table td:nth-child(4) {
|
||||
width: 110px;
|
||||
min-width: 90px;
|
||||
max-width: 130px;
|
||||
width: 200px;
|
||||
min-width: 180px;
|
||||
max-width: 280px;
|
||||
text-align: left;
|
||||
padding-left: 0;
|
||||
padding-right: 0;
|
||||
|
@ -186,8 +197,22 @@ main {
|
|||
|
||||
#dialog-items-table td.actions {
|
||||
text-align: left;
|
||||
min-width: 110px;
|
||||
white-space: nowrap;
|
||||
min-width: 200px;
|
||||
white-space: normal; /* allow wrapping so we don't see ellipsis */
|
||||
overflow: visible; /* override table cell default from global rule */
|
||||
text-overflow: clip; /* no ellipsis */
|
||||
}
|
||||
|
||||
/* Allow wrapping of action buttons on smaller screens */
|
||||
@media (max-width: 900px) {
|
||||
#dialog-items-table th:nth-child(4), #dialog-items-table td:nth-child(4) {
|
||||
width: auto;
|
||||
min-width: 160px;
|
||||
max-width: none;
|
||||
}
|
||||
#dialog-items-table td.actions {
|
||||
white-space: normal;
|
||||
}
|
||||
}
|
||||
|
||||
/* Collapsible log details */
|
||||
|
@ -346,7 +371,7 @@ button {
|
|||
margin-right: 10px;
|
||||
}
|
||||
|
||||
.generate-line-btn, .play-line-btn {
|
||||
.generate-line-btn, .play-line-btn, .stop-line-btn {
|
||||
background: var(--bg-blue-light);
|
||||
color: var(--text-blue);
|
||||
border: 1.5px solid var(--border-blue);
|
||||
|
@ -363,7 +388,7 @@ button {
|
|||
vertical-align: middle;
|
||||
}
|
||||
|
||||
.generate-line-btn:disabled, .play-line-btn:disabled {
|
||||
.generate-line-btn:disabled, .play-line-btn:disabled, .stop-line-btn:disabled {
|
||||
opacity: 0.45;
|
||||
cursor: not-allowed;
|
||||
}
|
||||
|
@ -374,7 +399,7 @@ button {
|
|||
border-color: var(--warning-border);
|
||||
}
|
||||
|
||||
.generate-line-btn:hover, .play-line-btn:hover {
|
||||
.generate-line-btn:hover, .play-line-btn:hover, .stop-line-btn:hover {
|
||||
background: var(--bg-blue-lighter);
|
||||
color: var(--text-blue-darker);
|
||||
border-color: var(--text-blue);
|
||||
|
@ -449,6 +474,72 @@ footer {
|
|||
border-top: 3px solid var(--primary-blue);
|
||||
}
|
||||
|
||||
/* Inline Notification */
|
||||
.notice {
|
||||
max-width: 1280px;
|
||||
margin: 16px auto 0;
|
||||
padding: 12px 16px;
|
||||
border-radius: 6px;
|
||||
border: 1px solid var(--border-medium);
|
||||
background: var(--bg-white);
|
||||
color: var(--text-primary);
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 12px;
|
||||
box-shadow: 0 1px 2px var(--shadow-light);
|
||||
}
|
||||
|
||||
.notice--info {
|
||||
border-color: var(--border-blue);
|
||||
background: var(--bg-blue-light);
|
||||
}
|
||||
|
||||
.notice--success {
|
||||
border-color: #A7F3D0;
|
||||
background: #ECFDF5;
|
||||
}
|
||||
|
||||
.notice--warning {
|
||||
border-color: var(--warning-border);
|
||||
background: var(--warning-bg);
|
||||
}
|
||||
|
||||
.notice--error {
|
||||
border-color: var(--error-bg-dark);
|
||||
background: #FEE2E2;
|
||||
}
|
||||
|
||||
.notice__content {
|
||||
flex: 1;
|
||||
}
|
||||
|
||||
.notice__actions {
|
||||
display: flex;
|
||||
gap: 8px;
|
||||
}
|
||||
|
||||
.notice__actions button {
|
||||
padding: 6px 12px;
|
||||
border-radius: 4px;
|
||||
border: 1px solid var(--border-medium);
|
||||
background: var(--bg-white);
|
||||
cursor: pointer;
|
||||
}
|
||||
|
||||
.notice__actions .btn-primary {
|
||||
background: var(--primary-blue);
|
||||
color: var(--text-white);
|
||||
border: none;
|
||||
}
|
||||
|
||||
.notice__close {
|
||||
background: none;
|
||||
border: none;
|
||||
font-size: 18px;
|
||||
cursor: pointer;
|
||||
color: var(--text-secondary);
|
||||
}
|
||||
|
||||
@media (max-width: 900px) {
|
||||
.panel-grid {
|
||||
flex-direction: column;
|
||||
|
|
|
@ -11,8 +11,38 @@
|
|||
<div class="container">
|
||||
<h1>Chatterbox TTS</h1>
|
||||
</div>
|
||||
|
||||
<!-- Paste Script Modal -->
|
||||
<div id="paste-script-modal" class="modal" style="display: none;">
|
||||
<div class="modal-content">
|
||||
<div class="modal-header">
|
||||
<h3>Paste Dialog Script</h3>
|
||||
<button class="modal-close" id="paste-script-close">×</button>
|
||||
</div>
|
||||
<div class="modal-body">
|
||||
<p>Paste JSONL content (one JSON object per line). Example lines:</p>
|
||||
<pre style="white-space:pre-wrap; background:#f6f8fa; padding:8px; border-radius:4px;">
|
||||
{"type":"speech","speaker_id":"alice","text":"Hello there!"}
|
||||
{"type":"silence","duration":0.5}
|
||||
{"type":"speech","speaker_id":"bob","text":"Hi!"}
|
||||
</pre>
|
||||
<textarea id="paste-script-text" rows="10" style="width:100%;" placeholder='Paste JSONL here'></textarea>
|
||||
</div>
|
||||
<div class="modal-footer">
|
||||
<button id="paste-script-load" class="btn-primary">Load</button>
|
||||
<button id="paste-script-cancel" class="btn-secondary">Cancel</button>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</header>
|
||||
|
||||
<!-- Global inline notification area -->
|
||||
<div id="global-notice" class="notice" role="status" aria-live="polite" style="display:none;">
|
||||
<div class="notice__content" id="global-notice-content"></div>
|
||||
<div class="notice__actions" id="global-notice-actions"></div>
|
||||
<button class="notice__close" id="global-notice-close" aria-label="Close notification">×</button>
|
||||
</div>
|
||||
|
||||
<main class="container" role="main">
|
||||
<div class="panel-grid">
|
||||
<section id="dialog-editor" class="panel full-width-panel" aria-labelledby="dialog-editor-title">
|
||||
|
@ -48,6 +78,7 @@
|
|||
<button id="save-script-btn">Save Script</button>
|
||||
<input type="file" id="load-script-input" accept=".jsonl" style="display: none;">
|
||||
<button id="load-script-btn">Load Script</button>
|
||||
<button id="paste-script-btn">Paste Script</button>
|
||||
</div>
|
||||
</section>
|
||||
</div>
|
||||
|
@ -101,8 +132,8 @@
|
|||
</div>
|
||||
</footer>
|
||||
|
||||
<!-- TTS Settings Modal -->
|
||||
<div id="tts-settings-modal" class="modal" style="display: none;">
|
||||
<!-- TTS Settings Modal -->
|
||||
<div id="tts-settings-modal" class="modal" style="display: none;">
|
||||
<div class="modal-content">
|
||||
<div class="modal-header">
|
||||
<h3>TTS Settings</h3>
|
||||
|
|
|
@ -10,7 +10,7 @@ const API_BASE_URL = API_BASE_URL_WITH_PREFIX;
|
|||
* @throws {Error} If the network response is not ok.
|
||||
*/
|
||||
export async function getSpeakers() {
|
||||
const response = await fetch(`${API_BASE_URL}/speakers/`);
|
||||
const response = await fetch(`${API_BASE_URL}/speakers`);
|
||||
if (!response.ok) {
|
||||
const errorData = await response.json().catch(() => ({ message: response.statusText }));
|
||||
throw new Error(`Failed to fetch speakers: ${errorData.detail || errorData.message || response.statusText}`);
|
||||
|
@ -26,12 +26,12 @@ export async function getSpeakers() {
|
|||
* Adds a new speaker.
|
||||
* @param {FormData} formData - The form data containing speaker name and audio file.
|
||||
* Example: formData.append('name', 'New Speaker');
|
||||
* formData.append('audio_sample_file', fileInput.files[0]);
|
||||
* formData.append('audio_file', fileInput.files[0]);
|
||||
* @returns {Promise<Object>} A promise that resolves to the new speaker object.
|
||||
* @throws {Error} If the network response is not ok.
|
||||
*/
|
||||
export async function addSpeaker(formData) {
|
||||
const response = await fetch(`${API_BASE_URL}/speakers/`, {
|
||||
const response = await fetch(`${API_BASE_URL}/speakers`, {
|
||||
method: 'POST',
|
||||
body: formData, // FormData sets Content-Type to multipart/form-data automatically
|
||||
});
|
||||
|
@ -86,7 +86,7 @@ export async function addSpeaker(formData) {
|
|||
* @throws {Error} If the network response is not ok.
|
||||
*/
|
||||
export async function deleteSpeaker(speakerId) {
|
||||
const response = await fetch(`${API_BASE_URL}/speakers/${speakerId}/`, {
|
||||
const response = await fetch(`${API_BASE_URL}/speakers/${speakerId}`, {
|
||||
method: 'DELETE',
|
||||
});
|
||||
if (!response.ok) {
|
||||
|
@ -124,18 +124,8 @@ export async function generateLine(line) {
|
|||
const errorData = await response.json().catch(() => ({ message: response.statusText }));
|
||||
throw new Error(`Failed to generate line audio: ${errorData.detail || errorData.message || response.statusText}`);
|
||||
}
|
||||
|
||||
const responseText = await response.text();
|
||||
console.log('Raw response text:', responseText);
|
||||
|
||||
try {
|
||||
const jsonData = JSON.parse(responseText);
|
||||
console.log('Parsed JSON:', jsonData);
|
||||
return jsonData;
|
||||
} catch (parseError) {
|
||||
console.error('JSON parse error:', parseError);
|
||||
throw new Error(`Invalid JSON response: ${responseText}`);
|
||||
}
|
||||
const data = await response.json();
|
||||
return data;
|
||||
}
|
||||
|
||||
/**
|
||||
|
@ -146,7 +136,7 @@ export async function generateLine(line) {
|
|||
* output_base_name: "my_dialog",
|
||||
* dialog_items: [
|
||||
* { type: "speech", speaker_id: "speaker1", text: "Hello world.", exaggeration: 1.0, cfg_weight: 2.0, temperature: 0.7 },
|
||||
* { type: "silence", duration_ms: 500 },
|
||||
* { type: "silence", duration: 0.5 },
|
||||
* { type: "speech", speaker_id: "speaker2", text: "How are you?" }
|
||||
* ]
|
||||
* }
|
||||
|
|
|
@ -1,6 +1,69 @@
|
|||
import { getSpeakers, addSpeaker, deleteSpeaker, generateDialog } from './api.js';
|
||||
import { API_BASE_URL, API_BASE_URL_FOR_FILES } from './config.js';
|
||||
|
||||
// Shared per-line audio playback state to prevent overlapping playback
|
||||
let currentLineAudio = null;
|
||||
let currentLinePlayBtn = null;
|
||||
let currentLineStopBtn = null;
|
||||
|
||||
// --- Global Inline Notification Helpers --- //
|
||||
const noticeEl = document.getElementById('global-notice');
|
||||
const noticeContentEl = document.getElementById('global-notice-content');
|
||||
const noticeActionsEl = document.getElementById('global-notice-actions');
|
||||
const noticeCloseBtn = document.getElementById('global-notice-close');
|
||||
|
||||
function hideNotice() {
|
||||
if (!noticeEl) return;
|
||||
noticeEl.style.display = 'none';
|
||||
noticeEl.className = 'notice';
|
||||
if (noticeContentEl) noticeContentEl.textContent = '';
|
||||
if (noticeActionsEl) noticeActionsEl.innerHTML = '';
|
||||
}
|
||||
|
||||
function showNotice(message, type = 'info', options = {}) {
|
||||
if (!noticeEl || !noticeContentEl || !noticeActionsEl) {
|
||||
console[type === 'error' ? 'error' : 'log']('[NOTICE]', message);
|
||||
return () => {};
|
||||
}
|
||||
const { timeout = null, actions = [] } = options;
|
||||
noticeEl.className = `notice notice--${type}`;
|
||||
noticeContentEl.textContent = message;
|
||||
noticeActionsEl.innerHTML = '';
|
||||
|
||||
actions.forEach(({ text, primary = false, onClick }) => {
|
||||
const btn = document.createElement('button');
|
||||
btn.textContent = text;
|
||||
if (primary) btn.classList.add('btn-primary');
|
||||
btn.onclick = () => {
|
||||
try { onClick && onClick(); } finally { hideNotice(); }
|
||||
};
|
||||
noticeActionsEl.appendChild(btn);
|
||||
});
|
||||
|
||||
if (noticeCloseBtn) noticeCloseBtn.onclick = hideNotice;
|
||||
noticeEl.style.display = 'flex';
|
||||
|
||||
let timerId = null;
|
||||
if (timeout && Number.isFinite(timeout)) {
|
||||
timerId = window.setTimeout(hideNotice, timeout);
|
||||
}
|
||||
return () => {
|
||||
if (timerId) window.clearTimeout(timerId);
|
||||
hideNotice();
|
||||
};
|
||||
}
|
||||
|
||||
function confirmAction(message) {
|
||||
return new Promise((resolve) => {
|
||||
showNotice(message, 'warning', {
|
||||
actions: [
|
||||
{ text: 'Cancel', primary: false, onClick: () => resolve(false) },
|
||||
{ text: 'Confirm', primary: true, onClick: () => resolve(true) },
|
||||
],
|
||||
});
|
||||
});
|
||||
}
|
||||
|
||||
document.addEventListener('DOMContentLoaded', async () => {
|
||||
console.log('DOM fully loaded and parsed');
|
||||
initializeSpeakerManagement();
|
||||
|
@ -23,18 +86,24 @@ function initializeSpeakerManagement() {
|
|||
const audioFile = formData.get('audio_file');
|
||||
|
||||
if (!speakerName || !audioFile || audioFile.size === 0) {
|
||||
alert('Please provide a speaker name and an audio file.');
|
||||
showNotice('Please provide a speaker name and an audio file.', 'warning', { timeout: 4000 });
|
||||
return;
|
||||
}
|
||||
|
||||
try {
|
||||
const submitBtn = addSpeakerForm.querySelector('button[type="submit"]');
|
||||
const prevText = submitBtn ? submitBtn.textContent : null;
|
||||
if (submitBtn) { submitBtn.disabled = true; submitBtn.textContent = 'Adding…'; }
|
||||
const newSpeaker = await addSpeaker(formData);
|
||||
alert(`Speaker added: ${newSpeaker.name} (ID: ${newSpeaker.id})`);
|
||||
showNotice(`Speaker added: ${newSpeaker.name} (ID: ${newSpeaker.id})`, 'success', { timeout: 3000 });
|
||||
addSpeakerForm.reset();
|
||||
loadSpeakers(); // Refresh speaker list
|
||||
} catch (error) {
|
||||
console.error('Failed to add speaker:', error);
|
||||
alert('Error adding speaker: ' + error.message);
|
||||
showNotice('Error adding speaker: ' + error.message, 'error');
|
||||
} finally {
|
||||
const submitBtn = addSpeakerForm.querySelector('button[type="submit"]');
|
||||
if (submitBtn) { submitBtn.disabled = false; submitBtn.textContent = 'Add Speaker'; }
|
||||
}
|
||||
});
|
||||
}
|
||||
|
@ -79,23 +148,24 @@ async function loadSpeakers() {
|
|||
} catch (error) {
|
||||
console.error('Failed to load speakers:', error);
|
||||
speakerListUL.innerHTML = '<li>Error loading speakers. See console for details.</li>';
|
||||
alert('Error loading speakers: ' + error.message);
|
||||
showNotice('Error loading speakers: ' + error.message, 'error');
|
||||
}
|
||||
}
|
||||
|
||||
async function handleDeleteSpeaker(speakerId) {
|
||||
if (!speakerId) {
|
||||
alert('Cannot delete speaker: Speaker ID is missing.');
|
||||
showNotice('Cannot delete speaker: Speaker ID is missing.', 'warning', { timeout: 4000 });
|
||||
return;
|
||||
}
|
||||
if (!confirm(`Are you sure you want to delete speaker ${speakerId}?`)) return;
|
||||
const ok = await confirmAction(`Are you sure you want to delete speaker ${speakerId}?`);
|
||||
if (!ok) return;
|
||||
try {
|
||||
await deleteSpeaker(speakerId);
|
||||
alert(`Speaker ${speakerId} deleted successfully.`);
|
||||
showNotice(`Speaker ${speakerId} deleted successfully.`, 'success', { timeout: 3000 });
|
||||
loadSpeakers(); // Refresh speaker list
|
||||
} catch (error) {
|
||||
console.error(`Failed to delete speaker ${speakerId}:`, error);
|
||||
alert(`Error deleting speaker: ${error.message}`);
|
||||
showNotice(`Error deleting speaker: ${error.message}`, 'error');
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -131,6 +201,12 @@ async function initializeDialogEditor() {
|
|||
const saveScriptBtn = document.getElementById('save-script-btn');
|
||||
const loadScriptBtn = document.getElementById('load-script-btn');
|
||||
const loadScriptInput = document.getElementById('load-script-input');
|
||||
const pasteScriptBtn = document.getElementById('paste-script-btn');
|
||||
const pasteModal = document.getElementById('paste-script-modal');
|
||||
const pasteText = document.getElementById('paste-script-text');
|
||||
const pasteLoadBtn = document.getElementById('paste-script-load');
|
||||
const pasteCancelBtn = document.getElementById('paste-script-cancel');
|
||||
const pasteCloseBtn = document.getElementById('paste-script-close');
|
||||
|
||||
// Results Display Elements
|
||||
const generationLogPre = document.getElementById('generation-log-content'); // Corrected ID
|
||||
|
@ -140,9 +216,6 @@ async function initializeDialogEditor() {
|
|||
const zipArchivePlaceholder = document.getElementById('zip-archive-placeholder');
|
||||
const resultsDisplaySection = document.getElementById('results-display');
|
||||
|
||||
let dialogItems = [];
|
||||
let availableSpeakersCache = []; // Cache for speaker names and IDs
|
||||
|
||||
// Load speakers at startup
|
||||
try {
|
||||
availableSpeakersCache = await getSpeakers();
|
||||
|
@ -152,6 +225,48 @@ async function initializeDialogEditor() {
|
|||
// Continue without speakers - they'll be loaded when needed
|
||||
}
|
||||
|
||||
// --- LocalStorage persistence helpers ---
|
||||
const LS_KEY = 'dialogEditor.items.v1';
|
||||
|
||||
function saveDialogToLocalStorage() {
|
||||
try {
|
||||
const exportData = dialogItems.map(item => {
|
||||
const obj = { type: item.type };
|
||||
if (item.type === 'speech') {
|
||||
obj.speaker_id = item.speaker_id;
|
||||
obj.text = item.text;
|
||||
if (item.exaggeration !== undefined) obj.exaggeration = item.exaggeration;
|
||||
if (item.cfg_weight !== undefined) obj.cfg_weight = item.cfg_weight;
|
||||
if (item.temperature !== undefined) obj.temperature = item.temperature;
|
||||
if (item.audioUrl) obj.audioUrl = item.audioUrl; // keep existing audio reference if present
|
||||
} else if (item.type === 'silence') {
|
||||
obj.duration = item.duration;
|
||||
}
|
||||
return obj;
|
||||
});
|
||||
localStorage.setItem(LS_KEY, JSON.stringify({ items: exportData }));
|
||||
} catch (e) {
|
||||
console.warn('Failed to save dialog to localStorage:', e);
|
||||
}
|
||||
}
|
||||
|
||||
function loadDialogFromLocalStorage() {
|
||||
try {
|
||||
const raw = localStorage.getItem(LS_KEY);
|
||||
if (!raw) return;
|
||||
const parsed = JSON.parse(raw);
|
||||
if (!parsed || !Array.isArray(parsed.items)) return;
|
||||
const loaded = parsed.items.map(normalizeDialogItem);
|
||||
dialogItems.splice(0, dialogItems.length, ...loaded);
|
||||
console.log(`Restored ${loaded.length} dialog items from localStorage`);
|
||||
} catch (e) {
|
||||
console.warn('Failed to load dialog from localStorage:', e);
|
||||
}
|
||||
}
|
||||
|
||||
// Attempt to restore saved dialog before first render
|
||||
loadDialogFromLocalStorage();
|
||||
|
||||
// Function to render the current dialogItems array to the DOM as table rows
|
||||
function renderDialogItems() {
|
||||
if (!dialogItemsContainer) return;
|
||||
|
@ -184,6 +299,8 @@ async function initializeDialogEditor() {
|
|||
});
|
||||
speakerSelect.onchange = (e) => {
|
||||
dialogItems[index].speaker_id = e.target.value;
|
||||
// Persist change
|
||||
saveDialogToLocalStorage();
|
||||
};
|
||||
speakerTd.appendChild(speakerSelect);
|
||||
} else {
|
||||
|
@ -195,8 +312,7 @@ async function initializeDialogEditor() {
|
|||
const textTd = document.createElement('td');
|
||||
textTd.className = 'dialog-editable-cell';
|
||||
if (item.type === 'speech') {
|
||||
let txt = item.text.length > 60 ? item.text.substring(0, 57) + '…' : item.text;
|
||||
textTd.textContent = `"${txt}"`;
|
||||
textTd.textContent = `"${item.text}"`;
|
||||
textTd.title = item.text;
|
||||
} else {
|
||||
textTd.textContent = `${item.duration}s`;
|
||||
|
@ -243,6 +359,8 @@ async function initializeDialogEditor() {
|
|||
if (!isNaN(val) && val > 0) dialogItems[index].duration = val;
|
||||
dialogItems[index].audioUrl = null;
|
||||
}
|
||||
// Persist changes before re-render
|
||||
saveDialogToLocalStorage();
|
||||
renderDialogItems();
|
||||
}
|
||||
};
|
||||
|
@ -261,6 +379,7 @@ async function initializeDialogEditor() {
|
|||
upBtn.onclick = () => {
|
||||
if (index > 0) {
|
||||
[dialogItems[index - 1], dialogItems[index]] = [dialogItems[index], dialogItems[index - 1]];
|
||||
saveDialogToLocalStorage();
|
||||
renderDialogItems();
|
||||
}
|
||||
};
|
||||
|
@ -275,6 +394,7 @@ async function initializeDialogEditor() {
|
|||
downBtn.onclick = () => {
|
||||
if (index < dialogItems.length - 1) {
|
||||
[dialogItems[index], dialogItems[index + 1]] = [dialogItems[index + 1], dialogItems[index]];
|
||||
saveDialogToLocalStorage();
|
||||
renderDialogItems();
|
||||
}
|
||||
};
|
||||
|
@ -288,6 +408,7 @@ async function initializeDialogEditor() {
|
|||
removeBtn.title = 'Remove';
|
||||
removeBtn.onclick = () => {
|
||||
dialogItems.splice(index, 1);
|
||||
saveDialogToLocalStorage();
|
||||
renderDialogItems();
|
||||
};
|
||||
actionsTd.appendChild(removeBtn);
|
||||
|
@ -314,6 +435,8 @@ async function initializeDialogEditor() {
|
|||
if (result && result.audio_url) {
|
||||
dialogItems[index].audioUrl = result.audio_url;
|
||||
console.log('Set audioUrl to:', result.audio_url);
|
||||
// Persist newly generated audio reference
|
||||
saveDialogToLocalStorage();
|
||||
} else {
|
||||
console.error('Invalid result structure:', result);
|
||||
throw new Error('Invalid response: missing audio_url');
|
||||
|
@ -321,7 +444,7 @@ async function initializeDialogEditor() {
|
|||
} catch (err) {
|
||||
console.error('Error in generateLine:', err);
|
||||
dialogItems[index].error = err.message || 'Failed to generate audio.';
|
||||
alert(dialogItems[index].error);
|
||||
showNotice(dialogItems[index].error, 'error');
|
||||
} finally {
|
||||
dialogItems[index].isGenerating = false;
|
||||
renderDialogItems();
|
||||
|
@ -330,19 +453,107 @@ async function initializeDialogEditor() {
|
|||
actionsTd.appendChild(generateBtn);
|
||||
|
||||
// --- NEW: Per-line Play button ---
|
||||
const playBtn = document.createElement('button');
|
||||
playBtn.innerHTML = '⏵';
|
||||
playBtn.title = item.audioUrl ? 'Play generated audio' : 'No audio generated yet';
|
||||
playBtn.className = 'play-line-btn';
|
||||
playBtn.disabled = !item.audioUrl;
|
||||
playBtn.onclick = () => {
|
||||
if (!item.audioUrl) return;
|
||||
let audioUrl = item.audioUrl.startsWith('http') ? item.audioUrl : `${API_BASE_URL_FOR_FILES}${item.audioUrl}`;
|
||||
// Use a shared audio element or create one per play
|
||||
let audio = new window.Audio(audioUrl);
|
||||
audio.play();
|
||||
const playPauseBtn = document.createElement('button');
|
||||
playPauseBtn.innerHTML = '⏵';
|
||||
playPauseBtn.title = item.audioUrl ? 'Play' : 'No audio generated yet';
|
||||
playPauseBtn.className = 'play-line-btn';
|
||||
playPauseBtn.disabled = !item.audioUrl;
|
||||
|
||||
const stopBtn = document.createElement('button');
|
||||
stopBtn.innerHTML = '⏹';
|
||||
stopBtn.title = 'Stop';
|
||||
stopBtn.className = 'stop-line-btn';
|
||||
stopBtn.disabled = !item.audioUrl;
|
||||
|
||||
const setBtnStatesForPlaying = () => {
|
||||
try {
|
||||
playPauseBtn.innerHTML = '⏸';
|
||||
playPauseBtn.title = 'Pause';
|
||||
stopBtn.disabled = false;
|
||||
} catch (e) { /* detached */ }
|
||||
};
|
||||
actionsTd.appendChild(playBtn);
|
||||
const setBtnStatesForPausedOrStopped = () => {
|
||||
try {
|
||||
playPauseBtn.innerHTML = '⏵';
|
||||
playPauseBtn.title = 'Play';
|
||||
} catch (e) { /* detached */ }
|
||||
};
|
||||
|
||||
const stopCurrent = () => {
|
||||
if (currentLineAudio) {
|
||||
try { currentLineAudio.pause(); currentLineAudio.currentTime = 0; } catch (e) { /* noop */ }
|
||||
}
|
||||
if (currentLinePlayBtn) {
|
||||
try { currentLinePlayBtn.innerHTML = '⏵'; currentLinePlayBtn.title = 'Play'; } catch (e) { /* detached */ }
|
||||
}
|
||||
if (currentLineStopBtn) {
|
||||
try { currentLineStopBtn.disabled = true; } catch (e) { /* detached */ }
|
||||
}
|
||||
currentLineAudio = null;
|
||||
currentLinePlayBtn = null;
|
||||
currentLineStopBtn = null;
|
||||
};
|
||||
|
||||
playPauseBtn.onclick = () => {
|
||||
if (!item.audioUrl) return;
|
||||
const audioUrl = item.audioUrl.startsWith('http') ? item.audioUrl : `${API_BASE_URL_FOR_FILES}${item.audioUrl}`;
|
||||
|
||||
// If controlling the same line
|
||||
if (currentLineAudio && currentLinePlayBtn === playPauseBtn) {
|
||||
if (currentLineAudio.paused) {
|
||||
// Resume
|
||||
currentLineAudio.play().then(() => setBtnStatesForPlaying()).catch(err => {
|
||||
console.error('Audio resume failed:', err);
|
||||
showNotice('Could not resume audio.', 'error', { timeout: 2000 });
|
||||
});
|
||||
} else {
|
||||
// Pause
|
||||
try { currentLineAudio.pause(); } catch (e) { /* noop */ }
|
||||
setBtnStatesForPausedOrStopped();
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
// Switching to a different line: stop previous
|
||||
if (currentLineAudio) {
|
||||
stopCurrent();
|
||||
}
|
||||
|
||||
// Start new audio
|
||||
const audio = new window.Audio(audioUrl);
|
||||
currentLineAudio = audio;
|
||||
currentLinePlayBtn = playPauseBtn;
|
||||
currentLineStopBtn = stopBtn;
|
||||
|
||||
const clearState = () => {
|
||||
if (currentLineAudio === audio) {
|
||||
setBtnStatesForPausedOrStopped();
|
||||
try { stopBtn.disabled = true; } catch (e) { /* detached */ }
|
||||
currentLineAudio = null;
|
||||
currentLinePlayBtn = null;
|
||||
currentLineStopBtn = null;
|
||||
}
|
||||
};
|
||||
|
||||
audio.addEventListener('ended', clearState, { once: true });
|
||||
audio.addEventListener('error', clearState, { once: true });
|
||||
|
||||
audio.play().then(() => setBtnStatesForPlaying()).catch(err => {
|
||||
console.error('Audio play failed:', err);
|
||||
clearState();
|
||||
showNotice('Could not play audio.', 'error', { timeout: 2000 });
|
||||
});
|
||||
};
|
||||
|
||||
stopBtn.onclick = () => {
|
||||
// Only acts if this line is the active one
|
||||
if (currentLineAudio && currentLinePlayBtn === playPauseBtn) {
|
||||
stopCurrent();
|
||||
}
|
||||
};
|
||||
|
||||
actionsTd.appendChild(playPauseBtn);
|
||||
actionsTd.appendChild(stopBtn);
|
||||
|
||||
// --- NEW: Settings button for speech items ---
|
||||
if (item.type === 'speech') {
|
||||
|
@ -383,13 +594,13 @@ async function initializeDialogEditor() {
|
|||
try {
|
||||
availableSpeakersCache = await getSpeakers();
|
||||
} catch (error) {
|
||||
alert('Could not load speakers. Please try again.');
|
||||
showNotice('Could not load speakers. Please try again.', 'error');
|
||||
console.error('Error fetching speakers for dialog:', error);
|
||||
return;
|
||||
}
|
||||
}
|
||||
if (availableSpeakersCache.length === 0) {
|
||||
alert('No speakers available. Please add a speaker first.');
|
||||
showNotice('No speakers available. Please add a speaker first.', 'warning', { timeout: 4000 });
|
||||
return;
|
||||
}
|
||||
|
||||
|
@ -419,10 +630,11 @@ async function initializeDialogEditor() {
|
|||
const speakerId = speakerSelect.value;
|
||||
const text = textInput.value.trim();
|
||||
if (!speakerId || !text) {
|
||||
alert('Please select a speaker and enter text.');
|
||||
showNotice('Please select a speaker and enter text.', 'warning', { timeout: 4000 });
|
||||
return;
|
||||
}
|
||||
dialogItems.push(normalizeDialogItem({ type: 'speech', speaker_id: speakerId, text: text }));
|
||||
saveDialogToLocalStorage();
|
||||
renderDialogItems();
|
||||
clearTempInputArea();
|
||||
};
|
||||
|
@ -461,10 +673,11 @@ async function initializeDialogEditor() {
|
|||
addButton.onclick = () => {
|
||||
const duration = parseFloat(durationInput.value);
|
||||
if (isNaN(duration) || duration <= 0) {
|
||||
alert('Invalid duration. Please enter a positive number.');
|
||||
showNotice('Invalid duration. Please enter a positive number.', 'warning', { timeout: 4000 });
|
||||
return;
|
||||
}
|
||||
dialogItems.push(normalizeDialogItem({ type: 'silence', duration: duration }));
|
||||
saveDialogToLocalStorage();
|
||||
renderDialogItems();
|
||||
clearTempInputArea();
|
||||
};
|
||||
|
@ -486,15 +699,18 @@ async function initializeDialogEditor() {
|
|||
generateDialogBtn.addEventListener('click', async () => {
|
||||
const outputBaseName = outputBaseNameInput.value.trim();
|
||||
if (!outputBaseName) {
|
||||
alert('Please enter an output base name.');
|
||||
showNotice('Please enter an output base name.', 'warning', { timeout: 4000 });
|
||||
outputBaseNameInput.focus();
|
||||
return;
|
||||
}
|
||||
if (dialogItems.length === 0) {
|
||||
alert('Please add at least one speech or silence line to the dialog.');
|
||||
showNotice('Please add at least one speech or silence line to the dialog.', 'warning', { timeout: 4000 });
|
||||
return; // Prevent further execution if no dialog items
|
||||
}
|
||||
|
||||
const prevText = generateDialogBtn.textContent;
|
||||
generateDialogBtn.disabled = true;
|
||||
generateDialogBtn.textContent = 'Generating…';
|
||||
// Smart dialog-wide generation: use pre-generated audio where present
|
||||
const dialogItemsToGenerate = dialogItems.map(item => {
|
||||
// Only send minimal fields for items that need generation
|
||||
|
@ -546,7 +762,11 @@ async function initializeDialogEditor() {
|
|||
} catch (error) {
|
||||
console.error('Dialog generation failed:', error);
|
||||
if (generationLogPre) generationLogPre.textContent = `Error generating dialog: ${error.message}`;
|
||||
alert(`Error generating dialog: ${error.message}`);
|
||||
showNotice(`Error generating dialog: ${error.message}`, 'error');
|
||||
}
|
||||
finally {
|
||||
generateDialogBtn.disabled = false;
|
||||
generateDialogBtn.textContent = prevText;
|
||||
}
|
||||
});
|
||||
}
|
||||
|
@ -554,7 +774,7 @@ async function initializeDialogEditor() {
|
|||
// --- Save/Load Script Functionality ---
|
||||
function saveDialogScript() {
|
||||
if (dialogItems.length === 0) {
|
||||
alert('No dialog items to save. Please add some speech or silence lines first.');
|
||||
showNotice('No dialog items to save. Please add some speech or silence lines first.', 'warning', { timeout: 4000 });
|
||||
return;
|
||||
}
|
||||
|
||||
|
@ -599,11 +819,12 @@ async function initializeDialogEditor() {
|
|||
URL.revokeObjectURL(url);
|
||||
|
||||
console.log(`Dialog script saved as ${filename}`);
|
||||
showNotice(`Dialog script saved as ${filename}`, 'success', { timeout: 3000 });
|
||||
}
|
||||
|
||||
function loadDialogScript(file) {
|
||||
if (!file) {
|
||||
alert('Please select a file to load.');
|
||||
showNotice('Please select a file to load.', 'warning', { timeout: 4000 });
|
||||
return;
|
||||
}
|
||||
|
||||
|
@ -626,19 +847,19 @@ async function initializeDialogEditor() {
|
|||
}
|
||||
} catch (parseError) {
|
||||
console.error(`Error parsing line ${i + 1}:`, parseError);
|
||||
alert(`Error parsing line ${i + 1}: ${parseError.message}`);
|
||||
showNotice(`Error parsing line ${i + 1}: ${parseError.message}`, 'error');
|
||||
return;
|
||||
}
|
||||
}
|
||||
|
||||
if (loadedItems.length === 0) {
|
||||
alert('No valid dialog items found in the file.');
|
||||
showNotice('No valid dialog items found in the file.', 'warning', { timeout: 4000 });
|
||||
return;
|
||||
}
|
||||
|
||||
// Confirm replacement if existing items
|
||||
if (dialogItems.length > 0) {
|
||||
const confirmed = confirm(
|
||||
const confirmed = await confirmAction(
|
||||
`This will replace your current dialog (${dialogItems.length} items) with the loaded script (${loadedItems.length} items). Continue?`
|
||||
);
|
||||
if (!confirmed) return;
|
||||
|
@ -650,30 +871,97 @@ async function initializeDialogEditor() {
|
|||
availableSpeakersCache = await getSpeakers();
|
||||
} catch (error) {
|
||||
console.error('Error fetching speakers:', error);
|
||||
alert('Could not load speakers. Dialog loaded but speaker names may not display correctly.');
|
||||
showNotice('Could not load speakers. Dialog loaded but speaker names may not display correctly.', 'warning', { timeout: 5000 });
|
||||
}
|
||||
}
|
||||
|
||||
// Replace current dialog
|
||||
dialogItems.splice(0, dialogItems.length, ...loadedItems);
|
||||
// Persist loaded script
|
||||
saveDialogToLocalStorage();
|
||||
renderDialogItems();
|
||||
|
||||
console.log(`Loaded ${loadedItems.length} dialog items from script`);
|
||||
alert(`Successfully loaded ${loadedItems.length} dialog items.`);
|
||||
showNotice(`Successfully loaded ${loadedItems.length} dialog items.`, 'success', { timeout: 3000 });
|
||||
|
||||
} catch (error) {
|
||||
console.error('Error loading dialog script:', error);
|
||||
alert(`Error loading dialog script: ${error.message}`);
|
||||
showNotice(`Error loading dialog script: ${error.message}`, 'error');
|
||||
}
|
||||
};
|
||||
|
||||
reader.onerror = function() {
|
||||
alert('Error reading file. Please try again.');
|
||||
showNotice('Error reading file. Please try again.', 'error');
|
||||
};
|
||||
|
||||
reader.readAsText(file);
|
||||
}
|
||||
|
||||
// Load dialog script from pasted JSONL text
|
||||
async function loadDialogScriptFromText(text) {
|
||||
if (!text || !text.trim()) {
|
||||
showNotice('Please paste JSONL content to load.', 'warning', { timeout: 4000 });
|
||||
return false;
|
||||
}
|
||||
try {
|
||||
const lines = text.trim().split('\n');
|
||||
const loadedItems = [];
|
||||
|
||||
for (let i = 0; i < lines.length; i++) {
|
||||
const line = lines[i].trim();
|
||||
if (!line) continue; // Skip empty lines
|
||||
try {
|
||||
const item = JSON.parse(line);
|
||||
const validatedItem = validateDialogItem(item, i + 1);
|
||||
if (validatedItem) {
|
||||
loadedItems.push(normalizeDialogItem(validatedItem));
|
||||
}
|
||||
} catch (parseError) {
|
||||
console.error(`Error parsing line ${i + 1}:`, parseError);
|
||||
showNotice(`Error parsing line ${i + 1}: ${parseError.message}`, 'error');
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
if (loadedItems.length === 0) {
|
||||
showNotice('No valid dialog items found in the pasted content.', 'warning', { timeout: 4000 });
|
||||
return false;
|
||||
}
|
||||
|
||||
// Confirm replacement if existing items
|
||||
if (dialogItems.length > 0) {
|
||||
const confirmed = await confirmAction(
|
||||
`This will replace your current dialog (${dialogItems.length} items) with the pasted script (${loadedItems.length} items). Continue?`
|
||||
);
|
||||
if (!confirmed) return false;
|
||||
}
|
||||
|
||||
// Ensure speakers are loaded before rendering
|
||||
if (availableSpeakersCache.length === 0) {
|
||||
try {
|
||||
availableSpeakersCache = await getSpeakers();
|
||||
} catch (error) {
|
||||
console.error('Error fetching speakers:', error);
|
||||
showNotice('Could not load speakers. Dialog loaded but speaker names may not display correctly.', 'warning', { timeout: 5000 });
|
||||
}
|
||||
}
|
||||
|
||||
// Replace current dialog
|
||||
dialogItems.splice(0, dialogItems.length, ...loadedItems);
|
||||
// Persist loaded script
|
||||
saveDialogToLocalStorage();
|
||||
renderDialogItems();
|
||||
|
||||
console.log(`Loaded ${loadedItems.length} dialog items from pasted text`);
|
||||
showNotice(`Successfully loaded ${loadedItems.length} dialog items.`, 'success', { timeout: 3000 });
|
||||
return true;
|
||||
} catch (error) {
|
||||
console.error('Error loading dialog script from text:', error);
|
||||
showNotice(`Error loading dialog script: ${error.message}`, 'error');
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
function validateDialogItem(item, lineNumber) {
|
||||
if (!item || typeof item !== 'object') {
|
||||
throw new Error(`Line ${lineNumber}: Invalid item format`);
|
||||
|
@ -729,12 +1017,75 @@ async function initializeDialogEditor() {
|
|||
const file = e.target.files[0];
|
||||
if (file) {
|
||||
loadDialogScript(file);
|
||||
// Reset input so same file can be loaded again
|
||||
e.target.value = '';
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
// --- Paste Script (JSONL) Modal Handlers ---
|
||||
if (pasteScriptBtn && pasteModal && pasteText && pasteLoadBtn && pasteCancelBtn && pasteCloseBtn) {
|
||||
let escHandler = null;
|
||||
const closePasteModal = () => {
|
||||
pasteModal.style.display = 'none';
|
||||
pasteLoadBtn.onclick = null;
|
||||
pasteCancelBtn.onclick = null;
|
||||
pasteCloseBtn.onclick = null;
|
||||
pasteModal.onclick = null;
|
||||
if (escHandler) {
|
||||
document.removeEventListener('keydown', escHandler);
|
||||
escHandler = null;
|
||||
}
|
||||
};
|
||||
const openPasteModal = () => {
|
||||
pasteText.value = '';
|
||||
pasteModal.style.display = 'flex';
|
||||
escHandler = (e) => { if (e.key === 'Escape') closePasteModal(); };
|
||||
document.addEventListener('keydown', escHandler);
|
||||
pasteModal.onclick = (e) => { if (e.target === pasteModal) closePasteModal(); };
|
||||
pasteCloseBtn.onclick = closePasteModal;
|
||||
pasteCancelBtn.onclick = closePasteModal;
|
||||
pasteLoadBtn.onclick = async () => {
|
||||
const ok = await loadDialogScriptFromText(pasteText.value);
|
||||
if (ok) closePasteModal();
|
||||
};
|
||||
};
|
||||
pasteScriptBtn.addEventListener('click', openPasteModal);
|
||||
}
|
||||
|
||||
// --- Clear Dialog Button ---
|
||||
let clearDialogBtn = document.getElementById('clear-dialog-btn');
|
||||
if (!clearDialogBtn) {
|
||||
clearDialogBtn = document.createElement('button');
|
||||
clearDialogBtn.id = 'clear-dialog-btn';
|
||||
clearDialogBtn.textContent = 'Clear Dialog';
|
||||
// Insert next to Save/Load if possible
|
||||
const saveLoadContainer = saveScriptBtn ? saveScriptBtn.parentElement : null;
|
||||
if (saveLoadContainer) {
|
||||
saveLoadContainer.appendChild(clearDialogBtn);
|
||||
} else {
|
||||
// Fallback: append near the add buttons container
|
||||
const addBtnsContainer = addSpeechLineBtn ? addSpeechLineBtn.parentElement : null;
|
||||
if (addBtnsContainer) addBtnsContainer.appendChild(clearDialogBtn);
|
||||
}
|
||||
}
|
||||
|
||||
if (clearDialogBtn) {
|
||||
clearDialogBtn.addEventListener('click', async () => {
|
||||
if (dialogItems.length === 0) {
|
||||
showNotice('Dialog is already empty.', 'info', { timeout: 2500 });
|
||||
return;
|
||||
}
|
||||
const ok = await confirmAction(`This will remove ${dialogItems.length} dialog item(s). Continue?`);
|
||||
if (!ok) return;
|
||||
// Clear any transient input UI
|
||||
if (typeof clearTempInputArea === 'function') clearTempInputArea();
|
||||
// Clear state and persistence
|
||||
dialogItems.splice(0, dialogItems.length);
|
||||
try { localStorage.removeItem(LS_KEY); } catch (e) { /* ignore */ }
|
||||
renderDialogItems();
|
||||
showNotice('Dialog cleared.', 'success', { timeout: 2500 });
|
||||
});
|
||||
}
|
||||
|
||||
console.log('Dialog Editor Initialized');
|
||||
renderDialogItems(); // Initial render (empty)
|
||||
|
||||
|
@ -781,6 +1132,8 @@ async function initializeDialogEditor() {
|
|||
dialogItems[index].audioUrl = null;
|
||||
|
||||
closeModal();
|
||||
// Persist settings change
|
||||
saveDialogToLocalStorage();
|
||||
renderDialogItems(); // Re-render to reflect changes
|
||||
console.log('TTS settings updated for item:', dialogItems[index]);
|
||||
};
|
||||
|
|
|
@ -13,8 +13,15 @@ const getEnvVar = (name, defaultValue) => {
|
|||
};
|
||||
|
||||
// API Configuration
|
||||
export const API_BASE_URL = getEnvVar('VITE_API_BASE_URL', 'http://localhost:8000');
|
||||
export const API_BASE_URL_WITH_PREFIX = getEnvVar('VITE_API_BASE_URL_WITH_PREFIX', 'http://localhost:8000/api');
|
||||
// Default to the same hostname as the frontend, on port 8000 (override via VITE_API_BASE_URL*)
|
||||
const _defaultHost = (typeof window !== 'undefined' && window.location?.hostname) || 'localhost';
|
||||
const _defaultPort = getEnvVar('VITE_API_BASE_URL_PORT', '8000');
|
||||
const _defaultBase = `http://${_defaultHost}:${_defaultPort}`;
|
||||
export const API_BASE_URL = getEnvVar('VITE_API_BASE_URL', _defaultBase);
|
||||
export const API_BASE_URL_WITH_PREFIX = getEnvVar(
|
||||
'VITE_API_BASE_URL_WITH_PREFIX',
|
||||
`${_defaultBase}/api`
|
||||
);
|
||||
|
||||
// For file serving (same as API_BASE_URL since files are served from the same server)
|
||||
export const API_BASE_URL_FOR_FILES = API_BASE_URL;
|
||||
|
|
|
@ -0,0 +1,9 @@
|
|||
// jest.config.cjs
|
||||
module.exports = {
|
||||
testEnvironment: 'node',
|
||||
transform: {
|
||||
'^.+\\.js$': 'babel-jest',
|
||||
},
|
||||
moduleFileExtensions: ['js', 'json'],
|
||||
roots: ['<rootDir>/frontend/tests', '<rootDir>'],
|
||||
};
|
|
@ -5,11 +5,13 @@
|
|||
"main": "index.js",
|
||||
"type": "module",
|
||||
"scripts": {
|
||||
"test": "jest"
|
||||
"test": "jest",
|
||||
"test:frontend": "jest --config ./jest.config.cjs",
|
||||
"frontend:dev": "python3 frontend/start_dev_server.py"
|
||||
},
|
||||
"repository": {
|
||||
"type": "git",
|
||||
"url": "https://oauth2:78f77aaebb8fa1cd3efbd5b738177c127f7d7d0b@gitea.r8z.us/stwhite/chatterbox-ui.git"
|
||||
"url": "https://gitea.r8z.us/stwhite/chatterbox-ui.git"
|
||||
},
|
||||
"keywords": [],
|
||||
"author": "",
|
||||
|
@ -17,7 +19,7 @@
|
|||
"devDependencies": {
|
||||
"@babel/core": "^7.27.4",
|
||||
"@babel/preset-env": "^7.27.2",
|
||||
"babel-jest": "^30.0.0-beta.3",
|
||||
"babel-jest": "^29.7.0",
|
||||
"jest": "^29.7.0"
|
||||
}
|
||||
}
|
||||
|
|
|
@ -0,0 +1,123 @@
|
|||
#Requires -Version 5.1
|
||||
<#!
|
||||
Chatterbox TTS - Windows setup script
|
||||
|
||||
What it does:
|
||||
- Creates a Python virtual environment in .venv (if missing)
|
||||
- Upgrades pip
|
||||
- Installs dependencies from backend/requirements.txt and requirements.txt
|
||||
- Creates a default .env with sensible ports if not present
|
||||
- Launches start_servers.py using the venv's Python
|
||||
|
||||
Usage:
|
||||
- Right-click this file and "Run with PowerShell" OR from PowerShell:
|
||||
./setup-windows.ps1
|
||||
- Optional flags:
|
||||
-NoInstall -> Skip installing dependencies (just start servers)
|
||||
-NoStart -> Prepare env but do not start servers
|
||||
|
||||
Notes:
|
||||
- You may need to allow script execution once:
|
||||
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
|
||||
- Press Ctrl+C in the console to stop both servers.
|
||||
!#>
|
||||
|
||||
param(
|
||||
[switch]$NoInstall,
|
||||
[switch]$NoStart
|
||||
)
|
||||
|
||||
$ErrorActionPreference = 'Stop'
|
||||
|
||||
function Write-Info($msg) { Write-Host "[INFO] $msg" -ForegroundColor Cyan }
|
||||
function Write-Ok($msg) { Write-Host "[ OK ] $msg" -ForegroundColor Green }
|
||||
function Write-Warn($msg) { Write-Host "[WARN] $msg" -ForegroundColor Yellow }
|
||||
function Write-Err($msg) { Write-Host "[FAIL] $msg" -ForegroundColor Red }
|
||||
|
||||
$root = Split-Path -Parent $MyInvocation.MyCommand.Path
|
||||
Set-Location $root
|
||||
|
||||
$venvDir = Join-Path $root ".venv"
|
||||
$venvPython = Join-Path $venvDir "Scripts/python.exe"
|
||||
|
||||
# 1) Ensure Python available
|
||||
function Get-BasePython {
|
||||
try {
|
||||
$pyExe = (Get-Command py -ErrorAction SilentlyContinue)
|
||||
if ($pyExe) { return 'py -3' }
|
||||
} catch { }
|
||||
try {
|
||||
$pyExe = (Get-Command python -ErrorAction SilentlyContinue)
|
||||
if ($pyExe) { return 'python' }
|
||||
} catch { }
|
||||
throw "Python not found. Please install Python 3.x and add it to PATH."
|
||||
}
|
||||
|
||||
# 2) Create venv if missing
|
||||
if (-not (Test-Path $venvPython)) {
|
||||
Write-Info "Creating virtual environment in .venv"
|
||||
$basePy = Get-BasePython
|
||||
if ($basePy -eq 'py -3') {
|
||||
& py -3 -m venv .venv
|
||||
} else {
|
||||
& python -m venv .venv
|
||||
}
|
||||
Write-Ok "Virtual environment created"
|
||||
} else {
|
||||
Write-Info "Using existing virtual environment: $venvDir"
|
||||
}
|
||||
|
||||
if (-not (Test-Path $venvPython)) {
|
||||
throw ".venv python not found at $venvPython"
|
||||
}
|
||||
|
||||
# 3) Install dependencies
|
||||
if (-not $NoInstall) {
|
||||
Write-Info "Upgrading pip"
|
||||
& $venvPython -m pip install --upgrade pip
|
||||
|
||||
# Backend requirements
|
||||
$backendReq = Join-Path $root 'backend/requirements.txt'
|
||||
if (Test-Path $backendReq) {
|
||||
Write-Info "Installing backend requirements"
|
||||
& $venvPython -m pip install -r $backendReq
|
||||
} else {
|
||||
Write-Warn "backend/requirements.txt not found"
|
||||
}
|
||||
|
||||
# Root requirements (optional frontend / project libs)
|
||||
$rootReq = Join-Path $root 'requirements.txt'
|
||||
if (Test-Path $rootReq) {
|
||||
Write-Info "Installing root requirements"
|
||||
& $venvPython -m pip install -r $rootReq
|
||||
} else {
|
||||
Write-Warn "requirements.txt not found at repo root"
|
||||
}
|
||||
|
||||
Write-Ok "Dependency installation complete"
|
||||
}
|
||||
|
||||
# 4) Ensure .env exists with sensible defaults
|
||||
$envPath = Join-Path $root '.env'
|
||||
if (-not (Test-Path $envPath)) {
|
||||
Write-Info "Creating default .env"
|
||||
@(
|
||||
'BACKEND_PORT=8000',
|
||||
'BACKEND_HOST=127.0.0.1',
|
||||
'FRONTEND_PORT=8001',
|
||||
'FRONTEND_HOST=127.0.0.1'
|
||||
) -join "`n" | Out-File -FilePath $envPath -Encoding utf8 -Force
|
||||
Write-Ok ".env created"
|
||||
} else {
|
||||
Write-Info ".env already exists; leaving as-is"
|
||||
}
|
||||
|
||||
# 5) Start servers
|
||||
if ($NoStart) {
|
||||
Write-Info "-NoStart specified; setup complete. You can start later with:"
|
||||
Write-Host " `"$venvPython`" `"$root\start_servers.py`"" -ForegroundColor Gray
|
||||
exit 0
|
||||
}
|
||||
|
||||
Write-Info "Starting servers via start_servers.py"
|
||||
& $venvPython "$root/start_servers.py"
|
|
@ -28,3 +28,9 @@ dd3552d9-f4e8-49ed-9892-f9e67afcf23c:
|
|||
2cdd6d3d-c533-44bf-a5f6-cc83bd089d32:
|
||||
name: Grace
|
||||
sample_path: speaker_samples/2cdd6d3d-c533-44bf-a5f6-cc83bd089d32.wav
|
||||
3d3e85db-3d67-4488-94b2-ffc189fbb287:
|
||||
name: RCB
|
||||
sample_path: speaker_samples/3d3e85db-3d67-4488-94b2-ffc189fbb287.wav
|
||||
f754cf35-892c-49b6-822a-f2e37246623b:
|
||||
name: Jim
|
||||
sample_path: speaker_samples/f754cf35-892c-49b6-822a-f2e37246623b.wav
|
||||
|
|
|
@ -14,101 +14,109 @@ from pathlib import Path
|
|||
# Try to load environment variables, but don't fail if dotenv is not available
|
||||
try:
|
||||
from dotenv import load_dotenv
|
||||
|
||||
load_dotenv()
|
||||
except ImportError:
|
||||
print("python-dotenv not installed, using system environment variables only")
|
||||
|
||||
# Configuration
|
||||
BACKEND_PORT = int(os.getenv('BACKEND_PORT', '8000'))
|
||||
BACKEND_HOST = os.getenv('BACKEND_HOST', '0.0.0.0')
|
||||
FRONTEND_PORT = int(os.getenv('FRONTEND_PORT', '8001'))
|
||||
FRONTEND_HOST = os.getenv('FRONTEND_HOST', '127.0.0.1')
|
||||
BACKEND_PORT = int(os.getenv("BACKEND_PORT", "8000"))
|
||||
BACKEND_HOST = os.getenv("BACKEND_HOST", "0.0.0.0")
|
||||
# Frontend host/port (for dev server binding)
|
||||
FRONTEND_PORT = int(os.getenv("FRONTEND_PORT", "8001"))
|
||||
FRONTEND_HOST = os.getenv("FRONTEND_HOST", "0.0.0.0")
|
||||
|
||||
# Export frontend host/port so backend CORS config can pick them up automatically
|
||||
os.environ["FRONTEND_HOST"] = FRONTEND_HOST
|
||||
os.environ["FRONTEND_PORT"] = str(FRONTEND_PORT)
|
||||
|
||||
# Get project root directory
|
||||
PROJECT_ROOT = Path(__file__).parent.absolute()
|
||||
|
||||
|
||||
def run_backend():
|
||||
"""Run the backend FastAPI server"""
|
||||
os.chdir(PROJECT_ROOT / "backend")
|
||||
cmd = [
|
||||
sys.executable, "-m", "uvicorn",
|
||||
"app.main:app",
|
||||
"--reload",
|
||||
f"--host={BACKEND_HOST}",
|
||||
f"--port={BACKEND_PORT}"
|
||||
sys.executable,
|
||||
"-m",
|
||||
"uvicorn",
|
||||
"app.main:app",
|
||||
"--reload",
|
||||
f"--host={BACKEND_HOST}",
|
||||
f"--port={BACKEND_PORT}",
|
||||
]
|
||||
|
||||
|
||||
print(f"\n{'='*50}")
|
||||
print(f"Starting Backend Server at http://{BACKEND_HOST}:{BACKEND_PORT}")
|
||||
print(f"API docs available at http://{BACKEND_HOST}:{BACKEND_PORT}/docs")
|
||||
print(f"{'='*50}\n")
|
||||
|
||||
|
||||
return subprocess.Popen(
|
||||
cmd,
|
||||
cmd,
|
||||
stdout=subprocess.PIPE,
|
||||
stderr=subprocess.STDOUT,
|
||||
universal_newlines=True,
|
||||
bufsize=1
|
||||
bufsize=1,
|
||||
)
|
||||
|
||||
|
||||
def run_frontend():
|
||||
"""Run the frontend development server"""
|
||||
frontend_dir = PROJECT_ROOT / "frontend"
|
||||
os.chdir(frontend_dir)
|
||||
|
||||
|
||||
cmd = [sys.executable, "start_dev_server.py"]
|
||||
env = os.environ.copy()
|
||||
env["VITE_DEV_SERVER_HOST"] = FRONTEND_HOST
|
||||
env["VITE_DEV_SERVER_PORT"] = str(FRONTEND_PORT)
|
||||
|
||||
|
||||
print(f"\n{'='*50}")
|
||||
print(f"Starting Frontend Server at http://{FRONTEND_HOST}:{FRONTEND_PORT}")
|
||||
print(f"{'='*50}\n")
|
||||
|
||||
|
||||
return subprocess.Popen(
|
||||
cmd,
|
||||
env=env,
|
||||
stdout=subprocess.PIPE,
|
||||
stderr=subprocess.STDOUT,
|
||||
universal_newlines=True,
|
||||
bufsize=1
|
||||
bufsize=1,
|
||||
)
|
||||
|
||||
|
||||
def print_process_output(process, prefix):
|
||||
"""Print process output with a prefix"""
|
||||
for line in iter(process.stdout.readline, ''):
|
||||
for line in iter(process.stdout.readline, ""):
|
||||
if not line:
|
||||
break
|
||||
print(f"{prefix} | {line}", end='')
|
||||
print(f"{prefix} | {line}", end="")
|
||||
|
||||
|
||||
def main():
|
||||
"""Main function to start both servers"""
|
||||
print("\n🚀 Starting Chatterbox UI Development Environment")
|
||||
|
||||
|
||||
# Start the backend server
|
||||
backend_process = run_backend()
|
||||
|
||||
|
||||
# Give the backend a moment to start
|
||||
time.sleep(2)
|
||||
|
||||
|
||||
# Start the frontend server
|
||||
frontend_process = run_frontend()
|
||||
|
||||
|
||||
# Create threads to monitor and print output
|
||||
backend_monitor = threading.Thread(
|
||||
target=print_process_output,
|
||||
args=(backend_process, "BACKEND"),
|
||||
daemon=True
|
||||
target=print_process_output, args=(backend_process, "BACKEND"), daemon=True
|
||||
)
|
||||
frontend_monitor = threading.Thread(
|
||||
target=print_process_output,
|
||||
args=(frontend_process, "FRONTEND"),
|
||||
daemon=True
|
||||
target=print_process_output, args=(frontend_process, "FRONTEND"), daemon=True
|
||||
)
|
||||
|
||||
|
||||
backend_monitor.start()
|
||||
frontend_monitor.start()
|
||||
|
||||
|
||||
# Setup signal handling for graceful shutdown
|
||||
def signal_handler(sig, frame):
|
||||
print("\n\n🛑 Shutting down servers...")
|
||||
|
@ -117,16 +125,16 @@ def main():
|
|||
# Threads are daemon, so they'll exit when the main thread exits
|
||||
print("✅ Servers stopped successfully")
|
||||
sys.exit(0)
|
||||
|
||||
|
||||
signal.signal(signal.SIGINT, signal_handler)
|
||||
|
||||
|
||||
# Print access information
|
||||
print("\n📋 Access Information:")
|
||||
print(f" • Frontend: http://{FRONTEND_HOST}:{FRONTEND_PORT}")
|
||||
print(f" • Backend API: http://{BACKEND_HOST}:{BACKEND_PORT}/api")
|
||||
print(f" • API Documentation: http://{BACKEND_HOST}:{BACKEND_PORT}/docs")
|
||||
print("\n⚠️ Press Ctrl+C to stop both servers\n")
|
||||
|
||||
|
||||
# Keep the main process running
|
||||
try:
|
||||
while True:
|
||||
|
@ -134,5 +142,6 @@ def main():
|
|||
except KeyboardInterrupt:
|
||||
signal_handler(None, None)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
|
|
Loading…
Reference in New Issue