chatterbox-ui/README-dialog-generator.md

90 lines
2.4 KiB
Markdown

# Chatterbox Dialog Generator
This tool generates audio files for dialog from a markdown file, using the Chatterbox TTS system. It maps speaker names to audio samples using a YAML configuration file.
## Features
- Maps speaker names to audio samples via a YAML config file
- Processes markdown dialog files with lines in the format: `Name: "Text"`
- Generates sequentially numbered audio files (e.g., `001-output.wav`, `002-output.wav`)
- Automatically splits long dialog lines (>300 characters) at sentence boundaries
- Provides a summary of generated files
## Requirements
- Python 3.6+
- PyYAML
- torchaudio
- Chatterbox TTS library
## Usage
```bash
python cbx-dialog-generate.py --config speakers.yaml --dialog sample-dialog.md --output-base output
```
### Arguments
- `--config`: Path to the YAML config file mapping speaker names to audio samples
- `--dialog`: Path to the markdown dialog file
- `--output-base`: Base name for output files (e.g., "output" for "001-output.wav")
- `--reinit-each-line`: Re-initialize the model after each line to reduce memory usage (useful for long dialogs)
## Config File Format (YAML)
The config file maps speaker names (as they appear in the dialog) to audio sample files:
```yaml
Denise: denise.wav
Mark: mark.wav
Mary: mary.wav
```
## Dialog File Format (Markdown)
The dialog file should contain lines in the format:
```
Name: "Text"
```
For example:
```
Denise: "What do you think is wrong with me?"
Mark: "I think you're being overly emotional."
Mary: "Jesus, Mark, can you be any more of an asshole?"
```
## Output
The script generates sequentially numbered WAV files:
- `001-output.wav`
- `002-output.wav`
- etc.
If a dialog line exceeds 300 characters, it will be split at sentence boundaries into multiple files, each maintaining the sequential numbering.
## Example
Given the sample dialog and config files, running:
```bash
python cbx-dialog-generate.py --config speakers.yaml --dialog sample-dialog.md --output-base output
```
For long dialogs where memory usage is a concern, you can use:
```bash
python cbx-dialog-generate.py --config speakers.yaml --dialog sample-dialog.md --output-base output --reinit-each-line
```
Either command would generate:
- `001-output.wav` - Denise's first line
- `002-output.wav` - Mark's first line
- `003-output.wav` - Mary's line
- `004-output.wav` - First part of Denise's long line
- `005-output.wav` - Second part of Denise's long line
- `006-output.wav` - Mark's second line