90 lines
2.4 KiB
Markdown
90 lines
2.4 KiB
Markdown
# Chatterbox Dialog Generator
|
|
|
|
This tool generates audio files for dialog from a markdown file, using the Chatterbox TTS system. It maps speaker names to audio samples using a YAML configuration file.
|
|
|
|
## Features
|
|
|
|
- Maps speaker names to audio samples via a YAML config file
|
|
- Processes markdown dialog files with lines in the format: `Name: "Text"`
|
|
- Generates sequentially numbered audio files (e.g., `001-output.wav`, `002-output.wav`)
|
|
- Automatically splits long dialog lines (>300 characters) at sentence boundaries
|
|
- Provides a summary of generated files
|
|
|
|
## Requirements
|
|
|
|
- Python 3.6+
|
|
- PyYAML
|
|
- torchaudio
|
|
- Chatterbox TTS library
|
|
|
|
## Usage
|
|
|
|
```bash
|
|
python cbx-dialog-generate.py --config speakers.yaml --dialog sample-dialog.md --output-base output
|
|
```
|
|
|
|
### Arguments
|
|
|
|
- `--config`: Path to the YAML config file mapping speaker names to audio samples
|
|
- `--dialog`: Path to the markdown dialog file
|
|
- `--output-base`: Base name for output files (e.g., "output" for "001-output.wav")
|
|
- `--reinit-each-line`: Re-initialize the model after each line to reduce memory usage (useful for long dialogs)
|
|
|
|
## Config File Format (YAML)
|
|
|
|
The config file maps speaker names (as they appear in the dialog) to audio sample files:
|
|
|
|
```yaml
|
|
Denise: denise.wav
|
|
Mark: mark.wav
|
|
Mary: mary.wav
|
|
```
|
|
|
|
## Dialog File Format (Markdown)
|
|
|
|
The dialog file should contain lines in the format:
|
|
|
|
```
|
|
Name: "Text"
|
|
```
|
|
|
|
For example:
|
|
|
|
```
|
|
Denise: "What do you think is wrong with me?"
|
|
Mark: "I think you're being overly emotional."
|
|
Mary: "Jesus, Mark, can you be any more of an asshole?"
|
|
```
|
|
|
|
## Output
|
|
|
|
The script generates sequentially numbered WAV files:
|
|
|
|
- `001-output.wav`
|
|
- `002-output.wav`
|
|
- etc.
|
|
|
|
If a dialog line exceeds 300 characters, it will be split at sentence boundaries into multiple files, each maintaining the sequential numbering.
|
|
|
|
## Example
|
|
|
|
Given the sample dialog and config files, running:
|
|
|
|
```bash
|
|
python cbx-dialog-generate.py --config speakers.yaml --dialog sample-dialog.md --output-base output
|
|
```
|
|
|
|
For long dialogs where memory usage is a concern, you can use:
|
|
|
|
```bash
|
|
python cbx-dialog-generate.py --config speakers.yaml --dialog sample-dialog.md --output-base output --reinit-each-line
|
|
```
|
|
|
|
Either command would generate:
|
|
- `001-output.wav` - Denise's first line
|
|
- `002-output.wav` - Mark's first line
|
|
- `003-output.wav` - Mary's line
|
|
- `004-output.wav` - First part of Denise's long line
|
|
- `005-output.wav` - Second part of Denise's long line
|
|
- `006-output.wav` - Mark's second line
|