2.4 KiB
2.4 KiB
Chatterbox Dialog Generator
This tool generates audio files for dialog from a markdown file, using the Chatterbox TTS system. It maps speaker names to audio samples using a YAML configuration file.
Features
- Maps speaker names to audio samples via a YAML config file
- Processes markdown dialog files with lines in the format:
Name: "Text"
- Generates sequentially numbered audio files (e.g.,
001-output.wav
,002-output.wav
) - Automatically splits long dialog lines (>300 characters) at sentence boundaries
- Provides a summary of generated files
Requirements
- Python 3.6+
- PyYAML
- torchaudio
- Chatterbox TTS library
Usage
python cbx-dialog-generate.py --config speakers.yaml --dialog sample-dialog.md --output-base output
Arguments
--config
: Path to the YAML config file mapping speaker names to audio samples--dialog
: Path to the markdown dialog file--output-base
: Base name for output files (e.g., "output" for "001-output.wav")--reinit-each-line
: Re-initialize the model after each line to reduce memory usage (useful for long dialogs)
Config File Format (YAML)
The config file maps speaker names (as they appear in the dialog) to audio sample files:
Denise: denise.wav
Mark: mark.wav
Mary: mary.wav
Dialog File Format (Markdown)
The dialog file should contain lines in the format:
Name: "Text"
For example:
Denise: "What do you think is wrong with me?"
Mark: "I think you're being overly emotional."
Mary: "Jesus, Mark, can you be any more of an asshole?"
Output
The script generates sequentially numbered WAV files:
001-output.wav
002-output.wav
- etc.
If a dialog line exceeds 300 characters, it will be split at sentence boundaries into multiple files, each maintaining the sequential numbering.
Example
Given the sample dialog and config files, running:
python cbx-dialog-generate.py --config speakers.yaml --dialog sample-dialog.md --output-base output
For long dialogs where memory usage is a concern, you can use:
python cbx-dialog-generate.py --config speakers.yaml --dialog sample-dialog.md --output-base output --reinit-each-line
Either command would generate:
001-output.wav
- Denise's first line002-output.wav
- Mark's first line003-output.wav
- Mary's line004-output.wav
- First part of Denise's long line005-output.wav
- Second part of Denise's long line006-output.wav
- Mark's second line