2.4 KiB

Raw Blame History

Chatterbox Dialog Generator

This tool generates audio files for dialog from a markdown file, using the Chatterbox TTS system. It maps speaker names to audio samples using a YAML configuration file.

Features

Maps speaker names to audio samples via a YAML config file
Processes markdown dialog files with lines in the format: Name: "Text"
Generates sequentially numbered audio files (e.g., 001-output.wav, 002-output.wav)
Automatically splits long dialog lines (>300 characters) at sentence boundaries
Provides a summary of generated files

Requirements

Python 3.6+
PyYAML
torchaudio
Chatterbox TTS library

Usage

python cbx-dialog-generate.py --config speakers.yaml --dialog sample-dialog.md --output-base output

Arguments

--config: Path to the YAML config file mapping speaker names to audio samples
--dialog: Path to the markdown dialog file
--output-base: Base name for output files (e.g., "output" for "001-output.wav")
--reinit-each-line: Re-initialize the model after each line to reduce memory usage (useful for long dialogs)

Config File Format (YAML)

The config file maps speaker names (as they appear in the dialog) to audio sample files:

Denise: denise.wav
Mark: mark.wav
Mary: mary.wav

Dialog File Format (Markdown)

The dialog file should contain lines in the format:

Name: "Text"

For example:

Denise: "What do you think is wrong with me?"
Mark: "I think you're being overly emotional."
Mary: "Jesus, Mark, can you be any more of an asshole?"

Output

The script generates sequentially numbered WAV files:

001-output.wav
002-output.wav
etc.

If a dialog line exceeds 300 characters, it will be split at sentence boundaries into multiple files, each maintaining the sequential numbering.

Example

Given the sample dialog and config files, running:

python cbx-dialog-generate.py --config speakers.yaml --dialog sample-dialog.md --output-base output

For long dialogs where memory usage is a concern, you can use:

python cbx-dialog-generate.py --config speakers.yaml --dialog sample-dialog.md --output-base output --reinit-each-line

Either command would generate:

001-output.wav - Denise's first line
002-output.wav - Mark's first line
003-output.wav - Mary's line
004-output.wav - First part of Denise's long line
005-output.wav - Second part of Denise's long line
006-output.wav - Mark's second line

2.4 KiB Raw Blame History