chatterbox-ui/README-dialog-generator.md

2.4 KiB

Chatterbox Dialog Generator

This tool generates audio files for dialog from a markdown file, using the Chatterbox TTS system. It maps speaker names to audio samples using a YAML configuration file.

Features

  • Maps speaker names to audio samples via a YAML config file
  • Processes markdown dialog files with lines in the format: Name: "Text"
  • Generates sequentially numbered audio files (e.g., 001-output.wav, 002-output.wav)
  • Automatically splits long dialog lines (>300 characters) at sentence boundaries
  • Provides a summary of generated files

Requirements

  • Python 3.6+
  • PyYAML
  • torchaudio
  • Chatterbox TTS library

Usage

python cbx-dialog-generate.py --config speakers.yaml --dialog sample-dialog.md --output-base output

Arguments

  • --config: Path to the YAML config file mapping speaker names to audio samples
  • --dialog: Path to the markdown dialog file
  • --output-base: Base name for output files (e.g., "output" for "001-output.wav")
  • --reinit-each-line: Re-initialize the model after each line to reduce memory usage (useful for long dialogs)

Config File Format (YAML)

The config file maps speaker names (as they appear in the dialog) to audio sample files:

Denise: denise.wav
Mark: mark.wav
Mary: mary.wav

Dialog File Format (Markdown)

The dialog file should contain lines in the format:

Name: "Text"

For example:

Denise: "What do you think is wrong with me?"
Mark: "I think you're being overly emotional."
Mary: "Jesus, Mark, can you be any more of an asshole?"

Output

The script generates sequentially numbered WAV files:

  • 001-output.wav
  • 002-output.wav
  • etc.

If a dialog line exceeds 300 characters, it will be split at sentence boundaries into multiple files, each maintaining the sequential numbering.

Example

Given the sample dialog and config files, running:

python cbx-dialog-generate.py --config speakers.yaml --dialog sample-dialog.md --output-base output

For long dialogs where memory usage is a concern, you can use:

python cbx-dialog-generate.py --config speakers.yaml --dialog sample-dialog.md --output-base output --reinit-each-line

Either command would generate:

  • 001-output.wav - Denise's first line
  • 002-output.wav - Mark's first line
  • 003-output.wav - Mary's line
  • 004-output.wav - First part of Denise's long line
  • 005-output.wav - Second part of Denise's long line
  • 006-output.wav - Mark's second line