chatterbox-ui/README-dialog-generator.md

# Chatterbox Dialog Generator

This tool generates audio files for dialog from a markdown file, using the Chatterbox TTS system. It maps speaker names to audio samples using a YAML configuration file.

## Features

- Maps speaker names to audio samples via a YAML config file
- Processes markdown dialog files with lines in the format: `Name: "Text"`
- Generates sequentially numbered audio files (e.g., `001-output.wav`, `002-output.wav`)
- Automatically splits long dialog lines (>300 characters) at sentence boundaries
- Provides a summary of generated files

## Requirements

- Python 3.6+
- PyYAML
- torchaudio
- Chatterbox TTS library

## Usage

```bash
python cbx-dialog-generate.py --config speakers.yaml --dialog sample-dialog.md --output-base output
```

### Arguments

- `--config`: Path to the YAML config file mapping speaker names to audio samples
- `--dialog`: Path to the markdown dialog file
- `--output-base`: Base name for output files (e.g., "output" for "001-output.wav")
- `--reinit-each-line`: Re-initialize the model after each line to reduce memory usage (useful for long dialogs)

## Config File Format (YAML)

The config file maps speaker names (as they appear in the dialog) to audio sample files:

```yaml
Denise: denise.wav
Mark: mark.wav
Mary: mary.wav
```

## Dialog File Format (Markdown)

The dialog file should contain lines in the format:

```
Name: "Text"
```

For example:

```
Denise: "What do you think is wrong with me?"
Mark: "I think you're being overly emotional."
Mary: "Jesus, Mark, can you be any more of an asshole?"
```

## Output

The script generates sequentially numbered WAV files:

- `001-output.wav`
- `002-output.wav`
- etc.

If a dialog line exceeds 300 characters, it will be split at sentence boundaries into multiple files, each maintaining the sequential numbering.

## Example

Given the sample dialog and config files, running:

```bash
python cbx-dialog-generate.py --config speakers.yaml --dialog sample-dialog.md --output-base output
```

For long dialogs where memory usage is a concern, you can use:

```bash
python cbx-dialog-generate.py --config speakers.yaml --dialog sample-dialog.md --output-base output --reinit-each-line
```

Either command would generate:
- `001-output.wav` - Denise's first line
- `002-output.wav` - Mark's first line
- `003-output.wav` - Mary's line
- `004-output.wav` - First part of Denise's long line
- `005-output.wav` - Second part of Denise's long line
- `006-output.wav` - Mark's second line