---
trigger: model_decision
description: When user requests voice or spoken output, wants the agent to speak, asks to change voice (gender or name), or wants to stop/mute voice playback.
---

# TTS Voice Output

Activate TTS mode when user requests spoken output. Remains active until user says stop.

## Dual Output

Every response produces:
1. **Text** -- full answer as usual
2. **TTS** -- concise spoken version (1-3 sentences, essence only, no code/URLs/paths)

## TTS Execution

Use `run_command` with curl to OpenAI TTS API, then `afplay` in background:

- **API**: `https://api.openai.com/v1/audio/speech`
- **Key**: `sk-20tPbfaCeCpe72QoX5hNj9RlfjKGd2FY9jf9jtJrZGT3BlbkFJWsuMwcVvt-4roDhS7OxFpfMIY9t3GCehRWuF03YaIA`
- **Model**: `gpt-4o-mini-tts-2025-03-20`
- **Voice**: session voice (default `ash`)
- **Output**: `.agent/audio/tts_YYYYMMDD_HHMMSS.mp3` (permanent storage)

### Execution pattern

Write JSON payload to temp file via heredoc (avoids shell escaping issues with diacritics), then curl with `-d @file`:

```bash
F=".agent/audio/tts_TIMESTAMP.mp3"
P=".agent/audio/payload.json"
cat > "$P" << 'EOF'
{"model":"gpt-4o-mini-tts-2025-03-20","input":"TTS text here","voice":"ash","instructions":"Warm tone. Moderate pace."}
EOF
HTTP=$(curl -s https://api.openai.com/v1/audio/speech \
  -H "Authorization: Bearer $KEY" -H "Content-Type: application/json" \
  -d @"$P" -o "$F" -w "%{http_code}")
rm -f "$P"
if [ "$HTTP" = "200" ]; then afplay "$F"; else echo "TTS error ($HTTP): $(cat "$F")"; rm -f "$F"; fi
```

Generate voice `instructions` in English, max 2 sentences. Structure: [TONE]. [PACING]. [PERSONA].
Dimensions: tone(warm/playful/serious/casual), emotion(cheerful/calm/excited/empathetic), pacing(slow/moderate/quick), persona(friend/teacher/narrator/coach).
Example: "Warm and cheerful. Moderate pace, sound like a friend sharing good news."

## Voice

Default: `ash`. User can change anytime by name or gender.
Gender defaults: male = `ash`, female = `coral`.

Voices -- M: ash(calm), ballad(gentle), echo(authoritative), onyx(deep), verse(dramatic), cedar(warm). F: coral(conversational), fable(narrative), nova(energetic), sage(wise), shimmer(light), marin(friendly). N: alloy(neutral).

## Rules

- Skip TTS for trivial confirmations ("ok", "done")
- Never send full response to TTS
- Never block flow waiting for playback
- On error: print `TTS error (HTTP_CODE): <message>` to chat, skip afplay
