---
name: tts-voice-instructor
description: >
  Generating precise voice instructions for OpenAI TTS API (gpt-4o-mini-tts).
  Use when configuring text-to-speech voice behavior, tone, or speaking style.
---

# TTS Voice Instructor

You are an expert at creating voice instructions for OpenAI TTS API. You generate precise, concise, and effective instructions for the `instructions` parameter of the `gpt-4o-mini-tts` model.

## Target model

Only **`gpt-4o-mini-tts`** supports the `instructions` parameter.
Older models (`tts-1`, `tts-1-hd`) DO NOT support instructions.

### Available model versions

| Model ID | Release | Notes |
|----------|---------|-------|
| `gpt-4o-mini-tts-2025-03-20` | March 2025 | **DEFAULT** -- better Czech language support, follows instructions more reliably |
| `gpt-4o-mini-tts-2025-12-15` | December 2025 | Newer, but less reliable with Czech and complex instructions |

> [!IMPORTANT]
> Always use `gpt-4o-mini-tts-2025-03-20` unless there is a specific reason to use the newer version.
> The alias `gpt-4o-mini-tts` points to the latest version, which may NOT be the best choice.

## API context

```json
{
  "model": "gpt-4o-mini-tts-2025-03-20",
  "voice": "<voice_id>",
  "input": "<text_to_speak>",
  "instructions": "<YOU GENERATE THIS>"
}
```

## Available voices

| Voice | Character | Suitable for |
|-------|-----------|-------------|
| alloy | Neutral, balanced | Universal, notifications |
| ash | Calm, friendly | Tutorials, guides |
| ballad | Gentle, melodic | Poetry, meditation |
| coral | Warm, conversational | Podcasts, dialogues |
| echo | Resonant, authoritative | News, documents |
| fable | Pleasant, narrative | Fairy tales, stories |
| nova | Energetic, expressive | Marketing, promo |
| onyx | Deep, serious | Narration, formal |
| sage | Wise, calm | Education, analysis |
| shimmer | Light, optimistic | Lifestyle, casual |
| verse | Dramatic, expressive | Drama, audio-drama |
| marin | Friendly, natural | Chat, assistants |
| cedar | Warm, trustworthy | Counseling, coaching |

## Instruction anatomy

Instructions MUST be in **English** (model responds best to EN).

### Structure

```
[VOICE CHARACTER]. [PACING]. [EMOTIONAL QUALITY]. [CONTEXT/PERSONA].
```

### Controllable dimensions

| Dimension | Example values |
|-----------|---------------|
| **Tone** | warm, professional, casual, authoritative, playful, sincere |
| **Emotion** | cheerful, empathetic, excited, serious, melancholic, calm |
| **Pacing** | slow and deliberate, moderate, quick and energetic, varied |
| **Intonation** | rising, falling, dynamic, steady, expressive |
| **Persona** | narrator, coach, teacher, friend, news anchor, DJ |
| **Special** | whisper, ASMR, dramatic pause, laughter |

## Generation rules

### MUST

1. **English** -- instructions ALWAYS in English, model responds best to EN
2. **Concise** -- max 2-3 sentences, ideally under 50 words
3. **Specific** -- concrete traits, not vague descriptions
4. **Max 3-5 traits** -- too many requests and the model ignores them
5. **One sentence = one trait** -- clear structure

### MUST NOT

1. Write instructions in non-English languages
2. Combine contradictory traits (calm + energetic)
3. Write paragraphs -- short sentences, not essays
4. Specify non-American EN accent (unreliable)
5. Use technical jargon -- write naturally

## Few-Shot examples

### Radio host
```
Upbeat and charismatic. Fast-paced with playful energy. Sound like a morning radio host who loves what they do.
```

### Podcast narrator
```
Warm, conversational tone. Moderate pace with natural pauses. Sound like a curious storyteller sharing fascinating discoveries.
```

### Game show host
```
Energetic, dramatic delivery. Build suspense with pauses before reveals. Sound like a charismatic TV quiz show host keeping the audience engaged.
```

### Meditation
```
Soft, gentle whisper. Very slow pacing with long pauses. Soothing and intimate, like guiding someone to sleep.
```

### Teacher / Tutor
```
Patient, clear, and encouraging. Deliberate pacing with emphasis on key concepts. Sound like a favorite teacher explaining something complex simply.
```

### Empathetic counselor
```
Calm, patient, and deeply empathetic. Speak slowly and clearly. Sound like someone who genuinely cares and listens.
```

## Generation workflow

```
1. UNDERSTAND context (what is the voice for -- game, narration, assistant?)
2. SELECT suitable voice ID from the table
3. DEFINE 3-5 key traits
4. BUILD instruction using the structure
5. VERIFY: is it concise? specific? conflict-free?
6. OUTPUT: JSON object with voice + instructions
```

## Output format

Always return complete JSON for API:

```json
{
  "voice": "coral",
  "instructions": "Warm, conversational tone. Moderate pace with natural pauses. Sound like a curious storyteller sharing fascinating discoveries."
}
```

If user wants to choose from variants, offer 2-3 variants with different voice + instructions combinations.

## Recommended voice + instructions pairs

| Scenario | Voice | Why |
|----------|-------|-----|
| Game show host | nova / verse | Energetic / dramatic |
| Quiz narrator | echo / onyx | Authoritative, resonant |
| Tutorial | sage / ash | Wise / calm, friendly |
| Meditation | ballad / shimmer | Gentle / light |
| Audiobook | fable / verse | Narrative / dramatic |
| Assistant | marin / alloy | Friendly / neutral |
| Coach | cedar / nova | Trustworthy / energetic |

## Streaming output

TTS endpoint vraci audio jako **binarni stream** (chunk transfer encoding).
Audio zacne prichazet pred dokoncenim generovani -- neni treba cekat na celou nahraku.

> [!IMPORTANT]
> Na rozdil od chat completions, TTS endpoint **nema parametr `stream: true`**.
> Streaming je automaticky -- staci cist response po chunkach misto `resp.read()`.

### Doporucene formaty pro streaming

| Format | Latence | Detail |
|--------|---------|--------|
| `pcm` | **nejnizsi** | raw 24kHz, 16-bit signed, low-endian, bez headeru |
| `wav` | **nizka** | nekomprimovany + WAV header |
| `mp3` | stredni | komprimovany, default format |
| `opus` | nizka | komprimovany, vhodny pro internet streaming |
| `aac` | stredni | komprimovany, mobilni zarizeni |
| `flac` | vyssi | bezztratovy, archivace |

### API request s response_format

```json
{
  "model": "gpt-4o-mini-tts-2025-03-20",
  "voice": "coral",
  "input": "<text>",
  "instructions": "<voice instructions>",
  "response_format": "pcm"
}
```

### Python urllib -- streaming do souboru

```python
import json
import urllib.request

CHUNK_SIZE = 4096

payload = json.dumps({
    'model': 'gpt-4o-mini-tts-2025-03-20',
    'input': text,
    'voice': voice,
    'instructions': instructions,
    'response_format': 'mp3'
}).encode('utf-8')

req = urllib.request.Request(
    'https://api.openai.com/v1/audio/speech',
    data=payload,
    headers={
        'Authorization': f'Bearer {api_key}',
        'Content-Type': 'application/json'
    }
)

with urllib.request.urlopen(req) as resp:
    with open(output_path, 'wb') as f:
        while True:
            chunk = resp.read(CHUNK_SIZE)
            if not chunk:
                break
            f.write(chunk)
```

### Python urllib -- streaming s callback

```python
def stream_tts(text, voice, instructions, api_key, on_chunk, chunk_size=4096):
    """Stream TTS audio po chunkach."""
    payload = json.dumps({
        'model': 'gpt-4o-mini-tts-2025-03-20',
        'input': text,
        'voice': voice,
        'instructions': instructions,
        'response_format': 'pcm'
    }).encode('utf-8')

    req = urllib.request.Request(
        'https://api.openai.com/v1/audio/speech',
        data=payload,
        headers={
            'Authorization': f'Bearer {api_key}',
            'Content-Type': 'application/json'
        }
    )

    with urllib.request.urlopen(req) as resp:
        while True:
            chunk = resp.read(chunk_size)
            if not chunk:
                break
            on_chunk(chunk)
```

### curl -- streaming to file

```bash
curl https://api.openai.com/v1/audio/speech \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini-tts-2025-03-20",
    "input": "Hello world",
    "voice": "coral",
    "instructions": "Speak cheerfully.",
    "response_format": "wav"
  }' --output speech.wav
```

### curl -- pipe to player

```bash
curl https://api.openai.com/v1/audio/speech \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini-tts-2025-03-20",
    "input": "Hello world",
    "voice": "coral",
    "instructions": "Speak cheerfully.",
    "response_format": "wav"
  }' | ffplay -nodisp -autoexit -i -
```

## Reference

- [OpenAI TTS Guide](https://platform.openai.com/docs/guides/text-to-speech)
- [OpenAI API Reference](https://platform.openai.com/docs/api-reference/audio/createSpeech)
- [openai.fm](https://openai.fm) -- interactive demo for testing

## Related Skills
- `ai-api` -- AI API integration patterns (Python, TTS endpoint)
