---
name: cartesia-tts
description: >
  Use this skill when integrating Cartesia TTS API (Sonic model)
  for ultra-low latency text-to-speech, voice cloning, or streaming audio.
  Covers REST/WebSocket API via PHP proxy.
---

# Cartesia TTS Skill

You are an expert at integrating Cartesia Sonic TTS API into web applications using PHP proxy and vanilla JavaScript.

## Overview

Cartesia provides ultra-low latency TTS built on State Space Model (SSM) technology:
- **Sonic 3** — flagship model, 42+ languages
- **Ultra-low latency** — 40-90ms time-to-first-audio
- **Voice cloning** — from 3-second audio clip
- **Streaming** — WebSocket and SSE
- **Fine-grained control** — speed, volume, emotion, laughter
- **Professional Voice Clones (PVC)** — perfect replicas

## Current Models

| Model | Latency | Languages | Notes |
|-------|---------|-----------|-------|
| `sonic-3` | ~90ms | 42+ | Latest, most natural |
| `sonic-2` | ~100ms | 30+ | Previous generation |

> [!IMPORTANT]
> Cartesia uses SSM (State Space Model), NOT transformer-based.
> This enables significantly lower latency than competitors.

## API Base

```
https://api.cartesia.ai/
```

Authentication: `X-API-Key: $CARTESIA_API_KEY`

## Key Endpoints

| Endpoint | Method | Purpose |
|----------|--------|---------|
| `/tts/bytes` | POST | Generate full audio (returns binary) |
| `/tts/sse` | POST | Stream audio via SSE |
| `wss://api.cartesia.ai/tts/websocket` | WS | Real-time WebSocket streaming |
| `/voices` | GET | List voices |
| `/voices/{id}` | GET | Get voice details |
| `/voices/clone/clip` | POST | Clone voice from audio clip |

## Quick Start (PHP)

```php
<?php
$payload = [
    'model_id' => 'sonic-3',
    'transcript' => $text,
    'voice' => [
        'mode' => 'id',
        'id' => 'a0e99841-438c-4a64-b679-ae501e7d6091'  // example voice
    ],
    'output_format' => [
        'container' => 'mp3',
        'bit_rate' => 128000,
        'sample_rate' => 44100
    ]
];

$ch = curl_init('https://api.cartesia.ai/tts/bytes');
curl_setopt_array($ch, [
    CURLOPT_POST => true,
    CURLOPT_HTTPHEADER => [
        'X-API-Key: ' . getenv('CARTESIA_API_KEY'),
        'Cartesia-Version: 2024-06-10',
        'Content-Type: application/json'
    ],
    CURLOPT_POSTFIELDS => json_encode($payload),
    CURLOPT_RETURNTRANSFER => true
]);
$audioData = curl_exec($ch);
curl_close($ch);

header('Content-Type: audio/mpeg');
echo $audioData;
```

## SSE Streaming (PHP proxy)

```php
<?php
header('Content-Type: text/event-stream');
header('Cache-Control: no-cache');

$payload['output_format']['container'] = 'raw';
$payload['output_format']['encoding'] = 'pcm_f32le';
$payload['output_format']['sample_rate'] = 24000;

$ch = curl_init('https://api.cartesia.ai/tts/sse');
curl_setopt_array($ch, [
    CURLOPT_POST => true,
    CURLOPT_HTTPHEADER => [
        'X-API-Key: ' . getenv('CARTESIA_API_KEY'),
        'Cartesia-Version: 2024-06-10',
        'Content-Type: application/json'
    ],
    CURLOPT_POSTFIELDS => json_encode($payload),
    CURLOPT_WRITEFUNCTION => function($ch, $data) {
        echo $data;
        ob_flush();
        flush();
        return strlen($data);
    }
]);
curl_exec($ch);
```

## Voice Cloning

```php
$ch = curl_init('https://api.cartesia.ai/voices/clone/clip');
// POST multipart: clip (audio file, min 3s), name, description
```

## Voice Control

```json
{
    "voice": {
        "mode": "id",
        "id": "voice-id",
        "__experimental_controls": {
            "speed": "normal",       // "slowest" to "fastest"
            "emotion": ["positivity:high", "curiosity:medium"]
        }
    }
}
```

## Output Formats

- `mp3` (128kbps default)
- `raw` + `pcm_f32le` / `pcm_s16le` / `pcm_alaw` / `pcm_mulaw`
- `wav`

## API Docs

- [TTS API Reference](https://docs.cartesia.ai/api-reference/tts/bytes)
- [Voices](https://docs.cartesia.ai/api-reference/voices/list)
- [Voice Cloning](https://docs.cartesia.ai/api-reference/voices/clone-voice-clip)
- [WebSocket](https://docs.cartesia.ai/api-reference/tts/websocket)
- [SSE Streaming](https://docs.cartesia.ai/api-reference/tts/sse)

## Related Skills
- `ai-api` — AI integration patterns (LLM + TTS), PHP proxy architecture
- `tts-voice-instructor` — voice instruction engineering (OpenAI TTS)
