---
name: groq-api-dev
description: >
  Use this skill when building applications with Groq's ultra-fast LPU inference,
  using Llama, Mixtral, Gemma or Groq compound models via OpenAI-compatible API.
  Covers chat completions, streaming, tool calling, and speed optimization.
---

# Groq API Development Skill

You are an expert at integrating Groq API for ultra-fast LLM inference using PHP proxy and vanilla JavaScript.

## Overview

Groq provides ultra-low latency LLM inference through custom LPU hardware:
- **OpenAI-compatible API** — drop-in replacement for OpenAI endpoints
- **Ultra-fast inference** — 10-100x faster than GPU-based providers
- **Chat Completion** — text generation with open-weight models
- **Streaming** — real-time SSE output
- **Tool Calling** — function calling (local, remote MCP, built-in)
- **Audio** — transcription, translation, speech (Whisper)
- **Batch** — async batch processing

## Current Groq Models

- `compound`: Groq's flagship compound model
- `compound-mini`: faster compound variant
- `llama-4-scout-17b-16e-instruct`: Meta Llama 4, 17B active params, multimodal
- `llama-4-maverick-17b-128e-instruct`: Meta Llama 4, larger MoE, advanced reasoning
- `llama-3.3-70b-versatile`: Meta Llama 3.3, general purpose (production recommended)
- `llama-3.1-8b-instant`: fast, lightweight
- `kimi-k2.5`: Moonshot AI, 1T params MoE (32B active), multimodal, agentic
- `kimi-k2-0905`: Moonshot AI, 256K context, coding, tool calling
- `gpt-oss-20b` / `gpt-oss-120b`: OpenAI open-source models
- `gemma2-9b-it`: Google Gemma 2

> [!WARNING]
> Nvidia acquired Groq's LPU technology ($20B deal, late 2025).
> GroqCloud continues to operate independently for now. Monitor for changes.

> [!IMPORTANT]
> Groq API is OpenAI-compatible — same request/response format.
> Use it as a drop-in replacement by changing base URL and API key.
> Older models (`mixtral-8x7b`, `llama3-groq-*`) are deprecated — use `llama-3.3-70b-versatile`.

## API Base

```
https://api.groq.com/openai/v1/
```

Authentication: `Authorization: Bearer $GROQ_API_KEY`

## Key Endpoints

| Endpoint | Purpose |
|----------|---------|
| `/openai/v1/chat/completions` | Chat completion |
| `/openai/v1/audio/transcriptions` | Speech-to-text |
| `/openai/v1/audio/translations` | Audio translation |
| `/openai/v1/audio/speech` | TTS |
| `/openai/v1/models` | List models |

## Quick Start (PHP)

```php
<?php
$payload = [
    'model' => 'llama-3.3-70b-versatile',
    'messages' => [
        ['role' => 'system', 'content' => 'You are helpful.'],
        ['role' => 'user', 'content' => $userMessage]
    ],
    'temperature' => 0.7
];

$ch = curl_init('https://api.groq.com/openai/v1/chat/completions');
curl_setopt_array($ch, [
    CURLOPT_POST => true,
    CURLOPT_HTTPHEADER => [
        'Authorization: Bearer ' . getenv('GROQ_API_KEY'),
        'Content-Type: application/json'
    ],
    CURLOPT_POSTFIELDS => json_encode($payload),
    CURLOPT_RETURNTRANSFER => true
]);
$response = json_decode(curl_exec($ch), true);
curl_close($ch);
```

## Tool Calling

Three patterns:
1. **Local** — model generates JSON, you execute functions locally
2. **Remote (MCP)** — Model Context Protocol, tools run externally
3. **Built-in** — Groq-hosted tools (web search, code execution)

```php
// Local tool calling (same as OpenAI format)
$payload['tools'] = [[
    'type' => 'function',
    'function' => [
        'name' => 'get_weather',
        'description' => 'Get weather for a city',
        'parameters' => [
            'type' => 'object',
            'properties' => ['city' => ['type' => 'string']],
            'required' => ['city']
        ]
    ]
]];
```

## When to Use Groq

- **Speed-critical** applications (real-time chat, live responses)
- **Cost-sensitive** — competitive pricing for open-weight models
- **Prototyping** — same API as OpenAI, easy to switch
- **Audio processing** — fast Whisper transcription

## API Docs

Groq does NOT have `llms.txt`. Key docs:

- [Chat Completions](https://console.groq.com/docs/chat)
- [Models](https://console.groq.com/docs/models)
- [Tool Use](https://console.groq.com/docs/tool-use)
- [Streaming](https://console.groq.com/docs/streaming)
- [Audio](https://console.groq.com/docs/speech-text)

## Related Skills
- `ai-api` — general AI integration patterns (streaming, agent loops, PHP proxy)
- `system-prompt-master` — prompt engineering
