🎙️ Text-to-Speech API Tutorial
Generate natural-sounding speech in 23 languages with emotion tags and zero-shot voice cloning.
Models
| Model | Languages | Speed | Credits/30s | Features |
|---|---|---|---|---|
chatterbox-turbo | English | ~3x realtime | 15 | Emotion tags [laugh] [sigh] |
chatterbox-multilingual | 23 languages | ~1x realtime | 20 | Hindi, Japanese, Korean, Arabic... |
Quick Start (cURL)
curl -X POST https://api.pixelapi.dev/v1/tts/generate \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "text=Hello! [laugh] This is amazing." \
-F "model=chatterbox-turbo" \
-F "language=en"
# Poll result:
curl -H "Authorization: Bearer YOUR_API_KEY" \
"https://api.pixelapi.dev/v1/tts/status/JOB_ID"
Python Example
import requests, time
resp = requests.post("https://api.pixelapi.dev/v1/tts/generate",
headers={"Authorization": "Bearer YOUR_KEY"},
data={"text": "Hello! [laugh] Amazing.", "model": "chatterbox-turbo", "language": "en"})
job = resp.json()
while True:
s = requests.get(f"https://api.pixelapi.dev/v1/tts/status/{job['id']}",
headers={"Authorization": "Bearer YOUR_KEY"}).json()
if s["status"] == "completed":
print(f"Audio: {s['output_url']}")
break
time.sleep(3)
Voice Cloning
curl -X POST https://api.pixelapi.dev/v1/tts/generate \
-H "Authorization: Bearer YOUR_KEY" \
-F "text=This is my cloned voice." \
-F "model=chatterbox-multilingual" \
-F "language=en" \
-F "voice_ref=@reference_10s.wav"
💡 Use a clear 10-second audio clip with minimal background noise for best cloning results.
Emotion Tags (Turbo only)
| Tag | Effect |
|---|---|
[laugh] | Natural laughter |
[chuckle] | Soft chuckle |
[gasp] | Surprised gasp |
[sigh] | Sigh |
[cough] | Cough |
[clear throat] | Throat clearing |
23 Supported Languages
en, hi, ja, ko, zh, fr, de, es, it, pt, ru, ar, nl, pl, tr, sv, da, fi, el, he, ms, no, sw
Pricing
| Feature | Credits | Cost | vs ElevenLabs |
|---|---|---|---|
| Turbo (English) | 15/30s | $0.015 | 11x cheaper |
| Multilingual | 20/30s | $0.020 | 8.5x cheaper |
| Voice Clone | +5 | +$0.005 | — |