🎙️ Text-to-Speech API Tutorial

Generate natural-sounding speech in 23 languages with emotion tags and zero-shot voice cloning.

Models

ModelLanguagesSpeedCredits/30sFeatures
chatterbox-turboEnglish~3x realtime15Emotion tags [laugh] [sigh]
chatterbox-multilingual23 languages~1x realtime20Hindi, Japanese, Korean, Arabic...

Quick Start (cURL)

curl -X POST https://api.pixelapi.dev/v1/tts/generate \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "text=Hello! [laugh] This is amazing." \
  -F "model=chatterbox-turbo" \
  -F "language=en"

# Poll result:
curl -H "Authorization: Bearer YOUR_API_KEY" \
  "https://api.pixelapi.dev/v1/tts/status/JOB_ID"

Python Example

import requests, time

resp = requests.post("https://api.pixelapi.dev/v1/tts/generate",
    headers={"Authorization": "Bearer YOUR_KEY"},
    data={"text": "Hello! [laugh] Amazing.", "model": "chatterbox-turbo", "language": "en"})
job = resp.json()

while True:
    s = requests.get(f"https://api.pixelapi.dev/v1/tts/status/{job['id']}",
        headers={"Authorization": "Bearer YOUR_KEY"}).json()
    if s["status"] == "completed":
        print(f"Audio: {s['output_url']}")
        break
    time.sleep(3)

Voice Cloning

curl -X POST https://api.pixelapi.dev/v1/tts/generate \
  -H "Authorization: Bearer YOUR_KEY" \
  -F "text=This is my cloned voice." \
  -F "model=chatterbox-multilingual" \
  -F "language=en" \
  -F "voice_ref=@reference_10s.wav"
💡 Use a clear 10-second audio clip with minimal background noise for best cloning results.

Emotion Tags (Turbo only)

TagEffect
[laugh]Natural laughter
[chuckle]Soft chuckle
[gasp]Surprised gasp
[sigh]Sigh
[cough]Cough
[clear throat]Throat clearing

23 Supported Languages

en, hi, ja, ko, zh, fr, de, es, it, pt, ru, ar, nl, pl, tr, sv, da, fi, el, he, ms, no, sw

Pricing

FeatureCreditsCostvs ElevenLabs
Turbo (English)15/30s$0.01511x cheaper
Multilingual20/30s$0.0208.5x cheaper
Voice Clone+5+$0.005

Ready to try?

100 free credits on signup.

Try Text-to-Speech →