PixelAPI / Voice Cloning API

REST API · Live · 30+ Languages

Best Voice Cloning API

The best voice cloning API for developers who need production-quality AI speech at scale. POST any text and get back natural-sounding audio — in 30+ languages — using either a text description prompt or a reference audio recording. Two modes, one endpoint. $0.05 per request for voice design, $0.10 per request for voice cloning. 500 free credits, no credit card required.

$0.05 / request Voice design Voice cloning 30+ languages 500 free credits No credit card

Get an API key (free) Quick start See pricing API docs

Two modes: Voice Design and Voice Cloning

Most voice APIs force you to pick from a fixed library of preset voices. PixelAPI gives you two better options:

Voice Design — $0.05/request

Describe the voice you want in plain English: (warm elderly man, slow pace) Hello. No reference audio needed. Generates a unique synthetic voice on the fly from your description. Supports style cues: cheerful, whispering, sad slow, formal, gentle.

Voice Cloning — $0.10/request

Upload a WAV or MP3 reference recording (minimum 5 seconds, 16 kHz+, max 10 MB). The API replicates the speaker's timbre, accent, and rhythm. Control cloning strength via cfg_value (0.5–5.0) and quality via inference_timesteps (4–20).

Quick start — one API call

Sign up, copy your key from the dashboard, and POST your text. The endpoint returns a generation id; poll until status=completed, then download your audio from output_url. Maximum 500 characters of text per request.

# Voice Design — describe the voice in text
curl -X POST https://api.pixelapi.dev/v1/tts/generate \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "text=Hello, welcome to our store. We have great deals today." \
  -F "language=en" \
  -F "voice_description=warm, professional male narrator"
# Response: {"id": "uuid", "status": "queued", "credits_used": 50.0, ...}

# Poll until completed
curl https://api.pixelapi.dev/v1/tts/status/UUID \
  -H "Authorization: Bearer YOUR_API_KEY"
# Response: {"status": "completed", "output_url": "https://..."}

# Voice Cloning — add a reference recording
curl -X POST https://api.pixelapi.dev/v1/tts/generate \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "text=This script will be read in my cloned voice." \
  -F "language=en" \
  -F "voice_ref=@my_voice_sample.wav"

import requests, time

API_KEY = "YOUR_API_KEY"
headers = {"Authorization": f"Bearer {API_KEY}"}

# Voice Design
resp = requests.post(
    "https://api.pixelapi.dev/v1/tts/generate",
    headers=headers,
    data={
        "text": "Hello, welcome to our store. We have great deals today.",
        "language": "en",
        "voice_description": "warm, professional male narrator",
    }
)
job = resp.json()

# Poll for result
while True:
    status = requests.get(
        f"https://api.pixelapi.dev/v1/tts/status/{job['id']}",
        headers=headers
    ).json()
    if status["status"] == "completed":
        audio_url = status["output_url"]  # download from here
        break
    time.sleep(2)

# Voice Cloning — swap data= for files=
with open("my_voice_sample.wav", "rb") as ref:
    resp = requests.post(
        "https://api.pixelapi.dev/v1/tts/generate",
        headers=headers,
        data={"text": "Clone this script.", "language": "en"},
        files={"voice_ref": ref}
    )

import FormData from 'form-data';
import fetch from 'node-fetch';

const API_KEY = process.env.PIXELAPI_KEY;

// Voice Design
const form = new FormData();
form.append('text', 'Hello, welcome to our store.');
form.append('language', 'en');
form.append('voice_description', 'warm, professional male narrator');

const res = await fetch('https://api.pixelapi.dev/v1/tts/generate', {
  method: 'POST',
  headers: { 'Authorization': `Bearer ${API_KEY}`, ...form.getHeaders() },
  body: form,
});
const job = await res.json();

// Poll for result
let status;
do {
  await new Promise(r => setTimeout(r, 2000));
  status = await fetch(`https://api.pixelapi.dev/v1/tts/status/${job.id}`, {
    headers: { 'Authorization': `Bearer ${API_KEY}` }
  }).then(r => r.json());
} while (status.status !== 'completed');

console.log(status.output_url); // download audio from this URL

<?php
$apiKey = getenv("PIXELAPI_KEY");

// Voice Design
$ch = curl_init("https://api.pixelapi.dev/v1/tts/generate");
curl_setopt_array($ch, [
    CURLOPT_POST => true,
    CURLOPT_HTTPHEADER => ["Authorization: Bearer $apiKey"],
    CURLOPT_POSTFIELDS => [
        "text" => "Hello, welcome to our store.",
        "language" => "en",
        "voice_description" => "warm, professional male narrator",
    ],
    CURLOPT_RETURNTRANSFER => true,
]);
$job = json_decode(curl_exec($ch), true);
curl_close($ch);

// Poll for result
do {
    sleep(2);
    $ch = curl_init("https://api.pixelapi.dev/v1/tts/status/{$job['id']}");
    curl_setopt_array($ch, [
        CURLOPT_HTTPHEADER => ["Authorization: Bearer $apiKey"],
        CURLOPT_RETURNTRANSFER => true,
    ]);
    $status = json_decode(curl_exec($ch), true);
    curl_close($ch);
} while ($status["status"] !== "completed");

echo $status["output_url"]; // download audio from this URL

require 'net/http'
require 'json'

api_key = ENV["PIXELAPI_KEY"]
http = Net::HTTP.new("api.pixelapi.dev", 443)
http.use_ssl = true

# Voice Design
req = Net::HTTP::Post.new("/v1/tts/generate")
req["Authorization"] = "Bearer #{api_key}"
req.set_form([
  ["text", "Hello, welcome to our store."],
  ["language", "en"],
  ["voice_description", "warm, professional male narrator"],
], "multipart/form-data")

job = JSON.parse(http.request(req).body)

# Poll for result
loop do
  sleep 2
  status_req = Net::HTTP::Get.new("/v1/tts/status/#{job['id']}")
  status_req["Authorization"] = "Bearer #{api_key}"
  status = JSON.parse(http.request(status_req).body)
  if status["status"] == "completed"
    puts status["output_url"]  # download audio from this URL
    break
  end
end

package main

import (
    "bytes"; "encoding/json"; "fmt"
    "io"; "mime/multipart"; "net/http"; "time"; "os"
)

func main() {
    apiKey := os.Getenv("PIXELAPI_KEY")
    client := &http.Client{}

    // Voice Design
    body := &bytes.Buffer{}
    w := multipart.NewWriter(body)
    w.WriteField("text", "Hello, welcome to our store.")
    w.WriteField("language", "en")
    w.WriteField("voice_description", "warm, professional male narrator")
    w.Close()

    req, _ := http.NewRequest("POST", "https://api.pixelapi.dev/v1/tts/generate", body)
    req.Header.Set("Authorization", "Bearer "+apiKey)
    req.Header.Set("Content-Type", w.FormDataContentType())

    resp, _ := client.Do(req)
    var job map[string]interface{}
    json.NewDecoder(resp.Body).Decode(&job)
    resp.Body.Close()

    // Poll for result
    id := job["id"].(string)
    for {
        time.Sleep(2 * time.Second)
        req, _ = http.NewRequest("GET", "https://api.pixelapi.dev/v1/tts/status/"+id, nil)
        req.Header.Set("Authorization", "Bearer "+apiKey)
        resp, _ = client.Do(req)
        var status map[string]interface{}
        json.NewDecoder(resp.Body).Decode(&status)
        resp.Body.Close()
        if status["status"] == "completed" {
            fmt.Println(status["output_url"]) // download audio from this URL
            break
        }
    }
}

Pricing — half the price of ElevenLabs

PixelAPI's voice cloning API is priced at exactly half the cost of the nearest equivalent tier on major competitors. The table below compares flat per-request billing (PixelAPI) against per-minute and per-character models used by rivals.

Provider	Free tier	Voice design / TTS	Voice cloning	Languages
PixelAPI	500 credits, no card	$0.05/request	$0.10/request	30+
ElevenLabs	Limited free tier	~$0.30/min (Scale)	Included in plan	30+
OpenAI TTS-1	Pay-as-you-go	$15.00/1M chars	Not available	~6
Play.ht	Limited	see play.ht/pricing	see play.ht/pricing	140+
Murf.ai	Trial only	see murf.ai/pricing	Enterprise	20+

Pricing verified from each rival's public pricing page March 2026. ElevenLabs per-minute rate sourced from competitive pricing audit. OpenAI TTS-1 rate from platform.openai.com/docs/pricing. PixelAPI's per-request price is set at exactly half the equivalent rate of the leading competitor per our pricing principle — never above (not competitive), never below (signals low quality).

What you get back

Audio download URL

The output_url field in the completed status response is a signed, time-limited URL pointing directly to your generated audio file. Download it server-side or stream it to end users.

Generation metadata

Every status response includes credits_used, created_at, and completed_at timestamps — ready to pipe into your billing, logging, or analytics systems.

30+ language voices

The same endpoint handles English, Mandarin, Hindi, Spanish, French, German, Japanese, Korean, Arabic, and 20+ more. Set language=auto to detect from the text, or specify a code for explicit control.

Cloning strength control

For voice cloning jobs, cfg_value (0.5–5.0) controls how closely the output locks to the reference speaker. Lower values add naturalness; higher values tighten the clone. inference_timesteps (4–20) trades generation speed for audio quality.

Common workflows

The Voice Cloning API powers these production workflows. Each link goes to an industry-specific setup guide:

Books & media

Narrate audiobooks, articles, and news summaries. Multi-chapter scripts split into 500-char segments.

E-commerce

Product description voiceovers, promotional audio for Shopify and WooCommerce listings.

Marketing agencies

Ad voiceovers, brand voice cloning, multilingual campaign audio at scale.

Real estate

Automated property listing audio tours. Clone the agent's voice for consistent branding.

Social media

Short-form audio for Reels, TikTok, and Shorts. Voice design for rapid content production.

Fashion & retail

Lookbook narration, product walk-through audio, multilingual retail content.

More use-case guides: all industries →

Integrations

Zapier

Trigger voice generation from any Zap. No-code pipeline for content teams.

Make.com

Drag-and-drop TTS module in Make scenarios.

Webflow

Auto-generate audio for CMS collection items on publish.

Shopify

Product audio on upload via webhook. Accessible storefronts, zero manual work.

Wix

Wix Automations hook to generate voiceovers for new blog posts.

Next.js

Server-side audio generation in Next.js API routes. Edge-compatible polling pattern.

Comparison vs alternatives

vs ElevenLabs

Flat per-request pricing vs ElevenLabs' credit tiers. No monthly subscription required. Equivalent voice quality at a fraction of the cost for moderate volumes.

vs Murf.ai

API-first vs Murf's studio-first approach. No UI lock-in — integrate directly into your pipeline. Voice cloning on the free trial, not enterprise-gated.

vs Play.ht

Simple per-request billing vs Play.ht's character-based plans. No seat limits. REST API identical to PixelAPI's other audio and image endpoints.

vs OpenAI TTS

OpenAI offers six preset voices, no cloning. PixelAPI adds voice design from text description, voice cloning from reference audio, and 30+ language support in the same endpoint.

Rate limits & error handling

Default 60 requests/minute on the free tier, 600 requests/minute on paid tiers. Exceeding the limit returns HTTP 429 with a Retry-After header. Recommended: exponential backoff starting at 2 seconds, doubling on each retry up to 30 seconds maximum.

Additional status codes to handle:

402 Insufficient credits — top up your balance at /pricing or use trial credits.
400 Text cannot be empty — the text field is required and must not be blank.
400 Reference audio must be under 10 MB — compress or trim the reference file and retry.
503 Server busy — the request queue is temporarily full; retry after the Retry-After header delay.

# Retry with exponential backoff (Python)
import requests, time

def generate_speech(text, headers, **kwargs):
    delay = 2
    for attempt in range(5):
        resp = requests.post(
            "https://api.pixelapi.dev/v1/tts/generate",
            headers=headers,
            data={"text": text, **kwargs}
        )
        if resp.status_code == 429:
            time.sleep(int(resp.headers.get("Retry-After", delay)))
            delay = min(delay * 2, 30)
            continue
        resp.raise_for_status()
        return resp.json()
    raise RuntimeError("Max retries exceeded")

How-to guides

Clone a voice

Step-by-step guide to uploading a reference recording, tuning cfg_value, and achieving a high-quality clone.

Create an audiobook

Segment long manuscripts, batch-generate audio per chapter, and concatenate to a finished audiobook.

Generate a voiceover

Voice design tips for ad copy, explainer videos, and on-hold messages — no reference audio needed.

Frequently asked questions

How do I convert text to speech via the voice cloning API?

POST your text to https://api.pixelapi.dev/v1/tts/generate with your API key and optionally a voice_description prompt. The endpoint returns a generation id; poll GET /v1/tts/status/{id} until status=completed, then download your audio from output_url. See the Quick Start section above for code in six languages.

What does the voice cloning API cost?

$0.05 per request for voice design (text prompt to speech) and $0.10 per request for voice cloning (reference audio upload). New accounts get 500 free credits — enough for 10 voice design jobs or 5 voice clone jobs — with no credit card required. Credits never expire.

What is the difference between voice design and voice cloning?

Voice design uses a text description — voice_description=warm elderly man, gentle pace — to generate a synthetic voice on the fly. No reference audio needed. Voice cloning takes an actual WAV or MP3 recording (minimum 5 seconds, 16 kHz+, max 10 MB) and replicates the speaker's timbre, accent, and rhythm. Voice design costs $0.05/request; voice cloning costs $0.10/request.

How many languages are supported?

30+ languages: English, Mandarin Chinese, Hindi, Spanish, French, German, Japanese, Korean, Russian, Arabic, Portuguese, Italian, Dutch, Polish, Turkish, Vietnamese, Thai, Indonesian, Malay, Bengali, Tamil, Telugu, Marathi, Ukrainian, Swedish, Norwegian, Danish, Finnish, Greek, Hebrew, and Swahili. Use language=auto for automatic detection from the input text.

What is the maximum text length per request?

500 characters per request — roughly 30 seconds of speech at a natural speaking pace (~150 words/minute). For longer content such as audiobooks or podcast scripts, split the text into 500-character segments and send sequential requests, then concatenate the audio files in your application.

What audio format does the API return?

The output_url field in the completed status response is a signed URL pointing to the generated audio file. Download it directly from your application once status equals completed.

How do I upload a reference voice for cloning?

Include the voice_ref field as a file upload (multipart/form-data) containing a WAV or MP3 recording sampled at 16 kHz or higher. The file must be at least 5 seconds long and under 10 MB. Longer, cleaner recordings — 30+ seconds in a quiet room — produce noticeably better clones. Background noise and music in the reference degrade clone quality.

What are the rate limits?

Default 60 requests/minute on the free tier, 600 requests/minute on paid tiers. Exceeding the limit returns HTTP 429 with a Retry-After header. For high-volume batch workloads — audiobook generation, IVR prompt refreshes — contact [email protected] with your expected volume for a custom limit.

Is there a free trial?

Yes. Every new account starts with 500 free credits — no credit card required. That covers 10 voice design requests or 5 voice clone requests. Credits never expire; unused credits roll over as long as the account remains active. You can test on real workloads before paying anything.

Can I control voice speed and style?

Yes. For voice design, embed style cues directly in the text field: (cheerful) Good morning! or (slow, whispering) This is a secret. For voice cloning, cfg_value (0.5–5.0, default 2.0) controls cloning strength: lower values add naturalness, higher values lock tighter to the reference speaker. inference_timesteps (4–20, default 10) trades generation speed for audio quality — use 20 for production audiobooks, 4 for previews.

Start free — 500 credits, no card Read full API docs Compare all plans