← Back to Docs

Lensora Studio API

One photo in, a cinematic 3D-render video out. AI picks the object, swaps the background, builds a full 3D model, and renders an MP4 from any of three camera moves.
Lensora Studio output: 3D-rendered Rolleiflex camera, front view on a marble countertop, kitchen background. Both lenses with blue specular highlights, focusing knob, film advance, controls strip — all preserved from the source photo.
Front turntable frame — rendered from a single Rolleiflex photo. New “marble countertop in a sunlit kitchen” background was prompt-generated, not photoshopped.
Lensora Studio output: same 3D camera, close-up dolly shot. Auto-orient placed the front face square to the camera, kitchen background gently out of focus behind the subject.
Dolly close frame — auto-orient keeps the hero face square to camera, background falls into shallow defocus.
Try it in the browser → Get an API key

How it works

Lensora Studio is a two-call orchestrator that wraps four PixelAPI primitives behind a single guided flow:

  1. Detect — we run object detection on your photo and return up to 8 foreground proposals (bbox, label, category) so the user picks the subject.
  2. Cut + replace background — the chosen subject is segmented out; you can leave it transparent, drop in your own backplate URL, or describe a new scene in plain English.
  3. 3D generation — the cropped subject is reconstructed as a full 360° mesh with production-ready textures (real depth, not a flat slab).
  4. Cinematic render — the mesh is composited over your new background and rendered as a 24 fps MP4 from one of three camera moves: turntable, dolly, or cinematic.

You make two HTTP calls. The first returns object proposals in < 2 s. The second runs the full pipeline in the background and you poll for the result.

Pricing

EndpointCreditsUSDRefund on failure
/v1/studio/init5$0.005n/a (charged on success only)
/v1/studio/transform75$0.075Yes — auto-refunded if any stage fails or times out
End-to-end80$0.08

No subscription required. Pay-per-use. Median end-to-end runtime is ~3-4 minutes (object detection ≈ 1 s, the rest is parallel background-replace + 3D + render).

Endpoints

1. POST /v1/studio/init — upload + detect

Upload a photo. Get back object proposals plus a session_id you'll use for the transform call.

Request (multipart/form-data)

FieldTypeRequiredDescription
imagefilerequiredJPEG / PNG / WebP. Max 20 MB.
max_objectsintoptionalCap the number of returned proposals. Range 2-8, default 6.

Example — cURL

curl -X POST https://api.pixelapi.dev/v1/studio/init \
  -H "Authorization: Bearer $PIXELAPI_KEY" \
  -F "[email protected]" \
  -F "max_objects=6"

Response

{
  "session_id": "2a91884c-1bb5-44e3-a6e0-7db0a5a56668",
  "source_url": "https://api.pixelapi.dev/uploads/3f27a1...jpg",
  "objects": [
    {
      "label": "vintage twin-lens reflex camera",
      "category": "product",
      "bbox": [0.18, 0.12, 0.79, 0.93]
    },
    {
      "label": "entire image (no crop)",
      "category": "full_frame",
      "bbox": [0.0, 0.0, 1.0, 1.0]
    }
  ],
  "suggested_actions": [],
  "credits_used": 5,
  "vlm_elapsed_ms": 842
}

2. POST /v1/studio/transform — run the pipeline

Pick the object, choose a background, choose a camera. Returns a job_id immediately; the job runs in the background and you poll for status.

Request (application/json)

FieldTypeRequiredDescription
session_idstringrequiredFrom /init. Sessions live 24h.
object_indexintoptionalIndex into the objects array. Default 0.
backgroundobjectrequiredSee background modes below.
camera_presetstringoptionalturntable · dolly · cinematic. Default turntable.

Background modes

background.typeOther fieldsWhat you get
"remove"noneTransparent background. The MP4 renders the subject on a clean white plate.
"image"url (public)Your own backplate dropped in as the scene.
"prompt"prompt (string)We generate a subject-free scene from your description and use it as the backplate.

Example — cURL

curl -X POST https://api.pixelapi.dev/v1/studio/transform \
  -H "Authorization: Bearer $PIXELAPI_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "session_id": "2a91884c-1bb5-44e3-a6e0-7db0a5a56668",
    "object_index": 0,
    "background": {
      "type": "prompt",
      "prompt": "on a marble countertop with soft natural light"
    },
    "camera_preset": "turntable"
  }'

Response

{
  "job_id": "e76d4dfc-8e0a-4e1b-a5a2-4f9c8a14e2b3",
  "status": "queued",
  "credits_used": 75,
  "estimated_seconds": 240,
  "poll_url": "/v1/studio/result/e76d4dfc-8e0a-4e1b-a5a2-4f9c8a14e2b3"
}

3. GET /v1/studio/result/{job_id} — poll status

Poll every 3-5 seconds until status is completed or failed. Credits are refunded automatically on failure.

In-progress response

{
  "job_id": "e76d4dfc-8e0a-4e1b-a5a2-4f9c8a14e2b3",
  "status": "processing",
  "step": "generating-3d",
  "progress": 60,
  "scene_url": "https://api.pixelapi.dev/outputs/files/edits/.../scene.jpg"
}

The step field walks through these stages: startingcroppingremoving-bggenerating-bgcompositinggenerating-3drendering-videodone.

Completed response

{
  "job_id": "e76d4dfc-8e0a-4e1b-a5a2-4f9c8a14e2b3",
  "status": "completed",
  "step": "done",
  "progress": 100,
  "output": {
    "mp4_url": "https://api.pixelapi.dev/dl/3d/.../turntable.mp4",
    "glb_url": "https://api.pixelapi.dev/dl/3d/.../model.glb",
    "scene_url": "https://api.pixelapi.dev/outputs/files/edits/.../scene.jpg",
    "removed_bg_url": "https://api.pixelapi.dev/outputs/files/edits/.../cutout.png"
  }
}

Four downloadable artifacts every job produces:

FieldWhat it isUse case
mp4_urlThe hero deliverable. 24 fps, 4 s, 768×768.Drop into product pages, ads, social.
glb_urlThe reconstructed 3D mesh with production-ready textures.AR previews, Blender/Unity/Unreal/Three.js.
scene_urlThe composited still — subject + new background.Static product photo, thumbnails.
removed_bg_urlSubject cutout with transparent alpha.Re-use in your own compositing.

Failed response

{
  "job_id": "e76d4dfc-...",
  "status": "failed",
  "error": "3D generation timeout",
  "progress": 0
}

When status == "failed", the 75 transform credits are automatically refunded to your account before the response is returned. The 5 /init credits are not refunded because object detection always succeeds on its own.


Camera presets

Each preset is a 4-second 24 fps render. Click play to compare:

Auto-orient is on by default. Before rendering, the model is auto-rotated so its largest face points square to the camera at angle 0. That means dolly never goes edge-on, and turntable never opens on a thin sliver, regardless of how the source photo was framed.


Full Python example

import os, time, requests

API = "https://api.pixelapi.dev"
KEY = os.environ["PIXELAPI_KEY"]
H = {"Authorization": f"Bearer {KEY}"}

# 1. INIT
with open("rolleiflex.jpg", "rb") as f:
    init = requests.post(f"{API}/v1/studio/init",
        headers=H, files={"image": f}).json()

print("session:", init["session_id"])
print("objects:")
for i, o in enumerate(init["objects"]):
    print(f" [{i}] {o['label']} ({o['category']})")

# 2. TRANSFORM
job = requests.post(f"{API}/v1/studio/transform",
    headers={**H, "Content-Type": "application/json"},
    json={
        "session_id": init["session_id"],
        "object_index": 0,
        "background": {
            "type": "prompt",
            "prompt": "on a marble countertop with soft natural light",
        },
        "camera_preset": "cinematic",
    }).json()

print("job:", job["job_id"], "ETA:", job["estimated_seconds"], "s")

# 3. POLL
while True:
    s = requests.get(f"{API}/v1/studio/result/{job['job_id']}",
                     headers=H).json()
    print(f" {s['status']:10} step={s.get('step','-'):20} pct={s.get('progress',0)}%")
    if s["status"] == "completed":
        out = s["output"]
        print("MP4:", out["mp4_url"])
        print("GLB:", out["glb_url"])
        # Download the MP4
        with open("output.mp4", "wb") as f:
            f.write(requests.get(out["mp4_url"]).content)
        break
    if s["status"] == "failed":
        print("FAILED:", s.get("error"))
        break
    time.sleep(4)

Node.js example

import fs from "fs";
import FormData from "form-data";
import axios from "axios";

const API = "https://api.pixelapi.dev";
const H = { Authorization: `Bearer ${process.env.PIXELAPI_KEY}` };

// 1. INIT
const form = new FormData();
form.append("image", fs.createReadStream("rolleiflex.jpg"));
form.append("max_objects", "6");
const init = (await axios.post(`${API}/v1/studio/init`, form, {
  headers: { ...H, ...form.getHeaders() },
})).data;

// 2. TRANSFORM
const job = (await axios.post(`${API}/v1/studio/transform`, {
  session_id: init.session_id,
  object_index: 0,
  background: { type: "prompt", prompt: "on a marble countertop with soft natural light" },
  camera_preset: "cinematic",
}, { headers: { ...H, "Content-Type": "application/json" } })).data;

// 3. POLL
while (true) {
  const s = (await axios.get(`${API}/v1/studio/result/${job.job_id}`, { headers: H })).data;
  console.log(`${s.status} step=${s.step} pct=${s.progress}`);
  if (s.status === "completed") { console.log("MP4:", s.output.mp4_url); break; }
  if (s.status === "failed") { console.log("FAILED:", s.error); break; }
  await new Promise(r => setTimeout(r, 4000));
}

Error codes

HTTPMeaningWhat to do
400Invalid image, bad background.type, object_index out of range, or bbox crop too small.Re-check inputs.
401Missing / invalid API key.Pass Authorization: Bearer <key>.
402Insufficient credits (need 5 for init, 75 for transform).Top up at pixelapi.dev/billing.
404Session expired (24h TTL) or job not found.Restart with a fresh /init call.
503Object detection temporarily unavailable.Retry after a few seconds — no credits charged.

Tips for best results

  1. Single hero subject in the frame. Detection works on multi-object photos but the 3D step gives the strongest output when one subject dominates.
  2. Mid-distance shots beat extreme close-ups. A little of the original scene around the subject gives the 3D step depth cues to work with.
  3. Even lighting helps texture quality. Harsh shadows get baked into the GLB albedo.
  4. Pick the smallest bbox that fully contains the subject. The orchestrator pads the crop generously on its own.
  5. Background prompts work best when they describe a scene, not just a color. “on a marble countertop with soft natural light” beats “white”.
  6. Pick the camera that matches the use: turntable for catalogs, dolly for ads, cinematic for launches.

Known limits


Try it in the browser → Get an API key All API docs