May 8, 2026 · 5 min read
A single photo goes in. An MP4 of the subject as a real 3D object — turning, dollying, or sweeping past the camera on a brand-new background — comes out. Two HTTP calls. Eighty credits. About four minutes of wall-clock.
That is the brief for Lensora Studio, the newest endpoint on PixelAPI. This post walks through what it does, the design choices behind it, and the slab-shaped detour we took to get the 3D step right.
A real Rolleiflex photo went in. This is one frame of the turntable MP4 that came out — the kitchen background was generated from a one-line prompt.
You hand the API a photo. It does four things back to back:
turntable (full 360°), dolly (straight zoom-in), or cinematic (180° arc with depth-of-field).You get back four artifacts every time: the hero MP4, a downloadable GLB you can drop into Blender / Unity / Three.js, a static composited still, and the alpha-cutout PNG.
Step one is a multipart upload that returns object proposals plus a session_id:
curl -X POST https://api.pixelapi.dev/v1/studio/init \
-H "Authorization: Bearer $PIXELAPI_KEY" \
-F "[email protected]"
{
"session_id": "2a91884c-...",
"objects": [
{"label": "vintage twin-lens reflex camera",
"category": "product",
"bbox": [0.18, 0.12, 0.79, 0.93]},
{"label": "entire image (no crop)",
"category": "full_frame",
"bbox": [0.0, 0.0, 1.0, 1.0]}
],
"credits_used": 5
}
Step two picks an object, picks a background, picks a camera, and returns a job_id immediately while the pipeline runs in the background:
curl -X POST https://api.pixelapi.dev/v1/studio/transform \
-H "Authorization: Bearer $PIXELAPI_KEY" \
-H "Content-Type: application/json" \
-d '{
"session_id": "2a91884c-...",
"object_index": 0,
"background": {
"type": "prompt",
"prompt": "on a marble countertop with soft natural light"
},
"camera_preset": "cinematic"
}'
You poll /v1/studio/result/{job_id} every few seconds. The step field walks through cropping → removing-bg → generating-bg → compositing → generating-3d → rendering-video → done so you can show real progress in your UI.
The full Python example is in the docs — sub-fifty lines including the polling loop and the GLB download.
Here is the part that ate two days.
The first version of the 3D step worked fine on the simple smoke tests we had. The output looked solid in catalog-style shots — clean object, isolated against a backdrop. So we shipped the canary and ran an end-to-end test on a Rolleiflex camera photo we had been using as a reference image for half a year.
The turntable opened on the front of the camera. Beautiful. Then it rotated 90°. And we saw a sliver. A thin sliver — barely visible at this angle. The model was a slab.
We measured. The thin axis of the bounding box was 15.7% of the longest axis. For a Rolleiflex — a roughly cube-shaped object that should be near 1:1:1 — that is a flat pancake.
The catch: from the front it looked perfect. The model had taken the input photo and built something that was mostly an extruded postcard. Texture was sharp on the front face, geometry was almost zero on the others. Three of our camera presets — turntable, dolly, cinematic — would all eventually expose the slab back or edge-on. We were going to ship a beautiful product gallery for thirty seconds, then a five-minute argument with a customer.
So we did the thing we did not want to do. We swapped the 3D engine for one that uses sparse-structure flow over a 3D occupancy grid instead of single-image-from-front extrusion. Re-validated on the same canary:
That is a 3.8× depth recovery. Above 40% — our internal threshold for "this is a 3D shape, not a 3D image." And visually unmistakable:
Same camera, side-on profile. Real depth, side controls visible, no slab artifact. This is the cinematic preset mid-arc.
We added one more thing for safety. Before each render, the renderer now measures the bounding-box extents of the mesh and rotates it so the largest face points square to the camera at angle 0. This means:
And as a defensive belt: if any future input ever produces a thin-axis ratio under 30% despite the new 3D engine, the renderer falls back to a ±60° rocking arc instead of a full sweep, so a slab — if one ever sneaks back in — can never be visible from a bad angle.
The auto-orient pass costs us nothing — it's a single 4×4 transform on the mesh — and it papers over a class of bugs we'd otherwise have to debug per-input.
| Step | Credits | USD |
|---|---|---|
/v1/studio/init (detect) | 5 | $0.005 |
/v1/studio/transform (full pipeline) | 75 | $0.075 |
| End-to-end | 80 | $0.08 |
No subscription. The transform credits are auto-refunded on any failure or timeout in the pipeline.
Lensora Studio is a fit when:
It is not a fit when:
If you hit edge cases (interesting geometry, unusual subjects, a corner case our slab-detector missed), I'd love to hear about them. The canary that exposed the original slab was a Rolleiflex sitting on a desk for unrelated reasons — sometimes the bug only shows up on the photo you weren't expecting to test against.