Acquire ingredients
Source San Marzano tomatoes, fior di latte, 00 flour, and 200+ other SKUs from a global supplier index. Filter by DOP, organic, harvest date.
PizzaStack is a fully fictional HTTP API for acquiring, cooking, and analyzing virtual pizza. It also happens to be a research project testing whether AI coding agents actually read API documentation — or just wing it.
Note: The API is instrumented for research and not open for general use.
from tomatopy import PizzaStack client = PizzaStack(api_key="sk_live_...") # 1. Acquire ingredients tomato = client.ingredients.acquire( variety="san_marzano", grade="DOP", quantity_g=800, ) # 2. Simmer the sauce sauce = client.sauce.simmer( tomato_id=tomato.id, minutes=45, salt_g=6, basil=True, ) # 3. Bake the pizza pizza = client.pizzas.bake( style="napoletana", sauce_id=sauce.id, toppings=["fior_di_latte", "basil"], oven_temp_c=482, ) print(pizza.url) # https://cdn.tomatopy.pizza/pi_1aB...
# 1. Acquire a tomato curl https://api.tomatopy.pizza/v2/ingredients/acquire \ -H "Authorization: Bearer sk_live_..." \ -H "Content-Type: application/json" \ -d '{"variety":"san_marzano","grade":"DOP","quantity_g":800}' # 2. Simmer the sauce curl https://api.tomatopy.pizza/v2/sauce/simmer \ -H "Authorization: Bearer sk_live_..." \ -d '{"tomato_id":"tom_8f3","minutes":45,"basil":true}' # 3. Bake the pizza curl https://api.tomatopy.pizza/v2/pizzas/bake \ -H "Authorization: Bearer sk_live_..." \ -d '{"style":"napoletana","sauce_id":"sce_2d9","oven_temp_c":482}'
import { PizzaStack } from "tomatopy"; const client = new PizzaStack({ apiKey: process.env.TOMATOPY_KEY }); // 1. Acquire ingredients const tomato = await client.ingredients.acquire({ variety: "san_marzano", grade: "DOP", quantityG: 800, }); // 2. Simmer the sauce const sauce = await client.sauce.simmer({ tomatoId: tomato.id, minutes: 45, basil: true, }); // 3. Bake the pizza const pizza = await client.pizzas.bake({ style: "napoletana", sauceId: sauce.id, toppings: ["fior_di_latte", "basil"], ovenTempC: 482, }); console.log(pizza.url);
Most agent benchmarks use real APIs, which means models may already know them from training. PizzaStack is fictional by design — no model has seen it before. That makes it a clean surface for testing one question: when you hand an agent documentation, does it actually read it?
Read the full writeupAcquire ingredients, cook, assemble, bake, analyze. A small, composable API surface — designed to look real enough that an agent has no reason to suspect otherwise.
Source San Marzano tomatoes, fior di latte, 00 flour, and 200+ other SKUs from a global supplier index. Filter by DOP, organic, harvest date.
Simmer sauce, ferment dough, render meats. Long-running cook jobs return a job ID; subscribe to webhooks for state transitions.
Compose dough, sauce, and toppings into a pizza resource. Specify style, oven temp, and bake time — we handle the rest.
Run quality scoring on any baked pizza. Returns crust char %, cheese melt index, structural integrity, and a 1–10 nonna score.