Agents are shipping faster than the evals that gate them.
/simulate_
tests products the way a user would —
catching the failures evals miss.
Run a simulation
/
simulate_
Change flow
·
Start over
←
▾
▾
(
0
)
←
Claude Haiku 4.5
Claude Sonnet 4.6
Claude Opus 4.6
Claude Sonnet 5
GPT-5.3 Instant
GPT-5.4 Thinking
GPT-5.4 Pro
GPT-5.4-mini
←
18-25
26-40
41-60
60+
iPhone
iPad
Mac Desktop
Mac Laptop
Android Phone
Android Tablet
Windows Laptop
Windows Desktop
✎
—
—
🔋
▶
0
(
0
)
🎤
💬
🎤
→
‹
›
×
✓
✓
✓
✓