Copula Lab

Data is the world modeled.

Expert-built benchmarks, high-quality data, and real-world environments for frontier models and vertical agents.

Benchmarks

Coming in three weeks.

View methodology →

VLM Benchmark

3 weeks

A high-difficulty evaluation suite testing vision-language models on complex, real-world scenarios. Goes beyond standard academic benchmarks to surface where leading VLMs actually fail in professional contexts.

Get notified when it ships:

Office Workflow Benchmark

3 weeks

Systematic evaluation of model performance in professional office workflows: multi-step document understanding, cross-format reasoning, and tool use across realistic enterprise scenarios.

Get notified when it ships:

Why us

How we're different.

01

Model-native perspective

Both founders built benchmark and evaluation systems at frontier labs. We understand the actual bottlenecks from the inside — not from the outside looking in.

02

Global ecosystem reach

Proven track record of international open-source launches and revenue growth across domestic and overseas markets. We reach the labs that matter.

03

Scalable delivery

We build standardized data pipelines and delivery packages. Quality scales without headcount scaling linearly — because we've designed for it from day one.

Team

Built by people who've been there.

Jiaren Cai蔡佳人

Technical Cofounder

Former open-source lead and post-training researcher at MiniMax. Built the vibe benchmark from 0→1. Drove post-training evolution of the M2 series in coding. Led international open-source launch of M2, M2.1, and M2.5.

Li Liang梁丽

Product Cofounder

Former Agent product lead at MiniMax, managing a 20+ person team. Achieved 5× revenue growth and 8-figure GMV in Q1 2026. Built MiniMax's internal Agent Benchmark from scratch. Peking University; previously Tencent Product Management Program.

We're hiring — founding team

Looking for a BD Lead with AI infra commercialization experience, and an Expert Ecosystem Lead with reach into academic and professional specialist communities.

View roles →

Why copula

"In statistics, a copula joins separate marginal distributions into a true joint distribution — Sklar's theorem, 1959. Capability benchmarks and real-world deployment are two marginals. Most companies measure each in isolation. We model the dependence between them: including the tails, where models actually fail."

— COPULA LAB

Ready to close the execution gap?

We work directly with frontier model teams. Reach out to discuss data needs, benchmark access, or expert collaboration.

Talk to us