How does real science actually get done? Help us record it.
We are building open datasets of scientific work — from single computational
tasks to entire multi-month research projects — so that AI agents can learn to
understand, assist, and eventually carry out research. Three complementary efforts
are recruiting participants and collaborators right now.
Across UMN: Chemistry · Biology · Medicine · Aerospace · Civil & Environmental · Materials Science · Electrical & Computer Eng. · Robotics · & moreThree time horizons: tasks → sessions → projects · Open data, co-authorship for contributing labs
Now recruitinggraduate students, lab members, and PIs across departments — refer an interested colleague or contribute your own lab's work.
One question, three scales of data
Each project captures scientific work at a different granularity. Together they span
the full arc of research — what a scientist does in an hour, and what unfolds over half a year.
They share a participant pool and recording infrastructure across departments.
If your lab works with simulators or computational tools, contribute well-specified tasks that become a public benchmark for AI agents on real scientific software.
What we collectA task description, its expected outcome, and the metrics that decide success — across chemistry, biology, aerospace, civil, and materials science.
Step 1
Scope
A short interview to scope the tasks your lab cares about.
Step 2
Submit
Specify each task — description, expected outcome, evaluation metrics — as a YAML manifest.
Step 3
Benchmark
We run agents against your tasks and post results to a public leaderboard.
Co-authorshipAll participants are invited as co-authors
ProducesSciHarbor — a benchmark evaluating AI agents on real simulators (OpenFOAM, GROMACS, AlphaFold, Ansys, SAP2000, OpenRocket…). 300+ tasks in Phase 1, MIT-licensed, with an open leaderboard.
Related workFollows the Terminal-Bench style of agentic benchmark and live leaderboard.Terminal-Bench ↗
Phase 2 extends to physical workflows once robotic arms are installed.
Contribute your own scientific work — a couple of hours of lab recording, a
set of computational tasks, or passive recording of a project you're already running.
We handle setup, scheduling, and equipment.
For graduate students & lab members
Collaborate
Bring your lab into one or more of the projects. PIs help define what "correct"
looks like in their domain and open the door for their students to participate.
For PIs & research groups across departments
Co-authorship and contribution opportunities are available for participating PIs and lab members across all three projects.
This initiative — and the team behind it — is featured in the
Minnesota NLP Lab project portfolio,
alongside our other work on expert workflows in science, law, education, and journalism.