🧠 Cognitive Science × AI

CogBench

Do Language Models Think Like Humans?

A comprehensive research initiative bridging cognitive science and artificial intelligence. We investigate whether large language models exhibit human-like cognitive abilities through empirical studies in memory, reasoning, and comprehension.

Core Research Team

Karin de Langis

Karin de Langis*

PhD Cand.

UMN

Khanh Chi Le

Khanh Chi Le

UG

UMN

Jong Inn Park

Jong Inn Park

Engineer

UMN

Bin Hu

UG

UMN

Dongyeop Kang

Dongyeop Kang+

Prof.

UMN

* Project Lead + Principle Investigater (PI)

Interdisciplinary Collaborators

Püren Öncel

Püren Öncel

Postdoc

U. Valencia

Andreas Schramm

Andreas Schramm

Prof. Emr.

Hamline

Andrew Elfenbein

Andrew Elfenbein

Assoc. Prof.

UMN

Laura K. Allen

Laura K. Allen

Assoc. Prof.

UMN

Mike Mensink

Mike Mensink

Researcher

UW-Stout

About CogBench

CogBench is a research initiative that bridges cognitive science and artificial intelligence. We conduct empirical studies comparing the cognitive abilities of large language models with human cognition across multiple domains including executive functioning, working memory, temporal reasoning, and narrative comprehension.

Our mission is to understand whether and how LLMs exhibit human-like thinking patterns, and to develop better frameworks for evaluating AI systems through the lens of established cognitive psychology theories.

Publications

Executive Function Paper

Strong Memory, Weak Control

Karin de Langis, Jong Inn Park, Bin Hu, Khanh Chi Le, Andreas Schramm, Michael C. Mensink, Andrew Elfenbein, Dongyeop Kang

EACL 2026 (Oral) Read →
Temporal Reasoning Paper

Temporal Reasoning in Narratives

Karin de Langis, Jong Inn Park, Andreas Schramm, Bin Hu, Khanh Chi Le, Dongyeop Kang

ACL 2025 Read →
Narrative Coherence Paper

Narrative Coherence in LLMs

Karin de Langis, Püren Öncel, Ryan Peters, Andrew Elfenbein, Laura Kristen Allen, Andreas Schramm, Dongyeop Kang

EACL 2026 (Oral) Read →
🤖

Human vs. AI Cognition

Andreas Schramm, Karin de Langis, Jong Inn Park, Anh Thu Tong, Michael Mensink, Bin Hu, Khanh Chi Le, Dongyeop Kang

AAAL 2025 To appear

Research Objectives & Key Findings

🎯 Core Thesis

Despite impressive performance on surface-level tasks, large language models exhibit fundamental cognitive limitations when compared to human cognition. Our research reveals a consistent pattern: LLMs demonstrate disparities between capacity (what they can do) and control (how they apply that capacity). They possess remarkable working memory but struggle with executive control and strategic reasoning. They have internal representations of coherence but fail to articulate it. They over-rely on prototypical patterns rather than constructing robust semantic representations.

💪

Executive Control: The Critical Gap

Finding: LLMs have larger working memory capacity than humans but perform worse on executive control tasks like the Wisconsin Card Sorting Test.

This asymmetry reveals deficits in attentional control and cognitive flexibility—the ability to inhibit automatic responses and adapt to shifting information.

de Langis et al. (EACL 2026)

Temporal Reasoning: Pattern Over Pragmatics

Finding: LLMs over-rely on prototypicality and produce inconsistent judgments about temporal meaning in narratives.

They struggle with causal reasoning derived from grammatical aspect and fail to construct semantic representations in a human-like manner.

de Langis et al. (ACL 2025)

📖

Narrative Coherence: Internal vs. External

Finding: LLMs' internal representations can detect incoherence, but their outputs fail to consistently separate coherent and incoherent narratives.

LLMs are more sensitive to world-knowledge violations than to character-trait violations, revealing a fundamental gap in semantic understanding.

de Langis et al. (EACL 2026)

🔬

Standardized Cognitive Evaluation

Objective: Develop reliable cognitive assessment frameworks adapted from human psychology for LLM evaluation.

We provide an evidence-based, rigorous methodology for assessing LLM cognition using classic tasks from cognitive and behavioral sciences.

See CogBench Framework

Research Highlights

We conduct empirical studies across multiple cognitive domains to understand how LLMs compare to human cognition. Scroll through our experimental settings below:

Demo Video

View this video for a demo of the data collection web interface. We will provide the login credentials once you fill out our interest form!

Link to Video

🤝 Interested in Joining?

We are always looking for passionate researchers and students interested in cognitive science and AI. If you'd like to collaborate with us, please reach out!

Contact: Karin de Langis (dento019 [AT] umn.edu) or Dongyeop Kang (dongyeop [AT] umn.edu)

Funding & Support

This research is made possible through the generous support of our funding partners and institutions:

Grammarly
Open Philanthropy
3M
University of Minnesota DDF

Additional support from the University of Minnesota's College of Liberal Arts and the Department of Computer Science & Engineering.

BibTeX Citations

Strong Memory, Weak Control

@inproceedings{delangis2026strongmemory,
  title={Strong Memory, Weak Control: An Empirical Study of Executive Functioning in LLMs},
  author={de Langis, Karin and Park, Jong Inn and Hu, Bin and Le, Khanh Chi and Schramm, Andreas and Mensink, Michael C and Elfenbein, Andrew and Kang, Dongyeop},
  booktitle={Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL)},
  year={2026},
}

Temporal Reasoning in Narratives

@inproceedings{delangis2025temporal,
  title={How LLMs Comprehend Temporal Meaning in Narratives: A Case Study in Cognitive Evaluation of LLMs},
  author={de Langis, Karin and Park, Jong Inn and Schramm, Andreas and Hu, Bin and Le, Khanh Chi and Kang, Dongyeop},
  booktitle={Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL)},
  year={2025}
}

Narrative Coherence Detection

@inproceedings{delangis2026narrative,
  title={Mary, the Cheeseburger-Eating Vegetarian: Do LLMs Recognize Incoherence in Narratives?},
  author={de Langis, Karin and \"Oncel, P{\"u}ren and Peters, Ryan and Elfenbein, Andrew and Allen, Laura Kristen and Schramm, Andreas and Kang, Dongyeop},
  booktitle={Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL)},
  year={2026},
}

Human vs. AI Cognition

@inproceedings{schramm2025human,
  title={How ``human'' is AI? A comparison of temporal meanings in 8 LLMs and 3 populations},
  author={Schramm, Andreas and de Langis, Karin and Park, Jong Inn and Tong, Anh Thu and Mensink, Michael and Hu, Bin and Le, Khanh Chi and Kang, Dongyeop},
  booktitle={Proceedings of the American Association for Applied Linguistics (AAAL) Conference},
  year={2025}
}