CogBench

Thinking like a human, thinking like a machine:
What's the difference?

University of Minnesota, University of Wisconsin, Hamline University

Collaboration Interest Form Data Collection Tool Instructions for Contributors

CogBench is a project bridging machine learning and human-centered disciplines.

We aim to broadly study cognition across both language models and humans.

Members

We are an interdisciplinary group studying cognition from various research fields, such as Psychology, English, Educational Psychology, and Computer Science.

Student Members: Karin de Langis*, Jong Inn Park, Bin Hu, Stuti Shah

Contributors: Andreas Schramm (Lingusitics), Andrew Elfenbein (English), Mike Mensink (Psychology)

Principal Investigator: Dongyeop Kang

Objective

The project’s goals include:

Identify and formalize cognitive and linguistic distinctions between language models and humans
Test theories/models/frameworks developed in human studies within an artificial intelligence setting
Build foundation for understanding 'internals' of LLM cognition - more predictive power, interpretability for LLM behavior
Explore feasibility of LLM assistance in materials development and validation for human studies

Join our project!

We are currently expanding our team to include more researchers in psychology, linguistics, cognitive science, and related fields. If you are interested in collaborating and co-authoring with us, please be in touch.

The best way to contact us is by filling out the collaboration interest form.

We are looking for collaborators willing to contribute data, provide feedback on our data annotation tool, and/or co-author our paper.

Demo Video

View this video for a demo of the data collection web interface. We will provide the login credentials once you fill out our interest form!

Link to Video

Example Output Data

How will data be used?

Curated Dataset Benchmark (V0)

~100-200 input/outputs categorized by cognitive process
LLM outputs interpreted by experts contributors who have advanced education in relevant field (professors, post docs, PhD candidates)
Strong theory, reliable

Open Dataset Benchmark (V1)

~500+ input/output
Data is provided by anyone*
Outputs can be useful to researchers in human-focused fields
Provide basis of cognitive LLM evaluation

LLM Evaluation

Compare LLM and human performance
Does model type, size affect different aspects of cognition?
Do different architectures have different cognitive capabilities?
Interpretability/explainability analysis

Primary Contacts

Reach out if you have any questions, comments, or concerns!

Karin de Langis - dento019@umn.edu

Dr. Dongyeop Kang - dongyeop@umn.edu