ScholaWrite - Understanding Scholarly Scientific Writing with AI

Are you currently a graduate student and working on an Overleaf draft for journal/conference submissions? Or, are you early-career researchers who are interested in the writing behaviors of scholars in scientific domains? Our study has the following two objectives:

  • We are interested in studying how scholars in scientific domains write a paper and capturing certain patterns of their writing behaviors.
  • We would like to obtain insights about how scholarly writers interact with an AI-powered (ChatGPT) paraphrasing tool and are influenced by the interaction in one’s writing behavior.

1. Eligibility

Participants must work on an Overleaf project for submission. Good candidates will be scholars who:

  • are enrolled in a graduate school program or have completed an M.S. or PhD;
  • are currently engaged in a research project and are about to start writing a paper;
  • are willing to write their paper entirely in Overleaf with our keystroke tracking Chrome extension installed;
  • are willing to participate in a study lasting for several months; and
  • are English speaking (e.g., Native English speakers, international students who are proficient in English writing)

If selected, the co-authors of participants will be asked to participate upon agreement.

2. Study Overview

Participants will have to install a Chrome extension ‘ScholaWrite’ into their Chrome. Therefore, they will be required to use Chrome for writing in Overleaf.

The data collected by ScholaWrite during this experiment is anonymized keystroke data in Overleaf, i.e., the information gathered from a human using a keyboard, such as which keys are pressed and when, and the sequence of keys pressed (these sequences make up words, sentences, paragraphs, etc.). Also, the associated timestamps and ChatGPT-suggested paraphrasing would be stored under anonymity, too.

Every two weeks, participants will be asked to complete a survey about their sentiments about the writing progress, interaction with AI paraphrasing tools, etc.

This work is conducted under approval by our PI, Prof. Dongyeop Kang at Minnesota NLP.

3. Payment

Each participant will receive a $20 Amazon gift card. Only the participants themselves will receive this payment. Note that for a group of authors working on a single paper, each of them will receive this payment.

4. Location

This is a remote study. You just need to turn on our ScholaWrite system and work on your overleaf project.

5. Timeframe

The associated IRB has been approved. You will be invited to join an onboarding session for your consent to participation.

IRB number: 00019901

6. Contact Info

Coordinators: Minhwa Lee

Supervisor: Prof. Dongyeop Kang

7. Onboarding Materials

If you are interested in participation, please fill out this eligibility inquiry form.

sample image


Note

We attach the following two forms to encourage participation in our research study without concern about the data collection procedures, anonymity process, and more details about how we plan to conduct our study. Please refer to the ‘Consent Form’ for more details on the study procedures, and ‘Study Flyer’ for a short summary of our IRB-approved study.

  1. Consent Form: Link
  2. Official Study Flyer: Link

About Storing Data for Future Use

(1) Storage and Access:

The data collected during the course of the study will be stored on a password-protected server owned by our research laboratory. The data will be stored indefinitely for future research purposes. After each participant has finished the publication process for their project we recorded that they intend to publish, we will release the dataset for public use except for records where the release would infringe on copyright laws.

(2) Data:

The following data elements will be collected, stored, and shared publicly in the future: (1) Anonymized raw keystroke data; (2) Timestamps associated with keystroke data; (3) Paraphrasing requests; (4) their timestamps and associated keystrokes; and (5) whether the paraphrase was accepted or rejected.

(3) Release/Sharing:

The dataset, including all data elements, will be released with anonymized information about participants, and only after each participant has completed the publication process. Records associated with a publication that is not open-access will not be made public. The dataset will be licensed under Creative Commons Attribution-NonCommercial 4.0 International.