We designed and implemented a Chrome extension, which enables the real-time collection of keystroke trajectories in the Overleaf platform without disturb participants' writing process. You can browse the extension code through this link.
To install and run this extention please do the following:Start the backend by following procedure in Run ScholaWrite System section, and setup URL in Chrome extension by completing step 1 in Setup the Extention section
The extension only works on Chrome. Open your browser & Search "chrome://extensions/"
On the top right corner of the page, toggle on 'Developer Mode'.
Click on 'Load unpacked' on the top left corner, and you should see a pop-up of the file directory.
Click on the folder named 'extension' in the folder where you download/clone the repository. Then, click on the toggle on the right side, enabling the extension!
At the top right corner, click on the puzzle icon followed by the pin icon next to the extension. The “S” logo will be shown!
Click on the 'S' icon to register your account. Register your username with your email address and password you would like.
After successful registration, you will see a welcome message. Toggle on the "Record writer actions" button. Now The Chrome extension setup is complete. Enjoy your writing!
Note: The operating logic of our Chrome extension is through listening frontend elements. Due to update of HTML elements in Overleaf, our Chrome extension is no longer able to recording writer actions
To make labeling process easy and smooth, we developed a novel web app to the replay collected keystroke data. The interface offers various modes for visualizing keystroke collections within each Overleaf project: by time, LaTeX file, and author. You can browse the code through this link.
Step-by-step Annotation Procedures:In the login page, enter an existing username or create a new one. Then click the blue login button.
Choose the title of the project that you would like to annotate, and click the button 'Switch'
Click 'By Time' button on the interface and start reading keystrokes.
Click through the first few actions and try to identify which high-level label is occurring (Planning, Implementation, or Revision).
Once identified the current high-level label, attempt to identify where it ends.
Once you know the start and end indices for the current high-level label,
Below is a data entry of one keystroke data in scholawrite. You can find the full dataset in the Huggingface data card
Figure a
Figure b
Figure c
To examine the generalizability of the models trained on the SCHOLAWRITE, we test end-to-end writing tasks. Our dataset is organaized with tuples of before text, after text and the scholarly writing intention (Figure [a])
We use this dataset to train two models: one for classification of writing intentions, and another for implementing writing assistance for that intention. These models are shown in figure [b].
The finetuned models were compared with GPT-4o and Llama3.1 8B on our novel iterative-writing setup shown in figure [c].
The goal of this setup is to understand the aptitude of our finetuned writing assistants on a self-writing application. We started our iterative writing with our finetuned models and GPT-4o with a seed text consisting of an abstract, title, and introduction paragraph from computer science reserach papers. We ran the iterative writing for 100 iterations.
We evaluated the results of the finetuned model and GPT-40 iterative writign with human evaluation and automatic evaluation metrics.
The following quantitative analysis metrics were also used to measure the quality of writing outputs:
We vsialized the self-iterative writing output of Llama-8B-SW and Llama-8B-Instruct on all four seeds. It is the slight modification of web page used for human evaluation in paper, which includes model names and backword navigation button. The red and strikethrough represent deletion, while green represent addition. You can access our demo page through this link.
Here is the tutorial:Picture above is the UI of demo page. First, click the drop down menu to select the seed document would like to read. After selection, click 'Switch' button on the right.
At the bottom of the page, from left to right, are progress bar, backword button, play button, and forward button. Progress bar shows the relative position of current text you are reading. Clicking forward button will display the text of previous iteration. Clicking play button will showing text of next iteration every one second. Clicking forward button display the text of next iteration. You can also use arrow keys on the keyboard to navigate to previous/next iteration.
You can also have more granular control to read text from different iterations by muanlly input the number on the top right corner of the UI. The number you entered represents the number of iterations. When you're done typing, press enter. Then the page will display the writing from your specified iteration
On the top of each model's output, there are writing intention and model's name.
After finishing viewing the outputs, click 'Go Back' button on the upper right corner to get back to this project page.
ScholaWrite 2.0 is launching! Here's what we aim to do:
Collect 50-100 Overleaf projects from multiple academic fields and researchers with different experience levels.
Uncover patterns and gain insights into multi-author collaborative writing and human-AI collaborative writing.
Develop an assistant that can better understand human cognitive writing behavior and support cognitively-aware suggestions without disturbing their thinking process.
If you would like to contribute your writing activities to the new dataset or want to join us to develop the ScholaWrite 2.0, don't hesitate to reach out! Please fill out this form and we will be reach out to you shortly.
@misc{wang2025scholawritedatasetendtoendscholarly,
title={ScholaWrite: A Dataset of End-to-End Scholarly Writing Process},
author={Linghe Wang and Minhwa Lee and Ross Volkov and Luan Tuyen Chau and Dongyeop Kang},
year={2025},
eprint={2502.02904},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.02904},
}