Module 41: Process Data

In this digital ITEMS module, Dr. Susu Zhang, Dr. Qiwei He, and Sunbeom Kwon go over the structure, analysis methods, and applications of log files from computer-based assessments.

Module Overview

Process data, such as log files from digital assessments, provide detailed records of how examinees interact with assessment tasks. These data offer opportunities to study test-taking behavior, strategy use, and human-machine interaction in ways that final item scores alone cannot capture. This module introduces the structure and characteristics of process data in large-scale digital assessments and presents several approaches for transforming raw action sequences into numerical features that can be used in statistical and psychometric analyses. The module covers both expert-derived and data-driven feature extraction methods, including pattern-based indicators, n-grams, multidimensional scaling, and sequence autoencoders. A hands-on section demonstrates data wrangling and feature extraction in R using the PISA 2012 Climate Control item and the ProcData package. The final section presents case studies showing how process-derived features can be used to study test accommodations, improve measurement precision, and reduce and interpret differential item functioning. By the end of the module, learners should have a practical introduction to process data and a foundation for incorporating process-derived information into educational measurement research.

Susu Zhang, Ph.D.

Dr. Susu Zhang is an Associate Professor of Psychology and Statistics at the University of Illinois Urbana-Champaign. Her work develops methods to incorporate process data into psychometric and statistical models to improve educational measurement. She has worked with process data from PISA, PIAAC, and NAEP and has led projects funded by IES and AERA-NSF analyzing the NAEP process data. She is a co-author of the ProcData R package and has co-hosted multiple short courses on statistical learning for process data.

Qiwei He, Ph.D.

Dr. Qiwei He is a Provost’s Distinguished Associate Professor in the Data Science and Analytics Program and the Founder and Director of the AI Measurement and Data Science Lab at Georgetown University. Her research advances psychometric and data science methodologies for multimodal process data, integrating sequence mining, text mining, psychometric modeling, and machine learning in educational and psychological assessment. She is particularly engaged in national and international large scale assessments, developing analytic frameworks that illuminate human behavior, learning processes, and measurement validity.

Sunbeom Kwon

Sunbeom Kwon is a Ph.D. candidate in Quantitative Psychology at the University of Illinois Urbana-Champaign, where he also earned an M.S. in Applied Statistics. His research focuses on psychometrics and data science, and his current work includes process data analysis, copula-based latent variable models, AI fairness evaluation, and AI-assisted psychometric models. He completed the Ida Lawrence Research Summer Internship at ETS in 2024. Previously, he earned a B.S. and an M.S. in Psychology from Sungkyunkwan University.

Introduction

Upon completion of this ITEMS module, learners should be able to:

  • Understand the structure and key characteristics of process data collected from large-scale digital assessments.
  • Gain familiarity with approaches for extracting features from action sequences.
  • Implement basic data wrangling and process feature extraction in R using assessment log data.
  • Interpret the structure, distribution, and substantive meaning of process-derived features.
  • Learn how process-derived features can address questions of test design, measurement reliability, and fairness.

Section 1: Introduction to Process Data

Upon completion of this section, learners should be able to:

  • Define process data and learn about the data structure.
  • Identify examples of process data from publicly available assessment datasets.
  • Develop intuition on process data’s potential utility for measurement and education.

Section 2: Extracting Process Data Features

Upon completion of this section, learners should be able to:

  • Understand the rationale behind extracting numerical features from unstructured log data.
  • Know the purpose and steps of extracting pattern-based summary indicators.
  • Explain different types of data-driven process feature extraction methods.

Section 3: Hands-on Coding Exercise: Data Wrangling and Feature Extraction in R

Upon completion of this section, learners should be able to:

  • Import and wrangle process data in R.
  • Implement expert-derived feature extraction with PISA Climate Control item process data.
  • Apply basic data-driven feature extraction using R ProcData package.
  • Explore the structure, distribution, and interpretations (if applicable) of extracted features.

Section 4: Applications of Process Features in Measurement and Education

Upon completion of this section, learners should be able to:

  • Describe case studies in which process-derived features are used to answer measurement questions.
  • Know the key considerations for modeling and statistical analysis using process-derived features.