SDS Seminar Series – Lydia Lucchesi, University of Texas at Austin
Sep
12
2025
Sep
12
2025
Description
The Fall 2025 SDS Seminar Series continues on September 12th from 2:00 p.m. to 3:00 p.m. with Dr. Lydia Lucchesi (Postdoctoral Fellow, Department of Statistics and Data Sciences, University of Texas at Austin). This event is in-person in the Avaya Room (POB 2.302).
Title: Visual Documentation for Data Preprocessing in R and Python
Abstract: Data preprocessing is a crucial intermediate stage in many data analyses but is often overlooked in the documentation and dissemination of research. This talk introduces the smallsets R package for building Smallset Timelines, a static and compact visualization for communicating data preprocessing decisions. The Smallset Timeline is composed of small dataset snapshots documenting the sequence of decisions in a preprocessing pipeline. The smallsets R package builds this figure from a user’s R or Python preprocessing script, containing structured comments with snapshot instructions. This talk also presents findings from a focus group study that gathered feedback from prospective smallsets users on the package’s utility and usability. The feedback will be used to inform future software development efforts for smallsets.
Other Events in This Series
Apr
3
2026
SDS Seminar Series – Leo Duan, University of Florida
TBA
2:00 pm – 3:00 pm • In Person
Speaker(s): Leo Duan
Apr
17
2026
SDS Seminar Series – Rina Foygel Barber, University of Chicago
TBA
2:00 pm – 3:00 pm • In Person
Speaker(s): Rina Foygel Barber