SDS Seminar Series – Lydia Lucchesi, University of Texas at Austin
Sep
12
2025

Sep
12
2025
Description
The Fall 2025 SDS Seminar Series continues on September 12th from 2:00 p.m. to 3:00 p.m. with Dr. Lydia Lucchesi (Postdoctoral Fellow, Department of Statistics and Data Sciences, University of Texas at Austin). This event is in-person in the Avaya Room (POB 2.302).
Title: Visual Documentation for Data Preprocessing in R and Python
Abstract: Data preprocessing is a crucial intermediate stage in many data analyses but is often overlooked in the documentation and dissemination of research. This talk introduces the smallsets R package for building Smallset Timelines, a static and compact visualization for communicating data preprocessing decisions. The Smallset Timeline is composed of small dataset snapshots documenting the sequence of decisions in a preprocessing pipeline. The smallsets R package builds this figure from a user’s R or Python preprocessing script, containing structured comments with snapshot instructions. This talk also presents findings from a focus group study that gathered feedback from prospective smallsets users on the package’s utility and usability. The feedback will be used to inform future software development efforts for smallsets.
Other Events in This Series
Sep
8
2023
SDS Seminar Series – Dr. Emily Roberts
A Causal Inference Approach for Surrogate Marker Evaluation with Mixed Models
2:00 pm – 3:00 pm • In Person
Speaker(s): Emily Roberts
Sep
15
2023
SDS Seminar Series – Dr. Dimitris Korobilis
Monitoring Multicountry Macroeconomic Risk
2:00 pm – 3:00 pm • Virtual
Speaker(s): Dimitris Korobilis
Sep
22
2023
SDS Seminar Series – Dr. Will Fithian
Estimating the False Discovery Rate of Model Selection
2:00 pm – 3:00 pm • In Person
Speaker(s): Will Fithian
Sep
29
2023
SDS Seminar Series – Dr. David Moriarty
A Data Science Journey in Business
2:00 pm – 3:00 pm • In Person
Speaker(s): David Moriarty
Oct
6
2023
SDS Seminar Series – Dr. Amanda Ellis
Navigating the Future of Statistics Education: Leveraging ChatGPT's Advantages and Overcoming Challenges
2:00 pm – 3:00 pm • Virtual
Speaker(s): Amanda Ellis
Oct
20
2023
SDS Seminar Series – Dr. Amy Zhang
Bisimulation and Reinforcement Learning
2:00 pm – 3:00 pm • Virtual
Speaker(s): Amy Zhang
Oct
27
2023
SDS Seminar Series – Dr. Marcelo Medeiros
Global Inflation Forecasting: Benefits from Machine Learning Methods
2:00 pm – 3:00 pm • Virtual
Speaker(s): Marcelo Medeiros
Nov
3
2023
SDS Seminar Series - Dr. Steve Yadlowsky
Choosing a Proxy Metric from Past Experiments
2:00 pm – 3:00 pm • Virtual
Speaker(s): Steve Yadlowsky
Nov
10
2023
SDS Seminar Series – Drew Herren
Statistical Aspects of SHAP: Functional ANOVA for Model Interpretation
2:00 pm – 3:00 pm • In Person
Speaker(s): Drew Herren
Dec
1
2023
SDS Seminar Series – Dr. Dave Zhao
High-Dimensional Nonparametric Empirical Bayes Problems in Genomics
2:00 pm – 3:00 pm • In Person
Speaker(s): Dave Zhao