SDS Seminar Series – Lydia Lucchesi, University of Texas at Austin
Sep
12
2025

Sep
12
2025
Description
The Fall 2025 SDS Seminar Series continues on September 12th from 2:00 p.m. to 3:00 p.m. with Dr. Lydia Lucchesi (Postdoctoral Fellow, Department of Statistics and Data Sciences, University of Texas at Austin). This event is in-person in the Avaya Room (POB 2.302).
Title: Visual Documentation for Data Preprocessing in R and Python
Abstract: Data preprocessing is a crucial intermediate stage in many data analyses but is often overlooked in the documentation and dissemination of research. This talk introduces the smallsets R package for building Smallset Timelines, a static and compact visualization for communicating data preprocessing decisions. The Smallset Timeline is composed of small dataset snapshots documenting the sequence of decisions in a preprocessing pipeline. The smallsets R package builds this figure from a user’s R or Python preprocessing script, containing structured comments with snapshot instructions. This talk also presents findings from a focus group study that gathered feedback from prospective smallsets users on the package’s utility and usability. The feedback will be used to inform future software development efforts for smallsets.
Other Events in This Series
Oct
4
2024
SDS Seminar Series – Huiyan Sang, Texas A&M University
GS-BART: Graph Split Additive Decision Trees for Spatial and Network Data
2:00 pm – 3:00 pm • In Person
Speaker(s): Huiyan Sang
Oct
11
2024
SDS Seminar Series – Mingyuan Zhou, University of Texas at Austin
Building Faster, Better, and Safer Deep Generative Models via Score Identity Distillation
2:00 pm – 3:00 pm • In Person
Speaker(s): Mingyuan Zhou
Oct
18
2024
SDS Seminar Series – Sherry Zhang, University of Texas at Austin
Pivoting between Space and Time: Spatio-Temporal Analysis with Cubble
2:00 pm – 3:00 pm • In Person
Speaker(s): Sherry Zhang
Oct
25
2024
SDS Seminar Series – Matt Koslovsky, Colorado State University
Sparse Dirichlet-Multinomial Models
2:00 pm – 3:00 pm • In Person
Speaker(s): Matt Koslovsky
Nov
1
2024
SDS Seminar Series – Aaditya Ramdas, Carnegie Mellon University
A Game-Theoretic Theory of Statistical Evidence
2:00 pm – 3:00 pm • In Person
Speaker(s): Aaditya Ramdas
Nov
8
2024
SDS Seminar Series – Myungsoo Yoo, University of Texas at Austin
Dynamic Spatio-Temporal Model Integrating Physics for Fire Front Propagation
2:00 pm – 3:00 pm • In Person
Speaker(s): Myungsoo Yoo
Nov
15
2024
SDS Seminar Series – Rafael Irizarry, Harvard University
Twenty-Five Years of Data Science: Music, Genomics, and Public Health Surveillance
2:00 pm – 3:00 pm • In Person
Speaker(s): Rafael Irizarry
Mar
7
2025
SDS Seminar Series - Arun Kuchibhotla, Carnegie Mellon University
Adaptive Inference Techniques for Some Irregular Problems
2:00 pm – 3:00 pm • In Person
Speaker(s): Arun Kuchibhotla
Mar
28
2025
SDS Seminar Series – Po-Ling Loh, University of Cambridge
Differentially Private M-estimation via Noisy Optimization
2:00 pm – 3:00 pm • In Person
Speaker(s): Po-Ling Loh
Apr
18
2025
SDS Seminar Series – Richard Samworth, University of Cambridge
How Should We Do Linear Regression?
2:00 pm – 3:00 pm • In Person
Speaker(s): Richard Samworth