SDS Seminar Series – Lydia Lucchesi, University of Texas at Austin
Sep
12
2025

Sep
12
2025
Description
The Fall 2025 SDS Seminar Series continues on September 12th from 2:00 p.m. to 3:00 p.m. with Dr. Lydia Lucchesi (Postdoctoral Fellow, Department of Statistics and Data Sciences, University of Texas at Austin). This event is in-person in the Avaya Room (POB 2.302).
Title: Visual Documentation for Data Preprocessing in R and Python
Abstract: Data preprocessing is a crucial intermediate stage in many data analyses but is often overlooked in the documentation and dissemination of research. This talk introduces the smallsets R package for building Smallset Timelines, a static and compact visualization for communicating data preprocessing decisions. The Smallset Timeline is composed of small dataset snapshots documenting the sequence of decisions in a preprocessing pipeline. The smallsets R package builds this figure from a user’s R or Python preprocessing script, containing structured comments with snapshot instructions. This talk also presents findings from a focus group study that gathered feedback from prospective smallsets users on the package’s utility and usability. The feedback will be used to inform future software development efforts for smallsets.
Other Events in This Series
Mar
1
2024
SDS Seminar Series – Dr. Laura Hatfield
Predict, Correct, Select: A New General Identification Strategy for Controlled Pre-Post Designs
2:00 pm – 3:00 pm • Virtual
Speaker(s): Laura Hatfield
Mar
22
2024
SDS Seminar Series – Dr. Sivaraman Balakrishnan
Statistical Inference for Optimal Transport
2:00 pm – 3:00 pm • In Person
Speaker(s): Sivaraman Balakrishnan
Mar
29
2024
SDS Seminar Series – Dr. Purna Sarkar
Some New Results for Streaming Principal Component Analysis
2:00 pm – 3:00 pm • In Person
Speaker(s): Purna Sarkar
Apr
12
2024
SDS Seminar Series – Dr. Daniela Witten
Data Thinning and Its Applications
2:00 pm – 3:00 pm • In Person
Apr
19
2024
SDS Seminar Series – Dr. William Rosenberger
Design and Inference for Enrichment Trials with a Continuous Biomarker
2:00 pm – 3:00 pm • In Person
Speaker(s): William Rosenberger
Apr
26
2024
SDS Seminar Series – Dr. Bodhisattva Sen
Extending the Scope of Nonparametric Empirical Bayes
2:00 pm – 3:00 pm • In Person
Speaker(s): Bodhisattva Sen
Sep
6
2024
SDS Seminar Series – Christine Peterson, University of Texas MD Anderson Cancer Center
New Methods for Microbiome Data Integration
2:00 pm – 3:00 pm • In Person
Speaker(s): Christine Peterson
Sep
13
2024
SDS Seminar Series – Matthew Vanaman, University of Texas at Austin
Data Analysis from the Zoo to the Wild and Back
2:00 pm – 3:00 pm • In Person
Speaker(s): Matthew Vanaman
Sep
20
2024
SDS Seminar Series – Saptarshi Roy, University of Texas at Austin
On the Computational Complexity of Private High-dimensional Model Selection
2:00 pm – 3:00 pm • In Person
Speaker(s): Saptarshi Roy
Sep
27
2024
SDS Seminar Series – Abhra Sarkar, University of Texas at Austin
(Bayesian) Semiparametric Local Inference (and Other Stories)
2:00 pm – 3:00 pm • In Person
Speaker(s): Abhra Sarkar