SDS Seminar Series – Dr. Daniela Witten
Apr
12
2024

Apr
12
2024
Description
The Spring 2024 SDS Seminar Series continues on April 12th from 2:00 p.m. to 3:00 p.m. with Dr. Daniela Witten (Biostatistics and Statistics, University of Washington). This event is in-person.
Title: Data Thinning and Its Applications
Abstract: We propose data thinning, a new approach for splitting an observation from a known distributional family with unknown parameter(s) into two or more independent parts that sum to yield the original observation, and that follow the same distribution as the original observation, up to a (known) scaling of a parameter. This proposal is very general, and can be applied to a broad class of distributions within the natural exponential family, including the Gaussian, Poisson, negative binomial, Gamma, and binomial distributions, among others. Furthermore, we generalize data thinning to enable splitting an observation into two or more parts that can be combined to yield the original observation using an operation other than addition; this enables the application of data thinning far beyond the natural exponential family. Data thinning has a number of applications to model selection, evaluation, and inference. For instance, cross-validation via data thinning provides an attractive alternative to the "usual" approach of cross-validation via sample splitting, especially in unsupervised settings in which the latter is not applicable. We will present an application of data thinning to single-cell RNA-sequencing data, in a setting where sample splitting is not applicable. This is joint work with Anna Neufeld (Fred Hutch), Ameer Dharamshi (University of Washington), Lucy Gao (University of British Columbia), and Jacob Bien (University of Southern California).
Location
Peter O’Donnell Jr. Building (POB) 2.302
Share
Other Events in This Series
Sep
12
2025
SDS Seminar Series – Lydia Lucchesi, University of Texas at Austin
Visual Documentation for Data Preprocessing in R and Python
2:00 pm – 3:00 pm • In Person
Speaker(s): Lydia Lucchesi
Sep
19
2025
SDS Seminar Series – Tuan Pham, University of Texas at Austin
Time-uniform Bounds for Iterated Algorithms
2:00 pm – 3:00 pm • In Person
Speaker(s): Tuan Pham
Sep
26
2025
SDS Seminar Series - Ryan Giordano, University of California, Berkeley
TBA
2:00 pm – 3:00 pm • In Person
Speaker(s): Ryan Giordano
Oct
3
2025
SDS Seminar Series – Rafael Campello de Alcantara, University of Texas at Austin
Searching for Parallel Trends: A Decision Tree Algorithm for Discovering Conditional Diff-in-Diff Estimators
2:00 pm – 3:00 pm • In Person
Speaker(s): Rafael Campello de Alcantara
Oct
10
2025
SDS Seminar Series – Michele Guindani, University of California, Los Angeles
TBA
2:00 pm – 3:00 pm • In Person
Speaker(s): Michele Guindani
Oct
17
2025
SDS Seminar Series – Wenyi Wang, MD Anderson Cancer Center
TBA
2:00 pm – 3:00 pm • In Person
Speaker(s): Wenyi Wang
Oct
31
2025
SDS Seminar Series – Max Goplerud, University of Texas at Austin
TBA
2:00 pm – 3:00 pm • In Person
Speaker(s): Max Goplerud
Nov
7
2025
SDS Seminar Series – Jeffrey Miller, Harvard University
TBA
2:00 pm – 3:00 pm • In Person
Speaker(s): Jeffrey Miller