SDS Seminar Series – Dr. Daniela Witten
Apr
12
2024
Apr
12
2024
Description
The Spring 2024 SDS Seminar Series continues on April 12th from 2:00 p.m. to 3:00 p.m. with Dr. Daniela Witten (Biostatistics and Statistics, University of Washington). This event is in-person.
Title: Data Thinning and Its Applications
Abstract: We propose data thinning, a new approach for splitting an observation from a known distributional family with unknown parameter(s) into two or more independent parts that sum to yield the original observation, and that follow the same distribution as the original observation, up to a (known) scaling of a parameter. This proposal is very general, and can be applied to a broad class of distributions within the natural exponential family, including the Gaussian, Poisson, negative binomial, Gamma, and binomial distributions, among others. Furthermore, we generalize data thinning to enable splitting an observation into two or more parts that can be combined to yield the original observation using an operation other than addition; this enables the application of data thinning far beyond the natural exponential family. Data thinning has a number of applications to model selection, evaluation, and inference. For instance, cross-validation via data thinning provides an attractive alternative to the "usual" approach of cross-validation via sample splitting, especially in unsupervised settings in which the latter is not applicable. We will present an application of data thinning to single-cell RNA-sequencing data, in a setting where sample splitting is not applicable. This is joint work with Anna Neufeld (Fred Hutch), Ameer Dharamshi (University of Washington), Lucy Gao (University of British Columbia), and Jacob Bien (University of Southern California).
Location
Peter O’Donnell Jr. Building (POB) 2.302
Share
Other Events in This Series
Sep
8
2023
SDS Seminar Series – Dr. Emily Roberts
A Causal Inference Approach for Surrogate Marker Evaluation with Mixed Models
2:00 pm – 3:00 pm • In Person
Speaker(s): Emily Roberts
Sep
15
2023
SDS Seminar Series – Dr. Dimitris Korobilis
Monitoring Multicountry Macroeconomic Risk
2:00 pm – 3:00 pm • Virtual
Speaker(s): Dimitris Korobilis
Sep
22
2023
SDS Seminar Series – Dr. Will Fithian
Estimating the False Discovery Rate of Model Selection
2:00 pm – 3:00 pm • In Person
Speaker(s): Will Fithian
Sep
29
2023
SDS Seminar Series – Dr. David Moriarty
A Data Science Journey in Business
2:00 pm – 3:00 pm • In Person
Speaker(s): David Moriarty
Oct
6
2023
SDS Seminar Series – Dr. Amanda Ellis
Navigating the Future of Statistics Education: Leveraging ChatGPT's Advantages and Overcoming Challenges
2:00 pm – 3:00 pm • Virtual
Speaker(s): Amanda Ellis
Oct
20
2023
SDS Seminar Series – Dr. Amy Zhang
Bisimulation and Reinforcement Learning
2:00 pm – 3:00 pm • Virtual
Speaker(s): Amy Zhang
Oct
27
2023
SDS Seminar Series – Dr. Marcelo Medeiros
Global Inflation Forecasting: Benefits from Machine Learning Methods
2:00 pm – 3:00 pm • Virtual
Speaker(s): Marcelo Medeiros
Nov
3
2023
SDS Seminar Series - Dr. Steve Yadlowsky
Choosing a Proxy Metric from Past Experiments
2:00 pm – 3:00 pm • Virtual
Speaker(s): Steve Yadlowsky
Nov
10
2023
SDS Seminar Series – Drew Herren
Statistical Aspects of SHAP: Functional ANOVA for Model Interpretation
2:00 pm – 3:00 pm • In Person
Speaker(s): Drew Herren
Dec
1
2023
SDS Seminar Series – Dr. Dave Zhao
High-Dimensional Nonparametric Empirical Bayes Problems in Genomics
2:00 pm – 3:00 pm • In Person
Speaker(s): Dave Zhao