SDS Seminar Series – Dr. Daniela Witten

Art by Pawel Czerwinski
Event starts on this day

Apr

12

2024

Event starts at this time 2:00 pm – 3:00 pm
In Person (view details)
Cost: Free
Data Thinning and Its Applications

Description

The Spring 2024 SDS Seminar Series continues on April 12th from 2:00 p.m. to 3:00 p.m. with Dr. Daniela Witten (Biostatistics and Statistics, University of Washington). This event is in-person.    

Title: Data Thinning and Its Applications

Abstract: We propose data thinning, a new approach for splitting an observation from a known distributional family with unknown parameter(s) into two or more independent parts that sum to yield the original observation, and that follow the same distribution as the original observation, up to a (known) scaling of a parameter. This proposal is very general, and can be applied to a broad class of distributions within the natural exponential family, including the Gaussian, Poisson, negative binomial, Gamma, and binomial distributions, among others. Furthermore, we generalize data thinning to enable splitting an observation into two or more parts that can be combined to yield the original observation using an operation other than addition; this enables the application of data thinning far beyond the natural exponential family. Data thinning has a number of applications to model selection, evaluation, and inference. For instance, cross-validation via data thinning provides an attractive alternative to the "usual" approach of cross-validation via sample splitting, especially in unsupervised settings in which the latter is not applicable. We will present an application of data thinning to single-cell RNA-sequencing data, in a setting where sample splitting is not applicable. This is joint work with Anna Neufeld (Fred Hutch), Ameer Dharamshi (University of Washington), Lucy Gao (University of British Columbia), and Jacob Bien (University of Southern California).

Location

Peter O’Donnell Jr. Building (POB) 2.302

Share


Audience

Other Events in This Series

Oct

11

2024

Seminar Series

SDS Seminar Series – Mingyuan Zhou, University of Texas at Austin

Building Faster, Better, and Safer Deep Generative Models via Score Identity Distillation

2:00 pm – 3:00 pm In Person

Speaker(s): Mingyuan Zhou

Oct

18

2024

Seminar Series

SDS Seminar Series – Sherry Zhang, University of Texas at Austin

Pivoting between Space and Time: Spatio-Temporal Analysis with Cubble

2:00 pm – 3:00 pm In Person

Speaker(s): Sherry Zhang

Oct

25

2024

Seminar Series

SDS Seminar Series – Matt Koslovsky, Colorado State University

Sparse Dirichlet-Multinomial Models

2:00 pm – 3:00 pm In Person

Speaker(s): Matt Koslovsky

Nov

1

2024

Seminar Series

SDS Seminar Series – Aaditya Ramdas, Carnegie Mellon University

A Game-Theoretic Theory of Statistical Evidence

2:00 pm – 3:00 pm In Person

Speaker(s): Aaditya Ramdas

Nov

8

2024

Seminar Series

SDS Seminar Series – Myungsoo Yoo, University of Texas at Austin

Dynamic Spatio-Temporal Model Integrating Physics for Fire Front Propagation

2:00 pm – 3:00 pm In Person

Speaker(s): Myungsoo Yoo

Nov

15

2024

Seminar Series

SDS Seminar Series – Rafael Irizarry, Harvard University

Twenty-Five Years of Data Science: Music, Genomics, and Public Health Surveillance

2:00 pm – 3:00 pm In Person

Speaker(s): Rafael Irizarry