SDS Seminar Series - Jonathan Huggins, Boston University
Oct
24
2025

Oct
24
2025
Description
The Fall 2025 SDS Seminar Series continues on October 24th from 2:00 p.m. to 3:00 p.m. with Dr. Jonathan Huggins (Assistant Professor, Department of Mathematics & Statistics, Boston University). This event is in-person in POB 6.304.
Title: Robust Model Selection for Discovery of Latent Mechanistic Processes
Abstract: When learning interpretable latent structures using model-based approaches, even small deviations from modeling assumptions can lead to inferential results that are not mechanistically meaningful. For example, many latent structures consist of K mechanistic processes (with K unknown). When the model is misspecified, likelihood-based model selection methods can substantially overestimate K as the sample size grows, while nonparametric methods can be overly conservative no matter how large the sample size. Hence, there is need for model selection methods that combine the precision of likelihood-based approaches with the robustness of nonparametrics. To address this need in a principled manner, we first formalize the problem of robust model selection in latent variable models designed for mechanistic understanding as requiring an estimator for K to satisfy a robust model selection consistency property. The definition of robust model selection consistency motivates a particular family of model selection procedures, which rely on plug-in estimates of a component-wise discrepancy measure we call the accumulated cutoff discrepancy criterion (ACDC). We provide a method for constructing mechanistically meaningful component-wise discrepancies for a class of latent variable models that includes unsupervised and supervised variants of probabilistic matrix factorization (including factor analysis) and mixture models. We prove that ACDC provides robust model selection consistency for unsupervised matrix factorization and mixture models. Numerical results show that in practice our approach reliably identifies a physically meaningful number of latent processes in four illustrative applications, outperforming widely used model selection methods. An in-depth case study of cell type discovery using single-cell RNA sequencing data demonstrates ACDC outperforms two widely used software packages designed specifically for single-cell data analysis.
Other Events in This Series
Sep
8
2023
SDS Seminar Series – Dr. Emily Roberts
A Causal Inference Approach for Surrogate Marker Evaluation with Mixed Models
2:00 pm – 3:00 pm • In Person
Speaker(s): Emily Roberts
Sep
15
2023
SDS Seminar Series – Dr. Dimitris Korobilis
Monitoring Multicountry Macroeconomic Risk
2:00 pm – 3:00 pm • Virtual
Speaker(s): Dimitris Korobilis
Sep
22
2023
SDS Seminar Series – Dr. Will Fithian
Estimating the False Discovery Rate of Model Selection
2:00 pm – 3:00 pm • In Person
Speaker(s): Will Fithian
Sep
29
2023
SDS Seminar Series – Dr. David Moriarty
A Data Science Journey in Business
2:00 pm – 3:00 pm • In Person
Speaker(s): David Moriarty
Oct
6
2023
SDS Seminar Series – Dr. Amanda Ellis
Navigating the Future of Statistics Education: Leveraging ChatGPT's Advantages and Overcoming Challenges
2:00 pm – 3:00 pm • Virtual
Speaker(s): Amanda Ellis
Oct
20
2023
SDS Seminar Series – Dr. Amy Zhang
Bisimulation and Reinforcement Learning
2:00 pm – 3:00 pm • Virtual
Speaker(s): Amy Zhang
Oct
27
2023
SDS Seminar Series – Dr. Marcelo Medeiros
Global Inflation Forecasting: Benefits from Machine Learning Methods
2:00 pm – 3:00 pm • Virtual
Speaker(s): Marcelo Medeiros
Nov
3
2023
SDS Seminar Series - Dr. Steve Yadlowsky
Choosing a Proxy Metric from Past Experiments
2:00 pm – 3:00 pm • Virtual
Speaker(s): Steve Yadlowsky
Nov
10
2023
SDS Seminar Series – Drew Herren
Statistical Aspects of SHAP: Functional ANOVA for Model Interpretation
2:00 pm – 3:00 pm • In Person
Speaker(s): Drew Herren
Dec
1
2023
SDS Seminar Series – Dr. Dave Zhao
High-Dimensional Nonparametric Empirical Bayes Problems in Genomics
2:00 pm – 3:00 pm • In Person
Speaker(s): Dave Zhao