The series is envisioned as a vital contribution to the intellectual, cultural, and scholarly environment at The University of Texas at Austin for students, faculty, and the wider community. Each talk is free of charge and open to the public. For more information, contact Rachel Poole at rachel.poole[@]austin[dot]utexas[dot]edu.

Fall Seminar Series

 

September 15, 2017 – Moriba Jah
(Aerospace Engineering and Engineering Mechanics, The University of Texas at Austin)
“Space Traffic Modeling: Challenges for Statistics and Data Science” 
CBA 4.328, 2:00 to 3:00 PM

October 6, 2017 – Michael Zhang
(Department of Statistics and Data Sciences, The University of Texas at Austin)
“Embarrassingly Parallel Inference for Gaussian Processes"
CBA 4.328, 3:30 to 4:30 PM

October 16, 2017 – Matt Taddy
(Department of Economics, University of Texas at Austin) 
“Counterfactual Prediction with Deep Instrumental Variables Networks" 
BRB 1.118, 3:30 to 5:00 PM

October 20, 2017 – Novin Ghaffari
(Department of Statistics and Data Sciences, The University of Texas at Austin)
“Wasserstein Distances and Copluas"
CBA 4.328, 2:00 to 3:00 PM

October 27, 2017 – Bowei Yan
(Department of Statistics and Data Sciences, The University of Texas at Austin)
“Statistical Convergence Analysis of Gradient EM on General Gaussian Mixture Models"
CBA 4.328, 2:00 to 3:00 PM

November 3, 2017 – Antonio Linero
(Department of Statistics, The University of Florida)
“Bayesian Regression Tree Ensembles that Adapt to Smoothness and Sparsity"
CBA 4.328, 2:00 to 3:00 PM

November 10, 2017 – Yuan Ji
(Department of Public Health Sciences, The University of Chicago & Program of Computational Genomics & Medicine, NorthShore University HealthSystem)
“TBA” 
CBA 4.328, 2:00 to 3:00 PM

November 17, 2017 – Panagiotis (Panos) Toulis
(Booth School of Business, University of Chicago Booth)
“Randomization Tests for Network Interference Via Conditioning Mechanisms” 
CBA 4.328, 2:00 to 3:00 PM

December 1, 2017 – Avi Feller
(Goldman School of Public Policy, University of California Berkeley)
“TBA” 
CBA 4.328, 2:00 to 3:00 PM

December 8, 2017 – Elizabeth Tipton
(Teachers College, Columbia University)
“TBA” 
CBA 4.328, 2:00 to 3:00 PM

 

 

Photo of Jah, Moriba K. Moriba Jah (Aerospace Engining and Engineering Mechanics, The University of Texas at Austin)

Copresenter: Manu Delande (Aerospace Engining and Engineering Mechanics, The University of Texas at Austin)

Title: Space Traffic Modeling: Challenges for Statistics and Data Science

Abstract: The United States Strategic Command (USSTRATCOM) has developed and maintains a database of approximately 23000 objects the size of a cell phone and as large as a bus. Out of those, only about 1500 are working satellites and all else is space debris, rubbish as it were. The space domain began as an environment with only a handful of state actors and now is populated with objects either owned or funded by over 60 countries. The man-made space object population is growing at an alarming rate and not all objects are trackable for a variety of reasons. All space objects are modeled as spheres. There is no scientific taxonomy for classifying or understanding the currently tracked man-made space object population. The data collected on this population is sparse, biased, noisy, corrupt, and incomplete. The world is invoking a global space traffic management system with norms of behavior that all space actors can adhere to. However, we lack the sufficient knowledge about what is on-orbit, where it came from, where it is going, and what it can do. The ASTRIA research program at UT Austin, led by Dr Moriba Jah aims to provide a credible solution to this wicked problem, with strong collaborations coming from Computer Science and the Statistics and Data Science Departments. He and his PostDoctoral Fellow, Dr Emmanuel Delande will motivate the research, provide some meaningful examples of ways to possible address it, and seek your involvement.

 Additional Resources


Michael Zhang Michael Zhang (Department of Statistics and Data Sciences, The University of Texas at Austin)

Title: Embarrassingly Parallel Inference for Gaussian Processes

Abstract: Training Gaussian process (GP)-based models typically involves an O(N^3) computational bottleneck. Popular methods for overcoming the matrix inversion problem include sparse approximations of the covariance matrix through inducing variables or dimensionality reduction via ``local experts''. However, these type of models cannot account for both long and short range correlations in the GP functions, and are often hard to implement in a distributed setting. We present an embarrassingly parallel method that takes advantage of the computational ease of inverting block diagonal matrices, while maintaining much of the expressivity of a full covariance matrix. By using importance sampling to average over different realizations of low-rank approximations of the GP model, we ensure our algorithm is both asymptotically unbiased and embarrassingly parallel. We show comparable or improved performance over competing methods, on a range of synthetic and real datasets.

 


mtaddyMatt Taddy (Department of Economics, The University of Texas at Austin)

Title: Counterfactual Prediction with Deep Instrumental Variables Networks

Abstract: We are in the middle of a remarkable rise in the use and capability of artificial intelligence. Much of this growth has been fueled by the success of deep learning architectures: models that map from observables to outputs via multiple layers of latent representations. These deep learning algorithms are effective tools for unstructured prediction, and they can be combined in AI systems to solve complex automated reasoning problems. This paper provides a recipe for combining ML algorithms to solve for causal effects in the presence of instrumental variables – sources of treatment randomization that are conditionally independent from the response. We show that a flexible IV specification resolves into two prediction tasks that can be solved with deep neural nets: a first-stage network for treatment prediction and a second-stage network whose loss function involves integration over the conditional treatment distribution. This Deep IV framework imposes some specific structure on the stochastic gradient descent routine used for training, but it is general enough that we can take advantage of off-the-shelf ML capabilities and avoid extensive algorithm customization. We outline how to obtain out-of-sample causal validation in order to avoid over-fit. We also introduce schemes for both Bayesian and frequentist inference: the former via a novel adaptation of dropout training, and the latter via a data splitting routine.

 


NovinNovin Ghaffari (Department of Statistics and Data Sciences, The University of Texas at Austin)

Title: Wasserstein Distances and Copluas

Abstract: The Wasserstein distances refer to a class of metrics on probability distributions that arise from the Monge-Kantorovich optimal transportation problems. For distributions on R, the Wasserstein distance is well understood. An explicit representation for the optimal coupling between two distributions, in terms of their distribution functions, is known, and the distance between two measures can be readily derived or approximated. For distributions on R𝑛, the case is still largely open. While some abstract representations exist, there is no explicit formulation for a solution, as in the one-dimensional case. Here we consider copula representations of multidimensional distributions to characterize optimal couplings and to derive formulations for the 𝐿2-Wasserstein distance. The talk will begin with background introduction to optimal transportation, Wasserstein distances, and copulas. The ultimate objective is then introducing our copula representations for 𝐿2-Wasserstein distances.

 


Bowei yanBowei Yan (Department of Statistics and Data Sciences, The University of Texas at Austin)

Title: 
Statistical Convergence Analysis of Gradient EM on General Gaussian Mixture Models

Abstract: 
In this paper, we study convergence properties of the gradient Expectation-Maximization algorithm for Gaussian Mixture Models for general number of clusters and mixing coefficients. We derive the convergence rate depending on the mixing coefficients, minimum and maximum pairwise distances between the true centers and dimensionality and number of components; and obtain a near-optimal local contraction radius. While there have been some recent notable works that derive local convergence rates for EM in the two equal mixture symmetric GMM, in the more general case, the derivations need structurally different and non-trivial arguments. We use recent tools from learning theory and empirical processes to achieve our theoretical results.



AntonioLineroAntonio Linero (Department of Statistics, University of Florida)

Title: Bayesian Regression Tree Ensembles that Adapt to Smoothness and Sparsity

Abstract: Ensembles of decision trees are a useful tool for obtaining for obtaining flexible estimates of regression functions. Examples of these methods include gradient boosted decision trees, random forests, and Bayesian CART. Two potential shortcomings of tree ensembles are their lack of smoothness and vulnerability to the curse of dimensionality. We show that these issues can be overcome by instead considering sparsity inducing soft decision trees in which the decisions are treated as probabilistic. We implement this in the context of the Bayesian additive regression trees framework, and illustrate its promising performance through testing on benchmark datasets. We provide strong theoretical support for our methodology by showing that the posterior distribution concentrates at the minimax rate (up-to a logarithmic factor) for sparse functions and functions with additive structures in the high-dimensional regime where the dimensionality of the covariate space is allowed to grow near exponentially in the sample size. Our method also adapts to the unknown smoothness and sparsity levels, and can be implemented by making minimal modifications to existing BART algorithms.

 


YuanJiYuan Ji (Department of Public Health Sciences, The University of Chicago & Program of Computational Genomics & Medicine, NorthShore University HealthSystem)

Title: 
"TBD"

Abstract: 
"TBD"

 

 

 

 

 

 


ptoulisPanos Toulis (Booth School of Business, University of Chicago Booth)

Title:  Randomization Tests for Network Interference Via Conditioning Mechanisms

Abstract: Many important causal questions address interactions between units, also known as network interference, such as interactions between individuals in households, students in schools, and firms in markets. Standard methods, however, often break down in this setting. In particular, randomization tests for statistical hypotheses on treatment effects are challenging because such hypotheses are typically not sharp in the presence of interference. One approach is to conduct randomization tests conditional on a subset of units and assignments, such that the null hypothesis is sharp. While promising, existing approaches require such conditioning to be based on an inflexible partition of the space of treatment assignments, which usually leads to loss of power. In this paper, we propose a general framework of conditioning mechanisms that supports more flexible conditioning, allowing us to leverage more structure from the problem and increase power. Our framework subsumes standard results in conditional randomization testing and also formalizes recent randomization tests in the presence of interference. We detail our framework for two-stage randomized designs, and illustrate with an analysis of a randomized evaluation of an intervention targeting student absenteeism in the School District of Philadelphia.

 


AviFellerAvi Feller (Goldman School of Public Policy, University of California Berkeley)

Title:  "TBD"

Abstract: "TBD"

 

 

 

 

 

 


elizabeth tiptonElizabeth Tipton (Teachers College, Columbia University) 

Title:  "TBD"

Abstract: "TBD"