Daniels Lab (Michael Daniels, SDS and IB)
Daniels's research group works on several important problems in biomedical big data. His current R01 includes an aim on causal inference using Bayesian nonparametric methods including in the presence of many confounders. He is currently collaborating with Sebastien Haneuse (Harvard) (R01 application pending) on drawing causal inferences from large medical databases to assess the impact of bariatric surgery on Type II diabetes mellitus. Methodologically, his group is developing methods to best address both selection bias and confounding bias (the former is often ignored). He is currently working with Krista Vandenborne (University of Florida) on assessing various MR measures as biomarkers for progression in boys with Duchenne Muscular Dystrophy (funded R01). Large amounts of imaging data are collected that need to be efficiently stored and processed and statistical models need to be developed to determine how well these markers capture (and predict) disease progression. Finally, he is working on another imaging study with a research group in Umea Sweden to assess factors related to cognitive aging (funded by a Swedish research agency) collected in the Betula study. Large amounts of fMRI data are collected and methodological issues include nonignorable dropout and trying to draw causal inferences related to lifestyle and other factors related to the cognitive decline.
Mueller Lab (Peter Mueller, SDS and Math)
Mueller's research group works on Bayesian nonparametric inference (BNP), Markov chain Monte Carlo methods (MCMC), decision problems and related applications in Bayesian biostatistics and bioinformatics. BNP models are at the base of many clustering and feature allocation methods that are being used for big data, and, vice versa, big data allows meaningful inference for the infinite dimensional random quantities that are being modeled in BNP priors. Big data gives rise to interesting research challenges in MCMC methods. Many bioinformatics data sets, including popular -omics data sets and TCGA data, are large data sets that require big data methods.
Scott Lab (James Scott, SDS)
Scott's research group explores new methods for addressing the computational challenges that arise in high-dimensional statistical inference problems. One particular line of work has focused on large-scale multiple testing problems, where strong protection against false discoveries is necessary to avoid being overwhelmed by noise. This issue is central to the analysis of modern biomedical data sets, which often involve simultaneously testing a large number of related null hypotheses (about genes, neurons, SNPs, brain regions, etc). Existing approaches for controlling the proportion of false discoveries typically fail to account for the natural biological structure of the problem: for example, whether two genes are physically adjacent on the chromosome, whether two neurons have similar tuning curves, and so forth. Much of Scott's group's recent work has focused on developing models for multiple testing, together with computationally e_efficient methods for inference, that are capable of leveraging this known biological structure to improve overall power, while maintaining the same control over the false-discovery rate of existing methods.
Walker Lab (Stephen Walker, SDS and Math)
Walker's research group involves Bayesian nonparametric methods with application areas focusing on medical statistics. Through the machine learning community, Bayesian nonparametrics has become one of the key tools adopted for dealing with the analysis and study of big data. Big data also often requires simulation strategies, for example the well known Markov chain Monte Carlo methods, in order to learn about the hidden patterns inside the data; and another area of research is the study and implementation of such techniques.
Williamson Lab: (Sinead Williamson, SDS)
Williamson's research group focuses on the construction and implementation of novel Bayesian nonparametric models. Since they remove the need to pre-specify model dimensionality, nonparametric models are a good match for large datasets that may grow in an online fashion. A major focus of this research is scaling inference in Bayesian nonparametric models to large datasets, in particular by allowing them to make use of large-scale distributed architectures. Bayesian nonparametric models have been used extensively in the biological and medical sciences; current work includes investigating models for predicting dietary choices based on the dietary logs of millions of users.