Trainees will rotate through at least two labs as part of this program. The labs encompass research involving many diseases including cancer, dementia, kidney disease, stroke, osteoarthritis, cardiovascular diseases, infectious diseases (influenza, SARS, and HIV), and obesity.







MuellerMueller Lab (Peter Mueller, SDS and Math)

Mueller's research group works on Bayesian nonparametric inference (BNP), Markov chain Monte Carlo methods (MCMC), decision problems and related applications in Bayesian biostatistics and bioinformatics. BNP models are at the base of many clustering and feature allocation methods that are being used for big data, and, vice versa, big data allows meaningful inference for the infinite dimensional random quantities that are being modeled in BNP priors. Big data gives rise to interesting research challenges in MCMC methods. Many bioinformatics data sets, including popular -omics data sets and TCGA data, are large data sets that require big data methods.

Scott JamesScott Lab (James Scott, SDS)

Scott's research group explores new methods for addressing the computational challenges that arise in high-dimensional statistical inference problems. One particular line of work has focused on large-scale multiple testing problems, where strong protection against false discoveries is necessary to avoid being overwhelmed by noise. This issue is central to the analysis of modern biomedical data sets, which often involve simultaneously testing a large number of related null hypotheses (about genes, neurons, SNPs, brain regions, etc). Existing approaches for controlling the proportion of false discoveries typically fail to account for the natural biological structure of the problem: for example, whether two genes are physically adjacent on the chromosome, whether two neurons have similar tuning curves, and so forth. Much of Scott's group's recent work has focused on developing models for multiple testing, together with computationally e_efficient methods for inference, that are capable of leveraging this known biological structure to improve overall power, while maintaining the same control over the false-discovery rate of existing methods.

walkerWalker Lab (Stephen Walker, SDS and Math)

Walker's research group involves Bayesian nonparametric methods with application areas focusing on medical statistics. Through the machine learning community, Bayesian nonparametrics has become one of the key tools adopted for dealing with the analysis and study of big data. Big data also often requires simulation strategies, for example the well known Markov chain Monte Carlo methods, in order to learn about the hidden patterns inside the data; and another area of research is the study and implementation of such techniques.

williamson SWilliamson Lab: (Sinead Williamson, SDS)

Williamson's research group focuses on the construction and implementation of novel Bayesian nonparametric models. Since they remove the need to pre-specify model dimensionality, nonparametric models are a good match for large datasets that may grow in an online fashion. A major focus of this research is scaling inference in Bayesian nonparametric models to large datasets, in particular by allowing them to make use of large-scale distributed architectures. Bayesian nonparametric models have been used extensively in the biological and medical sciences; current work includes investigating models for predicting dietary choices based on the dietary logs of millions of users.