Trainees will rotate through at least two labs as part of this program. The labs encompass research involving many diseases including cancer, dementia, kidney disease, stroke, osteoarthritis, cardiovascular diseases, infectious diseases (influenza, SARS, and HIV), and obesity.









bajaj imgBajaj Lab (Chandrajit Bajaj, CS) 

Bajaj's research group focuses on high throughput and big data analytics to elucidate, predict and validate molecular-molecular structural interactions for use in molecular therapeutics. Molecular sequence and structural imaging (1D, 2D) data come from experimental apparatus such as gene arrays, x-ray diffraction, and electron microscopy. These datasets are inherently noisy, and they must be computationally analyzed to construct 3D structural models, using highly regularized solutions to complicated inverse problems. The innovativeness of their approach is to always work with compressed data using and producing novel goal-directed image compression methods.

dr kevin bozic dell medical schoolBozic Lab (Kevin Bozic, DMS)

Dr. Bozic's clinical interests are in the management of patients with arthritis of the hip and knee, with an emphasis on primary and revision hip and knee replacement surgery. His research interests are broadly in the fields of health policy and health care services research, and specifically in the areas of healthcare technology assessment, cost-effectiveness analysis, shared medical decision making, and the impact of healthcare reform on cost and quality. Dr. Bozic is working to implement and evaluate new models of care delivery as well as new value-based payment models. Dr. Bozic's research focuses on the implementation and evaluation of novel, value-based healthcare delivery and payment models. He is also overseeing the development of integrated practice units (IPUs) for musculoskeletal conditions such as osteoarthritis. Currently, Dr. Bozic's team is also working on defining different value-based payment models to make the IPU model feasible and incentivize appropriate treatment decisions by the care team. The team also plans to establish a system for collecting data for a musculoskeletal disease registry. Dr. Bozic has established a joint replacement registry in California, and has extensive experience in using large databases to conduct outcomes and comparative effectiveness analysis.

DavisDavis Lab (Jaimie Davis, Nutrition)

Jaimie Davis is an assistant professor of nutritional sciences hired as a faculty member at UT Austin in 2012. Over the past 10 years, Davis's research group research has focused on designing and implementing obesity interventions for low-income minority children and adolescents. She has extensive expertise in nutrition physical activity, and body composition assessment in pediatric populations. Davis's research also involves developing and testing school and community based gardening and cooking programs targeting obesity prevention and treatment for low-income minority populations. Her work directly address the effects of behaviors (dietary and physical activity) and changes in these behaviors on adiposity parameters, type 2 diabetes, and cardiovascular risk factors in minority youth. Dr. Davis also recently submitted an NIH grant to develop a centralized, real-time Monitoring and Reaction (MORE) system that combines behavioral data (i.e., diet, exercise, sleep), insulin dosing, glucose levels, and emotional status, to allow targeted and integrated communications with parents/caregivers and healthcare professionals in real-time, and to test the effects on glucose control of the child and quality of life (QoL) and emotional status of the child and parent. 

dhillonDhillon Lab (Inderjit Dhillon, CS and Math)

Dhillon's research group focuses on developing novel solutions for Big Data Analytics that arise in various modern applications. These include models, algorithms and software that scale to very large data sets for various classical analysis tasks, such as regression, classification, dimensionality reduction as well as new analysis tasks, such as multi-task learning, matrix completion and high-dimensional covariance estimation. In particular, they apply these analysis tasks to social network analysis, recommender systems, bioinformatics and neuroscience. In a joint project with biologists, his group has developed new network-based methods that predict novel gene-disease associations based on a network of known associations between genes, human diseases and phenotypes of model species. In a joint project with neuroscientists, his group has developed scalable methods to analyze large-scale fMRI data by estimating a sparse inverse covariance matrix using new techniques from high-dimensional statistical inference. More generally, his research group has been developing software tools that are being used in various applications that involve big data analytics, such as non-negative matrix factorization, inverse covariance selection, high-dimensional clustering and co-clustering. 

Ellington Lab
Professor of Biochemistry
The Ellington lab works on the directed evolution of molecules and organisms, attempting to mold new phenotypes that go beyond what is naturally available.  In this regard, there are a variety of Big Data projects that involve using the results of NextGen sequencing analyses to better understand what paths evolution can take (or be made to take).  For example, in the directed evolution of polymerases with novel functions, such as the ability to synthesize extremely long tracts of DNA, there is both a great deal of natural phylogenetic data and data from the results of previous directed evolution experiments that can be used to better craft evolutionary paths, ultimately leading to molecules with improved diagnostic and synthetic capabilities.  Such projects have an amusing 'meta' component, in that by improving the molecules that are used to gather biological Big Data in the first place (DNA polymerases involved in NextGen sequencing eperiments) you will essentially be giving Big Data the means to accelerate its rate of acquisition, especially via newer generation single molecule sequencing platforms.  In another project, we have collected enormous amounts of sequencing data on the directed evolution of organismal genomes with altered genetic codes, and are attempting to understand how an entire organism moves into a completely new chemical space and takes advantage of novel amino acids that expands their chemical capabilities.  As with experiments that focus on smaller segments of DNA (such as an individual polymerase gene, as above), the problem is to sieve, cull, and refine mutation data so that deleterious, neutral, and positive features of emerging evolutionary landscapes can be discerned.  This problem ratchets up in difficulty for an entire genome, requiring the insights of data scientists.

Georgiou headshot Georgiou Lab (George Georgiou, Chemical Engineering and BME)

Georgiou's research group is working on the analysis of the immune receptor repertoire by proteomic and NextGen sequencing technologies. They have developed the only available platform for sequencing multiple transcripts (2 or 3) from single B cells at very high throughput, a technology that has enabled the determination of the VH:VL paired repertoire. They also developed technologies for the determination of the identity and relative quantization of the polyclonal anti- body repertoire in serum or secretions. They are currently employing these and other high information content methodologies to the analysis and mining of human immune responses following infection or vaccination. This work is generating very large datasets of VH:VL antibody sequences (and to a lesser extent TCR sequences). They are developing a cloud based computational resource for the analysis of antibody repertoires and for delineating the evolution of antigen specific antibody lineages in longitudinal samples.

HOfmannHHofmann Lab: (Johann Hofmann, IB)

Hofmann's research group utilizes sophisticated bioinformatics and statistical approaches to investigate how social interactions affect neural and behavioral phenotypes of individuals. Research in humans has quantitatively demonstrated that phenotypic outcomes and behavioral attitudes can indeed be predicted based on an individuals social connections, yet how interactions with other individuals affect brain and behavior phenotypes is not well understood. The neuromolecular mechanisms that regulate social behavior are still poorly understood, even though neurodevelopmental disorders (e.g., Autism spectrum disorders) severely affect social interactions. Trainees in the Hofmann lab use a variety of vertebrate model systems to investigate the endocrine and molecular mechanisms underlying social behavior within an integrative framework.

Alex HukHuk Lab (Alex Huk, NS)

Huk's research group focuses on making large-scale recordings of neural activity across the primate cortex. This is done both using direct measurement (electrophysiological recordings in nonhuman primates) and indirect techniques (functional magnetic resonance imaging in humans). The electrophysiological work is unique in that they have developed techniques to record from multiple neurons in multiple brain areas, all at the same time, in primates performing complex behavioral tasks. This allows direct tests of hypotheses about the ow of information from one brain area to another during various controlled forms of perception, cognition, and action. However, the scale of these recordings also requires the development and application of statistical and analytic tools to make sense of these high dimensional data. Likewise, the brain imaging work in his lab employs cutting-edge multiplexing sequences capable of yielding data at considerably higher spatial and temporal resolutions than conventional imaging approaches. Like the electrophysiology, these data are rich but also require new techniques to handle their scale.

KirkpatrickMarkKirkpatrick Lab (Mark Kirkpatrick, IB)

Kirkpatrick's research group attacks the fundamental question of what forces drive evolutionary change of the genome from two sides: they develop mathematical models that generate quantitative hypotheses, and analyze genomic data to test those hypotheses. These models are fit to the data using a range of statistical frameworks including likelihood, Bayesian methods, and approximate Bayesian computation (ABC). There are interesting computational challenges here as they are constantly fitting ever more complex (and realistic) evolutionary models to large data sets (e.g. 103 individuals each genotyped at 105 markers). A second research theme is developing methods to estimate quantitative genetic variances and covariances in natural and domestic populations. These parameters are important because they determine how rapidly species can adapt (in nature) and how fast they can be economically improved by selective breeding (in agriculturally important animals and crop plants). Here they are developing likelihood and Bayesian-like methods that are su_fficiently efficient to work with large data sets, for example multi-generation pedigrees with hundreds of thousands of individuals.

Mia Markey 2015 croppedMarkey Lab (Mia Markey, BME)

Markey's research group develops decision support systems for clinical decision- making and scientific discovery. Her lab leverages signal processing, machine learning, and statistical methods in designing algorithms for data-driven, health-focused research. In addition to collaborations with other engineering researchers and clinical experts, Markey's group has close partnerships with colleagues in the behavioral sciences. For example, Dr. Markey leads a collaborative, multi-institutional team working towards the vision of a decision support system that will enable breast cancer patients, in consultation with their healthcare providers, to choose a reconstruction strategy with maximal potential to optimize psychosocial adjustment.

meyersMeyers Lab (Lauren Ancel Meyers, IB and SDS)

For over a decade, Dr. Meyers's research group has been working in the field of mathematical epidemiology, pioneering network-based mathematical modeling of infectious disease transmission in human and animal populations. Collaborating with field ecologists, epidemiologists, and public health agencies around the globe, she and her research group at UT have applied these methods to diverse data sets to gain a better understanding of the dynamics of infectious diseases (in particular, pandemic influenza, Ebola, SARS, and HIV) and to develop effective strategies for surveillance, mitigation and conservation. Over the last six years, Dr. Meyers has led several large interdisciplinary research teams in developing decision-support tools for optimizing infectious disease surveillance systems and control policies. These projects network graduate students from diverse fields with state and national public health practitioners, and provide critical graduate training in translating basic science into practical applications.

Photo of Nancy MoranMoran Lab (Nancy Moran, IB)

Dr. Nancy Moran studies genome evolution, primarily in bacteria and insects. A focus is on the processes leading to genomic divergence, including mutation, horizontal gene transfer, genetic drift and natural selection. Projects in the lab include novel de novo sequencing and assembly of bacterial and insect genomes, metagenomics of bacterial communities, transcriptomic studies. comparative genomic studies, and a wide range of experiments on symbiotic bacteria that coevolve with their hosts.

MuellerMueller Lab (Peter Mueller, SDS and Math)

Mueller's research group works on Bayesian nonparametric inference (BNP), Markov chain Monte Carlo methods (MCMC), decision problems and related applications in Bayesian biostatistics and bioinformatics. BNP models are at the base of many clustering and feature allocation methods that are being used for big data, and, vice versa, big data allows meaningful inference for the infinite dimensional random quantities that are being modeled in BNP priors. Big data gives rise to interesting research challenges in MCMC methods. Many bioinformatics data sets, including popular -omics data sets and TCGA data, are large data sets that require big data methods.

PressWilliamPress Lab (William Press, CS and IB)

Press's research group works on large-data problems in biology using a variety of signal-processing and statistical algorithms. Examples include the development of new preparation protocol and informatic pipeline that can achieve order of magnitude higher accuracy from next generation sequencing; studies of the comparative genomics of ultra-conserved noncoding DNA in mammals; comparative genomics of multiple chiroptera species that may be reservoirs of Ebola hemorrhagic fever; study of possible new protocols for adaptive clinical trials based on Bayesian bandit problems; and informatics pipelines for shotgun proteomics and mass spectrometry.

preston aPreston Lab (Alison Preston, NS)

Preston's research group uses a combination of behavioral and human brain imaging techniques to explore how we form new memories, how we remember past experiences, and how our memories for the past influence what we learn in the present. In particular, Dr. Preston's work has focused on characterizing the functional role of the human hippocampus and its interactions with prefrontal, parietal and sensory cortices during behaviors that rely on memory. Her lab has brought several new paradigms and techniques to bear on these questions, including high-resolution functional magnetic resonance imaging (fMRI), which affords more precise visualization of the detailed structure of the human brain, including hippocampal sub-fields. The lab has also combined these cutting-edge fMRI acquisition methods with machine learning techniques to decode when individuals retrieve specific memory content in service of decision making. In particular, students receive practical training on sophisticated fMRI data analysis techniques that utilize the high-performance computing resources provided by the Texas Advanced Computing Center.

sacksSacks Lab (Michael Sacks, BME)

Sacks's research group is internationally renowned for their work on cardio- vascular biomechanics, with a focus on the quantification and simulation of the structure-mechanical properties of native and engineered cardiovascular soft tissues. His group also works on the mechanical behavior and function of heart valves, including the development of the first constitutive models for these tissues using a structural approach; and on the  biomechanics of engineered tissues, and on understanding the in-vitro and in-vivo re- modeling processes from a functional biomechanical perspective. The research includes multi-scale studies of cell/tissue/organ mechanical interactions in heart valves and tries to determine the local stress environment for heart valve interstitial cells. Recent research has included developing novel constitutive models of right ventricular myocardium that allow for the individual contributions of the myocyte and connective tissue networks.

Scott JamesScott Lab (James Scott, SDS)

Scott's research group explores new methods for addressing the computational challenges that arise in high-dimensional statistical inference problems. One particular line of work has focused on large-scale multiple testing problems, where strong protection against false discoveries is necessary to avoid being overwhelmed by noise. This issue is central to the analysis of modern biomedical data sets, which often involve simultaneously testing a large number of related null hypotheses (about genes, neurons, SNPs, brain regions, etc). Existing approaches for controlling the proportion of false discoveries typically fail to account for the natural biological structure of the problem: for example, whether two genes are physically adjacent on the chromosome, whether two neurons have similar tuning curves, and so forth. Much of Scott's group's recent work has focused on developing models for multiple testing, together with computationally e_efficient methods for inference, that are capable of leveraging this known biological structure to improve overall power, while maintaining the same control over the false-discovery rate of existing methods.

walkerWalker Lab (Stephen Walker, SDS and Math)

Walker's research group involves Bayesian nonparametric methods with application areas focusing on medical statistics. Through the machine learning community, Bayesian nonparametrics has become one of the key tools adopted for dealing with the analysis and study of big data. Big data also often requires simulation strategies, for example the well known Markov chain Monte Carlo methods, in order to learn about the hidden patterns inside the data; and another area of research is the study and implementation of such techniques.

dr steven warach dell medWarach Lab (Steven Warach, DMS and Seton Healthcare Family) 

Warach's research group focuses on clinical trials and large international registries related to the diagnosis, prevention, and treatment of stroke. The group conducts MRI biomarker and therapeutic studies of ischemic stroke and clinical trials testing novel reversal agents to newer oral anti-coagulants in patients suffering anti-coagulant induced hemorrhages. Warach is national co-PI of MR WITNESS, an NINDS-funded clinical trial of tPA using MRI markers to select patients. Through leadership roles in multi-center trials Warach has access to large clinical research databases that would be accessible for the trainees. One example is the Virtual International Stroke Trials Archive (VISTA),, a database of over 80,000 individual patient records, from large clinical trials and patient registries. Warach also chairs and hosts at the University of Texas at Austin the Stroke Imaging Research Repository (STIR), a VISTA-afiliate that includes source data of MRI from clinical trials and registries. 

ClausAug2015 printWilke Lab: (Claus Wilke, IB)

Wilke's research group carries out computational research in protein biochemistry, molecular evolution, and systems biology. At present, the lab pursues two major research directions: first, the evolution of protein-coding genes, in particular as applied to virus evolution and viral host-range shifts, using existing and novel computational methods such as maximum-likelihood and Bayesian statistics, computational protein design, and all-atom molecular dynamics simulations; second, the development of statistical and mechanistic models of bacterial metabolism. The goal in this research is to predict how bacterial metabolism changes when bacteria are grown in different environments.

williamson SWilliamson Lab: (Sinead Williamson, SDS)

Williamson's research group focuses on the construction and implementation of novel Bayesian nonparametric models. Since they remove the need to pre-specify model dimensionality, nonparametric models are a good match for large datasets that may grow in an online fashion. A major focus of this research is scaling inference in Bayesian nonparametric models to large datasets, in particular by allowing them to make use of large-scale distributed architectures. Bayesian nonparametric models have been used extensively in the biological and medical sciences; current work includes investigating models for predicting dietary choices based on the dietary logs of millions of users.