Button to scroll to the top of the page.

Spring 2019 Colloquia: Graduate Portfolio in Applied Statistical Modeling






Lara Heersema

April 10


GDC 1.406

"Optimizing iron oxide nanoparticles for biomedical applications by statistical experimental design"
Chia-Hui Liu May 13 11:30am-12pm GDC 7.402 "Tackling Residents’ Issues Using Austin 311 Service Requests"
Ashlee Frandell May 13 1-1:30pm GDC 7.402 "Online School Success: Key Predictors for Graduation"
Yizhen Wang May 13 1:30-2pm GDC 7.402 "Prediction of Mother’s Affect via Daily Activities"
Tianheng Feng May 13 2-2:30pm GDC 7.402 "Design Optimization of Bottom-Hole Assembly Using Genetic Algorithm to Reduce Drilling Vibration"
Dariya Sydykova May 13 2:30-3pm GDC 7.402 "Theory of measurement for site-specific evolutionary rates in amino-acid sequences"
Matt Lehrer May 14 1-1:30pm GDC 7.402 "Longitudinal Associations of Daily Stressor Frequency and Severity with Diurnal Cortisol Slopes and Cardiometabolic Conditions"
Shuning Lu May 14 1:30-2pm GDC 7.402 "A multilevel analysis of Internet’s effect on collective action in China"
Soovadeep Bakshi May 14 2-2:30pm GDC 7.402 "Fast Scheduling of Autonomous Mobile Robotsunder Task Space Constraints with Priorities"
Joseph O'Brien May 14 2:30-3pm GDC 7.402 "Beliefs in Affective Insight Predict Emotional Inertia and Self-Esteem Integration: A Random Cross-Lag Panel Design with Accompanying Simulation Data"

Lara Heersema

PhD student in Biomedical Engineering, advised by Dr. Claus Wilke

Title: "Optimizing iron oxide nanoparticles for biomedical applications by statistical experimental design"

Abstract: Iron oxide nanoparticles hold great potential for biomedical applications but are limited by a lack of understanding regarding efficient and scalable synthesis of monodisperse iron oxide nanoparticles. The important variables in the two-step high-temperature thermal decomposition of iron (III) oleate were identified and optimized using statistically designed experiments (DOE). Important variables in the thermal decomposition reaction of iron oxide nanoparticles were identified through statistical analysis of previously reported synthesis parameters and nanoparticle sizes. A definitive screening design (DSD) was used to evaluate six important synthesis parameters. The DSD model incorporated multiple factors at 2 or 3 levels with only 2k+1 experiments, where k is the number of factors. This allows for more efficient use of resources and time to build a better understanding of nanoparticle synthesis reactions compared to traditional one-at-a-time or fractional factorial studies. Forward regression was used with data generated according to the DSD to predict model coefficients. Equations were developed to predict nanoparticle hydrodynamic size based on synthesis parameters. These equations were compared with independent validation synthesis reactions. High-temperature thermal decomposition synthesis reaction time was the most influential synthesis parameter in dictating nanoparticle size. Other important parameters were thermal decomposition synthesis temperature, iron oleate generation atmosphere, and interactions between iron oleate generation atmosphere and drying conditions. These results agree well with overall trends for nanoparticle sizes reported in the literature synthesized from iron (III) oleate. The results of this study demonstrate the power of DOEs in identifying important parameters for nanoparticle synthesis in relatively few experiments and how studying these reaction parameters can be used to provide insight into nanoparticle formation.


Chia-Hui Lui

Title: "Tackling Residents’ Issues Using Austin 311 Service Requests"

Abstract: Nowadays, over 55% of people (4.2 billion) live in urban areas, and the urban population will be doubled by 2050. Although urbanization can augment the prosperity of the city, the urban population growth will negatively affect the living quality due to noise, pollution, traffic, and poor public security. In addition, bursting urban environment may counteract the residents’ living quality by increasing living stress and other urban hazards. The 311 service requests, starting as a non-emergency line, aim at aligning government priorities with residents’ concerns to improve urban living environment. Numerous studies have demonstrated that the 311 requests can help enhance the well-being of urban areas. However, most of the studies have done by focusing on single perspective, such as noise, pollution, for example. In this study, a more comprehensive analysis using statistical methods, including regression, and clustering, on Austin 311 service requests will be conducted. The data of Austin 311 service requests were collected from 2013 to 2018 via Austin Open Data Portal. The ultimate goal for this study is to investigate Austin 311 service request to better align the government priorities with the residents’ concerns using statistical methods.


Ashlee Frandell

Title: "Online School Success: Key Predictors for Graduation"

Abstract: Public schools are beginning to use data to identify students at risk of dropping out to better provide support. There are many possible predictors of high school graduation given available data. This paper sets out to determine the primary predictors of high school graduation from a sample of public online schools. Due to the format of the data, I modeled student probability of graduation using survival analysis. The data is censored, as some students take longer than four years to graduate or leave the schools before graduation due to the nature and flexibility of online schools. I expect to find that grades and school participation greatly impact the probability of graduating and that earlier grade level variables are more accurate predictors of graduation.


Yizhen Wang

Title: "Prediction of Mother’s Affect via Daily Activities"

Abstract: The project examines day-to-day activities and interactions between mothers and their infants, which is a part in the Daily Activity Lab of the Institute for Mental Health Research. To examine these interactions, this paper uses three data sources: motion data captured using Movie sensor, audio data captured using Lena sensor and EMA data (ecological momentary assessments) from daily questionnaires. Using this data, the project trained a model to analyze the different kinds of factors that affect a mother’s mood. Data is collected from 23 mothers and their infants. To ensure data accuracy I applied normalization and regular expression and added a threshold. To identify factors that affect a mother’s mood, I extracted several features like mother’s sleep quality, baby total sleep hours, baby total motions several hours before, baby cry several hours before and so on. Using correlation analysis to check the relationship between each predictor revealed some relevant variables. The baby overnight total sleep hours feature is highly correlated with the mother overnight total sleep hours feature. Finally, in the project, since participant is a random factor and other features, I extracted are fixed factors, I used Mixed Linear Model to predict mother’s mood.


Tianheng Feng

Title: "Design Optimization of Bottom-Hole Assembly Using Genetic Algorithm to Reduce Drilling Vibration"

Abstract: Drilling vibration has been considered as one of the major undesirable drill-string dynamics. This paper establishes a framework to optimize the design of the bottom hole assembly (BHA) such that the BHA structure is resilient against vibration. Firstly, a high-fidelity BHA model is established using the finite element method(FEM), which considers the buckling effects caused by the weight on bit (WOB) and the contact between stabilizers and wellbore. Then, vibration indicesare derived using the FEM model to evaluate BHA vibration.According to statistic study, the BHA vibration indices have high covariances with drillingvibrations; therefore, can be used to quantify BHA anti-vibration level. The stabilizer positions can determine theseindices through influencing the boundary conditions; therefore, designing stabilizer positions to reduce drilling vibration can be formulated as an optimization problem to minimize the BHA vibration indices over the operational range. The cost function is non-convex within feasible domain and cannot be expressed explicitly in terms of stabilizer positions. Therefore, astochastic optimization method-Genetic Algorithm (GA) is selected to solve the non-convex problem, where parallel computation is implemented to expedite the computational process. For model verification, the finite element analysis of a BHA is conducted and compared against analytical solutions and existing literature and shows a good agreement. A production BHA is optimized following the proposed method. According to Monte Carlo simulations, GA can optimize the stabilizer positions with high accuracyand low computational cost. The strain energy and stabilizer side force of the redesigned BHA are significantly reduced compared with the original design, which results in a much better BHA dynamic performance and less drilling vibration.




Dariya Sydykova

Title: "Theory of measurement for site-specific evolutionary rates in amino-acid sequences"

Abstract: In the field of molecular evolution, we commonly estimate site-specific evolutionary rates from alignments of amino-acid sequences. For example, catalytic residues in enzymes and interface regions in protein complexes can be inferred from observed relative rates. While numerous approaches exist to estimate amino-acid rates, it is not entirely clear what physical quantities the inferred rates represent and how these rates relate to the underlying fitness landscape of the evolving proteins. Further, amino-acid rates can be estimated in the context of different amino-acid exchangeability matrices, such as JTT, LG, or WAG, and again it is not well understood how the choice of the matrix influences the physical interpretation of the inferred rates. Here, we develop a theory of measurement for site-specific evolutionary rates, by analytically solving the maximum-likelihood equations for rate inference performed on sequences evolved under a mutation--selection model. We demonstrate that for realistic analysis settings the measurement process will recover the true expected rates of the mutation--selection model if rates are measured relative to a naive exchangeability matrix, in which all exchangeabilities are equal to 1/19. We also show that rate measurements using other matrices are quantitatively close but in general not mathematically equivalent. Our results demonstrate that insights obtained from phylogenetic-tree inference do not necessarily apply to rate inference, and best practices for the former may be deleterious for the latter.



Matt Lehrer

Title: "Longitudinal Associations of Daily Stressor Frequency and Severity with Diurnal Cortisol Slopes and Cardiometabolic Conditions"

Abstract: The unique roles that stressor frequency and severity play in the contribution to cardiovascular and metabolic disease are poorly understood, as are the interaction of these stressor characteristics with potentially beneficial psychosocial attributes in the context of cardiometabolic health. The purpose of this study was to examine prospective associations of daily stressor frequency and severity with prevalence of cardiometabolic health conditions, along with potential mediating and moderating influences of diurnal cortisol slopes and psychosocial resilience resources. Participants (N = 1,333) from the Midlife in the United States (MIDUS) study completed questionnaires at MIDUS 2 (2004-2005) assessing cardiometabolic conditions (heart disease, stroke, hypertension, type 2 diabetes, hypercholesterolemia, and obesity) and resilience resources (optimism, self-esteem, social integration, purpose in life, positive reappraisal). Participants then completed a daily diary (2004-2009) in which they reported stressors and perceived stressor severity each day for eight consecutive days, and provided saliva samples for cortisol analysis on days 2-5. At MIDUS 3, participants completed another questionnaire assessing cardiometabolic conditions (2013-2015). Structural equation modeling estimated effects of stressor frequency and severity on the diurnal cortisol slope and MIDUS 3 cardiometabolic conditions, and the moderating effect of resilience resources on those associations. Daily stressor severity and flattened cortisol slopes were associated with greater prevalence of cardiometabolic conditions at MIDUS 3. Resilience resources did not moderate any associations of daily stressor frequency or severity with the cortisol slope or MIDUS 3 cardiometabolic conditions. Stressor severity—rather than frequency of stressor occurrence—may be uniquely associated with risk for cardiovascular and metabolic disease later in life.




Shuning Lu

Title: "A multilevel analysis of Internet’s effect on collective action in China"

Abstract: Integrating rational choice theory with the perspective of information ecology, this paper investigates how new media affect collective action in China. Using both survey and government statistics, I specify multilevel models to estimate the effects of Internet usage and penetration on the propensity to participate in collective action among individuals. Our findings show that Internet affects collective action propensity through individual-level usage rather than aggregate-level penetration. Implications and future directions are discussed. 




Soovadeep Bakshi

Title: "Fast Scheduling of Autonomous Mobile Robotsunder Task Space Constraints with Priorities"

Abstract: The use of Autonomous Mobile Robots (AMRs) for fast and efficient manufacturing has attracted the interestof academia and industry in recent times, especially due to significant improvements in computational efficiency.However, one of the biggest challenges in terms of controls is the optimal task assignment and scheduling of theseAMRs in order to finish the assigned tasks as quickly as possible, taking into account the priority of the tasks.Since there are multiple AMRs and tasks have various priority levels, tasks have to be assigned to each robotwith a good spread of task priorities. Once the tasks are assigned, the order in which the tasks are to be performedby each AMR has to be determined. Each task involves transporting of tools/materials from one point (pick-up) toanother point (drop-off) on the factory floor, and therefore, a cost of performing a task can be defined in terms ofthe time/energy spent in completing the task. The additional challenge is the addition of a priority level to tasks,which can be considered as a constraint on the ordering of tasks for each AMR.The need for real-time algorithms to solve this problem renders exhaustive search algorithms inappropriate,since their focus is on the accuracy of the solution without considering time constraints. This paper explores twomethods to solve the single AMR scheduling problem more efficiently. Firstly, a model-based learning techniqueis used with reinforcement learning-based methods. Secondly, this research effort proposes a gradient-based real-time approach for the scheduling problem based on a mathematical formulation in the structure of a regularizedquadratic program. These scheduling algorithms are shown to perform better than a simulated annealing basedpairwise exchange technique, which is a commonly used heuristic method, in terms of a defined cost metric. There-fore, the proposed algorithms allow for the generation of efficient real-time solutions to the scheduling problem fora single AMR in a prioritized task space.




Joseph O'Brien

Title: "Beliefs in Affective Insight Predict Emotional Inertia and Self-Esteem Integration: A Random Cross-Lag Panel Design with Accompanying Simulation Data"

Abstract: The development of self-esteem is most strongly influenced by experiences that are interpreted as more strongly self-relevant (Demarree et al., 2007). Similarly, a focus on self-relevance of emotional experiences may be likely to produce emotional inertia, the increased persistence of emotions over time (Koval et al., 2012). Both facets are predictors of depression. This research tested in adolescents a novel lay theory regarding whether salient affective cues (e.g., particularly intense emotional experiences) are to be regarded as revealing certain and relevant information about the self. Working from a balanced random subsample of across 3 high schools from a multi-school longitudinal study (N = 198, 57% female, full sample to be tested later), we examined the ability of a measure of the lay theory obtained from a baseline assessment to moderate autoregressive and cross-lag pathways in a subsequent 10-day daily diary cross-lag design between daily sadness and daily self-esteem (Berry & Willoughby, 2017). Analysis employed a random-intercepts latent cross-lag design, which allows testing the effect of daily emotion that is subjectively more intense relative to person mean. Results indicated that lay theory endorsement increased the effects of daily intense sadness on next-day self-concept (self-esteem influence), as well as sadness on next day sadness (emotional inertia), with both pathways significant only where lay theory endorsement was high. To address concerns over biased parameter estimates in autoregressive designs (Hamaker & Grasman, 2015), this was supplemented by 10k iterated random-data simulations testing manifest-centered and latent random intercept cross-lag models, showing that the latent approach avoids downwardly-biased autoregression and cross-lag parameter estimates