Seminar Series - Dr. Baharan Mirzasoleiman

Photo by Christopher Burns - Hanging cylindrical lights
Event starts on this day




Event starts at this time 2:00 pm – 3:00 pm
Virtual (view details)
Featured Speaker(s): Baharan Mirzasoleiman
Cost: Free
The Department for Statistics and Data Sciences at UT Austin presents its Spring 23 Seminar Series with speaker Dr. Baharan Mirzasoleiman.


The Spring 2023 SDS Seminar Series continues on Friday, March 10th from 2:00 p.m. to 3:00 p.m. with Dr. Baharan Mirzasoleiman (Assistant Professor at the University of California, Los Angeles). This event is virtual.

Title: Coresets for Efficient and Robust Learning from Massive Datasets

Abstract: Large datasets have been crucial to the success of modern machine learning models. However, training on massive data has two major limitations. First, it is contingent on exceptionally large and expensive computational resources, and incurs a substantial cost due to the significant energy consumption. Second, in many real-world applications such as medical diagnosis, self-driving cars, and fraud detection, big data contains highly imbalanced classes, noisy labels, and malicious data points. In such cases, training on the entire data does not result in a high-quality model.  

In this talk, I will argue that we can address the above limitations by developing techniques that can identify and extract the most informative subsets for learning from massive datasets. Training on such subsets not only reduces the substantial costs of learning from big data, but also improves their accuracy, and robustness against noisy labels and data poisoning attacks. I will discuss how we can develop effective and theoretically rigorous techniques that provide strong guarantees for the learned models’ quality and robustness against noisy labels. I discuss this problem in both supervised and unsupervised settings



Please contact for the zoom link.