Diversity Maximization over Large Data Sets
Event starts on this day
Oct
28
2022
Featured Speaker(s):
Sepideh Mahabadi
Event starts on this day
Oct
28
2022
Description
The Fall 2022 SDS Seminar Series continues on Friday, October 28th from 2:00 p.m. to 3:00 p.m. with Dr. Sepideh Mahabadi (Senior Researcher at the Algorithms group of Microsoft Research). This event is in-person, but a virtual option will be available as well.
Title: Diversity Maximization over Large Data Sets
Abstract:
In this talk, we consider efficient construction of "composable core-sets" for the task of diversity maximization. A core-set is a subset of the data set that is sufficient for approximating the solution to the whole dataset. A composable core-set is a core-set with the composability property: given a collection of data sets, the union of the core-sets for all data sets in the collection, should be a core-set for the union of the data sets. Using composable core-sets one can obtain efficient solutions to a wide variety of massive data processing applications, including distributed computation (e.g. Map-Reduce model), streaming algorithms, and similarity search.
The notion of diversity can be captured using several measures such as "minimum pairwise distance" and "sum of pairwise distances". In this talk, I will focus on the "determinant maximization" problem which has recently gained a lot of interest for modeling diversity. We present algorithms that are simple to implement and achieve almost optimal approximation guarantee. We further show their effectiveness on standard datasets.