GIST: Greedy Independent Set Thresholding for Max-Min Diversification with Submodular Utility

Matthew Fahrbach, Srikumar Ramalingam, Morteza Zadimoghaddam, Sara Ahmadian, Gui Citovsky, Giulia DeSalvo

Advances in Neural Information Processing Systems 38 (NeurIPS 2025) Main Conference Track

This work studies a novel subset selection problem called *max-min diversification with monotone submodular utility* (MDMS), which has a wide range of applications in machine learning, e.g., data sampling and feature selection. Given a set of points in a metric space, the goal of MDMS is to maximize $f(S) = g(S) + \lambda \cdot \text{div}(S)$ subject to a cardinality constraint $|S| \le k$, where $g(S)$ is a monotone submodular function and $\text{div}(S) = \min_{u,v \in S : u \ne v} \text{dist}(u,v)$ is the *max-min diversity* objective. We propose the `GIST` algorithm, which gives a $\frac{1}{2}$-approximation guarantee for MDMS by approximating a series of maximum independent set problems with a bicriteria greedy algorithm. We also prove that it is NP-hard to approximate within a factor of $0.5584$. Finally, we show in our empirical study that `GIST` outperforms state-of-the-art benchmarks for a single-shot data sampling task on ImageNet.