Coreset for Robust Geometric Median: Eliminating Size Dependency on Outliers

Ziyi Fang, Lingxiao Huang, Runkai Yang

Advances in Neural Information Processing Systems 38 (NeurIPS 2025) Main Conference Track

We study the robust geometric median problem in Euclidean space $\mathbb{R}^d$, with a focus on coreset construction. A coreset is a compact summary of a dataset $P$ of size $n$ that approximates the robust cost for all centers $c$ within a multiplicative error $\varepsilon$. Given an outlier count $m$, we construct a coreset of size $\tilde{O}(\varepsilon^{-2} \cdot \min \\{ \varepsilon^{-2}, d \\})$ when $n \geq 4m$, eliminating the $O(m)$ dependency present in prior work [Huang et al., 2022 & 2023]. For the special case of $d = 1$, we achieve an optimal coreset size of $\tilde{\Theta}(\varepsilon^{-1/2} + \frac{m}{n} \varepsilon^{-1})$, revealing a clear separation from the vanilla case studied in [Huang et al., 2023; Afshani and Chris, 2024]. Our results further extend to robust $(k,z)$-clustering in various metric spaces, eliminating the $m$-dependence under mild data assumptions. The key technical contribution is a novel non-component-wise error analysis, enabling substantial reduction of outlier influence, unlike prior methods that retain them. Empirically, our algorithms consistently outperform existing baselines in terms of size-accuracy tradeoffs and runtime, even when data assumptions are violated across a wide range of datasets.