Navigating the MIL Trade-Off: Flexible Pooling for Whole Slide Image Classification

Jafarinia, Hossein; Hamdi, Danial; Alamdar, Amirhossein; Zahiri, Elahe; Vafaie Tabar, Soroush; Alipanah, Alireza; Mirzaie, Nahal; Razavi, Saeed; Najafi, Amir; Rohban, Mohammad Hossein

Navigating the MIL Trade-Off: Flexible Pooling for Whole Slide Image Classification

Hossein Jafarinia, Danial Hamdi, Amirhossein Alamdar, Elahe Zahiri, Soroush Vafaie Tabar, Alireza Alipanah, Nahal Mirzaie, Saeed Razavi, Amir Najafi, Mohammad Hossein Rohban

Advances in Neural Information Processing Systems 38 (NeurIPS 2025) Main Conference Track

Bibtex Paper

Abstract

Multiple Instance Learning (MIL) is a standard weakly supervised approach for Whole Slide Image (WSI) classification, where performance hinges on both feature representation and MIL pooling strategies. Recent research has predominantly focused on Transformer-based architectures adapted for WSIs. However, we argue that this trend faces a fundamental limitation: data scarcity. In typical settings, Transformer models yield only marginal gains without access to large-scale datasets—resources that are virtually inaccessible to all but a few well-funded research labs. Motivated by this, we revisit simple, non-attention MIL with unsupervised slide features and analyze temperature-$\beta$-controlled log-sum-exp (LSE) pooling. For slides partitioned into $N$ patches, we theoretically show that LSE has a smooth transition at a critical $\beta_{\mathrm{crit}}=\mathcal{O}(\log N)$ threshold, interpolating between mean-like aggregation (stable, better generalization but less sensitive) and max-like aggregation (more sensitive but looser generalization bounds). Grounded in this analysis, we introduce Maxsoft—a novel MIL pooling function that enables flexible control over this trade-off, allowing adaptation to specific tasks and datasets. To further tackle real-world deployment challenges such as specimen heterogeneity, we propose PerPatch augmentation—a simple yet effective technique that enhances model robustness. Empirically, Maxsoft achieves state-of-the-art performance in low-data regimes across four major benchmarks (CAMELYON16, CAMELYON17, TCGA-Lung, and SICAP-MIL), often matching or surpassing large-scale foundation models. When combined with PerPatch augmentation, this performance is further improved through increased robustness. Code is available at \href{https://github.com/jafarinia/maxsoft}{\texttt{https://github.com/jafarinia/maxsoft}}

Abstract

Name Change Policy