STaRFormer: Semi-Supervised Task-Informed Representation Learning via Dynamic Attention-Based Regional Masking for Sequential Data

Maximilian Forstenhäusler, Daniel Külzer, Christos Anagnostopoulos, Shameem Puthiya Parambath, Natascha Weber

Advances in Neural Information Processing Systems 38 (NeurIPS 2025) Main Conference Track

Understanding user intent is essential for situational and context-aware decision-making. Motivated by a real-world scenario, this work addresses intent predictions of smart device users in the vicinity of vehicles by modeling sequential spatiotemporal data. However, in real-world scenarios, environmental factors and sensor limitations can result in non-stationary and irregularly sampled data, posing significant challenges. To address these issues, we propose STaRFormer, a Transformer-based approach that can serve as a universal framework for sequential modeling. STaRFormer utilizes a new dynamic attention-based regional masking scheme combined with a novel semi-supervised contrastive learning paradigm to enhance task-specific latent representations. Comprehensive experiments on 56 datasets varying in types (including non-stationary and irregularly sampled), tasks, domains, sequence lengths, training samples, and applications demonstrate the efficacy of STaRFormer, achieving notable improvements over state-of-the-art approaches.