Estimating Car Insurance Premia: a Case Study in High-Dimensional Data Inference

Part of Advances in Neural Information Processing Systems 14 (NIPS 2001)

Bibtex Metadata Paper


Nicolas Chapados, Yoshua Bengio, Pascal Vincent, Joumana Ghosn, Charles Dugas, Ichiro Takeuchi, Linyan Meng


Estimating insurance premia from data is a difficult regression problem for several reasons: the large number of variables, many of which are .discrete, and the very peculiar shape of the noise distri(cid:173) bution, asymmetric with fat tails, with a large majority zeros and a few unreliable and very large values. We compare several machine learning methods for estimating insurance premia, and test them on a large data base of car insurance policies. We find that func(cid:173) tion approximation methods that do not optimize a squared loss, like Support Vector Machines regression, do not work well in this context. Compared methods include decision trees and generalized linear models. The best results are obtained with a mixture of experts, which better identifies the least and most risky contracts, and allows to reduce the median premium by charging more to the most risky customers.