A Tunable Loss Function for Robust Classification: Calibration, Landscape, and Generalization

We introduce a tunable loss function called α-loss, parameterized by α ∈ (0,∞], which interpolates between the exponential loss (α = 1/2), the log-loss (α = 1), and the 0-1 loss (α = ∞), for the machine learning...

Full description

Bibliographic Details
Main Authors: Cava, J.K (Author), Dasarathy, G. (Author), Diaz, M. (Author), Kairouz, P. (Author), Sankar, L. (Author), Sypherd, T. (Author)
Format: Article
Language:English
Published: Institute of Electrical and Electronics Engineers Inc. 2022
Subjects:
Online Access:View Fulltext in Publisher
LEADER 02830nam a2200505Ia 4500
001 10.1109-TIT.2022.3169440
008 220510s2022 CNT 000 0 und d
020 |a 00189448 (ISSN) 
245 1 0 |a A Tunable Loss Function for Robust Classification: Calibration, Landscape, and Generalization 
260 0 |b Institute of Electrical and Electronics Engineers Inc.  |c 2022 
856 |z View Fulltext in Publisher  |u https://doi.org/10.1109/TIT.2022.3169440 
520 3 |a We introduce a tunable loss function called α-loss, parameterized by α ∈ (0,∞], which interpolates between the exponential loss (α = 1/2), the log-loss (α = 1), and the 0-1 loss (α = ∞), for the machine learning setting of classification. Theoretically, we illustrate a fundamental connection between α-loss and Arimoto conditional entropy, verify the classificationcalibration of α-loss in order to demonstrate asymptotic optimality via Rademacher complexity generalization techniques, and build-upon a notion called strictly local quasi-convexity in order to quantitatively characterize the optimization landscape of α-loss. Practically, we perform class imbalance, robustness, and classification experiments on benchmark image datasets using convolutional-neural-networks. Our main practical conclusion is that certain tasks may benefit from tuning α-loss away from logloss (α = 1), and to this end we provide simple heuristics for the practitioner. In particular, navigating the α hyperparameter can readily provide superior model robustness to label flips (α > 1) and sensitivity to imbalanced classes (α < 1). IEEE 
650 0 4 |a α 
650 0 4 |a α-loss 
650 0 4 |a Arimoto conditional entropy 
650 0 4 |a Calibration 
650 0 4 |a Classification (of information) 
650 0 4 |a Classification algorithm 
650 0 4 |a Classification algorithms 
650 0 4 |a classification-calibration 
650 0 4 |a Classification-calibration 
650 0 4 |a Conditional entropy 
650 0 4 |a Entropy 
650 0 4 |a Generalisation 
650 0 4 |a generalization 
650 0 4 |a Logistics 
650 0 4 |a -loss 
650 0 4 |a Neural networks 
650 0 4 |a Noise measurement 
650 0 4 |a Noise measurements 
650 0 4 |a Optimisations 
650 0 4 |a Optimization 
650 0 4 |a Privacy 
650 0 4 |a Quasi convexity 
650 0 4 |a robustness 
650 0 4 |a Robustness 
650 0 4 |a strictly local quasi-convexity 
650 0 4 |a Strictly local quasi-convexity 
700 1 |a Cava, J.K.  |e author 
700 1 |a Dasarathy, G.  |e author 
700 1 |a Diaz, M.  |e author 
700 1 |a Kairouz, P.  |e author 
700 1 |a Sankar, L.  |e author 
700 1 |a Sypherd, T.  |e author 
773 |t IEEE Transactions on Information Theory