|
|
|
|
LEADER |
02830nam a2200505Ia 4500 |
001 |
10.1109-TIT.2022.3169440 |
008 |
220510s2022 CNT 000 0 und d |
020 |
|
|
|a 00189448 (ISSN)
|
245 |
1 |
0 |
|a A Tunable Loss Function for Robust Classification: Calibration, Landscape, and Generalization
|
260 |
|
0 |
|b Institute of Electrical and Electronics Engineers Inc.
|c 2022
|
856 |
|
|
|z View Fulltext in Publisher
|u https://doi.org/10.1109/TIT.2022.3169440
|
520 |
3 |
|
|a We introduce a tunable loss function called α-loss, parameterized by α ∈ (0,∞], which interpolates between the exponential loss (α = 1/2), the log-loss (α = 1), and the 0-1 loss (α = ∞), for the machine learning setting of classification. Theoretically, we illustrate a fundamental connection between α-loss and Arimoto conditional entropy, verify the classificationcalibration of α-loss in order to demonstrate asymptotic optimality via Rademacher complexity generalization techniques, and build-upon a notion called strictly local quasi-convexity in order to quantitatively characterize the optimization landscape of α-loss. Practically, we perform class imbalance, robustness, and classification experiments on benchmark image datasets using convolutional-neural-networks. Our main practical conclusion is that certain tasks may benefit from tuning α-loss away from logloss (α = 1), and to this end we provide simple heuristics for the practitioner. In particular, navigating the α hyperparameter can readily provide superior model robustness to label flips (α > 1) and sensitivity to imbalanced classes (α < 1). IEEE
|
650 |
0 |
4 |
|a α
|
650 |
0 |
4 |
|a α-loss
|
650 |
0 |
4 |
|a Arimoto conditional entropy
|
650 |
0 |
4 |
|a Calibration
|
650 |
0 |
4 |
|a Classification (of information)
|
650 |
0 |
4 |
|a Classification algorithm
|
650 |
0 |
4 |
|a Classification algorithms
|
650 |
0 |
4 |
|a classification-calibration
|
650 |
0 |
4 |
|a Classification-calibration
|
650 |
0 |
4 |
|a Conditional entropy
|
650 |
0 |
4 |
|a Entropy
|
650 |
0 |
4 |
|a Generalisation
|
650 |
0 |
4 |
|a generalization
|
650 |
0 |
4 |
|a Logistics
|
650 |
0 |
4 |
|a -loss
|
650 |
0 |
4 |
|a Neural networks
|
650 |
0 |
4 |
|a Noise measurement
|
650 |
0 |
4 |
|a Noise measurements
|
650 |
0 |
4 |
|a Optimisations
|
650 |
0 |
4 |
|a Optimization
|
650 |
0 |
4 |
|a Privacy
|
650 |
0 |
4 |
|a Quasi convexity
|
650 |
0 |
4 |
|a robustness
|
650 |
0 |
4 |
|a Robustness
|
650 |
0 |
4 |
|a strictly local quasi-convexity
|
650 |
0 |
4 |
|a Strictly local quasi-convexity
|
700 |
1 |
|
|a Cava, J.K.
|e author
|
700 |
1 |
|
|a Dasarathy, G.
|e author
|
700 |
1 |
|
|a Diaz, M.
|e author
|
700 |
1 |
|
|a Kairouz, P.
|e author
|
700 |
1 |
|
|a Sankar, L.
|e author
|
700 |
1 |
|
|a Sypherd, T.
|e author
|
773 |
|
|
|t IEEE Transactions on Information Theory
|