Quantifying Bias in a Face Verification System

Machine learning models perform face verification (FV) for a variety of highly consequential applications, such as biometric authentication, face identification, and surveillance. Many state-of-the-art FV systems suffer from unequal performance across demographic groups, which is commonly overlooked...

Full description

Bibliographic Details
Main Authors: Frisella, Megan (Author), Khorrami, Pooya (Author), Matterer, Jason (Author), Kratkiewicz, Kendra (Author), Torres-Carrasquillo, Pedro (Author)
Format: Article
Language:English
Published: Multidisciplinary Digital Publishing Institute, 2022-04-25T12:36:10Z.
Subjects:
Online Access:Get fulltext
LEADER 01702 am a22001693u 4500
001 142034
042 |a dc 
100 1 0 |a Frisella, Megan  |e author 
700 1 0 |a Khorrami, Pooya  |e author 
700 1 0 |a Matterer, Jason  |e author 
700 1 0 |a Kratkiewicz, Kendra  |e author 
700 1 0 |a Torres-Carrasquillo, Pedro  |e author 
245 0 0 |a Quantifying Bias in a Face Verification System 
260 |b Multidisciplinary Digital Publishing Institute,   |c 2022-04-25T12:36:10Z. 
856 |z Get fulltext  |u https://hdl.handle.net/1721.1/142034 
520 |a Machine learning models perform face verification (FV) for a variety of highly consequential applications, such as biometric authentication, face identification, and surveillance. Many state-of-the-art FV systems suffer from unequal performance across demographic groups, which is commonly overlooked by evaluation measures that do not assess population-specific performance. Deployed systems with bias may result in serious harm against individuals or groups who experience underperformance. We explore several fairness definitions and metrics, attempting to quantify bias in Google’s FaceNet model. In addition to statistical fairness metrics, we analyze clustered face embeddings produced by the FV model. We link well-clustered embeddings (well-defined, dense clusters) for a demographic group to biased model performance against that group. We present the intuition that FV systems underperform on protected demographic groups because they are less sensitive to differences between features within those groups, as evidenced by clustered embeddings. We show how this performance discrepancy results from a combination of representation and aggregation bias. 
655 7 |a Article