FAWOS: Fairness-Aware Oversampling Algorithm Based on Distributions of Sensitive Attributes

With the increased use of machine learning algorithms to make decisions which impact people’s lives, it is of extreme importance to ensure that predictions do not prejudice subgroups of the population with respect to sensitive attributes such as race or gender. Discrimination occurs when...

Full description

Bibliographic Details
Main Authors: Teresa Salazar, Miriam Seoane Santos, Helder Araujo, Pedro Henriques Abreu
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9442706/
id doaj-9fb86190df9d412e890c49413c9a979f
record_format Article
spelling doaj-9fb86190df9d412e890c49413c9a979f2021-06-10T23:00:58ZengIEEEIEEE Access2169-35362021-01-019813708137910.1109/ACCESS.2021.30841219442706FAWOS: Fairness-Aware Oversampling Algorithm Based on Distributions of Sensitive AttributesTeresa Salazar0https://orcid.org/0000-0003-2471-5783Miriam Seoane Santos1https://orcid.org/0000-0002-5912-963XHelder Araujo2https://orcid.org/0000-0002-9544-424XPedro Henriques Abreu3https://orcid.org/0000-0002-9278-8194Department of Informatics Engineering, Centre for Informatics and Systems, University of Coimbra, Coimbra, PortugalDepartment of Informatics Engineering, Centre for Informatics and Systems, University of Coimbra, Coimbra, PortugalDepartment of Electrical and Computer Engineering, University of Coimbra, Coimbra, PortugalDepartment of Informatics Engineering, Centre for Informatics and Systems, University of Coimbra, Coimbra, PortugalWith the increased use of machine learning algorithms to make decisions which impact people&#x2019;s lives, it is of extreme importance to ensure that predictions do not prejudice subgroups of the population with respect to sensitive attributes such as race or gender. Discrimination occurs when the probability of a positive outcome changes across privileged and unprivileged groups defined by the sensitive attributes. It has been shown that this bias can be originated from imbalanced data contexts where one of the classes contains a much smaller number of instances than the other classes. It is also important to identify the nature of the imbalanced data, including the characteristics of the minority classes&#x2019; distribution. This paper presents FAWOS: a Fairness-Aware oversampling algorithm which aims to attenuate unfair treatment by handling sensitive attributes&#x2019; imbalance. We categorize different types of datapoints according to their local neighbourhood with respect to the sensitive attributes, identifying which are more difficult to learn by the classifiers. In order to balance the dataset, FAWOS oversamples the training data by creating new synthetic datapoints using the different types of datapoints identified. We test the impact of FAWOS on different learning classifiers and analyze which can better handle sensitive attribute imbalance. Empirically, we observe that this algorithm can effectively increase the fairness results of the classifiers while not neglecting the classification performance. Source code can be found at: <uri>https://github.com/teresalazar13/FAWOS</uri>https://ieeexplore.ieee.org/document/9442706/Classification biasfairnessimbalanced dataK-nearest neighborhoodoversampling
collection DOAJ
language English
format Article
sources DOAJ
author Teresa Salazar
Miriam Seoane Santos
Helder Araujo
Pedro Henriques Abreu
spellingShingle Teresa Salazar
Miriam Seoane Santos
Helder Araujo
Pedro Henriques Abreu
FAWOS: Fairness-Aware Oversampling Algorithm Based on Distributions of Sensitive Attributes
IEEE Access
Classification bias
fairness
imbalanced data
K-nearest neighborhood
oversampling
author_facet Teresa Salazar
Miriam Seoane Santos
Helder Araujo
Pedro Henriques Abreu
author_sort Teresa Salazar
title FAWOS: Fairness-Aware Oversampling Algorithm Based on Distributions of Sensitive Attributes
title_short FAWOS: Fairness-Aware Oversampling Algorithm Based on Distributions of Sensitive Attributes
title_full FAWOS: Fairness-Aware Oversampling Algorithm Based on Distributions of Sensitive Attributes
title_fullStr FAWOS: Fairness-Aware Oversampling Algorithm Based on Distributions of Sensitive Attributes
title_full_unstemmed FAWOS: Fairness-Aware Oversampling Algorithm Based on Distributions of Sensitive Attributes
title_sort fawos: fairness-aware oversampling algorithm based on distributions of sensitive attributes
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2021-01-01
description With the increased use of machine learning algorithms to make decisions which impact people&#x2019;s lives, it is of extreme importance to ensure that predictions do not prejudice subgroups of the population with respect to sensitive attributes such as race or gender. Discrimination occurs when the probability of a positive outcome changes across privileged and unprivileged groups defined by the sensitive attributes. It has been shown that this bias can be originated from imbalanced data contexts where one of the classes contains a much smaller number of instances than the other classes. It is also important to identify the nature of the imbalanced data, including the characteristics of the minority classes&#x2019; distribution. This paper presents FAWOS: a Fairness-Aware oversampling algorithm which aims to attenuate unfair treatment by handling sensitive attributes&#x2019; imbalance. We categorize different types of datapoints according to their local neighbourhood with respect to the sensitive attributes, identifying which are more difficult to learn by the classifiers. In order to balance the dataset, FAWOS oversamples the training data by creating new synthetic datapoints using the different types of datapoints identified. We test the impact of FAWOS on different learning classifiers and analyze which can better handle sensitive attribute imbalance. Empirically, we observe that this algorithm can effectively increase the fairness results of the classifiers while not neglecting the classification performance. Source code can be found at: <uri>https://github.com/teresalazar13/FAWOS</uri>
topic Classification bias
fairness
imbalanced data
K-nearest neighborhood
oversampling
url https://ieeexplore.ieee.org/document/9442706/
work_keys_str_mv AT teresasalazar fawosfairnessawareoversamplingalgorithmbasedondistributionsofsensitiveattributes
AT miriamseoanesantos fawosfairnessawareoversamplingalgorithmbasedondistributionsofsensitiveattributes
AT helderaraujo fawosfairnessawareoversamplingalgorithmbasedondistributionsofsensitiveattributes
AT pedrohenriquesabreu fawosfairnessawareoversamplingalgorithmbasedondistributionsofsensitiveattributes
_version_ 1721384290736930816