Validating Seed Data Samples for Synthetic Identities – Methodology and Uniqueness Metrics

This work explores the identity attribute of synthetic face samples derived from Generative Adversarial Networks. The goal is to determine if individual samples are unique in terms of identity, firstly with respect to the seed dataset that trains the GAN model and secondly with respect to other synt...

Full description

Bibliographic Details
Main Authors:	Viktor Varkarakis, Shabab Bazrafkan, Gabriel Costache, Peter Corcoran
Format:	Article
Language:	English
Published:	IEEE 2020-01-01
Series:	IEEE Access
Subjects:	Artificial intelligence computer vision face recognition generative adversarial networks (GANs) StyleGAN synthetic face
Online Access:	https://ieeexplore.ieee.org/document/9165737/

id	doaj-ac3051dd95a84be8b14e597a8132d634
record_format	Article
spelling	doaj-ac3051dd95a84be8b14e597a8132d6342021-03-30T04:08:35ZengIEEEIEEE Access2169-35362020-01-01815253215255010.1109/ACCESS.2020.30160979165737Validating Seed Data Samples for Synthetic Identities – Methodology and Uniqueness MetricsViktor Varkarakis0https://orcid.org/0000-0002-3877-2802Shabab Bazrafkan1Gabriel Costache2Peter Corcoran3https://orcid.org/0000-0003-1670-4793Department of Electronic Engineering, College of Science and Engineering, National University of Ireland Galway, Galway, IrelandDepartment of Physics, Imec Vision Laboratory, University of Antwerp, Antwerp, BelgiumXperi, Galway, IrelandDepartment of Electronic Engineering, College of Science and Engineering, National University of Ireland Galway, Galway, IrelandThis work explores the identity attribute of synthetic face samples derived from Generative Adversarial Networks. The goal is to determine if individual samples are unique in terms of identity, firstly with respect to the seed dataset that trains the GAN model and secondly with respect to other synthetic face samples. Two approaches are introduced to enable the comparative analysis of large sets of synthetic face samples. The first of these uses ROC curves to determine identity uniqueness using a number of large publicly available datasets of real facial samples to provide reference ROCs as a baseline. The second approach uses a thresholding technique utilizing again large publicly available datasets as a reference. For this approach, new metrics are introduced, and a technique is provided to remove the most connected data samples within a large synthetic dataset. The remaining synthetic samples can be considered as unique as data samples gathered from different real individuals. Several StyleGAN models are used to create the synthetic datasets, and variations in key model parameters are explored. It is concluded that the resulting synthetic data samples exhibit excellent uniqueness when compared with the original training dataset, but significantly less uniqueness when comparisons are made within the synthetic dataset. Nevertheless, it is possible to remove the most highly connected synthetic data samples. Thus, in some cases, up to 92% of the data samples in a 20k synthetic dataset can be shown to exhibit similar uniqueness to data samples taken from real public datasets.https://ieeexplore.ieee.org/document/9165737/Artificial intelligencecomputer visionface recognitiongenerative adversarial networks (GANs)StyleGANsynthetic face
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Viktor Varkarakis Shabab Bazrafkan Gabriel Costache Peter Corcoran
spellingShingle	Viktor Varkarakis Shabab Bazrafkan Gabriel Costache Peter Corcoran Validating Seed Data Samples for Synthetic Identities – Methodology and Uniqueness Metrics IEEE Access Artificial intelligence computer vision face recognition generative adversarial networks (GANs) StyleGAN synthetic face
author_facet	Viktor Varkarakis Shabab Bazrafkan Gabriel Costache Peter Corcoran
author_sort	Viktor Varkarakis
title	Validating Seed Data Samples for Synthetic Identities – Methodology and Uniqueness Metrics
title_short	Validating Seed Data Samples for Synthetic Identities – Methodology and Uniqueness Metrics
title_full	Validating Seed Data Samples for Synthetic Identities – Methodology and Uniqueness Metrics
title_fullStr	Validating Seed Data Samples for Synthetic Identities – Methodology and Uniqueness Metrics
title_full_unstemmed	Validating Seed Data Samples for Synthetic Identities – Methodology and Uniqueness Metrics
title_sort	validating seed data samples for synthetic identities – methodology and uniqueness metrics
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2020-01-01
description	This work explores the identity attribute of synthetic face samples derived from Generative Adversarial Networks. The goal is to determine if individual samples are unique in terms of identity, firstly with respect to the seed dataset that trains the GAN model and secondly with respect to other synthetic face samples. Two approaches are introduced to enable the comparative analysis of large sets of synthetic face samples. The first of these uses ROC curves to determine identity uniqueness using a number of large publicly available datasets of real facial samples to provide reference ROCs as a baseline. The second approach uses a thresholding technique utilizing again large publicly available datasets as a reference. For this approach, new metrics are introduced, and a technique is provided to remove the most connected data samples within a large synthetic dataset. The remaining synthetic samples can be considered as unique as data samples gathered from different real individuals. Several StyleGAN models are used to create the synthetic datasets, and variations in key model parameters are explored. It is concluded that the resulting synthetic data samples exhibit excellent uniqueness when compared with the original training dataset, but significantly less uniqueness when comparisons are made within the synthetic dataset. Nevertheless, it is possible to remove the most highly connected synthetic data samples. Thus, in some cases, up to 92% of the data samples in a 20k synthetic dataset can be shown to exhibit similar uniqueness to data samples taken from real public datasets.
topic	Artificial intelligence computer vision face recognition generative adversarial networks (GANs) StyleGAN synthetic face
url	https://ieeexplore.ieee.org/document/9165737/
work_keys_str_mv	AT viktorvarkarakis validatingseeddatasamplesforsyntheticidentitiesx2013methodologyanduniquenessmetrics AT shababbazrafkan validatingseeddatasamplesforsyntheticidentitiesx2013methodologyanduniquenessmetrics AT gabrielcostache validatingseeddatasamplesforsyntheticidentitiesx2013methodologyanduniquenessmetrics AT petercorcoran validatingseeddatasamplesforsyntheticidentitiesx2013methodologyanduniquenessmetrics
_version_	1724182333084401664

Validating Seed Data Samples for Synthetic Identities &#x2013; Methodology and Uniqueness Metrics

Similar Items

Validating Seed Data Samples for Synthetic Identities – Methodology and Uniqueness Metrics