Generation of Synthetic Data for Handwritten Word Alteration Detection

Fraudsters often alter handwritten contents in a document in order to achieve illicit purposes. At times, this may result in financial and mental loss to an individual or an organization. Hence, ink analysis is necessary to identify such an alteration. Convolution Neural Network (CNN) can be used to...

Full description

Bibliographic Details
Main Authors: Prabhat Dansena, Soumen Bag, Rajarshi Pal
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9354159/
id doaj-cf8c40f532b149cbb63eb0f6e294ca03
record_format Article
spelling doaj-cf8c40f532b149cbb63eb0f6e294ca032021-03-30T15:21:17ZengIEEEIEEE Access2169-35362021-01-019389793899010.1109/ACCESS.2021.30593429354159Generation of Synthetic Data for Handwritten Word Alteration DetectionPrabhat Dansena0https://orcid.org/0000-0001-5982-1215Soumen Bag1Rajarshi Pal2Department of Computer Science and Engineering, Indian Institute of Technology (ISM) Dhanbad, Dhanbad, IndiaDepartment of Computer Science and Engineering, Indian Institute of Technology (ISM) Dhanbad, Dhanbad, IndiaInstitute for Development and Research in Banking Technology, Hyderabad, IndiaFraudsters often alter handwritten contents in a document in order to achieve illicit purposes. At times, this may result in financial and mental loss to an individual or an organization. Hence, ink analysis is necessary to identify such an alteration. Convolution Neural Network (CNN) can be used to identify such cases of alteration, as CNN has emerged as a monumental success in the field of computer vision for varieties of classification tasks. But, CNN requires large amount of labeled data for training. Hence, there is a need to generate a large dataset for the experiments relating to handwritten word alteration detection. Collection, digitization, and cropping of a large number of altered and unaltered handwritten words are tedious and time consuming. To overcome such an issue, an approach for synthetic word data generation is presented in this paper for handwritten word alteration detection experiments. This scheme is designed in such a way that the synthetically generated words are very similar to the original ones. In order to achieve this, handwritten character data set is prepared using 10 blue and 10 black pens. These handwritten characters are used for creating synthetic word alteration data set. The presented approach uses relatively less number of handwritten character images to create a huge word alteration data set. Further, deep learning models are trained on the synthetically generated data set for word alteration detection.https://ieeexplore.ieee.org/document/9354159/Convolution neural networkdocument forensicshandwrittenink analysissynthetic data
collection DOAJ
language English
format Article
sources DOAJ
author Prabhat Dansena
Soumen Bag
Rajarshi Pal
spellingShingle Prabhat Dansena
Soumen Bag
Rajarshi Pal
Generation of Synthetic Data for Handwritten Word Alteration Detection
IEEE Access
Convolution neural network
document forensics
handwritten
ink analysis
synthetic data
author_facet Prabhat Dansena
Soumen Bag
Rajarshi Pal
author_sort Prabhat Dansena
title Generation of Synthetic Data for Handwritten Word Alteration Detection
title_short Generation of Synthetic Data for Handwritten Word Alteration Detection
title_full Generation of Synthetic Data for Handwritten Word Alteration Detection
title_fullStr Generation of Synthetic Data for Handwritten Word Alteration Detection
title_full_unstemmed Generation of Synthetic Data for Handwritten Word Alteration Detection
title_sort generation of synthetic data for handwritten word alteration detection
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2021-01-01
description Fraudsters often alter handwritten contents in a document in order to achieve illicit purposes. At times, this may result in financial and mental loss to an individual or an organization. Hence, ink analysis is necessary to identify such an alteration. Convolution Neural Network (CNN) can be used to identify such cases of alteration, as CNN has emerged as a monumental success in the field of computer vision for varieties of classification tasks. But, CNN requires large amount of labeled data for training. Hence, there is a need to generate a large dataset for the experiments relating to handwritten word alteration detection. Collection, digitization, and cropping of a large number of altered and unaltered handwritten words are tedious and time consuming. To overcome such an issue, an approach for synthetic word data generation is presented in this paper for handwritten word alteration detection experiments. This scheme is designed in such a way that the synthetically generated words are very similar to the original ones. In order to achieve this, handwritten character data set is prepared using 10 blue and 10 black pens. These handwritten characters are used for creating synthetic word alteration data set. The presented approach uses relatively less number of handwritten character images to create a huge word alteration data set. Further, deep learning models are trained on the synthetically generated data set for word alteration detection.
topic Convolution neural network
document forensics
handwritten
ink analysis
synthetic data
url https://ieeexplore.ieee.org/document/9354159/
work_keys_str_mv AT prabhatdansena generationofsyntheticdataforhandwrittenwordalterationdetection
AT soumenbag generationofsyntheticdataforhandwrittenwordalterationdetection
AT rajarshipal generationofsyntheticdataforhandwrittenwordalterationdetection
_version_ 1724179669482209280