Generation of Synthetic Data for Handwritten Word Alteration Detection
Fraudsters often alter handwritten contents in a document in order to achieve illicit purposes. At times, this may result in financial and mental loss to an individual or an organization. Hence, ink analysis is necessary to identify such an alteration. Convolution Neural Network (CNN) can be used to...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2021-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9354159/ |
id |
doaj-cf8c40f532b149cbb63eb0f6e294ca03 |
---|---|
record_format |
Article |
spelling |
doaj-cf8c40f532b149cbb63eb0f6e294ca032021-03-30T15:21:17ZengIEEEIEEE Access2169-35362021-01-019389793899010.1109/ACCESS.2021.30593429354159Generation of Synthetic Data for Handwritten Word Alteration DetectionPrabhat Dansena0https://orcid.org/0000-0001-5982-1215Soumen Bag1Rajarshi Pal2Department of Computer Science and Engineering, Indian Institute of Technology (ISM) Dhanbad, Dhanbad, IndiaDepartment of Computer Science and Engineering, Indian Institute of Technology (ISM) Dhanbad, Dhanbad, IndiaInstitute for Development and Research in Banking Technology, Hyderabad, IndiaFraudsters often alter handwritten contents in a document in order to achieve illicit purposes. At times, this may result in financial and mental loss to an individual or an organization. Hence, ink analysis is necessary to identify such an alteration. Convolution Neural Network (CNN) can be used to identify such cases of alteration, as CNN has emerged as a monumental success in the field of computer vision for varieties of classification tasks. But, CNN requires large amount of labeled data for training. Hence, there is a need to generate a large dataset for the experiments relating to handwritten word alteration detection. Collection, digitization, and cropping of a large number of altered and unaltered handwritten words are tedious and time consuming. To overcome such an issue, an approach for synthetic word data generation is presented in this paper for handwritten word alteration detection experiments. This scheme is designed in such a way that the synthetically generated words are very similar to the original ones. In order to achieve this, handwritten character data set is prepared using 10 blue and 10 black pens. These handwritten characters are used for creating synthetic word alteration data set. The presented approach uses relatively less number of handwritten character images to create a huge word alteration data set. Further, deep learning models are trained on the synthetically generated data set for word alteration detection.https://ieeexplore.ieee.org/document/9354159/Convolution neural networkdocument forensicshandwrittenink analysissynthetic data |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Prabhat Dansena Soumen Bag Rajarshi Pal |
spellingShingle |
Prabhat Dansena Soumen Bag Rajarshi Pal Generation of Synthetic Data for Handwritten Word Alteration Detection IEEE Access Convolution neural network document forensics handwritten ink analysis synthetic data |
author_facet |
Prabhat Dansena Soumen Bag Rajarshi Pal |
author_sort |
Prabhat Dansena |
title |
Generation of Synthetic Data for Handwritten Word Alteration Detection |
title_short |
Generation of Synthetic Data for Handwritten Word Alteration Detection |
title_full |
Generation of Synthetic Data for Handwritten Word Alteration Detection |
title_fullStr |
Generation of Synthetic Data for Handwritten Word Alteration Detection |
title_full_unstemmed |
Generation of Synthetic Data for Handwritten Word Alteration Detection |
title_sort |
generation of synthetic data for handwritten word alteration detection |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2021-01-01 |
description |
Fraudsters often alter handwritten contents in a document in order to achieve illicit purposes. At times, this may result in financial and mental loss to an individual or an organization. Hence, ink analysis is necessary to identify such an alteration. Convolution Neural Network (CNN) can be used to identify such cases of alteration, as CNN has emerged as a monumental success in the field of computer vision for varieties of classification tasks. But, CNN requires large amount of labeled data for training. Hence, there is a need to generate a large dataset for the experiments relating to handwritten word alteration detection. Collection, digitization, and cropping of a large number of altered and unaltered handwritten words are tedious and time consuming. To overcome such an issue, an approach for synthetic word data generation is presented in this paper for handwritten word alteration detection experiments. This scheme is designed in such a way that the synthetically generated words are very similar to the original ones. In order to achieve this, handwritten character data set is prepared using 10 blue and 10 black pens. These handwritten characters are used for creating synthetic word alteration data set. The presented approach uses relatively less number of handwritten character images to create a huge word alteration data set. Further, deep learning models are trained on the synthetically generated data set for word alteration detection. |
topic |
Convolution neural network document forensics handwritten ink analysis synthetic data |
url |
https://ieeexplore.ieee.org/document/9354159/ |
work_keys_str_mv |
AT prabhatdansena generationofsyntheticdataforhandwrittenwordalterationdetection AT soumenbag generationofsyntheticdataforhandwrittenwordalterationdetection AT rajarshipal generationofsyntheticdataforhandwrittenwordalterationdetection |
_version_ |
1724179669482209280 |