Lost in Transcription : Evaluating Clustering and Few-Shot learningfor transcription of historical ciphers

Where there has been a steady development of Optical Character Recognition (OCR) techniques for printed documents, the instruments that provide good quality for hand-written manuscripts by Hand-written Text Recognition methods (HTR) and transcriptions are still some steps behind. With the main focu...

Full description

Bibliographic Details
Main Author:	Magnifico, Giacomo
Format:	Others
Language:	English
Published:	Uppsala universitet, Institutionen för lingvistik och filologi 2021
Subjects:	Image Recognition Handwritten Text Recognition HTR Deep-learning K-mean clustering NN Neural Network Few-Shot Language Technology (Computational Linguistics) Språkteknologi (språkvetenskaplig databehandling)
Online Access:	http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-460248

id	ndltd-UPSALLA1-oai-DiVA.org-uu-460248
record_format	oai_dc
spelling	ndltd-UPSALLA1-oai-DiVA.org-uu-4602482021-12-05T05:47:01ZLost in Transcription : Evaluating Clustering and Few-Shot learningfor transcription of historical ciphersengMagnifico, GiacomoUppsala universitet, Institutionen för lingvistik och filologi2021Image RecognitionHandwritten Text RecognitionHTRDeep-learningK-mean clusteringNNNeural NetworkFew-ShotLanguage Technology (Computational Linguistics)Språkteknologi (språkvetenskaplig databehandling)Where there has been a steady development of Optical Character Recognition (OCR) techniques for printed documents, the instruments that provide good quality for hand-written manuscripts by Hand-written Text Recognition methods (HTR) and transcriptions are still some steps behind. With the main focus on historical ciphers (i.e. encrypted documents from the past with various types of symbol sets), this thesis examines the performance of two machine learning architectures developed within the DECRYPT project framework, a clustering based unsupervised algorithm and a semi-supervised few-shot deep-learning model. Both models are tested on seen and unseen scribes to evaluate the difference in performance and the shortcomings of the two architectures, with the secondary goal of determining the influences of the datasets on the performance. An in-depth analysis of the transcription results is performed with particular focus on the Alchemic and Zodiac symbol sets, with analysis of the model performance relative to character shape and size. The results show the promising performance of Few-Shot architectures when compared to Clustering algorithm, with a respective SER average of 0.336 (0.15 and 0.104 on seen data / 0.754 on unseen data) and 0.596 (0.638 and 0.350 on seen data / 0.8 on unseen data). Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-460248application/pdfinfo:eu-repo/semantics/openAccess
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	Image Recognition Handwritten Text Recognition HTR Deep-learning K-mean clustering NN Neural Network Few-Shot Language Technology (Computational Linguistics) Språkteknologi (språkvetenskaplig databehandling)
spellingShingle	Image Recognition Handwritten Text Recognition HTR Deep-learning K-mean clustering NN Neural Network Few-Shot Language Technology (Computational Linguistics) Språkteknologi (språkvetenskaplig databehandling) Magnifico, Giacomo Lost in Transcription : Evaluating Clustering and Few-Shot learningfor transcription of historical ciphers
description	Where there has been a steady development of Optical Character Recognition (OCR) techniques for printed documents, the instruments that provide good quality for hand-written manuscripts by Hand-written Text Recognition methods (HTR) and transcriptions are still some steps behind. With the main focus on historical ciphers (i.e. encrypted documents from the past with various types of symbol sets), this thesis examines the performance of two machine learning architectures developed within the DECRYPT project framework, a clustering based unsupervised algorithm and a semi-supervised few-shot deep-learning model. Both models are tested on seen and unseen scribes to evaluate the difference in performance and the shortcomings of the two architectures, with the secondary goal of determining the influences of the datasets on the performance. An in-depth analysis of the transcription results is performed with particular focus on the Alchemic and Zodiac symbol sets, with analysis of the model performance relative to character shape and size. The results show the promising performance of Few-Shot architectures when compared to Clustering algorithm, with a respective SER average of 0.336 (0.15 and 0.104 on seen data / 0.754 on unseen data) and 0.596 (0.638 and 0.350 on seen data / 0.8 on unseen data).
author	Magnifico, Giacomo
author_facet	Magnifico, Giacomo
author_sort	Magnifico, Giacomo
title	Lost in Transcription : Evaluating Clustering and Few-Shot learningfor transcription of historical ciphers
title_short	Lost in Transcription : Evaluating Clustering and Few-Shot learningfor transcription of historical ciphers
title_full	Lost in Transcription : Evaluating Clustering and Few-Shot learningfor transcription of historical ciphers
title_fullStr	Lost in Transcription : Evaluating Clustering and Few-Shot learningfor transcription of historical ciphers
title_full_unstemmed	Lost in Transcription : Evaluating Clustering and Few-Shot learningfor transcription of historical ciphers
title_sort	lost in transcription : evaluating clustering and few-shot learningfor transcription of historical ciphers
publisher	Uppsala universitet, Institutionen för lingvistik och filologi
publishDate	2021
url	http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-460248
work_keys_str_mv	AT magnificogiacomo lostintranscriptionevaluatingclusteringandfewshotlearningfortranscriptionofhistoricalciphers
_version_	1723963659865030656

Lost in Transcription : Evaluating Clustering and Few-Shot learningfor transcription of historical ciphers

Similar Items