Cleartext detection and language identification in ciphers

In historical cryptology, cleartext represents text written in a known language ina cipher (a hand-written manuscript aiming at hiding the content of a message).Cleartext can give us an historical interpretation and contextualisation of themanuscript and could help researchers in cryptanalysis, but...

Full description

Bibliographic Details
Main Author:	Gambardella, Maria-Elena
Format:	Others
Language:	English
Published:	Uppsala universitet, Institutionen för lingvistik och filologi 2021
Subjects:	historical cryptology digital humanities Language Technology (Computational Linguistics) Språkteknologi (språkvetenskaplig databehandling)
Online Access:	http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-446439

id	ndltd-UPSALLA1-oai-DiVA.org-uu-446439
record_format	oai_dc
spelling	ndltd-UPSALLA1-oai-DiVA.org-uu-4464392021-06-20T05:33:02ZCleartext detection and language identification in ciphersengGambardella, Maria-ElenaUppsala universitet, Institutionen för lingvistik och filologi2021historical cryptologydigital humanitiesLanguage Technology (Computational Linguistics)Språkteknologi (språkvetenskaplig databehandling)In historical cryptology, cleartext represents text written in a known language ina cipher (a hand-written manuscript aiming at hiding the content of a message).Cleartext can give us an historical interpretation and contextualisation of themanuscript and could help researchers in cryptanalysis, but to these days thereis still no research on how to automatically detect cleartext and identifying itslanguage. In this paper, we investigate to what extent we can automaticallydistinguish cleartext from ciphertext in transcribed historical ciphers and towhat extent we are able to identify its language. We took a rule-based approachand run 7 different models using historical language models on ciphertextsprovided by the DECRYPT-Project. Our results show that using unigrams andbigrams on a word-level combined with 3-grams, 4-grams and 5-grams on acharacter-level is the best approach to tackle cleartext detection. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-446439application/pdfinfo:eu-repo/semantics/openAccess
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	historical cryptology digital humanities Language Technology (Computational Linguistics) Språkteknologi (språkvetenskaplig databehandling)
spellingShingle	historical cryptology digital humanities Language Technology (Computational Linguistics) Språkteknologi (språkvetenskaplig databehandling) Gambardella, Maria-Elena Cleartext detection and language identification in ciphers
description	In historical cryptology, cleartext represents text written in a known language ina cipher (a hand-written manuscript aiming at hiding the content of a message).Cleartext can give us an historical interpretation and contextualisation of themanuscript and could help researchers in cryptanalysis, but to these days thereis still no research on how to automatically detect cleartext and identifying itslanguage. In this paper, we investigate to what extent we can automaticallydistinguish cleartext from ciphertext in transcribed historical ciphers and towhat extent we are able to identify its language. We took a rule-based approachand run 7 different models using historical language models on ciphertextsprovided by the DECRYPT-Project. Our results show that using unigrams andbigrams on a word-level combined with 3-grams, 4-grams and 5-grams on acharacter-level is the best approach to tackle cleartext detection.
author	Gambardella, Maria-Elena
author_facet	Gambardella, Maria-Elena
author_sort	Gambardella, Maria-Elena
title	Cleartext detection and language identification in ciphers
title_short	Cleartext detection and language identification in ciphers
title_full	Cleartext detection and language identification in ciphers
title_fullStr	Cleartext detection and language identification in ciphers
title_full_unstemmed	Cleartext detection and language identification in ciphers
title_sort	cleartext detection and language identification in ciphers
publisher	Uppsala universitet, Institutionen för lingvistik och filologi
publishDate	2021
url	http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-446439
work_keys_str_mv	AT gambardellamariaelena cleartextdetectionandlanguageidentificationinciphers
_version_	1719411542847389696

Cleartext detection and language identification in ciphers

Similar Items