Cleartext detection and language identification in ciphers

In historical cryptology, cleartext represents text written in a known language ina cipher (a hand-written manuscript aiming at hiding the content of a message).Cleartext can give us an historical interpretation and contextualisation of themanuscript and could help researchers in cryptanalysis, but...

Full description

Bibliographic Details
Main Author: Gambardella, Maria-Elena
Format: Others
Language:English
Published: Uppsala universitet, Institutionen för lingvistik och filologi 2021
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-446439
id ndltd-UPSALLA1-oai-DiVA.org-uu-446439
record_format oai_dc
spelling ndltd-UPSALLA1-oai-DiVA.org-uu-4464392021-06-20T05:33:02ZCleartext detection and language identification in ciphersengGambardella, Maria-ElenaUppsala universitet, Institutionen för lingvistik och filologi2021historical cryptologydigital humanitiesLanguage Technology (Computational Linguistics)Språkteknologi (språkvetenskaplig databehandling)In historical cryptology, cleartext represents text written in a known language ina cipher (a hand-written manuscript aiming at hiding the content of a message).Cleartext can give us an historical interpretation and contextualisation of themanuscript and could help researchers in cryptanalysis, but to these days thereis still no research on how to automatically detect cleartext and identifying itslanguage. In this paper, we investigate to what extent we can automaticallydistinguish cleartext from ciphertext in transcribed historical ciphers and towhat extent we are able to identify its language. We took a rule-based approachand run 7 different models using historical language models on ciphertextsprovided by the DECRYPT-Project. Our results show that using unigrams andbigrams on a word-level combined with 3-grams, 4-grams and 5-grams on acharacter-level is the best approach to tackle cleartext detection. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-446439application/pdfinfo:eu-repo/semantics/openAccess
collection NDLTD
language English
format Others
sources NDLTD
topic historical cryptology
digital humanities
Language Technology (Computational Linguistics)
Språkteknologi (språkvetenskaplig databehandling)
spellingShingle historical cryptology
digital humanities
Language Technology (Computational Linguistics)
Språkteknologi (språkvetenskaplig databehandling)
Gambardella, Maria-Elena
Cleartext detection and language identification in ciphers
description In historical cryptology, cleartext represents text written in a known language ina cipher (a hand-written manuscript aiming at hiding the content of a message).Cleartext can give us an historical interpretation and contextualisation of themanuscript and could help researchers in cryptanalysis, but to these days thereis still no research on how to automatically detect cleartext and identifying itslanguage. In this paper, we investigate to what extent we can automaticallydistinguish cleartext from ciphertext in transcribed historical ciphers and towhat extent we are able to identify its language. We took a rule-based approachand run 7 different models using historical language models on ciphertextsprovided by the DECRYPT-Project. Our results show that using unigrams andbigrams on a word-level combined with 3-grams, 4-grams and 5-grams on acharacter-level is the best approach to tackle cleartext detection.
author Gambardella, Maria-Elena
author_facet Gambardella, Maria-Elena
author_sort Gambardella, Maria-Elena
title Cleartext detection and language identification in ciphers
title_short Cleartext detection and language identification in ciphers
title_full Cleartext detection and language identification in ciphers
title_fullStr Cleartext detection and language identification in ciphers
title_full_unstemmed Cleartext detection and language identification in ciphers
title_sort cleartext detection and language identification in ciphers
publisher Uppsala universitet, Institutionen för lingvistik och filologi
publishDate 2021
url http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-446439
work_keys_str_mv AT gambardellamariaelena cleartextdetectionandlanguageidentificationinciphers
_version_ 1719411542847389696