Cleartext detection and language identification in ciphers
In historical cryptology, cleartext represents text written in a known language ina cipher (a hand-written manuscript aiming at hiding the content of a message).Cleartext can give us an historical interpretation and contextualisation of themanuscript and could help researchers in cryptanalysis, but...
Main Author: | |
---|---|
Format: | Others |
Language: | English |
Published: |
Uppsala universitet, Institutionen för lingvistik och filologi
2021
|
Subjects: | |
Online Access: | http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-446439 |
id |
ndltd-UPSALLA1-oai-DiVA.org-uu-446439 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-UPSALLA1-oai-DiVA.org-uu-4464392021-06-20T05:33:02ZCleartext detection and language identification in ciphersengGambardella, Maria-ElenaUppsala universitet, Institutionen för lingvistik och filologi2021historical cryptologydigital humanitiesLanguage Technology (Computational Linguistics)Språkteknologi (språkvetenskaplig databehandling)In historical cryptology, cleartext represents text written in a known language ina cipher (a hand-written manuscript aiming at hiding the content of a message).Cleartext can give us an historical interpretation and contextualisation of themanuscript and could help researchers in cryptanalysis, but to these days thereis still no research on how to automatically detect cleartext and identifying itslanguage. In this paper, we investigate to what extent we can automaticallydistinguish cleartext from ciphertext in transcribed historical ciphers and towhat extent we are able to identify its language. We took a rule-based approachand run 7 different models using historical language models on ciphertextsprovided by the DECRYPT-Project. Our results show that using unigrams andbigrams on a word-level combined with 3-grams, 4-grams and 5-grams on acharacter-level is the best approach to tackle cleartext detection. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-446439application/pdfinfo:eu-repo/semantics/openAccess |
collection |
NDLTD |
language |
English |
format |
Others
|
sources |
NDLTD |
topic |
historical cryptology digital humanities Language Technology (Computational Linguistics) Språkteknologi (språkvetenskaplig databehandling) |
spellingShingle |
historical cryptology digital humanities Language Technology (Computational Linguistics) Språkteknologi (språkvetenskaplig databehandling) Gambardella, Maria-Elena Cleartext detection and language identification in ciphers |
description |
In historical cryptology, cleartext represents text written in a known language ina cipher (a hand-written manuscript aiming at hiding the content of a message).Cleartext can give us an historical interpretation and contextualisation of themanuscript and could help researchers in cryptanalysis, but to these days thereis still no research on how to automatically detect cleartext and identifying itslanguage. In this paper, we investigate to what extent we can automaticallydistinguish cleartext from ciphertext in transcribed historical ciphers and towhat extent we are able to identify its language. We took a rule-based approachand run 7 different models using historical language models on ciphertextsprovided by the DECRYPT-Project. Our results show that using unigrams andbigrams on a word-level combined with 3-grams, 4-grams and 5-grams on acharacter-level is the best approach to tackle cleartext detection. |
author |
Gambardella, Maria-Elena |
author_facet |
Gambardella, Maria-Elena |
author_sort |
Gambardella, Maria-Elena |
title |
Cleartext detection and language identification in ciphers |
title_short |
Cleartext detection and language identification in ciphers |
title_full |
Cleartext detection and language identification in ciphers |
title_fullStr |
Cleartext detection and language identification in ciphers |
title_full_unstemmed |
Cleartext detection and language identification in ciphers |
title_sort |
cleartext detection and language identification in ciphers |
publisher |
Uppsala universitet, Institutionen för lingvistik och filologi |
publishDate |
2021 |
url |
http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-446439 |
work_keys_str_mv |
AT gambardellamariaelena cleartextdetectionandlanguageidentificationinciphers |
_version_ |
1719411542847389696 |