Automated Culling of Data in a Relational Database for Archiving
Background. Archiving of legacy information systems is challenging. When no options exist for extracting the information in a structured way, the last resort is to save the database. Optimally only the information that is relevant should be saved and the rest of the information could be removed. Obj...
Main Author: | |
---|---|
Format: | Others |
Language: | English |
Published: |
Blekinge Tekniska Högskola, Institutionen för programvaruteknik
2019
|
Subjects: | |
Online Access: | http://urn.kb.se/resolve?urn=urn:nbn:se:bth-18261 |
id |
ndltd-UPSALLA1-oai-DiVA.org-bth-18261 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-UPSALLA1-oai-DiVA.org-bth-182612019-08-20T04:27:16ZAutomated Culling of Data in a Relational Database for ArchivingengNilsson, SimonBlekinge Tekniska Högskola, Institutionen för programvaruteknik2019digital preservationInformation Systemslegacy systemsdatabase reverse engineeringComputer SystemsDatorsystemSoftware EngineeringProgramvaruteknikBackground. Archiving of legacy information systems is challenging. When no options exist for extracting the information in a structured way, the last resort is to save the database. Optimally only the information that is relevant should be saved and the rest of the information could be removed. Objectives. The goal is to develop a method for assisting the archivist in the process of culling a database before archiving. The method should be described as rules defining how the tables can be identified.Methods. To get an overview of how the process works today and what archivists think can be improved, a number of interviews with experts in database archiving is done. The results from the interviews are then analysed, together with test databases to define rules that can be used in a general case. The rules are then implemented in a prototype that is tested and evaluated to verify if the method works. Results. The results point to the algorithm being both faster and able to exclude more irrelevant tables than a person could do with the manual method. An algorithm for finding candidate keys has also been improved to decrease the number of tests and execution time in the worst case. Conclusions. The evaluation shows results that point to the method working as intended while resulting in less work for the archivist. More work should be done on this method to improve it further. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:bth-18261application/pdfinfo:eu-repo/semantics/openAccess |
collection |
NDLTD |
language |
English |
format |
Others
|
sources |
NDLTD |
topic |
digital preservation Information Systems legacy systems database reverse engineering Computer Systems Datorsystem Software Engineering Programvaruteknik |
spellingShingle |
digital preservation Information Systems legacy systems database reverse engineering Computer Systems Datorsystem Software Engineering Programvaruteknik Nilsson, Simon Automated Culling of Data in a Relational Database for Archiving |
description |
Background. Archiving of legacy information systems is challenging. When no options exist for extracting the information in a structured way, the last resort is to save the database. Optimally only the information that is relevant should be saved and the rest of the information could be removed. Objectives. The goal is to develop a method for assisting the archivist in the process of culling a database before archiving. The method should be described as rules defining how the tables can be identified.Methods. To get an overview of how the process works today and what archivists think can be improved, a number of interviews with experts in database archiving is done. The results from the interviews are then analysed, together with test databases to define rules that can be used in a general case. The rules are then implemented in a prototype that is tested and evaluated to verify if the method works. Results. The results point to the algorithm being both faster and able to exclude more irrelevant tables than a person could do with the manual method. An algorithm for finding candidate keys has also been improved to decrease the number of tests and execution time in the worst case. Conclusions. The evaluation shows results that point to the method working as intended while resulting in less work for the archivist. More work should be done on this method to improve it further. |
author |
Nilsson, Simon |
author_facet |
Nilsson, Simon |
author_sort |
Nilsson, Simon |
title |
Automated Culling of Data in a Relational Database for Archiving |
title_short |
Automated Culling of Data in a Relational Database for Archiving |
title_full |
Automated Culling of Data in a Relational Database for Archiving |
title_fullStr |
Automated Culling of Data in a Relational Database for Archiving |
title_full_unstemmed |
Automated Culling of Data in a Relational Database for Archiving |
title_sort |
automated culling of data in a relational database for archiving |
publisher |
Blekinge Tekniska Högskola, Institutionen för programvaruteknik |
publishDate |
2019 |
url |
http://urn.kb.se/resolve?urn=urn:nbn:se:bth-18261 |
work_keys_str_mv |
AT nilssonsimon automatedcullingofdatainarelationaldatabaseforarchiving |
_version_ |
1719235920758046720 |