Using Machine Learning to Categorize Documents in a Construction Project

Automation of document handling in the construction industries could save large amounts of time, effort and money and classifying a document is an important step in that automation. In the field of machine learning, lots of research have been done on perfecting the algorithms and techniques, but the...

Full description

Bibliographic Details
Main Author: Björkendal, Nicklas
Format: Others
Language:English
Published: Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM) 2019
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-90091
Description
Summary:Automation of document handling in the construction industries could save large amounts of time, effort and money and classifying a document is an important step in that automation. In the field of machine learning, lots of research have been done on perfecting the algorithms and techniques, but there are many areas where those techniques could be used that has not yet been studied. In this study I looked at how effectively the machine learning algorithm multinomial Naïve-Bayes would be able to classify 1427 documents split up into 19 different categories from a construction project. The experiment achieved an accuracy of 92.7% and the paper discusses some of the ways that accuracy can be improved. However, data extraction proved to be a bottleneck and only 66% of the original documents could be used for testing the classifier.