Data Capture Automation in the South African Deeds Registry using Optical Character Recognition (OCR)
The impact of apartheid on land registration is still evident within South Africa. The Deeds Registry is facing a current backlog in registering an estimated 900,000 title deeds. Providing formal ownership, through title, is seen as necessary for unlocking the 'dead capital’ of unregistered pro...
Main Author: | |
---|---|
Other Authors: | |
Format: | Dissertation |
Language: | English |
Published: |
Faculty of Commerce
2020
|
Subjects: | |
Online Access: | http://hdl.handle.net/11427/31389 |
id |
ndltd-netd.ac.za-oai-union.ndltd.org-uct-oai-localhost-11427-31389 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-netd.ac.za-oai-union.ndltd.org-uct-oai-localhost-11427-313892020-07-22T05:07:39Z Data Capture Automation in the South African Deeds Registry using Optical Character Recognition (OCR) Favish, Ashleigh Georg, Co-Pierre Financial Technology The impact of apartheid on land registration is still evident within South Africa. The Deeds Registry is facing a current backlog in registering an estimated 900,000 title deeds. Providing formal ownership, through title, is seen as necessary for unlocking the 'dead capital’ of unregistered property, fostering access to capital markets and poverty alleviation. Within the current legislative framework, the Deeds Registry only accepts paper documents, which introduces inefficiencies. To increase the number of deeds processed per day, automation of manual data capture is tested using an OCR pipeline. To adapt to the linguistics used in title deeds, text analysis and parsing is done using Regex. Uploading the scanned title deeds onto IPFS is as an additional security measure included in the pipeline. Previous research has failed to apply these techniques to formal land registration or other South African government institutions. The preliminary results show that this pipeline has an overall accuracy of 89.6%. This represents the comparison of the expected output to the output extracted using OCR. The results are significantly less accurate when classifying handwritten and stamped information. Thus, further measures are required to increase accuracy for these fields. The OCR accuracy was 98.3% for the fields extracted from typed text characters. This is within the accuracy range of manual data capture. A secondary quality check, which is currently done on manual data capture, would still be necessary to ensure accuracy of inputs. Overall it appears that this application would be appropriate for incorporation into the Deeds Registry to streamline their processes while ensuring title deed validity. 2020-02-28T11:46:12Z 2020-02-28T11:46:12Z 2019 2020-02-28T11:09:38Z Masters Thesis Masters MPhil http://hdl.handle.net/11427/31389 eng application/pdf Faculty of Commerce African Institute of Financial Markets and Risk Management |
collection |
NDLTD |
language |
English |
format |
Dissertation |
sources |
NDLTD |
topic |
Financial Technology |
spellingShingle |
Financial Technology Favish, Ashleigh Data Capture Automation in the South African Deeds Registry using Optical Character Recognition (OCR) |
description |
The impact of apartheid on land registration is still evident within South Africa. The Deeds Registry is facing a current backlog in registering an estimated 900,000 title deeds. Providing formal ownership, through title, is seen as necessary for unlocking the 'dead capital’ of unregistered property, fostering access to capital markets and poverty alleviation. Within the current legislative framework, the Deeds Registry only accepts paper documents, which introduces inefficiencies. To increase the number of deeds processed per day, automation of manual data capture is tested using an OCR pipeline. To adapt to the linguistics used in title deeds, text analysis and parsing is done using Regex. Uploading the scanned title deeds onto IPFS is as an additional security measure included in the pipeline. Previous research has failed to apply these techniques to formal land registration or other South African government institutions. The preliminary results show that this pipeline has an overall accuracy of 89.6%. This represents the comparison of the expected output to the output extracted using OCR. The results are significantly less accurate when classifying handwritten and stamped information. Thus, further measures are required to increase accuracy for these fields. The OCR accuracy was 98.3% for the fields extracted from typed text characters. This is within the accuracy range of manual data capture. A secondary quality check, which is currently done on manual data capture, would still be necessary to ensure accuracy of inputs. Overall it appears that this application would be appropriate for incorporation into the Deeds Registry to streamline their processes while ensuring title deed validity. |
author2 |
Georg, Co-Pierre |
author_facet |
Georg, Co-Pierre Favish, Ashleigh |
author |
Favish, Ashleigh |
author_sort |
Favish, Ashleigh |
title |
Data Capture Automation in the South African Deeds Registry using Optical Character Recognition (OCR) |
title_short |
Data Capture Automation in the South African Deeds Registry using Optical Character Recognition (OCR) |
title_full |
Data Capture Automation in the South African Deeds Registry using Optical Character Recognition (OCR) |
title_fullStr |
Data Capture Automation in the South African Deeds Registry using Optical Character Recognition (OCR) |
title_full_unstemmed |
Data Capture Automation in the South African Deeds Registry using Optical Character Recognition (OCR) |
title_sort |
data capture automation in the south african deeds registry using optical character recognition (ocr) |
publisher |
Faculty of Commerce |
publishDate |
2020 |
url |
http://hdl.handle.net/11427/31389 |
work_keys_str_mv |
AT favishashleigh datacaptureautomationinthesouthafricandeedsregistryusingopticalcharacterrecognitionocr |
_version_ |
1719330720462143488 |