Data Capture Automation in the South African Deeds Registry using Optical Character Recognition (OCR)

The impact of apartheid on land registration is still evident within South Africa. The Deeds Registry is facing a current backlog in registering an estimated 900,000 title deeds. Providing formal ownership, through title, is seen as necessary for unlocking the 'dead capital’ of unregistered pro...

Full description

Bibliographic Details
Main Author: Favish, Ashleigh
Other Authors: Georg, Co-Pierre
Format: Dissertation
Language:English
Published: Faculty of Commerce 2020
Subjects:
Online Access:http://hdl.handle.net/11427/31389
id ndltd-netd.ac.za-oai-union.ndltd.org-uct-oai-localhost-11427-31389
record_format oai_dc
spelling ndltd-netd.ac.za-oai-union.ndltd.org-uct-oai-localhost-11427-313892020-07-22T05:07:39Z Data Capture Automation in the South African Deeds Registry using Optical Character Recognition (OCR) Favish, Ashleigh Georg, Co-Pierre Financial Technology The impact of apartheid on land registration is still evident within South Africa. The Deeds Registry is facing a current backlog in registering an estimated 900,000 title deeds. Providing formal ownership, through title, is seen as necessary for unlocking the 'dead capital’ of unregistered property, fostering access to capital markets and poverty alleviation. Within the current legislative framework, the Deeds Registry only accepts paper documents, which introduces inefficiencies. To increase the number of deeds processed per day, automation of manual data capture is tested using an OCR pipeline. To adapt to the linguistics used in title deeds, text analysis and parsing is done using Regex. Uploading the scanned title deeds onto IPFS is as an additional security measure included in the pipeline. Previous research has failed to apply these techniques to formal land registration or other South African government institutions. The preliminary results show that this pipeline has an overall accuracy of 89.6%. This represents the comparison of the expected output to the output extracted using OCR. The results are significantly less accurate when classifying handwritten and stamped information. Thus, further measures are required to increase accuracy for these fields. The OCR accuracy was 98.3% for the fields extracted from typed text characters. This is within the accuracy range of manual data capture. A secondary quality check, which is currently done on manual data capture, would still be necessary to ensure accuracy of inputs. Overall it appears that this application would be appropriate for incorporation into the Deeds Registry to streamline their processes while ensuring title deed validity. 2020-02-28T11:46:12Z 2020-02-28T11:46:12Z 2019 2020-02-28T11:09:38Z Masters Thesis Masters MPhil http://hdl.handle.net/11427/31389 eng application/pdf Faculty of Commerce African Institute of Financial Markets and Risk Management
collection NDLTD
language English
format Dissertation
sources NDLTD
topic Financial Technology
spellingShingle Financial Technology
Favish, Ashleigh
Data Capture Automation in the South African Deeds Registry using Optical Character Recognition (OCR)
description The impact of apartheid on land registration is still evident within South Africa. The Deeds Registry is facing a current backlog in registering an estimated 900,000 title deeds. Providing formal ownership, through title, is seen as necessary for unlocking the 'dead capital’ of unregistered property, fostering access to capital markets and poverty alleviation. Within the current legislative framework, the Deeds Registry only accepts paper documents, which introduces inefficiencies. To increase the number of deeds processed per day, automation of manual data capture is tested using an OCR pipeline. To adapt to the linguistics used in title deeds, text analysis and parsing is done using Regex. Uploading the scanned title deeds onto IPFS is as an additional security measure included in the pipeline. Previous research has failed to apply these techniques to formal land registration or other South African government institutions. The preliminary results show that this pipeline has an overall accuracy of 89.6%. This represents the comparison of the expected output to the output extracted using OCR. The results are significantly less accurate when classifying handwritten and stamped information. Thus, further measures are required to increase accuracy for these fields. The OCR accuracy was 98.3% for the fields extracted from typed text characters. This is within the accuracy range of manual data capture. A secondary quality check, which is currently done on manual data capture, would still be necessary to ensure accuracy of inputs. Overall it appears that this application would be appropriate for incorporation into the Deeds Registry to streamline their processes while ensuring title deed validity.
author2 Georg, Co-Pierre
author_facet Georg, Co-Pierre
Favish, Ashleigh
author Favish, Ashleigh
author_sort Favish, Ashleigh
title Data Capture Automation in the South African Deeds Registry using Optical Character Recognition (OCR)
title_short Data Capture Automation in the South African Deeds Registry using Optical Character Recognition (OCR)
title_full Data Capture Automation in the South African Deeds Registry using Optical Character Recognition (OCR)
title_fullStr Data Capture Automation in the South African Deeds Registry using Optical Character Recognition (OCR)
title_full_unstemmed Data Capture Automation in the South African Deeds Registry using Optical Character Recognition (OCR)
title_sort data capture automation in the south african deeds registry using optical character recognition (ocr)
publisher Faculty of Commerce
publishDate 2020
url http://hdl.handle.net/11427/31389
work_keys_str_mv AT favishashleigh datacaptureautomationinthesouthafricandeedsregistryusingopticalcharacterrecognitionocr
_version_ 1719330720462143488