Using data mining for digital ink recognition

Computational recognition of hand-drawn diagrams has come a long way but is still inadequate for general use. This research uses data mining techniques to improve the accuracy of recognition. We focus on text-shape division as a challenging example that benefits from this approach. Surprisingly,...

Full description

Bibliographic Details
Main Author: Blagojevic, Rachel Venita
Other Authors: Plimmer, Beryl
Published: ResearchSpace@Auckland 2011
Online Access:http://hdl.handle.net/2292/7526
id ndltd-AUCKLAND-oai-researchspace.auckland.ac.nz-2292-7526
record_format oai_dc
spelling ndltd-AUCKLAND-oai-researchspace.auckland.ac.nz-2292-75262012-03-21T22:50:17ZUsing data mining for digital ink recognitionBlagojevic, Rachel VenitaComputational recognition of hand-drawn diagrams has come a long way but is still inadequate for general use. This research uses data mining techniques to improve the accuracy of recognition. We focus on text-shape division as a challenging example that benefits from this approach. Surprisingly, although text is a fundamental part of diagrams it has been largely ignored. A review of the literature will show that feature-based recognisers are ideal candidates for solving these types of problems. Such recognisers require a good feature set and a suitable algorithm. For recognition to be successful, the features fed into the algorithms must provide good distinguishing characteristics between classes of interest. While small feature sets have been reported, currently there is no extensive survey of existing features employed for sketch recognition. Such a survey could act as a library for algorithms to employ for a given problem in sketch recognition. In addition, while various algorithms have been tried, there has been no extensive study of algorithms to determine the most optimal fit for accurate text-shape dividers. To build our text-shape dividers, we have assembled a comprehensive library of ink features that can be used for sketch recognition problems and compiled a large repository of labelled sketch data. To collect this data we built our own tool, DataManager, which includes support for collecting and labelling sketches as well as automatically generating datasets. Using this feature library and data repository a systematic investigation and tuning of machine learning algorithms has identified the algorithms best suited to text-shape division. The extensive evaluation on diagrams from six different domains has shown that our resulting dividers, using LADTree and LogitBoost, are significantly more accurate than three existing dividers. To our knowledge, these algorithms have not been used for text-shape division before.ResearchSpace@AucklandPlimmer, BerylGrundy, JohnWang, Yong2011-08-25T23:44:27Z2011-08-25T23:44:27Z2011Thesishttp://hdl.handle.net/2292/7526PhD Thesis - University of AucklandUoA2167907Items in ResearchSpace are protected by copyright, with all rights reserved, unless otherwise indicated. Previously published items are made available in accordance with the copyright policy of the publisher.https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htmhttp://creativecommons.org/licenses/by-nc-sa/3.0/nz/Copyright: The author
collection NDLTD
sources NDLTD
description Computational recognition of hand-drawn diagrams has come a long way but is still inadequate for general use. This research uses data mining techniques to improve the accuracy of recognition. We focus on text-shape division as a challenging example that benefits from this approach. Surprisingly, although text is a fundamental part of diagrams it has been largely ignored. A review of the literature will show that feature-based recognisers are ideal candidates for solving these types of problems. Such recognisers require a good feature set and a suitable algorithm. For recognition to be successful, the features fed into the algorithms must provide good distinguishing characteristics between classes of interest. While small feature sets have been reported, currently there is no extensive survey of existing features employed for sketch recognition. Such a survey could act as a library for algorithms to employ for a given problem in sketch recognition. In addition, while various algorithms have been tried, there has been no extensive study of algorithms to determine the most optimal fit for accurate text-shape dividers. To build our text-shape dividers, we have assembled a comprehensive library of ink features that can be used for sketch recognition problems and compiled a large repository of labelled sketch data. To collect this data we built our own tool, DataManager, which includes support for collecting and labelling sketches as well as automatically generating datasets. Using this feature library and data repository a systematic investigation and tuning of machine learning algorithms has identified the algorithms best suited to text-shape division. The extensive evaluation on diagrams from six different domains has shown that our resulting dividers, using LADTree and LogitBoost, are significantly more accurate than three existing dividers. To our knowledge, these algorithms have not been used for text-shape division before.
author2 Plimmer, Beryl
author_facet Plimmer, Beryl
Blagojevic, Rachel Venita
author Blagojevic, Rachel Venita
spellingShingle Blagojevic, Rachel Venita
Using data mining for digital ink recognition
author_sort Blagojevic, Rachel Venita
title Using data mining for digital ink recognition
title_short Using data mining for digital ink recognition
title_full Using data mining for digital ink recognition
title_fullStr Using data mining for digital ink recognition
title_full_unstemmed Using data mining for digital ink recognition
title_sort using data mining for digital ink recognition
publisher ResearchSpace@Auckland
publishDate 2011
url http://hdl.handle.net/2292/7526
work_keys_str_mv AT blagojevicrachelvenita usingdataminingfordigitalinkrecognition
_version_ 1716391018052452352