Using data mining for digital ink recognition

Computational recognition of hand-drawn diagrams has come a long way but is still inadequate for general use. This research uses data mining techniques to improve the accuracy of recognition. We focus on text-shape division as a challenging example that benefits from this approach. Surprisingly,...

Full description

Bibliographic Details
Main Author:	Blagojevic, Rachel Venita
Other Authors:	Plimmer, Beryl
Published:	ResearchSpace@Auckland 2011
Online Access:	http://hdl.handle.net/2292/7526

id	ndltd-AUCKLAND-oai-researchspace.auckland.ac.nz-2292-7526
record_format	oai_dc
spelling	ndltd-AUCKLAND-oai-researchspace.auckland.ac.nz-2292-75262012-03-21T22:50:17ZUsing data mining for digital ink recognitionBlagojevic, Rachel VenitaComputational recognition of hand-drawn diagrams has come a long way but is still inadequate for general use. This research uses data mining techniques to improve the accuracy of recognition. We focus on text-shape division as a challenging example that benefits from this approach. Surprisingly, although text is a fundamental part of diagrams it has been largely ignored. A review of the literature will show that feature-based recognisers are ideal candidates for solving these types of problems. Such recognisers require a good feature set and a suitable algorithm. For recognition to be successful, the features fed into the algorithms must provide good distinguishing characteristics between classes of interest. While small feature sets have been reported, currently there is no extensive survey of existing features employed for sketch recognition. Such a survey could act as a library for algorithms to employ for a given problem in sketch recognition. In addition, while various algorithms have been tried, there has been no extensive study of algorithms to determine the most optimal fit for accurate text-shape dividers. To build our text-shape dividers, we have assembled a comprehensive library of ink features that can be used for sketch recognition problems and compiled a large repository of labelled sketch data. To collect this data we built our own tool, DataManager, which includes support for collecting and labelling sketches as well as automatically generating datasets. Using this feature library and data repository a systematic investigation and tuning of machine learning algorithms has identified the algorithms best suited to text-shape division. The extensive evaluation on diagrams from six different domains has shown that our resulting dividers, using LADTree and LogitBoost, are significantly more accurate than three existing dividers. To our knowledge, these algorithms have not been used for text-shape division before.ResearchSpace@AucklandPlimmer, BerylGrundy, JohnWang, Yong2011-08-25T23:44:27Z2011-08-25T23:44:27Z2011Thesishttp://hdl.handle.net/2292/7526PhD Thesis - University of AucklandUoA2167907Items in ResearchSpace are protected by copyright, with all rights reserved, unless otherwise indicated. Previously published items are made available in accordance with the copyright policy of the publisher.https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htmhttp://creativecommons.org/licenses/by-nc-sa/3.0/nz/Copyright: The author
collection	NDLTD
sources	NDLTD
description	Computational recognition of hand-drawn diagrams has come a long way but is still inadequate for general use. This research uses data mining techniques to improve the accuracy of recognition. We focus on text-shape division as a challenging example that benefits from this approach. Surprisingly, although text is a fundamental part of diagrams it has been largely ignored. A review of the literature will show that feature-based recognisers are ideal candidates for solving these types of problems. Such recognisers require a good feature set and a suitable algorithm. For recognition to be successful, the features fed into the algorithms must provide good distinguishing characteristics between classes of interest. While small feature sets have been reported, currently there is no extensive survey of existing features employed for sketch recognition. Such a survey could act as a library for algorithms to employ for a given problem in sketch recognition. In addition, while various algorithms have been tried, there has been no extensive study of algorithms to determine the most optimal fit for accurate text-shape dividers. To build our text-shape dividers, we have assembled a comprehensive library of ink features that can be used for sketch recognition problems and compiled a large repository of labelled sketch data. To collect this data we built our own tool, DataManager, which includes support for collecting and labelling sketches as well as automatically generating datasets. Using this feature library and data repository a systematic investigation and tuning of machine learning algorithms has identified the algorithms best suited to text-shape division. The extensive evaluation on diagrams from six different domains has shown that our resulting dividers, using LADTree and LogitBoost, are significantly more accurate than three existing dividers. To our knowledge, these algorithms have not been used for text-shape division before.
author2	Plimmer, Beryl
author_facet	Plimmer, Beryl Blagojevic, Rachel Venita
author	Blagojevic, Rachel Venita
spellingShingle	Blagojevic, Rachel Venita Using data mining for digital ink recognition
author_sort	Blagojevic, Rachel Venita
title	Using data mining for digital ink recognition
title_short	Using data mining for digital ink recognition
title_full	Using data mining for digital ink recognition
title_fullStr	Using data mining for digital ink recognition
title_full_unstemmed	Using data mining for digital ink recognition
title_sort	using data mining for digital ink recognition
publisher	ResearchSpace@Auckland
publishDate	2011
url	http://hdl.handle.net/2292/7526
work_keys_str_mv	AT blagojevicrachelvenita usingdataminingfordigitalinkrecognition
_version_	1716391018052452352

Using data mining for digital ink recognition

Similar Items