Text Preprocessing in Programmable Logic

There is a tremendous amount of information being generated and stored every year, and its growth rate is exponential. From 2008 to 2009, the growth rate was estimated to be 62%. In 2010, the amount of generated information is expected to grow by 50% to 1.2 Zettabytes, and by 2020 this rate is expec...

Full description

Bibliographic Details
Main Author: Skiba, Michal
Language:en
Published: 2010
Subjects:
Online Access:http://hdl.handle.net/10012/5366
id ndltd-LACETR-oai-collectionscanada.gc.ca-OWTU.10012-5366
record_format oai_dc
spelling ndltd-LACETR-oai-collectionscanada.gc.ca-OWTU.10012-53662013-10-04T04:10:12ZSkiba, Michal2010-08-24T14:50:05Z2010-08-24T14:50:05Z2010-08-24T14:50:05Z2010-08-03http://hdl.handle.net/10012/5366There is a tremendous amount of information being generated and stored every year, and its growth rate is exponential. From 2008 to 2009, the growth rate was estimated to be 62%. In 2010, the amount of generated information is expected to grow by 50% to 1.2 Zettabytes, and by 2020 this rate is expected to grow to 35 Zettabytes. By preprocessing text in programmable logic, high data processing rates could be achieved with greater power efficiency than with an equivalent software solution, leading to a smaller carbon footprint. This thesis presents an overview of the fields of Information Retrieval and Natural Language Processing, and the design and implementation of four text preprocessing modules in programmable logic: UTF–8 decoding, stop–word filtering, and stemming with both Lovins’ and Porter’s techniques. These extensively pipelined circuits were implemented in a high performance FPGA and found to sustain maximum operational frequencies of 704 MHz, data throughputs in excess of 5 Gbps and efficiencies in the range of 4.332 – 6.765 mW/Gbps and 34.66 – 108.2 uW/MHz. These circuits can be incorporated into larger systems, such as document classifiers and information extraction engines.enProgrammable LogicText ProcessingText Preprocessing in Programmable LogicThesis or DissertationElectrical and Computer EngineeringMaster of Applied ScienceElectrical and Computer Engineering
collection NDLTD
language en
sources NDLTD
topic Programmable Logic
Text Processing
Electrical and Computer Engineering
spellingShingle Programmable Logic
Text Processing
Electrical and Computer Engineering
Skiba, Michal
Text Preprocessing in Programmable Logic
description There is a tremendous amount of information being generated and stored every year, and its growth rate is exponential. From 2008 to 2009, the growth rate was estimated to be 62%. In 2010, the amount of generated information is expected to grow by 50% to 1.2 Zettabytes, and by 2020 this rate is expected to grow to 35 Zettabytes. By preprocessing text in programmable logic, high data processing rates could be achieved with greater power efficiency than with an equivalent software solution, leading to a smaller carbon footprint. This thesis presents an overview of the fields of Information Retrieval and Natural Language Processing, and the design and implementation of four text preprocessing modules in programmable logic: UTF–8 decoding, stop–word filtering, and stemming with both Lovins’ and Porter’s techniques. These extensively pipelined circuits were implemented in a high performance FPGA and found to sustain maximum operational frequencies of 704 MHz, data throughputs in excess of 5 Gbps and efficiencies in the range of 4.332 – 6.765 mW/Gbps and 34.66 – 108.2 uW/MHz. These circuits can be incorporated into larger systems, such as document classifiers and information extraction engines.
author Skiba, Michal
author_facet Skiba, Michal
author_sort Skiba, Michal
title Text Preprocessing in Programmable Logic
title_short Text Preprocessing in Programmable Logic
title_full Text Preprocessing in Programmable Logic
title_fullStr Text Preprocessing in Programmable Logic
title_full_unstemmed Text Preprocessing in Programmable Logic
title_sort text preprocessing in programmable logic
publishDate 2010
url http://hdl.handle.net/10012/5366
work_keys_str_mv AT skibamichal textpreprocessinginprogrammablelogic
_version_ 1716600464076701696