Text Preprocessing in Programmable Logic
There is a tremendous amount of information being generated and stored every year, and its growth rate is exponential. From 2008 to 2009, the growth rate was estimated to be 62%. In 2010, the amount of generated information is expected to grow by 50% to 1.2 Zettabytes, and by 2020 this rate is expec...
Main Author: | |
---|---|
Language: | en |
Published: |
2010
|
Subjects: | |
Online Access: | http://hdl.handle.net/10012/5366 |
id |
ndltd-LACETR-oai-collectionscanada.gc.ca-OWTU.10012-5366 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-LACETR-oai-collectionscanada.gc.ca-OWTU.10012-53662013-10-04T04:10:12ZSkiba, Michal2010-08-24T14:50:05Z2010-08-24T14:50:05Z2010-08-24T14:50:05Z2010-08-03http://hdl.handle.net/10012/5366There is a tremendous amount of information being generated and stored every year, and its growth rate is exponential. From 2008 to 2009, the growth rate was estimated to be 62%. In 2010, the amount of generated information is expected to grow by 50% to 1.2 Zettabytes, and by 2020 this rate is expected to grow to 35 Zettabytes. By preprocessing text in programmable logic, high data processing rates could be achieved with greater power efficiency than with an equivalent software solution, leading to a smaller carbon footprint. This thesis presents an overview of the fields of Information Retrieval and Natural Language Processing, and the design and implementation of four text preprocessing modules in programmable logic: UTF–8 decoding, stop–word filtering, and stemming with both Lovins’ and Porter’s techniques. These extensively pipelined circuits were implemented in a high performance FPGA and found to sustain maximum operational frequencies of 704 MHz, data throughputs in excess of 5 Gbps and efficiencies in the range of 4.332 – 6.765 mW/Gbps and 34.66 – 108.2 uW/MHz. These circuits can be incorporated into larger systems, such as document classifiers and information extraction engines.enProgrammable LogicText ProcessingText Preprocessing in Programmable LogicThesis or DissertationElectrical and Computer EngineeringMaster of Applied ScienceElectrical and Computer Engineering |
collection |
NDLTD |
language |
en |
sources |
NDLTD |
topic |
Programmable Logic Text Processing Electrical and Computer Engineering |
spellingShingle |
Programmable Logic Text Processing Electrical and Computer Engineering Skiba, Michal Text Preprocessing in Programmable Logic |
description |
There is a tremendous amount of information being generated and stored every year, and its growth rate is exponential. From 2008 to 2009, the growth rate was estimated to be 62%. In 2010, the amount of generated information is expected to grow by 50% to 1.2 Zettabytes, and by 2020 this rate is expected to grow to 35 Zettabytes. By preprocessing text in programmable logic, high data processing rates could be achieved
with greater power efficiency than with an equivalent software solution, leading to a smaller carbon footprint.
This thesis presents an overview of the fields of Information Retrieval and Natural Language Processing, and the design and implementation of four text preprocessing modules in programmable logic: UTF–8 decoding, stop–word filtering, and stemming with both Lovins’ and Porter’s techniques. These extensively pipelined circuits were implemented in a high performance FPGA and found to sustain maximum operational frequencies of 704 MHz, data throughputs in excess of 5 Gbps and efficiencies in the range of 4.332 – 6.765 mW/Gbps and 34.66 – 108.2 uW/MHz. These circuits can be incorporated into larger systems, such as document classifiers and information extraction engines. |
author |
Skiba, Michal |
author_facet |
Skiba, Michal |
author_sort |
Skiba, Michal |
title |
Text Preprocessing in Programmable Logic |
title_short |
Text Preprocessing in Programmable Logic |
title_full |
Text Preprocessing in Programmable Logic |
title_fullStr |
Text Preprocessing in Programmable Logic |
title_full_unstemmed |
Text Preprocessing in Programmable Logic |
title_sort |
text preprocessing in programmable logic |
publishDate |
2010 |
url |
http://hdl.handle.net/10012/5366 |
work_keys_str_mv |
AT skibamichal textpreprocessinginprogrammablelogic |
_version_ |
1716600464076701696 |