Morphological segmentation : an unsupervised method and application to Keyword Spotting
Thesis: S.M. in Computer Science and Engineering, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014. === 26 === Cataloged from PDF version of thesis. === Includes bibliographical references (pages 41-44). === The contributions of this thesis are t...
Main Author: | |
---|---|
Other Authors: | |
Format: | Others |
Language: | English |
Published: |
Massachusetts Institute of Technology
2014
|
Subjects: | |
Online Access: | http://hdl.handle.net/1721.1/90139 |
id |
ndltd-MIT-oai-dspace.mit.edu-1721.1-90139 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-MIT-oai-dspace.mit.edu-1721.1-901392019-05-02T16:02:13Z Morphological segmentation : an unsupervised method and application to Keyword Spotting Unsupervised method and application to KWS Narasimhan, Karthik Rajagopal Regina Barzilay. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis: S.M. in Computer Science and Engineering, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014. 26 Cataloged from PDF version of thesis. Includes bibliographical references (pages 41-44). The contributions of this thesis are twofold. First, we present a new unsupervised algorithm for morphological segmentation that utilizes pseudo-semantic information, in addition to orthographic cues. We make use of the semantic signals from continuous word vectors, trained on huge corpora of raw text data. We formulate a log-linear model that is simple and can be used to perform fast, efficient inference on new words. We evaluate our model on a standard morphological segmentation dataset, and obtain large performance gains of up to 18.4% over an existing state-of-the-art system, Morfessor. Second, we explore the impact of morphological segmentation on the speech recognition task of Keyword Spotting (KWS). Despite potential benefits, state-of-the-art KWS systems do not use morphological information. In this thesis, we augment a KWS system with sub-word units derived by multiple segmentation algorithms including supervised and unsupervised morphological segmentations, along with phonetic and syllabic segmentations. Our experiments demonstrate that morphemes improve overall performance of KWS systems. Syllabic units, however, rival the performance of morphological units when used in KWS. By combining morphological and syllabic segmentations, we demonstrate substantial performance gains.. by Karthik Rajagopal Narasimhan. S.M. in Computer Science and Engineering 2014-09-19T21:42:03Z 2014-09-19T21:42:03Z 2014 2014 Thesis http://hdl.handle.net/1721.1/90139 890151805 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 44 pages application/pdf Massachusetts Institute of Technology |
collection |
NDLTD |
language |
English |
format |
Others
|
sources |
NDLTD |
topic |
Electrical Engineering and Computer Science. |
spellingShingle |
Electrical Engineering and Computer Science. Narasimhan, Karthik Rajagopal Morphological segmentation : an unsupervised method and application to Keyword Spotting |
description |
Thesis: S.M. in Computer Science and Engineering, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014. === 26 === Cataloged from PDF version of thesis. === Includes bibliographical references (pages 41-44). === The contributions of this thesis are twofold. First, we present a new unsupervised algorithm for morphological segmentation that utilizes pseudo-semantic information, in addition to orthographic cues. We make use of the semantic signals from continuous word vectors, trained on huge corpora of raw text data. We formulate a log-linear model that is simple and can be used to perform fast, efficient inference on new words. We evaluate our model on a standard morphological segmentation dataset, and obtain large performance gains of up to 18.4% over an existing state-of-the-art system, Morfessor. Second, we explore the impact of morphological segmentation on the speech recognition task of Keyword Spotting (KWS). Despite potential benefits, state-of-the-art KWS systems do not use morphological information. In this thesis, we augment a KWS system with sub-word units derived by multiple segmentation algorithms including supervised and unsupervised morphological segmentations, along with phonetic and syllabic segmentations. Our experiments demonstrate that morphemes improve overall performance of KWS systems. Syllabic units, however, rival the performance of morphological units when used in KWS. By combining morphological and syllabic segmentations, we demonstrate substantial performance gains.. === by Karthik Rajagopal Narasimhan. === S.M. in Computer Science and Engineering |
author2 |
Regina Barzilay. |
author_facet |
Regina Barzilay. Narasimhan, Karthik Rajagopal |
author |
Narasimhan, Karthik Rajagopal |
author_sort |
Narasimhan, Karthik Rajagopal |
title |
Morphological segmentation : an unsupervised method and application to Keyword Spotting |
title_short |
Morphological segmentation : an unsupervised method and application to Keyword Spotting |
title_full |
Morphological segmentation : an unsupervised method and application to Keyword Spotting |
title_fullStr |
Morphological segmentation : an unsupervised method and application to Keyword Spotting |
title_full_unstemmed |
Morphological segmentation : an unsupervised method and application to Keyword Spotting |
title_sort |
morphological segmentation : an unsupervised method and application to keyword spotting |
publisher |
Massachusetts Institute of Technology |
publishDate |
2014 |
url |
http://hdl.handle.net/1721.1/90139 |
work_keys_str_mv |
AT narasimhankarthikrajagopal morphologicalsegmentationanunsupervisedmethodandapplicationtokeywordspotting AT narasimhankarthikrajagopal unsupervisedmethodandapplicationtokws |
_version_ |
1719033600376045568 |