Morphological segmentation : an unsupervised method and application to Keyword Spotting

Thesis: S.M. in Computer Science and Engineering, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014. === 26 === Cataloged from PDF version of thesis. === Includes bibliographical references (pages 41-44). === The contributions of this thesis are t...

Full description

Bibliographic Details
Main Author: Narasimhan, Karthik Rajagopal
Other Authors: Regina Barzilay.
Format: Others
Language:English
Published: Massachusetts Institute of Technology 2014
Subjects:
Online Access:http://hdl.handle.net/1721.1/90139
id ndltd-MIT-oai-dspace.mit.edu-1721.1-90139
record_format oai_dc
spelling ndltd-MIT-oai-dspace.mit.edu-1721.1-901392019-05-02T16:02:13Z Morphological segmentation : an unsupervised method and application to Keyword Spotting Unsupervised method and application to KWS Narasimhan, Karthik Rajagopal Regina Barzilay. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis: S.M. in Computer Science and Engineering, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014. 26 Cataloged from PDF version of thesis. Includes bibliographical references (pages 41-44). The contributions of this thesis are twofold. First, we present a new unsupervised algorithm for morphological segmentation that utilizes pseudo-semantic information, in addition to orthographic cues. We make use of the semantic signals from continuous word vectors, trained on huge corpora of raw text data. We formulate a log-linear model that is simple and can be used to perform fast, efficient inference on new words. We evaluate our model on a standard morphological segmentation dataset, and obtain large performance gains of up to 18.4% over an existing state-of-the-art system, Morfessor. Second, we explore the impact of morphological segmentation on the speech recognition task of Keyword Spotting (KWS). Despite potential benefits, state-of-the-art KWS systems do not use morphological information. In this thesis, we augment a KWS system with sub-word units derived by multiple segmentation algorithms including supervised and unsupervised morphological segmentations, along with phonetic and syllabic segmentations. Our experiments demonstrate that morphemes improve overall performance of KWS systems. Syllabic units, however, rival the performance of morphological units when used in KWS. By combining morphological and syllabic segmentations, we demonstrate substantial performance gains.. by Karthik Rajagopal Narasimhan. S.M. in Computer Science and Engineering 2014-09-19T21:42:03Z 2014-09-19T21:42:03Z 2014 2014 Thesis http://hdl.handle.net/1721.1/90139 890151805 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 44 pages application/pdf Massachusetts Institute of Technology
collection NDLTD
language English
format Others
sources NDLTD
topic Electrical Engineering and Computer Science.
spellingShingle Electrical Engineering and Computer Science.
Narasimhan, Karthik Rajagopal
Morphological segmentation : an unsupervised method and application to Keyword Spotting
description Thesis: S.M. in Computer Science and Engineering, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014. === 26 === Cataloged from PDF version of thesis. === Includes bibliographical references (pages 41-44). === The contributions of this thesis are twofold. First, we present a new unsupervised algorithm for morphological segmentation that utilizes pseudo-semantic information, in addition to orthographic cues. We make use of the semantic signals from continuous word vectors, trained on huge corpora of raw text data. We formulate a log-linear model that is simple and can be used to perform fast, efficient inference on new words. We evaluate our model on a standard morphological segmentation dataset, and obtain large performance gains of up to 18.4% over an existing state-of-the-art system, Morfessor. Second, we explore the impact of morphological segmentation on the speech recognition task of Keyword Spotting (KWS). Despite potential benefits, state-of-the-art KWS systems do not use morphological information. In this thesis, we augment a KWS system with sub-word units derived by multiple segmentation algorithms including supervised and unsupervised morphological segmentations, along with phonetic and syllabic segmentations. Our experiments demonstrate that morphemes improve overall performance of KWS systems. Syllabic units, however, rival the performance of morphological units when used in KWS. By combining morphological and syllabic segmentations, we demonstrate substantial performance gains.. === by Karthik Rajagopal Narasimhan. === S.M. in Computer Science and Engineering
author2 Regina Barzilay.
author_facet Regina Barzilay.
Narasimhan, Karthik Rajagopal
author Narasimhan, Karthik Rajagopal
author_sort Narasimhan, Karthik Rajagopal
title Morphological segmentation : an unsupervised method and application to Keyword Spotting
title_short Morphological segmentation : an unsupervised method and application to Keyword Spotting
title_full Morphological segmentation : an unsupervised method and application to Keyword Spotting
title_fullStr Morphological segmentation : an unsupervised method and application to Keyword Spotting
title_full_unstemmed Morphological segmentation : an unsupervised method and application to Keyword Spotting
title_sort morphological segmentation : an unsupervised method and application to keyword spotting
publisher Massachusetts Institute of Technology
publishDate 2014
url http://hdl.handle.net/1721.1/90139
work_keys_str_mv AT narasimhankarthikrajagopal morphologicalsegmentationanunsupervisedmethodandapplicationtokeywordspotting
AT narasimhankarthikrajagopal unsupervisedmethodandapplicationtokws
_version_ 1719033600376045568