Sentence boundary detection without speech recognition: A case of an underresourced language

Sentence boundary detection (SBD), also known as sentence segmentation decides where a sentence begins and ends. Previous method of SBD is either done by linguistic approach or acoustic approach; or combination of both approaches. Even though linguistic approach generally performed better than ac...

Full description

Bibliographic Details
Main Authors: Nursuriati Jamil, Muhammad Izzad Ramli, Noraini Seman
Format: Article
Language:English
Published: ESRGroups 2015-09-01
Series:Journal of Electrical Systems
Subjects:
Online Access:https://journal.esrgroups.org/jes/papers/11_3_6.pdf
id doaj-1bcb342debe8453b8549651e2d0893e7
record_format Article
spelling doaj-1bcb342debe8453b8549651e2d0893e72020-11-25T02:43:22ZengESRGroupsJournal of Electrical Systems1112-52091112-52092015-09-01113308318Sentence boundary detection without speech recognition: A case of an underresourced languageNursuriati JamilMuhammad Izzad RamliNoraini SemanSentence boundary detection (SBD), also known as sentence segmentation decides where a sentence begins and ends. Previous method of SBD is either done by linguistic approach or acoustic approach; or combination of both approaches. Even though linguistic approach generally performed better than acoustic approach, it requires the need of a speech recognition component. This is a constraint for Under Resource Languages such as the Malay language. This paper describes the SBD for spontaneous Malay language spoken audio. Experiments are conducted on a forty-two minutes question-answer (Q/A) Malaysia parliamentary session comprising 12 adult male speakers and 4 female speakers. The speech datasets are first classified as speech/non-speech segments and only the non-speech segments are further tested as candidates of sentence boundaries. Seven prosodic features, rate-of-speech and volume are then extracted from the boundary candidates for classification. Our proposed SBD method using supervised Adaboost classifier managed a promising100% accuracy rate with 19.44% error rate. For future work, we intend to reduce the error rate by implementing end-point detection on the boundary candidates.https://journal.esrgroups.org/jes/papers/11_3_6.pdfsentence boundary detectionspontaneous speechprosody featuresadaboost
collection DOAJ
language English
format Article
sources DOAJ
author Nursuriati Jamil
Muhammad Izzad Ramli
Noraini Seman
spellingShingle Nursuriati Jamil
Muhammad Izzad Ramli
Noraini Seman
Sentence boundary detection without speech recognition: A case of an underresourced language
Journal of Electrical Systems
sentence boundary detection
spontaneous speech
prosody features
adaboost
author_facet Nursuriati Jamil
Muhammad Izzad Ramli
Noraini Seman
author_sort Nursuriati Jamil
title Sentence boundary detection without speech recognition: A case of an underresourced language
title_short Sentence boundary detection without speech recognition: A case of an underresourced language
title_full Sentence boundary detection without speech recognition: A case of an underresourced language
title_fullStr Sentence boundary detection without speech recognition: A case of an underresourced language
title_full_unstemmed Sentence boundary detection without speech recognition: A case of an underresourced language
title_sort sentence boundary detection without speech recognition: a case of an underresourced language
publisher ESRGroups
series Journal of Electrical Systems
issn 1112-5209
1112-5209
publishDate 2015-09-01
description Sentence boundary detection (SBD), also known as sentence segmentation decides where a sentence begins and ends. Previous method of SBD is either done by linguistic approach or acoustic approach; or combination of both approaches. Even though linguistic approach generally performed better than acoustic approach, it requires the need of a speech recognition component. This is a constraint for Under Resource Languages such as the Malay language. This paper describes the SBD for spontaneous Malay language spoken audio. Experiments are conducted on a forty-two minutes question-answer (Q/A) Malaysia parliamentary session comprising 12 adult male speakers and 4 female speakers. The speech datasets are first classified as speech/non-speech segments and only the non-speech segments are further tested as candidates of sentence boundaries. Seven prosodic features, rate-of-speech and volume are then extracted from the boundary candidates for classification. Our proposed SBD method using supervised Adaboost classifier managed a promising100% accuracy rate with 19.44% error rate. For future work, we intend to reduce the error rate by implementing end-point detection on the boundary candidates.
topic sentence boundary detection
spontaneous speech
prosody features
adaboost
url https://journal.esrgroups.org/jes/papers/11_3_6.pdf
work_keys_str_mv AT nursuriatijamil sentenceboundarydetectionwithoutspeechrecognitionacaseofanunderresourcedlanguage
AT muhammadizzadramli sentenceboundarydetectionwithoutspeechrecognitionacaseofanunderresourcedlanguage
AT norainiseman sentenceboundarydetectionwithoutspeechrecognitionacaseofanunderresourcedlanguage
_version_ 1724769757336436736