Sentence boundary detection without speech recognition: A case of an underresourced language
Sentence boundary detection (SBD), also known as sentence segmentation decides where a sentence begins and ends. Previous method of SBD is either done by linguistic approach or acoustic approach; or combination of both approaches. Even though linguistic approach generally performed better than ac...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
ESRGroups
2015-09-01
|
Series: | Journal of Electrical Systems |
Subjects: | |
Online Access: | https://journal.esrgroups.org/jes/papers/11_3_6.pdf |
id |
doaj-1bcb342debe8453b8549651e2d0893e7 |
---|---|
record_format |
Article |
spelling |
doaj-1bcb342debe8453b8549651e2d0893e72020-11-25T02:43:22ZengESRGroupsJournal of Electrical Systems1112-52091112-52092015-09-01113308318Sentence boundary detection without speech recognition: A case of an underresourced languageNursuriati JamilMuhammad Izzad RamliNoraini SemanSentence boundary detection (SBD), also known as sentence segmentation decides where a sentence begins and ends. Previous method of SBD is either done by linguistic approach or acoustic approach; or combination of both approaches. Even though linguistic approach generally performed better than acoustic approach, it requires the need of a speech recognition component. This is a constraint for Under Resource Languages such as the Malay language. This paper describes the SBD for spontaneous Malay language spoken audio. Experiments are conducted on a forty-two minutes question-answer (Q/A) Malaysia parliamentary session comprising 12 adult male speakers and 4 female speakers. The speech datasets are first classified as speech/non-speech segments and only the non-speech segments are further tested as candidates of sentence boundaries. Seven prosodic features, rate-of-speech and volume are then extracted from the boundary candidates for classification. Our proposed SBD method using supervised Adaboost classifier managed a promising100% accuracy rate with 19.44% error rate. For future work, we intend to reduce the error rate by implementing end-point detection on the boundary candidates.https://journal.esrgroups.org/jes/papers/11_3_6.pdfsentence boundary detectionspontaneous speechprosody featuresadaboost |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Nursuriati Jamil Muhammad Izzad Ramli Noraini Seman |
spellingShingle |
Nursuriati Jamil Muhammad Izzad Ramli Noraini Seman Sentence boundary detection without speech recognition: A case of an underresourced language Journal of Electrical Systems sentence boundary detection spontaneous speech prosody features adaboost |
author_facet |
Nursuriati Jamil Muhammad Izzad Ramli Noraini Seman |
author_sort |
Nursuriati Jamil |
title |
Sentence boundary detection without speech recognition: A case of an underresourced language |
title_short |
Sentence boundary detection without speech recognition: A case of an underresourced language |
title_full |
Sentence boundary detection without speech recognition: A case of an underresourced language |
title_fullStr |
Sentence boundary detection without speech recognition: A case of an underresourced language |
title_full_unstemmed |
Sentence boundary detection without speech recognition: A case of an underresourced language |
title_sort |
sentence boundary detection without speech recognition: a case of an underresourced language |
publisher |
ESRGroups |
series |
Journal of Electrical Systems |
issn |
1112-5209 1112-5209 |
publishDate |
2015-09-01 |
description |
Sentence boundary detection (SBD), also known as sentence segmentation decides where a
sentence begins and ends. Previous method of SBD is either done by linguistic approach or
acoustic approach; or combination of both approaches. Even though linguistic approach
generally performed better than acoustic approach, it requires the need of a speech recognition
component. This is a constraint for Under Resource Languages such as the Malay language.
This paper describes the SBD for spontaneous Malay language spoken audio. Experiments are
conducted on a forty-two minutes question-answer (Q/A) Malaysia parliamentary session
comprising 12 adult male speakers and 4 female speakers. The speech datasets are first
classified as speech/non-speech segments and only the non-speech segments are further tested
as candidates of sentence boundaries. Seven prosodic features, rate-of-speech and volume are
then extracted from the boundary candidates for classification. Our proposed SBD method using
supervised Adaboost classifier managed a promising100% accuracy rate with 19.44% error rate.
For future work, we intend to reduce the error rate by implementing end-point detection on the
boundary candidates. |
topic |
sentence boundary detection spontaneous speech prosody features adaboost |
url |
https://journal.esrgroups.org/jes/papers/11_3_6.pdf |
work_keys_str_mv |
AT nursuriatijamil sentenceboundarydetectionwithoutspeechrecognitionacaseofanunderresourcedlanguage AT muhammadizzadramli sentenceboundarydetectionwithoutspeechrecognitionacaseofanunderresourcedlanguage AT norainiseman sentenceboundarydetectionwithoutspeechrecognitionacaseofanunderresourcedlanguage |
_version_ |
1724769757336436736 |