Speech Recognition Software and Vidispine

To evaluate libraries for continuous speech recognition, a test based on TED-talk videos was created. The different speech recognition libraries PocketSphinx, Dragon NaturallySpeaking and Microsoft Speech API were part of the evaluation. From the words that the libraries recognized, Word Error Rate...

Full description

Bibliographic Details
Main Author:	Nilsson, Tobias
Format:	Others
Language:	English
Published:	Umeå universitet, Institutionen för datavetenskap 2013
Subjects:	Computer science Datavetenskap
Online Access:	http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-71428

id	ndltd-UPSALLA1-oai-DiVA.org-umu-71428
record_format	oai_dc
spelling	ndltd-UPSALLA1-oai-DiVA.org-umu-714282013-05-30T03:59:00ZSpeech Recognition Software and VidispineengNilsson, TobiasUmeå universitet, Institutionen för datavetenskap2013Computer scienceDatavetenskapTo evaluate libraries for continuous speech recognition, a test based on TED-talk videos was created. The different speech recognition libraries PocketSphinx, Dragon NaturallySpeaking and Microsoft Speech API were part of the evaluation. From the words that the libraries recognized, Word Error Rate (WER) was calculated and the results show that Microsoft SAPI performed worst with a WER of 60.8%, PocketSphinx at second place with 59.9% and Dragon NaturallySpeaking as the best with 42.6%. These results were all achieved with a Real Time Factor (RTF) of less than 1.0. PocketSphinx was chosen as the best candidate for the intended system on the basis that it is open-source, free and would be a better match to the system. By modifying the language model and dictionary to closer resemble typical TED-talk contents, it was also possible to improve the WER for PocketSphinx to a value of 39.5%, however with the cost of RTF which passed the 1.0 limit,making it less useful for live video. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-71428UMNAD ; 937application/pdfinfo:eu-repo/semantics/openAccess
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	Computer science Datavetenskap
spellingShingle	Computer science Datavetenskap Nilsson, Tobias Speech Recognition Software and Vidispine
description	To evaluate libraries for continuous speech recognition, a test based on TED-talk videos was created. The different speech recognition libraries PocketSphinx, Dragon NaturallySpeaking and Microsoft Speech API were part of the evaluation. From the words that the libraries recognized, Word Error Rate (WER) was calculated and the results show that Microsoft SAPI performed worst with a WER of 60.8%, PocketSphinx at second place with 59.9% and Dragon NaturallySpeaking as the best with 42.6%. These results were all achieved with a Real Time Factor (RTF) of less than 1.0. PocketSphinx was chosen as the best candidate for the intended system on the basis that it is open-source, free and would be a better match to the system. By modifying the language model and dictionary to closer resemble typical TED-talk contents, it was also possible to improve the WER for PocketSphinx to a value of 39.5%, however with the cost of RTF which passed the 1.0 limit,making it less useful for live video.
author	Nilsson, Tobias
author_facet	Nilsson, Tobias
author_sort	Nilsson, Tobias
title	Speech Recognition Software and Vidispine
title_short	Speech Recognition Software and Vidispine
title_full	Speech Recognition Software and Vidispine
title_fullStr	Speech Recognition Software and Vidispine
title_full_unstemmed	Speech Recognition Software and Vidispine
title_sort	speech recognition software and vidispine
publisher	Umeå universitet, Institutionen för datavetenskap
publishDate	2013
url	http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-71428
work_keys_str_mv	AT nilssontobias speechrecognitionsoftwareandvidispine
_version_	1716586221693566976

Speech Recognition Software and Vidispine

Similar Items