Text and Speech Alignment Methods for Speech Translation Corpora Creation : Augmenting English LibriVox Recordings with Italian Textual Translations

The recent uprise of end-to-end speech translation models requires a new generation of parallel corpora, composed of a large amount of source language speech utterances aligned with their target language textual translations. We hereby show a pipeline and a set of methods to collect hundreds of hour...

Full description

Bibliographic Details
Main Author: Della Corte, Giuseppe
Format: Others
Language:English
Published: Uppsala universitet, Institutionen för lingvistik och filologi 2020
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-413064