Towards better subtitles: A multilingual approach for punctuation restoration of speech transcripts
作者:
Highlights:
• A state-of-the-art approach for multilingual punctuation prediction.
• Knowledge about punctuation from pre-trained transformer-based encoder models.
• Monolingual models tested both in human-edited and in automatic transcripts.
• Single multilingual model predicts punctuation in multiple languages.
• Integration within an existing multilingual video subtitling pipeline.
摘要
•A state-of-the-art approach for multilingual punctuation prediction.•Knowledge about punctuation from pre-trained transformer-based encoder models.•Monolingual models tested both in human-edited and in automatic transcripts.•Single multilingual model predicts punctuation in multiple languages.•Integration within an existing multilingual video subtitling pipeline.
论文关键词:Punctuation marks,Intelligent subtitles,Pre-trained embeddings,Speech transcripts,Sentence boundaries,Multilingual embeddings
论文评审过程:Received 12 May 2020, Revised 6 August 2021, Accepted 6 August 2021, Available online 14 August 2021, Version of Record 18 August 2021.
论文官网地址:https://doi.org/10.1016/j.eswa.2021.115740