编辑: 颜大大i2 | 2019-07-15 |
1000 Open-Subtitle sentences. Gender Dependence Percentage of Sentences None 48.5% Speaker Only 3.1% Listener Only 46.9% Both Speaker and Listener 1.5% Fortunately, speaker gender determination from speech has reached high accuracy even for relatively short speech segments [4] . Therefore, we can rely on having this information at runtime. However, training a SLT system would require gender tagged parallel sentences to be able to generate gender-aware translations. This is particularly important in the current pipelined approach to SLT, which combines a speech recognition component followed by machine translation, commonly used in large scale SLT systems. A promising direction is training end-to-end speech to speech translation systems [5] which is trained on source language audio and produces target language audio(or text). In such setting, the speaker'
s gender information can be easily extracted from the source language audio. However, the listener'
s gender information would still be required to be able to produce gender-aware SLT. One of the main challenges in training gender-aware SLT is to find a large gender tagged parallel corpus that has both the speaker'
s and listener'
s gender information. To address this challenge, we propose an approach to automatically label a parallel conversational corpus with gender information. Applying this approach to the Open Subtitle data set has produced the training data needed for this work. The proposed approach uses a part-of-speech tagger and a set of rules to automatically tag sentences with speaker and listener genders. The tagged sentences are used to adapt a baseline neural MT system trained using sequence to sequence training with attention. This baseline system is trained using both gender dependent and gender independent sentences, then adapted using the sentences with identified gender dependence. The main contribution of this paper is twofold: enabling NMT systems to produce gender-aware translation and provide a method to generate the data to achieve that. The remainder of this paper is structured as follows. Section
2 reviews some of the work on speaker gender determination from speech. Section
3 describes the sentence labelling process for speaker gender dependent and listener gender dependent utterance extraction. Section
4 outlines the NMT training and testing used. Section
5 summarizes the experiments we have conducted, and Section
6 concludes the paper.
2 Speaker Gender Identification Humans can easily identify the gender of the speaker from a noticeably short audio segment due to the natural differences in female and male speech generation process. Automatic gender identification helps in improving the accuracy and the robustness of many speech applications such as automatic speech recognition, emotion recognition, content-based multimedia indexing systems, speaker diarization, speaker indexing, human-machine interaction, and voice synthesis. Gender classification from speech is considered a solved problem on clean and monolingual corpora such as the TIMIT [6] speech corpus or distorted and multilingual corpora such as DARPA RATS [7]. However, differentiating the g........