编辑: 颜大大i2 | 2019-07-15 |
Spoken Language Translation (SLT) is becoming more widely used and becoming a communication tool that helps in crossing language barriers. One of the challenges of SLT is the translation from a language without gender agreement to a language with gender agreement such as English to Arabic. In this paper, we introduce an approach to tackle such limitation by enabling a Neural Machine Translation system to produce gender-aware translation. We show that NMT system can model the speaker/listener gender information to produce gender-aware translation. We propose a method to generate data used in adapting a NMT system to produce gender-aware. The proposed approach can achieve significant improvement of the translation quality by
2 BLEU points. Keywords: Speaker Gender, Gender Information, Gender Aware Translation, Gender Agreement, produce gender, Neural Machine Translation System,
1 Introduction Nearly half the world languages have a grammatical gender system. For native speakers of these languages, violations of gender agreement are associated with a difficulty in comprehension. In one study [1] , gender agreement violations resulted in a delay of
500 to
700 ms. in response time while reading Spanish sentences. A similar study [2], has reached analogous conclusions for spoken language comprehension. These findings suggest that gender agreement violations place an additional cognitive overload on the listener. In conversational settings, pronouns are frequently used referring to the speaker or addressing the listener(s). Pronominal gender agreement is particularly challenging for machine translation (MT), particularly when the source language does not have gender agreement while the target language does, which is the case for English to Arabic translation. The focus of this paper is to enable a SLT system to produce gender-aware translation for both parties participating in a conversation. For instance, let us consider a SLT session involving English and French participants. If an English says: I am certain . The appropriate translation of the adjective certain to French depends on the speaker gender since French has a grammatical gender system. For a male speaker the correct translation is Je suis certain , while Je suis certaine is the correct form for a female speaker. Similarly, in Arabic, I am certain should be translated to ??????? ????? (?na mt?kd) or ???????? ????? (?na mt?kdt) for a male or female speaker respectively. The listener'
s gender would affect the translation as well. Let'
s consider the translation of You said it into Arabic. For a male listener, it should be ?????? ????? (?nt qlth) and for a female listener, the correct translation becomes ??????? ????? (?nt qltyh). As the listener is also a speaker in conversational setting, the term speakers gender agreement here refers to both speaker-dependent and listener-dependent gender agreement unless making the distinction is necessary for the clarity of the presentation. To assess the prevalence of speakers'
gender agreement in SLT, we have randomly selected
1000 sentences from the English-Arabic Open-Subtitles data [3]. These sentences were manually analyzed for speaker-dependent or listener-dependent gender agreement. More than half the sample contained at least one form of gender dependency. However, smaller number of sentences, had both speaker and listener dependency. Detailed findings are in Table 1. We also observed that the listener dependency is much more dominating than speaker dependency. Table 1. Gender dependence in