Total Pageviews

Tuesday, January 1, 2019

Malayalam experience of Google Translate: Morpho-syntactic observations


.........................................................................................................................


आभ्यंतर (Aabhyantar)      SCONLI-12  विशेषांक         ISSN : 2348-7771

.........................................................................................................................
18. Malayalam experience of Google Translate: Morpho-syntactic observations
Prajisha Areecode
Abstract
Google translator is one of the most used online machine-translator.  This paper evaluates the errors of Malayalam to English translation carried out through this device. Sample sentences are attempted in order to understand the translation problems and to assess the accuracy of online machine translation offered by Google. It is observed that morpho syntactic peculiarities of Malayalam found remaining untranslatable. An evaluation of this issue could suggest that the development of NLP in Malayalam is not up to the mark to support machine translation at this juncture. This study suggests the present morpho syntactic errors should be addressed immediately to ensure the effectiveness of Machine Translation in the Context of Malayalam language.
Key words: Google translate, NLP, machine translation, Malayalam, Morpho-syntax
Introduction
Machine Translation in Malayalam context has not much of the systematic efforts in its credit other than Google initiative. This very context demands to test the success of the Malayalam Google translator. This paper presents the accuracy test and identifying major issues involved in translating Malayalam by Google Translator. Also this study suggests the solution of the issue.
Considering the low success rate of MT attempts previously, the Google translator is good in response and use. But the mismatches in translation should be studied and make suggestions for a better tomorrow. With this aim, this study is designed.
This paper is organized into 3 sections. Section1 gives a brief introduction of MT attempts in the context of Malayalam, Section 2 illustrates the translation of Malayalam by Google Translator with the support of examples problems are identified and discussed and finally the concluding remarks are made at the 3rd session.
Translation of Malayalam by Google Translator
In this part translations of Malayalam sentences done by the Google Translator and its human variation is presented.
1)                  1.(a) ii  panthinu            nalla     valippamuNDu
This            ball+ DAT        pretty    size+CONJ
This sound is pretty big                                                        GT
1.(b) ii        panthinu            nalla     valuppamuNDu
This            ball+DAT         pretty    size+CONJ.
      This      ball       is          pretty                big                                HT
 This           ball       has       a          good     size                              HT
This            bal        has      a          good     size                                                      GT
Consider 1.(a) and 1. (b). these sentences shows words valippam, valuppam respectively in free variation. 1.(a) shows valippam and Google mistranslates it. When it is valuppam as in 1.(b) GT gets it right. This indicates that the two forms are not listed as the variants of the same lexical entry in the corpus.
2)                  UrappayiTTum             avare                kaNDaal           thallaam - 
Definitely         they-Acc          see-if    beat-may
Definitely         beat      them     when    (you)    meet     HT
Even     if          they      can       see       them.                GT
Here aal is equalent to the word ‘if’ in english. But in Malayalam homonymous with instrumental suffix. This might create confusion in MT. The verb thallu (beat) is lost in GT. The mistranslation of this sentence indicates incurrect performance of morphological analyser. And aam is judgemental modality.
3)                  Nii       kaLLi   aaNu -
You            thief-FEM.SG   be.
  You          are       (a)        thief                                         HT
You’re in the mosque                                                          GT
The ‘thief-feminine-singular’ is translated here as ‘in the mosque’. This may be because of mistakes in lexical items. This translation  makes mismatch in lexical item and it reframe case structure. For instance, In the Malayalam sentence (3) it is in nominative but in its translation came with locative case- in. This unrelated way of translation suggests that the sentence is not understood by the translator
4)                  manushyaR       nanma  uLLavaraNu - Humans have good things
human-Pl    goodness           having-Pl-be
Humans      are       good     beings   HT
Humans      are       virtuous            HT
There is a mix up in copular verbs, ‘aanu’(be) vs ‘uND’(have) (In sentence number 4).
In the above 1-4 sentences, all are found failed in translation (except 3). In the first one(1), there is a misrepresentation of the object ball. The word for ball in Malayalam is translated as ‘sound’ as in 1. Likewise kaLLi (thief-feminine-singular) is translated as ‘in the mosque’ as in (3). This kind of unrelated lexical items appearing in translation caused as the major problem in MT. Another problem is of sense identification found in case of nanma. It means virtuous but it is reduced as good. It suggests Google couldn’t sense its semantic value.
2.1 Verbs
5)                  Njaan   avaLe   viLichathayirunnu
I     she-Acc            call-PAST.PCPL-be –CONJ.PAST
I     had       called   her.                              HT
I     called   her                                           GT
6)                  Njaan   avaLe   viLichirunnu
I     she-Acc            call-perfect past
I     had       called   her.                              HT
I     called   her                                           GT
7)                  Njaan   avaLe   viLichu
I     she-Acc            call-PAST
I     called   her                                           HT
I called her                                                 GT      
(7 is simple sentences with only past tense marker. Eg.7 is only translated correctly. )
8)                  Njaan   avaLe   viLichiTTuND
I           she-Acc            call-REMO.PERF-be-PRES
I     had       called                                       HT
I           called   her                                           GT
In 6, perfective aspect doesn’t get translated. Instead it is in simple past.
9)                  Njaan   avaLe   viLichiTTuNDaayirunnu
I     she-Acc            call-REMO.PERF-be-PRES-CONJ.PAST
                  I           had       called   her                   HT
I           called               her                   GT
5-9 translations do not retain the tense and aspect meanings of corresponding Malayalam verbs. Sentence 5-9 are typical examples showing inflections of Malayalam. Inflections are rated as an important character of Malayalam but it is not addressed in Google Translation.  In case of 5-9, the translation shows similarity while its use in Malayalam is distinctive and this is not covered by the Translation. Past tense in Malayalam is expressed differently but in translation it is uniformly translated as the pattern with –ed form.
The following 10- 13, reflects the same like above in case of future tense sense differences.
10)               Njan     avaLe   viLikkum
I     she-Acc            call-FUT
I     will      call       her                               HT
I’ll  call       her                                           GT
11)               Njan     avaLe   viLikkumaayirikkum
I     she-Acc            call-FUT-may
I           may/might         be         call       her.                  HT
I’ll        call       her                                                       GT
Here, the verb stem vilik with desiderative mood (desiderative mood is used to denote a situation where the speaker intends to say that a particular action which was not alone should have been done.2012:63) That is, GT fails to capture the mood features of verb.
12)               Njan     avaLe   viLikkaam
I     she-Acc            call-PROM
I     will      call       her                               HT
I’ll  call       her                                           GT
Some other examples:
The following sentences illustrate the failures of translate the verb inflections in Malayalam. Even tense is also not translated equally.
13)               Enikk    viSakkum
I-DAT        hungry-FUT
     I            will      be         hungry                          HT
I’m       hungry                                                  GT
14)               Enikk    viSannu
I-DAT        hungry-PAST
 I    got        hungry                                      HT
I     was      hungry                                      GT
15)               Enikk    viSakkunnilla
I-DAT        hungry-PRES-NEG
I     didn’t    get        hungry                          HT
I’m             not        hungry                          GT
16)               Enikk    viSannilla
I-DAT  hungry-PAST-NEG
I           was      not        hungry                          HT
I’m                   not        hungry                          GT
17)               Enikk    viSakkunnuNDaayirunnilla
I-DAT  hungry-PRES-be-CONJ-NEG
I           was      not        feeling  hungry                          HT
I           was      not        hungry                                      GT
18)               Enikk    viSanniTTuNDaayirunnilla
I-DAT        hungry-PAST-REMO PERF-be-PAST-NEG
I           have     not        got        hungry                          HT
I           was      not        hungry                                      GT
2.2 Habitual action (seelabhaavi) in Malayalam
‘Seelabhaavi’ includes all tenses. It is continuous and habitual. For instance, daily process like sun rise and sun set.
19)               Sooryan            kiZhakkee         udikkuu
Sun east-HAB                     rise-HAB
The            sun       rises     only      in         the        east                  HT
The            Sun       rises     in         the        east                              GT
20)               naayayuTe                    vaal      vaLnjnjee         irikkoo
dog-GEN    tail                   bent-HAB         be-HAB
The dog’s tail always will be bent                                        HT
Eat the        dog’s    tail                                                       GT
Here the main verb irikk is used in the sense of aak (meaning ‘be’, usually it means sit). But GT got the verb wrong. Instead it is translated as ‘eat’. ‘ee’ is an emphasis marker, noting habitual action. oo also denotes habitual action. These two indicators of habitual action are ignored by GT. Thus that sense is completely lost in translation.
In 19 and 20, the translation could not hold its nature habitually.
2.3  Auxiliary Verbs (anuprayoogam)
This category is main speciality to Malayalam. Use of auxiliaries (traditional  Malayalam grammar distinguish aspect-mood and auxiliaries) are not conceived by the Google Translator. For instance,
21)               Njaan   sathyam            paRanju            pooyi            
I           truth     tell-PAST         go-PAST
I           happened                      to         tell       the        truth                 HT
I’m                   saying               the        truth                                         GT
Here, got tense wrong. pooyi is a auxiliary/light verb. It denotes that the action was done involuntarily. But this sense is lost in GT.
2.4 Agglutinative Nature
Compare the pair of sentence 22(a) and 22(b), 23(a) and 23(b).
22)               (a) Ninte           veeTeviTeyaaNu
You-Acc          home    where-is
Here’s  your      home                            GT
22(b). Ninte veeTu EviTeyaaNu – Where is your home
You-Acc    home    where-is
Where is your home                                    HT
Where is your home                                    GT
23)               (a) Raamanetthi
Raaman            reach-PAST
Ram                                                     GT
23(b).Raaman               etthi
Raaman reach-PAST
Raman reached                               GT
Raaman reached                             HT
GT correctly analyse and translate complex words only when given as separate morphemes. In the above examples, The Translator couldn’t understand the agglutinative nature of Malayalam But when the same examples are separated, the machine could sense it and GT could identify the individual morphemes. It means the low accuracy shown in this case is mainly due to the non-familiarity with the agglutinative nature of the language
2.5 We can list out the mistakes.
1.       Error in documenting corpus and lexicon.
2.       Mistakes in translating copular verbs.
3.       GT could not distinguish the differences tense, aspects and mood of verb morphology.
4.       When auxiliary verbs/light verbs are used, GT could not capture their various functional meanings.
5.       Problems in analysing agglutinated forms.
Conclusion
The present paper discusses various instances of mistranslation done by the Google translator in Malayalam context. From this test of accuracy, it is observed there are instances of mismatch and incoherence appeared in Google translation. We used Malayalam sentences as input and it generates less corresponding translation in English. Language used in this work exhibit rich morphology which causes poor translation quality. It opens a new vista of MT in Malayalam. The above mapping of untranslatability is not a failure of the Google device rather NLP in Malayalam is not scientifically enriched. It may be concluded that the quality of translation is directly dependent on the scope and quality of NLP and parallel language corpora. Malayalam Linguistics should concentrate primarily on the morphological aspects of the language and it’s computing for making MT realised in Malayalam.
Bibliography
·         Antony, P.J. 2012.  Machine Translation Approaches and Survey for Indian Languages. Computational Linguistics and Chinese Language Processing.  Vol.18, No.1, March 2013.
·         Garie,VY, Kbarate.U.K.  Survey of Machine Translation systems in India.  International Journal on Natural Language Computing (IJNLC) Vol.2, No.4, October 2013.
·         http://GloablSecurity.org/intell/systems/mt.history.html
·         Hutchins,W.John. Machine Translation: A Concise history. http://ourworld.compuserve.com/homepages/wjHutchins
·         http://ourworld.compuserve.com Latest version November 2005
·         Jomysose.  Machine Translation with special reference to Malayalam language:  International Journal of computer science and Engineering Technology (IJCSET).  Vols. No.04, April 2014.
·         Translation directory.com/articles/articles 190t.php
Abbreviations


Acc – Accusative case.
CONJ – conjunctive
CONT – Continuous
DAT – Dative
DES – Desiderative
FEM – Feminine
FUT – Future tense
GEN – Genitive
HAB – Habitual
LOC – Locative
NEG – Negative
PAST – Past Tense
PCPL – Participle
PERF – Perfective
PERM – permissive
PL – Plural
POSS – Possibilitive
PRES – Present tense
PROM – Promissive
REMO – Remote





1 comment:

  1. I like your post very much. It is very much useful for my research. I hope you to share more info about this. Keep posting Cyber Security Training

    ReplyDelete