.........................................................................................................................
आभ्यंतर (Aabhyantar)
SCONLI-12
विशेषांक ISSN : 2348-7771
.........................................................................................................................
18. Malayalam experience of
Google Translate: Morpho-syntactic observations
Prajisha Areecode
Abstract
Google translator is one of the most used online
machine-translator. This paper evaluates
the errors of Malayalam to English translation carried out through this device.
Sample sentences are attempted in order to understand the translation problems
and to assess the accuracy of online machine translation offered by Google. It
is observed that morpho syntactic peculiarities of Malayalam found remaining
untranslatable. An evaluation of this issue could suggest that the development
of NLP in Malayalam is not up to the mark to support machine translation at
this juncture. This study suggests the present morpho syntactic errors should
be addressed immediately to ensure the effectiveness of Machine Translation in
the Context of Malayalam language.
Key words: Google translate, NLP, machine translation,
Malayalam, Morpho-syntax
Introduction
Machine Translation in
Malayalam context has not much of the systematic efforts in its credit other
than Google initiative. This very context demands to test the success of the
Malayalam Google translator. This paper presents the accuracy test and
identifying major issues involved in translating Malayalam by Google
Translator. Also this study suggests the solution of the issue.
Considering the low success
rate of MT attempts previously, the Google translator is good in response and
use. But the mismatches in translation should be studied and make suggestions
for a better tomorrow. With this aim, this study is designed.
This paper is organized into 3
sections. Section1 gives a brief introduction of MT attempts in the context of
Malayalam, Section 2 illustrates the translation of Malayalam by Google
Translator with the support of examples problems are identified and discussed
and finally the concluding remarks are made at the 3rd session.
Translation
of Malayalam by Google Translator
In this part translations of Malayalam sentences
done by the Google Translator and its human variation is presented.
1)
1.(a) ii panthinu nalla valippamuNDu
This ball+ DAT pretty size+CONJ
This
sound is pretty big GT
1.(b) ii panthinu nalla valuppamuNDu
This ball+DAT pretty size+CONJ.
This ball is pretty big HT
This ball has a good size HT
This bal has a good size GT
Consider 1.(a) and 1. (b). these sentences shows words valippam,
valuppam respectively in free variation. 1.(a) shows valippam and
Google mistranslates it. When it is valuppam as in 1.(b) GT gets it
right. This indicates that the two forms are not listed as the variants of the
same lexical entry in the corpus.
2)
UrappayiTTum avare kaNDaal thallaam
-
Definitely they-Acc see-if beat-may
Definitely beat them when (you) meet HT
Even if they can see them. GT
Here aal is equalent to the word ‘if’ in english.
But in Malayalam homonymous with instrumental suffix. This might create
confusion in MT. The verb thallu (beat) is lost in GT. The
mistranslation of this sentence indicates incurrect performance of
morphological analyser. And aam is judgemental modality.
3)
Nii kaLLi aaNu -
You thief-FEM.SG be.
You are (a) thief
HT
You’re
in the mosque GT
The ‘thief-feminine-singular’ is translated here as ‘in
the mosque’. This may be because of mistakes in lexical items. This
translation makes mismatch in lexical
item and it reframe case structure. For instance, In the Malayalam sentence (3)
it is in nominative but in its translation came with locative case- in. This
unrelated way of translation suggests that the sentence is not understood by
the translator
4)
manushyaR nanma uLLavaraNu - Humans have good things
human-Pl goodness having-Pl-be
Humans are good beings HT
Humans are virtuous
HT
There is a mix up in copular verbs, ‘aanu’(be) vs
‘uND’(have) (In sentence number 4).
In the above 1-4 sentences, all are found failed in
translation (except 3). In the first one(1), there is a misrepresentation of
the object ball. The word for ball in Malayalam is translated as ‘sound’ as in
1. Likewise kaLLi (thief-feminine-singular) is translated as ‘in the
mosque’ as in (3). This kind of unrelated lexical items appearing in
translation caused as the major problem in MT. Another problem is of sense
identification found in case of nanma. It means virtuous but it is
reduced as good. It suggests Google couldn’t sense its semantic value.
2.1 Verbs
5)
Njaan avaLe viLichathayirunnu
I she-Acc call-PAST.PCPL-be
–CONJ.PAST
I had called her. HT
I called her GT
6)
Njaan avaLe viLichirunnu
I she-Acc call-perfect
past
I had called her. HT
I called her GT
7)
Njaan avaLe viLichu
I she-Acc call-PAST
I called her HT
I
called her GT
(7 is simple sentences with only past tense marker. Eg.7
is only translated correctly. )
8)
Njaan avaLe viLichiTTuND
I she-Acc call-REMO.PERF-be-PRES
I had called HT
I called her GT
In 6, perfective aspect doesn’t get translated. Instead
it is in simple past.
9)
Njaan avaLe viLichiTTuNDaayirunnu
I she-Acc call-REMO.PERF-be-PRES-CONJ.PAST
I had called her HT
I called her GT
5-9 translations do not retain the tense and aspect
meanings of corresponding Malayalam verbs. Sentence 5-9 are typical examples
showing inflections of Malayalam. Inflections are rated as an important
character of Malayalam but it is not addressed in Google Translation. In case of 5-9, the translation shows
similarity while its use in Malayalam is distinctive and this is not covered by
the Translation. Past tense in Malayalam is expressed differently but in
translation it is uniformly translated as the pattern with –ed form.
The following 10- 13, reflects the same like above in
case of future tense sense differences.
10)
Njan avaLe viLikkum
I she-Acc call-FUT
I will call her HT
I’ll call her GT
11)
Njan avaLe viLikkumaayirikkum
I she-Acc call-FUT-may
I may/might be call her. HT
I’ll call her GT
Here, the verb stem vilik with desiderative mood
(desiderative mood is used to denote a situation where the speaker intends to
say that a particular action which was not alone should have been done.2012:63)
That is, GT fails to capture the mood features of verb.
12)
Njan avaLe viLikkaam
I she-Acc call-PROM
I will call her HT
I’ll call her GT
Some other examples:
The following sentences illustrate the failures of
translate the verb inflections in Malayalam. Even tense is also not translated
equally.
13)
Enikk viSakkum
I-DAT hungry-FUT
I will be hungry HT
I’m hungry GT
14)
Enikk viSannu
I-DAT hungry-PAST
I got hungry HT
I was hungry GT
15)
Enikk viSakkunnilla
I-DAT hungry-PRES-NEG
I didn’t get hungry HT
I’m not hungry GT
16)
Enikk viSannilla
I-DAT hungry-PAST-NEG
I was not hungry HT
I’m not hungry GT
17)
Enikk viSakkunnuNDaayirunnilla
I-DAT hungry-PRES-be-CONJ-NEG
I was not feeling hungry HT
I was not hungry GT
18)
Enikk viSanniTTuNDaayirunnilla
I-DAT hungry-PAST-REMO PERF-be-PAST-NEG
I have not got hungry HT
I was not hungry GT
2.2 Habitual action (seelabhaavi) in
Malayalam
‘Seelabhaavi’ includes all tenses. It is continuous and
habitual. For instance, daily process like sun rise and sun set.
19)
Sooryan kiZhakkee udikkuu
Sun east-HAB rise-HAB
The sun rises only in the east HT
The Sun rises in the east GT
20)
naayayuTe vaal vaLnjnjee irikkoo
dog-GEN tail bent-HAB be-HAB
The
dog’s tail always will be bent HT
Eat the dog’s tail GT
Here the main verb irikk is used in the sense of aak
(meaning ‘be’, usually it means sit). But GT got the verb wrong. Instead it is
translated as ‘eat’. ‘ee’ is an emphasis marker, noting habitual action. oo
also denotes habitual action. These two indicators of habitual action are
ignored by GT. Thus that sense is completely lost in translation.
In 19 and 20, the translation
could not hold its nature habitually.
2.3 Auxiliary Verbs (anuprayoogam)
This category is main speciality to Malayalam. Use of
auxiliaries (traditional Malayalam
grammar distinguish aspect-mood and auxiliaries) are not conceived by the
Google Translator. For instance,
21)
Njaan sathyam paRanju pooyi
I truth tell-PAST go-PAST
I happened to tell the truth HT
I’m saying the truth GT
Here, got tense wrong. pooyi is a auxiliary/light
verb. It denotes that the action was done involuntarily. But this sense is lost
in GT.
2.4 Agglutinative Nature
Compare the pair of sentence 22(a) and 22(b), 23(a) and
23(b).
22)
(a) Ninte veeTeviTeyaaNu
You-Acc home where-is
Here’s your home GT
22(b). Ninte veeTu EviTeyaaNu – Where
is your home
You-Acc home where-is
Where
is your home HT
Where
is your home GT
23)
(a) Raamanetthi
Raaman reach-PAST
Ram GT
23(b).Raaman etthi
Raaman
reach-PAST
Raman
reached GT
Raaman
reached HT
GT correctly analyse and translate complex words only
when given as separate morphemes. In
the above examples, The Translator couldn’t understand the agglutinative nature
of Malayalam But when the same examples are separated, the machine could sense
it and GT could identify the individual morphemes. It means the low accuracy
shown in this case is mainly due to the non-familiarity with the agglutinative
nature of the language
2.5
We can list out the mistakes.
1. Error in documenting corpus and lexicon.
2. Mistakes in translating copular verbs.
3. GT could not distinguish the differences tense,
aspects and mood of verb morphology.
4. When auxiliary verbs/light verbs are used, GT could
not capture their various functional meanings.
5. Problems in analysing agglutinated forms.
Conclusion
The present paper discusses
various instances of mistranslation done by the Google translator in Malayalam
context. From this test of accuracy, it is observed there are instances of
mismatch and incoherence appeared in Google translation. We used Malayalam
sentences as input and it generates less corresponding translation in English.
Language used in this work exhibit rich morphology which causes poor
translation quality. It opens a new vista of MT in Malayalam. The above mapping
of untranslatability is not a failure of the Google device rather NLP in
Malayalam is not scientifically enriched. It may be concluded that the quality
of translation is directly dependent on the scope and quality of NLP and
parallel language corpora. Malayalam Linguistics should concentrate primarily
on the morphological aspects of the language and it’s computing for making MT
realised in Malayalam.
Bibliography
·
Antony, P.J. 2012. Machine Translation Approaches
and Survey for Indian Languages. Computational Linguistics and Chinese Language
Processing. Vol.18,
No.1,
March 2013.
·
Garie,VY, Kbarate.U.K. Survey of Machine Translation systems in
India. International Journal on Natural
Language Computing (IJNLC) Vol.2, No.4, October 2013.
·
http://GloablSecurity.org/intell/systems/mt.history.html
·
Hutchins,W.John. Machine Translation: A Concise
history. http://ourworld.compuserve.com/homepages/wjHutchins
·
http://ourworld.compuserve.com Latest version
November 2005
·
Jomysose.
Machine Translation with special reference to Malayalam language: International Journal of computer science and
Engineering Technology (IJCSET). Vols.
No.04,
April 2014.
·
Translation directory.com/articles/articles 190t.php
Abbreviations
Acc – Accusative case.
CONJ – conjunctive
CONT – Continuous
DAT – Dative
DES – Desiderative
FEM – Feminine
FUT – Future tense
GEN – Genitive
HAB – Habitual
LOC – Locative
NEG – Negative
PAST – Past Tense
PCPL – Participle
PERF – Perfective
PERM – permissive
PL – Plural
POSS – Possibilitive
PRES – Present tense
PROM – Promissive
REMO – Remote
I like your post very much. It is very much useful for my research. I hope you to share more info about this. Keep posting Cyber Security Training
ReplyDelete