आभ्यंतर (Aabhyantar) ISSN : 2348-7771
लोक, भाषा, विश्व साहित्य और समकालीन वैचारिकी का मंच
अंक : 06 जनवरी-मार्च
2018
(Issue-06) (January-March 2018)
..........................................................................................................................................
विषय (Contents)
1.
Modeling a Computational Grammar of Hindi 11-15
Dr.
Dhanji Prasad : Ast. Prof., Language Technology MGAHV, Wardha
Academic
Coordinator : SCONLI-12
..........................................................................................................................................
Modeling a Computational Grammar of Hindi
Dr.
Dhanji Prasad
Assistant
Professor, Linguistics and Language Technology
MGAHV,
Wardha
Summary
The
grammar is a collection of rules and linguistic units of language; it
makeslanguage learning easier and solves the problems of word and sentence
formation. But, these are the classical usages of grammar which is meant for
humans. In current scenario,computerhas emerged as a major component in all areas
of human behavior.Research has changed our way of looking at grammar. The
efforts are going on to makea computational model of human’s language knowledge.
These efforts are collectively called as Natural Language Processing (NLP).
There are various usages and applied areas of NLP. These areas have been
categorized in four levels in this research paper- Pre-NLP, Primary NLP,
Central NLP and Advance NLP.
Hindi
is the official and most
spoken/understood language of India.there is a very large amount of literary
and scientific data available in the form of text and multimedia.Besides this,
Researchers are trying to make a computational model for Hindi Grammar so that
we can use the potential of the computer in Hindi NLP Applications.For thispurpose
we need a robust rule-based grammar for Hindi that may be called as the Computational
Grammar of Hindi. In this research paper I have talked about a sample of model
to prepare this computational grammar.
Keywords :Computational grammar, Pre-NLP, Primary NLP,
Central NLP and Advance NLP, Hindi Language.
1.0 Introduction
The
grammar is a systematic representation of the structure of a language. It
describes the phonological, morphological and syntactic rules of the language.
It is an authentic source to learn that language or to describe how sentences
and words are formed in that language etc. The form ofgrammar may change
according to time or according to concerned society of language or even
according to the goal of the grammarian.Generally grammar of any natural
language is prepared for humans. Grammar can be divided in two broad categories
according to the user –1. mother tongue speakers and 2. second language or
foreign language learners. There are different tradition to write grammar for
above mentioned users.
The
invention of computer wasthe revolution of 20th century. Now in 21st
century, computersarethe part of our day-to-day life. Today no-one can imagine
their life without computer or (digital machines such as smartphones etc.).
Language is fundamental medium of our thinking and social behavior. Thus, if we
want to fully incorporate the computer in our life we have to process the Natural
languages on computer.Many researchers and institutions are trying to
processthe natural languagesby various techniques but there are lots of problem
in processing natural language. This is why no one can deny that‘A computer
oriented comprehensive grammar of a language is necessary to complete this
task.’Many researchers are developing computational grammar and grammatical
frameworks for many languages such as ‘English’. This research paper proposes a
format to model a Computational Grammar for Hindi.
2.0
Related Works
A
very little amount of work is found on Computational Grammar of Hindi. There is
no computational grammar made for Hindi till date, but some scholars have
composed some grammars in this way. An important work named ‘A Syntactic
Grammar of Hindi’ (2012) by Surajbhan Singh. In this book,Author has defined
the phrase structure and sentence frames of Hindi very
systematically. Looking forward for other languages we can find some on this
topic for ‘Sinhala’ language. A research paper named ‘A Computational Grammar
of Sinhala’ is published as chapter in ‘Computational Linguistics and
Intelligent Text Processing’ (2012). ‘A Computational Grammar for Deep
Linguistic Processing ofPortuguese’ (2014) is a project report submitted to Department
of Informatics, University of Lisbon. It describes the parsing rules for
‘Portuguese’.dejuliaH(2012) discusses on Building a computational grammar for
Esperanto.
Returning
to the Indian languages, ‘Gb-Based Computational Grammar for Punjabi:A Machine
Translation Perspective’ (1997) is a thesis submitted to School of Computer and
Systems Sciences, Jawaharlal Nehru University by Paramjit Singh. Therefore, a
Computational Grammar for Hindi is a requirement of current period.
3.0
Computational Grammar : A Need for NLP
Natural
Language Processing (NLP) is a branch of computer science trying to establish
the structural knowledge of human languages in the machine (computer). Its aims
to utilize this knowledge from basic application areas such as- Machine
Translation (MT), Optical Character Recognition (OCR), Information Retrieval
(IR), Transliteration, Computer-assisted Language Teaching/Learning (CALT/CALL)
etc. to Artificial Intelligence. Both of the forms of language usage- spoken
and written are processed in NLP. The processing of spoken form of language is
called ‘Speech Processing’ and the processing of written form of language is
called ‘Text Processing’. There are many tasks performed in NLP and through
these tasks different types of tools or software are developed. According to
the nature of these tools or software NLP may be categorized in four stages-
·
Pre-NLP :There
is a pre-stage of NLP, where the tasks related to Programming, Database
Management, Font Designing, Font Conversion etc. are performed. Working on
these things is not NLP. No computational grammar is needed for these tasks.
·
Primary-NLP :In
NLP the systematic knowledge of a human language is established to machine.
This knowledge must be related to various levels of language analysis, such as-
phoneme, morpheme, word, phrase, clause, sentence and semantics. The basic work
of NLP starts Morph Analysis or Morph Generation. This is related to morpheme
and word, concerned with a lexicon and some morphemic rules of inflection.
Thus, the works like ‘Building Computational Lexicon, Punctuation Marks
Recognition and Processing,Transliteration System,Morph Analysis, Morph
Generation,Building Spell Checker,Date, Time and Currency Recognition’ may be
categorized as primary NLP. A word level computational grammar is required for
this purpose.
·
Central-NLP :
This is the heart of NLP. The main objective of NLP is to execute the sentences
(and their meaning) of human languages by machines. The main tasks regarding
this are- ‘Parts of Speech Tagging,Phrase Marking,Building Grammar
Checker,Automatic Sentence Generation and Parsing. A robust and explicit
computational grammar is required to perform these tasks. The tools developed
in primary NLP are also used in developing the software related to these areas.
·
Advanced-NLP :Central
NLP is the main NLP. The developed in it are used in the applied areas, such
as- Machine Translation,OCR,Information Retrieval Machine Translation (MT),
Optical Character Recognition (OCR), Information Retrieval (IR) and Artificial
Intelligence. But when a developer tries to perform above tasks and to develop
systems, he starts to face lot of human language complexity problems. These
problems are called NLP challenges. Structural and semantic Ambiguity,Name
Entity Recognition: NER,Multi-word Expression-MWE,Discourse References are
among the most common challenges. A computational grammar must provide rules to
possible extent to resolve these problems.
4.0 Why a Computational Grammar for Hindi
Hindi is the
most spoken language of India and it has also got the status of official
language of India by constitution. Most of the scholars assume it as third most
spoken language of the world. Hindi is used as vernacular language or lingua
franca of the nation. It has a rich literature tradition in all formats. Today,
a lot technological developments have also taken place through Hindi. It means
Hindi has also updated on the platform of technology. A lot of material from
all the branches of knowledge are available online currently through various
websites, blogs etc. One of the most popular website for material on Hindi
literature and language is- ‘http://www.hindisamay.com/’
established and developed by Mahatma Gandhi Antararaashtriya Hindi
Vishvavidyalaya, Wardha. Anyone can find here the material on Hindi
literature and language in about a half million pages-
There are also
a lot of other websites, such as- http://www.shabdkosh.com/, http://bharatdiscovery.org/india/मुखपृष्ठ, http://kavitakosh.org/kk/कविता_कोश_मुखपृष्ठ, http://www.hindikunj.com/ , http://jkhealthworld.com/hindi/, http://www.onlymyhealth.com/hindi.html, http://www.thehealthsite.com/hindi/
and Hindi Wikipedia etc. There are hundreds of news channels and news papers in
Hindi and everyone knows about Hindi cinema.
Besides all
the above a lot of software and tools have also been developed for Hindi. This
is why, if we talk about India from linguistic perspective, we must have to
consider Hindi. There is also a point to be considered that Hindi is the
successor of the most classical language ‘Sanskrit’ that had the richest
tradition of grammarians. ‘Ashtadhyaayi’ by Panini has been said as ‘one of the
greatest monuments of human intelligence.’ by one of the most famous American
Linguist L. Bloomfield in his renowned
book Language (1933). Thus, we can ask ‘Why there is no such grammar for Hindi
from today’s perspective?’ The answer is ‘Computational grammar’ that must be
written.
5.0 Design of
a Computational Grammar of Hindi
The Computational Grammar
of Hindi should be designed in such a way that the rules described in it may be
processed in machines. It is grammar for machines written on the background of
linguistics. It may contain different sections on various levels and processes
of the language. The author of this article has tried to compose a
Computational Grammar of Hindi. It is an initiative; anyone may compose a
computational grammar. Here, I would like to introduce some major aspects of
the grammar. This grammar is divided in following sections-
5.1 Character and Lexicon
In this section first of
all Hindi Sctript, Font and Unicode has been introduced. After this ‘Spelling
Checking, Translitration, Punct. Marks, Special Chs., Logograms Processing’ and
‘Computational Lexiconand Word Categories’ are described.
5.2
Derivation/ Word Formation
The
derivational part of language is most important to understand the internal
construction of words. This section contains Affixation for prefixes and
suffixes, Sandhi to understand morphophonemic changes and Compounding.
5.3
Inflection and Morphological Analysis
Inflection
is the change in words to make them usable in meaningful sentences.
‘Grammatical Categories’ play the key role in inflection, so are described in
this section. After this the major parts of speech- Noun, Pronoun, Adjective
and Verb are discussed from the point of inflection. Some Other Parts of Speech
and Inflection is also analyzed in it. ‘Morphological Analysis’ is a subject to
be discussed theoretically.
5.4
Grammatical Recognition of Text
The
central processing of a linguistic material (text or speech) starts after
‘Parts of Speech Tagging’. In this process some general issues like- Name
Entity Recognition: NER, Date, Time and Currency Entity Recognition: DTCER,
Multiword
Expression and Ambiguity occurs. The identification and disambiguation rules to
fulfill the purpose are subject matter for this section.
5.5
Phrase and Sentence Analysis
‘Sentence’
is the primary unit of language. It is the largest grammatical and minimal
communicative unit of a language. Sentences are made from phrases. To analyze
the construction of sentences of a language we must have to study first about
‘Phrase Structures and Frames’. Phrases are joined in a sentence according to
‘Case and Other Functional Categories’ to make it meaningful and communicative.
It means, all functional categories are the part of discussion in this section.
All these are part of ‘Sentence Frames’ that’s rules are the things to be
analyzed. Analyzing the sentences in frames is technically called ‘Parsing’.
Parsing is the key task of NLP for sentence analysis. Thus, it is described in
this section. After all, from the point of view of application, ‘Grammar
Checking’ takes place here. It is important for the computational grammar to
depict how to develop a Grammar Checker for Hindi.
6. Concluding Remarks
In this way, Computational Grammar is a basic requirement
for the researches and works regarding NLP. If we want to make Hindi a digital
language in real sense, we must have to model and prepare robust Computational
Grammar of Hindi. This grammar will make Hindi efficient for rules-based
processing. The grammar must be structured on linguistic background as it
contain the leveled rules for various systems of language, such as- phonemic
system, morphemic system and syntactic system etc. It should also consider the
various levels of NLP to prepare tools and software.If we shall prepare an
adequate grammar of this kind, there will be no interruption in processing
Hindi on digital platform.
7. References
·
Costa, Francisco. Branco, Ant´onio. 2014. A
Computational Grammar for Deep Linguistic Processing of Portuguese. University
of Lisbon.
·
Gelbukh, Alexander. Editor. 2012.
Computational Linguistics and Intelligent Text Processing. Springer.
·
Ritchie. Graeme D. 1980. Computational
Grammar: An Artificial Intelligence Approach to Linguistic Description. Branch
LinePress.
·
Singh, SurajBhan. (2017) A Syntactic Grammar
of Hindi. New Delhi :Prabhatprakashan.
No comments:
Post a Comment