Total Pageviews

Friday, February 9, 2018

Modeling a Computational Grammar of Hindi

    आभ्यंतर (Aabhyantar)                                                               ISSN : 2348-7771
   लोक, भाषा, विश्व साहित्य और समकालीन वैचारिकी का मंच

   अंक : 06                                                                                              जनवरी-मार्च 2018

   (Issue-06)                                                                                       (January-March 2018)
..........................................................................................................................................
विषय (Contents)
1.       Modeling a Computational Grammar of Hindi                                                                  11-15
Dr. Dhanji Prasad : Ast. Prof., Language Technology MGAHV, Wardha
Academic Coordinator : SCONLI-12
..........................................................................................................................................
Modeling a Computational Grammar of Hindi
Dr. Dhanji Prasad
Assistant Professor, Linguistics and Language Technology
MGAHV, Wardha
Summary
The grammar is a collection of rules and linguistic units of language; it makeslanguage learning easier and solves the problems of word and sentence formation. But, these are the classical usages of grammar which is meant for humans. In current scenario,computerhas emerged as a major component in all areas of human behavior.Research has changed our way of looking at grammar. The efforts are going on to makea computational model of human’s language knowledge. These efforts are collectively called as Natural Language Processing (NLP). There are various usages and applied areas of NLP. These areas have been categorized in four levels in this research paper- Pre-NLP, Primary NLP, Central NLP and Advance NLP.
Hindi is the official  and most spoken/understood language of India.there is a very large amount of literary and scientific data available in the form of text and multimedia.Besides this, Researchers are trying to make a computational model for Hindi Grammar so that we can use the potential of the computer in Hindi NLP Applications.For thispurpose we need a robust rule-based grammar for Hindi that may be called as the Computational Grammar of Hindi. In this research paper I have talked about a sample of model to prepare this computational grammar.

Keywords :Computational grammar, Pre-NLP, Primary NLP, Central NLP and Advance NLP, Hindi Language.
1.0 Introduction
The grammar is a systematic representation of the structure of a language. It describes the phonological, morphological and syntactic rules of the language. It is an authentic source to learn that language or to describe how sentences and words are formed in that language etc. The form ofgrammar may change according to time or according to concerned society of language or even according to the goal of the grammarian.Generally grammar of any natural language is prepared for humans. Grammar can be divided in two broad categories according to the user –1. mother tongue speakers and 2. second language or foreign language learners. There are different tradition to write grammar for above mentioned users.
The invention of computer wasthe revolution of 20th century. Now in 21st century, computersarethe part of our day-to-day life. Today no-one can imagine their life without computer or (digital machines such as smartphones etc.). Language is fundamental medium of our thinking and social behavior. Thus, if we want to fully incorporate the computer in our life we have to process the Natural languages on computer.Many researchers and institutions are trying to processthe natural languagesby various techniques but there are lots of problem in processing natural language. This is why no one can deny that‘A computer oriented comprehensive grammar of a language is necessary to complete this task.’Many researchers are developing computational grammar and grammatical frameworks for many languages such as ‘English’. This research paper proposes a format to model a Computational Grammar for Hindi.
2.0 Related Works
A very little amount of work is found on Computational Grammar of Hindi. There is no computational grammar made for Hindi till date, but some scholars have composed some grammars in this way. An important work named ‘A Syntactic Grammar of Hindi’ (2012) by Surajbhan Singh. In this book,Author has defined the phrase structure and sentence frames of Hindi very systematically. Looking forward for other languages we can find some on this topic for ‘Sinhala’ language. A research paper named ‘A Computational Grammar of Sinhala’ is published as chapter in ‘Computational Linguistics and Intelligent Text Processing’ (2012). ‘A Computational Grammar for Deep Linguistic Processing ofPortuguese’ (2014) is a project report submitted to Department of Informatics, University of Lisbon. It describes the parsing rules for ‘Portuguese’.dejuliaH(2012) discusses on Building a computational grammar for Esperanto.
Returning to the Indian languages, ‘Gb-Based Computational Grammar for Punjabi:A Machine Translation Perspective’ (1997) is a thesis submitted to School of Computer and Systems Sciences, Jawaharlal Nehru University by Paramjit Singh. Therefore, a Computational Grammar for Hindi is a requirement of current period.
3.0 Computational Grammar : A Need for NLP
Natural Language Processing (NLP) is a branch of computer science trying to establish the structural knowledge of human languages in the machine (computer). Its aims to utilize this knowledge from basic application areas such as- Machine Translation (MT), Optical Character Recognition (OCR), Information Retrieval (IR), Transliteration, Computer-assisted Language Teaching/Learning (CALT/CALL) etc. to Artificial Intelligence. Both of the forms of language usage- spoken and written are processed in NLP. The processing of spoken form of language is called ‘Speech Processing’ and the processing of written form of language is called ‘Text Processing’. There are many tasks performed in NLP and through these tasks different types of tools or software are developed. According to the nature of these tools or software NLP may be categorized in four stages-
·        Pre-NLP :There is a pre-stage of NLP, where the tasks related to Programming, Database Management, Font Designing, Font Conversion etc. are performed. Working on these things is not NLP. No computational grammar is needed for these tasks. 
·        Primary-NLP :In NLP the systematic knowledge of a human language is established to machine. This knowledge must be related to various levels of language analysis, such as- phoneme, morpheme, word, phrase, clause, sentence and semantics. The basic work of NLP starts Morph Analysis or Morph Generation. This is related to morpheme and word, concerned with a lexicon and some morphemic rules of inflection. Thus, the works like ‘Building Computational Lexicon, Punctuation Marks Recognition and Processing,Transliteration System,Morph Analysis, Morph Generation,Building Spell Checker,Date, Time and Currency Recognition’ may be categorized as primary NLP. A word level computational grammar is required for this purpose.
·        Central-NLP : This is the heart of NLP. The main objective of NLP is to execute the sentences (and their meaning) of human languages by machines. The main tasks regarding this are- ‘Parts of Speech Tagging,Phrase Marking,Building Grammar Checker,Automatic Sentence Generation and Parsing. A robust and explicit computational grammar is required to perform these tasks. The tools developed in primary NLP are also used in developing the software related to these areas.
·        Advanced-NLP :Central NLP is the main NLP. The developed in it are used in the applied areas, such as- Machine Translation,OCR,Information Retrieval Machine Translation (MT), Optical Character Recognition (OCR), Information Retrieval (IR) and Artificial Intelligence. But when a developer tries to perform above tasks and to develop systems, he starts to face lot of human language complexity problems. These problems are called NLP challenges. Structural and semantic Ambiguity,Name Entity Recognition: NER,Multi-word Expression-MWE,Discourse References are among the most common challenges. A computational grammar must provide rules to possible extent to resolve these problems.
4.0 Why a Computational Grammar for Hindi
Hindi is the most spoken language of India and it has also got the status of official language of India by constitution. Most of the scholars assume it as third most spoken language of the world. Hindi is used as vernacular language or lingua franca of the nation. It has a rich literature tradition in all formats. Today, a lot technological developments have also taken place through Hindi. It means Hindi has also updated on the platform of technology. A lot of material from all the branches of knowledge are available online currently through various websites, blogs etc. One of the most popular website for material on Hindi literature and language is- ‘http://www.hindisamay.com/’ established and developed by Mahatma Gandhi Antararaashtriya Hindi Vishvavidyalaya, Wardha. Anyone can find here the material on Hindi literature and language in about a half million pages-
Besides all the above a lot of software and tools have also been developed for Hindi. This is why, if we talk about India from linguistic perspective, we must have to consider Hindi. There is also a point to be considered that Hindi is the successor of the most classical language ‘Sanskrit’ that had the richest tradition of grammarians. ‘Ashtadhyaayi’ by Panini has been said as ‘one of the greatest monuments of human intelligence.’ by one of the most famous American Linguist L. Bloomfield in  his renowned book Language (1933). Thus, we can ask ‘Why there is no such grammar for Hindi from today’s perspective?’ The answer is ‘Computational grammar’ that must be written.
5.0 Design of a Computational Grammar of Hindi
The Computational Grammar of Hindi should be designed in such a way that the rules described in it may be processed in machines. It is grammar for machines written on the background of linguistics. It may contain different sections on various levels and processes of the language. The author of this article has tried to compose a Computational Grammar of Hindi. It is an initiative; anyone may compose a computational grammar. Here, I would like to introduce some major aspects of the grammar. This grammar is divided in following sections-
5.1 Character and Lexicon
In this section first of all Hindi Sctript, Font and Unicode has been introduced. After this ‘Spelling Checking, Translitration, Punct. Marks, Special Chs., Logograms Processing’ and ‘Computational Lexiconand Word Categories’ are described.
5.2 Derivation/ Word Formation
The derivational part of language is most important to understand the internal construction of words. This section contains Affixation for prefixes and suffixes, Sandhi to understand morphophonemic changes and Compounding.
5.3 Inflection and Morphological Analysis
Inflection is the change in words to make them usable in meaningful sentences. ‘Grammatical Categories’ play the key role in inflection, so are described in this section. After this the major parts of speech- Noun, Pronoun, Adjective and Verb are discussed from the point of inflection. Some Other Parts of Speech and Inflection is also analyzed in it. ‘Morphological Analysis’ is a subject to be discussed theoretically.
5.4 Grammatical Recognition of Text
The central processing of a linguistic material (text or speech) starts after ‘Parts of Speech Tagging’. In this process some general issues like- Name Entity Recognition: NER, Date, Time and Currency Entity Recognition: DTCER,
Multiword Expression and Ambiguity occurs. The identification and disambiguation rules to fulfill the purpose are subject matter for this section.
5.5 Phrase and Sentence Analysis
‘Sentence’ is the primary unit of language. It is the largest grammatical and minimal communicative unit of a language. Sentences are made from phrases. To analyze the construction of sentences of a language we must have to study first about ‘Phrase Structures and Frames’. Phrases are joined in a sentence according to ‘Case and Other Functional Categories’ to make it meaningful and communicative. It means, all functional categories are the part of discussion in this section. All these are part of ‘Sentence Frames’ that’s rules are the things to be analyzed. Analyzing the sentences in frames is technically called ‘Parsing’. Parsing is the key task of NLP for sentence analysis. Thus, it is described in this section. After all, from the point of view of application, ‘Grammar Checking’ takes place here. It is important for the computational grammar to depict how to develop a Grammar Checker for Hindi.
6. Concluding Remarks
In this way, Computational Grammar is a basic requirement for the researches and works regarding NLP. If we want to make Hindi a digital language in real sense, we must have to model and prepare robust Computational Grammar of Hindi. This grammar will make Hindi efficient for rules-based processing. The grammar must be structured on linguistic background as it contain the leveled rules for various systems of language, such as- phonemic system, morphemic system and syntactic system etc. It should also consider the various levels of NLP to prepare tools and software.If we shall prepare an adequate grammar of this kind, there will be no interruption in processing Hindi on digital platform.
7. References
·        Costa, Francisco. Branco, Ant´onio. 2014. A Computational Grammar for Deep Linguistic Processing of Portuguese. University of Lisbon.
·        Gelbukh, Alexander. Editor. 2012. Computational Linguistics and Intelligent Text Processing. Springer.
·        Ritchie. Graeme D. 1980. Computational Grammar: An Artificial Intelligence Approach to Linguistic Description. Branch LinePress.
·        Singh, SurajBhan. (2017) A Syntactic Grammar of Hindi. New Delhi :Prabhatprakashan.

No comments:

Post a Comment