Keyword based Marathi Interface to the Database using Natural Language Processing

— Now a day‟s a computer plays a vital role in almost all the sectors/application. The available information is stored and retrieved in/from the database. Database Management system (such as SQL, Oracle) allows to handle the database by creation of the database, querying the database, updating database, and administration of databases. So there is need of skilled person (database experts) in order to deal with database management system, but this is not always true. The common people with lack of expertise are there to handle the database, who doesn‟t know about the syntax/format of queries to be fire on the database. Also they are more comfortable in their native languages like “Marathi”, “Hindi”, “Arabic”. It is essential to have the database interface in native language (e.g Marathi) so that the non-expert person can also interact with database for storing and extracting the data. In this paper we are proposing the Marathi language interface using Natural Language Processing (NLP) to handle the database. This interface can handle the queries in native language which may be syntactically and semantically incorrect and convert it into correct form.(e.g SQL query). processing , Section II contain the related work of natural language interface to the database system, Section III contain the problem definition and proposed system, Section IV contain the architecture and algorithm of proposed system, section V explain the working of proposed system with example, Section VI give conclusion with future direction.


I. INTRODUCTION
In this computer era, most of sectors like education, banking, government and medical become computerized. Information is stored in the database. Most all applications store and extract the data from database by using the Structured Query Language (SQL) language. The person who has knowledge of this SQL language can able to access the database. But people working in above mentioned sectors are might not be the expert in SQL language. So it"s important to use query in natural language to interact with the database. But queries in natural form not understand by the database. So need to focus on NLP concept to convert queries (in Natural language) into SQL. Using natural language interface person with no knowledge of SQL can also handle the database easily.
One of the most important applications of Natural Language Processing (NLP) is Natural Language Interface to the database (NLIDB). This interface allows the non-database expert to interact with the database in natural language.

Natural Language Processing:
Natural language processing is the branch of Artificial Intelligence (AI), which concern with computer can process the human like languages. Science goal of NLP is to understand how language produced and how it is understand. Also engineering goal of NLP is to reduce the man-machine interaction. The translation from English sentence to logical form involves slot filling, scope determination and parsing. One limitation is, while converting English sentence into logic, "pronouns" are not taken into account. After converting sentence into logical form, it is possible to answer the question due to precise semantics of logic.
Basically, Chat-80 do the augmentation of logical form of a query with extra control information, to make it efficient piece of Prolog program, which can be executed to answer the query.

NLKBIDB:
Axita Shah , et al [5] proposed new system of natural language and keyword based interface to agricultural database, which gives result for syntactically correct or incorrect queries. NLKBIDB is combined approach of NLIDB and KBIDB. NLIDB is more accurate than KBIDB for generation of SQL query based on natural language, but it can"t handle syntactically incorrect queries which are handled by KBIDB. System mainly consists of three agents i.e Natural Language Agent , Keyword Based Agent and Knowledge Base Agent along with SQL generator and SQL executor. Natural Language Agent perform lexical , syntactic and semantic analysis of query entered by user in natural form and generated tree is given to SQL generator. Keyword Based Agent do mapping of token generated from lexical analysis with knowledge. Knowledge Base Agent generated the knowledge base using representing the metadata in XML form. The SQL generator work according to rules of natural language or rules of keyword based interface.  [6] proposed new system work with the model natural language interface using database and designed to framework owl ontology. The author expresses the brief idea of relational database with ontology and use of word net dictionary. The paper gives the characteristics of frameworks.

Using Natural Language Processing in Order to
 Natural language and database interface is organized in such way that business model and conceptual model of database works with ontology.  The framework uses lexicons with the synonymy and antonym relations to integrate with word net. The framework maps the lexicons entered by the user with the domain lexicons.  This framework uses the Discourse Representation Structure (DRS), and Discourse representation Theory (DRT).
This structure uses template techniques to convert DRS to structured query language. To accomplish this, the format is determined by the SQL grammar and content is provided by DRS. With respect to above defined features the framework is divided into two modules. One module is language processing module and second module is database processing module. Language processing module is getting completed with different sub module as, semantic grammar, syntax parser, lexicon, semantic interpreter and most important is Ontology. Again this ontology is work with respect business model and database schema. Language processing module accepts input as natural language as input and it into tree representation and converts it into intermediate representation as DRS. Natural language query sentences are translated into DRS query sentences, and then translated into SQL query sentences.
Hindi Language GUI to DBMS: Mohit Dua, et al [7] implemented interface to database system in Hindi language. Architecture of the system mainly consists of three component i.e tokenizer, mapper and query generator. Tokenize separate the token from the user query which is in Hindi. Then mapper maps these tokens to the English tokens available in the lexicon along with type i.e command, function, condition_start, column_name, table_name etc.Query generator generate the SQL query according the mapping done. This system was support the various queries that include joining operator, logical operator, selection of multiple column etc.

Enhancing the Relevance of Information Retrieval by Querying the Database in Natural form:
Prof. Amisha Shingala, et al [8] has developed Student Interface System in Natural Language (English). The system can convert the user question in English form into SQL query. The English sentence is first converted into intermediate logic by semantic interpreter and then converted into SQL form to retrieve data from database.

III. PROBLEM DEFINITION
In most of the sectors today, there is use of application based software"s. These applications can store and retrieve information to and from database. For most of application relational database management system is used. A specialpurpose programming language designed for managing data held in a relational database management system (RDBMS) i.e SQL (structured query language). Anybody who interacts with database should be expert in SQL, so as to store and retrieve, update and delete the records from the database. That person, who knows the syntax of all queries in SQL, can handle the database properly. But everywhere it"s not possible. The person who operates the database is not always expert in SQL. Another important issue is, people are more comfortable in their local natural languages rather than database language i.e SQL. So it creates the problem while handling the database. And because of that they can"t use the application software properly.

Interface to database system:
One of the solutions to above mentioned problem is design and develops a software interface which can handle the queries in local natural language, which can be syntactically and semantically incorrect. By using this software interface, it is possible to convert the natural language query statements written in local language into required form i.e SQL, which can understand by the database. Because of this interface it is not necessary to have expert person to handle the database.
Two types of interface are allow the user to interact with the database i.e natural language interface to DB(NLIDB) and Keyword based interface to DB(KBIDB).

I) Natural language interface to DB(NLIDB):-
It consist of three main components that language component, intermediate representation language and database component.

a) Language Component:
Language component consist of,

Lexical Analyzer:
Query entered by user in natural form is given as input to the lexical analyzer. The Lexical analyzer is act as string tokenizer, which will generate tokens from the given query by considering space as delimiter excluding the stop words.

Syntax Analyzer:
Syntax analyzer, parse the given input according to the grammar and assign the correct parse tree to input query (sentence)

Semantic Analyzer:
Semantic analyzer, represent the meaning of linguistic input in formal structure like First Order Predicate Calculus (FOPC). This process is called as meaning representation. The meaning of the sentence is depends on the meaning of the parts of sentences i.e word. The meaning of a sentence is not based solely on the words that make it up, it is based on the ordering, grouping, and relations among the words in the sentence.

Discourse Analysis:
Processing of sequence of sentences is known as discourse and it requires discourse planning.

Pragmatics:
Study of how the context provides meaning is known as pragmatics. It"s main aim to modeling user intention.

b) intermediate representation language:
It"s an interface between the front end language component and database component. It is kind of knowledge base which is derived from the metadata of the database.

c) Database component: Database component uses this intermediate language to
convert the natural language query into SQL query.

II) Keyword based interface to DB (KBIDB):-
Keyword based interface to database involves two important steps i) Generation of symbol table (keyword table) in database which shows the equivalence between the word in natural language to word in English word. Basically it"s creation of lexicon, which stores the tokens in natural language and correspond English token along with types to indicate whether it is table name, column name, value…etc. ii) Preparation of SQL query from natural language query by mapping the keyword from natural query to keywords stored in the symbol table.

IV. PROPOSED WORK
In this paper we are focusing on Marathi language. Our proposed system can accepts the queries in Marathi language and convert that query into SQL format, which is executed against the database and finally result will be displayed to user. The system will handle SELECT, UPDATE and DELETE type of queries on the database.
The proposed work consist of four phases namely String tokenizer , Keyword Mapper, Query Generator and SQL Query Executor.

V. ARCHITECTURE OF PROPOSED WORK
The following figure shows the architecture of the proposed work.

Figure1: Architecture of Proposed System
Proposed system consist different module like string tokenizer, Keyword Mapper, Query Generator and SQL Query Executor.
User query in the natural form is given input to the string tokenizer. The String tokernizer split the sentence into different token. Words separated by the white space are known as tokens. These token are stored in one array. Token of sentence may indicate type of command, table name, column name, value, start_of_condition, condition, logical_operator which are the important parts of SQL query.
The token which was extracted from the sentence is then compared with lexicon dictionary which contain token (word) in Marathi language, it"s corresponding English word and type of the token indicating either that token is table_name, column_name, value, condition, start_of_condition, logical_operator or command of operation performed on the database.
The Keyword mapper compare the Marathi tokens with lexicon directory to extract the corresponding keyword of English language along with type. All this information given to the SQL query generator. SQL query generator is main module of the proposed system, which is responsible for actual query generation. Based on knowledge of SQL query and information given by the keyword mapper, SQL query is generated which is further executed by SQL query executor against the database to display the result to the end user.

VII. WORKING OF PROPOSED SYSTEM
Our proposed system can accept the queries in Marathi language and convert that query into SQL format. The system will handle SELECT, UPDATE and DELETE type of queries on the database. For testing purpose we have taken following assumption, i) Student Database is taken into consideration with following attributes, Name, Roll_No, Class, City, Mobile_No, Marks ii) It"s assume that we have the lexicon dictionary which contain the mapping of Marathi word to corresponding English token along with the types of token. The proposed system will work as follows,

a) Simple Select query for single column selection
First we consider a very simple query entered by user in Marathi language as follows,

Query Entered by User: sava-ivaVaqyaaMcaI naavao saaMga.
This query is given input to the string tokenizer ,which will separate the token from the query. In entered query total 4 token are there, generated token is stored in the array, which will given to Keyword mapper. The mapper search in the array for the command. Here in this example the word "saaMga" is the command which will map to "select" word in English. Also identified column_name1 is "Name" by mapping "naavao " to the English token and the Marathi word "ivaVaqyaaMcaI" will map to student.
The identified mapped words will given input to the query generator. Based on the knowledge base and output from string tokenizer mapper will generate the following SQL query, Corresponding SQL Query: "Select Name from Student"

b) Select query for multicolumn selection
We will consider the another example where user has entered query for selection of multiple column.
Total 6 token are identified in above entered query. In the given Marathi query mapper identify the different column names like Name, Roll_No and Class after mapping Marathi word to English token which will be stored in column_name1, column_name2 and column_name3, corresponds to Marathi word naavao , kmaaMk , vaga--.
Corresponding SQL Query: "Select Name, Roll_No, Class from Student" c) Select with where clause Let"s consider example with where clause Query Entered by User: sava-ivaVaqyaaMcaI naavao saaMga jyaaMcaa vaga--tRtIya vaYa-Aahooo In the query entered by the user total 9 token are there. The mapper identify the column name as "naavao" and store into variable column_name1 as name. Also it find lexicon with type "condition" as "jyaaMcaa" which will map to "where" clause. The mapper store column_names after condition into condition_column_name and remaining part into value array. It will also consider the logocal_operators.
The given query works similar like query with where clause. The word "sarasarI" is mapped to function AVG , "caaLIsa" will map to 40 and stored into condition_column_name and "jyaast" will mapped to ">" and stored into condition_array.
Corresponding SQL Query: "Select Name from Student where AVG(Marks)>40"

VIII. CONCLUSION
Keyword based Marathi interface to Database system (KBMIDB) is a system that accepts queries in Marathi language. User give Marathi query to the database interface which will converted into SQL query.
When user enters the query in Marathi, the sentence is parsed and all the tokens are stored in an array. Then mapping of these tokens will be done to find out table name, column name, conditions, commands and values. Then these tokens are converted into English word. From these words, SQL query is formulated. The system supports all type of selection, updating and deletion on database.