An XML Approach of Coding aMorphological Database for Arabic Language

We present an XML approach for the production of an Arabic morphological database for Arabic language that will be used in morphological analysis for modern standard Arabic (MSA). Optimizing the production, maintenance, and extension of morphological database is one of the crucial aspects impacting natural language processing (NLP). For Arabic language, producing a morphological database is not an easy task, because this it has some particularities such as the phenomena of agglutination and a lot of morphological ambiguity phenomenon. The method presented can be exploited by NLP applications such as syntactic analysis, semantic analysis, information retrieval, and orthographical correction.


Introduction
Currently, Arabic faces many challenges due to a lot of reasons such as the increase of the Arabic web sites, Arabic media, and Arabic companies around the world using the Arabic language.For these reasons, a lot of research in the domain has been developed to satisfy the increasing demand of the applications using Arabic.Arabic morphology is one of the essential needs in this domain, and lots of morphological analyzers are available now, some of them have a commercial purpose, and the others are available for research and evaluation as discussed by Attia [1].The development of a morphological analyzer requires the production of a morphological database, and many approaches has been developed.In morphological analysis, Buckwalter Arabic morphological analyzer and Xerox Arabic finite state morphology are two of the best known morphological analyzers for MSA, and they are available and well documented.Concerning the production of Arabic lexicon resource, the LMF approach is became as one of the most used for representing the lexicon resource.
The morphological analysis of Arabic is interested, as of other languages, in the structure of the word as discussed by El-Sadany and Hashish [2].But being given the wealth of the Arabic word's structure and the problem of agglutination, the operation becomes more complex than in the other languages.We also note that diacritics are particularities for our language, and they are also considered as another source of difficulty for morphological analysis.For all these reasons seen so far, Arabic is conceived as one of the languages that present a big problem in the morphological analysis and make this process very complicated.
In this paper, we will be presenting a morphological analyzer based on morphological automaton developed using a new approach for the production of a morphological database.To develop the morphological analyzer and product the morphological database, we used the particularities of Arabic that is concretized on multilevel: verbs and nouns are also characterized by a specific representation named the matrix "root scheme".This representation will help us construct a morphological automaton for the Arabic language.We note that we have used an innovative language (XMODEL) to represent the Arabic morphological knowledge.The use of this new language helps us to reduce the number of the entries in the lexicon.It also makes our system very flexible and one of the best existing morphological analyzers for the Arabic language.
The structure of the paper is as follows.First, in this introduction, we discuss some challenges of Arabic language Advances in Human-Computer Interaction and the importance of the production of morphological database and morphological analyzers in natural language processing.After that, we present some morphological analyzers for Arabic language related to our work in Section 2.Then, in Section 3, we explain our approach for the choice of the linguistic resource and compared it with the LMF approach.In Section 4, we present our Arabic Morphological Analyzer.In Section 5, we evaluate the proposed technique.In Section 6, we discuss the obtained results.Finally, in Section 7, we draw some conclusions and future works to be done.

Works in the Domain
Morphological processing involves two different tasks according to the operation type: generation and analysis.In generation, we produce correct forms using given morphemes, while in analysis, we try to identify morphemes for a given word.A lot of research has been done in the development of morphological analyzers for Arabic; some of them are available for research and evaluation, while the rest have a commercial purpose.Analyzer (2004).This analyzer is considered as one of the most referenced in the literature, well documented and available for evaluation.It is also used by linguistic data consortium (LDC) for POS tagging of Arabic texts, Penn Arabic Treebank, and the Prague Arabic Dependency Treebank as discussed by Atwell et al. [3].It takes the stem as the base form and root information is provided.This analyzer contains over 77800 stem entries which represent 45000 lexical items.However, the number of lexical items and stems makes the lexicon voluminous, and as a result, the process of analyzing an Arabic text becomes long.

Xerox Arabic Morphological Analysis and Generation.
Xerox Arabic morphological Analyzer is well known in the literature and available for evaluation and well documented.This analyzer is constructed using finite state technology (FST) as discussed by Beesley [4,5].It adopts the root and pattern approach.Besides this, it includes 4930 roots and 400 patterns, effectively generating 90000 stems.The advantages of this analyzer are, on the one hand, the ability of a large coverage.On the other hand, it is based on rules and also provides an English glossary for each word.But the system fails because of some problems such as the overgeneration in word derivation, production words that do not exist in the traditional Arabic dictionaries as discussed by Darwish [6], and we can consider the volume of the lexicon as another disadvantage of this analyzer which could affect the analysis process.

ElixirFM: An Arabic Morphological Analyzer by Otakar
Smrz.ElixirFM is an online Arabic morphological analyzer for modern written Arabic developed by Otakar Smrz available for evaluation and well documented.This morphological analyzer is written in Haskell, while the interfaces in Perl.ElixirFM is inspired by the methodology of functional morphology (Forsberg and Ranta [7]) and initially relied on the reprocessed Buckwalter lexicon as discussed by Buckwalter [8].It contains two main components: a multipurpose programming library and a linguistically morphological lexicon as discussed by Smrž [9].The advantage of this analyzer is that it gives to the user four different modes of operation (resolve, inflect, derive and lookup) for analyzing an Arabic word or text.But the system has limited coverage, because it analyzes only words in the modern written arabic.

Linguistic Resource
So as to develop a morphological analyzer of the Arabic language, representing the morphological knowledge becomes very crucial.Besides this, it is viewed as one of the central problems of the automatic processing of the Arabic morphology.
According to some works, in order to represent the morphological knowledge of the Arabic language, they have chosen to use the database concept as a basic support to store the morphological information as discussed elsewhere [5,10].To seek any information, they make use of requests consulting.Unfortunately, this method remains very limited to this type of challenges.Consequently, they do not give good results.
A second method of representation, which is widely used, is offered by the artificial intelligence.Accordingly, a morphological analyzer serves as an intelligent system able to infer the morphological nature of the analysed sentence from a certain knowledge-base which consists of data and morphological rules as discussed by Shaalan [11].However, the artificial intelligence language is criticized for its being general and sequential search of information.The choice of the Lisp or Prolog language as a support of representing the morphological knowledge may not probably be the right option.This is due to the fact that the interpreter is not well adapted to this kind of problems.
A third method which will be used in the future works for representing, designing, and implementing the lexical resource is the method using lexical markup framework (LMF).It was used in lots of languages (Indo-European), but for Arabic language, this method still in progress towards a standard for representing the Arabic linguistic resource.In the next paragraph, we will present our approach to represent the linguistic resource for Arabic, and we finish by giving a comparison between our approach and the LMF approach for representing the linguistic resource.
enables us to represent the whole components, properties, and morphological rules with a very optimal way.To clarify the last point, we note that our morphological database contains 960 lexicon items (morphological components) and 455 morphological rules to be applied to these morphological components which present a remarkable reduction in the number of entries in the lexicon compared to the existing systems (Xerox and Buckwalter).This representation helps us achieve the following goals: (i) a symbolic definition, declarative and, therefore, progressive of the Arabic morphology, (ii) a morphological database independent of processing that will be applied (see later), (iii) a considerable reduction of the number of morphological entries, (iv) the notion of scheme enables us to define the maximum morphological components by means of XMODEL language.
Our language makes it possible for us to represent the Arabic morphology as morphological classes and rules.Accordingly, our Arabic morphological database will be composed of three main parties: morphological classes, morphological properties, and morphological rules.Now, let us first introduce the XMODEL language which permits to represent the morphological knowledge of Arabic and consists of the following three main parties.

Morphological Component Class.
It enables us represent all morphological components of the Arabic language.It also permits to gather a set of morphological components having the same nature, the same morphological characteristics, and the same semantic actions.Relying on the notion of scheme /ealwazn/ ( ), this class allows us a better optimization, hence a considerable reduction of morphological entries.By doing so, we need not represent all the language items but only their schemes.We note that our lexicon <?xml version="1.0"encoding="ISO-8859-1"?> <package name="PropertyPackage"> <morphological_properties> <property name="Person" type="exclusive"> <descriptor name="Pr1"/> <descriptor name="Pr2"/> <descriptor name="Pr3"/> </property> <property name="Gender " type="additive"> <descriptor name="GFe"/> <descriptor name="GMa"/> </property> </morphological_properties> </package> contains 960 items (morphological components) which is a remarkable reduction in the number of the items compared to the other dictionaries (Figure 1).
There are two strategies to characterize the morphological components using the following properties.

Property of Components. A morphological class can use a list of morphological descriptors to define its components.
Generally speaking, each morphological component can have its own morphological descriptors.As for the "gender" property, some components of this class can be masculine, while the others can be feminine.This type of properties is named the property of components.In order to put them into practice, we have introduced the "uses" tag.This means that the different morphological descriptors defined by the property of components can be used by the different morphological components of the morphological class (Figure 3).

Property of Classes.
This one requires assigning a set of morphological components to the common morphological properties.For example, all components are masculine names.This type of property is known as property of classes.To realize this, we introduce the "is" tag (Figure 4).In the above example, all the schemes are singular components and masculine gender.It becomes evident to mention that the same class of the morphological components can use one combination of the tags "uses" and "is".

Property of Reference.
Another strong point of the XMODEL language is the introducing of the notion of property of reference which has an important role to benefit from the specificities of the Arabic morphology.As for the Arabic language, some morphological components might be conjugated forms of other components which we call original components.An example of this is the case of the following components "afcalu", "afcilu ", and "afculu".These components are all conjugated forms of the component "facala".We have specified this link of reference between the components using the "ref " tag (Figures 5 and 6).

Morphological Rules
Class.Firstly, it should be noted that we developed 455 morphological rules for the Arabic language.They help us combine some morphological components (morphemes) together to generate correct language words.They use the different morphological components classes as well as the morphological properties classes.The morphological rules classes allow us the possibility to add new morphological descriptors which do not belong to the union of morphological descriptors of components of rules.As a result, they are considered as a generator of language words.The implementation of the morphological rules class permits to put into practice all the possible concatenations between components (Figure 7).In the above example, this rule permits to generate the components which begin by the prefix "la" and the prefix "bi".
The structuring of our morphological database using XMODEL language allows us to generate the morphological automaton of the Arabic language.In the next section, we will be dealing with the notion of morphological automaton.

Comparison with LMF Approach.
Lexical markup framework (LMF, ISO-24613) is the ISO standard which provides a common standardized framework for the construction of natural language processing lexicons.The US delegation is the first which started the work on LMF in 2003.In early 2004, the ISO/TC37 committee decided to form a common ISO project with Nicoletta Calzolari (Italy) as convenor and Gil Francopoulo (France) and Monte George (US) as editors.The first step in developing LMF was to design an overall framework based on the general features of existing lexicons and to develop a consistent terminology to describe the components of those lexicons.The next step was the actual design of a comprehensive model that best represented all of the lexicons in detail.A large panel of 60 experts contributed a wide range of requirements for LMF that covered many types of NLP lexicons.The editors of LMF worked closely with the panel of experts to identify the best solutions and reach a consensus on the design of LMF.Special attention was paid to the morphology in order to provide powerful mechanisms for handling problems in several languages that were known as difficult to handle.
The aims of LMF are to provide a common model for the creation and use of lexical resources at all levels, to manage the exchange of data between and among these resources, and to enable the merging of large number of individual electronic resources to form extensive global electronic resources.It allows instantiation of monolingual, bilingual, or multilingual lexical resources and works at a small scale or large scale.Finally, it tries to cover all natural languages (including languages with rich and complex morphology such as Arabic).
For Arabic language, this new method is still in progress, and some works try to standardize it as discussed elsewhere [12][13][14].As we discussed before, LMF covered all levels of linguistic description (morphologic, syntactic, and semantic).To compare this method with our approach, we will be limited to the morphological level.
The morphological extension model allows representing the morphological information of lexicons.Given the inflectional and derivational aspect of Arabic, they have used a package of classes defined in the morphological extension of LMF to describe the scheme and the written lemmatized form and the root and the properties of inflected forms as discussed by Baccar et al. [12].They defined two strategies to describe the morphology of a word.The first one is to represent explicitly all inflected forms, and the second strategy is to use an inflectional paradigm as discussed by Francopoulo and George [13].Figure 8 shows an example of representing the lexical entry "kataba" ( ).
Related to the example above and the description of our approach seen before, the main remark that we make when we compare the two approaches is concerning the representation of any Arabic word.Our approach permits a remarkable reduction of the lexical entries in the morphological database, because the main idea of our approach is to represent Arabic words by their schemes and not by their roots and schemes as we saw in the LMF model.So, this advantage will affect immediately the size of the morphological database and which was considered as one of the problems of morphological analysis.The future works which will be done on the construction of Arabic lexicon using LMF will prove this remark.

System Description
In this part, we describe the Arabic morphological analyzer.This latter is based on using morphological automaton technology.The implementation of each morphological analyzer for any language needs a main resource.This resource is the morphological database, so the first task is the conception and the realization of a morphological database.We also used a new language, XMODEL language, to create this database.After that, the second task is the development of a set of morphological automatons for Arabic language each of which represents a very definite category of morphology.

Arabic Morphological Automaton.
To implement a morphological analyzer for any language, especially Arabic language, the use of morphological automaton is considered among the most efficient methods.It can be used for both analysis and generation.This latter is based on the notion of finite state automaton (DFA).A word is accepted by the morphological automaton if it belongs to a correct word in Arabic and rejected in the contrary case.Generally, the Arabic morphological automaton will have the following features.
(i) Q is a finite set of states of the control unit which represents the states of a morphological automaton.
(ii) is a finite input tape alphabet symbols.Concerning morphological automaton for Arabic, it is constituted of the alphabets of Arabic language.
(iii) q 0 is the start state of the morphological automaton.
It is constituted of only one start state in the case of a morphological automaton.
(iv) F is a subset of Q.It also represents the accepting states of the morphological automaton for Arabic.This latter has a very important role, because it permits to give us a set of information of the Arabic words analyzed.This information is called the morphological descriptors and they also characterize these words.
(v) The set τ also represents the transition function of the morphological automaton.
Consequently, the building of the morphological automaton of the Arabic language needs to use the XMODEL database discussed before.We have to extract all the morphological rules from this database and construct a morphological automaton of each rule.So, to realize that constructing, we have to use some automaton operations such as concatenation and union operation.Let us clarify how we can use these two operations to generate a morphological automaton for a definite morphological rule.So, we consider the following rule: <?xml version="1.0"encoding="ISO-8859-1"?> <package name="RulesPackage"> <rules_class name="CardNbCRules"> <rule id="rule_1"> <morpheme key="CardNumber.CNAccepteSCID.CNAccepteSC"/> <morpheme key="CasSuffixe.SCD " component="u"/> <idp name="CNADefMarfUc"/> </rule> ... </rules_class> </package> ---So, to generate the morphological automaton which represent this rule, we have to use the operation of concatenation to concatenate the first morpheme (key = "CardNumber.CNAccepteSCID.CNAccepteSC") with the second one (key = "CasSuffixe.SCD" component = "u").Therefore, the morphological automaton that represents this morphological rule is the following shown in Figure 9.
In the following paragraphs, we present a detail of how to construct all the morphological automatons of the Arabic language and technique used in this constructing.
So as to build a morphological automaton of the Arabic language, we have classified words of the Arabic language in to two categories: the first category is that which submits to the derivation process, while the second one does not.This process of derivation is generated by a set of morphological rules known in the Arabic grammar under the name "qawAcidu eaSSarfi" / /.They repose on the manipulation of a set of very determined schemes named "ealeawzAn" / /.
A scheme "ealwazn" / / is an abstract linguistics term that represents a family of varied derived words; these words might be verbs or derived nouns which share the same linguistic features as discussed by Tahir et al. [15]).At the graphical level, a scheme generally constitutes of the following.
(i) three main consonants that are represented by the letters "f " / /, "c" / /, and "l" / / with a possibility to duplicate the last letter "l" as in the case of schemes that correspond to a four letters root like "faclala'' / /, (ii) some consonants that serve as tools to extend the root like "stafcala" / / and "tafAcala" / /, (iii) a group of adequate vowels.
We have grouped in the first category of words the following items:  (i) derived nouns "ealaSmAe ealmu taqqa " / /, (ii) strong verbs "ealeafcAl eaSSaHIHa" / /: these are the verbs that contain no weak letters.In the Arabic language, there are three weak letters: "w" / /, "A" / /, and "y" / /, (iii) weak verbs "ealeafcAl ealmuctalla" / /: these are the verbs that contain a weak letter.Weak verbs are also classified into three categories as discussed by Attia [10]: (a) assimilated "ealmi∼al" / /: a verb that contains an initial weak letter, (b) hollow "ealajwaf " / /: a verb that contains a middle weak letter, (c) defective "eannAqiS" / /: a verb that contains a final weak letter.
Finally, after the generation the morphological automaton which contains the Arabic vocalized, its size is about 120 MB.
Concerning the number of the entries generated by our system, it is about 5961 entries, which represent a remarkable reduction of the number of the entries and makes our system as one of the best existing systems.These obtained results confirm what we explained before about our new language (XMODEL) to represent the Arabic morphological knowledge with an optimal way.To concretize this representation, let us take an example of how to generate a morphological automaton of verbs that have these schemes: "facala", "facila", "facula", and "faclala".So, related to the example in Figure 11, the automaton contains 19 states including 4 accepting states ("19", "16", "17", and "18") which represent the four schemes.This representation permits a remarkable reduction of the number of the morphological entries and explains the results seen before.
morphological analyzer for our language.So, in the following paragraph, we will present the proposed technique for the Arabic language (Figure 11).

The Arabic Morphological Analyzer.
First of all, because our morphological analyzer is based on the morphological automaton which is the main idea of this work, so this part it will be shorter than the part of the Arabic morphological automaton.So, in this section, we describe our morphological analyzer for Arabic language.This analyzer is developed using three principal components.
(i) A morphological database constructed using the XMODEL language based on XML language integrating all the data suitable for Arabic language.Its regroups three packages: package of morphological components that contains verbs, nouns, particles, and affixes.The second package includes the morphological rules and the last package is concerned with the morphological properties.
(ii) A set of morphological automatons for the Arabic language each of which represents a very specific morphological category.
(iii) A program handling the morphological database and the morphological automaton.It is developed through the use of Java language.
In addition, the method presented is meant to give a set of information about any Arabic word given to it.This set of information is about the following.
(i) The gender of the word: masculine or feminine.
(ii) The person of the word: first, second, or third person.
(iii) The number of the word: singular, dual, or plural.
(v) The type of the word: verb, noun or particle.
(vii) The origin scheme of the word is given if available.
Let us analyze some examples of Arabic verbs and nouns using our system.These examples are taken from a standard input text provided by ALECSO (Arab League, Educational, Cultural, and Scientific Organization) which organized a competition in April 2009 of the Arabic Analyzers in Damascus.The standard input text provided by ALECSO is unvocalized, in our test, we used a vocalized version.This standard input text is provided in this file: http://www.alecso.org.tn/images/stories/OULOUM/MOHALLILAT%20SARFIA DAMAS 2009/020%20NIZAR .html(Figure 12).
The example above shows the results of the morphological analysis of some verbs using our system.The example below will show the results of the morphological analysis of some nouns.
We note that this work is very rich for the information giving about every noun analyzed which makes it one of the best Arabic morphological analyzers.So, let us take the "properties" column in the case of nouns, we conclude that our system gives the maximum of information about each word analyzed which will be very useful for the future works to be done (Figure 13).
We note that this set of information has an important role especially in future works like for example the building of a syntactic analyzer, a semantic analyzer, machine translation, and so forth.
Finally, the development of our morphological analyzer for Arabic language has many advantages such as the following.
(i) The separation between the task of the linguist and the developer.
(ii) We can also reuse our programs in future works.
Figure 12: A morphological analysis of some verbs using our system.
(iii) Development standardization means in our application that we have build all the applications with the same standards technologies (Java language, XML technologies, SAX, DOM, etc.).
(iv) The work presented is developed using Java language.Therefore, our analyzer can be run in any platform such as Windows, Linux, UNIX, and Mac OS.
(v) The facility of maintenance: it is easy to add some new features to our system if the user or the linguist needs them for his Arabic morphological analysis.It is also easy to extend our system to include some new works related to NLP of Arabic such as information retrieval, syntactic and semantic analyzers, correction, and generation of Arabic texts.

Evaluation
In this section, we are going to evaluate Xerox Arabic morphological analyzer, the Arabic morphological analyzer by Otakar Smrz, and our system.We note that a standard annotated corpus for Arabic language is not yet available, and for this reason, the process of evaluation will be difficult.So, we have chosen Xerox morphological analyzer and the morphological analyzer by Otakar Smrz, because they are one of the best known morphological analyzers for MSA, and they are also available and well documented as shown in Tables 1, 2, and 3. On the one hand, the first remark when we compare the three morphological analyzers is about the information giving by each one.Used an innovative language (XMODEL) for Figure 13: A morphological analysis of some nouns using our system.
representing the morphological knowledge and the notion of the morphological automaton for Arabic language, our morphological analyzer gives more information about each word analyzed and more precision compared to Xerox Arabic morphological analyzer and Smrz's analyzer.To clarify this point, let us take some Arabic words and try to analyze them using the three morphological analyzers.
Related to Tables 1, 2, and 3 which represented the results of ten different Arabic words analyzed using the three morphological analyzers, we note that our morphological analyzer provides more information and more precision about the word analyzed compared to the others.Thanks to our new innovative language (XMODEL) which permits to represent the morphological knowledge in an optimal way and the power of the morphological automaton for Arabic, this advantage will be very useful especially in the future works which will be done later.It should be noted that our system could provide more information about the word analyzed according to the user needs.
On the other hands, let us see the evaluation process from another view.So, we have selected a corpus of 975 words containing different type of the word in Arabic (verbs, nouns, and particles).Then, we tested them on each morphological analyzer, after that, we draw a detailed analysis for the three analyzers.Our corpus contains 975 words divided into 481 nouns, 362 verbs, and 132 particles.Table 4 shows the number of words which are not found when they are analyzed using the three morphological analyzers.
To conclude this part of evaluation, using a new innovative language (XMODEL) and the notion of morphological automaton, our morphological analyzer can reach an average of performance around 90% which will make it one of the best existing morphological analyzers for the Arabic language, and it will be very useful for the next future  works to be done in NLP.We note that an update of our morphological database could resolve these errors seen in Table 4. Represented as a set of XMODEL files, the process of updating the morphological database becomes very easy which make our innovative language one of the best languages to represent the Arabic morphological knowledge.

Discussion
To compare our morphological analyzer for the Arabic language to the other existing systems, the task is difficult to do, because there is no standard to make this comparison, and every system has its own target.For this reason, each analyzer has some advantages and disadvantages compared to the others.Our morphological analyzer for the Arabic language has some advantages compared to the others analyzers.These advantages are the following.
(i) Our morphological analyzer can be used in both analysis and generation.
(ii) It handles diacritized texts which permit to reduce the rate of ambiguity.(iii) Our new and innovative language (XMODEL) used for the representation of the morphological knowledge and the use of morphological automaton for the Arabic language permit to avoid a huge problems of ambiguity in the Arabic language which the most analyzers cannot resolve.
(iv) The use of XMODEL language permit to reduce the number of the entries in the morphological database which present a big problem of the other morphological analyzers.
(v) Represented as a set of XMODEL files, the process of updating the Arabic morphological database is very easy to develop.This makes our system very flexible and one of the best existing morphological analyzers.
(vi) The major advantage of our system is that it permits, on the one hand, to give the affixes, the stem and the scheme (if the word is a noun or a verb and if it has a scheme) for any word given.On the other hands, it gives the information about the word analyzed using a list of morphological descriptors which permits to characterize every Arabic word.
Our system has some disadvantages compared to some other systems.Firstly, it cannot handle undiacritized texts.Secondly, it does not provide an English glossary, and finally, it handles words which do not exist in the Arabic language.
To clarify the last disadvantage, let us take an example of some invalid words analyzed with our system (Figure 14).The example above shows five invalid words analyzed as they are valid words in the Arabic language.So, the next works to be done is to solve this problem.To solve it, we will keep the current design of morphological analysis.Our idea is to use a lexicon of any existing morphological analyzer (e.g., Buckwalter's analyzer, Xerox analyzer, etc.) and try to eliminate the invalid words generated by the morphological automaton and reconstruct it without these errors.This operation will reduce the number of the invalid words.But, this kind of problems is not very serious to take into consideration, because Arabic Language is very rich in words, and every year, there is some news words added to the language.

Conclusion and Future Works
The production of a morphological database is the first step in morphological analysis and most natural language processing.In this work, we presented an approach for representing the lexicon resource for Arabic language constructed using a flexible and extensible language (XMODEL).This morphological database used to develop a morphological analyzer for Arabic language.The strength of our morphological analyzer is the portability and the reusability because we have used Java for the development and the XML technology.
To extend our platform, we can also think to develop some works in the future such as the following.
(i) Develop the syntactic analyzer: our morphological database holds a lot of syntactic information, which makes it very useful for syntactic analysis.
(ii) Develop the semantic analyzer: our morphological database holds semantic information, which makes it very useful in semantic analysis.
(iv) Develop a system for Arabic learning.
(v) The help of the correction and the generation of texts.
(vi) Automatic understanding of the texts: classification of texts, automatic summary, and automatic extraction of the key words.
(vii) Realize some specific applications such as machine translation, Q/R systems, information retrieval systems; that is, these applications need morphological database and morphological analysis to be developed.
The next step is to use this morphological database and morphological analyzer to develop a syntactic analyzer for Arabic language using especially the syntactic information given by our system.

Figure 4 :
Figure 4: Example of the property of classes.

Figure 6 :
Figure 6: The conjugated forms of some verbs.

Figure 7 :Figure 8 :
Figure7: Class of rules which represent the components prefixed by the prefix "la" and "bi".

Figure 9 :
Figure 9: A morphological automaton representing the above morphological rule.

Figure 10 :
Figure 10: A morphological automaton representing the above rule.

Figure 14 :
Figure 14: Some invalid words analyzed by our system.

Table 1 :
Words analyzed using Xerox Arabic morphological analyzer.

Table 3 :
Words analyzed using our morphological analyzer.

Table 4 :
Results of the evaluation process.

Table 5 :
The acronyms signification in morphological analysis.

Table 6 :
The english translation of some Arabic words.