چکیده:
This paper presents an efficient mechanism to convert Sana’ani dialect to
modern standard Arabic. The mechanism is based on morphological rules
related to Sana’ani dialect as well as Modern Standard Arabic. Such rules
facilitate the dialect conversion to its corresponding MSA. The mechanism
tokenizes the input dialect text and divides each token into stem and its
affixes; such affixes can be categorized into two categories: dialect affixes
and/or MSA affixes. At the same time, the stem could be dialect stem or
MSA stem. Therefore, our mechanism, implemented by using a simple MSA
stemmer, must pay attention to such situations. Then our dialect stemmer is
applied to strip the resulting token and extract dialect affixes. At this point,
the rules are applied to decide when to carry out the extraction of an affix.
The experiment shows that Sana’ani dialect has three classes of distortions,
which are prefixes, suffixes, and stems distortions. The algorithm normalizes
such distortion based on the morphological rules. For each morphological
rule the mechanism checks possibility of applying such a rule. That means if
rule conditions be met, then the dialect affix will be replaced by its
corresponding MSA. If there is no restriction on applying the rule related to
the distorted stem, then the rule can be considered as a parallel corpus of the
dialect and MSA. Finally, the experiment computes the distortion ratio of
MSA in Sana’ani dialect. For a Sana’ani dialect sample of 9386 words,
16.29% of them have distorted suffixes, 0.70% have distorted prefixes and
2.17% contain distorted stems. These percentages are related only to the
processed words.
خلاصه ماشینی:
The mechanism is based on morphological rules related to Sana’ani dialect as well as Modern Standard Arabic.
If there is no restriction on applying the rule related to the distorted stem, then the rule can be considered as a parallel corpus of the dialect and MSA.
In fact, the main objective of this paper is to design and implement an algorithm to convert the Sana’ani dialect into modern standard Arabic.
These rules could be applied to handle any distortion in MSA language (Sana’ani dialect).
Table I S yntactic Rules The translation process from dialect to MSA could take place as shown in Table 2 Table 2 Sample of Translation {مراجعه شود به فایل جدول الحاقی} {مراجعه شود به فایل جدول الحاقی} In example 1 and example 2 included in Table 2, different rules are selected and applied to the same enclitic ' '.
In general, our rules do not have such a deep dependency except in distorted MSA stems which have dialect clitics.
The algorithm accepts Sana’ani dialect text as inputs, it processes the corpus and produces Table 4 contents.
That means the algorithm works fine as long as it is able to accept Sana’ani dialect of size 9386 words and process such a corpus to produce 77.
The other dependant rule will be applicable by removing clitics (stemming) and/or replacing dialect stem with MSA equivalent using corpus.
Conclusion The experiment results show how to use rule-based algorithm to convert the dialect to MSA.