A Method to Convert Sana’ani Accent to Modern Standard Arabic مقاله

نویسنده: G. H. Al-Gaphari ؛ D. M. Al-Yadoumi ؛

International Journal Of Information Science And Management January & June 2010, Volume 8 - Number 1 رتبه علمی-پژوهشی (وزارت علوم/ISC (‎12 صفحه - از 39 تا 50 )

کلیدواژه ها: Methods Conversion NLP Dialects Algorithms MSA

چکیده:

This paper presents an efficient mechanism to convert Sana’ani dialect to modern standard Arabic. The mechanism is based on morphological rules related to Sana’ani dialect as well as Modern Standard Arabic. Such rules facilitate the dialect conversion to its corresponding MSA. The mechanism tokenizes the input dialect text and divides each token into stem and its affixes; such affixes can be categorized into two categories: dialect affixes and/or MSA affixes. At the same time, the stem could be dialect stem or MSA stem. Therefore, our mechanism, implemented by using a simple MSA stemmer, must pay attention to such situations. Then our dialect stemmer is applied to strip the resulting token and extract dialect affixes. At this point, the rules are applied to decide when to carry out the extraction of an affix. The experiment shows that Sana’ani dialect has three classes of distortions, which are prefixes, suffixes, and stems distortions. The algorithm normalizes such distortion based on the morphological rules. For each morphological rule the mechanism checks possibility of applying such a rule. That means if rule conditions be met, then the dialect affix will be replaced by its corresponding MSA. If there is no restriction on applying the rule related to the distorted stem, then the rule can be considered as a parallel corpus of the dialect and MSA. Finally, the experiment computes the distortion ratio of MSA in Sana’ani dialect. For a Sana’ani dialect sample of 9386 words, 16.29% of them have distorted suffixes, 0.70% have distorted prefixes and 2.17% contain distorted stems. These percentages are related only to the processed words.

خلاصه ماشینی:

The mechanism is based on morphological rules related to Sana’ani dialect as well as Modern Standard Arabic. If there is no restriction on applying the rule related to the distorted stem, then the rule can be considered as a parallel corpus of the dialect and MSA. In fact, the main objective of this paper is to design and implement an algorithm to convert the Sana’ani dialect into modern standard Arabic. These rules could be applied to handle any distortion in MSA language (Sana’ani dialect). Table I S yntactic Rules The translation process from dialect to MSA could take place as shown in Table 2 Table 2 Sample of Translation {مراجعه شود به فایل جدول الحاقی} {مراجعه شود به فایل جدول الحاقی} In example 1 and example 2 included in Table 2, different rules are selected and applied to the same enclitic ' '. In general, our rules do not have such a deep dependency except in distorted MSA stems which have dialect clitics. The algorithm accepts Sana’ani dialect text as inputs, it processes the corpus and produces Table 4 contents. That means the algorithm works fine as long as it is able to accept Sana’ani dialect of size 9386 words and process such a corpus to produce 77. The other dependant rule will be applicable by removing clitics (stemming) and/or replacing dialect stem with MSA equivalent using corpus. Conclusion The experiment results show how to use rule-based algorithm to convert the dialect to MSA.

دریافت فایل ارجاع :
(پژوهیار, , , )

دانلود HTML
دانلود PDF

ورود / عضویت

برای مشاهده محتوای مقاله لازم است وارد پایگاه شوید. در صورتی که عضو نیستید از قسمت عضویت اقدام فرمایید.

ورود

عضویت

تحتاج دخول لعرض محتوى المقالة. إذا لم تكن عضوًا ، فتابع من الجزء الاشتراک.
إن كنت لا تقدر علی شراء الاشتراك عبرPayPal أو بطاقة VISA، الرجاء ارسال رقم هاتفك المحمول إلی مدير الموقع عبر webmaster@noormags.com .

You need Sign in to view the content of the article. If you are not a member, proceed from part Sign up.
If you fail to purchase subscription via PayPal or VISA Card, please send your mobile number to the Website Administrator via webmaster@noormags.com .

لینک کوتاه:

1400

1399

1398

1397

1396

1395

1394

1393

1392

1391

1390

1389

1388

1387

1386

1385

1384

1383

1382

1381

A Method to Convert Sana’ani Accent to Modern Standard Arabic مقاله