Word Sense Disambiguation Focusing on POS Tag Disambiguation in Persian:A Rule-based Approach مقاله

نویسنده: Alayiaboozar، Elham ؛ Moloodi، Amirsaeid ؛ Kouhestani، Manouchehr ؛

International Journal Of Information Science And Management July & December 2019, Volume 17 - Number 2 رتبه بین المللی (وزارت علوم/ISC (‎16 صفحه - از 119 تا 134 )

کلیدواژه ها: context POS Tagging Homographs POS Disambiguation Noun and Adjective HomographsEnding in <ی> sensitive Rules

چکیده:

The present study deals with ambiguity at word level focusing on homographs. In different languages, homographs may cause ambiguity in text processing. In Persian, the number of homographs is high due to its orthographic structure as well as its complex derivational and inflectional morphology. In this study, a broad list of homographs was extracted from some Persian corpora first. The list indicates that the number of homographs in Persian corpora is high and homographs with high frequency are those that occur as a result of the identical orthographic representation of some inflectional and derivational morphemes. Based on the list, the most frequent homographs are nouns and adjectives ending in <ی> /i/. POS tag disambiguation of such homographs would make word sense disambiguation easier and lead to better text processing. In this study, a list of noun and adjective homographs ending in <ی> is extracted in order to decide their correct POS tag. The result was studied to extract context-sensitive rules for allocating the right POS tag to the homograph in syntactic structures. The accuracy of rules was checked, and the result showed that the accuracy of most rules is high which proves most rules are true.

خلاصه ماشینی:

In this study, different classifications of Persian homographs are presented, then the frequency of homographs is studied in three Persian corpora including the Persian written corpus or Peykare, also known as Bijankhan corpus (Bijankhan, Sheykhzadegan, Bahrani & Ghayoomi, 2011), the Farsi linguistic database, also known as paygah-e dadegan-e zaban-e Farsi (Assi, 1997), and the Persian syntactic dependency Treebank (Rasooli, Kouhestani & Moloodi, 2013). Method A rule-based approach for studying homographs Word Sense Disambiguation (WSD) is the task of determining which sense of an ambiguous word (word with multiple meanings) is chosen in a particular use of that word, by considering its context (Abed et al: 2015). These methods as introduced in Wilks & Stevenson (1998), Montoyo, Suarez, Rigau & Palomar (2005), Bakx (2006), Makki & Homayounpour (2008), Riahi & Sedghi (2012), Singh & Gupta (2015), Mahmoodvand & Hoourali (2015) are overviewed as machine learning (includes supervised and unsupervised) and external knowledge sources. For example, Jani and Pilevar (2012) seek to elaborate disambiguation of Persian words with the same written form but different senses using a combination of supervised and unsupervised method which is conducted by means of thesaurus and corpus. A general study of the list of homographs shows that the number of homographs in different Persian corpora is considerable which means that POS tag disambiguation is necessary, otherwise text processing would face problems. the list indicates that the number of homographs in the Persian corpora is high which means that word POS tag disambiguation is necessary, otherwise text processing would face problems.

دریافت فایل ارجاع :
(پژوهیار, , , )

دانلود HTML
دانلود PDF

ورود / عضویت

برای مشاهده محتوای مقاله لازم است وارد پایگاه شوید. در صورتی که عضو نیستید از قسمت عضویت اقدام فرمایید.

ورود

عضویت

تحتاج دخول لعرض محتوى المقالة. إذا لم تكن عضوًا ، فتابع من الجزء الاشتراک.
إن كنت لا تقدر علی شراء الاشتراك عبرPayPal أو بطاقة VISA، الرجاء ارسال رقم هاتفك المحمول إلی مدير الموقع عبر webmaster@noormags.com .

You need Sign in to view the content of the article. If you are not a member, proceed from part Sign up.
If you fail to purchase subscription via PayPal or VISA Card, please send your mobile number to the Website Administrator via webmaster@noormags.com .

لینک کوتاه:

1400

1399

1398

1397

1396

1395

1394

1393

1392

1391

1390

1389

1388

1387

1386

1385

1384

1383

1382

1381

Word Sense Disambiguation Focusing on POS Tag Disambiguation in Persian:A Rule-based Approach مقاله