Developing a Comprehensive Standard Persian Positional Tagset مقالة

مؤلف: Mahdavi، Mohammad Amin ؛

International Journal Of Information Science And Management January & June 2018,Volume 16 - Number 1 التصنيف الف (Ministry of Science/ISC (‎26 صفحة - من 165 إلی 190 )

الکلمات المفتاحية: Persian Positional Tagset Persian POS tagset Standard Persian Tagset PersianMorphosyntactic tagset

خلاصة:

One of the primary tools used in text processing tasks such as information retrieval, text extraction, and text mining, is a corpus that is enhnaced by linguistic tags. In a corpus development effort, the role of a POS-tagger is to assign a linguistic tag to every textual token. POS annotation relies heavily on a tagset based on a linguistic theory. Text processing in Persian, too, follows this common practice. Several tagsets have been introduced, so far, to annotate Persian corpora. However, each tagset has followed a specific standard and linguistic theory. The resulting tagsets contain a limited number of tags, which renders them inadequate for a larger scope of research. This study is inspired by EAGLES, MULTEXT-East, positional tagset standards to produce a comprehensive standard positional tagset for Persian. The proposed tagset is also informed by the existing Persian tagsets. The proposed Persian Positional Tagset (PPT) is designed to be used for morphological, lexical, and syntactic annotations of Persian corpora.

ملخص الجهاز:

The proposed Persian Positional Tagset (PPT) is designed to be used for morphological, lexical, and syntactic annotations of Persian corpora. The additional motivation is to produce a comprehensive set of part-of-speech categories and their respective features for Persian along with the proposed positional tagging scheme. This study, therefore, intends to propose a comprehensive positional tagset that can be used for morphological, lexical, and syntactic annotation of Persian corpora. Some of the pioneering works on corpus development for English language dates back to the Brown corpus (Greene & Rubin, 1971), the Lancaster/Oslo-Bergen corpus (LOB) (Johansson, 1986), Spoken English Corpus (SEC) (Taylor & Knowles, 1988), the Polytechnic of Wales corpus (PoW) (Souter, 1989), the University of Pennsylvania corpus (UPenn) (Santorini, 1990), the London-Lund Corpus (LLC) (Eeg-Olofsson, 1991), the International Corpus of English (ICE) (Greenbaum, 1992, 1993), the British National Corpus (BNC) (Burnard, 2000), and the Spoken Corpus Recordings in British English (SCRIBE) (Huckvale, 2004), among others. Persian Dependency Treebank uses a tagset that, in addition to morphosyntactic annotation, introduces 43 categories for dependency relations (Rasooli, Moloodi, Kouhestani, & Minaei-Bidgoli, 2011). Another initiative that intends to introduce a consistent morphological tagsets for Indo- European languages is MULTEXT-East, which is informed by MULTEXT (Derzhanski & Kotsyba, 2013; Erjavec, 2012). Introducing Persian positional tagset (PPT) The proposed tagset, in this study, is intended to cover a wide range of annotations from morphological analysis to Treebank, syntactic, and lexical analysis. Similar to EAGLES and MULTEXT-East, Persian Positional Tagset (here we adopt the PPT label for the proposed tagset) reserves the first position for specifying the main categories.

استلام ملف الإرجاع :
(پژوهیار, , , )

تحميل HTML
تحميل

دخول / الاشتراک

تحتاج الدخول لعرض محتوى المقالة. إذا لم تكن عضوًا ، فتابع من الجزء الاشتراک.

دخول

الاشتراک

تحتاج دخول لعرض محتوى المقالة. إذا لم تكن عضوًا ، فتابع من الجزء الاشتراک.
إن كنت لا تقدر علی شراء الاشتراك عبرPayPal أو بطاقة VISA، الرجاء ارسال رقم هاتفك المحمول إلی مدير الموقع عبر webmaster@noormags.com .

You need Sign in to view the content of the article. If you are not a member, proceed from part Sign up.
If you fail to purchase subscription via PayPal or VISA Card, please send your mobile number to the Website Administrator via webmaster@noormags.com .

رابط قصير:

1400

1399

1398

1397

1396

1395

1394

1393

1392

1391

1390

1389

1388

1387

1386

1385

1384

1383

1382

1381

Developing a Comprehensive Standard Persian Positional Tagset مقالة