توسعه یک روش انتخاب مشخصه مبتنی بر نظریه اطلاعات و الگوریتم ژنتیک Journal Article

علوم و فنون مدیریت اطلاعات پاییز 1402 - شماره 32 Ranking الف (Ministry of Science/ISC (‎24 page(s) - From 8 to 31 )

Keywords: داده کاوی الگوریتم ژنتیک تئوری اطلاعات انتخاب مشخصه پیش پردازش داده کلاسبند Data pre-processing Data mining Classification Information theory feature selection Genetic Algorithm

fa en

Abstract:

چکیده- این مجموعه داده‌ها علاوه بر داشتن مقادیر مفید، شامل داده‌های پرت، حشو ، مازاد و بی‌ربط نیز هستند. در مواجهه با مجموعه داده‌های با ابعاد بالا، کاهش ابعاد یک گام پیش‌پردازشی مهم برای حصول دقت بالا، کارایی و مقیاس‌پذیری در مسائل رده‌بندی است. در این مطالعه یک الگوریتم انتخاب مشخصه مبتنی بر یک معیار جدید، جهت تشخیص اطلاعات متقابل بین مشخصه ها و رده هدف پیشنهاد شده است. علاوه بر در نظر گرفتن معیار جدید اطلاعات متقابل، به منظور بهبود سرعت روش پیشنهادی، از یک الگوریتم فراابتکاری مبتنی بر الگوریتم ژنتیک استفاده شده است. روش پیشنهادی محدودیت‌های روش‌های انتخاب مشخصه پیشین که باعث انتخاب مشخصه‌های نامرتبط و حشو در زمان بالا می‌شود را کاهش داده و باعث افزایش دقت رده‌بند می‌شود. عملکرد این روش برروی مجموعه داده‌هایی با ابعاد مختلف، که تعداد مشخصه‌ها در آن‌ها از 13 تا 60 متفاوت بوده، ارزیابی شده است.ارزیابی روش پیشنهادی در مقایسه با روش‌های مشابه از لحاظ دقت رده‌بند بررسی شده و نتایج نویدبخشی حاصل شده است. در این مطالعه یک روش انتخاب ‌مشخصه مبتنی بر معیار جدید محاسبه اطلاعات متقابل بین مشخصه های منتخب و مشخصه هدف پیشنهاد شده است. روش پیشنهادی محدودیت‌های روش‌های انتخاب مشخصه موجود که باعث انتخاب مشخصه‌های نامرتبط و حشو می‌شود را کاهش داده و باعث افزایش دقت رده‌بند می‌شود. در کنار این معیار، به منظور افزایش سرعت روش پیشنهادی، یک الگوریتم فراابتکاری مبتنی بر الگوریتم ژنتیک مورد استفاده قرار گرفته است. جهت بررسی کارایی، مجموعه داده‌های مختلف پایگاه UCI ابتدا برروی روش پیشنهادی و سپس برروی کلاسبندKNN اعمال شدند و نتایج مقایسه روش‌های مختلف از نقطه نظر دقت رده‌بند گزارش شده است که در سطح رضایت کننده ای قرار دارند.در مواجهه با مجموعه داده‌های با ابعاد بالا، کاهش ابعاد یک گام پیش‌پردازشی مهم برای حصول دقت بالا، کارایی و مقیاس‌پذیری در اغلب مسائل اقتباس دانش از میان داده‌ها است. در این تحقیق برای کاهش ابعاد داده‌ها ابتدا رابطه‌ای ارائه شده است که می‌تواند روابط بین ویژگی‌ها و توابع هدف را مبتنی بر واقعیت در نظر بگیرد و از طرفی نیز پیچیدگی محاسبه را کاهش دهد. رابطه مذکور از روابط زیر مجموعه بهره اطلاعاتی است که در آن روابط بین ویژگی‌ها و تابع هدف به صورت مقایسات زوجی تشخیص داده می‌شود. همچنین با توجه به زمان‌بر بودن روش پیشنهادی در اابعاد بالا به دلیل افزایش تعداد ویژگی‌های اصلی و اولیه مسئله و افزایش مقایسات زوجی بین ویژگی‌ها و همچنین توابع هدف آن‌ها از یک الگوریتم فراابتکاری مبتنی الگوریتم ژنتیک استفاده شده است تا زمان انتخاب مشخصه‌های منتخب را کاهش دهد. پس از محاسبه معیار ارائه شده و تعیین ویژگی‌هایی که بیشترین تأثیر را در تشخیص مؤلفه هدف مسئله دارند، مشخصه‌های منتخب از مجموعه اصلی ویژگی‌ها تعیین می‌گردد. سپس ویژپی‌های انتخاب شده وارد رده‌بند KNN شده تا دقت رده‌بندی داد‌ه‌ها با ابعاد انتخاب شده تعیین و اعتبار‌سنجی گردد. روش پیشنهادی با روش‌های mRMR, DISR, JMI, NJMIM در مجموعه داده‌های متفاوت اعمال شده است. متوسط دقت‌های به دست آمده از خروجی‌های روش پیشنهادی 65.32 و 74.51 و 70.88 و 58.2 درصد می‌باشد که حاکی از کارآمدی روش پیشنهادی است. طبق نتایج به جز در مورد مجموعه داده sonar که نتیجه‌ای بهتر از روش پیشنهادی داشته است، متوسط عملکرد روش پیشنهادی بهتر از DISR, JMI, NJMIM و مشابه mRmRبوده است، در مورد مجموعه داده‌های دیگر متوسط دقت روش پیشنهادی بهتر از همه روش‌ها بوده است. روش پیشنهادی فوق می‌تواند با ترکیب با الگوریتم‌های یادگیری ماشین دارای عملکرد بهتری شود. همچنین می‌توان از ترکیب روش‌های فراابتکاری جهت بهبود مسئله استفاده کرد.

Dealing with the high dimensional datasets, dimension reduction as a pre-processing approach can assist to provide high accuracy, efficiency and scaling procedure particularly in classification problems. In this study, an algorithm for feature selection based on the information theory has been proposed focusing on the dimensionality reduction in classification task. In this approach mutual information between candidate features and label class is measured by considering a new optimal metric. Next to the new MI metric, the meta heuristic algorithm based on genetic algorithm has been applied to increase the speed and efficiency of the proposed method. This approach is applied on the datasets with different dimensions from 13 to 60. The evaluation results show the promising results in term of classification accuracy in comparison with other similar methods. the proposed method has been studied with the mRMR, DISR, JMI, NJMIM data based and the gap between this data contrasted with proposed algorithm.Dealing with the high dimensional datasets, dimension reduction as a pre-processing approach can assist to provide high accuracy, efficiency and scaling procedure particularly in classification problems. In this study, an algorithm for feature selection based on the information theory has been proposed focusing on the dimensionality reduction in classification task. In this approach mutual information between candidate features and label class is measured by considering a new optimal metric. Next to the new MI metric, the meta heuristic algorithm based on genetic algorithm has been applied to increase the speed and efficiency of the proposed method. This approach is applied on the datasets with different dimensions from 13 to 60. The evaluation results show the promising results in term of classification accuracy in comparison with other similar methods. the proposed method has been studied with the mRMR, DISR, JMI, NJMIM data based and the gap between this data contrasted with proposed algorithm.Dealing with the high dimensional datasets, dimension reduction as a pre-processing approach can assist to provide high accuracy, efficiency and scaling procedure particularly in classification problems. In this study, an algorithm for feature selection based on the information theory has been proposed focusing on the dimensionality reduction in classification task. In this approach mutual information between candidate features and label class is measured by considering a new optimal metric. Next to the new MI metric, the meta heuristic algorithm based on genetic algorithm has been applied to increase the speed and efficiency of the proposed method. This approach is applied on the datasets with different dimensions from 13 to 60. The evaluation results show the promising results in term of classification accuracy in comparison with other similar methods. the proposed method has been studied with the mRMR, DISR, JMI, NJMIM data based and the gap between this data contrasted with proposed algorithm.Dealing with the high dimensional datasets, dimension reduction as a pre-processing approach can assist to provide high accuracy, efficiency and scaling procedure particularly in classification problems. In this study, an algorithm for feature selection based on the information theory has been proposed focusing on the dimensionality reduction in classification task. In this approach mutual information between candidate features and label class is measured by considering a new optimal metric. Next to the new MI metric, the meta heuristic algorithm based on genetic algorithm has been applied to increase the speed and efficiency of the proposed method. This approach is applied on the datasets with different dimensions from 13 to 60. The evaluation results show the promising results in term of classification accuracy in comparison with other similar methods.

Download citation file :
(پژوهیار, , , )

Downlaod HTML
Download PDF

Sign in / Sign up

You need Enter to view the content of the article. If you are not a member, proceed from part Sign up.

تحتاج دخول لعرض محتوى المقالة. إذا لم تكن عضوًا ، فتابع من الجزء الاشتراک.
إن كنت لا تقدر علی شراء الاشتراك عبرPayPal أو بطاقة VISA، الرجاء ارسال رقم هاتفك المحمول إلی مدير الموقع عبر webmaster@noormags.com .

You need Sign in to view the content of the article. If you are not a member, proceed from part Sign up.
If you fail to purchase subscription via PayPal or VISA Card, please send your mobile number to the Website Administrator via webmaster@noormags.com .

Shortlink:

1402

1401

1400

1399

1398

1397

1396

1395

1394

توسعه یک روش انتخاب مشخصه مبتنی بر نظریه اطلاعات و الگوریتم ژنتیک Journal Article