Effectiveness of a Face-to-Face Training Program on Oral Performance Assessment: The Analysis of Tasks Using the Multifaceted Rasch Analysis


The current popularity of second/foreign language oral performance assessment has led to a growing interest in tasks as a tool for assessing language learners’ oral abilities. However, most oral assessment studies so far have investigated tasks separately; therefore, any possible relationship among them has remained unexplored. Twenty English as a foreign language (EFL) teachers rated the oral performances produced by 200 EFL learners before and after a rater training program using description, narration, summarizing, role-play, and exposition tasks. The findings demonstrated the usefulness of multifaceted Rasch measurement (MFRM) in detecting rater effects and demonstrating the consistency and variability in rater behavior aiming to evaluate the quality of rating. The outcomes indicated that test difficulty identification is complex, difficult, and at the same time multidimensional. On the other hand, test takers’ ability is a more determining factor in their score variation than other intervening variables. The outcomes displayed no relationship between task difficulty and raters’ interrater reliability measures. The findings suggest that tasks have various effects on oral performance assessment tests and most importantly, performance conditions in estimating the oral ability of test takers. Since various groups of raters have biases to different tasks in use, the findings indicated that training programs can reduce raters’ biases and increase their consistency measures. The findings imply that decision makers had better not be concerned about raters’ expertise in oral assessment, whereas they should establish better rater training programs for raters to increase assessment reliability.

The Speaking Test The elicitation of test takers’ oral proficiency was done through the use of five different tasks including description, narration, summarizing, {مراجعه شود به فایل جدول الحاقی} role-play, and exposition tasks. The outcome displayed a significant mean difference among all pairs of tasks with respect to their scorings of test takers’ oral performance ability at the pre-training phase except for narration-role play (p=0. The outcome of the table displays that there is significant mean difference among all pairs of tasks with respect to their scorings of test takers oral performance ability at the post-training phase except for the following pairs: description-summarizing (p = 0. 52, p Similar to the pre-training phase, in order to make sure whether there is a significant difference between NEW and OLD raters with regard to rating difficulty of each particular task, an independent t-test was run. The outcome of the first and second research questions dealing with raters’ biases to the tasks of various levels of difficulty indicated significant differences between NEW and OLD raters in their biases to the oral tasks at the pre-training phase. Nevertheless, unlike the pre-training phase, data analysis showed no significant difference between NEW and OLD raters’ biases in scoring tasks with respect to various difficulty measures. The high amount of obtained 50 Journal of Modern Research in English Language Studies 5(4),27-53 (2018) residual, as compared to the effect of raters and tasks, demonstrates that test takers ability acts as a more significant role in test takers oral ability rather than other involving factors.

