Abstract:
In this study, the researcher used the many-facet Rasch measurement model (MFRM) to detect two pervasive rater errors among peer-assessors rating EFL essays. The researcher also compared the ratings of peer-assessors to those of teacher assessors to gain a clearer understanding of the ratings of peer-assessors. To that end, the researcher used a fully crossed design in which all peer-assessors rated all the essays MA students enrolled in two Advanced Writing classes in two private universities in Iran wrote. The peer-assessors used a 6-point analytic rating scale to evaluate the essays on 15 assessment criteria. The results of Facets analyses showed that, as a group, peer-assessors did not show central tendency effect and halo effect; however, individual peer-assessors showed varying degrees of central tendency effect and halo effect. Further, the ratings of peer-assessors and those of teacher assessors were not statistically significantly different
Machine summary:
"Keywords Peer-assessment; Rater effects; Rating; Many-facet Rasch measurement model *Corresponding author: English Language Department, Faculty of Humanities, Imam Khomeini International University, Nourozian Blvd.
Peer-assessment aids students in reflecting on their learning by observing other students‟ performance (Falchikov, 1986; Gielen, Dochy, & Onghena, 2011; Nulty, 2010; Somervell, 1993; Vickerman, 2009), generates positive attitudes in students ( Haaga, 1993; Murakami, Valvona, & Broudy, 2012; Saito & Fujita, 2004), develops a sense of shared responsibility among students (Saito, 2008), and increases a higher level of cognitive thinking (Cheng & Warren, 2005; Davis, 2009).
The findings have shown that a lengthy period of training can affect peer evaluation positively (Stanley, 1992), training can provide students with more feedback and prompt them to interact with each other (Zhu, 1995), training will result in developing writing skill, building more confidence, and using more metacognitive strategies (Min, 2005), peer- assessor training can make peer-assessors more consistent, leading to fewer misfitting peer-assessors (Saito, 2008), and when peer-assessors receive training, they tend to provide more valid ratings (Liu & Li, 2014).
In an empirical study to detect central tendency effect using the many-facet Rasch measurement model among peer-assessors, Farrokhi, Esfandiari, and Vaez (2011) employed 188 peer- assessors to rate the five-paragraph essays their classmates had written.
It provides a statistical framework, enabling researchers to analyze rating data and summarize the overall ratings in terms of group-level effects for facets such as peer-assessors, assessment criteria, and students‟ essays (Myford & Wolf, 2003).
Peer-assessors rated the essays of their peers very reliably, not showing central tendency effect and halo effect when the researcher reviewed the group-level statistical indicators."