Unreliability in marking is well documented, yet we lack studies that have investigated assessors’ detailed use of assessment criteria. This project used a form of Kelly’s repertory grid method to examine the characteristics that 24 experienced UK assessors notice in distinguishing between students’ performance in four contrasting subject disciplines: that is their implicit assessment criteria. Variation in the choice, ranking and scoring of criteria was evident. Inspection of the individual construct scores in a sub-sample of academic historians revealed five factors in the use of criteria that contribute to marking inconsistency. The results imply that, whilst more effective and social marking processes that encourage sharing of standards in institutions and disciplinary communities may help align standards, assessment decisions at this level are so complex, intuitive and tacit that variability is inevitable. We conclude that universities should be more honest with themselves and with students, and actively help students to understand that application of assessment criteria is a complex judgement and there is rarely an incontestable interpretation of their meaning.
"Accepting the inevitability of grading variation means that we should review whether current efforts to moderate are addressing the sources of variation. This study does add some support to the comparison of grade distributions across markers to tackle differences in the range of marks awarded. However, the real issue is not about artificial manipulation of marks without reference to evidence. It is more that we should recognise the impossibility of a ‘right’ mark in the case of complex assignments, and avoid overextensive, detailed, internal or external moderation. Perhaps, a better approach is to recognise that a profile made up of multiple assessors’ judgements is a more accurate, and therefore fairer, way to determine the final degree outcome for an individual. Such a profile can identify the consistent patterns in students’ work and provide a fair representation of their performance, without disingenuously claiming that every single mark is ‘right’. It would significantly reduce the staff resource devoted to internal and external moderation, reserving detailed, dialogic moderation for the borderline cases where it has the power to make a difference. This is not to gainsay the importance of moderation which is aimed at developing shared disciplinary norms, as opposed to superficial procedures or the mechanical resolution of marks."
It's quite easy to criticize this paper - small scale study (n=24), no attempt at statistical analysis or validation. But there's still an inescapable feeling that as the stakes have escalated, HE is kidding itself about assessment practices.