Department of Communication Studies
The performances of six major indices of inter-coder reliability were evaluated against actual judgments of human coders in a behavior-based Monte Carlo (BMC) experiment. The correlations between the indices’ estimated chance agreements (ac) and the observed chance agreements (oac) turned out to be negative for Cohen’s κ, Scott’s π and Krippendorff’s α, and mild although positive for Bennett et al’s S, Perrault and Leigh’s Ir and Gwet’s AC1. While each of the indices was designed to improve on percent agreement, each underperformed percent agreement (ao) when estimating observed true agreement (at) in the BMC experiment.
The poor or negative correlations between the calculated estimates and the observed estimands question the validity of the estimators, namely the indices. The findings support the emerging theory that reliability indices available today assume dishonest coders who deliberately maximize chance coding, and they are therefore unsuitable for typical studies where coders perform chance coding involuntarily when the task is too difficult. A new index or indices are needed.
This manuscript also reports the effort to develop such a new index, agreement index (ai), which assumes honest coders and involuntary chance coding. Subsequent analysis shows that ai is void of the 23 known paradoxes that plague other indices. In the BMC experiment, the chance agreement estimated by ai was by far the best predictor of the observed chance agreement between coders. Index ai also outperformed percent agreement and all other six indices while predicting true agreements among the coders.
Empirical testing of theories and indices should continue, especially by different researchers using different methods, and so should the search for a better index. Until better evidences are available, however, researchers may refrain from using κ, π, and α, and add ai as a reasonable measure of true agreements between two coders on a nominal scale. Online software has been provided at http://reliability.hkbu.edu.hk/ to facilitate calculation.
reliability, intercoder reliability, interrater reliability, agreement index, estimator, estimate, estimand, maximum randomness, variable randomness, behavioral Monte-Carlo experiment, BMC, simulation-augmented behavior experiment, SAB, kappa, alpha, pi
Source Publication Title
Annual conference of Association for Education in Journalism and Mass Communication
Zhao, Xinshu (2012). “A Reliability index(ai) that assumes honest coders and variable randomness.” Paper presented at the annual conference of Association for Education in Journalism and Mass Communication, Chicago, USA, August.