We find that it shows a greater resemblance between A and B in the second case, compared to the first. Indeed, if the percentage of agreement is the same, the percentage of agreement that would occur « by chance » is much higher in the first case (0.54 vs. 0.46). The weighted Kappa allows differences of opinion to be weighted differently[21] and is particularly useful when codes are ordered. [8]:66 Three matrixes are involved, the matrix of observed scores, the matrix of expected values based on random tuning and the weight matrix. The weight dies located on the diagonal (top left to bottom-to-right) are consistent and therefore contain zeroes. Off-diagonal cells contain weights that indicate the severity of this disagreement. Often the cells are weighted outside diagonal 1, these two out of 2, etc. The reliability of Interraters is, to some extent, a concern in most large studies, as many people who collect data may experiment and interpret phenomena of interest differently. Variables that are subject to in-disciplinary errors are easy to find in clinical research and diagnosis. For example, studies of pressure ulcers (1.2) where variables contain elements such as redness, deme and erosion in the affected area. While data collectors can use measurement tools for size, color is quite subjective like edema.

In head trauma research, data collectors appreciate the size of the patient`s students and the degree of shrinkage of students who respond to light. In the lab, people reading Papanicolaou (Pap) smears for cervical cancer were found to vary in their interpretations of cells on slides (3). As a potential source of error, researchers are expected to provide training to data collectors to reduce variability in data display and interpretation and to record it on data collection tools. Finally, researchers are expected to measure the effectiveness of their training and report the degree of agreement (the reliability of the interrater) between their data collectors. As Marusteri and Bacarea (9) have found, there is never 100% certainty about the results of the research, even if the statistical significance is reached. The statistical results used to test hypotheses about the relationship between independent and dependent variables are meaningless when there are inconsistencies in the evaluation of variables by evaluators. If the agreement is less than 80%, more than 20% of the data analysed is wrong. With a reliability of only 0.50 to 0.60, it is understandable that 40 to 50% of the data analyzed is wrong. If Kappa values are less than 0.60, the confidence intervals around the received kappa are so wide that it can be assumed that about half of the data may be false (10).

It is clear that statistical significance does not mean much when there are so many errors in the results tested. The dissent is 14/16 or 0.875. The disagreement is due to the quantity, because the assignment is optimal. Kappa is 0.01. There are a number of statistics that have been used to measure the reliability of interreters and intraraterns. A sub-list includes a match percentage, Kappa cohens (for two tyters), kappa fleiss (Adjustment of Cohens Kappa for 3 or more raters), contingency coefficient, Pearson r and Spearman Rho, intraclassin correlation coefficient, match correlation coefficient, and Alpha krippendorff (useful if there are several tips and evaluations). The use of correlation coefficients such as Pearsons r can be a poor reflection of the agreement between advisors, leading to an extreme overshoot or underestimation of the actual level of the breach agreement (6).