An attribute analysis was developed to simultaneously assess the effects of repeatability and reproducibility on accuracy. It allows the analyst to review the responses of several reviewers if they look at multiple scenarios multiple times. It establishes statistics that assess the ability of evaluators to agree with themselves (repeatability), with each other (reproducibility) and with a master or correct value (overall accuracy) known for each characteristic – over and over again. If you look at Table 1 for Bob`s data, you can see that this part has been evaluated 25 times by Bob each time. There have been five times that Bob did not rate the piece in the same way each time (for parts 6, 14, 21, 22 and 26). This corresponds to an approval percentage of 25/30 – 83.3%. So Bob rated each game equal at 83.3% each time. If we do this study again, would Bob have the same percentage agree? We do not know, but probably not because there are frequent causes of variation that are still present. We can develop a confidence interval around this average to give us an idea of the possible variation in Bob`s results. You can also determine an effective total score for all three listeners.

The calculations are the same as for the confidence interval. The difference is that you first determine the number of parts in which the three operators agreed on all the tests, i.e. all operators evaluated the part in the same way for each test – either a pass or a failure. As indicated in the data in the first table, this happened for 22 parts. There were 8 parties in which the evaluators disagreed with some trials or trials. This is how the global agreement came through: we introduced the Kappa value in the last newsletter. It can be used to measure the examiner`s compliance with the benchmark. Kappa can range from 1 to -1. A kappa value of 1 represents a perfect match between the examiner and the benchmark. A kappa value of -1 is a perfect disagreement between the examiner and the benchmark.

A Kappa value of 0 indicates that the agreement represents the agreement that is expected only by chance. Therefore, Kappa values close to 1 are desired. Each expert versus the standard disagreement is a breakdown of each reviewer who evaluates classification errors (compared to a known reference standard). This table only applies to two-tiered binary responses (z.B 0/1, G/NG, Pass/Fail, True/False, Yes/No). The lowest confidence interval is expressed by the equation: Unlike a continuous measurement that cannot be accurate (on average), any lack of precision in a system for measuring attributes inevitably leads to accuracy problems.