The first aim of this study was to investigate the intra-rater and inter-rater reproducibility of the assessment of range of motion in three planes for patients with neck pain, using the Cybex EDI-320 inclinometer. For intra-rater reproducibility we compared the first rating with the second rating of each rater and for the inter-rater reproducibility we compared rater A with rater B. Some systematic differences were observed, however these were small considering the overall active ROM in each plane for both the intra-rater and inter-rater agreement. Overall, we found good intra-rater and inter-rater reliability statistics (ICCs of 0.86 or higher). As expected both agreement and reliability were slightly higher for the intra-rater comparisons than for the inter-rater comparisons. High reliability does not necessarily mean that the raters agree in an absolute sense on the active ROM (agreement) [13, 14]. For this reason we included both parameters of agreement and reliability in the present study.
The SDD, based on intra-rater agreement, for flexion-extension (11.1°) and rotation (13.5°) was almost equal to the cut-off values for our predefined criteria for an acceptable clinical difference (10.1° and 13.9°, respectively). However, for lateral flexion (10.4°) an acceptable clinical difference may be somewhat more difficult to detect as the SDD was higher than our predefined acceptable difference of 10% (7.2°).
Also measurements performed by different raters are insufficiently reproducible to detect the predefined difference of 10% of the used range of the measurement scale. However, this holds for SDDs calculated on the individual level. In research, when groups of patients are used the EDI-320 is sufficiently reproducible for all measurements of range of motion, because SDD values should be divided by √N to obtain SDD for group level, with a group size of N.
To minimize any random error, the inter-rater statistics were based on the mean of two ratings as outlined in our protocol. We investigated whether just one rating per rater instead of two would yield acceptable reproducibility statistics (second aim). Although a duplicate rating did not improve the reproducibility much, the 2nd rating with the EDI-320 can be done easily. Similarly, we evaluated whether reproducibility was affected by the severity of pain. Patients with high pain intensity had on average, less ROM compared to patients in the low pain intensity group (p ≤ 0.05). However, reliability and agreement were acceptable in both the group with low and high pain intensity.
We hypothesized that pain and limitation of movement could either increase or decrease during the course of a series of movements and thus pose sources of systematic variation to the assessment of reproducibility . By comparing the first and second consecutive pair of ratings (independent of the rater), a statistically significant small, but not clinically relevant, difference was observed for flexion-extension (3.4 degrees difference: 95% CI 0.2 to 6.5). We therefore conclude that the effect of repeated movements on cervical ROM was minimal.
In the present study we looked at the intra-rater reproducibility by comparing two consecutive ratings with a minimal time interval and inter-rater reproducibility with an interval of approximately 10 minutes. The main reason for the choice of the time interval of 10 minutes was a practical one: we could measure a patient in one single visit. Our assumption was that within 10 minutes the patients will be stable on pain perception and range of motion. Had we chosen a larger time interval our results might have been different, however. Ideally, true intra-rater variability is evaluated for a disorder stable within the time frame evaluated. However, we consider a large time interval not desirable for the assessment of measurement variation because of the biological variation within subjects over time [6, 23].
More than half of all studies on the reproducibility of cervical ROM have inappropriately used T-tests or repeated measures ANOVA, which are not considered true reliability statistics . The ICC is used in only a few studies . ICC values are known to be dependent on the variation in the study population . As can be seen from the visual representation of agreement (Figure 2), the active ROM values for lateral flexion are somewhat more clustered together (a smaller range) than the other two planes. The more homogeneous values might give some explanation for the somewhat lower ICCs for lateral flexion, and the wider range of values result in higher ICCs for rotation. Likewise, the larger variation in active ROM values in the high pain intensity group might also explain the higher ICCs compared to the low pain intensity group.
Studies that measure ROM for patients with neck disorders are scarce. A systematic review identified that only 6 studies assessed reliability in patients with cervical disorders and of these only 2 studies had more than 30 subjects . Two studies reported on the reproducibility of the EDI-320 for cervical ROM in healthy subjects [10, 11]. The first one reported acceptable agreement results and found that more than 90% of the successive ratings for cervical flexion and lateral flexion by two raters were within a range between 0–10 degrees . The other study only investigated flexion and extension, and reported moderate to high intra-rater reliability (flexion ICC 0.77, extension 0.79–83) and somewhat lower inter-rater reliability (flexion ICC 0.66–0.73; extension ICC 0.66–0.80) . The authors of this study report that the reliability could be improved by using a standardized protocol. Comparison of ICC values between different studies is hampered by the dependency of ICC values on the variability of range of motion values of the population under study . De Winter et al showed that for measurements of range of motion in 155 patients with shoulder complaints, the ICC were high for the affected shoulder (ICC = 0.83) and low for the non-affected shoulder (ICC = 0.28). This difference was completely due to variability of range of motion found for the affected shoulder, which was large and the non-affected shoulder, which was low.
The CROM device is the most frequently reported measure for cervical ROM and variable ICC values have been reported, both alone or when compared to other ROM instruments [8, 25, 26]. One study on patients with cervical spine disorders reported inter-rater ICCs for active ROM greater than 0.80 with the Cervical Range of Motion Device (CROM device) compared to ICCs lower than 0.80 for visual estimation and a universal goniometer (Youdas et al 1991). Considering the results of this study it would be interesting to directly compare the CROM device with the EDI-320 inclinometer in a future study.
Our population consisted of patients with non-specific neck pain, readers can compare the patient profile presented in this article with their own patients. The measurement procedure is quick and simple, which we hope will facilitate replication of our reproducibility design in other clinical settings.