Patient characteristics
Consecutive patients with neck pain, referred by local general practitioners for physical therapy in Zoetermeer, the Netherlands, were invited to participate. The selection criteria were: age between 18 and 70 years, pain and/or stiffness in the neck for at least 2 weeks, and written informed consent. Patients were excluded if they had received surgery in the cervical region or had evidence of specific pathology, such as malignancy, neurological disease, fracture, herniated disc or systemic rheumatic disease. Data on demographics (e.g. age and gender), clinical factors (duration, concomitant complaints), neck pain on a numerical 0–10 point scale ranging from 0 (no pain) to 10 (maximal pain), and disability assessed with the Neck Disability Index (NDI) [12] were collected by an independent research assistant prior to the actual active ROM measurements.
Rater characteristics
The raters were two physical therapists with 3 months experience using the EDI-320 inclinometer (Lumex, Inc., Ronkonkoma, New York) [9] and both performed weekly cervical ROM assessments in another study. The measurement procedures were practiced on 5 healthy volunteers prior to the start of the present study.
Measurement protocol
For the measurements of cervical flexion-extension and lateral flexion the patient was seated upright in a high chair, with the hands resting on the upper thigh. For the measurement of cervical flexion-extension, the position of 0 degrees was in maximal cervical flexion ("chin to chest"), followed by maximal cervical extension. Likewise, the measurements of lateral flexion were initiated with the position of 0 degrees in maximal lateral flexion to the left ("ear to left shoulder"), followed by maximal lateral flexion to the right. Because active ROM using the EDI-320 inclinometer can only be measured against gravity, the ratings of cervical rotation were performed with the patient in a supine position. The position of 0 degrees was in maximal left rotation, followed by maximal right rotation. During rotation the head slide over a cushioned treatment table and the patient was not allowed to make any compensatory lateral flexion with the head. See figure 1
We chose for full cycle ROM (for example: from left to right rotation) as the neutral head position is difficult to perform in half-cycle ROM (for example: from the neutral to left rotation) assessments in the cervical spine[7]. The reference point for the EDI was on the forehead for both flexion-extension and rotation, and right above the ear for lateral flexion. Throughout the motion, the physiotherapist maintained contact with the EDI and the reference point on the head.
The subjects were instructed to perform the movement and then to practice twice before performing the actual movement. The patient was instructed only to move the head, and to avoid compensatory movements in the thoracic or lumbar region. The patient was gently guided through the whole range of motion, and manual contact was applied by the rater. The patient was encouraged to perform a maximal movement until the end of the active ROM was reached, or until the pain prevented the patient from going any further.
Procedure reproducibility study
Active ROM of the cervical spine was assessed twice in three planes in the following order: maximal flexion to maximal extension (2×), maximal lateral flexion from left to right (2×), and maximal rotation from left to right (2×). The time interval between measurement between the first and second ratings of a single rater was 5 minutes and the interval between raters was 10 minutes. The order of the raters was randomized using a computer generated random sequence table. At all times only one rater was present in the examination room, together with the research assistant. The research assistant recorded the number of degrees, which were electronically displayed on the EDI-320. In order to keep the raters blind for the outcome of measurement, the read out on the electronic display of the EDI-320 was concealed for both raters and patients. Thus, the raters were unaware of the previous measurements by the other rater.
Data analysis
We used two different measures which are increasingly used in reproducibility studies: one measure to assess agreement and one measure to assess reliability [13, 14]. Figure 2 shows an overview of the intra- rater and inter-rater comparisons we made.
Agreement parameters
Parameters of agreement measure the ability to achieve the same value in two measurements, and gives an indication of the size of the measurement errors. We assess the 95% limits of agreement (LoA) according to Bland and Altman as a measure of agreement [15].
The mean difference between the scores of both raters was calculated, representing the systematic differences (bias) between the measurements. The standard deviation (SD) of this difference represented the extent to which the rater(s) recorded the same mean value in each plane. Then the 95% limits of agreement (LoA) were calculated (mean of the difference ± 1.96*SD), indicating the 'total error', systematic and random error together [16].
As no clear criteria exist for acceptable value of intra-rater and inter-rater agreements for active ROM outcome measures, we defined, a priori, that a difference in measurement between the raters of 10% of the total range of measurement values would be acceptable.
The Bland and Altman method can be visualized by plotting the differences between the first and the second ratings against the corresponding mean of the first and the second rating. This visual representation of agreement illustrates the magnitude and range of the differences, bias or outliers, and the relation between the magnitude of the differences and the magnitude of the mean values [15].
Based on the agreement results of rater A the smallest detectable difference (SDD) for an individual level was calculated for each movement by multiplying the SD of the differences by 1.96: 1.96* SD change. The SDD represents the change that can be detected by the EDI-320 beyond measurement error[17, 18]
Reliability parameters
Reliability parameters reflect the extent to which a measurement instrument can differentiate between patients[19]. If persons differ a lot, it is easier to distinguish them from each other, despite some measurement errors. In that case the measurement errors are related to the differences between the persons.
As a parameter of reliability the Intraclass Correlation Coefficient (ICC) was used (Figure 2) [16], Streiner and Norman 2003). We used ICCs which took systematic differences in the measurements into account. These ICCs are defined as the ratio of the variance among patients (patient variability) over the total variance (among patients, among raters plus the error variance), which ranges between 0 (no reliability) and 1 (perfect reliability). The cut-off point of ICC>0.75 was chosen a priori as an indication of acceptable reliability [20]. We used SPSS 9.0 statistical software (SPSS Inc., Chicago, Illinois) to calculate the ICCs [21]. In case the unit of analyses was the mean of two ratings by one rater, variances in which the raters were involved were divided by a factor 2 [21].
Figure 2 shows an overview of intra- and interrater comparisions