This article has Open Peer Review reports available.
Inter-observer reproducibility of measurements of range of motion in patients with shoulder pain using a digital inclinometer
© de Winter et al; licensee BioMed Central Ltd. 2004
Received: 27 October 2003
Accepted: 14 June 2004
Published: 14 June 2004
Reproducible measurements of the range of motion are an important prerequisite for the interpretation of study results. The digital inclinometer is considered to be a useful instrument because it is inexpensive and easy to use. No previous study assessed inter-observer reproducibility of range of motion measurements with a digital inclinometer by physical therapists in a large sample of patients.
Two physical therapists independently measured the passive range of motion of the glenohumeral abduction and the external rotation in 155 patients with shoulder pain. Agreement was quantified by calculation of the mean differences between the observers and the standard deviation (SD) of this difference and the limits of agreement, defined as the mean difference ± 1.96*SD of this difference. Reliability was quantified by means of the intraclass correlation coefficient (ICC).
The limits of agreement were 0.8 ± 19.6 for glenohumeral abduction and -4.6 ± 18.8 for external rotation (affected side) and quite similar for the contralateral side and the differences between sides. The percentage agreement within 10° for these measurements were 72% and 70% respectively. The ICC ranged from 0.28 to 0.90 (0.83 and 0.90 for the affected side).
The inter-observer agreement was found to be poor. If individual patients are assessed by two different observers, differences in range of motion of less than 20–25 degrees can not be distuinguished from measurement error. In contrast, acceptable reliability was found for the inclinometric measurements of the affected side and the differences between the sides, indicating that the inclimeter can be used in studies in which groups are compared.
Measurement of the range of motion of the shoulder joint plays a vital role in the understanding of the nature and the expected course of shoulder pain, as well as in the evaluation of treatment effects. Systematic reviews evaluating the efficacy of medication, steroid injection or physical therapy for shoulder disorders show that in most randomised clinical trials a measurement of range of motion was included [1–4]. The degree of external rotation and glenohumeral abduction is relevant for the evaluation of treatment effects, especially in patients with adhesive capsulitis .
Reproducible measurements of the range of motion are an important prerequisite for the interpretation of study results. Visual inspection, goniometric measurements, inclinometry and high-speed cinematography are examples of methods that have been used to quantify the range of motion. For this purpose the digital inclinometer is considered to be a useful instrument because it is inexpensive and easy to use . A few studies have assessed the reproducibility of inclinometric assessment of the range of motion of the shoulder joint [6–10]. The first study showed that two trained physical therapists could obtain reproducible measurements for the assessment of external rotation and glenohumeral abduction of the shoulder joint . However, only a small sample of healthy subjects was included. Three later studies that included patients reported poor reproducibility of range of shoulder motion [7, 9, 10]. One of these studies, however, was conducted in a very specific group of patients with reflex sympathetic dystrophy . In addition, in all these studies the measurements were done by physicians (e.g. rheumatologists or surgeons) from a single practice. Hoving et al showed that physical therapists achieved higher reliability than rheumatologists, especially for external rotation, but the physical therapists from the study that Hoving referred to assessed only 6 patients [8, 9].
Therefore, the purpose of our study was to evaluate the inter-observer reproducibility of the external rotation and glenohumeral abduction measurements by physical therapists in a large sample of patients from many different practices and with different degrees of shoulder pain, using the Cybex Electronic Digital Inclinometer-320 (EDI 320).
Within the framework of a study on inter-observer agreement on the diagnosis of shoulder disorders, which involved history taking and physical examination , an evaluation was made of the inter-observer reproducibility of external rotation and glenohumeral abduction measurements by physical therapists, using the EDI 320 inclinometer. During a 20-month period, consecutive patients with shoulder complaints who consulted one of the 20 participating general practitioners, one the 2 participating physicians in an orthopaedic practice, or one of the 20 participating rheumatologists in a secondary care rheumatology clinic, were considered for participation in the study. Patients were eligible for participation if they met the following inclusion criteria: aged between 18 and 75 years, ability to co-operate (no dementia, sufficient knowledge of the Dutch language) and informed consent given. Patients with shoulder problems due to neurological, vascular or internal disorders, systemic rheumatic diseases, prior dislocations or fractures were excluded. The study was approved by the local institutional review board of the VU University Medical Center.
Two observers (MPJ & AFW), both experienced physical therapists, independently measured the range of motion of the shoulder joint using the Cybex Electronic Digital Inclinometer-320 (EDI 320) (Cybex Inc, Ronkonkoma, NY). This device is gravity dependent and indicates range of motion on a 360° scale.The EDI 320 consists of a hand-held unit and portable display unit with an integral rechargeable power source. The EDI 320 recorded gross movement and then calculated the differential range of motion by subtracting the initial position reading from the final position reading. The EDI 230 can be used to measure single joint motions of the elbow, forearm, wrist, thumb, fingers, shoulder, scapula, hip, knee and ankle, and combined motions of the spine and shoulder.
Each observer measured both shoulders of each patient once. Passive glenohumeral abduction was measured first, followed by measurement of the passive external rotation. Within one hour the second observer repeated the measurements of the first observer. In order to prevent the occurrence of systematic differences between the observers, due to repeated testing, the sequence of the observers was randomly allocated. The patients did not receive any therapy between the two measurements.
Prior to the study, the performance of all measurements were standardised, to make sure that the physiotherapists assessed the patients in the same way. For the measurement of passive glenohumeral abduction, the patients was seated upright, and the position of 0° was defined as the upper arm in a neutral position. While palpating the lower angle of the scapula with the thumb, the examiner elevated the upper arm of the patient until the scapula began to rotate or pain limited further motion. This range of motion was recorded in degrees.
For the measurement of passive external rotation, the patient was in a supine position, with the shoulder in 0° of abduction and rotation, the elbow flexed at 90° and the forearm in a neutral position. This position was defined as the position of 0°. The observer then performed external rotation until pain limited the range of motion or the extreme of the range was reached. This range of motion was recorded in degrees.
Prior to the measurements, demographic characteristics (age, gender) and clinical characteristics (e.g. previous episodes of shoulder problems, duration of complaints, sleep disturbances) of the patients were recorded by means of a structured questionnaire. In addition, all patients recorded the severity of pain during the day and at night in the preceding week on a 100 mm visual analogue scale (VAS), ranging from 0 'no pain' to 100 ' very severe pain'.
Assessment of reproducibility
The reproducibility of the measurements of the affected side and the contralateral side, and the difference in range of motion between the sides was calculated. The difference between the sides was quantified by subtracting the results of the affected shoulder from those of the contralateral shoulder. The difference between the sides is an important outcome, since in clinical practice a conclusion on abnormal range of motion of the affected shoulder is usually drawn after comparison of the measurements of the affected shoulder with those of the contralateral shoulder. In this manner, differences in mobility between subjects due to age, gender or other factors  can be taken into account.
For the quantification of reproducibility, we distinguished two different types of measures of reproducibility with different interpretations: measures of agreement and measures of reliability. Measures of agreement refer to the absolute measurement error (presented in the units of measurement of the instrument) that is associated with one mesaurement taken from one individual patient . Measures of agreement provide insight into the the ability of two or more observers to achieve the same value. Measures of reliability refer to the relative measurement error, i.e. the variation between patients in relation to the total variance of the measurements (see below). They provide information on the ability of two or more observers to differentiate between subjects in a group [13, 14].
The inter-observer agreement was quantified by calculating the mean difference between the two observers (A-B) and the standard deviation (SD) of this difference. Subsequently, the 95% limits of agreement were calculated according to the method of Bland & Altman , defined as the mean difference between the observers ± 1.96*SD of this difference. These limits represent the range in which 95% of the differences between the two observers fall. If the values of observer A would be extracted from observer B (B-A instead of A-B), the limits of agreement would stay the same, but the signs (+ / -) of the mean differences and the upper and lower limits of agreement would be opposed. In this situation, the choice of extracting B-A or A-B is arbitrair. Therefore, the signs are irrelevant and should be ignored when interpreting the results. For the interpretation of the measurement error, the largest limit of agreement (either upper or lower limit) is most relevant.
Furthermore, plots of the differences between observers against the corresponding mean of the two observers for each patient were constructed to examine homoscedasticity, as proposed by Bland and Altman . In addition, the frequency of agreement of the observers within 5° and 10° was calculated. Although no clear criteria for the acceptable degree of inter-observer agreement are available, based on our clinical experience, we decided prior to the study that differences exceeding 10° were determined as being unacceptable because they are likely to affect decisions on patient management.
The intra-class correlation coefficient (ICC) was derived from a random-effects two-way analysis of variance. By means of analysis of variance the variation in measurements is partitioned into the potential sources of variation: observer differences, patient differences and random error. The ICC is defined as the ratio of the variance between patients over the total variance . The values of the ICC can theoretically range from 0 to 1, with a higher value indicating that less variance is due to other factors such as differences between observers. An intraclass correlation coefficient of at least 0.70 is considered to be satisfactory for group comparisons, and a value of 0.90–0.95 for individual comparisons .
Main characteristics of the participants and the non-participants
(n = 155)
(n = 46)
Patients recruited by (%):
- general practitioner
- physician in orthopaedic practice
- rheumatologist in secondary care rheumatology clinic
Mean age in years (SD)
Previous episode(s) of shoulder pain (%)
Duration of current episode (%):
- < 3 months
- 3 – 6 months
- 6 – 12 months
- > 12 months
Mean pain score* (SD):
- at night
- during the day
Mean, standard deviation (SD) and range of the glenohumeral abduction and the external rotation according to observer A and B, followed by the mean differences between both observers, and the frequency of agreement within 5 and 10 degrees.
Observer A (in degrees)
Observer B (in degrees)
Observer A-B (in degrees)
Upper and lower limit of agreement
Agreement (%) within 5° and 10°
- affected side
-18.8 – 20.4
- contralateral side
-17.9 – 19.7
- contralateral – affected
-19.6 – 19.8
- affected side
-23.4 – 14.2
- contralateral side
-25.2 – 12.0
- contralateral – affected
-21.7 – 17.7
Since the pain level of the non-participants was higher than that of the participants (Table 1), patients with a high pain intensity (pain score on the VAS during the day > 65; n = 54) were compared with patients with moderate pain intensity (pain score on the VAS during the day ≤ 65; n = 101). The inter-observer agreement was not different between these patient groups (data not shown).
Results of the analysis of variance
Source of variation
- affected side
- contralateral side
- contralateral – affected
- affected side
- contralateral side
- contralateral – affected
This study investigated the inter-observer reproducibility of the assessment of the passive range of motion of the glenohumeral abduction and the external rotation of the shoulder joint, using the EDI 320 digital inclinometer. A large number of patients from different clinics with different levels of mobility and varying severity of shoulder pain were examined. We chose to measure passive rather than active range of motion because according to diagnostic quidelines the degree of passive external rotation and glenohumeral abduction is important for the evaluation of adhesive capsulitis.
The results showed that there was considerable variation in measurement between the observers across the whole range of values of the tested movements. In a maximum of 75% of the various measurements, the differences between observers did not exceed 10°. Although it is a matter of clinical judgement, which other clinicians might not agree with, it was decided that differences between observers which exceed 10° are not acceptable for clinical purposes. The limits of agreement show that if patients, that are considered to be stable, are assessed by two different observers, the differences in the measured range of motion between the observers can be as large as 20–25 points (referring to the largest of the upper and lower limit of agreement). This means that if patients are assessed e.g. before and after therapy by two different observers, changes in range of motion of less than 20–25 degrees, can not be distuinguished from measurement error.
In the present study, inclinometric measurements could often not be performed at all because of the high severity of shoulder pain, resulting in a large number of non-participants (n = 46). However, one could argue that if patients are not able to perform this kind of test because of their pain, there is no need to measure the range of motion anyway. In addition, in our study population no association was found between the level of pain and the inter-observer differences. We believe that our study provides a reasonably valid estimate of the reproducibility of inclinometric measurements of patients with shoulder pain, based on one measurement of each range of motion.
Contrary to those of the glenohumeral abduction, the measurements of the external rotation showed systematic differences between the observers, which is consistent with the findings of Croft et al. . Although several factors might contribute to the systematic differences, differences in defining the limits of motion might explain the results. For glenohumeral abduction the limits of motion are determined by rotation of the scapula, whereas pain and reaching the extreme of the range of motion are the criteria for the limits of motion of external rotation. It was suggested that the amount of passive force applied is one of the reasons why passive movements are more difficult to reproduce than active movements . However, Tousignant et al also found systematic differences in their study on reliability of the EDI-320 for measurement of active neck flexion and extension .
In contrast to the level of poor agreement, acceptable reliability was found for most inclinometric measurements for use for group comparisons (ICC above 0.70), but not for individual comparisons (ICC between 0.90–0.95). These findings are in accordance with the findings of Green et al. , who also reported acceptable reliability for the measurement of glenohumeral abduction and external rotation by physical therapists using an inclinometer. Tousignant et al found comparable ICCs for measurement of active neck flexion and extension with the DI 320 .
Several other studies have evaluated observer variation in the measurement of range of motion of the shoulder joint based on various other measurement methods [18–30]. The results vary considerably across studies and estimates are difficult to compare due to differences in patient groups, raters, and measurement methods.
In general, is seems that most methods are reliable enough to use for group comparisons, but not for individual comparisons (ICCs between 0.70 and 0.90). This means that most instruments can be used in studies. Several authors suggested that visual estimation may be as reliable as measurement instruments, such as an inclinometer or a goniometer [7, 21, 24]. In our patient group reliability obtained with inclinometer measurements was higher than reliability obtained with visual estimation in a previous study on the same subjects (ICC for abduction was 0.83 compared with 0.71; ICC for external rotation was 0.90 compared with 0.78 for the affected side, data submitted for publication).
As in our study, most other studies that presented data on agreement found large measurement errors, especially for the assessment of external rotation. For example, large standard errors of measurement were found in the studies of Geertzen et al.  (approximately 25°), and Triffitt et al.  (approximately 25–30°).
Poor inter-observer agreement, but acceptable reliability of measurements may seem to be a puzzling result. ICCs, however, are strongly influenced by the heterogeneity of the population studied. In a patient group with large differences between patients, it is more easy to distinguish between patients than in a patient group with small differences between patients. Therefore, it is possible that an instrument is able to discriminate adequately between groups of patients despite a large measurement error in a heterogeneous patient population . For the measurement of individual patients in clinical practice, or to assess intra-individual changes in range of motion over time, the measurement error of the observers, using the inclinometer, or most other instruments, seems to be too large. For the purpose of comparing groups in studies, the inclinometer seems to be a useful instrument, and is probably better than visual estimation. Finally, reproducibility is a function of the instrument that is used, the measurement conditions, the movements tested, the observers and the study population. Which method of assessment of range of motion is preferable should therefore be evaluated within one single study.
Investigators should quantify the reproducibility of their assessments before commencing a clinical trial, since the level of reproducibility has considerable impact on the power of a clinical trial. In general, intra-observer reproducibility is better than inter-observer reproducibility [18, 19, 25, 29], so it is recommended that in clinical trials the same observer should be responsible for the measurement of treatment outcome for each patient. Reproducibility of measurements may also be improved by using the mean value of multiple measurements. Further psychometric studies should examine the validity of the EDI 320.
In conclusion, the inter-observer agreement was found to be poor. If patients are assessed by two different observers, differences in range of motion of less than 20–25 degrees, can not be distuinguished from measurement error. In contrast, acceptable reliability was found for the inclinometric measurements of the affected side and the differences between the sides, indicating that the inclimeter can be used in studies.
Since the measurements were already standardised and the observers trained prior to the study, the best way to reduce variation in measurements would seem to use the mean value of multiple measurements at each time point, preferably done by the same observer.
This study was supported by a grant from the Foundation "Stichting Anna Fonds".
- van der Windt DAWM, van der Heijden GJMG, Scholten RJPM, Koes BW, Bouter LM: The efficacy of Non-Steroidal Anti-Inflammatory Drugs (NSAIDs) for shoulder complaints. A systematic review. J Clin Epidemiol. 1995, 48: 691-704. 10.1016/0895-4356(94)00170-U.View ArticlePubMedGoogle Scholar
- van der Heijden GJMG, van der Windt DAWM, Kleijnen J, Koes BW, Bouter LM, Knipschild PG: Steroid injections for shoulder disorders: a systematic review of randomised clinical trials. Br J Gen Pract. 1996, 46: 309-316.PubMedPubMed CentralGoogle Scholar
- van der Heijden GJMG, van der Windt DAWM, de Winter AF: Physical therapy for patients with soft tissue shoulder disorders: a systematic review of randomised clinical trials. BMJ. 1997, 315: 25-30.View ArticlePubMedPubMed CentralGoogle Scholar
- Green S, Buchbinder R, Glazier R, Forbes A: Systematic review of randomised controlled trials of interventions for painful shoulder: selection criteria, outcome assessment, and efficacy. BMJ. 1988, 316: 354-360.View ArticleGoogle Scholar
- Chiarello CM, Savidge R: Interrater reliability of the Cybex EDI-320 and fluid goniometer in normals and patients with low back pain. Arch Phys Med Rehabil. 1993, 74: 32-37.PubMedGoogle Scholar
- Heemskerk MAMB, van Aarst M, van der Windt DAWM: De reproduceerbaarheid van het meten van de passieve beweeglijkheid van de schouder met de EDI-320 digitale hoekmeter. Ned T Fysiother. 1997, 107: 146-149.Google Scholar
- Geertzen JHB, Dijkstra PU, Stuwart RE, Groothoof JW, ten Duis HJ, Eisma WH: Variation of measurements of range of motion: A study in reflex symathetic dystrophy patients. Clin Rehabil. 1998, 12: 254-264. 10.1191/026921598675343181.View ArticlePubMedGoogle Scholar
- Green S, Buchbinder R, Forbes A, Bellamy N: A standardized protocol for measurement of range of movement of the shoulder using the Plurimeter-V inclinometer and assessment of its intrarater and interrater reliability. Arthritis Care Res. 1998, 11: 43-52.View ArticlePubMedGoogle Scholar
- Hoving JL, Buchbinder R, Green S, Forbes A, Bellamy N, Brand C, Buchanan R, Hall S, Patrick M, Ryan P, Stockman A: How reliable do rheumatologists measure shoulder movement?. Ann Rheum Dis. 2002, 61: 612-616. 10.1136/ard.61.7.612.View ArticlePubMedPubMed CentralGoogle Scholar
- Triffitt PD, Wildin C, Hajioff D: The reproducibility of measurement of shoulder movement. Acta Orthop Scand. 1999, 70: 322-324.View ArticlePubMedGoogle Scholar
- de Winter AF, Jans MP, Scholten RJPM, Devillé W, van Schaardenburg D, Bouter LM: Diagnostic classification of shoulder disorders: Inter-observer agreement and determinants of disagreement. Ann Rheum Dis. 1999, 58: 272-277.View ArticlePubMedPubMed CentralGoogle Scholar
- van Schaardenburg D, van den Brande KJS, Ligthart GJ, Breedveld FC, Hazes JMW: Musculokeletal disorders in persons aged 85 and over, a communitiy survey. Ann Rheum Dis. 1994, 53: 807-811.View ArticlePubMedPubMed CentralGoogle Scholar
- de Vet HCW: Observer reliability and agreement. In Encyclopedia of Biostatistics. Edited by: Armitage P, Colton T. 1998, Boston University: John Wiley & Sons Ltd, 4: 3123-3128.Google Scholar
- Stratford P: Reliability: consistency or differentiating between subjects?. Physical Therapy. 1989, 69: 299-300.PubMedGoogle Scholar
- Bland JM, Altman DG: Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986, 1: 307-310.View ArticlePubMedGoogle Scholar
- Fleiss JF: The design and analysis of clinical experiments. Chapter 1: Reliability of measurement. 1986, John Wiley & Sons, 1-33.Google Scholar
- Scientific Advisory Committee of the Medical Outcome Trust: Assessing health status and quality-of-life instruments: Attributes and review criteria. Qual Life Res. 2002, 11: 193-205. 10.1023/A:1015291021312.View ArticleGoogle Scholar
- Croft P, Pope D, Boswell R, Rigby A, Silman A: Observer variability in measuring elevation and external rotation of the shoulder. Br J Rheumatol. 1994, 33: 942-946.View ArticlePubMedGoogle Scholar
- Gajdosik RL, Bohannon RW: Clinical measurement of range of motion; Review of goniometry emphasizing reliability and validity. Phys Ther. 1987, 67: 1867-1872.PubMedGoogle Scholar
- Tousignant M, Boucher N, Bourbonnais J, Gravelle T, Quesnel M, Brosseau L: Intratester and intertester erliability of the Cybex electronic digital inclinometer (EDI-320) for measurement of active neck flexion and extension in healthy subjects. Man Ther. 2001, 6: 235-241. 10.1054/math.2001.0419.View ArticlePubMedGoogle Scholar
- Awan R, Smith J, Boon AJ: Measuring shoulder internal rotation range of motion: A comparison of 3 techniques. Arch Phys Med Rehab. 2002, 83: 1229-1234. 10.1053/apmr.2002.34815.View ArticleGoogle Scholar
- Boone DC, Azen SP, Lin C, Spence C, Baron C, Lee L: Reliability of goniometric measurements. Phys Ther. 1978, 58: 1355-1360.PubMedGoogle Scholar
- Bostrom C, Harms-Ringdahl K, Nordemar R: Clinical reliability of shoulder function assessment in patients with rheumatoid artritis. Scand J Rheumatol. 1991, 20: 36-48.View ArticlePubMedGoogle Scholar
- Hayes K, Walton JR, Szomor ZL, Murrell GAC: Reliability of five methods for assessing shoulder range of motion. Austr J Physiother. 2001, 47: 289-294.View ArticleGoogle Scholar
- Johnson GR, Fyfe NCM, Heward M: Ranges of movement at the shoulder complex using an electromagnetic movement sensor. Ann Rheum Dis. 1991, 50: 824-827.View ArticlePubMedPubMed CentralGoogle Scholar
- Jordan K, Dziedzic K, Jones PW, Ong BN, Dawes PT: The reliability of the three-dimensional FASRAK measurement system in measuring cervical spine and shoulder range of motion in healthy subjects. Rheumatology. 2000, 39: 382-388. 10.1093/rheumatology/39.4.382.View ArticlePubMedGoogle Scholar
- MacDermid JC, Chesworth BM, Patterson S, Roth JH: Intratester and intertester reliability of goniometric measurement of passive lateral shoulder rotation. J Hand Ther. 1999, 12: 187-192.View ArticlePubMedGoogle Scholar
- Mayerson NH, Rose MA, Milano A: Goniometric measurement reliability in physical medicine. Arch Phys Med Rehabil. 1984, 65: 92-94.PubMedGoogle Scholar
- Riddle DL, Rothstein JM, Lamb RL: Goniometric reliability in a clinical setting: shoulder measurements. Phys Ther. 1987, 67: 668-673.PubMedGoogle Scholar
- Williams JG, Callaghan M: Comparison of visual estimation and goniometry in determination of a shoulder joint angle. Physiotherapy. 1990, 6: 655-657.View ArticleGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2474/5/18/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.