- Research article
- Open Access
Inter-rater reliability of clinical mobility measures in ankylosing spondylitis
BMC Musculoskeletal Disorders volume 17, Article number: 382 (2016)
Several measurements are often used in daily clinical practice in the assessment of Ankylosing Spondylitis (AS) patients. The Assessment in SpondyloArthiritis International Society (ASAS) recommend in its core set: chest expansion modified Schöber test, Occiput to wall distance, lateral lumbar flexion, cervical rotation and The Bath Ankylosing Spondylitis Metrology Index (BASMI). BASMI also includes five measurements, some of them recommended by ASAS. Three versions of BASMI have been published with different scales and intervals for each component of the index. Though studies about reliability of these measurements are needed. The aim of this study was to analyze inter-rater reliability of recommended spinal mobility measures in AS.
We examined reproducibility of spinal mobility measurements on 33 AS patients performed by two experienced rheumatologists in the same day. Descriptive statistics, Intraclass Correlation Coefficients (ICC), and Smallest Detectable Difference (SDD) using the Bland-Altman criteria were obtained for all the measurements.
Chest expansion showed the lowest value of ICC (0.66) and occiput-wall the highest (0.97). SDD was 2.43 units for BASMI2 and 1.27 units for BASMI10.
Reliability according to ICC was moderate to high in all measurements. BASMI10, instead BASMI2, must be used: measurements used to calculate are the same but there is better reliability. Inter-rater variation, expressed as SDD, must be taken in account: smaller improvements do not demonstrate the efficacy of treatment because they can be due to experimental error and not to the treatment itself.
Ankylosing Spondylitis (AS) is a subtype of Spondyloarthropaties (SpA), which affects mainly the spine. Spinal mobility impairment in AS patients is caused both by inflammation and structural damage of the spine [1, 2]. Assessment of the reduction in spinal mobility is fundamental to evaluate disease stage and disease evolution in the patients . Several studies showed that the evolution of the disease is highly correlated with the reduction in spinal mobility [4, 5]. Several measurements were defined to assess spinal mobility in AS, among them due to their greater acceptance are those recommended by ASAS (The Assessment of SpondyloArthritis international Society) group its comprise different spinal mobility measures in a core set : chest expansion, modified Schöber test, occiput to wall distance, lateral lumbar flexion or BASMI (Bath Ankylosing Spondylitis Metrology Index).
BASMI was defined in 1994 by Jenkinson et al.  and includes five measurements: cervical rotation, tragus to wall distance, lateral lumbar flexion, modified Schöber, and intermalleolar distance. Each one of these measurements is classified in three levels of severity (0 = Mild, 1 = Moderate, 2 = Severe) according to values defined for each interval. Summing up the results obtained for the five measurements, an index between zero and ten is obtained. This index was validated in several studies showing good reliability and correlation with radiological measures . BASMI is used as a tool for patient’s classification and to analyze the sensitivity to change of different treatments. In 1995, a second definition of BASMI was published by Jones et al. , using the same measurements, but establishing ten intervals in each measurement. Averaging individual scores, an index between zero and ten is obtained. This new scale gave a greater precision of the evaluation obtained by each measurement (multiples of 0.2 units instead multiples of 1 unit). More recently, Van deer Heidje et al.  proposed a BASMI version in which a linear function was applied for calculate each index component. According to the authors, this definition of BASMI, whose results were very similar to Jenkinson’s version, provides better results of the reliability and sensitivity to change. Thus, there are three versions of BASMI: original BASMI2, BASMI10 and linear BASMILIN.
BASMI is not always performed in daily clinical practice, due to the difficulty to obtain certain measurements (cervical rotation with a goniometer, intermalleolar distance needs more physical space). Although BASMI2 is used more than BASMI10, the latter has higher accuracy, for the same measurements. Although in most publications BASMI2 is used, it is often not clear to the readers which of the three definitions were applied.
In this study, we analyzed inter-observer variability of different spinal mobility including ASAS core set and BASMI, with its three different versions. Our aim was to obtain reliability of the spinal measurements and to determine the smallest detectable difference, which must be considered in order to demonstrate the efficacy of the treatment assessed with spinal measurements.
We included 33 consecutive patients from daily clinical practice from Rheumatology Department of University Hospital Reina Sofia, Córdoba. Inclusion criteria were: patients diagnosed with AS according to the modified New York criteria, having at least 5 years of disease duration and with ages between 18 and 80 years. They were all informed and consented to participate in the study, who was approved by the Reina Sofia Hospital Research Ethics Committee. Exclusion criteria were: pregnant, spinal surgery and scoliosis.
Only four of them were female. These patients had different level of mobility impairment varying from 0.94 to 8.78 (average value 4.75) according to BASMI10. The medium age was 50.35 years, and disease evolution was 24.61 years.
Spinal measurements were performed by two experienced rheumatologists. Two assessments in independent and isolated way were performed in the same day by each rheumatologist. All tests were done in the evening during three months period.
There are more than 20 measurements used for AS assessment. Mobility measures were reviewed by Sieper et al. . This review makes a precise description of the most used metrology measures (including all the measures analyzed in our study) and how to calculate them.
ASAS recommends chest expansion, modified Schöber and occiput to wall distance and lateral lumbar flexion or BASMI. BASMI includes: cervical rotation, tragus to wall distance, lateral lumbar flexion, modified Schöber and intermalleolar distance. Three ranges of BASMI were considered: BASMI2, BASMI10, BASMILIN. As, the last two are very similar we will use only BASMI2 and BASMI10. Finger to floor distance was also included because it is a measure often used in studies. In total, we analyzed eight measurements and two indexes.
We used intraclass correlation coefficient (ICC) for statistical analysis of inter-observer reliability. A value upper to 0.6 indicates good reliability, a value superior to 0.8 indicates a very good reliability and upper to 0.9 represents an excellent reliability. Determination of the smallest detectable difference was made using Bland-Altman method . According to this method, 95 % limit of confidence is defined as the difference measured between observers for each measurement +/− 1.96 the standard deviation. Assuming that the differences are normally distributed the mean difference must be near zero. We used SPSS® 14.0 (SPSS International BV, Chicago, USA) and Medcalc® 11.3.6 (Medcalc Software bvba, Mariakerke, Belgium) to interpret the results.
Results obtained by both observers for the analyzed parameters are shown in Table 1. High variability was observed in chest expansion and Schöber test. Values over one unit in BASMI10 and two units in BASMI2 indicated high values of SDD.
Table 2 shows inter-observer reliability according to ICC. Results of reliability compared with already published studies are also included. Although ICC values are high, for a good instrument for individual decision-making, these values must be over 0.9. Not all measurements fulfill this condition.
Correlations between measures (Pearson) are showed in Table 3. High correlation values appeared in BASMI indexes with the rest of measures.
Figure 1 shows Bland-Altman plots comparing the scores of the two BASMI definitions obtained for both observers.
The main conclusion of our study is that inter-observer variability expressed as SDD must be kept in mind in order to justify patient improvement for short-term follow-up treatments. Every measurement included in ASAS core set and BASMI was analyzed.
Davis et al.  in a bibliographical review, studied the different spinal mobility measurements used to assess loss of mobility in AS, including BASMI, analyzing their validity applying the OMERACT filter (Outcome Measures in Rheumatoid Arthritis Clinical Trials). Although Davis shown good results, some studies show certain problems of reliability, accuracy and variability of these mobility measurements. Auleley et al.  calculated the smallest detectable difference (SDD) of several measurements used in AS assessment (chest expansion, occiput to wall distance and modified Schöber). This was the first study providing SDD as outcome measurement in AS based on Bland-Altman’s 95 % limits of agreement method . The SDD was relatively high and, although ICCs were high, it appeared to be poorly reliable judged by SDD. Consequently, changes smaller than SDD, could be considered as measurement error.
Different reliability results were obtained, for each measures analyzed. Next we will review some of these results comparing them with the obtained in the current literature.
Chest expansion showed the worst results (CV 47.31 %, ICC 0.659, SDD 3.27 cm). Is one of the most complicated measures to be obtained and it has special problems to be done in women. According to Auleley  ICC was 0.85 and SDD 2.4 cm. This measurement may be useful, because the reduction of pulmonary capacity in AS is well known, but the actual measure system is not appropriate. Tzelepis et al.  used the thoracoabdominal movement in breathing as outcome parameter of the disease.
Modified Schöber is one of the mobility measures more used. Clear correlations with radiology and symptoms duration has been described . Reliability results were good (CV 42,8 %, ICC 0.756, SDD 2.39 cm), higher than chest expansion but lower than the rest of the measures. Our results were very similar with Auleley  (ICC 0.60, SDD 3.3 cm).
Occiput-wall / Tragus-wall distances are related with kyphosis seen in AS. Greater level of affectation implies greater kyphosis. ASAS recommends occiput-wall, although this parameter is difficult to measure when the patient is only some millimeters separated from the wall. Measuring tragus-wall is easier in this situation (BASMI includes this measure), but this distance depends on the size of the patient’s head. In BASMI2, values less than 15 cm are scored with 0 (no affectation). Normally the distance tragus to occiput is about 11 cm. In this case, there is no problem if the patient touches the wall. However, in BASMI10, the zero value for this measure is 10 cm, so the patient will be scored with 1 unit and will be affected according this index. Some studies [18, 19] assessed BASMI in healthy subjects and they discovered that it is unusual for healthy individuals to score zero on the BASMI. Both measures, obtained good reliability results (especially tragus to wall). Also they showed good correlations with the rest of parameters. Therefore, kyphosis is a good indicator of the level of affectation.
Floor to finger distance is not included in ASAS core set neither in BASMI. In spite of having a high reproducibility (ICC = 0.948), it does not correlate well with BASMI index (r = 0.44), nor with the rest of parameters. This fact can be due to the influence of the height of the subject, although some studies declined it . Some studies show poor correlation with radiology for this measure .
Unlike finger to floor distance, lateral lumbar flexion showed good correlation with the rest of measures (Schöber, occiput and tragus to wall distance, and BASMI, p < 0.01). ICC was good (0.817), something inferior than obtained by other authors (0.95–0.98).
Intermalleolar distance measure showed excellent values of repeatability (ICC = 0.944) and good correlations with some of the analyzed parameters. It is complex to measure and requires more space and time in clinical practice, for this reason is not habitually used.
We have calculated BASMI using the two scales previously described (BASMI2.and BASMI10). BASMI10 showed better results than BASMI2. BASMI correlated well with almost all of the measures. Floor to finger distance did not show correlation with this index. We obtained a SDD of BASMI2 of 2.43 units, while for BASMI10 it was half the value (1.27 units). This result is shown in Fig. 1. CV also varied from 30.47 to 13,71 %. ICC was good in both cases (0.894 to 0.956); therefore the variability is smaller for the second index. The published results of variability are near to the values obtained in our study.
According to Madsen  the SDD, in BASMI2 was of +/− 1.4 (+/− 2 units in valuations on individual patients). Another study performed by Martidale et al.  shown that for repeat assessments of the same participant, differences in BASMI of 1.0 or less are within bounds of error. Therefore, an improvement below these units may be due to the experimental error and not to the treatment itself.
Some studies showed improvements of less than one unit in BASMI2 and BASMI10 in patients treated with biological agents [24, 25]. These values are less than the smallest detectable difference established for BASMI and therefore, the improvement could be due to the experimental error of the measure. It is a fact that BASMI is seldom used to evaluate the short-term effectiveness of the treatment. Some authors prefer to use lateral side flexion instead BASMI . Braun et al.  indicated that although biological treatment improves AS activity indices, this improvement is less important with the respect to spinal mobility (assessed with BASMI), but he strengthened out that BASMI does not have much sensitivity to change. Jauregui et al.  analyzed BASMI in a controlled trial using pamidronated and they concluded that responsiveness of the BASMI was poor with either scoring system (BASMI2 and BASMI10).
To summarize, BASMI10 must be used because the measures included are the same and requires only little extra effort in its calculation. Although we obtained better results for BASMI10, BASMI2 is still using in clinical practice.
As a limitation of our study, the results we provide for SDD are based on a relative small number of patients and observers but, the results are similar to other studies.
In order to analyze clinical significance of our results, SDD of the different measurements must be kept in mind when demonstrating the efficacy of treatments in short term studies. Another possibility is to research for advanced metrology tools with better reliability results to assess mobility in AS patients.
Assessment of Spondylo Arthritis International Society
Bath ankylosing spondylitis metrology index
Coefficient of variation
Intraclass correlation coefficient
Outcome measures in rheumatoid arthritis clinical trials
Small detectable difference
Machado P, Landewé R, Braun J, et al. Both structural damage and inflammation of the spine contribute to impairment of spinal mobility in patients with ankylosing spondylitis. Ann Rheum Dis. 2010;69(8):1465–70.
Calvo-Gutierrez J, Garrido-Castro JL, Gil-Cabezas J, et al. Is spinal mobility in patients with spondylitis determined by age, structural damage, and inflammation? Arthritis Care Res (Hoboken). 2015;67(1):74–83.
Viitanen JV, Heikkila S, Kokko ML, et al. Clinical assessment of spinal mobility measurements in ankylosing spondylitis: a compact set for follow-up and trials? Clin Rheumatol. 2000;19(2):131–8.
Gran JT, Skomsvoll JF. The outcome of ankylosing spondylitis: a study of 100 patients. Br J Rheumatol. 1997;36(7):766–71.
Viitanen JV, Kokko ML, Lehtinen K, et al. Correlation between mobility restrictions and radiologic changes in ankylosing spondylitis. Spine. 1995;20(4):492–8.
Van der Heijde D, Dougados M, Davis J, et al. ASsessment in Ankylosing Spondylitis International Working Group/Spondylitis Association of America recommendations for conducting clinical trials in ankylosing spondylitis. Arthritis Rheum. 2005;52:386–94.
Jenkinson TR, Mallorie PA, Whitelock HC, et al. Defining spinal mobility in ankylosing spondylitis (AS). The Bath AS Metrology Index (BASMI). J Rheumatol. 1994;21:1694–8.
Haywood KL, Garratt AM, Jordan K, et al. Spinal mobility in ankylosing spondylitis: reliability, validity and responsiveness. Rheumatology (Oxford). 2004;43(6):750–7.
Jones SD, Porter J, Garrett SL, et al. A new scoring system for the Bath Ankylosing Spondylitis Metrology Index (BASMI). J Rheumatol. 1995;22(8):1609.
Van der Heijde D, Landewe R, Feldtkeller E. Proposal of a linear definition of the Bath Ankylosing Spondylitis Metrology Index (BASMI) and comparison with the 2-step and 10-step definitions. Ann Rheum Dis. 2008;67:489–93.
Maksymowych WP, Mallon C, Richardson R, et al. Development and validation of the Edmonton Ankylosing Spondylitis Metrology Index. Arthritis Rheum. 2006;55(4):575–82.
Garrido-Castro JL, Escudero A, Medina-Carnicer R, et al. Validation of a new objective index to measure spinal mobility: the University of Cordoba Ankylosing Spondylitis Metrology Index (UCOASMI). Rheumatol Int. 2014;34(3):401–6.
Sieper J, Rudwaleit M, Baraliakos X, Brandt J, Braun J, Burgos-Vargas R, Dougados M, Hermann KG, Landewé R, Maksymowych W, van der Heijde D. The Assessment of SpondyloArthritis international Society (ASAS) handbook: a guide to assess spondyloarthritis. Ann Rheum Dis. 2009;68 Suppl 2:ii1–44.
Bland M, Altman D. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;1:307–10.
Davis Jr JC, Gladman DD. Spinal mobility measures in spondyloarthritis: application of the OMERACT filter. J Rheumatol. 2007;34(4):666–70.
Auleley G, Benbouazza K, Spoorenberg A, et al. Evaluation of the smallest detectable difference in outcome or process variables in ankylosing spondylitis. Arthritis Rheum. 2002;47(6):582–7.
Tzelepis GE, Kalliakosta G, Tzioufas AG, et al. Thoracoabdominal motion in ankylosing spondylitis: association with standardised clinical measures and response to therapy. Ann Rheum Dis. 2009;68(6):966–1037.
Chilton-Mitchell L, Martindale J, Hart A, Goodacre L. Normative values for the Bath Ankylosing Spondylitis Metrology Index in a UK population. Rheumatology (Oxford). 2013;52(11):2086–90.
Ramiro S, van Tubergen A, Stolwijk C, van der Heijde D, Royston P, Landewé R. Reference intervals of spinal mobility measures in normal individuals: the MOBILITY study. Ann Rheum Dis. 2015;74(6):1218–24.
Maksymowych WP, Mallon C, Richardson R, Conner-Spady B, Chung C, Russell AS. Does height influence the assessment of spinal and hip mobility measures used in ankylosing spondylitis? J Rheumatol. 2006;33(10):2035–75.
Viitanen JV, Kautiainen H, Suni J, Kokko ML, Lehtinen K. The relative value of spinal and thoracic mobility measurements in ankylosing spondylitis. Scand J Rheumatol. 1995;24(2):94–101.
Madsen OR, Hansen LB, Rytter A, et al. The Bath metrology index as assessed by a trained and an untrained rater in patients with spondylarthropathy: a study of intra- and inter-rater agreements. Clin Rheumatol. 2009;28(1):35–40.
Martindale JH, Sutton CJ, Goodacre L. An exploration of the inter- and intra-rater reliability of the Bath Ankylosing Spondylitis Metrology Index. Clin Rheumatol. 2012;31(11):1627–31.
Van der Heijde D, Kivitz A, Schiff MH, et al. Efficacy and safety of adalimumab in patients with ankylosing spondylitis: results of a multicenter, randomized, double-blind, placebo controlled trial. Arthritis Rheum. 2006;54:2136–46.
van der Heidje D, Deodhar A, Inman RD, et al. Comparison of three methods for calculating the Bath Ankylosing Spondylitis Metrology Index in a randomized placebo-controlled study. Arthritis Care Res (Hoboken). 2012;64(12):1919–22.
Brandt J, Listing J, Sieper J, et al. Development and preselection of criteria for short term improvement after anti-TNF alfa treatment in ankylosing spondylitis. Ann Rheum Dis. 2004;63:1438–44.
Braun J, Baraliakos X, Listing J, Fritz C, Alten R, Burmester G, Krause A, Schewe S, Schneider M, Sorensen H, Zeidler H, Sieper J. Persistent clinical efficacy and safety of anti-tumour necrosis factor alfa therapy with infliximab in patients with ankylosing spondylitis over 5 years: evidence for different types of response. Ann Rheum Dis. 2008;67:340–6.
Jauregui E, Conner-Spady B, Russell AS, et al. Clinimetric evaluation of the bath ankylosing spondylitis metrology index in a controlled trial of pamidronate therapy. J Rheumatol. 2004;31(12):2422–8.
No funding was obtained for this study.
Availability of data and material
The data and materials in current paper may be made available upon request through sending e-mail to first author.
The authors declare that they have no competing interests.
JCG and JLGC designed the methods and aims of the study, interpreted the data and drafted the manuscript. ROC and CLM obtain the metrology measures on patients. CGN and PFU make and analyzed statistics. MCCV, AEC and ECE performed critical revision of the manuscript. All authors read and approved the final manuscript.
Consent for publication
Ethics approval and consent to participate
A statement relating to ethical approval and consent was included in the methods section as follows: “They were all informed and consented to participate in the study, who was approved by the Reina Sofia Hospital Research Ethics Committee.”
About this article
Cite this article
Calvo-Gutiérrez, J., Garrido-Castro, J.L., González-Navas, C. et al. Inter-rater reliability of clinical mobility measures in ankylosing spondylitis. BMC Musculoskelet Disord 17, 382 (2016). https://doi.org/10.1186/s12891-016-1242-1
- Ankylosing spondylitis
- Smallest detectable difference