Spondyloarthritis-related and degenerative MRI changes in the axial skeleton - an inter- and intra-observer agreement study

Background The Back Pain Cohort of Southern Denmark (BaPa Cohort) was initiated with the aim of evaluating the clinical relevance of magnetic resonance imaging (MRI) in the diagnosis of early spondyloarthritis (SpA). In order to facilitate the collection of MRI data for this study, an electronic evaluation form was developed including both SpA-related and degenerative axial changes. The objective of the current study was to assess the intra- and inter-observer agreement of the MRI changes assessed. Methods Three radiologists evaluated 48 MRI scans of the whole spine and the sacroiliac joints from a subsample of the BaPa Cohort, consisting of patients with non-specific low back pain and patients with different stages of SpA features. The spine was evaluated for SpA-related and degenerative MRI changes and the SIJ for SpA-related changes. Inter- and intra-observer agreements were calculated with kappa statistics. In the interpretation of the kappa coefficient, the standards for strength of agreement reported by Landis and Koch were followed. Results A total of 48 patients, 40% men and mean age of 31 years (range 18 – 40 years), were evaluated once by all three readers and re-evaluated by two of the readers after 4-12 weeks. For MRI changes in the spine, substantial to almost perfect observer agreement was found for the location and the size of vertebral signal changes and for disc degeneration and disc contour. For the sacroiliac joints, substantial or almost perfect observer agreement was found for the grading of bone marrow oedema and fatty marrow deposition, the depth of bone marrow oedema and for subchondral sclerosis. Global assessment of the SpA diagnosis had substantial to almost perfect observer agreements. Conclusion The acceptable agreement for key MRI changes in the spine and sacroiliac joints makes it possible to use these MRI changes in the BaPa Cohort study and other studies investigating MRI changes in patients with non-specific low back pain and suspected SpA.


Background
Spondyloarthritis (SpA) is a group of rheumatological disorders, which result in back pain, and stiffness of the spine due to inflammatory and structural changes in the spine and the sacroiliac joints (SIJ). Plain-film radiography can detect structural changes but not early inflammatory changes. Magnetic resonance imaging (MRI) has been reported to identify both structural and inflammatory changes [1,2] and is considered essential in the diagnoses of SpA. However, there are still several uncertainties regarding the utility of MRI in the diagnosis of SpA [3], especially in the early stages when the clinical signs of SpA can be difficult to distinguish from non-specific low back pain (LBP) and the MRI signs of SpA can be difficult to distinguish from the much more common findings of degeneration. Signal changes related to degeneration such as Modic changes are an important pitfall in the assessment of SpA [4] and some studies have shown substantial variation in the extent of MRI lesions in the SIJ previously considered to be specific for SpA [5]. Therefore, studies encompassing patients reflecting the target population and using a MRI protocol including both SpA-related and degenerative changes are needed to validate the utility of this new imaging modality for the diagnosis of SpA.
On this basis, the Back Pain Cohort of Southern Denmark (BaPA Cohort) was initiated in 2011 at the Spine Centre of Southern Denmark with the aim of evaluating the clinical relevance of MRI in the diagnosis of early SpA. In order to facilitate the quantification of MRI changes in detail, an electronic evaluation form was developed for the evaluation of SpA-related and degenerative changes in the spine and SpA-related changes in the SIJ. The electronic MRI evaluation protocol was based on existing grading systems of active and chronic SpA changes in the spine [6] and SIJ [7]. These grading systems have been tested for inter-and intra-observer agreement in sum-scores with good results [6,7]. However, the current evaluation form was more detailed and included both SpA-related and degenerative spinal MRI changes. Thus, a new assessment of observer agreement was required.
The objective of the current study was therefore to assess the intra-and inter-observer agreement of SpArelated and degenerative changes in the spine and SpArelated changes in the SIJ assessing each lesion separately.

The study population
The analysis encompassed 48 sets of whole spine MRI scans in addition to MRI of the SIJ. All MRI scans were acquired from a subset of patients (n = 350) of the BaPa Cohort enrolled between March 2011 and February 2012. The BaPa Cohort consists of randomly selected patients aged between 18 and 40 years, referred to a secondary care sector outpatient spine clinic (Spine Centre of Southern Denmark). Patients were referred to the Centre for episodes of LBP ranging from 2 to 12 months, where there had been insufficient effect following conservative treatment in the primary care sector and there was no suspicion of specific LBP conditions such as SpA, fracture, cancer or infection. All patients who were included in the BaPa cohort received an MRI scan of the whole spine and the SIJ.
The patients included in the current analysis were selected by the primary investigator (BA) without involvement of the evaluating radiologists. Due to the low prevalence of some MRI changes to be evaluated in this cohort, 38 patients were chosen based on data from previous systematic evaluations of the MRI scans. The previous systematic evaluations were done at least 4 months prior to the readings in the current study. This selection method was used to increase the number of 'positive' MRI changes, thereby ensuring sufficient statistical power to calculate reliable kappa values. The remaining 10 patients were randomly selected from the remaining 312 patients.

Magnetic resonance imaging technique and evaluation
MRI of the whole spine and the SIJ was performed with a 1.5 T Philips Achieva (Best, The Netherlands) MRI System. A SENSE spine coil was used for imaging with the study participants in the supine position. The whole spine sequences were performed in three steps (cervical, thoracic and lumbar) subsequently fused digitally and encompassing: The images were read on dedicated radiological workstations with two 21-inch high-resolution screens. All MRIs were anonymised and blinded for all clinical information including previous readings and the patient's age and gender.
Three observers evaluated the images independently. They were all senior consultant radiologists at the Department of Radiology, Aarhus University Hospital, and were specialised in musculoskeletal imaging and SpA. Prior to the study, two calibration sessions were conducted. After a period of 4-12 weeks, two observers (AJ and AZ) re-evaluated all 48 MRI scans for intraobserver agreement.
The evaluation form consists of two parts: 1) evaluation of the spine and 2) evaluation of the SIJ. The spine was divided in 23 disco-vertebral units (DVU) from C2-C3 to L5-S1. A DVU was defined as the region between two virtual horizontal lines through the centre of two adjacent vertebrae ( Figure 1). Furthermore, each vertebral endplate and subjacent bone marrow area of a DVU were assessed separately for variables related to signal changes or erosions. An estimate of the total vertebral endplate and subchondral bone marrow areas was based on all sagittal slices creating "3D like picture" of the changes. The spinal MRI changes assessed are listed in Table 1. For a detailed definition of the MRI changes assessed, see Additional file 1.
The SIJs were subdivided in four osseous locations for each joint: the iliac and sacral bone corresponding to the cartilaginous and the ligamentous portion of the joint, respectively. An estimate of the total cartilaginous and ligamentous joint facets and the adjacent subchondral bone marrow areas was based on all semicoronal and semiaxial slices creating a 3D picture of both joint portions. The MRI changes assessed at the SIJ are listed in Table 2 according to the Danish method described previously [7]. For a detailed definition of the MRI changes assessed, see Additional file 1.
Global assessment of the SpA diagnosis was based on MRI changes in both the spine and the SIJ. Both regions were assessed at the same session. For each patient, the observer was asked to rate how strongly he/she agreed with the following: 'This patient has SpA'. For a detailed definition of the MRI changes assessed, see Additional file 1.
In the statistical analysis, the number of observations varied according to the variables assessed. In the spine, variables related to signal changes (with the exception of the total size of the signal changes) or erosions were evaluated for both the upper and lower endplates of 23 DVUs in the 48 patients (2208 endplates). 'Bone marrow oedema (BMO) in the costovertebral joints' was evaluated at 12 vertebral levels in 48 patients (576 levels). The remaining spinal variables were evaluated in 23 DVUs from 48 patients (1104 DVUs). In the SIJ, 8 regions in the 48 patients were evaluated (384 regions).

Data entry
The data were entered directly into a comprehensive clinical and imaging electronic database (the SpineData database) using an internet-based evaluation form. Data were subsequently exported to, and stored in, STATA11 format (StataCorp, 2000, Stata Statistical Software: Release 11.2, College Station, TX: STATA Corporation, USA) and checked for logic and consistency using the STATA 'do files' as documentation.

Statistical analysis
To assess the inter-and intra-observer agreement, ratings from each observer were cross-tabulated and agreement was measured using kappa statistics [8]. Results were reported as observed agreement, expected agreement and kappa values with 95% confidence Figure 1 Discovertebral unit modified from [6]. intervals (CI) for each pair of observers and combined for all three observers.
Kappa is defined as the difference between observed and expected agreement (by chance), expressed as a fraction of the maximum difference. Kappa = (observed agreement -expected agreement) / (1 -expected agreement) [8]. Dichotomous and nominal categorical variables were tested with ordinary kappa statistics and ordered categorical variables were tested with weighted Kappa. Quadratic weights were applied according to the number of categories. The quadratic weights are specified as 1 -{(i-j)/(k-1)}^2 where i and j index the rows and columns of the ratings by the two readers and k is the number of categories. The intra-class correlation coefficient (ICC), which is similar to an overall quadratic weighted kappa [9], was used as a measure of overall agreement between the three observers with the exception of two nominal categorical variables which had more than two categories: 'type of signal change' and 'location of signal change in the vertebral endplate' which were analysed with ordinary kappa. ICC was tested in a one way ANOVA model (absolute agreement).
95% confidence interval (CI) was calculated with an analytical method in the case of dichotomous variables [10] and by bootstrap resampling with 3000 repetitions for categorical variables with more than two categories [11,12].
In the interpretation of the kappa coefficient the standards for strength of agreement given by Landis and Koch were followed defined as slight (κ < 0.2), fair (0.2 ≤ κ < 0.4), moderate (0.4 ≤ κ < 0.6), substantial (0.6 ≤ κ < 0.8) and almost perfect (0.8 ≤ κ < 1) [13]. Only endplates where both readers agreed on the presence of a signal change were included in the analyses for the following variables: 'Signal change in the corner' , 'location of signal change in the vertebral endplate' , and 'size of signal change' , so the statistical analysis was a measure of agreement of location and size and not the presence of the given signal change. Similarly, only endplates where both readers agreed on the presence of erosions were included in the analyses for 'erosions in the corner'. In relation to intensity and depth of BMO and the depth of fatty marrow deposition (FMD), only observations where both readers agreed on the presence  of BMO or FMD respectively, were included in the analyses.
Analogous to the requirements for valid inference for contingency tables, we used a criterion of having at least 5 positive ratings for each variable for inclusion in the kappa analyses.
For statistical analysis, the STATA11 statistical package was used.

Ethics
The project was conducted in accordance with the Helsinki-II declaration. The Regional Scientific Ethical Committee for Southern Denmark has evaluated the study as not obligated of notification. Each patient gave written informed consent for research use and publication of their data. The establishment of the database is registered at the Danish Data Protection Agency and all clinical information about the participants are kept confidential and in line with the Danish Act on Processing of Personal Data.

Results
A total of 48 patients, 40% men and a mean age of 31 years (range 18 -40 years), were evaluated once by all three readers and re-evaluated by two of the readers after 4-12 weeks.

Spinal MRI changes
In relation to the combined inter-observer agreement of the spinal MRI changes, four findings: 'erosion of the corner' , 'BMO at the costovertebral joints' , 'FMD at the apophyseal joints' and, 'soft tissue oedema' , were excluded because of too few positive ratings ( Table 3).
The strength of the combined inter-observer agreement for spinal MRI changes ranged from slight (κ = .12) to almost perfect (κ = .90). Almost perfect agreement was found for 'location of signal changes in the vertebral endplate'. Substantial agreement was found for 'size of signal change' , 'disc degeneration' and 'disc contour'. Moderate agreement was found for 'type of signal change' , 'signal change in the corner' , 'total size of FMD lesions in the DVU' and 'herniation in the vertebral endplate'. Fair agreement was found for the 'total size of BMO in the DVU' and 'total size of mixed lesions in the DVU'. Slight agreement was found for 'Scheuermann's changes'. For 'erosions' , 'syndesmophytes or vertebral fusion' and 'BMO at the apophyseal joint' , only single pairwise analyses of inter-observer agreement were possible because of too few positive ratings. These analyses showed a fair, moderate and moderate agreement, respectively (Table 4).
In relation to the intra-observer agreement, four MRI findings: 'erosions in the corner' , 'BMO at the costovertebral joints' 'FMD at the apophyseal joints' and 'soft tissue oedema' were excluded because of too few positive ratings. Furthermore, 'erosions' and 'BMO at the apophyseal joint' could only be analysed for one reader because of too few positive ratings ( Table 3).
The strength of the intra-observer agreement for the spinal MRI changes ranged from moderate (κ = .56) to almost perfect (κ = .98) for reader A and from substantial (κ = .67) to almost perfect (κ = .93) for reader B. In general, the strength of intra-observer agreement was notably higher than the strength of inter-observer agreement (Table 5). All kappa values were above 0.7, except for two MRI changes for reader A ('total size of BMO in the DVU' and 'total size of FMD in the DVU') and one finding for reader B ('Scheuermann's changes').

Changes in the sacroiliac joints
The strength of the combined inter-observer agreement for evaluation of the SIJ changes ranged from moderate (κ = .52) to almost perfect agreement (κ = .81) ( Table 6). Almost perfect agreement was found for 'BMO' and substantial agreement was found for 'depth of BMO' , 'FMD' and 'subchondral sclerosis'. Moderate agreement was found for 'depth of FMD' and 'erosions'. For 'intensity of BMO' and ankylosis, only single pairwise analyses was possible because of too few positive ratings (Table 3). These analyses showed moderate and substantial agreement, respectively. The strength of intra-observer agreement was stronger than the inter-observer agreement. For reader A, the strength of agreement ranged from substantial (κ = .77) to almost perfect (κ = .96) and for reader B also from substantial (κ = .75) to almost perfect (κ = .91). For details, see Table 7.

Discussion
In this study, the agreement of different SpA-related and degenerative changes in the spine and SpA-related changes in the SIJ were tested jointly in a sample of patients with non-specific LBP only and patients with LBP associated with different stages of SpA. The The numbers in the table refer to ratings > 0 (not normal) for each of the MRI changes. Only MRI changes with at least 5 positive ratings pr. reading were included in the kappa analyses. BMO, Bone marrow oedema; FMD, Fatty marrow deposition. 1 Only observations where the reader on both the readings agreed on the presence of a signal change, BMO or FMD, respectively, were kept in the analysis. majority of earlier studies on agreement on SpA-related and degenerative changes have been focused on separate regions of the spine, primarily the lumbar spine, whereas this study included the whole spine and the SIJ. Moreover, for the MRI changes evaluated in the SIJ, this is the first time agreement has been tested assessing each lesion separately. In general, the agreement ranged from slight to almost perfect. As expected, the level of intra-observer agreement was higher than the inter-observer agreement. Agreements for MRI changes in the SIJ were generally stronger than for the spine. For the spinal MRI changes, 'disc degeneration' and 'disc contour' yielded the highest level of agreement followed by 'location of signal changes in the vertebral endplate' , 'size of signal change' and 'type of signal change'. In relation to the evaluation of the SIJ, 'BMO' , 'depth of BMO' , 'FMD' , and 'ankylosis' were the changes with the best agreement. Global assessment showed substantial to almost perfect agreements.
The tendency of better reliability of the SpA-related findings in the SIJ compared to the spine could be explained by low prevalence of SpA-related findings in the spine. In addition, changes in the posterior spinal elements often are relatively small and can be difficult to assess on sagittal MRI slices.

Comparison with previously published studies
The number of previous studies on observer agreements on spinal MRI changes related to SpA is limited. One previous study evaluated the agreement of structural SpA-related changes at each vertebral level in 20 patients with established SpA [14]. Kappa value of 0.60, 0.21, and 0.59 were found for 'non-corner vertebral endplate erosions' , 'vertebral corner spurs' and 'ankylosis' , respectively. However, differences in the definitions and in the study sample preclude a direct comparison with our results. Furthermore, there are published studies evaluating the agreement of sum scores for the whole spine [6,15], which unfortunately preclude comparison with the evaluation of changes at the endplate level.
In relation to the evaluation of signal changes in the endplates, these changes are not only observed in patients with suspected SpA but also in other populations. Several authors have reported inter-and intra-observer agreement in the range of .30-.88 [16][17][18][19][20][21][22][23][24][25] and .70-.94 [16][17][18][19][20], respectively, for populations of LBP patients [18,[20][21][22][23], unspecified patients [17,24], asymptomatic patients [25] and general populations [16,19]. Of these studies, four report confidence intervals [16][17][18][19] thereby allowing reliable comparison of results between studies. In relation to the evaluation of the type of signal changes, two of the four studies reporting CIs had statistically higher inter-observer agreement [16,19] and two had comparable results [17,18]. The intra-observer agreements found in all four studies were comparable with the results from the current study. However, both study samples and the definitions of signal changes in these studies differed from the current study. Agreements regarding location of signal changes were reported in two of the studies [16,19] and were in concordance with the current study; however these definitions also varied from the one used in the current study.  Agreements regarding size of signal changes were reported in one of the four studies, with results in concordance with the current study [16]. In relation to the evaluation of signal changes located in the vertebral corner, agreement of BMO and FMD corner lesions has been analysed in a previous study sample encompassing 20 patients with established SpA. The reported kappa values ranged from 0.23 to 0.72 for BMO lesions [26] and from 0.60 to 0.72 for FMD lesions [14]. However, differences in the definitions and in the study sample preclude a direct comparison of results. Disc degeneration was assessed using Pfirrmann's grading system [27] and substantial to almost perfect inter-and intra-observer agreements, respectively, were found in accordance with earlier reports on this grading system [27,28], although no studies with CIs were identified. In relation to disc contour, similar agreements were found which are also comparable with previous reports [29,30].
The inter-observer agreement for herniations in the vertebral endplate was found to be fair. This is slightly inferior to the results of a previous study on LBP patients, but the intra-observer agreements were comparable [31].
In relation to Scheuermann's changes, the interobserver agreement was slight and the intra-observer agreement, moderate. To our knowledge, there are no previous agreement studies regarding Scheuermann's changes using MRI.
In relation to the evaluations of the SIJ, either substantial or almost perfect inter-and intra-observer agreements were found for the majority of MRI changes in the current study. The exceptions were for the intensity of BMO, the depth of FMD and erosions which had a moderate inter-observer agreement. To our knowledge, no earlier studies report on the agreement of these changes assessed as single lesions. Several studies that assess each lesion individually were identified. However, these studies report only results on analysis performed on combinations of these findings, e.g. sum score of total findings or anatomical regions [32][33][34][35][36], which are not comparable with assessing agreement on each lesion. BMO: Bone marrow oedema, FMD: fatty marrow deposition, -Too few positive ratings for one of the observers to be included in the analysis 1 Intraclass correlation coefficient. 2 Only observations where both readers agreed on the presence of BMO/FMD were kept in the analysis. 3 Assessed on the entire MRI scan.
Regarding global assessment, one recent study investigated the inter-observer agreement for global evaluation of MRI of the SIJ in SpA versus non-SpA patients. The kappa value for inter-observer agreement for 5 categories of confidence in the SpA diagnosis were found to be .73 (.62-.81) in a cohort of back pain patients referred to a secondary care outpatient clinic in Switzerland due to suspicion of SpA and .74 (.65-.80) in cohorts of back pain patients with anterior uveitis referred to a ophthalmology department in Canada [37]. This is higher than the interobserver agreement found for global assessment in the current study but with overlapping CI. In general, the spinal MRI findings related to SpA are not as clearly defined as the findings related to the SIJ, which is reflected in the incorporation of only SIJ changes in the ASAS criteria for SpA. Therefore, one reason for the lower agreement in the current study could be that the inclusion of spinal changes in the global assessment increases the uncertainty of the diagnosis.

Application of the findings
The acceptable agreement for the evaluation of key MRI changes in the spine and SIJ makes it possible to use these MRI changes in the BaPa Cohort study and other studies investigating MRI changes in patients with nonspecific LBP and suspected SpA.
Earlier publications on the evaluation of SpA-related MRI findings have mainly been focused on grading systems for active and chronic SpA changes as a measurement of disease severity in already diagnosed SpA patients. However, the assessment of each lesion separately creates the potential for additional analysis of the diagnostic and prognostic value of each individual MRI finding. It also creates the potential for describing the development of the changes in subsequent longitudinal studies and it provides a possibility for analysing locationspecific alterations, e.g. to compare MRI changes with pain location. Furthermore, the inclusion of both SpArelated and degenerative changes in the same evaluation protocol facilitate an accessible assessment of MRI findings that could mimic SpA-related findings, assessed under the same standardized evaluation session.

Strengths and weaknesses of the study
This study has potential weaknesses that have to be addressed. Firstly, some MRI changes could not be analysed because of too few positive ratings, and the agreement of the evaluation of these findings could not be tested. If this problem were to be addressed, the study population would have to have contained patients with more pronounced SpA. However, this would have made the study sample less applicable to the BaPa Cohort, to which the evaluation protocol will be applied. For some SIJ: sacroiliac joints, BMO: Bone marrow oedema, FMD: fatty marrow deposition, -Too few positive ratings for one of the observers to be included in the analysis 1 Only observations where the reader on both the readings agreed on the presence of BMO/FMD were kept in the analysis. 2 Assessed on the entire MRI scan, not only the SIJ. 3 Number of patients assessed.
of the MRI changes, the inter-observer agreement varied between reader pairs, despite training and calibration sessions, indicating that more effort could have been done in calibration, especially regarding vertebral disc herniation and Scheuermann's changes.
This study also has a number of strengths. MRI of the whole spine and SIJ were read by three independent readers and intra-observer agreement was tested by two of the readers. The involvement of more than two readers improves the generalisablity of the evaluation method. Moreover, for the MRI changes related to SpA in both the spine and SIJ, this is the first time agreement has been tested assessing each lesion separately. This creates the potential for describing the development of the changes in subsequent studies, and the possibility for analysing location-specific alterations. Furthermore, the readers were highly specialized musculoskeletal radiologists, and training and calibration sessions were conducted prior the readings.

Conclusion
The inter-and intra-observer agreement for the evaluation of spondyloarthritis-related and degenerative MRI changes in the spine and spondyloarthritis-related changes in the sacroiliac joints were investigated in this study. In the spine, substantial to almost perfect observer agreement was found for the evaluation of the location and the size of vertebral signal changes and for disc degeneration and disc contour. In the sacroiliac joints substantial to almost perfect observer agreement was found for the grading of bone marrow oedema and fatty marrow deposition, the depth of bone marrow oedema and for subchondral sclerosis. Also, 'Global assessment' regarding the spondyloarthritis diagnosis had substantial or almost perfect observer agreements.