Semi-quantitative MRI biomarkers of knee osteoarthritis progression in the FNIH biomarkers consortium cohort − Methodologic aspects and definition of change

Background To describe the scoring methodology and MRI assessments used to evaluate the cross-sectional features observed in cases and controls, to define change over time for different MRI features, and to report the extent of changes over a 24-month period in the Foundation for National Institutes of Health Osteoarthritis Biomarkers Consortium study nested within the larger Osteoarthritis Initiative (OAI) Study. Methods We conducted a nested case–control study. Cases (n = 406) were knees having both radiographic and pain progression. Controls (n = 194) were knee osteoarthritis subjects who did not meet the case definition. Groups were matched for Kellgren-Lawrence grade and body mass index. MRIs were acquired using 3 T MRI systems and assessed using the semi-quantitative MOAKS system. MRIs were read at baseline and 24 months for cartilage damage, bone marrow lesions (BML), osteophytes, meniscal damage and extrusion, and Hoffa- and effusion-synovitis. We provide the definition and distribution of change in these biomarkers over time. Results Seventy-three percent of the cases had subregions with BML worsening (vs. 66 % in controls) (p = 0.102). Little change in osteophytes was seen over 24 months. Twenty-eight percent of cases and 10 % of controls had worsening in meniscal scores in at least one subregion (p < 0.001). Seventy-three percent of cases and 53 % of controls had at least one area with worsening in cartilage surface area (p < 0.001). More cases experienced worsening in Hoffa- and effusion synovitis than controls (17 % vs. 6 % (p < 0.001); 41 % vs. 18 % (p < 0.001), respectively). Conclusions A wide range of MRI-detected structural pathologies was present in the FNIH cohort. More severe changes, especially for BMLs, cartilage and meniscal damage, were detected primarily among the case group suggesting that early changes in multiple structural domains are associated with radiographic worsening and symptomatic progression.


Background
Knee osteoarthritis (OA) is a major public health concern with current treatment focusing on controlling symptoms since there are no interventions that have yet been approved for modifying the course of the disease or improving structural alterations in joint tissues [1]. The Foundation for the National Institutes of Health (FNIH) sample was selected for a nested case-control study designed to evaluate the predictive validity of a broad spectrum of imaging and biochemical markers of disease progression in knee OA derived from the Osteoarthritis Initiative (OAI) public data base, an ongoing multi-center prospective observational cohort study of knee OA [2]. A biomarker that exhibits change over the near-term and is associated with longer-term clinically important outcomes would have potential as a marker of treatment efficacy [2].
While radiography depicts structural bony tissue changes only in advanced stages of OA, magnetic resonance imaging (MRI) is able to visualize all involved joint tissues, even in the earliest stages of disease, in which radiographs are normal [3,4]. Recent data suggest that non-cartilaginous tissue changes in particular play an important role in the onset and progression of osteoarthritis [5,6].
Using multivariable logistic regression models to examine associations between structural MRI markers and progression of radiographic and pain outcomes, we showed recently that all baseline structural joint features with the exception of effusion-synovitis and meniscal morphology, were able to predict 48 month case status and that for all joint features evaluated including size of bone marrow lesions, cartilage thickness and surface area, effusion-synovitis, meniscus morphology and -extrusion, osteophyte size, and Hoffa-synovitis, change over 24 months was associated with progression of disease [7]. However, definitions of change using complex scoring systems are challenging and need to be defined carefully prior to engaging in detailed analyses focused on outcomes and prediction models. As currently only sparse data are available on reliability and definitions of change in semi-quantitatively assessed MRI studies, we believe that a detailed description will be helpful to investigators focusing on samples at risk for progression; these data were not covered in the recent publication [7].
Thus, the aims of our study were to describe the scoring methodology and MRI assessments used to evaluate the cross-sectional features observed in cases and controls, to define change over time for different MRI features and to report the extent of changes over a 24month period, which may serve as a potential reference for future studies focusing on MRI features and progression over similar observational periods.

Study design
The Osteoarthritis Initiative (OAI) is an ongoing multicenter prospective observational cohort study of knee OA (http://www.oai.ucsf.edu/) that enrolled 4796 participants aged 45-79 years at four clinical centers. Clinical data, MRI scans, radiographs and serum and urine specimens were obtained at baseline, 12, 24, 36, and 48 months (M) follow-up [8]. Eligible participants for the present study were those with at least one knee with a Kellgren-Lawrence grade (KLG) of 1-3 at baseline.

Criteria for case-control selection
Radiographic progression was defined by a decrease in minimal joint space width of ≥0.7 mm in loss in the medial tibio-femoral compartment from baseline to 24, 36 or 48 M.
Knee pain was assessed using the Western Ontario McMasters (WOMAC) pain subscale. Symptomatic progression was defined as a persistent increase of ≥9 points on a 0-100 normalized score from baseline to 24, 36, 48 or 60 months. This difference has been documented to be clinically relevant [9].
For the nested case-control study, a predetermined number of index knees was selected in the following outcome groups for measurement of imaging biomarkers [6]: 1) case knees had both radiographic and pain progression; control knees did not have this combination, and included 2) knees with radiographic but not pain progression, 3) knees with pain but not radiographic progression, and 4) knees with neither radiographic nor pain progression. The sample size for cases and these three control groups was 194, 103, 103 and 200 knees, respectively. For the purposes of this analysis we compared 194 cases vs. 406 controls.

MRI acquisition and assessment
MRIs of both knees were acquired using 3 T systems (Siemens Trio) at the 4 OAI clinical sites. A dedicated quadrature transmit/receive knee coil was used and the sequence protocol included a coronal intermediateweighted 2-dimensional turbo spin echo sequence, a sagittal 3-dimensional dual-echo steady-state sequence, and a sagittal intermediate-weighted fat-suppressed turbo spin-echo sequence [10].
Two musculoskeletal radiologists with 13 (FWR) and 15 (AG) years' experience of semi-quantitative assessment of knee OA, blinded to clinical data and case-control status, read the baseline and 24 month MRIs according to a validated scoring system [11], and with knowledge of the chronological order of the scans. The following joint structures were assessed: cartilage morphology, osteophytes, subchondral bone marrow lesions (BMLs), meniscal structural damage and meniscal extrusion, Hoffa-synovitis and effusion-synovitis.
In addition, within-grade changes were coded that fulfill the definition of a definite visual change but do not fulfill the definition of a full grade change on the ordinal scales applied [12].

Reliability
One experienced musculoskeletal radiologist (FWR) reevaluated 20 randomly selected MRIs in random order after a 4 week interval to assess intra-reader reliability. Inter-observer reliability between the two readers was determined using the same 20 cases.

Definition of change over time BMLs
Change in overall number of subregions affected by any BML was defined as the difference between the number of subregions affected by any BML at 24 months (size > 0) and the number of subregions affected by any BML at baseline. This was further categorized into improvement, no change, and worsening in one subregion and worsening in two or more subregions. An example of incident BML at follow-up is shown in Fig. 1.
We also determined the number of subregions with worsening, and the number of subregions with improvement. In both instances we took into account withingrade changes in BML size. We further classified these measures into any subregions with worsening vs. no subregions with worsening and any subregions with improvement vs. no subregions with improvement.
To determine maximum change in BML size score, we first evaluated change in size score in each of the 14 articular subregions between baseline and 24 months. Change in size score in each subregion could range from a maximal improvement by three to a maximal worsening by three. The second step was to create an overall change in size score that was defined as the maximum change in size score across the 14 articular subregions. It was categorized into improvement, no change, worsening within grade, worsening by 1 grade, and worsening by two or more grades. Based on distributional quantities the final grouping included: worsening by <2 grades (comprised of improvement, within grade worsening and worsening in at most one grade in size score) vs. worsening by two or more grades.

Osteophytes
The change in number of locations affected by any osteophyte was defined as the difference between the number of locations affected by any osteophyte at 24 months (Grade > 0) and the number of locations affected by any osteophyte at baseline. This change was classified as no change, worsening in one location, and worsening by two or more locations and then further classified into no change vs. any worsening. To determine maximum worsening in osteophyte score, we evaluated change in score in each of the 12 locations between baseline and 24 months. Maximum worsening in score was defined as the greatest amount of worsening among the 12 locations. Maximum worsening in score was initially classified as no change, worsening one grade, and worsening by two or more grades. Based on the distribution, the final categorization included no worsening vs. any worsening.

Meniscus
We assessed whether there was worsening in meniscal morphology from baseline to 24 months in each of the six meniscal subregions. We defined worsening as an increase in grade in at least one subregion. Figure 2 shows an example of increase in meniscal extrusion over time. We further categorized worsening in meniscal morphology into number of compartments with worsening (range 0-6) and whether any of the compartments had worsening (yes/no). We assessed changes in meniscal extrusion separately in the medial and lateral compartments. We categorized change in extrusion as improvement, no change, and worsening. We further dichotomized change in extrusion as no worsening vs. any worsening.
Cartilage MOAKS uses a two-digit score for cartilage assessment that incorporates both area size per subregion and percentage of subregion affected by full thickness cartilage  loss. In this analysis separate scores for cartilage thickness and surface area were considered. The number of subregions with worsening (i.e., a higher score at 24 months vs. baseline) was defined separately for surface area and thickness. Change over time for surface area was computed in two ways: including within-grade changes and excluding-within grade changes. Within grade scoring for cartilage refers to within grade change in area or thickness. For both thickness and surface area, worsening was grouped into 4-levels: 0, 1, 2, or 3 or more areas with worsening.

Hoffa-synovitis and effusion-synovitis
As MRI markers of inflammation so-called effusion-and Hoffa-synovitis are evaluated. Fluid sensitive sequences as applied in the OAI are capable of delineating intraarticular joint fluid but a distinction between true joint effusion and synovial thickening is not possible as both are visualized as hyperintense signal within the joint cavity. For this reason the term effusion-synovitis has been introduced, which is scored based on the distension of the joint capsule. Hoffa-synovitis is a term used for signal changes in Hoffa's fat pad that are commonly used as a surrogate for synovitis on non-contrast enhanced MRI. Effusion-synovitis is scored from 0 to 3 according to the distention of the joint capsule as 1 = small, 2 = moderate and 3 = large. Hoffa-synovitis is scored based on the amount of hyperintensity signal in Hoffa's fat pad on sagittal fat suppressed intermediate-weighted sequences as 1 = mild, 2 = moderate and 3 = severe. Twenty-four month changes in both, Hoffa-synovitis and effusion-synovitis were categorized as improvement, no change, or worsening.

Analytic approach
Descriptive statistics were used to report frequencies for the different features and parameters for baseline and change over time. Logistic regression was used to identify factors associated with statistically significant differences between cases and controls. For some features raw distributions were grouped into categories as described above. In these instances descriptive statistics are presented for both raw and categorical versions of features, and regression was used only for the *statistically significant at p < 0.05; p values refer to differences between cases and controls across all grades categorical version. Weighted kappa statistics were applied to determine inter-and intra-observer reliability. All analyses were conducted in SAS 9.4 (SAS Institute, Cary NC).

Results
Mean age of the participants was 62 years, 60 % were women and average BMI was 30 kg/m 2 [5]. Cases and controls were balanced on all covariates, with the exception of baseline KLG with a higher proportion of KL3 knees in the case group (44 %) compared to the controls (33 %). Summarizing the intra-and interobserver reliability results, all of the measures showed at least substantial agreement ranging between 0.68 for Hoffa-synovitis and 0.97 for medial and lateral meniscal morphology. Table 1 gives a detailed overview of the reliability results.

BMLs
The number of sub-regions affected by any BML ranged from zero to eight and the maximum BML score per knee ranged from zero to three. The change in number of subregions affected by any BML ranged from −3 (three fewer subregions affected at 24 months compared to baseline) to 5 (five more subregions affected at 24 months compared to baseline). Fourteen percent of subjects showed improvement in number of subregions with BMLs (fewer subregions with BMLs at 24 months as compared to baseline) and 52 % showed no change based on this definition. Seventy-three percent of the cases had any subregions with worsening (vs. 66 % in the control group).

Osteophytes
The number of locations with any osteophytes ranged from zero to 12. The maximum osteophyte score per *statistically significant at p < 0.05; p values refer to differences between cases and controls across all grades knee was zero for 3 % of knees, one for 48 %, two for 34 % and three for 15 % of the knees. Overall there was very little change in osteophytes over 24 months. Nine percent of the cohort had at least one location that worsened in osteophyte score over 24 months. Across all locations, the maximum amount of worsening was 2 grades (i.e., zero to two or one to three) and 83 % had no change in any location.

Meniscus
Thirty percent of the knees had any meniscal tear and 28 % showed meniscal substance loss (i.e. maceration).
The number of regions with meniscal morphology worsening ranged from zero to five, with 16 % of subjects having worsening in at least one subregion. Fourteen percent showed an increase in medial meniscal extrusion while only one knee had an increase in lateral extrusion.

Cartilage
The number of subregions with worsening in cartilage surface area, including within-grade changes, ranged from zero to eight with 59 % of subjects having at least one area with worsening in surface area. The number of subregions with cartilage thickness score > 0 ranged from zero to seven. Across the entire knee, the number of areas with worsening in cartilage thickness ranged from zero to six with 42 % of subjects having at least one area with worsening in thickness.

Hoffa-synovitis
MOAKS Hoffa-synovitis score ranged from zero to seven and with 24 month change ranging from −2 to 2. While only 10 % of subjects experienced worsening, more cases experienced worsening than controls (17 % vs. 6 %).

Effusion-synovitis
MOAKS effusion-synovitis score ranged from zero to three with 24 month changes ranging from −2 to 2. Forty-one percent of cases worsened compared to 18 % of controls. Apart from meniscal damage and effusion-synovitis, baseline frequencies of all measures showed statistically significant differences for cases vs. controls. For change parameters, maximum worsening of BML score, 24 months change in osteophytes and meniscal damage and extrusion, all cartilage measures, and Hoffa-and effusion-synovitis showed significant differences between cases and controls. Tables 2, 3

Discussion
In this cohort of subjects at risk for OA progression, the values for several tissue-specific MRI features associated with progression of disease vary widely and show great change or fluctuation. The subgroup defined as cases based on composite progression of structural and clinical features exhibited changes to a greater extent than the controls on several features. Specifically, we observed greater change in the case group on maximum change in BMLs, worsening of BMLs in two or more subregions, worsening of cartilage surface area and thickness in three or more subregions and worsening of meniscal damage. Inflammatory markers of disease, i.e. Hoffa-and effusion-synovitis, also worsened more frequently in the case group compared to the controls emphasizing the potential role of inflammation in disease progression [13][14][15]. Overall little change was observed for osteophytes reflecting the generally slow course of the disease. Focusing on the identical dataset, we could show using a multivariable approach that 24-month change in cartilage thickness, cartilage surface area, synovitis-effusion, Hoffa-synovitis, and meniscal morphology were associated with disease progression independently, suggesting that they may serve as efficacy biomarkers in clinical trials of disease modifying interventions for knee OA [7]. Definition of change using semi-quantitative approaches is challenging as there are multiple possible definitions including subregional or maximum-grade approaches. To gain additional understanding of frequencies and categories encountered in this cohort selected on the basis of progression or serving as controls we performed the current analysis that may help researchers in the future to power planned observational studies or clinical trials.
Few studies are available that have focused on longitudinal change of MRI parameters using semi-quantitative assessment. Most available studies are centered around baseline predictors of subsequent cartilage loss as the outcome [16]; only few studies focus on cartilage as a predictor of worsening BMLs as the outcome [17]. When assessing change using semi-quantitative scoring in OA, scores are commonly presented as mean values or summed over a defined anatomical region (commonly compartment or knee) [18,19]. For several reasons, such approaches have drawbacks that need to be considered. One of the main shortcomings is that sums are challenging to compare. As an example, a sum of six acquired over six distinct subregions may mean one lesion with a grade 6 (considered severe) while five other subregions exhibit no lesion (grade 0); alternatively, it may reflect grade 1 lesions in all six subregions. More work is needed on the prognostic implications of having widespread low grade involvement vs. a focal severe lesion. It appears likely that both play a role with regard to disease progression [3]. Other approaches to define progression have been published recently [20].
Part of the study design was sequential reading of MRIs not blinded to time point but blinded to case or control status as it has been shown that this approach increases sensitivity to change [21]. Reading unblinded to time point also allowed for the application of withingrade changes, further increasing sensitivity to detect minor changes [12]. In assessing MRI data semiquantitatively, we are advocating the scoring of the number of subregions or locations involved by pathology, with further stratification using cut-offs related to severity of a certain feature. In addition, an approach assessing a maximum change over a pre-defined unit, such as a knee compartment or the entire joint, adds to the understanding of the degree of change observed, which may be lost using a summative approach. Our definition of controls included both non-progressors and noncomposite progressors including those that either progressed clinically (but not radiographically) or radiographically (but not clinically). A further subanalysis is needed to look at differences in changes for these subgroups separately.

Conclusions
In summary, a wide range of MRI-detected structural pathologies was present in the FNIH cohort. More severe changes, especially for BMLs, cartilage and meniscal damage, were detected primarily among the case group suggesting that early changes in multiple structural domains are associated with radiographic worsening and symptomatic progression. Particularly the role of structural predictors of progression that are potentially amenable to therapeutic approaches such as inflammatory markers of disease (depicted as Hoffa-and effusion synovitis on MRI) or subchondral bone changes         *statistically significant at p < 0.05; p values refer to differences between cases and controls across all grades (visualized as BMLs on MRI) should be the focus of further evaluation. In addition, the complexity of the different semi-quantitative scoring systems needs consideration when engaging in analyses focusing on change over time. Simply summing scores does not seem to be sufficient and further validation of analyses taking into account potentially improving features or within-grade scoring is urgently needed to take full advantage of the richness of semi-quantitative data that is considered complementary to more quantitative approaches based on segmentation of 3D datasets.