Meta-analysis of long-term joint structural deterioration in minimally treated patients with rheumatoid arthritis

Background Rheumatoid arthritis (RA) is a chronic autoimmune disease characterized by inflammation and joint structural deterioration. Driven by recent expectations that patients in clinical trials randomized to placebo should be ‘rescued’ with active therapy within 6 months of starting treatment, the relative benefit of arresting joint damage with biologic agents beyond this period is unclear. With longer-term evidence of the rate of joint deterioration with minimal treatment, the efficacy of biologic agents and novel treatments might be projected beyond the placebo-controlled phase observed in clinical trials. The aim of this study was to estimate radiographic structural deterioration over time in patients with moderate-to-severe RA minimally treated with DMARDs. Methods A literature review identified evidence of joint structural deterioration in patients with (DMARD-IR population) and without (non-DMARD-IR population) a history of inadequate response to DMARDs. Patients were minimally treated with one non-biologic DMARD or palliative care (non-DMARD-IR population only). Outcomes of interest were the (modified) Total Sharp Score (TSS) and subscales (Erosion Subscore [ES] and Joint Space Narrowing [JSN] Subscore), and Larsen score. Pooled joint-deterioration curves over time were obtained with meta-analysis models. Results Mean change from baseline in TSS increased in the DMARD-IR population from 1.14 (95 % credible interval [CrI] 0.66, 1.67) to 9.84 (5.68, 14.46) at Weeks 12 and 104, respectively, and a non-linear increase of 1.56 (0.79, 2.34) and 5.13 (−1.35, 11.67) in the non-DMARD-IR population. At the same time points, mean changes (95 % CrI) were 0.51 (0.27, 0.83) and 4.43 (2.38, 7.21) for ES and 0.36 (0.09, 0.67) and 3.14 (0.80, 5.78) for JSN in the DMARD-IR population, whereas corresponding changes in the non-DMARD-IR population were 0.69 (0.31, 1.12) and 2.93 (0.92, 5.02), and 0.29 (0.17, 0.44) and 2.55 (1.45, 3.80), respectively. Larsen scores were only available for the non-DMARD-IR population, with mean changes (95 % CrI) of 0.08 (0.04, 0.11) and 0.65 (0.36, 0.96) at Weeks 12 and 104, respectively. Conclusion Minimal treatment of RA with one non-biologic DMARD results in deterioration of joint structure in patients with or without a history of inadequate response to non-biologic DMARDs. Electronic supplementary material The online version of this article (doi:10.1186/s12891-016-1195-4) contains supplementary material, which is available to authorized users.


Background
Rheumatoid arthritis (RA) is a common chronic inflammatory joint disorder. Without treatment most patients with RA become severely disabled. The goals of RA treatment are to reduce disease activity, reduce or inhibit the rate of joint damage and, if possible, achieve remission. Current pharmacologic therapies include traditional disease-modifying anti-rheumatic drugs (DMARDs) and biologic agents [1][2][3].
Biologic agents have been shown to inhibit radiographic joint destruction in patients with an inadequate response to non-biologic DMARDs. Driven by recent expectations that patients in clinical trials randomized to placebo should be 'rescued' with active therapy within 6 months of starting treatment, the relative benefit of arresting joint damage with biologic agents beyond this period is unclear. With longer-term evidence of the rate of joint deterioration with placebo or minimal treatment, the efficacy of biologic agents and novel treatments might be projected beyond the placebo-controlled phase observed in clinical trials.
The objective of the current study was to estimate radiographic joint destruction over time with minimal treatment among the following populations of biologic DMARD-naïve RA patients: (1) moderate-to-severe RA patients with a history of inadequate response to nonbiologic DMARDs who were treated with one (other) non-biologic DMARD; and (2) moderate-to-severe RA patients without a history of inadequate response to a DMARD, who received palliative care (non-steroidal anti-inflammatory drugs [NSAIDs], analgesics, low-dose glucocorticoids) or were being minimally treated with one non-biologic DMARD. The first population was termed the "DMARD-IR population" and the second population the "non-DMARD-IR population". The evidence for this analysis was obtained by means of a systematic literature review.

Study identification and selection
A systematic literature search was performed to identify studies that provided information concerning joint structural deterioration among minimally treated RA patients. MEDLINE® and EMBASE® databases were searched simultaneously for articles published in English, French, or German, from 1970 to October 2009, with a predefined search strategy. Search terms included a combination of free text and thesaurus terms related to RA, NSAIDs, glucocorticoids, non-biologic DMARDs, clinical trials, and observational studies. (See Additional file 1 for details of the search strategy.) The relevance of each citation identified from the databases was based on the title and abstract according to the predefined selection criteria outlined below: Populations of interest DMARD-IR, i.e., adult RA patients naïve to biologic DMARDs with a history of inadequate response to one or more non-biologic DMARDs; and non-DMARD-IR, i.e., adult RA patients naïve to biologic DMARDs without a history of inadequate response to a non-biologic DMARD. The non-DMARD-IR population could include both non-biologic DMARD-naïve (completely DMARD-naïve) and non-naïve (non-biologic DMARDexperienced) patients.

Study design
Randomized controlled trials (RCTs), and prospective and retrospective observational cohort studies. Only study arms concerning the interventions of interest were included.
Publications were obtained, if available, for any abstracts that potentially met the selection criteria. Based on these full-text reports, two reviewers evaluated whether each study met the selection criteria and any disagreements were resolved in a consensus meeting.

Data extraction
For each of the selected studies that reported sufficient follow-up data, details were extracted from the relevant study arms on study design, population characteristics, interventions, and the outcomes of interest, i.e., the (modified) TSS and its two subscores (ES and JSN). Data were extracted into a study database according to the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) 2009 [11]. Mean change from baseline (CFB) in the outcomes of interest was extracted from tables, text, or graphs. If not reported, CFBs were calculated as the difference between reported follow-up and baseline values. Corresponding standard errors were extracted directly or calculated indirectly based on the following data (if available): reported standard deviation (SD) with sample size, 95 % confidence interval, or p-values (in this order of preference).
Larsen scores were not consistently evaluated or reported. Different numbers and sets of joints were evaluated in the various studies, including hands and feet, or hands, feet, and wrists, and many studies did not report which or how many joints were evaluated. Moreover, some studies reported the total scores and some reported an average of scores per joint. Consequently, the analyses were based on standardized mean CFB in Larsen score, calculated as the reported CFB divided by the corresponding SD of this change.

Meta-analysis of joint structural deterioration over time
Mean CFB in TSS, ES, JSN, and the standardized Larsen scores obtained from the selected studies were combined with Bayesian random-effects meta-analysis models to estimate joint deterioration over time for the DMARD-IR and non-DMARD-IR populations [12]. Any study that did not explicitly state whether or not patients had previously shown an inadequate response to DMARDs was assigned to the non-DMARD-IR population. Depending on the availability of data by endpoint and population, two sets of analyses were performed. In the first series of analyses, all non-biologic-DMARDs were considered as one group and the development of the outcomes of interest was estimated. Studies evaluating only NSAIDs were not combined with studies evaluating DMARDs. In the second series of analyses, the development of the outcome over time was compared among individual DMARD (e.g., MTX, LEF, and AZA) using only data from comparative studies. All analyses provided curves reflecting the pooled mean CFB in TSS, ES, JSN, and the standardized Larsen score over time, along with their respective 95 % credible intervals (95 % CrIs).
Within the Bayesian framework, analyses consisted of data, likelihood, parameters, and a model. Bayesian methods involve a formal combination of a prior probability distribution (that reflects a prior belief of the possible values of the parameters of interest) with a likelihood distribution based on the observed data, to obtain a posterior probability distribution of the parameters of interest [13]. A normal likelihood distribution was assumed.
We opted for statistical models that assume that outcomes develop over time in a linear fashion, as well as models that anticipate that outcomes can develop in a non-linear fashion over time [14][15][16]. The advantage of the used meta-analysis models is that all available data points of each study included in the analysis are captured, even if time points are not the same across studies, and (non-) linear trends of the development of outcomes over time are estimated [12]. Details of the meta-analysis models are provided in Additional file 2. Model 1 and 2 were used to estimate the development over time where all non-biologic DMARDs were grouped. Model 3 and 4 were used for the comparative analysis of different DMARDS. The deviance information criterion (DIC) provides a measure of model fit that penalizes model complexity and was used to compare the different models [17,18]. The model with the lowest DIC and, therefore the model with the "best fit", was considered the most appropriate.
To avoid prior beliefs influencing the results of the model, non-informative prior distributions were used. Prior distributions of all model parameters were normal distributions with a mean of 0 and a variance of 10 4 , except for heterogeneity, which was a uniform distribution with a range of 0-10. With such a "flat" prior, it is assumed that, in advance of the actual data, any parameter value is "equally" likely. As a consequence, posterior results are not influenced by the prior distribution but are driven by the data. The result of the Bayesian analysis is a (joint) posterior distribution for the model parameters of interest. The model parameters were estimated using a Markov chain Monte Carlo method as implemented in the WinBUGS software package [19].

Study selection
The study selection process, including the reasons for exclusion, is summarized in Fig. 1. The literature search identified 2076 potentially relevant studies, although the first review excluded 1892 (91 %) of these. The full-text review of the 184 remaining studies excluded another 111 studies. Of the 73 articles meeting the selection criteria, another 29 studies were excluded because of insufficient data on the outcomes of interest during follow-up. Overall, 44 studies were included .

Study characteristics
Information on key study and patient characteristics are presented in Tables 1 and 2. All 44 studies were RCTs, except for one retrospective [31] and two prospective cohort studies [33,52], and were published between 1982 and 2009, with follow-up periods ranging from 24 weeks to 2 years. Twelve studies concerned the DMARD-IR population [20][21][22][23][24][25][26][27][28][29][30][31] and the remaining 32 studies concerned the non-DMARD-IR population . Only two studies provided data on the Larsen score for the DMARD-IR population, while 10 provided data on the (modified) TSS in this population. For the non-DMARD-IR population, 17 studies provided data on Larsen score only, 14 studies provided data on the (modified) TSS, and one study provided data on both. As some of the studies contributed data from more than one arm (three studies in the DMARD-IR population [21,22,31] and 13 studies in the non-DMARD-IR population [35,36,38,41,48,50,51,54,57,59,[61][62][63]), the total number of treatment arms included in the analyses was 63: 16 for DMARD-IR and 47 for non-DMARD-IR. Among the 47 arms that formed the non-DMARD-IR population, only 12 included patients who had been previously exposed to DMARDs. Hence, the majority of patients in this population could be considered DMARD-naïve.

Patient characteristics
The DMARD-IR and the non-DMARD-IR populations showed comparable distributions for gender, age, baseline C-reactive protein (CRP) level, and erythrocyte sedimentation rate (ESR). On average, 73 % of the patients were women in the DMARD-IR studies versus 70 % in the non-DMARD-IR studies; the average ages were 54 and 52 years, respectively. The median of the reported baseline CRP level was 2.2 mg/dL across the DMARD-IR studies and 2.6 mg/dL across the non-DMARD-IR studies. The median of the reported baseline ESR was 43 mm/h across DMARD-IR studies and 44 mm/h across non-DMARD-IR studies. There was a large variation in baseline CRP level and ESR across the non-DMARD-IR studies. As expected, the disease duration was skewed and longer among the DMARD-IR patients (median of 88 months) versus the non-DMARD-IR patients (median of 15 months). For the DMARD-IR population, the median of the reported baseline Health Assessment Questionnaire (HAQ) score across the studies was 1.7; the HAQ scale ranges from zero (no disability) to three (completely disabled). The median baseline TSS for the DMARD-IR population was 53. For the non-DMARD-IR population the median of the reported HAQ scores across studies was 1.0 and the median TSS was 11.9.

Joint structural deterioration over time in the DMARD-IR population
The mean CFB in TSS within the DMARD-IR population, as obtained from the individual studies, is presented in Fig. 2. These results were combined with a random-effects meta-analysis (Additional file 2, Model 1), where the change in TSS over time developed in a linear fashion from 1.14 at Week 12 to 9.84 at Week 104 (Table 3). There was a high probability that continuation of treatment with any one non-biologic DMARD in the setting of inadequate  response would result in deterioration of the joint structure over time. Table 3 also presents the results of the analysis (Additional file 2, Model 3) that compared joint deterioration as observed with MTX and AZA. Continuation of treatment with AZA was associated with greater joint deterioration than continuation of treatment with MTX in this DMARD-IR population.
The progression in ES extracted from the individual studies and the pooled results (0.51 at Week 12 and 4.43 at Week 104) obtained with the meta-analysis (Additional file 2, Model 1) are presented in Fig. 3. There was a 98 % to 100 % chance that ES would deteriorate over time when DMARD-IR patients received minimal treatment with a non-biologic DMARD ( Table 3). As inferred from the comparative analysis, a greater rate of deterioration was expected with AZA than with MTX.
When DMARD-IR patients were treated with MTX alone mean changes in JSN were 0.36 at Week 12 and 3.14 at Week 104 ( Fig. 4; Table 3).
For joint deterioration as measured with the Larsen score, only two studies with sufficient data were available for the DMARD-IR population [27,31]. As neither study reported repeated intermediate observations, no metaanalysis model for change over time was estimated. At 24 weeks, a deterioration of 0.52 was observed with MTX [27]. In the other study the deterioration varied from 0.53 points with SSZ to 1.06 with gold salts at 52 weeks [31].

Joint structural deterioration over time in the non-DMARD-IR population
The rate of deterioration in the non-DMARD-IR population was not as great as for the DMARD-IR patients. The progression of the TSS for the non-DMARD-IR population is presented in Fig. 5 and Table 4. Individual study results were combined with a random-effects meta-analysis model, where the change in TSS from baseline developed in a non-linear fashion (fractional polynomial with p1 = p2 = 1. Additional file 2, Model 2) and shows an increase in TSS from 1.56 at 12 weeks to 5.13 at 104 weeks. Up to at least 104 weeks, there was at least a 94 % chance that continuing treatment with one DMARD would result in deterioration of the joint structure in the non-DMARD-IR population (Table 4). An analysis (Additional file 2, Model 4) comparing LEF and MTX based on two head-to-head RCTs indicated a similar rate of deterioration with LEF and MTX [37,59].
For ES, the individual study results were also combined with a random-effects non-linear meta-analysis model (fractional polynomial with p1 = 1 and p2 = 0.5; Additional file 2, Model 2) and showed that ES worsened over time from 0.69 at Week 12 to 2.93 at Week 104 when non-DMARD-IR patients continued to receive one traditional DMARD ( Fig. 6; Table 4). Comparative analysis showed no difference in rate of deterioration was expected between LEF and MTX (Additional file 2, Model 4).
For the JSN and the Larsen scores, linear metaanalysis models were appropriate to reflect the deterioration up to 104 weeks ( Fig. 7 and Table 4; Additional file 2, Model 1). One study evaluated treatment with NSAIDs only and reported a mean standardized change from baseline in the Larsen score of 0.01 up to 52 weeks [54].

Discussion
In this study, the development of joint structural deterioration among minimally treated patients with moderate-to-severe RA was estimated based on currently available published data. Estimates were obtained for two populations: a DMARD-IR population that consisted of patients who showed previous inadequate response with non-biologic DMARDs, and a non-DMARD-IR population that consisted of both non-      AZA azathioprine, CRP C-reactive protein, CSA ciclosporin A, DAS-28 disease activity score in 28 joints, DMARD disease-modifying anti-rheumatic drug, DMARD-IR patient population with moderate-to-severe RA with a history of inadequate response to non-biologic DMARDs who are currently treated with one (other) non-biologic DMARD, ES Erosion Subscore of TSS, ESR erythrocyte sedimentation rate, gold gold salts, HAQ Health Assessment Questionnaire, HCQ hydroxychloroquine, ITT intent to treat, JSN Joint Space Narrowing Subscore of TSS, LEF leflunomide, MTX methotrexate, NA not available, non-DMARD-IR patient population with moderate-to-severe RA without a history of inadequate response to a DMARD who are currently receiving palliative care (NSAIDs, analgesics, low-dose glucocorticoids) or are being minimally treated with one non-biologic DMARD, NSAID non-steroidal anti-inflammatory drug, RA rheumatoid arthritis, Ref reference, RF+ rheumatoid factor positive, SJC swollen joint count, SSZ sulfasalazine, TJC tender joint count, TSS modified Total Sharp Score a Median. b Range. c Interquartile range biologic DMARD-naïve and non-biologic DMARD-experienced patients without an inadequate response to any DMARD. In the identified studies, the minimally treated DMARD-IR patients were receiving monotherapy with MTX, AZA, SSZ, or gold salts, with most patients receiving MTX. In the included non-DMARD-IR studies, DMARD treatment consisted of MTX, SSZ, LEF, CSA, HCQ, or D-penicillamine.
Only one study was identified in which patients were treated with NSAIDs only, but this study was not included in the meta-analysis. For both populations, treatment with one DMARD resulted in deterioration of joint structure over a 2-year period as measured with the TSS, ES, JSN, and Larsen scores. Under the assumption that the minimal clinically important difference is about 1 % of the maximum of the possible TSS and Larson scores, the estimated changes over a 2 year period in terms of TSS can be considered relevant, in particular for the DMARD-IR population [64,65]. Depending on the time assessed and the measure examined, the rate of deterioration in the DARD-IR population was about 1.5-to 2-times the rate of deterioration in the non-DMARD-IR population. Based on RCT evidence, the rate of deterioration with AZA was greater than with MTX in the DMARD-IR population. For the non-DMARD-IR population, LEF and MTX showed a similar progression over time. The greater rate of deterioration observed in the DMARD-IR population compared with the non-DMARD-IR population makes sense, given the negative impact a history of non-biologic-DMARD failure should have on the effectiveness of continuation with a non-biologic DMARD. Related underlying causes for the difference in progression rates are possibly differences in disease duration, rheumatoid factor status, and disease activity.
However, it is important to note that for a subset of the identified studies it was not clear whether the patients were exclusively DMARD-IR. These studies were assigned to the non-DMARD-IR group to make sure that the DMARD-IR group was as homogenous as possible. As such, it is possible that the defined non-DMARD-IR population partly consisted of patients who might have a history of failed treatment with a DMARD. This possible misclassification might have overestimated the deterioration in this group, and should be kept in mind when comparing the degree of joint deterioration in DMARD-IR versus non-DMARD-IR populations.
The relevant studies were identified by means of a systematic search of the literature and included both RCTs and observational designs. Given the objective of the meta-analysis, only those arms of the comparative studies were selected in which patients were treated with NSAIDs or a single DMARD (with or without additional NSAIDs or corticosteroid use). Although many RCTs were included, often only one treatment arm (e.g., MTX-only arm from biologic trials in DMARD-IR populations) was used. As such, there was no difference in the way evidence obtained from observational studies and RCTs was handled. RCTs in which different single nonbiologic DMARDs were compared were included and provided the evidence to allow comparisons between DMARDs. Comparative analyses were only possible for AZA versus MTX and for LEF versus MTX. Although RCTs provided comparative data for some other DMARDs, these were not part of a connected network of RCTs and could not, therefore, be used in the planned analyses.
Many of the MTX treatment arms comprising the DMARD-IR population were obtained from RCTs in which a biologic DMARD was evaluated. In the included studies the patients in these MTX arms were not assigned to biologic treatment within the study time horizon; only the MTX dose could be increased in case of non-response. Hence, the observed structural deterioration is a reflection of the limitations of MTX in this population.
The included studies reported joint deterioration at different time points, with outcomes reported up to 2 years of follow-up. With the meta-analysis models used, all the available time points were analyzed simultaneously to estimate a curve reflecting joint structural deterioration over time. It cannot be assumed that extrapolation of these curves beyond this 2-year period is a valid representation of joint structure deterioration over the longer term. The Table 3 Mean change from baseline in TSS and subscores in the DMARD-IR population as estimated with meta-analysis Week 12 Week  vast majority of studies used the modified Sharp score to analyze joint erosion and space narrowing. The modified score includes feet in the radiographic assessment, in addition to the scoring of wrists and hands as with the original Sharp score [6][7][8][9][10]. The study by Hamdy [21] used the earlier version of the Sharp score. Despite the differences in total score, we included the study by Hamdy in the analysis of the DMARD-IR population. We do not expect this variation in total score to be a cause of large between-study heterogeneity in development of TSS over time. In fact the observed TSS reported by Hamdy is very consistent with the other studies included in that analysis (Fig. 2). The included studies were characterized by variability in patient characteristics, especially among the non-DMARD-IR studies. As a result, heterogeneity in joint structural deterioration over time was observed. In order to capture this heterogeneity, random-effects models were used; however, these models do not explain the heterogeneity. In the future, it will be of interest to evaluate whether certain patient characteristics are associated with differences in joint deterioration. However, meta-regression analysis where study level data is used to evaluate the impact of patient characteristics on outcomes or treatment effects can be prone to ecological bias [66,67]. For such an evaluation it is preferred to have access to patient-level data. In this context it would be interesting to evaluate the independent effect of steroid use, disease duration, rheumatoid factor status, and disease activity on joint deterioration, for example. Table 4 Mean change from baseline in TSS and subscores plus Larsen Score in the non-DMARD-IR population as estimated with meta-analysis Week 12 Week AZA azathioprine, CFB change from baseline, CrI credible interval, DMARD disease-modifying anti-rheumatic drug, MTX methotrexate, non-DMARD-IR patient population with moderate-to-severe rheumatoid arthritis without a history of inadequate response to a DMARD who are currently receiving palliative care (non-steroidal anti-inflammatory drugs, analgesics, low-dose glucocorticoids) or are being minimally treated with one non-biologic DMARD, p-value probability of joint structural deterioration relative to baseline, TSS modified Total Sharp Score a Estimated based on comparative data only, using models for relative treatment effects (see Model 3 and 4 in Additional file 2), which allows comparative interpretation of MTX and AZA findings This study was sponsored by Pfizer Inc. As specified above, all authors were funded by Pfizer either as consultants or employees and Pfizer were involved in the decision to submit this manuscript for publication.

Availability of data and materials
All data supporting the findings is contained in the manuscript. Two supplementary files include the literature search strategy and statistical models used for the meta-analysis.
Authors' contributions JPJ designed the study, performed the analyses, and co-wrote the manuscript. MCV coordinated and helped conduct the systematic literature review and data extraction, performed part of the analyses, and co-wrote the manuscript. JCC made substantial contributions to the analysis and interpretation of data, and revised the article critically for important intellectual content. JDB, SHZ, and GCW made substantial contributions to the study design, interpretation of results, and development of the manuscript. All authors read and approved the final version of the manuscript.

Competing interests
At the time of performing the analysis presented this manuscript, JPJ and MCV were employees of Mapi and were paid consultants to Pfizer Inc in connection with the manuscript's development. All other authors are employees of Pfizer Inc.

Consent for publication
Not applicable.
Ethics approval and consent to participate Not applicable as this is a meta-analysis of published data.