Disease activity level, remission and response in established rheumatoid arthritis: Performance of various criteria sets in an observational cohort, treated with anti-TNF agents

Background Most composite indices of disease activity and response criteria in RA have been validated and compared in clinical trials rather than routine care. We therefore wanted to compare the performance of the DAS28, SDAI and CDAI activity indices, their activity states, their response criteria, and also compare with the ACR response criteria in an observational clinical setting. Methods Agreement between the criteria sets was investigated using κ statistics in a non-randomized cohort of 1789 RA patients from southern Sweden, starting their first course of anti-TNF-treatment. Mean disease duration was 12 years. Completer analysis was used. Results Agreement between high, moderate and low activity states was moderate or substantial, with κ = 0.5 or better for all criteria. Agreement between SDAI and CDAI disease states was > 90% in these categories with κ > 0.8. DAS28 original and modified cut point remission had good agreement (κ = 0.91). Agreement between responses was substantial at the overall/ACR20 level (about 95%, κ = 0.7 or better) for all criteria. By contrast, agreement was poor between moderate and high level responses. Conclusion Disease activity states according to the various indices perform similarly and show substantial agreement at all levels except remission. Agreement between SDAI and CDAI states is excellent. Response criteria, applied at the individual patient level, are hard to interpret and show poor agreement, except at the lowest level of response. Thus, they should not be applied uncritically in clinical practice.

trial setting, where the aim is to compare one treatment to another (standard) or none at all (placebo), response is the outcome measure of choice. The efficacy of a treatment is expressed as the proportion of a patient group that meets a certain response criterion. Indeed, disease activity indices, their LDA and remission criteria are not recommended as primary end points in trials due to low sensitivity to change [5]. On the other hand, response criteria, such as the DAS based EULAR moderate or good [6], SDAI [2] and CDAI [3] minor or major, and American College of Rheumatology (ACR) 20, 50 or 70% response [7], may be less suitable in routine care. Unlike the group level, individual responses will depend on the criteria set chosen, at least at the stricter levels [8].
The cut-points for the various activity indexes and the components of the response criteria are summarized in Table 1.
The aim of the present study was to apply and compare the performance of various activity indices and response criteria in an observational cohort of patients from southern Sweden with established RA, treated with their first course of TNF-blockers. The findings may also serve as a reference of what levels of activity and response are to be expected in the "real life" setting without any formal or financial restrictions. The comparisons will also serve as a repository for future studies using one or the other measure as their primary end-point, relating them to results observable with other primaries.

Methods
The South Swedish Arthritis Treatment Group (SSATG), a network of hospital and office based rheumatologists in southern Sweden, maintains a database into which all courses of treatment with biologic drugs for RA and other arthritides are entered as described elsewhere [9].
Patients eligible for the study had a diagnosis of RA, as judged by the treating rheumatologist, and started their first course of treatment with infliximab, etanercept, or adalimumab from March 1999 through December 2006. Due to the quality and safety monitoring character of the register, no formal ethics committee approval was required.
Patients were evaluated at baseline and 3 months. The follow up protocol included tender and swollen 28 joint counts, visual analogue scale (VAS) for pain and patient global, evaluator global Likert scale, ESR, and CRP. Thus DAS28, SDAI and CDAI could be calculated at each time point as well as fulfilment of EULAR, ACR, SDAI and CDAI response criteria at 3 months. The number of patients falling into the different categories of disease activity (including remission) when applying the cut off levels proposed for DAS28 (original and modified), SDAI and CDAI were then calculated.

Statistical methods
Descriptive statistics was used throughout. For each level of disease activity and response, a reference criteria set was chosen. Patients fulfilling and not fulfilling this com- prised the basis for comparison, as another criteria set was applied to the same patients. The agreement, i e patients fulfilling both sets (positive agreement) or neither (negative agreement) were then calculated as percentages and assessed with κ statistics [10]. The procedure was then repeated with a new reference criteria set. Thus, all comparisons were made between pairs of disease states and according to different indices, but each comparison was on the same level of disease activity. Likewise, pair wise comparisons of response criteria fulfilment at each level of response were performed. For example, DAS28 original cut point low was compared to SDAI low, and ACR20 was compared to EULAR overall response, etc. The κ values indicate the level of agreement beyond chance between two dichotomous variables. A frequently cited rule of thumb [11] is that κ values > 0.8 correspond to almost perfect agreement, 0.61-0.8 to substantial, 0.41-0.6 to moderate and 0.2-0.4 to fair agreement. κ = 0 denotes chance agreement. Completer analysis has been applied due to the observational nature of the study.

Results
The eligibility criteria were met by 1789 patients. The baseline characteristics for all patients, the 1258 with 3 month data and the 531 patients lacking data at 3 months are given in Table 2. At baseline the majority (> 95%) of patients were in high/intermediate disease activity irrespective of criteria sets used ( Figure 1A). At 3 months, 12-19% of patients had high, 39-46% had moderate, and 38-49% had low disease activity depending on criteria set. According to DAS28 original and modified cut points, 23 and 19%, respectively, were in remission, whereas about 8% were in remission according to SDAI and CDAI ( Figure 1B).
The agreement between each defined disease state at baseline and 3 months, e.g. DAS28 low original cut point, DAS28 low modified cut point, SDAI low and CDAI low, are given in Table 3, along with their respective κ values. In general, agreement between the activity states determined by the cut-points of the various activity indices was moderate or substantial with κ values of about 0.5 or higher. At 3 months, remission according to the 2 DAS28based categories had excellent agreement, κ = 0.84, and so did SDAI and CDAI, κ = 0.91, whereas agreement between DAS28-based agreement criteria and SDAI/CDAI was moderate, κ = 0.40-0.47. SDAI and CDAI had excellent agreement at all levels of disease activity (Table 3).
For the calculation of response at 3 months, 1258 patients had complete data, with baseline characteristics similar to the total cohort, but also to those with missing 3 month data (table 2). Response rates were 60-70% at the ACR20/ overall level, 42% for EULAR original moderate, 19% for EULAR modified moderate, 25% for SDAI and CDAI minor response. At the major/good level, 35% responded according to EULAR original and ACR50 and 50-55% according to EULAR modified, SDAI and CDAI. Fourteen per cent were ACR70 responders ( Figure 2).
The responder agreements at 3 months are summarized in Table 4. Substantial agreement (κ = 0.54-0.91) was found at the modest response level of ACR20, EULAR overall, SDAI overall and CDAI overall. Agreement at good/major level between EULAR original on one hand and, ACR70, SDAI, or CDAI on the other was poor (κ = 0.17-0.27), whereas EULAR modified was in much better agreement with SDAI and CDAI, κ = 0.69. SDAI and CDAI showed good agreement with κ values between 0.68 and 0.91 at the different response levels, while the agreement between other criteria sets at different response levels was more variable.

Discussion
The major findings in this observational study of a nonrandomized cohort of established RA patients, receiving their first course of anti-TNF treatment and followed for 3 months, were that the disease activity states according to the various indices performed similarly and showed moderate or substantial agreement at all levels except remission. SDAI and CDAI stages show excellent agreement. Agreement of response criteria is substantial at low Disease stages at A) baseline and B) at 3 month follow up according to the different criteria sets  The κ value tells to which extent agreement is not explained by chance. The positive/negative agreements give, however, an idea of the degree of agreement; the closer total agreement (positive + negative) is to 100%, the higher degree of agreement. Furthermore, κ depends on sample size, and it is hard to interpret, if the compared groups are small. This is illustated by the limited κ agreement regarding both ACR70 responders and those reaching remission.
Completer analysis may lead to bias, if the drop-outs at 3 months substantially differ from the completers. The baseline data of the drop-outs and patients with 3 month data was therefore examined separately and not found to differ from the whole cohort in any clinically relevant way (Table 2). Also, in the observational setting, only patients remaining on therapy are contributing their data as time  goes by, which tends to inflate the results. Missing information at 3 month follow up in our setting is higher than in the British Biologics register [12]. There may be several explanations for this, but the mandatory response demanded for drug continuation in the British setting may be one important factor. Also, the patients in the British register differ on numerous baseline characteristics such as higher disease activity and very high disability measured by the HAQ, all factors that may influence response rates and future disease states [12,13]. Thus each setting has its features and face to face comparison should always be done with great care.
The distribution between disease state categories at baseline and after 3 months is what may be expected in RA patients with mean disease duration of 12 years and who failed at least 2 different DMARDs. Achieving remission and even low disease activity thus appears not to be a too common event in established RA, treated with TNF-blockers. This is in agreement with other observational studies [16]. In our setting with many long-standing cases, erosive disease may preclude reaching remission in many patients although inflammation per se is suppressed by treatment. The small number of patients in low disease activity (LDA) at baseline is explained by patients taking high doses of prednisolone, also an accepted indication for commencing TNF-blocker treatment according to the Swedish guidelines.
The agreement between disease states, as defined by the DAS28 original and modified cut points, SDAI and CDAI proposed cut points, seems to be substantial. This is in agreement with comparisons based on trial data [14]. The excellent agreement of the disease state groups based on SDAI and CDAI irrespective of activity level may be accounted for by their great similarity: they contain the same components, added together, except for the CRP that is excluded from the CDAI. Our data thus support the notion of acute phase reactant values contributing little to the overall disease activity estimate [3]. The SDAI and CDAI remission is achieved in fewer patients than the DAS28 remission that appears less strict in this cohort, in agreement with previous findings [15]. At the remission level, agreement is almost perfect between EULAR original and modified cut off, and also between SDAI and CDAI at 3 months, as may be expected ( Table 3). The moderate agreement between the DAS28-based and SDAI/CDAI remission, may partly be accounted for by the greater strictness of the latter ( Figure 1B). Due to low patient numbers reaching remission, κ must be interpreted with caution in this category.
Response criteria are intended for clinical trials, and it is thus not surprising that they perform poorly when applied to individual RA patients in clinical practice. Agreement (Table 3) is variable across criteria and response levels with a tendency to be better at the less strict overall level, where EULAR moderate and good are merged to one group, as are SDAI and CDAI minor and major responders. In this manner, substantial agreement is achieved with ACR20, which includes all ACR50 and 70 responders, in accordance with previous findings [5,8].
The very poor κ values (often close to 0) for moderate response comparisons seem to indicate random agreement at the individual level. This may in part be due to the construction of the criteria. Thus, the ACR50 responders include all the ACR70 responders, whereas the EULAR moderate category does not include the EULAR good. The same mechanism may be operative concerning SDAI and CDAI minor and major responders. EULAR response according to original and modified definitions also exhibit poor agreement, especially at the moderate level. This is an expected finding, given the different cut-points of DAS28 used in each case. The value of response criteria in monitoring patients in routine care thus seems to be limited. In contrast, group level responses in a clinical setting can give some indication of the value of a particular treatment in routine care. As far as the management of individual RA patients with established disease is concerned, at least in our setting, the achievement of an absolute degree of disease activity seems to be a more relevant treatment goal than fulfilling a response criterion, i.e. achieving a given degree of improvement [8]. Our data thus do not support the use of response criteria as aid in the monitoring of RA-patients treated routinely with TNFblockers, but this should be verified in other clinical cohorts.
In the development and evaluation of new treatment modalities, as well as in routine care, a unified concept of disease activity measurement and treatment aims will be beneficial. The widespread use of reproducible and simple composite measures of RA activity will facilitate this development. The present study provides support for this, but further validation of the indices in other cohorts is desirable.

Conclusion
Disease activity states according to the various indices perform similarly and show substantial agreement at all levels except remission. Agreement between SDAI and CDAI states is excellent. Response criteria, applied at the individual patient level, are hard to interpret and show poor agreement, except at the lowest level of response. Thus, they should not be applied uncritically in clinical practice.