Minimal clinically important decline in physical function over one year: EPOSA study

Background The Australian/Canadian hand Osteoarthritis Index (AUSCAN) and the Western Ontario and McMaster Universities knee and hip Osteoarthritis Index (WOMAC) are the most commonly used clinical tools to manage and monitor osteoarthritis (OA). Few studies have as yet reported longitudinal changes in the AUSCAN index regarding the hand. While there are published data regarding WOMAC assessments of the hip and the knee, the two sites have always evaluated separately. The current study therefore sought to determine the minimal clinically important difference (MCID) in decline in the AUSCAN hand and WOMAC hip/knee physical function scores over 1 year using anchor-based and distribution-based methods. Methods The study analysed data collected by the European Project on Osteoarthritis, a prospective observational study investigating six adult cohorts with and without OA by evaluating changes in the AUSCAN and WOMAC physical function scores at baseline and 12–18 months later. Pain and stiffness scores, the performance-based grip strength and walking speed and health-related quality of life measures were used as the study’s anchors. Receiver operating characteristic curves and distribution-based methods were used to estimate the MCID in the AUSCAN and WOMAC physical function scores; only the data of those participants who possessed paired (baseline and follow up-measures) AUSCAN and WOMAC scores were included in the analysis. Results Out of the 1866 participants who were evaluated, 1842 had paired AUSCAN scores and 1845 had paired WOMAC scores. The changes in the AUSCAN physical function score correlated significantly with those in the AUSCAN pain score (r = 0.31). Anchor- and distribution-based approaches converged identifying 4 as the MCID for decline in the AUSCAN hand physical function. Changes in the WOMAC hip/knee physical function score were significantly correlated with changes in both the WOMAC pain score (r = 0.47) and the WOMAC stiffness score (r = 0.35). The different approaches converged identifying two as the MCID for decline in the WOMAC hip/knee physical function. Conclusions The most reliable MCID estimates of decline over 1 year in the AUSCAN hand and WOMAC hip/knee physical function scores were 4 and 2 points, respectively.


Background
The Australian/Canadian Hand Osteoarthritis Index (AUSCAN) [1] and the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) [2,3] scales are self-report instruments measuring pain, stiffness and physical function linked to osteoarthritis (OA), and have been used by the European Project on Osteoarthritis (EPOSA) to assess personal and societal variables affected by OA, such as quality of life (QoL), social participation, and health care use in several ageing European cohorts. The individuals enrolled in the project were receiving treatment for severe OA, had undiagnosed or untreated OA or did not have OA at all [4].
The Minimum Clinically Important Difference (MCID) is defined as the smallest change in a score that a patient perceives as beneficial or detrimental [5]. There are different types of MCID, depending on whether there has been an improvement or a worsening in the variable being measured and on the external standard being employed [6].
Until now, to our knowledge, the MCID in the AUS-CAN hand and WOMAC hip/knee physical function scales has received scarce attention. Specifically few studies report longitudinal changes in the AUSCAN [7]. While some studies have investigated the WOMAC scales [7][8][9], the two sites of hip and knee have always been evaluated separately [10][11][12][13][14][15]. Moreover, the MCID has almost always been considered from an improvement perspective, as the majority of studies have aimed to examine the efficacy of pharmacological interventions [11,13], rehabilitation programs [8,9], and/or of surgical treatments [10,12,14,15].
The aim of the current study was therefore to estimate the MCID in the AUSCAN and WOMAC physical function subscales using distribution-based and anchor-based methods for longitudinal changes. We postulated that the AUSCAN and the WOMAC physical function scores would worsen [16,17] with time (i.e., there would be a rise in both) and that the changes in the AUSCAN and WOMAC physical function scores would correlate significantly with changes in other well-established OA health variables or performance-based measures [18][19][20][21][22][23][24][25].

Participants
The current study analysed data collected by the European Project on OSteoArthritis (EPOSA), a population-based study involving cohorts living in Germany, Italy, the Netherlands, Spain, Sweden, and the UK, that recruited 2942 adults between the ages of 65-85. All the participants gave written informed consent; the study design and methodology are outlined in detail elsewhere [4]. The study design was granted approval by the appropriate local ethics committees (Germany: The project aimed to evaluate the participants once at baseline (between November 2010 and November 2011) and a second time 12-18 months later. During the assessment the participants underwent a clinical examination and were interviewed at home or in a health care centre by trained physicians and nurses using a standardized questionnaire.
Physical function, pain and stiffness of hip and/or knee OA were measured at baseline and 12-18 months later using the three subscales of the WOMAC (24 items grouped into 3 scales: pain (5 items), stiffness (2 items), and physical function (17 items)) that utilized a 5-point Likert scale. Hip/knee pain and stiffness were defined as the maximum value of two joints [2,3].
The Grip strength of both hands at baseline and 12-18 months later was measured twice, using a strain gauge dynamometer and the result (the highest values of the right and left hands were summed and divided by 2) is expressed in kilograms [26].
Walking speed was measured by time, registered in seconds, for a 3-m course marked out on the floor with no obstructions for an additional 2 ft at both ends.
Anxiety and depression were evaluated at baseline and 12-18 months later using the Hospital Anxiety and Depression Scales (HADS), a 14-item self-assessment instrument that measures anxiety and/or depression separately [27]. Scores that are 8 or higher (scores range between 0 and 21) on each/either of the subscales indicate altered states.
Health-related QoL was measured at baseline and 12-18 months later using: the 5-level EQ-5D, consisting of a descriptive system comprising five dimensions (mobility, self care, usual activities, pain/discomfort, anxiety/ depression) and the EQ VAS, a vertical visual analogue scale [28]. The scores of the EQ-5D were converted into a single index, based on general population surveys, using the time trade-off (TTO) valuations from the general population of the UK; scores between − 0.594 and 1. 1 indicate good or satisfying health. Scores of the EQ-VAS range between 0 and 100, with higher scores indicating better health.
Clinical diagnosis of OA was formulated on the basis of the participant's medical history and a physical examination (only at baseline), utilising algorithms in accordance with the clinical criteria developed by the American College of Rheumatology (ACR) [29] and the recommendations of the European League Against Rheumatism [30].
Clinical hand OA (classified as present vs absent) was diagnosed using specific AUSCAN sections [1]: the cut-off in the algorithm for hand pain was ≥3 and it was ≥1 for stiffness. At least 2 of the following criteria were required for a diagnosis of hand OA: a) hard tissue enlargement of two or more joints, b) hard tissue enlargement of two or more distal inter-phalangeal joints, c) deformity of at least one hand joint. Swelling of the metacarpophalangeal joints criteria was a variable that was assessed only in the English and German participants.
Clinical hip/knee OA, defined as the presence of OA in at least one or both of these joints, was diagnosed on the basis of the outcome of specific WOMAC sections. Pain in the hip/knee on at least one side [2,3] was evaluated during the physical examination using a cut-off ≥3. For the participants, to be diagnosed with knee OA, at least two of the following were necessary: a) a morning stiffness score from mild to extreme; b) crepitus with active motion on at least one side at the physical examination; c) bone tenderness on at least one side at the physical examination; d) bone enlargement at the physical examination on at least one side; e) no palpable warmth of synovium at the physical examination in either knees. All of the following were needed for a positive hip OA assessment: a) pain in the hip on at least one side associated with restricted hip internal rotation at a physical examination; b) morning stiffness of the hip lasting < 60 min, evaluated using the stiffness section of the WOMAC with a score from mild to extreme.

Statistical analysis
Data analyses and graphical presentations were carried out using SAS software (SAS System, SAS Institute Inc., Cary, NC), version 9.4. Data were analysed using a set of weights calculated per sex and per 5-year age class with respect to the 2010 Standard European Population [4].
The changes over times (in the 12-18 months between the baseline evaluation and the follow-up one) were evaluated as continuous variables using the non-parametric signed rank test. Spearman's correlation was used to compare the changes in the AUSCAN and WOMAC physical function scales and the changes in the other variables; the Cronbach α coefficient was used to measure the scales' reliability (internal consistency) (values of α ≥ 0.7 reflect a good reliability) [31].
Only the data of the participants whose assessments were considered complete, that is they had completed both the baseline and follow-up assessments, were included in the statistical analysis. The MCID was calculated by measuring the changes from basal to follow-up measurements scores. Since the MCID for subject-reported outcome measures may vary in different populations and depending on the context, as recommended by Revicki et al. [32], we used multiple approaches to estimate the MCID in the AUSCAN and WOMAC physical function scores to triangulate on a single value or on a small range of values.
For anchor-based estimation of MCID we used the receiver operating characteristic (ROC) curve on the change score in the anchor. The variables assessed as possible anchors for the AUSCAN hand OA physical function score were: the AUSCAN for hand OA Pain, the AUSCAN for hand OA Stiffness, the Grip strength, the HADS anxiety, the HADS depression, the EQ-5D-5 L, and the EQ VAS. The variables evaluated as possible anchors for the WOMAC for hip/knee OA physical function score were: the WOMAC for hip/knee OA Pain, the WOMAC for hip/knee OA Stiffness, the Walking-test time, the HADS anxiety, the HADS depression, the EQ-5D-5 L, and the EQ VAS.
An anchor should be chosen because of a significant correlation between the change in the physical function score and the change in the anchor and a correlation coefficient ≥ |0.30| [31].
A ROC curve was constructed for those participants showing stable or worsened anchor scores; the area under the curve (AUC) summarizes the instrument's ability to distinguish between individuals who have or do not have a minimal clinically important difference in functionality. The criteria used to calculate the probability of an optimal cut-off were: the Youden index (J) [33], the Euclidean distance (D), and the equality sensitivity and specificity (S). The percentage of participants exceeding the MCID were estimated for each cut-off value.
Anchor-based and distribution-based-methods were used to determine the MCID, and on that basis the participants were divided into two categories: worse/no worse functionality at the second assessment point. Triangulation was used to examine multiple values from different approaches to converge on a single value, with Cohen's k (range from − 1 to 1, with one indicating a perfect agreement).

Results
Out of the original 2942 participants who completed the baseline evaluation ( Fig. 1), 2455 (83%) agreed to undergo a follow-up evaluation 12-18 months later. It was not possible to re-evaluate 487 participants (16.6% of the baseline sample) because they had died, were untraceable, or declined to participate. The non-completers were significantly older, more likely to be female, less educated, and predominantly Italian with respect to the completers.
Since information from the German group was incomplete, data from that country (n = 336, 14% of 2455), were not analysed. The data that was analysed and upon which our results are therefore based represents 1866 participants who completed both of the study's evaluations. One thousand, eight hundred forty-two of these had paired AUSCAN measurements and 1845 had paired WOMAC measurements. Table 1 (weighted data) shows that approximately 17% of the participants had clinical hand OA and more than 22% had clinical hip/knee OA. The median changes in the physical function scores detected using the AUS-CAN hand and WOMAC hip/knee subscales were significant, as were the median changes in the WOMAC stiffness score. There were no significant changes in the AUSCAN pain and stiffness subscales and in the WOMAC pain subscale over time.
All other changes in grip strength, walking-test time, the HADS scales, and the EQ-5D-5 L were significant over time. Table 2 shows the correlation coefficient in the change in the AUSCAN and the WOMAC physical function scores and in the change in the other measures that were considered as possible anchors. For the hand, only the changes in the AUSCAN pain scores were significantly

AUSCAN physical function estimates of MCID
Using hand pain as an external anchor, the estimates of the MCID in the AUSCAN hand physical function were consistent and equal to one. The only divergent criteria was the Youden index according to which the estimated MCID for the hand was four. Using distribution-based methods, the estimate for significant worsening in the AUSCAN physical function score ranged from 1 to 8.
Based on these cut-offs, the participants were divided into worse vs not worse in functionality 12-18 months after baseline (Fig. 2).
When the percentage of values obtained with distribution-based MCID methods were compared with those produced by anchor-based methods, the two sets agreed most strongly according to Cohen's k. The first set (k values at approximately 0.97) formed by the ROC D Euclidean distance, the ROC S for equal sensitivity and specificity, and the SRM2, identified approximately 34-35% of the participants with clinically significant physical function decline at the 12-18 month follow-up evaluation. The second set (k values ranging from 0.95 The P are bold where they are less than or equal to the significance level cut-off of 0.05 to 1) which was formed by the ROC J Youden index, the SRM5, and the SEM63, uncovered that 24% of the participants had a clinically significant decline. In view of the concordance and the recommendation to privilege the anchor-based methods [32], we compared the MCID based on the ROC J/SEM63 and the one based on the ROC D/S. Out of the 639 worse participants identified by the ROC D/S criterion, 453 were the same ones identified by ROC J/SEM63. The MCID based on the ROC J/SEM63, which estimated a change of 4 points, was found to be the most reliable criterion to analyse the loss of hand functionality at 12-18 months.  moderate. The ROC analysis of the anchor responses to the WOMAC pain and stiffness scales estimated that the MCID was almost always one. Once again, the divergent criteria was the Youden index with stiffness as the anchor that estimated a two point MCID for hip/knee physical function. Using distribution-based methods, the estimate for significant worsening in the WOMAC physical function score ranges from 1 to 9. Figure 3 shows the percentage of participants who had worse hip/knee functionality 12-18 months after baseline according to the different methods utilized. The MCIDs for hip/ knee physical function decline that showed the highest degree of agreement were: those based on the SRM2, those that used the WOMAC pain score as the anchor minimized the Euclidean distance and the equality sensitivity and specificity as well as those that maximized the Youden index (k = 0.94). The SRM2 also agreed with those that, using the WOMAC stiffness score as the anchor, minimized the Euclidean distance and the equality sensitivity and specificity (k = 0.94), or maximized the Youden index (k = 1). These methods identified 30 and 33% of participants with clinically significant hip/knee physical function decline 12-18 months after baseline respectively. Finally, there was a strong agreement (k = 0.89) between the SEM63 and the Youden index with the stiffness score used as an anchor; they respectively detected 26 and 30%.of the participants.

WOMAC physical function estimates of MCID
As the highest degree of agreement was found between the Youden index (using ROC with stiffness as the anchor) and the SRM2, the MCID based on these criteria seemed to be the most suitable one to analyse the loss of hip/knee functionality at 12-18 months. Both criteria were consistent in identifying two as the best discriminating WOMAC physical function change cut-off.

Discussion
While it is true that the AUSCAN and WOMAC scales are the most commonly used clinical tools to manage and monitor OA patients, to our knowledge the MCID for decline picked up by these measures has never been evaluated. Only the study of Angst et al. [9], which focused on patients with OA of the lower extremities after a rehabilitation intervention, reported a MCID showing worsening in the WOMAC hip/knee physical function of approximately 1.33, based on a scale from 0 to 10. The study's initial premise that hand and hip/knee physical function would deteriorate significantly over a year's time was confirmed by our data showing higher AUS-CAN and WOMAC physical function scores.
Although the relevance of the MCID approach remains controversial and despite the fact that physical function values can depend on the population being examined, the context, the time and methods used [25], it remains an important assessment instrument. The magnitude of the MCID was inferior in the participants studied here using the WOMAC instrument with respect to other studies [7][8][9][10][11][12][13][14][15]. As those studies focused on patients before and after interventions, the differences in the magnitude of the MCID might be connected to patient expectations regarding surgical interventions, as compared to non-surgical interventions [38]. Other factors that might explain the differing MCID values could be: the severity of the participant's baseline health status, the length of the period being examined, the accuracy of the measurement instruments, and the direction in the change in the MCID (i.e. towards improvement or worsening).
Other studies have demonstrated that changes in the AUSCAN and WOMAC physical function scores correlate significantly with changes in other generic, adapted, or performance-based measures used to gauge pain and function in the hand and hip/knee OA [18][19][20][21][22][23][24][25]. Our study did not, however, find any correlations between the changes in the AUSCAN and WOMAC physical function and the changes in other more generic measures such as the Hospital Anxiety and Depression Scale and the European QoL Surveys. The anchors with the strongest correlations were the pain-specific questionnaires (AUSCAN and WOMAC pain subscale), presumably because they are basically measures of pain during physical activities rather than unspecific pain measures [39].
The estimates of the MCID in both the AUSCAN hand physical function and the WOMAC hip/knee physical function according to the ROC analysis using different anchor responses and criteria, were consistent. The only divergent criteria was the Youden index that overestimated the MCIDs.
But as explained above, besides an anchor-based method, we also used a distribution-based approach to estimate the MCID, given that the two are complementary [40]. Distribution-based approaches, which are based on the statistical characteristics of the samples studied and reliable measures, generated a wide range of different estimates of the MCID for the AUSCAN hand and WOMAC hip/knee physical function that were greater than the anchor-based estimates. Both methods converged to a common result.
While distribution-based estimates are able to furnish supportive information when the change is significant, they do not provide a direct measure of minimum clinically important difference. That is why precedence was shown to the anchor based estimates. Moreover, since the MCIDs estimated using distribution-based methods were greater than the mean change reported 12-18 months after baseline, it is possible that the data from the distribution-based methods provide information about clinical significance but might overestimate the true MCID.
This study has several limitations. First, the changes in outcome measures could hypothetically be associated with baseline levels. Second, there is the possibility that the participants selected did not experience much or any change over the 12-18 month study period. Third, there may be even important differences in the populations studied and in the cut-off values of the MCID physical function decline. Indeed, estimates can differ depending on the instrument, domain, country, and condition, at least for condition-specific measures, and further research is required before the estimates presented here can be generalized to other instruments [7].
The study's most important strength was undeniably its large population base: the participants were randomly selected from older community-dwelling European populations. Not only persons with OA but also large numbers of healthy individuals not affected with OA were analysed. The methodology used was the same in all of the countries, and OA was diagnosed in accordance with standardized international guidelines [4]. Our study was based on valid standardized globalized measures (the WOMAC and AUSCAN Indexes) suggested by guidance documents [41,42], and, in fact, they proved to be quite reliable. The data are longitudinal in nature. Another study strength was that it describes the decision-making process leading to the selection of a single value from a range of different MCID cut offs by comparing the percentages of change scores exceeding the MCID. The process considered the major concordance between those based on anchor-based methods and those based on distribution-based approaches, privileging those based on the former [40,43,44], and evaluated the differences in terms of clinical OA.

Conclusion
To conclude, the study shows that the AUSCAN hand and WOMAC hip/knee physical function scores are indeed sensitive to the effects of OA. The data analysed using various health and physical performance measures as external anchors showed that the minimally important decline over 1 year in the AUSCAN and WOMAC physical function scores was four and two points respectively. Further research is required to confirm the robustness of these estimates and to evaluate their temporal consistency and country-dependency.

Availability of data and materials
The data that support the findings of this study are available from https:// www.eposa.org/ but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of the EPOSA Research Group.