Nonsteroidal anti-inflammatory drugs (NSAIDs), cyxlooxygenase-2 selective inhibitors (coxibs) and gastrointestinal harm: review of clinical trials and clinical practice

Background Gastrointestinal harm, known to occur with NSAIDs, is thought to be lower with NSAID and gastroprotective agent, and with inhibitors selective to cyclooxygenase-2 (coxibs) at usual plasma concentrations. We examine competing strategies for available evidence of reduced gastrointestinal bleeding in clinical trials and combine this evidence with evidence from clinical practice on whether the strategies work in the real world, whether guidance on appropriate prescribing is followed, and whether patients adhere to the strategies. Methods We used a series of systematic literature searches to find full publications of relevant studies for evidence about the efficacy of these different gastroprotection strategies in clinical trials, and for evidence that they worked and were adhered to in clinical practice – whether they were effective. We chose to use good quality systematic reviews and meta-analyses when they were available. Results Evidence of efficacy of coxibs compared to NSAIDs for upper gastrointestinal bleeding was strong, with consistent reductions in events of about 50% in large randomised trials (34,460 patients), meta-analyses of randomised trials (52,474 patients), and large observational studies in clinical practice (3,093 bleeding events). Evidence on the efficacy of NSAID plus gastroprotection with acid suppressants (proton pump inhibitors, PPIs, and histamine antagonists, H2As) was based mainly on the surrogate measure of endoscopic ulcers. The limited information on damage to the bowel suggested that NSAID plus PPI was more damaging than coxibs. Eleven observational studies studied 1.6 million patients, of whom 911,000 were NSAID users, and showed that 76% (range 65% to 90%) of patients with at least one gastrointestinal risk factor received no prescription for gastroprotective agent with an NSAID. The exception was a cohort of US veterans with previous gastrointestinal bleeding, where 75% had gastroprotection with an NSAID. When gastroprotection was prescribed, it was often described as inadequate. A single study suggested that patient adherence to prescribed gastroprotection was low. Conclusion Evidence for efficacy of gastroprotection strategies with NSAIDs is limited. In clinical practice few patients who need gastroprotection get it, and those who get it may not take it. For coxibs, gastroprotection is inherent, although probably not complete.


Background
Patient-reported outcome measures have become an important part of the assessments used in clinical studies. One of the outcome measures intended for upper extremity disorders is the 30-item disabilities of the arm, shoulder and hand (DASH) questionnaire, which has been assessed regarding reliability, cross-sectional validity and longitudinal validity in a variety of arm disorders [1][2][3]. The use of the DASH has been growing rapidly in clinical trials and other studies of upper extremity disorders and it is now available in several languages [4].
From the original DASH questionnaire a shorter version, named the QuickDASH, has been developed using what was called a "concept-retention" approach [5]. The Quick-DASH consists of 11 items from the original 30-item DASH. The QuickDASH may be more appealing to use than the DASH because a shorter questionnaire is associated with less burden on the responder as well as less administrative burden. To date, data regarding the development process and various aspects of reliability and validity have been published only for the English version of the QuickDASH [5]. It is important that translated versions of shortened questionnaires also are subjected to an appropriate validation process. Furthermore, little is known about how the QuickDASH scores can be interpreted in comparison to the DASH scores or which version is more favorable with respect to precision of the scoring.
To determine whether a shortened questionnaire may be used to replace an existing full-length questionnaire, several assessments can be performed to show that the short version should be measuring what the original version is measuring. Different aspects of cross-sectional validity can be compared [6]. Further, longitudinal construct validity, which concerns the measure's ability to detect a true change in health status and its precision in detecting changes of different magnitudes (also referred to as responsiveness or sensitivity to change) needs to be addressed to determine the clinical usefulness of the short version [7][8][9].
The purpose of this study was to evaluate the performance of the 11-item QuickDASH in comparison to the fulllength 30-item DASH regarding different aspects of validity and reliability. The data for the QuickDASH were extracted from the full-length DASH.

Design
This study was designed as a reanalysis of collected data for the 30-item DASH questionnaire, from which scores for both the DASH and the QuickDASH were calculated. The data collection process for the assessment of the lon-gitudinal construct validity of the DASH has been described previously [10]. The study was conducted in agreement with the local ethical guidelines for clinical studies and informed consent was obtained from the participants.

Questionnaire
The DASH questionnaire mainly consists of a 30-item disability/symptom scale. The two optional scales of the DASH (sport/music and work) were not part of the study. Each item in the disability/symptom scale has 5 response options. If at least 27 of the 30 items are completed a scale score, ranging from 0 (no disability) to 100 (most severe disability), can be calculated.
From the full-length DASH the 11 items that constitute the QuickDASH were extracted. To calculate a QuickDASH score at least 10 of the 11 items must be completed. Similar to the DASH, each item has 5 response options and, from the item scores, scale scores are calculated, ranging from 0 (no disability) to 100 (most severe disability).
The follow-up questionnaire included an item inquiring about change in the status of the arm as compared to its status before surgery. The item had 5 response options; much better, somewhat better, unchanged, somewhat worse, much worse. This item was accidentally missing in the initially mailed questionnaires and was therefore only completed by the last 83 participants, 82 of whom had QuickDASH scores and could be included in the present analysis.

Setting and participants
From an orthopedic department 109 of 118 consecutive patients with upper extremity disorders who fulfilled the eligibility criteria (scheduled for elective surgery, 18 years or older, symptom duration of at least 2 months, able to answer questionnaires) responded to the Swedish version of the DASH before surgery and at the follow-up evaluation. The follow-up was done at 6 to 21 (mean 12) months after surgery.
Of the 109 responders, 105 had responded to at least 10 of the 11 items used in the QuickDASH and were included in the analysis. The mean age of the 105 participants was 52 (range 18-83) years; 60 (57%) were women and 45 were men.

Analysis
The baseline, follow-up and change scores for the DASH and the QuickDASH were calculated for the whole population and for specific diagnostic groups.
To study the longitudinal construct validity the effect size (mean change score divided by the standard deviation of the baseline scores) and the standardized response mean (mean change score divided by the standard deviation of the change scores) for the DASH and QuickDASH were calculated.
To compare the performance of the DASH and the Quick-DASH in discriminating among patients who differed in the degree of arm-related disability, receiver operating characteristic (ROC) curves were constructed using change scores (baseline to follow-up) as the test variable and patients' responses to the global item concerning perceived change in arm status after surgery as the dichotomized classifying variable; the difference in the areas under the ROC curves for the two questionnaire versions was calculated [11,12]. In the first ROC analysis the DASH and QuickDASH were compared with regard to their ability to discriminate the patients who rated their arm status as "much better" or "somewhat better" (combined into one group) from those who rated it as "unchanged". In the second analysis the ability to discriminate the "much better" group from the "somewhat better" group was compared. The difference in the areas under the ROC curves indicates the magnitude of the difference in the discriminant ability of the two measures. The number of patients who had reported worsening was too small to perform an analysis comparing the ability of the 2 measures to detect deterioration.
To assess reliability the Cronbach alpha coefficient was calculated for the baseline and follow-up item responses. Agreement between the QuickDASH and the full-length DASH was assessed with the intraclass correlation coefficient (ICC) using the 2-way mixed and absolute agreement model [13]. The difference between the DASH scores and the QuickDASH scores was assessed with the paired-samples t-test. Because the QuickDASH responses were extracted from the full-length DASH some degree of correlation between part of the questionnaire and the whole is expected. To explore the possible effect of this factor we created two hypothetical 11-item short-forms by computer-generated random selection from the 30 items of the full-length DASH. These random 11-item short-forms were analyzed with regard to reliability in a similar fashion as done with the QuickDASH.
Test-retest reliability was studied in a subgroup of 30 patients (14 women) with a mean age of 54 (range 27-79) years, who had completed the full-length DASH on two occasions prior to surgery with a median interval of 5 (range 5-17) days [14]. The scores for the DASH, Quick-DASH and the random short-forms from both response times were calculated. The ICC (2-way mixed, absolute agreement) and the paired-samples t-test were used for this analysis.

Cross-sectional validity
The baseline mean DASH score was 34 (SD 22) and the mean QuickDASH score was 39 (SD 24) ( Table 1). A best possible score of zero (ceiling) at baseline was recorded for the QuickDASH in 3 patients (2.9%) and for the DASH in 1 patient (1%) and a score of less than 10 was found in 19 patients (18%) and 20 patients (19%), respectively (Figure 1). At follow-up, 12 patients (14%) had a best possible QuickDASH and 10 (9.5%) a best possible DASH score. No patient had a score exceeding 90 at any evaluation except for 1 patient who had a QuickDASH score of 93 at follow-up.
The mean difference between the QuickDASH and the DASH scores at baseline was 4.2 (SD 5.4) and the mean difference at follow-up was 2.6 (SD 4.6). The mean difference between the QuickDASH and DASH change scores was 1.7 (SD 5.8; 95% CI 0.6-2.8).
For the different diagnostic groups the mean and median QuickDASH scores were higher than the corresponding DASH scores by up to 5 points in most groups (Table 2). Among patients with shoulder disorders the mean DASH score was 44 (SD 15) and the mean QuickDASH score was 49 (SD 18); the difference among patients with CTS was even larger. In the ROC analysis of the change scores for the patients who rated their arm status after surgery as better (including "much better" and "somewhat better") and those who rated it as "unchanged", the difference in the area under the ROC curves for the DASH and QuickDASH was 0.01 (95% CI -0.05-0.07), indicating no difference in their ability to discriminate between the 2 groups (Table 3). In the ROC analysis comparing the ability to discriminate the "much better" group from the "somewhat better" group, the difference in the area under the ROC curves for the DASH and the QuickDASH was 0.03 (95% CI -0.03-0.09).

QuickDASH
In the assessment of cross-sectional reliability among the 105 responders, the alpha coefficient for the scores exceeded 0.90 and the corrected item-total correlations (ITC) exceeded 0.62, except for 1 item with ITC of 0.42 at baseline ( Table 4). The ICC values for the agreement between the QuickDASH and the DASH scores were high, exceeding 0.90 at baseline and follow-up.
In the analysis of test-retest reliability, the ICC for the QuickDASH scores on the 2 response times was high and the mean difference between the QuickDASH scores on the first and second response time was almost zero and the 95% confidence interval was within 4 points in each direction.

Discussion
The aim of this study was to compare the performance of the 11-item QuickDASH with that of the 30-item DASH, with the QuickDASH scores extracted from the responses to the full-length DASH. The results indicate that the DASH can be replaced by the shorter QuickDASH. The magnitude of the differences between the DASH and the QuickDASH scores found in this study implies that the same questionnaire should be used in longitudinal studies because the score differences between the questionnaires may inflate small random differences and make them reach the level of an important change.
In all analyses the QuickDASH scores were slightly higher than the corresponding DASH scores, which may be an advantage for the QuickDASH as this allows for larger improvement to occur, provided that the scores considered as "normal" are equal. Among the different diagnostic groups the QuickDASH mean scores were higher; in fact this difference was more pronounced among patients with greater disability, such as those with shoulder disorder, than patients with little disability, such as those with wrist ganglion (Table 2). This suggested that the Quick-DASH potentially had better precision in detecting different degrees of disability. To further assess possible differences in the two measures' ability to detect improvement, ROC curves were studied. In all analyses, the confidence intervals for the difference contained null, indicating that no differences were found between the DASH and the QuickDASH in their ability to discriminate among groups that differed in the degree of self-rated improvement in arm status after surgery.  *calculated as the score at time 1 minus the score at time 2. † All ICC values were statistically significant (p < 0.001). ‡ All differences between the DASH and the other 3 forms were significant (p ≤ 0.001) except for random-11 form 2 at follow-up (p = 0.053) a p = 0.6, b p = 0.9, c p = 0.2. ICC, intraclass correlation coefficient; alpha, Cronbach alpha coefficient; ITC, item-total correlation Table 3: The area under the receiver operating characteristic (ROC) curve for the DASH and the QuickDASH, constructed using the change scores for patients classified, according to their response to the global item about self-rated improvement in arm status after surgery, into "much better", "somewhat better" or "unchanged"; item administered to 82 participants.
The remaining responders to the global item were 6 with "much worse" and 2 "somewhat worse" responses.
In the study assessing the English-version QuickDASH the standardized response mean, calculated for the total population of 171 patients with various disorders, was 0.78 for the DASH and 0.79 for the QuickDASH [5]. In our study the standardized response mean for the DASH and QuickDASH also were similar, with values of 0.61 and 0.63, respectively. The mean scores for the DASH and QuickDASH in different diagnostic groups were more similar in the study of the English-version QuickDASH than in our study. However, limited data was available and the score distributions for the groups were not shown making comparisons difficult.
In this study, as in the study that reported the development and validation of the English version [5], the Quick-DASH scores were computed from the full-length DASH responses. It is not known if patients' responses to the 11 items would have differed if only the QuickDASH were administered. In a study of the performance of three SF-36 scales (physical functioning, bodily pain and general health perceptions) no significant differences were found when the scales were administered independently compared to when they were administered within the full 8scale questionnaire [15]. However, these were full scales and not selected items as is the case with the QuickDASH.
The results of the present study, based on QuickDASH responses extracted from the full-length DASH, are promising but further assessment of the short version administered to different patient groups would be useful. Because of the small number of patients in certain diagnostic groups as well as the small number of patients with unchanged or worsened self-rated arm status the results involving these groups may need to be interpreted with caution.
The reliability of the QuickDASH was good. However, the 2 randomly constructed 11-item forms also had similarly good reliability and agreement with the DASH. The 2 random short forms showed higher scores than the DASH at baseline and follow-up, which also was found with the QuickDASH. Although the differences were statistically significant, their magnitude may not be considered as clinically important. The findings may suggest that the 30item full DASH may contain redundant items and that fewer items would be sufficient for assessing disability with the same degree of reliability and validity. It might be argued that the random short forms may not cover all relevant domains. However, the results of the DASH or QuickDASH are usually not presented as a number of separate components or domains because they are not validated as such. Moreover, the DASH and QuickDASH are predominantly composed of activity items that measure physical disability leaving little impact for the non-activity items. Because the item responses were extracted from the responses to the full-length DASH it may not be possible to compare with certainty the individual performance of the QuickDASH as compared to other possible short forms of the DASH.
In this study all participants underwent surgery, an intervention that often results in large score change. The effect size and standardized response mean measured with Scatter plot of the QuickDASH and DASH scores at baseline and follow-up Figure 1 Scatter plot of the QuickDASH and DASH scores at baseline and follow-up. DASH and QuickDASH in populations treated with surgery may be larger than those measured after other interventions. However, the overall effect size in this population was moderate probably because the different diagnostic groups had large variation in the degree of baseline disability with some groups having low scores before treatment allowing only small score improvement. The results support the use of the QuickDASH even in the assessment of interventions expected to have smaller effect size.
The findings of this study are primarily related to the validity and reliability of the Swedish version of the Quick-DASH (available online [4]). Although many aspects also may apply to QuickDASH versions that are derived from other translated full-length versions with established validity and reliability, other language versions would still require appropriate assessment.

Conclusion
The results of this study indicate that the QuickDASH can be used instead of the DASH to measure disability/symptom severity with similar precision in a variety of arm disorders.