Comparison of the SF6D, the EQ5D, and the oswestry disability index in patients with chronic low back pain and degenerative disc disease

Johnsen, Lars G; Hellum, Christian; Nygaard, Øystein P; Storheim, Kjersti; Brox, Jens I; Rossvoll, Ivar; Leivseth, Gunnar; Grotle, Margreth

doi:10.1186/1471-2474-14-148

Research article
Open access
Published: 26 April 2013

Comparison of the SF6D, the EQ5D, and the oswestry disability index in patients with chronic low back pain and degenerative disc disease

Lars G Johnsen^1,2,3,
Christian Hellum⁴,
Øystein P Nygaard^1,3,7,
Kjersti Storheim^4,6,
Jens I Brox⁴,
Ivar Rossvoll^1,2,3,
Gunnar Leivseth⁵ &
…
Margreth Grotle^8,9

BMC Musculoskeletal Disorders volume 14, Article number: 148 (2013) Cite this article

12k Accesses
109 Citations
3 Altmetric
Metrics details

Abstract

Background

The need for cost effectiveness analyses in randomized controlled trials that compare treatment options is increasing. The selection of the optimal utility measure is important, and a central question is whether the two most commonly used indexes - the EuroQuol 5D (EQ5D) and the Short Form 6D (SF6D) – can be used interchangeably. The aim of the present study was to compare change scores of the EQ5D and SF6D utility indexes in terms of some important measurement properties. The psychometric properties of the two utility indexes were compared to a disease-specific instrument, the Oswestry Disability Index (ODI), in the setting of a randomized controlled trial for degenerative disc disease.

Methods

In a randomized controlled multicentre trial, 172 patients who had experienced low back pain for an average of 6 years were randomized to either treatment with an intensive back rehabilitation program or surgery to insert disc prostheses. Patients filled out the ODI, EQ5D, and SF-36 at baseline and two-year follow up. The utility indexes was compared with respect to measurement error, structural validity, criterion validity, responsiveness, and interpretability according to the COSMIN taxonomy.

Results

At follow up, 113 patients had change score values for all three instruments. The SF6D had better similarity with the disease-specific instrument (ODI) regarding sensitivity, specificity, and responsiveness. Measurement error was lower for the SF6D (0.056) compared to the EQ5D (0.155). The minimal important change score value was 0.031 for SF6D and 0.173 for EQ5D. The minimal detectable change score value at a 95% confidence level were 0.157 for SF6D and 0.429 for EQ5D, and the difference in mean change score values (SD) between them was 0.23 (0.29) and so exceeded the clinical significant change score value for both instruments. Analysis of psychometric properties indicated that the indexes are unidimensional when considered separately, but that they do not exactly measure the same underlying construct.

Conclusions

This study indicates that the difference in important measurement properties between EQ5D and SF6D is too large to consider them interchangeable. Since the similarity with the “gold standard” (the disease-specific instrument) was quite different, this could indicate that the choice of index should be determined by the diagnosis.

Peer Review reports

Background

An important way of assessing the effects of treatment in health economic evaluations is the use of utility indexes. The outcome scores of general health-related quality of life (HRQoL) questionnaires are stratified into different health states [1, 2] that can then be validated in a community population [3, 4]. Treatment benefit is thus expressed in a way that allows health states that are considered less preferable (0) to full health (1) to be given quantitative values. Because these quantitative values represent a valuation or preference of health states for the patients, they are called utility indexes (more utility for the patient with increasing value). When combined with a follow up period, health utility indexes are used to calculate quality-adjusted life years (QALYs). There are several utility indexes that could be used, and discrepancies exist regarding which index is most suitable [1, 5]. These discrepancies could have implications for calculating cost-effectiveness when comparing alternative treatment options for the same disease [6–10]. Two of the most widely used indexes are the EuroQuol 5D (EQ5D) and the Short Form 6D (SF6D) [4, 7].

Two papers assessed the impact that the measure has on cost-utility estimates [8, 9]. Sach et al. found that the SF6D and EQ5D favored different treatment options for alleviating knee pain when applying the same cost per QALY threshold. Søgaard et al. [11] reported on the interchangeability of the two indexes. When plotting difference between change scores of SF6D and EQ5D against their average in a Bland-Altman plot , they found that the expected between-measure variation was 0.546 [12]. They conclude that although both indexes appear to be psychometrically valid for generic assessment of long-lasting back pain, the variation between them was too great to be considered interchangeable.

From other studies, we could hypothesize that there would be a discrepancy between the EQ5D and SF6D because of differences in valuing similar health states, evidence of a floor effect in the SF6D and a ceiling effect in the EQ5D, and because the SF6D can describe severe health states better than EQ5D [7, 13, 14].

Further work is required in this field to understand these discrepancies. Therefore, the aim of this study was to evaluate change scores values of the EQ5D and SF6D utility indexes in terms measurement error, structural validity, criterion validity, responsiveness, and interpretability according to the COSMIN taxonomy. The psychometric properties of the two utility indexes were compared to a disease-specific instrument, the Oswestry Disability Index (ODI), in the setting of a randomized controlled trial for degenerative disc disease.

Methods

Details about the RCT on which this work is based is reported in detail in Hellum et al. [15]. Between April 2004 and September 2007, 172 patients with diagnosed chronic low back pain and degenerative disc disease were randomized to either surgery with total disc replacement or multidisciplinary rehabilitation. The results from this study have been published previously [15].

Briefly, data were collected in a multicentre randomized controlled trial involving the five university hospitals in Norway. Inclusion criteria included age between 25 and 55 years, LBP for more than a year, degenerative changes in the intervertebral disc in one of the two lowest levels of the lumbar spine and an Oswestry Disability Index score of 30% points or more. Exclusion criteria included generalized chronic pain syndrome and degeneration established in more than two levels. Part of this study was an economic evaluation of chronic low back pain treatment. Patients were randomized to either surgery with insertion of an artificial disc or to non-surgical treatment (a multidisciplinary back rehabilitation program).

The outcomes of patients who completed the SF6D, EQ5D, and ODI at baseline and at 2-year follow up were included in this study.

Instruments

ODI

The ODI is a back-specific questionnaire [16, 17]. Patients rate physical disability in activities of daily living due to low back pain in 10 questions, each of which has verbal response alternatives. Ratings are summed to yield a score ranging from 0 (not disabled at all) to 100 (completely disabled). We used the Norwegian translation of the validated questionnaire (version 2.0) [18].

SF6D

The SF6D utility index is comprised of 11 items from the SF-36 [19] that were revised into a six-dimensional health state classification system. The six dimensions are physical functioning, role limitations, social functioning, pain, mental health, and vitality. It reflects a continuous outcome scored on a 0.29–1.00 scale, with 1.00 indicating full health [3]. SF6D health states were evaluated against a normal population using the Standard Gamble (SG) method. We used the United Kingdom (UK) tariff [3]. The SF6D was calculated based on the Norwegian SF-36 (version 2) with the use of syntax files in SPSS 17(SPSS, New York, US). The syntax files were kindly provided by Dr J. Brazier, University of Sheffield, UK.

EQ5D

For the EQ5D utility index, responses on a questionnaire with five dimensions, each comprised of three levels, are revised into an index with a range from −0.59–1, with 1.00 indicating full health. The 243 possible health states on the EQ5D are evaluated against a normal population using the time trade off method (TTO) [20, 21]. We used the Norwegian version of the EQ5D and syntax files obtained from the EQ5D society using the UK tariff to calculate the index.

Seven-point scale for patient assessment

Many authors suggest a seven-point scale to assess patient outcome in terms of a global score [22]. On the question: “How much benefit do you think you have had from the treatment you have received?” patients answered on a 7-category response scale that ranged from “I am completely disabled” to “I am completely recovered”.

Data analysis

We followed the definitions and recommendations from The COSMIN (COnsensus-based Standards for the selection of health Measurement INstruments) checklist when analyzing the psychometric properties of the two utility indexes and ODI in this study [23].

If not otherwise mentioned, SPSS version 17 was used in the statistical analysis.

Measurement error

Measurement error concerns the systematic and random error of a patient`s score that is not attributed to true changes in the construct to be measured [24]. We used the standard error of measurement (SEM) to express instrument imprecision [22, 25–27]. The advantage of using SEM is that it is considered to be an attribute of the measure and not a characteristic of the sample itself [28]. The SEM value could be calculated from a test-retest study or in a group of stable patients. The SEM in this study was calculated as:

S_{w} = S E M = \sqrt{\frac{1}{2 n} \sum d_{t}^{2}}

where s_w is the within-subject standard deviation, d is the difference between two observations in patients i who reported “unchanged” on a four-point scale between 3 and 6 months follow up and n is the number of subjects [29]. The s_w statistics is also called the SEM_consistency[30].

The lowest change that exceeds measurement error and noise at a 95% confidence level is defined as:

M D C_{95} = 1.96 * \sqrt{2} * S E M = 2.77 * S E M

Here, the * $\sqrt{2}$ is introduced because there are two measurements for each patient. The minimum detectable change (MDC) at a 95% confidence level, is denoted MDC₉₅[31]. With a scale value ≥MDC₉₅, we can be 95% certain that a change in the measured underlying construct has really occurred [32].

To assess the agreement between EQ5D and SF6D, a Bland Altman plot was constructed. [12]. The average EQ5D and SF6D change score values were plotted against the mean difference in change score values of both instruments. Limits of Agreement (LoA) based on a +/− 1.96*SD_difference interval for the differences were also constructed.

Structural validity

Structural validity concerns the degree to which the scores of an instrument are an adequate reflection of the dimensionality of the construct to be measured [33]. Both EQ5D and SF6D are constructed to measure the dimension of general health related quality of life (HRQoL) alongside a continuous scale (from low to high). Using Item Response Theory (IRT), the unidimensionality of the two utility indexes was tested. The category ordering of the questionnaire items (the probability of moving from an easier to a harder accomplished category of item answers in parallel with being increasingly disabled) was also tested.

We employed the unrestricted (Partial-Credit) polytomous model of the Rasch model (for general information about fit to the Rasch model, see Additional file 1) and the test proposed by Smith to reveal unidimensionality [34]. The SF6D and EQ5D were tested for unidimensionality in a principal component analysis (PCA) [35]. We performed a test equating procedure with baseline values from the SF6D and the EQ5D. The response of each patient to a question was tested against what was predicted by the Rasch model. Deviation from the model is expressed in residuals. Independent t-tests were used to test if the magnitude of the residuals represents a significant deviation. The CI calculated for this was 95%. We carried out a binominal test for the proportion of t-tests outside the range of −1.96–1.96. The software used in the Rasch analysis was RUMM 2020 (RUMM Laboratory Pty Ltd.).

Criterion validity

Criterion validity concerns the degree to which the scores of an instrument are an adequate reflection of a “gold standard” when this is present [33]. In this analysis we compared the scores of the EQ5D and SF6D to the disease specific instrument ODI. The rationale was that the ODI has been found to be a responsive and valid measure for patients with LBP [16, 18, 36] and that an improvement assessed by the ODI should be correlated with an improvement assessed by the two utility indexes.

Spearman rank correlation coefficient (r) with 1000 bootstrap replications of the baseline scores was calculated to assess the correlation between the scores of the EQ5D and ODI and SF6D and ODI.

Responsiveness

Responsiveness is defined as the ability of an instrument to detect change over time in the construct to be measured [33]. Responsiveness was assessed by using the ODI and the seven-point global scores at 2-year follow-up as “gold standard”. First, we calculated the Spearman rank correlation coefficient (r) with 1000 bootstrap replications for the correlation between change scores from baseline to 2 year FU for the EQ5D, SF6D and ODI. Second, we analyzed the area under the Receiver Operator Curve (ROC) for the change scores of the EQ5D, SF6D and ODI by using a dichotomization of the patient global scores as follows: Categories 1 to 3 was considered “improved” and categories 4 to 7 were “non-improved”. Sensitivity was defined as the proportion of patients who were correctly classified as “improved” and specificity was defined as the proportion of patients who were correctly classified as “non-improved”. A receiver operating characteristic (ROC) curve was then calculated by plotting every possible change score from baseline to 2 year FU for EQ5D, SF6D and ODI using the global score as an anchor [37, 38]. The area under the ROC curve (AUC) was then calculated. This value corresponds to the possibility of correctly diagnosing a patient as having improved when this is really the case [38] and reflects how responsive the instruments are to detect a change in the underlying construct.

The calculation of ROC curves was performed with MedCalc Statistica software (version 11.1.1. for Windows, Brussels, Belgia).

Interpretability

Interpretability concerns the qualitative meaning of quantitative scores or change in scores. A core question is: “What is the smallest change in score in the construct to be measured which patients consider important? This is expressed as the Minimal Important Change (MIC) value [33], and is calculated based on the sensitivity and specificity results from the ROC analysis described above. The cut-off value for differentiating between patients with or without improvement at optimum sensitivity and specificity was determined using ROC analysis [38]. This corresponds to the upper left point on the ROC curve and it can be interpreted as the point or value that yields the lowest overall misclassification [25, 39].

Study approval

The study was evaluated and approved by the regional Committee for Medical Research Ethics in east Norway. Storage of data was allowed by the Norwegian Data Inspectorate. The study was conducted in accordance with the Helsinki Declaration and the ICH-GCP guidelines and registered at clinicaltrial.gov under the identifier NCT00394732.

Results

At inclusion, there were 52,6% females. Mean age was 41 years and mean (SD) duration of low back pain was 6 years (5,74). Response rates at baseline and 2-year follow up and pre- and post-treatment scores are presented in Table 1. At baseline, 133 out of 173 patients had completely filled out the ODI, the EQ5D, and the SF-36, so values for each of the instruments could be calculated. At 2-year follow up, 113 patients had values for all three instruments, so change scores could be calculated.

Table 1 Response rate at baseline and two year follow up together with pre- and post-treatment scale scores

Full size table

Measurement error

The SEM values calculated for patients who were stable for a period of 3 months are presented in Table 2.

Table 2 SEM and MDC ₉₅ values

Full size table

The smallest change score that could be said to represent a real change beyond measurement error with 95% probability in one individual (MDC₉₅) are presented in Table 2.

The proportion of patients with a change score value ≥MCD₉₅ was 69% for ODI, 57% for SF6D, and 45% for EQ5D.

Figure 1 shows a Bland-Altman plot of the SF6D and EQ5D baseline values. It illustrates a systematic variation (proportional error) in the EQ5D and SF6D scores, with less healthy individuals tending to have a higher score on the SF6D and healthier individuals tending to have a higher score on the EQ5D. The 95% Limits of Agreement (LOA) varied from −0.3 to 0.83 with a mean difference in scale scores (SD) of 0.23 (0.29).

Structural validity

When the SF6D items were used as one subset and the EQ5D items as another, the binominal test showed overlap of the 5% expected value with the 95% CI for each of the indexes. When the EQ5D and SF6D items were combined on a common scale, no overlap was identified. This finding could indicate that the indexes are unidimensional when considered separately, but that they do not exactly measure the same underlying construct [34, 40].

Figures 2 and 3 are graphic representations of the targeting of the SF6D and EQ5D items. Patients “ability” (level of health-related quality) and the item location (moving from an easy to a more difficult category of item answers in parallel with being increasingly disabled) are plotted on the same logarithmic scale. The bars in the top panels represent patient responses, and the bars in the bottom panels represent item thresholds on the scales. A threshold is the 0.5 probability point between adjacent item categories [41]. HRQoL levels (i.e., scoring values) decrease from left to right. Scoring responses outside the range of items represent a floor effect (to the right) or a ceiling effect (to the left). Responses outside the range of the scale give no additional information, and the test cannot discriminate between patients who fall in this area.

From Figures 2 and 3 it can be seen that the EQ5D was relatively well targeted for this group, with no sign of floor or ceiling effects, i.e., all responders were captured within the scale. With a me an person-location value of −0.132, the patients were at a slightly higher level of HRQoL than the scale could express. No floor or ceiling effect could be seen in the SF6D, but here the mean person-location was 1.423. This indicates that there is a tendency for patients to score at the lower end of the scale of this index.

Three of the items in the SF6D showed disordered threshold: question 1: Physical functioning, question 2: Role limitation and question 4: Pain. A better fit to the model was achieved if some of the response categories of these items were omitted. None of the questions in the EQ5D showed disordered thresholds.

Criterion validity

The correlation between baseline scores of ODI and EQ5D was r = 0,58 (n = 114, p=0.000) and for ODI and SF6D: r = 0.38 (n = 114, p = 0.000).

Responsiveness

a)
The correlation between change scores of ODI and EQ5D was r = 0,64 (n = 108, p=0.000) and between ODI and SF6D change scores: r = 0.77 (n = 108, p = 0.000).
b)
Spearman’s rho for the correlation between change scores of the instruments and global score categories was 0.84, 0.55 and 0.76 for ODI, EQ5D and SF6D respectively. The area under the ROC curve, the possibility of correctly discriminating between “improved” or “non-improved” patients with a 95% CI was: 94% (87.5–97.6) for ODI, 90% (82.1–94.6) for SF6D, and 83% (75–90) for EQ5D. The ROC curves are presented in Figure 4.

Interpretability

The MIC values defined as the most optimal cut-off point of change scores plotted on the ROC curve was for ODI: 12.88,(sensitivity 88%, specificity 85%), EQ5D: 0.173 (sensitivity: 73%, specificity 79%) and SF6D: 0,031 (sensitivity 93%, specificity 78%) (Figure 4).

Discussion

The present study failed to show similarity between EQ5D and SF6D in several important measurement properties. EQ5D had a higher value of inherent measurement error than SF6D. The mean difference between baseline score values had a wide 95% Limits of Agreement in the Bland-Altman plot signifying a low degree of agreement between the instruments [12, 42]. Rasch analysis showed that although EQ5D and SF6D separately seem to have unidimensional scale properties they probably do not measure the same underlying construct. SF6D show less similarity with the baseline scores of the disease specific instrument but were more responsive to detect a change in the underlying construct in addition to better ability to correctly diagnosing a patient as having improved when this was really the case even though it did not reach the level of the ODI. The MIC values were quite different and SF6D had a better ability to identify truly change in scale score beyond measurement error.

Van Stel et al. showed that the EQ5D and the SF6D yield dissimilar scores in patients with coronary heart disease, and consequently, they cannot be used interchangeably [43]. This is in line with the Bland-Altman plot pattern we found in our study and in agreement with other previously published reports [6, 13, 43]. Furthermore, we observed that the magnitude of difference between the two instruments in the Bland-Altman plot was beyond the MIC for both instruments and therefore interpreted as clinically significant.

In this study, sensitivity was defined as the proportion of patients that truly improved (true-positive rate), and the sensitivity was the proportion of patients that did not actually improve (true-negative rate). The EQ5D diagnosed fewer patients as clinically improved (change score values beyond MIC). This was also reflected in the MIC/MDC₉₅ ratio (the proportion of patients who truly changed with a possibility of 95% predicted by the instruments): For the MIC value to reach the MDC₉₅, the specificity for the SF6D would have to increase from 78.1 to 87.5, but the sensitivity would then fall from 92.5 to 73.7. For the EQ5D, this would necessitate an increase in specificity from 78.9 to 86.8 and a decrease in sensitivity from 72.8 to 57.6. In other words, to reach a value beyond the 95% CI for measurement error, the probability of correctly classifying a patient as improved would fall dramatically for the EQ5D, nearly reaching 50% or classifying by chance. The effect was not as dramatic for the SF6D, which would still correctly classify over 70% of patients as “improved”.

We found that the difference in the range of the scales between the SF6D and the EQ5D could be reflected in their targeting properties. Based on the Rasch analysis (Figures 3 and 4), we could hypothesize that patients were at a lower level of HRQoL than the SF6D could express (floor effect). The range of patient abilities was better captured within the EQ5D scale. Barton et al. [6] compared the performance of the EQ5D and the SF6D in 1865 individuals over ≥45 years old. They found that healthier individuals had higher scores on the EQ5D, and less healthy individuals such as patients with back pain had higher scores on the SF6D. In a study that compared the SF6D and the EQ5D in liver transplant patients, Longworth et al. observed that the SF6D does not describe health states at the lower end of the utility scale but is more sensitive than the EQ5D in detecting small changes at the top of the scale [14]. This result is somewhat confusing because the same group later published a paper in which they conclude that the SF6D can describe some “poor health states including states that (according to the EQ5D scoring algorithm) are viewed as worse than the state of being dead” [13].

The Rasch analysis also revealed that some of the SF6D items did not function as intended. A better fit to the model was achieved if some of the response categories of these items were collapsed (i.e., the category was removed from the item). An interpretation of this is that for these items, patients could not differentiate between two adjacent response categories and the information in the removed categories was therefore redundant. None of the items in the EQ5D showed similar signs of dysfunction. When treated as separate scales, both instruments showed signs of unidimensionality, but significant invariance across items was noted when analyzed as one scale (all items from the SF6D and the EQ5D put together). The interpretation of this was that the two scales seem to measure different aspects of HRQoL. Walters and Brazier mentioned that a fundamental assumption in their comparison of the EQ5D and the SF6D was that the instruments should measure the same underlying HRQOL variable [44].

Strengths and limitations of the study

Compared to Brazier et al. [7], SF6D in our study had a higher percentage of missing data at both assessment time points (baseline and 2-year follow up). As Brazier mentioned in another paper, this has important consequences for data quality [45].

The use of global assessment score has been questioned in several studies [46, 47]. Criticism of the reliability of anchor based methods includes no standardization of anchors, time dependence of patients perception of health, dependence on only one question and failure of the anchor question to differentiate between quantitative and qualitative perception of change [48]. The COSMIN study did not reach any consensus about which method to use to determine the MIC value but conclude that there is an ongoing discussion about this in the literature [23]. Some authors now suggest ROC analysis for determining MIC values mainly because it uses all available data and maximizes the number of individuals correctly classified [49]. The question and answer categories in our 7-point global scale was not a standardized scale but Spearman`s rho for the correlation between change scores of the instruments and global score categories used in the ROC analysis was considered acceptable (0.84, 0.55 and 0.76 for ODI, EQ5D and SF6D respectively) [46, 50, 51].

Conclusions

EQ5D and SF6D measure different aspects of HRQoL. The difference in psychometric properties between them and the lack of agreement is probably clinically significant. Because the ability to detect a change in the underlying construct and similarity to a disease-specific instrument is quite different, the choice of instrument should probably be guided by diagnosis and/or treatment choice. In our study of patients with chronic low back pain, the SF6D had the best ability to detect change and correctly identify patients as improved or non-improved beyond a 95% confidence level of measurement error.

Finally, our study supports the findings of Soegaard et al. [11]. They concluded that the SF6D and EQ5D cannot be used interchangeably for measurement of preference value and that sensitivity analysis examining the impact of between-measure discrepancy remains a necessary condition for cost-utility evaluation results.

Authors’ information

LGJ: M.D. orthopaedic surgeon, CH: M.D. orthopaedic surgeon., KS: Ph.D. physiotherapist, ØPN: M.D, Ph.D. neurosurgeon, professor, JIB: M.D., Ph.D. specialist in physical medicine and rehabilitation, Ivar Rossvoll: M.D., Ph.D. orthopaedic surgeon, GL: M.D., Ph.D. specialist in physical medicine and rehabilitation, professor.

References

Brazier J: Measuring and valuing health benefits for economic evaluation. 2007, New York: Oxford University Press
Google Scholar
Nord E: Health state values from multiattribute utility instruments need correction. Ann Med. 2001, 33 (5): 371-374. 10.3109/07853890109002091.
Article CAS PubMed Google Scholar
Brazier J, Roberts J, Deverill M: The estimation of a preference-based measure of health from the SF-36. J Health Econ. 2002, 21 (2): 271-292. 10.1016/S0167-6296(01)00130-8.
Article PubMed Google Scholar
Dolan P: Modeling valuations for EuroQol health states. Med Care. 1997, 35 (11): 1095-1108. 10.1097/00005650-199711000-00002.
Article CAS PubMed Google Scholar
Drummond MF: Methods for the economic evaluation of health care programmes, 3rd edn. Oxford. 2005, New York: Oxford University Press
Google Scholar
Barton GR, Sach TH, Avery AJ, Jenkinson C, Doherty M, Whynes DK, Muir KR: A comparison of the performance of the EQ-5D and SF-6D for individuals aged >or= 45 years. Health Econ. 2008, 17 (7): 815-832. 10.1002/hec.1298.
Article PubMed Google Scholar
Brazier J, Roberts J, Tsuchiya A, Busschbach J: A comparison of the EQ-5D and SF-6D across seven patient groups. Health Econ. 2004, 13 (9): 873-884. 10.1002/hec.866.
Article PubMed Google Scholar
Grieve R, Grishchenko M, Cairns J: SF-6D versus EQ-5D: reasons for differences in utility scores and impact on reported cost-utility. Eur J Health Econ. 2009, 10 (1): 15-23. 10.1007/s10198-008-0097-2.
Article PubMed Google Scholar
Sach TH, Barton GR, Jenkinson C, Doherty M, Avery AJ, Muir KR: Comparing cost-utility estimates: does the choice of EQ-5D or SF-6D matter?. Med Care. 2009, 47 (8): 889-894. 10.1097/MLR.0b013e3181a39428.
Article PubMed Google Scholar
Soegaard R: Interchangeability of the EQ-5D and the SF-6D in Long-Lasting Low Back Pain Source: Value in Health 12, no. 4 (2009): 606–612 Additional Info: Blackwell Publishing; 20090601 Standard No: ISSN: 1098–3015. Value Health. 2009, 12 (4): 606-612. 10.1111/j.1524-4733.2008.00466.x.
Article Google Scholar
Sogaard R, Christensen FB, Videbaek TS, Bunger C, Christiansen T: Interchangeability of the EQ-5D and the SF-6D in long-lasting low back pain. Value Health. 2009, 12 (4): 606-612. 10.1111/j.1524-4733.2008.00466.x.
Article PubMed Google Scholar
Bland JM, Altman DG: Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986, 1 (8476): 307-310.
Article CAS PubMed Google Scholar
Bryan S, Longworth L: Measuring health-related utility: why the disparity between EQ-5D and SF-6D?. Eur J Health Econ. 2005, 6 (3): 253-260. 10.1007/s10198-005-0299-9.
Article PubMed Google Scholar
Longworth L, Bryan S: An empirical comparison of EQ-5D and SF-6D in liver transplant patients. Health Econ. 2003, 12 (12): 1061-1067. 10.1002/hec.787.
Article PubMed Google Scholar
Hellum C, Johnsen LG, Storheim K, Nygaard OP, Brox JI, Rossvoll I, Ro M, Sandvik L, Grundnes O: Surgery with disc prosthesis versus rehabilitation in patients with low back pain and degenerative disc: two year follow-up of randomised study. BMJ. 2011, 342: d2786-10.1136/bmj.d2786.
Article PubMed PubMed Central Google Scholar
Fairbank JC, Couper J, Davies JB, O’Brien JP: The Oswestry low back pain disability questionnaire. Physiotherapy. 1980, 66 (8): 271-273.
CAS PubMed Google Scholar
Fairbank JC, Pynsent PB: The Oswestry Disability Index. Spine. 2000, 25 (22): 2940-2952. 10.1097/00007632-200011150-00017. discussion 2952
Article CAS PubMed Google Scholar
Grotle M, Brox JI, Vollestad NK: Cross-cultural adaptation of the Norwegian versions of the Roland-Morris Disability Questionnaire and the Oswestry Disability Index. J Rehabil Med. 2003, 35 (5): 241-247. 10.1080/16501970306094.
Article CAS PubMed Google Scholar
Ware JE, Sherbourne CD: The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection. Med Care. 1992, 30 (6): 473-483. 10.1097/00005650-199206000-00002.
Article PubMed Google Scholar
Dolan P, Gudex C, Kind P, Williams A: The time trade-off method: results from a general population study. Health Econ. 1996, 5 (2): 141-154. 10.1002/(SICI)1099-1050(199603)5:2<141::AID-HEC189>3.0.CO;2-N.
Article CAS PubMed Google Scholar
The EuroQol Group: EuroQol--a new facility for the measurement of health-related quality of life. Health Policy. 1990, 16 (3): 199-208.
Article Google Scholar
Ostelo RW, de Vet HC: Clinically important outcomes in low back pain. Best Pract Res Clin Rheumatol. 2005, 19 (4): 593-607. 10.1016/j.berh.2005.03.003.
Article PubMed Google Scholar
Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, Bouter LM, De Vet HC: COSMIN checklist manual. 2012
Google Scholar
Mokkink LB, Terwee CB, Knol DL, Stratford PW, Alonso J, Patrick DL, Bouter LM, de Vet HC: The COSMIN checklist for evaluating the methodological quality of studies on measurement properties: a clarification of its content. BMC Med Res Methodol. 2010, 10: 22-10.1186/1471-2288-10-22.
Article PubMed PubMed Central Google Scholar
van der Roer N, Ostelo RW, Bekkering GE, van Tulder MW, de Vet HC: Minimal clinically important change for pain intensity, functional status, and general health status in patients with nonspecific low back pain. Spine. 2006, 31 (5): 578-582. 10.1097/01.brs.0000201293.57439.47.
Article PubMed Google Scholar
Wyrwich KW, Nienaber NA, Tierney WM, Wolinsky FD: Linking clinical relevance and statistical significance in evaluating intra-individual changes in health-related quality of life. Med Care. 1999, 37 (5): 469-478. 10.1097/00005650-199905000-00006.
Article CAS PubMed Google Scholar
Wyrwich KW, Tierney WM, Wolinsky FD: Further evidence supporting an SEM-based criterion for identifying meaningful intra-individual changes in health-related quality of life. J Clin Epidemiol. 1999, 52 (9): 861-873. 10.1016/S0895-4356(99)00071-2.
Article CAS PubMed Google Scholar
Crosby RD, Kolotkin RL, Williams GR: Defining clinically meaningful change in health-related quality of life. J Clin Epidemiol. 2003, 56 (5): 395-407. 10.1016/S0895-4356(03)00044-1.
Article PubMed Google Scholar
Bland JM, Altman DG: Measurement error. BMJ. 1996, 313 (7059): 744-
Article CAS PubMed PubMed Central Google Scholar
de Vet HC, Terwee CB, Knol DL, Bouter LM: When to use agreement versus reliability measures. J Clin Epidemiol. 2006, 59 (10): 1033-1039. 10.1016/j.jclinepi.2005.10.015.
Article PubMed Google Scholar
Beaton DE: Understanding the relevance of measured change through studies of responsiveness. Spine. 2000, 25 (24): 3192-3199. 10.1097/00007632-200012150-00015.
Article CAS PubMed Google Scholar
Hagg O, Fritzell P, Nordwall A: The clinical importance of changes in outcome scores after treatment for chronic low back pain. Eur Spine J. 2003, 12 (1): 12-20.
CAS PubMed Google Scholar
Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, Bouter LM, de Vet HC: The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. J Clin Epidemiol. 2010, 63 (7): 737-745. 10.1016/j.jclinepi.2010.02.006.
Article PubMed Google Scholar
Smith EV: Detecting and evaluating the impact of multidimensionality using item fit statistics and principal component analysis of residuals. J Appl Meas. 2002, 3 (2): 205-231.
PubMed Google Scholar
Chou Y-T, Wang W-C: Checking Dimensionality in Item Response Models With Principal Component Analysis on Standardized Residuals. Educ Psychol Meas. 2010, 70 (5): 717-731. 10.1177/0013164410379322.
Article Google Scholar
Fairbank JC, Pynsent PB, 22: The Oswestry Disability Index. Spine (Phila Pa 1976). 2000, 25: 2940-2952. 10.1097/00007632-200011150-00017. discussion 2952
Article CAS Google Scholar
Zweig MH, Campbell G: Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clin Chem. 1993, 39 (4): 561-577.
CAS PubMed Google Scholar
Deyo RA, Centor RM: Assessing the responsiveness of functional scales to clinical change: an analogy to diagnostic test performance. J Chronic Dis. 1986, 39 (11): 897-906. 10.1016/0021-9681(86)90038-X.
Article CAS PubMed Google Scholar
Copay AG, Glassman SD, Subach BR, Berven S, Schuler TC, Carreon LY: Minimum clinically important difference in lumbar spine surgery patients: a choice of methods using the Oswestry Disability Index, Medical Outcomes Study questionnaire Short Form 36, and pain scales. Spine J. 2008, 8 (6): 968-974. 10.1016/j.spinee.2007.11.006.
Article PubMed Google Scholar
Tennant A, Conaghan PG: The Rasch measurement model in rheumatology: what is it and why use it? When should it be applied, and what should one look for in a Rasch paper?. Arthritis Rheum. 2007, 57 (8): 1358-1362. 10.1002/art.23108.
Article PubMed Google Scholar
Pallant JF, Tennant A: An introduction to the Rasch measurement model: an example using the Hospital Anxiety and Depression Scale (HADS). Br J Clin Psychol. 2007, 46 (Pt 1): 1-18.
Article PubMed Google Scholar
Bland JM, Altman DG: Measuring agreement in method comparison studies. Stat Methods Med Res. 1999, 8 (2): 135-160. 10.1191/096228099673819272.
Article CAS PubMed Google Scholar
van Stel HF, Buskens E: Comparison of the SF-6D and the EQ-5D in patients with coronary heart disease. Health Qual Life Outcomes. 2006, 4: 20-10.1186/1477-7525-4-20.
Article PubMed PubMed Central Google Scholar
Walters SJ, Brazier JE: Comparison of the minimally important difference for two health state utility measures: EQ-5D and SF-6D. Qual Life Res. 2005, 14 (6): 1523-1532. 10.1007/s11136-004-7713-0.
Article PubMed Google Scholar
Brazier J, Deverill M: A checklist for judging preference-based measures of health related quality of life: learning from psychometrics. Health Econ. 1999, 8 (1): 41-51. 10.1002/(SICI)1099-1050(199902)8:1<41::AID-HEC395>3.0.CO;2-#.
Article CAS PubMed Google Scholar
de Vet HC, Ostelo RW, Terwee CB, van der Roer N, Knol DL, Beckerman H, Boers M, Bouter LM: Minimally important change determined by a visual method integrating an anchor-based and a distribution-based approach. Qual Life Res. 2007, 16 (1): 131-142. 10.1007/s11136-006-9109-9.
Article PubMed Google Scholar
Guyatt GH, Norman GR, Juniper EF, Griffith LE: A critical look at transition ratings. J Clin Epidemiol. 2002, 55 (9): 900-908. 10.1016/S0895-4356(02)00435-3.
Article PubMed Google Scholar
Terwee CB, Roorda LD, Dekker J, Bierma-Zeinstra SM, Peat G, Jordan KP, Croft P, de Vet HC: Mind the MIC: large variation among populations and methods. J Clin Epidemiol. 2010, 63 (5): 524-534. 10.1016/j.jclinepi.2009.08.010.
Article PubMed Google Scholar
Turner D, Schunemann HJ, Griffith LE, Beaton DE, Griffiths AM, Critch JN, Guyatt GH: The minimal detectable change cannot reliably replace the minimal important difference. J Clin Epidemiol. 2010, 63 (1): 28-36. 10.1016/j.jclinepi.2009.01.024.
Article PubMed Google Scholar
Cella D, Hahn EA, Dineen K: Meaningful change in cancer-specific quality of life scores: differences between improvement and worsening. Qual Life Res. 2002, 11 (3): 207-221. 10.1023/A:1015276414526.
Article PubMed Google Scholar
Guyatt GH, Jaeschke RJ: Reassessing quality-of-life instruments in the evaluation of new drugs. Pharmaco Economics. 1997, 12 (6): 621-626. 10.2165/00019053-199712060-00002.
Article CAS Google Scholar

Pre-publication history

The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2474/14/148/prepub

Download references

Acknowledgements

All authors involved declare that they have no conflict of interests and no financial disclosures to report.

Financial disclosures

We want to thank the patients participating in the study, the South Eastern Norway Regional Health Authority and EXTRA funds from the Norwegian Foundation for Health and Rehabilitation, through the Norwegian Back Pain Association, for financial support and Hege Andresen at St.Olavs Hospital, Trondheim, for data coordination. Editorial assistance was delivered by Charlesworth Publishing Services at a price of 360$.

Author information

Authors and Affiliations

Neuroclinic; National Center of spinal disorder, St. Olavs Hospital, Trondheim University Hospital, Trondheim, Norway
Lars G Johnsen, Øystein P Nygaard & Ivar Rossvoll
Clinic of Orthopedics and Rheumatology, Department of Orthopaedic Surgery, St. Olavs Hospital, Trondheim University Hospital, Trondheim, Norway
Lars G Johnsen & Ivar Rossvoll
Department Of Neuromedicine, Faculty of Medicine, Norwegian University of Science and Technology, Trondheim, Norway
Lars G Johnsen, Øystein P Nygaard & Ivar Rossvoll
Clinic for Surgery and Neurology, Department of Orthopedics, Oslo University Hospital and University of Oslo, Oslo, Norway
Christian Hellum, Kjersti Storheim & Jens I Brox
Department of Clinical Medicine, Neuromuscular Diseases and Research Group, University of Tromsø, Tromsø, Norway
Gunnar Leivseth
Clinic for Surgery and Neurology, Oslo University, Oslo, Norway
Kjersti Storheim
Department of neurosurgery, St. Olavs Hospital, Trondheim University Hospital, Trondheim, Norway
Øystein P Nygaard
FORMI, Clinic for surgery and neurology, Ullevaal, Oslo, N-0407, Norway
Margreth Grotle
Faculty of health Sciences, Department of Physiotherapy, Oslo and Akershus University College of Applied Sciences, Oslo, Norway
Margreth Grotle

Authors

Lars G Johnsen
View author publications
You can also search for this author in PubMed Google Scholar
Christian Hellum
View author publications
You can also search for this author in PubMed Google Scholar
Øystein P Nygaard
View author publications
You can also search for this author in PubMed Google Scholar
Kjersti Storheim
View author publications
You can also search for this author in PubMed Google Scholar
Jens I Brox
View author publications
You can also search for this author in PubMed Google Scholar
Ivar Rossvoll
View author publications
You can also search for this author in PubMed Google Scholar
Gunnar Leivseth
View author publications
You can also search for this author in PubMed Google Scholar
Margreth Grotle
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lars G Johnsen.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

LGJ takes responsibility for the integrity of the data and the accuracy of the data analysis. LGJ performed the statistical analysis. LGJ, MG, IR and CH participated in the design of the study. LGJ and CH: Acquisition of data. LGJ, CH, ØPN, KS, JIB, IR, GL and MG conceived of the study and helped to draft the manuscript. All authors had full access to the data. All authors read and approved the final manuscript.

Lars G Johnsen and Margreth Grotle contributed equally to this work.

Electronic supplementary material

Additional file 1:Rasch analysis.(DOCX 50 KB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Johnsen, L.G., Hellum, C., Nygaard, Ø.P. et al. Comparison of the SF6D, the EQ5D, and the oswestry disability index in patients with chronic low back pain and degenerative disc disease. BMC Musculoskelet Disord 14, 148 (2013). https://doi.org/10.1186/1471-2474-14-148

Download citation

Received: 14 May 2012
Accepted: 19 April 2013
Published: 26 April 2013
DOI: https://doi.org/10.1186/1471-2474-14-148

Comparison of the SF6D, the EQ5D, and the oswestry disability index in patients with chronic low back pain and degenerative disc disease

Abstract

Background

Methods

Results

Conclusions

Background

Methods

Instruments

ODI

SF6D

EQ5D

Seven-point scale for patient assessment

Data analysis

Measurement error

Structural validity

Criterion validity

Responsiveness

Interpretability

Study approval

Results

Measurement error

Structural validity

Criterion validity

Responsiveness

Interpretability

Discussion

Strengths and limitations of the study

Conclusions

Authors’ information

References

Pre-publication history

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ contributions

Electronic supplementary material

Additional file 1:Rasch analysis.(DOCX 50 KB)

Authors’ original submitted files for images

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Musculoskeletal Disorders

Contact us