Hip disability and osteoarthritis outcome score (HOOS) – validity and responsiveness in total hip replacement

Background The aim of the study was to evaluate if physical functions usually associated with a younger population were of importance for an older population, and to construct an outcome measure for hip osteoarthritis with improved responsiveness compared to the Western Ontario McMaster osteoarthritis score (WOMAC LK 3.0). Methods A 40 item questionnaire (hip disability and osteoarthritis outcome score, HOOS) was constructed to assess patient-relevant outcomes in five separate subscales (pain, symptoms, activity of daily living, sport and recreation function and hip related quality of life). The HOOS contains all WOMAC LK 3.0 questions in unchanged form. The HOOS was distributed to 90 patients with primary hip osteoarthritis (mean age 71.5, range 49–85, 41 females) assigned for total hip replacement for osteoarthritis preoperatively and at six months follow-up. Results The HOOS met set criteria of validity and responsiveness. It was more responsive than WOMAC regarding the subscales pain (SRM 2.11 vs. 1.83) and other symptoms (SRM 1.83 vs. 1.28). The responsiveness (SRM) for the two added subscales sport and recreation and quality of life were 1.29 and 1.65, respectively. Patients ≤ 66 years of age (range 49–66) reported higher responsiveness in all five subscales than patients >66 years of age (range 67–85) (Pain SRM 2.60 vs. 1.97, other symptoms SRM 3.0 vs. 1.60, activity of daily living SRM 2.51 vs. 1.52, sport and recreation function SRM 1.53 vs. 1.21 and hip related quality of life SRM 1.95 vs. 1.57). Conclusion The HOOS 2.0 appears to be useful for the evaluation of patient-relevant outcome after THR and is more responsive than the WOMAC LK 3.0. The added subscales sport and recreation function and hip related quality of life were highly responsive for this group of patients, with the responsiveness being highest for those younger than 66.


Introduction
Some 20 different scores have been introduced to evaluate the results of total hip replacement (THR) [1]. In the last few years a number of generic and disease specific out-come measurements have been developed for measuring the outcome from the patient's point of view [2][3][4][5]. Patient relevant outcomes are now considered the primary outcome measure in clinical trials [6][7][8][9]. WOMAC, Western Ontario and MacMaster Universities Osteoarthritis Index LK 3.0 is a widely used disease specific instrument validated for OA in the lower extremities and for evaluating outcome after THR [5]. It has been proven to be responsive to clinical change over time after THR [10]. KOOS, Knee Injury and Osteoarthritis Outcome score [11] is a further development of WOMAC initially constructed as a measure of patient-relevant outcomes to be used in studies of the treatment of anterior cruciate ligament and meniscus injury. Special emphasis was given to ascertaining validity for young and middleaged patients with osteoarthritis (OA). It is currently being validated for OA patients assigned for total knee replacement. As an extension of the WOMAC, the KOOS also evaluates Sport and recreation function (SP) and knee related quality of life (QOL). These two subscales have consistently showed as high or higher sensitivity and responsiveness than the three subscales included in the WOMAC in young and middle-aged patients with knee injury and/or knee OA [11,12]. Today, many patients eligible for THR have expectations of more demanding physical functions than required for activities of daily living. This encouraged us to study the use of an adapted form of KOOS in patients receiving a total hip replacement for OA.
The aim of the study was to validate the HOOS (hip disability and osteoarthritis outcome score) for use in patients with hip osteoarthritis. Firstly, the instrument was adapted for use in patients with hip OA. Secondly, we studied the content validity, construct validity and responsiveness of the adapted instrument. Thirdly, the responsiveness of the WOMAC LK 3.0 [5] is compared with the responsiveness of the HOOS in patients with hip OA assigned for THR.

HOOS
The HOOS is an adaptation of the KOOS [11,13] intended to evaluate symptoms and functional limitations related to the hip. The HOOS consists of 40 items, selected from 51 original items (tables 1 and 2), assessing five separate patient-relevant dimensions: Pain (P) (ten items); Symptoms (S) including stiffness and range of motion (five items); Activity limitations-daily living (A) (17 items); Sport and Recreation Function (SP) (four items); and Hip Related Quality of Life (Q) (four items).
The HOOS contains all WOMAC LK 3.0 questions in unchanged form [5]. WOMAC scores can thus be calculated from the HOOS questionnaire. The HOOS dimension Activity of Daily Living is equivalent to that of Function in the WOMAC.
To answer each question, five Likert-boxes were used (no, mild, moderate, severe, extreme). All items were scored from zero to four, and each of the five subscales was calculated as the sum of the items included. To enhance the interpretation, HOOS is transformed into a 0-100 worst to best scale [14,15]. The subscores can be presented graphically as a HOOS profile (Fig. 1). Missing data were treated as such; one or two missing values were substituted with the average value for the dimension. If more than two items were omitted, the response for this dimension was considered invalid. Questions and answer options are given in Table 1. The questionnaire and users guide can be found on the web site http://www.koos.nu. The instrument is self-administered and takes seven to 10 minutes to complete.

SF-36
The SF-36 is a self-administered generic health status measure that contains 36 items [4]. It measures three major health attributes (functional status, well-being, overall health) and has eight subscales (physical function, role limitations due to physical health, bodily pain, general health, vitality, social function, role limitations due to emotional health, and mental health) [4,16] Study design and statistics Subjects HOOS and SF-36 data was obtained preoperatively from 90 patients (mean age 71.5, range 49-85, 41 females) who where assigned for THR due to primary hip OA. At follow-up six months later the patients completed the same questionnaire.

Content validity
To assess the content validity of the 40 items, a subgroup of patients (N = 26) were asked to rate the relevance or importance of each item on a scale from one to three: 1 = irrelevant, unimportant 2 = somewhat relevant, somewhat important 3 = very relevant, very important Mean relevance scores for each item were calculated. It was considered that the mean score of an item should be at least 2.0 (possible range, 1.0 to 3.0) to justify inclusion into the HOOS. Additionally, the percentages of patients grading each item as of no importance, little importance, and great importance were calculated. It was decided that at least 67% of the patients should grade an item as being of at least somewhat important to justify inclusion in the HOOS. The cut off levels were in agreement with a previous study where the KOOS was adapted for patients with foot and ankle problems [17]. The percentages of patients having floor effects (worst possible scores) or ceiling effects (best possible scores) of each subscale were calculated.

Dimensionality
Dimensionality was assessed by performing principal component factor analysis, firstly by entering all 40 items into one analysis, and secondly by performing one analysis for each subscale. Failure to load on a single major factor suggests that the items do not all measure the same aspect. An eigenvalue criterion of 1.0 was used for these factor analyses, and the results are given in terms of the percentage of variance in the scale score explained by the principal factor.

Construct validity
The Hip Disability and Osteoarthritis Outcome Score was developed to assess patient-relevant aspect of hip related problems. Spearman's correlation coefficients (r s ) were calculated to assess construct validity of the HOOS in comparison to the SF-36. It was hypothesized a priori that the correlations to the SF-36 subscales physical function and bodily pain should be high, the correlations to the subscale general health should be low and the correlation to the other subscales, role physical, vitality, social function, role emotional and mental health should be moderate.

Responsiveness
Responsiveness was calculated as standardized response mean (SRM). SRM is defined as mean score change divided by the standard deviation of that score change [18]. An SRM >0.8 is considered large.

Statistics
The sampling distribution (mean, standarddeviation) of the SRMs of the two measurements was estimated with a jackknife procedure [18], programmed in SPSS 11.5, and then tested with a paired t-test.

Missing data
Of the individual items 2.6 % were missing (96 items in 90 patients x40 items), and a total score could be calculated for all subscales for 99 % of the patients.
Of the 90 patients, 28 were excluded during the six months follow-up (22 abstained, 3 were operated on the contra-lateral side, 3 suffered from other diseases that made it impossible to participate). Thus the results of 62 patients (mean age 72.8, range 53-85, 28 females) are presented at the six months follow up.
Item selection 40 out of 51 original items were selected. A pre-requisite was that all WOMAC LK 3.0 items should be retained. The process of selecting the items was due to the content validity, the dimensionality the construct validity and the responsiveness of the items and the new subscales. The selected items are presented in Table 1 and the items not selected in Table 2.

Content validity
The limit set to justify inclusion of an item into the HOOS was set to a mean relevance score of above 2.0. One selected item, S10, difficulty spreading your legs, had a mean relevance score 1.7 but was selected due to a high responsiveness (SRM = 1.44). Three selected items from the subscale ADL (A13-15, Table 1) had a mean relevance score of 1.3-1.7 but was selected because they are items included in the WOMAC LK 3.0. All other selected items had a mean relevance score above 2.0 (range 2.0-3.0). All items were considered as being of at least some importance by more than 67% of the patients (range 69% to 100%), the limit set to justify inclusion into the HOOS. All items included in the Sport and Recreation (5/5), Hip related Quality of Life 8 (4/4) Pain (10/10), Activities of Daily Living (17/17) and most items included in Symptoms (4/5) were considered at least somewhat important by more than 80% of the patients.

Score distributions
The subscales indicating the most problems preoperatively were Sport and Recreation Function and Hip Related Quality of Life with mean scores of 17.2 and 21.4 on a 0-100 scale, worst to best. The subscale indicating the least problems preoperatively was Activities of Daily Living with a mean score of 37.8 ( Figure 1). 12 patients reported worst possible score (floor effect) in the subscale Sport and Recreation Function, four patients in the subscale Hip Related Quality of Life and two patients in the subscale Symptoms. One patient reported best possible score (ceiling effect) preoperatively in the subscale Symptoms.
The subscales indicating most problems at follow-up six months postoperatively were Sport and Recreation Function and Hip Related Quality of Life with mean scores of 56.3 and 66.2. The subscale indicating the least problems at follow-up was Pain with a mean score of 82.3 (Table 3).
No patients reported worst possible score (floor effect) in any of the subscales at follow up, while best possible score (ceiling effect) was reported by 19% of the patients for the Pain scale, 10% for the Symptoms scale, 5% for the Activity of Daily Living scale, 9% for the Sport and Recreation scale and 9% for the Hip Related Quality of Life scale. As a comparison, for the WOMAC, no floor effects were reported postoperatively but 26% reported best possible score in the subscale Pain and 17% in the subscale Stiffness.

Construct validity
The highest correlations occurred between the SF-36 subscales and the HOOS scales that are intended to measure similar constructs (physical function vs. ADL, r S = 0.66, physical function vs. sport and recreation, r = 0.49, bodily pain vs. pain, r = 0.61). Generally, higher correlations were seen when comparing HOOS scales to SF-36 scales with a high ability to measure physical health, and lower correlations were seen when comparing HOOS scales to SF-36 scales with a high ability to measure mental health, as shown in Table 4.

Responsiveness
The scores from all subscales improved significantly (p < 0.0001) postoperatively as compared to preoperative values (Table 3, Figure 1).  Table 5). Any comparisons between the SRMs of these two age groups where not possible to do due to a too small sample size (n = 17, n = 47).  Single items that were not included in the HOOS due to a relatively low responsiveness were S4, S5, A19, SP3 and SP5 (Table 2).

Discussion
With improved general health and increased life span, expectations on physical activity and function by elderly are ever higher. This raises the standard of outcome after THR. When measuring outcome after THR it is important to take into consideration the patient's expectations [19].
For active persons as well as for the more disabled, absence of pain is the most important reason for surgery. Nevertheless, improved physical function is one of the main goals with the operation. In a previous study it was shown that younger patients obtain a better postoperative outcome than older, as assessed by WOMAC and SF-36 [20]. This result was confirmed with the use of HOOS in the present study. We also found that Sport and Recreation Function and Hip Related Quality of Life were highly responsive for this group of patients with an average age of 73 years at surgery (range 53-85), with the responsiveness being highest for those younger than 66 years. These dimensions are usually associated with a younger population but appear to be important also for the older. The results of this study show that HOOS has a higher responsiveness than WOMAC LK 3.0 for these patients and may be useful in evaluating hip OA and intervention outcome in different age groups.

Content validity
The time consuming process of developing a questionnaire could be shortened if already existing questionnaires could be adapted for use in similar patient groups. The present study, and previous studies [5,21], indicate that it is possible to adapt already existing outcome measures to obtain increased responsiveness. A subgroup of patients in this study rated the relevance of each item included in the HOOS as shown in table 1. The additional questions dealt with walking on hard and uneven ground as well as spreading the legs. These problems were important for patients with hip dysfunction. This is in contrast to two of the questions from KOOS (table 2) which dealt with swelling and if the hip would catch or hang up when moving. These questions appear to be knee-related questions.

Dimensionality
Factor analysis can be used to check that each item has been attributed to the right scale [22]. When analysis was performed on all 51 items it was obvious that some of the  items in each subscale loaded on another factor. That was the reason why P2, S8 and Q5 (table 2) were not selected though they were relevant and had a sufficient responsiveness. If an item loads on more than one factor it is difficult to know whether you measure what you intend to measure or not. Weakness, for example, (S8) can definitely be a problem for patients with hip dysfunction but the reason for the problem may be of another cause than OA. When the selected 40 items were analyzed, all items loaded on a major factor for each subscale (data not shown).

Construct validity
It is generally accepted that convergent construct validity is demonstrated if the correlations between scores on the same health component, as measured by two different instruments, is positive and appreciably above zero [23].
Mc Dowell and Newell [24] have noted in a review of rating scales and questionnaires that correlations for convergent construct validity often fall between 0.2 and 0.6 and rarely above 0.7. Consequently, the instrument used for comparison must be well validated. The SF-36 is such an instrument and has been used to assess outcome after THR [25]. In the present study the correlations of the all the HOOS subscales except Symptoms to SF-36 subscales Physical Function and Bodily Pain ranged from 0.44-0.66 (Table 4). The HOOS subscale Symptoms showed a lower correlation 0.35-0.38. Likewise the correlations of the WOMAC subscale Stiffness to the SF-36 subscales Physical Function and Bodily Pain were lower than the correlations of the two other WOMAC subscales. This may be due to the relatively low sensitivity for change over time for the two questions dealing with stiffness. However, these questions seemed to be relevant for the patients.

Responsiveness
Responsiveness to clinical change is an important characteristic of outcome measures. High responsiveness indicates that fewer subjects are needed to demonstrate a significant difference [26,27].
Hip replacement is above all surgery for relieving pain and improving physical function [28]. In the present study the HOOS subscale Pain showed the highest responsiveness ( Table 5). The difference in responsiveness between the younger and older group of patients was less for the subscale Pain than the subscale Activity of Daily Living. This finding is in concordance with a previous study where the age of the patient seemed to be more important for improvement postoperatively for physical function than for pain [20]. In other words, pain is the most serious problem in this group of patients independent of age. Nevertheless, there is a great improvement concerning Sport and Recreation Function as well as Hip Related Quality of Life (Figure 1), most pronounced in the younger group of patients (Table 5). It is noteworthy that these dimensions that are associated with young patients are important also in this comparatively old group of patients. The items added to the HOOS compared to the WOMAC subscales Pain and Symptoms resulted in higher responsiveness. The follow-up time in the present study was limited to 6 months. It is known from a previous study [20] that pain relief is experienced very soon after surgery, while adaptation to the new health status takes at least 1 year. This implies that you could expect an even higher effect size after one year.

Limitations
A limitation of the present study is that reliability was not assessed. However, when the corresponding measure KOOS was used in patients assigned for total knee replacement the test-retest stability was high with intraclass correlation coefficients for all subscales exceeding 0.75 (unpublished data). The reliability of HOOS in hip OA will need to be confirmed in further studies.
In conclusion the HOOS 2.0 appears to be useful for the evaluation of patient-relevant outcome after THR in hip OA and is more responsive than the WOMAC 3.0. The subscales Sport and Recreation Function and Hip Related Quality of Life were highly responsive for this group of patients with an average age at surgery of 73 years, with responsiveness being highest for those younger than 66 years of age.

Competing interests
Non declared