Research article | Open | Open Peer Review | Published:
Clinically important improvement thresholds for Harris Hip Score and its ability to predict revision risk after primary total hip arthroplasty
BMC Musculoskeletal Disordersvolume 17, Article number: 256 (2016)
Some aspects of validity are missing for the Harris Hip Score (HHS). Our objective was to examine the clinically meaningful change thresholds, responsiveness and the predictive ability of the HHS questionnaire.
We included a cohort of patients who underwent primary total hip arthroplasty (THA) and responded to the HHS preoperatively and at 2- or 5-year post-THA (change score) to examine the clinically meaningful change thresholds (Minimal clinically important improvement, MCII; and moderate improvement), responsiveness (effect size (ES) and standardized response mean (SRM)) based on pre- to post-operative change and the predictive ability of change score or absolute postoperative score at 2- and 5-years post-THA for future revision.
Two thousand six hundred sixty-seven patients with a mean age of 64 years completed baseline HHS; 1036 completed both baseline and 2-year HHS and 669 both baseline and 5-year HHS. MCII and moderate improvement thresholds ranged 15.9–18 points and 39.6–40.1 points, respectively. ES was 3.12 and 3.02 at 2- and 5-years; respective SRM was 2.73 and 2.52. There were 3195 hips with HHS scores at 2-years and 2699 hips with HHS scores at 5-years (regardless of the completion of baseline HHS; absolute postoperative scores). Compared to patients with absolute HHS scores of 81–100 (score range, 0–100), patients with scores <55 at 2- and 5-years had higher hazards (95 % confidence interval) of subsequent revision, 4.34 (2.14, 7.95; p < 0.001) and 3.08 (1.45, 5.84; p = 0.002), respectively. Compared to HHS score improvement of >50 points from preoperative to 2-years post-THA, lack of improvement/worsening or 1–20 point improvement were associated with increased hazards of revision, 18.10 (1.41, 234.83; p = 0.02); and 6.21 (0.81, 60.73; p = 0.10), respectively.
HHS is a valid measure of THA outcomes and is responsive to change. Both absolute HHS postoperative scores and HHS score change postoperatively are predictive of revision risk post-primary THA. We defined MCID and moderate improvement thresholds for HHS in this study.
Total Hip Arthroplasty (THA) is the second most commonly performed arthroplasty in the U.S. In 2010, 438,000 THAs were done in the U.S.  and its utilization rate is increasing rapidly. U.S. Medicare spent $9 billion on implantable medical device procedures in 2009 .
The improvements in pain and function after THA are measured with instruments such as Harris Hip Score (HHS) . HHS is the most commonly used instrument for the assessment of outcomes post-THA . HHS is valid and reliable [5–8] and is often used as a reference/gold standard for assessing the construct validity of other patient-reported outcome measures (PROs) for hip outcomes . HHS is more responsive than the Western Ontario McMaster Osteoarthritis Index (WOMAC)  (a pain and function composite measure), short form-36 (SF-36) [8, 10, 11] (a generic health-related quality of life measure) and the walking speed  (an objective measure). HHS is joint-specific, measures hip outcomes and is widely available. A surgeon or a health professional usually completes HHS.
To our knowledge, despite its widespread use, there are no published data regarding what is a clinically important difference on the HHS or whether the HHS scores are predictive of the risk of future revision surgery. Defining clinically important thresholds for outcomes instruments such as HHS is critical, since it has direct clinical care and clinical trial relevance. For clinical care, this threshold could define what proportion of patients improved (by what matters to a patient, which is more relevant than a mean change for a population and/or a p-value for group means) with any new clinical initiative. For a clinical trial, thresholds would allow a comparative assessment of one intervention vs. another, and thus allow the design of clinical trials with an adequate sample size to differentiate two interventions . If HHS can predict the risk of early revision surgery, future studies could assess its utility as a screening tool for implant failures after THA. The objective of this study was to define the clinically meaningful thresholds for improvement and assess the responsiveness of HHS and examine its predictive ability for early revision surgery in patients with primary THA.
Study participants and the HHS questionnaire
The Human Ethics Committee at the Mayo Clinic approved the study and research was carried out in compliance with the Helsinki Declaration. We received a waiver of written informed patient consent for this database study. Two study cohorts of patients who underwent primary THA at the Mayo Clinic between 1993 and 2005 were examined: (1) patients who completed HHS at baseline and at follow-up at either 2- or 5-year (cohort 1); (2) patients who completed follow-up HHS at either 2- or 5-year (cohort 2). All analyses were performed for the first cohort and in addition, we performed analysis of predictive ability of final postoperative HHS scores at 2- and 5-years for the latter cohort. HHS is a composite measure, with score ranging from 0 to 100, heavily weighted by pain and function; a higher score is better. It includes four domains: pain (1 item; 44 points), physical function (7 items; 47 points), deformity (5 items; 5 points), and range of motion (5 items; 4 points) .
Responsiveness and predictive ability
Minimal clinically important improvements (MCII) and moderate improvements were calculated by assessing mean change from baseline to 2- and 5-year follow-up in patients who reported “somewhat better now” or “much better now” in response to the global question, “Compared to before surgery, how is your hip?” at both 2- and 5-years, respectively. To assess responsiveness, we calculated effect size (ES) and standardized response mean (SRM). We calculated ES by dividing the change in hip score from baseline to 2-years (or 5-years) by the baseline standard deviation (preoperative; SD). According to the Cohen’s rule, an ES of 0.20–0.49 represents a small change, 0.50–0.79 a medium change, and ≥ 0.80 a large change. The SRM is the mean change in the patient score divided by the SD of the changed scores . These analyses were performed for the cohort with pre-operative and at least one post-operative follow-up, 2 or 5-year.
Descriptive statistics were reported as number (percentage) or mean (SD) as appropriate. We examined the associations of the final HHS score at 2- or 5-years or change in HHS score from baseline to 2- and 5-years, with the risk of subsequent revision THA, at ≥ 731 days post-surgery for 2-year and ≥ 1826 days for 5-year (Day 0 for all revision analyses, respectively). Final HHS scores were categorized into ≤ 55, 56–63, 64–71, 72–80 and 81–100, based on quintiles. Only the first hip in a patient was included for this study; patients with simultaneous bilateral THAs were excluded. We performed sensitivity analyses by using the traditional categorization of HHS: < 70, poor; 70–79, fair; 80–89, good; and 90–100 excellent. Improvement in HHS was categorized a priori as ≤ 0 (no improvement or worsening), 1–20, 21–50 vs. > 50, based on clinical judgment from an orthopedic surgeon (D.G.L), a co-author of the current study. Cox proportional hazards regression was used, reporting a hazard ratio and 95 % confidence intervals. Kaplan-Meier survival was used to estimate implant survival based on the absolute and change in HHS scores at 2- or 5-years. A p-value of less than 0.05 was considered significant. Since death is a competing risk, we also performed competing risk models adjusting for death.
Two thousand six hundred sixty-seven patients had completed baseline HHS, of whom 1036 had completed both baseline and 2-year HHS and 669 had completed both baseline and 5-year HHS; 338 patients completed all three assessments (baseline and 2- and 5-year HHS (Fig. 1). Mean age was 64 years and 51 % were women. The dempgraphic and clinic characteristics of two cohorts used for (1) responsiveness, clinically important improvement thresholds and change HHS scores (both pre- and 2- or 5-year post-arthroplasty scores) and (2) predictive ability of 2- or 5-year HHS scores only, are shown in Appendix: Table 3. The causes for early revision in the 2- and 5-year cohorts are shown in Appendix: Table 4.
Clinically important improvement thresholds for the HHS and its responsiveness
MCII threshold for HHS ranged 15.9–18 points (Table 1); and moderate improvement threshold was 39.6–40.1 points. We found that the effect sizes for HHS pre- to post-THA were large, estimated at 3.12 and 3.02 at 2-, and 5-years. SRM was 2.73 and 2.52, at 2- and 5-years, respectively (Table 1).
Predictive ability of the absolute and change HHS scores
There were 3195 hips with HHS scores at 2-years and 2699 hips with HHS scores at 5-years (regardless of the completion of the baseline HHS). Low total HHS scores at 2- and 5-years were associated with a higher risk of revision surgery after each time-point (Table 2). Compared to patients with follow-up HHS scores of 81–100, patients with HHS scores < 55 at 2- and 5-years, had 4.34 and 3.08 higher hazards of revision subsequently, both statistically significant (Table 2). Results were similar when we used the traditional clinical cut offs of < 70, 70–79, 80–89, and 90–100 (Table 2).
Compared to an improvement of > 50 points, no improvement/worsening HHS score, i.e., change ≤ 0, was associated with a 18-fold increased risk (p = 0.02) at 2-years post-primary THA and improvement of only 1–20 points with a 6-fold increased risk of revision (p = 0.10) (Table 2). K-M graphs showed that 2-year HHS scores (p =0.0018) was associated with revision risk and change HHS score at 2-years seemed to be associated as well with borderline statistical significance (p =0.062; Fig. 2a-d). Models accounting for death as a competing risk confirmed findings; the total number of patients was small for improvement (Appendix: Table 5).
Our study is the first study to define the thresholds for patient-relevant meaningful improvements on HHS. These thresholds were 16–18 points for MCII and 40 points for moderate improvement. We used a patient anchor to define these thresholds. MCII for HHS can now be used for trial sample size calculations and comparative effectiveness studies in THA. The use of MCII in arthroplasty trials will allow future trials to be adequately powered to examine group-level differences in patient-level outcome assessment on HHS. This information will compliment the mean HHS score comparisons, an approach that combines responders and non-responders in a single group. Consensus recommendations from an international pain trial group are to report both means and %responders in clinical trials, where pain is one of the primary outcomes . Similar thresholds have been reported for another instrument, the Mayo Hip Score , that is patient-administered, not physician-reported.
Our MCII threshold is similar to/lower than MCII estimates for other 0–100 pain scales, for example, the thresholds of 15–20 mm on WOMAC pain , 24 mm on HOOS pain , 20 mm on VAS pain in pain trials  and 22 mm on VAS pain in gout trials . This finding establishes the sensitivity to change for HHS, an important property for an instrument. Moderate improvement reported in our study are similar to the substantial improvement of 50 % in pain [14, 18] and to American College of Rheumatology (ACR50) response in patients with rheumatoid arthritis, i.e., 50 % improvement in composite criteria . An earlier study in revision THA that did not use a patient anchor defined the MCII using a statistical definition and reported it to be 2.44 points , much lower than the 20-unit threshold previously defined on 0–100 pain scales in trials of chronic pain  and gout . This might be due to very low variability on HHS in this small sample, and is discordant with available literature for other similar pain scales. A 2 mm improvement on a 0–100 scale (~2–4 % change depending on the baseline) does not mean much to patients or providers, and in most instances will be indistinguishable from baseline noise.
We found that HHS scores < 55 at 2- and 5-years as well as the lack of improvement or worsening on change HHS scores from baseline to 2-years were each predictive of early revision in patients with primary THA. This is a new finding, to our knowledge. The robustness of this finding is supported by the consistency of estimates at both 2- and 5-years. Our finding links HHS at early follow-up to future implant failure and indicates that absolute HHS scores and its trajectory post-THA may help to screen early implant failures after primary THA. More research is needed to see if risk prediction models using PROs such as HHS can be developed for early implant failure and validated, similar to a proposal based on Oxford scores .
Wright et al. reported ES and SRM of 2.5 and 1.8 for HHS in a sample of 78 THA patients at 6-month post-surgery , confirmed in our study and extended to a longer follow-up. A previous study showed a much larger effect size of 8.6 for HHS in patients with revision THA ; SRM in hip fracture was 0.75 for HHS . Some reasons for the difference in findings from our study vs. previous study for HHS effect size were the difference in patient population (primary THA vs. revision THA), length of follow-up (up to 5-years vs. 6 months), and the setting (USA vs. China).
Our study strengths include a large primary THA patient sample from a Joint Registry, the use of well-accepted methods to examine validity, the performance of sensitivity analyses and robustness of estimates across 2- and 5-year data.
Our study has several limitations. MCII thresholds are not absolute, but rather estimates and can vary somewhat from one population to another, and between studies. Our findings were derived from a single center primary THA cohort, and our patient characteristics are similar to several other THA studies including the one with a national sample [23–25], implying that our sample is representative of primary THA. However, findings may not be generalizable to revision THA. However, more studies in other populations can determine the slight variation in MCII thresholds or confirm that these thresholds hold true for those groups as well. Incorporation of 2- or 5-year HHS scores into surgeon’s clinical decision-making may have biased findings to be more significant that they actually might be. However, many revisions occurred at a time much after the assessments, and it is possible that the surgeon did not access old scores. We recognize that radiographic changes after arthroplasty are very important in deciding whether revision surgery is needed or not, since asymptomatic patients sometimes show loose hip implants. The present study did not assess the predictive ability of hip radiographs alongside Harris Hip Score. This should be investigated in future studies. Our study establishes the thresholds for clinically meaningful change on HHS; however, ceiling effects have been noted with HHS .
In conclusion, we found that in patients who underwent primary THA, HHS was responsive to change and predictive of the risk of revision after primary THA. Clinically important improvement thresholds for minimal and moderate clinically important improvements on HHS were defined in this study can now be used in arthroplasty clinical trials and clinical care. This report also establishes an additional utility of HHS, in predicting early revision surgery after THA, an important outcome.
CI, confidence interval; HHS, Harris Hip Score; MCID, Minimal clinically important difference; MCII, Minimal clinically important improvement; SD, standard deviation; THA, Total hip arthroplasty
Healthcare Cost and Utilization Project (HCUP). HCUP Facts and Figures 2009- Section 3: Inpatient Hospital Stays by Procedure. Exhibit 3.1 Most Frequent All-listed Procedures. http://hcup-us.ahrq.gov/reports/factsandfigures/2009/pdfs/FF_2009_section3.pdf. Accessed 3 June 2016.
Report to the Chairman, Committee on Finance, U.S. Senate. Medicare. Lack of Price Transparency May Hamper Hospitals’ Ability to Be Prudent Purchasers of Implantable Medical Devices. GAO-12-126. http://www.gao.gov/assets/590/587688.pdf. Accessed 3 June 2016.
Harris WH. Traumatic arthritis of the hip after dislocation and acetabular fractures: treatment by mold arthroplasty. An end-result study using a new method of result evaluation. J Bone Joint Surg Am. 1969;51(4):737–55.
Riddle DL, Stratford PW, Singh JA, Strand CV. Variation in outcome measures in hip and knee arthroplasty clinical trials: a proposed approach to achieving consensus. J Rheumatol. 2009;36(9):2050–6.
Soderman P, Malchau H. Is the Harris hip score system useful to study the outcome of total hip replacement? Clin Orthop Relat Res. 2001;384:189–97.
Soderman P, Malchau H, Herberts P. Outcome of total hip replacement: a comparison of different measurement methods. Clin Orthop Relat Res. 2001;390:163–72.
Kavanagh BF, Fitzgerald Jr RH. Clinical and roentgenographic assessment of total hip arthroplasty. A new hip score. Clin Orthop Relat Res. 1985;193:133–40.
Wright JG, Young NL. A comparison of different indices of responsiveness. J Clin Epidemiol. 1997;50(3):239–46.
Shields RK, Enloe LJ, Evans RE, Smith KB, Steckel SD. Reliability, validity, and responsiveness of functional tests in patients with total joint replacement. Phys Ther. 1995;75(3):169–76. discussion 176–169.
Shi HY, Chang JK, Wong CY, Wang JW, Tu YK, Chiu HC, Lee KT. Responsiveness and minimal important differences after revision total hip arthroplasty. BMC Musculoskelet Disord. 2010;11:261.
Hoeksma HL, Van Den Ende CH, Ronday HK, Heering A, Breedveld FC. Comparison of the responsiveness of the Harris Hip Score with generic measures for hip function in osteoarthritis of the hip. Ann Rheum Dis. 2003;62(10):935–8.
Singh J, Sloan JA, Johanson NA. Challenges with health-related quality of life assessment in arthroplasty patients: problems and solutions. J Am Acad Orthop Surg. 2010;18(2):72–82.
Cohen J. A power primer. Psychol Bull. 1992;112(1):155–9.
Dworkin RH, Turk DC, McDermott MP, Peirce-Sandner S, Burke LB, Cowan P, Farrar JT, Hertz S, Raja SN, Rappaport BA, et al. Interpreting the clinical importance of group differences in chronic pain clinical trials: IMMPACT recommendations. Pain. 2009;146(3):238–44.
Singh JA, Schleck C, Harmsen WS, Lewallen D. Validation of the Mayo Hip Score: construct validity, reliability and responsiveness to change. BMC Musculoskelet Disord. 2016. (In Press).
Tubach F, Ravaud P, Baron G, Falissard B, Logeart I, Bellamy N, Bombardier C, Felson D, Hochberg M, van der Heijde D et al. Evaluation of clinically relevant changes in patient reported outcomes in knee and hip osteoarthritis: the minimal clinically important improvement. Ann Rheum Dis. 2005;64(1):29–33.
Paulsen A, Roos EM, Pedersen AB, Overgaard S. Minimal clinically important improvement (MCII) and patient-acceptable symptom state (PASS) in total hip arthroplasty (THA) patients 1 year postoperatively. Acta Orthop. 2014;85(1):39–48.
Farrar JT, Young Jr JP, LaMoreaux L, Werth JL, Poole RM. Clinical importance of changes in chronic pain intensity measured on an 11-point numerical pain rating scale. Pain. 2001;94(2):149–58.
Singh JA, Yang S, Strand V, Simon L, Forsythe A, Hamburger S, Chen L. Validation of pain and patient global scales in chronic gout: data from two randomised controlled trials. Ann Rheum Dis. 2011;70(7):1277–81.
Felson DT, Anderson JJ, Boers M, Bombardier C, Furst D, Goldsmith C, Katz LM, Lightfoot R, Jr., Paulus H, Strand V et al. American College of Rheumatology. Preliminary definition of improvement in rheumatoid arthritis. Arthritis Rheum. 1995;38(6):727–35.
Rothwell AG, Hooper GJ, Hobbs A, Frampton CM. An analysis of the Oxford hip and knee scores and their relationship to early joint revision in the New Zealand Joint Registry. J Bone Joint Surg Br. 2010;92(3):413–8.
Frihagen F, Grotle M, Madsen JE, Wyller TB, Mowinckel P, Nordsletten L. Outcome after femoral neck fractures: a comparison of Harris Hip Score, Eq-5d and Barthel Index. Injury. 2008;39(10):1147–56.
Huddleston JI, Wang Y, Uquillas C, Herndon JH, Maloney WJ. Age and obesity are risk factors for adverse events after total hip arthroplasty. Clin Orthop Relat Res. 2012;470(2):490–6.
Jimenez-Garcia R, Villanueva-Martinez M, Fernandez-de-Las-Penas C, Hernandez-Barrera V, Rios-Luna A, Garrido PC, de Andres AL, Jimenez-Trujillo I, Montero JS, Gil-de-Miguel A. Trends in primary total hip arthroplasty in Spain from 2001 to 2008: evaluating changes in demographics, comorbidity, incidence rates, length of stay, costs and mortality. BMC Musculoskelet Disord. 2011;12:43.
Kirksey M, Chiu YL, Ma Y, Della Valle AG, Poultsides L, Gerner P, Memtsoudis SG. Trends in in-hospital major morbidity and mortality after total joint arthroplasty: United States 1998–2008. Anesth Analg. 2012;115(2):321–7.
Wamper KE, Sierevelt IN, Poolman RW, Bhandari M, Haverkamp D. The Harris hip score: Do ceiling effects limit its usefulness in orthopedics? Acta Orthop. 2010;81(6):703–7.
We thank Youlonda Lochler for assisting data extraction from the Mayo Clinic Total Joint Registry.
Role of the funding source
The funding source, Department of Orthopedic Surgery at Mayo Clinic, played no role in protocol development, study conduct, interpretation of results, or the decision to submit the manuscript for publication.
JS designed the study, wrote the protocol and submitted it for approval by the ethics committee. CS obtained the data and performed data programming. CS and WSH performed data analyses. JS, CS, WSH, and DGL reviewed the analyses. JS wrote the first draft of the manuscript. JS, CS, WSH, and DGL made critical revisions and edits. All authors approved the final version of the report.
There are no financial or non-financial conflicts related directly to this work. JAS has received research grants from Takeda and Savient and consultant fees from Savient, Takeda, Regeneron, Iroko, Merz, Bioiberica, Crealta, and Allergan pharmaceuticals, WebMD, UBM LLC and the American College of Rheumatology. JAS serves as the principal investigator for an investigator-initiated study funded by Horizon pharmaceuticals through a grant to DINORA, Inc., a 501c3 entity. JAS is a member of the executive of OMERACT, an organization that receives arms-length funding from 36 companies; a member of the American College of Rheumatology’s Guidelines Subcommittee of the Quality of Care Committee; and a member of the Veterans Affairs Rheumatology Field Advisory Committee. DGL has received royalties/speaker fees from Zimmer, Orthosonic and Osteotech, has been a paid consultant and owns stock in Pipeline Biomedical and his institution has received research funds from DePuy, Stryker, Biomet and Zimmer. CS and WSH have no conflicts to declare.
“The views expressed in this article are those of the authors and do not necessarily reflect the position or policy of the Department of Veterans Affairs or the United States government.”
This material is the result of work supported by the Department of Orthopedic Surgery, Mayo Clinic School of Medicine, Rochester, MN, USA. JAS is also supported by the resources and the use of facilities at the VA Medical Center at Birmingham, Alabama, USA.
We will share data with any investigator interested in replicating these findings or interested in future collaborations, pursuant to institutional and IRB regulations, in accordance with patient privacy, confidentiality, and HIPAA laws/regulations.
The Institutional Review Board (IRB) at the Mayo Clinic approved the study. Each author certifies that all investigations were conducted in conformity with ethical principles.
About this article
- Harris Hip Score
- Total Hip Arthroplasty
- Discriminant ability
- Clinically important improvement
- Minimal clinically important improvement
- Minimal clinically important difference