In this study, we found that Mayo hip score is a valid and reliable measure of hip outcomes in patients who had undergone primary THA. Mayo hip score was also associated with the risk of revision surgery at 2-years, both when examined as the final score and as a change score. These psychometric properties compared well to the HHS [5, 19–21], which at present is the most commonly used outcome instrument for THA outcome assessment [3, 4]. An advantage of the Mayo hip score is that either physicians or patients can complete it, as compared to the HHS that has physician and patient portions. An excellent agreement between these assessments also indicates that in an appropriate setting, Mayo hip score may be an alternative to HHS. Several findings from this validation study deserve further discussion.
First, our study established that Mayo hip score was responsive to change with large effect sizes and was associated with risk of subsequent revision at 2-years. This is a very important finding, considering that one of the main applications of arthroplasty outcomes instruments is to compare various surgical or implant types and therefore a responsive instrument is highly desirable [2]. HHS has been shown to be responsive with large ES and SRM of 2.5 and 1.8 [21]; respective ES and SRM for Mayo hip score of 2.8 and 2.6 from our study are comparable. We also determined the MCII and moderate improvement thresholds for Mayo hip score, at 2- and 5-years.
The association of Mayo hip score with the risk of future revision surgery is particularly impressive, since the attainment of Mayo Hip Scores at or below 55 points at 2- or 5-years increased the risk of early revision surgery after each time-point by 2-times, compared to scores 72–80 points (maximum score is 80). This was statistically significant at 2-years (p = 0.0003) and borderline statistically significant at 5-years (p = 0.05). A lack of improvement (or worsening) of Mayo hip score from preoperative to 2-years was associated with a 3-4 times higher revision risk compared to improvement by >50-points. K-M survival curves describe these findings that both absolute and change in Mayo hip scores were significantly associated with the risk of revision, the only exception being the association of absolute scores at 5-years that was not statistically significantly associated with future revision risk (p = 0.14). These findings indicate that Mayo hip scores might allow screening of early THA implant failures. This has practical implications, beyond instrument validation. It may be possible to develop an early implant risk score for clinical use that incorporates Mayo hip score (or similar scales) and other risk factors, a project currently underway.
Second, this study adds significant validation data related to the construct validity of the Mayo hip score. Our finding of significant association of Mayo hip scores at 2- and 5-years with important baseline characteristics previously shown to impact pain and function outcomes, i.e. age, gender, BMI, ASA class and Deyo-Charlson index [9–16], provides critical evidence for its convergent and divergent validity. The high correlation of Mayo hip score with HHS, a validated outcome instrument [19, 20, 22] and overall activity level, establishes the construct validity of Mayo hip score for assessment of outcomes after THA. More validation studies in other populations such as revision THA, hip arthroscopy, partial hip replacement may be needed before Mayo hip score can be used for the assessment of outcomes in these populations.
We found the Mayo hip score has fair test-retest inter-rater reliability, which extends the findings from an independent sample from our institution (i.e., Mayo Clinic) in 97 THA patients [6]. We found a slightly higher (better) score in physician- than patient-administered Mayo hip score, and the difference of 5-points in our study was slightly higher than the 1.8-point difference in the previous study. As expected, this difference was slightly smaller than that for HHS, given its complexity and inclusion of range of motion and deformity. This indicates the reliability of Mayo Hip score is as much as HHS or higher. Inter-rater ICCs in the range of 0.39 to 0.86 have been reported for outcome measures in pemphigus [23] with patient vs. physician scoring, similar to our ICC of 0.55 for Mayo hip score. A higher ICC would imply that patient and physician assessments agree closely; however, the ICC seems numerically better than the most commonly used instrument, the HHS, in our study. For a patient-reported measure such as Mayo hip score, it is not surprising to see differences in physician completed assessment vs. patient completed assessment. We recommend that this PRO be completed by patients, not physicians, and in studies where both physicians and patients have completed the survey, MHS scores may not be interchangeable.
Key advantages of Mayo hip score are that it has fewer questions than HHS and does not require physician assessment in the clinic. Therefore, it can be sent as a mailed survey and completed by patients. This can allow for a rapid and efficient assessment of not only the patients’ current pain/function status that allows screening for THA failures during the short and long-term arthroplasty follow-up. As suggested in earlier studies, the Mayo hip score can be coupled with radiographic score [5] for a more comprehensive assessment. The advent of telemedicine may allow for a virtual patient visit and follow-up (mailed survey and remote review of radiographs) that can replace the annual in-person clinician visit. In an era of health care cost reduction and a declining supply of arthroplasty surgeons [24], this approach might allow more cost-efficient arthroplasty follow-up and monitoring.
One must consider the study limitations while interpreting study findings. Non-response rates were 36 % and 45 % at 2- and 5-years, respectively, which could have potentially affected the study findings. We decided a priori not to impute values for missing data, since this was a validation study, and we wanted to only use real data. It is unclear to us, whether it strengthened or weakened study findings, since we are unaware of the impact of non-response on validation statistics. These findings may therefore be generalizable only to patients who regularly respond to post-arthroplasty surveys. The overall revision rate in the cohort is low, and we may have missed significant findings due to the rarity of this outcome.
So, what does this study add? There is already a large diversity of instruments being used in THA assessments [25], so why have another instrument? Several instrument in current use in arthroplasty have limited data on validity and reliability, and we provide assessment of validity and responsiveness of an instrument that has been in use in our Total Joint Registry for >3 decades and is patient self-administered. One challenge in post-arthroplasty outcome assessment is the use of several different outcome measures [25]. Harmonization of the THA outcome instrument use across arthroplasty studies would be a step forward and will allow comparison across studies. This effort is currently underway [26].