- Open Access
Differences among the observers in the assessments of Japanese orthopedic association hip scores between surgeons and physical therapists and the correlations to patients’ reported outcomes after total hip arthroplasty
BMC Musculoskeletal Disorders volume 23, Article number: 27 (2022)
We aimed to assess the utility of a clinician-reported outcome (the Japanese Orthopedic Association [JOA] hip score) as evaluated by clinicians and physiotherapists. This assessment was made by comparing these scores to those of the JOA hip disease evaluation questionnaire (JHEQ), which is a measurement of patient-reported outcomes after total hip arthroplasty.
In this retrospective case-control study, 52 hips that underwent primary total hip arthroplasty were included in the analyses. The mean age of the participants was 66.8 years (sex, seven male and 45 female participants). The JOA hip score included four categories: pain, range of motion, ability to walk, and active daily living. The JHEQ included three categories: pain, movement, and mental health. These scores were evaluated preoperatively and postoperatively by clinicians or physiotherapists. Pearson’s correlation coefficients were utilized to analyze the association of the JOA hip scores to those of the JHEQ.
The JOA hip scores were determined by clinicians and physiotherapists (scores of 46.8 and 57.3, respectively) preoperatively and at 24 months (scores of 94.4 and 91.7, respectively) postoperatively. The JHEQ points were 28.8 and 66.2 preoperatively and at 24 months postoperatively, respectively. The correlation coefficients between the JOA hip and JHEQ scores were .66 and .69 preoperatively and .57 and .76 at 24 months postoperatively, as evaluated by clinicians and physiotherapists, respectively.
Although the JHEQ scores were positively correlated to the JOA hip scores by clinicians and physiotherapists preoperatively and postoperatively, this study implies that clinicians may interpret the results in a way that might have been beneficial to them. To comprehend a patients’ health status, we should inclusively understand the varying range of information among different evaluators.
With the aim to eliminate observational bias among clinicians for the evaluation of the quality of life (QOL) after surgeries, patient-reported outcomes measures (PROMs) should be regarded as indispensable measurements for the actual and subjective reflection of a patient’s condition and satisfaction. From 2002, the Swedish Hip Arthroplasty Registers, a nationwide arthroplasty registration conducted by the Swedish Orthopedic Society, initiated an observation program for total hip arthroplasty (THA) cases using PROMs [1, 2]. Moreover, the Food and Drug Administration and National Institute for Health and Care Excellence had recognized PROMs as essential methods for clinical investigators to measure the efficacy of medical intervention .
In contrast, clinician-reported outcomes (ClinROs) are commonly used as objective measures for surgical evaluations (i.e., Harris hip  and Merle d’Aubigné scores ), which are not long-listed questionnaires and reduce the patients’ subjective views of PROMs. However, it has not been fully established who among the various types of medical staff, including clinicians, nurses, or physical therapists, are suitable to extract ClinROs correctly. Moreover, although the significant differences between evaluations performed by physicians and patients have been noted , the range of discrepancy between PROMs and ClinROs still has not been evaluated.
In this study, based on the hypothesis that physiotherapists could evaluate the postoperative function from more independent viewpoints, we compared the results of ClinROs scored by surgeons or physiotherapists with those of the PROMs for patients with THA before and after surgery. Then, these assessments were compared and correlated to PROMs. In this study, the Japanese Orthopedic Association hip disease evaluation questionnaire (JHEQ, Supplement 1), which was specifically developed for the lifestyles of East Asian countries, was used as the PROM [7, 8], while the Japanese Orthopedic Association (JOA) hip score (Supplement 2) was used as the ClinRO .
From June 2012 to December 2017, we collected the data of patients who underwent THAs for degenerative hip disease cases and the regular assessments by PROMs and ClinROs at our hospital. The patients who had osteoarthritis, osteonecrosis, or rapidly destructive coxarthropathy and consented to participate into the study were included. We excluded those who underwent revision THA and acetabular reconstruction, as well as those who had rheumatic diseases and serious postoperative comorbidity complications.
All cases were reconstructed using cementless implants with the Revelation® hip system (DJO Global, Lewisville, TX, USA), SL-PLUS™ femoral hip system (Smith & Nephew, Hull, UK), MODULUS® femoral stem (Lima Corp., San Daniele, Italy), or C2® femoral stem (Lima Corp.). The reconstructions of the acetabular components were performed using the FMP® acetabular system (DJO Global) for the Revelation®, the R3Acetabular system® (Smith & Nephew) for the SL-PLUS®, and the Delta TT cup® (Lima Corp.) for the Modulus® and C2® femoral stems, respectively. All THAs were performed by one surgery team (organized by NW), by a modified Dall’s anterior-lateral approach .
Patient-reported outcome measurement (PROM)
The JHEQ was evaluated as a PROM preoperatively and at 12 and 24 months postoperatively (Supplement 1). The JHEQ (maximum of 84 points) consisted of 20 questionnaires with subsections: pain, movement, and mental health (up to 28 points each) [7, 8]. At the same time, the visual analog scale (VAS) scores concerning the patients’ satisfaction regarding the surgical procedure were rated by them using an horizontal line of 100-mm long .
Clinician-reported outcome (ClinRO)
Concerning ClinROs, the physicians and physical therapists who were engaged in the physical therapy and the rehabilitation programs after THA recorded the JOA hip score preoperatively and at 12 and 24 months postoperatively (Supplement 2). The JOA hip score had four categories for pain, range of motion (ROM), ability to walk, and activities of daily living (ADLs) (up to 40, 20, 20, and 20 points, respectively) .
Shapiro–Wilk tests were performed for the confirmation of normal distributions of each characteristic. The ClinROs that were evaluated by the JOA hip scores before and after THA from different observers were compared using the Student’s t-test or the Mann–Whitney test in accordance with the results of the Shapiro–Wilk test. The correlations between the JOA hip and JHEQ scores were compared by Pearson’s correlation coefficients. The correlations between the VAS scores with the JOA hip or JHEQ scores were evaluated by Spearman’s correlation coefficients. A P-value <.01 was considered significant. The statistical package for the social sciences (SPSS ver. 24; IBM Corp., Armonk, NY, USA) was used for statistical analysis. The total sample size was determined on whether a correlation coefficient differed from zero (α = .01 [two-tailed], β = .20 and r = .45; target number = 52.7). Bland–Altman analysis and evaluation of the limit of agreement between the medical physicians and physical therapists were performed to assess the systemic bias [12, 13]. Identification of the fixed bias was evaluated based on whether the mean value of the difference differed significantly from 0 on the basis of a one-sample t-test. Moreover, the presence of proportional bias was investigated using the liner regression model.
Role of the funding source
No funders participated in the design, conduct, or reporting of this study.
In this period, THAs were performed for degenerative hip disease cases at our institution (160 hips in total). We excluded revision THA cases (n = 17), acetabular reconstruction cases with acetabular support (n = 3), and cases of rheumatic diseases (n = 8). During the follow-up periods, we also omitted the cases of patients who had the following: required revision of THAs because of loosening (n = 1), postoperative deep located infection (n = 2), dislocation (n = 1), periprosthetic fracture (n = 1), and dementia, which would have affected the acquisition of accurate postoperative evaluation data (n = 1). Thirty patients dropped out from the routine surveys after THA in our hospital. Among 96 cases, a total 52 of patients (cases of osteoarthritis [n = 46], osteonecrosis [n = 4], and rapidly destructive coxarthropathy [n = 2]) agreed to participate in this study and completed the consecutive questionnaires. The average age was 66.8 (standard deviation [SD], 8.9) years. In total, THAs were performed for 45 and seven joints of women and men, respectively. The median operation time was 98 min, the median total amount of surgical bleeding was 250 mL, and the average body mass index was 23.5 kg/m2. The details of the patients’ characteristics or implants’ information are presented in Tables 1 and 2.
ClinROs between different observers
Preoperatively, the median JOA hip scores, as assessed by physicians, were 46.5 points in total: 10, 10, 10, and 10 points for pain, ROM, ability to walk, and ADL, respectively. In contrast, the median JOA hip scores, as assessed by physical therapists, were 57.0 points in total; 20, 14, 10, and 12 points for pain, ROM, ability to walk, and ADL. Therefore, the JOA hip scores evaluated by the orthopedic surgeons were significantly lower (P < .01) than those evaluated by the physical therapists, except for the scores for ability to walk (Table 3).
After THA, the mean total JOA hip score improved and gradually restored over time from preoperative scores to 94.0 and 92.0 (12 months, P < .001), and to 96.0 and 94.5 (24 months, P = .004) postoperatively, as evaluated by surgeons and physical therapists, respectively. Unlike the preoperative evaluations, several subcategories of JOA hip scores, including pain, ROM, ability to walk (only 12 months), and total scores, were significantly overestimated by the orthopedic surgeons (Table 3).
Preoperatively, the Bland–Altman analysis suggested the downward fixed bias in the total JOA hip scores evaluated by physicians about 10 points (P < .001). On the contrary, the Bland–Altman analysis suggested the presence of upward fixed bias in the score evaluated by physicians (12 months, 4.3 points (P < .001); 24 months, 2.8 points (P = .006)). Moreover, there were proportional errors (R = -.46, P < .001; R = -.48, P < .001; 12 and 24 months, respectively; Supplement 3).
Correlations of ClinROs and PROMs
Preoperatively, the median total JHEQ score was 30.0 points (pain, 9; movement, 5; mental health, 12). At 12 months postoperatively, the mean total JHEQ score was 67.5 points (pain, 28; movement, 18; mental health, 24.5) (Supplement 4). Then, at 24 months postoperatively, the total JHEQ score was 67.5 points (pain, 27; movement, 18.5; mental health, 25). The correlations of the total JOA and JHEQ scores at preoperative periods were .66 and .69 (evaluated by physicians and therapists, respectively; Table 4 and Supplement 5). Moreover, at 24 months postoperatively, the correlations of the total JOA and JHEQ scores were .57 and .76 (evaluated by physicians and therapists, respectively; Table 4 and Supplement 5).
Correlations of patients’ satisfaction and pain measured by VAS with JOA hip scores or JHEQs
As representative continuous values of the outcomes, the VAS-satisfaction for hip joints were evaluated. Preoperatively, the median VAS-satisfaction was 13 points. After THA, these complaints were resolved to more than a median of 95 points within 12 months. When comparing the relationships between VAS-satisfaction and JOA or JHEQ, the correlations were calculated by Spearman’s correlations. Therefore, VAS-satisfaction was found to be highly correlated to the total JHEQ score (Table 5, Supplement 6) after comparing to the JOA hip score preoperatively and moderate correlated at 12 and 24 months after THA.
In this study, we first described postoperative evaluation after THA with the ClinROs, as evaluated by different observers, and analyzed the relationships between the PROMs and ClinROs (customized measurements for the East Asian populations), and the JOA hip and JHEQ scores. Interestingly, we found that physical therapists could substitute the essential evaluators for the ClinROs from more independent viewpoints compared with physicians.
For the assessment of postoperative function, pain, satisfaction, or QOL, more reliable patient-oriented evaluation criteria are desired. As a representation of changes for multiple symptoms, these criteria clarify the impact of treatment and enhance the interpretation of clinical studies for clinicians . To date, the Short-Form 36-Item Health Survey (SF-36) , Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) , and Oxford Hip Score  are generally used as PROMs [7, 8]. The JHEQs were designed to adjust for the sedentary Asian lifestyle, which requires deep hip flexion for several activities, including sitting upright and usage of traditional toilets . Moreover, the JHEQ covered the subjective dissimilarities of the patients, which are difficult to determine from the objective examinations. Thus, the JHEQ provides meaningful information in the actual clinical setting.
Originally, Seki et al. reported that the JHEQ in cases of osteoarthritis or necrosis presented excellent reliability (intraclass correlation coefficient [ICC] > .8), while the JOA hip score was reliable in Japanese patients with osteoarthritis (Cronbach’sα test = .70) . Moreover, in a previous work, we reported that the ICC (1.2) in the JHEQ subgroup of patients with labral tear was .88 in all categories (pain, .85; movement, .89; mental, .8), while the Cronbach’s α test result was .94 in all categories (pain, .92; movement, .94; mental, .89 in subcategory) . Based on the preceding sufficient results, we did not duplicate the reliability tests in our patients.
For various types of illness, PROMs and ClinROs had been compared and disagreements among numerous studies have been reported. Generally, patient-reported symptoms provided an independent patients’ perspective on the treatment benefit and the expected risk, which occasionally exceeds the clinicians’ expectations. Flores et al.  reported that patients with rectal cancer who were treated with chemo-radiotherapy described the presence of diarrhea and proctitis more often than when recorded by clinicians throughout treatment. For patients with breast cancer treated with radiotherapy, Mukesh et al.  reported that moderate-to-severe toxicity was underestimated as low toxicity by clinicians, and the overall concordance between clinicians and patients was not sufficient. In this study, as reliable values, VAS-satisfaction was more precisely correlated to the JHEQ score compared with the JOA hip scores.
To our knowledge, this is the first study to compare the JOA hip scores recorded by surgeons with those recorded by physiotherapists. There were significant differences between the JOA hip scores recorded by physicians and physiotherapists for approximately all investigations preoperatively and postoperatively. Nevertheless, the maximum differences might not be a critical discrepancy in clinical settings (only < 5 points), but this study indicated that the JOA scores were overestimated after THA by clinicians. However, the reasons for the systemic tendencies were not fully elucidated; clinicians reported upward postoperative scores without consciousness (a sort of rater bias). In contrast, preoperative JOA-pain, ROM, and ADL scores were underestimated by clinicians. These inclinations might have been a bias at the time of selection of patients for THA; securing a stable number of cases for surgery is important for the clinicians. These data suggested the physical therapists can correctly report the pre- and postoperative functions from more objective viewpoints.
Moreover, the correlation coefficients between the JOA hip and JHEQ scores were higher for physiotherapists than for clinicians, especially for the preoperative JOA-pain/JHEQ-pain, preoperative JOA-ADL/JHEQ-movement, postoperative JOA-walk/JHEQ-movement, and postoperative JOA-ADL/JHEQ-movement. These findings were partially related to the fact that well-educated rehabilitation staff can more accurately evaluate the patients’ status than physicians with closer relationships to the patients and provide more open-minded circumstances to present their problems before and after operation. The assessment by physiotherapists might support clinicians with a more objective perception to exclude observational bias of patients’ status. As objective observers, ClinROs by physiotherapists should be considered in evaluating postoperative outcomes.
This study had several limitations. First, the number of patients in this study was small and the institutional difference should be noted. These differences included the conditions of the surgeries and the experience of the young physicians in residency and medical staff who evaluated JOA hip scores. Second, we excluded technically difficult cases (revision THA, acetabular reconstruction cases with acetabular support, and extensive infection cases), for which, postoperative functions are not generally guaranteed and dispersed. Third, the presence of bias should be noted; especially, patients who were willing to participate in the study and answer the questionnaires were mainly selected for this study. Moreover, relatively low response rates to this study might have influenced the results. This was attributed to the accessibility to our institution and introduction to clinics located near the participants’ dwelling places. However, the response rate was not intentional and could not have affected the results of this study.
The JHEQ score was correlated to the JOA hip score, as measured by clinicians and physiotherapists. However, this study implicated that rater bias might have influenced the results. To determine a patient’s status, it is recommended that the various selections of information collected among different observers should be inclusively understood and evaluated.
Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
Active daily living
Food and Drug Administration
Japanese Orthopedic Association hip disease evaluation questionnaire
Japanese Orthopedic Association
Oxford Hip Score
Patient-reported outcomes measure
Quality of life
Range of movement
Total hip arthroplasty
National Institute for Health and Care Excellence
Western Ontario and McMaster Universities Osteoarthritis Index
Rolfson O, Eresian Chenok K, Bohm E, Lübbeke A, Denissen G, Dunn J, et al. Patient-reported outcome measures in arthroplasty registries. Acta Orthop. 2016;8:3–8.
Rolfson O, Kärrholm J, Dahlberg LE, Garellick G. Patient-reported outcomes in the Swedish hip arthroplasty register: results of a nationwide prospective observational study. J Bone Joint Surg Br. 2011;93:867–75.
Weldring T, Smith SM. Patient-reported outcomes (PROs) and patient-reported outcome measures (PROMs). Health Serv Insights. 2013;6:61–8.
Harris WH. Traumatic arthritis of the hip after dislocation and acetabular fractures: treatment by mold arthroplasty. An end-result study using a new method of result evaluation. J Bone Joint Surg Am. 1969;51:737–55.
d'Aubigne RM, Postel M. Functional results of hip arthroplasty with acrylic prosthesis. J Bone Joint Surg Am. 1954;36-A:451–75.
Studenic P, Radner H, Smolen JS, Aletaha D. Discrepancies between patients and physicians in their perceptions of rheumatoid arthritis disease activity. Arthritis Rheum. 2012;64:2814–23.
Matsumoto T, Kaneuji A, Hiejima Y, Sugiyama H, Akiyama H, Atsumi T, et al. Japanese Orthopaedic association hip disease evaluation questionnaire (JHEQ): a patient-based evaluation tool for hip-joint disease. The subcommittee on hip disease evaluation of the clinical outcome committee of the Japanese Orthopaedic association. J Orthop Sci. 2012;17:25–38.
Seki T, Hasegawa Y, Ikeuchi K, Ishiguro N, Hiejima Y. Reliability and validity of the Japanese Orthopaedic association hip disease evaluation questionnaire (JHEQ) for patients with hip disease. J Orthop Sci. 2013;18:782–7.
Imura S. The Japanese Orthopaedic association: evaluation chart of hip joint functions. J Jpn Orthop Assoc. 1995;69:864–7.
Dall DM. Modified Dall approach. In: Bono JV, McCarthy JC, Thornhill TS, Bierbaum BE, Turner RH, editors. Revision Total hip arthroplasty. New York: Springer; 1999. p. 263–5.
Brokelman RB, Haverkamp D, van Loon C, Hol A, van Kampen A, Veth R. The validation of the visual analogue scale for patient satisfaction after total hip arthroplasty. Eur Orthop Traumatol. 2012;3(2):101–5.
Hallgren KA. Computing inter-rater reliability for observational data: an overview and tutorial. Tutor Quant Methods Psychol. 2012;8(1):23–34.
Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;1(8476):307–10.
Cleeland CS, Sloan JA. ASCPRO Organizing Group. Assessing the symptoms of cancer using patient-reported outcomes (ASCPRO): searching for standards. J Pain Symptom Manage. 2010;39:1077–85.
McHorney CA, Ware JE Jr, Raczek AE. The MOS 36-item short-form health survey (SF-36): II. Psychometric and clinical tests of validity in measuring physical and mental health constructs. Med Care. 1993;31:247–63.
McConnell S, Kolopack P, Davis AM. The Western Ontario and McMaster universities osteoarthritis index (WOMAC): a review of its utility and measurement properties. Arthritis Rheum. 2001;45:453–61.
Dawson J, Fitzpatrick R, Carr A, Murray D. Questionnaire on the perceptions of patients about total hip replacement. J Bone Joint Surg Br. 1996;78:185–90.
Fujita K, Makimoto K, Mawatari M. Three-year follow-up study of health related QOL and lifestyle indicators for Japanese patients after total hip arthroplasty. J Orthop Sci. 2016;21:191–8.
Watanabe N, Murakami S, Uchida S, et al. Validity of the Japanese Orthopaedic association hip disease evaluation questionnaire (JHEQ) for Japanese patients with labral tear. J Hip Preserv Surg. 2020;7(3):466–73.
Flores LT, Bennett AV, Law EB, Hajj C, Griffith MP, Goodman KA. Patient-reported outcomes vs. clinician symptom reporting during chemoradiation for rectal cancer. Gastrointest Cancer Res. 2012;5:119–24.
Mukesh MB, Qian W, Wah Hak CC, Wilkinson JS, Barnett GC, Moody AM, et al. The Cambridge Breast Intensity-modulated Radiotherapy Trial: comparison of clinician- versus patient-reported outcomes. Clin Oncol (R Coll Radiol). 2016;28:354–64.
Japanese Hip Society. Japanese Orthopaedic Association Hip-Disease Evaluation Questionnaire (JHEQ). http://hip-society.jp/jheq/jheq_eng.pdf. Accessed 2 Nov 2021.
Kuribayashi M, Takahashi KA, Fujioka M, Ueshima K, Inoue S, Kubo T. Reliability and validity of the Japanese Orthopaedic association hip score. J Orthop Sci. 2010;15(4):452–8. https://doi.org/10.1007/s00776-010-1490-0.
We thank the physical therapists in the Tosei General Hospital for the assessment of JOA; Nobuaki Baba, Yukari Suzuki, Yasuhiro Shimada and Tatsu Ishikawa. Moreover, medical assistants, Mamiko Hayashi, Yukari Kato and Rei Goto, supported the data inputs of the patients’ characteristics. Moreover, we would like to thank the copy right clearance center of the Journal of Orthopedic Science for the use of the JOA hip score and JHEQ (order Date: Sep 30, 2021; order Number: 5158591285781).
No funding support.
Ethics approval and consent to participate
This research was approved by the institutional review board of the authors’ affiliated institutions (the Ethics Committee of Tosei General Hospital, approved on August 12, 2020), and written informed consent from the participants for this study was obtained from all patients. This study complied with the Declaration of Helsinki.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Japanese Orthopedic Association Hip-Disease Evaluation Questionnaire: A Patient-Based Evaluation Tool for Hip-Joint Disease” .
Bland–Altman in the total JOA hip scores.
Correlation between JOA hip scores and JHEQ scores in 12 months.
The scatter plots of the distribution of JOA hip scores and JHEQ preoperatively and 24 months after THA.
The scatter plots of the distribution of JOA hip scores /JHEQ and VAS-satisfaction preoperatively and 24 months after THA.
About this article
Cite this article
Aiba, H., Watanabe, N., Inagaki, T. et al. Differences among the observers in the assessments of Japanese orthopedic association hip scores between surgeons and physical therapists and the correlations to patients’ reported outcomes after total hip arthroplasty. BMC Musculoskelet Disord 23, 27 (2022). https://doi.org/10.1186/s12891-021-04980-5
- Total hip arthroplasty
- JOA hips score
- Patient-reported outcome
- Clinician-reported outcome