Traditional Chinese-Hong Kong version of Forgotten Joint Score-12 (FJS-12) for patients with osteoarthritis of the knee underwent joint replacement surgery: cross-cultural and sub-cultural adaptation, and validation

Background A patient-reported outcome (PRO) tool which reflects the outcomes of patients underwent total knee arthroplasty (TKA) are important to be “ceiling effect free” which commonly used PRO tools face. Forgotten joint score-12 (FJS-12) has been proved to reduce or even free from ceiling effect. FJS-12 has been translated to different languages. The objectives of this study are to validate FJS-12 in Traditional Chinese-Hong Kong language and look for the goodness of FJS-12 still exist in this language adapted FJS-12 version. Methods FJS-12 was administered to 75 patients whose majority was obese underwent TKA between September 2019 and March 2020. Patients completed 3 sets of questionnaires (FJS-12, Oxford Knee Score (OKS), and Numeric Rating Scale (NRS)) twice, 2 weeks apart. Reliability, internal consistency, responsiveness, test–retest agreement and discriminant validity were evaluated. Results Reliability of FJS-12 showed moderate to excellent internal consistency (Cronbach’s α = 0.870). Test–retest reliability of FJS-12 was good (ICC = 0.769). Bland–Altman plot showed good test–retest agreement. Construct validity in terms of correlations between FJS-12 and OKS, and FJS-12 and NRS were moderate at baseline (Pearson’s coefficient r = 0.598) and good at follow-up (r = 0.879). Smallest detectable change (Responsiveness) was higher than MIC. Floor effect was none observed, and ceiling effect was low. Discriminant validity was found to have no significance. BMI (obesity) did not affect FJS-12 outcomes. Conclusions The Traditional Chinese-Hong Kong version of FJS-12 showed good test–retest reliability, validity, responsiveness, BMI non-specific, with no floor and low ceiling effects for patients who underwent TKA. Sub-culture differences in individual PRO tools should be considered in certain ethnicities and languages. Supplementary Information The online version contains supplementary material available at 10.1186/s12891-022-05156-5.

suffering from their disorder severity [1]. PRO also provides timely and appropriate therapeutic and rehabilitation strategies. The success of a disease-specific PRO always comes with their well cross-cultural adaptation capability which make them locality and language friendly [2].
Forgotten Joint Score-12 (FJS-12) is a newly developed well-recognized joint-specific patient-reported outcome (PRO) focusing on patients' awareness of a specific joint in everyday life [3]. Joint awareness is always 'forgotten' until strong sensations come e.g. pain, mild stiffness, subjective dysfunction, or any discomfort [3]. FJS-12 has been introduced in different joint related studies [4][5][6][7][8][9][10] together with some "gold standards", such as Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) [11,12], Oxford Knee Score (OKS) [13], Knee Injury and Osteoarthritis Outcome Score (KOOS) [14], Knee Society Score (KSS) and Function Score (KFS) [15]. Recent technology allows patients to look for the information concerning their disease symptoms, treatments receiving and expected outcomes. Gaining knowledge benefits the patients and at the same time, they expect better health outcomes as medical technology (knee arthroplasty) advances. Some of the tools mentioned before, as the PRO's internal construct has been developed for years, find themselves difficult to differentiate between higher levels of function and patient satisfaction (i.e. known ceiling and floor effects) nowadays [16]. One of the advantages of FJS-12 is that it has low ceiling and floor effects [3,17]. FJS-12 is also found to be the most responsive tool comparing with the PRO mentioned above in patients following total knee arthroplasty (TKA) [18]. FJS-12 is developed to assess the outcomes of hip and knee arthroplasty by evaluating a patient's awareness of the artificial joint during twelve activities of daily living. FJS-12 is based upon the assumption that the goal of total knee arthroplasty is a joint patient can "forget" about. Studies started using FJS-12 as the sole PRO assessment tool [19,20] to access knee functions and used to assess the long-term results after TKA [21].
World Health Organization (WHO) developed a universal measuring tool of the quality of life (QOL) called the WHOQOL Questionnaire, and WHOQOL had been translated to different languages, including Chinese (China), Chinese (Hong Kong), and Chinese (Taiwan). The development process teams of WHO-QOL from mainland China, Hong Kong and Taiwan looked for the similarities and differences among these 3 language versions [24]. The authors found that, although "Chinese" language in the three regions used a similar written and spoken language and was deeply influenced by the same ancient Chinese philosophies, variations still found. The report mentioned that the differences could be attributed to a combination of historical and geo-political factors [24]. Similarities and dissimilarities can be found within subcultures [24]. The similarities and dissimilarities can also be found in other well recognized QOL measures e.g., Short Form-36 (SF-36) (SF-36 has China, Hong Kong, and Taiwan versions). Another example of sub-culture difference is also referred to the development of WHOQOL, of which WHOQOL developed USA (American English), Canadian (Canadian English), UK (British English), and Australia (Australian English) versions. That also reflects subcultural differences exist among English speaking countries.
Why is FJS-12 necessary to have the "Traditional Chinese-Hong Kong" version when "Simplified Chinese-Mandarin Chinese" version and "Traditional Chinese-Taiwan" version are available? To recall, FJS-12 has already been translated to Simplified Chinese-Mandarin Chinese [25], and translated and linguistically validated to Traditional Chinese-Taiwan [26]. "Simplified Chinese" is officially used in mainland China, Singapore and the Chinese community in Malaysia and "Traditional Chinese" is officially and commonly used in Taiwan, Hong Kong, and Macau. In the "Traditional Chinese" societies, however, a fundamental cross-cultural difference between Taiwan and Hong Kong/Macau was reported. In a cross-society comparison of general happiness and personal life satisfaction between 1222 participants from Taiwan and 1044 participants from Hong Kong using an identical survey platform, Hong Kong participants indicated a happier attitude regarding to their recent life than the Taiwanese participants [27]. However, the Taiwanese respondents were more satisfied with their personal quality of life than the Hong Kong respondents. As a result, a Traditional Chinese-Hong Kong version of FJS-12 is necessary to develop although another two Chinese versions is available now.
The purpose of this study is to validate the psychometric properties of FJS-12 by testing the reliability, validity, and responsiveness of the validated FJS-12. Floor and ceiling effects of the translated version were discussed. Oxford Knee Score (OKS) and Numeric Rating Scale (NRS) were conducted in line with the Traditional Chinese-Hong Kong version of FJS-12 and correlations between OKS and FJS-12, and between OKS and NRS were sorted.

Methods
Between September 2019 and March 2020, 75 patients who underwent unilateral total knee arthroplasty (TKA) at their end stage of knee osteoarthritis were invited to join this study. The inclusion criteria were 1) male and female patients of any age, 2) presence of unilateral knee osteoarthritis (Kellgren Lawrence scale of III-IV), 3) patients received unilateral total knee arthroplasty at least 1 year before this study, and 4) fluent in Chinese Cantonese reading and comprehension. The exclusion criteria were 1) patients with impaired cognitive function, 2) unable to understand Chinese Cantonese, and 3) unable to self-administer both questionnaires. Informed consent was signed by every participant. Ethics approval was received from the institutional ethics review committee (ethics approval number: 2019.337). The study was performed in accordance with the Declaration of Helsinki and ICH-GCP.

Translation and cross-cultural adaptation
The translation of the FJS-12 into Traditional Chinese-Hong Kong version was carried out using "translation and back-translation" method, in accordance with the International Quality of Life Assessment (IQOLA) guideline [28,29]. Following the guideline, the FJS-12 was translated from English to Traditional Chinese-Hong Kong by two independent bilingual medical professionals and one non-health worker. The translated version was then back-translated to English by two different independent bilingual medical professionals and another non-health worker. The final version was reviewed and discussed for consistency by all 6 members and subsequently verified (Version 1.1, Appendix 1). Minor modifications were made in different questions for cultural adaptation. The "modifications" were summarized in Appendix 2. "Modifications" concerned about the wordings on the same activities and actions used in different regions, and the changes were meant not to alter the meaning of the questions.

Forgotten Joint Score-12 (FJS-12)
FJS-12 comprises 12 questions under a 5-point Likert scale (Score = 1 (never, leftmost) to 5 (mostly, rightmost)). The final score is transformed to a 0-100 scale and then reversed to obtain the final score. Higher score indicates better outcome. Scoring FJS-12 final score follows the recommended scoring algorithm.

Oxford Knee Score (OKS)
OKS has a similar scoring algorithm with FJS-12. OKS consists of 12 assessment questions concerning pain and function after TKA scoring from 0 to 4 (0 being the worst effect and 4 being the best) [13,30]. Summing up all 12 scores forms the final score, of which the final score ranges from 0 (most severe symptoms) to 48 (least symptoms). In recent cross-cultural adaptation and translation studies on OKS, different translated languages showed good reliability, validity and responsiveness e.g. Arabic [31], Slovenian [32], and Malaysian Chinese, Hong Kong Chinese and Singaporean Chinese [33]. The Hong Kong Traditional Chinese version of OKS was used in this study.

Numeric Rating Scale (NRS)
NRS has been routinely applied to let the patients rate the pain level on a defined scale. NRS is a single 11-point numeric scale ranging between 0 and 10, with 0 representing "no pain" and 10 representing the pain extreme [34].

Data collection
Validated FJS-12 and OKS was administered to the patients during their routine clinic follow-up visits (baseline). NRS was routinely recorded at each patient visit. All patients were invited to come back to the clinic 1-2 weeks after to complete these questionnaires again (follow-up).
Patients' baseline demographics e.g., age, sex, body height, body weight, and side of surgery were collected from electronic medical records from the hospital. Details on education level of patients were not routinely collected, however, obesity in terms of body mass index (BMI) was found to be inversely associated with education level [35].

Statistical analysis
Demographic characteristics were summarized by mean ± standard deviation (SD) for numeric data and N(%) for categorical data respectively. Reliability was measured through test-retest reliability expressed in terms of intra-class correlation (ICC) (two-way random single measure), internal consistency using Cronbach's Alpha, and smallest detectable change (SDC) [36]. SDC was calculated using the formula: where SEM (standard error of mean) = SD [37]. Bland-Altman plot was used to look for test-retest agreement. Correlations between FJS-12 and OKS, and between FJS-12 and NRS were tested to look for the validity between the translated version to a gold standard (construct validity). Responsiveness measuring the measurement error in longitudinal validity under repeated measures was calculated by comparing SDC with minimal important change (MIC). Floor and ceiling effects defined as the percentages of participants scoring the leftmost option "never" ("Floor"; score = 1) and rightmost option "mostly" ("Ceiling"; score = 5) in individual questions. Percentages at or above 15% considered significant [37]. Discriminant validity was evaluated using correlations between FJS-12 final score and patients' baseline demographics. Data analysis were carried out using IBM SPSS 27.0 (Armonk, New York). A two-sided p value ≤ 0.05 was considered statistically significant.

Bootstrapping
Bootstrapping was introduced to compare the differences in responsiveness estimates between the measures, and the results were expressed in terms of bias, standard error, and 95% confidence interval (CI) [38]. Bootstrapping is a resampling technique to draw numerous samples from the original sample with replacement [39]. In this study, a bias-corrected bootstrap method (bias corrected accelerated, BCa) with 200 and 1000 iterations or samples was used to compare the differences in the mentioned responsiveness estimates (In our study, bias, standard error, and 95% confidence interval (CI) were reported) between the measures [40][41][42]. Two sampling sizes, 200 and 1000 were performed because 1) this was a statistics "rule of thumb" that 200 samples provide adequate statistical power for data analysis, and 2) 1000 is a presumed sample size for running bootstrapping. Bootstrapping was also carried out using IBM SPSS 27 (Armonk, New York).

Results
The baseline demographics of the 75 patients were tabulated in Table 1. Of the 75 patients, 74.6% were obese. Mean number of days between the baseline and followup was 9.53 days. Obese patients constituted 70.67% of the 75 patients, 16% were overweight and 12% felt into normal BMI range.

Reliability
FJS-12 showed moderate to excellent internal consistency in individual question with Cronbach's α of 0.870 in the final score ( Table 2). The test-retest reliability in terms of ICC was good in the FJS-12 final score (ICC = 0.769 (95% CI = 0.560, 0.886)) using the definitions established by Koo et al. [43]. Question 1 was "excellent" and most of the questions indicated at least "moderate". Bland-Altman plot for the repeated measures (follow-up -baseline) showed the majority of measurement differences fell within the mean ± 1.96 standard deviation (Fig. 1). Nearly all measurement differences fell within the 95% limits of agreement (LOA) (Fig. 1).

Construct validity
Construct validity explained by correlation analyses showed moderate correlation with OKS at baseline (FJS-12 baseline vs. OKS baseline; Pearson's coefficient = 0.598, p < 0.01) and very strong correlation at follow-up (Pearson's coefficient = 0.879, p < 0.01) ( Table 3). Similar results were also observed in correlations between FJS-12 and NRS (moderate at baseline and very strong correlation at follow-up) ( Table 4).

Responsiveness
Responsiveness in terms of SDC was 15.77. MIC was calculated by halving the standard deviation proposed by Norman et al. [44]. MIC came out to be 5.92, which was smaller than SDC (i.e., SDC was higher than MIC). Floor effect was not observable in all questions (Table 5). Ceiling effect was statistically significant in question 8 in both baseline and follow-up, unless otherwise non-specified.

Discriminant validity
FJS-12 baseline and follow-up were found to have no significant correlation with patients' age, sex, BMI, and side of surgery (Table 5). OKS baseline and follow-up were also put in line with the analysis and results also showed no significant correlation with the respective baseline demographics.

Bootstrapping
Bias and standard error of the mean and standard deviation of individual questions as well as total score at baseline and follow-up were both low after performing bootstrapping for 200 samples (Table 6). Similar results (low bias and standard error) were found after performing bootstrapping for 1000 samples (Table 6). In OKS,   bias and standard error were low similar to that in FJS-12 (Table 7). Table 8 showed the results of mean differences, correlation coefficients, and p values in FJS-12 and OKS after bootstrapping for 200 and 1000 samples. The calculations were based on the score differences between baseline and follow-up. In mean difference, the 95% CI after bootstrapping for 200 and 1000 samples were similar (for example, in the comparison of mean difference in FJS-12 Question 1 between baseline and follow-up: within -0.46 and 0.00 in bootstrapping N = 200, and within -0.42 and 0.00 in bootstrapping N = 1000). Similarly, the 95% CI of correlation coefficients after bootstrapping N = 200 and N = 1000 were similar (Table 8;     non-statistical significance (p > 0.05) without bootstrapping remained statistical insignificance after bootstrapping of both sampling sizes. In the comparison group "OKS Question 12 Baseline -OKS Question 12 Followup", the score difference was found to have statistical significance (p = 0.05). Statistical difference remained after the two bootstrapping methods (p = 0.02 after bootstrapping for N = 200; and p = 0.05 after bootstrapping for N = 1000). Cross-comparisons between FJS-12 and OKS individual scores at baseline and follow-up followed. In the comparisons between FJS-12 and OKS in the 13 individual questions (12 questions and total) at baseline, mean differences and correlation coefficients were similar (Table 9). These results were reflected by the p values without bootstrapping, bootstrapping for N = 200, and bootstrapping for N = 1000 (Table 10). Comparing between 95% CI of mean difference and 95% CI of correlation coefficient in FJS-12 and OKS after bootstrapping for N = 200 and for N = 100 showed similar results (Table 11). For example, in the comparison "FJS-12 Q01 Follow-up -OKS Q01 Follow-up", the 95% CI of mean differences were 0.43 to 2.14 (bootstrapping for N = 200) and 0.43 to 2.14 (bootstrapping for N = 1000) ( Table 11,     Obesity is a well-known risk factor for OA, and endstage OA patients demand for TKA. World Health Organization (WHO) released a brochure on "Global Strategy on Diet, Physical Activity and Health" in year 2004 [45] followed by a global action plan on physical activity 2018-2030 in year 2018 [46]. A recent report projected the obesity trend in 2030 that the number of people who are overweigh might reach a total of 2.16 billion and another 1.12 billion obese population, or 38% and 20% of the world's adult population respectively [47]. Mean BMI of patients in our previous studies always fell within "overweight" or "obese" categories [48][49][50]. Consequently, a PRO questionnaire for patients underwent TKA is  important to provide accurate and high responsiveness to the respondents (patients) who are "overweight" or "obese". The effect of BMI on results from different PRO questionnaires are somehow conflicting [51][52][53]. FJS-12 has been proven to be simple, valid and reliable in original and translated versions [3,17,20,[54][55][56]. A study in New York found that although patients who were obese (BMI ≥ 30 kg/m 2 ) and received primary TKA provided lower post-surgery FJS-12 scores, statistical significance was not found [57]. That means FJS-12 is able to accurately reflect patients' outcome undergoing conservative or operative treatment of the knee, regardless of the patient's BMI. The mean BMI of our patients was 27.48 which was classified as "obese" (using BMI categories   for Asians [58]). We speculate the percentage of obese patients would be ever increasing. The education level of our patients also reflects the necessity of having a Traditional Chinese-Hong Kong version of FJS-12 for local community. The validated FJS-12 is, therefore, suitable for any patients who linguistically prefer Traditional Chinese-Hong Kong version. There are 3 questions which either ICC or Cronbach's alpha was lower than 0.7. The 3 questions are: Q3. when you are walking for more than 15 min, Q8. when you are standing up from a low-sitting position, and Q10. when you are doing housework or gardening. Looking at the percentages of "floor" and "ceiling" answers in these questions can identify the causes. In question 3, 24.7% of patients were never aware of their artificial joints when walking for more than 15 min (the higher percentage of "never" means better (already forgotten their artificial joints)) and this percentage has been decreased to 17.9% after 2 weeks. Similarly, the percentage of answering "mostly" increased by 4.2% (17.9%-13.7%) meaning more patients took extra attention to their knee implants after at least 15-min walk. Patients tended not to "forget their knee implants" within the test-retest period. In this study, the period administering both questionnaires between the 2 rounds was about 2 weeks, which was similar in other validation studies [25,55,59]. As a result, the percentages had been changed and the changes made the ICC and Cronbach's alpha lower comparing with other questions. Similar phenomenon was also observed in question 10 (patients were "alerted" and fewer patients forgot their knee implants when doing housework). In question 8, statistical significances were found between patients scoring "mostly" and "never" in both baseline and follow-up. Percentages of patients reflecting "never" thought of their artificial knee joints increased from 32.0% at baseline to 42.9% at follow-up, and at the same time, the percentages of patients mostly aware of their knees decreased. The results of Q8 (when you are standing up from a low-sitting position) are contrary to those of Q3 (when you are walking for more than 15 min) and Q10 (when you are doing housework or gardening) because walking for more than 15 min and doing housework or gardening are continuously performing while standing up from a low-sitting position is an example of split-second movement. Patients on artificial knee joints tend to be aware of their joints after these kinds of continuous activities over time (reflected by the decreased ceiling percentages and increased floor percentages). Patients gain confidence on short-term movements over time; therefore, more patients "forget" their artificial joint(s) when they stand up from a lower-sitting position. No significant floor and ceiling effect was observed through a recent validity study in the UK evaluating the Oxford Knee Score using a national patient-reported outcome measure dataset [60].
Correlations between translated FJS-12 and OKS are promising. We correlated FJS-12 with OKS at baseline and follow-up, and results were 0.598 at baseline and 0.879 at follow-up. In different validation studies on language adaptations using OKS as gold standard, correlation coefficients were 0.366 in German version [55] and 0.37 in Hindi version [59]. Our results showed moderate correlation when patients first answered the FJS-12 and good correlation at the repeated administration. Previous studies showed FJS-12 was more responsive at 6 months and 12 months [61], and 1 to 2 years after surgery [18]. We conclude that the responsiveness of FJS-12 is good for knee OA patients after TKA. The subjects in this study experienced TKA at least after 1 year and the responsiveness of FJS-12 was proven better 1 to 2 years after surgery [18]. Further study on inviting patients to complete FJS-12 shortly after TKA to look for the responsiveness immediately after surgery to 6 or 9 months after can fill out responsiveness data gap before 1 year after TKA. We chose OKS as the gold standard because both questionnaires share similar construct (12 questions) and total sum is calculated by simply adding all 12 scores ("final score" in FJS-12 and "overall score" in OKS; data conversion reverting the score strength in FJS-12 is require without data transformation). Both total sums can be scaled to a maximum of 100 (native in FJS-12 and ratio conversion in OKS). That would make the two questionnaires easily comparable. Furthermore, only OKS is introduced in this study which is different from other studies which employed multiple PRO tools to validate the language adapted version. The mean age of our patients was around 70 years old [49] and response bias happened when old age patients required to fill out multiple questionnaires. Telephone interview instead of face-to-face interview could have been an alternative but declined eventually because the targets were elderly patients who were prone to lower response rates [62][63][64] and they could cope with short interview duration only [62,63]. Mailing all sets of questionnaires to the participants hoping them to complete and send the questionnaires back at different time points was reported low response rate. The Dutch version came with a limitation of receiving all questionnaires back after sending two sets of questionnaires in one go expecting to receive the second set within 2 weeks [65]. Further study on developing an electronic version of FJS-12 and accessing the FJS-12 through a web/ mobile browser or mobile phone application could possibly increase the successful rate. Furthermore, if the electronic version can easily switch languages instantly, that will definitely increase the response rate in communities which use different kinds of official languages e.g., switch between English or French in Canada.
Using Bland-Altman (BA) plot explaining the agreement between two methods or test-retest reliability is very useful and clear to demonstrate any systematic error between the two measures. This confirms the good testretest (baseline-follow-up) agreement and reproducibility of FJS-12. Our previous experience on the use of BA plots to evaluate the agreements between a new imaging technology to the conventional X-ray methods was proven useful [66].
Another important message we would like to bring out from this study is to raise the awareness of subculture difference within the same ethnicity or race. We firstly introduce this point by referencing to the experience of cultural adaptation and validation of WHOQOL questionnaire. WHOQOL had been translated to Chinese (China), Chinese (Hong Kong), and Chinese (Taiwan) languages [24]. Later, the Taiwan Chinese language adaption group published another article on testing the agreement between "Taiwan Chinese" version and "Taiwanese" version of the brief version of the WHOQOL [67]. The authors pointed out that > 50% of the elderly Taiwanese at age over 65 only used a spoken language, Taiwanese. Another classic example we mentioned before, is that WHOQOL is also available in American English, Canadian English, British English, and Australian English. We speculate that sub-culture variations happen in African countries, European countries, middle East countries, Southeast Asia countries, and possibly any countries with multicultural societies or federal multicultural policies. In summary, sub-culture difference is recommended to review and consider including in future version of IQOLA project. Further longitudinal study examining the long-term reflection of FJS-12 scores to patients underwent TKA is also recommended to look for any practical change over time.

Limitations of this study
The small sample size in this study reduces the data generalizability and affects the accuracy and reliability of the results of this study. This study was carried out during COVID-19 pandemic and the patients were recruited when the local situation was being eased. We stuck onto the original research protocol to collect two sets of questionnaires through face-to-face interview. Moreover, we introduced bootstrapping to tackle the small sample size issue. Bootstrapping is an appropriate way to control and check the stability of the result. The estimates of standard errors and confidence intervals are both promising after bootstrapping for N = 200 and N = 1000. Second, we admit that using multiple gold standards increase the validity of the translated version. However, our experience tells us that when patients move on to the second questionnaire, they start asking questions on why the questions are similar to the first one. Some patients requested to opt out from the study. This affects the compliance rate. Therefore, we choose the well-recognized patient subjective outcome assessment (i.e., OKS) as the sole gold standard in this study. In view of this situation, NRS was added to correlate with FJS-12 although NRS might not be classified as "gold standard tool". Minimal important change (MIC) in calculating responsiveness is an estimate which needs to establish a gold standard. MIC of Hindu version is 8.67 and 10.9 in German version. Further study on standardizing the calculation of MIC is recommended.

Conclusions
Traditional Chinese-Hong Kong version of FJS-12 showed good reliability and validity for patients underwent TKA. The "Forgotten joint" score questionnaire did a great job to evaluate how the patients "forget" their artificial joint during their daily activities. FJS-12 is also suitable for patients who are obese (or body mass index (BMI) non-specific). Individual questions and final score did not carry any floor effect and ceiling effect. FJS-12 also found to have good agreement, nice responsiveness and discriminant validity. FJS-12 are important PRO questionnaires for patients who come across TKA with benefits outstand other PRO tools. Moreover, subcultural adaptation should be considered along with the standard guideline during cross-cultural adaptation and validation.