Adaptation of Chinese and English versions of the Ankylosing Spondylitis quality of life (ASQoL) scale for use in Singapore

Background To cross-culturally adapt and validate the Singapore Chinese and Singapore English versions of the Ankylosing Spondylitis Quality of Life (ASQoL) scales. Methods Translation of the ASQoL into Singapore Chinese and English was performed by professional and lay translation panels. Field-testing for face and content validity was performed by interviewing ten Chinese speaking and ten English speaking axial spondyloarthritis (AxSpA) patients. AxSpA patients (either Chinese or English speaking) were invited to take part in validation surveys. The Health Assessment Questionnaire (HAQ), Short Form Health Survey (SF-36), Bath Indices, and other measures of disease activity were used as comparator scales for convergent validity. A separate sample of AxSpA patients were invited to participate in a test-retest postal study, with 2 weeks between administrations. Results The cross-sectional study included 183 patients (77% males, 82% English speaking), with a mean (SD) age of 39.4 (13.7) years. The ASQoL had excellent internal consistency (Cronbach’s alpha = 0.88), and correlated moderately with all the comparator scales. The ASQoL was able to distinguish between patients grouped by disease activity and perceived general health. The ASQoL fulfilled the Rasch model analysis for fit, reliability and unidimensionality requirements. No significant differential item functioning was noted for gender, age below or above 50 years, and language of administration. Test–retest reliability was good (r = 0.81). Conclusions The ASQoL was adapted into Singapore Chinese and English language versions, and shown to be culturally relevant, valid and reliable when used with combined samples of AxSpA patients who speak either Chinese or English.


Background
Axial spondyloarthritis (AxSpA) is a chronic inflammatory rheumatic disease affecting the spine, and to a lesser extent, the peripheral joints. It affects young adults who are economically active, causing pain and stiffness, and therefore limiting their ability to perform various activities of daily living. Over time, AxSpA can lead to structural and functional impairments and significantly impairs quality of life (QoL) [1]. QoL has been increasingly recognised as an essential outcome in chronic diseases. With the development of new treatment modalities for AxSpA, such as tumour necrosis factor (TNF) blockers, it is vital that there are accurate and reliable methods of measuring QoL in AxSpA.
The Ankylosing Spondylitis Quality of Life (ASQoL) questionnaire is a measure of QoL specific to ankylosing spondylitis (AS), and has been validated for use with SpA patients [2]. It was developed simultaneously in the United Kingdom (UK) and the Netherlands (NL) [3], and has been adapted and validated for use in numerous languages including Italian and Spanish [3][4][5][6]. Content of the ASQoL was derived from interviews with AS patients exploring the impact of the condition on their QoL. The ASQoL is based on a clear theoretical construct, the needs-based QoL model. According to the need-based model, QoL is defined as the extent to which an individual is able to meet his or her fundamental human needs. QoL is high when these needs are fulfilled and poor when they are not [7]. The model has been used as the theoretical basis in the development of patient-reported outcome measures (PROMs) for over 30 diseases, including measures for rheumatic diseases such as rheumatoid arthritis, osteoarthritis, psoriatic arthritis and systemic lupus erythematosus [8][9][10][11]. The ASQoL has been used in a number of clinical trials [12][13][14][15] and has been shown to be sensitive to change [16]. Furthermore, the ASQoL has been demonstrated to fit the Rasch model, providing evidence of its unidimensionality [3] and construct validity.
Currently, the ASQoL is not available in Singapore Chinese or Singapore English and therefore has not been available for use in this country. The aim of the present study was to develop versions of the ASQoL suitable for use in Singapore, to enable the measure to be used in multi-centre clinical trials and longitudinal studies. With a population below 7 million it is difficult to justify adapting a measure into the several different languages spoken in the country. Singapore is a multiethnic country, with 80% of Singaporeans literate in English, and 71% literate in two or more languages, PROMs available in both English and Chinese will cover 98% of the population [17]. Consequently, a decision was taken to produce Chinese and English language versions of the ASQoL and to validate them with a combined sample of Chinese and English speaking AxSpA patients. This is practical and aligned to the daily clinical practice, observational cohorts and clinical trial in the local settings.

Translation of the ASQoL
The ASQoL was translated into Singapore Chinese and Singapore English using the dual-panel methodology [18]. Studies have shown that the dual-panel methodology produces translations that are more acceptable to patients than the standard forward-backward methodology [19,20]. The methodology involves conducting two independent translation panels; a bilingual panel followed by a lay panel. First, a bilingual panel produces an initial translation of the questionnaire into the target language. The bilingual panel was conducted with the presence of the original developer of ASQoL (SM), who advised on the closest meaning of the items and instructions to the original English language wording. Following this, a lay panel consisting of individuals with an average to below average educational level, assess the questionnaire items and instructions to ensure that they are written in clear, everyday language appropriate for typical patients. As the ASQoL was originally developed in English, only a lay panel was required for the Singapore English version, to ensure that the wording of item and instructions were appropriate for local patients.

Field-testing for face and content validity
Cognitive debriefing interviews (CDIs) were conducted with Chinese speaking and English speaking patients with AxSpA to assess the relevance, acceptability and comprehensiveness of questionnaire items and instructions. AxSpA patients completed either the English or Chinese version of the ASQoL according to their primary language. Patients were chosen by the attending clinician to represent a range of disease severity, gender and age. Questionnaires were completed in the presence of a trained interviewer (YYL), who observed and made note of difficulties or hesitation in answering any items. The time of start and completion of the ASQoL were recorded. After completing the ASQoL, reasons for taking a longer time to complete specific items were elicited (eg whether the item wording was unclear, item was not relevant to the subject, etc). Patients were also asked whether all the items were relevant in assessing their QoL. With an open ended question, patients were invited to give further comments about the questionnaire and asked if any important aspects of their experience had been omitted. This method of evaluation of content validity fulfilled the recommended requirements, and has been used in the development of ASQoL [3] and other needs-based PROMs [10,21].

Psychometric validation
Due to the small number of SpA patients in Singapore it was decided to validate the translations with a combined sample of Singapore Chinese and Singapore English speakers. Patients fulfilling the Assessment of Spondy-loArthritis international Society (ASAS) criteria for AxSpA [22] were recruited from a designated SpA outpatient clinic of a tertiary hospital. The ASAS criteria for AxSpA allow classification of patients with inflammatory back pain, other SpA features or signs of inflammation on imaging (either Magnetic Resonance Imaging or radiography); which is representative of the cases of AxSpA in daily clinical practice. Clinical features of patients with AxSpA in Singapore have been reported to be similar to those in Caucasian countries [23]. For the purpose of culturally adapting the ASQoL, we conducted a crosssectional study with consecutive patients with AxSpA to determine the validity of ASQOL via classical theory testing (internal consistency, convergent validity and known-group validity) and Item Response Theory. We excluded non-residents (non-Singapore permanent residents or citizens), patients who could not read English or Chinese, and patients who could not give informed consent. To assess the test-retest reliability of ASQoL, a second postal survey was conducted with a separate sample of AxSpA patients who completed the ASQoL on two separate occasions, two weeks apart.
All study protocols were approved by the SingHealth Centralized Institutional Review Board (ref. CIRB 2012/ 498/E and 2012/696/E). Prior to their inclusion in the study, all patients provided informed consent.

Cross-sectional study
Patients were examined by a rheumatologist and physician global assessment (PhGA) was recorded. Patients then completed a demographic questionnaire, followed by the Chinese or English ASQoL (depending on their primary language) and the comparator scales described below. All PROMs were filled in paper and pencil format in outpatient clinic.

ASQoL [3]
The ASQoL has 18 items, each with a dichotomous response format (True/Not true). 'True' responses are summed to create a total score. High scores indicate poor QoL. The original versions had excellent internal consistency (Cronbach's apha = 0.89-0.91), test-retest reliability (r = 0.92 UK and 0.91 NL), and fulfilled the requirement of unidimensionality (fit to the Rasch model) [3].
Health Assessment Questionnaire (HAQ) [24] This measures functional limitations in arthritic disease. It consists of 20 items covering eight categories each with a four-point scale. Total scores range from 0 'no disability' to 3 'completely disabled'. The Chinese version of HAQ have been proven valid and reliable (test re-test reliability, r = 0.84) in rheumatology settings in Singapore [25].

Short form-36 health survey (SF-36) version 2 [26]
This is a generic health status that has been extensively validated in general population and numerous disease groups globally. The instrument consisting of 36 items which form eight subdomains (Physical Functioning, Role physical, Bodily pain, General health, Vitality, Social function, Role emotion and Mental health) and two summary scales (Physical Component Summary and Mental Component Summary). Scores for each scale are calculated by summing the items in each subdomain, which are then transformed onto a scale of 0 'worst health' to 100 'best health'. Both English and Chinese versions of the SF-36 have been validated for use in the general population of Singapore [27]. The validity and reliability of SF-36 have been evaluated, with Cronbach's alpha of subscales ranged from 0.88-0.90 [28].
The bath indices [29][30][31] Patients completed the Bath Ankylosing Spondylitis Disease Activity Index (BASDAI), the Bath Ankylosing Spondylitis Functional Index (BASFI) and the Bath Ankylosing Spondylitis Global Score (BAS-G), which measure disease activity, physical functioning, and the effect of AS on the patient's well-being in past 1 week, respectively. Items are answered using a 100 mm visual analogue scale (VAS). Scores on each of the scales range from 0 to 100, with a high score indicating higher disease activity/worse disability/greater effect on well-being.
Patient global disease activity (PGA) and perceived pain in the past week were also measured using a 100 mm VAS.

Statistical analyses
Internal consistency was assessed using Cronbach's alpha coefficient. Alpha measures the extent to which items in a scale are inter-related. A value of 0.7 or higher indicates adequate inter-relatedness between the items [32], and >0.90 are considered necessary for individual level use.
Convergent validity was determined by correlating scores on the new measure with those on existing PROMs. ASQoL scores were correlated with scores on the comparator measures and clinical assessments using Spearman's rank correlation coefficients. Strong (r ≥ 0.7) and moderate correlation coefficients (r = 0.5-0.7) suggest that the scores from two PROMs are measuring related construct whereas weak correlation coefficients (r ≤ 0.3) suggest the PROMs are measuring different construct [33]. We hypothesize the ASQoL should correlate with SF-36 subscales at least moderately, and to a lesser extent with pain or global assessment indexes.
Known group validity assesses whether a scale can distinguish between groups of patients that are known to differ by some factor that would be expected to influence their scores. The factors used for the present study were physical function and disease activity (determined by HAQ and BASDAI score) and perceived general health (response to item 1 of the SF-36). Responses on the general health item were dichotomized to good (excellent/very good/good) and poor (fair/poor) for comparison. Non-parametric tests for independent samples (Mann-Whitney U test for two groups or Kruskal-Wallis One-Way Analysis of Variance for three or more groups) were employed.

Rasch model analysis
The Rasch model analysis provides a robust evaluation of the internal construct validity of PROMs, and has been increasingly applied [34,35]. We used WINSTEPS version 3.93.2 for Rasch model analysis and a dichotomous model was fitted. We evaluated fit to the model by a chi-square statistic, the mean-square (MNSQ) of informationweighted fit statistics (INFIT) and outlier-sensitive statistics (OUTFIT). The acceptable range of fit statistics is in the range of 0.7-1.3.A high fit statistic, > 1.3, denotes noise in the data and implies the item may not belong to the unidimensional construct. A low fit statistic, < 0.7, indicates that the item may have interdependence with another item. We examined the person separation reliability and item separation. Person separation reliability illustrates how well the index differentiates persons, while item separation provides a measure on the spread of item difficulties. An index is considered reliable if person separation reliability is >0.8.
Items of ASQoL were examined for differential item functioning (DIF) or item bias by comparing item performance for different subgroups using t-tests. Subgroups analysed included gender, age <50 years or ≥50 years, and language of administration (Chinese vs English). We reported the probability using Mantel-Haenszel statistics, a Bonferroni adjusted p-value of 0.0027 was considered significant.
We further examined unidimensionality via Principal Component Analysis (PCA) of the residuals [36], followed by comparison with simulation data. The absence of other meaningful patterns in the residuals were evaluated through plotting the person estimates of two subsets of positive and negative loading items (as defined by correlated at above or below 0.3 on the first factor of the PCA of the residuals).

Test-retest reliability study
Test-retest reliability of a measure is an estimate of its reproducibility over time when no change in condition has taken place. In a separate study, the ASQoL was completed by the same patients with AxSpA on two separate occasions, 2 weeks apart. We only included patients with stable SpA, and excluded those we were expecting a change in condition, such as those having changes in medication regimen or changing exercise intervention or plan. We standardized the administrations so that the forms were completed in their rest time at home on each occasion. Two sets of questionnaires in paper and pencil format were given to patients and completed forms sent back via return envelopes. The 2 week time period was chosen because it is unlikely that the disease status will change within this short period -yet it is long enough to avoid recall bias. The test-retest reliability was assessed by correlating scores on the ASQoL on two different occasions using Spearman's rank correlation coefficients. As data of the ASQoL are ordinal in nature, Spearman rank correlation coefficients were appropriate (intraclass correlation coefficients are also reported for information only). The Spearman's rank correlation of >0.85 for rank correlation coefficients were considered low random measurement error and evidence of high reproducibility [37]. Other than Rasch model analysis, all analyses were performed using the SPSS version 21 statistical package.

Translation of the ASQoL
All translation panels consisted of five participants (two males, three females). Participants in the three panels were aged from 23 to 72 years. The panels reported very few difficulties in producing translations for most of the items and instructions. In both the Chinese and English panels, the item 'I struggle to do jobs around the house' was interpreted as referring to housework, which in Asian culture is not usually done by men. Therefore, the panel replaced 'jobs' with 'chores' to make the item relevant to both genders. Also, the item 'I worry about letting people down' was changed to 'I worry about disappointing people'. This was because, in Singapore, 'letting people down' may mean a person has committed a grave offence and put their family to shame. Therefore, in both language versions it was modified to convey a much milder meaning.

Field-testing for face and content validity
Cognitive Debriefing Interviews (CDI) were conducted with ten English speaking (age range 22-55 years, 90% male, 50% married, 20% unemployed) and ten Chinese speaking (age range 24 -60 years, 60% male, 80% married, 20% unemployed) patients with AxSpA. Overall, patients found the instrument easy to understand and relevant to their condition. Five patients highlighted areas which affected their lives that may be missed. This included restricted neck and back mobility, lower body mobility, difficult to drive long distances, impairment in work productivity. After discussion, patients agreed the above aread were assessed in item 1 (limits places can go), item 4 (struggle with chores), item 7 and 12 (tired), and item 8 (keep taking break when work). Therefore, there was no important aspects of the impact of their condition had been omitted. The patients took a mean of 3 min and 2.4 min to complete the English and Chinese versions. No changes to the questionnaires were necessary as a result of the CDIs.

Psychometric validation
A total of 187 consecutive patients with AxSpA were recruited in the cross-sectional study, of whom 183 gave complete data. 82% and 18% completed the English and Chinese versions. Of these, 175 patients had radiographic images available. One-hundred and 45 (83%) fulfilled the New York modified criteria for sacroiliitis [38]. Among the 30 patients with radiographic features not fulfilling these criteria, 20 had active sacroiliitis on magnetic resonance imaging [22]. Seventy-seven percent of the patients were male, mean (SD) age and duration of illness were 39.5 (13.7) and 7.8 (8.9) years. Demographic and disease information are shown in Table 1. The mean (SD) ASQoL score was 4.6 (4.2). The ASQoL was noted to have a ceiling effect in 19.1% of patients, and no floor effect was noted. Table 2 shows scores on all the outcome measures.
Cronbach's alpha coefficient for the ASQoL was 0.88 indicating good internal consistency. The Cronbach's alpha coefficient for Singapore ASQoL Chinese and English version separately were 0.93 and 0.86. Table 3 shows the Spearman's rank correlations between ASQoL scores and scores on the comparator measures. Moderate correlations were observed between ASQoL scores and SF-36 subdomain scores and summary scores, indicating that these impairments and functional limitations influence QoL. There were moderate correlations (but lesser extent) to pain, global assessments, and physical function. This aligned with the hypothesis testing of ASQoL as a measure of QoL.    Figure 1 shows mean ASQoL scores grouped by disease status (HAQ and BASDAI scores) and perceived general health (response to item 1 of the SF-36). Significant differences were found between patients grouped by these factors, demonstrating the ability of the ASQoL to distinguish between subgroups of patients (all p-values <0.0001).

Rasch model analysis
Chi-square based fit statistics are presented in Table 4. The fit of the 18 items of ASQoL were good. There were two items showing minor misfit, with MNSQ values outside the required 0.7-1.3 range. The person separation reliability was 0.87, indicating adequate reliability. The person and item separation index were 1.51 and 4.77, respectively. DIF analysis were performed for age (<50 years = 143 vs. ≥50 years =49); gender (male = 141 vs. female = 42); and language of ASQoL administered (Chinese = 33 vs. English = 150). Slight DIF was revealed with item 2 which asked about "feel like crying", which suggests that males and females may response differently toward this item. There was no significant DIF noted by age and language ( Table 4).
PCA of the residuals indicate that 37.2% of the variance of the residuals was explained by the Rasch model. Out of the 62.8% of the variance of residuals that were not explained, the first contrast explained 7.2%, corresponding to a eigenvalue of 2.007 ( Table 5). The variance of residual explained by the Rasch model, and the Eigenvalue of the first contrast were slightly lower than the recommended 40% and 2.0. However, the percentage of variance attributed to the first residual factor compared to the Rasch factor was only 19.4%, which was within the recommended limits. We compared this result with two sets of simulation data, generated with perfect fit to the Rasch model. The simulation dataset showed similar results (Table 5). We further plotted the person estimates of two subsets of positive and negative loading items (as defined by correlated at above or below 0.3 on the first factor of the PCA of the residuals), which indicated no likelihood of a second dimension (Fig. 2). These findings suggest that when the Rasch factor was removed, there was no leftover patterns found in the residuals, thus confirming unidimensionality of the adapted ASQoL.

Test-retest reliability study
A separate sample of 42 AxSpA patients completed the ASQoL on two separate occasions, 2 weeks apart. Over 70% of the sample was male and the mean (SD) age was 39.1 (12.8) years. Nineteen (45.2%) patients completed the Chinese version and 23 (54.8%) the English ASQoL. There were no significant differences in patient global assessment, pain scores, and Bath Indices for these patients between two time points (all p values >0.19). The mean (SD) ASQoL scores at time 1 and time 2 were 4.07 (4.34) and 4.17 (4.44), respectively (p = 0.95). The Spearman's rank correlation coefficient between ASQoL at time 1 and time 2 was 0.81, indicating adequate reproducibility. Limiting analysis to 36 patients who had changes of BAS-G less than the reported minimal clinically difference of <15 mm between two time points, the Spearman's rank correlation coefficient of ASQoL at time 1 and 2 was 0.85. The intraclass correlation coefficient of ASQoL between time 1 and 2 was 0.86 (95%CI: 0.74-0.92).

Discussion
The ASQoL was successfully translated and validated for use with Chinese and English speakers with AxSpA in Singapore. The dual-panel translation methodology has been used successfully in the adaptation of all other language versions of needs-based QoL instruments. Research has shown that this approach produces translations that are more acceptable to patients than forwardbackward translation [19]. Patients who completed the CDIs found ASQoL to be relevant, applicable and easy to understand, and the ASQoL demonstrated excellent psychometric properties, with good internal consistency, convergent validity, known-group validity. The construct validity of the adapted ASQoL was supported by fulfilling the requirement of item fit, local independent of items and unidimensionality of the Rasch model. Testretest reliability was confirmed in a separate study. These findings are comparable to those found for the original UK measure, and address the truth and feasibility filter domains of the Outcome Measure in Rheumatology (OMERACT) consensus initiative [39]. PROMs commonly used in AxSpA, such as the Bath Indices (BASDAI, BASFI, BAS-G) provide limited information on the impact of AxSpA and its treatment on the patient. QoL has been recognised by the ASAS as an important domain [40], and one of the core domains and outcome measures of rheumatic diseases in the Core Set for Longitudinal Observational Studies in Rheumatology [41]. Furthermore, maximising QoL has been recognised as the long term goal of treating AS patients in the ASAS/European League Against Rheumatism (EULAR) recommendation for management of AS [42]. The ASQoL is a disease-specific measure, and its content was derived from qualitative interviews with patients, and as a result all the items are directly relevant to AS patients. The measure was also based on a clear theatrical construct, the needs-based model of QoL.
These are both essential requirements for QoL instruments in clinical research [43].
A limitation of the study was the small sample size available for the validation study. Particularly, a larger sample size is required for Rasch analysis. Recommendations suggest seven times the number of items of the PROM of interest, and >100 patients; and more than 200 patients per subgroup for proper DIF evaluation [44]. However, resource constraints did not allow us to recruit this number of AS patients, particular for the Chinese Speaking subgroup. Therefore, the small sample size for subgroups in our studies may have an impact particularly on the DIF analysis, and therefore may not be considered definite. It is also a limitation to validate the combined Chinese and English versions of ASQoL for a practical consideration. However the language of  administration (English or Chinese) did not affect the psychometric properties of the ASQoL, given the low level of DIF related to language, despite the small number of patients in the Chinese subgroup. The instruments for assessing patient's status were all relying on patient reported outcomes with the exception of PhGA, which may limit the comparison of ASQoL with a more objective assessment of disease activity. This is particularly true for the known-groups validity, which would be more robust if the "known groups" were established with a more clinical and objective standard. Nonetheless, except Ankylosing Spondylitis Disease Activity Score (ASDAS) that takes into account acute phase reactants in blood, other assessments of AxSpA at the current moment heavily rely on PROMs [12]. Besides, as the intended use of ASQoL is to measure the QoL in the patient's perspective [45], it is relevant to use other PROMs for a comparison. It is also known that QoL is not necessarily strongly related to clinical severity. Of note is that the Bath indices used in this study were not properly culturally adapted and validated in the Singaporean languages. However, the main comparator PROMs, SF-36 and HAQ have been properly validated in our population. We acknowledge the slightly low Spearman's coefficient (0.81) for test-retest reliability, and was improved to 0.85 when the analysis was limited to patients with more stringent criteria of "no change". The internal consistency of the adapted ASQoL falls marginally short of the recommended (>0.90) level for individual level use. However, the Rasch model analysis supported the necessary internal consistency and structure validity of the Singapore versions of ASQoL. A sample size of at least 50 is recommended for establishing test-retest reliability [44]. The actual sample available for test-retest reliability in our study (n = 42) may risk producing an inaccurate estimate of this property. Finally, given the cross-sectional study design, it was not possible to evaluate the responsiveness of the new ASQoL language versions.

Conclusion
The present study has demonstrated that the Singapore English and Chinese versions of the ASQoL are culturally relevant, valid and reliable instruments when used with combined samples of AxSpA patients who speak either Chinese or English.