First the English version of the LEFS was translated into Dutch according to a standardized procedure described by Beaton et al., and secondly it was tested for psychometric quality by use of prospective data.
Procedure of translation
The translation procedure consisted of four steps. First, two persons translated independently of each other the English version of the LEFS into Dutch (forward translation) (T1 & T2); one translator (TJH) had a medical background and was familiar with the concepts of the questionnaire and the other (VvS) was a certified translator without a medical background. Both were native speakers. Based on a consensus meeting one final version (T-12) was formed. Second, two bilingual persons (T3 & T4) translated the T-12 questionnaire back into English (BT1 & BT2), to guarantee a consistent translation of the questionnaire. Both translators (PA & DKJ) were unfamiliar with the original questionnaire, the concepts of the questionnaire, and had no medical background. DKJ is also a certified translator. Third, an expert meeting was organised in which all translators, two health professionals (CKS, ML), a methodologist (CHMvE) and two language experts participated. During this meeting all versions of questionnaires (T1, T2, T-12, BT1, BT2) were combined and consensus on semantic, idiomatic, experiential and conceptual equivalence was reached resulting in a pre-final version of the questionnaire. The developers of the original questionnaire approved all previous steps and the final version. Finally, the pre-final version was presented in a group of 33 patients (20 women and 13 men; age (SD): 63 (13) years) to explore the clarity of the questionnaire. All patients were asked whether they understood the items and whether they could interpret the questionnaires correctly. Also, the time needed to complete the questionnaire was timed. The findings were discussed among the translators, resulting in only minor changes to the final Dutch version of the LEFS. Mean completion time was 3.5 (SD = 1.5) minutes. For the final version of the Dutch LEFS see Appendix 1.
Patients and procedure
Individuals (≥18 years) diagnosed with hip or knee osteoarthritis (inclusion period June till October 2009) by an orthopaedic surgeon in the Sint Maartenskliniek hospital Nijmegen were eligible. People reporting concurrent rheumatoid arthritis, fibromyalgia or psoriatic arthritis, were excluded. Written materials were sent by mail: this included an information letter, an informed consent form, the questionnaires and a return envelope. At baseline, all patients completed four questionnaires, the LEFS, the HOOS or KOOS (depending on index joint), the SF-36 and the Hospital Anxiety and Depression Scale (HADS). A reminder was sent to those patients who did not respond within three weeks, to ensure a high response rate. One-hundred and twenty participants were sent a follow-up questionnaire to evaluate test-retest reliability (within 3 weeks) and another 120 participants were sent a follow-up questionnaire to evaluate responsiveness (after 3 months); as 100 participants were deemed sufficient . By use of random numbers the 240 patients were selected to either the reliability or responsiveness study. Both follow-up mailings consisted of three questionnaires (LEFS, HOOS or KOOS, and the SF-36) and a global perceived effect question. For test-retest reliability, we considered a time interval of 3 weeks to be appropriate for the current population. For responsiveness, we deemed a period up to 3 months long enough to allow for improvement and brief enough to minimize the risk of a response shift [18, 19].
The study was approved by the Institutional Review Board of the University Medical Centre Nijmegen (ID: 2009/20).
The LEFS is a 20-item condition-specific questionnaire designed to be applicable to individuals with musculoskeletal conditions of the lower extremity . Each item of the LEFS scores on a 5-point scale ranging from 0 to 4 points. When scoring the LEFS, up to 4 missing item responses are permitted, for more detailed information see Stratford et al. (2005) . Accordingly, LEFS scores range from 0 to 80 points, with higher scores representing higher levels of functioning.
The HOOS and the KOOS include five subscales: Pain, other Symptoms, Function in Daily living (ADL), Function in Sport and Recreation (Sport/Rec), and hip/knee-related quality of life (QoL). Standardized response options are given (5-point Likert scale) and each question is scored from 0 to 4 points. Subsequently, a normalized score (100 indicating no symptoms and 0 indicating extreme symptoms) is calculated for each subscale. The Dutch HOOS and KOOS have good internal consistency, construct validity, no floor and ceiling effects and have been found to be reliable [10, 11]. Both the HOOS and KOOS questionnaires include the WOMAC osteoarthritis-index in its complete and original format (with permission, http://www.koos.nu).
The SF-36 is a generic health status questionnaire which contains 36 items . It measures eight major attributes (bodily pain; physical function; social function; role limitations because of physical problems; role limitations because of emotional problems; mental health; vitality; general health perceptions). It is widely used, reliable, validated into Dutch and is easy to complete. Higher scores indicate better health .
The Hospital Anxiety and Depression Scale (HADS) is a 14-item scale designed to detect anxiety and depression, independent of somatic symptoms . It consists of two 7-item subscales measuring depression and anxiety on a 4-point response scale (from 0, no symptoms, to 3, maximum symptoms), with possible scores for each subscale ranging from 0 to 21. HADS is a valid and reliable screening instrument for detecting mood disorder in people with osteoarthritis [24, 25]. Higher scores indicate higher levels of disorder.
Fatigue is measured with the 8-itemed “Subjective Fatigue” subscale of the Checklist Individual Strength (CIS) . The outcomes per question are given in a 7-point scale, ranging from the statement ‘totally right’ to the statement ‘totally wrong’. The total score is counted in points with a range of 1-7 per question and a total score range of 8-56 points. The CIS is a sensitive instrument with good discriminating power and reliability .
The external criterion for distinguishing between improved and unimproved subjects was a 7-point global perceived effect (GPE) scale. The categories of improvement included the following: completely recovered, much improved, slightly improved, not changed, slightly worse, much worse, and vastly worsened.
Descriptive statistics were used to describe the study population and the number of missing values. Data symmetry was tested by use of visual inspection of the data distribution plotted by histograms. Psychometric qualities of the LEFS were expressed by floor- and ceiling effects, internal consistency, test-retest reliability, minimally detectable change, construct validity, discriminant validity and responsiveness.
Floor and ceiling effects
Floor and ceiling effects were determined by calculating the number of individuals that obtained the lowest (0) or highest (80) scores possible and were considered present if more than 15% of the participants achieved the highest or lowest score .
Internal consistency and dimensionality
Internal consistency – an indicator for the homogeneity of a questionnaire - was assessed with Cronbach’s alpha and 95% confidence intervals (95% CI’s). Internal consistency is considered good when Cronbach’s alpha lies between 0.7 and 0.9 . Dimensionality was assessed by performing principal component factor analysis with loading coefficient absolute value suppression at 0.40 on the LEFS, KOOS-PF and HOOS-PF to determine if the individual items loaded on a single factor. Factor extraction had three requirements: scree plot point of inflection at the second Eigenvalue, Eigenvalue cut-off >1.0, and ≥10% variance .
Reliability and minimal detectable change
Reliability concerns the degree to which the results of measurement are consistent across repeated measurements . Test-retest reliability of the Dutch LEFS was determined by means of Intraclass Correlation Coefficients (ICCs) (two-way random effects model absolute agreement) and Bland and Altman plots . The ICC(2,1) equals variance between patients divided by variance between patients plus variance between measurements plus error variance. The value of the ICC ranges from 0 to 1, where one represents perfect reliability of the measurement. Consequently, to quantify the reliability of the LEFS scores we determined the standard error of measurement (SEM = SD[√1-ICC]). The SEM is a representation of measurement error expressed in the same units as the original measurement. We quantified the minimal detectable change at the 90% and 95% confidence level (MDC90 and MDC95) by multiplying the point estimate of the SEM, the square root of 2 (to account for the error associated with repeated measurements), and the z score of 1.65 or 1.96 (resp. 90% or 95% confidence level); formula MDC90 = SEM * 1.65 * √2 and MDC95 = SEM * 1.96 * √2 .
Construct validity reflects the extent to which a particular measure consistently relates to other measures with theoretically derived hypotheses for the constructs that are being measured . To evaluate the construct validity of the LEFS, we formulated a set of 16 hypotheses (eight for knee osteoarthritis and eight for hip osteoarthritis) about the expected magnitude and direction of relationships between the LEFS and other instruments. If 75% or more of the arbitrarily set number of 16 hypotheses were confirmed we defined the construct validity of the LEFS as good [32, 33].
Discriminant validity was examined for the LEFS and the physical function subscale of the HOOS and KOOS, by contrasting its correlation with the PF subscale of the SF-36 with its correlation with the bodily pain subscale of the SF-36. Meng et al’s test for dependent data was used to evaluate the differences between those correlations .
We studied the responsiveness of the LEFS and the WOMAC-PF extracted from the HOOS-PF and KOOS-PF) in a combined hip and knee group, as only a very small number of patients reported clinically important change, thus not allowing to study the responsiveness of the HOOS and KOOS separately. As yet, a variety of responsiveness statistics is available. However, it is not yet known which of these statistics is better for assessing responsiveness  we utilized three different analyses. First we determined the Responsiveness Ratio of Guyatt (GRI: average change of recovered patients (GPE = 1-2)/SD of average change of stable patients (GPE = 3-5)). If the responsiveness ratio is larger than 1, the mean change score in clinically improved patients exceeds the measurement error and the instrument may be considered to be responsive, to an extent that is proportional to the magnitude of the responsiveness ratio [36, 37]. Second, we determined the Standardized Response Mean (SRM: average score change/SD of score change). By use of the modified Jackknife testing, we assessed differences in SRM statistically . Third, we calculated Receiver operating characteristic curves (ROC) for the improved subjects and for the worsened subjects using the change scores of the questionnaires and the patients’ ratings of change . The patients’ rating of change was dichotomized to identify those subjects who experienced a clinically meaningful reduction of symptoms. Important change was defined as ‘Much Improvement (GPE = 1-2)’ or ‘Much Decline (GPE = 6-7)’. Consequently, we computed the area under the curve (AUC). An AUC of 1.0 indicates perfect discrimination, whereas an AUC of 0.50 indicates no performance better than chance.