This study was designed and performed in accordance with the COSMIN checklist [11] and guidelines for PROMs [12].
Design
We used a cross-sectional design, including a test-retest assessment.
Translation and cross-cultural adaptation
The translation and cross-cultural adaptation were done according to international guidelines [9, 10]. Two translators (one philologist and one clinician), whose mother tongue is Norwegian, independently translated the 26 items into Norwegian and synthesized them into one Norwegian version before it was translated back to English. Two translators and native English speakers, blinded the original PWQ items, independently performed the backtranslation and synthesized the two versions into one English version. An expert committee consisting of the translators and two researchers from the research group (MG, RMK) reviewed the translations and agreed on a prefinal version. Ten patients with musculoskeletal disorders reviewed the prefinal Norwegian version. The items and responses were confirmed to be relevant and understandable without any proposed alterations. Since the prefinal version was acceptable and easy to comprehend, no changes were made for the final version.
Participants
Participants were recruited from an outpatient rehabilitation clinic in Akershus, Norway, between November 2015 and January 2018. Eligible participants were patients with different types of musculoskeletal disorders, aged 18 or above, working or on sick leave, who were referred to a specialist assessment and rehabilitation at the outpatient rehabilitation clinic. Exclusion criteria were patients being unable to speak, read or write in Norwegian. Inclusion was performed by clinicians, primarily physiotherapists, meeting with patients at the clinic. At baseline, all patients received written and oral information about the study, and provided their signed, informed consent.
According to recommended quality criteria by Terwee et al. [12] and Kline [13] we planned to recruit a minimum of 100 patients. These criteria suggest a minimum of 100 participants for assessing internal consistency, at least 50 participants for assessing reliability and floor or ceiling effects [12], and at least 4–10 participants for each item included in factor analysis [13].
Procedures and measurements
At baseline, patients completed the PWQ as part of a comprehensive questionnaire which also included sociodemographic variables, pain localization, intensity and history, psychosocial work environment, productivity costs and health-related quality of life.
The McGill pain drawing was used to measure pain localisation during the last week [14]. The Numeric Rating Scale (NRS) (range 0–10, a higher score indicates more severe pain) was used to measure average pain intensity in the last week [15]. The General Nordic Questionnaire for psychological and social factors at work (QPSnordic) was used to measure characteristics of the psychosocial work environment [16]. The iMTA Productivity Cost Questionnaire (iPCQ) was used to measure work status (occupation, paid job, working days/hours a week, sick leave and rehabilitation/work disability) and productivity costs [17]. The Short Form 36 Health Status Questionnaire (SF-36) (range 0–100, higher score indicates better health- related quality of life) was used to measure health-related quality of life. In addition, the Mechanical Exposure Index (MEI) (range 0–24, higher score indicates higher physical workload) was used to measure physical workload [18].
Patients consenting to participate in the test-retest part of the study filled out the PWQ and a global question recording change in work status at a second meeting, preferably within 1 week. Patients reporting “unchanged” work status were considered stable and included in the test-retest reliability analysis.
The physical workload questionnaire
The PWQ is a self-report questionnaire for assessing physical workload [8]. The questionnaire consists of 26 items assessing force, dynamic and static load, repetitive load, (uncomfortable) postures, sitting, standing, and walking. In the only previous study, assessing dimensionality, internal consistency, and construct validity among patients with upper and lower extremity musculoskeletal disorders in the Netherlands, factor analysis revealed two subscales- twelve items related to the first subscale “Heavy physical work” and six items related to the second subscale “Long-lasting postures and repetitive movements” [8]. The remaining eight items were excluded due to low loading or to similar loading on both subscales. Each item is scored on a 4-point Likert scale with the response options: “seldom or never” (0), “sometimes” (1), “often” (2), and “(almost) always” (3). Scoring is conducted by adding up the responses to each item to produce a raw score. The final scores are calculated by dividing the raw score by the maximum possible score on the subscale, multiplied by 100, resulting in a final score ranging between 0 (no workload) and 100 (highest workload) for each subscale [8]. The Norwegian version of the 26 items on the PWQ is shown in Additional file 1.
Analysis
All data analyses were performed using SPSS version 26 (IBM Corporation, Armonk, NY, USA). The structural validity was explored using Exploratory Factor Analysis (EFA) based on the same 26 items which formed the basis of the study of Bot et al. [8]. The suitability of data for factor analysis was confirmed using the Kaiser-Meyer-Olkin measure of sampling adequacy (values above 0.6 considered acceptable), a significant Bartlettʼs Test of Sphericity and inspection of the correlation matrix (correlation coefficients of .3 and above preferable) [19]. Principal Component Analysis (PCA) was used to extract the factors followed by oblique rotation of factors using oblimin rotation. The number of factors to be retained was guided by three decision rules: Kaiserʼs criterion, retention of eigenvalues above 1, Cattelʼs scree plot [20], and by the use of Hornʼs parallel analysis [21]. To aid in the interpretation of the retained factors, we computed factor loadings after direct oblimin rotation, allowing factors to correlate [19]. The next step involved interpreting the rotated solution by identifying which items loaded on each retained factor. Items with factor loading below 0.5 [22] and communalities value below 0.3 were excluded [23]. Items which cross-loaded were retained in the factor they loaded most strongly.
Hypothesis testing was assessed by 14 a priori hypotheses; “known” group validity (eight), convergent validity (two) and discriminant validity (four). The “known” group hypothesis are identical to those in the original study. They were tested with the same procedure as in the study of Bot et al., where it was hypothesised that physical workload would vary among different occupational groups [8]. As in the original study, the occupations of all included patients were classified into four groups based on expected physical load, and the subscale scores of the occupational groups were compared.
-
Group 1: no physical load (for example teacher, manager)
-
Group 2: heavy physical load (for example nurse, childcare worker)
-
Group 3: long-lasting postures and repetitive movements (for example cashier, civil servant, engineer)
-
Group 4: both heavy physical load and long-lasting postures and repetitive movements (for example electrician, farmer, mechanic)
Two investigators (LGK, ØNV) made the classifications independently, based on available occupation descriptions [24, 25]. Disagreements were resolved in a consensus meeting with a third investigator (RMK). Three occupations could not be classified (police, shop assistant and service employee) due to considerable physical workload variability within the occupations, and patients with these occupations were therefore excluded from the hypothesis analyses.
To assess convergent validity, both subscales were validated against the MEI [18]. The MEI includes similar questions to the PWQ, especially regarding heavy physical workload. We therefore expected high correlation between the MEI and the “Heavy physical work” subscale and moderate to high correlation between the MEI and the “Long-lasting postures and repetitive movements” subscale. To assess discriminant validity of the PWQ subscales, we formulated hypotheses regarding two dimensions from SF-36; “physical function” and “general health” [26]. These dimensions measure different constructs to the PWQ. We therefore expected low correlation between both PWQ subscales and the SF-36 dimensions. If > 75% of the predefined hypotheses were confirmed, construct validity was considered acceptable [12]. Mann-Whitney U tests and Wilcoxon signed ranks tests were used in “known” group analyses. Spearman’s rho was used in all correlation analyses (convergent and discriminant validity) because the scales were not normally distributed. Correlation coefficients under 0.3, between 0.3 and 0.6 and over 0.6 were considered low, moderate and high, respectively [27]. The hypotheses are listed in Table 3.
The internal consistency of the subscales was examined using Cronbach’s alpha. Cronbach’s alpha between 0.70 and 0.95 gave a positive rating [12]. The item-total correlation was examined and items with values below 0.3 were excluded [28].
For test-retest assessment, a paired t-test was used to assess the mean difference between test and re-test. An intraclass correlation coefficient (ICC2,1) was used to assess relative reliability. The acceptable level of ICC was set to ≥0.70 [12]. Absolute reliability (measurement error) was evaluated by standard error of measurement (SEM) and smallest detectable change (SDC). ICC2.1 and SEMagreement were used to account for the systematic difference between test and re-test [28]. SEM was estimated from the SPSS VARCOMP analysis; SEMagreement =√ (o2o + o2po,e), where o2o is the variance due to systematic error between observations and o2po,e is the random error. Based on this, the SDC was estimated using the formula SDC95%ind = 1.96 × √2 x SEMagreement [28].
Proportions of missing data and floor and/or ceiling effects were described. Floor or ceiling effects were considered to be present if more than 15% of patients reported either the lowest or the highest possible score [12].