Study design
We assessed the reproducibility of the findings of our initial cohort study (recruited in 2016) by following two subsequent patient cohorts (recruited in 2017 and 2018) that completed the same diagnostic before–after study as the initial cohort [14]. We did not use a control group because our original study also applied a before-after design. In brief, patients rated their functional ability before and after FCE. The FCE was performed on two consecutive days. The performance on two days is the standard procedure and is intended to avoid overloading patients. Assessing self-reported functional ability twice before and immediately after performing an FCE is also part of the standardized FCE protocol. The time between pretest and posttest assessment was short to minimize the risk of interfering factors influencing patient perception of their functional ability, e.g. improvement due to the natural course of musculoskeletal trauma-related disorders or effects of an intervention. For consistency, the same rehabilitation unit, testing therapists, medical staff, FCE protocol and primary outcome were kept the same while repeating the study in two different populations. The two subsequent studies were performed on cohorts of patients who had experienced trauma and had been referred to FCE due to uncertainty in the possibility of returning to work. The original sample size was determined by the inclusion of all eligible patients, and we assumed that comparable sized samples could be recruited in the following two years. The original study and the recruitment of the following two cohorts were reviewed and approved by Ethics Committee of the Provincial Government of Lower Austria (GS1-EK-4/502–2017). We used the STROBE checklist when preparing the manuscript to ensure transparent and complete reporting of our study design and findings [18].
Setting
All patients were treated in the same 200-bed in-patient trauma rehabilitation center in Lower Austria, Austria. In 2016, approximately 1200 patients were treated in our unit. (For further details refer to https://www.auva.at/cdscontent/?contentid=10007.670948).
Participants
In the original study [14], patients were assigned to our inpatient rehabilitation center located in the eastern part of Austria following a non-work or work accident (monotrauma or polytrauma, burns, amputation, and spinal cord injury). In 2016, approximately 1200 patients were treated in our unit. While the rehabilitation program is a multi-professional one, work-related functional capacity training and other work-related treatment components are not routinely used in these programs. In 2016 patients were referred to FCE if, at the end of the inpatient rehabilitation program, the rehabilitation team was uncertain as to whether the patient was able to perform the work demands of their previous job. If the team considered a return to work was likely, patients were not referred for FCE. Patients who were referred for FCE were eligible for the study. A rehabilitation physician checked the inclusion and exclusion criteria including medical stability status, ambulation without walking aids, and the ability to read and understand German. Patients in the following years were eligible for study inclusion, if they fulfilled the same inclusion and exclusion criteria as in the year 2016.
Intervention
The WorkWell Systems FCE was developed by Susan Isernhagen in the 1990s as a systematic method to observe a subject’s ability to perform work-related tasks [8, 19, 20].The complete test battery consists of 29 items in 5 performance categories (weight handling and strength, posture and mobility, locomotion, balance, and hand coordination). For the 6 weight handling tests, the tasks must be repeatedly performed while the load is gradually increased to the level of maximal safe performance. Other tests use norms (for example, grip strength, walking speed, hand co-ordination), while in posture and mobility tests time ceiling (for example, working overhead that is performed for 5 min) or qualitative descriptors such as movement patterns, base of support, posture, and order of muscle recruitment are used to describe the respective functional capacity (for example pushing a weighted cart over a distance of 9 m) to terminate a test.
The FCE was performed on two consecutive half-days, with a therapy-free afternoon between the two test days.
The FCE was administered by either a physiotherapist or occupational therapist experienced in the FCE procedure. All therapists had done at least 10 FCEs per year during the least 5 years. The final report was then confirmed by one of two rehabilitation physicians with 5–10 years of experience in performing FCE. Both physicians had performed approximately 50 to 80 FCEs per year during the least 5 years. All therapists and rehabilitation physicians were trained and certified to perform the WorkWell Systems FCE.
Measures
Self-reported functional ability
Our primary outcome was the Spinal Function Sort (SFS). The SFS is a picture-based questionnaire, including 50 items that assess the patient’s ability to perform various work tasks and instrumental activities of daily living (for example, picking up a small tool, lifting a 10 kg tool box, or climbing a ladder) [16, 21, 22]. The SFS was originated in the 1980s to assess self-reported function capacity in low back pain patients. However, the SFS has been reported to be valid and reliable also in patients with other musculoskeletal problems including whiplash injuries and trauma [22, 23]. Moreover, the SFS was shown to be a predictor of return-to-work [16]. Items are 5-point scaled from “unable” to “able”. A total score was calculated ranging from 0–200 points, with higher scores indicating better perceived functional capacity. The SFS was completed by the patient before testing and a second time after the FCE was finalized on the second day. This immediate assessment was used to reduce the effect of other treatments.
Other variables
We additionally assessed age, sex and time between injury and start of the rehabilitation program to describe the samples. All data generated or analyzed during this study are included in this published article and its supplementary information files.
Statistics
Patient cohorts were described with absolute frequencies, means, medians and 95% confidence intervals (CIs) plus corresponding graphs (box and whisker plots, density plots, scatterplots).
Our approach of statistical analysis in the present replication study did not aim to gain non-significant p-values. We mainly compared the figures of the first cohort and their associated 95% confidence intervals, with estimates for the two subsequent cohorts. In more detail, reproducibility of findings of the first cohort in the two succeeding cohorts was analyzed in a number of ways. For rank correlation analysis based on scatterplots including straight linear regression lines we calculated Spearman’s rs. Reproducibility was achieved when Spearman’s rs of every cohort was included in the 95% CIs of the other cohorts as described by Zou [24]. For comparison of regression estimates we calculated regression slopes and considered reproducibility if the slope of every cohort was included in the 95% CIs of the other cohorts. For graphic analysis of reproducibility, we used Bland–Altman scatterplots with corresponding data ellipse density plots. Reproducibility was determined when confidence intervals for 95% limits of agreement of straight regression lines overlapped.
Although we primarily inspected the overlap of the confidence intervals, we additionally calculated hypothesis tests to present further evidence of reproducibility. We used Fisher’s exact test for categorical variables and the Kruskal–Wallis H test for continuous variables. We also compared the slopes for regressing posttest SFS scores on pretest SFS scores using an analysis of variance with Tukey’s p-value adjustment. For hypothesis tests the type I error was set to 20%, and reproducibility was achieved when the probability of error exceeded 20%.
Statistical and graphical analyses were performed using the basic version of R 3.6.1 with dedicated standard packages (car, bestNormalize, boot, cocor, psych and lsmeans). We have provided our data as Supplementary File 1.