Although our patients had moderate limitations and most of them were already treated previously, all theory-based treatments, as hypothesized, were more effective than WL. The uncorrected data already showed a relevant decrease of the RDQ-score for all active treatments. After correction for non-balance regarding baseline variables and adjustment for patient cluster dependence and centre of treatment, CBT and CT even showed a clinically relevant decrease of ≥ 2.5 points on the RDQ and APT just did not reach this clinically relevant level (2.4 points). Furthermore, hardly any adverse effects were reported. Several secondary outcomes such as main complaints, current pain and global improvement further confirmed this conclusion. The more the patients had functional limitations the higher they reported treatment satisfaction after attending CBT or CT. Patients attending APT were more satisfied than WL-patients although this difference turned non-significant when pre-treatment functional limitations were high. This indicates that based on the patient's overall satisfaction, administration of APT in patients with moderate to severe functional limitations was less effective.
The performance tasks that seemed to be more physically demanding, improved significantly in treatments with a physical modality.
While comparing CT with APT or CBT, CT showed a significantly higher walking distance than CBT, but it can be debated whether a difference of 22.8 on 419 meters is clinically relevant. APT showed a greater reduction of depression than CT, although this change was not clinically relevant since the mean baseline score of the BDI was already low in all treatment groups. Regarding all other outcome measures, CT was not more effective than APT or CBT respectively, so the hypothesis that CT has a stronger effect than APT and CBT, could not be supported.
Until now, it was not known whether changes in functional limitations are more effectively reduced by purely psychological or physical interventions [3, 54]. To our knowledge the present study is the first trial in CLBP comparing explicitly theory-based treatments to WL and CT to APT and CBT respectively, trying to address these problems. Both purely physical and psychological treatments as well as CT showed clinically relevant improvement. The overall results of all active treatment groups are comparable to the results presented in the most recent reviews and meta-analysis on different active treatments [7, 8, 17, 18].
The lack of great differential effects of different active treatments has been attributed to the fact that these treatments are not sufficiently theory-driven . But even in our study, CT showed no additional effect. This might be caused by the relative low compliance rate in CT. Furthermore the total treatment intensity might have been a crucial factor for obtaining an additional effect. Although the CT had a total duration of 78 hours, this is still lower than the 100 hours of therapy in daily intensive CT-programs showing an additional effect when compared to non-multidisciplinary treatment . Furthermore no functional restoration was applied. On the other hand, we cannot rule out that the mixture of APT and CBT had an oppositional effect. For example, the increase of exercise load in APT was based on training physiology, and the increase of activity in CBT was based on time-contingency, which could have obscured the supplemental effect of both treatments.
By using the WL as reference treatment it cannot be ruled out that the positive effects of all active treatments were caused by other non-specific factors such as attention, a standardized treatment program, or emphasis on active participation. Otherwise, by using the WL we were able to control for time effects showing that the WL did not improve or deteriorate on all outcome measures. Furthermore, due to ethical regulations, it was not possible to use an attention-control/placebo treatment once a patient had been given an indication for rehabilitation treatment.
By including a total number of 223 patients and having 212 patients available for analyses, sufficient power was assured for the comparison between the three active treatments and the WL regarding the primary outcome measure, functional limitations. Otherwise, the power might have been insufficient to find differences between the active treatments although the point-estimates showed no clear tendency in favour of the CT.
Although the patient compliance was not very high, 95% of all randomized patients were assessed. This means that most patients, also those who did not have a sufficient intensity or showed serious protocol deviations, were included in the analyses. In this way the intention-to-treat method was approached as much as possible, ensuring that the real effectiveness of theory-based treatments in comparison to WL was determined. The results appeared to be quite robust since the alternative analyses hardly changed our results.
At randomization, all groups were quite similar on demographics and patient characteristics. Only the duration of functional limitations was definitely not equally distributed, but this variable was additionally controlled for in the statistical analyses.
In this study quite liberal inclusion criteria were used. In other studies, for example patients with psychosocial problems were excluded  were treated as inpatients and should not have had previous back surgery, ongoing somatic or psychiatric disease or generalized disc degeneration . The level of functional limitations of our patients was relative high compared to that reported in other studies [58–61], but in accordance with the Dutch health care system in which CLBP-patients with moderate to severe functional limitations are treated in outpatient rehabilitation centers . This means that the generalizability of the results for clinical practice is very high.
To improve the quality of the interventions, all treatments were highly structured by using detailed treatment protocols, and given by well-trained and skilled therapists. In order to avoid possible confounding within all treatment elements, therapists were trained only to deliver one specific treatment element in the single and combination treatment as well. Therefore it was not possible to keep the therapists blinded. The patients could not be blinded because of ethical reasons. However, before randomization the patients were told that all active treatments are effective, but that the exact effectiveness is not yet clear. Furthermore, patients with an absolute preference for one treatment were excluded. Concealment of randomization was successfully achieved since no one of the referring physicians was aware of the type of treatment the referred patient would be randomized to. The blinding of the research assistants also seemed to be successfully maintained.
The compliance with treatment protocol by patients and clinicians are rarely assessed or adequately reported in RCTs . Therefore additional registration forms for the therapists and diaries for the patients were used. Inspection of these forms indicated that there were no statistical significant differences between all treatment groups regarding protocol deviations. The treatment quality was judged to be sufficient in those patients who received at least 2/3 of all possible treatment sessions. This was further confirmed by the significant increase in VO2 max in the treatments using a physical training modality. The lack of additional improvement in negative problem solving after completion of PST was also reported by van den Hout et al. . Despite this lack of improvement, van den Hout et al. found a decreased level of functional limitations in the problem solving group at 12-month follow-up, meaning that PST probably exerted its effect otherwise.
Furthermore in only a limited number of patients the reasons for insufficient adherence were exclusively related to the type of treatment. The reported rate of compliance was quite similar to a few other RCTs using comparable treatments, ranging from 68% to 73% [60, 64] and even only 69 % in regular multidisciplinary rehabilitation programs .
Several investigators recommend that besides subjective questionnaires, more objective outcome measures such as performance tasks should be used [15, 46]. The improvement on performance tasks mainly occurred in APT and CT. This might be due to an increase of aerobic capacity or endurance strength. Otherwise, several patients did not report difficulties in, for example stair climbing, and for them a ceiling effect for this particular task seemed to apply. Furthermore, the performance tasks were possibly not specific enough to detect a change in the ability to perform patient relevant activities. For instance when a patient wanted to improve his walking distance, he would have increased the distance and not the speed. Because of this, the five-minute walking task probably would not change dramatically. Another explanation can be that the subjective experience of functional limitations and main complaints, as rated by questionnaires changes more easily, while changes in performance tasks take more time to establish . It is not known what these results mean for clinical practice, since besides our own study, only one RCT using similar performance tasks was identified and the only conclusion was that the number of patients showing improvement on these performance tasks was lower than for the self-rated outcome measures .
Since all active treatment groups showed similar effectiveness, the question arises how the effects are mediated. Previous studies have shown that irrespective of the treatment modality, improvement was mediated by the reduction of pain catastrophizing and the increase in experienced pain control [67–69]. Also in our study, patients in all treatment conditions have been exposed to situations that may have challenged their catastrophic beliefs that pain is a serious threat to their health. Results of such a mediation analysis based on our data will be reported later.
In order to improve the effectiveness it is not only necessary to find out how the treatment exerts its effect but also the question "what works for whom?" A way to further explore this might be to look for subgroups of patients by using objective, valid and reliable criteria to enhance the effectiveness of treatment programs .