The effectiveness of interventions aimed at increasing physical activity in adults with persistent musculoskeletal pain: a systematic review and meta-analysis

Background Individuals with persistent musculoskeletal pain (PMP) have an increased risk of developing co-morbid health conditions and for early-mortality compared to those without pain. Despite irrefutable evidence supporting the role of physical activity in reducing these risks; there has been limited synthesis of the evidence, potentially impacting the optimisation of these forms of interventions. This review examines the effectiveness of interventions in improving levels of physical activity and the components of these interventions. Methods Randomised and quasi-randomised controlled trials were included in this review. The following databases were searched from inception to March 2016: CENTRAL in the Cochrane Library, Cochrane Database of Systematic Reviews (CDSR), MEDLINE, Embase, CINAHL, PsycINFO and AMED. Two reviewers independently screened citations, assessed eligibility, extracted data, assessed risk of bias and coded intervention content using the behaviour change taxonomy (BCTTv1) of 93 hierarchically clustered techniques. GRADE was used to rate the quality of the evidence. Results The full text of 276 articles were assessed for eligibility, twenty studies involving 3441 participants were included in the review. Across the studies the mean number of BCTs coded was eight (range 0–16); with ‘goal setting’ and ‘instruction on how to perform the behaviour’ most frequently coded. For measures of subjective physical activity: interventions were ineffective in the short term, based on very low quality evidence; had a small effect in the medium term based on low quality evidence (SMD 0.25, 95% CI 0.01 to 0.48) and had a small effect in the longer term (SMD 0.21 95% CI 0.08 to 0.33) based on moderate quality evidence. For measures of objective physical activity: interventions were ineffective - based on very low to low quality evidence. Conclusions There is some evidence supporting the effectiveness of interventions in improving subjectively measured physical activity however, the evidence is mostly based on low quality studies and the effects are small. Given the quality of the evidence, further research is likely/very likely to have an important impact on our confidence in effect estimates and is likely to change the estimates. Future studies should provide details on intervention components and incorporate objective measures of physical activity. Electronic supplementary material The online version of this article (10.1186/s12891-017-1836-2) contains supplementary material, which is available to authorized users.


Background
Epidemiological studies suggest one in five people across Europe suffer from persistent pain [1,2]. Most persistent pain arises from musculoskeletal disorders, such as low back pain and osteoarthritis; both of which are considered leading causes of disability, worldwide [3]. It can be expected that with aging populations, the health, economic, and social problems associated with these conditions are likely to rise [1,2,4]. In addition to causing considerable disability, persistent musculoskeletal pain (PMP) also increases an individual's risk of developing other health conditions including; depression, obesity, heart disease [5][6][7], cancer [8] and indeed early mortality [7][8][9]. Despite this, efforts to address these broader health implications of PMP are somewhat lacking.

Description of the intervention
Clinical guidelines widely endorse exercise and/or physical activity (PA) in the management of PMP [10][11][12][13][14][15][16][17]. This is largely due to the positive impact these interventions can have on reducing pain and disability. However, improving levels of PA can lead to broader health benefits: with even small changes in PA levels leading to substantial health gains [18,19].
PA can be defined as any movement produced by skeletal muscles resulting in energy expenditure, it occurs across several domains including: social and domestic activities, commuting, recreational and leisure activities [20]. PA may or may not include exercise: exercise is a subset of PA tending to be planned, structured or repetitive [20] with a specific purpose such as improving strength, it has been recommended that the terms PA and exercise are not confused [21].

How the intervention might work
Improving levels of PA requires behaviour change. Behaviour change interventions are coordinated sets of activities designed to change specified patterns of behaviour [22]. Behaviour change techniques (BCTs) are the components of interventions that effect change [23]. Taxonomies of BCTs have been used to describe intervention content in a number of PA behaviour change interventions [24][25][26][27][28]. Across these interventions and in line with NICE recommendations for individual level behaviour change [29], some consistent techniques appear to be associated with effective interventions e.g. self-monitoring behaviour, providing feedback, and goal setting.
Why it is important to do this review PA and exercise interventions are often recommended in the management of PMP as they can have a positive effect on pain and disability levels. However, the extent to which these interventions actually result in changes to behaviour and consequently increased levels of physical activity is less clear. Although individual studies have demonstrated it is possible to increase PA levels in those with back pain [30] or osteoarthritis [31,32], the results of systematic reviews are conflicting and limited. In adults with osteoarthritis a systematic review concluded that selfmanagement programmes achieve small improvements in subjectively measured PA in the short-term [32]: whereas, a review of PA interventions in adults with PMP reported no improvements in objectively measured PA [33]. Furthermore, the BCTs used within these forms of interventions and the relationship if any, to outcomes has not yet been systematically explored.

Objectives
This systematic review investigated the effectiveness of any form of intervention with a clear aim of increasing PA in adults with PMP. Possible associations between BCTs or intervention characteristics and intervention effects were also investigated.
The objectives of this review are to: 1. Determine the effectiveness of interventions in increasing PA levels in adults with PMP. 2. Identify BCTs used within interventions. 3. Determine if particular BCTs or other intervention characteristics (intensity, recruitment route, type of PA, etc.) are associated with greater effect sizes.

Methods
The full protocol for this review has been published [34].

Population
Randomised and quasi-randomised controlled trials in adults (≥18) with PMP (pain lasting ≥3 months), in the axial skeleton or large peripheral joints were included. We excluded studies focusing on fibromyalgia, inflammatory and/or autoimmune disorders and perioperative patients, which may require a different management strategy.

Types of interventions
All interventions that had a clear aim of increasing PA in adults with PMP were eligible for inclusion. We excluded site specific rehabilitative exercise interventions unless it was clear the intervention also addressed habitual PA. We included trials with a comparative control group and trials with multiple intervention arms. We did not include population or community-wide interventions.

Types of outcome measures
The primary outcome of interest was PA measured by self-reported or objective measures; questionnaires, recall diaries, pedometers or actigraphy. Measurements of adherence or attendance at classes alone, were not sufficient. The secondary outcome of interest was adverse incidents. Reference lists of systematic reviews and articles retrieved from the search were scanned for additional references.

Data collection and analysis Selection of studies
Results from the searches were imported into End-Note (X7) bibliographic software (Thomson Reuters, Philadelphia, PA, USA) and duplicates removed. Titles and abstracts obtained from the search were independently screened by two authors (JM 100%, MAT 70% and SMcD 30%). Articles not meeting the inclusion criteria and outside the scope of the review were removed. Full text reports of the remaining publications were retrieved. Two review authors (JM, SMcD) used a standardised form tested prior to use, to select trials eligible for inclusion. Non-English papers were assessed and, where necessary, translated in part or in full.

Data extraction and management
Data was extracted independently by two reviewers (JM, SMcD) using a customised form tested prior to use. Relevant data was extracted for methodological issues, intervention characteristics, study design, study characteristics and adverse events. Intervention content was coded according to the BCTTv1 [35]. Two coders (JM, SH) independently coded BCTs, inter-rater reliability was assessed using the prevalence-adjusted bias-adjusted Kappa (PABAK) statistic [36]. PABAK adjusted for the high frequency of agreement on absent BCTs. Values of 0.60-0.79 indicated 'substantial' reliability and 0.80 and above 'outstanding' reliability [37].

Assessment of risk of bias in included studies
Two reviewers (JM, SMcD) independently assessed studies for risk of bias (ROB), using the Cochrane risk of bias tool [38]. An additional domain was added to determine if studies were adequately powered. For cluster randomised controlled trials, five additional domains were assessed, as recommended by Cochrane (16.3.2) [38].

Quality of the evidence
The Grading of Recommendations, Assessment, Development and Evaluation (GRADE) approach was used to interpret and evaluate the quality of the evidence [39,40]. The methods and recommendations described in the Cochrane handbook [38] and by the GRADE working group [33] were used to assess the quality of a body of evidence using five domains: risk of bias, inconsistency, indirectness of evidence, imprecision of effect estimates and potential publication bias. Data for each outcome was entered into GRADEpro to create 'Summary of Findings' table and footnotes were used to justify all decisions on the downgrading of the quality of the evidence.
The definitions described by the GRADE working group were used to grade the quality of evidence as follows: High -Further research is very unlikely to change our confidence in the estimate of effect. Moderate -Further research is likely to have an important impace on our confidence in the estimate of effect and may change the estimate. Low -Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate. Very low -Any estimate of effect is very uncertain.

Measures of treatment effect
Continuous outcomes were analysed using post intervention measures, we reported effect sizes using the standardised mean difference (SMD) as outcomes were reported across different scales. For comparisons of the results we categorised studies into effect sizes according to Cohen's classification; SMD; 0.2 < 0.3 as small, 0.3-0.8 as moderate, >0.8 as large [41]. P-values of <0.05 and confidence intervals that excluded null values were considered statistically significant.

Unit of analysis issues
Where studies involved multiple intervention groups we followed recommendations suggested by the Cochrane collaboration (16.5.4) [38] by combining similar intervention groups to perform a single pairwise comparison.
Where studies reported PA domains separately or reported more than one PA outcome, data were extracted for each, however, for the effect size analysis, measures of overall PA were given preference, if these were not available leisure time PA was given preference'.
To facilitate exploration of results not suitable for quantitative synthesis we grouped studies by effect size using an aggregate of subjective and objective measures (objective measures given preference to subjective where available) at the post intervention time point.

Dealing with missing data
Attempts were made to contact original investigators to request missing data.
The frequency and duration of the intervention was used to calculate an estimated overall intervention contact time 'intensity'. The calculation was based on the full intervention being delivered as planned. If the duration of a session was not reported or the data was unobtainable from authors, we allocated 20 min for telephone follow up and 45 min for face to face interventions.

Assessment of heterogeneity
Diversity across the studies was qualitatively assessed in terms of the intervention, participant demographics, outcome measures and follow-up. Data was assessed for statistical heterogeneity using RevMan version 5.3 using the I 2 statistic, values of I 2 ranging from 30% to 60% were considered to represent moderate heterogeneity and 50% to 90% substantial heterogeneity [38].

Data synthesis
Separate meta-analyses were completed for subjective and objective outcome data at three time points; short term (not longer than 12 weeks' post-randomisation), medium term (not longer than 6 months' post randomisation) and long term (greater than 6 months post randomisation). Outcomes were analysed using the SMD, with the inverse variance method to calculate the overall effect and standard error, a random effects model was applied to incorporate heterogeneity.

Subgroup analysis and investigation of heterogeneity
We performed the following pre-specified subgroup analysis: Clinical subgroups: classified as 'persistent low back pain' and 'osteoarthritis' Frequency and duration of intervention (intensity) classified as 'higher' or 'lower' relative to the median number of contact hours across the studies The following subgroups were planned but not conducted as the data generated was deemed insufficient.

Recruitment routes
Descriptive statistics were therefore used to explore possible associations between these factors and other intervention characteristics and intervention effects.

Sensitivity analysis
A sensitivity analysis was performed to check if excluding studies with a higher ROB affected results. The threshold for sensitivity analysis was set for studies meeting at least 50% of the criteria of the ROB assessment, excluding blinding of participants and providers.
Eight non-English language studies were translated but none were eligible for inclusion.

Excluded studies
A total of (n = 250) studies were excluded from the review. Exclusions were most often due to no or unacceptable measures of PA and studies having no clear aim of increasing PA (Fig. 1).

Participants in included studies
The studies involved 3441 suitable participants (4875 in total) (Table 1), over half were female (approx. 59.2%). Thirteen studies focused on osteoarthritis (1874 participants; n = 7 knee, n = 5 hip and/or knee, n = 1 generalised) and seven on persistent low back pain (n = 1567 participants). The mean age of participants with osteoarthritis ranged from 61 to 73.8 years, and for persistent low back pain from 40.4 years to 51.9 years. Table 2 summarises modes of delivery, intervention content, provider and intensity for each intervention. Most studies incorporated more than one mode of delivery but have been described according to what was considered the 'primary' delivery mode. Most interventions were provided by healthcare professionals (12/20), other providers included exercise and fitness professionals and a counsellor. Intervention contact times ranged from <1 h for a educational pamphlet [32] to approximately 200 h of contact time [46] occurring over a twelve month intervention. The median number of contact hours was 8.3 h. Walking was the most common form of PA, followed by multicomponent programmes utilising a mixture of aerobic, strengthening and/or general flexibility exercises. All of the interventions incorporated some form of educative component relating to the role of PA in managing PMP.

Interventions
A total of 160 BCTs (mean per study 8, range 0-16) were coded across the 20 studies ( Table 2). The most frequently coded techniques were 'goal setting (behaviour)' and 'instruction on how to perform the behaviour' (65%) followed by 'behavioural practice/rehearsal' and 'self-monitoring of the behaviour' (55%). A mean PABAK score of (0.9) indicated outstanding agreement on identification of BCTs.

Control groups
The content of control groups varied (Table 1); seven studies referred to control groups as 'treatment as usual' or some form of 'standard care' [30,43,49,52,[55][56][57]. Two studies [45,58] used waiting list control groups. A clinical guideline posted to GP's was used as a control in the study by Becker et al. [44]. Pamphlets were used as a control in the study by Brosseau et al. [46] and a copy of the ' Arthritis Help book' was given to controls in the study by Hughes et al. [50]. Two studies used selfmanagement programmes in their intervention, but provided it as a stand-alone intervention for controls; [31,47]. Two studies directly compared two forms of back rehabilitation programmes of varying intensity and content [42,54]. In the study by Williams et al. [32] the control booklet content differed to the intervention booklet. Krein et al. [53] provided controls with an uploading pedometer and reminder emails to upload data but not access to the web-based intervention, available to the intervention group. In two studies [48,51] in addition to exercise classes, intervention groups received additional intervention components.

Outcome measures
Across the 20 studies 13 scales or tools for measuring PA were identified (Table 1) twelve studies reported subjective PA; five objective PA and three reported both. Selfreported measures of PA included estimates of total PA and estimates of frequency, intensity and time in different domains of activity. Only two tools were used in more than one study; the International Physical Activity Questionnaire, [32,51,52], and the Freiburg Questionnaire of PA, [44,54]. Objective measures of PA included steps per day or total PA and/or time in different intensities of PA, measured by accelerometers and/or pedometers.
Follow-up (post randomisation) (Table 1) The longest follow up was 18 months [46] six months after a twelve month intervention. Eleven studies reported outcomes at 12 months [42-45, 48, 50, 52-54, 56, 57] however, the latter two studies involved interventions that lasted the 12 months. Four studies reported outcomes at 6 months [30,31,51,58] and one at 3 months [32]. One study had only post-intervention     outcomes at four weeks [49] and one study reported outcomes at nine months [47]. Pisters et al. [55] reported outcomes at 65 weeks, the intervention duration was described as 12 weeks however booster sessions were provided to participants up until week 55.

Risk of bias in included studies (Figs. 2 and 3)
The ROB in the included studies is summarised in Figs. 2 and 3. Blinding, inadequately powered studies and attrition bias were considered the greatest ROB in the included studies. Due to the difficultly in blinding participants and providers in PA interventions, the risk of performance bias was considered high in all but one study which involved posting an intervention or control pamphlet to participants [32], the review authors felt there was insufficient information in the report to support a judgement of high or low ROB for this study.
The majority of studies included in the review were not sufficiently powered, only nine reported conducting a power calculation for their primary outcome [32, 43-45, 48, 52-55]. Only two studies [45,55] conducted power calculations for PA outcomes. Attrition bias was considered high in just over one third of the included studies (35%).

Risk of bias in cluster randomised controlled trials
Three studies utilised cluster RCTs [43,44,55], summarised in (Figs. 2 and 3). Two studies [43,55] were judged to be of unclear ROB in relation to loss of clusters, this was due to the loss of clusters not being reported or discussed in the analysis or results. ROB on comparability with individually randomised trials was unclear in all three studies, this was largely due to a lack of reporting of comparability or the influence of clustering on intervention effects.  Short term: no longer than 12 weeks post randomisation. Nine studies (1096 participants) reported short term subjective PA outcomes (Fig. 4) [30, 32, 42, 45, 50-52, 57, 58]. Based on very low quality evidence the pooled effects of the interventions showed no demonstrable effect (SMD 0.24, 95% CI -0.07, 0.55). The quality of the evidence was downgraded from high to very low quality due to substantial statistical heterogeneity (I 2 = 83%), wide confidence intervals around the effect estimate and ROB (Table 3).
Medium term: greater than 3 months, not more than 6 months post randomisation.
Nine studies (1309 participants) reported medium term measures (Fig. 4) [30, 44, 50-52, 54-56, 58]. Based on low quality evidence the pooled effects of the studies at the medium term was significant with a small effect size (SMD 0.25, 95% CI 0.01, 0.48). The quality of the evidence was downgraded from high due to the substantial heterogeneity in the observed effects (I 2 = 72%) and weighting of studies at high ROB included in the analysis (Table 3).
Based on moderate quality evidence the pooled effects were small and statistically significant (SMD 0.21, 95% CI 0.08, 0.33) heterogeneity was moderate in the observed effects (I 2 = 40%). The quality of the evidence was downgraded from high to moderate due to the Fig. 4 Forest plot of comparison: 1 Effects of intervention versus control on subjectively measured physical activity: short-term, medium-term and long-term one area d. Imprecision -a serious downgrade is applied when the total population size is less than 400 (provided there is more than one study). Or, if the 95% CI includes 0 (no effect) or the upper and lower confidence interval cross an effect size (SMD) of 0.5 in either direction. A very serious downgrade is applied where there is a small population and imprecision of the effect estimate e. Where there was sufficient papers (10) a funnel plot was prepared and inspected, a serious downgrade was applied if this suggested a publication bias weighting applied to studies judged as high ROB in the analysis (Table 3).

Meta-analysis 2: Effects of intervention versus control on objectively measured PA
Eight studies reported objective measures of PA [30, 31, 45, 47-49, 51, 53]. Short term: no longer than 12 weeks post randomisation. Seven studies (441 participants) reported short term measures (Fig. 5, Table 3) [30,31,45,[47][48][49]51]. Based on very low quality evidence, the pooled effect was positive but not significant (SMD 0.31, 95% CI -0.11, 0.74) with substantial heterogeneity (I 2 = 76%). The quality of the evidence was downgraded from high to very low due to wide confidence intervals in the effect estimates and the weighting applied to studies judged as high ROB in the analysis (Table 3).
Medium term: greater than 3 months, not more than 6 months' post randomisation.
Four studies (245 participants) reported medium term measures (Fig. 5) [30,31,51,53]. Based on low quality evidence, the pooled effect was negative (SMD -0.02, 95% CI -0.40, 0.36) with moderate heterogeneity in the observed effects (I 2 = 41%). The quality of the evidence was downgraded due to the small number of participants included in the analysis and wide confidence intervals that included no effect.
Long term: greater than 6 months post randomisation. Four studies (435 participants) reported long term follow-up measures (Fig. 5) [45,47,48,53]. Based on low quality evidence, the pooled effect was positive but not significant (SMD 0.22, 95% CI -0.02, 0.46) with low heterogeneity in the observed effects (I 2 = 29%). The quality of the evidence was downgraded from high to low due to imprecision of the effect estimates as evidenced by the confidence intervals included no effect and the weighting applied in the analysis to studies at high ROB.

Sensitivity analysis
We examined the pooled effects for the two types of outcomes (subjective and objective) at each time point by an assessment of the ROB. When limited to studies with a lower ROB, effect sizes were not significant at any timepoint.
Subgroup Analyses: To increase statistical power for the planned subgroup analysis we used subjective measures of PA (n = 16 studies).

Subgroup analysis 1: Clinical conditions osteoarthritis and low back pain:
Effects were demonstrated for the osteoarthritis subgroup only, effects sizes were moderate in the medium-term (SMD 0.41, 95% CI 0.10, 0.72) and small in the longer term (SMD 0.29, 95%CI 0.08, 0.49). Only interventions that were of higher intensity, relative to the median calculated contact hours of the interventions (8.3 h) reached important effect sizes (seven studies). Higher intensity interventions resulted in moderate effect sizes for short term (SMD 0.66 95% CI 0.41, 0.91) and medium term (SMD 0.47 95% CI 0.20, 0.74) outcomes, and small effect sizes for longer term outcomes (SMD 0.25 95% CI 0.02, 0.48).

Influence of BCTS and recruitment route
It was not possible to conduct the quantitative subgroup analysis of BCTs and recruitment routes as the data generated from the review was not sufficient to permit valid comparisons. Descriptive statistics were used to describe possible associations between these factors and other intervention characteristics. To facilitate this exploration, all studies were grouped by effect size, post intervention (Fig. 6).
Thirteen studies demonstrated no effect, or negligible effects (<0.2) post intervention ( Table 2). Across these studies 100 BCTs were coded with a mean of 7.7 per study, range (0-15) with 31 unique BCTs present. The most commonly coded BCTs were; 'goal setting behaviour' , 'information on health consequences' 'instruction on how to perform the behaviour' and 'behavioural practice/rehearsal' which featured in 61.5% of the studies.
Recruitment route and other intervention characteristics: (Tables 1and 2) No notable differences were observed with regards to the influence of recruitment route, type of PA, mode of delivery and post-intervention effect sizes.
In seven studies demonstrating positive effects, five (71.4%) were delivered by healthcare professionals (2 multidisciplinary and 3 by physiotherapists). In comparison, studies with no effect (<0.2) were less frequently delivered by healthcare professionals (53.8%).

Adverse incidents
Only six studies made explicit statements regarding adverse incidents; two studies, although not explicitly stated, documented adverse incidents. Allen et al. [43] reported four adverse incidents unrelated to the intervention; one study [51] reported no adverse incidents related to the exercise components. Relatively minor musculoskeletal complaints were reported in three studies [30,52,53]. Allergic reactions to pedometer clips [30] and minor cardiovascular events [53] were also reported. One author [52] noted that half of the participants in a walking group who developed increases in musculoskeletal complaints withdrew from the study. A fall resulting in a hip fracture sustained during a session was reported in one study [57] and three withdrawals due to increasing back pain were reported [42].

Summary of findings
This is the first systematic review and meta-analysis examining the effectiveness of interventions in improving subjective and/or objective levels of PA in adults with PMP and possible associations between BCTs and other intervention characteristics on effect sizes.
In builds on the findings of two similar reviews; Williamson et al. [60] who assessed the effectiveness of behavioural PA interventions in participants with lowerlimb osteoarthritis, and Oliveira et al. [33] who assessed the effectiveness of interventions in increasing objectively measured PA in chronic musculoskeletal pain. In contrast to the latter study this review makes a clear distinction between therapeutic exercise programmes and interventions specifically aimed at increasing PA levels or 'habitual PA behaviours'.
With respect to subjective PA, interventions were ineffective in the short term (up to 12 weeks, very low quality evidence); or had a small effect medium term (3-6 months: SMD 0.25, 95%CI 0.01 to 0.48, low quality evidence) and long term (SMD 0.21 95% CI 0.08 to 0.33, moderate evidence). Given the quality of the evidence further research is likely or very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate. Analysis of the evidence for objective outcomes showed that interventions were not effective at any time point. These observations were based on very low to low quality evidence therefore the estimate of effect is very uncertain and further research is very likely to change the estimate.
Subgroup analyses indicated that interventions were more effective in improving PA levels in adults with osteoarthritis compared to those with persistent low back pain. Intervention effects were also consistently higher in interventions with a greater number of contact hours (> 8.3 h). These subgroup analyses should be interpreted with caution; as differences may not relate to their classifications. However, subgrouping participants by condition was clinically plausible and intervention intensity has previously been associated with effectiveness.

Comparison of subjective outcomes with published literature
Two reviews examining long term outcomes of PA interventions: a Cochrane review of face-to-face interventions to promote PA [61] and a systematic review of PA interventions for adults aged 55-70 years [62]: both reported significant, but very small effects (SMD 0.19) at 12 months. Similarly, this review found small effects for outcomes measured beyond six months (SMD 0.21 95% CI 0.08, 0.33). These findings may indicate that individuals with pain respond to PA interventions in a similar manner to non-pain populations.
In a subgroup analysis Williamson et al. [60] found intervention effects were greatest between 6 and 12 months (SMD 0.53, 95% CI 0.41 to 0.65) and that the effectiveness of interventions declined over time, reporting no significant benefit compared to controls in outcomes beyond 12 months. Similarly, in our osteoarthritis sub-group we found a moderate effect size for medium term outcomes (>3 months ≤6 months) (SMD 0.41, 95% CI 0.10, 0.72) that diminished over time (>6 months) (SMD 0.29, 95% CI 0.08, 0.49). These findings may suggest that individuals with osteoarthritis make changes to their PA levels gradually. However, without ongoing support or increased efforts directed towards maintenance of PA, individuals with osteoarthritis may struggle to sustain increased levels of PA.

Comparison of objective outcomes with published literature
In line with our own findings of no detectable effect on objectively measured PA, Oliveira et al. [33] also found no effect on short, intermediate or long term objective outcomes. Williamson et al. [60] were unable to conduct a meta-analysis using objective measures due to a lack of studies reporting objective measures. In contrast to our findings, the review of interventions aimed at increasing PA in adults aged 55 to 70 years, found larger effects for objective measures (steps per day) (SMD 1.08; 95%CI 0.16, 1.99) at 12 months [62]. A possible explanation for this difference could be that the participants included in this review by Hobbs et al. [62] were essentially 'healthy populations' in contrast, our review and that of Williamson et al. [60] and Oliveira et al. [33] all involved participants with PMP.

Intervention characteristics
We found interventions with a higher number of contact hours resulted in greater effect sizes. Similarly in a post hoc meta-regression, Williamson et al. [60] also found, that a higher number of contact hours had a significant influence on intervention effectiveness. In contrast Hobbs et al. [62] found less intensive interventions were more effective than higher intensity interventions. A plausible explanation for these contrasting findings, is that those with PMP may need additional interventional support, in order to successfully change their PA behaviours in comparison to healthy populations.
In this review the influence of BCTs on PA outcomes is unclear but the findings are consistent with those of previous reviews. Bishop et al. [63] published a review and meta-analysis exploring the effects of contextual and BCT content of control and target interventions in 42 trials included in a Cochrane review of interventions to improve adherence to exercise for chronic musculoskeletal pain [64]. In keeping with the findings from our review, among the most frequently coded BCT's were 'instruction on how to perform the behaviour' and 'behavioural practice and rehearsal'. A finding also reported by Keogh et al. [65] who reviewed BCTs utilised in chronic low back pain self-management programmes. We found 'self-monitoring of the behaviour' was amongst the most frequently coded techniques in interventions with greater effect sizes, a finding not replicated in the either the Bishop et al. [63] or the Keogh et al. [65] reviews, but consistent to findings of PA reviews in healthy populations [24], older adults [66], and in obese adults [28]. As our review was more narrowly focused on habitual PA as opposed to adherence to exercise or self-management, this finding (although tentative) lends some support to the evidence that this technique may be particularly useful in PA interventions.
Interventions included in this review were generally multifaceted often involving several modes of delivery with varying degrees of complexity. It was difficult to draw firm conclusions regarding which characteristics of interventions are associated with more effective interventions.
Few studies provided explicit statements regarding adverse incidents; where they were reported they were largely limited to minor musculoskeletal complaints. Although risk of adverse incidents in PA interventions is generally regarded as low; it is plausible that exacerbating pain may have a deleterious effect on participation, particularly in those with PMP.

Completeness and quality of the evidence
The quality of the evidence within this review ranged from moderate to very low across the different timepoints and outcomes. Effect sizes at best are small and limited to subjective measures. Key limiting factors leading to downgrading the quality of the evidence were, ROB, statistical heterogeneity in the observed effects and imprecision as evidenced by wide confidence intervals. With respect to ROB many studies were designed to identify changes in pain and function/disability as their primary outcomes and were thus underpowered to detect changes in physical activity levels; as such the results of this review should be interpreted with caution. Furthermore, a number of studies failed to provide adequate detail regarding blinding of outcome assessors and allocation concealment. In cluster randomised controlled trials it was often unclear if authors had considered the effect of trial design and the influence clustering may have had on results and whether this was considered when comparing effects with other trials.
Whilst the use of validated measures of PA, was in itself a strength, a more standardised approach to reporting PA data would have permitted a more robust statistical analysis, strengthening the evidence. Selfreport measures are known to be prone to recall bias: it has been suggested that as both the intervention and control groups complete the measure any misclassification should be non-differential [67]. However, it could be argued, that using self-report measures in interventions where participants and providers are also unlikely to be blinded the potential of recall bias is increased. Only three studies included subjective and objective measures; this approach might be considered ideal given the relative strengths and limitations of each.
Descriptions of intervention content varied greatly impacting on the number of BCTs that could be reliably reported as occurring within an intervention. In this review we only coded BCTs clearly delivered to the participants and directed towards the target behaviour. As reported by others, [24,63] this approach, although more rigorous, may result in less BCTs being coded than were actually delivered.
The variation noted across the control conditions could have influenced effect-estimates with smaller between group effects associated with comparisons against more active control treatments [68]. However, we did not detect this when reviewing individual effect size comparisons.
Six studies initially assessed as suitable for inclusion did not report means, standard deviations or sample sizes and requests to obtain this data from study authors were unsuccessful; this data could have added to the quality of the evidence in this review.
Study participants were largely recruited from primary or secondary care (General Practitioners, physiotherapy clinics): it is very possible that the effects seen in those recruited via these settings, differ to those accessing for example, specialist pain services.

Potential biases in the review process
Studies were primarily excluded from the review because a suitable measure of PA was not reported. This may reflect a selective reporting bias; however, it is suggested this is more likely to reflect the changing emphasis of healthcare interventions, particularly the drive towards self-management and a public health approach to managing long term conditions. Although databases were searched from inception only two studies included in the review were published prior to 2003 [42,57].

Implications for practice
Based on the findings of this review it is not possible to conclude which characteristics of interventions are more effective. However, based on observational analysis and in line with findings of previous reviews, integration of behavioural techniques such as; 'self-monitoring of the behaviour' , 'instruction on how to perform the behaviour' and 'goal setting (behaviour)' may be indicated. Higher intensity interventions -in terms of the estimated contact time with the intervention, may be more effective than less intensive interventions.
The emphasis of PA and exercise interventions in PMP has largely been directed at reducing pain and disability. However, these interventions may have little impact on the overall level of PA an individual engages. Targeted behaviour change interventions are likely to be required to address the risk of morbidity and mortality in this population.

Implications for research
Persistent pain, like many other non-communicable diseases is influenced by several determinants of health such as; socioeconomic status, education, employment and mental health [69]. There is a need for future studies to adopt methods to encourage and secure participation from individuals representing the broad spectrum of persistent pain patients. In particular, those accessing specialist pain services were under represented in this review. Individuals accessing specialist pain services are often deemed to be on the more severe end of the pain spectrum and typically report much higher levels of disability and poorer health related quality of life scores [2]. We agree with previous suggestions [70] that health inequalities may actually be increased because of differences in responses to recruitment. A clear finding from this review is the need to standardise the measurement of PA in PMP populations.
To improve the quality of evidence, future studies should be sufficiently powered, collect longer term follow up data and report on cost-effectiveness. Study authors should report methods for blinding outcome assessors clearly. Providing access to supplementary data such may improve the quality of coding and reporting of intervention content. Future reviews should consider incorporating meta-regression or moderator analysis to explore if specific components or characteristics of interventions are associated with more effective interventions.

Differences between published protocol and review
The review authors reappraised the decision to include unpublished studies and included only those that had been published.
Study authors were amended: SH was added to review team and coded intervention content. LA was added to the review team and provided expert input on aspects relating to coding of BCTs.
The review team agreed to limit the extraction of secondary outcomes to adverse incidents relating to the intervention. There were two main reasons; firstly, to maintain the focus and specificity of the review. Secondly a number of systematic reviews have recently been published describing many of the secondary measures; pain, disability and function, it was felt that extracting these outcomes would be of little additional value to readers of the review.
ROB: The validity of the PA outcome measure is not added as an additional domain within the ROB. This data was included in the data extraction forms and is discussed in relation to outcome measures. An additional domain of sample size calculation for the primary outcome (not specifically for PA) was added to the ROB table and a priori agreements were made during piloting of the ROB table with regards to agreed cut-offs for attrition bias.
The GRADE approach was adopted post-protocol to rate the quality of evidence generated within the review process.