Selection of the studies
In order to be included in our meta-analysis, studies had to meet the following criteria: (a) Physical therapy methodologies of treatment for LBP; (b) The study could include one or more different treatment groups, with or without a control group, but all had to have pretest and posttest measures; (c) Studies could be published or unpublished; (d) Studies had to have a group design; single-case designs were excluded; (e) Studies had to have the statistical data necessary for calculating the effect sizes; (f) Years considered: no restrictions regarding the beginning date, but the study had to be published or carried out by March 2011; (g) Languages: English, Spanish, French, Italian and Portuguese; (h) Age: 6 to 18 years; (i) LBP in the whole sample or part of it; (j) Studies in which subjects in the sample presented LBP that was secondary to the following features were excluded: serious spinal pathologies or deformities, neurological conditions which alter motor tone.
In order to select the studies that met the selection criteria the following databases were consulted: Cochrane Library, ISI Web of Knowledge, Medline, PEDro y LILACS. The search period went up to March 2011. The key words were combined as follows: [adolescent* or child* or youn* or school*] and [back pain or low back pain or back complaint* or back care] and [treatment or intervention or education or postural hygiene or posture education or back function or physiotherapy or ergonomics or physical therapy or exercise or exercise therapy or management or chiropractic or physical fitness or movement techniques or acupuncture or tens or massage or spinal manipulation or rehabilitation or back school or conservative or manual therapy or recuperation]. In the Medline search, this combination of key words was applied with the following additional characteristics: all years, all languages, all publication types, all citation subsets, all child (0–18 years), species (humans), all genders, all databanks, all statues, and with the field tags: topic. An example of the full electronic search strategy for Medline is provided in Additional file 1.
The electronic search identified 1,337 references which were reviewed to determine whether they met the selection criteria. The main reasons for deleting these studies were because the participants in the samples were adults (about 40%), applying pharmacological treatments for LBP (about 35%), or by other reasons (about 25%). Specialist electronic journals, conference papers and doctoral theses were also consulted. Finally, the references of the studies we identified were reviewed and contact was made with research experts in the field.
The result of the search process allowed us to select 8 articles that met the selection criteria, which meant a total of 16 groups, of which 11 were treatment groups and 5 were control groups. The Additional file 2 describes the process of selection and exclusion of studies. The 16 groups formed a total sample of 334 subjects at the posttest (221 participants in the treatment groups and 113 in the control groups).
Coding of the studies
In order to analyze the heterogeneity of the study results, the characteristics that could be related to the effect sizes were coded. The moderator variables were grouped into three categories according to the recommendations of Lipsey : substantive variables (of treatment, context and subject), methodological variables and extrinsic variables.
The following treatment characteristics were coded: (a) Type of treatment (back education, exercise, manual therapy and therapeutic physical conditioning); (b) Type of back education, (acquisition of knowledge, posture training habits, body awareness training); (c) The teaching mode of back education (theoretical, practical); (d) The type of exercise (stretching, strengthening, breathing, posture correction, balance exercises, functional exercises, warm-up, relaxation, coordination, stabilization); (e) The type of manual therapy (mobilization, manipulation, massage); (f) The type of therapeutic physical conditioning (walking, running, cycling, swimming); (g) The duration of the treatment (in weeks); (h) The intensity of treatment (number of hours per week of treatment per subject); (i) The treatment magnitude (total number of hours of treatment per subject); (j) The number of sessions established; (k) The homogeneity of the treatment; (l) The inclusion of home exercise program; (m) The inclusion of a follow-up program; (n) The use of external agents (subjects that are not part of the therapeutic group, who are not professionals but who have an influence and can support the subjects in achieving their therapy goals); (o) The presence of relatives or sports coaches acting as co-therapists (who continue or carry out the treatment in other areas); (p) The mode of treatment (therapist, previously trained co-therapist, subject with therapist, unsupervised subject); (q) The type of training (group, individual or mixed); (r) The use of informed consent. With regards to the characteristics of the therapists the following variables were coded; (s) The number of therapists; (t) Whether or not the study’s authors agree with the therapists; (u) The therapist’s training (physical therapist, or other); (v) The therapist’s experience (high, medium, low or mixed).
The following subject characteristics were coded: (a) The average age of the sample (in years); (b) The gender of the sample (% of males); (c) The level of physical activity of the subjects during the study (low, moderate, regular); (d) The average duration of pain (in months); (e) Whether they have received previous treatment or not; (f) The presence or absence of other disorders. The following contextual characteristics were coded: (a) The country and (b) The place where the treatment took place (university, clinic, health centre / day care centre, hospital, school, sports centre, mixed).
Regarding the methodological characteristics, the following were included: (a) The assignment of subjects to treatment groups (random versus non random); (b) The type of control group (active versus inactive), when there was one; (c) The longest follow-up (in months); (d) The sample size; (e) The attrition at the post test; (f) The attrition at the follow-up; (g) The methodological quality of the study on a scale of 0 to 8 points, which is the sum of the scores of eight quality items (random assignment, type of control group, sample size, attrition, intent-to-treat analysis, evaluator blinding, homogeneous assessment, and inter-rater reliability) was analyzed according to the criteria of van Tulder  adapting some items to our work. Finally, the following extrinsic characteristics were coded: (a) The year of publication of the study; (b) The training of the first author (physical therapist or other) and (c) The publication source (published versus unpublished). To ensure the maximum possible objectivity, a coding manual was created that specified the rules followed in encoding each of the characteristics of the studies. The coding of certain characteristics required complex decisions to be made. In order to test the appropriateness of these decisions, we conducted a reliability study of the coding process and two researchers independently coded all of the studies. For the quantitative moderator variables the coding reliability was calculated using the intra-class correlation coefficient (ICC), while for the qualitative moderator variables Cohen’s kappa coefficient was applied. The ICC was 0.988 (range: 0.886 to 1) and the kappa coefficient was 0.977 (range 0.792 to 1), which is highly satisfactory, as proposed by Orwin . The inconsistencies between the coders were resolved by consensus and the coding manual was corrected when the cause of these inconsistencies was due to an error in it. The coding manual can be obtained from the corresponding author.
The effect size
Given the lack of control groups in this area of research, we chose to use the group as the unit of analysis instead of the comparison between a treatment group and a control group. The standardized mean change index, d, was used as the effect size index. d is defined as the difference between the means of the pretest and the posttest, divided by the standard deviation of the pretest . Positive d indexes indicated an improvement from pretest to posttest. The within-study sampling variance of the d index was calculated following Morris (2000). To calculate this sampling variance, the pretest-posttest scores correlation is needed. As the studies did not report it, then a common value of 0.5 was assumed for all of them. In order to check whether the value of the correlation coefficient can affect the meta-analytic results, a sensitivity analysis was carried out consisting into calculate the sampling variances of the effect sizes by assuming r values of 0.2 and 0.8.
The d index is methodologically weaker than comparing a treatment group with a control group, since it is more prone to bias due to factors such as the mere passing of time, the effects of history or spontaneous remission. However, it is the only viable alternative if not all studies include a control group. Nevertheless, since we obtained d indices for the treatment and control groups, the difference between the two enabled us to estimate the net effects of treatment.
With the purpose of checking whether the standardized pretest-posttest difference might be offering a biased estimate of the treatment effects, we also calculated the between-groups standardized mean difference with the five studies that included a control group. In this case, the effect size in each study was calculated as the difference between the standardized mean change of the treatment and the control groups. The sampling variance of this new effect size index was the sum of the sampling variances of the treatment and control within-study effect sizes. A comparison between the mean within-group effect size and the mean between-groups effect size enabled us to examine the potential existence of an overestimation of the treatment effects with the within-group effect sizes.
The results of each study were classified according to outcome measure: (a) pain, (b) disability, (c) flexibility, (d) endurance and (d) mental health. The different results were also classified according to the measurement type: self-reports and clinician assessments. In addition, an overall effect size was calculated in each single group by averaging the effect sizes for the different outcome measures reported in the study. For each outcome measure and measurement type both a within- and a between-groups d index was calculated.
In order to check the reliability of the effect size calculations, two independent researchers carried out the calculations for all of the studies, reaching an intra-class correlation coefficient of 0.987 (range: 0.882-1), which is also highly satisfactory.
Separate meta-analyses were carried out for each combination of outcome measures and measurement types. In order to give more weight to the effect sizes with larger sample sizes, each effect size was weighted by the inverse variance.
In each meta-analysis and assuming a random-effects model, we calculated a weighted mean effect size together with its confidence interval for the treatment groups and control groups separately. The same calculations were done with the between-groups effect sizes. Following Cohen , we interpreted the effect sizes of 0.20, 0.50, and 0.80 as representing low, medium and high effect magnitudes, respectively. The comparison between the treated and control groups was carried out by weighted ANOVA so that the Q
b test enabled us to check if there were significant differences between the average effects of the treated and control groups. To test the influence of other moderator variables, weighted ANOVAs and simple meta-regressions were used for the qualitative and continuous variables, respectively. The residual heterogeneity variance was estimated by the method of moments proposed by DerSimonian and Laird. There are other heterogeneity variance estimators proposed in the literature. In order to check whether the selection of the variance estimator can affect the meta-analytic results, the analyses were repeated by using a variance estimator based on the restricted maximum likelihood (REML) method.
To test the differential effectiveness of the different types of treatment we applied a mixed-effects multiple meta-regression model.
Finally, we checked whether publication bias could be a bias source in the effect size estimates in our meta-analysis .
All statistical analyses were performed using SPSS macros created by David B. Wilson  and the programs REVMAN 2.0 and Comprehensive Meta-analysis 2.0 . The PRISMA checklist was used to check the reporting quality of the meta-analysis (Additional file 3).