Inclusion Criteria
Types of Studies
Randomised controlled clinical trials were included. The study must have been reported in English as translation funding was not available. Studies must have reported that one group performed aquatic exercise and the comparison group participated in land based exercise; this could have included any exercise training for strength, endurance, resistance or aerobic capacity whether gym or home-based. To allow conclusions regarding the relative effects of aquatic and land exercise, papers were only included if they provided data that enabled outcomes following aquatic and land based exercise to be tested for significant differences.
Types of participants
Participants had to be people with rheumatoid arthritis or osteoarthritis.
Types of Outcome measures
Trials must have reported function, mobility or patient satisfaction outcomes using any assessment instruments.
Exclusion Criteria
Trials in which participants performed aquatic or land based exercise in conjunction with other interventions were excluded unless the effects of aquatic compared to land based exercise could be partitioned from reported data. Participants less than 18 years of age were excluded due to the additional management implications associated with an immature musculoskeletal system. Participants who exercised as part of rehabilitation immediately following joint replacement surgery were excluded as the review focus was effectiveness for people with joints affected by arthritis.
Search Strategy
Medline, CINAHL, AMED and the Cochrane Central Register of Controlled Clinical Trials were searched from the commencement of each database to July 2010. A sensitive search was developed using the terms 'aquatic physiotherapy', 'hydrotherapy' or 'water exercise' interventions for people with 'arthritis', 'osteoarthritis' or 'rheumatoid arthritis'. No terms relating to the 'comparison' and 'outcome' of trials were searched to avoid excessive exclusion of trials in an area where limited research has been conducted. The full electronic search strategy is available from the first author on request.
Study Selection
Papers were initially screened and excluded based on title and abstract by two independent researchers. Full text was obtained for the remaining papers and these were assessed independently by both researchers against an inclusion and exclusion checklist. Disagreements were resolved through discussion; if this failed a third researcher was consulted.
Quality Assessment
All included trials were critically appraised using the 11 item PEDro scale [12–14], 10 of which were scored using explicit decision rules. All trials were independently assessed by the first author. A search for included papers was then performed on PEDro and quality assessment scores compared to those determined by two independent PEDro assessors where these were available [15]. If there was disagreement on an item's assessment, these were assessed independently by another researcher. If no quality score was available in the PEDro database, the paper was independently assessed by both reviewers.
Item 4 (baseline comparability) was not fulfilled if there was a significant and important difference (95% confidence that SMD > 0.2) between groups at baseline for one measure of disease severity or one key outcome measure. If more than one outcome was measured by trials, only one outcome had to achieve baseline similarity to this fulfil criteria. Item 8 (key outcome measures were obtained for more than 85% of participants who were assessed at baseline) was calculated using data for each group (rather than for the pooled intervention and comparison group) when relevant data were reported.
Data extraction
All data extraction and calculations were performed independently by two reviewers. Both sets of data were then compared for discrepancies and these were resolved through discussion.
The following data were systematically extracted: study design details, participant characteristics and baseline demographics, affected joints, duration of arthritis, group numbers, participant age and inclusion criteria, intervention and control group conditions including pool temperature, group size, supervision of the exercise intervention, provision of a home exercise program, compliance of participants, number of drop outs, length of interventions, duration and number of sessions, features and components of aquatic and land-based exercise including the provision of warm-up, stretching, cool down, balance, strengthening and functional exercises.
Data assessing function, pooled indices and mobility outcomes were also extracted. The World Health Organization defined six domains for the assessment of health [16]. These include pain, self care, usual activities, cognition, mobility and affect. Domains considered relevant to function were usual activities and self care. Outcomes that encompassed multiple domains were classified as pooled indices. Mobility was assessed through the extraction of data on walking ability and dynamic balance. If trials specified data collected under a range of walking speeds, data for the fast pace was extracted. To assess patient perception of the program any outcome that assessed patient enjoyment, satisfaction or any other type of feedback of the exercise programs was extracted.
To compare effectiveness of interventions for each relevant outcome, point measures and measures of variability were extracted. Means and standard deviations of outcomes measured immediately following the intervention were extracted and analysed. When necessary, the standard deviation [sd] was approximated by dividing the inter-quartile range by 1.35. Medians were used as best estimates of means. Standard error [SE] was converted to sd using the formula SE = sd/(√n). These data points were then used to calculate Hedges [17] corrected standardised mean differences [SMD] and 95% confidence intervals [CI] to assess intervention effects. The SMD was the difference between two means normalised using either pooled or control group standard deviations (the former where no significant difference in control and intervention standard deviations was observed). This index is useful for comparing data collected using different scales. A SMD <0.2 was considered a small effect, 0.5 (>0.2, <0.8) a moderate effect and >0.8 a large effect [18]. Data at baseline and immediately following the intervention were extracted. Long term effectiveness of interventions was not assessed as it was beyond the scope of the review. If trials did not use Intention-To-Treat analysis [ITT], per-protocol data were extracted for analysis.
Meta analysis
Pooling of data across multiple studies can provide an improved estimate of the effect of the intervention as a consequence of the larger number of total participants and reduction in random error due to sampling differences.
Meta-analysis was performed using Review Manager (RevMan5) software[19]. Heterogeneity between trials was assessed using the I2 statistic. Heterogeneity was considered substantial if I2 was greater than 50% and a random effects model applied; otherwise a fixed effects model was used for the analysis [20]. SMDs were used where different scales were used to measure comparable outcomes across trials. Scale directions were aligned by adding negative values where required.