The reliability of postural balance measures in single and dual tasking in elderly fallers and non-fallers

Background The purpose of this study was to determine the reliability of a forceplate postural balance protocol in a group of elderly fallers and non-fallers. The measurements were tested in single and dual-task conditions, with and without vision. Methods 37 elderly (mean age 73 ± 6 years) community-dwellers were included in this study. All were tested in a single (two-legged stance) and in a dual-task (two-legged stance while counting backwards aloud in steps of 7's) condition, with and without vision. A forceplate was used for registering postural variables: the maximal and the root-mean-square amplitude in medio-lateral (Max-ML, RMS-ML) and antero-posterior (Max-AP, RMS-AP) direction, mean velocity (MV), and the area of the 95% confidence ellipse (AoE). Reliability of the test protocol was expressed with intraclass correlation coefficients (ICC), with 95% limits of agreement (LoA), and with the smallest detectable difference (SDD). Results The ICCs for inter-rater reliability and test-retest reliability of the balance variables were r = 0.70–0.89. For the variables Max-AP and RMS-AP the ICCs were r = 0.52–0.74. The SDD values were for variable Max-ML and Max-AP between 0.37 cm and 0.83 cm, for MV between 0.48 cm/s and 1.2 cm/s and for AoE between 1.48 cm2 and 3.75 cm2. The LoA analysis by Bland-Altman plots showed no systematic differences between test-retest measurements. Conclusion The study showed good reliability results for group assessment and no systematic errors of the measurement protocol in measuring postural balance in the elderly in a single-task and dual-task condition.

Although tests for postural control with functional balance scales are easy to perform and are suitable for daily clinical use they often lack accuracy. Technology based laboratory systems may give more detailed information about postural balance [8], but are often difficult to use in a clinical setting.
Quantitative posturography is a frequently used technique for measuring postural control [9]. This technique covers all force platforms used to quantify postural control in upright stance in either static or dynamic conditions. The employed force platform indirectly detects changes of postural sway by assessing the ground-reaction forces. These ground-reaction forces are used to calculate the centre of pressure (COP), which reflects the trajectory of the centre of mass and the torque acting on the surface [10]. Various balance variables can be derived from the COP movement, e.g. the root mean square (RMS) of COP amplitudes in anterior-posterior and medio-lateral direction or the maximum COP displacement in anterior-posterior and medio-lateral direction [11][12][13][14][15]. It is assumed that these measures relate to impaired postural control in humans. However, in spite of the frequent use of these measures only a small number of studies have reported on the reliability of postural balance measures [12,13,[15][16][17].
Commonly identified flaws in reliability studies are the exclusive use of healthy individuals, questionable applicability in clinical practice, low sample size, the absence of a protocol and the use of inadequate statistics [18]. It is questionable whether the test results of healthy elderly for example can be generalized to specific sub-populations, e.g. fallers, in clinical practice. Only very few studies tested the reliability of postural assessment with a force platform in patient groups. Benvenuti and colleagues (1999) assessed patients with a variety of chronic pathologic conditions resulting in balance problems; however, they did not specifically focus on fallers or non-fallers [16]. Stroke survivors and patients suffering from diabetic neuropathy were assessed by Corriveau and colleagues (2001) but these authors excluded subjects if they reported visual or somatosensory impairments or reported at least 1 fall in the past year [17]. The same exclusion of fallers was performed by Lafond et al. (2004) [15].
There seems to be a need to perform reliability assessments of postural control in groups with identified fallers and non-fallers. No reliability studies have been reported that specifically included fallers. However, since one-third of community-dwelling people over 65 years of age experience one or more falls each year, it seems important to include elderly fallers in reliability studies [18][19][20][21][22].
The applicability of test measures in clinical practice is another important point to consider. Most reliability studies used single-task procedures consisting of standing quietly while manipulating the visual input and/or changing the base of support (BOS). Mulder et al. (2002) argued that although a motor system may deteriorate across time, many assessment procedures show no changes in performance. The authors state that this phenomenon is related to the fact that the level of functional reorganization of a (changing) motor system is not necessarily reflected in the 'pure' end-result of a task, but might be reflected also in the increasing compensatory costs across time [23]. This would mean that assessment procedures that are used in clinical practice should also be sensitive to this phenomenon when we want to be able to detect possible underlying pathologies. In other words the compensatory costs, necessary to keep the motor output optimal, should be estimated in clinical protocols.
The basic idea behind the dual-task methodology is that the performance of a difficult (non-automated) task interferes with other simultaneously performed tasks [24]. Hence, by employing an attention demanding task, it is possible to use the degree of interference of this task with the primary task (e.g. standing) as a measure of the attention demands (cognitive compensation) of the primary task. There is indeed a growing use observable of dual-task procedures in studies focusing on recovery after damage to the motor system [25] or in neurological assessment [26]. However, there are not many studies that include reports on the reliability of the used protocols. We found only one study that focused on the reliability of the postural measures and that used a simultaneous secondary task during performance of the primary (postural) task [4]. At the same time it has been reported that falls seem to occur frequently during activities in which attention has to be divided between two tasks [27]. This observation further underscores the potential value and necessity of dual task testing. Furthermore, because of inconsistencies in the design and analysis of method evaluation studies, a high proportion of prognostic studies were presented with poor methodology which resulted in the presentation of conflicting interpretation of variability of the measures. This led the Work Package 3 of the Prevention of Falls Network Europe to formulate criteria for evaluation of measurement properties of clinical balance measures for fall prevention studies [28]. The purpose of the present study was, therefore, to determine the interrater and test-retest reliability of quantitative postural control measures in elderly fallers and non-fallers, tested under single and dual-task conditions, with and without vision, and considering both relative and absolute reliability.

Participants
Thirty-seven community dwellers participated in the study (29 women), the average age was 73 ± 6 years (range 61-85 years). The inclusion criteria were fallers and nonfallers older than 60 years of age of both genders. Exclusion criteria were participants who were unable to understand (language) the purpose of the study, severe psychological or psychiatric problems, chronic substanceabuse (medication, drugs and/or alcohol), and patients under chronic therapy with neuroleptics, sedatives, antiepileptics and anti-depressives. A structured interview that considered recommendations on falls outcome measures [29] was used to assess the numbers of falls in the previous year. A fall was defined as any event that caused unintentional contact by the torso or upper limbs to the ground or to some lower level, other than as a consequence of a violent blow, loss of consciousness, or a sudden onset of paralysis as in stroke or epileptic seizure [30]. A faller was defined in this study as a subject that sustained more than one fall within the last 12 months. The measurements took place at the Institute of Physical Medicine (Department of Rheumatology), University Hospital Zurich. All participants gave their informed written consent and were blinded to the purpose of the measurements. The study was approved by the local ethics committee.

Experimental procedure
The AMTI Accusway system for balance and postural sway measurement (Advanced Mechanical Technology, Inc., Watertown, Massachusetts) was used for collecting the data. The Accusway system consists of a portable force platform and SWAYWIN software for data acquisition and analysis. The system measures ground reacting force and moments in 3 orthogonal directions with a sampling frequency of 50 Hz. These provide the COP coordinates, which enables the calculation of the maximum displacement in the anterior-posterior and medial-lateral direction (Max-AP; Max-ML), the root-mean-square amplitude in anterior-posterior and medial-lateral direction from the centroid in x-and y-axis (RMS-AP; RMS-ML), the mean velocity (MV) and the area of the 95 th percentile ellipse (AoE).
Before the measurements took place, the balance platform was strapped with an anti-slip plastic cover (1 mm). The participant then took a comfortable barefooted, doublelegged stance on the platform. Because changes in the Base of Support (BOS) have a substantial effect on postural control [14]; the outlines of both feet were marked on the plastic cover with a permanent marker in order to obtain standardised individual foot positions for the repeated measurements. After leaving the platform, the individual's BOS was entered in the Accusway Plus system [31]. Maximal BOS width and hip width, measured at the major trochanter femoris, were recorded with an anthropometric calliper (Lafayette Instrument Company, Lafayette, IN).

Measurement Design
The participants were tested individually within a single session that lasted about 25 minutes. First, instructions of the cognitive task were given, followed by a full perform-ance of the cognitive tasks while seated. Thereafter, the participants were instructed to stand on the pre-marked plastic cover with the arms by the sides and eyes open while looking straight ahead.
The postural balance measurements were collected under two task conditions: standing quiet (without a secondary cognitive task) and standing quiet combined with counting backwards in steps of seven. Each task consisted of 4 trials and the average of the 4 trials was taken to obtain a reliable measure [17]. Each separate trial lasted 20 seconds, followed by a break of 20 seconds [32].
The total 20 seconds of the trial was used for the calculations. Between each task, the participants were allowed to sit down for a 2-minute break. Both tasks were measured with and without vision. The order of tasks (single, dual, with and without vision) was changed randomly to control for the effects of fatigue and learning. The rationale for this procedure was primarily based on the fact that the duration of a trial in quiet standing is limited due to fatigue, particularly in pathologic elderly [15]. Furthermore, the optimum test-retest reliability for our protocol was assumed to be obtained at 20s trial durations [32], and we wanted a test that is feasible to be implemented in a clinical setting where time constraints play an important role.

Cognitive task
Counting backwards, as a cognitive task, showed significant degradation in postural stability in healthy adults and healthy elderly [33][34][35]. Therefore counting backwards in steps of 7's was also used as additional task in the present study. The participant was asked to count back as fast and accurate as possible in 20 seconds [36,37]. If the counting backwards in steps of sevens was too difficult, steps of threes or ones were used instead. The starting number was selected at random from a range of 80-99. For those participants who were able to count back to zero within 20 seconds a starting number was selected within the range of 121 and 199. The counting was controlled continuously for accuracy and every mistake was noted. No feedback on performance was given during the testing. Evaluation of performance during the cognitive task included the difficulty (sevens, threes or ones) of subtraction units and the number of mistakes made by the participant during calculation.
To evaluate the performance of the cognitive task the difficulty (sevens, threes or ones) of subtraction and the number of mistakes made by the participant during the calculation were used to define 6 performance scores (Cognitive Difficulty Score, CDS). The lowest score is designated number 1 and is given when mistakes are made during counting backwards in ones. The highest score (6) is given when counting backward in sevens is possible without making mistakes. With increasing numerical complexity the CDS is increasing. An overall group score (GS) was calculated by taking a mean of all individual scores (Table 1).

Visual conditions
The two tasks were tested under two different visual conditions: a) Normal vision; the participants were instructed to view a fixed grey cross; the arms of the cross were 1 meter long and aligned horizontal. The vertical arms were 0.5 meter long. The cross was located in the middle of a screen (1.5 m × 1.5 m), which was positioned 2 meters in front of the forceplate. The height of the grey cross was fixed at 1.5 m. All participants used their own glasses when needed, to have optimal individual visual acuity. b) Vision was occluded with a pair of custom-made opaque goggles that prevented the subject from perceiving visual information without blocking the light in general. The participants were instructed to keep their eyes open inside the goggles.

Reproducibility Protocol
For the test-retest study, all participants were evaluated by the first rater on 2 occasions with an inter-measurement interval of 7 days. Both measurements were performed at the same time of the day in the same measurement room. Additionally at the second measurement occasion the second rater performed a third measurement to evaluate the interrater reliability. The order of the rater was changed after each participant (Figure 1).

Statistical Analysis
Descriptive statistics were used to describe the participant's characteristics. The one-sample Kolmogorov-Smirnov test was used to check the normality of the distributions.
De Vet and colleagues (2006) recently suggested using both reliability and agreement parameters in reliability studies because this allows gaining a better insight on the performance of measuring a variable [38]. Reliability parameters assess whether a measurement device can distinguish between groups of patients and between individual patients [39]. Agreement parameters measure the ability to achieve the same value in two measurements, and thus give an indication of the size of the measurement errors [40].

Reliability parameters
The intraclass correlation coefficient (ICC) was used as a parameter of reliability. The ICC (2,1) model was selected to test the interrater reliability, and the ICC (3,1) model to estimate the test-retest reliability [41,42].
The 95% limits of agreement (LoA) were for both the testretest and the interrater reliability assessed according Bland and Altman. LoA was calculated by: mean of the differences ± 1.96*SD. LoA indicates the total error, which is systematic error and random error combined. Discrepancies between measurements were also assessed by visual interpretations of the amount of agreement of the means of two trials against the difference between the trials (Bland and Altman Plots). The use of 95% confidence intervals of the range of differences between the two trials demonstrates how close the measurements agree on different occasions. All calculations were considered as significant at the 5% confidence level [45].
The data were entered, stored, and analysed in SPSS 12.0.1 statistical software (SPSS, Inc., Chicago, IL).

Results
A total of 37 participants were recruited (29 women), the average age was 73 ± 6 years (range 61-85 years) and a total of 11 fallers were identified. The participant's characteristics are shown in Table 2. All participants were able to count backward in steps of sevens. A total of 20 participants made counting mistakes, whereas 17 made no mistakes. The group score (GS) of first rater measurements in both occasions was 5.5. The second rater reported a GS of 5.4 within his measurement (maximal GS possible is 6; see Table 3). There was no significant difference in GS between the raters.

Reliability parameters
Two postural balance variables, which had no normal distribution, were log transformed and marked (see in Table  4 and Table 5). Our study showed good ICC values of the postural balance measurement protocol, e.g. test retest, as well as for interrater reliability. The ICC( 2.1 ) for interrater reproducibility and the ICC  Table 4 and the results of the test-retest are presented in Table 5.   Table 4). To detect change in clinical practice beyond measurement error potential changes should be larger than these SDD values.

Agreement parameters
The LoA showed very small systematic error between testretest and interrater agreement. The mean of the differences for variable Max-ML and Max-AP were between 0.0 cm and 0.08 cm. For variable MV between 0.03 cm/s and 0.18 cm/s and for variable AoE between 0.06 cm 2 and 0.51 2 (see Table 6).
Bland-Altman plots indicated that most points lie within the 95% limits of agreement for test-retest measurements.
Only 2 to 3 outliers were found within the plots. In all tables the outliers show both positive and negative differences of the mean, which indicates no systematic effect. Balance variables had the smallest 95% limits of agreement when testing in a single task situation with vision. The opposite was found in the dual-task situation with and without vision. The Bland-Altman plots are presented in the additional file [see Additional file 1].

Discussion
The purpose of this study was to evaluate the reliability of a forceplate postural balance assessment protocol under

All (n = 37) Non-fallers (n = 26) Fallers (n = 11) p
Group Score (GS) Rater 1 first occasion 5.   This study showed good reliability parameters for the total group of participants although in the non-fallers subgroup the values were higher compared to the fallers (see Table 4). Hence, our findings show the relevance of including symptomatic populations in a reliability study as previously was suggested by Hoving and colleagues (2005) [18]. Furthermore, the results of our study are in line with previous studies that included symptomatic populations, e.g. patients suffering from diabetes, neuropathy or stroke survivors [17]. From a clinical perspective our procedure makes sense because we included symptomatic individuals in our sample. This indicates that the results can be generalised to similar populations in clinical settings. It can be expected that a normal population will, similar to our sample, consist of both fallers and non-fallers. This would mean that our results are generalisable to comparable clinical populations.
The ICC values were different for each balance variable that was assessed. Between the test conditions, vision or no-vision and single or dual task, there were differences in ICC values as well (Tables 4 &5). The results were consistently better in the medial lateral direction compared to the moderate ICC values in the anterior posterior direction. From a clinical perspective these results are encouraging. Day and colleagues (1993) have demonstrated that deterioration of balance control in the elderly primarily occurs in the ML direction during quiet stance [46]. When responding to a plate perturbation older adults also frequently step to especially preserve lateral stability [47]. These findings might be an indication that the main focus in assessment should be put on the mediolateral force plate variables. In these cases there are no large differences in reliabilities of the test protocol between vision and novision and between single and dual-task testing conditions. Our protocol reveals no large differences in reliability between these test conditions. The most optimal variables that should be assessed when groups of subjects are compared seem to be Max-ML, RMS-ML, and MV since  These results are not in accordance with the results of Corriveau and colleagues (2001), who found better ICC values in the anterior posterior direction than in the medial lateral direction [17]. A possible explanation for these differences could be found in the different assessment protocols used. Our participants were expected to take a comfortable stance position and were expected to repeatedly use this individualised position. This ment that foot position was standardized for each subject, but not across subjects. This was in contrast to Corriveau and colleagues who asked their participants to take a pre-determined stance position of pelvis width.
It is well documented that with increasing stance width a disproportionate reduction in the angular motion about the ankles and feet; e.g. the ankle joint mobility in the frontal plane is reduced with feet apart [48]; can be observed that causes a large reduction in lateral body motion [46]. It is for this reason that we standardised foot positions as previously recommended [50].
The limits of agreement showed no systematic error (bias) between the two measurements of rater 1 (test-retest) or between the measurements of rater 1 and rater 2 (interrater). Our protocol, therefore, seems to be well suited for clinical applications where several clinicians are often responsible for the same kinds of measurement. The resulting SDD values were rather large. At this moment it is difficult to say whether the obtained SDD values are too large to detect clinically meaningful differences on an individual level and would, therefore, be clinically not rel- evant. SDD values provide information about the size of the error related to a measured value and in the amount of measurement error that should be taken into account when comparing two consecutive measurements. Therefore these SDD values imply to have a rather less satisfactory reliability for assessing individual changes in comparison to group changes. This assumption should be substantiated in further research. It might very well be that the changes caused by interventions are larger, especially in clinical populations, than the SDD found in our study.
With our protocol that has shown to have good reliability in both fallers and non-fallers the next step in research would be to test the validity of this protocol. For that purpose we should perform a prospective study in a group of older individuals that is threatened to fall. It can be argued that in such a measurement design our protocol may have predictive value for subsequent falls.

Conclusion
In conclusion, our measurement protocol showed good reliability for group assessment with no systematic errors in measuring postural balance in single-task and in dualtask conditions in a group of elderly fallers and non-fallers. These results may form a basis for further research examining, for example, the effects of physical exercise in elderly suffering from balance impairments. The value of the test protocol for individualised assessment remains unclear and should be subject to further research.