Test–retest reliability for performance-based outcome measures among individuals with arthrogryposis multiplex congenita

Background Most individuals with arthrogryposis multiplex congenita, a rare condition characterized by joint contractures in ≥ 2 body regions, have foot and ankle involvement leading to compromised gait and balance. The purpose of this study was to establish between-days, test–retest reliability for performance-based outcome measures evaluating gait and balance, i.e., the 10-m Walk Test, Figure-of-8 Walk Test, 360-degree Turn Test, and modified Four Square Step Test, among adolescents and adults with arthrogryposis multiplex congenita. Methods This reliability study included ambulatory participants, aged 10 to 50 years, with a medical diagnosis of arthrogryposis multiplex congenita. Participants completed performance-based measures, in a randomized order, on two separate occasions. Intraclass correlation coefficients with 95% confidence intervals and minimal detectable changes at the 90% and 95% confidence level were calculated. Results Participants included 38 community-ambulators with a median of 13 out of 14 upper and lower joint regions affected. Intraclass correlation coefficient point estimates and 95% confidence intervals ranged from .85-.97 and .70-.98, respectively. Minimal detectable changes were 10 to 39% of sample means and were largest for the modified Four Square Step Test. Conclusions Among individuals with arthrogryposis, gait speed per the 10-m Walk Test, as well as non-linear walking and dynamic balance assessment per the Figure-of-8 Walk and 360 Degree Turn Tests, have adequate test–retest reliability enabling evaluation of individual patient changes. Changes in groups of ambulatory individuals with arthrogryposis multiplex congenita may be reliably evaluated with all of the studied outcome measures.

Sions et al. BMC Musculoskeletal Disorders (2022) 23:121 to AMC-provider specialists as they age-out of pediatric healthcare facilities [10]. Longitudinal monitoring requires quick and reliable outcome measures with known minimal detectable changes (MDCs) that allow assessment of whether 'true change' exceeding measurement error has occurred [11].
Performance-based outcome measures evaluate an individual's capacity under a given set of conditions, complementing patient-reported outcomes, which evaluate an individual's perceived ability [12]. While selfreport measures are discriminative when evaluating if individuals are able or unable to perform a given task, performance-based measures may further stratify levels of physical functioning [12]. Assuming standardized testing procedures, an additional inherent benefit to timed performance-based measures is objectivity. Objectivity may be critical when justifying care, including surgical procedures, orthotics, assistive technology, and rehabilitation, particularly when there is a potential conflict of interest for the examiner and/or patient.
Whereas performance-based functional outcomes, such as the 10-m Walk Test (10mWT), are staples in adult rehabilitation, use in pediatric orthopedics is less common, especially among patients with congenital conditions, including AMC [7]. If non-condition-specific performance-based outcome measures used in other patient populations are established as psychometricallysound among children and adults with AMC, practitioners may be able to easily track function from adolescence through adulthood, without the need for novel measures, overcoming a primary barrier to adoption of new outcome measures-lack of practitioner familiarity [13]. Further, use of outcome measures that are not conditionspecific allows for comparison to normative data, as well as comparison across patient samples; such comparisons can assist with prioritization of patient subgroups most in need of limited resources due to greater risk for poor outcomes.
Among adolescents and young adults with AMC, lower-extremity involvement, especially of the foot and ankle, occurs in 80-90% of individuals [14]. Consequently, gait and balance impairments are ubiquitous. In community-dwelling older adults, reduced gait speed is predictive of adverse health outcomes, such as disability, institutionalization, and mortality [15]. Among pediatric populations, reduced gait speed is associated with worse disease severity and greater disability, [16,17] and in some populations like Hereditary Motor Sensory Neuropathy, gait speed is a predictor of subsequent functional decline [18]. Given increased challenges to gait stability and symmetry with curved path walking, the use of straight-path walking alone is discouraged [19]. Thus, the purpose of this study was to establish between-days, test-retest reliability and minimal detectable changes for four outcome measures of gait and balance among adolescents and adults with AMC. The four measures were the 10mWT, which evaluates straight-path walking, and the Figure-of-8 Walk Test (F8WT), 360-degree Turn Test (360TT), and modified Four Square Step Test (mFSST), which evaluate non-linear walking.

Study design and participants
This test-retest reliability study recruited participants, aged 10 to 50 years, with a medical diagnosis of AMC who were ambulatory and able to walk at least 30 m with or without an assistive device. Recruitment occurred through verbal recruitment at community events and print advertisements from April to July of 2019. Exclusion criteria included cognitive-impairment precluding assent/consent, spine or lower-limb surgery in the past 6 months, a history of lower-limb amputation, current dizziness, an acute illness, or a progressive neuromuscular disease. The study was conceptualized in November of 2018, approved by the University of Delaware Institutional Review Board for Human Subjects Research (project number: 1354682; initial approval date: 1/16/2019) and conformed to the World Medical Association's Helsinki Declaration; written informed consent/assent and parental permission (as applicable based on age) was obtained for all participants. See Fig. 1 for the study timeline.

Data collection
Data collections occurred at a University of Delaware clinical research laboratory in Newark, Delaware, and at a Hilton hotel conference space in Norfolk, Virginia. After informed consent, trained research staff conducted standardized interviews for participant characterization; parental input was encouraged for medical history recall among adolescent participants. Participants independently completed the Gillette Functional Assessment Questionnaire to characterize their mobility status, where scores range from 1 = "unable to take steps'' to 10 = "walks, runs, climbs on level and uneven terrain without difficulty or assistance" [20]. For further participant characterization, average pain intensity rating over the past 7 days was obtained with the Patient-Reported Outcomes Measurement Information System, where 0 = "no pain" and 10 = "worst imaginable pain" [21]. Participants then completed performance-based outcome measures in a randomized order, and returned 1-10 days later for repeat performance-based testing.

Performance-based outcome measures 10mWT
'Usual' gait speed was obtained over the middle 6 m of a 10-m course, allowing 2 m for acceleration and deceleration at either end ( Fig. 2) [22]. For some children, adolescent, and adult populations, such as those with neurological conditions and hip dysplasia, betweendays test-retest reliability for the 10mWT has been reported [23,24]. The 10mWT, using ≥ 2 trials, is included in a core set of rehabilitation outcome measures recommended for adults with neurological conditions based on its established psychometric properties and clinical utility [22]. Gait speed was determined from a three-trial average based on prior test-retest reliability research [25].

F8WT
The F8WT, which times curved-path walking around two cones arranged in a figure-of-8 ( Fig. 2), is a valid measure of walking skill, and provides complimentary information to gait speed [26]. F8WT can differentiate between adults with lower-limb pathology with varying functional mobility levels [27]. Among adults post-stroke and -total knee arthroplasty, test-retest reliability for time and number of steps, where a greater number of steps indicates poorer performance, has been reported [26,[28][29][30].

360TT
During community ambulation, up to 50% of steps may include turning, [31] and impaired turning has been associated with recurrent falls [32]. Rehabilitation balance batteries, including the Pediatric Balance Scale, Berg Balance Scale, and Tinetti Performance-Oriented Mobility Assessment, have a 360-degree turn task, but administration time required to complete such batteries may reduce clinical adoption [33]. Therefore, in this study, four 360-degree turns [two to the left; two to the right; [34] Fig. 2] were completed and average time for completion and number of steps were determined. More steps to complete the turn indicates poorer performance. Testretest reliability for 360TT, in isolation, has been reported in adult patient populations, such as those with Multiple Sclerosis, Parkinson's disease, and post-stroke [34][35][36][37].

In various pediatric and adult populations, Four Square
Step Test (FSST) test-retest reliability has been reported, as has concurrent, construct, and predictive validity for falls among adults [23,[38][39][40]. With the FSST, individuals complete multi-directional stepping (i.e., forwards, lateral, and backwards) over canes arranged in a ' + ' in a specified sequence [38]. Requirement of a specified sequence for a valid trial increases cognitive-load, but the requirement for foot clearance alongside correct sequencing can result in 'invalid' trials, contributing to the FSST's known floor effect [38]. Thus, the mFSST, which substitutes taped lines for canes, was used ( Fig. 2) [27,39]. Aligning with prior research, [39] reliability for 'average' and 'best' performance over two trials, was determined.

Data Analyses
IBM SPSS Statistics 26 (Armonk, NY, USA) was used for all analyses. Descriptive statistics were determined. Wilcoxon Signed Ranks Test was used to evaluate intraindividual differences in pain intensity between testing sessions (p ≤ 0.050). Intraclass correlation coefficients (ICCs) were used to evaluate between-days, test-retest reliability for performance-based outcome measures using two-way mixed effects, absolute agreement (ICC 3,1 or 3,k models) [41,42]. Given the nonparametric distribution of performance-based data, Bland-Altman plots were evaluated to ensure ICC analyses were appropriate. ICCs > 0.90 may be considered excellent [41] and desirable for evaluating individual patient changes, [43] while ICCs > 0.70 may be adequate for evaluating group changes [44,45]. Standard errors of measurements (SEMs) were calculated. The decision to use MDC at the 90% (MDC 90 ) or 95% confidence level (MDC 95 ) may be determinant on the intervention. For example, meeting or exceeding MDC 90 may be appropriate for evaluating success of conservative interventions, such as rehabilitation, but when evaluating success of surgical interventions with greater inherent risks, MDC 95 may be more appropriate; [46] thus, both MDC 90 and MDC 95 were determined.

Results
Of the 63 interested individuals with AMC screened for study participation, 12 were ineligible due to an inability to walk at least 30 m. Fifty-one enrolled participants completed the first onsite data collection, which was part of a larger cross-sectional research project. Of these 51 participants, 38 were available and agreed to participate in an optional second onsite data collection within 1-10 days. Thus, 38 participants were included in this between-days, test-retest reliability study, exceeding the recommended sample size of at least 30 participants for reliability studies in rehabilitation [47]. The sample was largely female, Caucasian, and reported no known genetic cause for their AMC (Table 1). Nearly 50% of the sample had spinal involvement; the median number of affected upper-limb regions was 8 out of 8, and the median number of affected lower-limb regions was 5 out of 6. Home assistive device use was rare, but 29% of the sample reported community assistive device use. The median number of lower-limb orthopedic surgeries was 4, with spinal surgeries and upper-limb surgeries reported much less frequently. Most participants were ambulatory outside the home for community distances (i.e., GFAQ ≥ 8), and had mild-to-moderate pain over the course of last 7 days. Pain was not significantly different between testing sessions 1 and 2 (p = 0.368), with approximately two-thirds of the sample reporting 0/10 pain at the time of performance-based testing.
Bland-Altman plots for performance-based measures indicated ICC analyses were appropriate; however, differences between timepoints were significantly different from 0 for F8WT, average mFSST, and best mFSST scores (t = 2.061-3.783, p = 0.001-0.047). It was not unexpected for a practice effect to occur with repeated trials of novel tasks; while the absolute F8WT and mFSST scores improved between timepoints, individuals' relative standing did not change (all r > 0.91). Therefore, ICC analyses were deemed appropriate for evaluating reliability in this subset.  for average number of steps during the 360TT and time to complete the mFSST, regardless of 'average' or 'best' performance. Lower MDCs for average mFSST performance support using the average of 2 trials, when available, but 6 participants, i.e., 16% of the sample, had at least 1 invalid trial, suggesting averaging might not be possible in all ambulatory individuals with AMC.

Discussion
Outcome measures enable objective evaluation of functional changes and can inform clinical decisions, predict future ability, and fulfill healthcare documentation requirements [48]. Kennedy and colleagues called for core gait and functional ambulation outcome measures for use in pediatric clinical and research settings [16]. This study is a first step towards a possible core set of performance-based functional outcome measures for use in adolescents and adults born with AMC. Good-to-excellent between-days, test-retest reliability was found for the 10mWT, F8WT, 360TT, and mFSST, which are currently used in other patient populations. ICCs ≥ 0.90 suggest gait speed per the 10mWT, as well as evaluation of curved path walking and dynamic balance per the timed F8WT and 360TT, may be reliably evaluated among individuals with AMC on two separate occasions. Provided MDCs may enable clinicians to determine whether changes surpass measurement error and indicate 'true change' in their patients with AMC, i.e., pre-to-post rehabilitation (using MDC 90 ) and pre-to-post surgery (using MDC 95 ). Additionally, based on ICCs ≥ 0.70, evaluation of quality of movement, i.e., number of steps, as well as dynamic balance per the mFSST may have clinical trials utility in evaluating changes among groups of ambulatory individuals with AMC, although a potential floor effect should be considered when using the average of 2 mFSST trials, due to 'invalid' trials. While our study did not include a control group, it appears our individuals with AMC are presenting with worse gait speed and dynamic balance when compared to controls and peers with other lower-extremity pathologies. Among typically-developing children and young adults, 'self-selected' gait speeds for 11-30 year olds are, on average, 1.28-1.36 m/sec, [49] which is significantly faster than speeds obtained in our participants with AMC (i.e., 1.01-1.02 m/sec). Further, Scott et al. found gait speeds of 1.2 ± 0.2 m/sec among adolescents and young adults with hip dysplasia (n = 24), suggesting gait speeds with AMC, where multiple lower-limb regions are typically involved, are worse [23]. Collectively, results highlight the importance of evaluating and addressing reduced gait speed among individuals with AMC, particularly since 'self-selected' gait speed is better correlated to perceived gait quality when compared to other performance-based measures, like the 6-Minute Walk Test, in young adults with congenital, mobility-limiting conditions [50]. Scott et al. also reported adolescents and young adults with hip dysplasia (n = 24) had FSST times of 6.6 ± 2.5 s, as compared to controls (n = 21; 4.0 ± 0.7 s) [23]. Individuals with AMC in our study had mFSST times that were double that of individuals with single-joint involvement and triple that of controls. Hence, with AMC, dynamic balance appears considerably compromised.
Mean 360TT times among young healthy adults (n = 34) have been reported to be 2.2 s, [35] which is about 20% faster than timed 360TT among our participants with AMC. Among children and young adults who are typically developing, peak turn velocity during 180 degree turns are, on average, 221-289 degrees/ sec [49]. Among our participants with AMC, as evaluated with a 'quick' 360-degree turn, mean turn velocity was about 129 degrees/sec. Combined, data suggests impaired turning with AMC. As community-ambulation requires frequent turning, [31] and impaired turning has been associated with recurrent falls, [32] it may be imperative to incorporate turning into gait training among individuals with AMC.
Our participants with AMC had better FSST performance (median: 10.52-12.01 s) than children, aged 5-12 years, with Down syndrome and cerebral palsy, (i.e., mean: 18.7 ± 5.7 s), [40] which might be due to use of the mFSST without canes in our study and/ or impaired cognition or inattention in the aforementioned pediatric study. Conversely, our participants had worse dynamic balance performance as compared to adults with unilateral lower-limb amputation [51] despite being younger and our use of the mFSST; [39] differences might be attributed to multi-region, lowerlimb involvement with AMC.

Study strengths and limitations
Our gait speed assessments were scientifically robust as we used a static starting position for all trials, allowed 2.0 m for acceleration, and provided standardized examiner instructions ('usual pace'); failure to control for any of these factors may negatively impact test-retest reliability. [53] Nevertheless, we could not establish the minimal clinically important difference (MCID) for gait speed, or the other performance-based measures, given the study design. Based on a systematic review of gait speed among individuals with pathology, however, the MCID is likely around 0.1 m/s, [54] which is similar to the MDCs calculated in our study.
Study strengths include recruitment of a mixed sample of adolescents and adults with AMC, as well as selection of clinically-feasible outcome measures, which may enhance clinician adoption [55]. But, we acknowledge some additional limitations. First, without access to medical records, we could not confirm self-reported data, including whether participants were amyoplasiaor distal-type arthrogryposis, as defined by Hall et al [5]. The extent of limb involvement, however, would suggest the majority of our participants might be classified within the amyoplasia-type subgroup. Second, we standardized testing based on current practice, where individuals complete 1-3 recorded trials; we did not evaluate for practice effects (by completing trials until fatigue), which might have resulted in underestimation of performance. Third, we did not specifically target a care-seeking sample, who might have had worse mobility status or greater between-days pain fluctuations, which might have increased floor effects for some measures or negatively influenced test-retest reliability. Finally, performance in a laboratory, or clinical setting, may not reflect real-world performance; for example, gait speed among minors with developmental disorders has been reported to be slower in the real-world when compared to laboratory-obtained gait speed [56].

Conclusions
This study supports subsequent research evaluating linear and curved-path walking, as well as dynamic balance, via the 10mWT, F8WT, 360TT, and mFSST, among individuals with AMC. Future studies with larger sample sizes may seek to establish MCIDs, evaluate floor and ceiling effects among a more diverse sample in terms of mobility and musculoskeletal pain, and determine responsiveness of these performance-based measures to commonly employed interventions, such as bracing, rehabilitation, and surgery, in this patient population. In the interim, practitioners may adopt the 10mWT, F8WT, and 360TT and use provided MDCs when evaluating individuals with AMC for objectively determining intervention effectiveness.