This study has provided information concerning the measurement properties of four performance measures used to complement information concerning lower extremity functional status in patients with advanced OA undergoing THA or TKA. The test-retest reliability component of this study was conducted over a median interval of 178 days, which is a longer period than would typically be chosen to assess stability. This extended reassessment interval was chosen to accommodate the fact that random measurement error is often time dependent, and in practice, the period between clinical visits is often greater than several months . A potential concern when applying a reassessment of this duration is that true change in the sample will occur; however, in this study the LEFS MDC90was applied to further define a stable patient sample. The reliability coefficients (Table 2) for the time and distance components of the tests met or exceeded 0.90 with the exception of the TUG. They are believed to represent conservative estimates of the reliability likely to be associated with most clinical reassessment intervals.
It is important to remember that the reliability of a measure intended for individual patient application must be greater than the reliability of a measure designed for group use . Different authors have advocated different standards for individual patient use, Nunnally  recommended 0.95, Kelley  0.94 and Weiner and Stewart suggested 0.85 . Although the reliability of the TUG at 0.75 would meet the standards for group application, it would not meet the aforementioned standards for individual patient use. The SPWT, ST and 6MWT would meet one or all of these standards.
In reviewing the mean and quartile scores of the performance measures preoperatively (Table 3), the scores indicate higher function than those reported in other studies [14, 16, 17], including the findings from our own prior work which examined a large dataset of over 1800 patients . One potential explanation for these findings may have been the age of our sample, 25% of the patients were 57 or younger. As noted in the Canadian Joint Replacement Registry, the numbers of THA and TKA in the 45–54 year age group has increased between 1994/1995 to 1999/2000 . A second factor potentially accounting for the preoperative scores is the nature of the study. Individuals who could not complete all the performance measures preoperatively would not be included, thereby filtering out the individuals with the highest disability.
To be useful in clinical practice, the scores obtained on outcome measures must have meaning to clinicians. In this study, the SEM was used to identify the error associated with a patient's reported score and to estimate the value of MDC90. Because the SEM is reported in scale points, it enhances the interpretability of a patient's score and change score. To the authors' knowledge this is the first study to provide estimates for MDC90for each of the four physical performance measures in the hip/knee end stage OA-arthroplasty population. These benchmarks will assist clinicians to more effectively monitor change in these types of patients.
Using a different methodology, Redelmeier et al  determined the smallest difference in the 6MWT associated with a noticeable difference in perceived walking ability for COPD patients to be a distance of 54 meters. Using this as a benchmark in arthroplasty patients would underestimate the distance required to be confident that a change had truly occurred. This illustrates the importance of population specificity when determining MDC90.
Many studies assessing change have focused on improvement only; the current investigation assessed deterioration and improvement [14, 21, 38, 39]. Based on prior work, it was hypothesized that surgical intervention would induce a reduction in lower extremity functional status when assessed within 16 days of surgery . All time/distance performance measures demonstrated deterioration over this interval. Subsequently all of the measures demonstrated significant improvements between the first and second postoperative visits. These findings suggest that the four performance measures are adept at assessing both types of change. The greatest changes were associated with the ST and 6MWT. Examination of the SRMs for these two tests demonstrated similar responsiveness over the studied time intervals.
This parallels the findings in the study by Parent et al  examining early recovery after TKA using locomotor tests, including gait speed, stair ascent cycle duration, and the 6MWT. Of these measures, the authors found the 6MWT to be most responsive over the study's three time points, ranging from preoperatively to 4 months postoperatively. Of interest, the stair ascent cycle duration, measured using a 2-dimensional biomechanical analysis system was least responsive and the authors recommended evaluating the responsiveness of a timed stair measure, which has been accomplished in this study.
In addition to providing information concerning the psychometric properties of the performance measures, our results also offer insights into the clinical application of these measures. The TUG was originally developed to easily evaluate the risk of falls using balance and basic functional mobility . Tested in the frail elderly population, scores under 10 seconds were associated with individuals who were functionally independent . Considering this benchmark and normative values reported for community dwelling elders , the patients' mean TUG score, in this sample, did not demonstrate much disability. Consequently, there would not be as much opportunity for detecting change. However, the usefulness of the TUG in an elderly orthopaedic population, including patients post THA and TKA, has been reported. .
In considering the SPWT and the 6MWT, it is not surprising that the 6MWT demonstrated greater responsiveness in this study, as it was measured over a longer distance and duration. Unlike the SPWT, which in this study was used to determine fast walking speed, the 6MWT has both speed and endurance components. However, as apparent in Table 5, the TUG and SPWT tests might be preferred if the goal was measurement in the early acute post-operative phase when patients deteriorate and may be unable to perform the ST or 6MWT. This was the case for over 25% of the current study's sample when assessed within 16 days of surgery. Therefore, the time period of administration and the patient's preoperative level of disability can serve as useful guides for clinicians faced with the decision of choosing the most informative measures.
This study has several limitations. As apparent in the tables, different numbers of patients were assessed at postoperative assessment one and two. This is partially a reflection of the study design, as mentioned earlier, not all patients were assessed at the same time points due to the goals of the larger ongoing observational study. However, some patients were also missed at both time points due to unexpected changes in appointments without communication to the investigators. Referral bias might also be a potential concern due to the nature of the institution being a specialized tertiary care facility. This must be balanced against the fact that it is one of the largest joint arthroplasty centers in Canada and draws from a wide catchment area. Considering the higher preoperative function of the patients in this sample, it will be important to replicate the current study's findings in different settings with other samples of arthroplasty patients. In addition, as responsiveness is a highly contextualized attribute , it would be informative to study the results over additional time points in the postoperative continuum.