The main results of this study showed that the test-retest reliability and concurrent validity of the OMAS were good for patients surgically treated due to an uni- bi or trimalleolar ankle fracture.
Reliability is an important dimension of any patient-based outcome measure as it is essential to establish whether changes observed are due to the intervention and not to variations related to problems with the outcome instrument. The larger the random error, the larger sample size is needed in order to obtain precise estimates of effects in a trial . There are two aspects that have to be considered when evaluating reliability: reproducibility and internal consistency. A correlation coefficient in a test-retest of a measurement tool should be at least 0.70 when studying groups of patients and exceed 0.9 when studying individuals [45, 46]. Internal consistency can be evaluated using Cronbach’s alpha (0–1) . When items are used to form a scale they should have internal consistency, which means they should measure the same thing and be correlated to each other. When comparing groups an alpha level of 0.7–0.8 is recommended [43, 45]. Too high levels of alpha indicate that all items are identical, addressing a rather narrow aspect of an attribute  or there is redundancy among items . Too low levels indicate that the items included in the scale are not related to each other .
To the best of our knowledge no previous studies have evaluated the test- retest reliability of OMAS in subjects surgically treated due to an ankle fracture. Both the Spearman’s rank correlation coefficient and the ICC between the two measurements turned out to be very high (rho =0.95 and ICC = 0.94), although the circumstances of the two occasions when the questionnaires were filled in were not exactly the same; the first version was filled in at home and the second at the clinic. Cronbach’s alpha was within the recommended limits (0.76) and thus the OMAS can be regarded as a reliable measure in subjects after an ankle fracture.
Even if an ICC value is high it does not mean that a test is appropriate for clinical use. Before recommending a test for that issue the measurement errors both for groups of subjects and for single individuals have to be analyzed as well. The measurement error of an instrument should be small and sensitive enough to detect real changes in scored function. In the present study the SEM and SRD were used which gave the measurement errors in absolute values. SEM was 4.4 points and SEM% was 5.8%. These figures are both low and should be regarded as indicative for true change beyond measurement error and applied as the smallest difference between two measurements when evaluating for example an intervention on group level. The 95% SRD ranged from −10–13.7 points and indicates that a real clinical change for a single subject should exceed this range. The SRD% is independent of the unit of measurement like SEM% and may be more easy to use in clinical practice. The SRD% found was 15.8% and thus if a subject scores for example 60 points this subject must improve 10 points to indicate a real change. From a clinical point of view these values seem to be reasonable and confirm that OMAS can be used to detect real changes in subjects after an ankle fracture.
Criterion validity is the extent to which one measure is related to other measures or outcomes. This type of validity can be divided into either concurrent or predictive validity; concurrent validity is when a new tool is to be compared at the same time with another measurement as gold standard . Often, however, a true gold standard against which a new measurement can be compared does not exist, and this was the case in our study. In the present study OMAS was validated using two different instruments, the disease-specific questionnaire FAOS and the global rating scale, GSRF. FAOS has been tested for reliability and validity in the Swedish version , the Turkish version  and the Iranian version . The ICC values for test-retest reliability were high [40, 48] and the validity using SF-36 varied between low and moderate in the Iranian version  and between low and high in the Turkish version , but in that study only three subscales of SF-36 were presented. SF-36 is an instrument evaluating generic health-related quality of life, including equal parts of mental health and physical health, whereas FAOS is a disease-specific instrument evaluating functional outcome of the ankle. Thus not all subscales of SF-36 might be expected to correlate well with FAOS.
We found that the correlations between OMAS and the five subscales of FAOS were all high. When looking at the figures it can be noticed that for the subscale ADL in FAOS the median value was 99 and the interquartile range was nine. These figures are more extreme than they are for the rest of the items, and it seems as if the suggested functions were minor problems for the patients included in our study. FAOS has been developed from the self-reported questionnaire KOOS aimed at evaluating function in subjects with osteoarthritis in the knee, and the questions are adjusted and relevant to those patients. We think that not all items might be suitable for patients after an ankle fracture especially, not in the ADL subscale where many items deal with problems in non-weight-bearing positions such as “bending to floor”, “putting on socks”, “taking off socks”, “lying in bed”. It is logical to believe that if patients do not identify with the problems presented, the total score of those items would be rated higher. In the validation study of FAOS the authors came to the same conclusion; only two of 17 items were considered as “at least of some importance” by the responders . Regarding the subscale pain, the correlation was high but again the numerical value for FAOS was higher, and again many of the items of that subscale of FAOS deal with pain in non-weight-bearing positions such as “at night while in bed”, “sitting or lying”, “bending foot/ankle fully”, “stretching foot/ankle fully”. These situations are probably a minor problem in the studied group with ankle fractures.
Global rating of function was the traditional way of presenting results after treatment of injuries or diseases in earlier studies [8, 11, 49]. As a clinician it is important to take into account the patient’s overall opinion about recovery and function after an injury. It should be optimal if the returned responses from a disease-specific instrument agree with the patient’s overall rated function from the same region. OMAS has been shown to correlate well in that respect using LAS . In the present study the evaluation using GSRF agreed well with results from OMAS. The points from the three groups “good”, “fair” and “poor” were significantly separated at both the six-month and the 12-month follow-up.
Responsiveness is one dimension of great importance to determine when evaluating the methodological quality of a measurement. The responsiveness of an instrument expresses its capability to detect changes over time or changes due to intervention. In the present study the responsiveness was evaluated by calculating the effect size and was found to be 0.44 which could be considered as medium . In the study by Rose et al. the effect size was evaluated in the early stages after an acute ankle ligament injury and was found to range from 1.3 to 2.3 . It is well known that ankle ligament injuries recover quickly during the first two weeks, thereby the figures between that group and the group that we have focused on should differ. Between six and 12 months after an ankle fracture the improvement can be expected to progress more slowly and by then the effect size found should be regarded as realistic. There is a lack of knowledge how the responsiveness of OMAS is in the early phases of the rehabilitation process after ankle fractures and further studies regarding this are required.
However, to be able to detect changes over time the floor and ceiling effects should be considered as well, which means the number of patients reporting the lowest score at baseline should be limited in order to be able to observe deterioration. In the same way ceiling effect occurs when a patient reports excellent function and receives the best possible score. Questionnaires with good validity are expected to have fewer categories with floor or ceiling effects and no more than 15% of the individuals should score on these levels . In the present study no floor or ceiling effects regarding OMAS were found at six months. At 12 months 15% scored 100, which is within the recommended limits, and one year after injury it can be expected that some of the patients have attained normal function.
OMAS is well known and has been used in lots of studies for several years [16, 17, 19, 22, 23, 26, 28, 29],[37, 50]. With relatively few items it can be easily completed and the items included are all relevant to normal activities of daily life. The score is simple for the researcher to use as the raw score is summed up without any further calculation. The different symptoms are given and have different points according to the extent of disability the authors considered they would lead to . It seems as if these differences are reasonable, as OMAS and self-reported global function using both LAS  and GSRF in the present study have shown good correlations. Pain during different weight-bearing situations is the item that is highest weighted. It seems relevant as the foot and ankle take weight during every step all day long and pain in these situations should be strongly disabling. In two of our earlier studies we found that one year after the injury more than half of the patients still experienced pain while walking [22, 31].
Ankle fractures surgically treated normally need stabilization of the ankle mortise using either staples or a screw [8, 9, 11] and immobilization for six to eight weeks after surgery in a plaster cast or in an orthosis . These treatments influence the mobility of the ankle. The surgery technique might diminish the flexibility in the mortise and perhaps also the range of motion in dorsiflexion in the ankle joint. Maximum of dorsiflexion is needed during many activities of daily living such as climbing on a stool, rising from a chair, walking downstairs, walking uphill, rising from floor etc. Problems in those situations remind the patient of the injury and would thus be reflected in the score. Many authors have reported that patients still complained about stiffness in the ankle one year after the injury [16, 17, 22, 31].
Swelling has been frequently reported in the studied patient group [15, 17, 22, 31]. Swelling probably also affects the experience of stiffness and pain. Both stiffness and swelling are graded relatively high in the scoring scales, which seems relevant. Running and jumping are weighted lower. Ankle fracture occurs in all ages but increases with higher age [1, 6], particularly in women . Most ankle fractures happen to persons when stumbling or slipping in everyday life rather than to athletes . It is thus likely that these functions have less influence in the main group of injured persons, which might be the reason why these items were weighted lower by the authors . However, in younger age groups these functions are probably more important to regain, and then the weighting could be discussed. Furthermore, impediments or inabilities in return to work or to earlier activities of daily life might have a great impact on a person’s life. This item is weighted high, which seems to be correct.
Despite having relatively few items, the OMAS includes all dimensions of ICF recommended by the WHO . Pain, stiffness and swelling belong to the domain Body function; stair-climbing, jumping, running and squatting to the domain Activity; and work/activities of daily life to the domain Participation. OMAS thus also fulfills these demands.