A comparison of outcome measures used to report clubfoot treatment with the Ponseti method: results from a cohort in Harare, Zimbabwe

Background There are various established scoring systems to assess the outcome of clubfoot treatment after correction with the Ponseti method. We used five measures to compare the results in a cohort of children followed up for between 3.5 to 5 years. Methods In January 2017 two experienced physiotherapists assessed children who had started treatment between 2011 and 2013 in one clinic in Harare, Zimbabwe. The length of time in treatment was documented. The Roye score, Bangla clubfoot assessment tool, the Assessing Clubfoot Treatment (ACT) tool, proportion of relapsed and of plantigrade feet were used to assess the outcome of treatment in the cohort. Inter-observer variation was calculated for the two physiotherapists. A comparative analysis of the entire cohort, the children who had completed casting and the children who completed more than two years of bracing was undertaken. Diagnostic accuracy was calculated for the five measures and compared to full clinical assessment (gold standard) and whether referral for further intervention was required for re-casting or surgical review. Results 31% (68/218) of the cohort attended for examination and were assessed. Of the children who were assessed, 24 (35%) had attended clinic reviews for 4–5 years, and 30 (44%) for less than 2 years. There was good inter-observer agreement between the two expert physiotherapists on all assessment tools. Overall success of treatment varied between 56 and 93% using the different outcome measures. The relapse assessment had the highest unnecessary referrals (19.1%), and the Roye score the highest proportion of missed referrals (22.7%). The ACT and Bangla score missed the fewest number of referrals (7.4%). The Bangla score demonstrated 79.2% (95%CI: 57.8–92.9%) sensitivity and 79.5% (95%CI: 64.7–90.2%) specificity and the ACT score had 79.2% (95%CI: 57.8–92.9%) sensitivity and 100% (95%CI: 92–100%) specificity in predicting the need for referral. Conclusion At three to five years of follow up, the Ponseti method has a good success rate that improves if the child has completed casting and at least two years of bracing. The ACT score demonstrates good diagnostic accuracy for the need for referral for further intervention (specialist opinion or further casting). All tools demonstrated good reliability. Electronic supplementary material The online version of this article (10.1186/s12891-018-2365-3) contains supplementary material, which is available to authorized users.


Background
Clubfoot, or congenital talipes equinovarus, is a condition that is present at birth in which the foot is in a rigid turned-in position. Corrective treatment of a high quality remains a key requirement for reducing disability and improving function related to the deformity. Over the past decades there has been an increase in the use of the Ponseti method to correct clubfoot [1]. This method involves the simultaneous correction of three components of the clubfoot deformity through manipulation and serial casting. The equinus (downward pointing of the foot) is corrected last, often with a percutaneous achilles tenotomy. This is followed by long term use of a foot abduction brace at night to maintain the foot position [2]. Despite the global trend toward increased use of the Ponseti method, there remains variation in how success of clubfoot treatment is measured [3,4].
The Ponseti method is administered by locally trained therapists in resource constrained settings in Africa [5]. These clubfoot therapists often work alone and have no specialised physiotherapy or surgical support present in the clinics or nearby. It is important that they have a user friendly assessment system with agreed criteria for when treatment is not working and referral to a specialist for further management is indicated.
No globally accepted outcome scoring system exists to inform locally trained clubfoot therapists of the need for referral for further intervention. The most frequently used approach to measuring whether the Ponseti method has been successful (or not) is clinical assessment. In sub-Saharan Africa 68 to 98% of cases are reported to have a successful outcome with the Ponseti method [4]. This study aims to compare the results of the Ponseti method of clubfoot management at three to five years from initial correction using five different outcome measures. We explore the diagnostic accuracy of the outcome measures, which is the ability of the assessments to discriminate between the need for referral for further intervention and a successful outcome [6]. For methodology review, outcome score results in this study are compared with a reference standard of 'true' treatment success status (defined by full clinical assessment). The results are categorised as true positive, false positive (referred but not needed), true negative, and false negative (should have been referred but was missed) [7]. Sensitivity of the scoring system relates to the proportion of the children who need referral for further intervention and who are correctly classified by the outcome measure as requiring referral. Specificity is the proportion of children who do not need referral and who are correctly classified as not requiring referral by the outcome measure. Positive predictive value and negative predictive value are useful to understand the probability that a child with a given positive or negative outcome score result has the need for referral for further intervention and are therefore correctly classified.

Study design and population
This study was conducted and reported according to established STARD (Standards for Reporting of Diagnostic Accuracy Studies) guidelines [8] (Additional file 1). A cohort study of 218 children with idiopathic clubfoot was conducted in 2016. The children were managed with manipulation and casting at Parirenyatwa Hospital, Harare and the results are published elsewhere [9]. All children with a diagnosis of unilateral or bilateral idiopathic clubfoot who started treatment with the Ponseti method at the study hospital between 22nd March 2011 and 23rd April 2013 (25 months) were included in the cohort. The only exclusion criterion was foot conditions other than idiopathic clubfoot, for example clubfoot associated with neural-tube defects such as spina-bifida.

Sampling technique
The phone numbers of all carers of the cohort children were extracted from the clinic records in January 2017 and contact with them was attempted at least three times. Caregivers and their children were invited to attend the study. The children were between 3.5 and 5 years from initial casting.

Ethics, consent and permissions
Ethical approval for this study was granted by the Medical Research Council of Zimbabwe (MRCZ) and the London School of Hygiene & Tropical Medicine (LSHTM) (ref:11132 /RR/4725). All children and their caregivers were read an information sheet about the study and given an opportunity to ask questions. If they agreed to participate, written consent was taken from the caregiver who remained present throughout the assessment as per national requirements. Transport costs were reimbursed and referral services available in Harare were mapped pre-emptively to ensure appropriate onward referral for any children that required further intervention.

Data collection
Two physiotherapists who are experienced in co-ordinating national clubfoot programmes reviewed the assessment tools over three days for contextual relevance. The questionnaires were available in English and Shona and were cognitively tested. We used five outcome methods, three that give a score, and two that give a binary (success/failure) outcome. The Roye score [10] is a self-reported measurement that is used in high income settings. The Bangla clubfoot assessment tool [11] and the Assessing Clubfoot Treatment (ACT) score [12] combine physical assessment and parent reported outcome measures, and have been developed for low resource settings. The Bangla score includes a functional assessment. The two binary outcomes were assessment of a plantigrade foot [5] and the relapse pattern [13]. The study protocol was pilot tested for suitability in July 2016. Children were examined independently in January 2017 by the two physiotherapists and a decision was made if referral for further intervention (re-casting or surgical review) was required. Clinical examination composed observation, physical assessment and functional review; it included assessment of passive and active range of motion (plantiflexion, dorsiflexion, eversion, inversion of the foot, and knee extension), muscle strength tests of the calf and evertors of the foot, heel raises, squatting ability and gait analysis (walking and running).

Data management and analysis strategy
The data were entered into a Microsoft Excel 2000 (Microsoft Inc., Redmond, Washington) software package. Data were analysed using Stata 14.1 (Stata-Corp 4905, Lakeway Drive College Station, Texas 77, 845, USA). Statistical significance was set at the 95% confidence level. The inter-observer variation for the measurement of the physical assessment tools was assessed i.e. Intra-class correlation coefficient (ICC) ≥0.75 [10]. Outcomes of children who had completed casting and ≥ two years of bracing were compared to all of the children who were followed up, and to those who had only completed casting. A two-tailed paired t-test was used to assess the mean difference between the outcome measures of Roye, Bangla and ACT scores. Fisher's exact test of independence was used to assess the difference in proportion of children with an outcome of relapse and plantigrade foot. The five measures were compared against the standard of whether referral for further intervention was required (for re-casting or surgical review) as defined by a consensus agreement of two expert physiotherapists with experience of managing clubfoot in countries in Africa. Sensitivity, specificity, positive and negative predictive values were calculated for the five measures and compared to full clinical assessment (gold standard). The threshold for diagnostic accuracy was based on previous studies and was defined prior to the study. It was set at 70% for the three scores with continuous scales [14] and positive/ negative for the binary outcomes [7].
All tools demonstrated good reliability, with an intraclass coefficient (ICC) of ≥0.82 on all criteria (Table 1). An ICC of 1.00 demonstrates perfect correlation.
In the children who were followed up (n = 68) the success of treatment with different scores varied between 56 and 89% ( Table 2). In the children who completed casting (n = 63) it was between 57 and 93%; and in the children who completed casting and at least two years of bracing (n = 38) it was from 58 to 97% (Table 3). The individual category calculations for each outcome measurement are in Additional files 2, 3, 4 and 5.
The proportion of children with relapse and the Bangla tool had the lowest good outcome results of 56 and 59% respectively. Figure 2 demonstrates the variation in outcome when compared to full clinical assessment (the gold standard illustrated in the first row of the Fig. 1 Length of time child attended clubfoot clinic appointments figure). 87% (33/38) children who completed ≥2 years bracing were assessed as successfully treated with full clinical assessment. The scores that demonstrate a higher success (Plantigrade: 97% and Roye score: 94%) miss cases that require further intervention. The scores that demonstrate a lower success (Relapse: 58% and Bangla: 66%) are restrictive in the measurement of success.
There was strong evidence for a difference between the outcomes of the Roye score and the Bangla score (p < 0.0001), the Roye and the ACT score (p = 0.0013), and the ACT and Bangla score (p < 0.0001). It follows that none of these assessments can provide essentially the same estimate of success as the other measures.
There was a difference in the relative proportion of the cohort with relapse and plantigrade foot when assessed with Fischer's exact test (p = 0.012). The binary outcomes are therefore not interchangeable.
No adverse events occurred as a result of any of the outcome measures undertaken. When compared to the standard of full clinical assessment and the subsequent decision on the need for referral for further intervention, the Roye score had a sensitivity of 31.8% (95%CI: 13.9-54.9%) and a specificity of 100% (95%CI: 92-100%), with positive and negative predictive values of 100 and 74.6% respectively. The Bangla score demonstrated 79.2% (95%CI: 57.8-92.9%) sensitivity and 79.5% (95%CI: 64.7-90.2%) specificity with 67.9% positive predictive and 87.5% negative predictive values, and the ACT score had 79.2% (95%CI: 57.8-92.9%) sensitivity and 100% (95%CI: 92-100%) specificity in predicting the need for referral, with positive and negative predictive values of 100 and 89.8% respectively. Of the 44 children that did not require referral for further intervention, all achieved plantigrade or more (positive predictive value: 100%) and of those who did require referral (n = 24), 14 were identified with the plantigrade assessment (achieved less than   Table 4.

Discussion
This study found that five scoring systems that are used to report outcomes of clubfoot treatment provided a wide spectrum of success (from 56 to 89% of cases) in a cohort with 3.5-5 years of follow up. When compared with the standard of clinical assessment, missed referrals ranged from 7.4% (the Bangla and ACT scores) to 22.7% (the Roye score). The measurements assess different aspects of clubfoot correction, from parent reported outcome measures (the Roye score) to scores that include  physical assessment (the Bangla and ACT score) and single measurements (plantigrade foot and evidence of recurrence). Success improves in all measures with the completion of casting and at least two years of bracing.

Comparison to previous studies
There are limited studies that compare measurement tools in the same patient against which to compare our findings. However, success of treatment in this cohort is similar to other studies in sub-Saharan Africa (between 63 and 98% of cases) [9]. Non-adherence and surgical intervention, often defined as failure, are reported to vary from 7 to 61% and 3-39.4% [15] respectively. Ponseti and Laaveg [16] describe a scoring system that rates functional results as satisfactory in 88.5% of feet. Further studies describe success using the Ponseti and Laaveg system as 89.3% [17]. The criteria includes the need for a goniometer and the tool was therefore not included in evaluation of this cohort.

Use of outcome measures
The ease of use and rate of incorrect classification in the tools used to measure success need to be considered when selecting an outcome measure. Single item scales for assessment of individual children require no further calculation and may be easier to use in clinics (such as plantigrade foot or evidence of relapse), however their simplicity may not allow a full assessment of success. Multi-scale items prove difficult to transform into useful statistics without technology and are unlikely to be routinely used in clinics. This study found no clear agreement between the different outcome measurements in use.
All of the assessments used in this study have limitations. The Roye score has been validated in high income settings and parents in our study reported difficulty in answering the question of "How often does your child have problems finding shoes that he or she likes?" as it was understood to be related to the availability of a variety of shoes. The Bangla score took the longest time to transform with statistical analysis. Acceptability and feasibility of the ACT score is needed to be studied in future research. The ACT score is likely easy to teach, however this is unknown as the examiners were physiotherapists; the time taken for other cadres of health workers to use the ACT tool is also unknown. With regard to the relapse score, Bhaskar et al. (2013) considered ankle dorsiflexion < 15 degrees with knee in extension as grade IA relapse. This may be a reason for the restriction in defining good outcome as an evaluation of 85 normal feet in children found that the mean ankle dorsiflexion was 12.8 degrees with knees in extension [18]. Greater than 15 degrees may therefore be difficult to achieve.

Relationship between the outcome measures and clinical assessment
The Bangla and ACT tool were most helpful in predicting the need for referral for further intervention (specialist opinion or for further manipulation and casting). The five referrals that were missed with the ACT score were children who required review of a mobile curvature of the lateral border of the foot or supination in swing phase, neither of which are assessed with the score. Despite this, the ACT tool demonstrates the best diagnostic accuracy for the need for referral for further intervention.

Strengths and limitations of study
This study reports on five measurements of success in a cohort at 3.5-5 years from initial treatment. Repeat phone calls facilitated assessments when caregivers were initially unavailable. Two independent raters reduced the likelihood of reporting bias and all outcome measures were verified by the reference standard. The threshold for diagnostic accuracy was based on previous studies and was defined prior to the study. There were also study limitations. No distinction between a clubfoot that may not have been fully corrected and a relapsed clubfoot was made, and all cases with elements of the deformity were classified with the relapse score, which may be a source of potential bias that underestimates the accuracy of the relapse score. The tools were chosen based on ease of use in low resource high volume clinics and were not all initially developed to identify need for referral for further intervention.

Implications for practice
Task shifting and task sharing between orthopaedic and non-specialised health workers in some clinics means that outcome measures are even more important as teams expand. As older children are being treated with the principles of the Ponseti method [19], expert guidance on assessment and measurement in these cases is needed. The Roye score is overly optimistic of good outcomes, the Bangla score is restrictive in identifying good outcome, and the ACT score most closely aligns to read an information sheet about the study and given an opportunity to ask questions. If they agreed to participate, written consent was taken from the caregiver who remained present throughout the assessment as per national requirements.

Consent for publication
Not applicable.