This is the first study to adapt the SHAP for pediatric use and assessed reliability of this adapted version, the SHAP-C. Children were able to perform all SHAP-C tasks using the corresponding objects (including the downsized objects). The task means were significantly different in 7/26 tasks when a single assessor tested twice and in 7/26 tasks when three different assessors tested (P-values < 0.05). The intra-rater reliability of the SHAP-C was relatively better compared with the inter-rater reliability. Variation values within the same assessor, the RCs, had percentages < 75% in 17 out of 26 SHAP-C tasks (7/12 abstract objects tasks and 10/14 ADL tasks), indicating a relatively good repeatability of the procedure within the same assessor, at least in ADL tasks. The time scores per task varied largely between the three assessors. In 22/26 SHAP-C tasks, the RCs were higher than 75% of the task mean, thus revealing, poor SHAP-C repeatability. The small differences in task means on a group level indicate that the SHAP-C can be used for group comparisons. However, in clinical practice on an individual level, the SHAP-C may be used when one assessor is engaged but with considerable within-subject variation (Table 3). The SHAP-C should be used with caution when more assessors are engaged. Further adjustments are required to provide clinicians with a reliable SHAP-C.
In the current study, the mean values for the abstract objects of the SHAP-C tasks are much lower than the means of the SHAP tasks in adults (overall meansSHAP-C < 1.21 seconds vs. meansSHAP > 1.58 seconds) . This discrepancy is most likely because timing of the SHAP-C tasks was differently executed than in the SHAP (timing by the assessor vs. self-timing). Compared with the SHAP, the SHAP-C means do not include the times of two phases: (1) stopwatch activation–reaching-the-object and (2) after-release of the object–stopping the stopwatch. On the other hand, the SHAP-C means in more complex ADL tasks were overall higher than those of SHAP in adults (e.g., pick up coins, undo buttons, or food cutting, meansSHAP-C = 5.64-19.13 seconds vs. meansSHAP = 3.12-6.77 seconds) . This finding of children being slower than adults in executing complex tasks is in line with the reports in literature explaining age-related differences in (neuro) motor development (e.g., maturation of neural cortex gradually over time) [31, 32]. Nevertheless, our means for the SHAP-C tasks represent the first estimations of norm values. Because of the observed variability in task times (Tables 2 and 3), a larger sample is required to determine the norms once the SHAP-C protocol is more definitive.
Bland and Altman recommended the use of RC to determine consistency in outcomes of a measurement instrument [29, 33]. The precision of an RC over the Pearson’s correlation coefficient and intra-class correlation has been highlighted [29, 34]. There are, however, no standardized rules for interpretation of RCs. The suggested approach is that the lower the RCs are, the better the repeatability of the instrument is. The comparison of the RC to the minimum clinically important difference/change (MCID) would indicate good reliability of the instrument if the RC < MCID and vice-versa . In the absence of MCID values for the SHAP or SHAP-C, we chose to represent the RCs in percentages relative to the task means as used by others [35, 36]. The relative RCs quantify the degree of agreement between different or single assessors and facilitate the interpretation of RCs. The cut-off point for the relative RC (75%) was chosen arbitrarily; higher (80%) and lower (50%) cut-off points have been reported previously [35, 36]. Thus, one may shift the cut-off value and interpret the RCs found in this study accordingly. Using a cut-off point of 80% for our relative RCs, for example, would have not changed the current results because all non-reliable tasks had relative RCs higher than 80%.
The majority of tasks were reliable (relative RCs < 75% in 17/26 tasks) for assessor 1. In comparison with the adult version of the SHAP intra-rater reliability, we found approximately the same amount of less replicable tasks. In the SHAP, seven tasks have been found to be less reliable (light power, light tip, heavy extension, page turning, pour water from carton, rotate a key 90°, rotate a screw 90°), but not to a significant extent . In the SHAP-C, nine tasks were significantly less reliable (light power, light tip, light extension, heavy tripod, heavy power, undo buttons, food cutting, rotate a key 90°, and open/close a zip). The difference in less-reliable tasks between adults and children may be due to age differences in motor abilities with the upper limb [31, 32]. For instance, rotating a screw 90° requires fine-motor skills. In adults, the hand motor skills have been acquired to a different extent, thus the variability when rotating a screw, whereas the tested children did not vary in this task.
Interestingly, five of the SHAP-C tasks with relative RCs > 75% were abstract object tasks. In the context of SHAP-C tasks being timed by the assessor, a possible explanation might be the variation in the assessor’s reaction time, especially in rapidly executed tasks involving abstract objects (< 1.2 seconds). The literature reports a response time of 0.18-0.20 seconds after visual stimuli and that many factors account for reaction response: practice, gender, age, fatigue, distraction, and even breathing cycle . Practice might have had a role. The first measurement of the same assessor most likely served as practice and led to lower scores (faster performance) in the second measurement. We cannot exclude the fact that learning effects of SHAP-C tasks within a child might have occurred, but distinguishing learning effects from the reaction time of the assessor is not possible in this case.
An alternative, more objective method for SHAP-C data collection would be to use a different timing system. Possibly, a system that recognizes a certain opening of the thumb-index finger angle or the lifting of the hand from the table, in combination with sensors able to detect movement of position of the objects, would time the performance more accurately. Solutions for recording performance accurately can be extended to computerized systems able to depict the hand positions and objects’ shapes . The inclusion of kinematic measurements would provide information about the movement time and quality of the movement. Each abstract object task may also be executed repeatedly in a certain amount of time (e.g., 10 seconds) and the number of execution times rated accordingly as in the pegboard hand-dexterity test, for example . However, the mentioned solutions would increase the assessment time, costs, and dimensions of the SHAP-C kit, which is beyond the SHAP/SHAP-C purpose. With the disadvantage of increasing the time needed to determine the functionality scores, the simplest and most inexpensive approach would be to videotape the performance. Afterward, the task times could be accurately evaluated from the recordings, as has been previously accomplished for pediatric functionality tests [22, 23]. Although we avoided introduction of procedures that increase data-collection or analysis time, our results suggest that these types of changes might be necessary after all, as the influence of the assessor would be diminished considerably.
Clinically, the RCs of 0.58-1.20 seconds observed in abstract object tasks would be a negligible variation, but relative to the task means, this variation was rather large (≥ 75%). In this case, again, the reaction time might have had an influence on the abstract object task times . Moreover, the practice experience was different across our assessors. Two of the assessors had extended experience with applying SHAP. Assessor 2, on the other hand, had no previous experience and taking into account the findings–the means of assessor 2 differed in several tasks (Table 3)–the assessors may require a longer training period prior to applying the SHAP-C. The training might include studying instructional movies centralized on an online database as in the case of the SHAP . Creating a benchmark test to evaluate assessors’ instructional and data-collection skills after the training would ensure a level of proficiency when applying the SHAP-C.
Furthermore, distraction, another risk-factor for variation in performance , might have affected our participants. Engaging 5-y/o children in performing tasks requires good motivational techniques. Our assessors used intrinsic motivation by stimulating a playful atmosphere  and extrinsic motivation by rewarding performance with positive reinforcement, candy, allowing the child to color, draw or offering an animal sticker. However, the motivation of some children varied during different tasks and sessions, especially in ADLs, causing delays in performance, and thus the variability in task times. One study referred to SHAP tasks as being unattractive for children . If this is the case, then substituting current SHAP-C tasks with tasks simulating activities of child play  and using colorful objects may improve motivation and reduce distraction. Furthermore, the necessity of providing clear instructions and using good motivational techniques in children has been emphasized in the literature about other measurement instruments for pediatric hand functionality . The flow theory provides some suggestions on how to stimulate intrinsic motivation in children: use age-appropriate tasks, promote a ‘fun’ environment, provide the possibility for the children to control some of the tasks (e.g., allowing them to choose an object/task that they want to continue with), set clear and achievable goals for the tasks, and avoid negative feedback (oral or non-verbal) .
The observed variability of the SHAP-C means may be partly explained by the variability in (neuro) motor development in preschool children up to adolescence [43–45]. Therefore, the scores on functionality assessments have to be interpreted bearing in mind this variability in children . Importantly, the timed performances in children require a standardized test, well-trained assessors, and norms for different age categories .
Summarizing the steps to be considered for improving reliability of SHAP-C, future research has to identify an appropriate a data-collection method that will diminish the assessor’s influence. In addition, researchers have to consider providing information to the assessors about techniques to improve motivation of children, either in the form of training or including motivational techniques in the instructions for the SHAP-C protocol.
A limitation of this study would be the relatively small sample size. Although the SHAP reliability was also determined with 24 participants , we feel that for pediatric populations, a larger sample size is necessary to determine the SHAP-C norms and reliability. The study design for the inter-rater reliability is limited by the fact that data were collected on separate days. A more adequate design would involve simultaneously timing of the performance of the participant by the three assessors, but the task instructions being performed by one assessor would result in potential bias for the measurements of the other two assessors. In addition, having three assessors in the same room would be overwhelming for a child. Another alternative would be to measure the participants on the same day, three times with different assessors. However, this approach was not possible because of the limited availability of the children during school hours. More importantly, fatigue and disinterest may occur if children are requested to repeat 26 tasks three times in one day. Videotaping the performance might solve this issue of three consecutive sessions and limit measurements to one-time session. In addition, the order of assessors was the same per participant and measurement day. Therefore, the task means might have been affected by the order of the assessors and/or by the measurement day. For practicality reasons, we could not randomize assessors per measurement day, but further studies should consider randomizing the assessors.
A future approach for assessing inter-rater reliability of the SHAP-C in children may consider the following: (1) allowing each child to perform the SHAP-C once with a randomly assigned assessor and (2) live broadcasting the performance of the participants to the other assessors that will measure simultaneously the performance. This way, the children will not be solicited more than once and by more than one assessor, and the possible bias of rating performance of the participants that received instructions from another assessor will be evenly distributed throughout the data.
Another limitation of this study is the inability to estimate the norms for the prehensile patterns of FP and IOF that are of interest to clinicians. Based on the means and standard deviations in our study, the estimates of norms for prehensile patterns of FP and the IOF could have been calculated, but the formulas for such calculations [1, 26, 28, 48] are not clear for us nor to our statistician. Not having the exact procedure for determining the norm values that are needed for the calculation of FP z-scores and IOF z-scores made the calculation of FP and IOF impossible for the SHAP-C data. Explanations regarding the formulas were denied to us because of the holders’ exclusive rights on the SHAP (intellectual property).
The sizes of the objects were not systematically evaluated. Therefore, research is also needed to determine the appropriate size of the objects for the hands of older children (> 6 y/o), for larger prosthetic hands with an opening width > 5 cm or for spastic hands with an opening < 5 cm. In addition, clinimetric properties should be studied in older children because of changes in performance with age . The reliability of the SHAP-C should be evaluated in different impaired hands and in prosthetic users because the SHAP was also designed for use with such patients . The evaluation of learning effects of the SHAP-C in prosthetic users would be valuable for clinicians repeatedly using the SHAP-C.