Development and validation of the patient-rated ulnar nerve evaluation

Background Compression neuropathy at the elbow causes substantial pain and disability. Clinical research on this disorder is hampered by the lack of a specific outcome measure for this problem. A patient-reported outcome measure, The Patient-Rated Ulnar Nerve Evaluation (PRUNE) was developed to assess pain, symptoms and functional disability in patients with ulnar nerve compression at the elbow. Methods An iterative process was used to develop and test items. Content validity was addressed using patient/expert interviews and review; linking of the scale items to International Classification of Functioning, Disability, and Health (ICF) codes; and cognitive coding of the items. Psychometric analysis of data collected from 89 patients was evaluated. Patients completed a longer version of the PRUNE at baseline. Item reduction was performed using statistical analyses and patient input to obtain the final 20 item version. Score distribution, reliability, exploratory factor analysis, correlational construct validity, discriminative known group construct validity, and responsiveness to change were evaluated. Results Content analysis indicated items were aligned with subscale concepts of pain and sensory/motor symptoms impairments; specific upper extremity-related tasks; and that the usual function subscale provided a broad view of self-care, household tasks, major life areas and recreation/ leisure. Four subscales were demonstrated by factor analysis (pain, sensory/motor symptoms impairments, specific activity limitations, and usual activity/role restrictions). The PRUNE and its subscales had high reliability coefficients (ICCs > 0.90; 0.98 for total score) and low absolute error. The minimal detectable change was 7.1 points. It was able to discriminate between clinically meaningful subgroups determined by an independent evaluation assessing work status, residual symptoms, motor recovery, sensory recovery and global improvement) p < 0.01. Responsiveness was excellent (SRM = 1.55). Conclusion The PRUNE is a brief, open-access, patient-reported outcome measure for patients with ulnar nerve compression that demonstrates strong measurement properties.


Background
Compression of the ulnar nerve at the elbow (UNE), sometimes referred to as cubital tunnel syndrome, is the second most common compression upper extremity neuropathy [1]. The mean annual incidence has been estimated at 25 cases per 100,000 person-years, with the male to female difference being 33 to 17. Work-relatedness has been suggested; since males performing manual work have an elevated incidence of 57 cases/100,000 person-years [2]. A number of studies have related development of symptoms to occupational activities including sustainable flexion postures [3,4] and repetitive elbow movement; or sporting activities like cycling [5]. The prevalence of UNE is 3.5 times higher in people who report occupational activities that involve 'holding a tool in position' [4] compared to workers in the same setting who do not perform this task. However, other physical and psychosocial factors can also contribute to UNE [6]. In a multidimensional risk study, smoking, education level and work experience were identified as risk factors; whereas, gender, BMI, alcohol consumption, trauma to the elbow, diabetes mellitus, and hypertension were not [7].
Evidence on the management of UNE is problematic; and indicates a need for better outcome reporting. A Cochrane review was only able to locate six low-quality clinical trials relating to management of ulnar neuropathy. This review concluded " available evidence is not sufficient to identify the best treatment for idiopathic ulnar neuropathy at the elbow on the basis of clinical, neurophysiological and imaging characteristics. We do not know when to treat a patient conservatively or surgically." [8] This review recommended that future research would be improved by the use of validated disease-specific clinical outcome measures. Our systematic review of prognosis following anterior ulnar nerve transposition faced similar challenges [9]. Although, we were able to locate 26 studies addressing prognosis following ulnar nerve transposition surgery, only two of these were high-quality. We found profound inconsistencies in the design and conclusions of available studies; and were unable to make any conclusions based on the studies located. We found the lack of standardized evaluation of outcomes was a substantial barrier to the conduct of an effective systematic review.
Although the evidence about treatment and prognosis has flaws, it is clear that ulnar nerve compromise can lead to substantial disability. In a qualitative study of patients with ulnar nerve palsy, the majority of people had difficulties with simple, everyday tasks including holding soap,eating, buttoning clothes, holding a glass or lifting small objects [10]. Ulnar nerve compression causes less compromise to the ulnar nerve, but also results in substantial hand impairments [11]. Despite the unique and potentially profound consequences of ulnar nerve problems, there has been little attention to developing and validating UNE disease-specific patient report outcome measures. A systematic review of the outcomes measures used to assess outcomes of UNE identified 42 clinical studies [12] that used 21 different health outcomes measures including 2 generic instruments; 10 nonstandardized measures; 3 symptom-specific patient-reported instruments; and 6 patient questionnaires. A review of standardized rating systems for evaluation of the elbow did not locate any tools specific for UNE [13]. Further, this review noted that most scoring systems used to evaluate elbow function have limited supporting psychometric data.
In 2006, Mondelli and colleagues described a 9-item UNE scale developed in Italian and then translated to English [14]. It was developed in Italian, and published with an English translation. Data from this questionnaire were compared to nerve physiological features, clinical measures and the Boston Carpal Tunnel Questionnaire [15]. The scale had low correlation to electrophysiology grade, and moderate correlation to clinical severity. Test-retest reliability in the first 44 patients was excellent (0.97) and responsiveness was acceptable in 25 patients followed 6-8 months of conservative management (effect size = 0.46). The item development/ reduction process and translation process were not reported. It has not been widely used in subsequent research.
The purposes of this paper are to report the development and validation of a patient-report outcome measure that is designed for use in patients with ulnar nerve compression. Specific objectives include to describe the development process, reliability, content and construct validity, and factor/structural validity.

Methods
The Patient-Rated Ulnar Nerve Evaluation (PRUNE) was developed based on iterative revisions and stakeholder consultation. A formal structured examination of content validity; a statistical analysis of test-retest reliability, factor structure, and construct validity were used for item reduction and evaluation of the final instrument.

Scale development process
The first author has developed previous PRO that have a common structure of pain, specific activity and usual activity subscales [16][17][18][19] disorders. This structure informed the structure of the PRUNE. The specific items were developed through patient interviews, epidemiological and biomechanical studies. Symptoms that were relevant to patients with ulnar nerve compression neuropathy were grouped into: pain and ulnar-nerve "specific" sensory symptoms or motor symptoms. Together with the specific activities and usual functional activities, 4 subscales were derived. A long version of the instrument was completed by all participants and the final reduced scale was determined by a structured item reduction process. Cross correlations between items, factor analysis, inspection for floor and ceiling effects and item distribution were used to reduce the item pool to the optimal set of items.
The patient-rated ulnar nerve evaluation (PRUNE) The final version of the PRUNE is presented in Figure 1. The PRUNE is a 20-item scale that measures pain, sensory/motor symptoms and functional disability in patients with UNE. The 20 items include: 6 pain, 4 sensory/ motor symptoms 6 specific activity; and 4 usual activity (personal care, household, work and recreation) items. Each item is scored on a scale from zero (none/no difficulty) to 10 (worst possible/completely unable). Each subscale is scored by adding the component questions (pain/60, sensory/motor symptoms/40, specific activity/ 60, usual activity/40). The total score is calculated to range from 0-100 points with zero meaning no symptoms or difficulty and 100 being worst possible symptoms and completely unable to do all functional activity. The total score equally weights the 10 items on symptoms and 10 functional items (by dividing the grand total by 2).
The individual items retained during item reduction are in the appended instrument ( Figure 1) which is the final format for the PRUNE. Items modified in the final stage of beta testing and the rationale is listed in Table 1 to document the rationale for item reduction.

Treatment
This study was not designed to study intervention effects. However, response to treatment was used as a context to evaluate the clinical measurement property of responsiveness. Patients underwent a submuscular or subcutaneous ulnar nerve transposition using established procedures [20][21][22].

Comparison study measures
The SF-36 The SF-36 is a 36-item scale that addresses general health. Subscales address Physical Function, Physical Role, Bodily Pain, General Health, Vitality, Social Function Emotional Role, and Mental Health. These subscales are summarized into Physical and Mental Summary Component scores. While the SF-36 is less responsive than disease-specific scales [23,24], it is a valid indicator of general health status in musculoskeletal disorders [25]. General health status measures are commonly used in construct validation and are expected to have a low to moderate relationship with a disease-specific measure like the PRUNE.

The bishop scale
The Bishop Scale (sometimes referred to as Kleinman and Bishop) [26] is a clinician administered measure developed specifically for UNE. The scale addresses: satisfaction, improvement, severity of symptoms, work status, leisure activity, strength, and sensibility. There is no description of standardized application of the tool; nor has reliability, validity or responsiveness been reported [12]. The Bishop scale was administered by an independent evaluator and the items were used for criteria to test known group validity since the scale provides criteria for a number of clinically relevant subgroups.

Name:_________________________ Date:____________________
The questions below will help us to understand the amount of pain or difficulty you experience because of your hand/arm. Please describe your average experience over the past week.

Patient recruitment
Patients (n = 89) were diagnosed in a multi-stage process. The preliminary diagnosis was made by the family physician who then referred the patient for electrodiagnosis and examination by a physical medicine physician. The electrodiagnostic parameters as reported in patient demographics and clinical presentation were considered to make the definitive diagnosis. Patient with a confirmed ulnar neuropathy were sent to surgical consultation with a fellowship trained hand surgeon, who again confirmed the diagnosis again using electrodiagnostic findings and clinical examination. Patients undergoing anterior nerve transposition were approached and agreed to participate in this study. Inclusion criteria included electrodiagnostically confirmed ulnar nerve compression at the elbow, persistent symptoms for at least 3 months with failed conservative management, and able to return for follow-up. Exclusion criteria included: unable to complete self-report forms, central or spinal neurological disorders, other neuropathy affecting the hand (excluded by electrodiagnosis) and medical conditions that precluded participation. Respondents completed the full version of the PRUNE within 2 weeks prior to surgery; and again 3 and 24 months following surgery. The study was approved by the Western University Ethics Board. Written informed consent for participation in the study was obtained from all participants; none were under 18 years of age.

Scale distributions and floor/ceiling
Box plots were used to examine the distribution of scores for individual items and subscales to examine potential floor/ceiling effects or distribution problems.

Content validity
Content validity is fundamental to scale validity and was assessed by four methods. Patient interviews were used during development and reduction of items, pilot testing and the psychometric study to assess content relevancy. The prototype instrument was reviewed by patients, 3 physical therapists, 1 physiatrist, 3 orthopedic surgeons and 2 research assistants. Patients and experts provided feedback on the appropriateness and wording of the items.
Structured content analysis was performed using 2 methods. The International Classification of Functioning Disability and Health (ICF) linking procedures were performed according to established linking rules [27,28]. ICF coding provides a common international language to describe the elements of body structure, function, disability and environment contained in questionnaire items. The Item Perspective Classification was used to perform a 2-level classification of type of decision (rational/ emotional) and content of items (psychological, social, biological, inorganic or pure experience). More detail on this coding method can be found at https://sites.google.com/ site/ipcframework/.

Reliability
A subset of patients was re-tested 2-7 days after their completing the PRUNE. The following statistics were calculated to establish the reliability of the PRUNE: a) reliability intraclass correlation coefficients (2,1) [29], b) Standard error of measurement:

Structural validity
Exploratory factor analyses (principal components analysis using varimax rotation) were used to assess how scale items distributed into subscales [32]. Analysis was performed on data collected at baseline which included a larger subset of items; and these results contributed to decisions about item reduction. Thus, the factor analysis performed at 3 and 24 months included only the final items.

Construct validity
The following hypotheses/expectations were constructed to assess construct validity of the PRUNE based on convergent relationships expected from theoretical and evidence perspectives [33,34]. The strength of these associations was assessed using Pearson correlations.  Hold an object Item performance variable; however strong bio mechanical support and patient endorsement that some type of holding object with arm bent was difficult. Qualitative interviews indicated that respondents used a variety of reading devices and positions; and were not always clear that it meant a continuous activity. Item modified to specify one hour interval and allow multiple options for the object that was held clarifying that the elbow is bent Eating Added specification of different eating utensils for cultural transferability Control of the small finger Different respondents use either small, little or fifth finger to indicate the fifth digit. Motor dysfunction related to the ulnar comprise could include either deformity, lack of motor control-lay terms were used for these phenomena.
Finger use Finger use was a common difficulty reported by patients. It was most remarkably noted for keyboarding or musical instrument use but not all respondents perform these tasks therefore the question was modified to: Repeated finger movement (like when typing, playing instruments or moving small objects ) After item generation and initial iterative changes to the PRUNE, a larger potential instrument was tested on respondents. This larger subset of items underwent both cognitive interviewing and statistical analyses to determine the final subset of items included in the PRUNE. Functions related to the force generated by the contraction of a muscle or muscle groups. Inclusions: functions associated with the power of specific muscles and muscle groups, muscles of one limb, one side of the body, the lower half of the body, all limbs, the trunk and the body as a whole; impairments such as weakness of small muscles in feet and hands…

Specific activities items
Eat (use fork, knife, or chopsticks) d550 -Eating Carrying out the coordinated tasks and actions of eating food that has been served, bringing it to the mouth and consuming it in culturally acceptable ways, cutting or breaking food into pieces, opening bottles and cans, using eating implements, having meals, feasting or dining.
Lift a heavy object d4300 -Lifting Raising up an object in order to move it from a lower to a higher level, such as when lifting a glass from the

Constructed hypotheses for known-group validity
The following known group differences were tested (by ANOVA) to assess construct validity of the PRUNE. The subgroups were defined by an independent assessor through patient interview and examination using criteria defined by Kleinman and Bishop [26].
1. Patients who perceived their global rating of change at 2 years as: improved, versus no change, or worse 2. Patient who were asymptomatic versus those who had mild-occasional, moderate, or severe symptoms 3. Patients who were able to return to work at their regular job versus those who are unable to work because of continued symptoms 4. Patients whose leisure was unlimited versus those who were limited 5. Patients who had both grip and pinch 80% of opposite hand versus those who had either grasp or pinch reach 80% of opposite hand versus those both where grip and pinch less than 80% of opposite hand 6. Patients who had normal sensibility defined as twopoint discrimination less than 5 mm versus those where it was abnormal greater than 5 mm.

Responsiveness
Changes over time were evaluated by calculating a standardized response mean (change score divided by the standard deviation of the change scores) and effects size (change score divided by the standard deviation of the initial scores) [35].

Results
The final version is presented in Figure 1. Patients (See Table 1 for demographics) completed the PRUNE with few missing items (<1.0%).

Content validity
A prototype 25-item scale was developed based on items obtained item generation and early refinement procedures (expert and patient feedback). Subsequent item reduction of the 25-item prototype scale was based on statistical analyses and cognitive interviewing [36,37] Cognitive interviewing with patients indicated wide variability in interpretation (and use) in relation to the telephone item that led to modification of that item ( Table 2). ICF codes for the items are presented in Table 3. Pain items were coded to the ICF code for "Sensation of unpleasant feeling indicating potential or actual damage to some body structure felt in either one or both upper limbs, including hands (b28014). Concepts that relate to severity like frequency and intensity are not linked to ICF. Experts considered pain assessment fundamental to self-report and approved the range of qualifiers used for the pain items. The sensory motor subscale captures four separate ICF codes addressing touch function, sensation related to the skin; control of movement and muscle power. The Specific Activities Subscale was linked to six ICF disability codes at the third or fourth level and comprises codes that describe specific ADLs. The Usual Activity items coded to a high ICF level, i.e., chapter is consistent with the intent of this subscale to address broad domains of usual activity. In ICF language this subscale addresses self-care, household tasks, major life areas and recreation and leisure.
Using Item Perspective Classification all items were rational judgments because patients needed to recall information over the past week. Pain, symptoms and specific activities were classified as rational biological judgments. The first item of the usual activities subscale addresses personal care and hence falls under a rational judgment within the biological domain. The remaining three items (household, work and recreation) were classified as rational judgments about the social domain.

Item behavior/distributions
The full range of scores was used for all items except for numbness. The boxplots of items for each subscale at baseline (Figure 2 a, b, c, d and e).
The 6 pain items (Figure 2a) indicate that least and most pain items behaved as expected; with the remainder of items falling in a moderate range. From the sensory/motor symptoms subscale (labeled "Other Symptoms), numbness, pins and needles and weakness in the hand had high overall ratings; whereas, the motor control item was less severely affected. Numbness was the only item not endorsed by some individuals. Item responses for baseline specific functional activities (Figure 2b) indicated that "Carrying a heavy Recreational activities D920 -Recreation and leisure Engaging in any form of play, recreational or leisure activity, such as informal or organized play and sports, programmes of physical fitness, relaxation, amusement or diversion, going to art galleries, museums, cinemas or theatres; engaging in crafts or hobbies, reading for enjoyment, playing musical instruments; sightseeing, tourism and travelling for pleasure object" and holding an object with the arm bent were difficult tasks; whereas, eating and use of the telephone had lower ratings. Work was the most difficult "usual activity".
The subscale and total scores distributions and indicators of central tendency are reported in boxplots 2e. These illustrate that subscales are normally distributed.

Reliability
High ICCs were obtained with all subscales exceeding 0.90 except for usual activities (0.87). The ICC was 0.98 for the total score. Lower limits of the confidence interval were high with the exception of usual activities where the confidence interval was wide ( Table 4). The standardized response mean (SEM) and Minimal Detectable Change (MDC) were 3.1 and 7.2 respectively for the total score (Table 4).

Construct validity
The PRUNE was highly discriminative between different functional outcomes across all of the defined subgroups (See Tables 5 and 6). The follow-up scores were significantly different between subgroups based on whether they were working or not (16.5 vs. 53.8), able to do their normal activities or not (10.8 vs. 37.8) or had 2-point All subscale scores and the total score were significantly different between subgroups for both work status and activity status. For sensory subgroups only the sensory/motor (SM) symptoms scale was different between subgroups. All p-values were significant at p < 0.01.  Tables 5 and 6 for subgroup scores). There was a linear trend to PRUNE scores based on whether patients reported mild to severe disability ( Figure 3). Correlations coefficients were more strongly associated with the physical health domains on the SF-36 in comparison to the mental health domains ( Table 7). The sensory/ motor symptoms subscale correlated most strongly with overall physical health status indicating the importance of the ulnar nerve symptom items. Stronger correlations were observed between more conceptually similar subscales of the PRUNE and SF-36. All of these findings supported the construct validity.

Factor validity
The baseline factor analysis included the items from the longer version of the PRUNE (before final item reduction). Items dispersed into 4 factors representing pain, symptoms, specific function and usual functions at baseline (Table 8). At baseline, the larger subscales (pain-24% and specific activity-25%) explained the largest portion of the variance. The smaller subset of final items included in the 3 and 24 month factor analysis also loaded on these same 4 factors (See Table 9). Pain explained more than 20% of the variance at 3 and 24 months. Although the item "weakness in the hand" loaded most strongly on its assigned subscale (sensory/motor symptoms function), it exhibited some cross-loading onto Specific Activity. At 24 months recreational activities cross loaded on pain, and specific activity-although its highest loading was on Usual Activity. Overall, factor analysis supported the structural validity of the subscales.

Responsiveness
A large effect size (and standardized response mean) was observed from baseline to 24 months (Table 10) for all subscales (all SRM <0.90) and the total score (SRM = 1.55).

Discussion
This study provides evidence that the PRUNE is capable of providing reliable, valid and responsive assessment of symptoms and function experienced by patients with ulnar nerve compression.
Content validity analyses supported the theoretical content of the PRUNE. ICF linking indicated that the PRUNE Although there were variations between subsets of scores; there was a significant linear trend across all scores; patients in the best clinical outcome category always were significantly better than the lowest clinical outcome category regardless of the subscale (p < 0.01).
crosses a number of domains of the ICF. The sensory/ motor symptoms scale items linked to ICF codes for touch function, sensation related to the skin, control of movement and muscle power which is a fit with the conceptual target to measure symptoms arising from ulnar nerve compression symptoms. Use of patients, experts, ICF coding and cognitive content coding provided a comprehensive assessment of the content validity of the PRUNE. Given that content validity is the most foundational element of scale validity, it is critical that content issues be rigorously evaluated and resolved before proceeding to more statistically based clinical measurement evaluations. The extent of content validation performed for this study exceeds previous instrument development which is related to the development of clinical measurement methods for ICF linking and use of cognitive interviews in measure development.   Box plots indicated that the full range of scores was endorsed on almost every item. However, there were no participants who reported having "zero" numbness indicating that this was a consistent UNE symptom. Sensory symptoms (numbness and tingling) were rated as being more severe than motor control of the little finger. This is consistent with the pathology of nerve compression where sensory changes are an early impairment and motor changes occur with more severe or prolonged compression [38].
A reliability coefficient of 0.90 has been recommended for measures to be used on individual patients [39]; whereas greater than 0.75 is considered excellent for group comparisons [40]. The reliability of the total score or subscales scores of the PRUNE were all high, indicating either total or component scores could be used to make decisions about individual patients. A minimal detectable change of approximately 7 points for the total score suggests that clinicians should be confident that the PRUNE has indicated a true change in symptoms and disability when the score Principal Component Analysis with Varimax with Kaiser Normalization for items in the longer version of the PRUNE before the final item reductions were completed. Factor loadings are color-coordinated to highlight loading over 0.40. (Pain subscale items in bold, sensory/motor symptoms items in italics, specific activity items in underline (SA) and usual activity items in bolde (UA). Cross loading occurred on a number of items and was one issue considered in the item reduction process (along with the cross correlation between items, and results of patient interviews with respect to item interpretation and clarity).
changes by this amount. Some have suggested that a MDC of less than 10% of the score range is excellent. Whether the ICC, SEM or MDC are used to indicate reliability the PRUNE demonstrated high reliability. We speculate that test-retest reliability can be influenced by the retest interval, the number of items on a subscale, the acuity of the patients tested; and the extent to which the construct being measured is stable and definable by patients. For example, the "usual" activities performed over the past 24 hours can vary and influence how people calibrate that item even when the patient's condition is stable. Structural validity was supported by factor analysis that indicated the items fell into 4 subscales that matched the proposed structure. Only minor cross-loading was found. Pain explained more than 20% of the variance at all time points. The confirmation that pain and sensory/motor symptoms systems were separate concepts in the response patterns is important as it verifies the importance of an ulnar nerve specific measure which goes beyond pain questions to capture these additional more disease-specific symptoms.
The known groups validity supported the ability of the PRUNE to discriminate between different clinical subgroups like those who have/have not improved following surgery, or able/not able to return to work. Known group differences can be useful clinically as benchmark comparison scores when assessing whether patient profiles match   different categories of outcome. There were 10-fold differences in score between those who rated themselves as asymptomatic versus those who experienced severe symptoms; and a linear pattern was present for the scores for mild, moderate and severe rating. This suggests that increasing PRUNE scores reflect a linear trend of worsening outcomes that mirrors patient and clinician outcome ratings.
The construct validity of the scale was supported since the observed correlations matched the expected convergent relationships. The stronger relationships would be observed between PRUNE subscales and physical subscales of the SF-36 was expected and confirmed. We also anticipated that pain might interfere with social roles which was also confirmed. However, overall, the PRUNE demonstrated low correlation to mental health status which is consistent with its focus on the physical symptoms of UNE.
Finally, the large effect sizes observed in measuring change over time supports the responsiveness of the PRUNE to detect change over time, i.e., following treatment. This is an important measurement feature because assessing change in response to treatment is the most predominant use of outcome measures. A previous study demonstrated a smaller (moderate) effect size for a different PRO, although the intervention was conservative management in that study [14]. Others have cautioned that responsiveness can vary by treatment [35] and thus it would be premature to state that the PRUNE is more responsiveness than this measure.

Conclusion
This study led to the development of a reliable and valid measurement tool designed specifically for the patient population with ulnar nerve pathology. The next steps in evaluating the PRUNE should include analysis of the measurement properties through Rasch analysis which would address scale and differential item functioning issues (e.g. gender or age effects) not addressed in this study; and analysis of its responsiveness to detect clinical change in head-to-head comparison against other outcome measures. The PRUNE is provided by open access from the developer/copyright owner (J MacDermid-jmacderm@uwo.ca) for free use.