Diagnostic accuracy of sensory and motor tests for the diagnosis of carpal tunnel syndrome: a systematic review

Background Carpal tunnel syndrome (CTS) is the most common entrapment mononeuropathy of the upper extremity. The previous systematic review of the diagnostic tests for CTS was outdated. The objective of this study was to compile and appraise the evidence on the accuracy of sensory and motor tests used for the diagnosis of CTS. Methods MEDLINE, CINAHL, and Embase databases were searched on January 20, 2020. Studies assessing at least one diagnostic accuracy property of the sensory or motor tests for CTS diagnosis were selected by two independent reviewers. Diagnostic test accuracy extension of the PRISMA guidelines was followed. Risk of bias and applicability concerns were rated using QUADAS-2 tool. Any reported diagnostic accuracy property was summarized. Study characteristics and any information on the accuracy of the sensory and motor tests for CTS diagnosis were extracted. Results We included sixteen clinical studies, assessing thirteen different sensory or motor tests. The most sensitive test for CTS diagnosis was the Semmes-Weinstein monofilament test (with 3.22 in any radial digit as the normal threshold) with sensitivity from 0.49 to 0.96. The tests with the highest specificity (Sp) were palmar grip strength (Sp = 0.94), pinch grip strength (Sp from 0.78 to 0.95), thenar atrophy (Sp from 0.96 to 1.00), and two-point discrimination (Sp from 0.81 to 0.98). Conclusions The evidence was inconclusive on which sensory or motor test for CTS diagnosis had the highest diagnostic accuracy. The results suggest that clinicians should not use a single sensory or motor test when deciding on CTS diagnosis. Trial registration PROSPERO CRD42018109031, on 20 December 2018.


Background
Carpal Tunnel Syndrome (CTS) is the most common compression neuropathy of the upper extremity, happening as the results of median nerve entrapment in the carpal canal [1]. Persons with CTS have sensory or motor problems in the area innervated by the median nerve [2]. The prevalence of CTS has been estimated to be 4-5% in the general population, with a higher prevalence in the working population [3].
In its latest guideline, the American Association of Orthopedic Surgeons (AAOS), has categorized CTS clinical diagnostic tests in four main categories: 1) provocative maneuvers (e.g. Durkan's test, Phalen's test), 2) sensory and motor tests (e.g. heat/cold sensation, thenar muscles atrophy), 3) questionnaires and scales (Boston carpal tunnel questionnaire, CTS-6 scale), and 4) hand symptoms diagrams/maps (such as Katz and Stirrat's hand symptoms diagram) [4]. Advantages of clinical diagnostic tests include that they can be done quickly, do not cost much, are not painful, and yield immediate results.
A systematic review (SR) of the diagnostic accuracy of clinical examination tests was conducted by one of our research team members in 2004 and is currently outdated [5]. Several original studies have been published after 2004 that were not included in any other reviews in the past 16 years [6][7][8][9][10][11]. This paper is one of a series of updated SRs related to the diagnostic accuracy of CTS clinical diagnostic tests categorized by the AAOS. We previously published an SR of scales, questionnaires, and hand symptom diagrams [12]. The focus of this SR is on sensory and motor tests, and we aimed to identify, critically appraise and synthesize the evidence on the diagnostic accuracy of the sensory and motor tests for diagnosing CTS in individuals with suspected CTS.

Methods
We registered the protocol of this SR on December 20, 2018 with the International Prospective Register of Systematic Reviews (PROSPERO), with the registration number of CRD42018109031 [13]. We followed the Diagnostic Test Accuracy extension of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (DTA-PRISMA) [14] and the Cochrane collaboration guidelines in developing and reporting this SR [15].

Information sources
We conducted a systematic computerized search of Medline and Embase through Ovid, as well as CINAHL, all from inception until January 20, 2020. We developed our search strategy in consultation with a health science research methodologist librarian at McMaster University in two meetings. We originally developed a search strategy that captured all the four components of the clinical diagnostic tests outlined by the AAOS. However, due to the large number of study results and the variety of identified tests, we only focused on sensory and motor tests in this SR to increase the ease of readability for the target audience. Our search strategy included search terms for three main concepts including CTS, diagnostic accuracy properties, and names of the diagnostic tests for CTS. The search strategy can be found in Appendix A.

Study selection
Two authors (AD, JY) independently selected studies in two consecutive phases. In the first phase of study selection, titles and abstracts of the included citations were reviewed based on a pre-determined set of eligibility criteria. In order to enhance the quality of the review process, AD and JY initially reviewed 100 of the citations and resolved their disagreements through discussions. In the second phase, after retrieving the full texts of the included articles, two authors again independently assessed the eligibility of the articles for inclusion in this SR. The kappa agreement between the authors in the first phase of screening (titles and abstracts) was calculated using STATA statistical analysis software, ver-sion15 [16]. Kappa values below 0.20 suggest poor agreement, and values of larger than 0.80 indicate perfect agreement [17]. Any disagreements between AD and JY in the process of study selection was resolved by the most experienced research team member (JM) through discussion.

Eligibility criteria
We did not exclude any studies based on their language, sample size, choice of reference standard, or gender of the included participants. We included studies that met the following inclusion criteria.

Design
We included case-control, cross-sectional, and cohort (both retro-and prospective) study designs that were in a full-report format.

Participants
We included studies on persons who were diagnosed or suspected to have CTS and were older than 18 years old. The studies must have had a control group of people diagnosed with any type of upper limb musculoskeletal, neurological, or vascular conditions, such as cervical radiculopathy, or De Quervain's tenosynovitis. We excluded studies that had healthy control groups, as healthy control groups would falsely inflate the diagnostic accuracy properties and are not reflective of the actual clinical settings.

Diagnostic test
Studies that assessed the diagnostic accuracy of at least one sensory or motor test for CTS diagnosis.

Comparison
Since there is no gold standard for CTS diagnosis, we decided to accept studies with any reference standard, ranging from electrodiagnosis testing, to carpal tunnel release surgery and clinical examination tests.

Outcome
We included articles that reported at least one diagnostic accuracy property, such as sensitivity (Sn), Specificity (Sp), positive predictive value (PPV), negative predictive value (NPV), or articles providing enough data on their test results enabling us to (re)synthesize 2 × 2 contingency tables.

Time
Any time frame reporting diagnostic accuracy of the sensory or motor tests for CTS diagnosis.

Data extraction
Initially MG and AD extracted data from three of the included studies, and since the agreement was high, MG did the remainder of the extraction independently, and AD cross checked the information. We used a selfdeveloped, pre-determined extraction sheet previously developed to extract information for a SR of diagnostic accuracy of scales, questionnaires and hand symptom diagrams for CTS diagnosis. We extracted the following data: 1) Information about the studies, such as authors, study design, year and country, conflicts of interest. 2) Information on the participants, such as sample size, age, gender, inclusion and exclusion criteria, diagnoses, severity and duration of symptoms, and CTS prevalence in the sample. 3) Information regarding the index test, index test methodology and threshold criteria for positive results, as well information on the reference standard. 4) Any information on the diagnostic accuracy properties of the sensory or motor tests, such as Sn, Sp, NPV, PPV.

Data synthesis and analysis
We extracted information on Sn, Sp, PPV, NPV, positive likelihood ratio (+LR), negative likelihood ratio (−LR) and their associated 95% confidence intervals (95%CIs) from the included studies, where possible. When this information was not directly reported in the studies, we tried to calculate them by reconstructing 2 × 2 contingency tables based on the available data on true and false positives and negatives. PPV and PPV are affected by the prevalence of the condition in the sample, for instance, an increase in the prevalence of a given condition in a sample increases the PPV and decreases the NPV [18]. To overcome the previously mentioned issues associated with NPV and PPV, we tried to calculate and report +LR and -LR, where possible. Likelihood ratios are independent from the prevalence of the condition in the sample and provide a more accurate clinical judgment [18]. Following is an interpretation of the likelihood ratios: +LR > 10, and -LR < 0.1 indicate a great change in the posttest probability and are very valuable in the clinical decision-making process [18]. + LR of 5 to 10 and -LR of 0.1 to 0.2 indicate a moderate change in the posttest probability of having a condition [18]. + LR of 2 to 5, and -LR of 0.2 to 0.5 indicate slight change in the posttest probability [18]. Lastly, +LR < 2 and -LR > 0.5 have no clinical value in calculating the posttest probability [18].
We categorized and presented the information on the diagnostic accuracy of the sensory and motor tests for CTS diagnosis in separate tables. The results were grouped into 'sensory tests for CTS diagnosis' and 'motor tests for CTS diagnosis', with each category organized by the frequency of the diagnostic test being assessed. Due to the heterogeneity of the data (different sample characteristics, different index and reference tests methodology and criteria for positive results) we could not conduct a meta-analysis.

Assessment of risk of Bias and applicability concerns
Two authors (AD, JY), independently rated the risk of bias and applicability concerns of the included studies based on the revised tool for the quality assessment of diagnostic accuracy studies (QUADAS-2) [19]. In case of any disagreements in rating the quality of the studies, a third research team member (JM) was engaged and the disagreement was resolved through discussion. The QUADAS-2 tool assesses risk of bias in four domains: a) patient selection, b) index test, c) reference standards, and d) study flow and timing [19]. Moreover, QUADAS-2 rates the applicability concerns in three domains addressing patient selection, index test, and reference standard [19].

Results
We identified 5552 citations through the electronic database search. After removing the duplicates, we reviewed the titles and abstracts of 4052 citations. In the second phase of screening, we reviewed the full texts of 161 articles, of which 16 articles were included in this SR ( Fig. 1. PRISMA diagram). The reviewers had a kappa agreement of 0.70 (SE: 0.02, 95% CI = [0.66-0.74]) in screening the titles and abstracts. The studies were conducted in USA, Sweden, France, Canada, Spain, Portland, Italy, and Turkey. Appendix B summarises the reported conflict of interests of the included studies. The characteristics of the included studies are presented in Table 1. All of the studies had prospective crosssectional designs, except for two studies that had retrospective designs [6,23], and one that had prospective cohort design [11].
Participants' characteristics are summarized in Table 3, including their age, gender, duration and severity of symptoms, sampling method, process of selection, and eligibility criteria. Overall, 2763 individuals were included in these studies, of whom 1131 had CTS.

Risk of Bias and applicability concerns of the included studies
All of the studies had low risk of bias rating in the patient selection domain of the QUADAS-2 and enrolled a consecutive sample of participants, avoided a casecontrol design and inappropriate exclusions. Six studies had unclear risk of bias ratings in the index test domain. It was unclear if the index tests results were interpreted without the knowledge of the results of the reference standard. Three studies had high, seven studies had unclear, and six studies had low risk of bias in the reference standard domain. The main reason for low ratings was the lack of blinding of the person performing the reference standard test. Eleven studies had unclear   • Was administered independently on both sides. The task was to overturn all the pegs using only the movement of the first three fingers of a hand (without supinating or pronating the forearm or resting the elbow) starting from the top and the opposite side from the hand with which the test is performed. At the next time taken to complete the test, 5 s of penalties were added each time the patient pronated the forearm or touched the edge of the hole with the peg, and 10 s of penalty if the patient dropped the peg [10].
• If the total time is greater than the value corresponding to the 97th percentile of the normative data of the healthy Italian population, corrected by sex and age class [10].
Graphesthesia [20] • A figure was written on the finger pad with a blunt pencil [20] • The threshold was defined as the height in mm of the smallest figure that was identified by the patient [20].
Hand grip strength [22,29] • Measured using a Jamar Hydraulic Hand Dynamometer (J.A. Preston Corporation, Jackson, Michigan) [22] • Measured using either the Jamar dynamometer (Preston, Jackson, MI) or the Greenleaf Solo System (Palo Alto, CA). Grip was measured at each setting (I to V). Key (side-toside) pinch, 3-jaw (tripod) pinch, and tip-to-tip pinch strengths were also measured using the Greenleaf Solo System. Each test was performed 3 times and the resultant mean values were used for data analysis [29].
• Hand grip and palmar pinch grip results were considered abnormal if they were more than 1.65 standard deviations below the mean for persons of the same age and sex [22]. • Evaluated grip strength by comparing subjects' right hands with their left hands. They considered strength diminished if grip strength at position III on the dynamometer was more than 12% less on the affected side than the contralateral side. The same assumptions were applied to key pinch, 3-jaw (tripod) pinch, and tipto-tip pinch strengths [29].
Hypoesthesia [20,24,28] • The sensibility screening was carried out with cotton wool, pins and warm and cold metallic rollers (40°C and 20°C, respectively) [20]. • A pinwheel was rolled across the palmar aspect of the index and small fingers [24]. • The sensitivity was evaluated by perception of pinprick [28] • The test was considered positive if the subject reported hypesthesia of the index finger compared with the small finger [24].
Pinch grip strength [22,26] • Measured with a B&L Pinch Gauge (B&L Engineering, Santa Fe Springs, California) [22] • The pinch was performed by having the patient actively pinch a piece of paper between the tips of the thumb, index and long fingers using MP flexion and IP extension [26].
• Hand grip and palmar pinch grip results were considered abnormal if they were more than 1.65 standard deviations below the mean for persons of the same age and sex [22]. • If symptoms reproduced within 60 s [26].
Semmes-Weinstein monofilament testing (SWMFs) [6,8,21,[25][26][27]29] • The 20-piece kit of SWMFs (North Coast Medical, San Jose, CA) was used to test sensory thresholds of the tips of the thumb, the index finger, and the long and small fingers using standard clinical techniques. Monofilaments were applied three times, with a positive response in one or more of the applications indicating that the stimulus was perceived [25]. • SMWs was done on the distal palmer pad of each digit of the hand in with enough force to bow the monofilament for a total of 1.5 s. The monofilaments were applied three times, with a positive response to one or more of the applications indicating that the stimulus was perceived [27]. • The monofilament was applied 3 times to each digit and the palm; a patient's affirmative response to 1 or more of the monofilament applications indicated the stimulus was perceived. The monofilament kit contains 5 monofilaments to mark 5 selected thresholds: 2.83 (normal), 3.61 (diminished light touch), 4.31 (diminished protective sensation), 4.56 (loss of protective sensation), and 6.65 (loss of deep pressure sensation). The numeric value represents the logarithm of 10 times the force in milligrams required to bow the monofilament. All subjects were tested with their wrists in neutral position.
The tests were then repeated after the subjects held their wrists flexed (Phalen's maneuver) for 5 min [29].
• Recorded thresholds were categorized as normal or abnormal using four decision rules and two criterion measures. The decision rules were (1) a threshold higher than 2.83, (2) a threshold higher than 2.83 and higher than the threshold of the small finger (D5), (3) a threshold higher than 3.22, and (4) a threshold higher than 3.22 and higher than the threshold of the small finger. The two criterion measures were (1) the highest threshold of the three radial digits (D1-D3) and (2) the threshold of the long finger alone (D3) [25]. • A classification of abnormal was assigned if the SWMF threshold for any of the radial three digits was greater than 2.83 and greater than the threshold for the small finger [6,26]. • Two separate sets of criteria: • SWM 1: a positive test was defined as stimulus perception by the patient in any one of the radial three digits at a threshold value of 2.83 or an absent stimulus perception. • SWM 2: a positive test was defined by stimulus perception at threshold value of 2.83 or an absence of stimulus perception using only digit 3 and using digit 5 for internal comparison. • The patient must have had a digit 3 SWM test of 2.83 and a digit 5 test of 2.83 [27].
Tactile thresholds [20] • Pulses consisted of half sinusoids of 100 Hz from a Bruel & Kjaer shaker and were applied perpendicularly to the skin of the finger pads via a 2 mm diameter blunt plastic • The lowest amplitude that was felt in at least three of four consecutive stimulations was taken as the "yes response", and the lowest amplitude that was not felt in ratings on the flow and timing domain, because there was no mention of the appropriate interval between index and reference standard tests administration.
Regarding the applicability concerns of the included studies, nine studies had low concerns, four had unclear, and three studies had high concerns. In the patient selection domain, three studies had high, one study had unclear, and eleven studies had low applicability concerns. In the index test domain, only three studies had unclear concerns and the rest of the studies (thirteen studies) had no concerns regarding applicability. Lastly, in the reference standard domain, one study had high concerns, two studies had unclear concerns, and thirteen studies had no concerns regarding applicability. The probe. The amplitude of the stimulus pulse was increased or decreased in small increments [20].
3 of 4 stimulations as the "no response". The threshold was defined as the average of these 2 values [20].
Thenar atrophy 9,307 • Thenar atrophy was defined as concavity of the thenar muscle group along the plane parallel to the palm and was scored as either present or absent [7].

• No description
Thumb abduction weakness [24,28,30] • The subject placed the touch pads of the thumb and small finger together. The examiner then applied a strong posteriorly directed force at the thumb interphalangeal joint toward the metacarpophalangeal joint of the index finger while instructing the subject to give maximum effort to keep the touch pads together [24]. • The strength of the abductor pollicis ensuring that the thumb was parallel to the index finger and the movement was occurring at the metacarpal trapezial joint [28].
• The test was positive if any weakness was detected [24]. 20,21,267 • The gap was successively decreased between the 2 points of a pair of blunted dividers, applied perpendicularly to the pulp of the finger [20]. • Static 2PD Tested on the pulp of the index finger using the Disk-criminator [21]. • Moving (dynamic) 2PD with electrocardiogram calipers with tips set 4 mm apart. The index and fifth fingertips were stroked five times with either one or two caliper tips [23]. • Two-point discrimination was performed in order to determine sensory loss. The Dellon discriminator was used on the index and third-finger fingertips [7].
• The threshold was defined as the smallest gap in mm at which the patient could identify that there were 2 points [20]. • The normal being taken as less than 6 mm [21].
• Failure to identify correctly the number of points on two or more strokes was considered abnormal [23]. • Greater or equal to 6 mm was accepted as altered sensation [7].
Vibrometry [20][21][22]26] • A 100 Hz sine wave was produced by an electromagnetic vibrator. The peak to peak vertical movement of the 13 mm diameter blunt stimulus probe was recorded continuously in microns by means of an accelerometer. The variable tissue damping of the vibration amplitude was thus excluded as a source of error [20]. • Tested by the application of a branch of a tuning fork (256 cycles per second) to the pulp of the index finger and comparing the perceived intensity to that in the little finger in the same hand [21]. • Determined in the 2nd finger of each hand with a Vibratron II (Physitemp, Clifton, New Jersey) using a standard psychophysical technique and published normal values based on age and height [22]. • Testing with the prong of a 256 cycle per second tuning fork was performed on the fingertip [26] • The perception threshold was determined with the method of limits, i.e. as the average of appearance and disappearance thresholds when the stimulus was successively increased and decreased. Vibratory threshold was determined at least 3 times at each site and the mean was calculated [20]. • A vibratory threshold was considered abnormal if it was more than 1.65 standard deviations above the mean for persons of that age and height [22].
Von Frey hairs [20] • A series of 10 nylon filaments of different diameters and length with log arrhythmically spaced bend pressures from 0.02 to 10 g were applied perpendicularly to the pulp of the finger. Each filament was applied 10 times at irregular intervals (to avoid the error of rhythmical response).
• The threshold was defined as the pressure which was felt closest to half of the 10 stimulations [20].

Diagnostic accuracy of sensory tests for CTS diagnosis
The diagnostic accuracies of the SWMFs, two-point discrimination, vibrometry, hypoesthesia, tactile thresholds, Von Frey hairs, graphesthesia, and warm and cold thresholds were assessed in the included studies. See Tables 4 and 5 for detailed results. Semmes-Weinstein monofilaments (SWMFs) test was assessed in seven of the included studies [6,8,21,[25][26][27]29]. The reported sensitivities and the specificities ranged from 13 to 98%, and from 9 to 93%, respectively [6,8,21,[25][26][27]29]. The authors of this SR calculated +LR and -LR, which ranged from 1.6 to 7, and from 0.98 to 0.12, respectively. Different decision rules were tested in the studies, which resulted in different diagnostic accuracies, and are summarized in Table 4. In the study by Szabo et al. 1999, SWMFs was performed in two positions, neutral and Phalen's position (90 degrees of wrist flexion) [29]. The results from this study indicated a better diagnostic accuracy for SWMFs test, when done with wrist flexion (Sn = 83%, Sp = 44%, +LR = 1.48, −LR = 0.38) [29]. Furthermore, Szabo et al., calculated the PPV and NPV based on five hypothetical CTS prevalence, ranging from 1 to 20% [29], with the details of this analysis being summarized in Table 4. Two-point discrimination test was assessed in four studies [7,20,21,23]. In the study by Borg & Lindblom, only the Sn was calculated, which was 30% [20]. In the other three studies, the Sn was 6, 32, and 63%, the Sp was 98, 81, and 85%, the +LR was 3, 1.68, and 4.2, and the -LR was 0.95 and 0.84, and 0.43. 7,20,217 In the study by Katz et al. 1990, PPV and NPV were calculated based on two CTS prevalence [23]. In a sample with 40% CTS    [23]. In sample 2 with a CTS prevalence of 15%, the PPV was 23% and the NPV was 87% [23]. Vibrometry was assessed in four studies [20][21][22]26]. In the study by Borg et al. 1988 [20], only Sn was calculated for vibrometry testing, which was 52%. Franzblau et al., incorporated three different reference standards, which were NCS; NCS + symptoms consistent with CTS; and physical examination findings and symptoms consistent with CTS [22]. The highest diagnostic accuracy values occurred when taking physical examination findings as the reference standard (Sn = 11%, Sp = 93%, +LR = 1.57, −LR = 0.95) [22]. In the study by MacDermid et al. 1997, two testers performed the vibrometry [26], which resulted in different diagnostic accuracies as summarised in Table 5.
Lastly, Tactile thresholds, Von Frey hairs, graphesthesia, and warm and cold thresholds were only assessed in one study [20]. In this study by Borg & Lindblom, only the Sn was calculated, which was 52% for tactile thresholds, 52% for Von Frey hairs test, 24% for graphesthesia, and 15% for warm and cold thresholds [20]. Borg & Lindblom assessed the diagnostic accuracy of six sensory tests, which were vibrometry, two-point discrimination, tactile thresholds, Von Frey hairs, graphesthesia, and warm and cold thresholds. They called this combination, quantitative sensory testing (QST), and it had a Sn of 82% [20].

Diagnostic accuracy of motor tests for CTS diagnosis
The motor tests assessed in the included studies were thumb abduction weakness, thenar atrophy, hand grip strength, pinch grip strength, and functional dexterity tests. Each test is summarized below, and detailed information can be found in Table 6.
Thumb abduction weakness was assessed in three studies [24,28,30]. The reported sensitives and specificities from these studies ranged from 12.1 to 66%, and from 66 to 73%, respectively [24,28,30]. As calculated by the authors of this study, the +LR were 1.37, and 1.94, and the -LR were 0.51 and 0.86 for thumb abduction weakness testing [24,30]. We could only obtain the values for sensitivity from Raudino 2000 study [28].
Thenar atrophy was assessed by three studies [7,9,11]. The Sn of the thenar atrophy test was minimal, with values ranging from 5.5 to 22%, but it was a highly specific test, with Sp ranging from 96 to 100% [9,11].
Hand grip strength was assessed in two studies. In Franzblau et al. 's study [22], hand grip strength was compared to three different reference standards: 1) electrodiagnosis, 2) electrodiagnosis and symptoms consistent with CTS, and 3) physical examination findings and symptoms consistent with CTS [22]. The highest diagnostic accuracy results came from taking physical examination findings and symptoms consistent with CTS as the reference standard, which yielded a Sn of 32% and a Sp of 94% [22]. As calculated by the authors of this study, hand grip strength testing had a + LR of 5.33, and a -LR of 0.72. In addition, Szabo et al. 1999 found that hand grip strength had the following diagnostic accuracy: Sn = 48 (95% CI 26-70), Sp = 30 (95% CI 14-46) [29]. Positive and negative predictive values were calculated using five hypothetical CTS prevalence, which are summarized in Table 6. In general, the lowest CTS prevalence (1%) resulted in the worst PPV (1%) and the best NPV (98%) [29].
Pinch grip strength was assessed in two studies. In a study by MacDermid et al., two testers performed the pinch grip strength testing and identified Sn of 72 and 70% for testers 1, and 2, respectively [26]. The Sp values  [10].

Reference standards for CTS diagnosis
Out of the 16 included studies, 11 studies had nerve conduction studies (NCS) as their reference standard. These studies had different criteria for positive test results, which are summarized in Appendix C. In the remaining five studies, the following reference standards were considered. Borg & Lindblom [20] (1988) used a combined battery of tests as the reference standard. This combined battery of tests included formal CTS screening, the neurological examination and the  [25,26]. Finally, Franzblau et al. 1993, had three different reference standards, which were 1)NCS; 2)NCS + surveillance symptom definitions for CTS; 3)Physical examination + surveillance symptom definitions for CTS [22].

Discussion
This study synthesized sixteen clinical studies reporting on thirteen different sensory and motor tests. Among these tests, none had consistent evidence for high diagnostic accuracy. These results suggest clinicians should not rely on the results of one single sensory or motor test for CTS diagnosis, instead using a combination of several of sensory and motor tests, or other combinations of tests from different AAOS categories to rule in/rule out CTS.
In this SR, we found the most specific tests for CTS diagnosis were the hand (palmar) grip strength test [22] (Sp of 94%), pinch grip strength, (Sp from 78% [26] to 95% [22]), thenar atrophy (Sp from 96 to 100%) [7,9,30], and 2PD (Sp from 81 to 98%) [7,21,23]. Tests with high Sp can detect true negative cases with a great precision and have a low false positive outcome [18]. This feature can assist clinicians in differentiating between CTS and non-CTS cases. Of the included sensory and motor tests, the most sensitive for CTS diagnosis was the SWMF test, using the 3.22 monofilament size in any radial finger as the normal threshold, with Sn values ranging from 49% [6] to 96% [26]. Tests with high sensitivity have low false negative results, which is an important factor for screening purposes [31]. In other words, when the objective of a clinician is to screen people with suspected CTS, they should use a highly sensitive test (low specificity values are tolerable); therefore, the SWMF is potentially a useful screening tool.
Our results confirm the findings of a recent clinical practice guideline by the Academy of Hand and Upper Extremity Physical Therapy and the Academy of Orthopedic Physical Therapy of the American Physical Therapy Association [32]. This guideline recommended using the SMWFs test with either 3.22 or 2.83 as the normal threshold for mild to moderate CTS cases, and for more severe CTS cases, a 3.22 threshold should be considered. Compared to the previous SR on this topic by MacDermid and Wessel in 2004 [5], in this updated SR we mainly focused on a sample with no healthy controls, and this was the main difference of the two SRs. Moreover, MacDermid and Wessel concluded that the most specific (but not sensitive) tests for CTS were the 2PD and testing of thumb abduction weakness [5]. We did find the 2PD as one of the most specific tests, however, the palmar and pinch grip strength tests, and the atrophy of the thenar muscles proved more specific than the thumb abduction weakness test.
Only two studies reported the prevalence of CTS in the underlying population where they sampled their participants from [23,29]. Prevalence is important when considering applying the results since the pretest probability is determined by the prevalence [18]. Settings with higher prevalence of CTS, such as electrodiagnosis labs and hand therapy clinics, likely have higher pre-test probability of CTS as compared to other screening contexts such as preemployment screening, where the prevalence would be expected to be very low. Except for two studies [8,22], all of the included studies recruited their participants from neurophysiology/electrodiagnosis and hand clinics. To overcome the effect of CTS pretest probability, we ensured likelihood ratios were reported in this SR. Likelihood ratios report diagnostic accuracy independent from the prevalence of a condition in a given sample, and it is suggested that clinicians consider likelihood ratios in their clinical diagnosis decision making [18].
Administration methods of the sensory and motor tests for CTS diagnosis were very diverse across the included studies. For instance, the four studies assessing the diagnostic accuracy of vibrometry had four different methods in testing and different decision rules for positive test results. The same principle applies to the hand grip strength, hypoesthesia, pinch grip strength, and SWMFs tests. We advise clinicians and researchers should carefully consider their ability to replicate test methods (as reported in Table 2) when deciding on selecting a sensory or motor test to rule in/out CTS.
We did not exclude studies based on the choice for reference standard. Due to the lack of a gold standard for CTS diagnosis [4], and the nature of CTS as a clinical syndrome, there is no universal agreement on a reference standard. The most commonly used reference standard in the included studies was NCS. While some might consider NCS as the most definitive reference standard, it can have false positive and negative results [4]. That is, there can be abnormal results in patients who have no symptoms, and patients with persistent symptoms without positive NCS can show benefit following carpal tunnel release. Similar to our previous SR of the diagnostic accuracy of scales, questionnaires and hand symptom diagrams [12], the highest sensitivities and specificities occurred when taking other clinical tests and history as the reference standard [8,22,25,26]. For instance, in the study done by Dale et al. 2011, among the three reference standards used, the highest diagnostic accuracy values occurred when taking Katz and Stirrat's hand symptom diagram as the reference standard [8].

Study limitations and future directions
A limitation of the current study was that we did not conduct a meta-analysis. Due to the heterogeneity in the tests methods, reference standards, and decision rules for positive tests thresholds, meta-analysis was precluded, and we reported the results narratively. A second limitation that we would like to acknowledge is the possibility of a publication bias, because we only included published literature, not the gray literature. Our choice of only including published literature is justifiable by the argument that we intended to produce a synthesis of the available peer-reviewed evidence-based literature. As with any other review, we might have missed some studies. Although we designed the search strategy in consultation with a health science research librarian, it is possible that we did not capture all of the available evidence.
We recommend future studies produce evidence with the highest quality and the lowest risk of bias by adhering strictly to the established guidelines. Moreover, there is a great need for studies assessing the clinical triangulation process of combining several categories of clinical diagnostic tests.

Conclusion
The evidence reported in this study was obtained mostly from studies at risk of bias. Among the included studies none of the sensory or motor tests had consistently high diagnostic accuracy properties reported by high quality evidence. Confirming the value of a single sensory or motor test for CTS diagnosis is pending future robust research. From the evidence available at present, none of these methods appear promising in helping to make a definitive diagnosis in the individual patient (though they are useful in demonstrating that both sensory and motor function are impaired by CTS when used in cohorts of patients in research studies). 1) NCS Bilateral limited electrophysiologic testing of the median and ulnar nerves at the wrists. Measured parameters included sensory amplitude, peak latency and takeoff latency in each nerve tested.

Appendix
A difference of at least 0.5 milliseconds between median and ulnar sensory peak latencies in the same wrist.

2) NCS + various surveillance symptom definitions for CTS
The self-administered questionnaire focused on demographic information, prior medical conditions, occupational history, current health status, and symptoms which may be related to upper extremity cumulative trauma disorders.
Eight CTS cases were defined. In addition to the subjective symptoms of CTS, one of the three objective electrodiagnostic criteria must have been met for a patient to be diagnosed with CTS.
MacDermid et al. 1994 [25] Clinical profile of CTS The electrodiagnostic testing was performed in the hospital laboratory using the laboratory standards for abnormality of median nerve conduction velocity and/or distal sensory latency.
A clinical profile of CTS determined by hand surgeons based on history and gross motor and sensory inspection, combined with independently obtained electrodiagnostic evidence of CTS.  Evaluation of normality was considered in the context of the entire neurophysiologic examination, which induced testing of ulnar, radial and proximal nerves as sources of pathology and examination of distal latencies, amplitudes and conduction times for motor and sensory nerves.
Makanji et al. 2013 [11] NCS All of the patients had electrophysiological testing (nerve conduction velocity and electromyography) in the same office. Median nerve conduction studies were performed across the wrist.
standards based on the American Association of Neuromuscular and Electrodiagnostic Medicine (AANEM).

Naranjo et al. 2007
[9] NCS Tests were performed with the guidance of two neurologists following the American Academy of Neurology protocol. These include performing a median sensory or motor nerve conduction studies.
An initial latency over 3.4 ms was considered abnormal.
Pagel et al.
2002 [27] NCS All electrodiagnostic testing was performed with a Nicolet Viking IV D (Nicolet, Madison, WI). The median and ulnar nerves were stimulated in the palm, and the response was recorded 8 cm proximally at the wrist.
If a patient had a median ulnar latency difference of 0.3 msec or an absent median response and a normal ulnar response.

Raudino 2000 [28]
NCS motor latencies of median and ulnar nerve were recorded using surface electrodes placed over the abductor pollicis brevis and abductor digiti minimi respectively, and stimulating supramaximal at the wrist at a distance of 6 cm.
According to their normal values (mean + 2 SD), latencies greater than 3 ms were considered abnormal.

Sartorio 2017 [10]
NCS Subjects with suspected CTS was subdivided into 4 groups based on EMG (severe/extreme-GrA, moderate-GrB, mild/minimal-GrC, negative-GrD) The presence of CTS was defined as positive EMG (GrAGrC), while subjects with negative EMG included in the GrD were considered healthy.
Szabo et al. 1999 [29] NCS Bilateral median and ulnar motor and sensory nerve conduction testing were the electrodiagnostic parameters considered in this study.
Abnormal if the latency was ≥4.5 ms or ≥ 3.5 ms across the wrist, respectively. If either one or both were abnormal, the patient was considered to have a positive electrodiagnostic test.
Yildirim & Gunduz 2015 [6] NCS The instrument used was a Medelec Sapphire 4 ME. Bilateral median motor and sensory nerve conduction potentials were recorded using standard techniques according to the practice parameters for the electrodiagnosis of CTS outlined by the American Academy of Neurology, the American Association of Neuromuscular and Electrodiagnostic Medicine, and the American Academy of Physical Medicine and Rehabilitation.
Abnormal electrophysiological findings suggesting CTS were categorized into three grades according to Stevens' classification: mild, moderate, and severe.