Skip to main content
  • Research article
  • Open access
  • Published:

Physical examination tests of the shoulder: a systematic review and meta-analysis of diagnostic test performance



Physical examination tests of the shoulder (PETS) are clinical examination maneuvers designed to aid the assessment of shoulder complaints. Despite more than 180 PETS described in the literature, evidence of their validity and usefulness in diagnosing the shoulder is questioned.


This meta-analysis aims to use diagnostic odds ratio (DOR) to evaluate how much PETS shift overall probability and to rank the test performance of single PETS in order to aid the clinician’s choice of which tests to use. This study adheres to the principles outlined in the Cochrane guidelines and the PRISMA statement. A fixed effect model was used to assess the overall diagnostic validity of PETS by pooling DOR for different PETS with similar biomechanical rationale when possible. Single PETS were assessed and ranked by DOR. Clinical performance was assessed by sensitivity, specificity, accuracy and likelihood ratio.


Six thousand nine-hundred abstracts and 202 full-text articles were assessed for eligibility; 20 articles were eligible and data from 11 articles could be included in the meta-analysis. All PETS for SLAP (superior labral anterior posterior) lesions pooled gave a DOR of 1.38 [1.13, 1.69]. The Supraspinatus test for any full thickness rotator cuff tear obtained the highest DOR of 9.24 (sensitivity was 0.74, specificity 0.77). Compression-Rotation test obtained the highest DOR (6.36) among single PETS for SLAP lesions (sensitivity 0.43, specificity 0.89) and Hawkins test obtained the highest DOR (2.86) for impingement syndrome (sensitivity 0.58, specificity 0.67). No single PETS showed superior clinical test performance.


The clinical performance of single PETS is limited. However, when the different PETS for SLAP lesions were pooled, we found a statistical significant change in post-test probability indicating an overall statistical validity. We suggest that clinicians choose their PETS among those with the highest pooled DOR and to assess validity to their own specific clinical settings, review the inclusion criteria of the included primary studies. We further propose that future studies on the validity of PETS use randomized research designs rather than the accuracy design relying less on well-established gold standard reference tests and efficient treatment options.

Peer Review reports


Physical examination tests of the shoulder (PETS) aim to reproduce specific symptoms and signs as an aid for clinicians in diagnosing the painful shoulder. However, more than 180 different single PETS have been described in the literature [1] making the choice of which tests to use challenging. In addition, confusion arises because different names are used for the same test (e.g. Supraspinatus test = Empty can test = Jobe’s test [24]). Also, different criteria of positivity have been used for the same test (e.g. both ‘weakness’ [2] and/or ‘pain’ [3] as criterion of positivity for the supraspinatus test). Last but not least, several of the single PETS have been used for several different shoulder diagnoses (e.g. Yergason’s test originally published as a test of biceps pathology [5] is also used as test of glenoid labral pathology [6]). At present, therefore, there is a need to clarify the basis for an evidence based approach [7].

The validity of PETS based on meta-analysis from studies in primary care settings is scarce due to primary studies of insufficient quality [8]. However, several meta-analyses on PETS have been published in the specialty care setting. In one of these, a meta-analysis limited to PETS for subacromial impingement syndrome [9], the diagnostic validity of ‘Hawkins’, ‘Supraspinatus’, ‘Drop arm’ and ‘Lift-off’ tests was concluded to be limited by low pooled likelihood ratio (LR), but that ‘Lift-off’ test could be used to rule in a subscapularis tear. A more recent meta-analysis on rotator cuff tear recommended the ‘External rotation lag sign’ and ‘Painful arc’ tests based on findings of the highest pooled estimate of positive likelihood ratio and smallest confidence interval [10]. However, there was no overlap between the two meta-analyses regarding the studies finally retained for statistical pooling. Two additional meta-analyses have been published on PETS for superior labral anterior posterior (SLAP) lesions. In the first., ‘Active compression’, ‘Anterior slide’, ‘Crank’ and ‘Speed’ tests were included in the meta-analysis and assessed by estimated receiver operating characteristic curves [11]. ‘Anterior slide’ was concluded to perform worse than the other three tests but there were otherwise no significant differences [11]. The second meta-analysis on SLAP lesions [12] assessed Compression-rotation, Crank, Relocation, Speed and Yergason tests by pooled positive likelihood ratios and concluded that only the Yergason test showed statistical significant validity based on a likelihood ratio of 2.29 [1.21, 4.33]. In the update [13] of the only previous meta-analysis that has analyzed single PETS for all shoulder diagnosis (not limited to a specific diagnosis) [14], the concusion was that no single PETS were pathognomonic for any specific diagnoses and that the performance of PETS in general was low.

Given that the previous meta-analysis included different PETS and came to different conclusions, there is still a lack of robust evidence guiding clinicians on which tests to use in clinical practice and there is a need to assess if they are useful at all. The previous meta-analyses [914] were all aimed to pool data for single PETS assuming they were based on different biomechanical rationales. Only one of them included PETS for all shoulder diagnoses. It is therefore reasonable to suggest a different approach to meta-analysis of PETS.

In this systematic review we want to initially include PETS for all shoulder diagnoses commonly seen in specialty shoulder clinics, but limit the meta-analysis to include only high quality primary studies with a low risk of bias. Furthermore, we will try to pool different PETS that are based on similar biomechanical rationales in order to evaluate the validity of PETS in general.

This meta-analysis aims to use diagnostic odds ratio (DOR) [15], to evaluate how much PETS shift overall probability and to rank the test performance of single PETS in order to aid the clinician’s choice of which tests to use.


The protocol for this systematic review and meta-analysis adhered to the principles outlined in the handbooks of the Cochrane Collaboration [16], the Norwegian Knowledge Center for Health Services [17] and the preferred reporting items in systematic reviews and meta-analysis (PRISMA) statement [18].

Search methods for identification and processing of the literature

The electronic database searches were done in two stages (up to 2011; 2010 to June 2016). First stage, the searches were made in Medline (1946-), Embase (1980-), SPORT Discus (1975-); AMED (1985-); PEDRO (1929-) and the Cochrane library/Central. The alteration of the original search strategies was performed in 2015 and was used for searching the databases from 2010 to 2016. This modified search strategy included additional database-specific search terms as well as relevant text-words. A modified version of the methodological filter for diagnostic accuracy studies was applied [19, 20] in all searches. Additional citation searching and tracking was performed using ISI, SCOPUS and Google Scholar. Relevant reference lists of guidelines and systematic reviews were also checked. For a detailed description of the search strategy for Ovid Medline and PubMed see Additional file 1.

The search results were imported into an electronic reference database (EndNote) for removal of duplicates and further processing. Abstracts and full text articles were thereafter screened by the eligibility criteria for the meta-analysis. All evaluations, including assessments of eligibility and quality, were done by pairs of authors. Consistent interpretation of the eligibility and quality assessment process was ensured in consensus meetings with all authors before the respective processes were started. If doubt or dissent arose within the pair, consensus was sought with the other authors.

Eligibility criteria, quality assessment and meta-analysis

Full-text articles which met the initial eligibility criteria 1–8 (Table 2) were assessed for potential sources of bias by use of the original quality assessment tool for diagnostic accuracy studies (QUADAS) [21].

In line with recommendations [16, 21], the 14 original QUADAS questions were adapted and a scoring guide was developed specifically for this review (See Appendix 2 in Additional file 2 for a detailed description). 2 × 2 tables were constructed from articles which met all eligibility criteria (Table 1). In line with convention [22], 0.5 was automatically added to all cells of the 2 × 2 table if one cell was 0. A fixed effect model was used to calculate sensitivity, specificity, accuracy, likelihood ratios (LR+/−) and DOR from pooled 2 × 2 tables. Exclusion of potential outlier studies before final pooling of data was based on visual outlier appearance in a Funnel plot, measurement of Cooks distance and assessment of spectrum effects [23] including disease prevalence in primary studies deviating from the average for all PETS within each diagnostic category. The performance of Single PETS were assessed and ranked by pooled DOR for each test and likelihood ratios were calculated to assess clinically relevant shifts in probability. The diagnostic validity of PETS in general was assessed by pooling DOR for different PETS based on similar biomechanical rationale (only possible for SLAP lesions). DOR pooled for detection of SLAP lesions was visualized in a forest plot. Heterogeneity for data in the forest plot was assessed by chi-square and I-square. Both bivariate and hierarchical random effects modelling were planned as options in the case of pooling five or more studies with high levels of heterogeneity.

Table 1 Eligibility criteria for inclusion in the meta-analysis


Articles and PETS included in the meta-analysis

The flow of the search and selection process is presented in Fig. 1.

Fig. 1
figure 1

The flow of the search and selection process in this systematic review and meta-analysis of physical examination tests of the shoulder. 1QUADAS was scored for the all the articles that met the initial eligibility criteria. QUADAS-quality assessment tool for diagnostic accuracy studies

From the 6900 abstracts and 202 full-text articles assessed for eligibility, 20 articles [2, 3, 6, 2440] were found to have an acceptable risk of bias after QUADAS scoring (Fig. 2, Additional files 3, 4, and 5).

Fig. 2
figure 2

Risk of bias in the 104 articles assessed by QUADAS

All the PETS reported in the 20 articles are listed in Appendix 1 (Additional file 2, see also Additional file 5 for extracted raw-data). Data from 11 articles, where at least two articles had described and interpreted the same single PETS the same way, was available for meta-analysis (see Additional file 6). The meta-analysis included PETS from three shoulder diagnoses (10 for SLAP lesions, two for subacromial impingement syndrome and one for rotator cuff tear). Subsequent assessments of outlier characteristics led to excluding one of the PETS [30] from the meta-analysis (Fig. 3).

Fig. 3
figure 3

a Evidence for validity of PETS in diagnosing SLAP lesions. The diamond represents a pooled DOR of 1.38 with a 95% confidence interval of [1.13, 1.69]. The Forrest plot also visualizes that the variation in performance between the presumably different PETS was low. Heterogeneity chi-squared was 26.6 (d.f. = 19), p = 0.12; I-squared (variation in DOR attributable to heterogeneity) was 28.5%. PETS-physical examination tests of the shoulder, DOR-diagnostic odds ratio. b Funnel plot of 2 × 2 tables constructed for SLAP lesions. Nos. 15, 17 and 19 were omitted in the meta-analysis due to outlier characteristics; i.e. visual outlier appearance (No. 19), Cooks distance (No. 19) and disease prevalences (for the 10 PETS) deviating from the average 46% (72% for Nos. 15 and 17 and 31% for No. 19). Assessment of spectrum effects showed that Nos. 19 (Biceps load II test, (Kim, S.H -01)) and Nos. 15 and 17 (the O’Brien and Crank test, (Myers, T.H -05)) had included a non-representative spectrum of patients; they had low average ages (30.6 years [No. 19] and 23.9 years [Nos. 15&17]) and for Nos. 15&17 only athletes younger than 50 were included. Ln(DOR)-natural logarithmic transformation of diagnostic odds ratio

Evidence of diagnostic validity of PETS

Only PETS for SLAP lesions could be assessed for overall validity by pooling several different PETS based on similar biomechanical rationales. The pooled DOR of the included PETS for SLAP lesions was 1.38 [1.13, 1.69]. Heterogeneity chi-squared was 26.6 (d.f. = 19), p = 0.12; I-squared (variation in DOR attributable to heterogeneity) was 28.5% (Fig. 3a). A summary of results for the single PETS included in the meta-analysis is presented in Table 2.

Table 2 Diagnostic measures of single PETS ranked by DOR

The Compression-Rotation test [41] obtained the highest pooled DOR among single PETS in the SLAP category: DOR = 6.36 [1.41, 28.59]; specificity 0.89 and sensitivity 0.43. The highest ranks by pooled DOR for single PETS within the remaining shoulder diagnoses analyzed were the Hawkins test [42] for subacromial impingement syndrome: DOR = 2.86 [1.14, 7.17]; specificity 0.67, sensitivity 0.58; and the Supraspinatus test [4] for diagnosing any full thickness rotator cuff tear. The Supraspinatus test obtained the highest DOR overall: DOR = 9.24 [1.99, 42.84]; sensitivity 0.74, specificity 0.77.


This meta-analysis found statistical evidence for diagnostic validity of PETS when different tests for SLAP lesions were pooled (DOR = 1.38). Among the single PETS included in the meta-analysis, the highest DOR (9.24) overall was obtained for the Supraspinatus test in diagnosing any full thickness rotator cuff tear. The Compression-Rotation test was ranked highest of the SLAP tests (DOR 6.36) and the Hawkins test (DOR 2.86) for subacromial impingement syndrome (See Table 2 for details). However, the high risk of bias in primary studies and the fact that single PETS were performed and interpreted in diverging ways, limited the number of single PETS available for meta-analysis.

What constitutes superior clinical performance of a clinical test? In line with previous findings [13], no single PETS in this meta-analysis showed superior diagnostic validity when pooled test performance was assessed. An ideal test should have the ability to discriminate between subjects with and without the condition in question, i.e. a concurrent high sensitivity and specificity is sought. LR and DOR both convey a measurement for this concurrency (LR + =sensitivity/1-specificity; LR- = 1-sensitivity/specificity and DOR = LR+/LR-) of which DOR is the most sensitive single indicator of test performance [15]. For instance, when sensitivity and specificity both rise above 0.91; LR+ rises above 10 and DOR rises above 100. When reaching perfect test performance DOR rises to infinity. Nevertheless, LR may be more intuitive to the clinician when assessing clinical performance. According to Jaeschke et. al. [43], LR ratios >10 (LR+) or <0.1 (LR-) are needed to generate clinically conclusive changes in probability and moderate shifts are generated by a LR+ of 5–10 or LR- of 0.1-0.2.

When Walton et al. [12] recommended the Yergason test for SLAP lesions this was based on a pooled LR+ of 2.29. We found a similar LR+ (2.50) for the Yergason test and a slightly higher LR+ (3.91) for the Compression-Rotation test. However, when ranked by DOR the Yergason test performed second to Compression-Rotation test in our results (Table 2). None of the pooled results for single PETS resulted in LR+ above the range of 2–5 representing a small shift in probability [43].

The original study of the validity of a single PETS tend to report much better performance than later less biased attempts to replicate results. Despite the high sensitivity and specificity reported in the first study on Biceps load II [30], outlier characteristics led to exclusion from our meta-analysis (Fig. 3b). This decision is supported by previous reports about extensive bias in original studies and is in line with the exclusion of the original study on the Active Compression Test in a previous meta-analysis [13].

The forest plot (Fig. 3) visualizes the variation in the estimated performance of presumably different PETS. As we see, the estimated performance tends to vary between studies more than between the different tests, with a possible exception for the anterior slide test which also was found inferior to other SLAP tests in a previous meta-analysis [11]. In PETS aimed to detect SLAP lesions, most are designed to manipulate the superior labrum by stressing the glenohumeral joint often in combination with pulling on the biceps tendon (e.g. the Yergasons test of O’Brian test). This could be one of the reasons that performances of different tests vary relatively little, but this cannot explain why the general validity of PETS is poor. However, pathoanatomical/biomechanical rationale that most PETS are based on have recently been debated. For example, in subacromial impingement syndrome, the rationale for PETS (e.g. Hawkins and Neer’s sign tests) is that the greater tuberosity is rotated up underneath the acromion to force pinching of the bursa and supraspinatus tendon to reproduce impingement pain. The evidence for this postulated biomechanical explanation for the pain elicited is lacking [44]. Moreover, the fact that the interplay between genetics and psychological factors predicts shoulder pain in experimental and postoperative settings [45] also challenges the idea of a sole biomechanical explanation of shoulder pain.

In some of the previous meta-analysis of PETS hierarchical statistical modeling has been used to estimate receiver operating curves [9, 13]. No optimal curves for any single PETS have been documented apart from one possible exception for the Lift-off test though there was great uncertainty in the estimated curve. Hierarchical and bivariate random effects modeling were attempted also in our review but were not found feasible due to a low number of articles with acceptable risk of bias included for each single PETS. As heterogeneity was insignificant, a fixed effect model was used.

Despite the meticulous procedure to ensure high-quality input with an acceptable risk of bias, 9 of the 20 studies identified as eligible could not be included in the meta-analysis. In some, this was due to significant errors in reconstructing 2 × 2 tables such as test performance reported in the text of the result section that differed from that reported in tables [24] and that labels of several tables had been switched [28]. Unfortunately, some of these results have been included in previous systematic reviews [13].

Due to low quality of primary studies and strict selection criteria, we were only able to pool data for PETS within three shoulder diagnoses (SLAP lesions, subacromial impingement syndrome and for different degrees of rotator cuff tears only the supraspinatus test). Since gold standard reference tests have not been established for all shoulder diagnoses (e.g. multidirectional instability [46]), the accuracy study design itself may also present a challenge for the complete review of PETS as the validity of some PETS cannot be compared to a gold standard reference test. This may partially explain why no single PETS for multidirectional instability and adhesive capsulitis or other glenohumeral pathologies could be included in this meta-analysis. However, these and other shoulder diagnoses should still be assessed by the clinician as part of the general clinical examination.

The lack of uniform diagnostic labeling used in randomized controlled trials has led Schellingerhout et al. [7] to argue for abolishing diagnostic labels in shoulder pain patients altogether. Hence, there is a need for a new approach in future research on the validity of PETS and shoulder diagnoses. The GRADE initiative [47] suggests that validity of different diagnostic subgrouping strategies should be evaluated in a randomized design providing direct comparison of effects on patient-important outcomes (e.g. pain and shoulder function) for different diagnostic strategies, rather than the indirect evidence provided by the accuracy design. We therefore suggest that future research on the validity of PETS consider using such a randomized design.

Limitations and strengths

This study adhered to the state of the art methodology for systematic reviews and diagnostic meta-analysis. A broad scope without limitations to any specific shoulder diagnoses was chosen to strengthen the potential clinical applicability of results. In the meta-analysis, a clear description of inclusion criteria was made mandatory for primary studies to ensure that applicability in other clinical settings can be assessed for all studies included. The chosen QUADAS cutoff in this study was in line with that used in several previous reviews [14, 48] and particularly strong selection criteria were used for the meta-analysis to ensure inclusion of only high quality primary studies with a low risk of bias. However, with strong selection criteria, there is a risk that relevant primary studies were excluded from the meta-analysis and that this may have biased our conclusions. In addition the application of a QUADAS cutoff score has been advised against by its developers [49] and our choice may have induced a selection bias of primary studies. Also, due to the small number of primary studies available for pooling, hierarchical or bivariate random effects modeling were not feasible. However, since heterogeneity was low, a fixed effects approach could be used. A revised edition of the original QUADAS tool has been published [50]. Implementation was not possible in this review as QUADAS scoring had already started with the original tool. This was a meta-analysis of single PETS but in clinical practice a combination of tests is commonly used. Several of the included primary studies reported diagnostic performance when different tests were combined [3, 26, 34, 35, 37]. However, as test combinations differ, meaningful statistical pooling was not feasible and assessment of test combinations was beyond the specific scope of this meta-analysis. Another important limitation regarding conclusions and recommendations of this meta-analysis is the designated context of specialist care with high prevalence of shoulder pathology and co-morbidity. Care should be taken to assess applicability of results to any specific clinical context. To enable clinicians to assess transferability of primary research findings to their own specific spectrum of patients, we only included studies where inclusion criteria had been clearly described. The extraction of raw data from the included primary studies have been provided for clinicians own scrutiny (Additional file 5).


The clinical performance of single PETS is limited. However, our evidence indicates statistical validity when the different PETS for SLAP lesions were pooled. We suggest that clinicians choose their PETS among those with the highest rank of pooled DOR (Compression rotation, Yergason, Anterior apprehension or Crank tests for SLAP lesions; Hawkins-Kennedy for subacromial impingement and the supraspinatus/empty can/Jobe’s test for full thickness rotator cuff tears). Furthermore, we recommend that the clinician assess the inclusion criteria in relevant primary studies to assess the validity for their own clinical setting. There is still a need for a new research approach to the evidence based shoulder examination. A new approach to the diagnostic labels in the shoulder has also been called for by Schellingerhout et al. [7]. We therefore propose that future studies on the validity of PETS use a randomized research design [47] in order to compare the validity of different diagnostic strategies related to their effect on patient-outcomes.



Diagnostic Odds Ratio

LR +/−:

Likelihood ratio positive/negative


Physical Examination Test(s) of the Shoulder


Preferred Reporting Items in Systematic Reviews and Meta-Analysis


Quality Assessment of Diagnostic Accuracy Studies


Superior Labral Anterior Posterior


  1. McFarland EG. Examination of the Shoulder: The Complete Guide. Thieme; 2006. ISBN: 1588903710.

  2. Holtby R, Razmjou H. Validity of the supraspinatus test as a single clinical test in diagnosing patients with rotator cuff pathology. J Orthop Sports Phys Ther. 2004;34:194–200.

    Article  PubMed  Google Scholar 

  3. Chew K, Pua YH, Chin J, Clarke M, Wong YS. Clinical predictors for the diagnosis of supraspinatus pathology. Physiotherapy Singapore. 2010;13(2):12–7.

    Google Scholar 

  4. Jobe FW, Moynes DR. Delineation of diagnostic criteria and a rehabilitation program for rotator cuff injuries. Am J Sports Med. 1982;10:336–9.

    Article  CAS  PubMed  Google Scholar 

  5. Yergason RM. Supination sign. J Bone Joint Surg Am. 1931;13:160–160.

  6. Holtby R, Razmjou H. Accuracy of the Speed’s and Yergason’s tests in detecting biceps pathology and SLAP lesions: comparison with arthroscopic findings. Arthroscopy. 2004;20:231–6.

    Article  PubMed  Google Scholar 

  7. Schellingerhout JM, Verhagen AP, Thomas S, Koes BW. Lack of uniformity in diagnostic labeling of shoulder pain: time for a different approach. Man Ther. 2008;13:478–83.

    Article  PubMed  Google Scholar 

  8. Hanchard NC, Lenza M, Handoll HH, Takwoingi Y. Physical tests for shoulder impingements and local lesions of bursa, tendon or labrum that may accompany impingement. Cochrane Database Syst Rev. 2013;4, CD007427.

    Google Scholar 

  9. Alqunaee M, Galvin R, Fahey T. Diagnostic accuracy of clinical tests for subacromial impingement syndrome: a systematic review and meta-analysis. Arch Phys Med Rehabil. 2012;93:229–36.

    Article  PubMed  Google Scholar 

  10. Hermans J, Luime JJ, Meuffels DE, Reijman M, Simel DL, Bierma-Zeinstra SM. Does this patient with shoulder pain have rotator cuff disease?: The Rational Clinical Examination systematic review. JAMA. 2013;310:837–47.

    Article  CAS  PubMed  Google Scholar 

  11. Meserve BB, Cleland JA, Boucher TR. A meta-analysis examining clinical test utility for assessing superior labral anterior posterior lesions. Am J Sports Med. 2009;37:2252–8.

    Article  PubMed  Google Scholar 

  12. Walton DM, Sadi J. Identifying SLAP lesions: a meta-analysis of clinical tests and exercise in clinical reasoning. Phys Ther Sport. 2008;9:167–76.

    Article  PubMed  Google Scholar 

  13. Hegedus EJ. Which physical examination tests provide clinicians with the most value when examining the shoulder? Update of a systematic review with meta-analysis of individual tests. Br J Sports Med. 2012.

  14. Hegedus EJ, Goode A, Campbell S, Morin A, Tamaddoni M, Moorman 3rd CT, Cook C. Physical examination tests of the shoulder: a systematic review with meta-analysis of individual tests. Br J Sports Med. 2008;42:80–92. discussion 92.

    Article  CAS  PubMed  Google Scholar 

  15. Glas AS, Lijmer JG, Prins MH, Bonsel GJ, Bossuyt PM. The diagnostic odds ratio: a single indicator of test performance. J Clin Epidemiol. 2003;56:1129–35.

    Article  PubMed  Google Scholar 

  16. Cochrane-Collaboration. Handbook for DTA Reviews. In: Book Handbook for DTA Reviews. Cochrane collaboration; 2011.

  17. NOKC. Handbook for Norwegian Knowledge Center for the Health Services. In: Book Handbook for Norwegian Knowledge Center for the Health Services. 2009.

    Google Scholar 

  18. Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gotzsche PC, Ioannidis JP, Clarke M, Devereaux PJ, Kleijnen J, Moher D. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. PLoS Med. 2009;6, e1000100.

    Article  PubMed  PubMed Central  Google Scholar 

  19. Haynes RB, Wilczynski NL. Optimal search strategies for retrieving scientifically strong studies of diagnosis from Medline: analytical survey. BMJ. 2004;328:1040.

    Article  PubMed  PubMed Central  Google Scholar 

  20. Beynon R, Leeflang MM, McDonald S, Eisinga A, Mitchell RL, Whiting P, Glanville JM. Search strategies to identify diagnostic accuracy studies in MEDLINE and EMBASE. Cochrane Database Syst Rev 2013:MR000022.

  21. Whiting PF, Weswood ME, Rutjes AW, Reitsma JB, Bossuyt PN, Kleijnen J. Evaluation of QUADAS, a tool for the quality assessment of diagnostic accuracy studies. BMC Med Res Methodol. 2006;6:9.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Deville WL, Buntinx F, Bouter LM, Montori VM, de Vet HC, van der Windt DA, Bezemer PD. Conducting systematic reviews of diagnostic studies: didactic guidelines. BMC Med Res Methodol. 2002;2:9.

    Article  PubMed  PubMed Central  Google Scholar 

  23. Mulherin SA, Miller WC. Spectrum bias or spectrum effect? Subgroup variation in diagnostic test evaluation. Ann Intern Med. 2002;137:598–602.

    Article  PubMed  Google Scholar 

  24. Ardic F, Kahraman Y, Kacar M, Kahraman MC, Findikoglu G, Yorgancioglu ZR. Shoulder impingement syndrome: relationships between clinical, functional, and radiologic findings. Am J Phys Med Rehabil. 2006;85:53–60.

    Article  PubMed  Google Scholar 

  25. Bak K, Sorensen AKB, Jorgensen U, Nygaard M, Krarup AL, Thune C, Sloth C, Pedersen ST. The value of clinical tests in acute full-thickness tears of the supraspinatus tendon: does a subacromial lidocaine injection help in the clinical diagnosis? a prospective study. Arthroscopy. 2010;26(6):734–42.

    Article  PubMed  Google Scholar 

  26. Fodor D, Poanta L, Felea I, Rednic S, Bolosiu H. Shoulder impingement syndrome: correlations between clinical tests and ultrasonographic findings. Ortop Traumatol Rehabil. 2009;11:120–6.

    PubMed  Google Scholar 

  27. Hertel R, Ballmer FT, Lombert SM, Gerber C. Lag signs in the diagnosis of rotator cuff rupture. J Shoulder Elbow Surg. 1996;5:307–13.

    Article  CAS  PubMed  Google Scholar 

  28. Kim E, Jeong HJ, Lee KW, Song JS. Interpreting positive signs of the supraspinatus test in screening for torn rotator cuff. Acta Med Okayama. 2006;60:223–8.

    PubMed  Google Scholar 

  29. Kim HA, Kim SH, Seo YI. Ultrasonographic findings of painful shoulders and correlation between physical examination and ultrasonographic rotator cuff tear. Mod Rheumatol. 2007;17:213–9.

    Article  PubMed  Google Scholar 

  30. Kim SH, Ha KI, Ahn JH, Choi HJ. Biceps load test II: A clinical test for SLAP lesions of the shoulder. Arthroscopy. 2001;17:160–4.

    Article  CAS  PubMed  Google Scholar 

  31. Miller CA, Forrester GA, Lewis JS. The validity of the lag signs in diagnosing full-thickness tears of the rotator cuff: a preliminary investigation. Arch Phys Med Rehabil. 2008;89:1162–8.

    Article  PubMed  Google Scholar 

  32. Myers TH, Zemanovic JR, Andrews JR. The resisted supination external rotation test: a new test for the diagnosis of superior labral anterior posterior lesions. Am J Sports Med. 2005;33:1315–20.

    Article  PubMed  Google Scholar 

  33. Nakagawa S, Yoneda M, Hayashida K, Obata M, Fukushima S, Miyazaki Y. Forced shoulder abduction and elbow flexion test: a new simple clinical test to detect superior labral injury in the throwing shoulder. Arthroscopy. 2005;21:1290–5.

    Article  PubMed  Google Scholar 

  34. Oh JH, Kim JY, Kim WS, Gong HS, Lee JH. The evaluation of various physical examinations for the diagnosis of type II superior labrum anterior and posterior lesion. Am J Sports Med. 2008;36:353–9.

    Article  PubMed  Google Scholar 

  35. Park HB, Yokota A, Harpreet GS, El Rassi G, McFarland EG. Diagnostic accuracy of clinical tests for the different degrees of subacromial impingement syndrome. J Bone Joint Surg Am. 2005;87:1446–55.

    PubMed  Google Scholar 

  36. Razmjou H, Holtby R, Myhr T. Pain provocative shoulder tests: reliability and validity of the impingement tests. Physiother Can. 2004;56:229–36.

    Article  Google Scholar 

  37. Walton J, Mahajan S, Paxinos A, Marshall J, Bryant C, Shnier R, Quinn R, Murrell GA. Diagnostic values of tests for acromioclavicular joint pain. J Bone Joint Surg Am. 2004;86-A:807–12.

    Article  PubMed  Google Scholar 

  38. Zaslav KR. Internal rotation resistance strength test: a new diagnostic test to differentiate intra-articular pathology from outlet (Neer) impingement syndrome in the shoulder. J Shoulder Elbow Surg. 2001;10:23–7.

    Article  CAS  PubMed  Google Scholar 

  39. Collin P, Treseder T, Denard PJ, Neyton L, Walch G, Ladermann A. What is the best clinical test for assessment of the teres minor in massive rotator cuff tears? Clin Orthop Relat Res. 2015;473:2959–66.

    Article  PubMed  PubMed Central  Google Scholar 

  40. Toprak U, Ustuner E, Ozer D, Uyanik S, Baltaci G, Sakizlioglu SS, Karademir MA, Atay AO. Palpation tests versus impingement tests in Neer stage I and II subacromial impingement syndrome. Knee Surg Sports Traumatol Arthrosc. 2013;21:424–9.

    Article  PubMed  Google Scholar 

  41. Snyder SJ, Karzel RP, Del Pizzo W, Ferkel RD, Friedman MJ. SLAP lesions of the shoulder. Arthroscopy. 1990;6:274–9.

    Article  CAS  PubMed  Google Scholar 

  42. Hawkins RJ, Kennedy JC. Impingement syndrome in athletes. Am J Sports Med. 1980;8:151–8.

    Article  CAS  PubMed  Google Scholar 

  43. Jaeschke R, Guyatt GH, Sackett DL. Users’ guides to the medical literature. III. How to use an article about a diagnostic test. B. What are the results and will they help me in caring for my patients? The Evidence-Based Medicine Working Group. JAMA. 1994;271:703–7.

    Article  CAS  PubMed  Google Scholar 

  44. Papadonikolakis A, McKenna M, Warme W, Martin BI, Matsen 3rd FA. Published evidence relevant to the diagnosis of impingement syndrome of the shoulder. J Bone Joint Surg Am. 2011;93:1827–32.

    Article  PubMed  Google Scholar 

  45. George SZ, Wallace MR, Wright TW, Moser MW, Greenfield 3rd WH, Sack BK, Herbstman DM, Fillingim RB. Evidence for a biopsychosocial influence on shoulder pain: pain catastrophizing and catechol-O-methyltransferase (COMT) diplotype predict clinical pain ratings. Pain. 2008;136:53–61.

    Article  CAS  PubMed  Google Scholar 

  46. Saccomanno MF, Fodale M, Capasso L, Cazzato G, Milano G. Generalized joint laxity and multidirectional instability of the shoulder. Joints. 2013;1:171–9.

    PubMed  Google Scholar 

  47. Schunemann HJ, Oxman AD, Brozek J, Glasziou P, Jaeschke R, Vist GE, Williams Jr JW, Kunz R, Craig J, Montori VM, et al. Grading quality of evidence and strength of recommendations for diagnostic tests and strategies. BMJ. 2008;336:1106–10.

    Article  PubMed  PubMed Central  Google Scholar 

  48. Wright AA, Wassinger CA, Frank M, Michener LA, Hegedus EJ. Diagnostic accuracy of scapular physical examination tests for shoulder disorders: a systematic review. Br J Sports Med. 2012.

  49. Whiting P, Harbord R, Kleijnen J. No role for quality scores in systematic reviews of diagnostic accuracy studies. BMC Med Res Methodol. 2005;5:19.

    Article  PubMed  PubMed Central  Google Scholar 

  50. Whiting PF, Rutjes AW, Westwood ME, Mallett S, Deeks JJ, Reitsma JB, Leeflang MM, Sterne JA, Bossuyt PM. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011;155:529–36.

    Article  PubMed  Google Scholar 

Download references


The authors wish to extend their gratitude to Research Librarian Solveig Isabel Taylor (University Library, NTNU) for designing and executing the electronic database searches, to Kari Skinningsrud for help with preparation of figures, tables and the manuscript, and to Dr. Ulrich Schattel who facilitated this project through continuous support and contributions in discussions with SG.


This study was funded by Trondheim University Hospital, Department of Physical Medicine and Rehabilitation where four of the authors have been employed (SG, FG, MR and GL). The funding body granted 6 months for SG to work with this systematic review but otherwise had no role in design, analysis and interpretation of data, the writing of the manuscript or in the decision to submit the manuscript for publication.

Availability of data and materials

All relevant data (including raw-data) has been provided in figures, tables and supplements.

Authors’ contributions

SG conceived of the study and developed its design and protocol together with GL. SG organized the search and selection process; i.e. the electronic database search, removal of duplicates, coordinated the contributions of the other authors and drafted the manuscript. Eligibility and quality assessments were done by the following pairs of authors MR/JD, SG/GL, MR/FG and SG/FG. Reference hand search was done by FG and SG. Data-extraction in preparation for meta-analysis was done by FG and SG. Figures and tables were prepared by FG and SG. GL conducted the statistical pooling of data in STATA and helped to draft the first manuscript. All authors have read and approved the final manuscript.

Authors’ information

Four of the authors are medical doctors, three are specialists in physical medicine and rehabilitation (SG, MR and GL) and one is an orthopedic surgeon (JD). One of the authors (FG) is a physiotherapist. GL and JD are professors, SG works as a PhD candidate.

Competing interests

The authors declare that they have no competing interests.

Consent for publication

Not applicable.

Ethics approval and consent to participate

Not applicable. This systematic review and meta-analysis did not involve research on any human subjects requiring informed consent.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Sigmund Ø. Gismervik.

Additional files

Additional file 1: Table S1.

Detailed description of the literature search strategy. (XLS 31 kb)

Additional file 2:

Contains: a) Overview of PETS in the 20 articles with low risk of bias. b) Adapted QUADAS assessment tool and scoring guide. c) Full initial eligibility criteria for abstracts and full text articles. (DOC 104 kb)

Additional file 3: Table S2.

Quality scores for the 20 full text articles with acceptable risk of bias. (XLSX 17 kb)

Additional file 4:

QUADAS score table (containing scores for all articles assessed). (XLS 2873 kb)

Additional file 5:

Data-extraction from 20 articles with low risk of bias (raw-data). (XLS 109 kb)

Additional file 6:

Data-extraction prepared for Meta-analysis (raw-data). (XLS 83 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gismervik, S.Ø., Drogset, J.O., Granviken, F. et al. Physical examination tests of the shoulder: a systematic review and meta-analysis of diagnostic test performance. BMC Musculoskelet Disord 18, 41 (2017).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: