A systematic review of the diagnostic performance of orthopedic physical examination tests of the hip
BMC Musculoskeletal Disorders volume 14, Article number: 257 (2013)
Previous reviews of the diagnostic performances of physical tests of the hip in orthopedics have drawn limited conclusions because of the low to moderate quality of primary studies published in the literature. This systematic review aims to build on these reviews by assessing a broad range of hip pathologies, and employing a more selective approach to the inclusion of studies in order to accurately gauge diagnostic performance for the purposes of making recommendations for clinical practice and future research. It specifically identifies tests which demonstrate strong and moderate diagnostic performance.
A systematic search of Medline, Embase, Embase Classic and CINAHL was conducted to identify studies of hip tests. Our selection criteria included an analysis of internal and external validity. We reported diagnostic performance in terms of sensitivity, specificity, predictive values and likelihood ratios. Likelihood ratios were used to identify tests with strong and moderate diagnostic utility.
Only a small proportion of tests reported in the literature have been assessed in methodologically valid primary studies. 16 studies were included in our review, producing 56 independent test-pathology combinations. Two tests demonstrated strong clinical utility, the patellar-pubic percussion test for excluding radiologically occult hip fractures (negative LR 0.05, 95% Confidence Interval [CI] 0.03-0.08) and the hip abduction sign for diagnosing sarcoglycanopathies in patients with known muscular dystrophies (positive LR 34.29, 95% CI 10.97-122.30). Fifteen tests demonstrated moderate diagnostic utility for diagnosing and/or excluding hip fractures, symptomatic osteoarthritis and loosening of components post-total hip arthroplasty.
We have identified a number of tests demonstrating strong and moderate diagnostic performance. These findings must be viewed with caution as there are concerns over the methodological quality of the primary studies from which we have extracted our data. Future studies should recruit larger, representative populations and allow for the construction of complete 2×2 contingency tables.
The diagnostic value of many physical tests in orthopedic practice has been called into question and a number of these tests have been found to correspond poorly with anatomical models [1, 2]. In some cases, clinicians proceed directly to more invasive or technologically-involved ‘definitive’ investigations, however this is not always desirable, practical or economical . For example, the more direct approach has been blamed for diagnostic delays and misclassification of hip joint pathologies .
Recently, several diagnostic reviews of physical tests of the hip have been published [5–8] and they generally support the view that most studies are of low to moderate quality. Three of these reviews examined labral pathologies and/or femoroacetabular impingement [5, 6, 8] while a fourth looked at a wider range of pathologies . This systematic review aims to build on these reviews by assessing a broad range of hip pathologies, and employing a more selective approach to the inclusion of studies in order to accurately gauge diagnostic performance for the purposes of making recommendations for clinical practice and future research. We aim to determine:
which physical tests of the hip or physical clinical prediction rules have valid evidence from which their diagnostic performance in clinical practice can be calculated; and
whether any physical tests or clinical prediction rules have strong diagnostic utility; and
whether any physical tests or clinical prediction rules have moderate diagnostic utility.
In this systematic review, a preliminary search of various textbooks, medical journal databases, websites and grey literature sources was conducted to identify physical tests of the hip. Subsequently, an electronic database search strategy was developed, aided by a medical librarian (see Additional file 1), and applied to Medline (1950-July 2010), Embase (1980-July 2010), Embase Classic (1947–1979) and the Cumulative Index to Nursing and Allied Health Literature (CINAHL) (1982-July 2010). A follow up search was performed in March 2013 using Medline, Embase and CINAHL to identify studies published in the interim period following the original search (see Additional file 1).
Studies included in our review were required to:
compare a physical (index) test for the diagnosis of a particular hip pathology against a ‘gold standard’ (reference) test representing the true diagnostic result. Physical tests were defined as non-invasive bedside maneuvers, beyond inspection, point tenderness and palpation alone, which were intended to increase the probability of a particular diagnosis; and
report sufficient information to construct complete 2×2 contingency tables; and
recruit predominantly adult populations (where ages were indicated); and
be written in English.
Studies were excluded if they:
used physical tests under anesthesia or intra-operatively; or
used physical tests to diagnose vascular or neurologic pathologies.
Studies were also excluded if they did not meet our criteria for internally and externally valid methodology. These criteria are listed below.
For the purposes of internal validity, reference tests could not: (1) be dependent upon the index test result for interpretation, (2) be discredited for diagnosing the chosen pathology, or (3) allow for only partial construction of 2×2 contingency tables (e.g. by excluding persons with negative index test results from the study).
For the purposes of external validity, (1) the sample population had to reasonably represent a typical population presenting for diagnosis in clinical practice (e.g. they could not use healthy or asymptomatic controls who had no indications for testing), and (2) the index test needed to provide a threshold for dichotomizing results.
Assessments of validity were made independently by two authors and disputes arbitrated by a third author. No further restrictions were placed on study design, date of publication or clinical setting.
For the literature search in 2010, one author screened citations for inclusion on the basis of their title. The remaining citations were assessed independently by two authors, first by title and abstract and then by full text. Opposing views regarding inclusion were resolved by arbitration with the remaining authors. When new tests were identified, new search strategies were executed for them using Medline, Embase and Embase Classic (see Additional file 1). The follow up literature search and sorting process in March 2013 were conducted entirely by a single author.
The diagnostic performances of included physical tests are presented in terms of sensitivity, specificity, predictive values and likelihood ratios (LRs) with the latter being used to further identify tests demonstrating “strong” and “moderate” diagnostic utility. We favor the use of likelihood ratios because they offer the most valuable and comprehensive diagnostic information in the individual patient [9, 10]. Roughly speaking, tests with positive LRs greater than or equal to 10 or negative LRs less than or equal to 0.1 will cause almost conclusive, “strong” changes in post-test probability of disease. Positive LRs between 5 and 9.99 and negative LRs between 0.11 and 0.2 cause “moderate” changes in post-test probability . In order to limit the uncertainty caused by studies recruiting small sample populations, we required “strong” tests to meet our likelihood ratio criteria within their entire 95% confidence intervals (otherwise the test was classified as “moderate”). When diagnostic data was only presented in the form of percentages or fractions, we attempted to revert it back to integer form to determine the original population numbers in each diagnostic category of a 2x2 contingency table. We only pooled data from studies involving the exact same index test and target pathology.
Only a small proportion of hip tests identified in our preliminary search had their diagnostic performance assessed in methodologically valid primary studies. We identified sixteen studies containing data that satisfied our inclusion and exclusion criteria [11–26] (Figure 1). This produced a total of 56 independent test-pathology combinations (Additional file 2).
Two physical tests demonstrated strong diagnostic utility with the patellar-pubic percussion (PPP) test strongly excluding radiologically occult hip fractures (negative LR 0.05, 95% CI 0.03-0.08) , and the hip abduction sign strongly diagnosing sarcoglycanopathies in patients with known muscular dystrophies (positive LR 34.29, 95% CI 10.97-122.30)  (Table 1). The original description of these tests from the primary studies can be found in Additional file 2.
Fifteen independent test-pathology combinations demonstrated, at most, moderate diagnostic utility (Table 2). These included five tests for diagnosing symptomatic osteoarthritis , seven tests for diagnosing loosening of various components post-total hip arthroplasty  and three tests for diagnosing and excluding various hip fractures [11, 13, 24].
Previous reviews of physical tests have found much of the existing literature to be methodologically flawed and insufficient for guiding clinical practice. This review sought to identify clinically useful physical tests or combinations of tests that demonstrated strong and moderate diagnostic performance. This information could potentially be used to form future clinical prediction rules or guide future research. We found the PPP test strongly excluded radiologically occult hip fractures and the hip abduction sign strongly diagnosed sarcoglycanopathies in patients with known muscular dystrophies. In addition, we identified a number of tests with moderate usefulness for diagnosing and/or excluding hip fractures, symptomatic osteoarthritis and loosening of components post-THA.
While some of our results are promising at face value, the raw data needs to be considered in more detail.
Firstly, it is possible that we have overstated the utility of the PPP test since we have based our conclusions primarily on a single study by Tiru et al. . Two other studies recruiting smaller populations [11, 13] also employed the principle of osteophony when testing for hip fractures and found only moderate diagnostic utility. We did not pool the data from these studies they tested for radiologically apparent fractures, and the Bartford test employed by Bache and Cross  auscultated for sound transmitted by a tuning fork rather than percussion.
The hip abduction sign may also not perform as strongly as we suggested because Khadilkar and Singh  relied on retrospective testing of patients with known diagnoses of variable duration and severity. It is therefore possible that some of the recruited sample population may not have reflected clinical practice. Khadilkar and Singh’s  findings need to be confirmed prospectively in a pre-diagnosis setting.
There was significant uncertainty about the true diagnostic performance of some of the moderately useful physical tests because of the small sample populations recruited in the primary studies [11, 13, 24–26]. We suggest further testing with large sample populations would be of benefit to better assess if these tests should be considered for inclusion in future clinical prediction rules.
While we acknowledge that previous hip test reviews have found much of the literature to be methodologically flawed, we did not use cumulatively-scored quality assessment tools to analyze our data as the implications of these numerical values are not clear . Instead, we used our methodological validity criteria to provide a minimum standard to serve our primary purpose, which was to identify tests with strong and moderate diagnostic performance for use in clinical practice. Although our criteria are generally consistent with quality assessment tools and have been empirically associated with design-related bias , we acknowledge that this does not eliminate all bias and that there remain significant shortcomings in the literature. We believe our criteria represent a reasonable compromise for the sake of drawing basic conclusions. That said, since our criteria have not been independently validated, we have reported data from excluded studies in Additional file 3 when complete 2×2 contingency tables could be formed and Additional file 4 for the remaining studies and case reports. There were some discrepancies between this review and those that have been previously published. In some instances this was explained by calculation errors and in others this was because we found there was insufficient information in the primary study to construct 2×2 contingency tables for calculation of diagnostic performance.
There is valid evidence for the diagnostic performance of only a small proportion of physical tests of the hip in routine clinical practice. Two tests demonstrated strong diagnostic utility, the patellar-pubic percussion test for excluding radiologically occult hip fractures and the hip abduction sign for diagnosing sarcoglycanopathies in patients with known muscular dystrophies. In addition, we identified a number of tests with moderate usefulness for diagnosing and/or excluding hip fractures, symptomatic osteoarthritis and loosening of components post-THA. The primary studies from which our data are derived contain methodological flaws that bias their results. Future studies should recruit larger and more representative populations and allow for construction of complete 2×2 contingency tables.
Total hip arthroplasty
Positive likelihood ratio
Negative likelihood ratio
Cumulative Index to Nursing and Allied Health Literature
Negative predictive value
Positive predictive value
Hegedus EJ, Goode A, Campbell S, Morin A, Tamaddoni M, Moorman CT, Cook C: Physical examination tests of the shoulder: a systematic review with meta-analysis of individual tests. Br J Sports Med. 2008, 42: 80-92. discussion 92
Sackett DL, Rennie D: The science of the art of the clinical examination. JAMA. 1992, 267: 2650-2652.
McAlister FA, Straus SE, Sackett DL: Why we need large, simple studies of the clinical examination: the problem and a proposed solution. CARE-COAD1 group: clinical assessment of the reliability of the examination-chronic obstructive airways disease group. Lancet. 1999, 354: 1721-1724.
McBeath AA: Some common causes of hip pain: physical diagnosis is the key. Postgrad Med. 1985, 77: 189-192. 194–185, 198
Burgess RM, Rushton A, Wright C, Daborn C: The validity and accuracy of clinical diagnostic tests used to detect labral pathology of the hip: a systematic review. Manual Ther. 2011, 16: 318-326.
Leibold MR, Huijbregts PA, Jensen R: Concurrent criterion-related validity of physical examination tests for hip labral lesions: a systematic review. J Man Manip Ther. 2008, 16: E24-41.
Reiman MP, Goode AP, Hegedus EJ, Cook CE, Wright AA: Diagnostic accuracy of clinical tests of the hip: a systematic review with meta-analysis. Br J Sports Med . 2012, 47: 893-902.
Tijssen M, van Cingel R, Willemsen L, de Visser E: Diagnostics of femoroacetabular impingement and labral pathology of the hip: a systematic review of the accuracy and validity of physical tests. Arthroscopy. 2012, 28: 860-871.
Jaeschke R, Guyatt GH, Sackett DL: Users’ guides to the medical literature. III. How to use an article about a diagnostic test. B. What are the results and will they help me in caring for my patients? The evidence-based medicine working group. JAMA. 1994, 271: 703-707.
Simel DL, Samsa GP, Matchar DB: Likelihood ratios with confidence: sample size estimation for diagnostic test studies. J Clin Epidemiol. 1991, 44: 763-770.
Adams SL, Yarnold PR: Clinical use of the patellar-pubic percussion sign in hip trauma. Am J Emerg Med. 1997, 15: 173-175.
Anwar MM, Sugano N, Masuhara K, Kadowaki T, Takaoka K, Ono K: Total hip arthroplasty in the neglected congenital dislocation of the hip: a five- to 14-year follow-up study. Clin Orthop Relat Res. 1993, 295: 127-134.
Bache JB, Cross AB: The Barford test. A useful diagnostic sign in fractures of the femoral neck. Practitioner. 1984, 228: 305-308.
Bird PA, Oakley SP, Shnier R, Kirkham BW: Prospective evaluation of magnetic resonance imaging and physical examination findings in patients with greater trochanteric pain syndrome. Arthritis Rheum. 2001, 44: 2138-2145.
Birrell F, Croft P, Cooper C, Hosie G, Macfarlane G, Silman A: Predicting radiographic hip osteoarthritis from range of movement. Rheumatology (Oxford). 2001, 40: 506-512.
Garcia FL, Picado CH, Nogueira-Barbosa MH: Sonographic evaluation of the abductor mechanism after total hip arthroplasty. J Ultrasound Med. 2010, 29: 465-471.
Hananouchi T, Yasui Y, Yamamoto K, Toritsuka Y, Ohzono K: Anterior impingement test for labral lesions has high positive predictive value. Clin Orthop. 2012, 470: 3524-3529.
Holla JF, van der Leeden M, Roorda LD, Bierma-Zeinstra SM, Damen J, Dekker J, Steultjens MP: Diagnostic accuracy of range of motion measurements in early symptomatic hip and/or knee osteoarthritis. Arthritis Care Res (Hoboken). 2012, 64: 59-65.
Hossain M, Barwick C, Sinha AK, Andrew JG: Is magnetic resonance imaging (MRI) necessary to exclude occult hip fracture?. Injury. 2007, 38: 1204-1208.
Khadilkar SV, Singh RK: Hip abduction sign: a new clinical sign in sarcoglycanopathies. J Clin Neuromuscul Dis. 2001, 3: 13-15.
Narvani AA, Tsiridis E, Kendall S, Chaudhuri R, Thomas P: A preliminary report on prevalence of acetabular labrum tears in sports patients with groin pain. Knee Surg Sports Traumatol Arthrosc. 2003, 11: 403-408.
Olsson SS, Jernberger A, Tryggo D: Clinical and radiological long-term results after Charnley-Muller total hip replacement. A 5 to 10 year follow-up study with special reference to aseptic loosening. Acta Orthop Scand. 1981, 52: 531-542.
Röder C, Eggli S, Aebi M, Busato A: The validity of clinical examination in the diagnosis of loosening of components in total hip arthroplasty. J Bone Joint Surg Br. 2003, 85: 37-44.
Shin AY, Morin WD, Gorman JD, Jones SB, Lapinsky AS: The superiority of magnetic resonance imaging in differentiating the cause of hip pain in endurance athletes. Am J Sports Med. 1996, 24: 168-176.
Sutlive TG, Lopez HP, Schnitker DE, Yawn SE, Halle RJ, Mansfield LT, Boyles RE, Childs JD: Development of a clinical prediction rule for diagnosing hip osteoarthritis in individuals with unilateral hip pain. J Orthop Sports Phys Ther. 2008, 38: 542-550.
Tiru M, Goh SH, Low BY: Use of percussion as a screening tool in the diagnosis of occult hip fractures. Singapore Med J. 2002, 43: 467-469.
Whiting P, Harbord R, Kleijnen J: No role for quality scores in systematic reviews of diagnostic accuracy studies. BMC Med Res Methodol. 2005, 5: 19-
Lijmer JG, Mol BW, Heisterkamp S, Bonsel GJ, Prins MH, van der Meulen JH, Bossuyt PM: Empirical evidence of design-related bias in studies of diagnostic tests. JAMA. 1999, 282: 1061-1066.
The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2474/14/257/prepub
We thank the staff at the Ken Merten Library at Liverpool Hospital, Sydney, Australia, for their assistance in developing the search strategy for this review. We also thank the staff at the Fairfield Hospital Library, Sydney, Australia, for their assistance in retrieving studies for this review.
The authors declare that they have no competing interests.
LAR contributed to the design of the review; acquisition, analysis and interpretation of data; and drafting and revising of the manuscript. SA contributed to the conception and design of the review; analysis and interpretation of data; and revising of the manuscript. JMN contributed to the conception and design of the review; analysis and interpretation of data; and drafting and revising of the manuscript. RM contributed to the conception and design of the review; analysis and interpretation of data; and revising of the manuscript. SS contributed to the acquisition, analysis and interpretation of data, and revising of the manuscript. IAH contributed to the conception and design of the study; analysis and interpretation of the data; and revision of the manuscript. All authors read and approved the final manuscript.
Sam Adie, Justine Maree Naylor contributed equally to this work.
Electronic supplementary material
Additional file 1:Search strategy for Medline, Embase, Embase Classic and CINAHL. File shows search strategy, search terms and results for Medline, Embase, Embase Classic and CINAHL. (DOCX 57 KB)
Additional file 2:Diagnostic performances of physical test-hip pathology combinations included in review. File is a table of diagnostic characteristics of physical test-hip pathology combinations (sensitivity, specificity, positive and negative predictive values, and positive and negative likelihood ratios) from studies included in this review. (DOCX 81 KB)
Additional file 3:Diagnostic performances of physical test-hip pathology combinations from excluded studies (2×2 contingency tables). File is a table of diagnostic characteristics of physical test-hip pathology combinations (sensitivity, specificity, positive and negative predictive values, and positive and negative likelihood ratios) from excluded studies that allowed for the construction of complete 2×2 contingency tables. (DOCX 66 KB)
Additional file 4:Overview of excluded studies and case reports not presented in complete 2×2 contingency tables. File is a basic description of studies and case reports that were excluded from our studies and did not allow for the construction of complete 2×2 contingency tables (for example, because they excluded patients with negative index tests from their study). (DOCX 45 KB)
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
About this article
Cite this article
Rahman, L.A., Adie, S., Naylor, J.M. et al. A systematic review of the diagnostic performance of orthopedic physical examination tests of the hip. BMC Musculoskelet Disord 14, 257 (2013). https://doi.org/10.1186/1471-2474-14-257