Femoral neck fracture: the reliability of radiologic classifications
BMC Musculoskeletal Disorders volume 22, Article number: 1063 (2021)
Femoral neck fractures (FNF) are one of the most common injury in the elderly. A valid radiographic classification system is mandatory to perform the correct treatment and to allow surgeons to facilitate communication. This study aims to evaluate reliability of 2018 AO/OTA Classification, AO/OTA simplified and Garden classification.
Six Orthopaedic surgeons, divided in three groups based on trauma experience, evaluated 150 blinded antero-posterior and latero-lateral radiography of FNF using Garden classification, 2018 AO/OTA and simplified AO/OTA classification. One month later, the radiographs were renumbered and then each observer performed a second evaluation of the radiographs. The Kappa statistical analysis was used to determine the reliability of the classifications. Cohen’s Kappa was calculated to determine intra and inter observer reliability. Fleiss’ Kappa was used to determine multi-rater agreement.
The k values of interobserver reliability for Garden classification was from 0,28 to 0,73 with an average of 0,49. AO classification showed reliability from 0,2 to 0,42, with average of 0,30. Simplified AO/OTA classification showed a reliability from 0,38 to 0,58 with an average of 0,48.
The values of intra observer reliability for Garden classification was from 0,48 to 0,79 with an average of 0,63. AO classification showed reliability from 0,2 to 0,64 with an average of 0,5. Simplified AO/OTA classification showed a reliability from 0,4 to 0,75 with an average of 0,61.
The revised 2018 AO/OTA classification simplified the previous classification of intracapsular fracture but remain unreliable with only fair interobserver reliability. The simplified AO/OTA classification show a reliability similar to Garden classification, with a moderate interobserver reliability. The experience of the surgeons seems not to improve reliability. No classification has been shown to be superior in terms of reliability.
Proximal femur fracture is one of the most common type of fracture in the elderly. It occurs in 18% of women and in 6% of men worldwide . It is caused by accidental falls in elderly patients, due to osteoporosis . The incidence of proximal femur fracture has raised worldwide in the last two decades along with the increase in the average age of the population. In fact, the global number of hip fractures is expected to increase from 1.26 million in 1990 to 4.5 million by the year 2050 .
The incidence of femoral neck fractures (FNF) is approximately equal to the incidence of pertrochanteric fractures, in combination making up over 90% of all proximal femur fractures .
In Italy, hip fractures occurred in people over 65 years increased from 89,601 to 94,525 during the period from 2007 to 2014 . This leads to an increasing number of hospital admission and hospitalization costs . Furthermore hip fractures affect the quality of life of patients . For this reason it is important to reach a fast and correct diagnosis and perform an adequate and prompt treatment to reduce post-operative complications  and mortality .
The treatment of choice, in almost all of the cases, is surgical. The choice of a specific treatment option is based on the stability and orientation of the fracture and patient factors such as age, function, and bone quality [9, 10]. For unstable FNF the treatment of choice is hip replacement (total hip arthroplasty or hemiarthroplasty) instead for stable FNF, the most used treatment is the internal fixation with cannulated screws or with other hip implants .
Radiographic FNF classification helps with clinical decision making, communication, and research on prognosis and treatment . The most common classification used for intracapsular FNF are the Garden Classification and the AO/OTA classification. These classification systems are based on 2-dimensional X-ray images. Garden classified femoral neck fractures into four types based on displacement on the anteroposterior radiograph [13, 14]. A type I fracture is an incomplete or valgus-impacted fracture. A type II fracture is a complete fracture without displacement of the fracture fragments. A type III fracture is a complete fracture with partial displacement of fracture fragments. A type IV fracture is a complete fracture with total displacement of the fracture fragments, allowing the femoral head to rotate back to an anatomic position . The AO/OTA classification system is organized into hierarchies of severity as the descriptions generally proceed from simple to multifragmentary fractures . Fractures of the femoral head have been classified as subcapital with minimal or no displacement (Type B1), transcervical (Type B2), or displaced subcapital fractures (Type B3). Each of these types has a subclassification . In clinical practice AO/OTA classification is usually simplified considering only the three categories (B1, B2, B3).
The aim of the study is to assess the reliability of these classifications by examining intra- and interobserver agreement of trauma surgeons and how the reliability depends on observers’ experience.
In this retrospective study were included patients admitted to a single institution from January 2017 to December 2019 for FNF.
The inclusion criteria was femoral neck fracture in a patient aged 18 years or more.
The exclusion criteria were: incomplete series of preoperative radiography (it was requested digital files of antero-posterior projection of the pelvis and hip in lateral projection), advanced hip osteoarthritis, previous contralateral side femoral neck fracture or contralateral prosthetic replacement, hip dysplasia, associated pelvic fractures. Pathologic fractures were excluded too.
The final sample size consisted of 150 patients, including 57 men and 93 women, with an average age of 75,6 years. The hip involved in 43% of cases was the right. Of this sample, 49 patients underwent CRIF or ORIF surgery, 101 patients underwent prosthesis surgery (in 4 cases, a computed tomography was used for the surgical choice).
All possible patient identification marks were obscured on the radiographs. The radiographs were subsequently numbered and were analyzed, by 6 observers: 2 experienced trauma surgeons (GC and SD), 2 junior trauma surgeons (GM and MSO) and 2 orthopaedic trauma residents (AS and MM). All observers were familiar with the classifications analyzed, and all of them were equipped with the classifications’ definitions and schemes. Surgeons with different experience were chosen to assess how much experience could affect reproducibility. Radiographs were classified according to 2018 edition AO/OTA classification, 2018 edition AO/OTA simplified (only B1, B2, B3) and Garden classification.
Each observer was required to make a first classification of the radiographs in AP and LL projection, noting the results in a specific grid. At the end of the observation, the grid was archived without sharing it with the other observers. One month later, the radiographs were renumbered and then each observer performed a second evaluation of the radiographs (Figs. 1 and 2).
The Kappa statistical analysis was used to determine the reliability of the classifications. Cohen’s Kappa was calculated to determine intra and inter observer reliability. Fleiss’ Kappa was used to calculate the multi rater reliability of more and less experienced trauma surgeons.
We used the interpretation of intra and interobserver variability using the Landis and Koch criteria: a k values of 0.00–0.20 considered slight agreement; 0.21–0.40, fair agreement; 0.41–0.60 moderate agreement; 0.61–0.80, substantial agreement; 0.81–1.00, almost perfect agreement .
Cohen kappa’s values of interobserver reliability of AO/OTA, AO/OTA simplified, and Garden classification based on X ray are noted in Table 1 – 2 – 3. The values of interobserver reliability for Garden classification was from 0,28 [0,17–0,39 CI] to 0,73 [0,65–0,82 CI] with an average of 0,49. AO classification showed reliability from 0,2 [0,11–0,29 CI] to 0,42 [0,33–0,52 CI], with average of 0,30. Simplified AO/OTA classification showed a reliability from 0,38 [0,26–0,50 CI] to 0,58 [0,47-0,69] with an average of 0,48.
We also analyzed the agreement between observers, dividing them in groups according to their trauma experience: trauma surgeon, young trauma surgeon and resident. There were no significant differences in agreement between observer groups (Table 4). The results shows a moderate agreement as regards both Garden classification and simplified AO/OTA classification; the mean K value was lower when considering AO/OTA classification; results demonstrated a fair agreement.
Intra observer reliability
Cohen kappa’s values of intra observer reliability of AO/OTA, AO/OTA simplified, and Garden classification based on X ray are noted in Table 5. The values of intra observer reliability for Garden classification was from 0,48 [0,37–0,58 CI] to 0,79 [0,71–0,87 CI] with an average of 0,63. AO/OTA classification showed reliability from 0,2 [0,13–0,3 CI] to 0,64 [0,56–0,73 CI] with an average of 0,5. Simplified AO/OTA classification showed a reliability from 0,4 [0,28–0,52 CI] to 0,75 [0,66-0,84] with an average of 0,61.
Successful treatment starts by an adequate classification of pathology and an accurate evaluation of the clinical condition of the patient (age, comorbidities) that guides surgeons in choosing the correct management and communication.
Ideally, a classification system should be easily applicable, highly reliable, comprehensive, highly reproducible; in many cases it indicates outcomes. Regarding proximal femur fractures there is still no agreement on a universally accepted, reliable classification, and this can stimulate debate regarding the appropriate treatment options. Any classification system used should aim to possess a high degree of inter-observer and intra-observer reliability facilitating the communication of patient’s conditions providing a clear guidance for the treatment of patients .
A valid classification allows surgeons to determine the correct treatment and predict outcomes. In fact, femoral neck fractures were firstly classified by Waldenström in 1924 in “stable” and “unstable”. In literature, reliability of this classification was widely analyzed; datas show that it’s higher than in the others, because it considers only two level, instead of four and seven level respectively in Garden and AO classification, reducing possible bias  .
In this paper we have studied the inter-observer and intra-observer agreement evaluation of three different classification systems. Six orthopedic trauma surgeons, with different years of experience (two young trauma surgeons, two residents, two trauma surgeons) graded 150 radiographs of proximal femur fractures using Garden classification and 2018 AO classification, complete and simplified. We decided to not use the Waldenström and Pauwels classifications because these are not used a day in the clinical practice.
The inter-observer reliability obtained in Garden classification was moderate, as regards simplified AO/OTA was moderate too (average k value was 0,49 and 0,48 respectively). Inter-observer reliability lessens to fair with an average k value of 0.30 when considering AO classification.
In literature the interobserver agreement of Garden classification varies from fair to moderate [21,22,23]. Our results demonstrated for the Garden classification an higher reliability compared to the previous study: Masionis et al., Gaspar et al. and Van Embden et all found a k value respectively of 0,33, 0,41 and 0,31 [18, 21, 22].
We found a substantial intra-observer reliability in Garden and simplified AO/OTA classification, (mean k value was 0,63 and 0,61 respectively). Intra-observer reliability lessens to moderate with a mean k value of 0,50 when considering AO/OTA classification.
Even for Garden’s intraobserver reliability, we found an higher k value compared with previous studies: Masionis et al. found an intraobserver reliability from 0,40 to 0,57 .
We observed, as well as all the studies in the literature, that inter and intra-observer reliability decrease if the classification is more complex, in fact kappa values strongly depends on the numbers of levels of classification investigated [18, 20, 24].
Our work, to our knowledge, is the first in literature considering the reliability of 2018 AO/OTA classification. A recent study analyzed this classification, but only for the extra capsular femur fractures (31A): simplified AO k value was 0,479, complete AO k value was 0,376 .
Masionis et all describe a k value for intraobserver reliability from 0,26 to 0,48 and a k value from 0,11 to 0,43 for interobserver reliability of the previous AO classification ; Blundell et al. found AO system had fair agreement ; Gaspar et al. calculated a k value of 0,17 for interobserver reliability .
Thus, it is important to notice that radiographic images were graded using the latest version of AO classification (2018); despite its complexity, it has a reliability higher than the previous version. Another strength of our study is that the reliability was analyzed considering the experience of observers. In literature, this particular analysis has been described only for Garden classification and for the previous version of AO classification [12, 18, 20].
Our results are similar to data founded in literature for the reliability when comparing more experienced to less experience surgeons [18, 20]. Data we founded favor opinion that experience does not improve the interobserver and intra observer reliability. This can be due to the learning curve of classifying fracture that is steeper in the first couple of years of practice and then decreases ; trauma residents making part of this study had already 3 years of experience in treatment this type of fractures.
Authors are aware of limitations of the present study. First, the low sample size of the evaluating surgeons. Then, it’s a retrospective study and patient outcomes were not evaluated. All the observers work in the same hospital and in the same university orthopedic department, it could have probably uniformed their classification. This is not a multicentric study, because patients were selected from a single department and consequently radiographic images were collected using the same protocol. Lastly, we considered only three classifications; we excluded other classifications, such as Pauwels and Waldenström i.e., because in the clinical practice these are the most common used.
The latest version of AO classification (2018), despite its complexity, has a reliability higher than the previous version. Furthermore our results are similar to data founded in literature for the reliability when comparing more experienced to less experience surgeons. Garden and AO/OTA simplified classification are more reliable than AO/OTA classification with subgroups, in fact also in previous literature, inter and intra-observer reliability decrease when the classification become more complex. It does not mean that these classifications can be considered successful because their inter observer reliabilities are not high enough and even trauma experience did not improve them.
Availability of data and materials
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.
Femoral neck fractures
Arbeitsgemeinschaft für Osteosynthesefragen
Orthopaedics Trauma Association
closed reduction internal fixation
open reduction internal fixation
AO Foundation/Orthopaedic Trauma Association
Veronese N, Maggi S. Epidemiology and social costs of hip fracture. Injury. 2018;49:1458–60. https://doi.org/10.1016/j.injury.2018.04.015 PMID: 29699731.
Compston JE, McClung MR, Leslie WD. Osteoporosis. Lancet. 2019;393:364–76. https://doi.org/10.1016/S0140-6736(18)32112-3 PMID: 30696576.
Bäcker HC, Wu CH, Maniglio M, Wittekindt S, Hardt S, Perka C. Epidemiology of proximal femoral fractures. J Clin Orthop Trauma. 2021;12:161–5. https://doi.org/10.1016/j.jcot.2020.07.001 PMID: 33716441.
Piscitelli P, Feola M, Rao C, Neglia C, Rizzo E, Vigilanza A, et al. Incidence and costs of hip fractures in elderly Italian population: first regional-based assessment. Arch Osteoporos. 2019;14:81. https://doi.org/10.1007/s11657-019-0619-9 PMID: 31342284.
Williamson S, Landeiro F, McConnell T, Fulford-Smith L, Javaid MK, Judge A, et al. Costs of fragility hip fractures globally: a systematic review and meta-regression analysis. Osteoporos Int. 2017;28:2791–800. https://doi.org/10.1007/s00198-017-4153-6 PMID: 28748387.
Dyer SM, Crotty M, Fairhall N, Magaziner J, Beaupre LA, Cameron ID, et al. A critical review of the long-term disability outcomes following hip fracture. BMC Geriatr. 2016;16:158 PMID: 27590604.
Basilico M, Vitiello R, Oliva MS, Covino M, Greco T, Cianni L, et al. Predictable risk factors for infections in proximal femur fractures. J Biol Regul Homeost Agents. 2020;34(3 Suppl. 2):77–81 PMID: 32856444.
Vicenti G, Bizzoca D, Pascarella R, Delprete F, Chiodini F, Daghino W, et al. Development of the Italian fractures registry (RIFra): a call for action to improve quality and safety. Injury. 2020;(10):052.
Florschutz AV, Langford JR, Haidukewych GJ, Koval KJ. Femoral neck fractures: current management. J Orthop Trauma. 2015;29:121–9 PMID: 25635363.
Vitiello R, Perisano C, Covino M, Perna A, Bianchi A, Oliva MS, et al. Euthyroid sick syndrome in hip fractures: valuation of vitamin D and parathyroid hormone axis. Injury. 2020;51(Suppl 3):S13–6 PMID: 31983423.
Bigoni M, Turati M, Leone G, Caminita AD, D’Angelo F, Munegato D, et al. Internal fixation of intracapsular femoral neck fractures in elderly patients: mortality and reoperation rate. Aging Clin Exp Res. 2020;32:1173–8 PMID: 31175608.
Crijns TJ, Janssen SJ, Davis JT, Ring D, Sanchez HB. Science of variation group. Reliability of the classification of proximal femur fractures: does clinical experience matter? Injury. 2018;49:819–23 PMID: 29549969.
Zlowodzki M, Bhandari M, Keel M, Hanson BP, Schemitsch E. Perception of Garden’s classification for femoral neck fractures: an international survey of 298 orthopaedic trauma surgeons. Arch Orthop Trauma Surg. 2005;125:503–5 PMID: 16075274.
Garden RS. Reduction and fixation of subcapital fractures of the femur. Orthop Clin North Am. 1974;5:683–712.
Meinberg EG, Agel J, Roberts CS, Karam MD, Kellam JF. Fracture and dislocation classification Compendium-2018. J Orthop Trauma. 2018;32(Suppl 1):S1–170 PMID: 29256945.
Caviglia HA, Osorio PQ, Comando D. Classification and diagnosis of intracapsular fractures of the proximal femur. Clin Orthop. 2002;399:17–27 PMID: 12011690.
Fung W, Jonsson A, Buhren V, Bhandari M. Classifying intertrochanteric fractures of the proximal femur: does experience matter? Med Princ Pract Int J Kuwait Univ Health Sci Cent. 2007;16:198–202 PMID: 17409754.
Masionis P, Uvarovas V, Mazarevičius G, Popov K, Venckus Š, Baužys K, et al. The reliability of a Garden, AO and simple II stage classifications for intracapsular hip fractures. Orthop Traumatol Surg Res OTSR. 2019;105:29–33 PMID: 30639032.
Waldenström J. Fractures récentes du col femoral: traitement operatoire ou orthopédique. J Chir. 1924;24:129.
Turgut A, Kumbaracı M, Kalenderer Ö, İlyas G, Bacaksız T, Karapınar L. Is surgeons’ experience important on intra- and inter-observer reliability of classifications used for adult femoral neck fracture? Acta Orthop Traumatol Turc. 2016;50:601–5 PMID: 27889406.
Gašpar D, Crnković T, Durović D, Podsednik D, Slišurić F. AO group, AO subgroup, Garden and Pauwels classification systems of femoral neck fractures: are they reliable and reproducible? Med Glas Off Publ Med Assoc Zenica-Doboj Cant Bosnia Herzeg. 2012;9:243–7 PMID: 22926358.
Van Embden D, Rhemrev SJ, Genelin F, Meylaerts S, a. G, Roukema GR. The reliability of a simplified Garden classification for intracapsular hip fractures. Orthop Traumatol Surg Res OTSR. 2012;98:405–8 PMID: 22560590.
Aggarwal A, Singh M, Aggarwal AN, Bhatt S. Assessment of interobserver variation in Garden classification and management of fresh intracapsular femoral neck fracture in adults. Chin J Traumatol Zhonghua Chuang Shang Za Zhi. 2014;17:99–102 PMID: 24698579.
Chan G, Hughes K, Barakat A, Edres K, da Assuncao R, Page P, et al. Inter- and intra-observer reliability of the new AO/OTA classification of proximal femur fractures. Injury. 2020;10:067.
Blundell CM, Parker MJ, Pryor GA, Hopkinson-Woolley J, Bhonsle SS. Assessment of the AO classification of intracapsular fractures of the proximal femur. J Bone Joint Surg Br. 1998;80:679–83 PMID: 9699837.
About this supplement
This article has been published as part of BMC Musculoskeletal Disorders Volume 22 Supplement 2 2021: All about the hip. The full contents of the supplement are available at https://bmcmusculoskeletdisord.biomedcentral.com/articles/supplements/volume-22-supplement-2.
Publication costs are funded by Orthopedic and Traumatology School of Università Cattolica del Sacro Cuore – Roma, The funders did not play any role in the design of the study, the collection, analysis, and interpretation of data, or in writing of the manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare no potential conflict of interest.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Cazzato, G., Oliva, M.S., Masci, G. et al. Femoral neck fracture: the reliability of radiologic classifications. BMC Musculoskelet Disord 22 (Suppl 2), 1063 (2021). https://doi.org/10.1186/s12891-022-05007-3
- Hip fractures
- Femoral neck fracture
- Femoral fractures’ classification