Skip to main content

Clinical validation of artificial intelligence-based preoperative virtual reduction for Neer 3- or 4-part proximal humerus fractures

Abstract

Background

If reduction images of fractures can be provided in advance with artificial-intelligence (AI)-based technology, it can assist with preoperative surgical planning. Recently, we developed the AI-based preoperative virtual reduction model for orthopedic trauma, which can provide an automatic segmentation and reduction of fractured fragments. The purpose of this study was to validate a quality of reduction model of Neer 3- or 4-part proximal humerus fractures established by AI-based technology.

Methods

To develop the AI-based preoperative virtual reduction model, deep learning performed the segmentation of fracture fragments, and a Monte Carlo simulation completed the virtual reduction to determine the best model. A total of 20 pre/postoperative three-dimensional computed tomography (CT) scans of proximal humerus fracture were prepared. The preoperative CT scans were employed as the input of AI-based automated reduction (AI-R) to deduce the reduction models of fracture fragments, meanwhile, the manual reduction (MR) was conducted using the same CT images. Dice similarity coefficient (DSC) and intersection over union (IoU) between the reduction model from the AI-R/MR and postoperative CT scans were evaluated. Working times were compared between the two groups. Clinical validity agreement (CVA) and reduction quality score (RQS) were investigated for clinical validation outcomes by 20 orthopedic surgeons.

Results

The mean DSC and IoU were better when using AI-R that when using MR (0.78 ± 0.13 vs. 0.69 ± 0.16, p < 0.001 and 0.65 ± 0.16 vs. 0.55 ± 0.18, p < 0.001, respectively). The working time of AI-R was, on average, 1.41% of that of MR. The mean CVA of all cases was 81%±14.7% (AI-R, 82.25%±14.27%; MR, 76.75%±14.17%, p = 0.06). The mean RQS was significantly higher when AI-R compared with MR was used (91.47 ± 1.12 vs. 89.30 ± 1.62, p = 0.045).

Conclusion

The AI-based preoperative virtual reduction model showed good performance in the reduction model in proximal humerus fractures with faster working times. Beyond diagnosis, classification, and outcome prediction, the AI-based technology can change the paradigm of preoperative surgical planning in orthopedic surgery.

Level of evidence

Level IV.

Peer Review reports

Background

Comminuted proximal humerus fractures can cause serious complications and poor clinical outcomes [1]. Anatomic reduction can improve postoperative (Post-OP) clinical outcomes, for which proper preoperative planning is essential [2, 3]. Orthopedic surgeons use different methods, including three-dimensional (3D) printing and dedicated two-dimensional (2D)/3D software, for preoperative surgical planning [4, 5]. However, such methods require manual reduction (MR) and are time-consuming; therefore, efficient preoperative planning techniques are required.

Currently, artificial intelligence (AI) and deep learning techniques are leading innovations in medical fields, such as radiology, pharmaceuticals, and oncology [6,7,8]. In orthopedic surgery, various deep learning models are rapidly expanding, including those for diagnosis, risk analysis, clinical decision-making, and outcome prediction [9,10,11,12,13,14,15,16]. Although AI impacts clinical results and fracture diagnosis, many orthopedic surgeons hope to also use AI for surgeries and treatment. If AI could automatically provide a preoperative surgical plan, including the virtual reduction of proximal humerus fracture, surgeons could easily decide the surgical strategy, and surgical planning time would be reduced relative to MR, improving surgery efficacy.

Current research has examined computer-assisted surgical planning software for fractures [17,18,19,20]. Such software mostly requires manual user interaction for fracture reduction, and while studies have attempted automatic virtual reduction of fracture fragments, their application to complex fractures is limited [21]. We developed the AI-based preoperative virtual reduction model for orthopedic trauma, which can provide automatic segmentation and reduction of fractured fragments.

AI-based preoperative virtual reduction model employed several AI techniques, such as ‘You only look once’ to detect a fractured region, semantic segmentation to identify the fractured fragment, and reinforcement learning with Monte Carlo simulation for automatic reduction [22, 23]. AI-based preoperative virtual reduction model can show the 3D reduction model of the certain fracture automatically. We postulate that adequate visualization, along with segmentation and virtual reduction of fractures using 3D-computed tomography (CT), plays a key role in planning, potentially expanding surgical strategies. Clearly, this may aid orthopedic surgeons establish preoperative planning of trauma surgery; however, clinical validation has not yet been demonstrated. Therefore, this study aimed to validate the clinical effectiveness of planning outcomes using AI-based preoperative virtual reduction model through several approaches. Moreover, to evaluate the AI-generated reduction model quality, surgeons’ clinical opinions alongside a novel evaluation method based on computer vision (without human subjectivity) is required. We hypothesize that the performance of AI-based preoperative virtual reduction model will be superior to that of MR for proximal humerus fractures.

Methods

This study was approved by the Institutional Review Board of Ulsan University Hospital (No. UUH-2022-10-030). The requirement for informed consent was waived because of the retrospective nature of the study. All methods adhered to relevant guidelines and regulations.

Application of semantic segmentation model for identification of fractured fragments

Initially, the deep learning model for semantic segmentation was trained using a dataset designed to enable automatic segmentation of fractured bones. The training dataset consisted of 5,619,032 CT images (training: 0.6, validation: 0.2, test: 0.2, 5-fold cross validation) and their corresponding masked images, which were color-coded to distinguish fractured fragments. The deep learning models for the semantic segmentation in this study were developed using MATLAB (2023b, Mathworks, MA, USA, ). The equipment employed for model training incorporated dual Intel(R) Xeon(R) Silver processors running at a frequency of 2.10 GHz, coupled with 128 GB of RAM. Additionally, two NVIDIA GeForce RTX 3090 GPUs, each equipped with 24 GB of GPU memory, were utilized in the process. The deep learning architecture chosen for this study was the DeepLab v3 + model with Inception-ResNet-v2 as backbone [24]. The detailed structure of the deep learning model for the automatic segmentation of fractured fragments is shown in Fig. 1a. For the deep learning model, the solver used a Stochastic Gradient Descent with a momentum value of 0.9. The initial learning rate was set to 0.0005, and a drop period for the learning rate was established every 10 epochs with a drop factor of 0.1. The initial maximum number of epochs was set to 100, and the minibatch size was set to 128. In order to optimize the deep learning model, the five metrics which can show the performance of semantic segmentation were employed: Global Accuracy, Mean Accuracy, Mean intersection over union (IoU), Weighted IoU, Mean Boundary F1(BF) Score. The accuracy score is essentially defined as (True Positive / (True Positive + False Negative)). This metric encompasses ‘Global Accuracy,’ signifying the ratio of correctly classified pixels to the total pixels, irrespective of class, and ‘Mean Accuracy,’ representing the average ratio of correctly classified pixels in each class to the total pixels across all classes. In the context of ‘IoU,’ the score is determined as (True Positive / (True Positive + False Positive + False Negative)). ‘Mean IoU’ represents the IoU for all classes, and we also assessed ‘Weighted IoU,’ which calculates the average IoU for all classes, factored by the number of pixels in each class. Lastly, the ‘Mean BF Score’ denotes the average contour matching score for image segmentation. The BF Score gauges the proximity of the predicted object boundary to the ground truth boundary. Essentially, the equation for the BF Score mirrors that of the dice similarity coefficient (DSC) (2 × precision × recall / (recall + precision)).

Fig. 1
figure 1

Deep learning model structure for automatic segmentation of the fracture fragment from computed tomography scans and Outline of the Monte Carlo simulation and decision tree for automatic virtual reduction. (a) The model is based on DeepLab v3 + with the backbone model as Inception-ResNet-v2. The model has 853 layers with 956 connections, including skip connections. Layers are classified as different colors. The encoder is composed of multiple residual blocks, and the decoder involved the Atrous Spatial Pyramid Pooling. (b) Monte Carlo simulation provides a random number for the location of the fragments, the physical arrangement (rotation), and the detection threshold for the collision as weighting factors. The several decision trees find the best simulation model from the numerus simulation scenarios using the dice similarity coefficient score

Preparation of Neer 3- or 4-part proximal humerus fracture CT images

Twenty cases of Neer classification 3- or 4-part proximal humeral fractures were prepared. Patients were included if anatomic reduction was achieved after surgery on Post-OP 3D-CT scans. Anatomical reduction was determined by a radiologist with more than 14 years of experience. The anatomical reduction was considered when the head-shaft displacement and greater tuberosity displacement are anatomical reduction, and the head-shaft alignment is restored to its normal range of 120 to 150 degrees [25].

AI-based automated reduction and manual reduction

AI-based automated reduction (AI-R) means the virtual reduction of proximal humerus fractures using AI-based preoperative virtual reduction model. The AI-R involves four steps: (1) import the digital images and communications into medicine (DICOM) CT images; (2) automatic segmentation of all fractured fragments using semantic segmentation; (3) automatic reduction using the Monte Carlo simulation and decision tree (Fig. 1b). The Monte Carlo simulation randomly re-locates each fragment within the regular range [26]. In particular, a random number was applied to the initial position and rotation angle of each fragment. In this process, various weighting factors for the random number were determined according to the magnitude of the fragments using a probability model (weighting factor = total volume / fragment volume). The probability model that could limit the maximum shift range and rotation angle was calculated using the distribution of the weighting factor. After weighting each fragment, the fragments progressively moved toward the closest fragments in the simulation. When one fragment detected a collision with other fragments, the movement of the fragment stopped, and the combined fragments moved toward the closest fragment in continuous repetitions. A decision tree was employed to identify the optimal intact bone model for comparing the performance of the reduction model. The DSC between the reduction model and intact bone model can be calculated. Several decision trees were used to deduce the best simulation results for the virtual reduction, allowing users to select the most suitable reduction model from those recommended by each decision tree. Figure 2a shows the process of AI-R; it includes the automatic segmentation and reduction results and exported 3D reduction model.

Fig. 2
figure 2

Process of AI-based automated reduction and manual reduction. (a) The segmentation and reduction are fully automatic processes using AI. The AI-based preoperative virtual reduction model can export the results of segmentation or reduction in the 3D-STL format. The exported STL was uploaded to Metasequoia 4 which is a 3D modeling tool. (b) The manual segmentation using MIMICS and manual reduction using Metasequoia 4 were conducted. (c) Representative outcomes from reduction models achieved through AI-based automated reduction and manual reduction. The first and fourth columns in the figure illustrate the reduction model results obtained through the AI-based automated reduction. The second and fifth columns showcase the reduction models devised through manual reduction. Additionally, the reconstructed three-dimensional images of the postoperative computed tomography are presented in the third and last columns. AI, artificial intelligence; 3D, three-dimensional; STL, stereolithography

The MR process for virtual reduction of proximal humerus fractures consists of several distinct steps (Fig. 2b). Two orthopedic specialists (Y.D.J. and K.B.P.) and two image processing engineers (D.K.Y. and M.S.K.) performed MR using 20 CT scans of proximal humerus fracture. The first step involves importing the DICOM CT images into the segmentation and annotation software, MIMICS (version 21.0; Materialize, Leuven, Belgium). Within MIMICS, the “mask” function segments the bone region employing a Hounsfield unit-based threshold on the 2D-CT images, which generates multiple masks, each representing a different fractured fragment. Following segmentation, the segmented region on the 2D-CT image is reconstructed as a 3D-mesh type, utilizing the masks for guidance. Finally, the orthopedic specialists and engineers used the 3D-modeling tool, Metasequoia 4 (Ver 4.7.4, Tetraface, Japan, Tokyo), to relocate the bone fragments to achieve bone reduction (Fig. 2c). displays examples of virtual fracture reduction by AI-R and MR.

Evaluation of reduction quality using DSC and IoU

To assess the accuracy of reduction quality using AI-R and MR, Post-OP CT images from the same patients were obtained for structural comparison. The proximal humerus fracture virtual reduction model by AI-R was exported from AI-based preoperative virtual reduction model T in STL format and imported to Metasequoia 4. Following the reduction model using AI-R and MR, the reduction model 3D-image was aligned with the corresponding Post-OP CT image using the iterative closest points (ICP) algorithm [27]. Regions containing metal implants and screws were excluded to enable a focused comparison of the bone structure. The 3D-bone images from the reduction model and Post-OP CT were merged, uniformly sliced, and the DSC was calculated for each 2D-axial slice image (Fig. 3). The DSC was calculated with the following formula using MATLAB [28].

$$Dice = {{2 \times TP} \over {(TP + FP) + (TP + FN)}}$$
Fig. 3
figure 3

Method to calculate the DSC and IoU. To calculate the DSC and IoU between two 3D-images (reduction model and Post-OP CT), the two 3D-images should be registered using the algorithm of the iterative closest point, and the 3D-images should be sliced as 2D-images. The ground truth is considered as the Post-OP CT. DSC, dice similarity coefficient; IoU, intersection over union; 3D, three-dimensional; 2D, two-dimensional; TN, true negative; TP, true positive; FP, false positive; FN, false negative

Where TP, FP, and FN are true positive, false positive, and false negative, respectively. The mean DSC for all slice images was determined by comparing the virtual reduction slice images with the corresponding Post-OP CT slice images. DSC was categorized as: Near Perfect Agreement (0.80 ≤ DSC < 1.00), Substantial Agreement (0.60 ≤ DSC < 0.80), Moderate Agreement (0.40 ≤ DSC < 0.60), Fair Agreement (0.20 ≤ DSC < 0.40), and Slight Agreement (0.10 ≤ DSC < 0.20). DSC > 0.6 indicated ‘good performance’ when the reduction model was compared with Post-OP CT images [29].

Another evaluation indicator between the reduction model and Post-OP CT is an IoU (Fig. 3). The mean IoU is calculated according to each slice image using the below equation; the ratio between the overlapping region and combined region is more familiarly introduced. Typically, the minimum IoU, representing ‘good agreement,’ is > 0.5 [30].

$$IoU = {{TP} \over {(TP + FP + FN)}}$$

Working time of the process and evaluation of clinical validity using survey

The working time (in seconds) of the whole process was measured. The working time for AI-R is automatically measured during process. The working time for MR was measured using the timer from the start of the segmentation using the DICOM CT image to the manual performance of virtual reduction.

To evaluate reduction quality, 20 orthopedic surgeons, blinded to the models and with 3–20 years’ experience, completed a survey-based screening evaluation. Surgeons confirmed the results of the reduction model for all 20 cases based on 3D-images of proximal humerus fracture, before and after planning, with several view angles.

First, we investigated the clinical validity agreement (CVA) of the reduction model results for each case. Clinical validity means that real surgery needs to be applied as reference data. The surgeon responded “yes” if they considered the clinical validation to be high, regardless of the reduction quality, and “no” if they did not consider the clinical validation to be low, despite high reduction quality. The percentage of “yes” responses determined the clinical validity.

Second, we investigated the reduction quality score (RQS) for the virtual reduction results. To our knowledge, no consensus on reduction quality evaluation exists; therefore, the inspectors were asked to establish a subjective RQS. The scores ranged 0–100, and the inspector subjectively scored the reduction quality, regardless of clinical validation.

Statistical analysis

Continuous variables were compared using independent t-tests or Mann-Whitney U test, as applicable and are presented as mean ± standard deviation. Statistical significance was set at p < 0.05. Statistical analyses were performed using IBM SPSS, v.17.0 (IBM Corp., Armonk, NY, USA). The required sample size was estimated using the formula below, calculating the number of cases for clinical validity based on previous DSC results. We acquired the weighted mean of extreme cases, including the largest difference in DSC between AI-R and MR, as 0.8500 and 0.6287, respectively.

$$n = {{{{({Z_{\alpha /2}} + {Z_\beta })}^2} \times {P_1}(1 - {P_1})} \over {{{({P_1} - {P_0})}^2}}}$$

Where P1 (0.8500) and P0 (0.6287) are the lower bounds of the 95% confidence interval (CI) and weighted mean, respectively. Zα/2 and Zβ were set at 1.96 (95% CI) and 0.84 (power: 80%).

Results

Performance of semantic segmentation

Figure 4 shows the representative result of the deep learning-based semantic segmentation for CT images along with 3D image reconstruction. The deep learning model reported five metrics as 0.9756 of Global Accuracy, 0.7633 of Mean Accuracy, 0.2841 of Mean IoU, 0.9438 of Weighted IoU, and 0.8677 of Mean BF Score.

Fig. 4
figure 4

Representative result of deep learning-based semantic segmentation for CT images and their 3D image reconstruction. (a) Original CT images of proximal humerus fracture, (b) Segmented masks by the deep learning according to CT images: The segmented masks were identified using different colors. Especially, the fractured region employed various colors to present each fractured fragment. (c) Reconstruction of 3D image for original bone and segmented mask with various colors. 3D image reconstruction of bone as before/after segmentation CT, computed tomography; 3D, three-dimensional

Reduction quality using DSC and IoU

The mean DSC for AI-R and MR was 0.78 ± 0.13 and 0.69 ± 0.16, respectively (p < 0.001). The IoU for AI-R and MR was 0.65 ± 0.16 and 0.55 ± 0.18, respectively (p < 0.001). The percentage of cases demonstrating good performance, with DSC exceeding 0.6, was 100% in AI-R compared to 80% in MR. The percentage of good performance over 0.5 IoU was 100% in AI-R, while it was 75% in MR (Fig. 5).

Fig. 5
figure 5

DSC (a) and IoU (b) values comparing Post-OP CT with reduction model created through MR. DSC (c) and IoU (d) values comparing Post-OP CT with reduction model generated through AI-R. The yellow highlighted area indicates the region of “good performance”; a DSC value > 0.6 and an IoU value > 0.5 when comparing the reduction model from MR with the Post-OP CT images. DSC, dice similarity coefficient; IoU, intersection over union; Post-OP, postoperative; CT, computed tomography

Working time and survey-derived clinical validity

The working time for AI-R was automatically measured during the computation using an additional algorithm for chronometry. The AI-R showed a significantly faster (1.41%) working time in establishing the reduction model than that of MR (49.17 ± 8.64 vs. 3,477 ± 618 s; p < 0.001).

The mean CVA of all cases was 81%±14.7% (AI-R, 82.25%±14.27%; MR, 76.75%±14.17%%; p = 0.06). The mean reduction quality score (RQS) was significantly higher in AI-R than in MR (91.47 ± 1.12 vs. 89.30 ± 1.62; p = 0.0045).

Discussion

This study confirmed the excellent clinical validation of the proximal humerus fracture virtual reduction model using AI-R. Fracture segmentation and reduction using AI-based preoperative virtual reduction model showed good performance based on DSC and IoU compared to MR. The AI-R model significantly reduced the virtual reduction working time, requiring only 1.41% of the time needed for MR. The RQS of AI-R were better than those of MR.

Two key technologies were crucial in achieving the results of this study. The first is the semantic segmentation which recent studies have shown to produce similar trends and outcomes. Especially, Kim et al. utilized a DeepLab v3 + based ResNet50 model for the automatic segmentation of tibia and fibula fracture fragments from CT images [31]. This study employed the DeepLab v3 + based Inception-ResNet-v2 model. Despite using only half amount the data of Kim et al., this study achieved similar levels of performance in semantic segmentation with shorter training times. The second is the Monte Carlo simulation for the automatic reduction. Although there have been few related studies in recent, many studies have depended on the ICP for the automatic reduction. Zhou et al. proposed an improved technique for aligning fragments in cases of highly fragmented fractures [32]. Their approach used a two-category Bayesian classifier which utilized surface vertex intensity values to distinguish between fractured and intact surfaces. Chowdhury et al. recommended the use of a maximum weight graph-matching (MWGM) algorithm to precisely match fractured surfaces [33]. For aligning these surfaces, they advocated using a customized version of the ICP algorithm. In this study, the Monte Carlo simulation and decision tree were used instead of the ICP algorithm. In the case of the ICP based automatic reduction, the accuracy in the complex case has shown the poor performance from the most cases. However, the combination the Monte Carlo simulation and decision tree could overcome the complexity of trauma and showed relative high performance from the high complexity cases.

Comminuted proximal humerus fractures are difficult to anatomically reduce, sometimes requiring arthroplasty surgery [34]. Preoperative surgical planning through virtual reduction can help surgeons decide on treatment methods. Several studies have reported computer-assisted surgery for proximal humerus fracture [18,19,20]. Chen et al. reported superior clinical outcomes and efficacy of proximal humerus fracture surgery using computer-assisted virtual surgical planning compared to conventional planning [17]. Nonetheless, for computer-assisted planning, MR software for segmentation of fracture fragmentation is required. The current study is first to directly compare MR and AI-based planning using such software; AI-based software performed better than MR. Although we have demonstrated its effectiveness in surgical planning, Post-OP clinical outcomes using the AI-based preoperative virtual reduction model have not been analyzed, necessitating additional research.

Moolenaar et al. indicated the usefulness of computer-assisted planning for bone fracture fixation in their meta-analysis, comprising 79 articles [21]. However, in the included studies, fracture fragments reduction was performed manually. Previous studies have shown the feasibility of AI-based automatic segmentation of bone shape [35,36,37]. Zhao et al. tested automated reduction performance of pelvic bone fractures and found a mean global distance error < 4 mm [38]. This error margin meets the pelvic bone fracture reduction criteria; however, achieving more accurate results in reduction quality assessments is essential for AI-based automated software for preoperative planning.

Several engineering indicators to evaluate reduction quality have been introduced, such as length, apposition, and angulation [39]. When the fractured site included the articular surface, the measurements for step-off or gap are important indicators; the measurements of which are conducted using 2D/3D-images [38, 40]. However, the measurement of error can be easily generated according to the kinds of images, human skill, tools, method, and point set for measurement. To reduce error, we suggest applying the DSC and IoU for comparison between two objects. Originally, the DSC and IoU were typically and officially used for similarity analysis between two images in the field of computer vision [41]. Especially, the comparison between segmented regions on different images is significantly effective. Hence, our perspective herein is that the comparison between two different bone models is adequate using the DSC and IoU. In this study, both DSC and IoU in the reduction model using AI-R were better than MR, and AI-R showed good performance in all cases.

Numerous virtual reduction and planning software require a lot of working time, user interaction and expertise, with reported planning duration spanning from 22 to 258 min [21]. However, advancements in AI technology and machine learning for automated segmentation hold the promise of significantly reducing the working time [42]. In this study, AI-R demonstrated notable speed, averaging 49 s. This substantial difference from MR offers the potential for a significant reduction in surgeons’ planning time.

When the surgeons’ perspective based on clinical experiences supports the results of DSC and IoU, it is possible to be a strong and accurate indicator of reduction quality. AI-R showed good performance for the reduction quality from the engineering indicators and surgeons’ opinion within an extremely short time, without necessitating human labour. The surgeons’ clinical opinion (RQS) and the engineering indicators (DSC and IoU) showed statistically clear differences, meaning the results of AI-R are not only superior regarding engineering, but also regarding surgeons’ perspectives. Moreover, CVA also tended to be higher in AI-R. The significance of this study was focused on the clinical validity of DSC and IoU application with the surgeons’ opinions to evaluate the reduction quality rather than the performance of the AI-based preoperative virtual reduction model itself.

Nevertheless, this study had some limitations. First, the Post-OP CT, not the mirrored image, was the comparator. The mirrored image—an image of the side opposite the fractured region—can be a standard model to compare reduction quality. However, this study used the 20 cases selected based on the surgeons’ opinions, and the evaluation of reduction quality followed how accurately the reduction model plan corresponded with the Post-OP CT, classified as ‘anatomic reduction was achieved’. We believed that the superior reduction outcome can be clarified by retrospective evaluation of the follow-up outcome rather than the similarity level of the bone shape of the opposite side. Second, data were limited. Because the evaluation of reduction quality using DSC and IoU was firstly and carefully proposed, instead of the measurement of length, apposition, or angulation, this study progressed on a verifiable basis with a relatively small dataset. The data acquired had 80% power with 0.9 standardized difference based on the sample size calculation. Despite limited data, the good and consistent performance for the establishment of the reduction model was clear. To accurately evaluate the performance of AI-based preoperative virtual reduction model, more data is essential. Even the evaluation method for reduction quality using DSC and IoU should be progressed using additional data and surgeons’ clinical opinions. Finally, there is the potential impact on the surgeon’s understanding of the fracture and reduction process. While AI-based technology can expedite the planning process by providing preoperative reduction images, they may also limit the learning opportunity for surgeons who traditionally gain in-depth knowledge of the fracture by manually planning the surgery. To confirm whether AI-based technology is helpful in real situations, further comparative studies using and without using AI-based technology to improve surgery quality are needed.

Conclusion

This study evaluated the reduction model quality with AI-based preoperative virtual reduction model, considering its clinical validity. The reduction quality with AI-R showed good and consistent performance based on the DSC, IoU, and surgeons’ clinical opinions (RQS) with faster working times. Beyond diagnosis, classification, and outcome prediction, AI-based technology can change the paradigm of preoperative surgical planning in orthopedic surgery.

Data availability

The full original data generated during the current study are available from the corresponding author on reasonable request.

Abbreviations

2D:

Two-Dimensional

3D:

Three-Dimensional

AI:

Artificial Intelligence

AI-R:

AI-Based Automated Reduction

BF:

Boundary F1

CI:

Confidence Interval

CT:

Computed Tomography

CVA:

Clinical Validity Agreement

DSC:

Dice Similarity Coefficient

IoU:

Intersection over Union

MR:

Manual Reduction

Post-OP:

Postoperative

RQS:

Reduction Quality Score

STL:

Stereolithography

References

  1. Roddy E, Kandemir U. High rate of avascular necrosis but excellent patient-reported outcomes after open reduction and internal fixation (ORIF) of proximal humerus fracture dislocations: should ORIF be considered as primary treatment? J Shoulder Elb Surg. 2023;32(10):2097–104.

    Article  Google Scholar 

  2. Han R, Uneri A, Vijayan RC, Wu P, Vagdargi P, Sheth N, Vogt S, Kleinszig G, Osgood GM, Siewerdsen JH. Fracture reduction planning and guidance in orthopaedic trauma surgery via multi-body image registration. Med Image Anal. 2021;68:101917.

    Article  PubMed  CAS  Google Scholar 

  3. Augat P, von Rüden C. Evolution of fracture treatment with bone plates. Injury. 2018;49(Suppl 1):S2–7.

    Article  PubMed  Google Scholar 

  4. Tomaževič M, Kristan A, Kamath AF, Cimerman M. 3D printing of implants for patient-specific acetabular fracture fixation: an experimental study. Eur J Trauma Emerg Surg. 2021;47(5):1297–305.

    Article  PubMed  Google Scholar 

  5. Hu Y, Li H, Qiao G, Liu H, Ji A, Ye F. Computer-assisted virtual surgical procedure for acetabular fractures based on real CT data. Injury. 2011;42(10):1121–4.

    Article  PubMed  Google Scholar 

  6. Hamet P, Tremblay J. Artificial intelligence in medicine. Metabolism. 2017;69s:S36–40.

    Article  PubMed  Google Scholar 

  7. Hessler G, Baringhaus KH. Artificial intelligence in drug design. Molecules 2018, 23(10).

  8. Duong MT, Rauschecker AM, Rudie JD, Chen PH, Cook TS, Bryan RN, Mohan S. Artificial intelligence for precision education in radiology. Br J Radiol. 2019;92(1103):20190389.

    Article  PubMed  PubMed Central  Google Scholar 

  9. Chung SW, Han SS, Lee JW, Oh KS, Kim NR, Yoon JP, Kim JY, Moon SH, Kwon J, Lee HJ, et al. Automated detection and classification of the proximal humerus fracture by using deep learning algorithm. Acta Orthop. 2018;89(4):468–73.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Hui AT, Alvandi LM, Eleswarapu AS, Fornari ED. Artificial intelligence in modern orthopaedics: current and future applications. JBJS Rev 2022, 10(10).

  11. Cheng CT, Wang Y, Chen HW, Hsiao PM, Yeh CN, Hsieh CH, Miao S, Xiao J, Liao CH, Lu L. A scalable physician-level deep learning algorithm detects universal trauma on pelvic radiographs. Nat Commun. 2021;12(1):1066.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  12. Ramkumar PN, Karnuta JM, Navarro SM, Haeberle HS, Scuderi GR, Mont MA, Krebs VE, Patterson BM. Deep learning preoperatively predicts value metrics for primary total knee arthroplasty: development and validation of an artificial neural network model. J Arthroplasty. 2019;34(10):2220–e22272221.

    Article  PubMed  Google Scholar 

  13. Olczak J, Fahlberg N, Maki A, Razavian AS, Jilert A, Stark A, Sköldenberg O, Gordon M. Artificial intelligence for analyzing orthopedic trauma radiographs. Acta Orthop. 2017;88(6):581–6.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Borjali A, Chen AF, Bedair HS, Melnic CM, Muratoglu OK, Morid MA, Varadarajan KM. Comparing the performance of a deep convolutional neural network with orthopedic surgeons on the identification of total hip prosthesis design from plain radiographs. Med Phys. 2021;48(5):2327–36.

    Article  PubMed  Google Scholar 

  15. Jain D, Durand W, Burch S, Daniels A, Berven S. Machine learning for predictive modeling of 90-day readmission, major medical complication, and discharge to a facility in patients undergoing long segment posterior lumbar spine fusion. Spine (Phila Pa 1976). 2020;45(16):1151–60.

    Article  PubMed  Google Scholar 

  16. Levine B, Fabi D, Deirmengian C. Digital templating in primary total hip and knee arthroplasty. Orthopedics. 2010;33(11):797.

    Article  PubMed  Google Scholar 

  17. Chen Y, Jia X, Qiang M, Zhang K, Chen S. Computer-assisted virtual Surgical Technology Versus three-dimensional Printing Technology in Preoperative Planning for Displaced three and four-part fractures of the proximal end of the Humerus. J bone Joint Surg Am. volume 2018;100(22):1960–8.

  18. Wu RJ, Zhang W, Lin YZ, Fang ZL, Wang KN, Wang CX, Yu DS. Influence of preoperative simulation on the reduction quality and clinical outcomes of open reduction and internal fixation for complex proximal humerus fractures. BMC Musculoskelet Disord. 2023;24(1):243.

    Article  PubMed  PubMed Central  Google Scholar 

  19. Xia S, Zhang Y, Wang X, Wang Z, Wang W, Ma X, Tian S. Computerized virtual surgery planning for ORIF of Proximal Humeral fractures. Orthopedics. 2015;38(5):e428–433.

    Article  PubMed  Google Scholar 

  20. Chen Y, Zhang K, Qiang M, Li H, Dai H. Computer-assisted preoperative planning for proximal humeral fractures by minimally invasive plate osteosynthesis. Chin Med J (Engl). 2014;127(18):3278–85.

    Article  PubMed  Google Scholar 

  21. Moolenaar JZ, Tumer N, Checa S. Computer-assisted preoperative planning of bone fracture fixation surgery: a state-of-the-art review. Front Bioeng Biotechnol. 2022;10:1037048.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition: 2016; 2016: 779–788.

  23. Vodopivec T, Samothrakis S, Ster B. On monte carlo tree search and reinforcement learning. J Artif Intell Res. 2017;60:881–936.

    Article  Google Scholar 

  24. Wang J, Liu X. Medical image recognition and segmentation of pathological slices of gastric cancer based on Deeplab v3 + neural network. Comput Methods Programs Biomed. 2021;207:106210.

    Article  PubMed  Google Scholar 

  25. Schnetzke M, Bockmeyer J, Porschke F, Studier-Fischer S, Grützner PA, Guehring T. Quality of reduction influences Outcome after locked-plate fixation of Proximal Humeral Type-C fractures. J bone Joint Surg Am Volume. 2016;98(21):1777–85.

    Article  Google Scholar 

  26. Yoon D-K, Jung J-Y, Suh TS. Application of proton boron fusion reaction to radiation therapy: a Monte Carlo simulation study. Appl Phys Lett 2014, 105(22).

  27. Liang L, Wei M, Szymczak A, Petrella A, Xie H, Qin J, Wang J, Wang FL. Nonrigid iterative closest points for registration of 3D biomedical surfaces. Opt Lasers Eng. 2018;100:141–54.

    Article  Google Scholar 

  28. Krithika Alias AnbuDevi M, Suganthi K. Review of semantic segmentation of medical images using modified architectures of UNET. Diagnostics (Basel) 2022, 12(12).

  29. McHugh ML. Interrater reliability: the kappa statistic. Biochem Med (Zagreb). 2012;22(3):276–82.

    Article  PubMed  Google Scholar 

  30. Van Beers F. Capsule Networks with Intersection over Union Loss for Binary Image Segmentation. In: ICPRAM: 2021; 2021: 71–78.

  31. Kim H, Jeon YD, Park KB, Cha H, Kim MS, You J, Lee SW, Shin SH, Chung YG, Kang SB, et al. Automatic segmentation of inconstant fractured fragments for tibia/fibula from CT images using deep learning. Sci Rep. 2023;13(1):20431.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  32. Zhou B, Willis A, Sui Y, Anderson D, Thomas T, Brown T. Improving inter-fragmentary alignment for virtual 3D reconstruction of highly fragmented bone fractures. Medical imaging 2009: image Processing: 2009. SPIE; 2009. pp. 1033–41.

  33. Chowdhury AS, Bhandarkar SM, Robinson RW, Yu JC. Virtual multi-fracture craniofacial reconstruction using computer vision and graph matching. Comput Med Imaging Graph. 2009;33(5):333–42.

    Article  PubMed  Google Scholar 

  34. Ratajczak K, Szczesny G, Maldyk P. Comminuted fractures of the proximal humerus - principles of the diagnosis, treatment and rehabilitation. Ortop Traumatol Rehabil. 2019;21(2):77–93.

    Article  PubMed  Google Scholar 

  35. Deng Y, Wang L, Zhao C, Tang S, Cheng X, Deng HW, Zhou W. A deep learning-based approach to automatic proximal femur segmentation in quantitative CT images. Med Biol Eng Comput. 2022;60(5):1417–29.

    Article  PubMed  Google Scholar 

  36. Liu P, Han H, Du Y, Zhu H, Li Y, Gu F, Xiao H, Li J, Zhao C, Xiao L, et al. Deep learning to segment pelvic bones: large-scale CT datasets and baseline models. Int J Comput Assist Radiol Surg. 2021;16(5):749–56.

    Article  PubMed  Google Scholar 

  37. Verhelst PJ, Smolders A, Beznik T, Meewis J, Vandemeulebroucke A, Shaheen E, Van Gerven A, Willems H, Politis C, Jacobs R. Layered deep learning for automatic mandibular segmentation in cone-beam computed tomography. J Dent. 2021;114:103786.

    Article  PubMed  CAS  Google Scholar 

  38. Zhao C, Guan M, Shi C, Zhu G, Gao X, Zhao X, Wang Y, Wu X. Automatic reduction planning of pelvic fracture based on symmetry. Comput Methods Biomech Biomedical Engineering: Imaging Visualization. 2022;10(6):577–84.

    CAS  Google Scholar 

  39. Kim MS, Yoon DK, Shin SH, Choe BY, Rhie JW, Chung YG, Suh TS. Quantitative Assessment of the restoration of original anatomy after 3D virtual reduction of long bone fractures. Diagnostics (Basel) 2022, 12(6).

  40. Zhang B, Lu H, Quan Y, Wang Y, Xu H. Fracture mapping of intra-articular calcaneal fractures. Int Orthop. 2023;47(1):241–9.

    Article  PubMed  Google Scholar 

  41. Cui Z, Fang Y, Mei L, Zhang B, Yu B, Liu J, Jiang C, Sun Y, Ma L, Huang J, et al. A fully automatic AI system for tooth and alveolar bone segmentation from cone-beam CT images. Nat Commun. 2022;13(1):2096.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  42. He Y, Liu Y, Yin B, Wang D, Wang H, Yao P, Zhou J. Application of finite element analysis combined with virtual computer in Preoperative Planning of Distal femoral fracture. Front Surg. 2022;9:803541.

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

Not applicable.

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

Young Dae Jeon: writing the main manuscript and interpretation of data Kwang-Hwan Jung: design of the work and analysis. Moo-Sub Kim: development of AI software. Hyeonjoo Kim: conception and proof reading. Do-Kun Yoon: development of AI software. Ki-Bong Park: conception and proof reading and writing – review & editing. The authors read and approved the final manuscript.

Corresponding author

Correspondence to Ki-Bong Park.

Ethics declarations

Ethics approval and consent to participate

This study was approved by the Institutional Review Board of Ulsan University Hospital (No. 2022-10-030).

Consent for publication

The requirement for informed consent was waived because of the retrospective nature of the study.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jeon, Y.D., Jung, KH., Kim, MS. et al. Clinical validation of artificial intelligence-based preoperative virtual reduction for Neer 3- or 4-part proximal humerus fractures. BMC Musculoskelet Disord 25, 669 (2024). https://doi.org/10.1186/s12891-024-07798-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12891-024-07798-z

Keywords