HEMGN and SLC2A1 might be potential diagnostic biomarkers of steroid-induced osteonecrosis of femoral head: study based on WGCNA and DEGs screening

Background Steroid-induced osteonecrosis of the femoral head (SONFH) is a chronic and crippling bone disease. This study aims to reveal novel diagnostic biomarkers of SONFH. Methods The GSE123568 dataset based on peripheral blood samples from 10 healthy individuals and 30 SONFH patients was used for weighted gene co-expression network analysis (WGCNA) and differentially expressed genes (DEGs) screening. The genes in the module related to SONFH and the DEGs were extracted for Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis. Genes with |gene significance| > 0.7 and |module membership| > 0.8 were selected as hub genes in modules. The DEGs with the degree of connectivity ≥5 were chosen as hub genes in DEGs. Subsequently, the overlapping genes of hub genes in modules and hub genes in DEGs were selected as key genes for SONFH. And then, the key genes were verified in another dataset, and the diagnostic value of key genes was evaluated by receiver operating characteristic (ROC) curve. Results Nine gene co-expression modules were constructed via WGCNA. The brown module with 1258 genes was most significantly correlated with SONFH and was identified as the key module for SONFH. The results of functional enrichment analysis showed that the genes in the key module were mainly enriched in the inflammatory response, apoptotic process and osteoclast differentiation. A total of 91 genes were identified as hub genes in the key module. Besides, 145 DEGs were identified by DEGs screening and 26 genes were identified as hub genes of DEGs. Overlapping genes of hub genes in the key module and hub genes in DEGs, including RHAG, RNF14, HEMGN, and SLC2A1, were further selected as key genes for SONFH. The diagnostic value of these key genes for SONFH was confirmed by ROC curve. The validation results of these key genes in GSE26316 dataset showed that only HEMGN and SLC2A1 were downregulated in the SONFH group, suggesting that they were more likely to be diagnostic biomarkers of SOFNH than RHAG and RNF14. Conclusions Our study identified that two key genes, HEMGN and SLC2A1, might be potential diagnostic biomarkers of SONFH. Supplementary Information The online version contains supplementary material available at 10.1186/s12891-021-03958-7.


Background
Steroid-induced osteonecrosis of the femoral head (SONFH) is a chronic and crippling disease of the femoral head, due to the disruption of the blood supply of the femoral head and the subsequent death of bone cells after chronic exposure to excessive glucocorticoids, which ultimately results in the collapse of the femoral head and dysfunction of the hip joint [1]. There are approximately 10,000 to 20,000 new non-traumatic osteonecrosis of the femoral head (NONFH) cases reported each year in the United States alone, while the estimated NONFH cases in China were 8.12 million in the population aged 15 years and over, among which SONFH counted for 47.4% of the total NONFH cases [2][3][4]. Although the diagnostic criteria for SONFH have been established, SONFH patients are often diagnosed at the advanced stage (ARCO stage III-IV), due to the nonspecific clinical symptoms at the early stage and the absence of specific biomarkers for the early diagnosis, which often leads to total hip arthroplasty of the patients [5]. Therefore, it is exceptionally urgent to establish novel early diagnostic molecular markers for SONFH.
Many studies have identified the biomarkers of diseases by screening differentially expressed genes (DEGs).
It is worth noting that few studies have explored clusters of highly correlated genes, which may play key a role in clinical features of interests. Weighted gene coexpression network analysis (WGCNA), a particularly useful systems biology method in this context, helps to construct free-scale gene co-expression networks and detect gene modules [6]. By analyzing the connectivity between modules and clinical features, we can determine which modules are associated with which features. Notably, WGCNA has been widely used to identify the hub genes related to clinical features in different diseases, such as osteoarthritis [7], acute myocardial infarction [8], bladder cancer [9] and pancreatic cancer [10].
In the present study, we proposed to identify novel diagnostic biomarkers of SONFH based on WGCNA and DEGs screening on the basis of GSE123568 dataset. The flowchart used in the present study was presented in Fig. 1. The key module with the highest level of significant correlation with SONFH was identified, and the genes with |gene significance| > 0.7 and |module mem-bership| > 0.8 were then selected as hub genes in the key module. DEGs were integrated into a protein-protein interaction network analysis using the STRING database and Cytoscape software to identify hub genes (degree≥5). The overlapped genes between hub genes in the key module and hub genes in DEGs were defined as key genes (RHAG, RNF14, HEMGN and SLC2A1), and then the key genes were validated in GSE26316 dataset. This study may provide novel insight into the underlying mechanisms of SONFH and contribute to identifying potential diagnostic biomarkers of SONFH.

Data collection and preprocessing
The GSE123568 dataset was downloaded from the GEO database [11]. The gene annotation platform of the GSE123568 dataset was GPL15207 (Affymetrix Human Gene Expression Array). The GSE123568 dataset consisted of 10 peripheral blood samples from healthy individuals and 30 peripheral blood samples from SONFH patients (Table S1). The RAW data from the GSE123568 dataset were preprocessed using the R package "affy" and normalized using the "rma" method. Then, the data were filtered using the R package "genefilter" and the "nsFilter" method. Subsequently, probes were annotated using the R package "annotate". To ensure the reliability of the annotation, probe sets mapped to > 1 gene were removed. In cases where a gene corresponds to a plurality of probe sets, the maximum value was used as the gene expression value. Eventually 40 samples and 12,162 genes were processed for subsequent analysis.

WGCNA
After ranking the genes according to the median absolute deviation from largest to smallest, we selected the top 5000 genes for WGCNA using the R package "WGCNA" [6,12]. Subsequently, we screened out the power parameter ranging from 1 to 20 using the "pickSoftThreshold" (package WGCNA) function. A suitable soft threshold of 4 was selected because it is the minimum power value that satisfied the degree of independence of 0.95. And then we obtained 9 modules through dynamic branch cutting with 0.25 as the merging threshold. Furthermore, based on the difference of the Topological Overlap Matrix and their cluster dendrogram, we visualized the resulting gene network as a heatmap.

Identification of key modules
To identify key modules that were significantly related to clinical features, the correlation between the module feature genes and clinical features was analyzed. The correlation values were shown in a heatmap. The module most significantly associated with the SONFH statue was considered as the key module of SONFH. Gene significance (GS) referred to the correlation between gene expression and each trait, while module membership (MM) referred to the correlation between gene expression and each module eigengene. To verify specific module-trait associations, we examined the correlation between GS and MM. All correlation analyses in this study were carried out by using the Pearson correlation described in the 'WGCNA' package [6]. The genes with |GS| > 0.7 and |MM| > 0.8 were then selected as hub genes in the key module.

DEGs screening
The "limma" package from R [13] was used to screen for the DEGs between healthy individuals and SONFH patients. The adjusted P value < 0.05 and |log2(fold change)| > 1.5 were used as the cut-off thresholds for the GSE123568 dataset. The heatmap of DEGs was plotted with R package "pheatmap".
Protein-protein interaction (PPI) network analysis and key genes screening PPI network can help us identify the hub genes in DEGs between healthy individuals and SONFH patients. PPI information of DEGs was acquired from the Search Tool for the Retrieval of Interacting Genes (STRING) database. Then, Cytoscape software (v3.7.1) was used for the construction of PPI network. Hub genes in DEGs were identified as genes that had a ≥ 5 degree of connectivity. The overlapping genes of hub genes in the brown module and hub genes in DEGs were defined as key genes, which were significantly correlated with SONFH and easily affected by SONFH.

Functional enrichment analysis of key module
To assess the biological function of key module genes and DEGs, the Gene Ontology (GO) functional terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways enrichment analysis were performed using Database for Annotation, Visualization, and Integrated Discovery (DAVID) tool (version 6.8) [14][15][16]. The GO terms and KEGG pathways with pvalue < 0.05 and count > 1 were considered significant. For the genes in the key module, the top 10 GO terms and KEGG pathways functional enrichment ranked by gene count were visualized. For the DEGs between healthy individuals and SONFH patients, all significant GO terms and KEGG pathways functional enrichment were visualized.

Validation of key genes in a public database
To validate these key genes in an external dataset, GEO was searched with the keywords "femoral head necrosis". The inclusion criteria for the dataset used for validation were as follows: i) femoral head necrosis induced by steroid (glucocorticoid); ii) gene expression profiles; iii) the sample size of each group was ≥3. Based on these criteria, we identified 1 appropriate dataset, namely GSE26316, which included femoral heads from 3 normal rats and 3 SONFH rats. Subsequently, the key genes were assessed in the GSE26316 dataset for further validation. The expression profile from GSE26316 dataset was analyzed using the same method as described above for GSE123568.

Receiver operating characteristic (ROC) curve analysis
To investigate the effect of gene expression on the disease state of SONFH, we plotted ROC curve based on the gene expression data and the status of the samples from the GSE123568 dataset. The ROC curves were plotted by the "pROC" package in R software [17]. The entropy weight method was used to determine the weight of each gene, and then the ROC curve of the four key genes was plotted. The diagnostic value of these four genes for SONFH was assessed using the area under the curve (AUC) obtained from the ROC curve analysis.

Construction of WGCNA network
After preprocessing and filtering for the GSE123568 dataset, we selected the top 5000 genes ranked by median absolute deviation from large to small for subsequent analysis. The results of sample clustering showed that there is no significant difference in the samples included in the WGCNA (Fig. 2a). When the power value (β) was set to 4, the scale independence value achieved 0.95 (Fig. 2b), and the co-expression networks met the requirements of scale-free topology (Fig. 2c, d). Thus, β = 4 was selected to produce a hierarchical clustering tree with different colors representing different modules.

Identification and visualization of the key module
We then set power = 4 to produce a hierarchical clustering tree. Nine modules were identified with a merging threshold of 0.25 (Fig. 3a). Among all modules, the brown module with 1258 genes was the most relevant for SONFH status (Fig. 3b, c). All genes were selected for the heatmap (Fig. 3D). The eigengene dendrogram and heatmap were used to identify groups of correlated eigengenes. The result also indicated that the brown module was significantly associated with SONFH status (Fig. 3e). The GS for SONFH and MM in the brown module showed a strongly significant correlation, which indicated that genes in the brown module were highly correlated with SONFH (Fig. 3f). Subsequently, the genes with the |GS| > 0.7 and the |MM| > 0.8 were then selected as hub genes, and a total of 91 hub genes in the brown module were chosen for further analysis (Table S2).

Functional enrichment analysis of brown module genes
The genes in the brown module were used for GO analysis and KEGG pathway enrichment analysis to explore the underlying biological process correlated to SONFH. The top 10 significant GO terms and KEGG pathways functional enrichment ranked by gene count were visualized. For GO-biological process (GO-BP) enrichment analysis, the results showed that the genes involved in inflammatory response, apoptotic process, etc. were significantly enriched (Fig. 4a). For GOcellular component (GO-CC) enrichment analysis, the genes involved in cytoplasm, plasma membrane, etc. were significantly enriched (Fig. 4b). For GOmolecular function (GO-MF) enrichment analysis, the genes involved in protein binding, protein homodimerization activity, etc. were significantly enriched (Fig. 4c). The results of the KEGG pathway analysis showed that the genes involved in osteoclast differentiation, tuberculosis, etc. were significantly enriched (Fig. 4d).

Identification and analysis of DEGs
Subsequently, we screened DEGs between the peripheral blood of healthy individuals and SONFH patients on the basis of GSE123568 dataset. One hundred forty-five DEGs were screened out, including 17 upregulated genes and 128 downregulated genes (Fig. 5a, b, and Table S2). We further performed GO and KEGG enrichment analysis for DEGs. The results of GO enrichment analysis showed that DEGs were mainly involved in negative regulation of transcription from RNA polymerase II promoter, cytosol, cytoskeleton and protein binding, with KEGG pathways enrichment analysis showing no significant enrichment (Fig. 5c). The DEGs with the degree of connectivity ≥5 were then selected as hub genes, of which, 26 hub genes in DEGs were chosen for further analysis (Fig. 5d, Table S4).

Identification and verification of key genes
Among the 26 hub genes in DEGs, RHAG, RNF14, HEMGN and SLC2A1 were identified as the key genes for SONFH status and were selected for subsequent analysis (Fig. 6a, Table S4). We further visualized the expression of RHAG, RNF14, HEMGN and SLC2A1 in the GSE123568 dataset and found that the expression of these four genes was significantly lower in SONFH group compared with the normal group (P < 0.05, Fig. 6b-e). Subsequently, the 4 genes were verified in another dataset GSE26316, and the results showed that the expression levels of HEMGN and SLC2A1 were lower in the SONFH group (P < 0.05, Fig. 6f, g). However, no statistical difference in the gene expression of RHAG and RNF14 was obtained (Fig. 6h, i).

ROC curve analysis of key genes
Based on the RHAG, RNF14, HEMGN and SLC2A1 expression data in GSE123568 and the disease state of the samples, the ROC curve was plotted to assess the diagnostic value of these four genes for SONFH. As shown in Fig. 7, the AUC of these genes were all > 0.9, which indicated that these genes might serve as potential diagnostic markers for SONFH.

Discussion
SONFH is a complex disorder of the hip, of which the pathogenesis is multifactorial, with genetic and environmental factors playing a role [18]. Clinically, SONFH is sometimes difficult to be predicted or diagnosed because it is often asymptomatic at the early stage. Therefore, the femoral heads of some patients have already collapsed at the onset of the symptoms. Although total hip arthroplasty is considered as a definitive therapy of femoral head osteonecrosis, SONFH patients still face the psychological and economic burdens of revision surgery, as SONFH patients are often quite young when they received total hip arthroplasty [19]. Therefore, it is necessary to identify novel biomarkers for the early diagnosis of SONFH and the subsequent procedures for joint preservation. The advancement of genomics and proteomics contribute to significant advances in the diagnosing and treating of many rare diseases and pathological conditions [20]. The biochip of GSE123568 consists of 10 peripheral blood samples from healthy individuals and 30 peripheral blood samples from SONFH patients. In the current study, the WGCNA algorithm and DEGs screening method were adopted to identify SONFH diagnostic biomarkers based on GSE123568. We found that the module with 1258 genes was the most relevant to SONFH status. The key module genes were significantly enriched in inflammatory response, apoptotic process and osteoclast differentiation. In fact, inflammation has been shown to play an essential role in the occurrence of SONFH. Animal experiments already indicated that the pathogenesis of SONFH involves inflammatory macrophage polarization mediated by pro-inflammatory tumor necrosis factor-α (TNF-α) [21]. The administration of anti-inflammatory IFN-β could inhibit IL-6 secretion and suppress osteonecrosis [22]. Additionally, the death of bone cell was found to be an essential pathogenic process of SONFH [23,24]. Moreover, it is well known that abnormal osteoclast activity could lead to loss of bone structural integrity and subchondral fracture in osteonecrosis of the femoral head. A recent study has found that osteoclast numbers were elevated in rat or human femoral head of SONFH, which ultimately resulted in bone loss in the femoral heads and collapse of the femoral head [25]. To sum up, the key module genes were found to be significantly enriched in the inflammatory response, apoptotic process and osteoclast differentiation during the pathogenesis of SONFH.
Notably, 145 DEGs with 26 hub genes were screened out between peripheral blood samples of healthy individuals and SONFH patients, and the DEGs were mainly involved in negative regulation of transcription from RNA polymerase II promoter. Then, the intersection genes of hub genes in the key module and hub genes in DEGs were identified as the key genes for SONFH status, namely RHAG, RNF14, HEMGN and SLC2A1. To verify the results of bioinformatics analysis, we used the ROC curve to predict the diagnostic value of RHAG, RNF14, HEMGN, SLC2A1. The results showed that these genes might serve as diagnostic markers for SONFH because the AUC of these four genes was > 0.9.
RHAG, a member of the solute transporter family SLC42, is involved in the transportation of ammonia across erythrocyte membranes and appears to affect the function of erythrocyte in oxygen transport [26][27][28]. However, no existing study reported the role of RHAG in SONFH so far. RNF14, a human prostate coactivator, could induce androgen receptor (AR) target gene expression and regulate the AR signaling pathway [29]. Yang et al. [30] reported that the low expression of heterogeneous nuclear ribonucleoprotein A1 might activate RNF14-enhanced cell growth and contribute to prostate cancer progression. Wu et al. [31] reported that RNF14 is a regulator of TCF/β-catenin to activate the Wnt pathway in colon cancer cells. These studies indicate that RNF14 is an oncoprotein in several human cancers. However, whether RNF14 is associated with SONFH remains unclear. HEMGN, also known as RP59 in Rattus norvegicus, is expressed in bone marrow cells and osteoblasts and participates in osteoblast recruitment [32,33]. Jiang et al. [34] reported that HEMGN is a direct transcriptional target of HOXB4 and induces the expansion of myeloid progenitor cells. Thus, we speculate that HEMGN could participate in the occurrence of SONFH based on its involvement in the process of myeloid progenitor cell expansion and osteoblast recruitment. SLC2A1, a facilitative glucose transporter, is responsible for constitutive or basal glucose uptake [35,36]. Chen et al. [37] showed that deletion of SLC2A1 in osteoblast mostly blocked the bone anabolic function of Wnt7b in vivo, and impaired osteoblast differentiation and mineralization in vitro.
Another study showed that miR-140-5p may contribute to the development femoral head osteonecrosis via ubiquitin proteasome system [38], while SLC2A1 was a target of miR-140-5p [39]. In addition, the abolition of PI3K/AKT signaling induced by miR-186 may contribute to the pathogenesis of non-traumatic osteonecrosis [40], while miR-186 targets the 3′ UTR of Glut1 (SLC2A1) [41]. These studies suggest that SLC2A1 may be related to the occurrence of SOFNH. Furthermore, these four genes were verified in the GSE26316 dataset, and only the expression of HEMG  N and SLC2A1 was reduced in the SONFH group. Taken together, among these four key genes, HEMGN and SLC2A1 are more likely to participate in the occurrence of SOFNH and are more suitable as diagnostic biomarkers of SOFNH than RHAG and RNF14, which requires further experimental validation in the future.
There are several highlights in the current study. Firstly, few studies have focused on identifying diagnostic biomarkers of SONFH. An expression profile of peripheral serum samples from healthy individuals and SONFH patients could contribute to a comprehensive understanding of SONFH and identifying SONFH diagnostic biomarkers. Secondly, WGCNA has a particular advantage in processing gene expression datasets because it could analyze the connectivity between modules and clinical features. However, the current study also has a limitation. The potential diagnostic biomarkers were validated on a dataset from rats of SONFH rather than blood samples of real patients with this disease. The main cause is that the number of SONFH cases is decreasing due to the more and more standardized clinical use of steroid hormones. Therefore, we chosen to use the data set from rats of SONFH to verify the key genes. These potential diagnostic biomarkers remain to be validated using blood samples of real patients with SONFH in further studies.

Conclusions
This study was the first study that used the WGCNA and DEGs screening method to identify key genes and