Identification of transcription factors and construction of a novel miRNA regulatory network in primary osteoarthritis by integrated analysis

Backgrounds As osteoarthritis (OA) disease-modifying therapies are not available, novel therapeutic targets need to be discovered and prioritized. Here, we aim to identify miRNA signatures in patients to fully elucidate regulatory mechanism of OA pathogenesis and advance in basic understanding of the genetic etiology of OA. Methods Six participants (3 OA and 3 controls) were recruited and serum samples were assayed through RNA sequencing (RNA-seq). And, RNA-seq dataset was analysed to identify genes, pathways and regulatory networks dysregulated in OA. The overlapped differentially expressed microRNAs (DEMs) were further screened in combination with the microarray dataset GSE143514. The expression levels of candidate miRNAs were further validated by quantitative real-time PCR (qRT-PCR) based on the GEO dataset (GSE114007). Results Serum samples were sequenced interrogating 382 miRNAs. After screening of independent samples and GEO database, the two comparison datasets shared 19 overlapped candidate micRNAs. Of these, 9 up-regulated DEMs and 10 down-regulated DEMs were detected, respectively. There were 236 target genes for up-regulated DEMs and 400 target genes for those down-regulated DEMs. For up-regulated DEMs, the top 10 hub genes were KRAS, NRAS, CDC42, GDNF, SOS1, PIK3R3, GSK3B, IRS2, GNG12, and PRKCA; for down-regulated DEMs, the top 10 hub genes were NR3C1, PPARGC1A, SUMO1, MEF2C, FOXO3, PPP1CB, MAP2K1, RARA, RHOC, CDC23, and CREB3L2. Mir-584-5p-KRAS, mir-183-5p-NRAS, mir-4435-PIK3R3, and mir-4435-SOS1 were identified as four potential regulatory pathways by integrated analysis. Conclusions We have integrated differential expression data to reveal putative genes and detected four potential miRNA-target gene pathways through bioinformatics analysis that represent new mediators of abnormal gene expression and promising therapeutic targets in OA. Supplementary Information The online version contains supplementary material available at 10.1186/s12891-021-04894-2.

associated with disability after diabetes and dementia [1,3]. It is estimated that nearly 400 million people will suffer from OA by 2030 in China [4]. The etiology of OA is affected by factors including age, female, obesity, occupational exposure to high levels of joint loading activity, smoking status and family history, and the molecular pathophysiology is still poorly understood [4]. Epidemiological studies have reported the heritability of >40% at individual skeletal sites, while genome-wide association studies (GWAS) have revealed that OA is a heterogeneous chronic disease with multiple genetics [5]. As such, the disease is genetically complex and multi-factorial.
MicroRNA (miRNA) are a group of universally present and multifunctional small non-coding RNAs with 22-28 bases encoded by endogenous genes [6,7], and participate in many biological processes such as cell proliferation, differentiation, apoptosis [8]. Recent breakthroughs have revealed numerous examples of miRNAs involvement in normal development and disease [9]. As one of the epigenetic mechanisms, miRNAs play a significant role in OA pathogenesis, such as cartilage homeostasis, extracellular matrix regulation, bone metabolism, apoptosis and inflammation [10]. And, much attention has recently focused on miRNA because increasing evidence indicates that miRNA affect gene transcription through a number of regulatory processes [10].
Mature miRNAs are structurally stable and essential regulators of many physiological processes. Several studies have identified numerous differentially expressed micro-RNAs (DEMs) between OA joint tissues and controls [11,12]. Presumable studies demonstrated the role of miRNAs as potential biomarkers for the assessment of diagnosis and prognosis in OA [13,14]. Nakamura et al. showed that miR-181a-5p induces articular cartilage degeneration by promoting inflammation, cartilage catabolism and apoptosis/cell death [15]. Ntoumou et al. identified that a three-miRNA signature, hsa-miR-140-3p, hsa-miR-681-3p and hsa-miR-33b-3p in OA patients, which were involved in the regulation of OA metabolism [16]. Presently, medical care is mainly based on alleviating pain symptoms and there is still a lack of effective means for the treatment of disease. It is necessary to fully understand the molecular and cellular pathways of OA and novel breakthrough in the sector of early diagnosis and treatment of OA.
In the present study, we performed miRNA profiling using high-density miRNA-arrays in peripheral blood leukocytes (PBL) and compared the expression profiles of miRNAs between OA patients and controls. Then, overlapped DEMs were screened by downloading the miR-NAs expression data from the Gene Expression Omnibus (GEO) database (GSE143514). We created a molecular profile of the OA transcriptome and performed functionl enrichment analyses to elucidate perturbed molecular functions and pathways in OA, including transcription factors (TFs)-DEMs, DEMs-target genes, TFs of target genes, and DEMs-hub genes network. Hub genes were further performed to validate the expression levels based on the GSE114007 dataset. Finally, we developed a highresolution molecular profile of OA and elucidated a novel miRNA regulatory network of OA pathogenesis.

Sample collection
A two-stage study was conducted in this study. In the first RNA-sequencing screening stage, blood samples were collected form 3 OA patients and 3 healthy controls (age over 40 years old) in March 2018.
The inclusion criteria for OA were presented as follows: (i) Complete basic personal information; (ii) Conforming to the diagnosis criteria of disease [17]. The normal controls were recruited from the survey that had no signs or symptoms of arthritis or joint diseases. Secondary OA patients such as inflammatory arthritis, rheumatoid, bone fracture and developmental dysplasia were excluded. MicroRNA expression dataset was downloaded from the GEO database as another screening stage under the accession number GSE143514.
Subsequently, in the second stage validation, the expression levels of hub genes were further verified using quantitative reverse transcription polymerase chain reaction (qRT-PCR) based on the GSE114007 dataset.

Library preparation and sequencing
White blood cells are extracted by centrifugation at 1500 g for 20 min with 5 ml whole blood. Following the manufacture's protocol, we used TRIzol (Invitrogen, Carlsbad, CA, USA) to isolate total RNA. RNA-seq was operated by Gminix Biotechnology Co., Ltd. (Shanghai, China) using 150 ng of total RNA as input, and the results was evaluated by an Illumina HiSeq 2500 sequencing platform with 10 M reads (Illumina, San Diego, CA, USA). Hisat2 [18] were used to compare DEMs, and feature Counts [19] were adopted to annotate and quantify miRNAs. MiRNAs expression was normalized by principal component analysis (PCA) using R software. Then, using the DESeq2 package in R (http:// bioco nduct or. org/ packa ges/ relea se/ bioc/ html/ DESeq2. html), counts were analyzed and DEMs were screened. Herein, miRNAs with an absolute value fold changes (FC) ≥ 1.2 and P < 0.05 were considered as the significance.

Data Collection for GSE143514 and Identification of candidate DEMs
The GEO database (https:// www. ncbi. nlm. nih. gov/ gds) is an international public repository that provides highthroughput gene expression data [20]. The GEO dataset GSE143514, including five OA patients and three controls were screened [21]. All samples were detected by the platform HiSeq x-ten platform (Illumina). For data processing, the raw data were downloaded (https:// www. ncbi. nlm. nih. gov/ geo/ query/ acc. cgi? acc= GSE14 3514). The limma software package in R (https:// bioco nduct or. org/ packa ges/ relea se/ bioc/ html/ limma. html) was employed to screen DEMs for OA patients and healthy controls. The absolute value fold changes (FC) ≥1.2 and P < 0.05 were set as the threshold for identifying DEMs.
Respectively, the volcano maps of DEMs in the internal dataset and GSE143514 database were drawn using GraphPad Prism. Venn diagram was utilized to analyze the overlapped DEMs between the internal dataset and GEO dataset (GSE143514), and these miRNAs were considered as candidate DEMs.

Prediction of Upstream Transcription Factors and Downstream Target Genes of DEMs
FunRich, an independent software tool for the analysis of gene and protein functional enrichment and interaction networks [22], was employed to predict the potential upstream TFs of candidate DEMs. Finally, we presented the top 10 transcription factors. The P value<0.05 was considered statistically significant.

Functional Analysis of Target Genes
Bioinformatics analysis was used to investigate the involved pathways and target genes for the above miR-NAs. Gene Ontology (GO) is a comprehensive gene annotation tool that provides information on gene function from molecular level to organism point, including biological process (BP), molecular function (MF), and cellular component (CC) [23]. Kyoto Encyclopedia of Genes and Genomes (KEGG) is a comprehensive database that broadly interprets biological pathways and identifies the functions of candidate DEMs in biological pathways [24]. In our study, the database for annotation, visualization, and integrated discovery (DAVID) v6.8 (https:// david. ncifc rf. gov) online tool was performed for candidate DEMs enrichment analysis. P < 0.05 for GO analysis and P < 0.1 and count>2 for the KEGG pathway were considered for further analysis.

Construction of PPI Network and Screening of Hub Genes
Investigation on the key miRNA and target genes protein-protein interaction (PPI) network is beneficial for the exploration of molecular mechanism of the disease. The Search Tool for the Retrieval of Interacting Genes/ Proteins (STRING) online database (http:// strin gdb. org/) was employed to analyze the interaction between miRNA and target genes. PPI node pairs with a combined score ≥ 0.4 were selected for further analysis. We further matched up-regulated target genes with downregulated target genes and down-regulated miRNA with up-regulated genes. Cytoscape 3.7.1 was used to visualize the network and annotate genes within enriched pathways in the network. The hub genes were screened by CytoHubba, a plugin of Cyoscape, according to the degree. The top 30 target genes were selected using the Maximal Clique Centrality (MCC).

Expression Analysis of Hub Genes based on GSE114007
For further verification analysis, the GSE114007 database was introduced to analyze the expression level of hub genes. The dataset was performed on the platform GPL11154 Illumina HiSeq 2000 and GPL18573 Illumina NextSeq 500, including 20 OA knee cartilages and 18 normal cartilage tissues. DEMs between OA and control group were analyzed by Student's t-test. The hub genes must conform to the following criteria: (i) the relationship of gene regulation is definitely consistent with previous studies; (ii) the P value should be less than 0.05.

Statistical analysis
Body mass index (BMI) was calculated by dividing the weight (kg) by the squared height (m 2 ), and the participants were divided into four categories according to the Working Group on Obesity in China recommended criteria (Underweight: BMI < 18.5 kg/m2; Normal: 18.5 ≤ BMI < 24; Overweight: 24 ≤ BMI < 28; and General obesity: BMI ≥ 28), [25]. A value of P < 0.05 was considered statistically significant. All analysis was performed with SPSS 22.0 (IBM Corp) and GraphPad Prism version 9.0 (GraphPad Software, lnc).

Results
A schematic diagram of the study design is displayed in Fig. 1, and the characteristics of the internal database are shown in Table 1.

Identification of DEMs
Six serum samples were sequenced based on the selection criteria of adjusted P < 0.05 and |logFC| > 1.2. Briefly, a total of 382 DEMs were identified from the internal database, including 194 up-regulated miRNAs and 188 down-regulated miRNAs. Similarly, 128 DEMs (47 up-regulated and 81 down-regulated) were identified in the GSE143514 dataset. The volcano maps of the two databases were presented in Fig. 2a and b, respectively.

Prediction of Transcription Factors of DEMs
In the study, we used FunRich software to predict the upstream transcription factors of up-regulated and down-regulated DEMs and presented the top 10 transcription factors in Fig. 3a

Prediction of Target Genes of DEMs
To identify the miRNA-mRNA regulatory axes involved in OA, we used miRTarBase 7.2, miRDB, miRwalk and MicroT-CDS databases to predict the target genes of DEMs. As shown in Table 2, we explored the target genes of miRNAs based on different databases. The overlapped target genes from the above four databases were screened by Venn diagram analysis and presented in Supplementary Table 1. There were 236 target genes for up-regulated DEMs and 400 ones for down-regulated DEMs. The DEMs-target gene network were plotted to more intuitively explore the relationship between DEMs and target genes ( Fig. 4a and b).

Enrichment Analysis of Target Genes
We further explored biological functions of those candidate target genes through the DAVID database. The top 10 GO enrichment results of up-regulated and down-regulated target genes were shown in Fig. 5a and b, respectively. In the BP group, the up-regulated genes were significantly enriched in gene expression, regulation of nitrogen compound metabolic process, RNA metabolic process, regulation of gene expression, and regulation of nucleobase-containing compound metabolic process, while the down-regulated genes were enriched in regulation of nitrogen compound metabolic process, positive regulation of metabolic process, regulation of cell  In the CC group, the up-regulated genes were significantly enriched in nucleoplasm, neuron part, ribonucleoprotein complex, intracellular ribonucleoprotein complex, and nucleoplasm part, while the down-regulated genes were mainly involved the neuron part, cell junction, plasma membrane region, neuron projection, and synapse, etc. In the MF group, the up-regulated genes were significantly enriched in organic cyclic compound binding, heterocyclic compound binding, nucleic acid binding, RNA binding, and nucleic acid binding transcription factor activity, while the down-regulated genes were mainly involved in organic cyclic compound binding, heterocyclic compound binding, small molecule  binding, enzyme binding, and purine ribonucleotide binding, etc. Subsequently, we conducted a KEGG pathway enrichment analysis on these target genes of DEMs. As shown in Fig. 6a and b, we identified 18 critical pathways for up-regulated target genes and the top 20 pathways for down-regulated target genes, respectively. KEGG analysis suggested that the up-regulated genes were significantly enriched in insulin resistance, PI3K-Akt signaling pathway, and insulin signaling pathway, while the downregulated genes mainly participated in the Wnt signaling, cancer, melanogenesis, and proteoglycans in cancer, etc.

Construction of DEM-Hub Genes Network
The top 30 hub genes for up-regulated and down-regulated DEMs are shown in Fig. 7a (Table 3). We constructed the candidate miRNA-hub genes regulatory network to precisely explore the molecular mechanism of these DEMs, which was presented in the form of a diagram (Fig. 8).

Discussion
OA is the most common chronic condition associated with aging and progressive joint dysfunction and the disease with the greatest socioeconomic cost. There is a clear and urgent need to explore the aetiology and progression and support the development of disease-modifying  therapies for OA. Our study provided strong evidence for the replication of previous screening integrating RNA-seq with integrated bio-informatics analysis, which may considerably increase the possibility of identifying potential biomarkers and greatly enhance the stability of our findings. To the best of our knowledge, this is the first study to comprehensively explore and identify hub miRNAs of OA from PBL of participants in Southern Chinese population. The study shed a light on plausible etiology of OA, although their biological mechanism still remains to be elucidated in the future.
In the study, we screened out 9 up-regulated DEMs and 10 down-regulated DEMs by exploring overlapped DEMs in the PBL of internal samples and synovial tissue of GSE143514 database. Among those up-regulated DEMs, mir-34a-5p could play an important role in the cell progression of osteosarcoma and could induce chondrocyte apoptosis during OA progression [26]. A previous study reported that mir-138-5p could inhibit the apoptosis and inflammation of chondrocytes and regulate cartilage phenotype and osteogenesis [27]. For down-regulated DEMs, mir-1246 could be involved in the regulation of chondrocyte cell apoptosis and IL-6 secretion from chondrocyte cells [28]. Notably, those critical miRNAs play important roles in regulating biological functions in the process of OA.
Previous studies have shown that TFs can regulate miRNA expression [29]. We found that ERG1 definitely co-exist as the top transcription factor among those up-regulated DEMs and down-regulated DEMs. EGR1 belongs to the EGR family of C2H2-type zinc-finger protein [30]. EGR1 could accelerate cartilage hyper proliferation and degeneration by regulating KLF5 expression and β-catenin signaling pathway [31]. Specificity Fig. 8 Identification of the hub genes for DEMs in the PPI network. Diamond represents genes, ellipse represents miRNAs. Red represents up-regulated DEMs and hub genes, Blue represents down-regulated DEMs and hub genes protein 1(SP1), as a C2H2-type zinc-finger transcription factor, inhibits the proliferation of chondrocytes and promotes the apoptosis of chondrocytes by binding with miR-145 [32]. Eukaryotic Translation Termination Factor 1(E2F1) could regulate the potentials in modulating articular chondrocytes and participate in the regulatory mechanism of knee OA [33]. This suggest that these candidate DEMs play important roles in the pathogenesis of OA. The expression of 4 genes (KRAS, NRAS, PIK3R3, SOS1) was consistent with that in GSE114007 dataset from the results of 12 hub genes, representing high priority candidates for therapeutic interventions. KRAS, a heterodimer of RAS, leads to chondrocyte growth by stimulating the MAPK/ERK1/2/Cyclin D1 signaling pathway [34]. RAS has been shown to stabilize KRAS levels in the cytoplasmic membrane, aggregating KRAS to amplify the signal output of the mitogen-activated protein kinase (MAPK) pathway [35]. As a member of the pleiotrophin, Midkine (MK) can interact with RAS and further lead to the formation of MK-RAS-KRAS complexes [36]. Then, this action stimulates the downstream MAPK/ErK1/2 signalling pathway, thereby increasing the expression of cyclin D1 and accelerating cell cycle progression [37]. Previous studies reported positive correlations between chondrocyte death by apoptosis and the severity of OA [38,39]. Moreover, NRAS is an N-RAS oncogene that encodes a membrane protein travelling between the Golgi body and the plasma membrane. Actually, only a small number of studies investigated the role of NRAS in the development of OA. Huang et al. have reported that NRAS is an up-regulated gene in synovial tissue and associated with the MAPK pathway based on pathway analysis [40]. The MAPK pathway is the most important signal transduction system that mediates osteoarthritis cartilage injury, causing a series of reactions, such as the increased expression of matrix MMPs, chondrocyte apoptosis and cartilage destruction [40]. As a key subunit of the IAPI3K protein complex [41], PIK3R3 regulates Fig. 9 The mRNA expression of the top 12 hub genes was determined from the GSE114007 dataset (a). KRAS (b). NRAS (c). CDC42 (d). SOS1 (e). PIK3R3 f ). NR3C1 (g). PPARGC1A (h). SUMO1 (i). MEF2C (j). FOXO3 (k). PPP1CB (l). MAP2K1. P < 0.05 was considered statistically significant anabolism effector T cell differentiation, T cell senescence, and activation-induced cell death [42]. PIK3R3 is down-regulated in human cartilage tissue with OA and also mediates the effects of miR-1236 on chondrocyte proliferation and apoptosis [43]. Additionally, SOS1 is a guanine nucleotide exchange factor (GEF) for RAS involved in the RAS signaling pathway [44]. Major components of RAS, including ACE, AT1R, and AT2R, are expressed in synovial tissue in humans and participate in the pathogenesis of OA [45]. Kawakami et al. confirmed that chondrocytes partially expressed AT1R and AT2R and presented increased expression with the stimulation of inflammatory cytokines IL-1 [46]. This study has several merits. First, we have systematically characterized the profile of miRNAs and mRNAs expression between OA patients and healthy controls (blood samples) integrating with public data GSE143514(synovial tissues) to screen overlapped DEMs, which contain different tissues from participants. Second, we performed miRNA-associated integrated network analysis to identify potential hub genes by bioinformatics analysis. More importantly, the acquisition of blood samples is actually feasible as a means of early screening and diagnosis compared with cartilage samples and/or synovial tissues in clinical practice.
Also, there are still some limitations need to be considered. First of all, the sample size of the integrated analysis, neither the internal data nor the GEO dataset, is sufficient to explore and verify those candidate miRNAs and key genes. Considering the veracity and credibility, further studies with larger sample size are still needed to validate the findings. And, although we employed the internal samples and the public databases to comprehensively screen the critical miRNAs and key genes, the involvement of these factors and their value as therapeutic targets need to be validated in studies on their function in vitro and in vivo.
In summary, using integrated analysis, we have detected a series of potential miRNA-target gene pathways (mir-584-5p-KRAS, mir-183-5p-NRAS, mir-4435-PIK3R3, and mir-4435-SOS1) that represent new mediators of abnormal gene expression and high priority targets for therapeutic interventions. Our findings could advance in basic understanding of risk factors and molecular mechanisms of OA disease. Screening from large populations also remains to be further elucidated and validated.