Skip to main content

Analysis of mass spectrometry data from the secretome of an explant model of articular cartilage exposed to pro-inflammatory and anti-inflammatory stimuli using machine learning



Osteoarthritis (OA) is an inflammatory disease of synovial joints involving the loss and degeneration of articular cartilage. The gold standard for evaluating cartilage loss in OA is the measurement of joint space width on standard radiographs. However, in most cases the diagnosis is made well after the onset of the disease, when the symptoms are well established. Identification of early biomarkers of OA can facilitate earlier diagnosis, improve disease monitoring and predict responses to therapeutic interventions.


This study describes the bioinformatic analysis of data generated from high throughput proteomics for identification of potential biomarkers of OA. The mass spectrometry data was generated using a canine explant model of articular cartilage treated with the pro-inflammatory cytokine interleukin 1 β (IL-1β). The bioinformatics analysis involved the application of machine learning and network analysis to the proteomic mass spectrometry data. A rule based machine learning technique, BioHEL, was used to create a model that classified the samples into their relevant treatment groups by identifying those proteins that separated samples into their respective groups. The proteins identified were considered to be potential biomarkers. Protein networks were also generated; from these networks, proteins pivotal to the classification were identified.


BioHEL correctly classified eighteen out of twenty-three samples, giving a classification accuracy of 78.3% for the dataset. The dataset included the four classes of control, IL-1β, carprofen, and IL-1β and carprofen together. This exceeded the other machine learners that were used for a comparison, on the same dataset, with the exception of another rule-based method, JRip, which performed equally well. The proteins that were most frequently used in rules generated by BioHEL were found to include a number of relevant proteins including matrix metalloproteinase 3, interleukin 8 and matrix gla protein.


Using this protocol, combining an in vitro model of OA with bioinformatics analysis, a number of relevant extracellular matrix proteins were identified, thereby supporting the application of these bioinformatics tools for analysis of proteomic data from in vitro models of cartilage degradation.

Peer Review reports


Articular cartilage is a mechanically resilient connective tissue with unique load-bearing and shock-absorbing properties, which are largely dependent on the structural and functional integrity of its highly charged and hydrated extracellular matrix (ECM) [1]. Cartilage contains three principal components: chondrocytes, aggregating proteoglycans and collagens, all of which are embedded within the ECM and contribute to the homeostasis of the tissue [2]. Cartilage relies on oxygen and nutrient delivery from the synovial fluid [3] but is avascular and recalcitrant to repair [4]. Osteoarthritis (OA) is a degenerative disease of synovial joints, involving the loss of articular cartilage, synovial inflammation and changes to the subchondral bone, resulting in impaired articulation, reduced mobility, joint stiffness and pain [5, 6]. OA is estimated to affect up to 85% of the human population over 60 years old [7] and is also common in companion animals [8]. There are a number of factors affecting OA, including age, obesity, previous joint trauma or instability, metabolic or endocrine disease and oestrogen status [9, 10]. Currently, diagnosis is made through clinical examination and the imaging “gold standard”, radiography. However, radiographic diagnosis of OA is usually made when the clinical signs of pain and loss of mobility have already appeared. Consequently, the disease can remain undiagnosed until the later stages, where interventions may not alter the course of progression.

Biomarkers have the capacity to identify early changes in joint tissues and diagnose OA during the pre-radiographic stages of the disease and to determine the course of its progression, as well as aid in drug discovery and clinical trials [1115]. The term biomarker can be used to describe molecules or molecular fragments that indicate the presence of a biological or disease process. Early detection may also help prioritize treatments to slow progression, such as weight loss and a reduction in high impact load bearing on those joints [16]. Therefore, individual or combination biomarkers must be able to clearly differentiate between healthy and diseased states. Ideally biomarkers should be disease-specific and not be influenced by other disorders. Biomarkers should also be easily measurable in a clinical setting [17]. In rheumatology, biomarkers can be “tissue fingerprints” or combinations of “neo-epitopes”, reflecting catabolic effects downstream of inflammatory signals.

Recent advances in post-genomic technologies, including genomics, transcriptomics, proteomics and metabolomics, have allowed the development of novel methods for identification of biomarkers of disease. Proteomics is a particularly promising technology as it allows the identification of individual proteins and their peptides, neo-epitopes and degradation “fingerprints”. This information can then be used to develop sensitive, rapid antibody-based assays. In addition proteomic analyses provide an overview of changes in the proteome in biological systems across a range of conditions [18].

Through the combined use of proteomics, transcriptomics and other biochemical and immunological techniques, a number of proteins and protein families have previously been associated with OA. These include ECM proteins such as aggrecan, the major structural proteoglycan found in the cartilage ECM, cartilage oligomeric matrix protein (COMP), a non-collagenous protein involved in the organization and assembly of articular cartilage, and matrix metalloproteinases (MMPs), a family of proteins expressed by chondrocytes, which are involved in the degradation of ECM macromolecules and lead to the fibrillation of articular cartilage [11, 1924]. In the ECM, matrix metalloproteinase-3 (MMP-3) in particular appears to be vital for matrix turnover and homeostasis. This protein is up-regulated in early OA, but has been found to be down-regulated in later stages of the disease [25].

Many omics technologies, such as microarrays, next generation sequencing and mass spectrometry (MS), generate large amounts of data. Therefore, bioinformatic tools play an important role in the analysis of such data and a wide range of methods have been developed for this purpose [26, 27]. Supervised machine learning techniques are used, based on a training set of labelled samples, to build models that are able to automatically label previously unclassified samples [28, 29]. Samples can be assigned a label (e.g. a treatment group) based on whether or not they contain a certain attribute (e.g. a protein, or a group of proteins) and at what level the attribute is found within the samples [30, 31]. There are many types of machine learning techniques, such as decision trees, rule-based learners and support vector machines [28, 32]. Rule-based machine learning methods automatically produce human-readable production rules that assign samples to their respective treatment groups. In proteomics-based approaches, the rules created contain proteins that best divide the samples into disease or treatment groups. Proteins most consistently differing between groups are suitable for further investigation as potential biomarkers.

The aim of this study was to identify suitable bioinformatic methods for the analysis of proteomics data generated to investigate cytokine-induced catabolic changes associated with the early stages of OA [33]. This involved using an explant model of cartilage to investigate the secretome of canine articular cartilage. The cartilage explant model was selected because it allows a rapid and ‘clean’ analysis of secreted proteins in the context of joint disease. Many of the proteins present in the secretome of explant cultures are involved in the control of physiological and pathophysiological processes in the joint [34] and may enter the blood stream where they may be accessible as systemic biomarkers.


Animal tissues and statement of ethical approval

Forelimbs and hind limbs were taken from male German Shepherd army dogs, over 5 years of age, that were euthanized for clinical reasons unrelated to research. Therefore, this project does not fall under the Animals (Scientific Procedures) Act 1986a or the Veterinary Surgeons Act 1966b. Approval for the use of clinical materials was obtained from the Ethics Committee of the School of Veterinary Science and Medicine with input from members of the University of Nottingham's Animal Welfare and Ethical Review Body (AWERB). The British Army owned the animals that were used in this study. Informed consent was obtained for the use of joint tissues.

Cartilage explant culture

Limbs were washed in disinfectant and soaked in sodium hypochlorite prior to spraying with ethanol. The stifle and elbow joints were dissected under sterile conditions and full thickness articular cartilage was placed in serum free collection media. The media consisted of Hyclone® liquid medium: DMEM supplemented with penicillin and streptomycin.

After washing the harvested cartilage, a 3 mm biopsy punch was used to cut discs, which were placed in a randomized manner into wells of a 24 well plate, containing serum free DMEM (as above). The media was removed and the explants were incubated in media alone (control), or supplemented with recombinant canine IL-1β (10 ng/ml), the non-steroidal anti-inflammatory drug carprofen (Rimadyl®, 100 μg/ml), or carprofen and IL-1β combined (100 μg/ml and 10 ng/ml, respectively). For each treatment, three samples were used per dog, giving six samples per treatment. After 5 days in culture, supernatants and explants were removed and processed for mass spectrometric analysis.

Sample preparation and mass spectrometry

Samples from 2 dogs were chosen for MS/MS analysis based on the general profile of proteins as visualized on SDS-PAGE (data not included in the manuscript, see Additional file 1: Table S1; Additional file 2: Table S2; Additional file 3: Table S3; Additional file 4: Table S4; Additional file 5: Figure S1; Additional file 6: Figure S2). Each set of dog samples consisted of three treatments (IL-1β, carprofen, IL-1β + carprofen), with three replicates for each treatment for both dogs. A set of control samples was also analyzed, providing a total of 24 samples (12 samples per dog).

The secretome samples were digested with trypsin before mass spectrometry. Soluble proteins were reduced by the addition of DTT to a final concentration of 10 mM to each sample. The thiol groups were blocked by the addition of iodoacetamide to a final concentration of 55 mM. The proteins were then precipitated with ice-cold acetone before being suspended in trypsin solution (10 ng/μl in 50 mM ammonium bicarbonate) (Trypsin Gold, Mass Spectrometry Grade, Promega). Trypsin digestion was terminated by addition of formic acid to give a final concentration of 0.1%. Before MS analysis, an aliquot of the digestion was desalted and any insoluble particulates removed using a C18 Zip-Tip (Millipore).

Peptides were separated on a 15 cm C18 PepMap™ column (LC Packings) using a Bruker Easy-nLC platform with a flow rate of 300 nl/min. The sample was added to solvent A (95% v/v H2O, 5% v/v ACN, 0.1% v/v formic acid) and was injected into the HPLC column via the autosampler. Following binding and washing of the sample on the column in solvent A, peptides were separated and eluted in a gradient of solvent B (95% v/v ACN, 5% v/v H2O, 0.1% v/v formic acid).

Eluted peptides were delivered on-line and detected in a Bruker AmaZon ETD ion trap instrument. The five most abundant peptides in each MS scan were selected for fragmentation. The raw data were processed to provide peptide and fragment mass lists which were submitted to the MS/MS ions tool of the Mascot search engine, software which uses protein sequence databases to predict the identity of proteins present in samples, based on the peptides identified. The fragment mass values for each peptide were compared to the mammalian entries from the UniProtKB database. The modifications incorporated into the search were: fixed carbamidomethyl cysteine and variable oxidation of methionine.

One sample, treated with both IL-1β + carprofen, was removed from the dataset at this stage as it was considered to be anomalous due to the very small number of proteins that were identified from it by Mascot. This resulted in 23 samples for further analysis: six samples per treatment, except for IL-1β + carprofen, for which there were five samples.

Further MS data analysis pipeline

The pipeline for the analysis of mass spectrometry data is described in Figure 1. Included in the results generated by Mascot is the exponentially modified protein abundance index (emPAI) score for each protein identified. The emPAI score gives an estimate for the absolute amount of a protein present in a sample [35]. It is based on the protein abundance index (PAI), which is defined as ‘the number of peptides identified divided by the number of theoretically observable tryptic peptides’ [36]. PAI was then adapted to emPAI to ensure it is proportional to the total protein content in a sample [35].

Figure 1
figure 1

Pipeline for label-free quantification of mass spectrometry data. TPP – stages included in the Trans-Proteomic Pipeline.

Mascot outputs were also submitted to ProteinProphet [37, 38], part of the Trans-Proteomic Pipeline [39], used for the statistical validation of protein identifications. Using ProteinProphet a probability score is assigned to each of the protein identifications that was made by Mascot. Strictly speaking, ProteinProphet is not a true quantification method, but the probability scores that it produces are roughly equivalent to a quantitative approach. Therefore it is suitable for further analysis using machine learning techniques.

Both the emPAI and ProteinProphet scores were generated. Machine learning was applied to these datasets using a number of methods from the WEKA machine learning package [32] and BioHEL, a rule based learner [31].

Comparison of machine learning techniques

To determine the most suitable machine learning method for the analysis of canine articular cartilage mass spectrometry dataset, seven different machine-learning techniques, including BioHEL, were applied to compare their abilities. The other methods used were Naive Bayes, Support Vector Machines, C4.5, IBk, JRip and Random Forest, all implemented in WEKA. The source code and user manual for BioHEL are available at

Due to some anomalous identifications in the Mascot results for one of the carprofen + IL-1β samples, where only a very small number of proteins were identified compared to the other samples, it was removed from the dataset. This resulted in a dataset of 23 samples, spanning four treatment classes. As a result of this small number of samples, leave-one-out cross validation was used to divide them into training and test sets [40]; using this method allows for the most information to be extracted from the data available. Twenty-three training sets and the same number of test sets were created. The test sets each contained only one sample, with the remainder of the dataset in the related training set. This allows the ability of classification models to be evaluated.

Significance testing

The significance of the BioHEL classification accuracies achieved was tested by calculating p-values using one-tailed permutation testing [41]. A new version of the dataset was created where the samples were randomly assigned to treatments, but maintaining the same number of samples per treatment as in the original data. Afterwards, BioHEL was run, using leave-one-out cross-validation, to compare the accuracies achieved; 50 such permutations were generated for the emPAI, ProteinProphet and combined datasets. The accuracies achieved by these runs were compared to the accuracies achieved on the real, non-randomized, datasets and a p-value of the likelihood that the accuracy on the original data belongs to the randomized distribution was computed.

Identification of top ranking proteins

Due to the performance of BioHEL in the comparison with other machine learning methods, analyses using BioHEL continued through the identification of proteins that were pivotal to the classification, using a methodology previously used for the analysis of transcriptomics data [42, 43].

BioHEL classifies samples by automatically producing rules sets that consist of a number of rules that use the proteins found in the samples to determine which treatment group they belong to. Each rule within a set uses proteins, when used with mass spectrometry data, to assign samples to treatment classes. A rule within a set uses one or more proteins and assigns samples to the relevant class, shown at the end of rule, if it matches exactly the protein content specified by the rule. An example of a rule set for this data follows:

  1. 1.

    If the abundance of TPIS is greater than 0.01 then the sample belongs to the IL-1β group

  2. 2.

    If the abundance of IL-8 is greater than 0.02 then the sample belongs to the carprofen+IL-1β group

  3. 3.

    If the abundance of MMP-3 is greater than 0 and the abundance of UBIB is less than 0.2 then the sample belongs to the IL-1β group

  4. 4.

    If the abundance of MGP is greater than 0 and the abundance of A1AT is less than 0.9 then the sample belongs to the carprofen group

  5. 5.

    If the abundance of ALBU is greater than 0.01 then the sample belongs to the carprofen+IL-1β group

  6. 6.

    Any sample not assigned to a group belongs to the control group

The combinations of rules in the rule sets are used to assign samples to their respective treatment groups. Each rule contains one or more proteins and a score (either emPAI or ProteinProphet), which each protein should either be above or below, depending on the sign used. At the end of each line is the treatment class to which each rule relates. For example, the 1st rule of the rule set shown classifies all samples as belonging to the IL-1β class if the value of the protein attribute TPIS is greater than 0.01. There are no rules for the control: all samples that are not assigned to the other three classes by the rules generated will be, by default, considered as a control sample.

Due to the stochastic nature of BioHEL, running it multiple times on the same dataset produces different rule sets. Therefore BioHEL was run 10,000 times to analyze the results and determine recurrent patterns. Proteins were ranked by the number of times they appeared in rules across the 10,000 runs, to highlight those proteins used most frequently. Those ranking at the top are proteins that can be used to most successfully identify between samples of different treatments. As these proteins are the most different between treatment classes, they may be suitable for consideration as biomarkers or further analysis of them may provide information about possible novel methods for diagnosis or treatment.

Network generation

To investigate interactions between proteins with our prediction model we used network analysis, by identifying proteins that were working together in rules generated by BioHEL. Within rules generated by BioHEL protein pairs can be identified, from which networks were generated. These networks can be used to identify relationships between proteins; they also provide a visual way of viewing those proteins that are frequently in rules through identification of the most connected proteins. In the example of a rule set, shown in “Identification of top ranking proteins ” subsection, there are some rules that use more than one protein; these were used to form protein pairs. For example, in the third rule both apolipoprotein E (APOE) and hyaluronan and proteoglycan link protein 1 (HPLN1) are used and so are considered a protein pair. The 100 protein pairs that were most frequently used within rules, for each individual treatment class, across the 10,000 runs of BioHEL were extracted and a network was generated from them in Cytoscape [44]. The networks consist of nodes that relate to the proteins, found in the BioHEL rules, and edges connect proteins if they were frequently included in rules together. The edges were then coloured based on the treatment class that each pair of proteins relates to.

Results and discussion

Proteomic techniques are increasingly being used for the identification of novel joint disease biomarkers [11, 19, 20, 45]. This study tests the hypothesis that the secretome of canine articular cartilage may provide a simple but well-defined model for studying potential biomarkers of early cartilage damage. To study the secretome of canine articular cartilage in an explant model we used a combination of conventional and high throughput proteomic techniques, followed by the application of bioinformatics techniques.

Although the cartilage explant system has not been used extensively in proteomic studies, a similar equine explant model of articular cartilage has been used to examine changes in the secretome in response to pro-inflammatory and anti-inflammatory stimuli [33]. This present study indicates that canine cartilage explants can also serve as a model for targeted and high throughput proteomic studies. This is supported by the identification of a large number of proteins whose functions are relevant to articular cartilage and biological processes that are relevant to joint disease and OA. Using the explant model this study has demonstrated it is feasible to incorporate pathophysiologically relevant stimuli such as pro-inflammatory cytokines (e.g. IL-1β) to simulate catabolic changes as well as NSAIDs (e.g. carprofen) to simulate pharmacotherapy in a well-controlled model in vitro.

The SDS-PAGE protein profiles of the IL-1β stimulated samples illustrate that some proteins are present at a higher level of abundance in the presence of IL-1β. This was demonstrated by the presence of extra bands in the IL-1β treated samples that were not detected in the controls (see Additional file 1: Table S1; Additional file 2: Table S2; Additional file 3: Table S3; Additional file 4: Table S4; Additional file 5: Figure S1; Additional file 6: Figure S2). There was also general consistency in protein profiles across all groups of treated samples for the two animals (see Additional file 1: Table S1; Additional file 2: Table S2; Additional file 3: Table S3; Additional file 4: Table S4; Additional file 5: Figure S1; Additional file 6: Figure S2).

A range of machine learning methods were compared and BioHEL proved to be successful in classifying both the ProteinProphet and emPAI datasets. The accuracies of the range of machine learning techniques tested on the canine articular cartilage data are shown in Table 1. For the BioHEL classifications on each dataset, confusion matrices (that identify, treatment by treatment, how the samples were predicted) were generated to understand which samples were predicted correctly. It can be seen from the matrices for the emPAI, ProteinProphet and combined datasets (Figure 2) that the most frequent incorrect prediction made was predicting control samples as carprofen treated samples. This is due to the similarity between these groups, as carprofen was added in the absence of IL-1β and thus there was no pro-inflammatory present for this NSAID to act on. No IL-1β sample was predicted as a control sample. From Table 1, it can be seen that BioHEL achieves the highest accuracies for both the ProteinProphet and the dataset that combines both emPAI and ProteinProphet scores; because of this, analysis was continued using BioHEL. The classification was increased by the combination of these two scoring systems. The significance of the BioHEL classification accuracies was supported by the p-values, calculated using permutation testing, shown in Table 2, as they were all close to zero. The outcome of this test confirms that the models generated by BioHEL (rule sets) are sound and hence we can safely analyze them to extract rankings of important variables and generate interaction networks.

Table 1 Comparison of performance accuracies, as percentage of samples correctly classified, for classification of canine articular cartilage data for seven different machine-learning methods, using leave-one-out cross-validation
Figure 2
figure 2

Confusion matrices, for the emPAI, ProteinProphet and combined datasets, to show the number of samples in each class and which class they were predicted to be in, using BioHEL.

Table 2 P -values generated by significance testing of BioHEL for the emPAI, ProteinProphet and combined datasets

From the rules generated by BioHEL, the top ranking mammalian proteins for the three treatments are shown in Tables 3 and 4. There is no ranking for the control class because it was used as the default and so did not include any proteins in rules. The default is included at the end of a rule set, so that any sample that has not been assigned to a class by the rules in the set is automatically placed into the default class. Table 5 shows the top ranking mammalian proteins for the emPAI and ProteinProphet combined datasets. It shows that both the emPAI and ProteinProphet scores were useful in the classification as some proteins, including triosephosphate isomerase, MMP-3, IL-8 and HPLN1, are top ranking proteins using both emPAI and ProteinProphet values.

Table 3 The ten mammalian proteins found most frequently in rules for each of the three classes, not including the default control class, from the ProteinProphet dataset
Table 4 The ten mammalian proteins found most frequently in rules for each of the three classes, not including the default control class, from the emPAI dataset
Table 5 The ten mammalian proteins found most frequently in rules for each of the three classes, not including the default control class, from the ProteinProphet and emPAI combined dataset

The interaction network generated from the ProteinProphet probabilities is shown in Figure 3. There are particular proteins (those most connected to other proteins) that can be identified from the network. These proteins include matrix metalloproteinase-3 (MMP-3), interleukin 8 (IL-8), HPLN1, matrix gla protein (MGP) and APOE, and are detailed in Table 6. The interaction network generated from the emPAI scores is shown in Figure 4. In this network there are fewer highly connected proteins, than in the ProteinProphet network, although MMP-3 and IL-8 are again connected to many other proteins. The fewer highly connected proteins in the emPAI network could be due to some proteins having similar emPAI scores but differing ProteinProphet probabilities. Therefore, where in the ProteinProphet network only one protein was suitable, in the emPAI network multiple proteins gave the same results and were interchangeable.

Figure 3
figure 3

Protein interaction network generated from the top 100 BioHEL protein pairs for the ProteinProphet canine articular cartilage dataset. The most frequently used protein pairs for IL-1β in blue, carprofen in red and carprofen + IL-1β in green.

Table 6 Most connected proteins identified from the ProteinProphet protein pairs network
Figure 4
figure 4

Protein interaction network generated from the top 100 BioHEL protein pairs for the emPAI canine articular cartilage dataset. The most frequently used protein pairs for IL-1β in blue, carprofen in red and carprofen + IL-1β in green.

COMP is a noncollagenous ECM protein that is abundantly expressed in articular cartilage and which has been considered by other groups as a possible marker of articular cartilage degradation. This protein was not included in any top ranking protein lists, or in either network generated, because COMP was found at similar levels across all samples, regardless of the type of treatment. Therefore the bioinformatics methods discussed here are useful in determining proteins that may be suitable for use as putative biomarkers, rather than simply proteins that are abundant. We also expected to detect MMPs, a family of proteins expressed by chondrocytes with roles in cartilage development, remodelling and disease [54]. Matrix metalloproteinase-3 (MMP-3), a surrogate biomarker of psoriatic and rheumatoid arthritis [55, 56], was pivotal in the classification of IL-1β samples. MMP-3 is a proteolytic enzyme known to degrade components of the ECM, including collagens and cartilage proteoglycans and, as a result, was the top ranking protein for the IL-1β class. No other MMPs were highlighted by the bioinformatics techniques applied. interleukin 8 (IL-8) was dominant in the classification of IL-1β and carprofen treated samples. IL-8 is the major chemotactic factor released in response to pro-inflammatory cytokines in synovial tissues from rheumatoid arthritis and osteoarthritis affected joints [5759]. Matrix gla protein, involved in inhibition of calcification in cartilage [51], was also frequently found in the BioHEL rules from the analysis of the ProteinProhet dataset. This protein was found in many samples across all treatment groups, except for the carprofen + IL-1β group. MMP-3, IL-8 and MGP were also the most connected proteins in the ProteinProphet network. The inclusion of proteins such as these in the top ranking lists and as the most connected proteins, demonstrates the abilities of these techniques aimed at identifying proteins involved in cartilage degradation. There were other proteins, such as APOE and HPLN1 that were found frequently in the rules. However, the supplementary tables show they are not present in all the samples of any group.

The proteins identified by this protocol were compared to those found using the same proteomics protocol, but without the bioinformatics analysis, using equine explant tissue [33]. There were proteins highlighted in the equine study that were not in this canine study, including COMP, fibronectin and chondroadherin, because, whilst they were abundant in the samples, they were not significantly different across the different treatment groups. Therefore, the bioinformatics methods used provide a way to focus on the most relevant proteins.

The data indicate that in the absence of IL-1β carprofen had little effect on the cartilage explant secretome. Therefore, proteins that aided in the classification may have been included in the classification model, but are not necessarily intrinsically involved in the processes being investigated. This resulted in some non-mammalian proteins identified as top ranking proteins. It is possible traces of contaminating proteins entered the analysis and the proteins have been correctly identified. Alternatively, proteins were incorrectly identified by Mascot; because the selected proteins were not in the database used, in which case the highest-ranking closest protein was used.

The major challenge faced by many proteomic studies is the under representation of the lower abundance proteins that are potentially of interest. This under representation is due to the massive range of protein abundance in complex biological samples such as serum, cerebrospinal fluid and urine or marginally less complex samples like the secretome [60] with high abundant proteins saturating the MS/MS with higher signal levels. Proteins, such as COMP, are highly abundant in the cartilage and hinder identification of less abundant proteins relevant to biological processes. Sample preparation techniques such as proteome fractionation and deglycosylation should enable the identification of less abundant proteins and therefore more information could be uncovered using these techniques.

As described, additional analyses were performed on a number of top ranking proteins identified by these methods. However, further analysis is required to investigate the individual proteins highlighted and other proteins in the networks. This includes both laboratory-based experiments to confirm the presence of individual proteins and their levels within different sample types, and further literature and pathway analyses to mine relevant previously identified information.

Due to the nature of the machine learning methods used, it would be more suitable to analyze larger datasets and therefore future work should include a similar study on a larger scale, with more replicate samples and a larger number of animals.


This study involved bioinformatic analysis of high throughput proteomic data generated using an explant model of cytokine-induced articular cartilage degradation. The approach described in this paper may be used in future studies for identification of early structural changes in cartilage and for drug testing, and screening of novel anti-inflammatory compounds from natural products. Extending our previous work with explant models of articular cartilage, bioinformatics techniques were applied to high throughput proteomics data to identify proteins suitable for use as exploratory biomarkers. This combination of laboratory-based and computational methods has provided results, which experimental techniques alone could not have provided. This proteomic and bioinformatics study has detected a number of established ECM proteins, including MMP-3, IL-8 and MGP, and therefore has shown the application of these bioinformatics tools is suitable for this purpose and could be applied to proteomics data from other areas.




Authors’ information

Jaume Bacardit and Ali Mobasheri

The D-BOARD European Consortium for Biomarker Discovery, University of Nottingham, University Park, Nottingham, NG7 2RD, United Kingdom;



Apolipoprotein E


Cartilage oligomeric matrix protein


Dulbecco’s modified eagle medium




Extracellular matrix


Exponentially modified protein abundance index


High-performance liquid chromatography


Hyaluronan and proteoglycan link protein 1


Interleukin 1 beta




Matrix gla protein


Matrix metalloproteinase


Matrix metalloproteinase-3 (stromelysin-1)


Mass spectrometry




  1. Lammi MJ, Häyrinen J, Mahonen A: Proteomic analysis of cartilage- and bone-associated samples. ELECTROPHORESIS. 2006, 27 (13): 2687-2701. 10.1002/elps.200600004.

    Article  CAS  PubMed  Google Scholar 

  2. Eyre D: Articular cartilage and changes in arthritis: collagen of articular cartilage. Arthritis Res. 2002, 4 (1): 30-35. 10.1186/ar380.

    Article  CAS  PubMed  Google Scholar 

  3. Bian L, Angione SL, Ng KW, Lima EG, Williams DY, Mao DQ, Ateshian GA, Hung CT: Influence of decreasing nutrient path length on the development of engineered cartilage. Osteoarthr Cartilage. 2009, 17 (5): 677-685. 10.1016/j.joca.2008.10.003.

    Article  CAS  Google Scholar 

  4. Newman AP: Articular cartilage repair. Am J Sports Med. 1998, 26 (2): 309-324.

    CAS  PubMed  Google Scholar 

  5. Vaughan-Scott T, Taylor JH: The pathophysiology and medical management of canine osteoarthritis. J S Afr Vet Assoc. 1997, 68 (1): 21-25.

    Article  CAS  PubMed  Google Scholar 

  6. Buckwalter JA, Saltzman C, Brown T: The impact of osteoarthritis: implications for research. Clin Orthop Relat R. 2004, 427: S6-S15. 10.1097/1001.blo.0000143938.0000130681.0000143939d

    Article  Google Scholar 

  7. Goldring MB: Update on the biology of the chondrocyte and new approaches to treating cartilage diseases. Best Pract Res Clin Rheumatol. 2006, 20 (5): 1003-1025. 10.1016/j.berh.2006.06.003.

    Article  CAS  PubMed  Google Scholar 

  8. Macrory L, Vaughan-Thomas A, Clegg P, Innes J: An exploration of the ability of tepoxalin to ameliorate the degradation of articular cartilage in a canine in vitro model. BMC Vet Res. 2009, 5 (1): 25-10.1186/1746-6148-5-25.

    Article  PubMed  PubMed Central  Google Scholar 

  9. Radin EL, Burr DB, Caterson B, Fyhrie D, Brown TD, Boyd RD: Mechanical determinants of osteoarthrosis. Semin arthritis rheu. 1991, 21 (3, Supplement 2): 12-21. 10.1016/0049-0172(91)90036-Y.

    Article  CAS  Google Scholar 

  10. Sack KE: Osteoarthritis. A continuing challenge. Western J Med. 1995, 163 (6): 579-586.

    CAS  Google Scholar 

  11. Mobasheri A: Osteoarthritis year 2012 in review: biomarkers. Osteoarthr Cartilage. 2012, 20 (12): 1451-1464. 10.1016/j.joca.2012.07.009.

    Article  CAS  Google Scholar 

  12. Bai JP, Bell R, Buckman S, Burckart GJ, Eichler HG, Fang KC, Goodsaid FM, Jusko WJ, Lesko LL, Meibohm B: Translational biomarkers: from preclinical to clinical a report of 2009 AAPS/ACCP biomarker workshop. AAPS J. 2011, 13 (2): 274-283. 10.1208/s12248-011-9265-x.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Kraus VB, Burnett B, Coindreau J, Cottrell S, Eyre D, Gendreau M, Gardiner J, Garnero P, Hardin J, Henrotin Y: Application of biomarkers in the development of drugs intended for the treatment of osteoarthritis. Osteoarthr Cartilage. 2011, 19 (5): 515-542. 10.1016/j.joca.2010.08.019.

    Article  CAS  Google Scholar 

  14. Qvist P, Christiansen C, Karsdal MA, Madsen SH, Sondergaard BC, Bay-Jensen AC: Application of biochemical markers in development of drugs for treatment of osteoarthritis. Biomarkers. 2010, 15 (1): 1-19. 10.3109/13547500903295873.

    Article  CAS  PubMed  Google Scholar 

  15. Bay-Jensen AC, Sondergaard BC, Christiansen C, Karsdal MA, Madsen SH, Qvist P: Biochemical markers of joint tissue turnover. Assay Drug Dev Technol. 2010, 8 (1): 118-124. 10.1089/adt.2009.0199.

    Article  CAS  PubMed  Google Scholar 

  16. Bauer DC, Hunter DJ, Abramson SB, Attur M, Corr M, Felson D, Heinegård D, Jordan JM, Kepler TB, Lane NE: Classification of osteoarthritis biomarkers: a proposed approach. Osteoarthr and Cartilage. 2006, 14 (8): 723-727. 10.1016/j.joca.2006.04.001.

    Article  CAS  Google Scholar 

  17. Felson DT, Lohmander LS: Whither osteoarthritis biomarkers?. Osteoarthr cartilage/OARS, Osteoarthr Res Soc. 2009, 17 (4): 419-422. 10.1016/j.joca.2009.02.004.

    Article  CAS  Google Scholar 

  18. Moore RE, Kirwan J, Doherty MK, Whitfield PD: Biomarker discovery in animal health and disease: the application of post-genomic technologies. Biomarker insights. 2007, 2: 185-196.

    PubMed  PubMed Central  Google Scholar 

  19. Ruiz-Romero C, Blanco FJ: Proteomics role in the search for improved diagnosis, prognosis and treatment of osteoarthritis. Osteoarthr Cartilage. 2010, 18 (4): 500-509. 10.1016/j.joca.2009.11.012.

    Article  CAS  Google Scholar 

  20. Gharbi M, Deberg M, Henrotin Y: Application for proteomic techniques in studying osteoarthritis: a review. Front Physiol. 2011, 2: 90-

    Article  PubMed  PubMed Central  Google Scholar 

  21. Royce PM, Steinmann B: Connective tissue and its heritable disorders: molecular, genetic, and medical aspects. 2002, New York: John Wiley & Sons

    Book  Google Scholar 

  22. Buckwalter JA, Mankin HJ: Articular cartilage: tissue design and chondrocyte-matrix interactions. Instructional course lectures. 1998, 47: 477-486.

    CAS  PubMed  Google Scholar 

  23. Kiviranta I, Jurvelin J, Tammi M, SääMäunen A-M, Helminen HJ: Weight bearing controls glycosaminoglycan concentration and articualr cartilage thickness in the knee joints of young beagle dogs. Arthritis & Rheumatism. 1987, 30 (7): 801-809. 10.1002/art.1780300710.

    Article  CAS  Google Scholar 

  24. Dickinson SC, Vankemmelbeke MN, Buttle DJ, Rosenberg K, Heinegård D, Hollander AP: Cleavage of cartilage oligomeric matrix protein (thrombospondin-5) by matrix metalloproteinases and a disintegrin and metalloproteinase with thrombospondin motifs. Matrix Biol. 2003, 22 (3): 267-278. 10.1016/S0945-053X(03)00034-9.

    Article  CAS  PubMed  Google Scholar 

  25. Aigner T, Zien A, Hanisch D, Zimmer R: Gene expression in chondrocytes assessed with use of microarrays. J Bone Joint Surg Am. 2003, 85-A (Suppl 2): 117-123.

    PubMed  Google Scholar 

  26. Deutsch EW, Lam H, Aebersold R: Data analysis and bioinformatics tools for tandem mass spectrometry in proteomics. Physiol Genomics. 2008, 33 (1): 18-25. 10.1152/physiolgenomics.00298.2007.

    Article  CAS  PubMed  Google Scholar 

  27. Kanehisa M, Bork P: Bioinformatics in the post-sequence era. Nat Genet. 2003, 33: 305-310. 10.1038/ng1109.

    Article  CAS  PubMed  Google Scholar 

  28. Larrañaga P, Calvo B, Santana R, Bielza C, Galdiano J, Inza I, Lozano JA, Armañanzas R, Santafé G, Pérez A: Machine learning in bioinformatics. Brief Bioinform. 2006, 7 (1): 86-112. 10.1093/bib/bbk007.

    Article  PubMed  Google Scholar 

  29. Kotsiantis SB: Proceedings of the 2007 conference on emerging artificial intelligence applications in computer engineering: real word AI systems with applications in eHealth, HCI, information retrieval and pervasive technologies. Supervised machine learning: a review of classification techniques. 2007, IOS Press: Emerging Artificial Intelligence Applications in Computer Engineering, 3-24.

    Google Scholar 

  30. Fürnkranz J: Separate-and-conquer rule learning. Artif Intell Rev. 1999, 13 (1): 3-54. 10.1023/A:1006524209794.

    Article  Google Scholar 

  31. Bacardit J, Burke E, Krasnogor N: Improving the scalability of rule-based evolutionary learning. Memetic Computing. 2009, 1: 55-67. 10.1007/s12293-008-0005-4.

    Article  Google Scholar 

  32. Witten I, Frank E, Hall M: Data mining: practical machine learning tools and techniques. 2011, San Francisco, CA: Morgan Kaufmann, 3

    Google Scholar 

  33. Clutterbuck AL, Smith JR, Allaway D, Harris P, Liddell S, Mobasheri A: High throughput proteomic analysis of the secretome in an explant model of articular cartilage inflammation. J Proteomics. 2011, 74 (5): 704-715. 10.1016/j.jprot.2011.02.017.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Zwickl H, Traxler E, Staettner S, Parzefall W, Grasl-Kraupp B, Karner J, Schulte-Hermann R, Gerner C: A novel technique to specifically analyze the secretome of cells and tissues. ELECTROPHORESIS. 2005, 26 (14): 2779-2785. 10.1002/elps.200410387.

    Article  CAS  PubMed  Google Scholar 

  35. Ishihama Y, Oda Y, Tabata T, Sato T, Nagasu T, Rappsilber J, Mann M: Exponentially modified protein abundance index (emPAI) for estimation of absolute protein amount in proteomics by the number of sequenced peptides per protein. Mol Cell Proteomics. 2005, 4 (9): 1265-1272. 10.1074/mcp.M500061-MCP200.

    Article  CAS  PubMed  Google Scholar 

  36. Rappsilber J, Ryder U, Lamond A, Mann M: Large-scale proteomic analysis of the human spliceosome. Genome Res. 2002, 12: 1231-1245. 10.1101/gr.473902.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Keller A, Nesvizhskii AI, Kolker E, Aebersold R: Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal Chem. 2002, 74 (20): 5383-5392. 10.1021/ac025747h.

    Article  CAS  PubMed  Google Scholar 

  38. Nesvizhskii AI, Keller A, Kolker E, Aebersold R: A statistical model for identifying proteins by tandem mass spectrometry. Anal Chem. 2003, 75 (17): 4646-4658. 10.1021/ac0341261.

    Article  CAS  PubMed  Google Scholar 

  39. Deutsch EW, Mendoza L, Shteynberg D, Farrah T, Lam H, Tasman N, Sun Z, Nilsson E, Pratt B, Prazen B: A guided tour of the trans-proteomic pipeline. Proteomics. 2010, 10 (6): 1150-1159. 10.1002/pmic.200900375.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Kohavi R: 14th International joint conference on artificial intelligence: 1995. A study of cross-validation and bootstrap for accuracy estimation and model selection. 1995, 1137-1145.

    Google Scholar 

  41. Urbanowicz RJ, Granizo-Mackenzie A, Moore JH: An analysis pipeline with statistical and visualization-guided knowledge discovery for Michigan-style learning classifier systems. Computational Intelligence Magazine, IEEE. 2012, 7 (4): 35-45.

    Article  Google Scholar 

  42. Bassel GW, Glaab E, Marquez J, Holdsworth MJ, Bacardit J: Functional network construction in arabidopsis using rule-based machine learning on large-scale data sets. The Plant Cell Online. 2011, 23 (9): 3101-3116. 10.1105/tpc.111.088153.

    Article  CAS  Google Scholar 

  43. Glaab E, Bacardit J, Garibaldi JM, Krasnogor N: Using rule-based machine learning for candidate disease gene prioritization and sample classification of cancer gene expression data. PLoS ONE. 2012, 7 (7): e39932-10.1371/journal.pone.0039932.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003, 13 (11): 2498-2504. 10.1101/gr.1239303.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Patra D, Sandell LJ: Recent advances in biomarkers in osteoarthritis. Curr Opin Rheumatol. 2011, 23 (5): 465-470. 10.1097/BOR.0b013e328349a32b.

    Article  CAS  PubMed  Google Scholar 

  46. Straubinger RK, Straubinger AF, Härter L, Jacobson RH, Chang YF, Summers BA, Erb HN, Appel MJ: Borrelia burgdorferi migrates into joint capsules and causes an up-regulation of interleukin-8 in synovial membranes of dogs experimentally infected with ticks. Infect Immun. 1997, 65 (4): 1273-1285.

    CAS  PubMed  PubMed Central  Google Scholar 

  47. Parks WC, Wilson CL, Lopez-Boado YS: Matrix metalloproteinases as modulators of inflammation and innate immunity. Nat Rev Immunol. 2004, 4 (8): 617-629. 10.1038/nri1418.

    Article  CAS  PubMed  Google Scholar 

  48. Okada Y, Konomi H, Yada T, Kimata K, Nagase H: Degradation of type IX collagen by matrix metalloproteinase 3 (stromelysin) from human rheumatoid synovial cells. FEBS Lett. 1989, 244 (2): 473-476. 10.1016/0014-5793(89)80586-1.

    Article  CAS  PubMed  Google Scholar 

  49. Luo CC, Li WH, Chan L: Structure and expression of dog apolipoprotein A-I, E, and C-I mRNAs: implications for the evolution and functional constraints of apolipoprotein structure. J Lipid Res. 1989, 30 (11): 1735-1746.

    CAS  PubMed  Google Scholar 

  50. Mahley RW, Ji Z-S: Remnant lipoprotein metabolism: key pathways involving cell-surface heparan sulfate proteoglycans and apolipoprotein E. J Lipid Res. 1999, 40 (1): 1-16.

    CAS  PubMed  Google Scholar 

  51. Zebboudj AF, Imura M, Boström K: Matrix GLA protein, a regulatory protein for bone morphogenetic protein-2. J Biol Chem. 2002, 277 (6): 4388-4394. 10.1074/jbc.M109683200.

    Article  CAS  PubMed  Google Scholar 

  52. Dhore CR, Cleutjens JPM, Lutgens E, Cleutjens KBJM, Geusens PPM, Kitslaar PJEHM, Tordoir JHM, Spronk HMH, Vermeer C, Daemen MJAP: Differential expression of bone matrix regulatory proteins in human atherosclerotic plaques. Arterioscler Thromb Vasc Biol. 2001, 21 (12): 1998-2003. 10.1161/hq1201.100229.

    Article  CAS  PubMed  Google Scholar 

  53. Hardingham TE, Fosang AJ: Proteoglycans: many forms and many functions. FASEB J. 1992, 6 (3): 861-870.

    CAS  PubMed  Google Scholar 

  54. Clutterbuck AL, Asplin KE, Harris P, Allaway D, Mobasheri A: Targeting matrix metalloproteinases in inflammatory conditions. Curr Drug Targets. 2009, 10 (12): 1245-1254. 10.2174/138945009789753264.

    Article  CAS  PubMed  Google Scholar 

  55. Chandran V, Gladman DD: Update on biomarkers in psoriatic arthritis. Curr Rheumatol Rep. 2010, 12 (4): 288-294. 10.1007/s11926-010-0107-0.

    Article  CAS  PubMed  Google Scholar 

  56. Keyszer G, Lambiri I, Nagel R, Keysser C, Keysser M, Gromnica-Ihle E, Franz J, Burmester GR, Jung K: Circulating levels of matrix metalloproteinases MMP-3 and MMP-1, tissue inhibitor of metalloproteinases 1 (TIMP-1), and MMP-1/TIMP-1 complex in rheumatic disease. Correlation with clinical activity of rheumatoid arthritis versus other surrogate markers. J Rheumatol. 1999, 26 (2): 251-258.

    CAS  PubMed  Google Scholar 

  57. Rai MF, Sandell LJ: Inflammatory mediators: tracing links between obesity and osteoarthritis. Crit Rev Eukaryot Gene Expr. 2011, 21 (2): 131-142. 10.1615/CritRevEukarGeneExpr.v21.i2.30.

    Article  CAS  PubMed  Google Scholar 

  58. Nishiura H, Tanaka J, Takeya M, Tsukano M, Kambara T, Imamura T: IL-8/NAP-1 is the major T-cell chemoattractant in synovial tissues of rheumatoid arthritis. Clin Immunol Immunopathol. 1996, 80 (2): 179-184. 10.1006/clin.1996.0112.

    Article  CAS  PubMed  Google Scholar 

  59. Goldring MB: The role of cytokines as inflammatory mediators in osteoarthritis: lessons from animal models. Connect Tissue Res. 1999, 40 (1): 1-11. 10.3109/03008209909005273.

    Article  CAS  PubMed  Google Scholar 

  60. Righetti PG, Castagna A, Antonioli P, Boschetti E: Prefractionation techniques in proteome analysis: the mining tools of the third millennium. Electrophoresis. 2005, 26 (2): 297-319. 10.1002/elps.200406189.

    Article  CAS  PubMed  Google Scholar 

Pre-publication history

Download references


This study received grant support from the Biotechnology and Biological Sciences Research Council (BBSRC; grant number BB/F017014/1), Mars® and WALTHAM®.


The authors disclose no competing financial interests. None of the authors have any relationships that could be construed as biased or inappropriate. A. Mobasheri is the coordinator of the D-BOARD Consortium funded by European Commission Framework 7 program (EU FP7; HEALTH.2012.2.4.5–2, project number 305815, Novel Diagnostics and Biomarkers for Early Identification of Chronic Inflammatory Joint Diseases). D. Allaway is an associate at WALTHAM Centre for Pet Nutrition. S. Liddell and J. Bacardit are participants in D-BOARD. A. Mobasheri is a member and work package co-ordinator in the Arthritis Research UK Centre for Sport, Exercise and Osteoarthritis (Grant reference 20194).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Ali Mobasheri.

Additional information

Competing interests

This study has received industrial grant support from Mars® and WALTHAM®. The funding provided supported Kirsty L Hillier (through a one-year M.Res. studentship) and supplemented the BBSRC Industrial CASE Ph.D. studentship awarded to Anna L. Swan.

Authors’ contributions

Read, edited and approved the final manuscript: ALS, KLH, JRS, DA, SL, JB, AM. Conceived and designed the experiments: ALS, JB, SL, DA, AM. AM and JB contributed equally to the study design and conception. Performed the experiments: KLH, ALS, JRS. Analyzed the data: ALS, JB, AM. Contributed reagents/materials/analysis tools: ALS, JB. Wrote the paper: ALS, JB, SL, AM. All authors read and approved the final manuscript.

Jaume Bacardit and Ali Mobasheri contributed equally to this work.

Electronic supplementary material


Additional file 1: Table S1: Proteins identified by Mascot in the control (untreated) samples with corresponding Mascot scores. The Mascot score is a probability based score, used to determine the significance of a protein match. The higher the score the less likely it is that the protein match occurred by random. (DOC 108 KB)


Additional file 2: Table S2: Proteins identified by Mascot in the IL-1β treated samples with their corresponding Mascot scores. (DOC 134 KB)


Additional file 3: Table S3: Proteins identified by Mascot in the carprofen treated samples with their corresponding Mascot scores. (DOC 102 KB)


Additional file 4: Table S4: Proteins identified by Mascot in the samples treated with a combination of carprofen and IL-1β and their corresponding Mascot scores. (DOC 102 KB)


Additional file 5: Figure S1: SDS-PAGE protein profile of secretome from dog one. a) control (1,2,3,4), IL-1β (5,6,7,8) b) control (1,2,3,4), carprofen (5,6,7,8) c) control (1,2,3,4), IL-1β + carprofen (5,6,7,8). Molecular weight markers (M) (in kDa) were Bio-Rad Precision Plus unstained standards. (EPS 2 MB)


Additional file 6: Figure S2: SDS-PAGE protein profile of secretome from dog two. a) control (1,2,3,4), IL-1β (5,6,7,8) b) control (1,2,3,4), carprofen (5,6,7,8) c) control (1,2,3,4), IL-1β + carprofen (5,6,7,8). Molecular weight markers (M) (in kDa) were Bio-Rad Precision Plus unstained standards. Lanes 1 – 8 each contain 14-μg protein. Lane 9 contains blank loading buffer control. Arrows indicate differences in protein bands between sample sets. Gels were silver stained. (EPS 2 MB)

Authors’ original submitted files for images

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Swan, A.L., Hillier, K.L., Smith, J.R. et al. Analysis of mass spectrometry data from the secretome of an explant model of articular cartilage exposed to pro-inflammatory and anti-inflammatory stimuli using machine learning. BMC Musculoskelet Disord 14, 349 (2013).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: