Birmingham Behçet’s service: classification of disease and application of the 2014 International Criteria for Behçet’s Disease (ICBD) to a UK cohort

Background This study reports on the analysis of the application and diagnostic predictability of the revised 2014 ICBD criteria in an unselected cohort of UK patients, and the ensuing organ associations and patterns of disease. Methods A retrospective cohort study was conducted using a database of electronic medical records. Three categories were recognised: clinically defined BD, incomplete BD and rejected diagnoses of BD. We applied the ISG 1990 and ICBD 2014 classification criteria to these subgroups to validate diagnostic accuracy against the multidisciplinary assessment. Results Between 2012 and 2015, 281 patients underwent initial assessment at an urban tertiary care centre: 190 patients with a confirmed diagnosis of BD, 7 with an incomplete diagnosis, and 84 with a rejected diagnosis. ICBD 2014 demonstrated an estimated sensitivity of 97.89% (95% CI: 94.70 to 99.42) and positive likelihood ratio of 1.21 (1.10 to 1.28). The strongest independent predictors were: Central nervous lesions (OR = 10.57, 95% CI: 1.34 to 83.30); Genital ulceration (OR = 9.05, 95% CI: 3.35 to 24.47); Erythema nodosum (OR = 6.59, 95% CI: 2.35 to 18.51); Retinal vasculitis (OR = 6.25, 95% CI: 1.47 to 26.60); Anterior uveitis (OR = 6.16, 95% CI: 2.37 to 16.02); Posterior uveitis (OR = 4.82, 95% CI: 1.25 to 18.59). Conclusions The ICBD 2014 criteria were more sensitive at picking up cases than ISG 1990 using the multidisciplinary assessment as the gold standard. ICBD may over-diagnose BD in a UK population. Patients who have an incomplete form of BD represent a distinct group that should not be given an early diagnostic label. Behçet’s disease is a complex disease that is best diagnosed by multidisciplinary clinical assessment. Patients in the UK differ in their clinical presentation and genetic susceptibility from the original descriptions. This study also highlights an incomplete group of Behçet’s patients that are less well defined by their clinical presentation.


Background
Behçet's disease (BD) is a complex multisystem autoinflammatory disorder, which in its classic form, presents with recurrent oral aphthous ulcers, genital ulcers, and uveitis. Its aetiology is unknown but likely involves interplay between genetic and environmental factors [1]. BD has a very heterogeneous and unpredictable phenotype with the potential to involve the cardiovascular, renal, gastrointestinal, pulmonary, vascular, musculoskeletal, urological and central nervous systems to varying degrees [2][3][4].
Behçet's disease is more common, and often more severe, along the ancient Silk Road, which extended from eastern Asia to the Mediterranean [5,6]. It is most common in Turkey (80 to 370 cases per 100,000) [6], while the prevalence is much lower in Northern European and North American populations (1 per 15,000 to 1 per 500,000) [7]. The first symptoms often occur in young adults between 20 and 40 years of age but it is also infrequently seen in children. Familial clustering has been reported, although most cases of Behçet's are thought to be sporadic in onset [8,9].
As there is no universally accepted pathognomonic test for BD, diagnosis is based on the recognition of a particular but variable group of clinical manifestations. In 1990, these features were incorporated into the International Study Group (ISG) diagnostic/classification criteria based on data from a computer analysis of 914 patients with BD and 308 diseased controls with clinical features mimicking those of BD [10]. Although these were originally intended for the definition of patients participating in research programmes, they have since been shown to perform well in a clinical context and may be helpful in establishing a diagnosis [11]. More recent evaluations of the ISG collaboration have found lower sensitivity when compared to other classification criteria, leading to an initiative to develop newer criteria. In 2014, an international team from 27 countries (not including the UK) described the new International Criteria for Behçet's disease (ICBD), capable of "performing with good discriminatory potential regardless of country" and being "intuitive and easy to use in a wide variety of settings". In comparison to the earlier ISG criteria, the ICBD 2014 criteria included both vascular and neurological features, and assigned more points for the presence of oral or genital aphthosis and ocular findings. In the published evaluation, the newly proposed criteria exhibited much improved sensitivity over the older widely accepted ISG criteria while maintaining specificity. As a result, it was proposed that these revised criteria could be used as a tool for mass screening and identification of possible Behçet's patients in different clinical settings. The authors attempted to standardise and define a group of patients who had definite disease, and contrast these with patients who had fewer clinical manifestations and were unlikely to have Behçet's [12].
Here we report on the application and performance of both the ISG and ICBD criteria to a cohort of unselected patients referred to the Birmingham Behçet's Syndrome National Centre of Excellence since its inception in July 2012 to July 2015. The clinic was established in 1990 and has traditionally adopted a multidisciplinary approach to the management of patients with BD [13]. The aim of the study was to determine the frequency distribution of clinical characteristics for patients with clinically confirmed BD, possible but unlikely (termed 'incomplete') BD and a rejected diagnosis of BD. We compared the ISG and ICBD points-based scoring criteria with the gold standard multidisciplinary clinical assessment process (a core feature of our service design) to determine the probability of BD in our UK-based cohort, and reviewed the potential impact of the application of the new classification system as a screening tool for new referrals to the service. We investigated how many patients, who presented with typical clinical features of BD, would be reclassified if the newer ICBD 2014 classification criteria were employed.

Methods
This was a retrospective study of an unselected inception cohort in a tertiary referral centre. It was conducted in accordance with the Declaration of Helsinki, and approved by the local Trust Research and Development team and The London-Westminster Research Ethics Committee. Informed consent, including that from a responsible legal guardian in certain instances, was obtained from all patients before their details were entered onto the electronic database.

Data collection
All patients seen at the Birmingham Centre of Excellence for Behçet's disease since the Centre's inception in July 2012 until July 2015 were recruited consecutively. Data were obtained from the in-house database for all 281 patients irrespective of disease duration. Patients' clinical and demographic characteristics were collected and summarised, so as to discern local patterns of disease. With respect to pathergy, we assigned a total score based on the result of any formal testing as well as reported reaction, as this phenomenon is not currently routinely assessed in a UK population.
Patient diagnosis was established at the time of first presentation following multidisciplinary combined clinical assessment. Clinicians involved in this assessment included specialists in Rheumatology, Ophthalmology, Oral Medicine, Gynaecology, and where appropriate Gastroenterology and Dermatology, after review of the referral information and pre-clinic contact with the patient. Diagnoses were made collaboratively following independent specialist assessment. Three categories were recognised: clinically defined BD, incomplete BD and rejected diagnoses of BD. The groups were analysed separately to determine clinically defining or discriminating features. Patients who were classified as incomplete BD were felt to represent an interesting category, demonstrating some features consistent with but not necessarily diagnostic of BD, and not needing systemic disease modifying therapies at the time of assessment. Patients who presented with some clinical features of BD but were believed to have an alternative diagnosis were classified as a rejected group and usually discharged from follow up. We applied the ISG 1990 and ICBD 2014 classification criteria to these subgroups to validate diagnostic accuracy against the gold standard multi-disciplinary assessment process in our UK based cohort. A selection of cases were subsequently analysed in more detail: those who were clinically diagnosed following multidisciplinary review but nevertheless went on to meet the newer ICBD 2014 criteria. We compared variables between those patients who were newly classified as BD by ICBD with those in the BD group who were not reclassified.

Statistical analysis
All statistical analyses were conducted using IBM SPSS Statistics for Windows, Version 22.0 (Armonk, NY: IBM Corp.). Levels of continuous variables were expressed as means ± standard deviations. Continuous variables were compared between the categories of disease using a one-way analysis of variance with Tukey's Honestly Significant Difference test, and categorical variables were compared with Fisher's exact test. Diagnostic accuracies of ISG 1990 and ICBD 2014 criteria were represented by sensitivities, specificities, and likelihood ratios, where patients in the incomplete group were excluded from analysis. Odds ratios were obtained from a multivariable logistic regression analysis among variables with P values ≤ 0.05 in univariable analyses. For all statistical evaluations, P values ≤ 0.05 were considered to indicate statistical significance.

Results
A total of 281 patients were analysed: 190 patients with a confirmed diagnosis of BD, 7 with an incomplete diagnosis, and 84 with a rejected diagnosis following multispecialty clinical evaluation. Table 1 displays demographic data for the patient groups.
One-way analysis of variance indicated that the mean age of the three groups was significantly different. Posthoc Tukey's Honestly Significant Difference test also showed that the mean age was significantly different between the BD and rejected groups (P < 0.001). The proportion of males was not significantly different between the groups (P = 0.205, Fisher's exact test), however for ethnicity the three groups differed significantly (P = 0.002, Fisher's exact test). Forty-two of the 281 cases failed to meet ISG 1990 criteria; 38 of these went on to meet the newer ICBD 2014 criteria (reclassified). Twenty-six cases were classified as BD by the ISG criteria but were not confirmed clinically, all of whom were in the rejected diagnosis group; this increased to 68 cases for the ICBD 2014 criteria. The 16 patients who were both clinically and ICBD negative were given alternative diagnoses based on clinical assessment. These included ocular toxoplasmosis, vesicobullous autoimmune diseases, inflammatory bowel disease, Sweet's syndrome, idiopathic aphthous ulceration, nutritional and dietary deficiencies, and potential auto-inflammatory syndromes. All of the seven patients who were categorised as having incomplete BD exhibited oral ulcers, along with various other clinical manifestations that included: acneiform rash, pseudofolliculitis, non-specific arthralgia, inflammatory arthritis, enthesopathy, fibromyalgia, chronic diarrhoea and peripheral neuropathy. In addition, none of these patients satisfied either the ISG 1990 or ICBD 2014 criteria.
A comparison of the ISG 1990 and ICBD 2014 criteria and subsequent diagnostic accuracies compared with the gold standard clinical diagnoses are displayed in Table 2.
The Receiver Operating Characteristic (ROC) curve was calculated for the 2014 criteria in the study population (0-9 cut-off points; n = 274) (Fig. 1). The area under the ROC curve (AUC) was 0.818.
The frequencies of individual clinical features according to diagnosis are shown in Table 3. The majority of patients referred to the Behçet's service had recurrent oro-genital aphthous ulceration, however three of the patients with confirmed BD did not present with oral ulcers; all of these had significant inflammatory eye disease that was typical of BD.
All the statistically significant variables above remained significant when entered together into a multivariable binary logistic regression model: Genital ulceration P < 0.001: Odds ratio 9.05 with 95% confidence interval 3.35 to 24.47; Anterior uveitis P < 0.001: Odds ratio 6. 16  Thirty-eight patients were given a clinical diagnosis of BD based on multidisciplinary clinical assessment and met the ICBD but not the earlier ISG criteria. Table 4 represents a comparison between those patients in the BD group who were reclassified based on ICBD and those who were not. The reclassified patients exhibited higher vascular and neurological scores.

Discussion
Our data shares some similarities with that obtained from the development of the new multinational classification scheme for BD [12]. In our cohort, ICBD 2014 demonstrated an estimated sensitivity of 97.89% (95% CI: 94.70 to 99.42), compared to 94.8% (95% CI: 93.40 to 95.9) quoted in the original validation set, considerably higher than that of the ISG 1990 criteria (77.89%). We found a positive likelihood ratio of 1.21 (1.10 to 1.28). Nevertheless, we measured the specificity of ICBD 2014 to be much lower than that revealed in the original data set (19.05% (11.30 to 29.08), compared with 90.5% (95% CI: 87.9 to 92.8%). The reason for this discrepancy is that, in the new classification, a score of four can be achieved solely with the presence of orogenital ulcers; however following clinical assessment patients were often diagnosed with an alternative explanation for their ulcers, such as idiopathic or postinfectious aetiology, which would have given rise to higher false positive rates. This is supported by the finding that the symptoms showing the highest frequency but the least discriminatory utility in our cohort were oral and genital ulcers.
There were three patients who were diagnosed with BD on clinical grounds but did not report classic recurrent oral aphthosis; all of these were diagnosed following identification of characteristic ophthalmic changes of panuveitis and retinal vasculitis. These patients would not have satisfied the earlier ISG 1990 criteria for diagnosing BD given the lack of recurrent oral ulcers; however it is recognised that ocular disease may be the initial manifestation in about 20% of cases. In one series, anterior uveitis was present in 59% of cases, posterior uveitis was present in 76% of cases, and panuveitis was present in 88.1% of cases [14]. Since there is no pathognomonic clinical sign or laboratory test to distinguish BD from other uveitic causes, the diagnosis must be made based on characteristic ocular and systemic findings in the absence of evidence of other disease that can explain the findings. This has led some to develop diagnostic or classification criteria, for use in the uveitis community, that rely on a minimum number and/or combination of clinical findings to identify Behçet's disease [15]. The incomplete group, described in the original combined testing and validation sets of the ICBD criteria as those with 'possible but unlikely BD' , demonstrated undisputed presence of oral ulcers (100%), along with several skin and thrombotic manifestations. Patients in this group were predominantly female. The authors believe that these patients form an interesting subgroup, which should ideally be monitored for the development of more specific target organ associations but should not be given a diagnostic label due to uncertainty about progression and treatment. None of the patients in the incomplete group were started on systemic immunomodulatory therapies. In the reclassification group, more patients were female, and central nervous system lesions appeared statistically predictive for the development of BD. Peripheral nervous system lesions also showed a trend towards redefining disease according to the newer criteria. There was no history of pathergy nor was it observed on testing in all patients in this group.
Behçet's disease, like most other rheumatic diseases, lacks a gold-standard test with a high degree of sensitivity and specificity, making it necessary to develop classification and diagnostic criteria to guide researchers and clinicians. Classification criteria are designed to define a homogenous population with similar clinical features suitable for research studies. They are essential for our understanding of disease pathogenesis, treatments outcomes, for entry into clinical trials, and as such, increase the specificity for underlying disease while at the same time lose sensitivity on Receiver Operating Characteristic (ROC) curve analysis. Conversely, the goal of diagnostic criteria is to have a high sensitivity and positive predictive value (PPV) so as not to exclude individuals with possible disease. The specificity and PPV of ICBD was considerably lower than ISG in this cohort, which reflects the degree of false positive cases referred to the service and falsely thought to have BD. These indices are Fig. 1 The receiver operator curve and sensitivity and specificity for the ICBD criteria likely to be even more truncated in local Rheumatology centres where the prevalence of true BD is far lower. For these reasons, it is important for criteria to be tailored to the practice setting and clinicians to formulate a diagnosis based on sound clinical judgement and experience in rare conditions.
To further understand how "universal" classification criteria can be employed in individual clinical settings, one should appreciate the role of Bayes' theorem, named after the 18 th century English statistician, philosopher and minister. It states that the odds of having a disease is equal to the pre-test odds multiplied by the likelihood ratio; the former being determined by the prevalence in the population and the latter by the sensitivity and specificity in the data set [16]. Both types of criteria are highly dependent on the disease prevalence in the patient population being investigated. Bayes informs us that a set of criteria can only be accurately applied to the cohort for which it was designed. In light of this, it is important to realise that there are important genotypic and phenotypic variations in disease expression between different populations in BD.
The genetic locus most widely studied in BD is the human leukocyte antigen (HLA) complex on chromosome 6p21. Disease susceptibility has consistently been associated with polymorphisms in the HLA-B gene, particularly HLA-B*51 [17]. A recent meta-analysis showed a significant increase in the risk of HLA-B*51 carriers to develop BD compared with non-carriers across multiple geographic locations [18]. Nevertheless, the HLA-B*51 association is not invariable: the relative risk of disease with this haplotype is known to be stronger in Turkish, Middle Eastern, and Japanese populations than in Caucasians [19], and ethnic differences are thought to have a major impact on clinical expression of BD [20]. More recently, Caucasian patients from the UK have also been found to express HLA-B*57, another susceptibility gene that carries a relative risk of disease equivalent to that of HLA-B*51 [21]. Other HLA alleles may also increase or decrease the risk for Behçet's in various populations and in men and women [22][23][24][25]. Moreover, in a UK population, pathergy reaction is a relatively rare phenomenon when tested for [26]. There are likely to be further gender-specific differences such as those identified by the German Adamantiades Behcet's disease registry data [27]. These findings deserve greater attention to define the exact biological and clinical profiles of both true and incomplete Behçet's in the UK in case they represent distinct pathobiological entities that are separate from the classic Silk Road descriptions. Our study revealed a low proportion of patients from both Turkish and Middle Eastern ethnic groups, which reflects the demography of Birmingham.
This study has several strengths: This is the first time that the application and diagnostic predictability of the newer ICBD 2014 classification criteria have been investigated for clinical use in a cohort of patients referred to a National Behçet's Centre in the UK. The patients referred to the service represent an unselected cohort, which arguably assesses a set of discriminatory criteria in a more accurate and real-time manner than selecting out a control group beforehand with other final diagnoses. Our data compare the classification criteria against the conventional clinical diagnosis, and depicts the rate of reclassification in patients who presented with clinical manifestations but did not fulfil the previous classification criteria. This study goes further in defining a relatively under-recognised group of 'incomplete' patients who appear to have an undifferentiated inflammatory condition but who do not currently exhibit sufficient diagnostic certainty for BD. Future research may help to further understand the biological factors that are relevant for these patients. Our study has some limitations. Firstly, it was retrospective and conducted at a single time point, implying that it is difficult to be certain about future development of clinically defining features or severity of disease over time. Secondly, the ophthalmic indicators of BD may have been over-represented in our cohort as our hospital is an internationally renowned tertiary referral centre for uveitis.

Conclusions
In summary, the in-house data collection system linked to the electronic medical records enabled effective evaluation and investigation of the Birmingham National Behçet's Centre of Excellence to be undertaken. As expected, the proposed ICBD 2014 criteria were more sensitive at picking up cases than ISG 1990 using the multidisciplinary clinical assessment process as the gold standard. Specificity was less than expected for both criteria but particularly so for ICBD, as in our hands certain clinical features were not always judged to be attributable to a BD diagnosis and gave rise to high false positive values, though time and future follow-ups are likely to improve performance. ICBD may serve as a useful validated screening tool for BD but in our hands in a predominantly UK population, appears to over-diagnose BD. The gold standard for diagnosis should remain the multidisciplinary clinical assessment, and if criteria are to be used to assist with diagnosis, then we would suggest reverting back to the older ISG 1990 criteria until a more suitable alternative can be validated for use in a UK population. This study also highlights the need for further international harmonisation on potential geographic variations in BD clinical presentations. Future research may wish to investigate the revised ICBD criteria in all three UK National Behçet's Centres.

Funding
No funding was obtained for this study.

Availability of data and materials
The datasets used and/or analysed during the study are available from the corresponding author on reasonable request.