Usefulness of the SF-36 Health Survey in screening for depressive and anxiety disorders in rheumatoid arthritis

Background This study aimed to assess the accuracy of the Short-Form Health Survey (SF-36) mental health subscale (MH) and mental component summary (MCS) scores in identifying the presence of probable major depressive or anxiety disorder in patients with rheumatoid arthritis. Methods SF-36 data were collected in 100 hospital outpatients with rheumatoid arthritis. MH and MCS scores were compared against depression and anxiety data collected using validated measures as part of routine clinical practice. Sensitivity and specificity of the SF-36 were established using receiver operating characteristic (ROC) curve analysis, and area under the curve (AUC) compared the performance of the SF-36 components with the 9-item Patient Health Questionnaire (PHQ9) for depression and the 7-item Generalised Anxiety Disorder (GAD7) questionnaire for anxiety. Results The MH with a threshold of ≤52 had sensitivity and specificity of 81.0 and 71.4 % respectively to detect anxiety, correctly classifying 73.5 % of patients with probable anxiety disorder. A threshold of ≤56 had sensitivity and specificity of 92.6 and 73.2 % respectively to detect depression, correctly classifying 78.6 % of patients, and the same threshold could also be used to detect either depression or anxiety with a sensitivity of 87.9 %, specificity of 76.9 % and accuracy of 80.6 %. The MCS with a threshold of ≤35 had sensitivity and specificity of 85.7 and 81.9 % respectively to detect anxiety, correctly classifying 82.8 % of patients with probable anxiety disorder. A threshold of ≤40 had sensitivity and specificity of 92.3 and 70.2 % respectively to detect depression, correctly classifying 76.3 % of patients. A threshold of ≤38 could be used to detect either depression or anxiety with a sensitivity of 87.5 %, specificity of 80.3 % and accuracy of 82.8 %. Conclusion This analysis may increase the utility of a widely-used questionnaire. Overall, optimal use of the SF-36 for screening for mental disorder may be through using the MCS with a threshold of ≤38 to identify the presence of either depression or anxiety. Electronic supplementary material The online version of this article (doi:10.1186/s12891-016-1083-y) contains supplementary material, which is available to authorized users.


Background
Rheumatoid arthritis (RA) is a chronic, painful, progressive condition, which has a substantial impact on patients' quality-of-life (QoL) [1]. The prevalence of depression in this condition is high, with a recent meta-analysis [2] revealing that an estimated 38.8 % of patients screen positive for probable major depressive disorder (pMDD) according to the 9-item Patient Health Questionnaire (PHQ9; [3]). Common mental disorders such as pMDD or probable generalised anxiety disorder (pGAD) can have implications for long-term health outcomes; depression and anxiety are associated with increased fatigue [4], impaired long-term disease activity and physical disability [5], and reduced treatment efficacy [6].
Despite its prevalence and importance, mental health is rarely measured either in rheumatological research or in clinical practice, reported as an outcome in less than 8 % of published research [7]. QoL is more frequently measured (in 19 % of studies), most often with the Short-Form Health Survey (SF-36 [8]) [7]. The SF-36 has been extensively validated as a measure of QoL in multiple populations and is the most widely used and evaluated QoL outcome measure [9]. The SF-36 consists of 8 domains, which assess physical function (PF), role physical (RP), bodily pain (BP), global health (GH), vitality (VI), social function (SF), role emotional (RE) and mental health (MH). Scores on these subscales can also be combined to create two higher-order summary scores: the physical component summary (PCS) and mental component summary (MCS). The PCS is calculated by positively weighting the 4 physical subscales (PF, RP, BP and GH), and by negatively weighting the psychological subscales (VI, SF, RE and MH). Conversely, the MCS is created by positively weighting the psychological subscales and negatively weighting the physical subscales.
There are several similarities between the SF-36 MH subscale and typical depression and anxiety screening questionnaire. Items relating to low mood ("Have you felt downhearted and low?"), tiredness ("Did you feel tired?"), nerves ("Have you been a very nervous person") and restlessness ("Did you have a lot of energy") are comparable to items such as "Feeling down, depressed or hopeless" (PHQ9 item 2), "Feeling tired or having little energy" (PHQ9 item 4), "Feeling nervous, anxious or on edge" (GAD7 item 1) and "Being so restless that it is hard to keep still" (GAD7 item 5). Additionally, the weighting of other QoL domains introduced when combining subscale scores for the MCS include other depression and anxiety symptoms, such as psychosomatic symptomatology and emotional interference with daily activities.
Validating the MH and MCS constructs within the SF-36 as screening tools for depression and anxiety may add extra utility to a questionnaire which is already frequently used for research purposes, and could also provide additional room for interrogation in clinical trial datasets which measure QoL but not mental health. The identification of useful thresholds can also have implications for implementing change in clinical practice. For example, patients attending general outpatient appointment at King's College Hospital NHS Foundation Trust are required to complete the PHQ9 and GAD7 along with other patient reported outcomes, such as pain and fatigue visual analogue scales (VAS), and the Health Assessment Questionnaire (HAQ [10]) [11], on tablet devices while they wait for their appointment. The results of these assessments are made available immediately on their electronic health record, with advice for onward referral if required [11]. To date, the SF-36 MH and MCS scores have been validated as screening tools for depression and anxiety in an elderly population [12], however a similar validation process has yet to be performed in an RA sample.
We aimed to: 1) examine the relationships between MH and MCS SF-36 domains and depression, anxiety, and indicators of disease severity; 2) assess the accuracy of the MH and MCS SF-36 domains in identifying the presence of pMDD and pGAD, and describe the sensitivity and specificity of cut-off scores in the SF-36 MH and MCS in screening for psychological disorder; and 3) to recommend the most appropriate threshold with which to identify pMDD, pGAD, or presence of any psychological disorder (pMDD or pGAD).

Method
Setting Data were collected using questionnaires administered to consecutive RA outpatients attending outpatient appointments at King's College Hospital, an inner city hospital.

Eligibility criteria
In order to be eligible to participate in this study several inclusion and exclusion criteria applied. Inclusion criteria were: 1) Having sufficient English to complete the questionnaire, or having at translator present to assist; 2) Able to give informed consent, i.e. no substantial learning disability or dementia. The following exclusion criteria were applied: 1) No clinic data collected within an appropriate timeframe (same day for psychological variables, ± 3 months for disease activity and disability measures); 2) Severe disability such as blindness or extreme frailty precluding the ability to answer independently.

Procedure
Consecutive patients attending outpatient appointments were introduced by their clinicians to the researcher, who then confirmed eligibility and provided patients with the study information sheet. This information sheet explained the purpose of the research project, and confirmed that all responses would remain anonymous and be analysed in a confidential manner. Patients were also informed that they would be free to withdraw at any time.
Consenting participants were asked to provide their hospital number and asked to complete the SF-36.
Patients were asked to either complete the SF-36 questionnaire before leaving the hospital, or were provided with a stamped addressed envelope to complete the questionnaire at home.
The SF-36 were combined with data collected routinely in clinic. This clinical data includes assessment of disease activity (DAS-28), and patient reported outcomes including: physical disability, pain and fatigue, as well as depression and anxiety. Every patient attending an outpatient appointment is asked to complete these patientreported outcomes as part of their pre-appointment assessments. Patients can choose not to complete these assessments. To be eligible for inclusion in the current analysis, clinic data had to be collected on the same day as the SF-36, to ensure SF-36 data represented current mood and physical health. Hospital numbers were used to link questionnaire data with clinical data, and data were pseudonymised (with hospital numbers replaced with study identification numbers) by an independent database manager before release to the researcher. This procedure was approved by the Midlands National Research Ethics Service Committee (reference: 14/WM/0173).

Recruitment
Data were collected between July 2014 and February 2015, on one day per week. This day was consistent across the recruitment period, as it had previously been identified as having several dedicated RA clinics running, therefore likely to yield the more eligible patients than other days of the week.
The target sample size was set at 100. This decision was based on the accuracy with which the area under the ROC curve (AUC) would be estimated. Specifically, for an AUC of 80 % or higher, width of the 95 % confidence interval would be no larger than +/− 8 %, which was deemed acceptable.

Outcome measures Depression
Depression was measured routinely in clinic using the 9item Patient Health Questionnaire (PHQ9 [3]), which has been recommended by the National Institute for Health and Care Excellence (NICE) for use in adult patients with chronic physical health problems [13]. Probable major depressive disorder (pMDD) was defined as scoring "more than half the days" or "nearly every day" within the last two weeks on at least one of the first two items of the PHQ-9 (low mood and anhedonia), and on at least five out of all nine symptoms. This categorical algorithm for identifying pMDD with the PHQ9 has 83 % (95 % CI: 72-91 %) sensitivity and 90 % (95 % CI: 87-93 %) specificity when validated against the "gold standard" Structured Clinical Interview for DSM-IV for identifying pMDD, and has an overall accuracy of 89 % (95 % CI: 86-92 %) [14].

Disease variables
Disease activity was quantified using the 28-joint disease activity score (DAS28). This DAS28 is recommended by all major RA guidelines and is considered to be the gold-standard indicator of disease activity [16]. The DAS28 takes into account subjective and objective markers of disease severity: erythrocyte sedimentation rate (ESR) and a clinician-recorded swollen joint count (SJC) provide objective indication of inflammation; patient-reported tender joint count (TJC) and a patient global assessment (PGA) provide subjective elements of disease activity. This are combined and weighted to form an overall DAS28 score, with higher scores indicating worsened disease activity. A scores of ≤2.6 would suggest the patient is in a state of remission; a score of 2.6-3.2 would suggest low disease activity; scores between 3.2 and 5.1 would indicate moderate disease activity; and scores of over 5.1 would suggest high disease activity [17].
The HAQ [10] is a patient-reported measure of physical disability. It contains questions relating to 8 domains of activities of daily living: dressing and grooming; rising; eating; walking; hygiene; reach; gripping and opening things; and daily activities. Scoring for each section ranged between 0 ("without any difficulty") to 3 ("unable to do"), and total scores are on a scale of 0-3, with higher scores indicating higher levels of disability.
Pain and fatigue data were each collected via 0-100 visual analogue scales, with higher scores indicating increased levels of pain/fatigue.

SF-36
SF-36 subscales were calculated according to the SF-36 manual [18], resulting in 8 subscale scores: physical function (PF), role physical (RP), bodily pain (BP), general health (GH), vitality (VI), social functioning (SF), role emotional (RE), and mental health (MH). Physical component summary (PCS) and mental component (MCS) scores were calculated by norming subscale scores against population scores obtained from a normative UK dataset [19]. Normed subscale results were then weighted appropriately to calculate PCS and MCS totals, where a score of 50 represents the mean of the UK population (SD10).

Statistical analyses
Linear regression analyses examined any differences in mental and physical health between patients who participated in the study and patients who did not, to assess for potential selection bias. Patients who did not have clinical data collected within the specified timeframe (same day for psychological variables, ± 3 months for disease activity and disability measures), had their next closest clinical data included in this analysis only, to test for significant differences in disease state between those recruited and the remaining clinical population. Pearson's correlation analyses will assess the associations between PHQ9 depression scores, the SF-36 mental health subscale and mental component summary and: age; anxiety; disease duration; fatigue; pain; HAQ; DAS28 and its components TJC, SJC, ESR and PGA; CRP; BMI; the SF-36 subscales PF, RP, BP, GH, VI, RE, SF and PCS; and illness perceptions of consequence, timeline, personal control, treatment control, identity, concern, coherence and emotional representation.
Main analysis included sensitivity, specificity, predictive value and likelihood ratio assessment, along with ROC curve analysis to combine sensitivity and specificity. The AUC was used to compare the performance of the MH and MCS SF-36 components with PHQ9 to define depression and GAD7 to identify anxiety. Optimal thresholds for identifying depression with the MH and MCS were found through identifying the threshold which provided the highest level of sensitivity, with the least sacrifice in specificity. Positive and negative likelihood ratios described the accuracy of the SF-36 in detecting cases of pMDD and pGAD, with positive likelihood ratio thresholds of >10, 5-10, 2-5 and 1-2 indicating large, moderate, small (important), small (unimportant) changes in probability, and negative likelihood ratio thresholds of <0.1, 0.1-0.2, 0.2-0.5 and 0.5-1.0 indicating large, moderate, small (important), small (unimportant) changes in probability [20]. Analyses were conducted using STATA v11.

Results
A total of 244 individual patients attended appointments throughout the recruitment period. Of these, 119 (48.8 %) met all eligibility criteria and were invited to participate. Of the patients who did not meet eligibility criteria (N = 28): 2 had severe learning disabilities and could not provide informed consent; 6 patients were too disabled to answer independently; and 10 patients did not speak enough English and did not have a translator present. Nine eligible patients (7.6 %) were unable to be approached due to time constraints in the clinic. Ten eligible patients (8.4 %) declined to participate. A total of 107 patients attending appointments did not have clinic data collected within the previously specified eligible timeframe (same day for psychological variables, ± 3 months for disease activity and disability measures), therefore precluding them from meaningful comparison with the SF-36 data (as SF-36 data would not represent current mood state, or recent disease status). In total, 100 patients were successfully recruited, yielding a total participation rate of 84.0 % (Fig. 1). Table 1 shows the mean scores of pertinent available mental and physical variables across all patients attending appointments during the recruitment period. In comparison to patients who were recruited, ineligible patients were significantly older (p = 0.007), and had higher levels of pain (p = 0.03), disability (p = 0.01), and DAS28 (p = 0.02). Neither depression nor anxiety levels significantly differed across participation statuses. Table 2 reports the descriptive statistics for the study sample. A total of 27 % of patients screened positive for pMDD and 22.5 % for pGAD. The mean disease duration was 6.5 years, and the mean DAS28 was 3.9 (SD = 1.7). On a scale of 0-100, where higher scores represent better quality-of-life, the mean MH and MCS scores were 59.7 (SD = 23.4) and 41.7 (SD = 12.5) respectively. Overall, the sample had moderate disease activity (DAS-28 m = 3.9, SD = 1.7). Table 3 summarises the correlational relationships between continuous variables.

Associations between variables
There were several commonalities in associations between depression, anxiety MH and MCS. All three showed similar strength and direction of (or lack of ) association with fatigue, pain, disability (HAQ), TJC, SJC, PGA, DAS28, and all SF-36 variables.
However whereas lower age was associated with increased anxiety, MH and MCS, no association was found between age and depression. Higher depression scores were found to be associated with increased ESR, no such association was found between ESR and anxiety MH and MCS.

Sensitivity, specificity and receiver operating characteristic (ROC) curves for probable Major Depressive Disorder (pMDD)
The results of the ROC curve are shown in Fig. 2. The overall accuracy with which the SF-36 MH and MCS scales identify patients with pMDD are 86.0 and 88.9 % respectively. A full list of cut-points identified for pMDD using the MH and MCS are provided in the Additional file 1: Table S1.

SF-36 Mental Health (MH) subscale
A threshold of ≤56 provides a sensitivity of 92.6 % and a specificity of 73.2 %, correctly classifying 78.6 % of RA patients with pMDD. This is the equivalent to a normed MH subscale score of ≤40. The positive likelihood ratio of 3.5 suggests a small but important increase in the likelihood of the presence of pMDD in the case of an overall score ≤56. The negative likelihood ratio of 0.1 indicates a moderate decrease in the likelihood of pMDD in the case of an overall score of >56. Sensitivity, specificity and receiver operating characteristic (ROC) curves for probable Generalised Anxiety Disorder (pGAD)

SF-36 Mental Component Summary (MCS)
The results of the ROC curve are shown in Fig. 3. The overall accuracy with which the SF-36 MH and MCS Table 1 Mean scores for mental and physical health variables across attending patients      Table S1. Sensitivity, specificity and receiver operating characteristic (ROC) curves for any mental disorder (pMDD or pGAD) The results of the ROC curve are shown in Fig. 4. The overall accuracy with which the SF-36 MH and MCS scales identify patients with any mental disorder (pMDD or pGAD) are 89.4 % and 89.5 % respectively. A full list of cut-points identified for pMDD or pGAD using the MH and MCS are provided in the Additional file 1: Table S1.

SF-36 Mental Health (MH) subscale
A threshold of ≤56 provides a sensitivity of 87.9 % and a specificity of 76.9 %, correctly classifying 80.6 % of RA patients with pMDD or pGAD. This is the equivalent to a normed MH subscale score of ≤40. The positive likelihood ratio of 3.8 suggests a small but important increase in the likelihood of the presence of mental disorder in the case of an overall score ≤56. The negative likelihood ratio of 0.2 indicates a moderate decrease in the likelihood of any mental disorder in the case of an overall score of >56.

SF-36 Mental Component Summary (MCS)
A cut-point of 38 on the normed MCS provides a sensitivity of 87.9 % and a specificity of 76.9 %. This threshold correctly classified 82.8 % of RA patients with any mental disorder. A positive likelihood ratio of 4.4 suggests a small but important increase in the likelihood of pMDD or pGAD being present, and the negative likelihood ratio of 0.2 suggests a moderate decrease in the likelihood of pMDD or pGAD in the case of an MCS score of >38.

Discussion
The results of this analysis suggest that the SF-36 can be used to determine the presence of pMDD, pGAD, or general psychological disorder, and potential thresholds have been suggested for these diagnoses. However optimal use of the SF-36 for screening for mental disorder may be by utilising a threshold of ≤38 on the MCS, to identify the presence of any psychological disorder. This threshold had good sensitivity (88 %) and specificity (80 %), and correctly classified 83 % of patients with either pMDD or pGAD. This can be compared to sensitivity and specificity of 95 and 66 % respectively for the NICE-recommended questions for identifying depression in patients with chronic physical health problems [13]. A similar validation study in an elderly population identified 42 as an appropriate MCS threshold to identify significant psychological distress [12]. Our identified threshold of 38 is comparable to this. As the MCS is normed, with a score of 50 (standard deviation = 10) representing the UK population, this threshold identifies patients with a score of 1 standard deviation below the population mean (scoring in the bottom 16 % of the population) as being at risk of pMDD or pGAD.
There are several elements to take into consideration when evaluating the research process described. The primary limitation is the lack of a "gold-standard" depression measure with which to validate the SF-36 domains.