Study population
The cohort was recruited between 2002 and 2005 and has been described previously in Cibere et al. [20] In brief, 255 subjects aged 40–79 with knee pain were recruited as a population-based random sample in Vancouver, Canada. Stratified sampling was used in order to achieve equal representation within age decades and gender. The pain criteria for inclusion specifically were, “pain, aching, or discomfort in or around the knee on most days of the month at any time in the past” [20], and “any pain, aching, or discomfort in or around the knee in the past 12 months” [20]. While all subjects had knee pain, not all had osteoarthritis. Based on radiograph and magnetic resonance imaging (MRI) cartilage scores, subjects were classified into three subgroups: “no OA” on both MRI and x-ray (both normal) (13%), “pre-radiographic OA” on an abnormal MRI (but normal x-ray) (49%), and “radiographic OA” on an abnormal x-ray (38%). Subjects were excluded both at baseline and/or at follow-up if any of the following applied, as described in Cibere et al. [20]: inflammatory arthritis (or fibromyalgia), knee arthroplasty, knee injury or surgery within the past 6 months, knee pain referred from hips or back, or unable to undergo MRI. Of 255 subjects seen at baseline, 122 subjects successfully completed the second follow-up. Details about reasons for dropout have been described previously in Sayre et al. [21].
Outcomes and predictor variables
Depression and anxiety were measured via the Hospital Anxiety and Depression Scale (HADS), and the binary outcome variables were based on HADS scales measuring 5+ for depression and 7+ for anxiety [22,23,24].
Predictor variables related to OA were selected a priori by the study team for clinically relevant items from the literature, or those thought to be clinically relevant, from a much larger set of variables that had been collected on these subjects. We focused on items that are easily accessible to clinical practitioners (such as those measured during a clinical exam, collected via self-reported questionnaire, or scored on a plain radiograph), and/or easily noticeable by (and potentially worrisome to) patients. We identified three major sources of such potential predictors evaluated at baseline.
The first source of potential predictors was derived from a baseline physical examination of the knee. These variables have all previously demonstrated good reliability according to an inter-rater intraclass correlation coefficient (ICC) of at least 0.8 [25]. Data included: alignment by visual inspection (normal/varus/valgus; ICC = 0.94); effusion (present/absent; ICC = 0.97); flexion (degrees; ICC = 0.85); flexion contracture (degrees; ICC = 0.95); crepitus (none/fine/coarse; ICC = 0.96); quadriceps atrophy (none/mild/severe; ICC = 0.97). Alignment was entered into the models as any malalignment vs. normal alignment. Flexion was dichotomized as > = 135 vs. < 135 degrees in the models for ease of interpretation and application by clinicians, who don’t use a goniometer to measure flexion; > = 135 was chosen as the cut point due to its representing “normal” flexion. Similarly, flexion contracture was dichotomized as > = 1 vs. < 1 degree in the models for ease of interpretation and application by clinicians; > = 1 was chosen as the cut point due to its representing the presence of a flexion contracture. Quadriceps atrophy was entered into the models as any atrophy vs. none (i.e., normal).
The second source was derived from a self-reported questionnaire at baseline. Data included: knee swelling in the past 12 months (yes/no); painful joint count using a homunculus; and a previous physician’s diagnosis of study knee OA (no/probable/definite). For knee-related questions, information on the study knee was used in the analysis. In subjects with bilateral pain, the worse (more painful) knee was used as the study knee. Subjects also completed the Western Ontario and McMaster Universities (WOMAC) Osteoarthritis Index VA3.1 [26]. WOMAC scales for pain and stiffness were normalized to a 0–100 range (higher worse), and dichotomized for ease of interpretation and application by clinicians as 25+ vs. < 25 in the analysis; 25 was chosen as the cutpoint for pain based on the univariate distribution (75th percentile), and was also used for stiffness for consistency as it was numerically close to that 75th percentile.
The third source was derived from an x-ray at baseline, obtained using a weight-bearing fixed-flexion posteroanterior view with the SynaFlexer (BioClinica Inc., Newark, CA) positioning frame, and a skyline view in the supine position [27]. Radiographs were read blinded to clinical information, and read and scored by two independent readers for Kellgren-Lawrence (KL) 0–4 grading [28]. Previous studies using these data have demonstrated good interrater reliability (ICC = 0.79) [29]. Differences in readings were adjudicated by consensus readings with both readers. Baseline KL grade was collapsed into 0, 1, 2 or 3+ in the analysis. KL grades 3 and 4 were collapsed due to distribution—only 6% had KL grade 4 at baseline.
Finally, age decade, sex and body mass index (BMI) ≥25 (representing overweight) were included as potential predictors.
Statistical analysis
Baseline data were summarized using frequencies or means (+ standard deviation), stratified separately by baseline depression and anxiety. Logistic regression was used to model binary depression and anxiety at baseline, as well as at first and second follow-up. In the follow-up models for depression, those with depression at baseline were excluded. Similarly, in the follow-up models for anxiety, those with anxiety at baseline were excluded. Multivariable models were selected via lowest Akaike’s information criterion (AIC), with the additional restriction of p-values < alpha = 0.15. The p-value < 0.15 restriction was not a pre-screening, but rather imposed during selection, and was meant to prevent possible instability in the model if variables less significant than that were included. Odds ratios (ORs) along with 95% confidence intervals (CIs) were computed from each selected model. Model fit was assessed via the Hosmer and Lemeshow goodness of fit test, which is based on the agreement between observed and predicted probabilities [30]. In addition to selecting models on the basis of predictive utility via lowest AIC, the predictive utility of each selected multivariable model was also assessed via area under the receiver operating characteristic (ROC) curve (AUC), which were reported along with their 95% confidence intervals.
For consistency, the primary analysis utilized the common sample that was successfully followed to 7.5 years in all three analyses (baseline, 3-year and 7.5-year models). However, this could raise the question about a possible survival bias. To assess whether this may have played a role in any observed associations, we performed additional sensitivity analyses in which the selected multivariable models at first follow-up and baseline were re-fit using all available data for their respective time points, rather than only the smaller sample followed to 7.5 years. In addition to this, we compared baseline covariates as well as baseline and 3-year follow-up depression and anxiety in the portion of the baseline sample that was followed up to 7.5 years vs. the portion that was not, via chi-square tests for categorical variables, or t-tests for continuous variables.
To obtain population-based estimates, all analyses were performed using age decade-gender stratum sampling weights [20]. This was required as the original sample was collected in a manner to ensure adequate sample sizes across all in-scope age decade-gender groups (namely 40–79 male/female), for example older age decades were over-sampled and younger decades under-sampled. Descriptive statistics are also reported using the sampling weight, for population representativeness and to reflect the data used in the models.
All analyses were performed using SAS v9.4 (SAS Institute, Cary, North Carolina).