Setting
Kaiser Permanente Northern California (KPNC) is a large integrated healthcare delivery system that serves over four million members. Since 1995, centralized electronic databases comprising pharmacy records, ambulatory visit and hospitalization diagnoses, clinical encounters, and imaging reports have been maintained, with the ability to link across administrative databases and membership records. Digital radiologic images have been centrally available starting in 2002, with such images from all KPNC imaging centers accessible by 2005.
Study population
The study population included an existing cohort of all KPNC female members aged 45–84 years old who were identified from health plan pharmacy records as having initiated oral BP therapy with alendronate, risedronate or ibandronate between January 2002 and September 2014 [6]. Women without health plan membership for the 2 years before BP initiation and those who received intravenous BP (zoledronic acid, pamidronate or ibandronate) or etidronate any time before oral BP initiation were excluded. We also excluded women with any of the following within 2 years before BP initiation: diagnosed metastatic cancer beyond lymph nodes (ICD-9 197.x-199.0), multiple myeloma (ICD-9 203.0x), Paget’s disease of the bone (ICD-9 731.0), osteogenesis imperfecta (ICD-9 756.51), hypophosphatasia (ICD-9 275.3), and primary hyperparathyroidism (ICD-9 252.01); receipt of teriparatide or denosumab; advanced kidney disease defined by outpatient estimated glomerular filtration rate (eGFR) < 30 mL/min/1.73 m2 calculated using the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation [12] or by prior receipt of chronic dialysis or renal transplantation. The index date was defined as the date the initial BP prescription was dispensed.
Bisphosphonate (BP) exposure assessment
Bisphosphonate exposure was determined based on dispensing dates and days’ supply of each prescription. Stockpiling was allowed for prescriptions that overlapped 30 days or less; the second prescription took precedence when prescriptions overlapped for more than 30 days [13]. Each patient’s BP use was updated at each successive quarter (90-day interval) of follow-up after the index date. Women were categorized as “on BP” if at least half of the 90-day period was covered by a prescription (i.e., proportion of days covered (PDC) 0.50 or greater) [14]; otherwise women were considered “off BP”. Because our goal was to examine adverse outcomes associated with treatment rather than its efficacy (a measure potentially requiring a higher adherence threshold), we chose a 50% PDC adherence threshold to define exposure for this study. Classification of BP exposure regimens is described in Statistical Analyses.
Follow-up and atypical femur fracture (AFF) assessment
Women were followed from the time of initial BP prescription until they: (a) experienced a complete AFF [6], (b) died, (c) developed an exclusionary condition as defined above, (d) ended health plan membership, or (f) reached 10 years of follow-up or end of study on 9/30/2015, whichever came first.
As previously described for the source cohort [6], potential AFF cases were identified from principal hospital discharge diagnoses for fractures of the femoral subtrochanter (ICD-9 820.22) and shaft (821.0x), pathologic fracture of the femur specified as other and not neck (ICD-9 733.15) and stress fracture of the femur (ICD-9 733.97), where case review was expanded to include principal diagnoses of open fracture codes (820.23 and 821.1x) and (per) trochanteric fractures (820.20, 820.21) when associated with a secondary diagnosis of diaphyseal, pathologic, or stress fracture accompanied by evidence (radiologic report or image) of diaphyseal fracture location. After excluding periprosthetic fractures, pathologic fractures, and fractures occurring outside the femoral diaphysis, radiologic images of diaphyseal fractures were adjudicated by a board-certified orthopedic surgeon with experience in identifying atypical femur fractures (blinded to BP treatment duration) [6].
The major criteria for AFF identified in the American Society for Bone and Mineral Research (ASBMR) Task Force 2013 Revised Case Definition of AFF [1] were used to classify complete diaphyseal fractures (fracture line extending through both cortices) and included the following required radiographic and clinical features: (1) noncomminuted or minimally comminuted fracture [4]; (2) fracture originating at the lateral cortex and primarily transverse in pattern (with or without a medial spike) [4, 5]; (3) evidence of localized periosteal or endosteal thickening of the lateral cortex at the site of fracture origin [1, 4, 15]; and (4) fracture occurring with minimal to no trauma. These radiographic features were first reported more than 10 years ago [3, 16, 17] and comprise the well-established criteria for AFF used by experts, especially the finding of focal periosteal or endosteal thickening at the transverse fracture origin [4, 18, 19]. The original ASBMR criteria for AFF [20] were revised in 2013 [1] to include periosteal callous formation as a major feature of AFF, with recognition that the initially transverse fracture originating in the lateral cortex can propagate medially at an oblique angle, with or without a medial spike [1, 15, 21]. It has also been noted that these atypical fractures can present initially with focal cortical hypertrophy that progresses to partial or complete AFF [1, 15, 16, 22]. Figure 1 shows a radiographic example of complete AFF [22], demonstrating the transverse fracture that develops in the lateral cortex at the site of periosteal hypertrophy, with the fracture becoming oblique as it propagates medially.
Covariates
Electronic health record databases were used to derive covariate data for this study. Age was determined at index date and self-reported race-ethnicity was classified as non-Hispanic White, Black, Asian/Pacific Islander, Hispanic, and other or unknown. Neighborhood educational attainment and household income were estimated using 2010 US Census block data; residence in a census block with more than 25% of adults over age 25 years reporting below 12th grade education was used as a proxy for low educational attainment and residence in a census block with median household income <$35,000 was used as a proxy for low income. Patient index year (i.e. year of cohort entry) was classified as before 2008 or 2008 and later. We also ascertained the following baseline variables using the closest data available within 5 years before or after BP initiation: self-reported smoking status; body mass index (BMI) classified as normal or underweight (< 25 kg/m2), overweight (25 to < 30 kg/m2), or obese (≥30 kg/m2); and low vitamin D level (25OHD < 20 ng/mL), as well as an indicator of whether vitamin D levels were assessed. We did not have information on non-prescription cholecalciferol (Vitamin D3) supplementation, the most common approach for optimizing Vitamin D levels in patients with osteoporosis, including those with vitamin D deficiency who receive initial pharmacologic ergocalciferol therapy.
Time-dependent covariates, assessed at baseline and updated every 90 days, included: (1) the Charlson-Deyo Comorbidity Index (CCI) [23] derived from diagnostic and procedure codes associated with health plan encounters in the prior year (assessed annually) and categorized as 0, 1, 2, 3 or more; (2) diabetes mellitus, defined by having at least two clinical diagnoses and pharmacologic treatment; (3) rheumatoid arthritis, defined by at least two diagnoses; (4) grade 3A (eGFR 45–59 mL/min/1.7m2) and 3B (eGFR 30–44 mL/min/1.7m2) chronic kidney disease (CKD), using the most recent outpatient serum creatinine level [12, 24] (the mean eGFR of 74 mL/min/1.7m2 was used to impute baseline eGFR when missing, and an indicator of whether the value was imputed [25, 26] was used in analyses); (5) receipt of proton-pump inhibitors, aromatase inhibitors, estrogen, or raloxifene from pharmacy databases; (6) recent oral glucocorticoid exposure defined as receiving a cumulative prednisone dose equivalent of at least 1825 mg (average 5 mg/day) in the prior year; (7) indicators of hip, spine, humerus, wrist, or other clinical fractures; and (8) bone mineral density (BMD) T-score. Baseline fracture history was determined from hospitalization, institutional care, emergency or ambulatory visit diagnoses of clinical fracture within the 5 years prior to the index date (ICD-9 805, 807–815, 817–825, 827–829, excluding codes associated with open fractures, spinal cord injury, fractures of the skull, face, fingers or toes, high energy trauma (ICD-9 E800-E848), and fracture under age 40 years), classified by fracture site. For time-dependent fracture events during follow-up, we identified fractures of the hip, spine, humerus, wrist, and other clinical site based on qualifying diagnoses restricted to hospitalization, emergency, orthopedic and urgent care encounters for new fractures events occurring in the prior 12 months. BMD findings up to 5 years before BP initiation and during follow-up included femoral neck, total hip, and lumbar spine assessed by dual energy X-ray absorptiometry (Hologic, Inc.; Marlborough, MA). Accompanying T-scores were calculated using peak BMD derived from the manufacturer and from the NHANES III reference data for non-Hispanic white women according to expert recommendations [27] and were updated each quarter based on data from any new BMD test in that quarter (if available) or were otherwise carried forward. For each BMD study, the lowest T-score of the femoral neck, total hip and lumbar spine was used. Those without available BMD at baseline (27.6%) had their BMD T-score imputed based on the mean value for the cohort (T-score − 2.5), and an indicator of whether the BMD T-score was imputed was included in analyses. The most recent BMD (T-score) measurements were considered in analyses, categorized as above − 2.0, − 2.0 to − 2.4, − 2.5 to - 2.9, and ≤ − 3.0.
Statistical analyses
We used inverse probability weighting [10, 11] to estimate the counterfactual cumulative incidence (risk) of AFF if women followed one of two stochastic [28,29,30,31,32] BP exposure regimens: (1) Short-term treatment: discontinuation of BP therapy within the first 3 years of BP initiation (with equal probability of BP discontinuation in each quarter of years 1–3), followed by continued absence of BP exposure based on a PDC < 50%. A woman who discontinued BP within the first 3 years of follow-up would continue to contribute follow-up time to the short-term treatment arm while off BP, but her follow-up would end if, and at the time she went back on BP. (2) Longer-term treatment: continuous exposure to BP for three or more years. The longer-term treatment arm required continuous exposure for the first 3 years, after which women could discontinue (with equal probability of BP discontinuation in each quarter of years 4–10) or remain on treatment (Fig. 2). Note that a woman could contribute person-time to both the short and longer-term treatment arms while continuously on BP treatment during the first 3 years. Our approach aimed to emulate [33] inferences from an ideal clinical trial where women would have been randomly assigned to the short- or long-term treatment arms above.
Analytic datasets were created in SAS 9.3 (Cary, NC) using the MSMstructure SAS Macro [34]. Measurements on exposure, outcome (subsequent AFF), censoring and time-dependent covariates were updated every quarter between the date of BP initiation until the end of follow-up.
We used inverse probability weighting instead of standard covariate adjustment methods in order to properly adjust for potential time-varying confounding affected by prior exposures. Time-dependent confounding occurs if some factor (e.g. BMD) is affected by past BP exposure, impacts future BP exposure decisions, and additionally affects future AFF risk. Inverse probability weighting was used to fit the stochastic-intervention analog of a saturated marginal structural model (MSM) [10] to estimate the two discrete-time, counterfactual hazard functions [35, 36] (i.e. the hazards in each of the two arms of the ideal randomized experiment we aimed to emulate). These inverse probability weighted hazard estimates were then mapped into estimates of counterfactual AFF-free survival over 10 years of follow-up, as well as estimates of the adjusted risk differences for AFF comparing the longer-term (≥3 years) to the short-term (< 3 years) BP treatment arm. We compared differences in area under the two (discrete-time) counterfactual survival curves and report corresponding p-values [36].
The propensity scores that define the inverse probability weights were estimated using multivariable logistic regression with all previously described covariates included. Propensity score models were additionally adjusted for (1) current follow-up time, (2) time since the most recent BMD testing and (3) indicators of whether vitamin D levels were assessed and eGFR and BMD T-scores were imputed. Separate logistic models were fit to estimate the probability of BP initiation and continuation. The model for treatment continuation at any given quarter of follow-up also included a main term for cumulative exposure to date. Four additional logistic models were separately fit to estimate the propensity scores for each of the four censoring events (death, end of study, disenrollment, and occurrence of exclusion event). All inverse probability weights were stabilized and truncated at 50 [37, 38]. The inverse probability weights were centered around 1, with median values of 0.88 (interquartile range (IQR): 0.40–1.23) and 1.05 (IQR: 0.92–1.53) for short-term and longer-term treatment followers, respectively. Truncation of stabilized weights affected less than 0.2% of the observations. Analyses were performed using the stremr package [39, 40] in R [41]. A p-value less than 0.05 was considered significant.