Skip to main content

Study protocol: The back pain outcomes using longitudinal data (BOLD) registry



Back pain is one of the most important causes of functional limitation, disability, and utilization of health care resources for adults of all ages, but especially among older adults. Despite the high prevalence of back pain in this population, important questions remain unanswered regarding the comparative effectiveness of commonly used diagnostic tests and treatments in the elderly. The overall goal of the Back pain Outcomes using Longitudinal Data (BOLD) project is to establish a rich, sustainable registry to describe the natural history and evaluate prospectively the effectiveness, safety, and cost-effectiveness of interventions for patients 65 and older with back pain.


BOLD is enrolling 5,000 patients ≥ 65 years old who present to a primary care physician with a new episode of back pain. We are recruiting study participants from three integrated health systems (Kaiser-Permanente Northern California, Henry Ford Health System in Detroit and Harvard Vanguard Medical Associates/ Harvard Pilgrim Health Care in Boston). Registry patients complete validated, standardized measures of pain, back pain-related disability, and health-related quality of life at enrollment and 3, 6 and 12 months later. We also have available for analysis the clinical and administrative data in the participating health systems’ electronic medical records. Using registry data, we will conduct an observational cohort study of early imaging compared to no early imaging among patients with new episodes of back pain. The aims are to: 1) identify predictors of early imaging and; 2) compare pain, functional outcomes, diagnostic testing and treatment utilization of patients who receive early imaging versus patients who do not receive early imaging. In terms of predictors, we will examine patient factors as well as physician factors.


By establishing the BOLD registry, we are creating a resource that contains patient-reported outcome measures as well as electronic medical record data for elderly patients with back pain. The richness of our data will allow better matching for comparative effectiveness studies than is currently possible with existing datasets. BOLD will enrich the existing knowledge base regarding back pain in the elderly to help clinicians and patients make informed, evidence-based decisions regarding their care.


Back pain is a particularly important problem for older adults. The prevalence of severe, disabling back pain increases in older adults [1, 2]. Moreover, with an aging population, the importance of back pain in the U.S. will only increase in coming decades. Despite this, there is a paucity of research on back pain in older age, and most studies to date have been small [1].

The Back pain Outcomes using Longitudinal Data (BOLD) project establishes a large, community-based registry of patients aged 65 years and older presenting with new episodes of healthcare visits for back pain. BOLD’s primary aim is to establish an infrastructure that allows the conduct of prospective, controlled studies comparing the effectiveness of diagnostic and treatment strategies for back pain in the elderly. The importance of BOLD stems from the high prevalence, clinical impact and cost of back pain, combined with a relative lack of comparative effectiveness data, especially for older adults. Back pain, an Institute of Medicine priority condition [3], is one of the most important causes of functional limitations and disability among adults in the United States. Back pain is also one of the most common reasons for physician visits [4]. The economic impact of back pain is substantial. Martin et al., estimated that in 2005, the marginal direct cost of care for people with back pain compared to those without was over $86 billion [5].

Although there are numerous guidelines regarding the diagnosis and treatment of back pain in general, these evidence-based guidelines do not focus on the elderly. Age-related differences in the causes of back pain highlight the need for specific guidelines for diagnosing and treating back pain in older adults. For example, back pain due to metastatic cancer has a higher prevalence in older adults. In a study of primary care patients with back pain, age older than 50 years was associated with a higher likelihood of having cancer (positive likelihood ratio = 2.7), although the absolute probability of having cancer remained small at 1.2% [6]. This increased risk of cancer, as well the greater prevalence in older adults of other conditions such as spinal stenosis, vertebral compression fractures and aortic aneurysms, has led most guidelines to call for early diagnostic imaging in the elderly. However, it remains unclear how early imaging in the elderly affects clinical outcomes and costs associated with the treatment of back pain. A primary goal of the BOLD project is to enrich the existing knowledge base regarding back pain in the elderly to help clinicians and patients make informed, evidence-based decisions regarding their care.


BOLD registry


The overall goal of this project is to establish a sustainable and rich registry to evaluate prospectively the effectiveness, safety, and cost-effectiveness of diagnostic approaches and interventions for elderly patients with back pain. The registry can also be used to identify and recruit patients for additional studies. We plan to recruit 5,000 patients age 65 and older with new episodes of health care visits for back pain (defined as no prior visits to a health care provider for back pain care within 6 months). Patients who enroll in the registry complete validated, standardized measures of pain, back pain-related disability, and health-related quality of life at enrollment and 3, 6 and 12 months later. Our project includes a demonstration comparative effectiveness study of early (<six weeks after initial medical visit) imaging versus no early imaging for elderly patients with back pain. In this observational cohort study we will test the hypothesis that early imaging is associated with more interventions and adverse labeling (where simply assigning a diagnostic label results in worse health related quality of life), greater disability and higher levels of pain compared to matched controls who do not undergo early imaging. We will also test the hypothesis that racial and ethnic minorities will have lower rates of early imaging than non-minorities. In parallel with construction of the BOLD registry, we also are performing a double-blind, randomized controlled trial of epidural steroid with local anesthetic compared with a local anesthetic injection alone for spinal stenosis; this component of the study is described elsewhere [7].

Participating centers

BOLD is recruiting patients at three integrated health care systems: Kaiser Permanente of Northern California (KPNC), Henry Ford Health System (HFHS), and Harvard Vanguard Medical Associates/Harvard Pilgrim Health Care (HVMA/HPHC). We chose these sites for their geographic and demographic diversity. Confining our registry to the integrated health systems with comprehensive electronic medical record systems allows us to take advantage of the well-defined populations that their patients comprise as well as the wealth of data available in these systems, including health care utilization.

The University of Washington’s Comparative Effectiveness, Cost and Outcomes Research Center (CECORC) and Center for Biomedical Statistics (CBS) serve as the Data Coordinating Center (DCC) for BOLD. A collaborator at Oregon Health and Sciences University (RAD) is also part of the DCC.

Institutional review board (IRB) approval

The IRBs at all participating institutions (University of Washington, Harvard Vanguard, Harvard Pilgrim, Henry Ford Health System, and Kaiser-Permanente Northern California) reviewed and approved the protocols for the BOLD Registry and the Observational Cohort Study of Early Imaging.

Subject eligibility

We identify patients at their health care visits for back pain using the Ninth International Classification of Diseases (ICD-9) codes [8]. We recruit subjects from both primary care clinics and urgent care/emergency care settings. Since our aim is to evaluate treatment effectiveness (how an intervention performs in the real world) rather than efficacy (how an intervention performs under ideal conditions), our inclusion criteria are as broad as possible. Our inclusion and exclusion criteria are listed in Table 1.

Table 1 Inclusion and Exclusion Criteria

Patient identification

We screen for study eligibility patients ≥ 65 years old who have had a primary care visit (including by telephone) or urgent/emergency care visit and been assigned a diagnosis code indicating back pain within the past three weeks (Table 2). In addition to patient encounters with physicians, we also include patients who have had encounters with non-physician primary care providers (registered nurses, nurse practitioners and physician assistants). We select for patients with relatively new onset episodes of back pain by excluding those with visits for back pain in the previous 6 months.

Table 2 ICD9 Inclusion and Exclusion Codes for the Registry/Observational Cohort

Patient enrollment

The exact method of the initial subject contact varies slightly at each site. The site research staff identify and contact potential subjects by telephone, email, mail or in person, describing the study and inviting them to participate using a standardized script. In the invitation we provide a web address ( that has additional information about the study.

If the patient agrees, the research staff determines eligibility, verifying inclusion/exclusion criteria that were assessed during the query of the electronic health information system (Table 3). Patients provide verbal assent for their participation in the registry, which includes access to their medical records.

Table 3 Additional Exclusion Criteria Assessed at Initial Telephone Contact

We offer subjects a $10 gift card or check for each completed interview (baseline and 3, 6, and 12 months later). The total time for completing the study questionnaires at each assessment is approximately 15–30 minutes.

Data collection

Our data come primarily from two sources: subject questionnaires and electronic data records. We have attempted to minimize the questionnaire burden while still obtaining important information regarding the patient’s back pain. At baseline, trained research coordinators/interviewers administer the questionnaires either in person or over the telephone within three weeks of a subject’s index primary care visit.


We contact each registry patient at three, six and 12 months after baseline to collect data on patient treatments and outcomes. For follow-ups, the questionnaires are either self-administered by the subject using a mailed hard copy or administered by a research coordinator over the telephone. We plan to develop an on-line version of the questionnaire as an option for subjects to complete after being sent a link by email.

Follow-up questionnaires can be completed within a two-week window on either side of the follow-up time-point. We use a computerized tracking system to identify when patients enter the interview window and when interviews are complete. If patients withdraw from the study, we attempt to identify the reason.

Baseline and follow-up measures

We collect demographic information and information regarding back pain duration and back pain recovery expectations at baseline. We also administer the following measures at each assessment: 1) Roland-Morris Disability Questionnaire (RMDQ) [9], modified slightly to indicate disability due to back or leg pain (sciatica); 2) 0–10 numerical rating scales (NRS) of average back and leg pain in past 7 days; 3) Brief Pain Inventory activity interference scale; [10, 11] 4) Patient Health Questionnaire (PHQ)-4 Depression and Anxiety screen; [12] 5) the EuroQol-5D (EQ5D) [13] 6) Behavioral Risk Factor Surveillance System (BRFSS) survey (2 questions about falls) [14]. We repeat the same measures at each follow-up time-point except for the duration of pain and patient recovery expectation questions.

Baseline descriptive measures

Pain Duration: We ask subjects at baseline to categorize the length of the current episode of back or leg pain (sciatica) as follows: 1) less than 1 month; 2) 1–3 months, 3) 3–6 months; 4) 6–12 months; 5) 1–5 years; and 6) more than 5 years.

Patient Expectations: We ask subjects to use a 0–10 NRS to rate their confidence that their pain will be completely gone or much better in 3 months.

Primary outcome measure

Roland-Morris Disability Questionnaire: Our primary outcome measure is the Roland-Morris Disability Questionnaire (RMDQ) [9], a back pain-specific functional status questionnaire adapted from the generic Sickness Impact Profile (SIP) [15]. The original version consists of 24 yes/no items, which represent common dysfunctions in daily activities experienced by patients with back pain [9] .We use a slightly modified version of the questionnaire in which we add “or leg (sciatica)” to the words “back pain” where appropriate. A single score is derived by summing the items endorsed by the respondent, with higher scores indicating worse function. Both the original and modified RMDQ have proven to be more responsive to change over time than most subscales of the SF-36 [16] or disability day questions from national health surveys [16]. Its internal consistency is excellent [17]. Its construct validity is supported by significant associations in the expected directions with symptom severity, neurologic deficits, opioid medication use, work absenteeism, and other measures of health status (subscales of the SF-36, disability days) [18, 19]. The RMDQ was the measure most responsive to clinical changes over time in the Maine Low Back Pain Cohort study [16].

Additional patient-reported measures

Pain Numerical Rating Scale (NRS): We ask subjects to rate separately their average back and leg pain within the past seven days on 0–10 scales, with 0 = no pain and 10 = worst pain imaginable. Investigators commonly use NRS’s of pain intensity as outcomes in clinical trials of pain therapies, and these ratings have been demonstrated to be valid, reliable, and sensitive to detecting change in pain intensity after treatment [20]. The Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT) recommended a 0–10 NRS measure of pain intensity as a core outcome measure in pain clinical trials and noted that NRS measures had advantages over visual analogue scale (VAS) measures, including ability to be administered by telephone, preference by patients, and less missing and incomplete data [21]. Further, older adults may have difficulty completing VAS measures [20]. The IMMPACT group also recommended that clinical trials report the percentage of patients obtaining reductions in pain intensity from baseline of at least 30% on the NRS, and suggested that investigators may also wish to report the percentages of patients obtaining reductions in pain intensity of at least 50%. We plan to use both of these indicators of clinically meaningful change.

Pain Interference

The validated Brief Pain Inventory (BPI) Interference scale measures pain interference with activities [11].The scale consists of 7 ratings (0–10) of how much back pain interferes with the following: general activity, mood, ability to walk, normal work, relations with other people, sleep and enjoyment of life.

Patient Health Questionnaire-4 Depression and Anxiety Screen

The PHQ-4 is a four-item screen for depression and anxiety that has good sensitivity and specificity for identifying depression and anxiety disorders [22].


The EQ-5D is a standardized health outcome instrument consisting of five dimensions (mobility, self-care,usual activities, pain/discomfort, and anxiety/depression). In addition, the instrument includes a “feeling thermometer” to assess respondents’ current health-related quality of life (0–100). The EQ-5D has been extensively validated and studied for a wide variety of conditions and populations, including the elderly, and is used as a utility measure in cost-effectiveness analyses ([23].

Behavioral Risk Factor Surveillance System (BRFSS) Falls

The BRFSS Falls screen is a two-item questionnaire that assesses the number of falls the respondent has had in the past 3 weeks and how many resulted in injury [2429].

Additional data

In addition to the patient-reported outcome measures, we will use electronic medical record and administrative data that are available at these integrated health systems. These health systems have standardized administrative and clinical data collected across their systems (Figure 1). We will generate queries of each health system’s information system to acquire demographic, pharmacy, laboratory, vital sign, and provider data. Table 4 contains a list of variables that we plan provisionally to obtain from each site for each subject.

Figure 1
figure 1

Virtual Data Warehouse data elements.

Table 4 Variables to be obtained directly from Health Information Systems

Data management

For all interviews, research coordinators enter data on specially formatted, paper data collection forms that are stored securely at each site. In addition, sites have the option of entering data directly into the on-line REDCap data system [30]. This has the added advantage of automated range and logic checks that reduce data entry errors. The study has two classes of data: 1) data containing protected health information (PHI) that is only stored locally at each site on a secure server; and 2) a limited data set with dates of service but no other PHI that is uploaded to a central database at the DCC using a web interface.

Research assistants check data from the interviews for missing or unclear responses while the subject is still available. The data coordinating center’s senior programmer directs the data management and re-checks the data for quality. We defined specific logic rules for establishing the internal consistency of responses across several variables. When necessary, we check the original data collection forms or re-contact the subject.

Analytic approach

In many registries, there may be a relatively small subgroup of interest (e.g., a treatment or diagnostic testing prevalence of 1 or 2%) and a large number of controls available for a comparative analysis of outcomes. As a general analytic approach in BOLD, we will evaluate cases in comparative studies using 3:1 matched controls with the nearest propensity score [31].

Propensity-based matching is a strategy for assembling similar groups of patients in the absence of randomization. We will use this method to select control patients who are similar to patients who have selectively received the intervention of interest (e.g. early imaging). If we do not appropriately match controls on important baseline characteristics to patients who receive the intervention, there is a risk of obtaining biased results when comparing cases with controls due to confounding [3137]. Propensity score matching aims to provide a valid estimate of the intervention effect by comparing patients who have and have not had the intervention and who also have similar observed characteristics.

For a given research question and set of patients, we will use logistic regression models (or multi-level logistic regression models for ordinal or multilevel treatments; e.g., levels of treatment dose) to generate a propensity score for each person, using variables that are significant predictors of the intervention of interest. We will then match the propensity score of each case who received the intervention to the nearest propensity scores (up to 3) available among control patients whose propensity scores lie within a caliper window of 0.2σ of the case index, where σ is a measure of variation in the propensity score distributions of cases and controls as given in Rosenbaum and Rubin [35]. If no control propensity scores fall within the caliper window of a particular index case, then the index case will be excluded from any analyses.

Observational cohort of early imaging

We will conduct an observational cohort study of early imaging in seniors with new visits for back pain as our first comparative study using data from the BOLD registry. Our goal is to test the hypothesis that imaging of the lumbar spine within 6 weeks of the index visit (early imaging) is associated with worse patient outcomes and increased health care utilization and costs. Patients who get early imaging may be those with the worst pain or most alarming clinical presentation. However, given the variability in clinician ordering patterns, there is also a reasonable likelihood that those patients who do and do not get early imaging have considerable overlap.

Prior work has suggested an association between early imaging and subsequent interventions [38, 39] but lacked the statistical power to detect a significant association.

Subject eligibility

All subjects enrolled in the registry will be eligible for the observational study of early imaging. Cases selected for the observational cohort will be registry patients who had early imaging of the lumbar spine. We will identify propensity score-matched controls from the registry (see below) who did not have early imaging of their spine.

Analytic approach to the observational cohort of early imaging

Our overall aim for the observational cohort study is to compare the pain, function, and resource utilization and associated costs of patients who have early (within six weeks of index medical visit) imaging (radiographs, magnetic resonance imaging (MRI), computed tomography (CT) and bone scans) to those who do not have early imaging. The sample will consist of registry patients with new episodes of back pain. Our primary hypothesis is that patients who undergo early imaging will have worse modified RMDQ scores at one year compared with those who do not receive early imaging, after controlling for baseline back pain-related disability, pain severity and pain duration. Our rationale is that imaging may lead to adverse labeling [40] or more interventions (injections, surgery) [39], with resultant complications. We will also test the hypothesis that early-imaged subjects undergo more invasive and more resource-intensive subsequent interventions than those who do not.


We will construct a propensity score based upon the logit function of the probability of receiving early imaging (e.g., the log odds) for a patient with specific characteristics or prognostic factors [37]. We will use fixed matching of age (5-year strata), sex (male/female), and race (Caucasian/African American/other) in the generation of the propensity score and include candidate baseline covariates such as other co-morbidities or diagnoses identified at baseline, modified RMDQ score, and pain intensity rating. Patients receiving early imaging will be matched to the closest control whose propensity score differed by less than 0.2σ among those patients within five years of age.

Primary analysis

Our primary outcome measure is back-specific disability measured by the RMDQ at 12 months. We have selected the 12-month assessment as the primary outcome because this allows adequate time for any intervention benefit to manifest, and is the final assessment opportunity for the initial registry study design.

We will first assess comparability of baseline characteristics between the matched groups to gauge the effectiveness of the propensity matching and then address any residual covariate imbalances through model adjustment. Rosenbaum and Rubin suggested that an approach combining both the propensity score and covariate adjustment is superior to the use of either strategy alone [41].

Using the propensity-matched pairs, we plan to use a paired t-test to compare the between-group 12-month change in RMDQ. In conjunction with this primary analysis, we will use multivariate linear regression models adjusting for the propensity score and baseline factors that appear to have residual imbalance in order to compare groups with and without early imaging.

We will use multivariate linear regression models adjusting for the propensity score or conditional logistic models to identify predictors of patient outcome at the one-year follow-up. We will use interaction terms between the early imaging and baseline characteristics to identify variables that predict differences in the outcome associations between the two groups.

We will include subjects who have subsequent imaging more than six weeks after entry to the study in the non-early imaging group. We will compare characteristics of subjects who receive later imaging to those who do not in a sensitivity analysis.

Secondary analyses

We will conduct similar analyses for the RMDQ at three and six months as well as for the pain NRS and EQ-5D using all data through one year. We will use methods appropriate for the analysis of repeated measures such as linear mixed models or repeated measures ANCOVA [42], adjusting for the propensity score. We will assess binary secondary outcomes such as achievement of a 30% reduction in pain using conditional logistic regression models.

Using the patient‐reported data and the electronic health system information systems, we will enumerate the number and type of invasive interventions that patients undergo following enrollment. These interventions are listed in Table 5. We will use fixed effects conditional Poisson regression models to compare adjusted spinal surgery rates between those patients who did or did not receive early imaging, conditional on matched pair [43]. In addition, we will examine the time to first invasive intervention using survival analysis with a Cox proportional hazards model and adjust for the propensity score for early imaging.

Table 5 Invasive Interventions

Another hypothesis is that racial and ethnic minorities will have lower rates of early imaging than non-minorities. To test this hypothesis, we will use the registry to compare rates of early imaging between African Americans/ Blacks and Whites as well as between Hispanics and Whites. We will test for differences in rates using fixed effects conditional Poisson regression models, controlling for the propensity score and residual imbalances among important covariates. We will also examine subsequent invasive interventions as well as outcomes in each of these ethnic and racial subgroups. If early imaging rates are indeed lower in racial and ethnic minorities, we would expect subsequent invasive interventions to be fewer and functional status better.

Economic analysis

The primary economic hypothesis is that patients receiving early imaging will have higher health care utilization, higher costs, and worse outcomes at one year compared to those not receiving early imaging. The primary economic outcome will be one-year incremental cost per quality-adjusted life year (QALY) gained from the private/public payer perspective [44].

The cost-effectiveness assessment will use the health systems’ electronic medical records and administrative data as well as patient-reported outcome data. We will use the electronic data to assess within-health system categories of resource utilization (e.g., office visits, procedures, surgeries, tests, medications). We will use the Marketscan® data warehouse ( to obtain an estimate of 2012 private payer average unit costs for medications and medical procedures/services.

We will report short-term costs and consequences (baseline to 3 months) and assess six-month and one-year outcomes incorporating the linear mixed-model approach used in the primary outcomes analysis. Sensitivity and specific scenario analyses will be undertaken to evaluate uncertainty on cost-effectiveness parameters [45].

Sample size

Prior studies suggest that approximately 15%–30% of back pain patients will have early imaging of the lumbar spine [46]. Given a registry size of 5,000 subjects, we expect 750–1,500 patients in the BOLD registry will have early imaging and comprise cases for the observational matched cohort study.

In a matched study, missing data at follow-up in either the case or matched control imply that neither patient’s data will be included in a matched analysis. That is, if we anticipate between 10–15% loss to follow-up equally balanced between comparison groups, the number of missing data points can be as much as doubled in any matched or conditional analysis. To compensate for this, we will enrich the control sample with 3:1 matched sampling so that each case will have up to three controls followed in an identical manner. In Table 6, we see that this number of patients offers adequate power to detect minimally clinically relevant differences in functional and pain outcomes, as well as important differences in rates of surgery, complications, or adverse events. Given that one of our enrolling sites (Kaiser Permanente Northern California) is much larger than the other sites, we anticipate approximately triple the number of subjects to be enrolled from KPNC than the other two sites, or 3,000 vs. 1,000 subjects.

Table 6 BOLD Power Estimates (registry size = 5,000 patients)*

An important advantage of a registry is the ability to detect relatively rare events due to the large sample size. We base our sample size estimates on the ability to detect and make inference on relatively rare events. In the primary care setting, examples of rare events would be subsequent surgery or adverse outcomes from interventions such as epidural steroid injections. While our first planned use of the registry is for the comparative effectiveness evaluation of early imaging vs. no-early imaging in the elderly, we envision other evaluations such as the comparative effectiveness of physical therapy vs. no physical therapy.

Data access

As the registry progresses in size and maturity, we anticipate making the BOLD resources available to researchers interested in evaluating diagnostic tests, treatments, and outcomes among elderly patients with back pain. Detailed information regarding data sharing will be available at


Back pain registries

In the U.S., many spine-related registries are device- or procedure-focused and hence recruit patients primarily from specialists. Outside the U.S., several prospective spine registries/cohorts have been established to study various aspects of back pain and while somewhat broader in scope, most still have a specialist focus [4850]. In contrast, Costa and colleagues established an inception cohort of 973 primary care patients with acute (less than two weeks) low back pain [51], demonstrating both the feasibility and value of such an approach. Identifying patients early in the course allowed measurement of baseline factors that predicted progression and chronicity.

The Back Complaints in the Elders (BACE) consortium is an international group of investigators who have independently established prospective cohorts in a primary care setting to investigate back pain among seniors [52]. Investigators from the Netherlands, Australia and Brazil are collaborating to identify prognostic indicators leading to the transition from acute to chronic back pain in the elderly. The objectives of BOLD parallel those of BACE and similar study structures facilitate international comparisons.

Strengths of registries

A great advantage of registries is that patient enrollment is easier than intervention trials, so large sample sizes are feasible. This increases generalizability and the ability to detect rare events, such as complications.

Limitations of registries

Roovers highlighted the limitations of registries, including lack of proper control groups, confounding, bias, poor data quality control, and potential conflicts of interest due to industry sponsorship [53]. Due to these limitations, registries will never replace randomized controlled trials (RCTs). The most important limitation of registries in general is the lack of a pre-defined control group. However, we can identify important subgroups contained within the registry to use for comparative effectiveness evaluations, such as patients with and without early imaging, and use propensity-matched controls to minimize selection bias associated with treatment or diagnostic testing.

Another limitation of registries is selection bias associated with enrollment. Physicians might be more likely to enroll uncomplicated patients who are likely to have better outcomes. We avoid this shortcoming by using the health information systems to identify potential patients and have a research coordinator contact and enroll them (without prior screening by the primary care physician). Limiting enrollment to integrated health systems somewhat limits generalizability, since the delivery of care within these systems is distinct from the fee-for-service delivery system, with unique incentives. Nevertheless, we believe that the advantages of integrated health systems (comprehensive tracking of healthcare utilization and well-defined population) far outweigh the limitations.

Diagnostic imaging and back pain in the elderly

Patients and clinicians both tend to under-appreciate the disadvantages of diagnostic testing. The degree of potential controversy associated with this issue was recently emphasized by the release of the U.S. Preventative Health Services Task Force report on mammography [54]. The panel recommended against screening women in the 40–49 year old age group due to the high rate of false positives that could lead to unnecessary further testing and invasive procedures resulting in morbidity without benefit. In addition, these false positives could lead to anxiety and poorer health-related quality of life. Spine imaging in the elderly has similar problems. The rate of incidental findings is high, as high as 90% for some findings [55]. These findings can lead to adverse labeling as well as increased unnecessary interventions, with associated morbidity [40]. Most guidelines exclude patients older than 50 or 65 years from imaging constraints because of the increased prevalence of serious conditions in the elderly. However, it is in the elderly that the rate of incidental lumbar imaging findings is highest [55]. One of our goals is to examine the consequences of early imaging in the elderly by comparing elderly patients who receive early imaging to those who do not.

The BOLD registry establishes an infrastructure for studying back pain in the elderly and performing future comparative effectiveness and cost-effectiveness evaluations. Strengths include accessing patients from a community-based setting and using integrated health plans to facilitate the tracking of resource use. Since the aims and design parallel studies by an international consortium of investigators, comparison with cohorts from the Netherlands, Australia and Brazil will be possible. Even without the potential for future collaborations with international collaborators, the BOLD registry is a valuable new resource for comparative effectiveness research in the United States.


  1. Docking RE, Fleming J, Brayne C, Zhao J, Macfarlane GJ, Jones GT: Epidemiology of back pain in older adults: prevalence and risk factors for back pain onset. Rheumatology (Oxford). 2011, 50 (9): 1645-1653. 10.1093/rheumatology/ker175.

    Article  Google Scholar 

  2. Dionne CE, Dunn KM, Croft PR: Does back pain prevalence really decrease with increasing age? A systematic review. Age Ageing. 2006, 35 (3): 229-234. 10.1093/ageing/afj055.

    Article  PubMed  Google Scholar 

  3. Initial National Priorities for Comparative Effectiveness Research: In. 2009, Institute of Medicine, Washington DC

    Google Scholar 

  4. Schappert SM, Rechtsteiner EA: Ambulatory medical care utilization estimates for 2006. Natl Health Stat Report. 2008, 8: 1-29.

    PubMed  Google Scholar 

  5. Martin BI, Deyo RA, Mirza SK, Turner JA, Comstock BA, Hollingworth W, Sullivan SD: Expenditures and health status among adults with back and neck problems. JAMA. 2008, 299 (6): 656-664. 10.1001/jama.299.6.656.

    Article  CAS  PubMed  Google Scholar 

  6. Deyo RA, Diehl AK: Cancer as a cause of back pain: frequency, clinical presentation, and diagnostic strategies. J Gen Intern Med. 1988, 3: 230-238. 10.1007/BF02596337.

    Article  CAS  PubMed  Google Scholar 

  7. Friedly J, Bresnahan BW, Comstock B, Turner JA, Deyo RA, Sullivan SD, Heagerty P, Bauer Z, Nedeljkovic SS, Avins AL: Study Protocol- Lumbar Epidural Steroid Injections for Spinal Stenosis (LESS): a double-blind randomized controlled trial of epidural steroid injections for lumbar spinal stenosis among older adults. BMC Musculoskelet Disord. 2012, 13 (1): 48-10.1186/1471-2474-13-48.

    Article  PubMed  PubMed Central  Google Scholar 

  8. International Classification of Diseases, 9th Revision, 5th Edition, Clinical Modification. 1993, Practice Management Information Corp, Los Angeles, Calif

  9. Roland M, Morris R: A study of the natural history of back pain. Part 1: Development of a reliable and sensitive measure of disability in low back pain. Spine. 1983, 8: 141-144. 10.1097/00007632-198303000-00004.

    Article  CAS  PubMed  Google Scholar 

  10. Cleeland CS, Nakamura Y, Mendoza TR, Edwards KR, Douglas J, Serlin RC: Dimensions of the impact of cancer pain in a four country sample: new information from multidimensional scaling. Pain. 1996, 67 (2–3): 267-273.

    Article  CAS  PubMed  Google Scholar 

  11. Cleeland CS, Ryan KM: Pain assessment: global use of the Brief Pain Inventory. Ann Acad Med Singapore. 1994, 23 (2): 129-138.

    CAS  PubMed  Google Scholar 

  12. Kroenke K, Spitzer RL, Williams JB, Lowe B: An ultra-brief screening scale for anxiety and depression: the PHQ-4. Psychosomatics. 2009, 50 (6): 613-621.

    PubMed  Google Scholar 

  13. Brooks R: EuroQOL: the current state of play. Health Policy. 1996, 37 (1): 53-72. 10.1016/0168-8510(96)00822-6.

    Article  CAS  PubMed  Google Scholar 

  14. Centers for Disease Control and Prevention: Self-reported falls and fall-related injuries among persons aged > or =65 years--United States, 2006. MMWR Morb Mortal Wkly Rep. 2008, 57 (9): 225-229.

    Google Scholar 

  15. Bergner M, Bobbitt RA, Carter WB, Gilson BS: The Sickness Impact Profile: Development and final revision of a health status measure. Med Care. 1981, 19: 787-805. 10.1097/00005650-198108000-00001.

    Article  CAS  PubMed  Google Scholar 

  16. Patrick DL, Deyo RA, Atlas SJ, Singer DE, Chapin A, Keller RB: Assessing health-related quality of life in patients with sciatica. Spine. 1995, 20 (17): 1899-1908. 10.1097/00007632-199509000-00011.

    Article  CAS  PubMed  Google Scholar 

  17. Patrick DL, Deyo RA: Generic and disease-specific mesures in assessing health status and quality of life. Med Care. 1989, 27 (suppl): S217-S232. 10.1097/00005650-198903001-00018.

    Article  CAS  PubMed  Google Scholar 

  18. Deyo RA, Andersson G, Bombardier C, Cherkin DC, Keller RB, Lee CK, Liang MH, Lipscomb B, Shekelle P, Spratt KF: Outcome measures for studying patients with low back pain. Spine. 1994, 19 (18 Suppl): 2032S-2036S.

    Article  CAS  PubMed  Google Scholar 

  19. Deyo RA, Diehr P, Patrick DL: Reproducibility and responsiveness of health status measures. Statistics and strategies for evaluation. Control Clin Trials. 1991, 12 (4 Suppl): 142S-158S.

    Article  CAS  PubMed  Google Scholar 

  20. Jensen M, Karoly P: Self-report scales and procedures for assessing pain in adults. Handbook of Pain Assessment Second ed. 2001, The Guilford Press, New York, 15-34.

    Google Scholar 

  21. Dworkin RH, Turk DC, Farrar JT, Haythornthwaite JA, Jensen MP, Katz NP, Kerns RD, Stucki G, Allen RR, Bellamy N: Core outcome measures for chronic pain clinical trials: IMMPACT recommendations. Pain. 2005, 113 (1–2): 9-19.

    Article  PubMed  Google Scholar 

  22. Kroenke K, Spitzer RL, Williams JB: The Patient Health Questionnaire-2: validity of a two-item depression screener. Med Care. 2003, 41 (11): 1284-1292. 10.1097/01.MLR.0000093487.78664.3C.

    Article  PubMed  Google Scholar 

  23. Barton GR, Sach TH, Avery AJ, Jenkinson C, Doherty M, Whynes DK, Muir KR: A comparison of the performance of the EQ-5D and SF-6D for individuals aged > or = 45 years. Health Econ. 2008, 17 (7): 815-832. 10.1002/hec.1298.

    Article  PubMed  Google Scholar 

  24. Buchner DM, Hornbrook MC, Kutner NG, Tinetti ME, Ory MG, Mulrow CD, Schechtman KB, Gerety MB, Fiatarone MA, Wolf SL: Development of the common data base for the FICSIT trials. J Am Geriatr Soc. 1993, 41 (3): 297-308.

    Article  CAS  PubMed  Google Scholar 

  25. Ganz DA, Higashi T, Rubenstein LZ: Monitoring falls in cohort studies of community-dwelling older people: effect of the recall interval. J Am Geriatr Soc. 2005, 53 (12): 2190-2194. 10.1111/j.1532-5415.2005.00509.x.

    Article  PubMed  Google Scholar 

  26. Hannan MT, Gagnon MM, Aneja J, Jones RN, Cupples LA, Lipsitz LA, Samelson EJ, Leveille SG, Kiel DP: Optimizing the tracking of falls in studies of older participants: comparison of quarterly telephone recall with monthly falls calendars in the MOBILIZE Boston Study. Am J Epidemiol. 2010, 171 (9): 1031-1036. 10.1093/aje/kwq024.

    Article  PubMed  PubMed Central  Google Scholar 

  27. Mackenzie L, Byles J, D’Este C: Validation of self-reported fall events in intervention studies. Clin Rehabil. 2006, 20 (4): 331-339. 10.1191/0269215506cr947oa.

    Article  PubMed  Google Scholar 

  28. Pluijm SM, Smit JH, Tromp EA, Stel VS, Deeg DJ, Bouter LM, Lips P: A risk profile for identifying community-dwelling elderly with a high risk of recurrent falling: results of a 3-year prospective study. Osteoporos Int. 2006, 17 (3): 417-425. 10.1007/s00198-005-0002-0.

    Article  CAS  PubMed  Google Scholar 

  29. Tinetti ME, Baker DI, McAvay G, Claus EB, Garrett P, Gottschalk M, Koch ML, Trainor K, Horwitz RI: A multifactorial intervention to reduce the risk of falling among elderly people living in the community. N Engl J Med. 1994, 331 (13): 821-827. 10.1056/NEJM199409293311301.

    Article  CAS  PubMed  Google Scholar 

  30. Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG: Research electronic data capture (REDCap)–a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform. 2009, 42 (2): 377-381. 10.1016/j.jbi.2008.08.010.

    Article  PubMed  Google Scholar 

  31. Rosenbaum P, Rubin DB: The central role of the propensity score in observational studies for causal effects. Biometrika. 1983, 70: 41-55. 10.1093/biomet/70.1.41.

    Article  Google Scholar 

  32. Dehejia RH, Wahba S: Propensity score-matching methods for nonexperimental causal studies. Rev Econ Stat. 2002, 84 (1): 151-161. 10.1162/003465302317331982.

    Article  Google Scholar 

  33. Pearl J: Causality: Models, Reasoning and Inference. 2000, Cambridge University Press, Cambridge

    Google Scholar 

  34. Rosenbaum PR: Design of observational studies. 2009, Springer, New York, 1

    Google Scholar 

  35. Rosenbaum PR, Rubin DB: The bias due to incomplete matching. Biometrics. 1985, 41 (1): 103-116. 10.2307/2530647.

    Article  CAS  PubMed  Google Scholar 

  36. Rowan KM, Welch CA, North E, Harrison DA: Drotrecogin alfa (activated): real-life use and outcomes for the UK. Crit Care. 2008, 12 (2): R58-10.1186/cc6879.

    Article  PubMed  PubMed Central  Google Scholar 

  37. Stukel TA, Fisher ES, Wennberg DE, Alter DA, Gottlieb DJ, Vermeulen MJ: Analysis of observational studies in the presence of treatment selection bias: effects of invasive cardiac management on AMI survival using propensity score and instrumental variable methods. JAMA. 2007, 297 (3): 278-285. 10.1001/jama.297.3.278.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Jarvik JG, Hollingworth W, Martin B, Emerson SS, Gray DT, Overman S, Robinson D, Staiger T, Wessbecher F, Sullivan SD: Rapid magnetic resonance imaging vs radiographs for patients with low back pain: a randomized controlled trial. JAMA. 2003, 289 (21): 2810-2818. 10.1001/jama.289.21.2810.

    Article  PubMed  Google Scholar 

  39. Webster BS, Cifuentes M: Relationship of early magnetic resonance imaging for work-related acute low back pain with disability and medical utilization outcomes. J Occup Environ Med. 2010, 52 (9): 900-907. 10.1097/JOM.0b013e3181ef7e53.

    Article  PubMed  Google Scholar 

  40. Modic MT, Obuchowski NA, Ross JS, Brant-Zawadzki MN, Grooff PN, Mazanec DJ, Benzel EC: Acute low back pain and radiculopathy: MR imaging findings and their prognostic role and effect on outcome. Radiology. 2005, 237 (2): 597-604. 10.1148/radiol.2372041509.

    Article  PubMed  Google Scholar 

  41. Rosenbaum PR, Rubin DB: Constructing a Control Group Using Multivariate Matched Sampling Methods That Incorporate the Propensity Score. Am Stat. 1985, 39 (1): 33-38.

    Google Scholar 

  42. Diggle PJ, Heagerty P, Zeger SL: Analysis of longitudinal data. 1993, Oxford University Press, Oxford

    Google Scholar 

  43. Cummings P, McKnight B: Analysis of matched cohort data. Stata J. 2004, 4: 274-281.

    Google Scholar 

  44. Drummond MF, O’Brien B, Stoddart GL, Torrance GW: Methods for the economic evaluatio of health care programmes. 1997, Oxford University Press, Oxford, 2

    Google Scholar 

  45. Briggs A, Claxton K, Sculpher M: Decision modeling for health economic evaluation. 2008, Oxford University Press, Oxford

    Google Scholar 

  46. Graves JM, Fulton-Kehoe D, Martin DP, Jarvik JG, Franklin GM: Factors Associated with Early MRI Utilization for Acute Occupational Low Back Pain: A Population-Based Study from Washington State workers compensation. Spine (Phila Pa 1976). 2011, 10.1097/BRS.0b013e31823a03cc.

    Google Scholar 

  47. Glick HA, Doshi JA, Sonnad SS, Polsky D: Economic evaluation in clinical trials. 2007, Oxford University Press, Oxford

    Google Scholar 

  48. Melloh M, Staub L, Aghayev E, Zweig T, Barz T, Theis JC, Chavanne A, Grob D, Aebi M, Roeder C: The international spine registry SPINE TANGO: status quo and first results. Eur Spine J. 2008, 17 (9): 1201-1209. 10.1007/s00586-008-0665-2.

    Article  PubMed  PubMed Central  Google Scholar 

  49. Stromqvist B, Fritzell P, Hagg O, Jonsson B: The Swedish Spine Register: development, design and utility. Eur Spine J. 2009, 18 (Suppl 3): 294-304.

    Article  PubMed  PubMed Central  Google Scholar 

  50. Schluessmann E, Diel P, Aghayev E, Zweig T, Moulin P, Roder C: SWISSspine: a nationwide registry for health technology assessment of lumbar disc prostheses. Eur Spine J. 2009, 18 (6): 851-861. 10.1007/s00586-009-0934-8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Costa Lda C, Maher CG, McAuley JH, Hancock MJ, Herbert RD, Refshauge KM, Henschke N: Prognosis for patients with chronic low back pain: inception cohort study. BMJ. 2009, 339: b3829-10.1136/bmj.b3829.

    Article  PubMed  Google Scholar 

  52. Scheele J, Luijsterburg PA, Ferreira ML, Maher CG, Pereira L, Peul WC, van Tulder MW, Bohnen AM, Berger MY, Bierma-Zeinstra SM: Back complaints in the elders (BACE); design of cohort studies in primary care: an international consortium. BMC Musculoskelet Disord. 2010, 12: 193-

    Article  Google Scholar 

  53. Roovers JP: Registries: what level of evidence do they provide?. Int Urogynecol J Pelvic Floor Dysfunct. 2007, 18 (10): 1119-1120. 10.1007/s00192-007-0434-5.

    Article  PubMed  PubMed Central  Google Scholar 

  54. U.S. Preventive Services Task Force: Screening for breast cancer: U.S. Preventive Services Task Force recommendation statement. Ann Intern Med. 2009, 151 (10): 716-726. W-236

    Article  Google Scholar 

  55. Jarvik JJ, Hollingworth W, Heagerty P, Haynor DR, Deyo RA: The Longitudinal Assessment of Imaging and Disability of the Back (LAIDBack) Study: baseline data. Spine. 2001, 26 (10): 1158-1166. 10.1097/00007632-200105150-00014.

    Article  CAS  PubMed  Google Scholar 

Pre-publication history

Download references


The study is supported by the Agency for Healthcare Research and Quality : (AHRQ) R01 HS019222.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Jeffrey G Jarvik.

Additional information

Competing interests

Dr. Jarvik has the following potential conflicts of interest, although they do not relate directly to the subject of this manuscript, he lists them in the spirit of full disclosure. He serves on the Comparative Effectiveness Advisory Board for GE Healthcare. He is a co-founder and stockholder of PhysioSonics, a high intensity focused ultrasound company, and receives royalties for intellectual property. He is also a consultant for HealthHelp, a radiology benefits management company.

Authors’ contributions

JGJ, BAC, ALA, BWB, RAD, JLF, PH, LK, SSN, DRN, SDS and JAT developed the original concept of the study and developed the design of BOLD Registry study. ZB and KJ participated in the design of as BOLD and are coordinators. All authors have read and approved the final version of the article.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Jarvik, J.G., Comstock, B.A., Bresnahan, B.W. et al. Study protocol: The back pain outcomes using longitudinal data (BOLD) registry. BMC Musculoskelet Disord 13, 64 (2012).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: