We searched electronic databases PubMed and EMBASE (to May 2016) using controlled vocabulary and keyword variations of the concepts: emergency department, low back pain and prevalence (see Additional file 1 and Additional file 2). We conducted citation searches of seminal studies [9, 10, 12–15]. For studies with greater than 500 citations, we searched within citations for “emergency department” using Google Scholar. We reviewed reference lists of included studies to identify other potentially relevant studies. Additionally, our literature search incorporated all relevant literature that was identified in a broad scoping review mapping published research studies about back pain in the emergency department . We searched for relevant subsequent publications for any abstracts identified.
We searched the grey literature guided by the ‘Grey Matters’ checklist ; we searched all websites listed in the checklist under the headings of health economics [e.g. Public Health Agency of Canada] or health statistics [e.g. Canadian Institute for Health Information and the CDC National Centre for Health Statistics], excluding pharmacological based websites (see Additional file 3). Websites that we reviewed collected data from Canada, the United States, Australia, Ireland, England, Scotland and five international databases (e.g. World Health Organization). We searched these websites 10 pages deep using the following search criteria, “low back pain” and “prevalence” and “emergency department”. We did not restrict searches by language or date. The grey literature search was conducted in May 2016.
We included studies that investigated patients presenting to emergency settings. We defined ‘emergency setting’ as all pre-hospital, emergency, ambulatory, outpatient, accident, trauma, triage and urgent care services. Standard emergency settings provide initial treatment to patients with a broad spectrum of illnesses and injuries, some of which may be life threatening and require immediate action. For completeness, we included non-standard emergency settings, which provide care for a limited population and/or limited spectrum of illness and injuries (for example, orthopedic emergency settings). Additionally, we included studies from any year, written in any language.
We classified emergency settings by size. Emergency department settings with less than 10,000 annual visits were categorized as ‘rural’, those with more than 10 000 annual visits were categorized as ‘metropolitan’ and we separately considered studies that used nationally representative samples of emergency settings.
We categorized emergency settings by country level health care system funding. Studies were classified as being either using primarily a public funding system or a private funding system. If information was not provided in the publication, this data was collected from governmental websites and online encyclopedias identified using the search engine Google. We defined publicly funded healthcare systems as systems with no out of pocket costs associated with care in an emergency setting. We defined private funded healthcare systems as systems that require out of pocket payments for most visits to emergency settings and many procedures.
We included studies that measured adults presenting with low back pain. We defined adults as individuals over the age of 14, as this is an age where patients are likely to be diagnosed and treated as an adult . If study selection criteria were mixed or unclear, we defined studies with an adequately ‘adult’ population as those with a minimum mean age of 30 years.
We included studies that used any definition of back pain. We used subgroups to explore the impact of study definitions of low back pain. We categorized studies that identified patients from presenting complaint codes and studies that captured their study population from diagnostic codes, and we collected information on the specific coding system used.
We categorized low back pain definitions as ‘broad’ or ‘narrow’. Studies were defined as ‘broad’ if they used a general definition of ‘back pain’ to define their prevalence estimate. These studies may have included some individuals with back pain in regions other than the low back (for example, thoracic spine). Studies were defined as ‘narrow’ if they used the definition of ‘low back pain’ or ‘non-specific low back pain’, or were limited to pain complaints in the lumbar region.
We included studies that presented data about the prevalence, including presentation of a prevalence rate (total number of adults presenting to an emergency setting with low back pain / total number of individuals presenting to the emergency setting over a specified period of time), or raw data to allow prevalence calculation.
Study selection and data collection
Two independent reviewers screened the titles and abstracts from the electronic database searches for studies meeting our selection criteria. In the case of disagreement, resolution was achieved by discussion with a third reviewer. The primary author screened the titles from the grey literature searches, reference lists (from included studies), results of the scoping review , and citation searches. Full articles were obtained for potentially relevant studies, or where relevance was unclear; two authors independently assessed the full text to determine eligibility prior to data extraction.
Two independent reviewers performed data extraction. In the case of disagreement, resolution was achieved by including a third reviewer. We used a data extraction form (see Additional file 4), to record information about the methods and results of each included study, including study objectives, location and type of emergency setting, study period, sample size, the definition of low back pain used by the study authors to calculate prevalence, population characteristics including age and sex, and the prevalence estimate. In studies using the same datasets, we extracted the prevalence data of the study that was conducted over the longest period of time and rated as having the lowest risk of bias. Finally, one reviewer collected information from an independent Google search to characterize each study’s country-level healthcare system funding system.
Two independent reviewers critically appraised each included study using a tool developed by Hoy et al., , to assess prevalence studies (see Additional file 5) . In the case of disagreement or uncertainty, discussion was used to reach consensus with a third reviewer. The modified tool assesses each study according to nine domains: three external validity domains, and six internal validity domains, plus one item assessing overall risk of bias. The external validity domains assess the target population; sampling and non-response bias, while the internal risk of bias domains assess data collection, case definitions, assessment tools, prevalence period and an assessment of the numerator and denominator. We modified the tool by omitting an additional domain that assesses whether the study population represents the national population, which was not relevant to our review. The reviewers rated each of the nine domains as either high or low risk of bias; the overall risk of bias was rated as low, moderate or high risk of bias. We judged an overall low risk of bias if a study scored ‘low risk of bias’ on all domains. A moderate risk of bias study had one to two domains rated as a high risk of bias, and an overall high risk of bias study had three or more domains rated as a high risk of bias.
Descriptive analyses were used to report study characteristics. We reported prevalence ranges, information on emergency settings, study methodology, and study populations.
We used meta-analyses to pool prevalence estimates for sufficiently homogeneous groups of studies conducted in standard emergency settings. Subgroup analyses explored the impact of study level characteristics: back pain definition, coding system used for definitions of low back pain, health care system and emergency setting on prevalence estimates. This is an essential part of analyzing prevalence studies, as the largest contributor to heterogeneity is most likely due to differences in the way the studies were carried out . We were interested in further subgroup analyses (e.g., rural settings), however, we were limited by the available literature.
For all meta-analyses, we used a random-effects model to calculate mean prevalence rates and 95% confidence intervals. In this model, larger studies have more narrow confidence intervals and higher weight on the pooled estimate. We normalized the distribution of the prevalence rates by transforming the prevalence estimates reported in the publications (or calculated using reported data) using a double arcsine transformation. This transformation addresses the main issues associated with performing meta-analysis of prevalence estimates. It stabilizes the variance when pooling prevalence estimates and reduces the bias when combining prevalence estimates close to 0 or 100 . The rates were restored for presentation of results. We assessed statistical heterogeneity using the Q statistic and I2 index . We used forest plots to graphically present prevalence estimates and 95% CIs. We tested subgroups for inter-group heterogeneity using the Q statistic .
We performed a random effects meta-regression analysis to explore the independent association of three clinically relevant characteristics with prevalence: the coding system used for definitions of low back pain, health care system funding, and study risk of bias . Results of the analysis were used to determine the variance explained by the covariates and their contribution to the total variance in the prevalence estimates. For our analysis we used the Knapp-Hartung variance estimator and associated t-test to calculate p-values and confidence intervals . We performed sensitivity analyses by excluding studies judged to have a high risk of bias. All analyses were performed using STATA 13.1
Assessing the quality of evidence (GRADE)
We adapted components of the GRADE  framework to assess the overall quality of the available evidence on the prevalence of low back pain in the emergency setting, judged as high, moderate, low or very low quality evidence based on: study limitations (overall risk of bias of the evidence identified), imprecision (study sample sizes), indirectness (generalizability of included studies) and inconsistency (unexplained heterogeneity) [17, 24]. Additional file 6 provides additional detail about our assessment of the overall quality of the evidence.