Characteristics of patients with chronic back pain who benefit from acupuncture

Background Although many clinicians believe there are clinically important subgroups of persons with "non-specific" low back pain, such subgroups have not yet been clearly identified. As part of a large trial evaluating acupuncture for chronic low back pain, we sought to identify subgroups of participants that were particularly responsive to acupuncture. Methods We performed a secondary analysis of data for the 638 participants in our clinical trial comparing different types of acupuncture to usual care to identify baseline characteristics that predicted responses to individualized, standardized, or simulated acupuncture treatments. After identifying factors that predicted improvements in back-related function or symptoms, we determined if these factors were more likely to predict improvement for those receiving the acupuncture treatments than for those receiving usual care. This was accomplished by testing for an interaction between the prognostic factors and treatment group in four models: functional outcomes (measured by the Roland-Morris Disability Scale) at 8 and 52 weeks post-randomization and symptom outcomes (measured with a numerical rating scale) at 8 and 52 weeks. Results Overall, the strongest predictors of improvement in back function and symptoms were higher baseline levels of these measures, receipt of an acupuncture treatment, and non-use of narcotic analgesics. Benefit from acupuncture compared to usual care was greater with worse pre-treatment levels of back dysfunction (interaction p < 0.004 for the functional outcome, Roland Morris Disability Scale at 8 weeks). No other consistent interactions were observed. Conclusion This secondary analysis found little evidence for the existence of subgroups of patients with chronic back pain that would be especially likely to benefit from acupuncture. However, persons with chronic low back pain who had more severe baseline dysfunction had the most short-term benefit from acupuncture.


Background
Low back pain is a common and costly problem that plagues the developed world [1][2][3][4][5]. Although there are a plethora of treatment options for low back pain, most evaluated treatments are of modest, if any, benefit [6,7]. One explanation for this observation is that there are, as yet, unidentified subgroups of persons with back pain who would be more likely to respond to some treatments than to others [8,9]. The challenge, then, would be to identify which individuals would be most likely to benefit from which treatments. In fact, most primary care clinicians believe there are distinct subgroups of patients within currently defined "non-specific" low back pain [9]. If true, determining which treatments would be most beneficial for specific types of patients could substantially improve the effectiveness and efficiency of treatment [10,11].
Although many studies have identified factors associated with improvement in back pain [12][13][14], only a few have looked at those which predict improvement from a specific treatment. For example, analyses by Underwood [15] suggested that greater expectations of treatment success might be more likely to lead to improved outcomes for patients receiving exercise combined with spinal manipulation than for those receiving usual care, exercise alone, or manipulation alone. Other studies found that persons with high levels of fear avoidance were more likely to benefit from an educational booklet and an exercise program [16] than were those without high levels of fear avoidance.
This study contributes to this nascent field of research by analyzing data from a large randomized trial of acupuncture for chronic low back pain [17] in order to determine if there are identifiable subsets of patients who were especially likely to benefit from acupuncture.

Methods
We performed a secondary analysis of data for the 638 participants in our clinical trial of acupuncture for chronic back pain. In that trial, we found that participants who received acupuncture or simulated acupuncture had greater improvements in functional status and symptoms at the end of the treatment and at follow-up than those receiving usual medical care [17]. Details of the primary results and trial design are presented elsewhere [17,18], but a brief summary of the trial design is provided below. The trial was conducted in two integrated health care systems (Group Health in Seattle and Kaiser Permanente in Northern California) whose institutional review boards approved the study.
We recruited 638 participants 20 to 70 years of age with non-specific low back pain that had lasted at least 3 months. Persons with prior use of acupuncture were excluded. Participants were randomized to one of four treatments: individualized acupuncture, standardized acupuncture, simulated acupuncture (non-insertive stimulation of acupuncture points), or usual medical care. Participants were informed that the study was evaluating "different methods of stimulating acupuncture points". Those randomized to acupuncture or simulated acupuncture received 10 treatments over 7 weeks. All participants also received a self-care book and retained full access to the medical care provided by their insurance benefit. Telephone interviewers, masked to treatment, administered questionnaires to participants at baseline and at 8 and 52 weeks post-randomization.

Baseline Data Collection
We collected baseline information on sociodemographic characteristics, status of current back pain, back pain history, health status, perceived likelihood of self-managing future back pain, and expectations of acupuncture's helpfulness. Back pain history included pain duration and prior use of injections, hospitalization or surgery for back pain. Participants who reported use of any of these three treatments were labeled as having received "intensive treatment". To characterize the current episode, we asked about activity limitations (i.e., the number of days spent in bed, lost from work or school, or cutting down on usual activities due to back problems during past month) [19], back-related functional status (using the 23-item modified Roland Morris Disability Scale [Roland score], where a higher score indicates greater dysfunction) [17,20], bothersomeness of current back symptoms using a 0 to 10 numerical rating scale (where a higher score is associated with worse symptoms) [17] pain below the knee, and medication use in the past week. The SF-36 Mental Health Component Summary Score was used to measure mental health status [21]. Finally, expectations of acupuncture's helpfulness was assessed using a 0 to 10 numerical rating scale, which were then analyzed in three categories: top tertile of expectations (8)(9)(10), lower two tertiles of expectations (0-7), or "could not rate" for those who could not provide a rating. The characteristics of the study population are presented in Table 1.

Statistical Methods
Prior to undertaking the analysis, we identified 15 variables as potential predictors of change in outcomes. These were four demographic measures, three questions about back pain history, five questions about current back pain episode, one scale about mental health status, and one question each about likelihood for self-managing future back pain and expectations of acupuncture. Sociodemographic information included age, gender, educational level (trichotomized as at least college graduate vs. other vs. unknown) and physical demands at work (categorized as unemployed, sedentary, light/medium lifting, heavy lifting, unknown). Back pain history included length of time since first back pain (categorized as less than one year, a year or more, unknown), ever had any intensive back treatments (i.e., injection, hospitalization, or surgery), and number of days of back pain in the last 3 months. The questions characterizing the current episode of back pain included the baseline dysfunction score (Roland score), the baseline bothersomeness score, if there were any of 3 types of activity limitations in the last month described in the previous paragraph (yes vs. no), medication use in the past week (narcotics, any other, none,) and pain below the knee (yes vs. no). Mental health status was measured by the Mental Component Score of the SF-36 and self-efficacy and expectation of acupuncture helpfulness were categorized as reported in the previous paragraph.
Because important outcomes among persons with back pain include both functional improvement and symptom relief in the near and long-term, we constructed four separate ordinary least squares linear regression models to explore whether functional status or symptoms changed in response to acupuncture or simulated acupuncture treatment: dysfunction (Roland score) and bothersomeness score at both 8 and 52 weeks. These four initial models evaluated which of the 15 candidate variables predicted change in one of the outcomes at a specific time point. In addition to the 15 candidate variables described above, we included treatment group.
We then constructed reduced models for each of the four outcome/time points. These models included only those variables that were significant predictors of outcome at α ≤ 0.05 (for a two tailed test) in at least one of the initial models. We also chose, a priori, to include gender and expectation of acupuncture helpfulness because gender is often related to outcome and many other studies have found patient expectations for a treatment associated with better outcomes among those who received it [22][23][24][25]. The reduced models also included interaction terms between each treatment group and all other independent variables. This allowed us to test the hypothesis that specific predictor variables could predict response to different acupuncture treatments. Because of the large number of comparisons, we only report interactions where α ≤ 0.01 for a two tailed test. Furthermore, we looked for consistency of the interactions among both acupuncture groups and the simulated acupuncture group to rule out spurious results that might occur for one group, but not the other acupuncture groups.
Finally, we decided to test whether the percentage change in the dysfunction outcome at 8 weeks was constant over the range of baseline values for each of the three acupuncture or simulated acupuncture groups. We used a log transformation of our data, first adding a constant value of 1 to the dysfunction score to eliminate values equal to zero. Therefore, the model we tested was: log 10 [(Roland at 8 weeks + 1)/(Roland at baseline + 1)] = (log 10 (Roland at 8-weeks + 1) -log 10 (Roland at baseline + 1)). All data were analyzed using SAS/STAT version 9.1 [26].

Baseline Predictors of Outcome
Treatment group and six of the 15 potential predictor variables were significant predictors of changes in outcome in at least one of the four models (Tables 2 and 3). Treat-ment group and baseline dysfunction score were significant in three of the four models. Persons receiving acupuncture or simulated acupuncture improved more in function than those who received usual care at both 8 and 52 weeks and in symptom reduction at 8 weeks. Higher baseline dysfunction scores were associated with worse dysfunction scores at both 8 and 52 weeks and with worse bothersomeness scores at 52 weeks. Higher baseline bothersomeness scores predicted higher bothersomeness scores at both 8 and 52 weeks. Use of narcotics was associated with worse functional and symptom outcomes at 52 weeks. No other measures were found to be significant predictors of outcome in more than one of the four models.

Interaction between Baseline Predictor Variables and Treatment Response
Compared to those receiving usual care, the 8 week dysfunction score improved more for those randomized to any of the three acupuncture or simulated acupuncture groups who had higher levels of dysfunction at baseline (overall interaction p = 0.004) ( Table 4). There were consistent effects for each acupuncture group relative to usual care. A similar overall interaction between dysfunction and treatment was found for the 8-week bothersomeness score (interaction p = 0.01), but it was found in only two of the three acupuncture or simulated acupuncture groups, i.e., those receiving individualized or simulated acupuncture had greater improvement in the 8 week bothersomeness score if they had worse baseline back-related dysfunction (Table 5). There was no suggestion of an interaction in the standard acupuncture group, however. By 52 weeks, these interactions were no longer evident for either dysfunction or bothersomeness (Tables 4 and 5). The interaction between baseline dysfunction and acupuncture treatment appears to be due to an increased absolute improvement of the treatment groups compared to usual care when the baseline dysfunction score was worse. Figure 1 depicts the adjusted mean 8 week dysfunction score for each treatment group as a function of the baseline value of the dysfunction score. This figure clearly shows, as the baseline dysfunction score increases, the absolute difference between the 8 week dysfunction score of the acupuncture groups and the usual care group increases, with the acupuncture groups showing greater improvement in function. This is on an additive scale where the change in dysfunction score is the difference between the 8 week and baseline measures. However, when we measured the 8 week dysfunction score on a multiplicative scale in terms of the percent reduction from baseline, we found an approximately 30% reduction for the acupuncture groups across all levels of the baseline dysfunction score. This illustrates the importance of the scale used when determining whether there is statistical interaction between two variables. There were no other significant interactions between predictor variables and treatment group.

Discussion
This secondary data analysis found that persons with more severe pre-treatment back dysfunction demonstrated the greatest benefits from acupuncture or simulated acupuncture treatment, as measured by changes on the Roland score. Few other significant interactions emerged and none were consistent for both short and long term follow-ups. Regression to the mean is probably responsible for some of the greater improvement after 8 weeks in members of the usual care group with the worst back pain, given our finding that individuals in the usual care group improved more if their baseline dysfunction scores were worse. However, this phenomenon is unlikely to explain why the difference between usual care and acupuncture at 8 weeks increased as the baseline dysfunction scores increased. Thus, we have demonstrated interaction on an additive scale with the measurement of the Roland scores as absolute changes from baseline. This may merely reflect the greater opportunity for absolute change in those with higher baseline Roland scores. However, looked at from the perspective of relative percentage change from baseline, which is consistent with working on a multiplicative scale, the acupuncture (and simulated acupuncture) groups reduced their dysfunction approximately 30% more than did the usual care group -no matter what the baseline dysfunction score actually was. Thus, there was no interaction on the multiplicative scale. In fact, the measurement of interaction here, as is commonly found, depends on the scale that is being used.
Our findings are generally consistent with those of the few prior studies attempting to identify subgroups of individuals who respond best to specific treatments for back pain. Typically, these studies report few strong and consistent characteristics that identify subgroups of treatment responders for a specific intervention [15,27]. However, most of these studies are not large enough to identify all but the strongest interactions. In one of the largest pragmatic trials of acupuncture for chronic back pain including over 2000 patients, Witt et al [28] found that 3 of 9 evaluated characteristics of patients (younger age, worse baseline back dysfunction, more than 10 years of education) were effect modifiers indicating better response to acupuncture. One of the challenges in comparing results across studies is that studies typically assess a somewhat * For ease of identification, all independent variables with P < 0.5 are in boldface type ** The parameter estimates β refer to the amount of change in the outcome that is based on a one unit change in that covariate (continuous variables) or a change in category (categorical variables) different list of possible characteristics as potential moderators of response to treatment.
Our finding that pre-treatment expectations did not predict response to specific types of acupuncture differs from the findings of previous researchers. Kalauokalani [22] and Linde [23] both found that more optimistic expectations of treatment led to better outcomes from acupuncture. Thomas's [29] results were more complex. She found no benefit of acupuncture over usual care for persons with positive beliefs about acupuncture, but found acupuncture more effective for those who were agnostic about its benefits.
The results of these trials cannot be directly compared to our study because of differences in the way the data were collected and analyzed. However, our finding that individuals who could not provide a rating of their expectation of acupuncture's effectiveness did not have worse outcomes clearly demonstrates that our findings are different than those of Linde [23]. In that study, participants who could not rate their expectation of acupuncture's effectiveness did worse than the others, who nearly always believed that acupuncture would be "effective" or "very effective". Given the variability in the findings across these studies, further research is needed to understand the different effects of pre-treatment expectations on outcomes of acupuncture care.
Our study has a number of limitations. For one thing, our study only explored characteristics of individuals that were predictive of superior outcomes for acupuncture (or a type of acupuncture) versus usual care. Conceivably, our findings may have differed had we used a different comparison group. We did not collect data on fear avoidance, which is associated with poor prognosis in some data sets [27]. If patients with higher levels of fear avoidance were less likely to improve from acupuncture, our lack of information on this variable would be a limitation of our study.
Our study was large and high follow-up rates. However, the samples sizes required to detect interactions must be four times larger than that required for detecting a main effect of similar magnitude [30]. Thus, we would be able to detect only large interactions.
Finally, as with all post-hoc analyses, the results must be interpreted with caution and need to be replicated in other data sets. We suspect that replication would best be * For ease of identification, all independent variables with P < 0.5 are in boldface type ** The parameter estimates β refer to the amount of change in the outcome that is based on a one unit change in that covariate (continuous variables) or a change in category (categorical variables) undertaken in the context of a meta-analysis using individual patient level data from all included studies, as that would increase the sample size substantially [31]. The Acupuncture Trialist Collaboration, a new international collaboration among researchers to share data and conduct meta-analyses from large trials of acupuncture for pain, may be well-suited to conduct such analyses.
Researchers have employed two different approaches in their attempts to identify sub-groups of persons with low back pain that would benefit from specific treatments. Some studies, including ours, have searched for subgroups using regression analyses to see what characteris-tics are associated with superior outcomes for specific treatments. Others have developed "clinical prediction rules" wherein patients are initially categorized into more homogenous groups based on clinical findings and pain history. Such rules can then be tested in studies where patients are given treatments that are matched to the type of treatment that is believed better able to address their underlying problem [32]. For example, Childs [11] used this approach to validate a clinical prediction rule for spinal manipulation.
Clinical prediction rules have yet to be identified for acupuncture. In principle, various Chinese medicine find- ings, including Chinese medicine diagnosis, might be useful for developing such a rule. In practice, however, progress in this area has been limited because there is typically poor diagnostic concordance among TCM practitioners [33,34] and because individual patients with chronic low back pain are often given multiple TCM diagnostic labels [34,35].
Thoughtful collaboration among practitioners and researchers may ultimately lead to the development of prediction rules that match patients to the most appropriate health care provider. Such collaborations are most likely to be fruitful if they initially focus on developing comprehensive models that incorporate the physiological underpinnings of the biopsychosocial model [36].

Conclusion
This analysis found little evidence for the existence of subgroups of patients with chronic back pain that would be especially likely to benefit from acupuncture. The only statistically significant and consistent finding was that persons starting with greater back dysfunction improved the most from acupuncture or simulated acupuncture after 8 weeks, in terms of change score, although the per- centage improvement from baseline was consistent all levels of baseline dysfunction. Future studies are needed to confirm the findings of this post-hoc analysis.
Predicted values of the 8-week dysfunction score (Roland score) by baseline dysfunction score (Roland score) for each treatment group Figure 1 Predicted values of the 8-week dysfunction score (Roland score) by baseline dysfunction score (Roland score) for each treatment group. The predicted values are adjusted for baseline values of: Roland score, bothersomeness score, and age (as continuous variables); gender, employment type, medication use, acupuncture expectation, self-efficacy, and group (as categorical variables); and interaction between baseline Roland score and treatment group. The adjusted means assume a mean age of 47 years, bothersomeness = 5, and equal weighting in each level of the categorical covariates.