Accuracy and reproducibility of a retrospective outcome assessment for lumbar spinal stenosis surgery

Background Retrospective assessment of surgery outcome is considered problematic. The aims of this study were to evaluate the reproducibility and accuracy of a retrospective outcome assessment of lumbar spinal stenosis surgery with reference to prospective outcome scale measurements. Method Outcome of surgery from 100 lumbar spinal stenosis (LSS) patients was evaluated retrospectively from patient files of a 3-month outpatient visit performed according to a standard clinical protocol by two independent researchers. In the retrospective analysis, outcome was graded as 2 = good if the clinical condition had clearly improved, 1 = moderate if it had just slightly improved, 0 = poor if it had not improved or was even worse than before the surgical treatment (Retrospective 3- point scale). A prospectively assessed Oswestry Disability Index questionnaire (ODI), Visual analogue pain scale (VAS) and a patient satisfaction questionnaire were used as references of standards. Reproducibility of the measurements was evaluated. Results The retrospective 3-point scale correlated with ODI (r = 0.528; P < 0.001) and VAS (r = 0.368; P < 0.001). The agreement was better in the good and poor outcome than in the moderate outcome. Retrospective 3-point scale demonstrated substantial intra-rater and inter-rater repeatability (κ = 0.682, P < 0.001 and κ = 0.630, P < 0.001, respectively). Conclusions Retrospective assessment of spinal surgery outcome is highly reproducible. Accuracy is highest in the patients with poor and good surgical result.


Background
Lumbar spinal stenosis (LSS) is the most common indication for lumbar spinal surgery in people aged over 65 years [1]. The long-term results of surgery are poor in one third of patients [1,2], emphasizing the need for investigation of the predictive factors of surgical outcome [2,3] and patient selection for surgery [4]. Prospective studies are the best way to perform research. In prospective studies, however, patient selection may differ from the patient selection in daily clinical routine. In addition, comparison of treatment with historical controls is not feasible. Retrospective studies can include large patient materials. However, assessment of outcome in retrospective analysis is questionable. To the best of our knowledge, however, no previous study has investigated the accuracy and reproducibility of retrospective outcome measurements. Accordingly, the aims of this study were to evaluate the reproducibility and accuracy of a retrospective outcome assessment for lumbar spinal stenosis surgery with reference to prospective outcome scale measurements. As a model cohort we used a well characterized patient cohort which has undergone surgery for lumbar spinal spinal stenosis in a prospective study design.

Patients
The study included 102 patients with both clinically and radiologically defined lumbar spinal spinal stenosis (LSS) who had been selected for surgical treatment. The collection of the study cohort has been described in detail previously [5,6]. Briefly, selection for surgery was made by an orthopaedist or neurosurgeon between October 2001 and October 2004 in Kuopio University Hospital, Kuopio, Finland. The inclusion criteria were: (1) presence of severe back, buttock, and/or lower extremity pain with radiographic (computed tomography, magnetic resonance imagining, myelography) evidence of compression of the cauda equina or exiting nerve roots by degenerative changes (ligamentum flavum, facet joints, osteophytes and/or disc material), and (2) the surgeon's clinical evaluation that the patient had degenerative LSS that could be treated operatively. In addition, all patients had a history of ineffective response to conservative treatment over three months. At the 3-month follow-up, two of the 102 baseline patients had missing BDI and ODI data, thus the final sample size was 100.
The exclusion criteria were: emergency or urgent spinal operation precluding recruitment and protocol investigations; cognitive impairment prohibiting completion of the questionnaires or other failures in co-operation; and the presence of metallic particles in the body preventing the MRI investigation. The surgeons sent the information of eligible patients to the Department of Physical and Rehabilitation Medicine, which organized the study. A previous spine operation or co-existing disc herniation (N = 13) were not exclusion criteria. Sixteen patients (out of 100 study patients) had previously undergone one or more lumbar spine operations. Seventeen patients had only lateral spinal stenosis.
All the 100 patients had open or microscopic decompressive surgery with (N = 19) or without (N = 81) arthrodesis or with extirpation of disc herniation (N = 7). Decompressive surgery included laminotomy, hemilaminectomy or laminectomy with undercutting facetectomy. Decompression was done at 1 level in 23 patients, 2 levels in 51 patients, 3 levels in 24 patients and 4 levels in 2 patients. The most common level for decompression was L4-L5. Of the 19 cases with concomitant degenerative spondylolistesis leading to posterolateral fusion, three reached two levels, and the remaining 16 cases were single level.
The study was approved by the Ethics Committee of Kuopio University Hospital, and the patients provided informed consent.

Retrospective outcome scale measurement
In the retrospective analysis, surgical outcome was evaluated from the medical records by two independent researchers blinded for the prospective questionnaire data. Patient outcome was graded as 2 = good if the clinical condition had clearly improved which was the case when the patient was satisfied to the surgical treatment and symptom free, 1 = moderate if it had only slightly improved symptoms and the patient was not fully satisfied to the surgical treatment, 0 = poor if it had not improved symptoms or was worse than before the surgical treatment which was the case if the patient was totally dissatisfied to the surgical treatment (Retrospective 3-point scale). The judgement was based on the information in the medical records during the postoperative 3-month clinical check-up when the surgeon met the patient and patient told for the surgeon about how he or she was doing and how satisfied patient was to surgical treatment. To assess the inter-rater repeatability of the retrospective scale, the evaluation of the patient files was repeated completely for all patients (N = 100) by an independent senior neurosurgeon blinded for the previous evaluation. To assess the intrarater repeatability, the retrospective evaluation of the patient files was repeated completely (N = 100) of at least 2 months after the first evaluation by the first independent researcher, who was again blinded for previous results and prospective questionnaire data.

Prospective outcome scale measurements
Overall back and leg pain intensity was assessed by a self-administered Visual analogue scale (VAS) (range 0-100 mm). This has been proved to be a valid index of experimental, clinical and chronic pain [7]. Subjective disability was measured by the validated Finnish version of the Oswestry Disability Index, where 0 % represents no disability and 100 % extreme debilitating disability [8,9]. Depression was assessed with the Finnish version of the 21-item BDI with scores ranging from 0 to 63 [10,11]. Patients completed the ODI, VAS and BDI questionnaires at the baseline and 3 month after operation.

Statistical analyses
Associations between the retrospective 3-point surgical outcome scale and the prospectively measured (baseline, 3-month follow-up and change) ODI, VAS and BDI were analysed using Spearman correlation coefficients. We analysed separately analysis for patients with the only isolated lateral spinal stenosis to study possible difference outcomes in the central and lateral spinal stenosis patients. The inter-rater and intra-rater repeatability of the retrospective scale was analysed by calculating kappa coefficients (κ). Statistical significance was set at the P < 0.05 level.
3-point retrospective outcome scale and it correlated with the mean 3-month VAS (Spearman r = 0.368, P < 0.001) ( Table 2). Spearman correlation coefficient was higher in patients with lateral canal stenosis only (r = 0.592, P = 0.012, N = 17) than in patients with central canal stenosis (r = 0.335, P = 0.002, N = 83). 3-point retrospective outcome scale correlated with the mean 3-month BDI (Spearman r = 0.300, P < 0.005) ( Table 2) We did not find any statistically significant difference when comparing the baseline and follow-up ODI, VAS and BDI scores or their change between the patients with pure spinal stenosis to those with concomitant instability or with concomitant disc herniation (Table 3).
Both the intra and inter-rater repeatability of the retrospective 3-point surgical outcome scale was substantial (κ = 0.682, P < 0.001 and κ = 0.630, P < 0.001, respectively). Overall agreement was 83 % (N = 68) and there was only one case with total disagreement in the surgical result between the researchers.  In prospective studies, the outcome of treatment can be measured with standard questionnaires such as the Oswestry Disability Index (ODI) [8] and the Roland-Morris Disability Questionnaire (RDQ) [12], the Visual analogue pain scale (VAS) [13], the work disability time [14,15] and quality of life questionnaires such as SF-36 [16], EQ-5D [17] and 15D [18]. Comorbidity measures such as the Beck Depression Index (BDI) [10] and the Fear-Avoidance Belief Questionnaire (FABQ) [19] are also used.
Our results show that the outcome of surgery can be evaluated also retrospectively. Accuracy is highest in patients with poor and good surgical result. Both the intra-and also the inter-rater reproducibility of retrospective assessments are acceptable. The moderate outcome is the most challenging to determine and its retrospective evaluation could be questioned (Figure 1).
This study indicates that patients who had at the baseline worse scores in the ODI, VAS and BDI had also worse surgical outcome according the retrospective 3-point scale. The bigger ODI change between the baseline and 3 month follow-up also correlated to better outcome (Figure 3). This data could be used in clinical work to predict possible surgical outcome.
The higher correlation of the 3-point outcome scale with the ODI than with the VAS and BDI is logical. The VAS measured overall back pain, which is, in contrast to neurogenic claudication, usually not the worst symptom relieved by surgery in LSS patients. With regard to the BDI, improvement in disability and pain are the most important aspects of good outcome [4], and depression is only a comorbid condition, although, a potential predictor of outcome. Interestingly, correlations with the VAS and BDI were almost two times higher in patients with only lateral stenosis compared with central stenosis patients. One explanation for this could be that severe lateral spinal stenosis causing nerve compression is the major cause of pain and disability, and patients may have fever other symptomatic structural changes in their spine. One limitation of this study is the relatively small number of patients with lateral spinal stenosis.

Conclusions
Retrospective assessment of spinal surgery outcome is highly reproducible. Accuracy is highest in the patients with poor and good surgical result.