Towards standardized measurement of adverse events in spine surgery: conceptual model and pilot evaluation

Background Independent of efficacy, information on safety of surgical procedures is essential for informed choices. We seek to develop standardized methodology for describing the safety of spinal operations and apply these methods to study lumbar surgery. We present a conceptual model for evaluating the safety of spine surgery and describe development of tools to measure principal components of this model: (1) specifying outcome by explicit criteria for adverse event definition, mode of ascertainment, cause, severity, or preventability, and (2) quantitatively measuring predictors such as patient factors, comorbidity, severity of degenerative spine disease, and invasiveness of spine surgery. Methods We created operational definitions for 176 adverse occurrences and established multiple mechanisms for reporting them. We developed new methods to quantify the severity of adverse occurrences, degeneration of lumbar spine, and invasiveness of spinal procedures. Using kappa statistics and intra-class correlation coefficients, we assessed agreement for the following: four reviewers independently coding etiology, preventability, and severity for 141 adverse occurrences, two observers coding lumbar spine degenerative changes in 10 selected cases, and two researchers coding invasiveness of surgery for 50 initial cases. Results During the first six months of prospective surveillance, rigorous daily medical record reviews identified 92.6% of the adverse occurrences we recorded, and voluntary reports by providers identified 38.5% (surgeons reported 18.3%, inpatient rounding team reported 23.1%, and conferences discussed 6.1%). Trained observers had fair agreement in classifying etiology of 141 adverse occurrences into 18 categories (kappa = 0.35), but agreement was substantial (kappa ≥ 0.61) for 4 specific categories: technical error, failure in communication, systems failure, and no error. Preventability assessment had moderate agreement (mean weighted kappa = 0.44). Adverse occurrence severity rating had fair agreement (mean weighted kappa = 0.33) when using a scale based on the JCAHO Sentinel Event Policy, but agreement was substantial for severity ratings on a new 11-point numerical severity scale (ICC = 0.74). There was excellent inter-rater agreement for a lumbar degenerative disease severity score (ICC = 0.98) and an index of surgery invasiveness (ICC = 0.99). Conclusion Composite measures of disease severity and surgery invasiveness may allow development of risk-adjusted predictive models for adverse events in spine surgery. Standard measures of adverse events and risk adjustment may also facilitate post-marketing surveillance of spinal devices, effectiveness research, and quality improvement.


Conclusion:
Composite measures of disease severity and surgery invasiveness may allow development of risk-adjusted predictive models for adverse events in spine surgery. Standard measures of adverse events and risk adjustment may also facilitate post-marketing surveillance of spinal devices, effectiveness research, and quality improvement.

Background
An early warning system is needed to identify surgical devices and techniques that perform poorly when introduced into general practice [1]. Expensive technological innovations commonly gain widespread use based on limited comparative data and minimal systematic postmarketing surveillance [2]. Thus, awareness of adverse effects associated with these innovations accumulates haphazardly and disseminates slowly [3].
Adverse event assessment in spine surgery is mired by additional difficulties. In contrast to certain other procedures (such as hip and knee arthroplasty) that are fairly standardized across patients, spine surgery is much more individualized for the specific spinal pathology, combining various graft materials and fixation devices with varying degrees of vertebral decompression and fusion. Randomized trials of spine surgery typically focus on one or a few specific types of procedures, providing limited comparative data on the safety of different surgical approaches and devices. In observational studies, which in many ways are better suited for safety assessment [4,5], procedural variations might obscure the impact of a specific treatment. Also, the effects of treatment may differ across different groups of patients. This study was designed to develop measures and an analytical model to adjust for these variations when assessing safety of spine surgery.
We propose studying the safety of spine surgery for degenerative disease through a conceptual model in which safety is broadly defined as a function of preoperative patient, disease, and treatment characteristics: In this framework, the effect of an individual treatment factor on safety can potentially be distinguished from the effects of other relevant patient and disease characteristics ( Figure 1).
Specification of therapeutic safety is central to this model. Safety may be specified as a narrowly defined particular outcome, or it may be described as a set of adverse events characterized by specific criteria for timing, setting, severity, preventability, or causal pathway. Consistent terminology and definitions for safety outcomes are essential, both for comparing treatments and for assessing improvements over time [6].
Patient characteristics relevant for predicting surgical adverse events include age [7], height and weight (body mass index) [8], smoking status [9], burden of coexisting medical conditions [10], gender, and race [11,12]. When assessing consequences of an adverse event on clinical outcomes, such as pain or function, adjustment may also be necessary for psychosocial factors such as education, work conditions, and psychological stress [13].
To measure the severity of spinal disease, new methods are needed. Neurological function may be designated simply as normal or abnormal, or quantified by a score such as the American Spinal Injury Association (ASIA) motor score [14]. Prior surgery at the involved spinal segments may be measured as yes-no or as the number of prior operations. Quantifying degenerative structural changes across multiple spinal segments is more challenging, but at minimum, the methods must account for the severity of disc space and facet joint degeneration [15], spinal stenosis [16,17], and vertebral mal-alignment such as spondylolisthesis [12], scoliosis [18], and kyphosis [19].
New methods are also needed to measure treatment (surgical procedure) factors. Differences in the "invasiveness" of surgical procedures (e.g., route of surgical access, location of nerve roots decompressed, number of vertebrae fused and instrumented) influence risks.
The following multivariate analytical model provides a more detailed specification of the conceptual framework for evaluating the safety of spine surgery for degenerative disease: Multiple regression methods such as logistic regression can estimate independent effects of each variable on the likelihood of particular adverse events. We are evaluating the feasibility and utility of this conceptual model for measuring the safety of different types of lumbar spine surgery. The initial goals of this project are: (1) to identify the frequency, nature, and severity of adverse occurrences associated with lumbar spine surgery; (2) to quantify the severity of lumbar degenerative changes; (3) to quantify the invasiveness of the surgical procedure.
Longer term goals are: (4) to measure the consequences of adverse events on pain and patient-reported health status two years after surgery; and (5) to combine these new measures of disease severity and surgical invasiveness with established medical co-morbidity measures in predictive models of adverse events.
In this report, using data from the initial six months of the study, we describe the methods and the preliminary results for the first three goals.

Definitions
We define an adverse occurrence as any medical event in the course of a patient's treatment that has the potential for causing harm to the patient. We selected the term "adverse occurrence" to avoid the connotation of blame often associated with the term "complication." We reserve the term "adverse event" for the subset of adverse occurrences where the patient experiences harm or requires additional monitoring or intervention [20].

Study design
This report describes research conducted to develop analytical tools for a prospective cohort study of adverse occurrences in lumbar spine surgery. The inclusion and exclusion criteria for the lumbar study are listed in Table  1. The University of Washington (UW) institutional review board approved the study. For this report, we relied on data collected during first six months of that study.

Outcomes
The primary outcome is a discrete variable that indicates the presence of an adverse occurrence (1 = yes, 0 = no). In the future, we will measure the sensitivity of the safety assessment to different thresholds of adverse occurrence Framework for Safety Assessment Figure 1 Framework for Safety Assessment. The relationship of patient, disease, and treatment factors to adverse outcomes.
type, etiology, severity, and preventability. In addition to evaluating the association of adverse occurrences with patient, disease, and treatment factors, we will also examine their effect on hospital stay duration, re-admission, reoperation, and patient-reported health status at two years following surgery. We hypothesize that some complications that appear to resolve with treatment post-operatively (e.g., wound infection, cerebrospinal fluid leak) may have lasting effects on pain and function. We are measuring back and leg pain using numerical ratings of intensity and bothersomeness [21][22][23] and health status by the Short Form-36 [24][25][26]. We are also measuring pain medication use, work status, and patient satisfaction.

Ascertaining adverse occurrences
We created a priori definitions and ascertainment criteria for 176 adverse occurrences. One orthopedic surgeon and two neurosurgeons specializing in spinal surgery reviewed a list of spine surgery complications [27], eliminated redundancy, and developed explicit definitions for 70 adverse occurrences. Two hospitalists with experience studying surgical complications provided operational definitions for 56 other adverse occurrences [28]. Anesthesiologists experienced in studying anesthetic adverse occurrences provided definitions for 30 peri-operative anesthetic events [29]. With input from operating room nurses, technicians, and managers, we developed criteria for 20 adverse process-of-surgical care issues (e.g., lack of appropriate equipment, implants, documentation, or diagnostic studies). The final list of adverse occurrences and their definitions are provided in the Appendix [see Additional file 1].
In addition to prospective, daily, rigorous medical record review by research staff, we established six other mechanisms for surgeons, residents, fellows, and other team members to independently and voluntarily report adverse occurrences: (1) confidential forms in the operating rooms, inpatient areas, and outpatient clinics with secured collection-boxes; (2) dedicated telephone lines at each hospital; (3) privacy-protected email; (4) weekly spine clinical conferences; (5) daily inpatient rounds; and (6) outpatient clinics [30]. Occurrences from the last three sources were recorded by a designated nurse or physician assistant. We tracked all the modes through which each occurrence was identified.

Categorizing adverse occurrences
Adverse events in spine surgery are often arbitrarily reported as "device-related," "major," or "preventable." These judgments are not always straightforward, and they profoundly influence interpretation of safety data. Comparisons are difficult unless the terms are applied consistently. We, therefore, used four reviewers to evaluate the consistency of assigning etiology, severity, and preventability to adverse occurrences.
Reviewers were selected from different backgrounds to allow broad clinical perspective. They included a spine fellowship-trained orthopedic surgeon with 7 years of experience, a spine fellowship-trained neurosurgeon with more than 5 years experience, a neurosurgeon with more than 25 years of experience, and an anesthesiologist with more than 5 years of experience. Reviewers individually classified adverse occurrences using pre-established operational definitions [see Additional file 1] and categorization schemes (Tables 2, 3, and 4) and then discussed them as a group in three one-hour training sessions. Subsequently, the four reviewers independently coded adverse occurrences recorded during the first six months of the study.
The reviewers were provided a brief narrative describing each adverse occurrence and the patient's history, surgery, and other information available at discharge. Reviewers were asked to confirm that the reported event met the predefined ascertainment criteria and to judge the event's causes, preventability, and severity. Reviewers selected Reviewers could select multiple factors, but identified a dominant or most important factor. Reviewers coded preventability as clearly unpreventable, potentially preventable, or clearly preventable [31,32]. For severity coding, we provided the reviewers the adverse event severity categorizing scheme based on the Sentinel Event Reporting Policy required by the Joint Commission on Accreditation of Healthcare Organizations (JCAHO) ( Table 3) [33]. By design, this scheme does not distinguish quality of care concerns from patient outcomes, or real effects from potential effects, requiring institutions to define "sentinel event" specifically for their own purposes with "latitude in setting more specific parameters to define 'unexpected,' 'serious,' and 'the risk thereof"' [33]. To measure the impact of adverse occurrences independent of quality of care, with separation of potential risk and actual effect, we developed an "Adverse Occurrence Severity Score" similar to the Index for Categorizing Medication Errors developed by the National Coordinating Council for Medication Error Reporting and Prevention (NCC MERP)( Table 4) [34]. For each adverse occurrence, each reviewer identified the most important factor for etiology, rated preventability, and provided both a JCAHO severity rating and an Adverse Occurrence Severity Score.

Measuring medical comorbidity
Risk evaluation is crucial to predicting surgical outcomes, but the specific methods most appropriate for spine surgery are unclear. We therefore collected medical comorbidity information using multiple methods. Patients completed a medical history questionnaire to allow calculation of a Charlson comorbidity score [35][36][37]. We also reviewed medical records to identify presence of 32 med- Did not and unlikely to have had an adverse effect. 2 Did not but had the potential to have had an adverse effect. 3 Had an adverse effect but not life threatening. 4 Resulted in loss of major physical function or potentially life threatening. 5 Demonstrated a life threatening situation or resulted in death. ical conditions [38] We additionally recorded the American Society of Anesthesiologists (ASA) grade for anesthetic risk [39] and each patient's height, weight, and tobacco, alcohol, and drug use.

Measuring disease severity
Lumbar degeneration (spondylosis) is a broad category with varying degrees of severity, and surgical procedures to treat it are individualized to address various aspects of this condition. Technical difficulty of the surgical procedure, and the associated risk of adverse occurrences, may be affected by the anatomical changes, such as the severity of spinal stenosis or the presence and severity of concurrent spondylolisthesis and scoliosis. Also, because patients with more severe and complex spinal disease may seek out particular providers and hospitals, it is important to control for disease severity when comparing adverse occurrences in different settings. We desired a measure of severity of lumbar degeneration to use in predicting the probability of an adverse occurrence.
To test the reliability of this disease severity scoring method, two observers scored 10 imaging studies of patients showing a broad range of degenerative lumbar spine changes. Image panels showed lumbar spine anterior-posterior and lateral radiographs, lateral flexion and extension views, and sagittal views on MR images. To show the neural tissue space, the panels included an axial image of the spinal canal, sagittal view of the right foramen, and sagittal view of the left foramen for each lumbar level. Each observer rated the 10 cases at two times, approximately 3 weeks apart, identifying a score for each case on all 9 imaging dimensions.

Measuring surgery invasiveness
Surgical complexity influences risk of adverse occurrences. When comparing different surgeons, hospitals, or devices, the extent and nature of the spinal surgery may be a confounding factor. To control for variations in spinal procedures, we developed a quantitative index to rate the invasiveness of surgery.
We based the index on three fundamental elements of spinal procedures: decompression, fusion, and instrumentation of individual vertebrae. Combinations of these three elements on different vertebrae, when combined with surgical approach (anterior or posterior), can be useful in describing many spinal operations. Each operated vertebra can be assigned a score of 0 to 6, based on how many No effect, minor risk Adverse occurrence required no intervention, resulted in no adverse consequences, but had the potential to result in minor consequences. 2 No effect, major risk Adverse occurrence required no intervention, resulted in no adverse consequences, but had the potential to result in major but not life threatening adverse consequences. 3 No effect, risk of death Adverse occurrence required no intervention, but had the potential to result in a life-threatening situation or death. 4 Minor effect, minor risk Adverse occurrence required a minor intervention or resulted in minor loss of function, and had the potential to result in only minor adverse consequences. 5 Minor effect, major risk Adverse occurrence required a minor intervention or resulted in minor loss of function, but had the potential to result in major loss of function, though not life-threatening. 6 Minor effect, risk of death Adverse occurrence required a minor intervention or resulted in minor loss of function, but had the potential to result in a life-threatening situation or death. 7 Major effect, major risk Adverse occurrence required extensive intervention such as unexpected re-operation or re-admission, or resulted in major loss of function, but was not life-threatening. 8 Major effect, risk of death Adverse occurrence required extensive intervention such as unexpected re-operation or re-admission, or resulted in major loss of function, and had the potential to result in a life-threatening situation or death. 9 Life-threatening effect Adverse occurrence resulted in a life-threatening situation. 10 Death Adverse occurrence resulted in death. of six procedural elements were performed at that level: anterior decompression, anterior fusion, anterior instrumentation, posterior decompression, posterior fusion, and posterior instrumentation.
We scored the six constituent procedure components using the following definitions: (1) Anterior decompression: 1 unit for each vertebra requiring partial or complete excision of the vertebral body or the disc caudal to that vertebra.
(2) Anterior fusion: 1 unit for each vertebra that has graft material attached to or replacing that vertebral body.
(3) Anterior instrumentation: 1 unit for each vertebral body that has screws, plate, cage, or structural graft attached to its vertebral body or replacing its vertebral body.
(4) Posterior decompression: 1 unit for each vertebra requiring laminectomy or foraminotomy at the foramen caudal to its pedicle and/or discectomy at the disc caudal to that vertebral body.
(5) Posterior fusion: 1 unit for each vertebra that has graft material on its lamina, facets, or transverse processes.
(6) Posterior instrumentation: 1 unit for each vertebra that has screws, hooks, or wires attached to its pedicles, facets, lamina, or transverse processes.
Each of the six procedure elements can thus be assigned an integer value corresponding to the number of vertebrae on which that procedural component was performed. We also defined a composite "Spine Surgery Invasiveness Index" as the sum of the six procedural element scores for a given surgery. We developed a graphical grid for coding each surgery (Figure 2).
A surgeon-investigator or a trained research assistant completed the surgical procedure grid based on the treating surgeon's operative report. To determine if this grid method could be reliably used in routine clinical documentation, we made available a medical record form to allow surgeons to record the spinal procedure using the grid format in their immediate hand-written brief operative note. Using the treating surgeon's dictated operation report as the reference, we assessed the reliability of invasiveness coding by comparing the surgeons with the two researchers for fifty consecutive cases.

Data analysis
We used the kappa statistic to assess agreement between reviewers, using weighted kappa for ranked scales (pre-ventability and JCAHO severity scores) [53,54]. We report kappa values for each pair of observers. Calculations were made using STATA version 8 (College Station, Texas). For evaluating etiology code agreement across four reviewers, we calculated the kappa statistic using the "kap" command in STATA where each observation is assumed to be a subject, the number of raters is fixed (4 raters), and more than two outcomes are possible (18 etiology codes). We set a goal of >0.60 as desirable kappa value for designating agreement as "substantial" or better according to the following published scale [55]: We assessed agreement on continuous measures (Adverse Occurrence Severity Score, Degenerative Disease Severity Score, and Spine Surgery Invasiveness Index) using intraclass correlation methods using a SAS procedure (SAS Institute, Cary, NC) [56]. We selected the intra-class correlation coefficient (ICC) appropriate for a random sample of reviewers, selected from a larger population, where each reviewer rates each target. We set the significance level (alpha) at 0.05 to calculate 95% confidence intervals (CI).

Sample
Between January 1, 2003 and July 1, 2003, 350 patients had lumbar surgical procedures performed at the two participating institutions. Among these, 210 consented for enrollment in the study and 11 declined participation. Patients were offered enrollment only in clinics staffed by a research coordinator, and because of limited resources, only the busiest spine clinics were staffed by research coordinators. Target enrollment for the lumbar spine surgery study is 1000 patients.

Ascertaining adverse occurrences
During the initial six months of this study, we recorded 172 adverse occurrences for patients undergoing lumbar surgery for degenerative disease. Rigorous daily medical record review identified 92.6% of the total number of adverse occurrences and voluntary reports identified 38.5%; 31.1% of adverse occurrences were identified by both voluntary reports and medical records. Surgeons reported 18.3% of the total number of adverse occurrences ascertained; the inpatient team reported 23.1%, and 6.1% of the total number of adverse occurrences were reviewed or discussed in clinical care conferences, such as morbidity and mortality conferences. Most adverse occurrences were identified only in medical records, such as progress notes, laboratory reports, imaging reports, operation reports, and discharge summaries (61.5%). Surgeons were the sole source for 3.2% and inpatient team members (nurse practitioners, residents, and fellows) were the only source for 4.2%.

Categorizing adverse occurrences
After classifying some adverse occurrences during the initial training sessions, the four reviewers independently coded the remaining 141 occurrences in 53 patients (Tables 6 and 7). Agreement was substantial for four of the 18 categories of error examined: technical error, failure in communication, systems failure, and no error ( Table 8). Agreement across all four reviewers was fair when combined across all 18 error categories, and moderate (using weighted kappa) for preventability and JCAHO severity (Table 9). Numerical severity ratings using the Adverse Occurrence Severity Score showed substantial inter-rater agreement (ICC = 0.74, 95% CI = 0.68 -0.79).

Measuring disease severity
Overall (mean) agreement for disease severity dimensions was moderate across observers and substantial within observers (Table 10). Inter-observer agreement was lowest for herniation and instability and strongest for degeneration. There was excellent agreement for the Degenerative Disease Severity Score (ICC = 0.98, 95%CI = 0.96 -0.99) ( Figure 3).

Measuring surgery invasiveness
Inter-researcher agreement was almost perfect for the Invasiveness Index and for its six constituent elements (Table 11). Surgeons completed the grid operative report form as part of their medical record documentation in only 53% of the cases. Agreement between the surgeons and the researchers was very high on the forms completed (Table 10) (Figure 4).

Discussion
Adverse occurrences are unwanted but common, often carrying burdens of blame, guilt, or fear of sanctions [57,58]. Terms such as complication, adverse event, and medical error exacerbate the punitive atmosphere surrounding undesirable outcomes, particularly when these events are related to surgical procedures [59,60]. As a result, despite a century-old tradition among surgeons to focus intensely on complications in regular morbidity and mortality conferences [61], discussions of adverse occurrences in the surgical literature are frequently dismissive or defensive, leaving lessons buried under quality assurance protections [62]. Sanitized or closed quality-of-care discussions prevent systematic review of experience across institutions or cumulative experience over time, restrict- ing knowledge that may prevent future occurrences [63]. Mistakes get repeated. Patient safety suffers.

Graphical Grid for Coding Surgical Procedures
Approaches to measuring the safety of spine surgery are not well-developed. We undertook preliminary evaluations to help define a protocol to monitor adverse occurrences associated with spine surgery. We chose a design engineering perspective to create a conceptual framework with desirable components and specifications, including multi-modal, standardized, comprehensive surveillance of outcomes and detailed measurement of risk-adjustment factors. Establishing multiple methods to track 176 adverse occurrences requires extensive resources and is not practical for routine clinical surveillance. Identifying the most common or most severe of these events may help to select a smaller set of indicator events. Since many adverse occurrences tended to cluster in cascades, understanding associations among occurrences may allow selection of a shorter list of critical surveillance items. Quantifying disease severity on imaging studies and surgical invasiveness from medical records requires additional extensive resources. While such a complex and bulky system can be implemented in rigorous regulatory approval studies of new devices or other well-funded trials, widespread acceptance and application may require selecting subsets of risk factors and adverse outcomes that directly relate to specific patient safety concerns, or choosing those parameters in this framework that can be ascertained reliably from brief medical record reviews or administrative data alone.
Comprehensive surveillance of all adverse occurrences is difficult, if not impossible. Tracking surgical complications may be particularly troublesome because of issues relating to responsibility and liability surrounding invasive interventions. Although the true number of adverse occurrences cannot be determined, our experience confirms that complementary surveillance methods provide more complete assessment [64]. Our multi-modal attempt for capturing adverse occurrences showed that self-report by surgeons was not sufficient for identifying most adverse occurrences, and neither was reliance on voluntary reports by the spine team conducting daily ward rounds. Contrary to experience reported for some settings [30], in our study even designated professionals integrated into the daily team rounds were not sufficient to discern most adverse occurrences, perhaps because these personnel were not consistently aware of intra-operative occurrences, near-miss occurrences, or occurrences only observed by consulting services. Also, surgical team members may not have completely trusted the study goals during the early study period reported here. Hopefully, voluntary reporting can improve as team members develop greater awareness of reporting methods, more  certainty that prevention through learning is the sole motive for surveillance, and in time, cultivate a culture of safety that encourages openness.
Categorizing adverse occurrences is problematic. Reviewers agreed in their discrimination of error from no error, and they consistently identified errors related to technical, communication, or systems failures. They were also able to reliably assign severity ratings to adverse outcomes using a scale that separated actual from potential effects. Reviewers, however, had difficulty determining preventability of adverse occurrences and assessing severity using a classification based on the JCAHO Sentinel Event Policy. Adverse occurrences are products of complex patient and treatment factors, often occurring in cascades where it is difficult to isolate the causes and effects of individual events. Reviewer agreement may be limited in part due to lack of detailed information. Also, some consequences may not be apparent at the time of hospital discharge, when ratings were assigned. Agreement among reviewers may improve with more experience, with provision of more detailed narratives, or with development of simpler coding scales.
Initial assessment of severity scoring for degenerative changes in the lumbar spine is promising. Two orthopedic surgeons showed good agreement in distinguishing patients with mild degeneration from those with severe degenerative changes. More work is needed to assess generalizability and to describe how different aspects of degeneration may be related to presenting symptoms and functional impairment. Such research may allow hierarchical ranking of broad diagnostic categories within lumbar spondylosis or permit weighting of different components of degeneration. Degenerative Disease Severity Score Figure 3 Degenerative Disease Severity Score. The degenerative disease severity score assigned by two observers for 10 sample cases. Score by Observer 1 highly correlates with the score given by Observer 2 and with repeat scores for each observer.  Surgical procedures on the spine can be quantitatively ranked for invasiveness. Although surgeons were only able to provide this information routinely in just over half the cases, when the information was provided, it was reliable. Compliance may improve with time, encouragement, or proof of the value of such coding. Validation of this ranking system by comparison to other indicators of invasiveness, such as duration of surgery or blood loss, may help better assess utility of the ranking system and add meaning to the relative invasiveness of various procedural elements.
Our study only included the busiest spine centers within our network. This choice may have introduced bias. Surgical volume may influence both the frequency and the reporting of adverse occurrences. Busier centers and surgeons may have lower rates of some occurrences and higher rates of others. Incorporating additional tasks of surveillance and reporting into routine care processes may be more difficult in busy, high-volume settings. Some of these concerns could be addressed by limiting surveil-lance to only a select few adverse occurrences that are routinely recorded in operation reports and hospital discharge summaries.
Our study placed emphasis on explicitly recording absence of an adverse occurrence when none occurred. Lack of occurrence of a particular complication with a particular procedure is important information. The efficiency of surveillance of what occurred cannot be judged without explicit data on what did not occur. No report does not equal no occurrence. To be meaningful, adverse occurrence reports should specify what was monitored, how often it occurred, and how often it did not occur.
We hope that sharing this protocol development will stimulate discussion of these methodological issues and push the field towards greater standardization in reporting and comparing adverse occurrence rates for devices, techniques, and healthcare providers. Although our focus is lumbar surgery for degenerative disease, the methods described may be applicable also to surgery in other  1 Kappa value for inter-observer agreement between Observer 1 and Observer 2. 2 Kappa value for intra-observer agreement for Observer 1. 3 Kappa value for intra-observer agreement for Observer 2.   1 Intraclass correlation coefficients for agreement between the two researchers: surgeon-investigator and a trained research assistant. 2 Intraclass correlation coefficients for agreement between the treating surgeon and the researchers. Spine Surgery Invasiveness Index regions of the spine. The analytic approach described may also have relevance for efficacy level evaluation of current and new procedures. Individual hospital and provider level analyses may be useful for effectiveness research and quality improvement.

Conclusion
Approach to measuring the safety of spine surgery can be standardized. Scales for rating the impact of adverse occurrences, severity of lumbar spine degeneration, and invasiveness of spine surgery have acceptable reproducibility. Reviewers frequently disagree on causes of adverse occurrences.

Competing interests
Support of spine-related research at the University of Washington (UW) includes a gift of an endowed chair established in 1999 by support from Surgical Dynamics, a past manufacturer of spinal implants, to conduct outcomes research in spine surgery. The UW Department of Orthopedics has also received gifts of endowed chairs from Synthes (Paoli, PA) in 2003 and Depuy Spine (Rayhnam. MA) in 2005, current manufacturers of spinal implants. Synthes and Depuy also provide spine fellowship support at UW. In addition, Synthes has established a Spine End-Results Research (SERR) Fund at UW for conducting safety and outcomes research on spine surgery patients. The principal investigator for this fund is a faculty member in the orthopedics department and the fund is managed through the Grant and Contract Services Office of the University of Washington. The sponsors of the endowments and the research fund have no control over design, conduct, data, analysis, review, reporting, or interpretation of clinical research conducted with the funds. SM and the University of Washington also hold two patents on surgical drills. These patents are licensed by Synthes. SM and the University of Washington do not conduct research to evaluate use of these surgical drills in patients.