Design
A two-phase prospective, observational study was conducted involving the development, and subsequent validation of a Mechanical and Inflammatory low back pain (MIL) index.
Phase 1: Mechanical and Inflammatory LBP (MIL) Index development
A total of 27 items indicating signs and symptoms of potential mechanical and inflammatory NSLBP were extracted from the Walker and Williamson study [23] and assembled in a usable, testable format Additional file 1. A panel with five experts was formed as a part of the content validity assessment and included a sports physician, rheumatologist, general practice physician and two physiotherapists. Each panel member was experienced in treating back pain, had worked in both the clinical and research environments and presented their opinions as a representation of their field of expertise and qualification.
This panel identified areas of omission and item improvement or modification through a consensus approach using the content validity guidelines of a minimum of four votes with an average score of 3 on a four-point ordinal scale. This enabled a diverse and balanced approach that minimized medical or health management bias. This procedure yielded an initial MIL Index with 11 item items.
Content validity
A four-point ordinal rating scale was used to rate each of the 11 items: “1” = not relevant, “2” = unable to assess relevance without item revision, “3” = relevant but needs minor alteration, “4” = very relevant and succinct. The item evaluation content validity index [24] calculations were applied to both the items and the entire instrument with an a-priori requirement of 3 points with four panel votes.
Face validity
A 5-point numerical rating scale was used (0 = not easy, 4 = very easy) to evaluate item accuracy, comprehensiveness and ease of response with an a-priori requirement of 3 points.
Phase 2: Mechanical and inflammatory LBP index (MIL) validation
Design
A prospective observational study investigated the responses of participants (n = 170) recruited for the study. Three instruments and one physical test were administered: the Roland-Morris Questionnaire (RMQ), the Short-form Health Status survey (SF-12) and the newly created MIL. The “Backache Index” (BAI) was used as the physical test. The evaluators were two physiotherapists with more than 2-years of professional experience. For test-retest reliability two separate test periods were used on a subgroup of participants (n = 37) with a three-day interval. On each test occasion the second assessment assessor was blinded to the original scores to ensure independent data collection.
Patients and setting
The participants (n = 170, 38 ± 14 years-old, n = 96 females) were diagnosed with NSLBP using Waddell’s classification for acute and chronic conditions [20] by a general practitioner (GP), and then were referred to two Spanish physiotherapy outpatient clinics. Exclusion criteria were refusal to participate in the study, LBP as a result of a specific spinal disease, infection, presence of a tumor, osteoporosis, fracture, structural deformity, inflammatory disorder, radicular symptoms or cauda equina syndrome. The study was authorized by the Ethics and Research Committee of the Faculty of Medicine at Malaga University. All participants gave written informed consent, confidentiality and anonymity were preserved at all times, and the principles of the “Declaration of Helsinki” and its subsequent updates were respected.
The standardized measures administered in the study are described below:
-
1.
The Roland-Morris Questionnaire (RMQ) [25] is a 24 item dichotomous scale used to indicate functional disability with a score range from 0 (no disability) to 24 (maximum disability). The cut-offs are determined at 8/24 points for Low to Moderate disability and 16/24 for high disability [26]. The Spanish version has high reliability (ICC = 0.87) [27].
-
2.
The Short-form Health Status survey (SF-12) [28] is a 12-item questionnaire designed to estimate general health status based on physical and mental components (SF-12 PCS and SF-12 MCS). The reliability of the Spanish version is documented with an ICC = 0.90 [28].
-
3.
The Mechanical and Inflammatory LBP Index ( MIL) was the 11-item draft. The items used in each sub-section are 1) Mechanical - pain on trunk flexion, pain on lateral bending and palpation pain (spinous process); 2) Inflammatory - intermittent pain during the day, morning pain on waking and initial getting up, stiffness after resting and pain on repetitive bending. Scoring is performed by use of the standardized scores with regression methods determined from factor analysis.
Physical tests used in the study
The “Backache Index” (BAI) [29] determines the physical status from a single test of 5 simple trunk movements of a patient standing still in erect position: (1) flexion (with knee flexion limited to 10 degrees), (2) bilateral side-flexion to the left and (3) to the right, and (4) bilateral combined extension and lateral flexion to the left and (5) to the right. Observer assessment is performed by means of scoring pain factors obtained by asking the patient, and stiffness estimation at the end of the 5 trunk motions assessed by a physiotherapist according to the BAI criteria [29]. The results are recorded with a four-point score per outcome (0–3 points) and the sum of the five outcomes yields the BAI with a maximum of 15 points. Reliability coefficients of the Spanish version of BAI were excellent (n = 42; ICC = 0.97 at three-day follow-up) [30].
Statistical analyses
The LISREL v.8.0 and Statistical Package for the Social Sciences (SPSS) v.17.0 were used to compute the statistical analyses. The factor structure, internal consistency, and construct validity were assessed from the full sample. The test-retest reliability was assessed through the Intra-class Correlation Coefficients (ICC) Type 2, 1, and expressed with 95%CI using scores on the MIL from participants at baseline and three days later during a non-treatment period. Participants rating on an 11-point numerical rating scale (NRS) of perceived overall status at baseline and on day three provided the reference criterion to determine change. The subsample of participants (n = 37) for test-retest reliability was determined from the calculations of power analysis from the sample size attributes [31].
The participants were initially randomized into two equal groups for the purpose of cross-sample validation, allowing for exploratory factor analysis (Maximum Likelihood using Oblimin rotation and Kaiser’s normalization) with one half and confirmatory factor analysis with the other.
The “Root Mean Square Error of Approximation” (RMSEA), the “Comparative Fit Index” (CFI), and the “Normed Fit Index” (NFI) are used to evaluate the model fit. For the RMSEA, ≤0.08 reflects a reasonable fit [32]. The NFI and CFI varied along a continuum of 0 to 1 with ≥0.90 being satisfactory [33]. Since components/factors of signs and symptoms of LBP are continuous variables and factor loadings obtained by CFA cannot be used directly to assess the MLBP ILBP factors, a MLBP and ILBP index was developed. This is calculated as the sum of the standardized scores with regression methods of the two factors that comprise our proposed model.
In order to know whether the MIL instrument measures relatively specific constructs, the corrected item-total correlations were examined. Then, the internal consistency of the dimensions was determined by means of Cronbach's α. Test-retest reliability was performed at three days during a period of no treatment [34]. Correlating the BAI, SF-12, RMQ and MIL measures assessed convergent validity. Discriminant validity was determined examining the receiver operating curves (ROC) area under the curve (AUC) values [35].
Sample size
The minimum sample sizes for the validation study were verified from the results as determined from an 80% chance of detecting goodness of fit with an Effect size w = 0.5, alpha = 0.05, beta = 0.08, allowing for 15% attrition. This gave convergent validity (n = 61), test-retest reliability (n = 36), discriminant validity (n = 52) and the pooled samples for internal consistency and factor analysis (n > 100) [31].
Practical characteristics
Readability was assessed using the Flesch-Kincaid grading scale, a recognised measurement standard that is obtained within the grammar section of most standard word-processing software [36]. Missing responses were determined from all participant responses. Completion and scoring times were determined respectively from participants and clinicians from the average of three separate scores.