A similar staged methodology to that used in the development of the RA-WIS  was followed (see figure 1).
The entry criteria for the participants for all the stages of the study were as follows:
Confirmed diagnosis of AS by modified New York criteria
In paid work (but may be in current employment but "off sick" for less than 6 months in the current period); self-employed and part time workers are included.
Aged between 18 – 60 years of age.
The aim of this stage of the study was to identify characteristics of WI unique to AS. Thirteen qualitative interviews were undertaken with participants attending rheumatology outpatients and fulfilling the above recruitment criteria. This satisfies current guidelines for minimum sample size to achieve saturation . Selection of participants was based on a theoretical sample frame to ensure representation for age, gender and work type. Participants had a good range of occupations including sedentary workers (for example Administrator and a Town Planner who was mainly office based), light work (for example Hairdresser and Dental Technician) and manual workers (for example Plumber and Refuse Collector). The interviews took the form of informal conversations with the interviewer introducing areas for discussion using a topic list, main areas were as follows:
Details of occupation at onset of AS, impact of AS then and now, adaptations required, need for part-time working, job security
Employer – including disclosing diagnosis, attitude, flexibility of employment
Access to and from work and within work
Relationships (work and home)
The interviews were tape recorded in full and typed transcripts produced.
Thematic analysis of the interview transcripts was undertaken. Common issues relevant to WI were formulated into potential items for the draft AS-WIS. Where possible the exact words of the interviewees were used. New items generated from the interviews were combined with items thought to be relevant from the existing RA-WIS.
First postal survey
A questionnaire booklet including demographic details, AS QoL  and the draft of the AS-WIS was sent to subjects with a confirmed diagnosis of AS, and of the relevant age. Subjects were all attending for treatment at the local rheumatology clinic in Leeds. Only a single questionnaire was distributed, without follow-up. The ASQoL was chosen as a comparator measure because it offers a disease-specific 'needs based' quality of life measure which would be expected to have a strong correlation with work instability, which is focused upon the construct of participation (i.e. the need amongst this age group to maintain work).
A filter question about employment status ensured that the AS-WIS was only completed by those currently working. The aims of this stage were to test the scaling properties of the draft WIS, to facilitate item reduction and to provide preliminary evidence of construct validity.
Criterion validity: comparison of draft instruments against a 'Gold Standard".
A sample of volunteers from the postal survey completed the draft AS-WIS a second time. On the same day they were assessed by an experienced Occupational Health Physiotherapist/Ergonomist who was blind to the responses on the draft WIS. A facilitator, who asked the patients to complete the questionnaire, also performed a cognitive debriefing at the same time. The expert allocated each participant a WI score between 0 and 4. This scoring system was devised and has been used successfully in the development of other WIS [4, 6]. The draft AS-WIS questionnaire responses were then validated against the results of the gold standard assessments. Those items that were shown to discriminate across the levels of risk ascertained by the experts were retained for further analysis. Cut points for level of risk were then determined by those which maximised the sensitivity and specificity of the screening questionnaire for concordance with the expert judgement.
The Rasch model  is the current standard for the development of unidimensional scales (e.g. of impairment or quality of life) delivering metric quality outcomes in health care . Briefly, data collected from questionnaires which include items for a new (or existing) scale, which are intended to be summated into an overall score are tested against the expectations of this measurement model. The model defines how responses to items should be if measurement (at the metric level) is to be achieved. The response patterns achieved are tested against what is expected (a probabilistic form of Guttman scaling ), and a variety of fit statistics determine if this is the case .
Within the framework of Rasch measurement, the scale should also work in the same way, irrespective of which group is being assessed . For example, in the case of WI, males or females should have the same probability of affirming an item if they have the same underlying level of WI. If for some reason one group did not display the same probability of affirming the item, then this item would be deemed to display differential item function (DIF), and would violate the requirement of unidimensionality . Consequently, every item is checked for DIF by age and gender and, in the current study by time, to ensure stability in the test-retest sample. Finally, a rigorous check for unidimensionality is undertaken by identifying contrast sets of items on the principal first component of the residuals and testing if person estimates derived from these sets differ. The confidence interval for the proportion of individual t-tests showing a difference between estimates should overlap 5% if the scale is strictly unidimensional .
The sample size requirements for Rasch analysis are based upon the degree of precision required for estimates of item difficulty and person ability. For example, in most cases a sample size of 50 will give an item calibration within 1 logit with 99% confidence . This varies according to how well the scale is targeted at the patient sample. Thus a well targeted sample of 108 will give an estimate to within 0.5 logits with 99% confidence. It is important to note that Rasch analysis does not require a 'representative' sample as item difficulty is estimated independently from the ability of persons taking the test. It is more important to have a uniform distribution of persons such that the degree of precision of item estimates is similar across the whole of the construct (i.e. work instability) being measured.
Data are fitted to the Rasch model using the RUMM2020 software .
Test-retest postal survey
A sample of in-work patients were asked to complete the new draft of the WIS on two occasions, two weeks apart. These patients were attending for routine rheumatology clinic appointments in Bradford, a city in northern England adjacent to Leeds. This stage of the study was to assess the test-retest reliability of the scale, and to provide further evidence of its internal construct validity.
Ethical committee approval was granted by Leeds Teaching Hospitals NHS Trust Local Research Ethics Committee under a programme of work for 'Reducing Work Disability in common rheumatic conditions' [Ref CA03/035].