Outcome measures have been selected based on those recommended for clinical trials of OA [19, 20]. In those with bilateral hip OA, only the most symptomatic hip will be assessed.
Age, gender, duration of hip OA symptoms, previous treatment, surgery and medication use for hip OA, employment status, marital status, education level and previous health problems will be obtained by questionnaire. Radiographic disease severity will be assessed from the baseline x-ray using the Kellgren and Lawrence grading system  while individual features of osteophytes and joint space narrowing will be rated using the Osteoarthritis Research Society International Grading system .
Secondary outcome measures
A number of secondary measures will be included (Table 3). The Hip Osteoarthritis Outcome Scale (HOOS) is a patient-administered measure that assesses the patient's opinion of their hip and associated problems over the previous week. It consists of five subscales; pain, other symptoms, function in daily living, function in sport and recreation, and hip-related quality of life. A normalised score (100 indicating no symptoms and 0 indicating extreme symptoms) is calculated for each subscale .
A number of functional performance tests and musculoskeletal impairments will be measured at baseline and at 13 weeks. The passive range of hip flexion, extension, abduction, internal rotation and external rotation will be measured using clinical methods and an inclinometer. The test retest reliability ICCs for these measures ranged from 0.82 to 0.94 in 25 patients with hip OA tested one week apart . For all measures, two trials will be performed and the mean reading used in analysis. Hip flexion, extension and abduction will be measured in supine while internal and external rotation will be measured in sitting
An instrumented manual muscle tester will be used to measure maximum, normalised isometric strength (peak torque; Nm/kg) of the hip abductor, extensor, flexor and internal and external rotator muscles. Our reliability ICCs ranged from 0.65 to 0.85 in six patients with hip OA tested a week apart . Measurements of hip abductors and extensors will be taken in supine while those of the flexor and rotator muscles will be taken in sitting. Maximum voluntary isometric torque of the quadriceps and hamstring muscles at 60° knee flexion will be measured in sitting using a KinCom isokinetic dynamometer (reliability ICCs were 0.85 for quadriceps and 0.85 for hamstrings for ten patients with painful hip OA). Testing will comprise two maximal contractions with the peak value used for analysis.
Several functional tests will be included. The stair climb test  involves timing how long it takes participants to ascend and descend six steps at their own pace. For the 30 second sit-to-stand test, the number of times participants can rise to a full standing position from sitting and return to sitting in 30 seconds is counted . Walking performance will be assessed by calculating walking velocity (m/sec) as participants walk 20 meters with the instructions 'walk as quickly as you can without overexerting yourself' . Dynamic standing balance will be assessed by the step test  and the 4-square step test .
In a subset of participants, gait analysis will be performed at baseline and at 13 weeks. Kinematic and ground reaction force data will be recorded simultaneously for five walking trials in usual footwear at self-selected pace using a 3-D motion analysis system with12 cameras (Vicon MX, Oxford, UK) and three force plates (AMTI, Massachusetts, USA) concealed in the floor. Kinematics will be derived from the standard Davis-Kadaba marker set using Vicon Plug-in Gait model. Data will be combined using inverse dynamics to yield measures of external joint moments. The main measures will be peak external hip adduction and abduction moments, peak external hip flexion and extension moments, pelvic drop/rise in the frontal plane and pelvic rotation ranges of motion, and peak and total range of hip motion in the sagittal and transverse planes. Reliability in our laboratory is good for the measures we have subjected to reliability testing, with ICCs of 0.56 to 0.95 (the majority > 0.80) from six patients with hip OA and six controls tested twice one week apart.
The Assessment of Quality of Life instrument version 2 (AQoL II) has 20 questions that cover six dimensions of health-related quality of life including independent living, social relationships, physical senses, coping, pain and psychological wellbeing. The AQoL has strong psychometric properties and is more responsive than other widely-used scales [31, 32]. It produces a single utility index that ranges from -0.04 (worst possible health-related quality of life) to 1.00 (full health-related quality of life). A clinically important difference in health-related quality of life can be defined as a change of 0.04 AQoL units . The AQoL will be collected at weeks 0, 13, and 36.
Habitual physical activity will be measured in two ways, using a questionnaire and using a pedometer. The Physical Activity Scale for the Elderly (PASE) will be used to measure both the level and type of recreational and occupational physical activity undertaken by participants over the previous week. The PASE was developed and validated in samples of older adults (age 55+ years) . A pedometer (HJ-005 Omron Healthcare, Japan) will be worn for a week on three occasions (baseline, 13 weeks and 36 weeks) to record the number of steps taken per day. Participants will be asked to wear the pedometer full time during their waking hours. Pedometers have been found to be a simple and inexpensive means to estimate physical activity levels [35, 36]. It is recommended that at least three days of sampling are needed to accurately assess activity levels given differences between weekends and weekdays .
The Arthritis Self Efficacy Scale will be used to measure psychological status. It has three subscales that assess self-efficacy for control of pain management, physical function and other arthritis symptoms. Prior studies have supported both the reliability and validity of this scale . Pain catastrophizing will be measured using the 13-item Pain Catastrophizing Scale. It measures tendencies to ruminate about pain, magnify pain, and feel helpless about pain. It has high internal consistency (coefficient alpha =.87) and is associated with heightened pain, psychological distress, and physical disability . We will use the Coping Attempts Scale of the Coping Strategies Questionnaire to assess the use of pain coping skills . This 42-item scale measures how often a patient engages in 7 different pain coping strategies. This instrument has demonstrated sensitivity to change from treatment in chronic pain samples as well as good internal consistency and construct validity .
Participants will rate their perceived overall change and their change specifically in pain and in physical function with treatment (compared to baseline) on seven-point ordinal scales (1-much worse to 7-much better). Scales of this kind are frequently used as an external criterion for comparison with changes in scores of other outcomes . Measuring patient perceived change using a rating of change scale has been shown to be a clinically relevant and stable concept for interpreting truly meaningful improvements from the individual perspective . We will also dichotomise the group according to their perceived change rating where improved will be defined as 'moderately' or 'much' better, and not improved will be defined as 'slightly' better and below. The proportion of improved participants from each group will determine success of the treatment.
A number of other measures will be obtained (Table 3). Participant adherence will be obtained by recording the number of physiotherapy sessions attended (out of a maximum number of ten). Those in the active physiotherapy group will complete a daily log-book to record the number of home exercise sessions completed during the treatment phase. To indicate adherence to the home program during the six-month follow-up period, participants in both groups will be mailed a short questionnaire at Weeks 24 and 36 which asks how many times in the past week they have performed the home exercises or applied the gel. Adverse events and the use of co-intervention will be recorded in a log-book and by open-probe questioning by the assessor at trial completion. At the 13 week and 36 week measurement time points, study participants will be asked to indicate which treatment they believe they have received (active or sham) and reasons for that choice to assess the success of blinding.
Information on health care costs and direct non-health care costs over the last month will be collected retrospectively at week 0, 5, 9, 13, and 36 by questionnaire. Direct health care costs will include costs of physiotherapy attendance (assumed zero in the sham group), additional health provider visits (doctors, specialists, other health care professionals), investigative procedures, purchase of prescription and over the counter medication, and hospitalisation. These will be valued using published prices for medical costs. Direct non-health care resources will include number of lost days from work.
The monetary valuation of health status pre- and post-treatment is potentially a more comprehensive patient relevant measure of treatment gains than measures of either a clinical outcome or health-related quality of life. A simple open-ended questionnaire will ask participants in each group about their willingness to pay for the treatment given the outcomes they experience.