Summary of findings
The aim of this study was to investigate whether the use of data from repeated assessments enhanced model fit and prognostic performance of primary care prognostic models, in order to better predict long-term outcomes and reduce uncertainty in clinical decision-making. Such information can support decision making regarding the need for monitoring symptoms, or immediate referral or treatment. The results show that short-term repeat assessment of pain had only slightly better prognostic performance compared to short-term change or baseline only scores in predicting long-term disability improvement. Furthermore, a full baseline prediction model incorporating multiple prognostic factors does not necessarily add prognostic value to a repeat short-term assessment of pain. Using a hypothetical clinical scenario, combining information from a full prediction model assessed at baseline with a brief repeat assessment of pain in only those with uncertain prognosis at initial assessment may provide an efficient and effective strategy towards reducing uncertainty and improving discrimination between those at high or low probability of long-term recovery from a painful shoulder condition.
Comparison with existing literature
The results of this study do partially confirm those of two similar studies [8, 11] which also examined the added prognostic value of short-term monitoring on longer-term outcomes. Dunn & Croft  used the same dataset as one of those used in the present study, and found that repeat assessment of other prognostic factors (e.g. fear-avoidance beliefs) led to an improvement in predictions longer-term. The similarity between this previous study and the findings from the other datasets included here strengthens the argument that repeat assessment can provide additional prognostic information across different outcome measures. Wand et al.  found that a ‘subacute profile’ (predictors measured at six weeks) was the best at predicting long-term disability compared to an ‘acute profile’ (baseline scores only) and compared to change between baseline and six-week assessment. Our study extended the findings of both of these studies by testing a possible clinical scenario in which a more efficient strategy (only re-assessing those with uncertain prognosis rather than the entire sample) could be applied.
The highest c-statistics obtained in objective 2 are not as high as those reported for other prognostic models [2–5]. For example, the STarT Back tool which can be used to stratify patients with low back pain  has demonstrated better prognostic performance (e.g. [25, 26]). Variability in performance can be explained by differences in the outcome measure used in each study (function versus recovery), prognostic factors included in the model, the measures used to assess prognostic factors, or differences in the study population. However, these other studies could not be used as they did not include a short-term assessment point which we required in the present analysis. The aim of our analysis was not to find the “best” model, but rather to find the most appropriate studies to test our hypotheses and clinical scenario for incorporating repeat assessment. Further research is required to confirm whether this improvement holds for other baseline prognostic models, regardless of prognostic performance.
Strengths and Limitations
This study included four relatively large (n at least 512) datasets covering two common musculoskeletal complaints. The studies included different measures of disability and used slightly different measurement time points, but the results overall were very similar, which gives strength to the study findings . The number of cases included in each of the analyses did differ, due to dropout over time in each of the studies. Imputation was considered, but a sensitivity analysis using complete case data did not result in any significant differences in results. The sample sizes available for each of the datasets included in this study did meet guidance around numbers of cases required for building prognostic models of an event to predictor ratio of at least 10 (e.g. ), although we realise that such guidance may not be entirely suitable for the specific analysis performed in this study.
The inclusion of different musculoskeletal pain regions could be considered problematic as shoulder pain problems may have a different rate of recovery, and different prognostic factors compared to back pain problems. However, recent cohort studies and systematic reviews have highlighted similarities in symptom trajectories and identified generic prognostic factors across different regional musculoskeletal pain problems (e.g. [6, 29, 30]). This is supported by our findings from objective 1, confirming the prognostic value of short-term changes in pain for predicting long-term disability outcomes in both back and shoulder pain.
The included studies were selected according to specific criteria to test the present study hypothesis, but were not systematically searched for. This means that we may have missed additional studies that could have been included in our analysis. However, a search of the wider literature found that very few prognosis studies set in primary care include a short-term assessment point, which was a requirement for our analysis. We would however welcome further analysis of the prognostic value of repeat pain assessment in other settings or populations.
Not every dataset could be used for analysis of each objective. While both the Van der Windt and Kuijpers datasets included a prognostic model, differences between these models in terms of the proportion of participants recovered (outcome event rate) would require different classifications for predicted probabilities in order to address objective 3, making them difficult to compare. Only the Kuijpers model was therefore used for this analysis to provide an example for how such a strategy may be used in clinical practice and how researchers can investigate prognosis in the future. To strengthen these findings, replication in other datasets and other musculoskeletal conditions is needed, perhaps especially because prognostic performance of the models presented here could be considered as moderate at best.
The datasets examined here only contain patients who consulted for musculoskeletal pain, and provided data at both baseline and follow-up, and therefore may not be representative of people who have musculoskeletal pain but choose not to consult, or would not re-consult when invited for a repeat assessment. The datasets included in this study did not all take baseline assessments at the point of consultation. As most patients are likely to consult when their pain is at its worst, and may therefore experience a natural reduction in pain shortly after consulting, the scores obtained in these studies may give a different interpretation and prognostic value than if scores were obtained during the consultation. Studies investigating prognostic factors at the point of care found that pain reduced shortly following consultation,  suggesting that the point at which measures are taken is important and could affect the accuracy of any prediction models derived .
The clinical scenario presented here does not offer optimum prediction. However, this version of the scenario (with three equal risk categories), while realistic for this example, was meant as an illustration of this approach; it could be that different cut-off points for low, intermediate and high probability would result in fewer people being asked to re-consult. The choice of cut-off point should therefore depend on clinically relevant thresholds for treatment and referral. The number of patients in the intermediate group at baseline who needed to return for repeat assessment was still large in our scenario, highlighting the large amount of uncertainty in the prognosis of musculoskeletal pain conditions. Similar proportions for intermediate groups were used in the STarT Back study,  which included active management of this group rather than a ‘watch and wait’ scenario, although in the STarT Back study these proportions were a reflection of the patients included rather than pre-specified proportions as in the present study. The scenario presented here builds on this by ensuring that the low- and high-probability groups are optimally identified, either at first consultation or within a few weeks of that first consultation. It could be that a stronger prognostic model with better prognostic performance is needed in order to more clearly differentiate between patients who do or do not require referral.
The hypothetical scenario itself, while aiming to reduce the number of inappropriate clinical decisions, may also lead to an increase in consultations with an impact on GP time and costs. It could be that the single question about pain could be asked via a phone call, SMS text message/smart phone application 4–6 weeks after the initial consultation, with only those who still report pain at that point being invited back to see their GP.