How do we make sure our evidence base is robust?

Jack Martin describes 6 common problems in evaluating services for children, and offers advice on how to overcome them 

Boy playing with saucepansHigh-quality evidence on ‘what works’ plays an essential part in improving the way social care services are designed and delivered, making sure they result in the best possible outcomes for children and families - from preventing abuse and neglect or changing challenging behaviour to supporting mental health and improving educational attainment. I’m a research officer at the Early Intervention Foundation (EIF), and we’ve conducted over 100 in-depth assessments of the evidence used in programme evaluations. We rate everything against our standards of evidence and have published an online Guidebook about the early intervention programmes that have been shown to improve outcomes for children and young people.

As part of this process, we’ve examined thousands of pages of technical evaluation reports in great detail. The quality of these studies varies. So our assessments consider not only the findings of each evaluation – whether it suggests a programme is effective or not – but also the quality of that evidence. If a study hasn’t been well planned or properly carried out, we can’t always be confident that the findings of the study are robust.

I’m going to share some of the issues that we come across frequently, which undermine the confidence we have in published evaluation results. These six common pitfalls could, in many cases, be avoided or mitigated when an evaluation is being planned or carried out, and this would strengthen the evidence base for children’s services.

1. The need for robust comparison groups

What’s the problem?
It can be difficult to conclude whether participation in a programme has caused positive changes for children and families or whether changes are due to other factors. So as well as looking at the changes for the people who took part in the intervention, it’s helpful to have a comparison group made up of people who were in the same situation but who didn’t participate in the service. But some studies don’t use a comparison group at all. Others use a comparison group which is not sufficiently robust, which can lead to biased results.

What’s the solution?
It’s essential to use a comparison group in impact evaluations. Ideally people should be randomly assigned to the group, as in a randomised control trial (RCT), or assigned using a sufficiently rigorous quasi-experimental method.

2. Drop-out rates

Attrition is when participants drop out of an evaluation and data on their outcomes is not collected. In our case, attrition could take place because people stop taking part in a service or because they don’t want to engage in the evaluation.

As the number of people participating in the evaluation decreases, the study sample may become less representative of the population – this means the intervention group and control group could become less similar. This can result in misleading or biased data and affect the reliability of any conclusions we draw about a programme’s effectiveness.

You can use a range of strategies to encourage participants to engage with data collection, such as offering financial compensation. In addition, researchers can conduct data analyses to help work out whether attrition has introduced bias and what effect it’s had on the results.

3. Excluding participants from the analysis

Sometimes evaluators might decide not to collect or analyse data from a particular participant because they didn’t attend all the programme sessions, or because some members of the control group unintentionally received some or all of the programme. But excluding participants risks making the intervention and control groups less comparable and so biasing the results.

Aim to collect outcome data on all participants and include this in the final analysis - regardless of how much of the programme they took part in. This keeps the intervention and control group as similar as possible, and reduces the likelihood of bias.

4. Using the right measures

Ruler illustration

Validity is how far an evaluation tool describes or quantifies what it’s intended to measure. Reliability is its consistency when used multiple times in similar circumstances. If evaluators use measures which haven’t been tested and shown to be valid and reliable, we can’t be fully confident in their findings. And evaluation tools are all designed for a specific purpose, so it’s important to make sure they are appropriate for the programme being evaluated – otherwise we can’t be sure the data they’ve provided is correct.

Always use validated measures which are suitable for the programme being evaluated, and appropriate for the people taking part in the evaluation.

5. Sample size

It’s hard to have confidence in the results of an evaluation if there aren’t enough participants in the study. If a sample is too small, it’s more likely that genuinely positive effects won’t be detected – and also that any seemingly positive results might be erroneous. And having a small sample size in an RCT increases the probability that the intervention and control groups will not be equivalent.

Be realistic about potential drop-out rates, and use power calculations to identify the appropriate sample size. EIF will not consider evaluations with fewer than 20 participants in the intervention group, so it’s important to recruit the correct number of participants and use strategies to keep them engaged with the study.

6. Long-term follow-up

Most programmes aim to bring about lasting change for children and families. So long term outcomes (at least one year post-intervention) are often the most important and meaningful. Studies which don’t assess long-term outcomes – or don’t assess them well – can’t tell us if the changes that happen as a result of the programme are likely to last over a longer period of time. It can also be difficult to collect enough data to analyse long term effects if large numbers of people drop out of the programme and/or evaluation.

Plan your data collection to capture both potential short- and long-term outcomes. Guard against problems which are particularly likely to damage the quality of long-term outcome analyses: maintain comparison groups, attempt to minimise attrition, and make sure you account for attrition in your analysis.

We’ve developed a new short guide on these 6 pitfalls to help programme developers and evaluators conduct high-quality evaluation studies. This will help establish if a programme is achieving the outcomes it was designed to achieve, and add to the UK evidence base on what works for children.

The NSPCC offers advice about evaluating services and engaging service users in evaluation, which you might also find helpful.

Like this blog?

Let us know which blog you've read, what you think, share information you have on the topic or seek advice. 

Get in touch

More from impact and evidence

How we evaluate our services

We choose the best approach for the question we're trying to answer - whether we're learning about something very innovative or something that's well developed.
Read more

Ensuring children’s voices are heard in research

Dr. Catherine Hamilton-Giachritsis, Pat Branigan & Dr. Elly Hanson consider challenges in engaging children in research and look at ways researchers can ensure children’s voices are heard.
Find out more

Engaging service users in evaluation

Advice on embedding evaluation in services for children and families.
Find out more

Tools for measuring outcomes for children and families

Our experiences of using standardised measures in our evaluations
Read more

Recruiting service users for evaluation interviews

Tips for setting up interviews with service users as part of a research study.
Find out more

Using randomised control trials in social care

Richard Cotmore offers advice about using randomised control trials (RCTs) to evaluate the impact of support services for children and families.
Find out more

Impact and evidence insights

Each week we’ll be posting insights from professionals about evaluation methods, issues and experiences in child abuse services and prevention. 
Read our blogs

Support for professionals

Follow @NSPCCpro

Follow us on Twitter and keep up-to-date with all the latest news in child protection.

Follow @NSPCCpro on Twitter

Library catalogue

We hold the UK's largest collection of child protection resources and the only UK database specialising in published material on child protection, child abuse and child neglect.

Search the library

Helping you keep children safe

Read our guide for professionals on what we do and the ways we can work with you to protect children and prevent abuse and neglect.

Read our guide (PDF)


  1. For more information about power calculations, see: 
    McCrum-Gardner, E. (2010) Sample size and power calculations made simple. International Journal of Therapy and Rehabilitation, 17 (1):10-14