Thursday, 15 September 2011

'Three-fold variation' in UK bowel cancer death rates

[I rewrote this post on 19th September, having done further analysis of the data.  The overall conclusion is the same, but better supported by the analysis.  I corrected the penultimate paragraph on 4th October.]

I've copied the title from this BBC story.  The story is an uncritical account of a press release by the charity Beating Bowel Cancer.  "Beating Bowel Cancer calculates that over 5,000 lives could be saved every year".

The press release announces an on-line Bowel Cancer Map which allows one to find the (age-standardized) bowel cancer incidence and mortality for each local authority in England and Scotland.  This is based on 2008 figures provided by UKCIS.  (The raw data are available from UKCIS to registered users only.)

The headline finding is that death rates vary from 9.16 deaths per 100,000 in the semi-rural district of Rossendale (in Lancashire) to 31.09 deaths per 100,000 in the city of Glasgow.  I suppose that the calculation that over 5000 lives a year could be saved is based on reducing the death rate nationally from 17.68 to 9.16 - for a population of 61 million that would save 5,197 lives each year (17.68 is my calculation of the UK-wide death rate, using their data.  They give 17.27 for the death rate in England, and higher figures for Scotland, Wales and Northern Ireland).

But this statistical analysis is completely wrong.  Beating Bowel Cancer didn't reply to a request I sent them for a spreadsheet containing the numbers numbers behind the map, so I've scraped them out by semi-automated postcode query.  The total annual deaths in the data I've got is 15,936 compared with 16,259 reported for 2008 by Cancer Research UK, so I think I've been successful enough in extracting the data.  For a given expected death rate, the actual number of deaths in each district will be a random number sampled from a Poisson Distribution with mean equal to the expected number of deaths in that district.  I assumed that the expected death rate is the same throughout the country, and simulated on a spreadsheet numbers of deaths for every district in the UK. After each simulation, I found the district with the lowest actual death rate in that simulation, and the one with the highest.  The result was that on average the district with the lowest death rate has about 7 deaths per 100,000.  The district with the highest death rate has about 32 deaths per 100,000.  So the observed range - 9 to 31 deaths per 100,000 - is in fact slightly (not significantly) smaller than one would expect if the variation were purely random.  There's nothing in that range to suggest that the expectation is any different from one district to another.

What is happening here is that the analysis has been done over districts that are small enough for random variation to swamp any systematic variation.  Taking this to extremes, one might calculate deaths per household, and find that in the best households no one at all died that year of any cause.  If we could duplicate that for all households, we could all live for ever.

It's worth saying a bit more about the population sizes in each area.  The Bowel Cancer Map doesn't give these numbers directly, but it reports numbers of deaths and "age-standardised" deaths per 100,000.
Age-standardisation adjusts rates to take into account how many old or young people are in the population being looked at. When rates are age-standardised, you know that differences between the local authority areas do not simply reflect variations in the age structure of the populations...
From the data given it is simple to calculate the age-standardized population used for each area.  On this basis, Rossendale has 76,000 people and City of Glasgow has 675,000.  It is not surprising that the lowest death rate is in one of the smaller areas - they are the ones with the greatest random variation.  But if the results are simply random it is surprising that the highest death rate is in a large area.  So I do think that the high death rate in City of Glasgow is not purely random - the Cancer Research UK data confirm that death rates are significantly higher in Scotland.

One more thing.  Adding up the populations for each area, calculated from the deaths and death rates, I get a UK population of 89.75 million.  Since the true figure for 2008 was about 61 million, that's rather surprising.  Doing the same calculation on the data for number of cases, which the map also gives, I get a UK population of 83 million.  The Cancer Research UK table is using a population of just over 61 million, and therefore gets a noticeably higher death rate for about the same number of deaths.  [Update: I find that the CRUK table offers age-standardized rates also: they are very similar to the Beating Bowel Cancer rates.  The heading in the table reads "Age-standardised rate (European)...", so it seems that the standardization is to a European-average age distribution.]

I'm not sure whether I should be railing against the use of stupid statistics in a good cause... 


  1. Just read about your statistical [re]analysis in an article in The Guardian. Since your header says "If you like something here, please let me know" I just wanted to thank you for the work you did to illuminate the misinterpretation of statistics (even if it was for a good cause). As more data becomes available - especially regarding health issues - I hope your story will inspire others to take a closer, more skeptical, look at claims being made about correlations and causation.

  2. Is there any chance you could post your spreadsheet so we can see how the simulations were constructed? I for one would find that incredibly enlightening!


  3. I've posted a simulation spreadsheet at

  4. The very nice for content. Thanks

  5. Very nice content, I always follow you.

  6. Your analysis seems to be arguing that regional death rates are random (in a Poisson distribution) - but also that they are not random (eg Glasgow).

    Please could you explain?

    Surely the complex range of determinants of whether people get bowel cancer, and whether it is successfully treated, must vary from region to region? It certainly would if you looked at say death rates across Europe or the world, where presumably countries with worse health systems would create multiple peaks in any distribution curve.