Tuesday, 6 January 2015

Two-thirds of cancers - collected links

News sites getting the meaning of "two thirds of cases" wrong: Independent ,Telegraph, Mail, Express, Mirror, Huffington Post
News site getting it wrong in the headline but right in the text without one having to scroll down: Reuters.
News sites getting it right: BBC, Guardian.

The press release.


The Science abstract, with paywalled link to the paper
Free preview of the paper
Supplement on the data and methodology


Long critical review of the paper and its reporting: David Gorski
Discussion of the reporting: Andrew Maynard, Science-Presse (in French)
Criticism of the interpretation of correlation: Guardian, statsguy, Antonio Rinaldi (in Italian), with his own model, me, with a toy model
Criticism of the correlation calculation: StatsChat
Criticism of the clustering methodology: Understanding Uncertainty (with discussion of the reporting), statsguy, me, with discussion of the methodology generally
Criticism of the message: Cancer Research UK (with discussion of the reporting and the paper)
Expressing doubts about the accuracy of the data: Paul Knoepfler

A few comments on the paper: Science

Support for the paper: Steven Novella
Support for the message: PZ Myers, expressing disdain for those reluctant to accept the role of random chance

Monday, 5 January 2015

Cancer risk - an analysis

My previous post discussed this paper, and its claim that two thirds of cancer types are largely unaffected by environmental or hereditary carcinogenic factors.  While I'm unimpressed by the paper, the idea behind it is interesting, so here's my analysis of its data.

The hypothesis is that "many genomic changes occur simply by chance during DNA replication rather than as a result of carcinogenic factors.  Since the endogenous mutation rate of all human cell types appears to be nearly identical, this concept predicts that there should be a strong, quantitative correlation between the lifetime number of divisions among a particular class of cells within each organ (stem cells) and the lifetime risk of cancer arising in that organ."

So let's suppose that each stem cell division gives rise to cancer with a small probability p.  Then if there are n lifetime divisions, the probability that none of them leads to cancer is (1-p)n, so the lifetime risk of cancer, R, is 1 - (1-p)n.  We can rearrange that to find an expression for p, ln(1-p) = ln(1-R)/n.  For very small p, ln(1-p) = -p, so p = -ln(1-R)/n.  If we plot ln(1-R) against n we should expect to find that for all the organs where carcinogenic factors are absent the values fall on the same straight line through the origin.

However, the values of n range through several orders of magnitude, so we can't create this plot unless we're willing to make all the rare cancers invisibly close to the origin.  Instead, let's take logs again, giving log(p) = log(-ln(1-R)) - log(n).  So on graph of log(-ln(1-R)) against log(n), all the cancers satisfying our hypothesis should fall on a straight line with slope one crossing the y axis at log(p).  (I've switched to base-10 logarithms for this step, to make the powers of ten easier to follow)

Here's the graph, which looks not unlike the one in the paper.  The correlation between the x and y data series is 0.787, again not unlike in the paper.  But the slope of a line through the points is not unity, nor is there a subset of points at the bottom of the envelope of points for which the slope is unity.

(I've arbitrarily given FAP colorectal a cancer risk of one millionth less than one, because the method doesn't allow a risk of exactly one.  Its point could be moved vertically by choosing a different number.)

To explore further how well the data fit the model, I've backed out implied values of p for each cancer type.

Here's the problem.  If the data matched the theory, there would be a group of cancer types at the left end of the chart with similar implied probabilities.  It seems in particular that the risk of small-intestine adenocarcinoma is anomalously low.

[A commentator points out that there is a group of cancer types near the left end of the chart which do have similar implied probabilities (the same eight cancers lie roughly in a straight line in the scatter plot).  But the theory in the paper is that there's a background rate of cancer in any tissue type, depending only on the number of stem cell divisions, because "the endogenous mutation rate of all human cell types appears to be nearly identical".  This theory can't be casually modified to allow for a background rate of cancer in all tissue types except for in the small intestine.  (Oncologists are of course aware that small-bowel cancers are strangely rare.)]

Let's try an alternative theory: that for every tissue type, some fraction of stem cell divisions, call it α, are affected by environmental or heriditary influences in a way which gives them a probability, call it q, of causing cancer.  q is the same for all tissue types.  The remaining divisions carry negligible risk by comparison.  Somewhat arbitrarily, we'll assume α is one for the cancer with the highest implied probability in our previous analysis: that is, q is equal to the p implied for Gallbladder non-papillary adenocarcinoma.  We can now back out a value of α for each cancer.

Well, it's a simplistic theory, but it does have the advantage over our previous model that it fits the data.

It seems to me that picking out gallbladder cancer as high-alpha is a plus for this model, because that cancer has a peculiar geographic spread which can only be due to environmental or hereditary factors.

And I've been mischievous.  In this theory, despite the correlation in the input data between stem cell divisions and cancer risk, every cancer is caused by environmental or hereditary factors.

Saturday, 3 January 2015

Science by press release

Yesterday's Times has a front page story "Two thirds of cancer cases are the result of bad luck rather than poor lifestyle choices...". (paywall)

That doesn't match my preconceptions, so I looked for the story online.    The Independent and the Telegraph agree.  So does the Mail.  And the Express. And the Mirror.

Reuters' headline agrees, but its story suggests something a bit different - that two thirds of an abitrary selection of cancer types occur mainly at random.

The BBC speaks unambigrously of "most cancer types" and so does The Guardian.

The press release which must have given rise to this story features the phrase "two thirds of adult cancer incidence across tissues can be explained primarily by 'bad luck'".  I can't make much sense of "cancer incidence across tissues", so I can't blame the journalists for stumbling over it likewise.  But the reporters who got the story right must have managed to scan down to the paragraph where the press release explains that the researchers "found that 22 cancer types could be largely explained by the “bad luck” factor of random DNA mutations during cell division. The other nine cancer types had incidences higher than predicted by "bad luck" and were presumably due to a combination of bad luck plus environmental or inherited factors."

I emphasize that "two thirds of cancer types" is not at all the same as "two thirds of cancer cases".  Two rare cancers apparently unrelated to environmental factors will count for far fewer cases than one common cancer in the other category.

So what of the paper behind the press release?  Here's the abstract, with a paywalled link to the whole paper.  Or you can 'preview' the paper for free here, to the extent your conscience permits.  Supplementary data and methodology descriptions are here.

The hypothesis behind the paper is that cancer is to a large extent caused by errors arising during stem cell division, at a rate which is independent of the tissue type involved.  The researchers therefore obtain estimates of the lifetime number of stem cell divisions various tissue types, and plot that against lifetime cancer incidence, obtaining a significant-looking scatter plot (Figure 1 in the published paper).  So far so good.

But they've used a log-log plot, necessary to cover the orders of magnitude variations in the data.  Now, if you think, as the researchers apparently do, that cancer risk is proportional to number of stem cell divisions, it follows that the slope of a log-log plot should be unity.  It isn't, by eye it's more like two thirds.  The researchers, busy calculating a linear correlation between the log values seem not to have noticed this surprising result.  Instead they square the correlation to get an R2 of 65%, which may (it's not clear) be the source of the "two-thirds of cancer types" claim.

If so, that claim is based on a total failure of comprehension of what correlation means.  Imagine a hypothetical world in which cancer occurs during stem cell division with some significant probability only if a given environmental factor is present, and that environmental factor is present equally in all tissue types.  In this world cancer incidence across tissue types is perfectly correlated with the number of stem cell divisions, but nevertheless all cancer is caused by the environmental factor.

It's simply impossible to say anything about the importance of environmental factors in a statistical analysis without including those factors as an input to the analysis.

However, the press release also features the paragraph I quoted about 22 out of 31 cancer types being largely explained by bad luck.  Perhaps that's what they mean by two thirds.  To get this number, they devised an Extra Risk Score - ERS for short.  Then they used AI methods to divide cancer types into two types based on the ERS values.  So what's the ERS?  The Supplement describes it as "the (negative value of the) area of the rectangle formed in the upper-left quadrant of Fig. 1 by the two coordinates (in logarithmic scale) of a data point as its sides." That is, it's the product of the {base-10 logarithm of stem cell divisions} and the {base-10 logarithm of lifetime cancer risk}.   (The cancer risk logarithm is negative (or zero) since lifetime risk is less than (or equal to) one.)

Shorn of the detail, it's the product of two logarithms.  How does that make sense?  Multiplying two logarithms is bizarre; for all ordinary purposes you're supposed to add them.  For this analyis, a simple measure would seem to be the ratio of lifetime incidence to stem cell divisions, or you might prefer the log of that ratio, which would be the log of the incidence minus the log of the stem cell divisions.

(On further reflection, the number I'd use would be {log(1-incidence)/divisions}.  That doesn't give a defined answer for lifetime incidence of unity, but you can get a number by using an incidence of just less than unity.  Among the other cancer types, it picks out gallbladder cancer as having the highest environmental or heriditary risk, which is consistent with that cancer's unusual geographical variation of incidence.)

The Supplement attempts to justify multiplying the logarithms by explaining why dividing them woudn't make sense.  Which is a bit like advocating playing football in ballet shoes because it would be foolish to wear stilettos.

Whatever ERS calculation you used, the clustering method would still divide the cancers into two groups, because that's what clustering methods do, but different calculations would put different cancer types in the high-ERS group.  If you want, as the senior author does, to draw conclusions from composition of the high-ERS cluster, you need a sound justification for your ERS calculation.


To its credit, The Guardian has published a piece pointing out the correlation misunderstanding.  This piece is also highly unimpressed by the paper, and this review of it has mixed feelings.

Me, I suppose the underlying idea has some truth in it.  But the methodology is the worst I've ever seen in a prominently published paper.

Update: more commentary from Understanding Uncertainty and StatsGuy 

Update: Bradley J Fikes, author of this piece in the San Diego Union-Tribute, complains in comments here that the title of this post is ill-chosen.  He points out that he didn't write his story simply from the press release, but checked it with John Hopkins before it was published.  He's got a point about the title: more than half of what I say here is criticism of the paper not of the press release.

Wednesday, 8 October 2014

Income Tax Chart 2014-15

This is an update to the charts I posted three years ago.


Tax + Employee's National Insurance:

The changes from three years ago are:

- The personal allowance has increased significantly, to £10,000.  This has the side-effect of increasing the width of the trap at an income of £100,000, where the personal allowance rolls off.  All the other income tax thresholds are unchanged (in cash terms) except that the 40% rate now starts a few hundred pounds earlier.

- The top ('additional') income tax rate has been reduced from 50% to 45%.

- The minimum income for NI contributions has not increased in line with the personal allowance, creating a kink at the left hand side of the Tax+NI chart.

I'll repeat what I wrote last time: In a rational world, we would abandon the artificial distinction between Income Tax and National Insurance. We would determine an appropriate shape for the marginal tax curve - preferably piecewise linear and non-decreasing. And we would apply a multiplier to that curve to raise what revenue the government deemed appropriate.

Monday, 6 October 2014

Dies Mali

Economics is widely called "The Dismal Science", not least by its practitioners.  The term seems to apply naturally to a field which treats avarice as the key motivator in human affairs, contrary to our sense of our personal high-mindedness.

I've been aware that the phrase originated with Thomas Carlyle, the 19th-century writer best known for his history of the French Revolution - he memorably and unflatteringly described Robespierre, known to his contemporaries as l'Incorruptible, and remembered today for his enthusiasm for political executions, as "the sea-green incorruptible".  But recently I learnt from Robert Dixon, via Derek Thompson and Chris Dillow, the context in which Carlyle came up with the phrase.  He introduced it, in contrast to the 'gay science'* of poetry, in his 1849 essay Occasional Discourse on the Negro Question (original here, more readably here, historical background here).
...the social science - not a 'gay science,' but a rueful - which finds the secret of this universe in 'supply-and-demand,' and reduces the duty of human governors to that of letting men alone, is also wonderful. Not a "gay science," I should say, like some we have heard of; no, a dreary, desolate and, indeed, quite abject and distressing one; what we might call, by way of eminence, the dismal science.
Carlyle's subject matter was not economics but the use of slave labour on sugar plantations in the Caribbean, which had been banned in 1833.  He was being intentionally provocative - the essay was originally published anonymously, purported transcribed in an elaborate fiction by a Dr Phelim M'Quirk, and evoked a vigorous riposte, The Negro Question, from John Stuart Mill (original here, more readably here).  (None of this old-fashioned debate is ancient history: Mill lived long enough to be briefly godfather to Bertrand Russell, who was the grandson of the Lord John Russell, then prime minister, upbraided by Carlyle in his essay.)

By 1849 British sugar plantations in the Caribbean were struggling: slave labour was no longer legal, and the 1846 Sugar Duties Act had started the process of removing the preferential tariffs protecting the plantations from competition from countries where slavery was still legal, and from European sugar beet.  (At the end of his essay, Carlyle tacitly acknowledges the problem of competition by advocating the sending of gunships to stop shipments of African slaves to Cuba and Brazil.)   The former slaves, Carlyle complains, were unwilling to cut sugar cane for the wages on offer, preferring to work growing pumpkins for themselves.

The economists' solution to the problem of insufficient labour was to increase supply.  Plantation owners tried bringing in indentured labour from India - a practice strongly opposed by the anti-slavery movement - but Carlyle says the economists had recommended importing more (free) African labour.  He points out that newly arrived Africans preferred growing pumpkins just as much as did the original freed slaves, and that the only way to make them work on the sugar plantations would be to bring in so many Africans that they would otherwise starve - he makes a comparison with the Great Famine then causing mass emigration from Ireland.  But he doesn't admit the implication that the only thing that could make a man choose to work under the pay and conditions of a sugar plantation was starvation.

Carlyle's proposal was the Africans should be compelled to work on the plantations "with beneficient whip".  The economist on the other hand would say that sugar should be grown on the plantations only if it were profitable enough that an attractive wage could be paid.  The plantations might be protected by tariffs from competition from slave countries, but if they were unable to compete with European sugar production from beet, it would best to let the Europeans grow sugar for us and grow other crops in the Caribbean.  Pumpkins, if nothing else, had shown themselves capable of paying their way.  Whereas Carlyle seemed to be speaking for the owners of sugar plantations, the economist would have in mind the interests of humanity in general.

The true dies mali - evil days - were those of slave labour enforced by men with whips.  Economics might well follow Carlyle in calling itself the 'rueful science', but it's Carlyle's racism which was dismal.


*The Consistori del Gay Saber - "Consistory of the Gay Science" was an academy of poetry founded in Toulouse in 1323 to perpetuate the lyric school of the troubadours.  In the early 1840's Ralph Waldo Emerson described himself as "a Professor of the Joyous Science", and the term was revisited explicitly by E.S Dallas in his The Gay Science (1866).  Nietzsche adopted it for his Die fröhliche Wissenschaft in 1882.

Wednesday, 24 September 2014

How to fund drug research

Healthcare costs money.  To decide whether to spend that money, we have to go through the unattractive process of putting a price on life and health.  In writing what follows, I remain painfully aware that I am talking about human beings who love and are loved.

We have a bottle of pills which will improve a particular patient's health by an amount we value at $10,000.  The bottle of pills costs $100 to produce.  One way or another the patient has $100 to pay for it.  So he buys the pills, takes the pills, and the world is $9,900 better off.

But it doesn't work like that.  It costs a lot of money to develop a new drug - $1bn is a round-number estimate.  That cost is paid for through the price of the pills during the period of patent production.  Patents in the USA and EU last for twenty years, but much of that period can be taken up with testing and licensing.  (An extension of up to five years is available to compensate for the delay.) Let's assume that leaves ten years (another round number) for the drug company to make some money selling the drug.  And let's suppose the drug could benefit 10,000 patients a year.  That's 100,000 patients over the ten years; to get a billion dollars out of them you need to charge $10,100 for the bottle of pills.  Unfortunately, for half the patients there's no chance at all of their paying that much.  And you need to run a $1bn advertizing campaign to persuade the other patients and their doctors to use the drug.  So now you need to charge $40,100 for the bottle.  Which makes giving the drugs to the patient we started with a heavily losing proposition.

Sometimes we do unattractive, even inhumane things for sound economic reasons - devoting resources to where they can do most good.  But this situation is economic madness.  We have created a system where a transaction which would make the world $9,900 richer can't happen.

Let's clear our minds of any destructive obsession with the granting of patent monopolies, and think of a way to pay for the development of new drugs which will encourage the development of the new drugs we want without making it impossible to use them to best advantage.

First, we abolish all patent restrictions on drug manufacture - we allow any manufacturer to make and distribute any licensed drug, subject only to regulatory controls on quality.  Instead, we will award patent holders a share of a global drugs fund, according to how much good their patented drug is doing.

What's should be the measure of the value of a drug?  It should be how many people take it multiplied by how much they benefit.  And we should assess that benefit relative to the previous best treatment.  NICE in the UK uses the Quality Adjusted Life Year to measure as the basis for its decisions on what drugs to fund.  Personally, I'm not comfortable with the assumption that one person's life is worth more than another's just because they're in better health, and I'd be content to use unadjusted life years.  However, it's important that the measure we use should reflect the impact of side effects, so that a treatment which is as effective as another but with milder side effects can be properly rewarded.

In any case we assume that a year of healthy life is worth the same for everyone in the world, rich or poor.

We establish an international body to assess the expected life-year benefit of each drug in each class of patient, using published data.  We log the use of the drugs.  And we award benefit years to each patent holder accordingly.

There's one wrinkle, which is that the patent holder of the drug actually used gets only the marginal benefit value over the previous best treatment, and if that previous best treatment is still in patent its patent holder is assigned its marginal benefit over its previous best treatment, and so on.   The point of this is that the developer of a slightly superior "me-too" drug gets paid only for the drug's slight superiority.  (There's the complication that an alternative drug may be much better for a particular patient, perhaps because of idiosyncratic side effects: we would need a mechanism for the doctor to record this so that the benefits can be assigned appropriately.)

Now we need a big pot of money to feed all those billions of dollars to the patent holders.  That's not an impossible problem: global drugs spending is currently about a trillion dollars, of which about three quarters is spent on branded drugs, almost all of it in the developed nations.  The price of the branded drugs will fall precipitously once all drug manufacturing becomes generic, freeing up perhaps $700 billion a year of healthcare spending.  We just have to collect that up and distribute it to the drug companies according to their logged QALY contributions.

Individual governments would be responsible for gathering the money by whatever mechanism works best for them.  Initially, the amount would be assessed according to the savings in each country as prices fall.  Over time we would transition to a specified share of GDP from each country.

For distribution, either the money could either be treated as a pool to be distributed proportional to QALY contributions, or a price per QALY could be set, and funding managed to fulfil it.  The latter would be require more financial structuring, but be simpler for drug developers.

Who would gain and lose out of this scheme?  Marketing spending on drugs would fall dramatically - there would be little gain in promoting a drug unless its life-year contribution had been established, and little need to do so once it had.  So it would be bad for the marketing guys.

It would be very good for people in poor countries who are vulnerable to diseases not affecting the rich - sleeping sickness for example.  Because their lives would become as valuable for drug development purposes as anyone else's.

That implies that relatively less effort would go into developing drugs for the rich.  But that needn't be the case in absolute terms, because of all the marketing money being saved.  And the drugs being developed would at least be the ones with the biggest life-year benefits.  (I suppose it would be possible for opponents of this scheme to point to some useful drug which might not have been developed under it: it's inevitable that making the best decision a priori will occasionally result in an inferior decision a posteriori.)

Everyone everywhere would be able to get the drugs they need so long as they or their insurance or their national health service could cover the manufacturing cost of the drug.  In the UK we would no longer be making hard decisions about whether to fund a drug costing tens of thousands of pounds to extend the life of cancer patients by a few months - if the drug worked, we'd use it.

The big weakness is that we would be creating a body of experts whose decisions would determine the destination of hundreds of billions of dollars.  There could be a lot of money available to suborn them.  To make that harder, its decision making process needs to be open - , all the quality-adjustments should be published, all the criteria for an acceptable drug trial should be published, and all the (anonymized) trials data should be published,  so that every calculation is repeatable.

This body could use its power for good: one of the major weaknesses of drug research is that unfavourable results get buried.  By requiring all trials it considers to have been registered with it in advance, and by requiring all registered trials to report results, it can end the cherry-picking which bedevils drug research.

One of the defects of the current licensing system is that a patent owner need not seek a licence for all uses of a drug, so long as it is licensed for one use, and so can suppress adverse findings for an 'off-label' use - notoriously GSK withheld results suggesting that Paroxetine should not be prescribed to adolescents.  This scheme would end that - a drug's developer would not be rewarded for off-label uses, and would therefore be incentivized to seek to demonstrate the effectiveness of its drug for any widespread use.



For a change I've left links out of the main text: I'll gather them here instead.

The cost of a new drug
Marketing spending compared with development spending

Discussion of the EQ-5D QALY measure used by NICE

Global spending on drugs

US proposal for a Medical Innovation Prize Fund
Joseph Stiglitz on prizes instead of patents

GSK and Paroxetine
Bad Pharma by Ben Goldacre

Thursday, 18 September 2014

Hell and high water

Voting is under way in the independence referendum in Scotland.  The media and politicians keep telling us how important a decision it is, so naturally I ask myself what they're lying about.  Westminster politicians of all parties have promised more devolution if the answer is 'no', whereas the (devolved) Scottish government intends to "work in partnership with the rest of the UK" if the answer is 'yes'.  What difference does it make which way the vote goes?  If the vote is 'yes' (the bookies are quoting 4 to 1 against that) there will be 18 months of negotiations to decide what actually happens, so we don't know.  But we can at least look at what the parties say...

An independent country is responsible for its own foreign policy and defence.  The Scottish government's intends to stay in the EU and NATO: the only concrete change would be that "we would make early agreement on the speediest safe removal of nuclear weapons a priority. This would be with a view to the removal of Trident within the first term of the Scottish Parliament following independence."  There's no good alternative to Faslane as a base for Trident-carrying submarines, so if the Scots sticks to this, the logical choice for the (rest of) UK government would be to abandon plans to replace the current system, and plan to phase out the system from 2016.  They hate that option*, so they'll make continued use of Faslane a priority in the negotiations.  It's less important to the Scottish government, so I'd expect the roUK to win this point.

Scotland needs a currency: the Scots government says it will keep the pound.  No one can stop any country using any currency it chooses, so that's certainly possible.  Osborne says he won't agree to any sort of currency union, which means that Scotland would get no say in UK monetary policy, instead of almost no say as at present.  It could try to negotiate a seat for itself on the BoE's Monetary Policy Committee - I don't see why the roUK shouldn't agree to that, since the Scot could be outvoted 8-1.  But much more importantly, the Bank of England would not act as a lender of last resort to Scottish banks.  So if Scotland wants any sort of financial sector, it would need to set up its own central bank.  However, the Scottish government blithely claims that "The Bank of England, accountable to both countries, will continue to provide lender of last resort facilities".  I don't think Faslane is enough of a bargaining chip for it to win this one.

Scotland would get most of the oil.  That would make Scotland a bit richer than the rest of the UK.

Scotland would get some of the debt, and it would find itself borrowing to fund the debt at a slightly higher interest rate.

Sorting out the separation would be difficult and expensive.  And in the end, there would be more politicians and civil servants, costing more money.

Scotland would get its own team in the Olympics.

Scotland would no longer send MPs to Westminster.  The Conservatives won no MPs in Scotland in the 1997 election, and has won one in each general election since, with the result that Labour gets a net gain of about 40 MPs.  Without Scotland, the Conservatives would have a majority in the House of Commons, and the Liberal Democrats would not be in opposition as usual.  However, the overall result of most UK general elections would be unaffected.  Over time, one might expect the political centre to shift a little to the right as Labour strives to improve its electoral prospects.

Overall, independence would make some difference.  Arguments for it are:
- smaller states are more democratic
- it's a cleaner constitutional settlement that increased devolution, in which the role of Scottish MPs in Westminster would be questionable.
- if you don't want Trident renewed, you might get your way
- if you're a Scottish athlete not quite good enough for the UK team, you get to go to the Olympics anyway
- if you're a Conservative politician in the rest of the UK, you're more likely to get into government.
- if you're Scottish, your country gets the government share of future oil revenues

Arguments against it are:
- separating two countries is difficult and expensive
- if you want Trident renewed, you'll run into difficulties
- the Olympic ceremonies will get that bit longer, and the roUK will win fewer medals
- if you're a Labour politician in the rest of the UK, you're less likely to get into government.
- if you're in the roUK, your country loses the government share of future oil revenues.  And if you're Scottish, isn't it a bit tawdry to demand independence just because you've lucked into some mineral resources?
- we'd need to think of a better name that 'roUK' for the rest of the United Kingdom,

* Because having nuclear weapons makes politicians more important globally.  Or perhaps because they think Trident is needed to stop Putin invading.