Randomization

William Sealy Gosset aka Student

Randomization in Theory and Practice:             Reply to Jed Friedman

By Stephen T. Ziliak and Edward R. Teather-Posadas

April 25, 2014

 The case against the [randomized controlled Chinese Eyeglass] trial is that we no more need a randomised trial of spectacles than we need a randomised trial of the parachute.

        – Tim Harford, Financial Times

      A study we did on the ethics and econometrics of randomized trials, titled The Unprincipled Randomization Principle in Economics and Medicine,” has generated a number of helpful replies.  (See, for example, “The Economics of Randomized Experiments,” by Casey Mulligan for The New York Times, and “The Random Risks of Randomized Trials,” by Tim Harford for the Financial Times).  Other replies we’ve received are emotional mostly, lacking substantive engagement with logic and facts.

A recent reply from World Bank Senior Economist Jed Friedman, “Should impact evaluation be justified by clinical equipoise or policy equipoise?”, published in The World Bank’s Development Impact blog (April 17th 2014), falls into the latter camp.

Friedman is all steamed up, which is weird, as he seems to agree with Ziliak and Teather-Posadas (ZTP) on at least five of the main points we make in our study of randomization in theory and practice.  Friedman agrees with us that:

1) Field experiments in economics ought to “control for imbalance of influential unobserved characteristics”;

2) Experiments should be “very concerned with the balance of possibly correlated observable characteristics”;

3) That “it is far from clear that clinical equipoise should constitute the guiding ethical principle in the design of social policy research”;

4) And: “From the standpoint of policy, the salient question is not whether an intervention produces any benefit, but how much benefit and delivered at what cost.”

(We do not understand, however, Friedman’s qualification of claim number 4, where he writes in the very next sentence that “While clinical medicine is chiefly concerned with the improvement of health, the motivation for all economic inquiry comes down to the founding observation that resources are finite.”)

Regardless, the unbiased reader will wonder what the fuss is about. With such sizeable agreement, why does Friedman find the paper by Ziliak and Teather-Posadas “so painful,” as he put it, “in parts, to read”? Why, the reader wonders, did Friedman react to the ZTP paper with a sneering, sarcastic tone?

The plaque hangs in the Barley Room of The Guinness Storehouse.  But Gosset was "Head Brewer" and he never worked as a chemist. Photo by Steve Ziliak, July 2008

Randomization is not the gold standard.  The reason is in your pint of Guinness!  Photo by Steve Ziliak, July 2008

The agreement between us is large, converging toward consensus.  For example, after raising false doubt, Friedman eventually concedes our point about the lack of ethics involved in not giving prescription eyeglasses to sight defective Chinese schoolchildren in the Glewwe et al. study control group:

5) “Let’s leave aside the particular question,” Friedman writes, speaking of the eyeglass/school achievement study by Glewee et al., “of whether you or I believe the eyeglass study is ethical or not – maybe there are ways to assess the cost-effectiveness of this program without a leave-out control group.”

Maybe?  “Maybe there are ways” to test the efficacy of prescription eyeglasses on school performance “without a leave-out group”? Maybe is a word he could have dropped, and should have. Do Friedman, Glewwe et al., and others at The World Bank seriously doubt that having a pair of prescription eyeglasses does not bring at least $15.00 (U.S. 2004 nominal) of total benefit to each sight-defective school child, and thus to the Chinese economy – a benefit equal to or greater than the average cost of the eyeglasses? One of us needs a pair of eyeglasses just to find another pair of eyeglasses that went missing, and he is not alone in the world of sight defectives.  Q.E.D.

But anyway why—employing what ethical principle—ought one to “leave aside” the question about the ethical treatment of impoverished, sight defective schoolchildren living in the developing nation? As Tim Harford wrote in the Financial Times, “There are perils to treating patients not as human beings but as means to some glorious end.”

Gosset Student drawings of kurtosis

The ethical question is the question which Friedman and all researchers are supposed to face.  The charge we were given by the editors of the Oxford Handbook on Professional Economic Ethics (Oxford University Press, 2014) was to survey, analyze, and evaluate the ethics and economics of randomized controlled trials.  (The editors are George DeMartino and Deirdre N. McCloskey.) That is what we have done but for some reason Friedman—joined by other randomizing economists–wants to “leave aside” the ethics question.  Or perhaps Friedman believes with at least one of his colleagues at The World Bank, a self-proclaimed “consequentialist” of our acquaintance, that the ends justify the means.  End of story, as Chairman Mao would say, despite the large number of untreated Chinese schoolchildren who can’t read the writing on the blackboard.

The real point at issue seems to be about the extent of statistical and ethical malpractice in randomized trials, which Friedman denies and we find to be oomphfully large.

Friedman and other critics have not examined the facts.  In our study we applied a 25 question survey of randomization, statistical significance and validity to all of the full length articles using randomization in the pages of the American Economic Review, 2000-2009, and the New England Journal of Medicine, 2000-2003.  The entire survey focuses on size matters/how much type questions in the spirit of Gosset aka Student (1938, Biometrika) and Ziliak and McCloskey (2008, The Cult of Statistical Significance).  The fact is that none of the AER papers (0%) offered data showing the extent of balance or imbalance of their experimental design and results.  Friedman, who does not mention our survey, nor any other survey, asserts nevertheless that “Many, if not most, randomized designs are stratified random designs; some take Student’s adviso to the extreme with pair-matched randomization.”  Adding: “Hence RCTs are typically very concerned with the balance of possibly correlated observable characteristics.” But he has never before met a randomista, he claims.

Friedman seems to agree with us that randomizers ought to take Student’s advice.  (Though he calls “extreme” Student’s method of paired matching; go figure.) But his assertion that randomistas habitually balance is not grounded in anything more than fancy, a random coin flip.  Put it this way: if randomized trials were obsessed with balance, as Friedman claims, the trials would be called balanced controlled trials and randomistas would be called Guinnessometricians.

Friedman believes he is contradicting Ziliak and Teather-Posadas when he asserts:

“From the standpoint of policy, the salient question is not whether an intervention produces any benefit, but how much benefit and delivered at what cost.”

We agree.  That is exactly what we say in our 25 question survey and in our discussion of the survey results. And the “size matters/how much?” question is exactly what Ziliak has been recommending together with Deirdre McCloskey in more than two decades of research and in a full length book on the topic, The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives (2008).

Friedman struggles to defend the Glewwe et al. study, which invokes the randomization principle to justify the authors’ decision to leave untreated thousands of sight defective schoolchildren, though it was clearly within the budget to treat them, were they so inclined.  We understand the struggle. The authors were not inclined to treat the control group, though the economics of opportunity cost (not to mention the ethics of the impartial spectator and of attending physicians) would direct them to.

In truth, data were dropped from the 19,000 student study for fear of spoiling the randomized experimental design with a leave out control group. “Unfortunately,”  Glewwe et al. write, “in a few cases students in control townships were given eyeglasses because, after providing the eyeglasses in the treatment townships, the remaining funds were used to buy eyeglasses for students with poor vision in the paired control township. This occurred in two control townships in Tianzhu and three control townships in Yongdeng” (Glewwe, Park, and Zhao 2012, p. 8).  In remaining townships the “randomization was done according to the plan” (Glewwe et al. p. 8), the authors of the China study state. But one ought to question a practice which elevates abstract method over ethics and the chance to really help.  Friedman does not.

On the point about equipoise, we wonder if Friedman, joined by Ravallion, Glennerster, and Glewwe et al. among others would be willing to consult eye doctors about the efficacy of eyeglasses.  What about contact lenses?  Why not test eyeglasses against contact lenses, to see which helps better at school, and by how much?  Eye doctors and economists alike would be more convinced by such a study, knowing that all of the participants – the controls, included — were offered best practice treatment, the opportunity cost of the treatment in question.

If parents of the children shun the gift – and that was an issue in China — perhaps economists should examine the rhetoric and sociology of the gift and gift giving strategy. Have they considered the cultural barriers between Ph.D. American benefactors and rural impoverished Chinese recipients?  Glewwe et al. did not, so far as they say in their paper.  Our survey of the literature suggests that little attention is paid in general to the role of culture clash (or class clash, or ethnic clash) in randomized controlled study participation rates.

Guinness Brewery, St James s Gate Dublin Ziliak photo

Friedman’s mocking tone and numerous little errors suggest that he wrote his reply to our study a bit too hastily.  (In his blog post at The World Bank he said, countering ZTP, that “randomistas” do not exist; at least he has never met one, he said.  But when a supportive comment appeared after his post, he called the author of the comment “A true randomista!”  For Friedman, then, randomistas don’t exist but he knows one when he sees one. For another proof that randomistas are real and numerous, see Ziliak’s article on the philosophy of John List and Steve Levitt, published in volume 1, issue 1 of the Review of Behavioral Economics.)

We agree on the main theoretical and policy point: that randomization is neither necessary nor sufficient for good economics, policy, and medicine.

Related articles by Ziliak:

The Validus Medicus and a New Gold Standard (The Lancet, 2010)

Significant Errors – Reply to Stephen Senn (The Lancet, 2010)

W.S. Gosset and Some Neglected Concepts in Experimental Statistics: Guinnessometrics II (JWE, 2012)