0
Back To Top Jump Location
Sign In  | Cart
Left Shadow
Right Shadow
Editorials and Viewpoints |

From Hot Hands to Declining Effects: The Risks of Small Numbers FREE

Michael S. Lauer, MD
[+] Author Information

The views expressed herein are those of the author, a full-time employee of the NHLBI, and do not necessarily reflect those of the NHLBI, the NIH, or the United States Department of Health and Human Services.Reprint requests and correspondence: Dr. Michael S. Lauer, National Heart, Lung, and Blood Institute, Division of Cardiovascular Sciences, 6701 Rockledge Drive, Room 8128, Bethesda, Maryland 20892

American College of Cardiology Foundation

J Am Coll Cardiol. 2012;60(1):72-74. doi:10.1016/j.jacc.2012.02.048
Published online

  About 25 years ago, a group of researchers demonstrated that there is no such thing as the “hot hand” in professional basketball. When a player hits 5 or 7 shots in a row (or misses 10 in a row), what's at work is random variation, nothing more. However, random causes do not stop players, coaches, fans, and media from talking about and acting on “hot hands,” telling stories and making choices that ultimately are based on randomness. The same phenomenon is true in medicine. Some clinical trials with small numbers of events yielded positive findings, which in turn led clinicians, academics, and government officials to talk, telling stories and sometimes making choices that were later shown to be based on randomness. I provide some cardiovascular examples, such as the use of angiotensin receptor blockers for chronic heart failure, nesiritide for acute heart failure, and cytochrome P-450 (CYP) 2C19 genotyping for the acute coronary syndromes. I also review the more general “decline effect,” by which drugs appear to yield a lower effect size over time. The decline effect is due at least in part to over interpretation of small studies, which are more likely to be noticed because of publication bias. As funders of research, we at the National Heart, Lung, and Blood Institute seek to support projects that will yield robust, credible evidence that will affect practice and policy in the right way. We must be alert to the risks of small numbers.

Just over 25 years ago, the cognitive psychologist Amos Gilovich et al. (1) published a provocative analysis of the “hot hand” in professional basketball. After analyses of thousands of shots, it turns out that, contrary to popular belief, there is no such thing as a hot hand. Instead people—players, coaches, fans, and the media—observe variations that are entirely due to random chance and read into them actionable stories. These variations are invariably based on small numbers of events (e.g., 5 consecutive baskets). The study by Tversky et al. was not well received. The legendary Boston Celtics coach Arnold “Red” Auerbach commented, “Who is this guy? So he makes a study. I couldn't care less” (2).

Although mistaking randomness for cohesive stories may seem innocent enough when talking about professional basketball, Tversky's analysis revealed a truth about human psychology that has far-reaching implications. People are resistant to ascribe unexpected observations to random chance and much prefer to tell stories, stories that, in the words of psychologist and Nobel Prize winner Daniel Kahneman, describe “a view of the world around us that is simpler and more coherent than the data justify. Jumping to conclusions is a safer sport in the world of our imagination than it is in reality” (2).

Fast forward from 1985's classic paper by Gilovich et al. (1) to the year 2011 drawing to a close: Holmes et al. (3) published a paper on the purported associations between CYP2C19 genotype and cardiovascular events among patients receiving the platelet antagonist clopidogrel. There has been concern that patients with certain CYP2C19 variants may be less likely to manifest decreased platelet activity with clopidogrel and may therefore be at increased risk for poor clinical outcomes. Despite a great deal of controversy, the Food and Drug Administration issued a “black box warning,” and some companies are selling direct-to-consumer genetic tests. Holmes et al. (3) reviewed 32 studies involving 42,016 patients and 3,545 cardiovascular events and found no association between genotype and risk. But perhaps of even greater interest, the researchers found that the association was noted only in small studies—studies with relatively few events. There was a strong gradient whereby the association weakened as the number of events per study increased. CYP2C19 genotype predicted a substantially increased risk of events in studies with 99 or fewer events (pooled risk ratio [RR]: 1.83), a modestly increased risk in studies with 100 to 199 events (pooled RR: 1.26), and no association in studies with at least 200 events (pooled RR: 0.97) (3). An important likely contributor to this phenomenon is “publication bias,” whereby small studies showing an effect are more likely to be published than are small studies not showing an effect. In any case, just like basketball's hot hand, the phenomenon disappeared when investigators analyzed large samples.

In a sense, the clopidogrel CYP2C19 story is, in baseball legend Lawrence “Yogi” Berra's words, “dé jà vu all over again.” In 1997, Pitt et al. (4) published the primary results of the ELITE (Evaluation of Losartan in the Elderly) study, in which 722 patients with heart failure were randomly assigned to captopril or losartan. The investigators powered the trial to test an effect on renal function, but noted an “unexpected” lower mortality rate (17 vs. 32 deaths in the losartan and captopril groups, respectively; RR: 0.54). Fortunately, the investigators argued that “whether the apparent mortality advantage … holds true … requires further study” (4). They proceeded to design and perform a second trial (ELITE II), which enrolled 3,152 patients and recorded 530 deaths, >10 times the number of deaths in the original ELITE trial. This time there was no effect of losartan on mortality (280 vs. 250 deaths for captopril; RR: 1.13) (5).

And this past year we have seen at least 1 other prominent example of a large study laying to rest beliefs generated by multiple smaller studies. In 2002, a trial in 489 patients suggested that nesiritide could lead to more rapid resolution of decompensated heart failure (6). In 2005, however, Sackner-Bernstein et al. (7) published an 862-patient meta-analysis that found an increased risk of death within 30 days of therapy; the analysis hinged on 50 deaths. This past year saw the publication of a much larger trial of 7,141 patients, among whom 267 died (8). In this trial, nesiritide did cause hypotension (as noted in prior trials), but had minimal to no impact on resolution of decompensated heart failure or 30-day mortality. In the nesiritide case, a large study refuted beliefs of benefit and concerns about harm.

The “disappearance” of seemingly real effects, as was seen with CYP2C19-clopidogrel and losartan in heart failure, is becoming increasingly familiar to scientists and observers of science. Journalist Jonah Lehrer (9) published an article in The New Yorker on the so-called “decline effect,” with the intriguing headline question, “Is there something wrong with the scientific method?” Lehrer cited a number of examples of effects or associations that disappeared upon further, more rigorous investigations on drugs for the treatment of psychosis, feather symmetry's impact on the behavior of swallows, and a range of topics in ecological and evolutionary biology (10). And, indeed, no less a luminary than the late paleontologist and evolutionary biologist Stephen Jay Gould fell into the trap of small numbers when he attempted to synthesize reasons why latter-day baseball players have failed to bat over .400; his whole analysis was based on only 9 “endpoints” of .400 or better season batting averages throughout the history of baseball (11).

For students of probability, none of this should be surprising. In a widely acclaimed paper, Ioannidis (12) offered a mathematical proof showing that small, underpowered studies are not only less likely to discover real effects but are also more likely to yield false positive findings. That is, if an underpowered study shows an effect, one that meets standards for statistical significance, there is a high likelihood that the study findings were the result of random variation, variation that misleads us to think that an effect is real. Because of publication and confirmation biases, small positive studies are more likely to be published, discussed, and ultimately believed (1315). Once beliefs become entrenched, it becomes difficult to perform definitive large-scale studies, or believe them if their results are negative.

What is the role of small studies? We often perform small trials to see if there is a “biological signal,” a surrogate endpoint that would suggest clinical benefit. If enough small trials are positive, we might then proceed to a large definitive trial. A problem arises, though, when a body of evidence based on small trials becomes the basis for firm beliefs that translate into practice. Califf (16) cited the stories of antiarrhythmic drugs and hormone therapy as cases in which much money was spent on many small trials, trials with misleading results that led to real harm that only stopped once large-scale trials were completed and reported. Califf argues that excessive complexity and bureaucratic requirements prevent investigators from moving rapidly to large-scale pragmatic trials that are more likely to provide clinically meaningful answers (16).

Perhaps we should decrease our reliance on small studies and instead move more quickly to large-scale studies. The only way we could contemplate this would be to ensure that the large trials are simple, inexpensive, and, of course, large enough to yield believable results (17). We know from experience that it is possible to perform high-impact large cardiovascular trials at very low cost (18). One of our top priorities during this period of fiscal austerity is to discover and “rediscover” our ability to design and execute large, simple, and inexpensive trials. The National Heart, Lung, and Blood Institute is now considering proposals for the testing of novel methods that enable low-cost conduct of clinical trials (19). We are particularly interested in approaches that minimize specialized infrastructure, minimize clinic visits designed solely for collecting trial data, offer novel low-burden methods of obtaining informed consent, and employ low-cost methods of monitoring study conduct. Scientific thought leaders have suggested other approaches, including cluster randomization (20), point-of-care randomization (21), better use of the Internet (22), passive follow-up, incorporation of trials into existing clinical registries (23), and foregoing adjudication (24).

As funders of clinical studies and trials, the primary interest of the National Heart, Lung, and Blood Institute is to generate robust scientific evidence that is most likely to have an impact on practice and policy in the right ways. As Kahneman (2) and Gilovich et al. (1) pointed out, irrespective of training, humans are poor intuitive statisticians. We should recognize the risks when over interpreting findings generated from small, underpowered studies (2,12). We should seriously consider employing alternate approaches like Bayesian methods to construct measured, statistically appropriate hypotheses on the basis of small studies, whether they be basketball shooting streaks (25) or early phase clinical trials. Bayesian methods may also help us avoid over-interpretation of small, frequentist p values that often accompany large studies (26). Furthermore, a Bayesian perspective may help us make better decisions about how to spend scarce research dollars. One could argue, for example, that from a Bayesian perspective, the “out-of-context” endpoint findings from the ELITE (Evaluation of Losartan in the Elderly) study (4) were so unlikely to represent a true effect that the ELITE II study (5) should never have been performed. We should all keep in mind the real opportunity costs borne by researchers and research sponsors when going on “wild goose chases” resulting from spurious leads from underpowered studies.

Our many experiences of the “decline effect” (9), whereby the therapeutic effect of drugs appears to wane over time, should remind us of our obligation to better strategize when it is most appropriate to conduct small studies and how best to interpret their results. The implications go well beyond misrepresenting hot hands in basketball—they go to the heart of the power of the scientific method to improve human health.

References

Gilovich  T., Vallone  R., Tversky  A.; The hot hand in basketball: on the misperception of random sequences. Cogn Psychol. 17 1985:295-314.
CrossRef
Kahneman  D.; Thinking, Fast and Slow. 1st edition 2011 Farrar, Straus and Giroux New York, NY
Holmes  M.V., Perel  P., Shah  T., Hingorani  A.D., Casas  J.P.; CYP2C19 genotype, clopidogrel metabolism, platelet function, and cardiovascular events. JAMA. 306 2011:2704-2714.
CrossRef | PubMed
Pitt  B., Segal  R., Martinez  F.A.; Randomised trial of losartan versus captopril in patients over 65 with heart failure (Evaluation of Losartan in the Elderly Study, ELITE). Lancet. 349 1997:747-752.
CrossRef | PubMed
Pitt  B., Poole-Wilson  P.A., Segal  R.; Effect of losartan compared with captopril on mortality in patients with symptomatic heart failure: randomised trial—the Losartan Heart Failure Survival Study ELITE II. Lancet. 355 2000:1582-1587.
CrossRef | PubMed
Publication Committee for the VMAC Investigators Intravenous nesiritide vs nitroglycerin for treatment of decompensated congestive heart failure: a randomized controlled trial. JAMA. 287 2002:1531-1540.
CrossRef | PubMed
Sackner-Bernstein  J.D., Kowalski  M., Fox  M., Aaronson  K.; Short-term risk of death after treatment with nesiritide for decompensated heart failure: a pooled analysis of randomized controlled trials. JAMA. 293 2005:1900-1905.
CrossRef | PubMed
O'Connor  C.M., Starling  R.C., Hernandez  A.F.; Effect of nesiritide in patients with acute decompensated heart failure. N Engl J Med. 365 2011:32-43.
CrossRef | PubMed
Lehrer  J.; The truth wears off: is there something wrong with the scientific method?. December 13, 2010 New Yorker http://wwwnewyorker.com/reporting/2010/12/13/101213fa_fact_lehrer Accessed December 29, 2011
Jennions  M.D., Moller  A.P.; Relationships fade with time: a meta-analysis of temporal trends in publication in ecology and evolution. Proc Biol Sci. 269 2002:43-48.
CrossRef | PubMed
Gould  S.J.; Triumph and Tragedy in Mudville: A Lifelong Passion for Baseball. 2004 Jonathan Cape London, UK
Ioannidis  J.P.; Why most published research findings are false. PLoS Med. 2 2005:e124
CrossRef | PubMed
Schooler  J.; Unpublished results hide the decline effect. Nature. 470 2011:437
CrossRef | PubMed
Fanelli  D.; Do pressures to publish increase scientists' bias?. An empirical support from US data. PLoS One. 5 2010:e10271
CrossRef | PubMed
Nissen  S.E.; Pharmacogenomics and clopidogrel. JAMA. 306 2011:2727-2728.
CrossRef | PubMed
Califf  R.M.; Clinical trials bureaucracy: unintended consequences of well-intentioned policy. Clin Trials. 3 2006:496-502.
CrossRef | PubMed
Yusuf  S., Collins  R., Peto  R.; Why do we need some large, simple randomized trials?. Stat Med. 3 1984:409-422.
CrossRef | PubMed
Tavazzi  L., Maggioni  A.P., Tognoni  G.; Participation versus education: the GISSI story and beyond. Am Heart J. 148 2004:222-229.
CrossRef | PubMed
 RFA-HL-12-019: Pilot Studies to Develop and Test Novel, Low-Cost Methods for the Conduct of Clinical Trials (R01). http://grants.nih.gov/grants/guide/rfa-files/rfa-hl-12-019.html Accessed December 29, 2011
Platt  R., Takvorian  S.U., Septimus  E., Hickok  J., Moody  J., Perlin  J.; Cluster randomized trials in comparative effectiveness research: randomizing hospitals to test methods for prevention of healthcare-associated infections. Med Care. 48 (Suppl) 2010:52-57.
CrossRef | PubMed
Fiore  L.D., Brophy  M., Ferguson  R.E.; A point-of-care clinical trial comparing insulin administered using a sliding scale versus a weight-based regimen. Clin Trials. 8 2011:183-195.
CrossRef | PubMed
Yancy  W.S.  Jr., Maciejewski  M.L., Schulman  K.A.; Animal, vegetable, or… clinical trial?. Ann Intern Med. 153 2010:337-339.
PubMed
Frobert  O., Lagerqvist  B., Gudnason  T.; Thrombus Aspiration in ST-Elevation Myocardial Infarction in Scandinavia (TASTE trial). A multicenter, prospective, randomized, controlled clinical registry trial based on the Swedish Angiography and Angioplasty Registry (SCAAR) platform. Study design and rationale. Am Heart J. 160 2010:1042-1048.
CrossRef | PubMed
Pogue  J., Walter  S.D., Yusuf  S.; Evaluating the benefit of event adjudication of cardiovascular outcomes in large simple RCTs. Clin Trials. 6 2009:239-251.
CrossRef | PubMed
Berry  D.A.; Statistics: A Bayesian Perspective. 1996 Duxbury Press Belmont, CA
Diamond  G.A., Kaul  S.; Prior convictions: Bayesian approaches to the analysis and interpretation of clinical megatrials. J Am Coll Cardiol. 43 2004:1929-1939.
CrossRef | PubMed

Figures

Tables

Interactive Graphics

Video

References

Gilovich  T., Vallone  R., Tversky  A.; The hot hand in basketball: on the misperception of random sequences. Cogn Psychol. 17 1985:295-314.
CrossRef
Kahneman  D.; Thinking, Fast and Slow. 1st edition 2011 Farrar, Straus and Giroux New York, NY
Holmes  M.V., Perel  P., Shah  T., Hingorani  A.D., Casas  J.P.; CYP2C19 genotype, clopidogrel metabolism, platelet function, and cardiovascular events. JAMA. 306 2011:2704-2714.
CrossRef | PubMed
Pitt  B., Segal  R., Martinez  F.A.; Randomised trial of losartan versus captopril in patients over 65 with heart failure (Evaluation of Losartan in the Elderly Study, ELITE). Lancet. 349 1997:747-752.
CrossRef | PubMed
Pitt  B., Poole-Wilson  P.A., Segal  R.; Effect of losartan compared with captopril on mortality in patients with symptomatic heart failure: randomised trial—the Losartan Heart Failure Survival Study ELITE II. Lancet. 355 2000:1582-1587.
CrossRef | PubMed
Publication Committee for the VMAC Investigators Intravenous nesiritide vs nitroglycerin for treatment of decompensated congestive heart failure: a randomized controlled trial. JAMA. 287 2002:1531-1540.
CrossRef | PubMed
Sackner-Bernstein  J.D., Kowalski  M., Fox  M., Aaronson  K.; Short-term risk of death after treatment with nesiritide for decompensated heart failure: a pooled analysis of randomized controlled trials. JAMA. 293 2005:1900-1905.
CrossRef | PubMed
O'Connor  C.M., Starling  R.C., Hernandez  A.F.; Effect of nesiritide in patients with acute decompensated heart failure. N Engl J Med. 365 2011:32-43.
CrossRef | PubMed
Lehrer  J.; The truth wears off: is there something wrong with the scientific method?. December 13, 2010 New Yorker http://wwwnewyorker.com/reporting/2010/12/13/101213fa_fact_lehrer Accessed December 29, 2011
Jennions  M.D., Moller  A.P.; Relationships fade with time: a meta-analysis of temporal trends in publication in ecology and evolution. Proc Biol Sci. 269 2002:43-48.
CrossRef | PubMed
Gould  S.J.; Triumph and Tragedy in Mudville: A Lifelong Passion for Baseball. 2004 Jonathan Cape London, UK
Ioannidis  J.P.; Why most published research findings are false. PLoS Med. 2 2005:e124
CrossRef | PubMed
Schooler  J.; Unpublished results hide the decline effect. Nature. 470 2011:437
CrossRef | PubMed
Fanelli  D.; Do pressures to publish increase scientists' bias?. An empirical support from US data. PLoS One. 5 2010:e10271
CrossRef | PubMed
Nissen  S.E.; Pharmacogenomics and clopidogrel. JAMA. 306 2011:2727-2728.
CrossRef | PubMed
Califf  R.M.; Clinical trials bureaucracy: unintended consequences of well-intentioned policy. Clin Trials. 3 2006:496-502.
CrossRef | PubMed
Yusuf  S., Collins  R., Peto  R.; Why do we need some large, simple randomized trials?. Stat Med. 3 1984:409-422.
CrossRef | PubMed
Tavazzi  L., Maggioni  A.P., Tognoni  G.; Participation versus education: the GISSI story and beyond. Am Heart J. 148 2004:222-229.
CrossRef | PubMed
 RFA-HL-12-019: Pilot Studies to Develop and Test Novel, Low-Cost Methods for the Conduct of Clinical Trials (R01). http://grants.nih.gov/grants/guide/rfa-files/rfa-hl-12-019.html Accessed December 29, 2011
Platt  R., Takvorian  S.U., Septimus  E., Hickok  J., Moody  J., Perlin  J.; Cluster randomized trials in comparative effectiveness research: randomizing hospitals to test methods for prevention of healthcare-associated infections. Med Care. 48 (Suppl) 2010:52-57.
CrossRef | PubMed
Fiore  L.D., Brophy  M., Ferguson  R.E.; A point-of-care clinical trial comparing insulin administered using a sliding scale versus a weight-based regimen. Clin Trials. 8 2011:183-195.
CrossRef | PubMed
Yancy  W.S.  Jr., Maciejewski  M.L., Schulman  K.A.; Animal, vegetable, or… clinical trial?. Ann Intern Med. 153 2010:337-339.
PubMed
Frobert  O., Lagerqvist  B., Gudnason  T.; Thrombus Aspiration in ST-Elevation Myocardial Infarction in Scandinavia (TASTE trial). A multicenter, prospective, randomized, controlled clinical registry trial based on the Swedish Angiography and Angioplasty Registry (SCAAR) platform. Study design and rationale. Am Heart J. 160 2010:1042-1048.
CrossRef | PubMed
Pogue  J., Walter  S.D., Yusuf  S.; Evaluating the benefit of event adjudication of cardiovascular outcomes in large simple RCTs. Clin Trials. 6 2009:239-251.
CrossRef | PubMed
Berry  D.A.; Statistics: A Bayesian Perspective. 1996 Duxbury Press Belmont, CA
Diamond  G.A., Kaul  S.; Prior convictions: Bayesian approaches to the analysis and interpretation of clinical megatrials. J Am Coll Cardiol. 43 2004:1929-1939.
CrossRef | PubMed

Correspondence

Latest JACC CME

Continuing Medical Education through JACC is a convenient way to fulfill your CME requirements while learning important information about the latest advances in cardiovascular medicine.

April 2013- JACC CME Activity
Repeat Revascularization and Outcome

March 2013- JACC CME Activity
Extreme Lipoprotein(a) Levels and Improved Cardiovascular Risk Prediction

Feb 2013- JACC CME Activity
Results from the BARI 2D Trial

Jan 2013- JACC CME Activity
Prognosis Among Healthy Individuals Discharged With a Primary Diagnosis of Syncope

Dec 2012- JACC CME Activity
Incidence of Heart Failure or Cardiomyopathy After Adjuvant Trastuzumab Therapy for Breast Cancer

Nov 2012- JACC CME Activity
A Collaborative Analysis of Individual Patient Data From 10 Randomized Trials

Oct 2012- JACC CME Activity
Radiofrequency Ablation of Premature Ventricular Ectopy Improves the Efficacy of Cardiac Resynchronization Therapy in Nonresponders

Sept 2012- JACC CME Activity
Exercise and Pharmacological Treatment of Depressive Symptoms in Patients With Coronary Heart Disease

Aug 2012- JACC CME Activity
Reduction in Life-Threatening Ventricular Tachyarrhythmias in Statin-Treated Patients With Nonischemic Cardiomyopathy Enrolled in the MADIT-CRT (Multicenter Automatic Defibrillator Implantation Trial with Cardiac Resynchronization Therapy)

July 2012- JACC CME Activity
Relationship of Beta-Blocker Dose With Outcomes in Ambulatory Heart Failure Patients With Systolic Dysfunction

For previous CME quizzes, please follow this link to CardioSource Lifelong Learning and MOC.

 

NOTE:
Citing articles are presented as examples only. In non-demo SCM6 implementation, integration with CrossRef’s “Cited By” API will populate this tab (http://www.crossref.org/citedby.html).
Submit a Comment
Submit a Comment

Some tools below are only available to our subscribers or users with an online account.

Related Content

Customize your page view by dragging & repositioning the boxes below.

Articles Related By Topic
Related Topics
PubMed Articles