The first results from a major project to measure the reliability of cancer research have highlighted a big problem: Labs trying to repeat published experiments often can’t.
That’s not to say that the original studies are wrong. But the results of a review published Thursday, in the open-access journal eLife, are a sobering reminder that science often fails at one of its most basic requirements — an experiment in one lab ought to be reproducible in another one.
And the fact that they often aren’t could have big health implications. Many exciting ideas in cancer research never pan out. One reason is that findings from the initial studies don’t stand the test of time.
“Reproducibility is a central feature of how science is supposed to be,” says Brian Nosek, who spearheaded this research at the Center for Open Science.
Nosek is also a psychology professor at the University of Virginia. A few years ago, he organized a similar effort to examine research in his field. And his results garnered worldwide attention when two-thirds of the original findings in psychology couldn’t be reproduced.
Nosek decided to explore the work from cancer biology labs after two high-profile studies, from drugmakers Bayer and Amgen, reported dismal results when they tried to reproduce some cancer papers. Only 25 percent of the papers Bayer examined were reproduced. Amgen was able to replicate only six out of the 53 studies it examined.
“Those were earthshaking reports, in the sense that the community responded very strongly to these reports of challenges to reproduce some of these core findings,” Nosek says.
But scientists at Bayer and Amgen wouldn’t say which experiments they examined, so their work raised many questions but left no way for scientists to answer them.
“The cancer reproducibility project in cancer biology was an attempt to advance that discussion with an open project,” Nosek says.
This project is transparent about how it picked the studies to reproduce. It also published methods and study plans in advance. In collaboration with a California company called Science Exchange, the reviewers got grants to replicate key experiments from as many as 50 high-profile studies. (They will very likely run out of money before they’re able to complete that work, however.)
They’ve now published the results of their first five attempts, in eLife.
“Three of the five show very, very striking differences from the original,” says Timothy Errington, a biologist at the Center for Open Science and collaborator in the project. As for findings from the other two studies, he says, “I think you’ll get a lot of opinions about whether they replicate or not.”
Errington says he was quite surprised by the results.
In one case, the original scientists went the extra mile to help the labs doing the follow-up studies reduce potential sources of error. “The lab gave us the same drug. This is wonderful. Because that could have been a sticking point,” Errington says. “They gave us the same tumor cells that they used.”
Yet the replicating lab didn’t end up with the same results.
Scientists have had so much confidence in two of the original studies that drug companies already have sunk millions of dollars into efforts to try the concepts out in people. But the follow-up experiments for one of those didn’t validate the original results.
The inevitable question is whether the original science was wrong, or whether the scientists who tried to repeat that work somehow got tripped up.
The review project farmed out its actual laboratory work to commercial labs that perform experiments for the pharmaceutical industry, or to university “core facilities,” such as centralized labs that do a lot of research on mice. Those labs generally work to standards required by the Food and Drug Administration.
But research with living systems is never simple, so there are many possible sources of variation in any experiment, ranging from the animals and cells to the details of lab technique.
And there isn’t even clear agreement about when a study’s findings can be considered to have been reproduced.
Sean Morrison, an editor at eLife and a Howard Hughes Medical Institute investigator at the University of Texas Southwestern Medical Center, says that by his count, two studies’ findings were substantially reproduced. The findings of one other were not, he says, and two others have results that simply can’t be interpreted.
“One of the difficulties of the reproducibility project is they have limited time and resources to spend on any one study,” Morrison says. “As a result, they can’t go back and do these things over and over again when the first results turn out to be uninterpretable.”
Errington agrees that the reproducibility project leaves that big question hanging — but the scientists don’t plan to answer it.
“As exciting as that is, and as important as that is — and we hope someone else will follow up on it — we’re more curious about, ‘What does that look like when we do this across many, many, many studies.’ ”
But Dr. Erkki Ruoslahti, at the nonprofit Sanford Burnham Prebys Medical Discovery Institute in La Jolla, Calif., is worried that the reproducibility project could do real damage. The reviewers couldn’t reproduce his original study but didn’t follow up to understand why.
“I am really worried about what this will do to our ability to raise funding for our clinical development,” he writes in an email to Shots. “If we, and the many laboratories who have reproduced our results, are right and the reproducibility study is wrong — which I think is the case — they will not be doing a favor to cancer patients.”
Dr. Irving Weissman, a professor of pathology and developmental biology at Stanford University, is also disappointed in how the reproducibility project handled his experiment. His paper reported finding a protein that’s present on all human cancer cells — a finding that Weissman says has been replicated many times in other labs.
The reproducibility project chose to repeat a peripheral part of Weissman’s paper — an experiment involving mice, not human tissues. And, Weissman says, the replicating lab stumbled over an early step in the experiment, but plowed ahead anyway.
Weissman says he offered to bring scientists into his lab to train them in the technique, but the Reproducibility Project didn’t do that. (That would undercut one of its goals, which is to see whether scientists working independently can verify published results.)
It’s important to replicate important studies, Weissman tells Shots, “but you can’t do it halfheartedly. You have to be serious about it.”
Errington and Nosek hope people who hear about the project’s findings don’t jump to any conclusions about why individual studies came to different conclusions. They’re trying to look at the big picture across dozens of studies, the two scientists say, and they don’t place too much confidence in any single result.
The reproducibility project is looking for patterns across cancer research and also trying to identify common reasons that labs might have trouble reproducing one another’s work. Are the directions offered in the methods section of a paper too sketchy? Or maybe experiments frequently work only under unusual conditions.
Morrison, who is involved as a journal editor rather than a participant, says the entire reproducibility project is itself one big experiment.
“I think it’s too early for us to know whether this approach is the right approach or the best approach for testing the reproducibility of cancer biology,” he says. “But it will be a data point, and it will start the conversation.”
The conversation is important because the vast majority of treatment ideas that come from the lab fail when they’re tried in people. Cathy Tralau-Stewart, a pharmacologist at the University of California, San Francisco, says scientists often don’t know why those clinical failures occur, “and so that’s why I think studies like this are really, really important.”
Unfortunately, Nosek says, there are few incentives today for scientists to repeat experiments from other labs. The rewards are for publishing new ideas, not the less glamorous, but still critical, work of verifying somebody else’s findings.
“If we’re going to take reproducibility seriously,” Nosek says, experiments that attempt to reproduce the findings of others “need to be a valued part of scientific contribution.”