The Noughties and early 2010s were a good time to be a behavioural scientist. At least, they were if you happened to be doing the right kind of behavioural science – the sexy kind. Pick an eye-catching, media-friendly topic, garner a surprising result in your research, and the world was your oyster. A well-trodden path that included university tenure, a Ted talk watched by millions, a New York Times bestseller and a lucrative gig on the public speaking circuit lay ahead. Career and cash-wise, you’d be set for life.
Theories from this compelling and accessible branch of science made their way firmly into the mainstream via pop science books. People well outside the sphere of academia could be found quoting the likes of Richard Thaler and Cass Sunstein’s Nudge (exploring why we make bad decisions based on biases); Barry Schwartz’s The Paradox of Choice (positing that too many options lead to bad decision making); and Stephen J Dubner and Steven Levitt’s Freakonomics (an immensely readable melding of pop culture and economics).
A whole strata of academics became akin to rock stars thanks to their ability to explain why humans behaved the way they did. Research was applied by government policymakers and businesses alike; Thaler even won the Nobel Prize in 2017 for his work providing psychological insights into how people make economic decisions.
There was just one problem. Though much of the science in question was robust (you don’t tend to win Nobel Prizes based on woolly research), not every success story that landed in the spotlight was quite as rigorously backed up. It was, experts explain now, something of a Wild West period.
“There was a lot of slackness in how researchers collected data,” says David Comerford, professor of economics at the University of Sterling. He cites a 2012 survey of 2,000 psychology academics where the majority openly admitted to engaging in at least one questionable research practice during their career. “It was a race to get those sorts of cutesy results,” adds Comerford. “It’s almost like the more simplistic you could make your theory, the more fun, the more likely it was that you were going to get big rewards.”
The peer review process at that time didn’t help: instead, it may have allowed dodgy experiments to slip through the net. “Back then, it would largely examine whether your finding was interesting enough, whether it was novel enough, whether it was in line with existing research,” says Joe Simmons, a behavioural scientist at the University of Pennsylvania’s Wharton School. “But it was never asking: is it true? Because that was seen as being accusatory.”

Most of the iffy studies were not outright fraudulent – they were published in good faith, but would come under the umbrella of “bad science”. One of the most famous theories, which gained so much traction that it was adopted by the Conservative Party and made its way into NHS management training materials, was “power posing”. Amy Cuddy, who was an associate professor at Harvard Business School when her research came out in 2010, claimed that adopting body postures associated with dominance and power – such as a stance with feet spread wide – for two minutes could increase testosterone, decrease cortisol and result in candidates performing better in job interviews.
But the findings didn’t replicate – the technical term for when an experiment is repeated to check that the original findings are borne out. With a robust scientific study, you’d expect the results to be roughly the same every time. “It’s supposed to be a recipe, right?” says Simmons. “Mix these two chemicals together, and there’s an explosion – and when I do it, I should get the same result.”
Sometimes, with psychology, that doesn’t happen because the initial participants were all from a specific group (undergrad students in the US, for example), and therefore the outcomes don’t translate when applied to a different demographic – this is referred to as a “fragile” study. The results are true, but only for a certain subset of people.
In the case of the power pose, one reason it likely didn’t translate was because Cuddy’s study had involved a minuscule sample size of just 42 participants. After the data was called into question, the claims that posing had measurable chemical impacts were swiftly debunked. This isn’t simply “fragile”, according to Michael Sanders, a professor of public policy at King’s College London and author of a forthcoming book titled Bad Behaviour: Fragility, Failure and Fraud in Behavioural Science. It’s what is referred to as a “failure”.
“That’s when the only reason we found this result in the first case was because we did dodgy things, statistically – not fraud, not making things up, but bad practice,” he says. “Data is like people. If you torture it for long enough, it’ll tell you more or less whatever you want to hear.”
Other questionable research practices abounded during this free-and-easy period. They included conducting research without first positing any kind of hypothesis, instead running tests until you came up with something of note. This approach is akin to taking a dart, throwing it at the wall, and then painting the dartboard around it “so that it looks like you’ve got a bullseye”, says Sanders.
Then there’s “p-hacking”, so called because it involves manipulating data, whether consciously or not, until the p-value – a number between zero and one expressing the likelihood that a particular set of results could have been a fluke – reaches the desired 0.05 or lower. Hitting this threshold indicates that results are statistically significant, raising the chances that a study will get published.
Another eyebrow-raising tactic is called salami slicing, whereby a scientist runs one study but passes it off as multiple studies. “That’s bad because the strength of any of these tests is weakened by knowing that it was actually part of a battery of tests,” says Sanders. He uses the analogy of telling everyone you rolled a six – but failed to mention you’d actually rolled six dice at the same time to get that outcome.
One of the most famous proponents of these sketchy techniques was now-disgraced social science mega-star Brian Wansink. The head of Cornell’s prestigious food psychology research unit, he became a media darling thanks to a number of accessible theories on how lifestyle tweaks could trigger weight loss (his much-cited experiment involving refilling soup bowls was credited with inspiring X’s (Twitter) constantly refreshing newsfeed).
But Wansink’s star came crashing back down to Earth in 2016 after he wrote an ill-judged blog openly describing how he had encouraged a PhD student to engage in both p-hacking and salami slicing. After analysing data relating to an all-you-can-eat pizza buffet, the student initially had a “failed study” with “null results” on her hands, wrote Wansink. But he encouraged her to keep combing through the data until she “discover[ed] solutions that held up”, he added.
“It’s basically like me writing a blog saying that sometimes, when I want things from the shops but don’t want to pay for them, I just steal them – and that’s a really good thing,” says Sanders.

It’s no wonder he’s angry, having been burned more than most by malpractice in behavioural science. In 2015, Sanders was part of a team that had been given a hefty grant by the World Bank to apply the findings of a previous seminal study from Harvard star researcher Francesca Gino – a woman who specialised, somewhat ironically, in “honesty and ethical behaviour” – in Guatemala. Her work had conclusively found that making people sign a declaration of honesty before filling in forms online, rather than afterwards, made them more likely to tell the truth. Sanders and his colleagues implemented the findings on tax returns – but when they compared the returns of those who were given an honesty declaration to sign beforehand to those who signed it afterwards, there was zero perceptible difference.
Their experience suggested that the original study might have been fragile. In 2021, accusations came out that the study had not been fragile or a failure, but manifestly fraudulent. This meant the data hadn’t just been massaged or misrepresented – it was allegedly made up.
When Sanders found out that the original research he’d based his work on was null and void, he felt “a sense of betrayal”. “There is an anger,” he says. “This is a field that I love and a field that I’ve dedicated a lot of my life to, and these people have… well, I’m trying to think of a better metaphor than ‘pissed in the pool’.”
Simmons was part of the team of three scientists who brought the truth to light. Along with fellow academics Uri Simonsohn and Leif Nelson, he has been acting as something of a research detective since 2013, when the trio set up a now-legendary blog called Data Colada. The initial impetus had been to improve standards in research and eradicate questionable research practices; little did they know they’d end up exposing a case of seemingly intentional deception.
“Way back in 2010, we were independently reading a lot of newly published articles and basically being like: we don’t believe these,” Simmons says. The straw that broke the incredulous camel’s back was a paper on precognition, published in a prominent and respected journal in 2011, claiming to have “proven” that some people could predict the future.
“That was a wake-up call,” says Comerford. “After that, a lot of scientists started to wonder: are there ways that you can get your result to look as though it’s not a fluke, simply by running a lot of different experiments and excluding some of the conditions that don’t work?”
That same year, Simmons, Simonsohn and Nelson published their own groundbreaking paper, “False Positive Psychology”, which shone a light on all the commonplace bad practices they were seeing being conducted on a regular basis. It was the beginning of a reckoning for the field, producing the spark that would ignite a revolution.
Since then, seismic shifts have occurred. Pre-registration, whereby you have to say in advance what you’re going to be testing for and what you expect to find out, so you can’t cherry-pick your findings, has become standard practice. “You have to show that your dartboard is in place before you start throwing darts at it,” as Sanders puts it.
It’s far more common for scientists to make their data publicly available when they publish their results – previously unheard of – and, thanks in part to Data Colada’s work, academics are much more mindful of the fact that papers may be subject to probing and scrutiny. “It used to be that they thought the probability of being caught was zero – and they were basically right,” says Simmons. “Now it’s not zero. Do you want to take that risk?”
As for tiny sample sizes like the one Cuddy made her name from? Unthinkable. “Small samples, sexy findings – that gets laughed out of the room now for the most part,” says Simmons. “There’s a lot of scepticism around that stuff.”
It helps that the old guard, who were the most hostile towards change, are on their way out – and that the young academics coming up are much more on board with tightening up and working to a higher standard.
There’s still more to do; Sanders proposes implementing salary caps to de-incentivise bad actors. “There’s no reason in practice why we couldn’t say, look, a quarter of a million dollars a year is the maximum you can get paid as an academic.”
The peer review system, while improved, is also imperfect. “The gatekeepers of science are academics who are really pushed for time with lots of other commitments, and when they’re invited to peer review an article, there’s no incentive for them to do it other than civic-mindedness,” says Comerford. “In all but a very few cases, it’s unpaid.”
But the experts all feel hopeful for the future. “Things are better, and I think they’ll just keep getting a little better,” says Simmons.
The allegations against Gino and co may have temporarily tarnished the discipline of psychology and behavioural science, but they also might have done it a favour, ushering in a new era of higher quality, more checks and balances – and, in short, better science.