Skip to main content

Danilo Díaz Granados read: People Have A Hard-To-Explain Bias Against Experimental Testing of Policies And Interventions, Preferring Just To See Them Implemented

A/B testing vector.

By Jesse Singal

Randomised experiments (also known as A/B testing) are an absolutely critical tool for evaluating everything from online marketing campaigns to new pharmaceutical drugs to school curricula. Rather than making decisions based on ideology, intuition or educated guess-work,  you randomise people to one of two groups and expose one group to intervention A (one version of a social media headline, a new drug, or whatever, depending on the context ), one group to intervention B (a different version of the headline, a different drug etc), and compare outcomes for the two groups.

To anyone who believes in evidence-based decision making, medicine and policy, randomised tests make sense. But as a team led by Michelle N. Meyer, a researcher at the Center for Translational Bioethics and Health Care Policy at the Geisinger Health System in Pennsylvania, write in a new article in PNAS, for some reason A/B testing sometimes elicits moral outrage. As an example, they point to the anger that ensued when Pearson Education “randomized math and computer science students at different schools to receive one of three versions of its instructional software: two versions displayed different encouraging messages as students attempted to solve problems, while a third displayed no messages.” The goal had been to test objectively whether the encouraging messages would, well, encourage students to do more problems, yet for this, the company received much criticism, including accusations that they’d treated students like guinea pigs, and failed to obtain their consent.

Viewed from a certain angle, this reaction is strange – prior to the A/B testing Pearson’s default policy had been a lack of encouraging message, which didn’t appear to generate any complaints. People didn’t have a problem with a lack of encouraging messages, or with encouraging messages – they only had a problem with comparing the two conditions. Which doesn’t quite make sense. (As Meyer’s team point out, there are situations in which A/B testing could be genuinely unethical. Giving one group an already validated cancer treatment but withholding it from another, for example, is clearly morally problematic. But Meyer and her colleagues focus entirely on “unobjectionable policies or treatments.”)

At root, Meyer et al’s paper had two goals: To determine how widespread this phenomenon is (after all, sometimes there’s a perception that many people are mad about something, but it’s really just a small group of loud people online who have strong opinions), and to poke and prod people’s reasons for experiencing discomfort at the idea of A/B testing. The team used online samples to probe these issues, conducting “16 studies on 5,873 participants from three populations spanning nine domains.”

As it turns out, it isn’t just a small group of online complainers who are uncomfortable: Based on the new findings, it appears that humans have a more general bias against this sort of A/B testing, for reasons that are hard to pin down.

Take Meyer and her colleagues’ first study. They presented online participants with a vignette in which a hospital director, seeking to lower the rate of death and illness caused by a procedure being performed improperly, thinks it might be helpful to present doctors with a safety checklist. Participants then read one of four versions of what happened next and they had to rate the appropriateness of the course of action taken:

Badge (A): The director decides that all doctors who perform this procedure will have the standard safety precautions printed on the back of their hospital ID badges.

Poster (B): The director decides that all rooms where this procedure is done will have a poster displaying the standard safety precautions.

A/B short: The director decides to run an experiment by randomly assigning patients to be treated by a doctor wearing the badge or in a room with the poster.

A/B learn: Same as A/B short, with an added sentence noting that after a year, the director will have all patients treated in whichever way turns out to have the highest survival rate.

As the researchers predicted, there was more opposition to both forms of the A/B testing than to the unilateral introduction of either safety policy. This finding was robust to multiple versions of the vignette and held up whether the researchers used participants recruited via Pollfish or Amazon’s Mechanical MTurk.

The same phenomenon also popped up in a wide variety of other (hypothetical) situations, from the design of self-driving cars to interventions to boost teacher wellbeing. And the authors write that “the effect is just as strong among those with higher educational attainment and science literacy and those with STEM degrees, and among professionals in the relevant domain.” So it’s not as though this bias can be chalked up to a lack of knowledge about the scientific process, or some sort of lack of critical-thinking skills.

What does explain it, then? The researchers believe that a combination of factors are at work, among them “a belief that consent is required to impose a policy on half of a population but not on the entire population; an aversion to controlled but not to uncontrolled experiments; and the proxy illusion of knowledge,” the last of which the researchers define as the belief that “randomized evaluations are unnecessary because experts already do or should know ‘what works.’”

To many of the sorts of people who rely on A/B testing, of course, this sort of reasoning doesn’t pass muster (why would it be okay to impose a policy on the full population but not half it?). We clearly need more research to better understand the public’s concerns and how to respond to them, given how important A/B testing is in so many different circumstances (and that it is only going to become more common as organisations become more science- and data-focused). For now, though, it’s an important first step to have established that this bias generalises to various different populations and isn’t driven by any one simple factor.

Objecting to experiments that compare two unobjectionable policies or treatments

Post written by Jesse Singal (@JesseSingal) for the BPS Research Digest. Jesse is a contributing writer at BPS Research Digest and New York Magazine, and he publishes his own newsletter featuring behavioral-science-talk. He is also working on a book about why shoddy behavioral-science claims sometimes go viral for Farrar, Straus and Giroux.



View Source

Popular posts from this blog

Danilo Díaz Granados read: “Skunk” Cannabis Disrupts Brain Networks – But Effects Are Blocked In Other Strains

By Matthew Warren Over the past decade, neuroimaging studies have provided new insights into how psychoactive drugs alter the brain’s activity. Psilocybin – the active ingredient in magic mushrooms – has been found to reduce activity in brain regions involved in depression , for example, while MDMA seems to augment brain activity for positive memories . Now a new study sheds some light into what’s going in the brain when people smoke cannabis – and it turns out that the effects can be quite different depending on the specific strain of the drug. The research, published recently in the Journal of Psychopharmacology , suggests that cannabis disrupts particular brain networks  – but some strains can buffer against this disruption. Cannabis contains two major active ingredients: tetrahydrocannabinol (THC) and cannabidiol (CBD). THC is responsible for many of the drug’s psychoactive effects, such as the feeling of being stoned and the anxiety that people sometimes feel, as well as ...

Danilo Díaz Granados read: Beyond the invisible gorilla – inattention can also render us numb and anosmic (without smell)

By Emma Young It’s well-known that we can miss apparently obvious objects in our visual field if other events are hogging our limited attention. The same has been shown for sounds: in a nod to Daniel Simons’ and Christopher Chabris’ famous gorilla/basketball study that demonstrated “inattentional blindness”, distracted participants in the first “inattentional deafness” study failed to hear a man walking through an auditory scene for 19 seconds saying repeatedly “I am a gorilla”. Now, two new studies separately show that a very similar effect occurs in relation to touch ( inattentional numbness ) and to smell   ( inattentional anosmia ).   Sandra Murphy and Polly Dalton (a co-author on the inattentional deafness paper) at Royal Holloway, University of London report in the journal Cognition on inattentional numbness. They wanted to go beyond the way we rapidly tune out ongoing tactile stimulation, like the sensation of our clothes, and explore what happens when we’re tou...

Danilo Díaz Granados read: A New Study Has Investigated Who Watched The ISIS Beheading Videos, Why, And What Effect It Had On Them

By Emma Young In the summer of 2014, two videos were released that shocked the world. They showed the beheadings, by ISIS, of two American journalists – first, James Foley and then Steven Sotloff. Though the videos were widely discussed on TV, print and online news, most outlets did not show the full footage. However, it was not difficult to find links to the videos online. At the time, Sarah Redmond at the University of California, Irvine and her colleagues were already a year into a longitudinal study to assess psychological responses to the Boston Marathon Bombing, which happened in April 2013. They realised that they could use the same nationally representative sample of US adults to investigate what kind of person chooses to watch an ISIS beheading – and why. Their findings now appear in a paper published in American Psychologist .   By late spring 2013, the researchers had recruited 4,675 adults online, and assessed their mental health, TV-watching habits, demographics,...