Chapter 2 Examine Assumptions and account for Pitfalls

Before designing and running experiments, it helps to examine our assumptions about the topic and clearly track and communicate them. In a similar fashion, statistical models that are later used to analyze collected data also work under assumptions. For example, a simple linear regression model requires four assumptions:

- E(y) = Xβ
- Independence
- Equal variance (σ²)
- Normality

Without checking these assumptions, we are using the wrong models and generating misguided insights or misinformation. Thus, to reduce reliance on models with un-intended consequences, we need to clearly and transparently examine our assumptions about the topic of interest.

Another consideration is measuring how much information we need and can get from an experiment (degrees of freedom). Generally, the more variables are included, the less information is left out. In contrast, our data analysis models can run out of steam if there are too many variables (problem of big data). Thus, we need to strike a balance between information/data gathering and analysis methods.

To summarize, experiments can be more prone to failure if disregard the following points (not an exhaustive list):

Devise wrong metrics (i.e. metrics that don’t answer the question at hand)
Have no clear scope (i.e. what are the boundaries for the ‘system under test’)
Omit assumptions and limitations of study
Use unrepresentative metrics, have no comparison groups, or have cross-contamination
Not recognize the experimental limitations
Overlook significant parameters that affect the behaviour of a system
Report average and not variability (fall for tricks of statistics or have no statistics!)
Have no interpretation of what results mean or overgeneralizing conclusions
Ignore errors and outliers
Not consider the ethical issues and scenarios or have informed consent from participants

2.1 It is not all bad

Evidence-based policy making can be a political ideal

Experimentation is not all bad news. Many breakthroughs and transformations on many fronts from medicine and technology to social changes for good have come about by a willingness to experiment. For example, experimentation with design of cell phones has resulted in ease of their usability in time.

There are also many advantages to experimentation. For instance, experiments can result in a higher degree of internal validity. Through random assignment of a treatment condition, experimental designs allow us to examine the effect of one variable while keeping other conditions constant. Note that randomization is the key here because it ensures that the treatment and control groups are comparable. Any differences between the two groups can be attributed to the treatment.

For instance, a group in the U.K. tested differ letter framings on the tax reported behaviour of over 7,300 sole proprietors. The different treatments were offers of assistance with tax forms, rational argument and threats of audit. By and large, the treatments proved effective at encouraging taxpayers to declare more, with the threat messages being the most effective (Hasseldine et al. (2007), Persuasive Communications: Tax Compliance Enforcement Strategies for Sole Proprietors, Contempory Accounting Research).

Experimentation can be good even without prior knowledge. This is because sometimes there may not be a theory or theories may fall short to start with. Since experimentation can directly control how data is generated, the experimental approach can survive and thrive with no previous knowledge. For instance, with no prior knowledge, electoral researchers can carry out a field experiment to examine how different ways of contacting voters would affect the voting turnout.

Experimentation can help clarify mixed results. This is sometimes inherent in observations that are looking at similar phenomena with different measurements or with different data sources. Observational studies often generate mixed results and one way to validate or clarify the existing mixed result is to run an experiment. Again with the example of electoral politics, researchers can chose to run an experiment to detect how campaign spending affects the voters for the incumbent and the challengers differently.

2.2 Communication of intent to experiment

Given that we are considering the option to experiment, we need to have clear and open communications and collaborations with cross-functional teams.

The aim is to get feedback from as many diverse people as possible. These conversations can help us decide which ideas to take forward, reiterate clear questions based on feedbacks, and eventually implement in an experimentation proposal.

These conversations presumes that there is already an executive level buy-in in place and that stakeholders are invested in the experimentation process. This also assumes that a clear and thoughtful design and implementation plan exists before starting to communicate such that it can be communicated wholly and incrementally with the executives. This is not a tautology and hopefully drive the point that iteration, re-iteration, agility, and an open attitude are key qualities in the initial process.

Executive buy-in usually requires a thorough risk assessment and contingency plans more so in the public service space than in academic environments. Therefore, it is important that the executive is aware of the experimentation cycle which can also be thought of as the problem-solving process and endorses the method, approach, tests, and tools that generate evidence. Some, in fact, argue that experimentation is the creation of something new in the face of uncertainty and risk, which requires time, effort and relevant resources.

It is noteworthy that the word “experimental” has come to mean “innovative” or “radical” rather than simply “untested”. The Experimenter’s Inventory

Genuine experimentation is about committing to rigorous assessments and evaluation of evidence, not just freewheeling “trying stuff out” or doing things differently and expecting to succeed.

Therefore, even though a methodology is key to answering a specific question, the starting point is having a problem you are trying to resolve, preferably with the social good in mind. The purpose of experimenting is to test key questions and assumptions using quick, low risk, rigorous experiments.

From the thousands of experiments conducted by Thomas Edison to create the first lightbulb, or the long-running field experiments by Gregor Mendel to examine genetic variability that today underpins modern agriculture concepts, through to trials in medicine, carefully testing ideas in practice is a cornerstone of scientific and technological discovery.

2.3 Experiments in the public sphere

Today, experiments are critical to sectors where innovation and optimization are routine, such as web development, digital transformation, electrical vehicles, etc. This has caught on in business such that the largest financial institutions, retailers and restaurants are also running randomized experiments, along with companies like Google, Facebook, and Amazon running tens of thousands of experiments a year. A/B testing is now the standard means through which Silicon Valley improves its online products. However, in government experimentation remains relatively rare and a new field.

One of the most famous nudge experiments is the ‘Save More Tomorrow’ (SMarT) program that used defaults to increase employees’ savings rates by automatically increasing the percentage of their wage devoted to saving. Average saving rates for SMarT program participants increased from 3.5% to 13.6% over the course of 40 months while savings rates remained stagnant in the other two conditions (Benartzi & Thaler (2004), Save More Tomorrow, Journal of Political Economy).

A small but growing movement of policy experimenters are bringing fresh ideas on how to solve public problems. From crafting better services, to making the back-office of government more efficient, new methods and tools need to be used to develop and test policy. In fact, government must rigorously and systematically put policy to the test – or risk stagnation.

For example, a group conducted three field experiments looking at increasing savings with text message reminders and found that goal-specific reminders were considerably more effective than generic ones (Karlan et al. (2010), Getting to the Top of Mind: How Reminders Increase Saving, NBER Working Paper).

In the context of government agencies, experiments aim to evaluate a program, policy, 
or service and test an idea or innovation by investigating what difference it has 
made or will make for the people it is aiming to help.

Like laboratory experiments, public sphere experiments also need a control group to test an innovation against “business-as-usual”. This doesn’t have to be a large trial like testing a drug and can be fast and flexible. In fact, the best experiments start small and as a prototype before they are extended.

For example, the World Bank advocates for “nimble randomized control trials”. They funded nimble evaluations on how best to improve the take up of health insurance in Azerbaijan, expand the use of contraceptives in Burundi, and support teachers to deliver tailored education to children affected by war and displacement in Lebanon.

2.4 Summary: deciding to experiment

Do you need to experiment? Why or why not?
Find a behaviour, program, policy, or service to test
Try out of the box thinking to brainstorm, make observations
Look for natural experiments
Talk to experts and get feedback
Think small and short term
Start with a proof-of-concept question and hypothesis
Keep it simple and try to test one thing at a time
Measure everything that matters
Have control and treatment groups when possible