Experiments

Economic chances are very unequally distributed along dimensions such as race and gender. Why is this so? There are many channels through which such inequalities might be created. These include early childhood influences, different neighborhoods of growing up, different access to and quality of primary, middle, and high school education, the creation of aspirations, different access to and treatment in higher education, different chances of being hired when applying for a job, different wages conditional on being hired, different chances of being promoted or fired in a given job, differential treatment by customers or clients, etc.

The channel of *hiring* might in turn be decomposed into several components. What is the chance of being invited to an interview, and what is the chance of being hired given an interview? How does the chance of being invited to an interview depend on the neighborhood of residence, the high school attended, or the (perceived) race and gender of an applicant? It is this very last question that the paper by Betrand and Mullainathan (2004), which we discuss next, addresses. How does the chance of being invited to an interview depend on perceived race, for otherwise identical CVs? You should keep in mind that this is only one of many channels through which discrimination might affect labor market outcomes.

This paper gives us occasion to review potential outcomes, causality, and randomized experiments in the way they are conceptualized by empirical economists today. This framework is useful for making precise (i) what we mean by race "causing" lower/higher chances of being offered an interview, and (ii) how we can learn about this causal effect.

Consider a **treatment** \(D\), which can take one of two values, \(D=0\) or \(D=1\). The language used to talk about causality by applied economists these days has its roots in biostatistics and medical trials, where D=0/1 corresponds to placebo / actual treatment, hence the terminology of "treatments" and "treatment effects." In our application, \(D\) would be the implied race of the name on a given CV. Denote by \(Y_i\) the **outcome** of interest for CV \(i\). In our application, this would be whether a CV received a call to be invited for a job interview. In order to talk about causality, we use the notion of **potential outcomes**. Potential outcomes provide the answer to "what if" questions:

- Potential outcome \(Y^0_i\):

Would CV \(i\) have received a callback if the race implied by the name on it were \(0\)? - Potential outcome \(Y^1_i\):

Would CV \(i\) have received a callback if the race implied by the name on it were \(1\)?

**Observed outcomes** are determined by the equation

It is not obvious at this point that potential outcomes are an empirically meaningful idea. As we will see in section 2 below they are meaningful once we introduce the notion of a controlled experiment.

With the notion of potential outcomes at hand, we can define the causal effect or **treatment effect** for CV \(i\) as \(Y_i^1-Y_i^0\).
Correspondingly, we can define the average causal effect or **average treatment effect**,

which averages the causal effect over the population of interest.

Given this formalism, we can also state **the fundamental problem of causal inference**:

We never observe both \(Y^0\) and \(Y^1\) at the same time!

One of the potential outcomes is always missing from the data. Which treatment \(D\) was assigned determines which of the two potential outcomes we observe (recall that \(Y= D\cdot Y^1 + (1-D) \cdot Y^0\)).

Closely related is the **selection problem**: Simply comparing the average outcomes of those who got \(D=1\) and those who got \(D=0\) in general tells us nothing about causal effects. The reason is that the distribution of \(Y^1\) among those with \(D=1\) need not be the same as the distribution of \(Y^1\) among everyone, similarly for \(Y^0\). It might, for instance, be the case that the CVs with "black" names have higher educational qualifications on average than those with "white" names, so that their chances of receiving a callback are higher no matter what the name on the CV is. Making the same point formally, we get that in general

The selection problem arises because potential outcomes and treatment are not statistically independent. There is one way to ensure that they actually are: by assigning treatment in a controlled way in an experiment, possibly using randomization. This guarantees that there is no selection, i.e.

$$ (Y^0, Y^1) \perp D. $$In this case, the selection problem is solved and

The statistical independence ensures that, when comparing averages for the treatment and control group, we actually compare "apples with apples."
Note how this idea of controlled experiments gives **empirical content to** the "metaphysical" notion of **potential outcomes**!

In Bertrand and Mullainathan (2004), for instance, statistically "white" or "black" names were randomly assigned to given resumes which were sent out as job applications. This allows one to estimate the causal effect of race on the likelihood of getting invited to a job interview by simply comparing means. They actually used a design that was slightly more complicated than simple randomization: for each job-opening they submitted two (or four) randomly chosen CVs, and out of those one (or two) were randomly assigned a "black" / "white" name.

So far we have talked about expectations, that is population averages, for the treatment and control groups. We can easily construct estimators by replacing expectations with sample averages in equation (4). Consider a randomized trial with \(N\) individuals. Suppose that the estimand of interest is ATE: $$ {ATE}= E[Y_1 -Y_0]=E[Y|D=1]-E[Y|D=0]. $$ Replacing the conditional expectation \(E[Y|D=1]\) by its sample analog, the conditional mean \(\overline{Y}_1 \), and similarly for \(E[Y|D=0]\), we construct an estimator: $$\widehat{\alpha} = \overline{Y}_1 - \overline{Y}_0,$$

where

with \(N_1 = \sum_i D_i\) and \(N_0=N-N_1\).As you can easily show, \(\widehat{\alpha}\) is an unbiased estimator of the \({ATE}\), $$E[\widehat{\alpha}] = ATE.$$

We not only want to get a point-estimate of the average treatment effect, we also want to calculate a range of likely values, to assess whether our estimates are just the result of chance or reflect some true causal effects. This can be done using the \(t\)-statistic, which is defined as

\(\widehat{\sigma}_0^2\) is analogously defined. The \(t\)-statistic is approximately standard normal distributed (for samples of a reasonable size), \(t\sim^{approx} N(0,1).\)

We get a range of plausible values — a 95% confidence interval — by calculating the interval

As an exercise, try to show that

$$P(\alpha \in CI) \approx 0.95.$$Note that in this expression \(\alpha\) is fixed, while \(CI\) is random!