Redistributive Taxation—
Optimal Tax Theory

REFERENCE: Using Elasticities to Derive Optimal Income Tax Rates┬╗
REFERENCE: After Piketty?┬╗

One of the primary policy tools to address economic inequality is redistributive taxation. Redistributive taxation is, obviously, a very contested field of policy. There is a field of economics that aims to derive "optimal taxes," including optimal redistributive income taxes, inheritance taxes, etc. We will discuss some of the basic ideas of this field in the present chapter. A key reference for our discussion is Saez (2001).

Recall that we discussed the distributional impact of changes in prices on individuals' welfare in the previous chapter. What we will do next is very similar, with changing taxes taking the place of changing prices. Additional complications arise because we need to talk about government revenues, and about how to compare the welfare of different people.

There are many different kinds of taxes in practice, including value-added taxes, income taxes, wealth taxes, inheritance taxes, etc. The framework we discuss applies, in principle, to the analysis of all of these.

1. General principles

There are some general principles in common to the analysis of "optimal taxes" for different kinds of taxes:

  1. Marginal policy changes:
    The theory of optimal taxation is concerned with finding policies that maximize some notion of social welfare. As usual, we can characterize maximizers by first order conditions. At the optimum, any (feasible) marginal policy change has no effect on social welfare. We thus need to understand the effect of marginal policy changes on welfare.

  2. Envelope theorem:
    The first key ingredient to understand such marginal changes is the result we proved in the previous chapter: If we (i) measure individual welfare by realized utility and (ii) assume that individuals are maximizing utility subject to the constraints they face, then we can ignore the effect of behavioral responses to policy changes. This result is called the envelope theorem.

  3. Welfare weights:
    The envelope theorem allows us to evaluate the effect of a policy change on any individual, in terms of the amount of dollars that we could equivalently have given or taken from them. But how do we get from there to social welfare? We have to somehow decide how much we care about an additional dollar for a rich person versus an additional dollar for a poor person, or an additional dollar for a disabled person versus for an able-bodied person, etc. It is important to recognize that there is no "scientific" way to make this decision! In particular, it is meaningless from the point of view of economic theory to sum up dollars across people. The decision how to make these trade-offs depends on our moral judgments, and in practice, on the outcome of distributional struggles between different groups.

    If we have settled on how to make these trade-offs between different people, we can express them in terms of welfare weights \(\omega_i\) that measure the value we attach to an additional dollar for person \(i\).

  4. Government budget constraint:
    When we think about changing taxes, we also have to think about the impact of these changes on government revenues. One way to do this is to only consider tax changes that do not change total revenues. Another way, which is mathematically equivalent, assumes that there is a marginal value \(\lambda\) of additional government revenues, where \(\lambda\) is on the same scale as the welfare weights \(\omega_i\). This is the approach we will take.

    When we are considering the effect of tax changes on government revenues, we can not ignore behavioral responses to these changes. Usually, the tax base, and thereby government revenues, are affected by such behavioral responses. Rich individuals might for instance respond to a tax increase by exploiting additional loopholes in the tax code or by tax evasion.

  5. Effects on prices:
    When thinking about the effect of changing some tax, we also have to think about how prices and wages are affected by this change. This can be complicated, and is an empirical matter. To simplify our exposition, we will assume in this chapter that prices and wages do not change in response to policy changes.

Let us now state these principles in a more formal way. Suppose we are changing a tax parameter \(\alpha\), individual welfare for person \(i\) is given by \(v_i\), and government revenues are given by \(g\). A choice of \(\alpha\) is optimal if

$$ \partial_\alpha SWF = 0. $$

Adding up all components of social welfare, and using the appropriate welfare weights, we get

$$ \begin{equation} \partial_\alpha SWF = \sum_i \omega_i \cdot \partial_\alpha v_i + \lambda \cdot \partial_\alpha g. \end{equation} $$

The envelope theorem tells us that \(\partial_\alpha v_i\) can be calculated as the effect on the individual's budget constraint, holding behavior constant.

The effect on government revenue \(\partial_\alpha g\) has two components, the direct effect (holding behavior fixed), and the behavioral effect of individuals reacting to the policy change.

2. Linear income tax

Let us go through these terms in a more specific context, where individuals choose their labor supply \(l\) and consumption \(x\) subject to a linear income tax \(t=\alpha + \beta \cdot l \cdot w\), where \(l\) denotes labor supply and \(w\) denotes the wage. Real income taxes are rarely linear, but this assumption allows us to considerably simplify our discussion. Different individuals have different utility functions and different wages. In generalization of the setup we considered in section 1, assume individuals solve

$$ \begin{equation} (x_i, l_i) = {\arg\!\max}_x u_i (x, l) \end{equation} $$

subject to the budget constraint

$$ \begin{equation} x_i \cdot p \leq -\alpha + w_i \cdot l_i \cdot (1 - \beta). \end{equation} $$

Note that the choice variables \(x_i\) and \(l_i\) are functions of prices \(p\), wages \(w_i\), and the tax parameters \(\alpha\) and \(\beta\). Realized utility, as before is given by

$$v_i = u_i(x_i, l_i).$$

By exactly the same arguments as in CHAPTER 9, we get that the envelope theorem in this setting implies that the equivalent variation of marginally increasing \(\alpha\), and of marginally increasing \(\beta\), is given by

$$ \begin{align*} EV_\alpha &= -1\\ EV_\beta &= - w_i \cdot l_i. \end{align*} $$

As an exercise, try to prove this, going step by step through the arguments of CHAPTER 9.

What about government revenues? Effects on these are given by the sum of a mechanical and a behavioral component,

$$ \begin{align*} \partial_\alpha g &= N + \beta \cdot \sum_i w_i \cdot \partial_\alpha l_i \\ \partial_\beta g &= \sum_i w_i \cdot l_i + \beta \cdot \sum_i w_i \cdot \partial_\beta l_i, \end{align*} $$

where \(N\) is the number of people in the population. To simplify exposition, we shall assume that there are no effects of changing \(\alpha\) on labor supply, so that \(\partial_\alpha l_i =0\) and thus \(\partial_\alpha g = N\).

Now we have all terms that we need to calculate the marginal effect on social welfare of changing \(\alpha\) and \(\beta\):

$$ \begin{align*} \partial_\alpha SWF &= \sum_i (\lambda - \omega_i)\\ \partial_\beta SWF &= \sum_i (\lambda - \omega_i)\cdot w_i \cdot l_i + \lambda \cdot \beta \cdot \sum_i w_i \cdot \partial_\beta l_i. \end{align*} $$

These expressions are obtained by simply adding up everyone's equivalent variation, weighted by \(\omega_i\), and the impact on government revenues, weighted by \(\lambda\).

At the optimal linear income tax, both of these expressions have to equal zero. This implies

$$ \begin{align*} \lambda &= E[ \omega_i]\\ \lambda \cdot \beta \cdot E[ w \cdot \partial_\beta l] &= \text{Cov}(\omega, w\cdot l), \end{align*} $$

where \(E\) denotes the average across individuals, and \(\text{Cov}\) the covariance across individuals.

The first equation says that the value of an additional dollar for the government is the same as the average value of an additional dollar across the population. The second equation can be rewritten as

$$ \beta = \frac{\text{Cov}(\omega / \lambda, w\cdot l)}{ E[ w \cdot \partial_\beta l]}. $$

This equation says that the marginal tax rate \(\beta\), that is the degree of redistribution,

  1. is decreasing in the covariance of welfare weights and earnings.
    This covariance is negative if we assign larger welfare weights to people with lower earnings, and it is more negative the more welfare weights reflect a desire for redistribution. It would be a good exercise, involving some calculus, to verify this claim assuming that \(\omega\) is a decreasing function of \(w\cdot l\).

  2. is decreasing in the behavioral response of the tax base to an increase in tax rates, \(-E[ w \cdot \partial_\beta l]\).
    If higher tax rates lead to increased tax evasion, for instance, than this behavioral response is negative, as well. This second item reflects constraints on feasible redistribution. To the extent that there are behavioral responses to taxation, it is not possible to take 1$ from a rich person and give 1$ to a poor person. If behavioral responses are small (as seems to be the case, with the exception of tax evasion), we might get close, though.

3. Optimal top tax rate

Let us now turn to nonlinear income taxes, where we go through a simplified exposition of the arguments in Saez (2001). We will only consider how to set the top tax rate. In standard models, welfare weights ("the marginal welfare value of additional income") go to zero as income goes to infinity, relative to the welfare weights of people with average income. Put differently, an additional dollar for a billionaire is considered to be of much smaller value than an additional dollar for a poor person. If that is so, we want to set the top tax rate to maximize revenues, since the assumption implies

$$ \partial_\tau SWF = \lambda \cdot \partial_\tau g, $$

where \(\tau\) is the top tax rate. This top tax rate applies to everyone above the income threshold \(\underline{y}\).

Assume, returning to CHAPTER 3, that top incomes follow a Pareto distribution with parameter \(\alpha\): $$ P(Y>y | Y \geq \underline{y}) = \left ( \underline{y} /y\right )^{\alpha}. $$

Assume further that the elasticity of taxable income with respect to the "net of tax" rate \(1-\tau\) is equal to \(\eta\) for those above the income threshold \(\underline{y}\), that is

$$ \begin{equation} \eta = -\partial_\tau y_i \cdot \frac{1-\tau}{y_i}. \end{equation} $$

Government revenues from taxes on top income receivers are equal to

$$ g(\tau) = \tau \cdot N \cdot\left (E[Y| Y \geq \underline{y} ] - \underline{y}\right ), $$

where \(N\) is the number of individuals above the threshold. We have all terms that we need to calculate the effect of a change of \(\tau\) on government revenues, which is given by a sum of mechanical and behavioral effects:

$$ \begin{align} \tfrac{1}{N}\cdot\partial_\tau g &= \left (E[Y| Y \geq \underline{y} ] - \underline{y}\right ) - \frac{\tau}{1-\tau} \cdot \eta \cdot E[Y| Y \geq \underline{y} ] \nonumber \\ &= \underline{y} \cdot \left ( \frac{\alpha}{\alpha - 1}\cdot \left (1 - \frac{\tau}{1-\tau}\cdot \eta \right ) - 1 \right ) \end{align} $$

Solving the first order condition \(\partial_\tau g = 0\) yields

$$ \left ( \frac{\alpha}{\alpha - 1}\cdot \left (1 - \frac{\tau}{1-\tau}\cdot \eta \right ) - 1 \right ) =0, $$

or, after some algebra

$$ \tau = \frac{1}{1+\alpha \cdot \eta}. $$

If we plug in the realistic parameter values \(\alpha = 2\) and \(\eta = .25\), this formula implies an optimal top tax rate of \(1/(1+0.5) = 67\%\). More generally, optimal top tax rates are larger (i) the more unequal the distribution of incomes is (small \(\alpha\)), and (ii) the less responsive taxable incomes are to changes in tax rates (less tax loopholes, better tax enforcement).


Saez, E. (2001). Using elasticities to derive optimal income tax rates. The Review of Economic Studies, 68(1): 205 – 229.