The 1%—
Estimating the Pareto parameter

1. By hand exercises

1. Suppose a data-set contains the following observations on the wealth of 10 random rich individuals (in thousands):\(100, 105, 112, 120, 129, 141, 158, 183, 224, 316.\)

Calculate an estimate of the Pareto parameter \(\alpha\), and of \(E[Y|Y>100]\).

2. Now suppose that you just observe that the number of people in the tax bracket \([100, 200]\) equals \(800\), and the number of people with wealth above \(200\) equals \(200\).

Calculate an estimate of the Pareto parameter \(\alpha\), and of \(E[Y|Y>100]\).

2. Matlab exercises

Write code that performs the following:

1. Generate \(n\) independent draws from the Pareto distribution with parameters \(\underline{y}\) and \(\alpha\)

Hint: You can take \(Y_i = \underline{y} \cdot U_i^{-1/\alpha}\) for \(U\)uniformly \([0,1]\) distributed. Why?

2. Save these data to a .csv file, and exchange your file with a classmate.

3. Use your classmate's data to estimate \(\alpha\), using the formula in equation: $$\widehat{\alpha}^{MLE} = \frac{n}{\sum_i \log \left (y_i/ \underline{y} \right )}.$$

4. Now generate new data using the same procedure as before, and just tell your classmate the value \(\underline{y}\), as well as the number of observation below / above the cutoff \(2\cdot \underline{y}\). Ask her/him to provide an estimate of \(\alpha\) based on these numbers, using equation: $$\widehat{\alpha}^{MLE} = \frac{\log (N_2 /n)}{\log \left ( \underline{y} /y_l\right )}.$$

5. Now we are going to verify the argument of section "\(r-g\)" by simulations. Generate data following the process in equation \(Y_{t+1} = w_t + R_t\cdot Y_{t},\) that is: $$Y_{t+1} = w_t + R_t\cdot Y_{t},$$ where \(w_t\) and \(R_t\) are independent draws from uniform distributions with boundary values that you pick. Generate 10.000 observations, and only keep the last 2.000. Save them, and give them to a classmate.

6. Sort the data you got from your classmate, and only keep the top 200. Use these observations to estimate the Pareto parameter as in step 2.

7. Repeat the last two steps, but for a different distribution of \(R_t\). Does the estimated Pareto parameter change in the way that you would expect?

Matlab commands which you might find useful:

csvwrite, csvread

3. Empirical exercises

Estimating top income shares using the Survey of Consumer Finances (“SCF”)

The purpose of this exercise is to give you some feel for the basic tasks involved in analyzing actual data. This includes in particular

  • downloading the data from the internet,
  • getting rid of all the unnecessary data in these datasets,
  • converting the data to an appropriate file format
  • reading them into the statistical software used,
  • and generating some descriptive statistics.

Once these tasks are completed, we can proceed to ask statistical questions that can be answered using methods you learned in this class, such as

  • Can we conclude from these data that inequality has increased?
  • Are there significant differences in poverty rates between different demographic groups?
  • Is the distribution of wealth more unequal than the distribution of incomes?

We will analyze data from the Survey of Consumer Finances, or SCF. As described on the homepage of the FEDERAL RESERVE

“The Survey of Consumer Finances (SCF) is a triennial survey of the balance sheet, pension, income, and other demographic characteristics of U.S. families. The survey also gathers information on the use of financial institutions.”

The data we will be using are available in Stata format at the following addresses:



These data sets contain a lot of variables. Different columns correspond to different variables, different rows correspond to different observations (or entries).

The variables we will be using are the following:

  • WGT (sample weight)
  • ASSET (total value of assets held)
  • DEBT (total value of debt)
  • NETWORTH (difference between assets and debt)
  • INCOME (total income)
  • WAGEINC (wage and salary income)
  • EDUC, (education of the household head)
  • AGE (age of the household head)
  • RACE (of the respondent)

Furthermore, for some technical reasons, these data sets contain five entries for every household. We will only keep the first entry, and delete the remaining four entries.

Recall that all kinds of tutorials for learning Stata can be found at


In this exercise, the problem of calculating standard errors for weighted data comes up. If you are interested in how STATA allows to handle this problem, have a look at


Steps of your assignment:

  1. Pick one of the survey years

  2. Download the corresponding data from the internet

  3. Extract the zip file

  4. Open in OpenOffice Calc, or in Excel

  5. Save the file as a “comma delimited variable” (.csv) file, by selecting File-Save As

  6. You can now quit Calc (or Excel), and open Stata

  7. In the Stata command line, type doedit. This will open the do-file editor, in which you can save all steps of your analysis.

  8. On the first line type clear

  9. On the second line, type insheet using filename.csv, where “filename” is the name you gave to your file

  10. Delete all variables, except for those listed above keep wgt asset debt networth income wageinc educ age race

  11. Save the do file, and then execute it using the arrow in the top right corner of the do-file editor

  12. Open the data browser (by clicking on the symbol of a table with a magnifying glass) and have a look at the imported dataset

  13. Close the data browser

  14. Familiarize yourself with the following STATA commands: gen, sum, svyset, svy. To do so, type for instance “help gen” or “help sum” in the command line. Look in particular at the
    1. weighting options of these commands!
    2. The dataset has five lines for each household. We need to delete all but one of them. The following sequence of commands accomplishes this: gen no= mod(_n/5, 1) keep if no==0 drop no

  15. Continue working on your do file. Calculate the poverty line (0.6 times the median income), the .9 quantile of the income distribution, and the .99 quantile. Don't forget to use weights: sum income [aweight=wgt], detail

  16. Generate indicator variables for the following:
    • whether a household is below the poverty line: gen poor = (income < povertyline)
    • whether a household is among the top 10 percent (top 1 percent) of income earners
    • whether the survey respondent is non-Hispanic white (race=1), and
    • whether the respondent is black (race=2)

  17. Set the weights for the survey commands using svyset [pweight=wgt]

  18. Calculate the poverty rate for the full population, for the subpopulation of non-Hispanic white households, and for the subpopulation of black households:
    • svy: mean poor and svy,
    • subpop (nonhispwhite): mean poor, etc.
    • These commands also provide the correct standard errors you will have to use for your statistical tests.

  19. Calculate a variable incsharetop10 containing household income divided by mean income, times an indicator for whether the household is among the top 10 percent. The mean of this variable will be the share of incomes going to the top 10 percent. Calculate this mean using svy: mean incsharetop10

  20. Do the same for the top 1% income share

  21. Replicate the entire analysis for a different survey year. To do so, you just have to change the data set you import in your do file!