1. Suppose a data-set contains the following observations on the wealth of 10 random rich individuals (in thousands):\(100, 105, 112, 120, 129, 141, 158, 183, 224, 316.\)
Calculate an estimate of the Pareto parameter \(\alpha\), and of \(E[Y|Y>100]\).
2. Now suppose that you just observe that the number of people in the tax bracket \([100, 200]\) equals \(800\), and the number of people with wealth above \(200\) equals \(200\).
Calculate an estimate of the Pareto parameter \(\alpha\), and of \(E[Y|Y>100]\).
Write code that performs the following:
1. Generate \(n\) independent draws from the Pareto distribution with parameters \(\underline{y}\) and \(\alpha\)
Hint: You can take \(Y_i = \underline{y} \cdot U_i^{-1/\alpha}\) for \(U\)uniformly \([0,1]\) distributed. Why?
2. Save these data to a .csv file, and exchange your file with a classmate.
3. Use your classmate's data to estimate \(\alpha\), using the formula in equation: $$\widehat{\alpha}^{MLE} = \frac{n}{\sum_i \log \left (y_i/ \underline{y} \right )}.$$
4. Now generate new data using the same procedure as before, and just tell your classmate the value \(\underline{y}\), as well as the number of observation below / above the cutoff \(2\cdot \underline{y}\). Ask her/him to provide an estimate of \(\alpha\) based on these numbers, using equation: $$\widehat{\alpha}^{MLE} = \frac{\log (N_2 /n)}{\log \left ( \underline{y} /y_l\right )}.$$
5. Now we are going to verify the argument of section "\(r-g\)" by simulations. Generate data following the process in equation \(Y_{t+1} = w_t + R_t\cdot Y_{t},\) that is: $$Y_{t+1} = w_t + R_t\cdot Y_{t},$$ where \(w_t\) and \(R_t\) are independent draws from uniform distributions with boundary values that you pick. Generate 10.000 observations, and only keep the last 2.000. Save them, and give them to a classmate.
6. Sort the data you got from your classmate, and only keep the top 200. Use these observations to estimate the Pareto parameter as in step 2.
7. Repeat the last two steps, but for a different distribution of \(R_t\). Does the estimated Pareto parameter change in the way that you would expect?
Matlab commands which you might find useful:
rand
csvwrite, csvread
sort
Estimating top income shares using the Survey of Consumer Finances (“SCF”)
The purpose of this exercise is to give you some feel for the basic tasks involved in analyzing actual data. This includes in particular
Once these tasks are completed, we can proceed to ask statistical questions that can be answered using methods you learned in this class, such as
We will analyze data from the Survey of Consumer Finances, or SCF. As described on the homepage of the FEDERAL RESERVE
“The Survey of Consumer Finances (SCF) is a triennial survey of the balance sheet, pension, income, and other demographic characteristics of U.S. families. The survey also gathers information on the use of financial institutions.”
The data we will be using are available in Stata format at the following addresses:
These data sets contain a lot of variables. Different columns correspond to different variables, different rows correspond to different observations (or entries).
The variables we will be using are the following:
Furthermore, for some technical reasons, these data sets contain five entries for every household. We will only keep the first entry, and delete the remaining four entries.
Recall that all kinds of tutorials for learning Stata can be found at
In this exercise, the problem of calculating standard errors for weighted data comes up. If you are interested in how STATA allows to handle this problem, have a look at
Steps of your assignment: