Taxsim supplement to the Survey of Consumer Finance

Provided in Stata, SAS and CSV formats

The SCF is a survey of income and wealth done by the US Federal Reserve Board every three years. It is particularly appropriate for tax analysis because it over-samples high income households which are the source of the majority of tax revenue. Here we offer small files with just the taxsim relevant variables, and large files with those variables joined to the full SCF public use files. Although the SCF began in 1963, here we have only 1989+.

Downloads

There is taxsim data for each SCF from 1989-2019. These files contain the necessary variables to submit to the taxsim32 calculator to calculate federal income tax liability. The files "byhousehold" are one record per SCF respondent. Since a since respondent may encompass several taxpaying units, the taxsim data is aggregated to the household level. Variables that don't make sense to be summed are omitted. These files have all the SCF variables and even if you don't want taxsim, it may be convenient to use these files for Stata or other packages that can't read sas7bdat format.

The files "bytaxunit" include a record for each combination of household, replication and taxunit. This means that you need to be careful comparing taxsim variables with household variables. For example, if you add an SCF income item over all records, that double counts the income of households with more than one tax unit. Nor would you want to find an average tax rate by dividing tax liability by household income.

Note that the SCF does not provide child care expense, so dep13 isn't provided. Nor are the business tax deduction variables available in 2019. As a result -taxsim27.ado- and -taxsim32.ado- should generate the same tax liability.

Recall that naive calculations of the standard error will be incorrect because each actual interview is divided into 5 records to allow the calculation of standard errors for imputed variables. Here is a link to a document on the SCF website about calculation standard errors.

If you are using Stata you may wish to install -micombine- to simplify the calculation:

ssc install micombine An example showing the calculation of total federal income tax liabililty by filing status is simply: log using example1,text replace use "https://www.nber.org/taxsim/to-taxsim/scf27-32/taxsim/txpydata19" generate weight=x42001/5 taxsim32,replace full table mstat [pw=weight],c(sum fiitax) Taxsim only uses 32 variables, while the ./bytaxunit and ./byhousehold file include all the thousands of variables in the SCF with the addition of computed variables suitable for taxsim. None of the SCF variables can go directly in to taxsim. the ./taxsim directory provides just the taxsim variables in the ./taxsim/txpydataNN.dta files.

Here the dividing the original weight by 5 compensates for the implicates in the calculation of the sum, and since fiitax is a tax-unit variable there is no problem with multiple taxunits in a single household.

But perhaps you will need something from the SCF. Here we tab by "Expect inheritance". Note maxvar.

log using example,text replace set maxvar 8000 use "https://www.nber.org/taxsim/to-taxsim/scf27-32/bytaxunit/dta/scftax19" generate weight=x42001/5 taxsim32,replace full table x5819 [pw=weight],c(sum fiitax) Because of the multiple implicates,the -table- command isn't sufficient to get correct standard error of estimate. The -micombine- command is an improvement on multiplying the estimated standard error by the square root of 5. Here we find the standard error of the mean of income tax liability. We need to : micombine regress fiitax [aw=x42001],obsid(y1) impid(rep) by taking advantage of the fact that with one independent variable, that coefficient (_con) is just the mean.. A multivariate regression just adds independent variables. Here we regress liability on AGI. micombine regress fiitax v10 [aw=x42001],obsid(y1) impid(imp) Packages -scfses- and -scfcombo- take account of the complex sample design, which -micombine- does not. This requires merging the weighting matrix to the survey. Here we do the same regression ,with standard errors corrected for survey design: set maxvar 10000 use "http://www.nber.org/taxsim/to-taxsim/scf27-32/wts/dta/p19_rw1" merge 1:m y1 using "http://www.nber.org/taxsim/to-taxsim/scf27-32/orig/dta/p19i6" generate weight = x42001/5 generate rep = y1 - 10*int(y1/10) scfses fiitax [pw=weight],p(mean) scfcombo fiitax v10 [aw=weight],command(reg) So far the examples are based on tax filing units. For households, we need to sum the taxunit variables the level of the survey record. collapse (mean) x5729 weight rep (sum) fiitax,by(y1) scfcombo fiitax x5729 [aw=weight], command(reg) With each level of y1, x5279 will be constant, so the mean will do for that variable. The sum of fiitax over y1 will give total tax liability in that household. Clustered standard errors are left as an exercise for the reader.

Notes for merging ../txpydataNN.dta with ../orig:

x42001 sums to the number of respondents in the universe.
y1 indexes household*implicate and identifies each. record in ./wts
yy1 indexes household
rep indexes implicate number - 1-5
taxsimid is 1000*y1+100*imp+taxunit and identifies each record in ./txpydata and ./bytaxunit.

The SAS program frbscftax.sas is by Kevin Moore of the Federal Reserve Board, and I thank him for making it available. Comments by him on the taxsim integration here: ...

1) In my SAS program, if you turn the HTAXFILE=YES then the program looks for the file from TAXSIM, reads it in and then creates three different datasets (see line 3839 in the SAS program). Basically the three datasets are 1) tax units, 2) combine all the tax units in the primary economic unit (PEU) back into a household, 3) same as 2), but also adds any tax units from the non-primary economic unit (NPEU) back into the household. The PEU is the main household in the survey, the NPEU consists of individuals who are not financially dependent on the PEU. An example is a financially independent sibling (over 18) that lives with his brother and the brother's spouse or partner. I know it may seem a bit confusing, but I was trying to give users as much flexibility as possible. 2) So your Stata code in the readme file will produce dataset 3), which includes PEU and NPEU tax units. If you didn't want to keep the NPEU tax units, you could only keep observations from the TAXSIM file where the last digit of the taxsimid is less than 3, as that would exclude all the NPEU tax units. Tax units from the PEU have a last taxsimid digit of 0, 1, or 2, where 0=tax unit and household are the same, and 1 or 2 means the original household has been split into 2 tax units. ...

Sources: https://www.federalreserve.gov/econres/scfindex.htm Much more documentation is available at that URL. URL of this directory: http://www.nber.org/taxsim/to-taxsim/scf27-32 Daniel Feenberg feenberg@nber.org May 28, 2021