Index of /to-taxsim/scf-2022
In these directories are the original Survey of Consumer Finances
public use datasets (in ./frbsurvey), Taxsim versions of those files
(in ./txpydata) and the results of running those files through Taxsim (in
./results). The SAS code that does this is in ./code.
Here is an example in Stata for running the FRB file through Taxsim:
insheet using http://taxsim.nber.org/to-taxsim/scf-2022/txpydata/scf22pubtaxtu.$
taxsim35,replace
I want to thank Kevin Moore for sending the PUF files and the code for the
transforming them.
Please see http://taxsim.nber.org/ for information about Taxsim
Kevin kindly provided these notes on the code:
================================================================
I have updated the SCF TAXSIM SAS program to include the 2022 SCF. I have
also updated the program to run on TAXSIM v35, but I still have not
incorporated the business tax deduction variables and I continued to use
the dep13, dep17, and dep18 variables instead of age1, age2, and age3.
The business tax deduction variables are an issue I'm continuing to work
on, but I wanted to put out the updated program. The documentation page
you have for my code, https://taxsim.nber.org/to-taxsim/scf27-32/, only
needs a few minor updates. The first is that the program runs on TAXSIM
v35, the other edits are highlighted below.
There is taxsim data for each SCF from 1989-2022. These files contain the
necessary variables to submit to the taxsim32 calculator to calculate
federal income tax liability. The files "byhousehold" are one record per
SCF respondent. Since a since respondent may encompass several taxpaying
units, the taxsim data is aggregated to the household level. Variables
that don't make sense to be summed are omitted. These files have all the
SCF variables and even if you don't want taxsim, it may be convenient to
use these files for Stata or other packages that can't read sas7bdat
format.
1) In my SAS program, if you turn the HTAXFILE=YES then the program looks
for the file from TAXSIM, reads it in and then creates three different
datasets (see line 4035 in the SAS program). Basically the three
datasets are 1) tax units, 2) combine all the tax units in the primary
economic unit (PEU) back into a household, 3) same as 2), but also adds
any tax units from the non-primary economic unit (NPEU) back into the
household. The PEU is the main household in the survey, the NPEU
consists of individuals who are not financially dependent on the PEU.
An example is a financially independent sibling (over 18) that lives
with his brother and the brother's spouse or partner. I know it may
seem a bit confusing, but I was trying to give users as much
flexibility as possible. 2) So your Stata code in the readme file will
produce dataset 3), which includes PEU and NPEU tax units. If you
didn't want to keep the NPEU tax units, you could only keep
observations from the TAXSIM file where the last digit of the taxsimid
is less than 3, as that would exclude all the NPEU tax units. Tax units
from the PEU have a last taxsimid digit of 0, 1, or 2, where 0=tax unit
and household are the same, and 1 or 2 means the original household has
been split into 2 tax units. ...
The ftaxNNp* files contain TAXSIM output, I've just renamed the output
See line 3786:
RENAME taxsimid=TAXUNITID fiitax=FEDTAXLIAB siitax=STTAXLIAB
frate=FEDMTR1 srate=STMTR1 v10=TAGI v18=TAXINC v41=STBR fica=FICA
state=STATE v25=EITC v28=FEDTAXBC v22=CHILDTC v23=CHILDTCREF
v24=CHILDCARECR;
The taxsim input variables in the scfNNpubtaxtu files have
different names in the SCF file:
RENAME TAXUNITID=taxsimid TUAGE=page SPAGE=sage KIDS=depx KIDSU13=dep13
KIDSU17=dep17 KIDSU18=dep18 WSINCOME=pwages WSINCSP=swages TBUSINC=psemp
TBUSINCSP=ssemp DIVINC=dividends INTINC=intrec STCAPINC=stcg LTCAPINC=ltcg
PENINC=pensions GSSINC=gssi UNEMPINC=pui UNEMPINCSP=sui AFDCINC=transfers
RENT=rentpaid RESTAXM1=proptax CHCAREXP=childcare ;
The tax unit version of those files has many more variables than the
household level versions for the simple reason that I can't
aggregate up all the tax unit variables to the household level, so I
chose to do what I could. The main thrust of the SAS program is
splitting up households into tax units, which entails dividing up
income, deductions, exemptions and such, and computing as many
itemized deductions as I can from the SCF data.
Just to be clear, if HTAXFILE=NO, then the program just created the
input files for TAXSIM, and those input files would need to be
pushed through TAXSIM and then the SAS program rerun with
HTAXFILE=YES to merge the TAXSIM output with the underlying SCF
data. The SAS program output three SAS datasets, one with
observations at the tax unit, one with tax units rolled back up into
households (PEU), and one with tax units, including any NPEU tax
units, rolled back up into the household (PEU). All three of those
datasets contain Y1 (which is YY1 with the 1-5 replicate number on
the end).
In terms of linking the taxsimid back to the Y1 id in the public
dataset, the taxsimid = Y1*100+TAXUNIT, where TAXUNIT indicates if
the observation has been split into multiple tax units. But beware
the one-to-many merge you will encounter when merging the TAXSIM
input file to the public dataset, as the TAXSIM input file is in tax
units and the public dataset is households.
Kevin B. Moore
Federal Reserve Board
https://www.federalreserve.gov/econres/kevin-b-moore.htm
========================================================================
Daniel Feenberg
feenberg@nber.org
617-682-6204