Index of /to-taxsim/scf-2022


In these directories are the original Survey of Consumer Finances 
public use datasets (in ./frbsurvey), Taxsim versions of those files
(in ./txpydata) and the results of running those files through Taxsim (in 
./results). The SAS code that does this is in ./code. 

Here is an example in Stata for running the FRB file through Taxsim:

  insheet using http://taxsim.nber.org/to-taxsim/scf-2022/txpydata/scf22pubtaxtu.$
  taxsim35,replace


I want to thank Kevin Moore for sending the PUF files and the code for the 
transforming them. 

Please see http://taxsim.nber.org/ for information about Taxsim 

Kevin kindly provided these notes on the code:
================================================================

I have updated the SCF TAXSIM SAS program to include the 2022 SCF.  I have 
also updated the program to run on TAXSIM v35, but I still have not 
incorporated the business tax deduction variables and I continued to use 
the dep13, dep17, and dep18 variables instead of age1, age2, and age3.  
The business tax deduction variables are an issue I'm continuing to work 
on, but I wanted to put out the updated program.  The documentation page 
you have for my code, https://taxsim.nber.org/to-taxsim/scf27-32/, only 
needs a few minor updates. The first is that the program runs on TAXSIM 
v35, the other edits are highlighted below.

There is taxsim data for each SCF from 1989-2022. These files contain the 
necessary variables to submit to the taxsim32 calculator to calculate 
federal income tax liability. The files "byhousehold" are one record per 
SCF respondent. Since a since respondent may encompass several taxpaying 
units, the taxsim data is aggregated to the household level. Variables 
that don't make sense to be summed are omitted. These files have all the 
SCF variables and even if you don't want taxsim, it may be convenient to 
use these files for Stata or other packages that can't read sas7bdat 
format.


1) In my SAS program, if you turn the HTAXFILE=YES then the program looks 
   for the file from TAXSIM, reads it in and then creates three different 
   datasets (see line 4035 in the SAS program). Basically the three 
   datasets are 1) tax units, 2) combine all the tax units in the primary 
   economic unit (PEU) back into a household, 3) same as 2), but also adds 
   any tax units from the non-primary economic unit (NPEU) back into the 
   household. The PEU is the main household in the survey, the NPEU 
   consists of individuals who are not financially dependent on the PEU. 
   An example is a financially independent sibling (over 18) that lives 
   with his brother and the brother's spouse or partner. I know it may 
   seem a bit confusing, but I was trying to give users as much 
   flexibility as possible. 2) So your Stata code in the readme file will 
   produce dataset 3), which includes PEU and NPEU tax units. If you 
   didn't want to keep the NPEU tax units, you could only keep 
   observations from the TAXSIM file where the last digit of the taxsimid 
   is less than 3, as that would exclude all the NPEU tax units. Tax units 
   from the PEU have a last taxsimid digit of 0, 1, or 2, where 0=tax unit 
   and household are the same, and 1 or 2 means the original household has 
   been split into 2 tax units. ...



The ftaxNNp* files contain TAXSIM output, I've just renamed the output

See line 3786:  

    RENAME taxsimid=TAXUNITID fiitax=FEDTAXLIAB siitax=STTAXLIAB 
      frate=FEDMTR1 srate=STMTR1 v10=TAGI v18=TAXINC v41=STBR fica=FICA 
      state=STATE v25=EITC v28=FEDTAXBC v22=CHILDTC v23=CHILDTCREF 
      v24=CHILDCARECR;


The taxsim input variables in the scfNNpubtaxtu files have 
different names in the SCF file:

    RENAME TAXUNITID=taxsimid TUAGE=page SPAGE=sage KIDS=depx KIDSU13=dep13 
      KIDSU17=dep17 KIDSU18=dep18 WSINCOME=pwages WSINCSP=swages TBUSINC=psemp 
      TBUSINCSP=ssemp DIVINC=dividends INTINC=intrec STCAPINC=stcg LTCAPINC=ltcg 
      PENINC=pensions GSSINC=gssi UNEMPINC=pui UNEMPINCSP=sui AFDCINC=transfers 
      RENT=rentpaid RESTAXM1=proptax CHCAREXP=childcare ;

The tax unit version of those files has many more variables than the 
household level versions for the simple reason that I can't 
aggregate up all the tax unit variables to the household level, so I 
chose to do what I could. The main thrust of the SAS program is 
splitting up households into tax units, which entails dividing up 
income, deductions, exemptions and such, and computing as many 
itemized deductions as I can from the SCF data.

Just to be clear, if HTAXFILE=NO, then the program just created the 
input files for TAXSIM, and those input files would need to be 
pushed through TAXSIM and then the SAS program rerun with 
HTAXFILE=YES to merge the TAXSIM output with the underlying SCF 
data.  The SAS program output three SAS datasets, one with 
observations at the tax unit, one with tax units rolled back up into 
households (PEU), and one with tax units, including any NPEU tax 
units, rolled back up into the household (PEU).  All three of those 
datasets contain Y1 (which is YY1 with the 1-5 replicate number on 
the end).

In terms of linking the taxsimid back to the Y1 id in the public 
dataset, the taxsimid = Y1*100+TAXUNIT, where TAXUNIT indicates if 
the observation has been split into multiple tax units. But beware 
the one-to-many merge you will encounter when merging the TAXSIM 
input file to the public dataset, as the TAXSIM input file is in tax 
units and the public dataset is households.


Kevin B. Moore
Federal Reserve Board
https://www.federalreserve.gov/econres/kevin-b-moore.htm

========================================================================

Daniel Feenberg
feenberg@nber.org
617-682-6204