In these directories are the original Survey of Consumer Finances public use datasets (in ./frbsurvey), Taxsim versions of those files (in ./txpydata) and the results of running those files through Taxsim (in ./results). The SAS code that does this is in ./code. Here is an example in Stata for running the FRB file through Taxsim: insheet using http://taxsim.nber.org/to-taxsim/scf-2022/txpydata/scf22pubtaxtu.$ taxsim35,replace I want to thank Kevin Moore for sending the PUF files and the code for the transforming them. Please see http://taxsim.nber.org/ for information about Taxsim Kevin kindly provided these notes on the code: ================================================================ I have updated the SCF TAXSIM SAS program to include the 2022 SCF. I have also updated the program to run on TAXSIM v35, but I still have not incorporated the business tax deduction variables and I continued to use the dep13, dep17, and dep18 variables instead of age1, age2, and age3. The business tax deduction variables are an issue I'm continuing to work on, but I wanted to put out the updated program. The documentation page you have for my code, https://taxsim.nber.org/to-taxsim/scf27-32/, only needs a few minor updates. The first is that the program runs on TAXSIM v35, the other edits are highlighted below. There is taxsim data for each SCF from 1989-2022. These files contain the necessary variables to submit to the taxsim32 calculator to calculate federal income tax liability. The files "byhousehold" are one record per SCF respondent. Since a since respondent may encompass several taxpaying units, the taxsim data is aggregated to the household level. Variables that don't make sense to be summed are omitted. These files have all the SCF variables and even if you don't want taxsim, it may be convenient to use these files for Stata or other packages that can't read sas7bdat format. 1) In my SAS program, if you turn the HTAXFILE=YES then the program looks for the file from TAXSIM, reads it in and then creates three different datasets (see line 4035 in the SAS program). Basically the three datasets are 1) tax units, 2) combine all the tax units in the primary economic unit (PEU) back into a household, 3) same as 2), but also adds any tax units from the non-primary economic unit (NPEU) back into the household. The PEU is the main household in the survey, the NPEU consists of individuals who are not financially dependent on the PEU. An example is a financially independent sibling (over 18) that lives with his brother and the brother's spouse or partner. I know it may seem a bit confusing, but I was trying to give users as much flexibility as possible. 2) So your Stata code in the readme file will produce dataset 3), which includes PEU and NPEU tax units. If you didn't want to keep the NPEU tax units, you could only keep observations from the TAXSIM file where the last digit of the taxsimid is less than 3, as that would exclude all the NPEU tax units. Tax units from the PEU have a last taxsimid digit of 0, 1, or 2, where 0=tax unit and household are the same, and 1 or 2 means the original household has been split into 2 tax units. ... The ftaxNNp* files contain TAXSIM output, I've just renamed the output See line 3786: RENAME taxsimid=TAXUNITID fiitax=FEDTAXLIAB siitax=STTAXLIAB frate=FEDMTR1 srate=STMTR1 v10=TAGI v18=TAXINC v41=STBR fica=FICA state=STATE v25=EITC v28=FEDTAXBC v22=CHILDTC v23=CHILDTCREF v24=CHILDCARECR; The taxsim input variables in the scfNNpubtaxtu files have different names in the SCF file: RENAME TAXUNITID=taxsimid TUAGE=page SPAGE=sage KIDS=depx KIDSU13=dep13 KIDSU17=dep17 KIDSU18=dep18 WSINCOME=pwages WSINCSP=swages TBUSINC=psemp TBUSINCSP=ssemp DIVINC=dividends INTINC=intrec STCAPINC=stcg LTCAPINC=ltcg PENINC=pensions GSSINC=gssi UNEMPINC=pui UNEMPINCSP=sui AFDCINC=transfers RENT=rentpaid RESTAXM1=proptax CHCAREXP=childcare ; The tax unit version of those files has many more variables than the household level versions for the simple reason that I can't aggregate up all the tax unit variables to the household level, so I chose to do what I could. The main thrust of the SAS program is splitting up households into tax units, which entails dividing up income, deductions, exemptions and such, and computing as many itemized deductions as I can from the SCF data. Just to be clear, if HTAXFILE=NO, then the program just created the input files for TAXSIM, and those input files would need to be pushed through TAXSIM and then the SAS program rerun with HTAXFILE=YES to merge the TAXSIM output with the underlying SCF data. The SAS program output three SAS datasets, one with observations at the tax unit, one with tax units rolled back up into households (PEU), and one with tax units, including any NPEU tax units, rolled back up into the household (PEU). All three of those datasets contain Y1 (which is YY1 with the 1-5 replicate number on the end). In terms of linking the taxsimid back to the Y1 id in the public dataset, the taxsimid = Y1*100+TAXUNIT, where TAXUNIT indicates if the observation has been split into multiple tax units. But beware the one-to-many merge you will encounter when merging the TAXSIM input file to the public dataset, as the TAXSIM input file is in tax units and the public dataset is households. Kevin B. Moore Federal Reserve Board https://www.federalreserve.gov/econres/kevin-b-moore.htm ======================================================================== Daniel Feenberg feenberg@nber.org 617-682-6204