Guidelines for non-NBER users of -taxpuf.ado
From time to time I have run regressions or tables on the SOI
Individual Income Tax Public Use Files for other researhers.
These guidelines are for researchers sending me Stata .do files to run
against the SOI PUF files kept here at NBER. I'd like to keep my role
quick and mechanical so that turnaround for users is minimal.
I will send each such user a .zip file with test data. Once you are
satisfied with the tests, send me the .do file and I will run it against
the full sample, including more recent files and return the results to
you. Turnaround should be a day or less, but no promises.
Data
The sample file of test data is fully random. Range and mean for each variable
will be similar to real data, but there will be zero correlation among the
variables or across years. Tax calculations will not provide sensible
aggregates. The only function of the test data is to allow you confirm you are
not referring to missing data or confusing codes and values. Note that
different years have slightly different variable sets. Information about the
NBER collection of SOI PUF data is at:
The original SOI files can not be used as input for taxsim, which
requires more uniform files through time.
SOI files did not name the variables until recently, and data elements
were inconsistent through time. The NBER has created .dta files with a
highly consistent naming (actually numbering) convention through time for
a subset of the original variables. Wages are always "data11". Full long
term gains are calculated by dividing the SOI supplied amount by 1., .5 or
.4 and stored as "data70". Similar calculations are done for various items
subject to a floor or ceiling.
Tax Calculator
-taxpuf.ado- is a tax calculator that can use the PUF data. Note that the more
well-known Internet Taxsim operates with a subset of data, while -taxpuf.ado-
uses all the data available in the PUF, using the variable names established
at NBER back in 1976.
From within Stata Install taxpuf with:
net from "https://www.nber.org/stata/taxpuf"
net install taxpuf,all
Submissions
Your .do file should be named "useridN.do",
where N is an integer that increments with each submission and userid
is an id that will help me keep track of who is submitting what. For example
"joseph1.do" would be Joseph's first attempt.
Your program should start with a log command:
log using joseph1,text replace
This will help me keep track of what is happening. I will be keeping the .do
files, but please send a complete file each time, don't ask me to edit your
earlier file. The "text replace" options are important to me.
Programs may read only from /home/data/soi/taxsim/dta. Use a macro to
specify the filename so that changing to the NBER system requires minimal
edits:
local taxfile D:\Data\tax\randomtax
keep if data103==2008
if c(username)=="feenberg" then local taxfile /home/data/soi/taxsim/dta/s2008
...
use taxfile
With this specification The directory for SOI data will be specified
correctly, and without my editing the program. Also, it is simple for me to
change from the subsets to the full dataset by changing "s" to "x" in one
place if the program appears to be working.
You can read multiple years in a loop:
clear
forvalues i=1965/1991 {
append using /home/data/soi/taxsim/dta/s`i'
}
data103 always gives the file year. The file of random data will include
records for all years.
Programs may write only to the current directory or /tmp (Stata temp
files are ok).
If you want .dta files returned, please place a -summarize- command
after the -save- command so that I can look in the log to see what is
happening. Name the files so that I can zip them up easily:
zip a joseph.zip joseph*
Before sending me a program look it over carefully for signs of code that is
specific to your system, such as directory names, system commands, etc. Do
not use uppercase letters or spaces in filenames! That will make me mad.
This is a new service, so expect changes in the guidelines as I gain
experience.
Daniel Feenberg
617-863-0343