This is a project to create Taxsim 22 variable files from the March CPS
files. I start with IPUMS data. While a program (cps229.sas) exists, it
has not been fully debugged and tested. I becamse discouraged when the
following problems became apparent:
DEPSTAT:
This is an indicator that is supposed to point from the dependent to the taxpayer,
but is has several problems beyond the difficulty in knowing if a person receives
half or more of his support from the taxpayer:
1) Available 2004+ only
2) for 2004-2005 the taxpayer's spouse is (always?) shown as a dependent, which
is not consistent with the tax law.
3) for 2006 all married taxpayers are listed as dependent of each other, which
is confusing, as well as wrong.
4) for 2004-2006 many taxpayers are shown as dependent on themselves. In 2004
and 2005 many of these are very young, and a momloc or poploc is shown and
usually little or no income. In 2006 this appears to indicate a child with
significant income.
5) for 2007+ this seems correct.
6) for all years, the depstat points to the person line number (lineno)
rather than to the person number (pernum), making for rather complex
programming to unite parents and children.
Proposed corrections are to change DEPSTAT to zero for all married persons and for
persons dependent on themselves, change DEPSTAT to zero if they have income, and
to momloc or poploc if they don't. It would be an uncorrectable error for an
individual with no income to have no parent.
FILESTAT
The correspondence between AGI and filing isn't consistent through time. There is
legitimate ambiguity as to which low income individuals will file, and the CPS
changes its decision rule for of low income individuals through time. But
sometimes high income individuals are not shown as filing, which can't be right.
7) For 2007+ a small percentage of non-filers have non-zero AGI and FEDTAX. Of
course a small AGI is compatible with non-filing, however 47 of these have AGI
so high it is top-coded, and anyone with FEDTAX should surely file.
8) For 2004 and 2005 there are 74,000+ non-filers with AGI of zero, and only
7 (2004) and 4,472 (for 2005) filers with AGI of zero. For all other years
there are 25,000 to 42,000 filers with AGI of zero.
Provided one doesn't try to count filers, and calculates taxes for all
individuals with non-zero, non-missing income, there is no correction needed.
PROPTAX and HOUSRET:
It isn't clear where these values come from, they are some sort of imputation, and
there is no indication of what HOUSRET is intended to proxy - it can be negative,
so perhaps it includes a capital gain or loss. Oddly, the values for both these
items are repeated for all dependents of the taxpayer, at the same value imputed
to the taxpayer. That is, for a taxpayer imputed $1,000 in PROPTAX, and with 2
dependents, each dependent will also have $1,000 for PROPTAX. These are the only
two tax variables treated in this way. This oddity seems to be consistent from
1992 on.
Summary:
It is not possible to use the information in the tax variables of the March CPS to
form tax families, and any attempt to do so will lead to inconsistent treatments
across time. Therefore, it makes sense to form your own tax families, using the
relationship information which is consistently provided through the years with the
momloc and poploc variables provided by IPUMS. However, this sacrifices any
ability to allocate adult dependents to taxpayers.
Comparison of Aggregates:
We can compare the aggregate income and liability from the tax data to known
aggregates, however since the CPS values are top-coded, it can be important to
take that into consideration. Using the IRS Statistics of Income Division Public
Use Files I can create top-coded samples of actual tax returns to create top-coded
versions of the the public use files, and create top-coded aggregates. I do this
in two ways. First, I simply drop all values over $99,997, and second I replace
all top-coded values with $99,997 and drop all missing values. The results are
surprising.
In 1998 income and tax in the CPS is about half what it is in comparably top-coded
PUF. In all other years, the CPS values for tax are 20% to 35% higher than the PUF
values for the top-coded sample, and 5% to 23% higher for the sample where
top-coded values are dropped entirely. I don't find these differences large,
considering the difficulties in survey data, however the large year to year
variation is a concern.
Daniel Feenberg
August 2010