{smcl}
{* 3 August 2012}
{hline}
help for {it:{hi:tsls}}                                         
{hline}

{title: Fast and Small 2SLS with FE, IV and Clustered SE}

{title:Description}

{pstd}
{cmdab:tsls} {depvar} {indepvars} 
{cmd:(}{it:varlist2} {cmd:=} {it:varlist_iv}{cmd:)} 
{cmdab:fe(panelid)}
[,
{cmdab:a:reg}
{cmdab:c:luster(clusterid)}
{cmdab:d:emean}
{cmdab:r:eplace}]

{pstd}This procedure does two-stage least squares with fixed effects, 
instumental variables and clustered standard errors. While not covering all 
the capabilities of {cmd:xtivreg2} or {cmd:ivregress} it is memory efficient 
and is many times faster. Coeficients and standard errors are unaffected. It 
is intended for datasets with hundreds of millions of observations and 
hundreds of variables and for users with time for a bit of care 
and preparation.

{title:Options}

{pstd} {opt areg} {opt fe(panelid)} must also be specified. Use the 
{cmd:areg} instead of the {cmd: regress} procedure for the second stage 
regression, absorbing {it:panelid} with means calculated on-the-fly. This 
option is incompatible (and unnecessary) with {opt demean} and {opt replace}. 
See notes below. Standard errors are corrected to match 
{cmd:xtivreg}. {p_end}

{pstd}{opt cluster(clusterid)} Cluster standard errors by {it:clusterid}, 
which may be different from {it:panelid}. 
{p_end}

{pstd}{opt demean} Demean the variables by {opt fe(panelid)} before running 
the regression.  This is incompatible and unneccessary with {opt areg}. If 
{opt replace} is specified the demeaning is done in place and the original 
data is overwritten.  This reduces the memory load and if you have multiple 
regressions with overlapping variables it is efficient to include all your 
variables in an initial regression with {opt demean} and then subsequent 
regressions with only the {opt fe(panelid)}. The first regression will drop 
rows with missing data, and subsequent regressions will be from the same 
subsample. Note that if you add an un-demeaned variable in one of the 
subsequent regressions, there will be no error message but the result will 
be wrong. {p_end}

{pstd}{opt fe(panelid)} Required. Specify the variable identifying panel units. 
If {opt demean} is not specified this only affects the degrees of freedom.

{pstd}{opt replace} Used with {opt demean} to cause variables listed in 
the regression to be replaced with their own deviations from panel unit means.

{title:Examples}

{pstd}Fixed effects with  a storage constraint and clustered errors. This doesn't
affect the data. {p_end}

{phang2} {cmd:. tsls y1 y2,areg fe(panelid) cluster(clusterid) } {p_end}

{pstd} Fixed effect and instrumental variable but the original data is overwritten. {p_end}
{phang2}{cmd:. preserve} {p_end}
{phang2}{cmd:. tsls y1 (y2 = z1),demean fe(panelid) replace}

{pstd} Add clustered standard errors but use the previously demeaned data {p_end} 
{phang2} {cmd:. tsls y1 (y2=z1) fe(panelid) cluster(clusterid) }

{pstd} Drop the IV procedure, still using demeaned data {p_end}
{phang2} {cmd:.  tsls y1 y2,fe(panelid) }{p_end}

{pstd} Check the IV result against {cmd: xtivreg2} {p_end}
{phang2}{cmd:. restore} {p_end}
{phang2} {cmd:. xtivreg2 y1 (y2 = z1) vce(clustervar clusterid) absorb(panelid)} {p_end} 

{title:Notes}
{pstd} Please note that if any regressions expecting demeaned data refer to 
variables that are not demeaned the result will be incorrect. Hence 
the order of commands in the example.{p_end}

{pstd} Variables listed in {it varlist2} and {it varlist_iv} must not 
overlap with any variables listed among {it indepvars}.

{pstd} <cmd:tsls} will always use less memory than {cmd:xtivreg} because 
{cmd:xtivreg} stores the demeaned variables as additional doubles. 
{cmd:tsls} stores the demeaned values as floats, and writes over the 
in-memory original data. If the original data is integer or byte, the 
option {opt areg} demeans each row on-the-fly and uses no extra memory at 
all for demeaned data. If {opt replace} is specified, there is additional 
time for input/output.

{pstd} Standard errors are corrected for degrees of freedom, IV and 
clustering but you should compare on a subset of your data to {cmd:xtivreg2} 
to confirm this is done correctly. Coeffients and standard errors have 
matched to the full printed precision in our tests but it is possible we 
haven't considered every possible situation. {p_end}

{pstd}Standard errors in the second stage regression are obtained from a 
regression of the predicted errors on the RHS variables, but using the true 
values of the endogenous variables. We would like to thank Doug Staiger for 
this suggestion, and Jeffrey Wooldridge for noting that because the 2SLS 
residuals are always uncorrelated in sample with the first-stage fitted 
values regressing them on the actual data leads to Equation 5.34 of his 
textbook. We remain resposible for all errors. {p_end}

{pstd}This is beta-level software. Please report problems. 

{title:Limitations}
{pstd}
Weights, if, in and factor variables are not implemented. 
The procedure is not byable. 
{p_end}

{title:More Information}

{pstd}There is more information at
{browse "http://www.nber.org/stata/efficient": http://www.nber.org/stata/tsls .} 
tsls.ado is by Jacob Robbins, 
tsls.sthlp is by Daniel Feenberg (feenberg@nber.org). {p_end}

{title:Reference}

{pstd}Wooldridge, Jeffrey M.,
{it Econometric Analysis of Cross Section and Panel Data}