Stata+Tips

=Stata Tips = Index of Stata Tips Reading the General Social Survey (GSS) Data from the Web Tip #1--Distribution Functions Tip #2--Downloading Files Tip #3--Changing the Default Directories in Stata Tip #4--The foreach command Tip #5--preserve/restore and collapse commands Tip #6--Temporary variables Tip #7--Using tabulate to create dummy variables Tip #8--Producing summary statistics using collapse Tip #9--Setting up Stata to Run from Lab Computers Tip #10--Testing extra sums of squares Tip #11--Remapping the Function keys in Stata ||= ||

log using "Read_GSS_Data.txt", text replace
 * capture log close

/* Create a subset of the GSS for the Abortion Attitude Study */ use year cohort socbar socrel socommun socfrend race age sex educ attend marital childs premarsx reliten childs polviews sexfreq if race!=3 using http://terpconnect.umd.edu/~smilex3/GSS-Cumulative-72-08.dta, clear; log close
 * 1) delimit ;
 * 1) delimit cr

You can also read the data from http://www.bsos.umd.edu/socy/alan/stata/. This took about 7 minutes 36 seconds. Terpconnect took 3 minutes 58 seconds. ||

Tip #1--Distribution Functions
Stata has a lot of built in distribution or density functions. Essentially you can calculate any number in the density tables in the back of most statistics books. These can be used interactively from the command line or built into programs you write. To show the results on your computer screen you need to use the display command. For example, to determine the area under a normal distribution from the left side of the distribution to a z-score equal to +1 enter the following command at the Stata command line:

. display normal(1)

This produces the following output value of .84134475. The following example displays the area under a normal distribution that lies beyond the value of 1.96 (e.g. 0.025) . display 1-normal(1.96)

For more help enter " help density_functions " and " help display " at the command prompt and see my handout using Tip #2 with lots of examples using probability density functions in Stata. See density functions.

Tip #2--Downloading Files
Stata has great features for downloading user-written programs, program updates, and data files from Web sites. I will be posting a number of program files and datasets using these features. For more information enter "help net" at the Stata command prompt. From the Stata command window enter: net from http://terpconnect.umd.edu/~smilex3/stata/ This tells Stata to link to this Web site and display the content. You can click on the colored words to open up a window that describes the and optionally allows you to download the files. You can also use the command line. For example, you could enter " net describe gss-06 " to learn about this file or " net get gss-06 " to download the file. If you don't want store the data on your computer you can simply use it "over" the internet. The following commands will read the GSS data into stata:

. set memory 60m. use http://www.bsos.umd.edu/socy/alan/stata/gss-98-06

See net.

Tip #3--Changing the Default Directories in Stata
The sysdir command queries and sets the system directories. For example, typing sysdir produces the following:

. sysdir STATA: C:\Program Files\Stata10\ UPDATES: C:\Program Files\Stata10\ado\updates\ BASE: C:\Program Files\Stata10\ado\base\ SITE: C:\Program Files\Stata10\ado\site\ PLUS: c:\ado\plus\ PERSONAL: c:\ado\personal\ OLDPLACE: c:\ado\

You can ignore most of this. However, the line labeled PLUS can might be important. This line points to the directory that Stata uses to install user-written software. In this case the hard drive location is c:\ado\plus\ .This location may not be the same on different computers and it can be changed. On my home computer I enter the following to change the location:

. sysdir set PLUS "C:\Documents and Settings\HP_Owner\My Documents\My Research\stata\ado" . sysdir STATA: C:\Program Files\Stata10\ UPDATES: C:\Program Files\Stata10\ado\updates\ BASE: C:\Program Files\Stata10\ado\base\ SITE: C:\Program Files\Stata10\ado\site\ PLUS: C:\Documents and Settings\HP_Owner\My Documents\My Research\stata\ado\ PERSONAL: c:\ado\personal\ OLDPLACE: c:\ado\

Now, whenever I download a convenient piece of user-written software (i.e. vreverse, esttab , estout , etc.) it is installed in this directory. You can use this feature to change the default directory to a flash drive or to a networked drive in a computer lab and have access to your programs on any computer you use. See sysdir.

Tip #4--the foreach command
The foreach command allows you to loop over items. In this tip I show how to loop over variables. This is useful when you have some set of Stata commands that you want to execute repeatedly across a large number of variables. My examples are unrealistic in that I don't have that many variables but you should understand the efficiency as the number of repetitions (i.e. variables) increases. Consider the following lines of code:

sysuse auto, clear /* Create new variables based on old variable divided by ten */ gen price1=price/10 gen weight1=weight/10 gen length1=length/10 gen displacement1=displacement/10

Here we repeat four generate commands that are identical except for the variables they use. Here is the same code using the foreach command:

sysuse auto, clear /* Create new variables based on old variable divided by ten */ foreach y of varlist price weight length displacement { gen `y'1=`y'/10 }

The foreach command creates a local macro that contains the name of an existing variable. Here the local macro is called y and must be referenced as `y'. This macro contains, sequentially, the variable names price, weight, length , and displacement. The generate command within the foreach loop creates four new variables called price1, weight1 , length1 , and displacement1. Each time the loop is executed a new variable name is used in the generate statement.

Here is an example of replacing two tabulate commands with a foreach loop:

sysuse auto, clear tabulate rep78 tabulate foreign foreach x of varlist rep78 foreign { tabulate `x' }

Finally, here is an hypothetical example that replaces values of 99 in a large number of variables with the Stata value for missing values, the period:

/* Replace all values of 99 with missing for vars. sex through age */ foreach x of varlist sex-age { replace `x'=. if `x'==99 } Notice that the local variable is called //x// in this example and must be referenced as //`x'//.

See foreach.

Tip #5-- preserve/restore and collapse commands
The preserve command preserves or saves the data, guaranteeing that data will be restored after your program ends. The restore command forces a restore of the data immediately. This command is useful in combination with other Stata commands that replace (i.e. destroy) the dataset in memory. One useful command that replaces the dataset in memory with a new dataset is the collapse command. The collapse command converts the dataset in memory into a dataset of means, sums, medians, etc. The example below uses the Stata system, supplied auto dataset to create a dataset containing the average MPG for each different group defined by rep78, the repair record group. These data are then graphed as a scatterplot.

sysuse auto, clear preserve collapse (mean) mpg (count) n=mpg, by(rep78) list graph twoway (scatter mpg rep78 [fweight=n], connect(l)), scheme(s2color) restore

See preserve or restore and collapse for more help.

Tip #6--Temporary variables
You can define temporary variables that only exist while your Stata program is running. The command tempvar assigns names to the specified local macro names that may be used as temporary variable names in a dataset. When the program or do-file concludes, any variables with these assigned names are dropped. For example, the following code creates a new temporary variable named mpg_z which is the z-transformed variable mpg:

sysuse auto, clear tempvar mpg_z egen `mpg_z'=std(mpg) sum mpg `mpg_z'

Because the tempvar variable is a local macro to use it you must enclose the variable name in the `' quote marks.

See tempvar and quotes.

Tip #7--Using tabulate to create dummy variables
One easy way to create a set of dummy or indicator variables is to use the generate option to the tabulate command. The syntax for this command is:

tabulate varname [if] [in] [weight], generate(stubname)

For example, in the GSS the variable marital has the following categories: 1=married, 2=widowed, 3=divorced, 4=separated, and 5=never married. To create five new dummy variables called mardum1 through mardum5 use the following command:

tabulate marital, generate(mardum)

See tabulate.

Tip #8--Producing summary statistics using collapse
The command collapse converts the dataset in memory into a dataset of means, sums, medians, etc. This command destroys the dataset in memory unless you use the preserve and restore commands discussed in tip #5. Consider the following command applied to the complete GSS dataset:

collapse (mean) tvhours (count) n=tvhours, by(year)

The GSS dataset contains one record for each respondent for each year. That is about 51K cases from 1972 through 2006. After running the collapse command a new dataset is created that contains twenty-six cases, one for each year, and three variables (year, tvhours, and n). The variable tvhours contains the average number of hours spent watching television for each year. The variable n contains the number of nonmissing cases for tvhours. This is needed if we want to calculate a (weighted) average of tvhours. Here is what a portion of the new dataset looks like:

. . . . . . . . . 1977 2.92721 1525 1978 2.79254 1528 1980 2.92779 1454 1982 3.19094 1854 1983 2.95486 1595 . . . . . . . ..

The next example shows how collapse can be used to calculate averages by year, and within year by another variable, in this case sex:

collapse tvhours, by(year sex)

The new dataset contains fifty-two cases and three variables (year, sex, and tvhours) and looks like the following excerpt:

. . . . . . . . . 1977 MALE 2.66232 1977 FEMALE 3.14611 1978 MALE 2.479 1978 FEMALE 3.02034 1980 MALE 2.57771 1980 FEMALE 3.20073 . . . . . . . ..

Many statistics can be requested and includes: means (default), medians, 1st percentile, 2nd percentile, 3rd-49th percentiles, 50th percentile (same as median), 51st-97th percentiles, 98th percentile, 99th percentile, standard deviations, sums, sums, ignoring optionally specified weight, number of nonmissing observations, maximums, minimums, interquartile range, first value, last value, first nonmissing value, last nonmissing value. The last two examples show how to use the collapse command to produce interesting graphics. Note the use of a lot of advanced Stata command in these examples (i.e. collapse, local, legends, graph formatting, etc.):

collapse (mean) tvhours (count) n=tvhours, by(year) list quietly sum tvhours [fweight=n] graph twoway (scatter tvhours year, ms(i) connect(l)), title("Average Television Viewing Trend: 1975-2006") xtitle("") ytitle("Hours") ylabel(, format(%4.1f)) xtick(1970(5)2010) xmtick(1970(1)2010) ytick(2.8(.05)3.2) yline(`r(mean)', lw(medthick)) name(tv1, replace) restore || preserve collapse tvhours, by(year sex) list graph twoway (scatter tvhours year if sex==1, ms(i) connect(l)) (scatter tvhours year if sex==2, ms(i) connect(l)), title("Average Television Viewing Trend by Gender: 1975-2006") xtitle("") ytitle("Hours") ylabel(, format(%4.1f)) xtick(1970(5)2010) xmtick(1970(1)2010) ytick(2.4(.1)3.4) legend(on cols(1) ring(0) position(4) label(1 "Men") label(2 "Women")) name(tv2, replace) restore ||
 * preserve
 * 1) delimit ;
 * 1) delimit cr
 * 1) delimit cr
 * 1) delimit ;
 * 1) delimit cr
 * 1) delimit cr
 * [[image:http://image.wetpaint.com/image/1/eP9w3ZelllrhfzPTgGyxkA27060/GW390H284 width="390" height="284" caption="Stata - SOCY602"]] || [[image:http://image.wetpaint.com/image/1/n1lWL-z24d3IXWOviFkg1w29889/GW390H284 width="390" height="284" caption="Stata - SOCY602"]] ||

See collapse.

Tip #9--Setting up Stata to Run from Lab Computer
You can create a Stata do file that contains Stata commands that you want to use each time you run Stata. For example, you might always want a specific directory to be your default directory; you might always want more memory than the default; or you might want to reprogram the function keys.

Let's say that you are using a flash drive that when you connect it to the USB port it always is assigned the drive letter e:. Every time you use Stata you want to 1) make the default directory "e:\socy602", 2) increase the memory that Stata can use for data, 3) change the PLUS directory where you store user written Stata programs to e:\socy602\ado, and 3) reprogram the F7 function key to - net from - the class stata usersite. One way to do this is to put all of the relevant commands into a Stata do file stored on you flash drive and run the file every time you use Stata. If you store the following Stata code in a file called setup.do and save this file to your flash drive:

cd e:\socy602 capture log close set more off set mem 50m sysdir set PLUS "e:\socy602\ado" global F7 net from http://terpconnect.umd.edu/~smilex3/stata; ||
 * /* Save these commands in a file called setup.do */

Then you would run this file from the command window to execute all of these commands like this:

. do e:\setup.do

Note that the drive letter is "hard-coded" to always be "e:". If your flash drive is always assigned to that letter this works well. If, however, the drive assignment changes from computer to computer the following Stata program allows you to pass a parameter--send the program the correct drive letter--when you run it. Here is the program:

capture log close set more off set mem 50m sysdir set PLUS "`1'\SOCY602\ado" global F7 net from http://terpconnect.umd.edu/~smilex3/stata; ||
 * cd `1'\SOCY602

If your flash drive is assigned the letter k: you would run your program like this:

. do k:\setup.do k:\

Or, if you drive was assigned to drive z: you run the program like this:

. do z:\setup.do z:\ The part " do z:\setup.do " points to the root directory of your flash drive and if the file setup.do is found it is executed. The part of the command " z:\ " is a parameter that is passed to setup.do and substituted into the program wherever there is a `1'. Obviously you would substitute the appropriate drive letter and include whatever Stata commands you want in setup.do (e.g. load a data file, etc.).

See macro, keyboard , sysdir , set memory , Stata Journal (reprogram F-Keys).

Tip #10--Testing Extra Sums of Squares
Stata provides an easy way to calculate the //F//-tests associated with extra sums of squares using the nestreg command prefix. Consider the following Stata code using the auto.dta dataset:

nestreg: regress mpg (weight displacement) /* Block 1 */ (rep78 foreign price) /* Block 2 */ (headroom length gear_ratio) /* Block 3 */; ||
 * sysuse auto, clear
 * 1) delimit ;
 * 1) delimit cr

To make the example a little clearer I used the #delimit command and "inline" comments. The nestreg prefix fits nested models by sequentially adding blocks of variables and then reports comparison tests between the nested models. In this example Stata will produce partial regression coefficient estimates for mpg regressed on 1) weight and displacement (block 1), 2) weight, displacement, rep78, foreign, price (blocks 1 & 2), and 2) weight, displacement, rep78, foreign, price, headroom, length, and gear_ratio (blocks 1, 2, & 3). In addition to these models Stata will produce the //F//-tests associated with the extra sums of squares (or change in //r2//) for the addition of blocks of variables.

That output, shown below, indicates that the //r2// in the reduced or baseline model (block 1 only; //r2//=0.6492) is statistically significantly different than zero. Adding in the variables contained in block 2 to a model that already contains the variables from block 1 increases the //r2// to 0.6787 (an increase of 0.0294) but the difference is not statistically significant (//F//=1.92; //p//=0.1350). Finally, adding in the variables contained in block 3 to amodel that contains all of the block 1 and block 2 variables increases the //r2// by 0.0334 (from 0.6787 to 0.7121) and this increase is not statistically significant (//F//=2.32; //p//=0.0842)



If you are only interested in this table and want to suppress all of the regression output you can use the quietly option for nestreg shown in the following box:

nestreg, quietly: regress mpg (weight displacement) /* Block 1 */ (rep78 foreign price) /* Block 2 */ (headroom length gear_ratio) /* Block 3 */;
 * #delimit ;
 * 1) delimit cr ||

Finally, we can put together a number of Stata techniques to produce regresison estimates that are easy to compare across models. The following model 1) uses local macros to define the blocks of variables, 2) uses eststo and esttab to present side-by-side model comparisons (assumes that the user-written eststo package is installed), and 3) uses nestreg to produce the extra sums of squares //F//-tests. Note that the order of the regressions is important--the full model must be estimated first so that the e(sample) can be used to calculate all of the regressions on the same cases. local block1 weight displacement local block2 rep78 foreign price local block3 headroom length gear_ratio
 * quietly {

eststo block3: regress mpg `block1' `block2' `block3' eststo block1: regress mpg `block1' if e(sample) eststo block2: regress mpg `block1' `block2' if e(sample)

noisily esttab block1 block2 block3, r2 }


 * 1) delimit ;

nestreg, quietly: regress mpg (`block1') (`block2') (`block3');
 * 1) delimit cr ||

Stata is full of little building blocks that can be put together in interesting and useful ways like the last example!

See nestreg, e.

Tip #11--Remapping the Function Keys in Stata
Stata allows you to use global macros to remap the keyboard keys to any value you want, for example, a Stata command you use frequently. I look at the SOCY498C Stata materials frequently and so have created a shortcut key to get there. The following command reprograms the F7 key to access the Stata course materials:

global F7 net from http://terpconnect.umd.edu/~smilex3/stata;

This command can be entered every time you start Stata or put into a profile.do file and executed when you start Stata. See Tip #9 for additional information about setting up personal Stata defaults.

See macro, keyboard , Stata Journal (reprogram F-Keys).