If you do not see the menu on the left click here to see it
The Do-file
[IMPORTANT: Try not to copy and paste
the programs, write them yourself. Two reasons: learning experience and pasting
programs may not work properly]
Do-files are ASCII files
containing a series of Stata commands (which could include programs). These
type of files are just plain-text with the extension *.do. You can use notepad,
wordpad or any word processor to write do-files. You can also use the built-in
do-file editor in Stata.

You can also type in the
command window:
doedit
Stata do-file editor window
will appear[1].

As tradition dictates, all programming
courses start with the “Hello, world”
example, this basic course will not be an exception.
In the Stata do-editor write
the following
[IMPORTANT: At the end of each line hit enter]
DO-FILE:
hello.do
display "Hello,
world"
exit
NOTE: YOU MAY
HAVE TO RETYPE THE SINGLE AND DOUBLE QUOTES WHEN COPYING THE DO-FILES CODES
INTO STATA
And save the file go to
File-Save and save it as hello.do (use whatever name you want just make sure to save it with extension *.do).

[IMPORTANT 1: Remember to change the
path to your working directory: cd “h:\statadata\”, save the do-file there.]
[IMPORTANT 2: If you copy and paste the
code be careful with quotations, you may have to retype them into Stata
otherwise they program or do-file may not run]
Once saved, go back to the
command window type
do hello[2]
Or, you can also run the do-file from the do-file
editor by clicking on the right arrow

And you will get the
following output.

Stata recognizes the file hello.do
by typing only “hello” (as long as you are in the working directory, otherwise
you will have to specify the path of the do-file)
The display
command does more than just text; it is also a hand calculator, type

For further details on this
type help display
You can also write programs
directly in Stata, in the command window type
program hello
1.
display “Hello, world”
2.
end
NOTE: YOU MAY
HAVE TO RETYPE THE SINGLE AND DOUBLE QUOTES WHEN COPYING THE DO-FILES CODES
INTO STATA
To execute it just type
. hello
Hello, world
The main problem with programs is that they last as long as
Stata is active. To make them permanent you can write a program in a do-file.
If you type doedit again in the command window a new do-file editor
page will appear, type in the do-editor
DO-FILE:
hello1.do
program hello
display “Hello, world”
end
exit
NOTE: YOU MAY
HAVE TO RETYPE THE SINGLE AND DOUBLE QUOTES WHEN COPYING THE DO-FILES CODES
INTO STATA
And save it as hello1.do
Back to Stata, if when you
type do hello get the
following error message:
do hello
program hello
hello already defined
r(110);
end of do-file
r(110);
This happens because a hello
program already exists; you need to delete the previous hello
program, type
program drop hello
Type do hello1 again, this is what you will see on the results window
do hello /*You type this*/
program hello /*This is what you’ll see*/
1. display "Hello, world"
2. end
exit
end of do-file
To see if it works type hello
and you should get:
. hello /*You type this*/
Hello, world
You can put all together in
one do-file
DO-FILE:
hello1.do [modified]
capture program drop hello
program hello
display "Hello,
world"
end
hello
exit
NOTE: YOU MAY HAVE
TO RETYPE THE SINGLE AND DOUBLE QUOTES WHEN COPYING THE DO-FILES CODES INTO
STATA
Notice the new command capture, if there is no program called hello
the do-file will not run, capture will ignore the error and let the whole
program run (use this feature only when error are an annoyance in testing
programs). For more details on this type help capture.
Now let’s use some of the examples from the previous sections to
write a do-file that does the following:
Go to the do-file editor and
type
DO-FILE:
read1.do
capture log close /*
In case no log is open */
log using log1.log, append /*
Start a log, you can append or replace an
existing log */
clear /*
Reset memory, remove active data */
infile using lat544 /*
Read ASCII data using dictionary lat544 */
describe /*
Get general info on your data */
save lat544, replace /* Save the data[3] */
exit /* Always
end a do-file with “exit” */
Save the do-file as read1.do
In the Stata command window
type
do read1
To see the output check the
log file log1.log using notepad, wordpad, word or in Stata go to File-Log-View
Lets try with data from excel (in *.csv format)
DO-FILE:
read2.do
capture log close /* In case no log is open */
log using log2.log, append /* Start a log, you can
append or replace an existing log */
clear /* Reset memory, remove active
data */
insheet using
"H:\statadata\Testdata03.csv"
/* Read *.csv data */
describe /* Get general info on your
data */
save Testdata03, replace /* Save the data[4] */
exit /* Always end a do-file with “exit” */
Save the do-file as read2.do
In the Stata command window
type
do read2
When the output does not fit
the screen Stata will make a pause waiting for you to press “enter” (or any
keyword) to continue. This is nice but sometimes is annoying, Lets modify
read2.do as follows.
DO-FILE:
read2.do [modified]
set more off /* New. */
capture log close /* In case no log is
open */
log using log2.log, append /* Start a log, you can append or replace an
existing log[5]
*/
clear /* Reset memory, remove
active data */
insheet using
"H:\statadata\Testdata03.csv" /* Read *.csv data */
describe /* Get general info on
your data */
save Testdata03, replace /* Save the data[6] */
set more on /* New */
exit /* Always end a do-file with “exit” */
In the Stata command window
type
do read2
To see the output check the
log file log2.log using notepad, wordpad, word or in Stata go to
File-Log-View
Let’s say you are only interested in few variables
|
Var3 |
Population |
|
Var4 |
Imports |
|
Var5 |
Exports |
Will modify the do-file as
follows
DO-FILE:
read2.do [version 2.0]
set more off
capture log close /*
In case no log is open */
log using log2.log, append /*
Start a log, you can append or replace an existing log[7] */
clear /*
Reset memory, remove active data */
insheet using
"H:\statadata\Testdata03.csv" /* Read *.csv data */
keep var3 var4 var5 /* Keep the vars you need[8] */
describe /*
Get general info on your data */
save Testdata03, replace /*
Save the data[9]
*/
set more on
exit /* Always
end a do-file with “exit” */
NOTE: YOU MAY
HAVE TO RETYPE THE SINGLE AND DOUBLE QUOTES WHEN COPYING THE DO-FILES CODES
INTO STATA
Save the do-file and in the
Stata command window type
do read2
Let’s say you have several datasets in *.csv or *.dat format and
want to do the following:
Suppose you have three files
in *.csv format (click on the file name to download): file1.csv, file2.csv and file3.csv.
DO-FILE:
read3.do [generic]
clear
set more off
capture log close
log using
"H:\statadata\\`1'.log", replace
insheet using
"H:\statadata\\`1'.csv"
describe
summarize
codebook
save H:\statadata\\`1', replace
set more on
log close
exit
Or if you want to add the
variable labels use this one
DO-FILE:
read3.do [with variable labels]
clear
set more off
capture log close
log using "H:\statadata\\`1'.log", replace
insheet using "H:\statadata\\`1'.csv"
capture label variable var1 “Area in Square Km”
capture label variable var2 “Area in Square Mi”
capture label variable var3 “Population”
capture label variable var4 “Imports”
capture label variable var5 “Exports”
capture label variable var6 “Type of regime”
capture label variable var7 “Books read per month”
capture label variable var8 “Newspaper readership per wk”
capture label variable var9 “TV per capita”
capture label variable var10 “GDP per capita”
capture label variable var11 “School Enrollment”
capture label variable
var12 “Personal income”
describe
summarize
codebook
save H:\statadata\\`1', replace
set more on
log close
exit
Capture before “label
variable” makes the do-file more interactive and applicable to any of the files
without Stata yelling at you because it can’t find the variable. Save this
do-file as read3.do
Let’s convert file1.csv by
typing:
do read3 file1
Let’s convert file2.csv by
typing:
do read3 file2
Let’s convert file3.csv by
typing:
do read3 file3
IMPORTANT: Remember to specify the path or to change
the working directory when running do-files or any command that refers to a
particular directory (like use), otherwise you will get the error message file not found.
You can see the output (use
notepad, wordpad or word) by looking at file*.log in the directory "H:\statadata\”
With do-file read3.do
we did the following:
As an exercise do the same
thing using infile from a previous section.
What makes this works is the macro `1’
which Stata interprets as an argument.
To understand it better let’s do a math exercise by creating a do-file that
converts Fahrenheit into Celsius and vice versa. Go to the do-file editor in
Stata and enter the following commands.
DO-FILE:
convert.do
clear
capture program drop convert
program convert
display
display " I will do the following: `1' is in
Fahrenheit and will be converted into Celsius"
display " and `2' is in Celsius and will be converted
into Fahrenheit"
display
display " "`1' " degrees Fahrenheit =
" ((`1'-32)/9)*5 " degrees
Celsius"
display
display " "`2' " degrees Celsius = "
((`2'*9)/5)+32 " degrees
Fahrenheit"
display
display " `1' minus `2' is " `1'-`2' "
(this is just for fun)"
end
exit
NOTE: YOU MAY
HAVE TO RETYPE THE SINGLE AND DOUBLE QUOTES WHEN COPYING THE DO-FILES CODES
INTO STATA
The formulas are the
standard formulas for temperature conversion. Pay attention to the left (in a
standard keyword should be right below the ‘esc’ key) and right (next to the
‘enter’ key) apostrophes in 1 and 2, this is how Stata identifies the macros
(for individual arguments) from regular numbers. Text is enclosed in double
quotations. Save the do-file as convert.do. Run it as:
do convert

Now run the program
convert 32 10

Sometimes we need to perform
the same task more than once. Let’s use the Stata file cnvselect.dta (you
created this in a previous section, see also do-file read3.do) to practice
repetitions.
Let’s first rename
some of the variables for easy identification. To rename variables we type
the following:
rename var1 areami
This is rename [oldname] [newname]
Do this for the following:
rename var1 areami
rename var2 areakm
rename var3 population
rename var4 imports
rename var5 exports
rename var6 regime
We need to add value labels
to regime (because it is a categorical variable). You need two
commands: label define and label
value.
Defining regime labels
label define regimelab 1
"Civilian" 2 "Military/civilian" 3 "Military" 4
"Other"
Assigning regime labels
label value regime regimelab
If you run a frequency on
regime (type tab regime), let’s assume you will se a number “9”. We will
treat this is as missing (and will be added to other missing data for this
variable), to replace “9” with missing we use replace:
replace regime=. if regime==9
Note “=” and “==”. The “=”
assigns a value, the “==” represents conditional, not assignment. The “=.”
means “equal to missing”. Run the frequency again and the “9” will be gone. If
you want to see the number of missing using tab type the following:
tab regime, missing
If you want to know the
average imports by type of regime type
tab regime, summarize(imports)

A civilian government
imports, on average, more than a military one.
Let’s write
a program to calculate the z-value
for some variables. Remember that the z-value represents a distribution with
mean 0 and standard deviation of 1. The formula is:
Z = (X – Mean of X)/Standard
deviation of X.
A nice Stata feature are the
return values (type
help return for further details). Some
Stata commands return results in r(). For example,
type summarize imports and return
list as follows:

All statistics from the summarize command are saved temporarily in r() until the next command is called.
To get a Z-distribution for
variable “imports” we can use the return values from the summarize command. We
will use the generate command to create a new variable with mean 0 and
standard deviation 1. We do this right after running the summarize command for the variable “imports”.
generate imports_z=(imports-r(mean))/r(sd)
Another way to do this is to
use egen
egen imports_z=std(imports)
/*Here is a do-file to create z-scores from all variables in the
dataset*/
ds
local c `r(varlist)'
foreach var of varlist `c' {
egen
std`var'=std(`var')
}
NOTE: YOU MAY
HAVE TO RETYPE THE SINGLE AND DOUBLE QUOTES WHEN COPYING THE DO-FILES CODES
INTO STATA
type help egen for more details.

Results are pretty much
close to mean=0 and sd=1. Type,
browse imports imports_z
You will see the original variable and its z-value
side-by-side

To close the browser click on the “x” in the
upper-right corner of the window.

But hold on! Your dataset have data for countries across time. So we might
need to do z-values by country rather than by the whole sample. Imagine that we
need to create z-distributions of more than one variable for a lot of
countries. Let’s try it.
This is a bit trickier. Let’s explore the “country”
variable first, type tab country. “Country” is a string variable with the names of the countries in the
dataset. We need to put this into numeric by recoding 1 = country A, 2 = B, 3 =
C, etc.
Fortunately, Stata has a nice little command called encode (type help encode for
details):
encode country, generate(country1)
This is
Econde [var with strings],
generate(new variable with numeric coding)
Now type
tab country1
No difference with “country”, try this one:
tab country1, nolabel
Now we have a numeric variable for a total of 37
countries (whether there are some duplicates or not that does not matter right
now).
[IMPORTANT: we will treat
years within countries as cases (they could be firms, cases, families,
individuals, etc), in reality it may not make much sense to create z-values
from time series, we are using this dataset to show how commands work]
Let’s say you need to generate z-distributions for three variables and 37
countries, furthermore, you need to graph the z-distributions for each
country.
Let’s do the z-values first and the graphs later.
DO-FILE:
zvalue.do
capture log close
log using
"H:\statadata\zvalue.log", replace
capture program drop
zvalue
program zvalue
set more off
sort country year
foreach var of local 0 {
by country: egen
`var'mean=mean(`var')
by country: egen
`var'sd=sd(`var')
generate
`var'_z=(`var'-`var'mean)/`var'sd
by country: summarize
`var'_z
}
log close
end
exit
NOTE: YOU MAY
HAVE TO RETYPE THE SINGLE AND DOUBLE QUOTES WHEN COPYING THE DO-FILES CODES
INTO STATA
Do-file zvalue.do introduces some new commands: foreach,
egen, local 0. Foreach helps you to loop over a list of
variables, egen generates ‘analytical’
variables (like means, standard deviations, counts, sums, etc.), local 0
is a macro that stores lists (unlike local 1, 2, 3, which store argument 1, 2
or 3, local 0 stores argument 1 and 2 and 3). Notice the ‘by’, it helps you to
run the command by any group in your data (country in this case). The do-file
editor will look like:

Make sure Stata is working in the following
directory
H:\statadata\
Save the do-file zvalue.do here. From the command
window in Stata type
do zvalue
You can also run the do-file from the do-file editor
by clicking on the right arrow

It will create the program “zvalue”. We want to
create z-distributions for three variables: population, imports and exports. To
do this using the zvalue program type the following:
zvalue population imports exports
The program will create 9 new variables with
suffixes *mean, *sd and *_z. The summaries for each country on the z-values are
in the log file zvalue.log (which you can check with any word processor). Pay attention to the left
and right apostrophe in “var”. The program will run three times because we use
three variables (we could have written “foreach
var of population imports exports {“) defined in
local 0 (which makes the program more interactive if we want to use some other
variables)
Here
are some other examples on how to use “foreach”
foreach
var of varlist var1-var147 {
replace `var'=999 if `var'==.
sort `var'
gen id`var'=_n
replace id`var'=0 if id`var'>3
}
foreach
var of varlist x2000-x2006 {
gen y`var'=`var'
}
Using levelsof
gen betax1=.
gen betax2=.
gen betax3=.
gen sex1=.
gen sex2=.
gen sex3=.
gen constant=.
gen seconstant=.
levelsof countries, local(states)
foreach nation of local states {
reg y x1 x2 x3
if countries==`nation’
replace
betax1=_b[x1] if countries==`nation’
replace
betax2=_b[x2] if countries==`nation’
replace
betax3=_b[x3] if countries==`nation’
replace
sex1=_se[x1] if countries==`nation’
replace sex2=_se[x2]
if countries==`nation’
replace
sex3=_se[x3] if countries==`nation’
replace
constant=_b[_cons] if countries==`nation’
replace
seconstant=_se[_cons] if countries==`nation’
}
/*If you have a big dataset with lots of variables,
use this to delete variables with lots of missing values*/
/*You may have to install ‘mdesc’ since it is not a
built-in Stata command. Type “ssc install mdesc”. If this does not work type
“findit mdesc” select a link and click on ‘install’*/
/*The example below delete variables that have more
than 20% missing values, the cutoff is up to you*/
foreach var
of varlist var1-var200 {
set more off
mdesc `var'
local percent=r(percent)
if `percent'>0.2 drop `var'
else
}
NOTE: YOU MAY HAVE TO RETYPE THE SINGLE AND DOUBLE QUOTES WHEN
COPYING THE DO-FILES CODES INTO STATA
Now that we have the z-values, we proceed to generate a line graph for each
z-value per country. We will use the following do-file
DO-FILE:
linegraph.do
capture
program drop line
program
linegraph
set
more off
forvalues num = 1(1)37 {
local
titles "
Country01 Country02 Country03
Country04 Country05 Country06
Country07 Country08
Country09 Country10 Country11
Country12 Country13 Country14 Country15 Country16
Country17 Country18 Country19
Country20 Country21
Country22 Country23 Country24
Country25 Country26 Country27
Country28 Country29
Country30 Country31 Country32
Country33 Country34 Country35
Country36 Country37"
local
i : word `num' of `titles'
graph twoway line population_z imports_z exports_z year if
country1==`num', legend( order(1 "Population" 2 "Imports" 3
"Exports") ) xtitle(Year) ytitle(Z-value) title(country `num' `i')
subtitle(Z-values) lpattern(solid longdash shortdash) lwidth(medthick medthick
medthick) saving(`i',replace)
}
end
exit
NOTE: YOU MAY
HAVE TO RETYPE THE SINGLE AND DOUBLE QUOTES WHEN COPYING THE DO-FILES CODES
INTO STATA
Do-file line.do looks rather long and scary but it does not really. Here we have the
commands forvalues, graph twoway line, and two macros (local
titles and local i :)
Unlike foreach that loops across variables, forvalues loops across cases (in this case countries). The dataset has 37
countries so we need to generate 37 graphs. Forvalues num = 1(1)37 is defining a macro “num” (could be anything) that will
count 1 to 37 one by one (that is the one in parenthesis, If you put 2 it will
do 1, 3, 5, 7, etc).
Local titles defines the list of country names, here you have to be sure that the
seventh country will have a code number 7 (check the value labels by typing labelbook country1).
Local i will synchronize the titles with the country number and use it in the
graph title and the graph name (when saving it), it will basically say that
since we are working, for example with country 7, take the seventh name in the
list and use it to name/save the graph.
Graph twoway line is how Stata define line graphs (type help graph for details on this and other
types of graphs)
The do-file editor looks
like this

To run this do-file type in
Stata command window
do linegraph
To run the program that
makes the 37 graphs just type (and grab a cup of coffee or tea)
linegraph
You will see graphs
appearing and disappearing and your directory will be crowded with 37 graphs.
Once it finishes check some of the graphs typing
graph use [country name]
or
graph use
"H:\statadata\[country name].gph"
To verify country name and number, type
browse country if country1==[TYPE THE NUMBER OF THE COUNTRY YOU
WANT TO CHECK]
The browse window will
appear with the column of country name you called, for example
browse country if country1==23
You can do the same in case
you need to run 50 regressions or do some additional transformation.
Here is another example on how to use “forvalues”
forvalues
var = 1(1)147 {
gen file`var’=.
forvalues x = 148(1)5000 {
if abs(age[`var’]-age[`x'])<=6 {
if
abs(date[`var’]-date[`x'])<=365 {
if
((income[`var’]-income[`x']))==0 {
if
((turn[`var’]-turn[`x']))==0 {
replace file`var’
=((a[`var’]-a[`x'])/b)^2+((c[`var’]-c[`x'])/d)^2 in `x' if file`var’==.
}
}
}
}
}
}
NOTE: YOU MAY
HAVE TO RETYPE THE SINGLE AND DOUBLE QUOTES WHEN COPYING THE DO-FILES CODES
INTO STATA
For a comprehensive list of
Stata commands you should know check the following site (some may or may not
work in Stata 10)
http://www.ats.ucla.edu/stat/stata/notes2/commands.htm
[1] Everytime you type doedit a new edit window will
appear even if others are already open
[2] When using do-files you do not have to open Stata
beforehand, just double-click in the filename to run it
[3] Make sure you are in the working directory,
otherwise you will have to specify the path where you want to save the file.
[4] Make sure you are in the working directory,
otherwise you will have to specify the path where you want to save the file.
[5] When testing do-files it will be a nice idea to use
replace instead of append since every time you run a test all the errors and
changes will be appended to the file so you will end up with a very long log
file.
[6] Make sure you are in the working directory,
otherwise you will have to specify the path where you want to save the file.
[7] When testing do-files it will be a nice idea to use
replace instead of append since every time you run a test all the errors and
changes will be appended to the file so you will end up with a very long log
file.
[8] For further details type help keep in the command line. To drop variables use drop
(type help drop for details)
[9] Make sure you are in the working directory,
otherwise you will have to specify the path where you want to save the file.
[10] The command capture before label
variable helps to make the do-file
applicable for datasets lacking some variables.