If you do not see the menu
on the left click here to
see it
Before we introduce you to
programming in Stata we need to make sure you know how to enter data into Stata
and learn some basic commands along the way[1].
We will download real data in their original formats and we will proceed from
there. Once you are familiar with the basic principles of Stata we will move to
learn something about the famous ‘do-files”. If you have some experience with
Stata you can go straight to the programming part of the course.
Stata is a
multipurpose statistical package to help you perform data analysis, data
manipulation and graphics.
There are several ways to
enter data into Stata, in this course we will learn four different ways:
·
The ‘infile’ way from ASCII files using dictionary (*.dat, *.txt)
·
The ‘use’ way to open Stata files (*.dta, and
exporting from other stats packages like SPSS)
·
The ‘quickie’
way from excel (copy-and-paste)
·
The ‘insheet’ way from CSV files (*.csv)
·
The ‘infix’ way
for ASCII data (you need the layout or codebook to read columns)
Depending on your experience
with Stata you should go ahead and study the following sections at your own
pace.
To start using Stata you
need two things: a properly licensed version of the software and data.
Let’s take a brief moment to introduce Stata to those who
have never used it before. Stata users can skip this part.
When you first run Stata you
will see this screen

With some brief comments.

Stata has four windows.
·
The review
window (upper left) will show the history of the commands you submit (“_rc”
means error code)
·
The results
window (upper right) will show the output of every procedure you run
·
The variables
window (lower left) will show the variables of your dataset
·
The command
window (lower right) is where you type your Stata code and also DOS commands.
You can always use the
“point-and-click” method by using the menu. We recommend however, for most of
the procedures, to use the command line.
When you work with Stata
there are three basic procedures you may want to do first: create a log file, set your working directory, and set the correct memory allocation for
your data.
The log file
first. Go to File - Log - Begin

Stata will ask you where to
save the log file (you have the option of appending new output to an existing
log), choose the directory where you want to save the log. For this course we
will use the following directory in your H: drive
H:\statadata\
I will call the log file as Log1 and
save it as a log file (which can be read by any word processor).
Click on “Save as type:”
right below ‘File name:” and select
Log (*.log). This will create the file called Log1.log which can be
read by any word processor or by Stata (go to File – Log – View). If you save
it as *.smcl (Formatted Log) only Stata can read it. It is recommended to
save the log file as *.log

When you save it, the Stata results
window you will see something like this:

In the review screen you
will see the list of the commands you have typed so far and in the results
screen you will see the output of that command (this is useful if you need to
re-type a previous command just click on it in the review window and it will
appear in the command window)
An alternative is to type
the following code in the command window without the ‘point-and-click’ method
(it is always recommended to work with a log file open):
log using "H:\statadata\Log1.log", append
The second
thing to do is to check is your working directory. In the command window
type the following
pwd
‘pwd’ stands for ‘print
working directory’. This will show you your working directory, which right now,
in this example is H:\statadata.
![]()
You can also type in the
command window
dir
This will show you what is
in that directory (good old DOS command).
o change directory type in
the command window
cd H:\statadata\
![]()
You can see your current directory
by looking at the lower left of the Stata screen.

The third step
is to set the necessary memory allocation.
In the picture above you can
see in green letters after “Notes:” that the memory allocation is 10 mb. This
will be enough for a medium size database but sometimes you may need more
memory space to store your dataset.
To determine the size of
your dataset follow the formula:
Size (in bytes) = (8*Number
of cases or rows*(Number of variables + 8))
Depending on your Stata
version and computer power, you can allocate up to around 2 gigabytes. To
allocate 1 g you can type:
set mem 1g
Click here for a video
demonstration of these first steps.
Dealing with public opinion data
Please go to the
http://www.ropercenter.uconn.edu/
Go to the drop-down menu,
select “Dataset Collection” - “Recent Acquisitions”

Find the following study (if
you can’t find it try “search for datasets” in the same dropdown menu):
Conducted by Los Angeles Times, field dates: April 5-9, 2007, sample:
National adult [USLAT2007-544]
You can search by using the
study # USLAT2007-544.
Click on the link for the
study description. You will see links to the documentation (codebook) and the
data.

Codebook is offered in two
flavors: pdf and word, download one or the two of them if you want.
Data is in two formats:
ASCII (*.dat, sometimes is *.txt or *.data)
and SPSS portable (which you can open using SPSS and later save it –
file-save as—as Stata –this only work for SPSS v15 or later).
Download both files. You
will be required to enter your Princeton email (no password required as long as
you access it from the

Please save your files in
your “H:\” drive. For the course purposes we will be working in
the following directory:
H:\statadata\
Reading ASCII data using Stata
Most data files are in ASCII
format (*.dat, *.txt, *.data, fixed-format. It means “American Standard Code for Information Interchange”).
When dealing with large
datasets it is useful to create a dictionary file (*.dct)[2]
using the “data locations” in the codebook (you can find this section
either the *.pdf or *.doc).
A dictionary file is useful
when dealing with free format ASCII data.
The dictionary will help you
to let Stata know how many columns it has to read to process each variable (it
is like a map of your data)
To create a dictionary file
you can use notepad, wordpad or the do-file editor in Stata (the latter highly
recommended)
By “creating” I mean open
the word processor, typing the code and saving the file in a proper format
(with extension *.dct)
Let’s say we are interested
in the following variables (see the codebook for the lat544
database):
[NOTE:
The following is an extract from a codebook. Codebooks differ on
how they present the layout of the data, you need to look for: variable name,
start column, end column or length, and format of the variable (whether is
numeric –and how many decimals if any- or string)]
Data Locations
Variable
Rec Start
End Format
WTVAR 1 1
7 F7.2
GWBUSHJO 1 24
25 A2
GWBECON 1 26
27 A2
ECOSTATU 1 32
33 A2
DEMORREP 1 44
45 A2
…
…
…
·
First column
tells you the variable name,
·
Second column in
which record is located (for more on this visit http://www.columbia.edu/acis/eds/stat_pak/stata/stata-write.html,
or http://www.stata.com/support/faqs/data/dict.html),
·
Third column
indicates where the variable starts,
·
Fourth column
“End” shows where the variable ends and the last column shows the format of the
variable (numeric or alphanumeric). For example, variable WTVAR is in record 1,
starts at column 1 end column 7, is numeric (letter “F”) and has 2 decimal
points.
For the lat544
data example there are two ways we can do this. One is using infix and
the other is using infile.
The easiest way to extract
ASCII data and put it into Stata is to type directly in the command window the
layout of the variables you want by using infix. Type
infix varname [start column-end column] using mydatafile.*
For example:
infix wtvar 1-7 gwbushjo 24-25 gwecon 26-27 ecostatu 32-33 str demorrep 44-45 using
lat544.dat
NOTE: If a
variable is a string character you should add ‘str’ before the variable name
(not after) so Stata reads it as string.
This is what you will see in
the output window.
![]()
And this is what you will
see in the variables window (using either method).

You can also open a do-file
and use it to run the code (this may be useful when dealing with a lot of
variables)
infix ///
wtvar 1-7 ///
str gwbushjo 24-25 ///
str gwecon 26-27 ///
str ecostatu 32-33 ///
str demorrep 44-45 ///
using lat544.dat
NOTE: The
“///” tells Stata that all lines are part of one command (or one line)
You can also use infile to read fixed-format data (the dictionary file is a
bit more complex, type help
infile for further details).
Using notepad or the do-file
editor the dictionary TYPE the following (do not copy-and-paste, if you do so
apostrophes need to be re-typed):
dictionary using lat544.dat {
_column(1) wtvar %7.2f
“Weight”
_column(24) gwbushjo %2s “GW Bush Job Rating”
_column(26) gwecon %2s “GW Bush and the economy”
_column(32) ecostatu %2s “Status of Nat Econ”
_column(44) demorrep %2s “Vote demo or rep”
}
Where
·
“dictionary
using” is the standard code to define the file as a Stata dictionary (this is “dictionary using [name of the file with the
ASCCI data] {“). Also the curly brackets
should always be in the position you see them.
·
[IMPORTANT: Always a hard return for
every line]
·
“lat544.dat” is
the name of the ASCII data.
·
_column(*)
Indicates the position where the variable starts
·
Next write the
name of the variable
·
“%7.2f”
Indicates the format of the variable (the number after % shows the number of
columns -7 in this case- and the decimal points -2 in this example-, the letter
“f” refers to a numeric fix format –an “s” will indicate a string variable
–type help format in the Stata
command window for detail information)
·
After the format
you have the option to add a variable name in quotations.
Save the file as “lat544.dct”. When saving, make sure to select as “All files” in ‘Save as type:’
(right below of ‘file name’). This is how you create a dictionary file. To
run this code you need to open Stata.
Using notepad or the do-file
editor we wrote the following dictionary:
dictionary using lat544.dat {
_column(1) wtvar %7.2f
“Weight”
_column(24) gwbushjo %2f “GW Bush Job Rating”
_column(26) gwecon %2f “GW Bush and the economy”
_column(32) ecostatu %2f “Status of Nat Econ”
_column(44) demorrep %2s “Vote demo or rep”
}
Save it as lat544.dct
To read data using the
dictionary we need to import the data by using the command infile. If you want to use the menu go to File à Import à “ASCII data in fixed format with a data dictionary”.
With infile
we run the dictionary in the following way (for a details type help
infile in the command line in Stata,
also you may want to check out infix and outfile):
infile using lat544

In the “results” window you
will see your dictionary and the number of observations for your data (at the
end, press the ‘space-bar’ to finish the program). On the lower left side in
the variables window you will see the selected variables (red arrow). In the
output (results) window you will get a message that one variable has a string
character (“&”). Stata will ignore this character and convert it to missing
value.
To practice try to import
some other variables. This skill may be useful later on during this training.
If you type browse (in the command line), another window will open
showing your data in a spreadsheet format
Another way to use infile is
by typing directly into the command window (you can use this method If your
data is separated by comma, tabs or space)
infile [list of variables] using [name of the datafile including
the extension]
This command works fine when
you have few variables. If one of them is string you need to specify for
example
infile string7 v1 v2 v3 using mydata.txt
In this case v1 is a string
up to 7 characters long, the other variable v2 and v3 are numeric which is the
default.
Sometimes your data has a
more complicated structured where cases are in more than one row:
Example of a dictionary file when you have more than one record:
dictionary using tree.dat {
_lines(3)
_line(1)
_column(1) idnum %4f
_column(5) treetype %2f
_line(2)
_column(5) soilphn %3.2f
"Soil PH - North Side"
_column(8) soilphe %3.2f
"Soil PH -
_column(11) soilphs %3.2f
"Soil PH - South Side"
_column(14) soilphw %3.2f
"Soil PH -
_line(3)
_column(5) height %5.1f
_column(10) circ %5.1f
}
Source: http://www.columbia.edu/acis/eds/stat_pak/stata/stata-write.html
Save
the dictionary with extension *.dct [for example mydictionary.dct]
Run
it by typing:
infile
using mydictionary
Reading ASCII data using SPSS
If you get your data in
ASCII format and a setup file for SPSS (with extension *.sps) you need run
SPSS, go to file à
open à syntax, find and
open the *.sps. A SPSS text editor will open (it is called ‘syntax’, work the
same way as ‘do-files’ for Stata), you will see something like this:
…
…
…
…
DATA LIST FILE="path to
data" /
wtwar 1-7
gwbushjo 24-25
gwecon
26-27
ecostatu
32-33
demorrep
44-45 (A).
EXECUTE.
In ‘DATA LIST FILE’ write
the path to you data with the full name of the dataset (including the
extension)
…
…
…
…
DATA LIST
FILE="C:\mydata\lat544.dat" /
wtwar
1-7
gwbushjo 24-25
gwecon
26-27
ecostatu
32-33
demorrep
44-45 (A).
EXECUTE.
NOTE: The “(A)” means that
‘demorrep’ is a string variable. Alto notice the dots at the end of each command.
Select all and click on
to run it.

In the data window you will
see your data.
If the data is already
available in a statistical package format other than Stata, it is easier to use
that format (providing you have the software) and save it or export it to
Stata. This has the advantage of including the variable labels and, in some
cases, the value labels of the data. You can also use DBMS/Copy (click
here to learn how to use it)
Assuming that you have SPSS,
go ahead and open the file by double clicking on lat544.por or by opening SPSS and using the menu:

Change the ‘file of type’ to
“SPSS Portable (*.por)”, select the file and click “Open”

Once in SPSS you can save
the data as Stata format. In SPSS go to File à Save As, the following screen will comes up:

Following the red arrow
select from the list the latest version of Stata (or the version you are
working with). As an option, you can select the variables you need by clicking
in “Variables” (green arrow). For now
let’s get the whole dataset.
Save the data as lat544
and click “Save”. The data will be exported as lat544.dta
Going back to Stata.
A note on the log file. you can close the log file
and continue using it later on.
To close the log, type
log close
To continue working on the same log file type the
following
log using "H:\statadata\Log1.log",
append
The option append will add new output to your
existing log file.
If you still have your
previous data from the dictionary, type
clear
This will clear the data in
memory so you can start with a new dataset
To read a
Stata file, type
use lat544
or
use “C:\myfolder\mydata\lat544.dta”
You can also use the menu to
read a Stata file, go to File - Open.

The variable window will be
populated with all the variables in the dataset (with the variable name, label,
type and format)
In the command line type browse if you want to check the data.
A new spreadsheet-like
window will come up.
Close the window to go back
to the command line in Stata (it is important to know that when you browse or edit
your data you cannot use any of the other four windows until you close the
data editor)
Once you have the data
in Stata you can explore it by running the commands: describe, list, summarize and codebook
In the command window type
describe

The describe command will provide you info for the active dataset
(see white arrow) and the format of the variables (“display format”). [Hit
enter or spacebar to see the rest of the list]. Type help
describe for further details (if the
“--more-- ” message bugs you, type set
more off)
The list
command will list the data in a table format. Since we have many variables it
will be hard to read (try it, to stop the process type the letter “q” or click
on the red dot with a white “x” in the icon row below the menu). However you
can list some variables as follows:
list ecostatu ownfinan
gender

The
summarize command provides you with more information about your data. Type:
summarize

Summarize table tells you the number of cases, their mean, sd and min
and max values. Notice the ‘0” for most of the variables. This means that those
variables are in text (or string) format not numeric. From the codebook we know
they are supposed to be numeric.
To get percentiles and other statistics you can type the following[3]

Codebook is another useful Stata command to explore your data,
type

Type help
describe, help list, help summarize and help codebook in the command window for further details (check
also help inspect)
Numeric-string to numeric-numeric
To convert numeric variables
with a string (numbers in red) format into numeric we use the command destring
destring, replace

As you can see some
variables were replaced by a numeric format but others did not (see along the
red line, the screen may differ a bit from what you see) because they contain
some string characters.
Let’s run a frequency of one
of these variables to see what is going on. To do this we use the command tab

Along the red line note the
string character “&” but all the rest are numbers. We can still convert
this by using the option ‘ignore’. Type the following:
destring, replace ignore(&)
Now all
variables with the string character “&” will be converted to numeric and “&” set to missing. You can do this with any character (a, b, x, y, *,
etc).
NOTE: If destring still
does not work, here are some special cases you may want to check:
1)
Commas:
Var1
123
1,345
345
5,677
In this
case type destring Var1, replace
ignore(,)
WARNING: Sometimes decimals
are separated by comas, when using this make sure commas indicate thousands not
decimals. If you have decimals separated by comma type: destring Var1, replace dpcomma
2)
Spaces
Var1
123
1 345
345
5 677
In this
case type destring Var1, replace ignore(
)
3)
Dots
Var1
..
1 345
..
5 677
In this
case type destring Var1, replace
ignore(..) or sometimes
there is a space, type: destring Var1,
replace ignore(.. )
You can
also destring an individual variable, type
destring [variable(s)], replace
To save
the data go to File - Save As or type
save lat544, replace.
Continuing with the command tab (tabulate). Go ahead and type the following:
tab ecostatu
In the results window you
will se a frequency distribution of the variable ‘ecostatu’.

If you tab two variables you
will get a crosstabulation of those variables (not two different frequencies)
tab ecostatu gender

Please note the frequency
distribution and compare it with the data in the codebook. For the variable ecostatu in the codebook

The difference between the
codebook frequencies and the Stata are the weights. The codebook presents
weighted data. So we need to weight the data to get the right numbers. The
variable “wtvar” in the dataset (first variable in the list)
contains the weights. Type the following command and see the difference with
the previous one:
tab ecostatu [aw=wtvar]

Where “aw” means ‘analytic
weights’ (type help weight in the Stata command line for more details)
If you want to generate
frequencies for more than one variable you use tab1 instead of tab[4]:
tab1 gwbushjo gwbecon ecostatu

Tab
is a powerful command (type help tab in the command window for more details). For example, if you want to test
the hypothesis that two variables are independent and you want to have row and
column percentages you can use the tab command with the following options:
tab ecostatu ownfinan,
row col chi2

By now you might be
wondering what 1, 2, 3, etc. mean. When working with public opinion data you
work mostly with categorical variables whose values most of the time need to be
labeled[5]
Stata
creates an ‘alternative’ database for labels and you will need two commands to
label the values of the variables: label define and label values
Label define assigns a label to each category and label values assigns specific labels to a
variable
In the
case of gender, according to the codebook ‘1’ is for ‘male’ and ‘2’ is for
‘female’. Se we create a label called ‘sex’ as follows. Type:
label define sex 1 male 2 female
And we
assign it to the variable gender:
label values gender sex
Type tab gender and you will see the frequency
distribution with the labels
The same
thing with ‘ecostatu’. According to the codebook these are the codes for the
values
1 VERY WELL
2 FAIRLY WELL
3 FAIRLY BADLY
4 VERY BADLY
5 NOT SURE
6 REFUSED
Lets
create a value label called ‘scale1’.
label define scale1 1 “Very well” 2 “Fairly well” 3 “Fairly badly” 4
“Very badly” 5 “Not sure” 6 “Refused”
And
apply it to variable “ecostatu”
label values ecostatu scale1
Type tab
ecostatu to see it with the labels.
If you do not want to see
the labels, type:
tab ecostatu, nolabel

Converting to/from missing values
--- From value to missing
you type
For one value
mvdecode [name of the variable (type ‘_all’ if using all
variables)], mv([# to missing])
For more than one
mvdecode [name of the variable (type _all if using all
variables)], mv([# to missing] [# to missing] …)
or
mvdecode [name of the variable (type _all if using all
variables)], mv([# to missing]=. \ [# to missing]=.a \ [# to missing]=.b)
--- From missing to value
For one value
mvencode [name of the variable (type ‘_all’ if using all
variables)], mv([# assigned to missing])
For more than one
mvencode [name of the variable (type _all if using all
variables)], mv(.=[# assigned to missing]\ .a=[# assigned to missing] \ .b=[#
assigned to missing] )
Type help mvdecode or help mvencode for more details.
Example…
Using the previous example
let’s say you want to convert the option 5 (“Not sure”) and 6 (“Refused”) to
missing. Type
mvdecode ecostatu, mv(5=. \ 6=.a)
![]()
You get this

To do the reverse type:
mvencode ecostatu, mv(.=5 \ .a=6)

Converting text to date
Let’s say you have a date
variable in either one of the following formats: “date1” and “date2” are
strings (text) and “date3” is a plain number. This data goes from Jan 1st,
1995 to Jan 14, 2008. Neither of them is formally in date format. We will deal
with each in turn.

Summary
For ‘date1’ use à
STATA 10 - gen datevar1 =
date(date1,"DMY", 2008)
STATA 9.2 - gen datevar1
= date(date1,"dmy", 2008)
For ‘date2’ use à
STATA 10 - gen datevar2 =
date(date2,"MDY", 2008)
STATA 9.2 - gen datevar2
= date(date2,"mdy", 2008)
In both cases after creating the date variable you need to format it as
follows:
format datevar1 %td[6]
Converting “date1”
The structure of “date1” is
day-month-two digit year. For this we will use the function date()
to convert string variables into date variables. Type
STATA 10
gen datevar1 = date(date1,"DMY", 2008)[7]
STATA 9.2
gen datevar1 = date(date1,"dmy", 2008)[8]
Note, 2008 indicates that
the date variable changes from 1999 to 2000. In this case it points to the last
year in the series.
then type:
format datevar1 %td[9]
“2008” indicates the last
year of the series. Date() function only recognizes year as four digits for one
century, adding 2008 forces Stata to consider the change in centuries. For more
details type help date.

“Datevar1” should have the
same structure as “date1”. Check the change in century.

If you need quarterly data
you can transform that variable using the following:
gen quarters=qofd(datevar)
format
quarters %tq

Converting “date2”
The structure of “date2” is
month-day-four digit year. For this we will also use the function date()
to convert string variables into date variables. Type
STATA 10
gen datevar2 = date(date2,"MDY", 2008)
STATA 9.2
gen datevar2 = date(date2,"mdy", 2008)
Note, 2008 indicates that
the date variable changes from 1999 to 2000. In this case it points to the last
year in the series.
then type:
format datevar2 %td
Notice that we do not have
to specify 2008 as the last year since year has four digits.
![]()

If you need quarterly data
you can transform that variable using the following:
gen quarters=qofd(datevar)
format
quarters %tq

Converting “date3”
“Date3” has the following structure:
year(four digits)-month-day. It is numeric with different lengths.
We need first to separate
its date components.
We will generate a string
variable “date3a”.Type
gen date3a= string(date3,"%11.0g")

In “date3a” year has always the
first four characters, we can extract this by using the substr()
function:
gen year3=substr(date3a,1,4)

We cannot distinguish
between months and days since the rest of the characters in “date3a” have
different lengths. So we extract the rest after year. Type:
gen monthday3=substr(date3a, 5,.)

The maximum length of
“monthday3” is 4: two-digit months and two-digit days. We will extract these
first.
gen month3=substr(monthday3,1,2) if length(monthday3)==4
gen day3=substr(monthday3,3,2) if length(monthday3)==4

When “monthday3” length is 2
we could be sure the first digit represents the firs nine months and the second
digit the first nine days of the month. Se can extract these in the same way
but this time we will replace the missing:
replace month3=substr(monthday3,1,1) if length(monthday3)==2
replace day3=substr(monthday3,2,2) if length(monthday3)==2

You should have the
following…

I
f you scroll down you will
notice…

We will convert these partial
dates to a date variable. First we need to format “year3”, “month3” and “day3”
as numbers:
destring year3 month3 day3, replace

Now we generate the new date
variable using the mdy() function and format as %td:
gen datevar3=mdy( month3, day3, year3)
format datevar3 %td

“Datevar3” is now a partial
date variable.

Notice that one thing is the
date format and another is the actual date variable. As the following table
shows, dates are a special case of a numeric variable, where numbers are codes
for dates in consecutive order. We will fill the missing dates by simply
filling in the consecutive numbers in the series.
|
This is what you see |
This is what the computer "sees" |
|
10-Oct-95 |
13066 |
|
11-Oct-95 |
13067 |
|
12-Oct-95 |
13068 |
|
13-Oct-95 |
13069 |
|
14-Oct-95 |
13070 |
|
15-Oct-95 |
13071 |
|
16-Oct-95 |
13072 |
|
17-Oct-95 |
13073 |
|
18-Oct-95 |
13074 |
|
19-Oct-95 |
13075 |
|
20-Oct-95 |
13076 |
|
21-Oct-95 |
13077 |
|
22-Oct-95 |
13078 |
|
23-Oct-95 |
13079 |
|
24-Oct-95 |
13080 |
|
25-Oct-95 |
13081 |
|
26-Oct-95 |
13082 |
|
27-Oct-95 |
13083 |
|
28-Oct-95 |
13084 |
|
29-Oct-95 |
13085 |
|
30-Oct-95 |
13086 |
|
31-Oct-95 |
13087 |
To find the computer date
codes you can use the display()and the
td() functions. For example:
display td(1jan1995)
![]()
The computer code for Jan 1st.,
1995 is 12784. If, for example, you had the first date as missing, this is how
you would replace it (this is just an example):
replace datevar3=12784 in 1
![]()

Going back to the data.
Let’s make a copy of “datevar3a”
gen datevar3a=datevar3
![]()
Now we fill in the time
series:
replace datevar3a= datevar3a[_n-1]+1 if datevar3a==.
![]()

Format “datevar3a”
format datevar3a %td

Deconstructing date variables
Let’s say you already
have a date variable (“01jan1995”) and you need to extract days, months and
years. Using date functions type:
gen day=day(datevar1)
gen month=month(datevar1)
gen year=year(datevar1)
order date1 date2 date3 datevar1 day
month year


Date variables
with day of the week
If your date variable looks
like this

Here is a do-file to create
a date variable
/*Generating
date1 */
gen
date1=ltrim(subinword(date,"Monday,"," ",.))
replace date1=ltrim(subinword(date1,"Tuesday,","
",.))
replace
date1=ltrim(subinword(date1,"Wednesday,"," ",.))
replace
date1=ltrim(subinword(date1,"Thursday,"," ",.))
replace
date1=ltrim(subinword(date1,"Friday,"," ",.))
replace
date1=ltrim(subinword(date1,"Saturday,"," ",.))
replace date1=ltrim(subinword(date1,"Sunday,","
",.))
/*Generating
date2 */
gen
date2=subinstr(date1,"er","er,",.)
replace
date2=subinstr(date2,"y","y,",.)
replace
date2=subinstr(date2,"April","April,",.)
replace
date2=subinstr(date2,"March","March,",.)
replace date2=subinstr(date2,"August","August,",.)
replace
date2=subinstr(date2,"June","June,",.)
/*Generating
datevar */
gen datevar=date(date2,"MDY",2009)
format
datevar %td
You should get the following

For further details and
other formats type[10]
help date
Time variable
For details type help mf_date
Run this do-file and see
what happen.
set obs 100
/* Set the number of rows to 100 */
gen hr=0+int((23-0+1)*uniform()) /*Generating a random variable with
numbers between 0 and 24 to represent hours*/
gen min=0+int((59-0+1)*uniform()) /*Generating a random variable with numbers
between 0 and 50 to represent minutes*/
gen sec=0+int((59-0+1)*uniform()) /*Generating a random variable with numbers
between 0 and 60 to represent seconds*/
tostring hr min
sec, replace /*Convert numbers to
strings*/
gen time= hr+":"+ min+":"+
sec /*Creating ‘time’ variable as
string*/
destring hr
min sec, replace /*Convert strings
to numbers*/
gen time1=hms( hr, min, sec) /*Generating a time variable using
function hms()*/
gen elapse= time1- time1[_n-1] /*Creating a elapse variable in machine
code*/
gen elapsehr=hours( elapse) /*Converting elapse into hours*/
gen elapsemin=minutes( elapse) /*Converting elapse into minutes*/
You should have something
like this… well, not exactly since you are generating random numbers.
If you look at the variable
“elapsehr” you will notice in this example going from the
first row to the second took about 4 hrs or 237.1333 minutes.

Let’s say you have “time” in
the form of “hh:mm:ss” (red column above).
To separate it into hrs,
minutes and seconds as numbers (not time variables) you can use the substr
function:
gen hour=substr(time,1,2)
gen min=substr(time,3,3)
gen sec=substr(time,6,3)
To
create a time variable from a string variable you can use the function clock:
generate
time2 = clock(time, "hms")
Then format it as
format
time2 %tcHH:MM:SS
or
format
time2 %tcHh:MM:SSam
MOVING AVERAGE FOR PANEL DATA
Source: http://www.stata.com/support/faqs/stat/moving.html
Use the command egenmore,
you may have to install it first by typing
scc install egenmore
For the lags to work you may
need to xtset your data by typing
xtset [name of panel
variable] [time variable]
For example:
xtset country year
For a three year moving
average type
egen moveave = filter(x1),
lags(0/3) normalise
Where x1 is the variable of
interest.
Type help egenmore for more details.
EXTRACTING FROM FROM A NUMERIC/STRING COMBINATION
To remove strings from var1
below use the following command
gen
var2=regexr(var1,"[.\}\)\*a-zA-Z]+","")
destring var2, replace

To extract strings from a combination
of strings and numbers
gen var2=regexr(var1,"[.0-9]+","")

More info see: http://www.ats.ucla.edu/stat/stata/faq/regex.htm
Dealing with zip files within Stata
You can use unzipfile command to extract compressed data, for example
unzipfile mydata.zip
You can also zip file using zipfile
zipfile myzip.zip
mydata.dta
For some extra info please
check:
http://www.stata.com/statalist/archive/2007-08/msg00519.html
[1] A parallel goal of this course is to
introduce you to some useful data resources and show you how to deal with some
downloading issues and transformations.
[2] ACII files come in two flavors:
stream mode (where no specific structure is defined) and record mode (when each
line is define in certain way). Type help
infile for further details.
[3] If it does not work, you may have
to type destring income before summarize
[4] [NOTE: tab1 does not accept weights]. To simplify things we will work with
unweighted data.
[5] For a general guide on how to deal
with categorical data see http://www.ats.ucla.edu/stat/Stata/examples/icda/icdast2.htm
[6] Here are other useful date
functions:
mdy(month,day,year)for daily data
yw(year, week)for weekly data
ym(year,month)for monthly data
yq(year,quarter)for quarterly data
yh(year,half-year)for half-yearly data
[7] Here are other functions:
weekly(stringvar,"wy")
monthly(stringvar,"my")
quarterly(stringvar,"qy")
halfyearly(stringvar,"hy")
yearly(stringvar,"y")
[8] Here are other functions:
weekly(stringvar,"wy")
monthly(stringvar,"my")
quarterly(stringvar,"qy")
halfyearly(stringvar,"hy")
yearly(stringvar,"y")
[9] Here are other useful date
functions:
mdy(month,day,year)for daily data
yw(year, week)for weekly data
ym(year,month)for monthly data
yq(year,quarter)for quarterly data
yh(year,half-year)for half-yearly data
[10] To specify a range of dates (or
integers in general) you can use the tin() and twithin() functions. Tin()
includes the first and last date, twithin() does not.
tab a if tin(01jan1995,01jun1995)
tab b if twithin(01jan2000,01jan2001)