Reshaping files from
the World Development Indicators and datasets with similar format
If you
do not see the menu on the left, please click here to see it
Preparing the data

Before putting this data in Stata you need to do some
editing. First, you need
to add a character to the column headings. Stata does not take numbers as
variable names. In this case we add an “x” to the years. In excel you do this
by using the ‘replace’ function. For the 1900s we replace “19” for “x19”, same
for the 2000s. See the following


So we
have something like this

Replace
the dots “..” (or any string character) with a blank

Make
sure the numbers are numbers. Go to Format
- Cells and select “Number” in the “Number” tab and click OK. This will
format the data as numeric with two decimals

The
table should look like:

Now
select the whole data set, press Ctrl-C, go to the Stata’s data editor and press Ctrl-V to paste the data,
you should have the following:

Another
alternative is to save the file as *.csv format. We will save here as
GDPTest.csv
Open
Stata and type in the command window
insheet
using gdptest.csv
The
Stata screen should look like this

Reshaping wide to long
In wide
to long from
|
id |
x2001 |
x2002 |
x2003 |
|
1 |
2 |
7 |
1 |
|
2 |
3 |
5 |
9 |
|
3 |
1 |
1 |
8 |
To this
you type:
reshape long x, i(id) j(year)
|
id |
year |
x |
|
1 |
2001 |
2 |
|
1 |
2002 |
7 |
|
1 |
2003 |
1 |
|
2 |
2001 |
3 |
|
2 |
2002 |
5 |
|
2 |
2003 |
9 |
|
3 |
2001 |
1 |
|
3 |
2002 |
1 |
|
3 |
2003 |
8 |
First create
a unique identifier for each row of data. Type:
gen
id = _n

Years
are organized by columns and variables as observations which are not suitable
for time series analysis or other type of analysis. We need to put it in a way
where each year represents an observation (row) and each column has a variable.
Type:
reshape
long x, i(id) j(year)

Where:
NOTE: If you have more
than one variable you can list them as follows:
reshape long x y z, i(id) j(year)
If you browse
your data, it should look like this, variable “x” has all the values for both var1
and var2.

To separate var1 and var2 we need to do a little bit of
work.
First we need to create a new variable with the labels of
each variable, type
encode
variable, gen(varlabel)

Create a do-file with the labels for each variable. This
comes in handy when dealing with a lot of variables.
label
save varlabel using varname, replace
You
will notice that varname.do is created.
![]()
Using
notepad or the do-file editor within Stata open varname.do.
Using the do-file editor this is what you will see

Do the
following changes:
1.
Change
“label define” to “label variable”
2.
Change
“varlabel 1” to “x1” and “varlabel 2” to “x2”
3.
Delete
“, modify”
4.
Save
the do-file.

Separating two stacked variables
You
need to create an id variable to identify the groups. The combination of egen and group creates an id
variable for the two time series within each group (‘var1’ and ‘var2’). Type
egen
id2 = group(country year)
move
id2 year
drop
id
drop
variable
To
reshape long to wide you should have a table that looks like the following
where one variable holds two different variables per country. Each country has
two variables from 1960 to 2006.
Now let’s
reshape from long to wide because we want our variables to have one column
each. Type
reshape
wide x, i(id2) j(varlabel)

Where:
NOTE: If “j” is not
available in your dataset, you may be able to generate one using the following
command:
bysort
id: gen jvar=_n
Then
reshape
reshape
wide data, i(id) j(jvar)
Run the do-file varname.do by selecting all and clicking
on the last icon, this will change the labels for x1 and x2
![]()
The variable window should look like

The final dataset should look like

Reshaping long to wide
In long
to wide we go from
|
id |
time |
r |
|
1 |
1 |
2 |
|
1 |
2 |
7 |
|
1 |
3 |
1 |
|
2 |
1 |
3 |
|
2 |
2 |
5 |
|
2 |
3 |
9 |
|
3 |
1 |
1 |
|
3 |
2 |
1 |
|
3 |
3 |
8 |
To this
|
id |
r.time1 |
r.time2 |
r.time3 |
|
1 |
2 |
7 |
1 |
|
2 |
3 |
5 |
9 |
|
3 |
1 |
1 |
8 |
For the example above you tipe:
reshape
wide r, i(id) j(time)
Here is a more complex example.
Let’s use the following dataset (click
here to get it)

Where “r” is return
rate and “i” is interest rate.
IN THIS EXAMPLE WE NEED TO WORK ON THE DATE VARIABLE
FIRST. IF YOU DO NOT HAVE THIS YOU CAN SKIP THIS PART.
Notice that time is separated
in years and months (monthly data). We need to combine them into one date
variable. Since they are numeric variables we need to put them together as
string variables. We convert them to string first and then we concatenate them.
Type
tostring month
year, replace

We then
concatenate them. Type
gen
date=year+"_0"+month if length(month)==1
![]()
This
will create the date variable for dates with months 1 to 9 (with “0” before to
make it double digit), see below

Now we
add the rest. Type
replace
date=year+"_"+month if date==""
![]()

We can
drop ‘year’ and ‘month’.
drop year
month
Let’s
put some order, type
order
id date
RESHAPING
WIDE TO LONG
Now,
let’s reshape. Type
reshape
wide r i, i(id) j(date) str
We add
‘str’ because ‘date’ is a string variable.

Where:
The
variable window should look like

And the
data is now in the following format:

NOTICE
ABOVE ‘RETURNS’ AND ‘INTEREST RATES’ ARE TOGETHER FOR THE SAME TIME PERIOD. IF
YOU WANT TO HAVE ALL ‘RETURNS’ AND ‘INTEREST RATES’ TOGETHER YOU NEED TO DO THE
FOLLOWING:
We need
to transpose the data as follows
xpose, clear varname
The
data will be transposed with an additional variable “_varname” (option varname
after comma)

We then
sort by _varname
sort _varname

And
transpose again as follows:
xpose, clear
Move
“id” to the top
order
id

You can
export the data all together or by subsets. Let’s export in two files, one for
returns and another for interest.
The following
will export the return data into a *.csv file (readable in excel)
outsheet id r* using
testr.csv, comma replace
Now the
interest data
outsheet i* using
testr.csv, comma replace
You can
open these two files in excel.
Noticed
the name of the files and beware of the path. The format above will save the
files in the current active directory.
To specify a path you will need to put it in quotation like
outsheet id r* using
“H:\mydata\testr.csv”, comma replace
Some
useful links:
http://www.ats.ucla.edu/stat/stata/notes/reshape.htm
http://www.ats.ucla.edu/stat/stata/modules/reshapel.htm
http://www.ats.ucla.edu/STAT/stata/modules/reshapew.htm
For
more details on this command type
help
reshape