Reshaping files from the World Development Indicators and datasets with similar format

 

 

If you do not see the menu on the left click here to see it

 

 

Preparing the data (click here to get it)

 

Description: Description: reshape1.bmp

 

 

Before putting this data in Stata you need to do some editing. First, you need to add a character to the column headings. Stata does not take numbers as variable names. In this case we add an “x” to the years. In excel you do this by using the ‘replace’ function. For the 1900s we replace “19” for “x19”, same for the 2000s. See the following

 

 

 

 

So we have something like this

 

Description: Description: reshape2.bmp

 

 

Replace the dots “..” (or any string character) with a blank

 

 

 

 

Make sure the numbers are numbers. Go to Format  - Cells and select “Number” in the “Number” tab and click OK. This will format the data as numeric with two decimals

 

 

 

The table should look like:

 

Description: Description: reshape3.bmp

 

 

 

 

Now select the whole data set, press Ctrl-C, go to the Stata’s data  editor and press Ctrl-V to paste the data, you should have the following:

 

Description: Description: reshape4.bmp

 

 

Another alternative is to save the file as *.csv format. We will save here as GDPTest.csv

 

Open Stata and type in the command window

 

insheet using gdptest.csv

 

The Stata screen should look like this

 

 

Description: Description: reshape5.bmp

 

 

Reshaping wide to long

 

In wide to long from

 

 

id

x2001

x2002

x2003

1

2

7

1

2

3

5

9

3

1

1

8

 

To this you type:

 

reshape long x, i(id) j(year)

 

id

year

x

1

2001

2

1

2002

7

1

2003

1

2

2001

3

2

2002

5

2

2003

9

3

2001

1

3

2002

1

3

2003

8

 

 

First create a unique identifier for each row of data. Type:

 

gen id = _n

 

Description: Description: reshape6.bmp

 

 

 

Years are organized by columns and variables as observations which are not suitable for time series analysis or other type of analysis. We need to put it in a way where each year represents an observation (row) and each column has a variable. Type:

 

reshape long x, i(id) j(year)

 

 

 

Where:

 

 

NOTE: If you have more than one variable you can list them as follows:

 

     reshape long x y z, i(id) j(year)

 

 

 

If you browse your data, it should look like this, variable “x” has all the values for both var1 and var2.

 

Description: Description: reshape7.bmp

 

 

To separate var1 and var2 we need to do a little bit of work.

 

First we need to create a new variable with the labels of each variable, type

 

encode variable, gen(varlabel)

 

 

 

 

Create a do-file with the labels for each variable. This comes in handy when dealing with a lot of variables.

 

label save varlabel using varname, replace

 

You will notice that varname.do is created.

 

 

Using notepad or the do-file editor within Stata open varname.do. Using the do-file editor this is what you will see

 

Description: Description: reshape8.bmp

 

Do the following changes:

 

1.    Change “label define” to “label variable”

2.    Change “varlabel 1” to “x1” and “varlabel 2” to “x2”

3.    Delete “, modify”

4.    Save the do-file.

 

 

Description: Description: reshape9.bmp

 

 

Separating two stacked variables

 

 

You need to create an id variable to identify the groups. The combination of egen and group creates an id variable for the two time series within each group (‘var1’ and ‘var2’). Type

 

 

egen id2 = group(country year)

move id2 year

drop id

drop variable

 

 

To reshape long to wide you should have a table that looks like the following where one variable holds two different variables per country. Each country has two variables from 1960 to 2006.

 

Now let’s reshape from long to wide because we want our variables to have one column each. Type

 

reshape wide x, i(id2) j(varlabel)

 

 

 

Where:

 

 

NOTE: If “j” is not available in your dataset, you may be able to generate one using the following command:

 

bysort id: gen jvar=_n

 

Then reshape

 

reshape wide data, i(id) j(jvar)

 

 

Run the do-file varname.do by selecting all and clicking on the last icon, this will change the labels for x1 and x2

 

Description: Description: reshape9c.bmp 

 

 

The variable window should look like

 

Description: Description: reshape9d.bmp

 

 

The final dataset should look like

 

Description: Description: reshape9e.bmp

 

 

 

 

Reshaping long to wide

 

In long to wide we go from

 

 

id

time

r

1

1

2

1

2

7

1

3

1

2

1

3

2

2

5

2

3

9

3

1

1

3

2

1

3

3

8

 

 

To this

 

id

r.time1

r.time2

r.time3

1

2

7

1

2

3

5

9

3

1

1

8

 

 

For the example above you tipe:

 

reshape wide r, i(id) j(time)

 

Here is a more complex example.

 

Let’s use the following dataset (click here to get it)

 

 

 

Where “r” is return rate and “i” is interest rate.

 

IN THIS EXAMPLE WE NEED TO WORK ON THE DATE VARIABLE FIRST. IF YOU DO NOT HAVE THIS YOU CAN SKIP THIS PART.

 

Notice that time is separated in years and months (monthly data). We need to combine them into one date variable. Since they are numeric variables we need to put them together as string variables. We convert them to string first and then we concatenate them. Type

 

tostring month year, replace

 

 

We then concatenate them. Type

 

 

gen date=year+"_0"+month if length(month)==1

 

 

 

This will create the date variable for dates with months 1 to 9 (with “0” before to make it double digit), see below

 

 

Now we add the rest. Type

 

 

replace date=year+"_"+month if date==""

 

 

 

 

 

We can drop ‘year’ and ‘month’.

 

drop year month

 

Let’s put some order, type

 

order id date

 

 

RESHAPING WIDE TO LONG

 

 

Now, let’s reshape. Type

 

reshape wide r i, i(id) j(date) str

 

We add ‘str’ because ‘date’ is a string variable.

 

 

Where:

 

 

 

The variable window should look like

 

 

 

And the data is now in the following format:

 

 

NOTICE ABOVE ‘RETURNS’ AND ‘INTEREST RATES’ ARE TOGETHER FOR THE SAME TIME PERIOD. IF YOU WANT TO HAVE ALL ‘RETURNS’ AND ‘INTEREST RATES’ TOGETHER YOU NEED TO DO THE FOLLOWING:

 

 

We need to transpose the data as follows

 

xpose, clear varname

 

The data will be transposed with an additional variable “_varname” (option varname after comma)

 

 

We then sort by _varname

 

sort _varname

 

 

 

And transpose again as follows:

 

xpose, clear

 

Move “id” to the top

 

order id

 

 

You can export the data all together or by subsets. Let’s export in two files, one for returns and another for interest.

 

The following will export the return data into a *.csv file (readable in excel)

 

outsheet id r* using testr.csv, comma replace

 

Now the interest data

 

outsheet i* using testr.csv, comma replace

 

You can open these two files in excel.

 

Noticed the name of the files and beware of the path. The format above will save the files in the current active directory. To specify a path you will need to put it in quotation like

 

outsheet id r* using “H:\mydata\testr.csv”, comma replace

 

 

 

Some useful links:

 

http://www.ats.ucla.edu/stat/stata/notes/reshape.htm

http://www.ats.ucla.edu/stat/stata/modules/reshapel.htm

http://www.ats.ucla.edu/STAT/stata/modules/reshapew.htm

 

For more details on this command type

 

help reshape