Part 2 - Get and explore the data

R

Date: 04/15/2020
Data source
Developer Information
GitHub

This is second part of a two-part series on accessing the International Debt Statistics (IDS) database through the World Bank Data API. In Part 1, we queried the World Bank Data API to retrieve indicator names and location codes. In this guide, we will use that information to explore the regional trends of long-term external debt stocks from the IDS database.

The following code in this guide will show step-by-step how to:

  1. Setup up your environment with the needed R packages
  2. Input your data specifications (as selected in Part 1)
  3. Use the World Bank Data API call to return the specified data
  4. Explore the data through basic descriptive analysis and create a pretty chart.

1. Setup

To start, make sure you have the following packages installed on your machine. To install an R package type install.packages(“WDI”) with the correct package name into R. You can also visit each of the linked packages below for reference.

Then, open up your preferred mode of writing R, like R Studio. Now follow the rest of the steps below to retreive and analyze the IDS data.

# Load packages
library(WDI)
library(plotly)

2. Data Specifications

You will specify the the data that you want to explore using the following parameters:

Indicator(s)

In this guide, we will be looking at “long-term external debt stock” from the IDS database. To find the indicator for the data in which you’re interested, you can either explore the World Bank data catalog or use an API query as outlined in Part 1 of this series. You can also view the IDS indicators and codes in their hierarchical order on our data tables. The Analytical view shows the select indicators from the IDS publication and the Standard view shows all available indicators.

# Selecting the indicator
dataSeries = "DT.DOD.DLXF.CD"

Location(s)

To select a location by country, region, or income level category you will need to know its 2 or 3 letter code. To figure out what this code is, you can either use an API query as outlined in Part 1 of this series or use the convenient location-codes csv* in this repo.

We will select regional aggregates (these exclude high-income countries):

  • ECA: Europe & Central Asia
  • SSA: Sub-Saharan Africa
  • SAS: South Asia
  • LAC: Latin America & the Caribbean
  • MNA: Middle East & North Africa
  • EAP: East Asia & Pacific
*The location-codes csv was created using the API query: http://api.worldbank.org/v2/country?per_page=500&format=json&source=6
# Select the countries or regions
location = c("ECA","SSA","SAS","LAC","MNA","EAP")

Time

Here you will select the time frame for the data you are retrieving, by picking the first and last year of the time frame.

# Selecting the time frame
firstYear = 2009
lastYear = 2018

3. API Call

In this step, we will retrieve the data using the World Bank Data API call using the R package “WDI.” The following code uses the parameters selected above to request our data.

data = WDI(indicator=dataSeries, country=location, start=firstYear, end=lastYear)

If you want a quick preview of your freshly retrieved DataFrame, you can print the first 5 lines

head(data)
##   iso2c                                     country DT.DOD.DLXF.CD year
## 1    4E East Asia & Pacific (excluding high income)   1.404732e+12 2018
## 2    4E East Asia & Pacific (excluding high income)   1.287079e+12 2017
## 3    4E East Asia & Pacific (excluding high income)   1.170963e+12 2016
## 4    4E East Asia & Pacific (excluding high income)   1.037230e+12 2015
## 5    4E East Asia & Pacific (excluding high income)   1.039942e+12 2014
## 6    4E East Asia & Pacific (excluding high income)   8.537726e+11 2013

4. Explore the data!

Congratulations! At this point you should have the long-term external debt stock for regions (excluding high-income economies) from 2009 - 2018 all in a DataFrame called “data.”

Now we can do the following:

Data Cleaning

As you saw in the preview of the data in section 3, the data needs some basic cleaning.

The data for the long-term external debt stock is currently in units. To improve a table’s or chart’s readability, convert the units to billions and change the visible decimal places to zero.

# change units to billions
data$DT.DOD.DLXF.CD = data$DT.DOD.DLXF.CD/1000000000
# hide decimal places
data$DT.DOD.DLXF.CD = round(data$DT.DOD.DLXF.CD, 0)

These next sections of code will clean up the naming of headers and regions. First, it will rename the column headers. Second, it will remove the redundant “(excluding high income)” from the region names. We can instead include that information in the title of the chart. Finally, the code will remove any unneeded columns.

# Rename column headers
colnames(data)[colnames(data)=="country"] <- "Region"
colnames(data)[colnames(data)=="DT.DOD.DLXF.CD"] <- "ExternalDebtStock"
colnames(data)[colnames(data)=="year"] <- "Year"
# Remove "(excluding high income)" from each of the region names
data$Region <- gsub("excluding high income", "", data$Region)
# Remove the parenthesis
data$Region <- gsub("\\()", "", data$Region)
# Remove the iso2c column
data <- subset(data, select = -c(iso2c))

Now our data should be ready to present in a table or visualize. Let’s take a look at the first five lines again so we can compare the cleaned up data to the output in section 3.

head(data)
##                 Region ExternalDebtStock Year
## 1 East Asia & Pacific               1405 2018
## 2 East Asia & Pacific               1287 2017
## 3 East Asia & Pacific               1171 2016
## 4 East Asia & Pacific               1037 2015
## 5 East Asia & Pacific               1040 2014
## 6 East Asia & Pacific                854 2013

Data Visualization

Now use the package “plotly” to create a basic line graph of the regional trends in long-term external debt stock.

p<-plot_ly(
  data, x = ~Year, y=~ExternalDebtStock, 
  type="scatter", mode = "lines", 
  hoverinfo = "text",
  text=~paste("External Debt Stock: ",ExternalDebtStock, "<br>Region: ",Region, "<br>Year: ",Year),
  color=~Region)%>%
  layout(
    title= "Regional Long-term External Debt Stock (excluding high-income countries)(USD billion)"
  )
p