Skip to contents

Introduction

The Global Monitoring Database (GMD) is the World Bank’s central repository of harmonized, multitopic household income and expenditure surveys. These surveys, collected by national statistical offices, are compiled and processed by the Data for Goals (D4G) team, with support from regional statistics teams in the Poverty and Equity Global Practice. The Global Poverty & Inequality Data Team (GPID) in the Development Economics Data Group (DECDG) also contributes historical and external data, including from the Luxembourg Income Study (LIS).

The GMD harmonizes key variables to enable meaningful comparisons of poverty and sociodemographic trends across countries and over time. Its microdata underpin major World Bank initiatives, including the Poverty and Inequality Platform (PIP), the Multidimensional Poverty Measure (WB MPM), the Global Database of Shared Prosperity (GDSP), and the Poverty, Prosperity and Planet Reports.

# devtools::load_all(".")
library(dlw)

GMD server

The Datalibweb system is organized into several servers, each hosting one or more collections of data. For example, the GMD collection is hosted on the GMD server, which can be confusing since both the collection and the server share the same name.

To explore the catalog of data available on a server, use dlw_server_catalog(). If you do not specify a server, the default is dlw_server_catalog(server = "GMD").

ctl <- dlw_server_catalog(server = "GMD")
#> ℹ Returning ServerCatalog_GMD from .dlwevn
names(ctl)
#>  [1] "ServerAlias"    "Country"        "Year"           "Survey"        
#>  [5] "FilePath"       "Ext"            "FileSize"       "Timestamp"     
#>  [9] "Checksum"       "FileName"       "Country_code"   "Survey_year"   
#> [13] "Survey_acronym" "Vermast"        "Veralt"         "Collection"    
#> [17] "Module"         "ext"

The catalog contains all the metadata required to download any dataset from the GMD collection, provided you have the necessary access rights. For example, let’s examine the data available for Paraguay in 2020 within the GPWG module:

# Variables to display
vars_to_see <- c("FileName", "Country_code", "Survey_year", "Survey_acronym", "Vermast", "Veralt")

# The dlw package fully imports data.table, so you can use its syntax directly
ctl[Country_code == "PRY" & Year == 2020 & Module == "GPWG", ..vars_to_see]
#>                                 FileName Country_code Survey_year
#>                                   <char>       <char>      <char>
#> 1: PRY_2020_EPH_V01_M_V03_A_GMD_GPWG.dta          PRY        2020
#> 2: PRY_2020_EPH_V01_M_V02_A_GMD_GPWG.dta          PRY        2020
#> 3: PRY_2020_EPH_V01_M_V01_A_GMD_GPWG.dta          PRY        2020
#>    Survey_acronym Vermast Veralt
#>            <char>  <char> <char>
#> 1:            EPH     V01    V03
#> 2:            EPH     V01    V02
#> 3:            EPH     V01    V01

Notice that there are three different versions available for this survey. To download a specific version, you must explicitly provide all required arguments to the dlw_get_data() function, which is the main function for downloading data in the dlw package. For example:

dlw_get_data(
  country_code = "PRY",
  year = 2020L,
  server = "GMD",
  survey = "EPH",
  module = "GPWG",
  filename = "PRY_2020_EPH_V01_M_V03_A_GMD_GPWG.dta",
  collection = "GMD"
)

Downloading data

While specifying all these arguments is necessary, since the Datalibweb API requires them, it can be tedious. To simplify this process, the dlw_get_gmd() function acts as a convenient wrapper around dlw_get_data(). With dlw_get_gmd(), you only need to provide the country code, year, and module, and it will automatically download the most recent version of the data for you.

pry20 <- dlw_get_gmd(country_code = "PRY", year = 2020, module = "GPWG")
#> 
#> ── dlw_get_data Calls ─────────────────────────────────────────────────────────
#> Call 1:
#> dlw_get_data(
#>   country_code = "PRY",
#>   year = 2020L,
#>   server = "GMD",
#>   survey = "EPH",
#>   module = "GPWG",
#>   filename = "PRY_2020_EPH_V01_M_V03_A_GMD_GPWG.dta",
#>   collection = "GMD"
#> )

Notice that dlw_get_gmd() prints the actual call to dlw_get_data() in the console for your reference and verification.

All versions

By default, dlw_get_gmd() returns the most recent version of the data for a particular year. However, if you set the argument latest_version = FALSE, the function will instead return a list of calls—one for each survey version available for that year. This allows you to review and select the specific version you wish to download.

calls_pry20 <- dlw_get_gmd(country_code = "PRY",
                     year = 2020,
                     module = "GPWG",
                     latest_version = FALSE)
#> → your arguments do not uniquely identify a dataset.
#> You need execute one of the following:
#> 
#> ── dlw_get_data Calls ─────────────────────────────────────────────────────────
#> Call 1:
#> dlw_get_data(
#>   country_code = "PRY",
#>   year = 2020L,
#>   server = "GMD",
#>   survey = "EPH",
#>   module = "GPWG",
#>   filename = "PRY_2020_EPH_V01_M_V03_A_GMD_GPWG.dta",
#>   collection = "GMD"
#> )
#> Call 2:
#> dlw_get_data(
#>   country_code = "PRY",
#>   year = 2020L,
#>   server = "GMD",
#>   survey = "EPH",
#>   module = "GPWG",
#>   filename = "PRY_2020_EPH_V01_M_V02_A_GMD_GPWG.dta",
#>   collection = "GMD"
#> )
#> Call 3:
#> dlw_get_data(
#>   country_code = "PRY",
#>   year = 2020L,
#>   server = "GMD",
#>   survey = "EPH",
#>   module = "GPWG",
#>   filename = "PRY_2020_EPH_V01_M_V01_A_GMD_GPWG.dta",
#>   collection = "GMD"
#> )

Most recent year

If you do not specify the year, dlw_get_gmd() will return calls for all surveys available for the given country and module. If you want to retrieve only the most recent year (which may differ by country and module), set latest_year = TRUE. This will ensure you get the latest available data for your selection.

pry23 <- dlw_get_gmd(country_code = "PRY",
                     module = "GPWG",
                     latest_year = TRUE)
#> 
#> ── dlw_get_data Calls ─────────────────────────────────────────────────────────
#> Call 1:
#> dlw_get_data(
#>   country_code = "PRY",
#>   year = 2023L,
#>   server = "GMD",
#>   survey = "EPHC",
#>   module = "GPWG",
#>   filename = "PRY_2023_EPHC_V01_M_V01_A_GMD_GPWG.dta",
#>   collection = "GMD"
#> )

Old version

It is possible that, depending on the country and module, you may need to access an older version of the data. In such cases, you can specify the veralt argument to indicate which version you want to download. For example, if you want to download the second version of the Paraguay 2020 survey, you can do so as follows:

pry20_v2 <- dlw_get_gmd(country_code = "PRY",
                     year = 2020,
                     module = "GPWG",
                     veralt = "v02")
#> 
#> ── dlw_get_data Calls ─────────────────────────────────────────────────────────
#> Call 1:
#> dlw_get_data(
#>   country_code = "PRY",
#>   year = 2020L,
#>   server = "GMD",
#>   survey = "EPH",
#>   module = "GPWG",
#>   filename = "PRY_2020_EPH_V01_M_V02_A_GMD_GPWG.dta",
#>   collection = "GMD"
#> )