dlw is an R client of the internal datalibweb API of the World Bank. Datalibweb is a data system designed to enable users to seamlessly access microdata and documentation in the World Bank. Users can access the most up-to-date and historical versions of harmonized data collections and raw/non-harmonized data for subsequent analysis
Installation
Since dlw is hosted in the World Bank’s GitHub organization, you need to authorize your Personal Access Token (PAT) for SAML Single Sign-On (SSO) before you can install it. Follow these steps to authorize your PAT:
Go to your GitHub account: Visit https://github.com and log in.
Check your PAT: Go to Settings → Developer settings → Personal access tokens. Find the token you are using (or create a new one if needed).
Authorize your PAT for SAML SSO: Go to https://github.com/settings/tokens. Next to your token, You’ll see a button that says “Configure SSO.” Click it to authorize your token to the worldbank.
Try installing
You can install the development version of dlw from GitHub with:
# install.packages("pak")
pak::pak("worldbank/dlw")
# or
remotes::github_install("worldbank/dlw")
Usage
Token
To begin working with dlw, you must provide your datalibweb API token using the dlw_set_token()
function.
You can obtain or renew your token by visiting the datalibweb page and following the instructions there. Once you have your token, set it in your R session as follows:
dlw_set_token("your_token_here")
server catalob
After setting your token, you can download the catalog for the corresponding server using dlw_server_catalog()
. By default, this downloads the “GMD” catalog. The downloaded catalog is saved for the current session in a hidden environment within the dlw package, allowing for easy and efficient access in subsequent operations.
ctl <- dlw_server_catalog()
#> Pruning cache
#> ℹ saving ServerCatalog_GMD in .dlwevn
The dlw package returns results as data.table
objects, enabling lightning-fast data manipulation and filtering using concise syntax. For example, you can instantly list all available files for Colombia in 2010 in module “ALL” of the GMD collection with a single line of code:
ctl[Country_code == "COL" & Module == "ALL" & Survey_year == 2010,
.(FileName, Vermast, Veralt )]
#> FileName Vermast Veralt
#> <char> <char> <char>
#> 1: COL_2010_GEIH_V02_M_V09_A_GMD_ALL.dta V02 V09
#> 2: COL_2010_GEIH_V02_M_V08_A_GMD_ALL.dta V02 V08
#> 3: COL_2010_GEIH_v02_M_v07_A_GMD_ALL.dta v02 v07
Downloading files
The workhorse function to download data is dlw_get_data()
. However, it requires several pieces of information that you may not have at hand, such as:
dlw_get_data(
country_code = "PRY",
year = 2011L,
server = "GMD",
survey = "EPH",
module = "GPWG",
filename = "PRY_2011_EPH_V01_M_V03_A_GMD_GPWG.dta",
collection = "GMD"
)
To simplify this process, we have developed a wrapper function that works only for the GMD server: dlw_get_gmd()
. This function is much easier to use:
pry <- dlw_get_gmd(country_code = "PRY", year = 2011, module = "GPWG", vermast = "v01", veralt = "v03")
#>
#> ── dlw_get_data Calls ──────────────────────────────────────────────────────────
#> Call 1:
#> dlw_get_data(
#> country_code = "PRY",
#> year = 2011L,
#> server = "GMD",
#> survey = "EPH",
#> module = "GPWG",
#> filename = "PRY_2011_EPH_V01_M_V03_A_GMD_GPWG.dta",
#> collection = "GMD"
#> )
#> ℹ saving last_req in .dlwevn
#> ℹ saving last_raw_data in .dlwevn
#> Creating new version '20250617T202722Z-662d2'
#> Writing to pin 'PRY_2011_EPH_V01_M_V03_A_GMD_GPWG.parquet'
pry[, weighted.mean(welfare, weight, na.rm = TRUE)]
#> [1] 1099527
If you are interested in downloading the most recent version of a file, you can simply omit the version arguments:
pry <- dlw_get_gmd(country_code = "PRY", year = 2011, module = "GPWG")
#>
#> ── dlw_get_data Calls ──────────────────────────────────────────────────────────
#> Call 1:
#> dlw_get_data(
#> country_code = "PRY",
#> year = 2011L,
#> server = "GMD",
#> survey = "EPH",
#> module = "GPWG",
#> filename = "PRY_2011_EPH_V02_M_V01_A_GMD_GPWG.dta",
#> collection = "GMD"
#> )
#> ℹ saving last_req in .dlwevn
#> ℹ saving last_raw_data in .dlwevn
#> Creating new version '20250617T202725Z-6bd54'
#> Writing to pin 'PRY_2011_EPH_V02_M_V01_A_GMD_GPWG.parquet'
pry[, weighted.mean(welfare, weight, na.rm = TRUE)]
#> [1] 12675293
In this case, the most recent version available will be used (for example, master version v02 and alternative version v01).