Chapter 2 Datalibweb: Getting Access to SARMD


SARMD is exclusively available through the datalibweb system to guarantee replicability, security, and efficiency. Datalibweb is a data system specifically designed to enable users to access the most up-to-date versions of non-harmonized (original raw data) and harmonized datasets of different collections across Global Practices. SARMD is one collection among other collections available through datalibweb. This section provides reference to the documentation provided by the Global Team for Statistical Development (GTSD) in the Poverty and Equity Global Practice.

Using the datalibweb system has several advantages:

  • Instant access to raw and harmonized surveys ready to use for country or regional level analysis
  • Automatically generated time-deflated variables and poverty lines
  • User friendly interactive interface in Stata
  • A suitable Stata command for replicability, research, and programing purposes
  • Ability to combine different country surveys and survey years
  • Merging raw data and modules of harmonized collections on the fly (option in progress)

2.1 Request access to the system

Unlike GMD, access to regional harmonized collections like SARMD or to the original raw datasets is not given to users automatically. Most datasets in SARMD are not public and the user must be granted access for internal use. Each user must submit a request for the different collections and surveys specifying the purpose of the data request. Users must contact the regional/collection focal point and ask for permission to use the data. Without any subscription, users will only be able to check the availability of the data, and query relevant metadata of all surveys.

To get access to SARMD, you may send an email to Francisco Javier Parada Gomez Urquiza and the datalibweb team specifying the following:

  • Small description of the project in which you are planning to use the data
  • Particular survey, country, year or collection that you are interested in accessing
  • Who is the TTL in charge of the project
  • Names and UPIs of the people who will be working directly with the data

2.2 Install the datalibweb command in Stata

The datalibweb system can be accessed through the datalibweb Stata command. This command is already installed in all World Bank computers with an official Stata licence. In case it is not installed on your computer, you could do the following:

Option 1. Direct Installation

  • Close all Stata sessions
  • Enter the following lines in Stata
net from "http://eca/povdata/datalibweb/_ado"`
net install datalibweb

Option 2. Manual Installation

  • Get the file from this link
  • Copy with replacement all the files into c:/ado without changing the folder structure.

2.3 Filename structure

All the data filenames in the SARMD repository follow the convention established by the International Household Survey Network (IHSN) for archiving and managing data. This structure can be generalized as follows:

CCC_YYYY_SSSS_ vNN_M_vNN_A_TYPE_MODULE

where,

  • CCC refers to 3 letter ISO country code
  • YYYY refers to survey year when data collection started
  • SSSS refers to survey acronym (e.g., LSMS, CWIQ, HBS, etc.)
  • vNN is the version vintage of the raw/master data if followed by an M or of the alternative/adapted data if followed by an A. The first version will always be v01, and when a newer version is available it should be named sequentially (e.g. v02, v03, etc.). Note that when a new version is available, the previous ones must still be kept.
  • TYPE refers to the collection name. In the case of the South Asia region the user would refer to SARMD or, at the global level, to the GMD collection type.
  • MODULE refers to the module of the collection. In the case of SARMD, it could be IND, which refers to the questions at the individual level, or FULL, which refers to the whole survey.

For example, if the user wanted to load the most recent version of the harmonized Pakistani Household Survey of 2015, she would refer to the file PAK_2015_PSLM_v01_M_v02_A_SARMD_IND.dta. In this case, PSLM refers to the acronym of the Pakistan Social and Living Standards Measurement Survey. v01_M means that the version of the raw data has not changed since it is still the first. v02_A means that the most recent version of the harmonization is 02.

2.4 Using datalibweb

Using datalibweb is straightforward. You may read the help file, which is detailed and clear, but you may also dive right in to get your data. There are two possible ways to access a dataset:

Clicking on the Stata results window

You may access the data by typing datalibweb in Stata and hitting Enter (see Figure 2.1). Then, click SAR on Stata’s results window and then click on the ISO code of your country of preference. Click PAK in this case. Then, select either the modules of the collection or its vintages next to SARMD. Finally, click 01-02, which refers to the version of the harmonization, to load the data.

Accessing datalibweb in Stata

Figure 2.1: Accessing datalibweb in Stata

Providing the complete datalibweb syntax

Alternatively, a more efficient method when coding is to type the following to load your data:

datalibweb, country(PAK) year(2015) type(SARMD) vermast(01) veralt(02) survey(PSLM) clear

Because there is only one survey for Pakistan in 2015 and you are interested in the most recent version, you could access that same dataset with a shorter version of the command:

datalibweb, country(PAK) year(2015) type(SARMD) clear

Only when you request a previous version of the dataset or when there is more than one survey available for a country in a given year, you would need to specify the name of the survey and the version. Otherwise, you may just use the shorter command.

2.5 Citing datalibweb:

datalibweb is a Stata program developed and supported by the Global Poverty Team for Statistical Development of the World Bank, with the contribution of many different teams:

  • Stata front-end application: Minh Cong Nguyen, Raúl Andrés Castañeda Aguilar, José Montes, and João Pedro Azevedo, with support from Paul Andrés Corral Rodas.

  • Plugin, IT coordinator/support: Paul Ricci, Louis Wahsieh Elliott, and Antonio Ramos-Izquierdo.

  • SharePoint web application: Intekhab Alam Sheikh, Monisha Menon, and Nishant Nitin Trivedi.

  • Overall datalibweb project supervision: João Pedro Azevedo and Minh Cong Nguyen.

datalibweb was based on the initial effort of datalib created by Raúl Andrés Castañeda Aguilar and João Pedro Azevedo in LAC TSD; and Datalib2 created by César Cancho and João Pedro Azevedo in ECA TSD.

Please cite the use of datalibweb and SARMD in one or several of the following ways depending on the nature of your work.

  1. “Source: SARMD (SARTSD/World Bank)”
  2. “Source: SARMD Harmonization (SARTSD)”
  3. SARTSD 2019 DATALIBWEB: SARMD Ex-post Harmonization. Countries: [country names/years (separated by semi-colon)]. As of [date of access (dd/mm/yyyy)]