South Asia Regional Micro Database (SARMD) User Guidelines
2019-07-22
Welcome
This is a preliminary draft. Please do not cite or distribute without permission of the SARTSD
This book contains the user guidelines of the South Asia Regional Micro Database (SARMD), a collection of harmonized household surveys from the South Asia Region (SAR) that is used as underlying data to estimate comparable socioeconomic indicators at the country and regional levels in SAR. In addition, SARMD is used in the Global Monitoring Database (GMD), a global comparable micro database hosted by the World Bank across countries, regions and years for global poverty monitoring and welfare measurement. The household surveys contained in SARMD provide a rich environment to study expenditure patterns, demographics, educational attainment, employment, the acquisition of durable assets, and housing. The harmonization of these surveys allows to compare social and economic statistics among the eight countries in South Asia.
These guidelines are intended to teach you everything you need to work efficiently with SARMD. You will learn how to access the microdata through the datalibweb
system and how to perform basic (and not so basic) socioeconomic calculations. We will inspect how comparable household surveys are across the region and evaluate the quality of the harmonization process. We also provide several examples on how to conduct analysis at the regional or country level in our analytical notes.
Team
This book has been created by the South Asia Region Team for Statistical Development (SARTSD) under the direction of Benu Bidani. The team is composed by Raúl Andrés Castañeda Aguilar, Jayne Jungsun Yoo, and Francisco Javier Parada Gómez Urquiza. We are grateful for the comments and suggestions made by…
Replicability and license
This book is fully replicable. All the text files, codes, underlying data, and Tableau dashboards can be found in its GitHub repository worldbank/SARMD_guidelines. In addition, all the do-files used to harmonize the SARMD collection are available in the worldbank/SARMD Github repository. In Chapter 8 Variable derivation of SARMD we provide a deeper exploration of these harmonization do-files.
You are allowed to freely download and use any of the files in these repository, but please keep in mind that all the files—in particular the code—contained in this book and in the SARMD collection are simultaneously available under the GNU General Public License v3.0. This means that you are free to use any file in your own projects as long as you cite the source and, in case you publish your project, make your source code available under the same license. This way, we guarantee that any source code derived from this project is made freely available and cited properly.
How to read this book
This book is not intended to be read from cover to cover. Each chapter is independent from each other; yet we added some cross references along the book for clarification and expansion of certain topics. The book is divided into four main topics:
- Basic information. Explains how to use SARMD through datalibweb and provides a basic introduction to poverty measurement in SAR.
- Metadata analysis. This section contains a clear presentation of the SARMD inventory and the different components of the consumption aggregate (i.e., food, non-food, durables, and housing) for all of the surveys in the region.
- Quality check. This section summarizes the quality of the raw data and the harmonization. Even though several dashboards have been built to showcase the information, we are still working on the contents of these chapters.
- Analytical notes. Here we present about five (hopefully eight) analytical notes of socioeconomic findings at the regional level using SARMD. These notes could be easily converted into blogs.
In addition, many of the graphs in this book are interactive Tableau dashboards. Figure 1.1, for instance, is an example of these dashboards. You may visualize the share of households with access to a certain asset at a subnational level. You may change access to bicycle
to access to a computer
or cellphone
by using the filters provided. You may also switch to a different category of variables such as demographic or education variables. The filters allow the user to explore SARMD and learn what kind of data is available to study household welfare.
About the technical composition of this book
This book is written in R Markdown syntax and compiled with the bookdown package. The source files are available in the World Bank Github repository worldbank/SARMD_guidelines. To become a contributor to this project, please send an email to … requesting for access. Once you are granted access, you may clone or fork this repository and start contributing.
There are several advantages to composing books (especially technical books) using Markdown:
- You can create different types of output formats such as PDF, HTML, WORD, EPub, and even Kindle files.
- Everything is written using plain text, so you may enjoy the benefits of tracking changes through collaborative platforms as Git.
- You can execute
R
orStata
code directly from the text, include code chunks, and present its corresponding results inline wherever you want within the document. - You can add features like multi-page HTML output, numbering and cross-referencing figures/tables/sections/equations, inserting parts/appendices, and import GitBook styles to create elegant and beautiful books.
Software information and conventions
All the pieces of code are written like this
and sometimes you may find chunks of code like this:
This is a chunk of code,
and everything in it may be copied and
pasted directly into the corresponding
execution console, which could be
Stata or R.
Most of the code chunks in this book are written in Stata
syntax, but you may find some pieces of code written in R
language as well.
The R session information used when this book was compiled is the following:
sessionInfo()
## R version 3.5.3 (2019-03-11)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 17134)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] grid stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] bib2df_1.1.1 tidyr_0.8.3 xml2_1.2.0 readr_1.3.1
## [5] DiagrammeR_1.0.1 png_0.1-7 dplyr_0.8.1 ggplot2_3.1.1
## [9] spData_0.3.0 raster_2.9-5 sp_1.3-1 sf_0.7-4
##
## loaded via a namespace (and not attached):
## [1] httr_1.4.0 viridis_0.5.1 jsonlite_1.6
## [4] viridisLite_0.3.0 shiny_1.3.2 assertthat_0.2.1
## [7] highr_0.8 yaml_2.2.0 pillar_1.4.1
## [10] backports_1.1.4 lattice_0.20-38 glue_1.3.1
## [13] downloader_0.4 digest_0.6.19 RColorBrewer_1.1-2
## [16] promises_1.0.1 colorspace_1.4-1 htmltools_0.3.6
## [19] httpuv_1.5.1 plyr_1.8.4 XML_3.98-1.19
## [22] pkgconfig_2.0.2 bookdown_0.11 purrr_0.3.2
## [25] xtable_1.8-4 scales_1.0.0 brew_1.0-6
## [28] later_0.8.0 tibble_2.1.2 influenceR_0.1.0
## [31] withr_2.1.2 humaniformat_0.6.0 fortunes_1.5-4
## [34] lazyeval_0.2.2 cli_1.1.0 rgexf_0.15.3
## [37] magrittr_1.5 crayon_1.3.4 mime_0.6
## [40] evaluate_0.14 fansi_0.4.0 class_7.3-15
## [43] Rook_1.1-1 tools_3.5.3 hms_0.4.2
## [46] formatR_1.6 stringr_1.4.0 munsell_0.5.0
## [49] packrat_0.5.0 compiler_3.5.3 e1071_1.7-1
## [52] rlang_0.3.4 classInt_0.3-3 units_0.6-3
## [55] rstudioapi_0.10 htmlwidgets_1.3 visNetwork_2.0.7
## [58] igraph_1.2.4.1 miniUI_0.1.1.1 labeling_0.3
## [61] rmarkdown_1.13 gtable_0.3.0 codetools_0.2-16
## [64] DBI_1.0.0 R6_2.4.0 gridExtra_2.3
## [67] knitr_1.23 utf8_1.1.4 zeallot_0.1.0
## [70] addinexamples_0.1.0 KernSmooth_2.23-15 stringi_1.4.3
## [73] Rcpp_1.0.1 vctrs_0.1.0 tidyselect_0.2.5
## [76] xfun_0.7