Chapter 1 Acquiring household survey data
This chapter describes the data sources used for measuring poverty and inequality as well as how the data are selected and obtained.
1.1 Selection criteria and source of data
For most countries, the Poverty and Inequality Platform (PIP) relies on household surveys carried out by countries’ National Statistical Offices (NSO). Typically, official surveys used to report poverty for the country are used. This includes household budget surveys or household income and expenditure surveys. These surveys ask a representative sample households in the country detailed questions about their income and/or consumption and much more. In some occasions, the World Bank is a co-producer of these surveys.
Other surveys on consumption or income, such as surveys carried out by academic institutions and private sector companies are generally not included in PIP. The primary reason for this is that the estimates from PIP are the official vehicle for countries’ reporting on the Sustainable Development Goal target 1.1.1 on extreme poverty, which ought to come from sources that are officially recognized by countries.
For high-income countries in the European Union (with the exception of Germany), the data are obtained from the EU Statistics on Income and Living Conditions (EU-SILC). EU-SILC data are partly obtained from surveys and partly obtained from administrative records. For other high-income economies (Australia; Canada; Germany; Israel; Japan; South Korea; Taiwan, China; and the United States) data from the Luxembourg Income Study (LIS) Database are used. These data rely on surveys carried out by NSOs or by academic institutions. Data from the LIS database are also used for high-income countries from the European Union for years predating EU-SILC, which naturally causes a break in the series.
Occasionally, a survey fulfilling the above criteria is not incorporated into the platform because the World Bank does not have access to the data, because of concerns about the survey design or welfare aggregate, or because the auxiliary data needed (such as consumer price indices and purchasing power parities) are unavailable. Decisions to exclude a survey from the platform that has previously been available are discussed carefully in the Global Poverty Working Group within the World Bank and the reason for the exclusion of a data point is summarized in the “What’s New” documents.
All surveys used by PIP are assigned to a particular year using two variables year and datayear. If all fieldwork for a survey took place during one calendar year, and relate to the consumption or income of that year, year and datayear are both equal to this calendar year. When household surveys span two calendar years, the datayear noted in the Poverty and Inequality Platform is not an integer. The Gambia, for example, has a datayear of 2015.31. This means that 69% of the fieldwork for this particular survey took place in 2015 while 31% of the fieldwork months took place in 2016. In these cases, the year information is the floor of datayear. The datayear is used for interpolations, extrapolations, and at times, for the inflation to convert the welfare vector to constant prices. Notice that in some cases neither the datayear nor the year align with the year of the survey. For EU-SILC surveys, for example, the income information asked in a survey in given year relates to income of the previous year. Here, the datayear and year will both be one year prior to the year the survey was carried out.
1.2 How data are obtained
In many low-income countries, the Poverty and Equity Global Practice of the World Bank is cooperating with the National Statistical Offices and supporting their efforts of conducting household surveys and measuring poverty. Data are obtained through these partnerships. This concerns most countries in East Asia & Pacific, the Middle East & North Africa, South Asia, and Sub-Saharan Africa.
For Latin America and the Caribbean, most of the data are obtained from and harmonized by the CEDLAS’s Socio-Economic Database for Latin America and the Caribbean (SEDLAC), which is a partnership between the Center for Distributive, Labor and Social Studies (CEDLAS) at the National University of La Plata in Argentina and the World Bank’s Poverty and Equity Global Practice for Latin America and the Caribbean.
For most high-income countries data are obtained through the Luxembourg Income Study or EU-SILC.
All the data go through validity checks by the Poverty and Equity Global Practice of the World Bank. In certain occasions, the Poverty and Equity Global Practice is also responsible for harmonizing and constructing the welfare aggregates. The harmonized welfare aggregates are passed on to PIP.
In the majority of cases, the surveys used are nationally representative of all households in a country. Often, the selection of households is based on the most recent census. Yet three factors prevent surveys from being representative of the entire population:
- Some households are less likely to agree to participate in national surveys on income and consumption, or, if they participate, less likely to answer the questions used for calculating poverty. This is particularly a concern for wealthy households, which means that the distributions captured might have too thin a tail at the top.
- Some individuals do not reside in households as defined in household surveys and hence are not captured by the above sampling frame. This particularly concerns homeless people and individuals residing in nursery homes, prisons, and displacement camps.
- At times, conflicts and extreme weather events prevent enumerators from going to certain parts of a country. In these cases, some geographical areas may be excluded from the survey. The 2018-19 Nigeria Living Standards Survey, for example, is not representative of the Borno state, since parts of the state became inaccessible over the course of the survey.
For the purpose of calculating global or regional poverty, the poverty rates estimated in the surveys are assumed to be representative of the entire population despite of the three shortcomings above. This is largely because acquiring any alternative poverty estimate for the missing populations is challenging.
PIP also contains some surveys which explicitly only target the urban or rural part of the population. These surveys are not used for the purposes of calculating regional or global poverty, except for in Argentina and Suriname, which only have surveys representative of their urban population. For the urban population of Argentina and Suriname, these surveys are used for the regional and global poverty counts, while the rural population of Argentina and Suriname get the same treatment as countries for which we have no data.
1.4 Grouped vs. unit record data
In some cases, countries are unwilling to share unit-record data with the World Bank but they are willing to share grouped data. Grouped data are aggregated data, often to 5, 10, 20, or 100 quantiles of the distribution. Such data could reflect, for example, the mean income of each percentile or the income share of each percentile. For the purposes of estimating poverty and inequality, a Lorenz curve is estimated from these grouped data upon which inequality and poverty can be estimated (described here). Grouped data are used for Algeria, China, Guyana, Suriname, Turkmenistan, Trinidad & Tobago, Venezuela, and the United Arab Emirates and more countries for earlier years.
1.5 Multiple imputed micro data
In some countries, a full household survey with a comprehensive consumption measure cannot be collected due to concerns over the safety of enumerators and respondents, however, shorter interviews without a full welfare aggregate which minimize safety risks can be conducted. In these cases, a full welfare aggregate can be imputed using statistical models. These methods, described in greater detail in Arayavechkit et al. (2021) and Castaneda Aguilar et al. (2022), typically use multiple imputations, resulting in a dataset with many consumption values for each household. These multiple consumption values reflect the uncertainty that arises from the statistical modelling. Statistics of interest, such as poverty or inequality are computed separately on each of these consumption vectors and averaged to get to a final estimate. Currently such data are used for South Sudan 2016, Somalia 2017, Zimbabwe 2017, and Nigeria 2010, 2012, and 2015.