Chapter 10 Availability, missing values, and zeros in SARMD
Figures 10.1, 10.2, and 10.3 provide three ways of visualizing the availability, percentage of missing values and frequency of zeros in the harmonized variables.
According to the SARMD protocols, if the raw data does not have the necessary information to harmonize a particular SARMD variable, the variable must still be included in the dataset as a vector of missing values. If the variable is absent from the dataset, it could be the case that the raw data contains the necessary information to harmonize such variable, but it has not been harmonized yet; or that there is no information in the raw data and previous harmonizers decided not to include it in the variable as a vector of missing values. As it is impossible to know what the correct answer is, figure 10.1 shows all the variables that have not been harmonized in each dataset available in SARMD.
For example, note that all the variables of the assets category are absent from the datasets of the Maldives in 1997 and 2004. Another example, Pakistan 2015 lacks several variables of the assets category that were present in the surveys of previoues years. Moreoever, some variables like landphone
, cellphone
, and computer
that are present in 2015, were absent before 2015.
For thosse varaibles that are not absent in the dataset, figure 10.2 shows the share of observations with missing values in each dataset. For example, variable welfare
, which is used to estimate poverty and inequality meadures, is mostly available for all the observations of the SARMD datasets. However, Nepal 2010 has an astonishing 18% of missing values in such a variable. In other words, almost a fifth of the households surveyed in 2010, are not included in the any socioeconomic indicator of Nepal.
Finally, figure 10.3 shows for each variable its proportion of missings (like fig. 10.2), proportion of zeros, and mean.