Chapter 9 Temporal consistency of categorical variables


Unlike continuous variables for which averages, standard deviations, and ranges convey statistical and socioeconomical meaning, categorical variables may be analyzed by tabulating the frequency of their values. Figure 9.1 allows to plot the relative frequencies of categorical variables over time for a single country. For example, the user may tabulate the absolute and relative frequencies for the values in categorical variables such as relationharm, marital, urban, and educat7 for Pakistan as presented below.

Categorical quality check by country

Figure 9.1: Categorical quality check by country

Figure 9.2 does the same, but presents the results for all eight countries at the same time. For example, the frequency of values “Yes”" and “No”" for the harmonized variable ownhouse is presented below.

Categorical quality check for all countries

Figure 9.2: Categorical quality check for all countries

This tool is useful to evaluate whether categorical variables have been harmonized properly. A large change in the relative frequency of values in a categorical variable could indicate that the harmonization process has been inconsistent. For example, if someone mistakenly exchanges the value labels for urban (i.e., rural=1 urban=0 instead of rural=0 urban=1), the inconsistency with previous survey rounds could be easily detected in these dashboards. For example, as of today, Jun/19/2019, variable computer in Pakistan (in category Assets) presents a weird trend from 2013 to 2015. This clearly indicates an error either in the harmonization or in the raw data.