Chapter 3 Measuring the Statistical Performance of Countries: An Overview of the Statistical Performance Indicators and Index

The World Bank’s Statistical Capacity Index (SCI) has been widely employed to measure country statistical capacity since its inception two decades ago. We build on the existing advantages of the World Bank’s widely-used SCI, both conceptually and empirically, to offer a new Statistical Performance Index (SPI) that can better measure a country’s statistical performance. We present clearer conceptual motivations, employ a stronger mathematical foundation, and significantly expand the number of indicators and countries covered by this index. We also further provide empirical evidence that further illustrates the strong correlation of this new index with other commonly used development indicators regarding human capital, governance, poverty, and inequality. Our framework can accommodate future directions to further improve this index as the global data landscape evolves.

3.0.1 Technical Information

3.0.1.1 Handling Missing Data

For a full description of the methodology behind each specific indicator, please consult the technical documentation. However, we did follow an approach of handling missing values and lining up data, which will be describe in general terms below. For indicators where we had a value for a previous year (say a value in 2018 but not 2019), we would fill in from the previous value. For instance, the open data watch indicators on geo-spatial data was only released in 2018, not 2019, so we filled in the value for 2019 with the value from 2018 as our best estimate of that indicator.

For indicators where we had no data for any years, we chose not to impute a value. In that case, the value for that indicator is null and the country will not have a value for any pillars or dimensions where that value is used.

3.0.1.2 Process for Inquiring about and Validating the Data

The data for the Statistical Performance Indicators are collected from established public and open sources. The team makes every effort to ensure that the data presented in the Statistical Performance Indicators are accurate, but it is possible that the sources used to assign values for the indicators are occasionally not up to date or accurate despite these efforts. Countries and all other users have the opportunity to inquire about the values that make up the indicators through contacting the Bank directly or via .

3.0.1.3 Process for Updating Indicators

While the framework put forth in this note is designed to capture the contours of statistical systems around the world over the next decade and beyond, the indicators themselves are expected to improve over time. Changing indicators over time does come with a tradeoff: while it would improve the measurement of statistical performance, it would break the comparability with previous measures. Recognizing this trade-off, we plan to follow a 2- to 3-year cycle when, new indicators may be introduced and current indicators re-evaluated based on feedback from stakeholders and users, including national authorities. Whenever such a change takes place, the historical SPI series would need to be updated for comparability over time. In order to be completely transparent, all changes to methodology will be tracked through a publicly available github repository and all code and underlying data to produce the indicators will be published.

3.0.2 SPI Index Methodology

An overall score is produced by combining the Statistical Performance Indicators to yield one single index. The statistical performance indicators have a nested structure, and the SPI overall score is formed by sequentially aggregating each level.

To begin we produce a score for each dimension, which, unless otherwise stated, is an unweighted average of the indicators within that dimension. For instance, the Standards and Methods dimension will be formed by taking the unweighted average of the indicators for the system of national accounts in use, national accounts base year, classification of national industry, CPI base year, classification of household consumption, etc.

\[ SPI.DIM_{ctpd} = \sum_{i=1}^{N_I} \frac{SPI.IND_{ctpdi}}{N_I} \]

where \(SPI.DIM_{ctpd}\) is pillar p, in dimension d, in time period t, and country c. \(SPI.IND_{ctpdi}\) is an indicator (e.g. population census score).

After computing a score for each dimension, a score for each pillar is computed, as either an unweighted or weighted average of the dimensions in that pillar. For pillars 1, 2, 4, and 5, the unweighted average of the dimensions within each pillar is taken. For pillar 3 on data products, we take a weighted average of the dimension scores, where the weights are based on the number of SDGs in each dimension (6 SDGs in dimension 3.1 on social statistics, 6 SDGs in dimension 3.2 on economic statistics, 2 in dimension 3.3 on environmental statistics, and 2 in dimension 3.4 on institutional statistics).2 This reflects a perspective that all SDGs are of equal importance, and therefore the dimensions are weighted accordingly. Additionally, for Pillar 4 on data sources, censuses and surveys are given separate weights, so that censuses, surveys, admin data, and geospatial data each receives a weight of 1/4. While censuses and surveys are in the same pillar in the framework, and therefore each would typically only receive a weight of 1/6 (for a total weight of 1/3) in this dimension, because of their importance in producing many indicators, they are given extra weight such that they each gets a weight of 1/4 (for a total weight of 1/2).

\[ SPI.PIL_{ctp} = \sum_{d=1}^{N_d} \frac{\omega_{pd} \times SPI.DIM_{ctpd}}{N_d} \]

\(\omega_{pd}\) is the weight for dimension d in pillar p.

After calculating the scores for each pillar, the SPI overall score is derived by taking the simple average across the 5 pillars.

The SPI overall score has a maximum score of 100 and a minimum of 0. A score of 100 would indicate that a country has every single element that we measure. A score of 0 indicates that none are in available. To be precise:

\[ SPI.INDEX_{ct} = \sum_{p=1}^{N_p} \frac{SPI.PIL_{ctp}}{N_p} \]

Where SPI.INDEX is the SPI overall score. SPI.PIL are the 5 SPI pillars listed above. In the notation, c is a country, t is the date, p is a pillar

The nested structure of our index and the summation methods used to build an overall score ensure the axiomatic properties outlined in [@cameron2021measuring].3 These properties include symmetry, monotonicity, and subgroup decomposability. Symmetry refers to property where if the values of two indicators in a nesting are switched, then the resulting index scores are unaffected. Monotonicity implies that if the value of an indicator improves, then the resulting index scores improve as well. Subgroup decomposability results from the fact that the scores are a weighted average of the subgroups (either indicators, dimensions, pillars) that make up that score and so can be written as a linear combination of those subgroups.

3.0.3 SPI Overall Scores

The purpose of the SPI is to help countries assess and improve the performance of their statistical systems. The presentation of SPI overall scores is designed to reflect that aim. Small differences between countries should not be highlighted since they can reflect imprecision arising from the currently available indicators rather than meaningful differences in performance. Instead, the presentation of overall SPI scores focuses on larger groupings of countries reflecting broad categories of performance as measured by the indicator framework. In total, there are 174 countries with sufficient data to compute an index value. This set of countries covers 99.2 percent of the world population.

The map is color coded based on the performance of countries on our index. Given the imprecision inherent in the calculations we recommend that the color coding provides the most detailed subdivisions of maturity. Finer distinctions are unlikely to provide meaningful differentiation between countries.

Countries are grouped into five categories as shown on the map in figure 6.1:

  • Top Quintile: Countries in the top 20% of the SPI overall score (shading in dark green).

  • 4th Quintile: Countries in the 4th quintile, or those above the 60th percentile but below the 80th percentile (light green).

  • 3rd Quintile: Countries in the 3rd quintile, or those between the 40th and 60th percentile (shading in yellow).

  • 2nd Quintile: Countries in the 2nd quintile, or those above the 20th percentile but below the 40th percentile (shading in light orange).

  • Bottom 20%: Countries in the bottom 20% (shading in dark orange ).

Figure 6.1: SPI Overall Score by Quintile in 2019
 

Figure 3.1:  

The countries scoring in the top 20% have an average SPI overall score of 86.4. A maximum score of 100 would indicate that a country has every single element in place that is measured through the SPI. Meanwhile, countries in the bottom 20% score significantly worse, with average scores of 37.5. In Pillar 1, countries in the top quintile have a score of 98.9, while countries in the bottom quintile have scores of 53. One area where even top quintile countries are not doing as well as possible is on our data products pillar. Countries in the top quintile score only 72.3 points out of 100 in this area. Pillar 3 measures whether countries have a value in the past 5 years for each SDG indicator, meaning even the top quintile countries have only around 72.3% of indicators averaging over the SDGs.

Table 6.1: Table of SPI Overall and Pillar Scores by Quintile Group in 2019

To highlight a few specific indicators, countries in the top 20% on average get nearly perfect scores in terms of the debt reporting to the World Bank. The top 20% of countries in the SPI overall score have an average score on their debt reporting of 1, with a maximum score of 1 and minimum score of 0, while those in the bottom 20% on the SPI overall score have an average score of 0.7. To take another example, when looking at scores on whether a population census has been conducted recently, countries in the top 20% score on average 1 on our population census indicator, with a max score of 1 and minimum score of 0, while those in the bottom 20% on our SPI overall score receive 0.8 points on the population census indicator. The table below shows differences across maturity groups for a select set of indicators.

Table 6.2: Table of SPI Overall and Select SPI Indicators by Quintile Group in 2019

3.0.3.1 SPI Overall Score by Region and Income Group

There are large differences in the SPI overall score across World Bank regions and income groups. Overall, North America has the highest average overall SPI score, while the Sub-Saharan Africa region has the lowest average score.4 There is also a clear gradient with respect to income groups. Countries classified as low income have lower scores on average than countries classified as middle income. High income countries have the highest average SPI overall score.

Finally, when looking over time, since 2016 the SPI overall score values are relatively stable across regions. There has been some modest increase in the SPI overall score scores for middle income and high income countries, with little progress for low income countries.

Figure 6.2.A: SPI Overall Score by Region in 2019
 

Figure 3.2:  

Figure 6.2.B: SPI Overall Score by Income Group in 2019
 

Figure 3.3:  

Figure 6.2.C: SPI Overall Score by Lending Group in 2019
 

Figure 3.4:  

Figure 6.2.D: SPI Overall Score by Fragile/Conflict Situation Status in 2019
 

Figure 3.5:  

Figure 6.2.E: SPI Overall Score by Year (2016-19)
 

Figure 3.6:  

As well as large differences across regions, there is significant variation in the SPI overall scores within regions. For instance, in the Latin America & Caribbean World Bank region, Mexico is the highest scoring country with a score of 87.5. However, Haiti, the lowest scoring country in the region, earns a substantially lower score of 37.5. In Sub-Saharan Africa, the highest scoring country is Mauritius with a score of 75.9, while the lowest scoring country is Somalia with a score of 19.6. In the East Asia and Pacific region, the top scoring country is Korea, Rep. with a score of 88.3, while the lowest scoring country is Marshall Islands with a score of 20.9.

Figure 6.3: SPI Overall Scores by Country and Region in 2019
 

Figure 3.7:  

3.0.3.2 SPI Scores by Pillar

By presenting scores for each of the pillars – data use, data services, data products, data sources, and data infrastructure – that compose the SPI overall score, it is possible to show where variation across countries is coming from and to assess the specific areas in which countries may be struggling

3.0.3.2.1 Pillar 1: Data Use

Figure 6.4 shows a world map displaying countries based on data use score quintiles Overall, the high income countries in North America and Europe tend to rate most highly along this dimension, whereas Sub Saharan African countries tend to lag.

Figure 6.4: Pillar 1: Data Use Scores by Quintile in 2019
 

Figure 3.8:  

Countries within region show significant dispersion in performance . It is worth noting that several countries such as the United States, Mexico, Finland, and Costa Rica, among others, receive the maximum possible score in this dimension of 100. Several countries score the minimum score as well, including the Syrian Arab Republic, South Sudan, Namibia, and Nauru, among others.

Figure 6.5: Pillar 1: Data Use Scores by Country and Region in 2019
 

Figure 3.9:  

The table below shows the average score on each indicator in pillar 1. Countries in the top 20% possess nearly all of the components in pillar 1, while countries in the bottom 20% are lacking in many areas. Countries in the bottom 20% have an average of 0.1 on the comparable poverty estimate indicator, an average of 0.6 on the availability of under 5 mortality data, and an average of 0.7 on debt reporting.

Table 6.3: Select Pillar 1 Indicator Scores by SPI Overall Score Quintile Group in 2019

3.0.3.2.2 Pillar 2: Data Services

For pillar 2 on data services, high income countries tend to score highest along this pillar India, Morocco, and the Philippines scored highly as well in this pillar. This pillar contains four indicators on the quality of data releases, online accessibility, advisory and analytical services (not included due to lack of data), and the availability and use of data access services.

Figure 6.6: Pillar 2: Data Services Scores by Quintile in 2019
 

Figure 3.10:  

Figure 6.7: Pillar 2: Data Services Scores by Country and Region in 2019
 

Figure 3.11:  

In terms of specific indicators, the top 20% of countries receive an average score of 1 on the indicator measuring country adoption of the IMF’s data dissemination standards, while those in the bottom 20% receive an average score of 0.5. Countries in the bottom 20% receive an average data openness score, produced by Open Data Watch, of 0. This indicator is made up of several sub-indicators including whether data is available in machine readable format, a non-proprietary format, has download options, metadata available, and terms of use. For instance, in the bottom 20% of countries the score on whether data is available in a machine readable format is only 0.3 out of a maximum of 1. Additionally, countries in the bottom 20% only score an average of 0.4 on the measure of whether metadata meeting the standards of the Database Documentation Initiative (DDI) is available on surveys conducted.5

Table 6.4: Select Pillar 2 Indicator Scores by SPI Overall Score Quintile Group in 2019

3.0.3.2.3 Pillar 3: Data Products

The data products pillar is based on whether countries are reporting data to monitor the SDGs. The SDG goals and targets provide a comprehensive, internationally agreed framework, encompassing indicators reflecting all domains of official statistics: economic, social, environmental and institutional which is relevant to all nations. For this pillar, indicators are produced using the UN Global SDG monitoring database. For each SDG indicator, a value is checked for availability within a five year window of a particular year. For instance, for 2019, is any indicator value available between the years 2015-2019.

For this indicator, there is a weaker relationship between a country’s income level and their performance in this pillar. No countries receive a maximum score of 100, which would indicate that they report on every SDG indicator at least once inside a 5 year window. Additionally, no country scores 0, which would indicate that they have reported no information for any of the SDG indicators.

The results for this pillar reflect two important aspects of the performance of a national statistical system, both of which need to be in place for a good score. First, do the SDG indicators required by the globally agreed framework exist for that country? And second, are those indicators available, in the agreed formats, on the UN Global SDG monitoring database? It is likely that some countries are producing more indicators but are not making them available through an internationally comparable database. For OECD countries, we supplemented the UN SDG database with comparable data submitted to the OECD following the methodology in [@oecdsdg]. The UN Global SDG monitoring database has been supplemented using this OECD database, because a clear methodology had been established to do so. Even with this supplemental data from the OECD included, there is considerable room for improvement in reporting on the SDGs.

Indicator values that are either country reported, country adjusted, estimated, or is included as global monitoring data are included. Values that were produced by an international organization through modeling are excluded. These classifications are based on the UN SDG metadata, where for each value of the indicator, the responsible international agency has been requested to indicate whether the national data were adjusted, estimated, modeled or are the result of a harmonized global monitoring exercise. The “nature” of the data classification in the SDG database is determined as follows:

  • Country data (C): Produced and disseminated by the country (including data adjusted by the country to meet international standards);

  • Country data adjusted (CA): Produced and provided by the country, but adjusted by the international agency for international comparability to comply with internationally agreed standards, definitions and classifications;

  • Estimated (E): Estimated based on national data, such as surveys or administrative records, or other sources but on the same variable being estimated, produced by the international agency when country data for some year(s) is not available, when multiple sources exist, or when there are data quality issues;

  • Modeled (M): Modeled by the agency on the basis of other covariates when there is a complete lack of data on the variable being estimated;

  • Global monitoring data (G): Produced on a regular basis by the designated agency for global monitoring, based on country data. There is no corresponding figure at the country level.

Figure 6.8: Pillar 3: Data Products Scores by Quintile in 2019
 

Figure 3.12:  

Figure 6.9: Pillar 3: Data Products Scores by Country and Region in 2019
 

Figure 3.13:  

Even countries in the top quintile have significant room for improvement in reporting on the SDGs. The average score for the top 20% of countries is 72.3, while for the bottom 20% it is 48. For a specific goal, such as for SDG 1 on poverty, the top 20% receives an average of 0.8 out of a maximum of 1. This means that the top performing countries are only reporting on 80 percent of the SDG indicators between 2015-19. The bottom 20% receive an average score of 0.4. The table below shows the full set of breakdowns by SDGs, suggesting reporting on SDG 5 on Gender lags other goals.

Table 6.5: Select Pillar 3 Indicator Scores by SPI Overall Score Quintile Group in 2019

3.0.3.2.4 Pillar 4: Data Sources

Pillar 4 examines whether countries have the data sources available that are necessary to produce statistics for public use. It includes three aspects of data sources: censuses and surveys, administrative data, and geospatial data. Private and citizen generated data is an important area that is at present not incorporated due to the lack of an established source. For censuses and surveys the score reflects both whether a data source exists and how recently the source was produced, as both are needed for accurate reporting of current conditions. For administrative data and geospatial data, proxies regarding the state of these data systems are used.

Figure 6.10: Pillar 4: Data Sources Scores by Quintile in 2019
 

Figure 3.14:  

When looking at the country chart, again there is significant variability across countries. No country receives the maximum score of 100, with many countries falling short particularly on our geospatial indicator. Somalia is the only country that receives a score of 0.

Figure 6.11: Pillar 4: Data Sources Scores by Country and Region in 2019
 

Figure 3.15:  

Countries in the top 20% have an average score of 72.5, while countries in the bottom 20% have an average score of 22.3. The gap between the top 20% and bottom 20% for the population and housing census indicator is 0.2 points on a scale from 0 to 1. The gap for countries conducting a business/establishment survey is 0.9. The scores on the geospatial indicator are low. The average score for the top 20% is 0.3, while it is 0.1 for the bottom 20%.

Table 6.6: Select Pillar 4 Indicator Scores by SPI Overall Score Quintile Group in 2019

3.0.3.2.5 Pillar 5: Data Infrastructure

The fifth pillar measures whether countries have the hard and soft infrastructure to produce the data sources, data products, and data services to produce useful data. For the pillar 5 sub-score, the only usable data is on a set of ten methods, standards, and classifications. Internationally accepted and recommended methodology, classifications and standards provide the basis for national statistical offices (NSOs) on data integration, facilitating data exchange and providing the foundation for the preparation of relevant statistical indicators. The following methods and standards are considered: System of national accounts in use, National Accounts base year, Classification of national industry, CPI base year, Classification of household consumption, Classification of status of employment, Central government accounting status, Compilation of government finance statistics, Compilation of monetary and financial statistics, and Business process. Data has also been collected on statistical legislation and governance, as well as on finance, but these indicators currently lack adequate country coverage to include in the index.

Countries scoring in the top 20% tend to be concentrated among the high income countries. These top scoring countries have an average score for pillar 5 of 97.1, which is near the maximum score of 100. Countries in the bottom 20% have an average score fo 25.4.

Figure 6.12: Pillar 5: Data Infrastructure Scores by Quintile in 2019
 

Figure 3.16:  

Several countries score the maximum of 100 in the data infrastructure pillar One country, Somalia, scores 0 points in this pillar.

Figure 6.13: Pillar 5: Data Infrastructure Scores by Country and Region in 2019
 

Figure 3.17:  

Among countries in the top 20%, the average score is near 100 across all the indicators of pillar 5. In the bottom 20%, countries on average score close to zero points for the CPI base year, classification of status of employment, central government accounting status, compilation of government finance statistics, and business process indicators. Bottom 20% countries, score above 0.5 points on the system of national accounts in use and compilation of monetary and financial statistics indicators.

Table 6.8: Select Pillar 5 Indicator Scores by SPI Overall Score Quintile Group in 2019

3.0.3.2.6 Correlations between SPI pillars

In the following chart, correlations between the SPI overall score and the individual SPI pillar scores are shown. All pillars are positively correlated with one another. At the same time, no dimension is perfectly correlated with any of the other dimensions, which would indicate that a dimension was not providing any additional information on the statistical performance of countries. The pillar with the single highest correlation with the overall measure is pillar 1 on data use; pillar 3 on data products has the lowest overall correlation.

Figure 6.14: Correlation Between SPI pillars in 2019
 

Figure 3.18:  

3.0.3.3 SPI Scores by Dimension and Indicator

While there are large differences across regions, income groups, and countries in the SPI overall score, there are also differences across indicators in the strengths and weaknesses of countries that have the same final score. For instance, some countries may reach a final score by excelling in the area of data sources, while others may reach a final score by excelling in data infrastructure. There are many paths that countries may take to reach an SPI overall score and only by studying the pillars and indicators for a particular country in detail can you understand why a country gets a particular score. To help interpret the scores, it is possible to analyze which of the pillars and indicators are most responsible for differences between countries.

In order to do so, a leave-out approach to calculating the SPI overall scores has been taken, which can help to understand which pillars and indicators are most impactful. The leave-out approach is similar in spirit to a jackknife approach (see [@miller1974jackknife] for an introduction), which has been used in a variety of situations to examine the sensitivity of estimates to leaving out specific observations. Specifically for this case, the leave-out approach consists of sequentially deleting indicators one at a time and recalculating the SPI overall scores. By sequentially omitting indicators, the total difference between the SPI overall score incorporating all indicators and an alternative SPI score that is calculated by omitting a single indicator can be computed.The pillars and indicators can then be ranked, based on which produce the greatest total difference between the SPI overall score and the alternative score.

Specifically, the approach is as follows:

  1. Calculate the SPI overall score for each country using all dimensions and indicators, \(SPI.INDEX_{c}\).

  2. For each indicator \(j=1,...,J\) do:

  • Generate an alternative SPI score omitting indicator \(j\), \(SPI.INDEX^{-j}_{c}\).

  • Calculate difference between the original score and the alternative score for each country in a year \(t\), \(e_{c}^{-j}=SPI.INDEX_{c}-SPI.INDEX^{-j}_{c}\)

  • Calculate the mean absolute difference across all countries, \(E^{-j}=\sum_{c=1}^{N_c} |e_{c}^{-j}|/C\)

  1. Sort indicators based on \(E^{-j}\).

Figure 6.15: SPI Indicator Importance for Top 10 and Bottom 10 Indicators in 2019

 

Figure 3.19:  

The indicator that drives the single largest difference between the SPI overall score and an alternative score omitting the indicator is the NADA metadata availability indicator in dimension 2.4 on data access services. This is closely followed by the indicator on whether geospatial information is available and whether a complete civil registration and vital statistics system (CRVS) is available. The indicators creating the smallest difference when omitted tend to the be the individual indicators in pillar 3 on data products. The indicator with the single smallest difference is in Pillar 3:4 on SDG 4

3.0.4 Analysis

3.0.4.1 Unique Values

As a check of the data, the number of unique scores for the SPI overall score is calculated. If the SPI overall score produces a large number of tied scores, for instance, then the index will be less able to distinguish between the statistical performance of countries. When calculating the number of unique values for 2019, it is found that there are 174 unique scores for 174 countries. This means there are 0 tied values.

When looking at each specific pillar, there are only 18 unique scores for pillar on data use. The data use indicator is coming solely from dimension 1.5 on data use by international organizations. For pillar, there are 163 unique scores. For pillar 3, there are 173 unique scores. There are 173 unique scores for pillar 4, and there are 20 unique scores for pillar 5.

3.0.4.2 Relationship to GDP Per Capita and the Human Capital Index

The correlation between the SPI overall score and and the log of GDP per capita and the World Bank’s Human Capital Index ((Bank 2020)) provide a face validity check between the SPI index and other outcomes. This analysis is not meant to assert a causal relationship, only to assess whether the SPI index is correlated with other outcomes in ways that might be expected. The source for GDP per capita comes from the World Bank’s World Development Indicators (WDI) database (NY.GDP.PCAP.KD). The GDP per capita numbers are in constant 2010 US$.

It would be expected that a strong positive relationship between GDP per capita of countries and their statistical system, as higher income countries would tend to have more resources available for statistical production. In fact, there is a strong relationship between the two. The correlation in 2019 between log GDP per capita and the SPI overall score is 0.66.6

Another measure of a country’s development is the Human Capital Index (HCI) developed by the World Bank ([@hci2019]). The Human Capital Index is designed to capture the amount of human capital a child born today can expect to attain by age 18 in a country. The index combines a country’s child mortality, learning adjusted years of schooling, adult survival rates & stunting into one index.7. Again, a strong positive relationship between a country’s HCI value and their Statistical Performance Indicators index might be expected, as countries with a more developed human capital stock are likely to have greater capacity to produce statistics. Again, this is what is seen. The correlation between the 2018 value of the HCI (the latest value available at the time of this writing) and the 2018 value of the SPI overall score is 0.79.

The scatter plot of the relationship between log GDP per capita, the HCI, and our SPI overall score for the years 2016-2019 is shown below. In general, countries with higher per capita income and higher levels of human capital tend to have better performing statistical systems according to the SPI measure.

Figure 7.1: Plot of SPI overall score on Human Capital Index and GDP per capita
 

Figure 3.20:  

So as to highlight countries where this relationships do not hold as well, the next figure shows the 15 countries that most over-perform and the 15 countries that most under-perform on the SPI Index compared to their levels of GDP per capita and the Human Capital Index. A perfect fit between the SPI overall score and GDP per capita and the Human Capital Index is not to be expected, as countries differ in the resources put into their statistical system, even conditional on their levels of development. Highlighting outliers can sometimes be a useful exercise for determining whether a measure is identifying on the ground realities.

In order to produce this figure, an OLS regression of the SPI overall score in 2019 on log GDP per capita has been estimated.The residual, which can be interpreted as the difference between the country’s SPI overall score value and the expected index value based on their GDP per capita has then been calculated. Countries with values of the residual greater than zero are over-performing based on their GDP per capita and countries with residuals less than zero are under-performing. The corresponding figure for the Human Capital Index is calculated similarly.

This figure identifies some countries that appear to have better performing statistical systems than might be expected (Rwanda, Uganda, Egypt, Mexico, Philippines, Armenia and Turkey appear in green on both charts). There are also countries that appear to have poorer performing statistical systems than expected: there are several small island states for example that appear red on both charts.

Figure 7.2: Top 15 Over/Under-Performers on SPI overall score compared to GDP per capita and Human Capital Index in 2019
 

Figure 3.21:  

3.0.4.3 Relationship to Government Effectiveness

A common justification for improving statistical systems is that doing so can lead to better governance. Without good statistics, countries may be flying blind on where to target resources to improve the public welfare. Also, good statistics can help hold public officials accountable for progress toward reaching a country’s goals. In this next section, it is shown that the relationship between our SPI overall score and an estimate of government effectiveness produced by the Worldwide Governance Indicators (WGI) is analyzed. A strong relationship between the SPI measure of statistical performance and the WGI measure of governmental effectiveness is found.

[@kraay2010worldwide] produce a set of Worldwide Governance Indicators, including a measure of government effectiveness. According to the WGI metadata, the government effectiveness indicator captures perceptions of the quality of public services, the quality of the civil service and the degree of its independence from political pressures, the quality of policy formulation and implementation, and the credibility of the government’s commitment to such policies. The estimate gives the country’s score on the aggregate indicator, in units of a standard normal distribution, i.e. ranging from approximately -2.5 to 2.5.8. The government effectiveness indicator is available from 1996 to 2019.

There is a strong relationship between the SPI overall score and the government effectiveness indicator. The correlation between the two in 2019 is 0.77. The scatterplot below shows the relationship between the SPI overall score and the government effectiveness indicator.

Figure 7.3: Plot of SPI overall score on Government Effectiveness in 2019
 

Figure 3.22:  

To tease out to what extent the relationship between the government effectiveness indicator and the SPI is due to other factors such as income or regional characteristics, the results from an OLS regression are presented below. While it is acknowledged that a more detailed study could be conducted to better understand the processes relating the two, a relationship is found between government effectiveness and statistical performance after accounting for income and regional characteristics of a country.

The regression model used takes the following form:

\[ G_{ctr} = \alpha_t + \gamma_r + \beta SPI.INDEX_{ctr} + \theta X_{ctr} + \epsilon_{ctr} \]

where \(Y_{ctri}\) is the government effectiveness estimate for country i, in time period t, and region r. \(SPI.INDEX_{ctri}\) is the SPI overall score. \(X_{ctri}\) is a set of control variables in the regression. This includes log GDP per capita from the World Bank WDI. \(\epsilon_{ctri}\) is the error term. \(\alpha_t\) is an indicator variable for each year and \(\gamma_r\) is a regional indicator variable.

The table below shows the estimate of \(\beta\), the effect of the statistical performance measure on government effectiveness. Full results from this regression are shown in a table in the appendix. The estimated coefficient is statistically significant at the 0.1% level, and implies that a 10 point increase in the SPI overall score is associated with a 0.2 standard deviation increase in government effectiveness. For context, this 10 point jump in the SPI would roughly take a country from roughly the median in terms of government effectiveness to roughly the 58th percentile.

Table 7.1: Linear Regressions of Government Effectiveness Score on SPI Overall Score from 2016-19

3.0.4.4 Country Changes in the Index

In order to assess how stable the index values are over time, comparisons have been made between the index values in 2016 and the 2019 values. Overall, the SPI overall score is quite stable over time. The correlation between the 2016 value and the 2019 value is 0.96.

Figure 7.4: Scatterplot of 2019 SPI overall score & 2016 SPI overall score
 

Figure 3.23:  

While the scores were relatively stable over time, some countries did see large improvements in their score from 2016-2019. The country that improved most on the index from 2016 to 2019 was Myanmar. Myanmar improved by 20.3 points out of 100. The table below shows the changes in the SPI overall score for the top 10 largest improvers.

Table 7.2: Top 10 Countries with Largest Changes from 2016-2019.

3.0.4.5 Density Plots

As another check, the distribution of scores across countries for each year has been presented and compared to a normal distribution. This exercise checks for whether the distribution of the SPI overall scores contains significant skew or fat tails. There is some indication of a bunching of scores near the top of the distribution of SPI scores. This is due to a large number of OECD countries possessing similar scores. This is not unexpected as OECD requires member countries to adhere to several methodological standards and to regularly report on a large set of indicators. These countries also are composed of several of the highest income countries that tend to be on the frontier of statistical production.

Figure 7.5: Distribution of SPI overall score across Countries
 

Figure 3.24:  

3.0.5 Conclusion

The new Statistical Performance Indicators (SPI) will replace the Statistical Capacity Index (SCI), which the World Bank has regularly published since 2004. Although the goals are the same, to offer a better tool to measure the statistical systems of countries, the new SPI framework has expanded into new areas including in the areas of data use, administrative data, geospatial data, data services, and data infrastructure. The SPI provides a framework that can help countries measure where they stand in several dimensions and offers an ambitious measurement agenda for the international community.

The goal of the SPI is to offer a framework that is forward looking, measures less mature statistical systems as well as advanced systems, covers the broader national statistical system beyond the National Statistical Office (NSO), and gives countries incentives to build a modern statistical system. The project uses open data and open code to build confidence in the work. The data will also be updated on a yearly basis to track progress over time.

More research and data collection is, however, needed to improve measurement. Several of the dimensions of the SPI do not have measurable indicators yet. One of the functions of the dashboard is to motivate action on the part of the international community to help improve the collection of data so we can better measure these areas.

Countries and international organizations need to know the current capabilities of national statistical systems, and the new SPI is meant to provide insights into this. Through the SPI, there is the potential for countries and donors to create mechanisms for learning from their peers to develop a virtuous cycle of investment, effectiveness, innovation, value added, and impact.

3.0.6 Appendix

3.0.6.1 Country SPI overall scores

Below, the full list of countries by their SPI overall score in 2019 is presented. The first column is the country name and the following columns are the overall SPI overall score, and then the sub-scores for pillars 1,2,3,4 and 5.

The purpose of the SPI is to help countries assess and improve the performance of their statistical systems. The presentation of SPI overall scores is designed to reflect that aim. Small differences between countries should not be highlighted since they are likely to reflect imprecision arising from the methodology rather than meaningful differences in performance. Instead, presentation of overall SPI scores focuses on larger groupings of countries reflecting broad categories of performance as measured by the indicator framework.

Countries shaded in dark orange are the lowest performing, countries in dark green are the highest performing. Countries are grouped into five groups:

  • Top Quintile: Countries in the Top quintile are classified in this group. Shading in dark green.
  • 4th Quintile: Countries in the 4th quintile, or those above the 60th percentile but below the 80th percentile are in this group. Shading in light green.
  • 3rd Quintile: Countries in the 3rd quintile, or those between the 40th and 60th percentile, are classified in this group. Shading in yellow.
  • 2nd Quintile: Countries in the 2nd quintile, or those above the 20th percentile but below the 40th percentile, are in this group. Shading in light orange.
  • Bottom 20%: Countries in the bottom 20% are classified in this group. Shading in dark orange .

Table A.1: SPI overall score in 2019 and Pillar Scores

3.0.6.2 Country SPI overall scores over time

Table A.2: SPI overall scores over time

3.0.6.3 SPI Overall and Pillar Scores by Region, Income, & Lending

Table A.3: Table of SPI Overall and Pillar Scores by Region in 2019

Table A.4: Table of SPI Overall and Pillar Scores by Income Group in 2019

Table A.5: Table of SPI Overall and Pillar Scores by Lending Group in 2019

Figure A.1: SPI Pillar 1 - Data Use Score - By Region in 2019
 

Figure 3.25:  

Figure A.2: SPI Pillar 2 - Data Services Score - By Region in 2019
 

Figure 3.26:  

Figure A.3: SPI Pillar 3 - Data Products Score - By Region in 2019
 

Figure 3.27:  

Figure A.4: SPI Pillar 4 - Data Sources Score - By Region in 2019
 

Figure 3.28:  

Figure A.5: SPI Pillar 5 - Data Infrastructure Score - By Region in 2019
 

Figure 3.29:  

3.0.6.4 Comparison to other measures of statistical performance

Next, the SPI is compared with several other indices of statistical performance that have been created. This provides a sense of how rankings differ across measures, how they correlate with other outcomes, and how the distributions of scores compare. These are the SCI, the Open Data Watch index (ODIN), and version 0 of the SPI overall score that was produced in [@cameron2019measuring]

We first compare the SPI overall score to the older World Bank Statistical Capacity Index. The correlation between the SPI overall score and the SCI is 0.765.

Figure A.6: Scatterplot of Statistical Capacity Index (SCI) and SPI Overall Score
 

Figure 3.30:  

Next, the relationship is shown between Log GDP per capita, the SCI and the SPI overall score using linear regression. The Open Data Watch ODIN score and the version of the Statistical Performance Index developed in [@cameron2019measuring] are also included. While showing a strong relationship between an index and log GDP per capita does not mean the index is necessarily correct, and is certainly not necessarily causal, it does provide a face validity check of the index. Heteroskedasticity robust standard errors are shown in the table.

Overall, the new SPI overall score has the strongest relationship to GDP per capita. The linear regression estimates indicate that a 1% increase in GDP per capita is associated with a 0.1 point increase in the SPI overall score. The r-squared from this regression is 0.38.

Table A.6: Relationship between Statical Performance Measures and GDP per capita.

```
 

(\#fig:scispiregtab) 

We compare the SPI overall score to the [Open Data Watch rankings of country statistical systems](https://odin.opendatawatch.com/report/rankings). The correlation between the SPI overall score and the ODIN index is 0.84
 

(\#fig:odin) 

As a final comparison, the new SPI overall score in compared with thehe index developed by [@cameron2019measuring], which can be thought of as a version 0 of t SPI overall score. The authors use similar data sources for their index. However, there are some differences. First, the similarities. The methodology for constructing the index is the same. Also, the censuses and surveys (Indicator 4.1) and standards and methods (Indicator 5.2) are identical. The indicator for data releases (Indicator 2.1) and data services (Indicator 2.4) are pulled from the information collected in the fourth dimension on dissemination practices from version 0 of the SPI in Cameron et al. (2019). Finally, both indicators include an indicator for Complete Vital Registration Statistics (CRVS). The CRVS indicator is in the administrative data section of the new SPI overall score, while it was in the standards, methods, and classifications section of the SPI version 0. For the differences, SPI version 0 had four dimensions, namely: (i) Methodology, Standards and Classifications (MSC), which provides information on the technology being used by the NSS; (ii) Census and Surveys (CS), which describes the intermediate products of the NSS; (iii) Availability of Key Indicators (AKI), which focuses on key final products needed for policy; and (iv) Dissemination Practices and Openness (DPO), which evaluates the extent to which products are publicly disseminated. The indicator on AKI, from version 0, is conceptually similar to the Data Products dimension in the new SPI, but uses different sources of data and the DPO section is similar to the SPI data services section, but draws on some different sources in some cases. We compare the SPI overall score to the SPI version 0. The correlation between the SPI overall score and the version 0 index is 0.933 *Figure A.7: Scatterplot of new SPI overall score on SPI Version 0 Index*
 

(\#fig:spiv0) 

#### Other Tables and Figures *Table A.7: Linear Regressions of Government Effectiveness Score on SPI Overall Score from 2016-19* ```{=html}

3.0.6.5 Indicator Metadata

Table A.8: SPI Indicator Metadata

3.0.7 References


  1. SDG 14 - Life Below Water - is omitted because land-locked countries do not report on these indicators.↩︎

  2. An earlier version of the journal publication was released as a World Bank Policy Research paper, [@cameron2019measuring]↩︎

  3. All tables and figures show unweighted summary statistics (i.e. the summary statistics do not weight by population).↩︎

  4. For more information, see https://ddialliance.org/↩︎

  5. To understand what effect taking the log has on this correlation, the correlation in 2019 between (non-logged) GDP per capita and the SPI overall score is 0.58.↩︎

  6. For more details, visit the Human Capital Index website↩︎

  7. Detailed documentation of the WGI, interactive tools for exploring the data, and full access to the underlying source data available at www.govindicators.org. The WGI are produced by Daniel Kaufmann (Natural Resource Governance Institute and Brookings Institution) and Aart Kraay (World Bank Development Research Group).↩︎