1 Background

This document builds upon research first presented in the discussion paper Options for Improving Use of ESG for Sovereign Bond Analysis (World Bank, 2018).

Interviews with ESG data providers found that most obtain at least some and often a substantial amount of data from World Bank databases. 137 indicators were specifically identified from these interviews, of which 127 could be mapped to active databases in the World Bank’s data API, enabling the authors to perform a rapid assessment of data coverage and gaps, as included in the 2018 discussion paper. The 2018 paper found that “data coverage is a significant issue in WBG data used for ESG.” Looking at most recent available values (MRVs) by indicator and country, the paper found that just 41 ESG indicators (out of 127) had a value from 2017 or later (1 year old) for at least 50% of countries; 74 ESG indicators had a value from 2015 or later (3 years old) for at least 50% of countries. However, while the 2018 paper suggested a set of options for improving the availability and usefulness of ESG data, it stopped short of further investigating the reasons that might give rise to gaps in data coverage or suggesting specific strategies to address them.

The objective of this report is to pick up where the previous paper left off, and to better understand the circumstances that explain gaps in data coverage. The hope is that with a better understanding of why gaps occur and the significance of various explanatory factors, effective steps can be established to eliminate or mitigate gaps, and better understand which kinds of gaps are most relevant for ESG analysis.

The study set of ESG indicators in this report is different than the one in the 2018 report. Whereas the 2018 report excluded indicators used by providers that the Bank no longer actively maintains, this report includes those since they are relevant to the analysis. Additionally, this report includes indicators from various products introduced since the 2018 report, including the World Bank’s own curated ESG dataset. Conversely, we decided to remove a subset of indicators used by a single provider because the strong similarities among them (e.g., very similar trade or debt measures) were resulting in double-counting that could potentially skew the findings. Accordingly, this report is based on a body of 134 indicators, compared to 137 in the 2018 report.

The other major difference between the indicators in the two reports is that many of them have been updated since the 2018 report was completed. Many statistical indicators have been updated several times. Accordingly, if the 2018 analysis were re-run using the indicators from this analysis, the findings would likely be quite different, and different yet again if the analysis were run a year hence. One of the goals of this analysis is to provide a framework for thinking about data availability and coverage that is reasonably independent of the data curation cycles for the indicators under study.

1.1 What is a “Data Gap”?

The term “data gap” is somewhat ambiguous, so we should start by discussing what kinds of gaps can exist in datasets. For instance, data could be unavailable for a number of relevant economies, or there could be gaps in the time series over a relevant time period. There could also be gaps in metadata and other documentation. Data could also simply be unavailable or undefined for important concepts (such as “resilience”), necessitating the use of data proxies. While all of these are potentially relevant, the most important gaps in the context of ESG likely involve the most recently available values compared to the current time period, since ESG analysis concerns investment decisions being made today and in the near future. Accordingly, this paper defines a “data gap” as a significant difference between the current calendar year and the most recent available value(s) (MRVs) for the indicators and economies under study. Gaps in metadata or in time periods before the MRV are not a primary focus of this analysis.

1.2 How This Paper is Structured

This paper applies three separate approaches to better understand coverage gaps in ESG indicators:

Coverage Analysis. This approach provides a more detailed and visual picture of temporal gaps in the ESG indicator set, both historically and by MRV.
Explanation Framework Analysis. This approach sets out a set of reasons why data gaps might occur as a framework for classifying ESG indicators, and looking at what approaches might be used to mitigate coverage issues.
Variance Analysis. This approach looks at the temporal variance of ESG indicators to better understand the impacts of missing data for analysis. It may be possible to impute missing observations for indicators with low variance, mitigating the impact of data gaps.

The paper then concludes with a discussion section and set of recommendations based on the analysis and findings of each of these sections.

1.3 About the Data Used in This Report

The indicator database used in this report consists of 134 indicators extracted from the World Development Indicators and other World Bank Databases in October, 2019.

Table 1.1 provides a summary of the 134 indicators analyzed in this report disaggregated by pillar and origin. 44 indicators are environmental indicators, 66 are social indicators, and 24 are governance indicators. The World Bank is the primary source of 36 indicators, whereas the UN system is primary source of 66 indicators, and other organizations are the source for 32 indicators.

Table 1.1: Indicators by origin and sector

Origin	Env	Soc	Gov	Total
WBG	10	12	14	36
UN System	9	49	8	66
Other orgs	25	5	2	32
Total	44	66	24	134

Unless otherwise noted, the study period is limited to 2000-2018 since collection of 2019 data was still in its early stages at the time of compilation. 4 indicators include only projections data for the year 2050, and thus have been excluded from analysis unless otherwise noted. Another 15 indicators have been dropped or deprecated and, except as noted in the chapter on “Explanation Framework Analysis,” have also been excluded, leaving 115 indicators as the primary focus of analysis.