Foreword

For most of our history, data have been scarce and expensive to obtain, and economic research has advanced at a slow pace, especially in data-poor countries. The high cost of impact evaluations, for example, has mainly been determined by the cost of collecting data. This is rapidly changing because of recent advances in technology (see the World Bank’s World Development Report 2021). Technology, like smartphones, mobile networks, e-services, and e-government, has changed the process through which data are generated and has created massive amounts of real-time transaction-level data, with more data being generated in the last two years than in all human history combined.

Yet most data are unused, especially in the public sector and in low-capacity environments. The scarcity we now face is of a different nature: we are limited by high-quality research skills and institutional capabilities for making sense of all the data and turning them into a resource for human progress.

Building knowledge takes time, and evidence, especially in development economics, is scarce and difficult to generate. Realistically, evidence informs only a tiny proportion of the policy decisions being made every day. Making steady progress toward bridging the knowledge gap requires taking research from a microentrepreneurial to a corporate mode of delivery. At Development Impact Evaluation (DIME), we have invested in a research production function and workflows that capitalize on scale and specialization in the delivery of research. In so doing, we have lowered the costs of research while optimizing the quantity and quality of research output. We set up the DIME Analytics team as an institutional solution to increase the quality of data collection and analysis for our research portfolio. This has been a high-return investment. The team has helped transform the way we work and, in the process, created a host of open-source resources like the DIME Wiki, toolkits, and reproducibility protocols that have been made available to the wider community of development economics researchers and practitioners.

The provision of tools and innovations that help improve the global production of development economics research is, we feel, part and parcel of the responsibility of institutions like the World Bank and other academic centers of excellence that have the organization and resources to create and deliver public knowledge goods. In so doing, we hope to contribute to building capacities both internally and in the rest of the global research community, especially for those that may not operate with the same level of organization and resources as we do.

Although research is a great organizing principle for creating and extracting value from data, it is very rare for data to be ready for use. Identifying sources, obtaining permissions, integrating data from different sources, and triangulating and ground-truthing to understand biases in coverage and representativeness are all necessary steps to developing an understanding of how data quality can be improved, what the data can be used for, and how results should interpreted. A high level of technical specialization and the right combination of disciplines brought into a research team can significantly increase the quality of data and economic research.

At DIME, as we have been making consistent advancements in the great puzzle of creating and using data responsibly, investing in the quality of data and research for development, we have made those tools available and open source. Now we feel we should go one step further. The idea behind this handbook is to provide a step-by-step guide to high-quality, reproducible data work over the full life cycle of an empirical research project. The book is directed to development researchers all over the world, to be read cover to cover or as a desk reference as needs arise.

Arianna Legovini
Adviser, Development Impact Evaluation (DIME)
World Bank