Introduction
In the past decade, there has been a substantial increase in the availability of socio-economic data accessible to researchers and policymakers. Simultaneously, advancements in tools and methodologies have empowered users to leverage these datasets. This surge provides unparalleled opportunities for the research community and development practitioners to enhance the utilization and value of existing data.
Data that were initially collected with one intention can be reused for a completely different purpose. (…) Because the potential of data to serve a productive use is essentially limitless, enabling the reuse and repurposing of data is critical if data are to lead to better lives. (Source: World Bank, World Development Report 2021)
Despite this potential, challenges persist in finding, accessing, and effectively using data, leaving numerous valuable datasets underutilized. Data repositories, libraries, and the associated catalogs play a pivotal role in rendering data more discoverable, visible, and usable. However, many of these catalogs are built upon sub-optimal standards and technological solutions, resulting in limited findability and visibility of their assets. Addressing these shortcomings requires the development of a more robust marketplace for data, modeled after successful e-commerce platforms that efficiently serve both buyers and sellers.
In the context of a marketplace for data, the “buyers” are the data users, and the “sellers” are the organizations that own or curate datasets, aiming to make them available to users, ideally free of charge. To accomplish this, data platforms must be optimized to offer users convenient ways to identify, locate, and acquire data, necessitating the implementation of a user-friendly search and recommendation system. Furthermore, these platforms should provide data owners with a trustworthy mechanism to make their datasets visible, discoverable, and shareable in a cost-effective, convenient, and secure manner.
The achievement of these objectives hinges on the use of detailed and structured metadata that accurately describe the data products. Notably, search algorithms and recommender systems rely on metadata rather than raw data. Metadata play a crucial role in establishing the credibility, discoverability, visibility, and usability of data. Embracing metadata standards proves to be a pragmatic and effective approach to guarantee the comprehensiveness and quality of metadata. A metadata standard, also referred to as a metadata schema, comprises a meticulously organized set of clearly defined elements designed for documenting a dataset. It is accompanied by essential rules and instructions to ensure their uniform and consistent implementation.
This Guide puts forth a set of recommended metadata standards covering various data types, offering guidance for their implementation. The proposed standards are those selected and adapted by the World Bank’s Office of the Chief Statistician, and used for data curation at the World Bank. Encompassing microdata, statistical tables, indicators and time series, geographic datasets, text, images, video recordings, programs, and scripts, these standards aim to elevate the quality and effectiveness of metadata across diverse data categories. The Guide adheres to the FAIR Guiding Principles for scientific data management and stewardship, aiming to improve the Findability, Accessibility, Interoperability, and Reusability of digital data assets.
Chapter 1 of the Guide outlines the challenges associated with finding and using data. Chapter 2 describes the essential features of a modern data catalog, and Chapter 3 explains how rich and structured metadata, compliant with the metadata standards we describe in the Guide, can enable advanced search algorithms and recommender systems. Chapter 4 provides practical guidelines for the production and publishing of structured metadata. Finally, Chapters 5 to 14 present the recommended standards, along with examples of their use.
This Guide was produced by the Office of the World Bank Chief Statistician as a reference guide for World Bank staff and for partners involved in the curation and dissemination of data related to social and economic development. The standards it describes are used by the World Bank in its data management and dissemination systems, and for the development of systems and tools for the acquisition, documentation, cataloguing, and dissemination of data. Among these tools is a specialized Metadata Editor designed to facilitate the documentation of datasets in compliance with the recommended standards, and a cataloguing application (“NADA”). Both applications are openly available.