• Guide on Metadata Standards
  • Preface
  • Introduction
  • I RATIONALE AND OBJECTIVES
  • 1 The challenge of finding, accessing, and using data
    • 1.1 Finding data
    • 1.2 Accessing data
    • 1.3 Using data
  • 2 Features of a modern data dissemination platform
    • 2.1 Features for data users
      • 2.1.1 Search
        • 2.1.1.1 Search box
        • 2.1.1.2 Document as a query
        • 2.1.1.3 Image as a query
        • 2.1.1.4 Suggested queries
        • 2.1.1.5 Advanced search
        • 2.1.1.6 Geographic search
        • 2.1.1.7 Results ranking
        • 2.1.1.8 Saving and sharing results
      • 2.1.2 Filtering
        • 2.1.2.1 Facets
        • 2.1.2.2 Organizing entries by collection
        • 2.1.2.3 Organizing entries by data type
        • 2.1.2.4 Filtering by access type
      • 2.1.3 Browsing
        • 2.1.3.1 Providing core information in the listing page
        • 2.1.3.2 Featured and most popular entries
        • 2.1.3.3 Recent additions and history
        • 2.1.3.4 Metadata display and formats
        • 2.1.3.5 Variable-level comparison
        • 2.1.3.6 Mosaic view for images
        • 2.1.3.7 Data preview
        • 2.1.3.8 Visualizations
        • 2.1.3.9 Links and related resources
      • 2.1.4 Sorting
      • 2.1.5 Other features for users
        • 2.1.5.1 Data and metadata API
        • 2.1.5.2 Bulk download option
        • 2.1.5.3 Alerts
        • 2.1.5.4 Time series query user interface
        • 2.1.5.5 Online data access forms
        • 2.1.5.6 Permanent identifiers and DOIs
        • 2.1.5.7 Archiving
        • 2.1.5.8 Catalog of citations
        • 2.1.5.9 Reproducible and replicable scripts
        • 2.1.5.10 Users’ feedback
        • 2.1.5.11 Support
        • 2.1.5.12 Conversational interaction with datasets
        • 2.1.5.13 AI-activated assistance to data analysis
    • 2.2 Features for data providers
      • 2.2.1 Safety
      • 2.2.2 Visibility
      • 2.2.3 Efficiency
      • 2.2.4 Feedback from users
    • 2.3 Features for catalog administrators
      • 2.3.1 Data deposit
      • 2.3.2 Privacy protection
      • 2.3.3 Free software
      • 2.3.4 Security
      • 2.3.5 IT affordability
      • 2.3.6 Interoperability
      • 2.3.7 Flexibility on access policies
      • 2.3.8 API based system for automation and efficiency
      • 2.3.9 Featuring tools
      • 2.3.10 Usage monitoring and analytics
      • 2.3.11 Multilingual capability
      • 2.3.12 Search Engine Optimization (SEO)
      • 2.3.13 Widgets and plugins
      • 2.3.14 Interaction with developers
  • 3 The power of rich and structured metadata
    • 3.1 Rich metadata
      • 3.1.1 Defining rich metadata
        • 3.1.1.1 Cataloguing material
        • 3.1.1.2 Contextual information
        • 3.1.1.3 Explanatory material
      • 3.1.2 Benefits for data users
        • 3.1.2.1 Finding data
        • 3.1.2.2 Understanding and using data
        • 3.1.2.3 Assessing data
      • 3.1.3 Benefits for data producers
        • 3.1.3.1 Credibility
        • 3.1.3.2 Quality assurance
        • 3.1.3.3 Harmonization of data collection
        • 3.1.3.4 Visibility
        • 3.1.3.5 Cost-effectiveness of data dissemination
        • 3.1.3.6 Preservation of institutional memory
    • 3.2 Structured metadata
      • 3.2.1 Defining structured metadata
      • 3.2.2 Formats for structured metadata: JSON and XML
      • 3.2.3 Benefits of structured metadata
        • 3.2.3.1 Completeness
        • 3.2.3.2 Usability
        • 3.2.3.3 Discoverability
        • 3.2.3.4 Interoperability
    • 3.3 Recommended standards
      • 3.3.1 Documents
      • 3.3.2 Microdata
      • 3.3.3 Geographic datasets, data structures, and data services
      • 3.3.4 Time series, indicators
      • 3.3.5 Statistical tables
      • 3.3.6 Images
      • 3.3.7 Audio
      • 3.3.8 Videos
      • 3.3.9 Programs and scripts
      • 3.3.10 External resources
    • 3.4 A note on other standards
      • 3.4.1 SDMX
      • 3.4.2 schema.org and DCAT
    • 3.5 Use of controlled vocabularies
    • 3.6 Use of tags
    • 3.7 Metadata augmentation
  • II PRACTICAL IMPLEMENTATION
  • 4 Generating and publishing structured metadata
    • 4.1 Core properties and features of metadata elements
    • 4.2 Documentation of the standards’ API
    • 4.3 Generating and publishing structured metadata
      • 4.3.1 Using the Metadata Editor
        • 4.3.1.1 Custom templates
        • 4.3.1.2 Producing and publishing the metadata
      • 4.3.2 Using R
      • 4.3.3 Using Python
  • III METADATA STANDARDS
  • 5 Documents
    • 5.1 MARC 21, Dublin Core, and BibTex
    • 5.2 Schema description
      • 5.2.1 Metadata information
      • 5.2.2 Document description
      • 5.2.3 Provenance
      • 5.2.4 Tags
      • 5.2.5 LDA topics
      • 5.2.6 Embeddings
      • 5.2.7 Additional fields
    • 5.3 Complete examples
      • 5.3.1 Example 1: Working Paper
        • 5.3.1.1 Description
        • 5.3.1.2 Using a metadata editor
        • 5.3.1.3 Using R
        • 5.3.1.4 Using Python
      • 5.3.2 Example 2: Book
        • 5.3.2.1 Description
        • 5.3.2.2 Using R
        • 5.3.2.3 Using Python
      • 5.3.3 Example 3: Importing from a list of documents
        • 5.3.3.1 Using R
        • 5.3.3.2 Using Python
  • 6 Microdata
    • 6.1 Definition of microdata
    • 6.2 The Data Documentation Initiative (DDI) metadata standard
      • 6.2.1 DDI-Codebook
      • 6.2.2 DDI-Lifecycle
    • 6.3 Some practical considerations
    • 6.4 Schema description: DDI-Codebook
      • 6.4.1 Document description
      • 6.4.2 Study description
        • 6.4.2.1 Title statement
        • 6.4.2.2 Authoring entity
        • 6.4.2.3 Other entity
        • 6.4.2.4 Production statement
        • 6.4.2.5 Distribution statement
        • 6.4.2.6 Series statement
        • 6.4.2.7 Version statement
        • 6.4.2.8 Bibliographic citation
        • 6.4.2.9 Bibliographic citation format
        • 6.4.2.10 Holdings
        • 6.4.2.11 Study notes
        • 6.4.2.12 Study autorization
        • 6.4.2.13 Study information
        • 6.4.2.14 Study development
        • 6.4.2.15 Method
        • 6.4.2.16 Data access
      • 6.4.3 Description of data files
      • 6.4.4 Variable description
      • 6.4.5 Variable groups
      • 6.4.6 Provenance
      • 6.4.7 Tags
      • 6.4.8 LDA topics
      • 6.4.9 Embeddings
      • 6.4.10 Additional
  • 7 Geographic data and services
    • 7.1 Background
    • 7.2 Geographic information metadata standards
      • 7.2.1 Documenting geographic datasets - The ISO 19115 standard
      • 7.2.2 Describing data structures - The ISO 19115-2 and ISO 19110 standards
      • 7.2.3 Describing data services - The ISO 19119 standard
      • 7.2.4 Unified metadata specification - The ISO/TS 19139 standard
    • 7.3 Schema description
      • 7.3.1 Introduction to ISO19139
      • 7.3.2 Common sets of elements
        • 7.3.2.1 Contact / Responsible party
        • 7.3.2.2 Online resource
        • 7.3.2.3 Offline resource (Medium)
        • 7.3.2.4 File format
        • 7.3.2.5 Citation
        • 7.3.2.6 Keywords
        • 7.3.2.7 Constraints @@@@ not clear. where is the element useLimitations? … what are the elements used in the schema?
        • 7.3.2.8 Extent
      • 7.3.3 Core metadata properties
        • 7.3.3.1 Resource identifier (idno)
        • 7.3.3.2 Language (language)
        • 7.3.3.3 Character set (characterSet)
        • 7.3.3.4 Parent Identifier (parentIdentifier)
        • 7.3.3.5 Hierarchy level(s) (hierarchyLevel)
        • 7.3.3.6 Hierarchy level name(s) (hierarchyLevelname)
        • 7.3.3.7 Contact(s) (contact)
        • 7.3.3.8 Date stamp (dateStamp)
        • 7.3.3.9 Metadata standard name (metadataStandardName)
        • 7.3.3.10 Metadata standard version (metadataStandardVersion)
        • 7.3.3.11 Dataset URI (datasetURI)
      • 7.3.4 Main metadata sections
        • 7.3.4.1 Spatial representation (spatialRepresentationInfo)
        • 7.3.4.2 Reference system(s) (referenceSystemInfo)
        • 7.3.4.3 Identification (identificationInfo)
          • 7.3.4.3.1 Service operation
        • 7.3.4.4 Content (contentInfo)
        • 7.3.4.5 Distribution (distributionInfo)
        • 7.3.4.6 Data quality (dataQualityInfo)
          • 7.3.4.6.1 Report (report)
          • 7.3.4.6.2 Lineage (lineage)
        • 7.3.4.7 Metadata maintenance (metadataMaintenanceInfo)
    • 7.4 ISO 19110 Feature Catalogue (feature_catalogue)
    • 7.5 Provenance
    • 7.6 Tags
    • 7.7 LDA topics
    • 7.8 Embeddings
    • 7.9 Additional
    • 7.10 Complete examples
      • 7.10.1 Example 1 (vector - shape files): Bangladesh, Outline of camps of Rohingya refugees in Cox’s Bazar, January 2021
      • 7.10.2 Example 2 (vector, CSV data): Syria Refugee Sites (OCHA)
      • 7.10.3 Example 3 (vector, with Feature Catalogue) - The GDIS (beta) dataset
      • 7.10.4 Example 4 (raster): Spatial distribution of the Ethiopian population in 2020
      • 7.10.5 Example 5 (service): The United Nations Geospatial website
    • 7.11 Useful tools
  • 8 Databases of indicators
    • 8.1 Database vs indicators
    • 8.2 Schema description
      • 8.2.0.1 Metadata information
      • 8.2.0.2 Database description
      • 8.2.1 Provenance
      • 8.2.2 Tags
      • 8.2.3 LDA topics
      • 8.2.4 Embeddings
      • 8.2.5 Additional
        • 8.2.5.1 Complete example
  • 9 Indicators and time series
    • 9.1 Indicators, time series, database, and scope of the schema
    • 9.2 Schema description
      • 9.2.1 The time series (indicators) schema
        • 9.2.1.1 Cataloguing parameters
        • 9.2.1.2 Metadata information
        • 9.2.1.3 Series description
      • 9.2.2 Provenance
      • 9.2.3 Tags
      • 9.2.4 Additional
    • 9.3 Generating and publishing compliant metadata - Complete example
      • 9.3.1 Use of AI for metadata augmentation
      • 9.3.2 Using R
      • 9.3.3 Using Python
  • 10 Statistical tables
    • 10.1 Introduction
    • 10.2 Anatomy of a table
    • 10.3 Schema description
      • 10.3.1 Cataloguing parameters
      • 10.3.2 Metadata information
      • 10.3.3 Table description
      • 10.3.4 Provenance
      • 10.3.5 Tags
      • 10.3.6 Additional (custom) elements
    • 10.4 Complete examples
      • 10.4.1 Example 1
      • 10.4.2 Example 2
      • 10.4.3 Example 3
  • 11 Images
    • 11.1 Image metadata
      • 11.1.1 Embedded metadata: EXIF
      • 11.1.2 IPTC and Dublin Core standards
      • 11.1.3 Augmenting image metadata
    • 11.2 Schema description
      • 11.2.1 Common elements
      • 11.2.2 IPTC option
      • 11.2.3 Dublin Core option
      • 11.2.4 Additional elements (IPTC and DCMI)
      • 11.2.5 LDA topics
      • 11.2.6 Embeddings
    • 11.3 Examples
      • 11.3.1 Example 1 - Using the IPTC option
      • 11.3.2 Example 2 - Using the DCMI option
  • 12 Videos
    • 12.1 Augmenting video metadata
    • 12.2 Schema description
      • 12.2.1 Metadata information
      • 12.2.2 Video description
    • 12.3 Complete example
      • 12.3.1 In R
      • 12.3.2 In Python
  • 13 Research projects and scripts
    • 13.1 Rationale
    • 13.2 Motivation for open analytics
    • 13.3 Goal: discoverable code
    • 13.4 Schema description
      • 13.4.1 Document description
      • 13.4.2 Project description
      • 13.4.3 Provenance
      • 13.4.4 Tags
      • 13.4.5 Additional
    • 13.5 Generating compliant metadata
      • 13.5.1 Full example, using a metadata editor
      • 13.5.2 Full example, using R
      • 13.5.3 Full example, using Python
  • 14 External resources
    • 14.1 Example of use of external resources
  • ANNEXES
  • Annex 1: References and links
  • Published with bookdown

[DRAFT - WORK IN PROGRESS] Metadata Standards for Improved Data Discoverability and Usability

Annex 1: References and links

Documents

  • Asian Development Bank (ADB). 2001. Mapping the Spatial Distribution of Poverty Using Satellite Imagery in Thailand ISBN 978-92-9262-768-3 (print), 978-92-9262-769-0 (electronic), 978-92-9262-770-6 (ebook) Publication Stock No. TCS210112-2. DOI: http://dx.doi.org/10.22617/TCS210112-2

  • Balashankar, A., L.Subramanian, and S.P. Fraiberger. 2021. Fine-grained prediction of food insecurity using news streams

  • British Ecological Society. 2017. Guide to Reproducible Code in Ecology and Evolution

  • Google. Google’s Search Engine Optimization (SEO) Starter Guide

  • Jurafsky, Daniel; H. James, Martin. 2000. Speech and language processing : an introduction to natural language processing, computational linguistics, and speech recognition. Upper Saddle River, N.J.: Prentice Hall. ISBN 978-0-13-095069-7

  • Mikolov, T., K.Chen, G.Corrado, and J.Dean. 2013. Efficient Estimation of Word Representations in Vector Space

  • Min, B. and Z.O’Keeffe. 2021. http://www-personal.umich.edu/~brianmin/HREA/index.html

  • Priest, G.. 2010. The Struggle for Integration and Harmonization of Social Statistics in a Statistical Agency - A Case Study of Statistics Canada

  • Stodden et al. 2013. Setting the Default to Reproducible - Reproducibility in Computational and Experimental Mathematics

  • Turnbull, D. and J. Berryman. 2016. Relevant Search: With applications for Solr and Elasticsearch

Links (standards, schemas, controlled vocabularies)

  • American Psychological Association (APA): APA Style (example of specific publications styles for a table)

  • Consortium of European Social Science Data Archives (CESSDA)

  • US Census Bureau, CsPro Users Guide: Parts of a Table

  • Data Documentation Initiative (DDI) Alliance

  • DDI Alliance, Data Documentation Initiative (DDI) Codebook

  • Dublin Core Metadata Initiative (DCMI)

  • eMathZone: Construction of a Statistical Table

  • GoFair (Findable, Accessible, Interoperable and Reusable (FAIR))

  • International Household Survey Network (IHSN)

  • International Press Telecommunications Council (IPTC)

  • International Organization for Standardization (ISO) 19139: Geographic information — Metadata — XML schema implementation

  • LabWrite: Designing Tables

  • schema.org

  • Microsoft Bing: Bing Webmaster Tools Help & How-To Center, Bing Webmaster Guidelines

  • Vedantu: Tabulation

Links (tools)

  • CKAN open-source data management system
  • ElasticSearch
  • GeoNetwork
  • Milvus)
  • NADA cataloguing application, web page
  • NADA cataloguing application, demo page
  • NADA cataloguing application, GitHub repository
  • NADAR package
  • Nesstar Publisher (DDI 1.n Metadata Editor
  • R: The R Project for Statistical Computing
  • R Bookdown: Write HTML, PDF, ePub, and Kindle books with R Markdown
  • R geometa: Tools for Reading and Writing ISO/OGC Geographic Metadata
  • Solr

Links (others)

  • WorldPop: https://www.worldpop.org/

[to do]