From 4b47085264b07cc7f251c9904617300df852b5a3 Mon Sep 17 00:00:00 2001 From: Olivier Dupriez <35276300+odwb@users.noreply.github.com> Date: Thu, 23 Nov 2023 13:29:00 -0500 Subject: [PATCH] Update 00_introduction.Rmd --- 00_introduction.Rmd | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/00_introduction.Rmd b/00_introduction.Rmd index 205fff4..fab8ba7 100644 --- a/00_introduction.Rmd +++ b/00_introduction.Rmd @@ -4,16 +4,17 @@ output: html_document # Introduction {-} -Over the last decade, the supply of socio-economic data available to researchers has increased considerably, along with advances in the tools and methods available to exploit these data. This provides the research community and development practitioners with unprecedented opportunities to increase the use and value of existing data, as "data that were initially collected with one intention can be reused for a completely different purpose. (…) Because the potential of data to serve a productive use is essentially limitless, enabling the reuse and repurposing of data is critical if data are to lead to better lives.” ([World Bank, World Development Report 2021](https://www.worldbank.org/en/publication/wdr2021)) +Over the last decade, the supply of socio-economic data available to researchers and policy makers has increased considerably, along with advances in the tools and methods available to exploit these data. This provides the research community and development practitioners with unprecedented opportunities to increase the use and value of existing data. -But data can be challenging to find, access, and use, resulting in many valuable datasets remaining underutilized. Data libraries and data repositories, and the data catalogs they maintain, play a crucial role in making data more discoverable, visible, and usable. But many are built on sub-optimal standards and technological solutions, resulting in limited findability and visibility of their assets. To address such market failures, a better market place for data is needed. +#Note: +Data that were initially collected with one intention can be reused for a completely different purpose. (…) Because the potential of data to serve a productive use is essentially limitless, enabling the reuse and repurposing of data is critical if data are to lead to better lives. ([World Bank, World Development Report 2021](https://www.worldbank.org/en/publication/wdr2021)) -A better market place for data can be developed on the model of large e-commerce platforms, which are designed to effectively and efficiently serve both buyers (the data users in a market place for data) and sellers (the data producers in a market place for data). Data platforms must be optimized to provide data users with convenient ways of identifying, locating, and acquiring data (which implies user-friendly advanced search tools and a recommender system), and data producers with a trustable mechanism to make their datasets visible and discoverable and to share them in a cost-effective, convenient, and responsible manner. +But data can be challenging to find, access, and use, resulting in many valuable datasets remaining underutilized. Data repositories and libraries, and the data catalogs they maintain, play a crucial role in making data more discoverable, visible, and usable. But many of these catalogs are built on sub-optimal standards and technological solutions, resulting in limited findability and visibility of their assets. To address such market failures, a better market place for data is needed. -Achieving such objectives requires detailed and **structured metadata** that properly describe the data products. Indeed, search algorithms and recommender systems exploit metadata, not data. Metadata are essential to the credibility, discoverability, visibility, and usability of the data. Adopting **metadata standards and schemas**[^1] is a practical and efficient solution to achieve completeness and quality of the metadata. This Guide presents a set of recommended standards and schemas covering multiple types of data, both structured and unstructured, along with guidance and justification for their implementation. The data types covered include microdata, statistical tables, indicators and time series, geographic datasets, text, images, video recordings, and programs and scripts. +A better market place for data can be developed on the model of large e-commerce platforms, which are designed to effectively and efficiently serve both buyers and sellers. In a market place for data, the "buyers" are the data users, and the "sellers" are the organizations who own or curate datasets and seek to make them available to users -- preferably free of charge to maximize the use of data. Data platforms must be optimized to provide data users with convenient ways of identifying, locating, and acquiring data (which requires the implementation of a user-friendly search and recommendation system), and to provide data owners with a trustable mechanism to make their datasets visible and discoverable and to share them in a cost-effective, convenient, and safe manner. -Chapter 1 of the Guide outlines the challenges associated with finding and using data. Chapter 2 describes the essential features of a modern data catalog, and Chapter 3 explains how rich and structured metadata, compliant with the metadata standards and schemas we describe in the Guide, can enable advanced search algorithms and recommender systems optimized for data discoverability. Finally, Chapters 4 to 13 present a set of recommended standards and schemas, along with examples of their use. +Achieving such objectives requires detailed and structured metadata that properly describe the data products. Indeed, search algorithms and recommender systems exploit metadata, not data. Metadata are essential to the credibility, discoverability, visibility, and usability of the data. Adopting metadata standards and schemas is a practical and efficient solution to achieve completeness and quality of the metadata. This Guide presents a set of recommended standards and schemas covering multiple types of data along with guidance for their implementation. The data types covered include microdata, statistical tables, indicators and time series, geographic datasets, text, images, video recordings, and programs and scripts. -This Guide was produced by the Office of the World Bank Chief Statistician as a reference guide for World Bank staff and partners involved in the curation and dissemination of data related to social and economic development. The standards and schemas it describes are used by the World Bank in its data management and dissemination systems, and for the development of systems and tools for data acquisition, documentation, cataloguing, and dissemination. Among these tools is a specialized **Metadata Editor** designed to facilitate the documentation of datasets in compliance with the recommended standards and schemas, and a cataloguing application ("NADA"). Both applications are openly available. +Chapter 1 of the Guide outlines the challenges associated with finding and using data. Chapter 2 describes the essential features of a modern data catalog, and Chapter 3 explains how rich and structured metadata, compliant with the metadata standards and schemas we describe in the Guide, can enable advanced search algorithms and recommender systems. Finally, Chapters 4 to 13 present the recommended standards and schemas, along with examples of their use. -[^1]: We refer to metadata *standards* when a community or organization is in charge of the development and maintenance of a metadata schema, with formal governance. We use the term metadata *schema* when no formal governance is established. +This Guide was produced by the Office of the World Bank Chief Statistician as a reference guide for World Bank staff and for partners involved in the curation and dissemination of data related to social and economic development. The standards and schemas it describes are used by the World Bank in its data management and dissemination systems, and for the development of systems and tools for the acquisition, documentation, cataloguing, and dissemination of data. Among these tools is a specialized **Metadata Editor** designed to facilitate the documentation of datasets in compliance with the recommended standards and schemas, and a cataloguing application ("NADA"). Both applications are openly available.