diff --git a/source/includes/_terms.md b/source/includes/_terms.md index 7049e7f2..7b3ea9f4 100644 --- a/source/includes/_terms.md +++ b/source/includes/_terms.md @@ -8,4 +8,6 @@ Here's list of terms used and what we mean with them. The meaning of terms is mo | Data product | As a strategic resource for companies, data is considered an asset that, like any other material good, has a financial value and whose management generates costs. Data created, collected or used in individual business processes can be sold to other organisations as raw or processed data, so that it no longer serves as an enabler of products, but is the product itself. This leads to the paradigm that data assets can be monetised by exchanging and trading data between organisations as data products and services.

There are multiple definitions for data product. In an [article authored by Jian Pei (2020)](https://arxiv.org/abs/2009.04462), data products "*refer to data sets as products and information services derived from data sets.*" Simon O'Regan's defines data product as *a product whose primary objective is to use data to facilitate an end goal*. From the academic literature we have found several subtypes of data products: raw data, derived data, data sets, reports, analytic views, 3D visualisations, algorithms, decision support (dashboards) and automated decision-making (Netflix product recommendations or Spotify’s Discover Weekly would be common examples).

Typically raw data, derived data and algorithms have technical users. Most often they tend to be internal products in an organisation.

If we dive in the data mesh world, this quote from [Zhamak Dehghani’s book](https://www.oreilly.com/library/view/data-mesh/9781492092384/) is key to understand the definition of data as a product: “*Domain data teams must apply product thinking […] to the datasets that they provide; considering their data assets as their products and the rest of the organization’s data scientists, ML and data engineers as their customers.*”

While many of the standard Product Development Rules apply — solve a customer need, learn from feedback, prioritise relentlessly, etc. — data has different characteristics compared to tangible products that prevent the direct transfer of established processes and rules of trading goods, especially in terms of pricing mechanisms.

In trading data, there is less willingness to pay. For example, data buyers often do not recognise the potential value of data items because it cannot be fully disclosed prior to purchase (known as the ‘Arrow paradox’).

In addition, there is often a lack of notion that the creation, processing, storage and distribution of high-quality data is a major cost factor for the data provider. Another obstacle is the lack of trust and security causing potential data providers to fear that competitors could benefit from disclosure of in-house data.

One of the aims of this specification is to tackle above mentioned issues which hinder the growth of data ecosystem and market volatility. | | Data as a service | [In computing, data as a service, or DaaS](https://en.wikipedia.org/wiki/Data_as_a_service), is a term used to describe cloud-based software tools used for working with data, such as managing data in a data warehouse or analyzing data with business intelligence. It is enabled by software as a service (SaaS). DaaS like all "as a service" (aaS) technology, builds on the concept that its data product can be provided to the user on demand, regardless of geographic or organizational separation between provider and consumer.

[According to Daniel Newman from Forbes (2017)](https://www.forbes.com/sites/danielnewman/2017/02/07/data-as-a-service-the-big-opportunity-for-business/) DaaS is essentially a data stream that subscribers can access on demand.

Some people use the term data product in a meaning which contains also data commodities which have more service alike attributes than product attributes. In those cases we prefer to use the term *data as a service* and call the creation process as *data servitization*. The term *productizement* is reserved for the process which creates data products as end result. | | Data as a service business model | Data as a service as a business model is a concept when two or more organizations buy, sell, or trade machine-readable data in exchange for something of value. Data as a service is a general term that encompasses data-related services. Now DaaS service providers are replacing traditional data analytics services or happily clustering with existing services to offer more value-addition to customers. The DaaS providers are curating, aggregating, analyzing multi-source data in order to provide additional more valuable analytical data or information.

Typically, DaaS business is based on subscriptions and customers pay for a package of services or definite services. | -| Data pipeline | According to [Aiswarya et al.](https://research.chalmers.se/publication/523476/file/523476_Fulltext.pdf) the complex chain of interconnected activities or processes from data gen- eration through data reception constitutes a data pipeline. In other words, data pipelines are the connected chain of processes where the output of one or more processes becomes an input for another. It is a piece of software that removes many manual steps from the workflow and permits a streamlined, automated flow of data from one node to another. Moreover, it automates the operations involved in the selection, extraction, transformation, aggregation, validation, and loading of data for further analysis and visualization. It offers end to end speed by removing errors and resisting bottlenecks or delay. Data pipelines can process multiple streams of data simultaneously | \ No newline at end of file +| Data pipeline | According to [Aiswarya et al.](https://research.chalmers.se/publication/523476/file/523476_Fulltext.pdf) the complex chain of interconnected activities or processes from data gen- eration through data reception constitutes a data pipeline. In other words, data pipelines are the connected chain of processes where the output of one or more processes becomes an input for another. It is a piece of software that removes many manual steps from the workflow and permits a streamlined, automated flow of data from one node to another. Moreover, it automates the operations involved in the selection, extraction, transformation, aggregation, validation, and loading of data for further analysis and visualization. It offers end to end speed by removing errors and resisting bottlenecks or delay. Data pipelines can process multiple streams of data simultaneously. | +| Infrastructure as Code | Infrastructure as Code (IaC) transforms infrastructure management by using code instead of manual processes. Configuration files capture infrastructure specifications, ensuring consistent environment provisioning. The "as code" paradigm extends beyond infrastructure to encompass quality control and data product processes. This approach, applied to the entire data pipeline, enhances repeatability, traceability, and scalability, fostering collaboration and systematic data management. | +