Skip to content

Commit

Permalink
Add data products for snowplow-cli (#1101)
Browse files Browse the repository at this point in the history
* Add snowplow-cli documentation for data products

* Add data products cli instructions

* Apply suggestions from code review

Co-authored-by: Costas Kotsokalis <[email protected]>

* One more missed spec -> specification

* Remove github annnotate on publish

* Even more renames

* Amend to new sidebar

* Add the manage data products in CLI section

* Fix typo

---------

Co-authored-by: Costas Kotsokalis <[email protected]>
  • Loading branch information
gleb-lobov and cksnp authored Dec 13, 2024
1 parent 8e8b7b5 commit 8035f87
Show file tree
Hide file tree
Showing 9 changed files with 933 additions and 75 deletions.
100 changes: 100 additions & 0 deletions docs/data-product-studio/data-products/cli/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
---
title: "Managing data products via the CLI"
description: "Use the 'snowplow-cli data-products' command to manage your data products."
sidebar_label: "Using the CLI"
sidebar_position: 999
---
```mdx-code-block
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
```
The `data-products` subcommand of [Snowplow CLI](/docs/data-product-studio/snowplow-cli/index.md) provides a collection of functionality to ease the integration of custom development and publishing workflows.
## Snowplow CLI Prerequisites
Installed and configured [Snowplow CLI](/docs/data-product-studio/snowplow-cli/index.md)
## Available commands
### Creating data product
```bash
./snowplow-cli dp generate --data-product my-data-product
```
This command creates a minimal data product template in a new file `./data-products/my-data-product.yaml`.
### Creating source application
```bash
./snowplow-cli dp generate --source-app my-source-app
```
This command creates a minimal source application template in a new file `./data-products/source-apps/my-source-app.yaml`.
### Creating event specification
To create an event specification, you need to modify the existing data-product file and add an event specification object. Here's a minimal example:
```yaml title="./data-products/test-cli.yaml"
apiVersion: v1
resourceType: data-product
resourceName: 3d3059c4-d29b-4979-a973-43f7070e1dd0
data:
name: test-cli
sourceApplications: []
eventSpecifications:
- resourceName: 11d881cd-316e-4286-b5d4-fe7aebf56fca
name: test
event:
source: iglu:com.snowplowanalytics.snowplow/button_click/jsonschema/1-0-0
```
:::caution Warning
The `source` fields of events and entities must refer to a deployed data structure. Referring to a locally created data structure is not yet supported.
:::
### Linking data product to a source application
To link a data product to a source application, provide a list of references to the source application files in the `data.sourceApplications` field. Here's an example:
```yaml title="./data-products/test-cli.yaml"
apiVersion: v1
resourceType: data-product
resourceName: 3d3059c4-d29b-4979-a973-43f7070e1dd0
data:
name: test-cli
sourceApplications:
- $ref: ./source-apps/my-source-app.yaml
```
### Modifying the event specifications source applications
By default event specifications inherit all the source applications of the data product. If you want to customise it, you can use the `excludedSourceApplications` in the event specification description to remove a given source application from an event specification.
```yaml title="./data-products/test-cli.yaml"
apiVersion: v1
resourceType: data-product
resourceName: 3d3059c4-d29b-4979-a973-43f7070e1dd0
data:
name: test-cli
sourceApplications:
- $ref: ./source-apps/generic.yaml
- $ref: ./source-apps/specific.yaml
eventSpecifications:
- resourceName: 11d881cd-316e-4286-b5d4-fe7aebf56fca
name: All source apps
event:
source: iglu:com.snowplowanalytics.snowplow/button_click/jsonschema/1-0-0
- resourceName: b9c994a0-03b2-479c-b1cf-7d25c3adc572
name: Not quite everything
excludedSourceApplications:
- $ref: ./source-apps/specific.yaml
event:
source: iglu:com.snowplowanalytics.snowplow/button_click/jsonschema/1-0-0
```
In this example event specification `All source apps` is related to both `generic` and `specific` source apps, but event specification `Not quite everything` is related only to the `generic` source application.
### Downloading data products, event specifications and source apps
```bash
./snowplow-cli dp download
```
This command retrieves all organization data products, event specifications, and source applications. By default, it creates a folder named `data-products` in your current working directory. You can specify a different folder name as an argument if needed.
The command creates the following structure:
- A main `data-products` folder containing your data product files
- A `source-apps` subfolder containing source application definitions
- Event specifications embedded within their related data product files.
### Validating data products, event specifications and source applications
```bash
./snowplow-cli dp validate
```
This command scans all files under `./data-products` and validates them using the BDP console. It checks:
1. Whether each file is in a valid format (YAML/JSON) with correctly formatted fields
2. Whether all source application references in the data product files are valid
3. Whether event specification rules are compatible with their schemas
If validation fails, the command displays the errors in the console and exits with status code 1.
### Publishing data products, event specifications and source applications
```bash
./snowplow-cli dp publish
```
This command locates all files under `./data-products`, validates them, and publishes them to the BDP console.
2 changes: 1 addition & 1 deletion docs/data-product-studio/data-quality/index.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: "Data quality"
date: "2024-12-04"
sidebar_position: 7
sidebar_position: 8
---

There are a number of ways you can test and QA your pipeline to follow good data practices.
Expand Down
71 changes: 2 additions & 69 deletions docs/data-product-studio/data-structures/manage/cli/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,78 +13,11 @@ import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
```

The `data-structures` subcommand of [Snowplow CLI](https://github.com/snowplow-product/snowplow-cli) provides a collection of functionality to ease the integration of custom development and publishing workflows.
The `data-structures` subcommand of [Snowplow CLI](/docs/data-product-studio/snowplow-cli/index.md) provides a collection of functionality to ease the integration of custom development and publishing workflows.

## Snowplow CLI Prerequisites

### Install

Snowplow CLI can be installed with [homebrew](https://brew.sh/)
```
brew install snowplow-product/taps/snowplow-cli
```

Verify the installation with
```
snowplow-cli --help
```

For systems where homebrew is not available binaries for multiple platforms can be found in [releases](https://github.com/snowplow-product/snowplow-cli/releases).

Example installation for `linux_x86_64` using `curl`

```bash
curl -L -o snowplow-cli https://github.com/snowplow-product/snowplow-cli/releases/latest/download/snowplow-cli_linux_x86_64
chmod u+x snowplow-cli
```

Verify the installation with
```
./snowplow-cli --help
```

### Configure

You will need three values.

API Key Id and API Key Secret are generated from the [credentials section](https://console.snowplowanalytics.com/credentials) in BDP Console.

Organization Id can be retrieved from the URL immediately following the .com when visiting BDP console:

![](images/orgID.png)

Snowplow CLI can take its configuration from a variety of sources. More details are available from `./snowplow-cli data-structures --help`. Variations on these three examples should serve most cases.

<Tabs groupId="config">
<TabItem value="env" label="env variables" default>

```bash
SNOWPLOW_CONSOLE_API_KEY_ID=********-****-****-****-************
SNOWPLOW_CONSOLE_API_KEY=********-****-****-****-************
SNOWPLOW_CONSOLE_ORG_ID=********-****-****-****-************
```

</TabItem>
<TabItem value="defaultconfig" label="$HOME/.config/snowplow/snowplow.yml" >

```yaml
console:
api-key-id: ********-****-****-****-************
api-key: ********-****-****-****-************
org-id: ********-****-****-****-************
```
</TabItem>
<TabItem value="args" label="inline arguments" >
```bash
./snowplow-cli data-structures --api-key-id ********-****-****-****-************ --api-key ********-****-****-****-************ --org-id ********-****-****-****-************
```

</TabItem>
</Tabs>

Snowplow CLI defaults to yaml format. It can be changed to json by either providing a `--output-format json` flag or setting the `output-format: json` config value. It will work for all commands where it matters, not only for `generate`.
Installed and configured [Snowplow CLI](/docs/data-product-studio/snowplow-cli/index.md)


## Available commands
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: "Versioning data structures"
date: "2020-02-25"
sidebar_position: 2
sidebar_label: "Verson and amend"
sidebar_label: "Version and amend"
---

Snowplow is designed to make it easy for you to change your tracking design in a safe and backwards-compatible way as your organisational data needs evolve.
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
84 changes: 84 additions & 0 deletions docs/data-product-studio/snowplow-cli/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
---
title: Snowplow CLI
sidebar_label: Snowplow CLI
sidebar_position: 7
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

`snowplow-cli` brings data management elements of Snowplow Console into the command line. It allows you to download your data structures and data products to yaml/json files and publish them back to console. This enables git-ops-like workflows, with reviews and brancing.

# Install

Snowplow CLI can be installed with [homebrew](https://brew.sh/):
```
brew install snowplow-product/taps/snowplow-cli
```

Verify the installation with
```
snowplow-cli --help
```

For systems where homebrew is not available binaries for multiple platforms can be found in [releases](https://github.com/snowplow-product/snowplow-cli/releases).

Example installation for `linux_x86_64` using `curl`

```bash
curl -L -o snowplow-cli https://github.com/snowplow-product/snowplow-cli/releases/latest/download/snowplow-cli_linux_x86_64
chmod u+x snowplow-cli
```

Verify the installation with
```
./snowplow-cli --help
```

# Configure

You will need three values.

An API Key Id and the corresponding API Key (secret), which are generated from the [credentials section](https://console.snowplowanalytics.com/credentials) in BDP Console.

The organization ID, which can be retrieved from the URL immediately following the .com when visiting BDP console:

![](./images/orgID.png)

Snowplow CLI can take its configuration from a variety of sources. More details are available from `./snowplow-cli data-structures --help`. Variations on these three examples should serve most cases.

<Tabs groupId="config">
<TabItem value="env" label="env variables" default>

```bash
SNOWPLOW_CONSOLE_API_KEY_ID=********-****-****-****-************
SNOWPLOW_CONSOLE_API_KEY=********-****-****-****-************
SNOWPLOW_CONSOLE_ORG_ID=********-****-****-****-************
```

</TabItem>
<TabItem value="defaultconfig" label="$HOME/.config/snowplow/snowplow.yml" >

```yaml
console:
api-key-id: ********-****-****-****-************
api-key: ********-****-****-****-************
org-id: ********-****-****-****-************
```
</TabItem>
<TabItem value="args" label="inline arguments" >
```bash
./snowplow-cli data-structures --api-key-id ********-****-****-****-************ --api-key ********-****-****-****-************ --org-id ********-****-****-****-************
```

</TabItem>
</Tabs>

Snowplow CLI defaults to yaml format. It can be changed to json by either providing a `--output-format json` flag or setting the `output-format: json` config value. It will work for all commands where it matters, not only for `generate`.


# Use cases

- [Manage your data structures with snowplow-cli](/docs/data-product-studio/data-structures/manage/cli/index.md)
- [Set up a github CI/CD pipeline to manage data structures and data products](/docs/resources/recipes-tutorials/recipe-data-structures-in-git/index.md)
Loading

0 comments on commit 8035f87

Please sign in to comment.