docs: add data catalog and fix links

cds-snc · Jan 21, 2025 · af1ab2a · af1ab2a
1 parent d197fb6
commit af1ab2a
Show file tree

Hide file tree

Showing 6 changed files with 83 additions and 4 deletions.
diff --git a/docs/data/catalog/README.md b/docs/data/catalog/README.md
@@ -1,6 +1,7 @@
 # Data catalog
 Provides an inventory that contains metadata, descriptions, and technical details about all data sources within the data lake.
 
+- [Platform / Support / Freshdesk](./platform/support/freshdesk.md)
 - [Operations / AWS / Cost and Usage Report](./operations/aws/cost-and-usage-report.md)
 
 :page_facing_up: When adding a new pipeline, please use [template.md](./template.md) as a starting point.
diff --git a/docs/data/catalog/operations/aws/cost-and-usage-report.md b/docs/data/catalog/operations/aws/cost-and-usage-report.md
@@ -4,7 +4,7 @@ Dataset describing how much was spent on Amazon Web Services (AWS) by CDS.
 
 Each row describes the cost of using a particular AWS service (i.e., a line item) within a billing period.
 
-This dataset is represented in [Superset](https://superset.cdssandbox.xyz/) as the Physical dataset [`cost_usage_report_by_account`](https://superset.cdssandbox.xyz/explore/?datasource_type=table&datasource_id=68). All of the Virtual datasets in the "Operations / AWS / Cost and Usage" group are derived from it.
+This dataset is represented in [Superset](https://superset.cds-snc.ca/) as the Physical dataset [`cost_usage_report_by_account`](https://superset.cds-snc.ca/explore/?datasource_type=table&datasource_id=68). All of the Virtual datasets in the "Operations / AWS / Cost and Usage" group are derived from it.
 
 **Keywords:** AWS, Amazon, cost, usage, fees
 

diff --git a/docs/data/catalog/operations/aws/examples/cost-and-usage-report.sql b/docs/data/catalog/operations/aws/examples/cost-and-usage-report.sql
@@ -3,7 +3,7 @@ This query will return the first ten rows of the "AWS Cost and Usage Report"
 dataset. To run it, open the SQL Lab in Superset and cut and paste the whole
 query into the query window.
 
-SQL Lab: https://superset.cdssandbox.xyz/sqllab/
+SQL Lab: https://superset.cds-snc.ca/sqllab/
 
 The example dataset is provided as a query instead of a CSV to limit
 visibility to only those with Superset access.
@@ -13,4 +13,4 @@ SELECT
   *
 FROM
   operations_aws_production.cost_usage_report_by_account 
-LIMIT 10;
+LIMIT 10;
diff --git a/docs/data/catalog/platform/support/examples/freshdesk.sql b/docs/data/catalog/platform/support/examples/freshdesk.sql
@@ -0,0 +1,16 @@
+/*
+This query will return the first ten rows of the "Freshdesk"
+dataset. To run it, open the SQL Lab in Superset and cut and paste the whole
+query into the query window.
+
+SQL Lab: https://superset.cds-snc.ca/sqllab/
+
+The example dataset is provided as a query instead of a CSV to limit
+visibility to only those with Superset access.
+*/
+
+SELECT
+  *
+FROM
+  platform_support_production.platform_support_freshdesk 
+LIMIT 10;
diff --git a/docs/data/catalog/platform/support/freshdesk.md b/docs/data/catalog/platform/support/freshdesk.md
@@ -0,0 +1,62 @@
+# Platform / Support / Freshdesk
+
+Dataset providing Freshdesk support ticket data raised by the users of CDS products.
+
+Each row is a Freshdesk ticket that does not include any personally identifiable information (PII) or user entered content.
+
+This dataset is represented in [Superset](https://superset.cds-snc.ca/) as the Physical dataset `platform_support_freshdesk`. 
+
+**Keywords:** Platform, Freshdesk, support, tickets
+
+---
+
+[:information_source:  View the data pipeline](../../../pipelines/platform/support/freshdesk.md)
+
+## Provenance
+
+This dataset is extracted daily using the Freshdesk API.  Each day the extract process downloads all tickets from the previous day that have been updated or created.  These tickets are then merged with the existing tickets in the dataset.
+
+More documentation on the pipeline can be found [here](../../../pipelines/platform/support/freshdesk.md).
+
+* **Updated:** Daily
+* **Steward:** Platform Core Services
+* **Contact:** [Pat Heard](mailto:[email protected])
+* **Location:** s3://cds-data-lake-transformed-production/platform/support/freshdesk/month=YYYY-MM/*.parquet
+
+## Fields
+
+Almost all fields are sourced directly from Freshdesk's [Tickets](https://developers.freshdesk.com/api/#tickets), [Contacts](https://developers.freshdesk.com/api/#contacts), and [Conversations](https://developers.freshdesk.com/api/#conversations) API endpoints.
+
+A [query to return example data](examples/freshdesk.sql) has also been provided.
+
+Here's a descriptive list of the Freshdesk ticket fields:
+
+* `id` (bigint) - Unique identifier for each support ticket.
+* `status` (bigint) - Numerical code representing the ticket's current status.
+* `status_label` (string) - Human-readable label for the ticket status (e.g., "Open", "Pending", "Resolved").
+* `priority` (bigint) - Numerical code indicating the ticket's priority level.
+* `priority_label` (string) - Human-readable label for the priority level (e.g., "Low", "Medium", "High", "Urgent").
+* `source` (bigint) - Numerical code indicating how the ticket was created.
+* `source_label` (string) - Human-readable label for the ticket source (e.g., "Email", "Phone", "Portal", "Chat").
+* `created_at` (timestamp) - Date and time when the ticket was initially created.
+* `updated_at` (timestamp) - Date and time of the most recent update to the ticket.
+* `due_by` (timestamp) - Deadline for ticket resolution based on support policies.
+* `fr_due_by` (timestamp) - First response due time based on support policies.
+* `is_escalated` (boolean) - Indicates whether the ticket has been escalated to a higher support tier.
+* `tags` (array<string>) - List of labels or categories assigned to the ticket for classification.
+* `spam` (boolean) - Indicates whether the ticket has been marked as spam.
+* `requester_email_suffix` (string) - Domain portion of the requester's email address.  For non Government of Canada users, this will have the value of `external`.
+* `type` (string) - Classification of the ticket type.
+* `product_id` (bigint) - Unique identifier for the product associated with the ticket.
+* `product_name` (string) - Name of the product associated with the ticket.
+* `conversations_total_count` (bigint) - Total number of messages in the ticket thread.
+* `conversations_reply_count` (bigint) - Number of replies to and from the user in the ticket thread.
+* `conversations_note_count` (bigint) - Number of internal notes from the support team added to the ticket.
+* `language` (string) - Primary language used in the ticket communication.
+* `province_or_territory` (string) - Canadian province or territory where the ticket originated.
+* `organization` (string) - Government of Canada department or crown corporation associated with the ticket requester.
+* `month` (string) - Month of ticket creation, used as a partition key for data organization.
+
+## Notes
+
+The `language`, `province_or_territory` and `organization` fields are custom fields managed by the Platform Support team.  As such, they will not always have a value populated in the dataset.
diff --git a/docs/data/catalog/template.md b/docs/data/catalog/template.md
@@ -20,7 +20,7 @@ Briefly describe where the dataset comes from using words. If from a database, i
 
 ## Fields
 
-[Link to the first 10-20 rows of the table as CSV](http://www.example.com/dataset.csv). If the head of the table is not representative (e.g., missing data) or sensitive (contains PII), more appropriate rows may be selected instead. Alternatively, a [SQL query](http://www.example.com/dataset.sql) may be provided that returns appropriate example data. This query must be complete, so much that it can be run directly in [Superset's SQL Lab](https://superset.cdssandbox.xyz/sqllab/) without modification.
+[Link to the first 10-20 rows of the table as CSV](http://www.example.com/dataset.csv). If the head of the table is not representative (e.g., missing data) or sensitive (contains PII), more appropriate rows may be selected instead. Alternatively, a [SQL query](http://www.example.com/dataset.sql) may be provided that returns appropriate example data. This query must be complete, so much that it can be run directly in [Superset's SQL Lab](https://superset.cds-snc.ca/sqllab/) without modification.
 
 A bulleted list of field names must be included, alongside a brief description of the field. Boolean descriptions can simply be the question answered by the boolean. The data type of a field and the unit of measurement should be included as well. The names of data types are dictated by the storage format. For example, the Parquet storage format commonly includes include booleans, dates, floats, integers, strings, times, and timestamps. There is no need to include the integer or float width unless the circumstances are exceptional.