From 265d6562ccff2d37ee3e65b826267963d25de702 Mon Sep 17 00:00:00 2001 From: PierreDemailly <39910767+PierreDemailly@users.noreply.github.com> Date: Mon, 20 Nov 2023 22:39:32 +0100 Subject: [PATCH] docs(config): improve docs (#151) --- src/config/README.md | 434 +---------------------------- src/config/docs/composite-rules.md | 160 +++++++++++ src/config/docs/interfaces.md | 237 ++++++++++++++++ src/config/docs/rules.md | 424 ++++++++++++++++++++++++++++ src/config/docs/self-monitoring.md | 89 ++++++ src/config/docs/templates.md | 67 +++++ src/config/docs/throttle.md | 13 + 7 files changed, 995 insertions(+), 429 deletions(-) create mode 100644 src/config/docs/composite-rules.md create mode 100644 src/config/docs/interfaces.md create mode 100644 src/config/docs/rules.md create mode 100644 src/config/docs/self-monitoring.md create mode 100644 src/config/docs/templates.md create mode 100644 src/config/docs/throttle.md diff --git a/src/config/README.md b/src/config/README.md index 7a73cac..d8d5120 100644 --- a/src/config/README.md +++ b/src/config/README.md @@ -146,12 +146,7 @@ The `selfMonitoring` property defines how/when Sigyn should emit alert for self |------------|------------|----------|-------------| | `apiUrl` | `string` | ✔️ | The Loki API url | --- -- `templates` (Object, Optional): - - This object specifies templates to be used in the `rules`. - - | Property | Type | Required | Description | - |----------------|----------|----------|-------------| - | `[key:string]` | `object` | ✔️ | A record of template object that can have either of `title`, `content` or `extends` properties (**See below**) | +- `templates` See [templates](./docs/templates.md) --- - `extends` (String[], Optional): - This array specifies the configuration paths to extends from. @@ -165,156 +160,11 @@ The `selfMonitoring` property defines how/when Sigyn should emit alert for self | `ignore` | (**Default**) Skip the rule creation for each unknown label | | `error` | Invalidate config and throws when an unknown label is given | --- -- `rules` (Required, Array of Objects): - - This property holds an array of monitoring rules. - - Each rule object must have the following properties: - - | Property | Type | Required | Description | - |-------------|------------------------|----------|-------------| - | `name` | `string` | ✔️ | The name of the rule. Must be unique between each rule. | - | `logql` | `string` **or** `object` | ✔️ | The LogQL query associated with the rule. You can use `{label.x}` where `x` is provided in `labelFilters` (see example below) | - | `polling` | `string` or `string[]` | ❌ | The polling interval for the rule. You can use a `duration` i.e. `2m` or a **Cron expression**. If given an array of polling, it should only be **Cron expressions**, this is useful if you want a different polling the day and the night. Default to `1m`. | - | `pollingStrategy` | `bounded` or `unbounded` | ❌ | **For CRON polling only**. Defines how Sigyn should fetch logs given a range. For instance, given `* 7-20 * * *` at `7:00` it will fetch logs since `20:59` last day with `unbounded` strategy. It will skip and wait the next poll given a `bounded` strategy. Default to `unbounded` - | `alert` | `object` | ✔️ | An object defining the alerting configuration for the rule. | - | `disabled` | `boolean` | ❌ | Weither the rule is enabled, default to `false`. | - | `notifiers` | `string[]` | ❌ | An array of strings representing the notifiers for the rule. It will enables all configured `notifiers` by default. | - -- `rules.logql` (Object, Required): - - This object specifies rule LogQL options - - You can either use this object pattern **or** a simple string. - - | Property | Type | Required | Description | - |-------------|----------------------------------------|----------|-------------| - | `query` | `string` | ✔️ | The LogQL query e.g. `{app="foo"} |= "error"` | - | `vars` | `Record` | ✔️ | A record of vars that you can use in the `query` with `{vars.yourVar}` syntax | - -- `rules.alert` (Object, Required): - - This object specifies the alerting configuration for the rule. - - It must have the following properties: - - | Property | Type | Required | Description | - |------------|----------|----------|-------------| - | `on` | `object` | ✔️ | An object specifying when the alert should trigger. | - | `template` | `object` or `string` | ✔️ | An object or a string representing the notification template. | - -- `rules.alert.on` (Object, Required): - - An object specifying when the alert should trigger. - - It must have the following properties: - - | Property | Type | Description | - |---------------------|----------------------|-------------| - | `count` | `number` or `string` | The count threshold of log or label that must triggers an alert. You can use a range string i.e. `<= 5`, `> 6`. For **label based** alert, this property **MUST** be a valid number i.e `900` or `"900"` | - | `interval` | `string` | The time interval for the alerting condition. | - | `label` | `string` | The label key to check. | - | `value` | `string` | The label value to check. | - | `valueMatch` | `string` | The label regexp to check. | - | `percentThreshold` | `number` | The percent threshold of label value. | - | `minimumLabelCount` | `number` | The minimum count of label to compare percent threshold. | - - > [!NOTE] - > There are 2 sorts of alert: **basic** and **label based** - > For **basic** alert, both `count` and `interval` are **required**, other properties **must** be omitted. - > For **label based** alert, `label`, `value` **or** `valueMatch` are **required** plus at least one of `minimumLabelCount` or `interval` which defines the minimum logs to be fetched to have a revelant alert when `percentThreshold` is set, or `count` which works the same as basic alerting. - > `minimumLabelCount` and/or `interval` are optional when rule is based on `count` label. - > You cannot use both `value` and `valueMatch` - -- `rules.alert.template` (Object or String, Required): - - Can be an object representing the notification template or a string refering to a root template. - - It can have either of the following properties: - - | Property | Type | Required | Description | - |------------|------------|----------|-------------| - | `title` | `string` | ❌ | The title of the notification template. | - | `content` | `string[]` or `object` | ❌ | The content of the notification template. It can be an object when extending another template | - | `content.before` | `string[]` | ❌ | The content of the notification template to add **after** the extended template's content | - | `content.after` | `string[]` | ❌ | The content of the notification template to add **before** the extended template's content | - | `content.at.index` | `number` | ❌ | The index indicating where the new content should be added. Negative index works i.e. `-1` mean "before the last line" | - | `content.at.value` | `string` | ❌ | The specific content line to be included at the provided index. | - | `extends` | `string` | ❌ | The content of the notification template. | - - > [!NOTE] - > One of `title`, `content` or `at` must be provided. - - > [!NOTE] - > When extending template with `extends`: - > - if `title` is specified then it replaces the extended template's title - > - if `content` is `string[]` then it has the same behavior as using `content.after` i.e. it adds the content **after** the extended template's content. - - > [!NOTE] - > Extending templates can be nested - -- `rules.alert.severity` (String or Number, Optional): - - If not specified, the default value is `config.defaultSeverity`, if not specified the default is Severity 3 (`error`). Theses severities change the alert UI sent by the notifiers. - **Allowed values:** - - `critical` - - `error` | `major` - - `warning` | `minor` - - `information` | `info` | `low` - -- `rules.alert.throttle` (Object, Optional): - - Can be an object representing the maximum amount of alert in a given interval. - - It must have the following properties: - - | Property | Type | Required | Description | - |------------|------------|----------|-------------| - | `interval` | `string` | ✔️ | The throttle duration (e.g. `1m`, `1h`) after sending an alert. | - | `count` | `number` | ❌ | The count threshold to bypass throttle, default to `0` (never send alert before the end of interval). | - | `activationThreshold` | `number` | ❌ | The number of alerts allowed to be sent before the throttle to be activated. | - | `labelScope` | `string[]` | ❌ | Allow for the implementation of a dedicated throttle mechanism per label value. For example, when the labelScope is `["app"]`, if an alert is triggered by logs from the 'foo' app, then subsequently, if new logs come from the 'bar' app, a second alert will also be triggered, resulting in a total of two alerts where both app have its own throttle. | - -- `rule.labelFilters` (Object, Optional): - - This object specifies label filters to add for a given rule. - - Each key represents a label - - | Property | Type | Required | Description | - |----------------|------------|----------|-------------| - | `[key:string]` | `string[]` | ✔️ | A list of label values | +- `rules` See [Rules](./docs/rules.md) --- -- `selfMonitoring` (Object, Optional): - - Represents the configuration to enable self-monitoring. - - | Property | Type | Required | Description | - |-------------|------------------------|----------|-------------| - | `template` | `string` or `object` | ✔️ | The notifiers template, works same as `rules.alert.template` | - | `notifiers` | `string []` | ✔️ | An array of strings representing the notifiers for the rule. It will enables all configured `notifiers` by default. | - | `errorFilters` | `string[]` | ❌ | An array of strings representing the error to be filtered. Each value can be a strict-equal value or a RegExp. Examples of errors: `Bad Gateway`, `Bad Request` (if `rule.logql` is wrong), `Gateway Timeout`, etc | - | `ruleFilters` | `string[]` | ❌ | An array of strings representing the rules to be filtered, **by their name**. Can be useful for instance if you have a rule with a very big potential count of logs that could often get a timeout | - | `minimumErrorCount` | `number` | ❌ | The minimum of error before triggering an alert | - | `throttle.interval` | `string` | ✔️ | The throttle duration (e.g. `1m`, `1h`) after sending an alert. | - | `throttle.count` | `number` | ❌ | The count threshold to bypass throttle, default to `0` (never send alert before the end of interval). | - | `throttle.activationThreshold` | `number` | ❌ | The number of alerts allowed to be sent before the throttle to be activated. | - -> [!WARNING] -> Self-monitoring templates can be a root template reference, however the available variables are differents. +- `compositeRules` See [Composite Rules](./docs/composite-rules.md) --- -- `compositeRules` (Required, Array of Objects): - - This property holds an array of composite rules. - - Composite rules are based on rules and allow to send alert when a given set of rules triggers too much alert - - Each composite rule object must have the following properties: - - | Property | Type | Required | Description | - |-------------|------------------------|----------|-------------| - | `name` | `string` | ✔️ | The name of the rule. Must be unique between each rule. | - | `notifiers` | `string []` | ✔️ | An array of strings representing the notifiers for the rule. It will enables all configured `notifiers` by default. | - | `include` | `string[]` | ❌ | An array of strings representing the rule to monitor. You can use glob syntax i.e `["PROD*"]` | - | `exclude` | `string[]` | ❌ | An array of strings representing the rule to exclude from monitoring. You can use glob syntax i.e `["PROD*"]` | - | `notifCount` | `number` | ✔️ | The minimum alerts to be sent from each watched rules | - | `ruleCountThreshold` | `number` | ❌ | The minimum count of matching rules to triggers an alert to unlock composite rule. For instance, if you have 10 rules and `ruleCountThreshold` is 7, it means 7 rules must triggers an alert | - | `interval` | `string` | ❌ | A duration (i.e `1d`, `15m`) that represents the maximum interval date to count rules alerts | - | `template` | `object` | ✔️ | The number of alerts allowed to be sent before the throttle to be activated. Works same as `rules`. | - | `template.title` | `string` | ❌ | The title of the notification template. | - | `template.content` | `string[]` or `object` | ❌ | The content of the notification template. It can be an object when extending another template | - | `template.content.before` | `string[]` | ❌ | The content of the notification template to add **after** the extended template's content | - | `template.content.after` | `string[]` | ❌ | The content of the notification template to add **before** the extended template's content | - | `template.content.at.index` | `number` | ❌ | The index indicating where the new content should be added. Negative index works i.e. `-1` mean "before the last line" | - | `template.content.at.value` | `string` | ❌ | The specific content line to be included at the provided index. | - | `template.extends` | `string` | ❌ | The content of the notification template. | - | `throttle` | `object` | ❌ | The maximum amount of alert in a given interval. | - | `throttle.interval` | `string` | ✔️ | The throttle duration (e.g. `1m`, `1h`) after sending an alert. | - | `throttle.count` | `number` | ❌ | The count threshold to bypass throttle, default to `0` (never send alert before the end of interval). | - | `throttle.activationThreshold` | `number` | ❌ | The number of alerts allowed to be sent before the throttle to be activated. | - | `muteRules` | `boolean` | ❌ | Weither matched rules should stop trigger alert when a higher-lever composite rule triggers.
Default `false`. | - | `muteDuration` | `string` | ❌ | Defines the duration for which rules should be muted.
Default `30m` | +- `selfMonitoring` See [Self Monitoring](./docs/self-monitoring.md) --- **Notifiers** @@ -351,46 +201,6 @@ Each notifier is a key-value object where key represents the notifier name to re > [!NOTE] > [You can also use your own notifier](../notifiers/README.md) or any third-party notifier -You can use any of theses variables, surrounding with `{}` (see example below): -- `ruleName` -- `logql` -- `count` (count of logs retrievied within the interval) -- `counter` -- `threshold` (`alert.on.count`) -- `interval` -- `lokiUrl` - -> [!NOTE] -> You can use hyperlink with Markdown i.e. `[See logs]({lokiUrl})` - -For self-monitoring, you can use theses variables, surrounding with `{}`: -- `agentFailure.errors` which is equal to the joined error messages -- `agentFailure.rules` which is equal to the joined failed rules - -For composite rules, you can use theses variables, surrounding with `{}`: -- `compositeRuleName` -- `label` which includes each combined labels from all rules -- `rules` joined rules names - -You can also use a label variable from your LogQL using `{label.x}`: -```json -{ - ... - "logql": "{app=\"foo\", env=\"preprod\"} |= `my super logql`", - "template": { - "content": [ - "app: {label.app} | env: {label.env}" - ] - } - ... -} -``` - -You can also use any variable extracted from `stream` vector. - -> [!NOTE] -> You **MUST NOT** use markdown in `title` or `content`, this is handled by notifiers. - ## 🧠 Visual Studio Code JSON schema You can easily enjoy autocompletion & documentation from JSON schema for your `sigyn.config.json` on Visual Studio Code. @@ -433,241 +243,7 @@ Validate Sigyn extended configuration against an internal AJV Schema. ## 🖋️ Interfaces -```ts -interface SigynConfig { - loki: LokiConfig; - notifiers: Record; - rules: SigynRule[]; - templates?: Record; - extends?: string[]; - missingLabelStrategy: "ignore" | "error"; - defaultSeverity: AlertSeverity; - selfMonitoring?: SigynSelfMonitoring; - compositeRules?: SigynCompositeRule[]; -} - -interface SigynInitializedConfig { - loki: LokiConfig; - notifiers: Record; - rules: SigynInitializedRule[]; - templates?: Record; - extends?: string[]; - missingLabelStrategy: "ignore" | "error"; - defaultSeverity: AlertSeverity; - selfMonitoring?: SigynInitializedSelfMonitoring; - compositeRules?: SigynInitializedCompositeRule[]; -} - -interface PartialSigynConfig { - loki: LokiConfig; - notifiers: Record; - rules: PartialSigynRule[]; - templates?: Record; - extends?: string[]; - missingLabelStrategy?: "ignore" | "error"; - defaultSeverity?: AlertSeverity; - selfMonitoring?: SigynSelfMonitoring; - compositeRules?: SigynCompositeRule[]; -} - -type ExtendedSigynConfig = Pick; - -interface LokiConfig { - apiUrl: string; -} - -interface SigynRule { - name: string; - logql: string | { query: string; vars?: Record }; - polling: string | string[]; - pollingStrategy: "bounded" | "unbounded"; - alert: SigynAlert; - disabled: boolean; - notifiers: string[]; - labelFilters?: Record; -} - -interface SigynInitializedRule { - name: string; - logql: string; - polling: string | string[]; - pollingStrategy: "bounded" | "unbounded"; - alert: SigynInitializedAlert; - disabled: boolean; - notifiers: string[]; - labelFilters?: Record; -} - -interface PartialSigynRule { - name: string; - logql: string | { query: string; vars?: Record }; - polling?: string | string[]; - pollingStrategy?: "bounded" | "unbounded"; - alert: PartialSigynAlert; - disabled?: boolean; - notifiers?: string[]; - labelFilters?: Record; -} - -type NotifierFormattedSigynRule = Omit & { - alert: Omit -}; - -type AlertSeverity = - "critical" | - "error" | "major" | - "warning" | "minor" | - "information" | "info" | "low"; - -interface SigynAlert { - on: { - count?: string | number; - interval?: string; - label?: string; - value?: string; - valueMatch?: string; - percentThreshold?: number; - minimumLabelCount?: number; - }, - template: string | SigynAlertTemplate; - severity: Extract; - throttle?: { - count: number; - interval: string; - activationThreshold?: number; - labelScope?: string[]; - }; -} - -interface SigynInitializedAlert { - on: { - count?: string | number; - interval?: string; - label?: string; - value?: string; - valueMatch?: string; - percentThreshold?: number; - minimumLabelCount?: number; - }, - template: SigynInitializedTemplate; - severity: Extract; - throttle?: { - count: number; - interval: string; - activationThreshold: number; - labelScope: string[]; - }; -} - -interface PartialSigynAlert { - on: { - count?: string | number; - interval?: string; - label?: string; - value?: string; - valueMatch?: string; - percentThreshold?: number; - minimumLabelCount?: number; - }, - template: string | SigynAlertTemplate; - severity?: AlertSeverity; - throttle?: { - count?: number; - interval: string; - activationThreshold?: number; - labelScope?: string[]; - }; -} - -interface SigynAlertTemplateExtendedContent { - before?: string[]; - after?: string[]; -} - -interface SigynAlertTemplate { - title?: string; - content?: string[] | SigynAlertTemplateExtendedContent; - extends?: string; -} - -interface SigynInitializedTemplate { - title?: string; - content?: string[]; -} - -interface SigynSelfMonitoring { - template: string | SigynInitializedTemplate; - notifiers: string[]; - errorFilters?: string[]; - ruleFilters?: string[]; - minimumErrorCount?: number; - throttle?: { - count?: number; - interval: string; - activationThreshold?: number; - }; -} - -interface SigynInitializedSelfMonitoring { - template: SigynInitializedTemplate; - notifiers: string[]; - errorFilters?: string[]; - ruleFilters?: string[]; - minimumErrorCount?: number; - throttle?: { - count: number; - interval: string; - activationThreshold: number; - }; -} - -interface SigynCompositeRule { - name: string; - include?: string[]; - exclude?: string[]; - notifCount: number; - ruleCountThreshold?: number; - interval?: string; - template: string | SigynAlertTemplate; - notifiers?: string[]; - throttle?: { - count?: number; - interval: string; - activationThreshold?: number; - }; -} - -interface SigynInitializedCompositeRule { - name: string; - rules: string[]; - notifCount: number; - ruleCountThreshold?: number; - interval: string; - template: string | SigynInitializedTemplate; - notifiers: string[]; - throttle?: { - count: number; - interval: string; - activationThreshold: number; - }; -} -``` -> [!NOTE] -> `SigynInitializedConfig` represents the config after initialization. -> For instance, given a rule with a `logql` object with `query` & `vars`, the rule is updated upon initialization then `logql` is always as **string**. - -> [!NOTE] -> `PartialSigynConfig`, `PartialSigynRule` and `PartialSigynAlert` are the allowed types to **validate** config. -> These types have extra optional fields that are set by their default values upon initialization (`initConfig()`). +See [Interfaces](./docs/interfaces.md) ## License MIT diff --git a/src/config/docs/composite-rules.md b/src/config/docs/composite-rules.md new file mode 100644 index 0000000..dd81745 --- /dev/null +++ b/src/config/docs/composite-rules.md @@ -0,0 +1,160 @@ +# Composite Rules + +Composite rules are based on rules and allow to send alert when a given set of rules triggers too much alert for a given interval. + +Composite rules takes an array of object in the `compositeRules` root config field. + +## Summary + +- [Example Configuration](#example-configuration) +- [Schema Properties](#schema-properties) + - [`name`](#name) + - [`notifiers`](#notifiers) + - [`include`](#include) + - [`exclude`](#exclude) + - [`notifCount`](#notifcount) + - [`ruleCountThreshold`](#rulecountthreshold) + - [`interval`](#interval) + - [`template`](#template) (See [Templates](./templates.md)) + - [`throttle`](#throttle) (See [Throttle](./throttle.md)) + - [`muteRules`](#muteRules) + - [`muteDuration`](#muteDuration) + +## Example configuration + +```json +{ + "compositeRules": [ + { + "name": "Composite Rule", + "template": { + "title": "title", + "content": ["content"] + }, + "notifCount": 6, + "throttle": { + "interval": "5m", + "count": 3 + }, + "ruleCountThreshold": 2, + "muteRules": true + } + ] +} +``` + +## Schema Properties + +### `name` + +The name of the composite rule. Must be unique between each composite rule. + +| Type | Required | +|------------------------|----------| +| `string` | ✔️ | + +### `notifiers` + +Defines the notifiers to send alerts on. + +| Type | Required | Default | +|-----------|----------|--------------------------------| +| `string[]` | ❌ | All root configured notifiers | + +**Examples** +```json +{ + ..., + "notifiers": { + "slack": { + "notifier": "slack", + "webhookUrl": "https://hooks.slack.com/services/aaa/bbb" + }, + "discord": { + "notifier": "discord", + "webhookUrl": "https://discord.com/api/webhooks/aaa/bbb" + } + }, + "compositeRules": [ + { + "name": "Send alerts to Slack notifier only", + "notifiers": ["slack"], + ... + }, + { + "name": "notifiers are skipped: send alerts to both Slack & Discord", + ... + } + ], + ... +} +``` + +### `include` + +A list of rule to monitor, you can use glob i.e `My Service -*`. +By default, the composite rule is based on each rule. + +| Type | Required | +|------------|----------| +| `string[]` | ❌ | + +### `exclude` + +A list of rule to exclude from monitoring, you can use glob i.e `My Service -*`. + +| Type | Required | +|------------|----------| +| `string[]` | ❌ | + +### `notifCount` + +The minimum alert to have been sent from each rules to triggers the composite rule. + +| Type | Required | +|----------|----------| +| `number` | ✔️ | + +### `ruleCountThreshold` + +The minimum count of matching rules to triggers an alert to unlock composite rule. +For instance, if you have 10 rules and `ruleCountThreshold` is 7, it means 7 rules must triggers an alert before the composite rule triggers. + +| Type | Required | +|----------|----------| +| `number` | ❌ | + +### `interval` + +A duration (i.e `1d`, `15m`) that represents the maximum interval date to count rules alerts. + +| Type | Required | Default | +|----------|----------|---------| +| `string` | ❌ | `1d` | + +### `template` + +See [Templates](./templates.md) + +> [!NOTE] +> `template` is **required**. + +### `throttle` + +See [Throttle](./throttle.md) + +### `muteRules` + +Weither matched rules should stop trigger alert when a higher-level composite rule triggers. + +| Type | Required | Default | +|-----------|----------|---------| +| `boolean` | ❌ | `false` | + +### `muteDuration` + +Defines the duration for which rules should be muted when `muteRules` is `true`. + +| Type | Required | Default | +|----------|----------|---------| +| `string` | ❌ | `30m` | diff --git a/src/config/docs/interfaces.md b/src/config/docs/interfaces.md new file mode 100644 index 0000000..c1371b5 --- /dev/null +++ b/src/config/docs/interfaces.md @@ -0,0 +1,237 @@ +# Interfaces + +```ts +interface SigynConfig { + loki: LokiConfig; + notifiers: Record; + rules: SigynRule[]; + templates?: Record; + extends?: string[]; + missingLabelStrategy: "ignore" | "error"; + defaultSeverity: AlertSeverity; + selfMonitoring?: SigynSelfMonitoring; + compositeRules?: SigynCompositeRule[]; +} + +interface SigynInitializedConfig { + loki: LokiConfig; + notifiers: Record; + rules: SigynInitializedRule[]; + templates?: Record; + extends?: string[]; + missingLabelStrategy: "ignore" | "error"; + defaultSeverity: AlertSeverity; + selfMonitoring?: SigynInitializedSelfMonitoring; + compositeRules?: SigynInitializedCompositeRule[]; +} + +interface PartialSigynConfig { + loki: LokiConfig; + notifiers: Record; + rules: PartialSigynRule[]; + templates?: Record; + extends?: string[]; + missingLabelStrategy?: "ignore" | "error"; + defaultSeverity?: AlertSeverity; + selfMonitoring?: SigynSelfMonitoring; + compositeRules?: SigynCompositeRule[]; +} + +type ExtendedSigynConfig = Pick; + +interface LokiConfig { + apiUrl: string; +} + +interface SigynRule { + name: string; + logql: string | { query: string; vars?: Record }; + polling: string | string[]; + pollingStrategy: "bounded" | "unbounded"; + alert: SigynAlert; + disabled: boolean; + notifiers: string[]; + labelFilters?: Record; +} + +interface SigynInitializedRule { + name: string; + logql: string; + polling: string | string[]; + pollingStrategy: "bounded" | "unbounded"; + alert: SigynInitializedAlert; + disabled: boolean; + notifiers: string[]; + labelFilters?: Record; +} + +interface PartialSigynRule { + name: string; + logql: string | { query: string; vars?: Record }; + polling?: string | string[]; + pollingStrategy?: "bounded" | "unbounded"; + alert: PartialSigynAlert; + disabled?: boolean; + notifiers?: string[]; + labelFilters?: Record; +} + +type NotifierFormattedSigynRule = Omit & { + alert: Omit +}; + +type AlertSeverity = + "critical" | + "error" | "major" | + "warning" | "minor" | + "information" | "info" | "low"; + +interface SigynAlert { + on: { + count?: string | number; + interval?: string; + label?: string; + value?: string; + valueMatch?: string; + percentThreshold?: number; + minimumLabelCount?: number; + }, + template: string | SigynAlertTemplate; + severity: Extract; + throttle?: { + count: number; + interval: string; + activationThreshold?: number; + labelScope?: string[]; + }; +} + +interface SigynInitializedAlert { + on: { + count?: string | number; + interval?: string; + label?: string; + value?: string; + valueMatch?: string; + percentThreshold?: number; + minimumLabelCount?: number; + }, + template: SigynInitializedTemplate; + severity: Extract; + throttle?: { + count: number; + interval: string; + activationThreshold: number; + labelScope: string[]; + }; +} + +interface PartialSigynAlert { + on: { + count?: string | number; + interval?: string; + label?: string; + value?: string; + valueMatch?: string; + percentThreshold?: number; + minimumLabelCount?: number; + }, + template: string | SigynAlertTemplate; + severity?: AlertSeverity; + throttle?: { + count?: number; + interval: string; + activationThreshold?: number; + labelScope?: string[]; + }; +} + +interface SigynAlertTemplateExtendedContent { + before?: string[]; + after?: string[]; +} + +interface SigynAlertTemplate { + title?: string; + content?: string[] | SigynAlertTemplateExtendedContent; + extends?: string; +} + +interface SigynInitializedTemplate { + title?: string; + content?: string[]; +} + +interface SigynSelfMonitoring { + template: string | SigynInitializedTemplate; + notifiers: string[]; + errorFilters?: string[]; + ruleFilters?: string[]; + minimumErrorCount?: number; + throttle?: { + count?: number; + interval: string; + activationThreshold?: number; + }; +} + +interface SigynInitializedSelfMonitoring { + template: SigynInitializedTemplate; + notifiers: string[]; + errorFilters?: string[]; + ruleFilters?: string[]; + minimumErrorCount?: number; + throttle?: { + count: number; + interval: string; + activationThreshold: number; + }; +} + +interface SigynCompositeRule { + name: string; + include?: string[]; + exclude?: string[]; + notifCount: number; + ruleCountThreshold?: number; + interval?: string; + template: string | SigynAlertTemplate; + notifiers?: string[]; + throttle?: { + count?: number; + interval: string; + activationThreshold?: number; + }; +} + +interface SigynInitializedCompositeRule { + name: string; + rules: string[]; + notifCount: number; + ruleCountThreshold?: number; + interval: string; + template: string | SigynInitializedTemplate; + notifiers: string[]; + throttle?: { + count: number; + interval: string; + activationThreshold: number; + }; +} +``` +> [!NOTE] +> `SigynInitializedConfig` represents the config after initialization. +> For instance, given a rule with a `logql` object with `query` & `vars`, the rule is updated upon initialization then `logql` is always as **string**. + +> [!NOTE] +> `PartialSigynConfig`, `PartialSigynRule` and `PartialSigynAlert` are the allowed types to **validate** config. +> These types have extra optional fields that are set by their default values upon initialization (`initConfig()`). diff --git a/src/config/docs/rules.md b/src/config/docs/rules.md new file mode 100644 index 0000000..1089438 --- /dev/null +++ b/src/config/docs/rules.md @@ -0,0 +1,424 @@ +# Rules + +Rules represents the main field of the Sigyn configuration. Theses allow to detailing when alerts should trigger based on specified conditions. + +Rules takes an array of object in the `rules` root config field. + +> [!NOTE] +> There are 2 sorts of rules: **basic** and **label based**. +> - **basic** rule: both `count` and `interval` are **required**, other properties **must** be omitted. +> - **label based** rule: `label`, `value` **or** `valueMatch` are **required** plus at least one of `minimumLabelCount` or `interval` which defines the minimum logs to be fetched to have a revelant alert when `percentThreshold` is set, or `count` which works the same as basic alerting. +> `minimumLabelCount` and/or `interval` are optional when rule is based on `count` label. +> You cannot use both `value` and `valueMatch` + +## Summary + +- [Example Configuration](#example-configuration) +- [Schema Properties](#schema-properties) + - [`name`](#name) + - [`logql`](#logql) + - [`polling`](#polling) + - [`pollingStrategy`](#pollingstrategy) + - [`disabled`](#disabled) + - [`notifiers`](#notifiers) + - [`labelFilters`](#labelfilters) + - [`alert`](#alert) + - [`alert.on`](#alerton) + - [`alert.on.count`](#alertoncount) + - [`alert.on.interval`](#alertoninterval) + - [`alert.on.label`](#alertonlabel) + - [`alert.on.value`](#alertonvalue) + - [`alert.on.valueMatch`](#alertonvaluematch) + - [`alert.on.percentThreshold`](#alertonpercentthreshold) + - [`alert.template`](#alerttemplate) (See [Templates](./templates.md)) + - [`alert.severity`](#alertseverity) + - [`alert.throttle`](#alertthrottle) (See [Throttle](./throttle.md)) + +## Example configuration + +```json +{ + "rules": [ + { + "name": "Foo", + "logql": "{app=\"foo\", env=\"preprod\"} |= `Error`", + "polling": [ + "*/10 * 0-15 * * *", + "*/30 * 16-23 * * *" + ], + "alert": { + "on": { + "count": "10", + "interval": "5m" + }, + "template": { + "title": "{ruleName} - Triggered {counter} times!", + "content": [ + "- LogQL: {logql}", + "- Threshold: {count}", + "- Interval: {interval}", + "- [See logs on Grafana]({lokiUrl})" + ] + } + } + }, + { + "name": "My rule on env: {label.env}", + "logql": "{app=\"foo\", env={label.env}} |= `your awesome logql`", + "polling": "30s", + "labelFilters": { + "env": ["prod", "preprod"] + }, + "alert": { + "on": { + "count": "< 10", + "interval": "5m" + }, + "template": "onlyTitle" + } + }, + { + "name": "A rule based on label matching", + "logql": "{app=\"foo\"} |~ `state: (ok|ko)` | regexp `state: (?Pok|ko)`", + "alert": { + "on": { + "label": "state", + "value": "ko", + "percentThreshold": 80, + "interval": "5d" + }, + "template": { + "title": "Too much KO" + } + } + } + ], + ... +} +``` + +## Schema Properties + +### `name` + +The name of the rule. Must be unique between each rule. + +| Type | Required | +|------------------------|----------| +| `string` | ✔️ | + +You can use labels from `labelFilters` (see below), **example**: + +```json +{ + "name": "[Foo] Error - {label.env}", + "labelFilters": { + "env": ["production", "preprod"] + }, + ... +}, +``` + +> [!NOTE] +> When a rule use `labelFilters`, since rule name must be unique, Sigyn will update the name automatically if you don't use each label filters in the name like in the above example. + +### `logql` + +The LogQL query associated with the rule. + +| Type | Required | +|------------------------|----------| +| `string` or `object` | ✔️ | + +You can use labels from `labelFilters` (see below), **example**: + +```json +{ + "logql": "{app=\"foo\", env={label.env}} |= `Error`", + "labelFilters": { + "env": ["production", "preprod"] + }, + ... +}, +``` + +The object syntax can be useful with variables, **example**: + +```json +{ + "logql": { + "query": "{app=\"foo\"} |~ `connect ({vars.tcpErrors})`", + "vars": { + "tcpErrors": [ + "ECONNREFUSED", + "ECONNRESET", + "ECONNABORTED", + "EHOSTUNREACH", + "ETIMEDOUT" + ] + } + }, + ... +}, +``` + +The **LogQL** query will become ``{app=\"foo\"} |~ `connect (ECONNREFUSED|ECONNRESET|ECONNABORTED|EHOSTUNREACH|ETIMEDOUT)` `` upon initialization. + +### `polling` + +The polling interval for the rule. You can use a `duration` i.e. `2m` or a **Cron expression**. +If given an array of polling, it should only be **Cron expressions**, this is useful if you want a different polling the day and the night. + +| Type | Required | Default | +|------------------------|----------|---------| +| `string` or `string[]` | ❌ | `1m` | + +**Examples** +```json +{ + "polling": "1h", + ... +} +``` + +With this polling, Sigyn will fetch logs every hours. +```json +{ + "polling": [ + "* 8-20 * * 1-5", + "*/10 21-7 * * 1-5" + ], + ... +} +``` +With this config, Sigyn will fetch logs Monday through Friday, every minutes from 8:00 am to 8:59 pm and every 10 minutes from 9 pm to 7:59 am. + +### `pollingStrategy` + +The `pollingStrategy` defines how Sigyn fetch logs on the polling range resumption. + +Given this `polling`: `* 7-20 * * *`, at 7:00 am it will fetch logs since 8:59 pm (last day) when strategy is `unbounded`. +You can use `bounded` strategy to make Sigyn skip the first poll so the first poll and starts at 7:01. + +| Type | Required | Default | +|--------------------------|----------|-------------| +| `bounded` or `unbounded` | ❌ | `unbounded` | + +**Example** +```json +{ + "polling": "* 7-20 * * *", + "pollingStrategy": "bounded", + ... +} +``` + +### `disabled` + +Weither the rule should be disabled. When a rule is disabled, Sigyn simply ignore it. + +| Type | Required | Default | +|-----------|----------|-------------| +| `boolean` | ❌ | `false` | + +### `notifiers` + +Defines the notifiers to send alerts on. + +| Type | Required | Default | +|-----------|----------|--------------------------------| +| `string[]` | ❌ | All root configured notifiers | + +**Examples** +```json +{ + ..., + "notifiers": { + "slack": { + "notifier": "slack", + "webhookUrl": "https://hooks.slack.com/services/aaa/bbb" + }, + "discord": { + "notifier": "discord", + "webhookUrl": "https://discord.com/api/webhooks/aaa/bbb" + } + }, + "rules": [ + { + "name": "Send alerts to Slack notifier only", + "notifiers": ["slack"], + ... + }, + { + "name": "notifiers are skipped: send alerts to both Slack & Discord", + ... + } + ], + ... +} +``` + +### `labelFilters` + +The `labelFilters` field allow you to duplicate a rule for multiple labels (i.e envs, services...). +It works in pair with `logql` field where you can set the wanted labels. + +| Type | Required | Default | +|-----------|----------|-------------| +| `object` | ❌ | `false` | + +Where all items are defined as: + +| Property | Type | Required | +|----------------|------------|----------| +| `[key:string]` | `string[]` | ✔️ | + +**Examples** + +```json +{ + "logql": "{app=\"foo\", env={label.env}} |= `Error`", + "labelFilters": { + "env": ["production", "preprod"] + }, +} +``` + +Upon initialization, Sigyn will create 2 distinct rules, with theses queries: +- ``{app=\"foo\", env=\"production\"} |= `Error` `` (the first `production` label filter) +- ``{app=\"foo\", env=\"preprod\"} |= `Error` `` (the second `preprod` label filter) + +### `alert` + +The `alert` field allow to configure alerting behaviors via multiple rule definitions. + +### `alert.on` + +An object specifying when the alert should trigger. + +| Type | Required | +|------------|----------| +| `object` | ✔️ | + +### `alert.on.count` + +The count threshold of logs (or labels when rule is based on labels) to triggers an alert. +You can use a range string i.e `<=50`, `>6`. + +For **label based** rule, this property **must** be a valid number i.e `900` or `"900"`. + +| Type | Required | +|----------------------|----------| +| `number` or `string` | ✔️ | + +### `alert.on.interval` + +The interval range to compare fetched rules. When count represents a minimum of logs then Sigyn make sure the interval has been reached. + +For instance, given this config: +```json +{ + "polling": "1m", + "alert": { + "on": { + "interval": "1h", + "count": 100 + } + } +} +``` + +If it Sigyn retrieve 20 logs per poll, then after the 5th poll (5 minutes) it will trigger the alert because the interval doesn't need to be fully completed. + +If `count` is `0` or `<= 20`, now Sigyn will count each retrievied logs only after one hour. This allow you to create alerts based on a minimum count of logs for a given interval. +We can imagine an authentication service: if there is zero connexion in a 1 hour interval, then probably there is a problem somewhere (this use case works nice with CRON polling where you might not be interested to monitor logs the night). + +### `alert.on.label` + +The `label` field allow to make the rule based on a given label. +Imagine this **LogQL** query: ``{app=\"foo\"} |~ `statusCode: [0-9]+` | regexp `((?P\\d+\\.\\d+)ms)` ``. The label `responseTime` will be added to all retrieved logs. Then, you can make alert rule based on `responseTime`. + +| Type | Required | +|----------------------| +| `string` | ✔️ | + +### `alert.on.value` + +> [!IMPORTANT] +> For label based rule only. + +The `value` field specify the value for which the label should match. Can be either a `string` or a `range` i.e. `> 500`. +Based on the previous example query: ``{app=\"foo\"} |~ `statusCode: [0-9]+` | regexp `((?P\\d+\\.\\d+)ms)` ``, we can trigger an alert when a response time is above 1s: + +```json +{ + "label": "responseTime", + "value": "> 1000" +} +``` + +### `alert.on.valueMatch` + +> [!IMPORTANT] +> For label based rule only. + +The `valueMatch` field is similar to the `value` field but it accept RegExp. +Based on the previous example query: ``{app=\"foo\"} |~ `statusCode: [0-9]+` | regexp `((?P\\d+\\.\\d+)ms)` ``, we can trigger an alert when a response time is above 1s: +```json +{ + "label": "responseTime", + "valueMatch": "[0-9]{4}" +} +``` + +### `alert.on.percentThreshold` + +> [!IMPORTANT] +> For label based rule only. +> +> **Cannot be used in addition of `count`** + +The `percentThreshold` field allow to compare the count of label on the given interval that matches the given `value` (or `valueMatch`). +Let's take this query as example: ``{app=\"sigyn\"} |~ `state: (ok|ko)` | regexp `state: (?Pok|ko)` ``: +```json +{ + "label": "state", + "value": "ko", + "percentThreshold": 70 +} +``` + +After each polling for the given interval, Sigyn will compare the ratio of `ko` state with the `ok` ones, if the percentage is at least 70%, then Sigyn will trigger the alert. + +### `alert.template` + +See [Templates](./templates.md) + +### `alert.severity` + +The `severity` field is used to by the notifiers, the alert UI changes based on theses severities. + +| Type | Required | Default | +|-----------|----------|--------------------------------------| +| `string` | ❌ | `config.defaultSeverity` or `error` | + +**Allowed values:** +- `critical` +- `error` | `major` +- `warning` | `minor` +- `information` | `info` | `low` + +> [!NOTE] +> You can specify root property `defaultSeverity` to change the default severity which is `error`. + +- `alert.severity` (String or Number, Optional): + - If not specified, the default value is `config.defaultSeverity`, if not specified the default is Severity 3 (`error`). Theses severities change the alert UI sent by the notifiers. + **Allowed values:** + - `critical` + - `error` | `major` + - `warning` | `minor` + - `information` | `info` | `low` + +### `alert.throttle` + +See [Throttle](./throttle.md) diff --git a/src/config/docs/self-monitoring.md b/src/config/docs/self-monitoring.md new file mode 100644 index 0000000..22f5e0c --- /dev/null +++ b/src/config/docs/self-monitoring.md @@ -0,0 +1,89 @@ +# Self Monitoring + +Self monitoring allow to trigger an alert when something wrong happened. It can be your Loki instance down, a rule with a bad **LogQL**, etc + +- [Example Configuration](#example-configuration) +- [Schema Properties](#schema-properties) + - [`template`](#notifiers) (See [Templates](./templates.md)) + - [`notifiers`](#notifiers) + - [`errorFilters`](#errorfilters) + - [`ruleFilters`](#rulefilters) + - [`minimumErrorCount`](#minimumerrorcount) + - [`throttle`](#throttle) (See [Throttle](./templates.md)) + +## Example configuration + +```json +{ + "selfMonitoring": { + "notifiers": ["discord"], + "template": { + "title": "Bad Gateway", + "content": [ + "Loki is down!" + ] + }, + "errorFilters": ["Bad Gateway"] + } +} +``` + +## Schema Properties + +### `template` + +See [templates](./templates.md) + +> [!NOTE] +> `template` property is **required** + +> [!WARNING] +> Self-monitoring templates can be a root template reference, however the available variables are differents. + +### `notifiers` + +Defines the notifiers to send alerts on. + +| Type | Required | Default | +|-----------|----------|--------------------------------| +| `string[]` | ❌ | All root configured notifiers | + +### `errorFilters` + +The `errorFilters` allow to filter errors. +Each item can be either a strict-equal match value or a RegExp. + +| Type | Required | +|-----------|----------| +| `string[]` | ❌ | + +For instance, if you won't be notifier by malformed **LogQL** queries (**Bad Request** errors) then you can filter it: + +```json +{ + "errorFilters": [ + "Bad Request" + ] +} +``` + +### `ruleFilters` + +The `ruleFilters` property allow the rules to be filtered **by their name**. Can be useful for instance if you have a rule with a very high count of logs that may throw a timeout. + +| Type | Required | +|-----------|----------| +| `string[]` | ❌ | + +### `minimumErrorCount` + +The minimum count of error before triggering an alert. + +| Type | Required | Default | +|----------|----------| --------| +| `number` | ❌ | `0` | + +### `throttle` + +See [Throttle](./throttle.md) + diff --git a/src/config/docs/templates.md b/src/config/docs/templates.md new file mode 100644 index 0000000..575b32b --- /dev/null +++ b/src/config/docs/templates.md @@ -0,0 +1,67 @@ +# Templates + +## Config + +There is 2 sorts of templates you can configure with Sigyn: +1. Root templates, theses templates are useful to be used or referenced by notifications templates. +2. Notifications templates, theses concernes [rules](./rules.md), [composite rules](./composite-rules.md) & [self monitoring](./self-monitoring.md). + +Both can be setting up with an object as following: + +| Property | Type | Required | Description | +|---|---|---|---| +| `extends` | `string` | ❌ | The template to extends from. | +| `title` | `string` | ❌ | The title of the notification template. | +| `content` | `string[]` or `object` | ❌ | The content of the notification template. It can be an object when extending another template (using `extends`) | +| `content.before` | `string[]` | ❌ | The content of the notification template to add **after** the extended template's content | +| `content.after` | `string[]` | ❌ | The content of the notification template to add **before** the extended template's content | +| `content.at.index` | `number` | ❌ | The index indicating where the new content should be added. Negative index works i.e. `-1` mean "before the last line" | +| `content.at.value` | `string` | ❌ | The specific content line to be included at the provided index. | + +> ![IMPORTANT] +> Either one of `title` or `content` is **required**. + +> ![NOTE] +> Extending templates can be nested: a **root template** can be extended from another **root template**. + +> ![NOTE] +> When extending another template, `title` & `content` will simply replace the base template property. (except `content` if an object is provided which allow to update the extended template `content`) + +## Variables + +You can use any of theses variables (both for `title` & `content`), surrounding with `{}` (see example below): +- `ruleName` +- `logql` +- `count` (count of logs retrievied within the interval) +- `counter` +- `threshold` (`alert.on.count`) +- `interval` +- `lokiUrl` + +> [!NOTE] +> You can use hyperlink with Markdown i.e. `[See logs]({lokiUrl})`. + +For self-monitoring, you can use theses variables, surrounding with `{}`: +- `agentFailure.errors` which is equal to the joined error messages +- `agentFailure.rules` which is equal to the joined failed rules + +For composite rules, you can use theses variables, surrounding with `{}`: +- `compositeRuleName` +- `label` which includes each combined labels from all rules +- `rules` joined rules names + +You can also use a label variable from your LogQL using `{label.x}`: +```json +{ + ... + "logql": "{app=\"foo\", env=\"preprod\"} |= `my super logql`", + "template": { + "content": [ + "app: {label.app} | env: {label.env}" + ] + } + ... +} +``` + +You can also use any variable extracted from `stream` vector. diff --git a/src/config/docs/throttle.md b/src/config/docs/throttle.md new file mode 100644 index 0000000..bfaa989 --- /dev/null +++ b/src/config/docs/throttle.md @@ -0,0 +1,13 @@ +# Throttle + +You can setup throttle for multiple things (`rules`, `compositeRules`, `selfMonitoring`), it works same everytime. + +## Schema Properties + +| Property | Type | Required | Description | +|-------------|------------------------|----------|-------------| +| `interval` | `string` | ✔️ | The throttle duration (e.g. `1m`, `1h`) after sending an alert. | +| `count` | `number` | ❌ | The count threshold to bypass throttle, default to `0` (never send alert before the end of interval). | +| `activationThreshold` | `number` | ❌ | The number of alerts allowed to be sent before the throttle to be activated. | + +