Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add arg_min, arg_max #182

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
158 changes: 158 additions & 0 deletions apl/aggregation-function/arg-max.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
---
title: arg_max
description: 'This page explains how to use the arg_max aggregation in APL.'
---

The `arg_max` aggregation in APL helps you identify the record with the maximum value for a specific numeric field and return one or more additional fields from that record. Use `arg_max` when you want to determine key details associated with a record having the maximum value, such as the longest request duration, highest transaction amount, or most significant span duration.

This aggregation is particularly useful in scenarios like:

- Pinpointing the slowest HTTP requests in log data.
- Identifying the longest span durations in OpenTelemetry traces.
- Highlighting the highest severity security alerts in logs.
Comment on lines +6 to +12
Copy link
Contributor

@mhr3 mhr3 Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this mostly describes a max() aggregation and doesn't focus on the 'arg' part of it.

o1 makes it more clear imo:

- You group your data by one or more columns (using by in summarize).
- Within each group, Kusto finds the row where a particular expression (often a column) has the maximum value.
- It then returns specified columns from that “maximum” row.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


## For users of other query languages

If you come from other query languages, this section explains how to adjust your existing queries to achieve the same results in APL.

<AccordionGroup>
<Accordion title="Splunk SPL users">

In Splunk SPL, you use `stats` with a combination of `max` and `by` clauses to achieve similar results. APL provides a dedicated `arg_max` aggregation that simplifies this process.

<CodeGroup>
```sql Splunk example
| stats max(req_duration_ms) as max_duration by id, uri
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK splunk doesn't have an equivalent for arg_max, this is just

| summarize max(req_duration_ms) by id, uri

```

```kusto APL equivalent
['sample-http-logs']
| summarize arg_max(req_duration_ms, id, uri)
```
</CodeGroup>

</Accordion>
<Accordion title="ANSI SQL users">

In ANSI SQL, you typically use a subquery to find the maximum value and then join it back to the original table to retrieve additional fields. APL’s `arg_max` provides a more concise and efficient alternative.

<CodeGroup>
```sql SQL example
WITH MaxValues AS (
SELECT id, MAX(req_duration_ms) as max_duration
FROM sample_http_logs
GROUP BY id
)
SELECT logs.id, logs.uri, MaxValues.max_duration
FROM sample_http_logs logs
JOIN MaxValues
ON logs.id = MaxValues.id;
```

```kusto APL equivalent
['sample-http-logs']
| summarize arg_max(req_duration_ms, id, uri)
```
</CodeGroup>

</Accordion>
</AccordionGroup>

## Usage

### Syntax

```kusto
| summarize arg_max(numeric_field, field1[, field2, ...])
```

### Parameters

| Parameter | Description |
|------------------|-------------------------------------------------------------------------------------|
| `numeric_field` | The numeric field whose maximum value determines the selected record. |
| `field1, field2` | The additional fields to retrieve from the record with the maximum numeric value. |

### Returns

`arg_max` returns a row for each group (or the entire dataset if no grouping is specified), containing the fields specified in the query.

## Use case examples

<Tabs>
<Tab title="Log analysis">

Find the slowest HTTP request for each URI in the `['sample-http-logs']` dataset.

**Query**

```kusto
['sample-http-logs']
| summarize arg_max(req_duration_ms, method) by uri
Copy link
Contributor

@mhr3 mhr3 Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO a better example would be with swapped uri & method - ie arg_max(duration, uri) by method - which would return slowest paths for any given HTTP method.

Find the slowest HTTP request for each URI in the ['sample-http-logs'] dataset.

that doesn't need arg_max, would be just summarize max(duration) by uri

Copy link
Contributor

@mhr3 mhr3 Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or perhaps replace the method with request_id (in the original), that also makes sense

```

[Run in Playground](https://play.axiom.co/axiom-play-qf1k/explorer?initForm=%7B%22apl%22%3A%22%5B'sample-http-logs'%5D%20%7C%20summarize%20arg_max(req_duration_ms%2C%20method)%20by%20uri%22%7D)

**Output**

| uri | method | req_duration_ms |
|-------------------|--------|-----------------|
| /home | GET | 1200 |
| /api/products | POST | 2500 |

This query identifies the slowest HTTP request for each URI along with the corresponding method.

</Tab>
<Tab title="OpenTelemetry traces">

Identify the span with the longest duration for each service in the `['otel-demo-traces']` dataset.

**Query**

```kusto
['otel-demo-traces']
| summarize arg_max(duration, span_id, trace_id) by ['service.name']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, that's a good one 👍🏼

```

[Run in Playground](https://play.axiom.co/axiom-play-qf1k/explorer?initForm=%7B%22apl%22%3A%22%5B'otel-demo-traces'%5D%20%7C%20summarize%20arg_max(duration%2C%20span_id%2C%20trace_id)%20by%20%5B'service.name'%5D%22%7D)

**Output**

| service.name | span_id | trace_id | duration |
|--------------------|----------|-----------|----------|
| frontend | span123 | trace456 | 3s |
| checkoutservice | span789 | trace012 | 5s |

This query identifies the span with the longest duration for each service, returning the `span_id`, `trace_id`, and `duration`.

</Tab>
<Tab title="Security logs">

Find the highest status code for each country in the `['sample-http-logs']` dataset.

**Query**

```kusto
['sample-http-logs']
| summarize arg_max(toint(status), uri) by ['geo.country']
```

[Run in Playground](https://play.axiom.co/axiom-play-qf1k/explorer?initForm=%7B%22apl%22%3A%22%5B'sample-http-logs'%5D%20%7C%20summarize%20arg_max(toint(status)%2C%20uri)%20by%20%5B'geo.country'%5D%22%7D)

**Output**

| geo.country | uri | status |
|--------------|-------------------|--------|
| USA | /admin | 500 |
| Canada | /dashboard | 503 |

This query identifies the URI with the highest status code for each country.

</Tab>
</Tabs>

## List of related aggregations

- [arg_min](/apl/aggregation-function/arg-min): Retrieves the record with the minimum value for a numeric field.
- [max](/apl/aggregation-function/max): Retrieves the maximum value for a numeric field but does not return additional fields.
- [percentile](/apl/aggregation-function/percentile): Provides the value at a specific percentile of a numeric field.
146 changes: 146 additions & 0 deletions apl/aggregation-function/arg-min.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
---
title: arg_min
description: 'This page explains how to use the arg_min aggregation in APL.'
---

The `arg_min` aggregation in APL allows you to identify the row in a dataset where a specific numeric field has the minimum value. You can use this to retrieve other associated fields in the same row, making it particularly useful for pinpointing details about the smallest value in large datasets. Typical use cases include identifying the shortest request duration in web logs, the fastest span duration in OpenTelemetry traces, or the lowest security risk score in logs.

## For users of other query languages

If you come from other query languages, this section explains how to adjust your existing queries to achieve the same results in APL.

<AccordionGroup>
<Accordion title="Splunk SPL users">

In Splunk SPL, achieving similar functionality involves using the `stats` command with the `values` or `first` functions after sorting by the desired field. In APL, `arg_min` simplifies this by directly providing the row with the minimum value for a given field.

<CodeGroup>
```sql Splunk example
| stats min(req_duration_ms) as minDuration by id
| where req_duration_ms=minDuration
```

```kusto APL equivalent
['sample-http-logs']
| summarize arg_min(req_duration_ms, id, uri)
```
</CodeGroup>

</Accordion>
<Accordion title="ANSI SQL users">

In ANSI SQL, achieving similar functionality often requires a combination of `MIN`, `GROUP BY`, and `JOIN` to retrieve the associated fields. APL's `arg_min` eliminates the need for multiple steps by directly returning the row with the minimum value.

<CodeGroup>
```sql SQL example
SELECT id, uri
FROM sample_http_logs
WHERE req_duration_ms = (
SELECT MIN(req_duration_ms)
FROM sample_http_logs
);
```

```kusto APL equivalent
['sample-http-logs']
| summarize arg_min(req_duration_ms, id, uri)
```
</CodeGroup>

</Accordion>
</AccordionGroup>

## Usage

### Syntax

```kusto
| summarize arg_min(numeric_field, field1, ..., fieldN)
```

### Parameters

- `numeric_field`: The numeric field to evaluate for the minimum value.
- `field1, ..., fieldN`: Additional fields to return from the row with the minimum value.

### Returns

A single row containing the minimum value of the numeric field and the corresponding values of the specified additional fields.

## Use case examples

<Tabs>
<Tab title="Log analysis">

You can use `arg_min` to identify the HTTP request with the shortest duration and its associated details.

**Query**

```kusto
['sample-http-logs']
| summarize arg_min(req_duration_ms, uri, method)
```

[Run in Playground](https://play.axiom.co/axiom-play-qf1k/explorer?initForm=%7B%22apl%22%3A%22%5B'sample-http-logs'%5D%20%7C%20summarize%20arg_min(req_duration_ms%2C%20uri%2C%20method)%22%7D)

**Output**

| req_duration_ms | uri | method |
|-----------------|--------------------|--------|
| 12 | /api/login | POST |

This query identifies the shortest HTTP request duration and provides details about the request.

</Tab>
<Tab title="OpenTelemetry traces">

Use `arg_min` to find the span with the shortest duration and retrieve its associated details.

**Query**

```kusto
['otel-demo-traces']
| summarize arg_min(duration, trace_id, span_id, ['service.name'], kind)
```

[Run in Playground](https://play.axiom.co/axiom-play-qf1k/explorer?initForm=%7B%22apl%22%3A%22%5B'otel-demo-traces'%5D%20%7C%20summarize%20arg_min(duration%2C%20trace_id%2C%20span_id%2C%20%5B'service.name'%5D%2C%20kind)%22%7D)

**Output**

| duration | trace_id | span_id | service.name | kind |
|----------|------------|------------|--------------------|----------|
| 00:00:01 | abc123 | span456 | frontend | server |

This query identifies the span with the shortest duration along with its metadata.

</Tab>
<Tab title="Security logs">

Find the lowest status code for each country in the `['sample-http-logs']` dataset.

**Query**

```kusto
['sample-http-logs']
| summarize arg_min(toint(status), uri) by ['geo.country']
```

[Run in Playground](https://play.axiom.co/axiom-play-qf1k/explorer?initForm=%7B%22apl%22%3A%22%5B'sample-http-logs'%5D%20%7C%20summarize%20arg_min(toint(status)%2C%20uri)%20by%20%5B'geo.country'%5D%22%7D)

**Output**

| geo.country | uri | status |
|--------------|-------------------|--------|
| USA | /admin | 200 |
| Canada | /dashboard | 201 |

This query identifies the URI with the lowest status code for each country.

</Tab>
</Tabs>

## List of related aggregations

- [arg_max](/apl/aggregation-function/arg-max): Returns the row with the maximum value for a numeric field, useful for finding peak metrics.
- [min](/apl/aggregation-function/min): Returns only the minimum value of a numeric field without additional fields.
- [percentile](/apl/aggregation-function/percentile): Provides the value at a specific percentile of a numeric field.
2 changes: 2 additions & 0 deletions apl/aggregation-function/statistical-functions.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ The table summarizes the aggregation functions available in APL. Use all these a

| Function | Description |
| ------------------------------- | ----------------------------------------------------------------------------------------------------------------- |
| [arg_min](/apl/aggregation-function/arg-min) | Returns the row containing the minimum value of a numeric field. |
| [arg_max](/apl/aggregation-function/arg-max) | Returns the row containing the maximum value of a numeric field |
| [avg](/apl/aggregation-function/avg) | Returns an average value across the group. |
| [avgif](/apl/aggregation-function/avgif) | Calculates the average value of an expression in records for which the predicate evaluates to true. |
| [count](/apl/aggregation-function/count) | Returns a count of the group without/with a predicate. |
Expand Down
2 changes: 2 additions & 0 deletions mint.json
Original file line number Diff line number Diff line change
Expand Up @@ -399,6 +399,8 @@
"icon": "sigma",
"pages": [
"apl/aggregation-function/statistical-functions",
"apl/aggregation-function/arg-min",
"apl/aggregation-function/arg-max",
"apl/aggregation-function/avg",
"apl/aggregation-function/avgif",
"apl/aggregation-function/count",
Expand Down