-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add arg_min, arg_max #182
base: main
Are you sure you want to change the base?
Add arg_min, arg_max #182
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,158 @@ | ||
--- | ||
title: arg_max | ||
description: 'This page explains how to use the arg_max aggregation in APL.' | ||
--- | ||
|
||
The `arg_max` aggregation in APL helps you identify the record with the maximum value for a specific numeric field and return one or more additional fields from that record. Use `arg_max` when you want to determine key details associated with a record having the maximum value, such as the longest request duration, highest transaction amount, or most significant span duration. | ||
|
||
This aggregation is particularly useful in scenarios like: | ||
|
||
- Pinpointing the slowest HTTP requests in log data. | ||
- Identifying the longest span durations in OpenTelemetry traces. | ||
- Highlighting the highest severity security alerts in logs. | ||
|
||
## For users of other query languages | ||
|
||
If you come from other query languages, this section explains how to adjust your existing queries to achieve the same results in APL. | ||
|
||
<AccordionGroup> | ||
<Accordion title="Splunk SPL users"> | ||
|
||
In Splunk SPL, you use `stats` with a combination of `max` and `by` clauses to achieve similar results. APL provides a dedicated `arg_max` aggregation that simplifies this process. | ||
|
||
<CodeGroup> | ||
```sql Splunk example | ||
| stats max(req_duration_ms) as max_duration by id, uri | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. AFAIK splunk doesn't have an equivalent for arg_max, this is just | summarize max(req_duration_ms) by id, uri |
||
``` | ||
|
||
```kusto APL equivalent | ||
['sample-http-logs'] | ||
| summarize arg_max(req_duration_ms, id, uri) | ||
``` | ||
</CodeGroup> | ||
|
||
</Accordion> | ||
<Accordion title="ANSI SQL users"> | ||
|
||
In ANSI SQL, you typically use a subquery to find the maximum value and then join it back to the original table to retrieve additional fields. APL’s `arg_max` provides a more concise and efficient alternative. | ||
|
||
<CodeGroup> | ||
```sql SQL example | ||
WITH MaxValues AS ( | ||
SELECT id, MAX(req_duration_ms) as max_duration | ||
FROM sample_http_logs | ||
GROUP BY id | ||
) | ||
SELECT logs.id, logs.uri, MaxValues.max_duration | ||
FROM sample_http_logs logs | ||
JOIN MaxValues | ||
ON logs.id = MaxValues.id; | ||
``` | ||
|
||
```kusto APL equivalent | ||
['sample-http-logs'] | ||
| summarize arg_max(req_duration_ms, id, uri) | ||
``` | ||
</CodeGroup> | ||
|
||
</Accordion> | ||
</AccordionGroup> | ||
|
||
## Usage | ||
|
||
### Syntax | ||
|
||
```kusto | ||
| summarize arg_max(numeric_field, field1[, field2, ...]) | ||
``` | ||
|
||
### Parameters | ||
|
||
| Parameter | Description | | ||
|------------------|-------------------------------------------------------------------------------------| | ||
| `numeric_field` | The numeric field whose maximum value determines the selected record. | | ||
| `field1, field2` | The additional fields to retrieve from the record with the maximum numeric value. | | ||
|
||
### Returns | ||
|
||
`arg_max` returns a row for each group (or the entire dataset if no grouping is specified), containing the fields specified in the query. | ||
|
||
## Use case examples | ||
|
||
<Tabs> | ||
<Tab title="Log analysis"> | ||
|
||
Find the slowest HTTP request for each URI in the `['sample-http-logs']` dataset. | ||
|
||
**Query** | ||
|
||
```kusto | ||
['sample-http-logs'] | ||
| summarize arg_max(req_duration_ms, method) by uri | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IMO a better example would be with swapped uri & method - ie arg_max(duration, uri) by method - which would return slowest paths for any given HTTP method.
that doesn't need arg_max, would be just There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. or perhaps replace the method with request_id (in the original), that also makes sense |
||
``` | ||
|
||
[Run in Playground](https://play.axiom.co/axiom-play-qf1k/explorer?initForm=%7B%22apl%22%3A%22%5B'sample-http-logs'%5D%20%7C%20summarize%20arg_max(req_duration_ms%2C%20method)%20by%20uri%22%7D) | ||
|
||
**Output** | ||
|
||
| uri | method | req_duration_ms | | ||
|-------------------|--------|-----------------| | ||
| /home | GET | 1200 | | ||
| /api/products | POST | 2500 | | ||
|
||
This query identifies the slowest HTTP request for each URI along with the corresponding method. | ||
|
||
</Tab> | ||
<Tab title="OpenTelemetry traces"> | ||
|
||
Identify the span with the longest duration for each service in the `['otel-demo-traces']` dataset. | ||
|
||
**Query** | ||
|
||
```kusto | ||
['otel-demo-traces'] | ||
| summarize arg_max(duration, span_id, trace_id) by ['service.name'] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yep, that's a good one 👍🏼 |
||
``` | ||
|
||
[Run in Playground](https://play.axiom.co/axiom-play-qf1k/explorer?initForm=%7B%22apl%22%3A%22%5B'otel-demo-traces'%5D%20%7C%20summarize%20arg_max(duration%2C%20span_id%2C%20trace_id)%20by%20%5B'service.name'%5D%22%7D) | ||
|
||
**Output** | ||
|
||
| service.name | span_id | trace_id | duration | | ||
|--------------------|----------|-----------|----------| | ||
| frontend | span123 | trace456 | 3s | | ||
| checkoutservice | span789 | trace012 | 5s | | ||
|
||
This query identifies the span with the longest duration for each service, returning the `span_id`, `trace_id`, and `duration`. | ||
|
||
</Tab> | ||
<Tab title="Security logs"> | ||
|
||
Find the highest status code for each country in the `['sample-http-logs']` dataset. | ||
|
||
**Query** | ||
|
||
```kusto | ||
['sample-http-logs'] | ||
| summarize arg_max(toint(status), uri) by ['geo.country'] | ||
``` | ||
|
||
[Run in Playground](https://play.axiom.co/axiom-play-qf1k/explorer?initForm=%7B%22apl%22%3A%22%5B'sample-http-logs'%5D%20%7C%20summarize%20arg_max(toint(status)%2C%20uri)%20by%20%5B'geo.country'%5D%22%7D) | ||
|
||
**Output** | ||
|
||
| geo.country | uri | status | | ||
|--------------|-------------------|--------| | ||
| USA | /admin | 500 | | ||
| Canada | /dashboard | 503 | | ||
|
||
This query identifies the URI with the highest status code for each country. | ||
|
||
</Tab> | ||
</Tabs> | ||
|
||
## List of related aggregations | ||
|
||
- [arg_min](/apl/aggregation-function/arg-min): Retrieves the record with the minimum value for a numeric field. | ||
- [max](/apl/aggregation-function/max): Retrieves the maximum value for a numeric field but does not return additional fields. | ||
- [percentile](/apl/aggregation-function/percentile): Provides the value at a specific percentile of a numeric field. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,146 @@ | ||
--- | ||
title: arg_min | ||
description: 'This page explains how to use the arg_min aggregation in APL.' | ||
--- | ||
|
||
The `arg_min` aggregation in APL allows you to identify the row in a dataset where a specific numeric field has the minimum value. You can use this to retrieve other associated fields in the same row, making it particularly useful for pinpointing details about the smallest value in large datasets. Typical use cases include identifying the shortest request duration in web logs, the fastest span duration in OpenTelemetry traces, or the lowest security risk score in logs. | ||
|
||
## For users of other query languages | ||
|
||
If you come from other query languages, this section explains how to adjust your existing queries to achieve the same results in APL. | ||
|
||
<AccordionGroup> | ||
<Accordion title="Splunk SPL users"> | ||
|
||
In Splunk SPL, achieving similar functionality involves using the `stats` command with the `values` or `first` functions after sorting by the desired field. In APL, `arg_min` simplifies this by directly providing the row with the minimum value for a given field. | ||
|
||
<CodeGroup> | ||
```sql Splunk example | ||
| stats min(req_duration_ms) as minDuration by id | ||
| where req_duration_ms=minDuration | ||
``` | ||
|
||
```kusto APL equivalent | ||
['sample-http-logs'] | ||
| summarize arg_min(req_duration_ms, id, uri) | ||
``` | ||
</CodeGroup> | ||
|
||
</Accordion> | ||
<Accordion title="ANSI SQL users"> | ||
|
||
In ANSI SQL, achieving similar functionality often requires a combination of `MIN`, `GROUP BY`, and `JOIN` to retrieve the associated fields. APL's `arg_min` eliminates the need for multiple steps by directly returning the row with the minimum value. | ||
|
||
<CodeGroup> | ||
```sql SQL example | ||
SELECT id, uri | ||
FROM sample_http_logs | ||
WHERE req_duration_ms = ( | ||
SELECT MIN(req_duration_ms) | ||
FROM sample_http_logs | ||
); | ||
``` | ||
|
||
```kusto APL equivalent | ||
['sample-http-logs'] | ||
| summarize arg_min(req_duration_ms, id, uri) | ||
``` | ||
</CodeGroup> | ||
|
||
</Accordion> | ||
</AccordionGroup> | ||
|
||
## Usage | ||
|
||
### Syntax | ||
|
||
```kusto | ||
| summarize arg_min(numeric_field, field1, ..., fieldN) | ||
``` | ||
|
||
### Parameters | ||
|
||
- `numeric_field`: The numeric field to evaluate for the minimum value. | ||
- `field1, ..., fieldN`: Additional fields to return from the row with the minimum value. | ||
|
||
### Returns | ||
|
||
A single row containing the minimum value of the numeric field and the corresponding values of the specified additional fields. | ||
|
||
## Use case examples | ||
|
||
<Tabs> | ||
<Tab title="Log analysis"> | ||
|
||
You can use `arg_min` to identify the HTTP request with the shortest duration and its associated details. | ||
|
||
**Query** | ||
|
||
```kusto | ||
['sample-http-logs'] | ||
| summarize arg_min(req_duration_ms, uri, method) | ||
``` | ||
|
||
[Run in Playground](https://play.axiom.co/axiom-play-qf1k/explorer?initForm=%7B%22apl%22%3A%22%5B'sample-http-logs'%5D%20%7C%20summarize%20arg_min(req_duration_ms%2C%20uri%2C%20method)%22%7D) | ||
|
||
**Output** | ||
|
||
| req_duration_ms | uri | method | | ||
|-----------------|--------------------|--------| | ||
| 12 | /api/login | POST | | ||
|
||
This query identifies the shortest HTTP request duration and provides details about the request. | ||
|
||
</Tab> | ||
<Tab title="OpenTelemetry traces"> | ||
|
||
Use `arg_min` to find the span with the shortest duration and retrieve its associated details. | ||
|
||
**Query** | ||
|
||
```kusto | ||
['otel-demo-traces'] | ||
| summarize arg_min(duration, trace_id, span_id, ['service.name'], kind) | ||
``` | ||
|
||
[Run in Playground](https://play.axiom.co/axiom-play-qf1k/explorer?initForm=%7B%22apl%22%3A%22%5B'otel-demo-traces'%5D%20%7C%20summarize%20arg_min(duration%2C%20trace_id%2C%20span_id%2C%20%5B'service.name'%5D%2C%20kind)%22%7D) | ||
|
||
**Output** | ||
|
||
| duration | trace_id | span_id | service.name | kind | | ||
|----------|------------|------------|--------------------|----------| | ||
| 00:00:01 | abc123 | span456 | frontend | server | | ||
|
||
This query identifies the span with the shortest duration along with its metadata. | ||
|
||
</Tab> | ||
<Tab title="Security logs"> | ||
|
||
Find the lowest status code for each country in the `['sample-http-logs']` dataset. | ||
|
||
**Query** | ||
|
||
```kusto | ||
['sample-http-logs'] | ||
| summarize arg_min(toint(status), uri) by ['geo.country'] | ||
``` | ||
|
||
[Run in Playground](https://play.axiom.co/axiom-play-qf1k/explorer?initForm=%7B%22apl%22%3A%22%5B'sample-http-logs'%5D%20%7C%20summarize%20arg_min(toint(status)%2C%20uri)%20by%20%5B'geo.country'%5D%22%7D) | ||
|
||
**Output** | ||
|
||
| geo.country | uri | status | | ||
|--------------|-------------------|--------| | ||
| USA | /admin | 200 | | ||
| Canada | /dashboard | 201 | | ||
|
||
This query identifies the URI with the lowest status code for each country. | ||
|
||
</Tab> | ||
</Tabs> | ||
|
||
## List of related aggregations | ||
|
||
- [arg_max](/apl/aggregation-function/arg-max): Returns the row with the maximum value for a numeric field, useful for finding peak metrics. | ||
- [min](/apl/aggregation-function/min): Returns only the minimum value of a numeric field without additional fields. | ||
- [percentile](/apl/aggregation-function/percentile): Provides the value at a specific percentile of a numeric field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this mostly describes a max() aggregation and doesn't focus on the 'arg' part of it.
o1 makes it more clear imo:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://chatgpt.com/share/e/6780d15a-8794-800f-b066-a1414fd83a66