Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(instrumentation): add fw metrics #187

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

GALLLASMILAN
Copy link
Contributor

Description

I added the new custom middleware, that collects module_usage statistics with data about os, version etc. This middleware is enabled by default and sends data about the (agent, llm, tool) entities.
Data are automatically sent via the open telemetry SDK that is part of the framework right now, but it's configured only for metrics. I use PeriodicMetric reader, but I need to push the metrics when each run ends. Otherwise, I would drop a lot of metrics because the runtime does not wait for the SDK pushing.
This default telemetry can be disabled via the BEE_FRAMEWORK_INSTRUMENTATION_METRICS_DISABLED environment variable. See the new ./docs/docs/native-telemetry.md file for more info.

Checklist

  • I have read the contributor guide
  • Linting passes: yarn lint or yarn lint:fix
  • Formatting is applied: yarn format or yarn format:fix
  • Unit tests pass: yarn test:unit
  • E2E tests pass: yarn test:e2e
  • Tests are included
  • Documentation is changed or added
  • Commit messages and PR title follow conventional commits

@GALLLASMILAN GALLLASMILAN requested a review from Tomas2D November 20, 2024 10:19
@GALLLASMILAN GALLLASMILAN force-pushed the telemetry-framework-metrics branch from 89064aa to 5de3232 Compare November 20, 2024 10:26
const metricExporter = new OTLPMetricExporter({
url: "https://bee-collector.apps.fmaas-backend.fmaas.res.ibm.com/v1/metrics",
});
Copy link
Contributor

@pilartomas pilartomas Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will likely need some better domain that is stable. @matoushavlena

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with @GALLLASMILAN. He is going to lead the next actions required for this.

Copy link
Contributor Author

@GALLLASMILAN GALLLASMILAN Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will switch the URL to https://bee-telemetry.res.ibm.com/v1/metrics when it's prepared.

package.json Outdated
Comment on lines 168 to 172
"@opentelemetry/exporter-metrics-otlp-http": "^0.54.2",
"@opentelemetry/resources": "^1.27.0",
"@opentelemetry/sdk-metrics": "^1.27.0",
"@opentelemetry/sdk-node": "^0.54.2",
"@opentelemetry/semantic-conventions": "^1.27.0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to specify the metrics deps directly? Following should work

import { metrics } from "@opentelemetry/sdk-node"

README.md Outdated
@@ -152,6 +152,10 @@ This project and everyone participating in it are governed by the [Code of Condu

All content in these repositories including code has been provided by IBM under the associated open source software license and IBM is under no obligation to provide enhancements, updates, or support. IBM developers produced this code as an open source project (not as an IBM product), and IBM makes no assertions as to the level of quality nor security, and will not be maintaining this code going forward.

## Telemetry

Some metrics are collected by default. See the [Native Telemetry](./docs/native-telemetry.md) for more detail.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use absolute path + run yarn docs:build afterward.

package.json Outdated
Comment on lines 150 to 156
"test:unit": "BEE_FRAMEWORK_INSTRUMENTATION_METRICS_DISABLED=true vitest run src",
"test:unit:watch": "BEE_FRAMEWORK_INSTRUMENTATION_METRICS_DISABLED=true vitest run src",
"test:e2e": "BEE_FRAMEWORK_INSTRUMENTATION_METRICS_DISABLED=true vitest run tests/e2e",
"test:e2e:watch": "BEE_FRAMEWORK_INSTRUMENTATION_METRICS_DISABLED=true vitest watch tests/e2e",
"test:all": "yarn test:unit && yarn test:e2e && yarn test:examples",
"test:examples": "vitest --config ./examples/vitest.examples.config.ts run tests/examples",
"test:examples:watch": "vitest --config ./examples/vitest.examples.config.ts watch tests/examples",
"test:examples": "BEE_FRAMEWORK_INSTRUMENTATION_METRICS_DISABLED=true vitest --config ./examples/vitest.examples.config.ts run tests/examples",
"test:examples:watch": "BEE_FRAMEWORK_INSTRUMENTATION_METRICS_DISABLED=true vitest --config ./examples/vitest.examples.config.ts watch tests/examples",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add such variable to .env.test (just create it, it is going to be automatically used: see tests/setup.ts)

@@ -18,6 +18,9 @@ import { parseEnv } from "@/internals/env.js";
import { z } from "zod";

export const INSTRUMENTATION_ENABLED = parseEnv.asBoolean("BEE_FRAMEWORK_INSTRUMENTATION_ENABLED");
export const INSTRUMENTATION_METRICS_ENABLED = !parseEnv.asBoolean(
"BEE_FRAMEWORK_INSTRUMENTATION_METRICS_DISABLED",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we not switch the flag to _ENABLED and set the default to true?

// send metrics to the public collector
emitter.match(
(event) => event.path === `${basePath}.run.${finishEventName}`,
async () => await metricReader.forceFlush(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIP: metricReader.forceFlush.bind(metricReader)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, this inside forceFlush will always correctly refer to metricReader, so bind() is not required.

@GALLLASMILAN GALLLASMILAN force-pushed the telemetry-framework-metrics branch from 5de3232 to cbfaa9b Compare November 22, 2024 08:15
docs/native-telemetry.md Show resolved Hide resolved

We understand that not all users want to send telemetry data. You can easily disable this feature by setting an environment variable:

```bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't there support for ```env ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen the ```env yet, I will remove the bash part and the markdown will highlight it correctly.

@GALLLASMILAN GALLLASMILAN requested a review from Tomas2D November 22, 2024 12:13
@mmurad2 mmurad2 added the blocked Waiting for an external factor to be completed. label Dec 16, 2024
@GALLLASMILAN GALLLASMILAN force-pushed the telemetry-framework-metrics branch from c8e45fb to af722fe Compare January 23, 2025 09:00
Signed-off-by: GALLLASMILAN <[email protected]>
@GALLLASMILAN GALLLASMILAN force-pushed the telemetry-framework-metrics branch from af722fe to 969a12a Compare January 23, 2025 09:06
@GALLLASMILAN GALLLASMILAN marked this pull request as ready for review January 23, 2025 09:31
@GALLLASMILAN GALLLASMILAN requested a review from a team as a code owner January 23, 2025 09:31
package.json Outdated
@@ -277,6 +279,7 @@
"@googleapis/customsearch": "^3.2.0",
"@grpc/grpc-js": "^1.12.4",
"@grpc/proto-loader": "^0.7.13",
"@ibm-generative-ai/node-sdk": "~3.2.3",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BAM is no longer supported

Comment on lines +25 to +27
const name = "bee-agent-framework";

export const activeTracesMap = new Map<string, string>();
export const meter = api.metrics.getMeter(name, Version);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd assume name and version are taken from the OTEL SDK instance. Is that not the case?

Comment on lines 142 to 151
moduleUsageGauge.record(1, {
source: instance.constructor.name,
type: instance instanceof BaseAgent ? "agent" : instance instanceof Tool ? "tool" : "llm",
framework_version: Version,
os_type: os.type(),
os_release: os.release(),
os_arch: os.arch(),
trace_id: traceId,
event_id: eventId,
});
Copy link
Contributor

@pilartomas pilartomas Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is weird to see most of these labels,

os, fw version, trace are likely supplied by the SDK (not necessarily as a label but as OTEL resource or something like that)

trace_id and event_id are not a good fit for a metric

source and type are good but it signals this should be an incrementing counter not a gauge, it will also likely blow up our collector/backend.

Lastly, and this is likely the most important. I'd say we'd want a label that establishes an anonymous identity once framework is cloned (or first used). A random ID that's stable.

Comment on lines 57 to 58
const answer = response.messages.at(-1)?.content;
return { next: Workflow.END, update: { answer: answer?.toString() } };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the main branch this is not needed.

Comment on lines 48 to 49
const answer = response.messages.at(-1)?.content;
return { next: Workflow.END, update: { answer: answer?.toString() } };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

package.json Outdated Show resolved Hide resolved
throw new Error("Method not implemented.");
}
public readonly emitter = Emitter.root.child<BaseLLMEvents>({
namespace: ["bam", "llm"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update namespace please

@@ -116,3 +127,26 @@ export function buildTraceTree({
},
);
}

export const isMeasurementedInstance = (instance: any) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use named function instead

@@ -14,22 +14,22 @@
* limitations under the License.
*/

import { Attributes, SpanStatus, TimeInput } from "@opentelemetry/api";
import { api } from "@opentelemetry/sdk-node";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import type { api } ...

@@ -20,6 +20,9 @@ import * as R from "remeda";
import { extractClassName } from "@/serializer/utils.js";
import { SerializerError } from "@/serializer/error.js";
import { Cache } from "@/cache/decoratorCache.js";
import { startMetricNodeSdkReader } from "@/instrumentation/sdk.js";

startMetricNodeSdkReader();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the start function is here in serializable module?

Copy link
Contributor Author

@GALLLASMILAN GALLLASMILAN Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each exported measurable module inherits from this class.

Signed-off-by: Milan Gallas <[email protected]>
Signed-off-by: Milan Gallas <[email protected]>
@GALLLASMILAN GALLLASMILAN requested a review from Tomas2D January 23, 2025 11:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked Waiting for an external factor to be completed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants