New UI for Observe #22

GALLLASMILAN · 2025-01-02T11:03:48Z

Observe UI

Current state

We use the mlflow as a UI tool for traces right now. This approach has several limitations I will describe.
The motivation is to have software that will eliminate these limitations.

mlflow limitations

Some things limit us right now.

We need to follow the external routes format in mlflow API = when We upload the traces to the mlflow we have to use the mlflow API to upload data there. But the API does not accept the exact OpenTelemetry format and we use the custom logic to parse it. The API is still experimental and adds to another complexity to keep Observe up to date with mlflow.
The mlflow is not scalable = You can deploy more pods, but it has no effect on performance and when we use the COS and database to save data, we have timeout problems.
The mlflow ui is not user friendly = You cannot easily find the specific trace by id, or the UI misses the smart sorting and filtering. The next thing is the tabs are confusing and users cannot easily find the traces page.

UI solutions

The evaluation system is a good business opportunity

bee-ui included ⛔

This solution creates the dependency between the framework and the bee-ui and it's inefficiently complicated.

New UI app ✅

This is a better solution from my point of view. We can create a very simple app that will be part of the bee-stack and bee-agent-framework-starter.
This application will have only one dependency on the bee-observer (API server).

!! The scope is important !!

Features

Trace list

The paginated table with traces and base information about each one.

The table will be sortable
the search input will filter the traces by id.
There will be an option showing only the errored traces

Trace detail

The page detail data with the dependency tree and some picked data, that are important for us for quick debugging.
The picked data for the trace execution:

token count
execution time

The picked data for each iteration:

raw prompt
token count
execution time

Features v2

Dashboard
- add metrics to our UI (feat(instrumentation): add fw metrics bee-agent-framework#187). This is a blocked PR that adds the module_usage metric. We would also need to add the metric route to our Obaserve OTLP backend.
- Total traces/period (total count, total errored) etc.....

Features V3 (Evaluation)

The simple version of evaluation without the datasets and runs

When the Observe accepts the trace and saves it to the database, it calls the BullMQ evaluation job.
The Python service will accept the traceId from BullMQ and will call inference to get the evaluation metrics. Then, The service saves the evaluation metrics to the trace entity

Implementation:

[UI] The evaluation metrics pages => only the simple judge type without the expected answer. (Only the static list, could be hardcoded in the code for the first evaluation version.
[Observe] = Add the patch /v1/traces/${traceId} route that will accept the list of evaluation metrics. TODO: specify the format
[Observe] = Create a queue for an evaluation job. This job will be called automatically when the trace is created.
[Python-eval] = Update the evaluation service to work with BullMQ. @jezekra1 TODO: what inference will we use (try to avoid the bee-api dependency)
[Infrastructure] = Add the Python-eval to the bee-agent-framework-starter and bee-stack

Evalutation (part 2)

Datasets in Observe
Run entity in Observe
Compare functionality

Inspiration and opportunities

Depending on this analysis I make the summary of what would be the right way for us.

well-arranged UI for traces with picked important information. = All tools I analyzed have partly messy UI for tracing.
good evaluation UI = Only langfuse has a good UI integration. There is a big opportunity for us
Good open-source solution = When you wanna try tools like agentOps and langtrace, you are navigated to the hosted app. It's great for the first use, but when you develop locally, the easily runnable docker is a git advantage. The goal for us could be to simplify the docker image so that it can run without other dependencies in one command.

Let's simplify the Observe to allow the possibility of running it without Redis and MongoDB.

👎 more universal solution - we should stay connected with our framework solution and not make some universal tool for observability. Because when we stay to support only the defined data frame, we can work with the data more efficiently and visualize them well-arranged for the user.

The text was updated successfully, but these errors were encountered:

GALLLASMILAN · 2025-01-13T10:16:16Z

Crew AI Observability

AgentOps.ai

crewAI - agentops-observability => AgentOps.ai => (repo), default UI data, Node sdk. Set via AGENTOPS_API_KEY env and the dependency pip install 'crewai[agentops]' must be installed. Then

import agentops
agentops.init()

cons:

DOES NOT HAVE EVALUATION FN
the UI is not intuitive
The trace detail page (session record) tree detail is a joke.

prons:

The trace detail page header with picked information is cool.

Langtrace

Agent Monitoring with Langtrace => Langtrace, (repo), Evalution docs page = but the UI is very naive.

from langtrace_python_sdk import langtrace
langtrace.init(api_key='<LANGTRACE_API_KEY>')

cons:

the evaluation and observability is not implemented in the native crewAI stack but in the external Langtrace tool.
👎 The user cannot run the evaluation in the UI directly. He is redirect to the evaluation page with the instructions how to use the command line to create a evalution.
The docs page does not have valid instructions.
The trace page is not clean. The default page contains only irrelevant information and the table is very messy. The user needs to click several times to show something.
Bugs in UI (e.g. datasets)

props:

great evaluation page with graphs. See evaluation page images
evaluation comparing

OpenLIT

Agent Monitoring with OpenLIT => OpenLIT, evaluation docs = it looks like a good tool for the tracing in general, but not for the LLM solution like ours. The trace (request) detail is very simple. The product core is not very useful. It has some functions like prompt management and secret management, but it's not the additional value for the base telemetry use cases. It looks like an open-source tool made from the custom tool 😄

They have an auto-evaluation task. But it is not implemented yet.

portkey

Portkey Observability and Guardrails => https://portkey.ai/, github repose.

Does not have evaluation, but provides only some predefined guardrails.

Some base guardrails are implemented in the UI (see docs), but for custom once, they only provide the webhook solution.

This is the only library, that is set in the LLM provider class.

------------------------ bonus -------------------------------------

Langfuse

Langfuse => github. Read how to use Ollama.
OpenAI + ollama in node.js.
langfuse js monorepo
docs - LLM as a judge

template => config (join traces and templates)

cons:

The default trace table contains a lot of empty columns. They should simplify the trace list and provide users only the important ones by default.
I cannot filter traces by a user prompt.

props:

very nice trace detail
session for trace grouping

jezekra1 · 2025-01-17T14:11:14Z

I created evaluation-observe integration proposal as a separate issue:

https://github.com/i-am-bee/internal/issues/90

GALLLASMILAN self-assigned this Jan 2, 2025

planetf1 mentioned this issue Jan 10, 2025

platform: log aggregation i-am-bee/bee-hive#43

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New UI for Observe #22

New UI for Observe #22

GALLLASMILAN commented Jan 2, 2025 •

edited

Loading

GALLLASMILAN commented Jan 13, 2025 •

edited

Loading

jezekra1 commented Jan 17, 2025

New UI for Observe #22

New UI for Observe #22

Comments

GALLLASMILAN commented Jan 2, 2025 • edited Loading

Observe UI

Current state

mlflow limitations

UI solutions

bee-ui included ⛔

New UI app ✅

Features

Trace list

Trace detail

Features v2

Features V3 (Evaluation)

Implementation:

Evalutation (part 2)

Inspiration and opportunities

GALLLASMILAN commented Jan 13, 2025 • edited Loading

Crew AI Observability

AgentOps.ai

Langtrace

OpenLIT

portkey

Langfuse

jezekra1 commented Jan 17, 2025

GALLLASMILAN commented Jan 2, 2025 •

edited

Loading

GALLLASMILAN commented Jan 13, 2025 •

edited

Loading