You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We recently refactored the RAG subsystem built into the agent implementation into an independent "memory" tool. The desire was to:
enhance modularity
allow for swapping of this subsystem by other implementations
most importantly, allow for evaluating this subsystem independently
However, this has created a few other downstream issues we need to solve.
There's confusion between this memory tool and the "Memory / MemoryBanks" APIs -- what goes where? it is not at all clear beyond the naming issue
Ergonomics for accessing this "special built-in" tool are not great since the client SDK calls look like client.tools_runtime.invoke_tool("builtin::memory", bag_of_arguments)
The goal of this issue is to propose a solution for this.
Proposed Solution
change our existing "Memory + MemoryBanks" APIs and reframe them as "raw indexing" APIs. Specifically, we will have { vector_index, vector_io } and {keyvalue_index, keyvalue_io} APIs. The vector indices will accept raw chunks with metadata to be embedded. They will not do any intelligent chunking.
move chunking functionality and semantics up to the builting memory tool.
rename the memory tool as "builtin::rag_tool" to make the naming more straightforward.
introduce special sugar API to make addressing this tool easier: client.tools_runtime.memory.insert_documents(documents, ...)
Examples
The user starts by registering a vector index with Llama Stack:
Problem
We recently refactored the RAG subsystem built into the agent implementation into an independent "memory" tool. The desire was to:
However, this has created a few other downstream issues we need to solve.
client.tools_runtime.invoke_tool("builtin::memory", bag_of_arguments)
The goal of this issue is to propose a solution for this.
Proposed Solution
{ vector_index, vector_io }
and{keyvalue_index, keyvalue_io}
APIs. The vector indices will accept raw chunks with metadata to be embedded. They will not do any intelligent chunking.client.tools_runtime.memory.insert_documents(documents, ...)
Examples
The user starts by registering a vector index with Llama Stack:
Ingesting documents via the RAG tool
This is the recommended approach so the RAG system can automatically with minimal user configuration can ingest the documents:
Chunking yourself
If the user has a pre-existing ingestion system or prefers to do chunking themselves,
Providing this tool to the agent
There are no changes in terms of how the tool is given to the agent. The rough structure stays the same:
The text was updated successfully, but these errors were encountered: