Improve ergonomics for the builtin "memory" (RAG) tool #827

ashwinb · 2025-01-19T19:15:10Z

Problem

We recently refactored the RAG subsystem built into the agent implementation into an independent "memory" tool. The desire was to:

enhance modularity
allow for swapping of this subsystem by other implementations
most importantly, allow for evaluating this subsystem independently

However, this has created a few other downstream issues we need to solve.

There's confusion between this memory tool and the "Memory / MemoryBanks" APIs -- what goes where? it is not at all clear beyond the naming issue
Ergonomics for accessing this "special built-in" tool are not great since the client SDK calls look like client.tools_runtime.invoke_tool("builtin::memory", bag_of_arguments)

The goal of this issue is to propose a solution for this.

Proposed Solution

change our existing "Memory + MemoryBanks" APIs and reframe them as "raw indexing" APIs. Specifically, we will have { vector_index, vector_io } and {keyvalue_index, keyvalue_io} APIs. The vector indices will accept raw chunks with metadata to be embedded. They will not do any intelligent chunking.
move chunking functionality and semantics up to the builting memory tool.
rename the memory tool as "builtin::rag_tool" to make the naming more straightforward.
introduce special sugar API to make addressing this tool easier: client.tools_runtime.memory.insert_documents(documents, ...)

Examples

The user starts by registering a vector index with Llama Stack:

client.vector_indices.register(
   "my_knowledge_base", 
   embedding_model=...,
   ...other params...,
   provider_id="remote::weaviate"
)

Ingesting documents via the RAG tool

This is the recommended approach so the RAG system can automatically with minimal user configuration can ingest the documents:

client.tools_runtime.memory.insert_documents(
   documents,
   metadata,
   vector_indices=["my_knowledge_base"],
   # ... params for checking, etc.
)

Chunking yourself

If the user has a pre-existing ingestion system or prefers to do chunking themselves,

chunks = get_chunks_from_documents(documents)
client.vector_io.insert_chunks(
  "my_knowledge_base",
  chunks,
  # embedding params
)

Providing this tool to the agent

There are no changes in terms of how the tool is given to the agent. The rough structure stays the same:

agent_config = AgentConfig(
   tools=[
      ("builtin::rag_tool", dict(vector_indices=["my_knowledge_base"]))
   ]
)

The text was updated successfully, but these errors were encountered:

ashwinb added this to the v0.1 milestone Jan 19, 2025

ashwinb self-assigned this Jan 19, 2025

ashwinb linked a pull request Jan 19, 2025 that will close this issue

[memory refactor][1/n] Rename Memory -> VectorIO, MemoryBanks -> VectorDBs #828

Open

ashwinb linked a pull request Jan 19, 2025 that will close this issue

[memory refactor][2/n] Update faiss and make it pass tests #830

Open

ashwinb removed a link to a pull request Jan 19, 2025

[memory refactor][1/n] Rename Memory -> VectorIO, MemoryBanks -> VectorDBs #828

Open

This was linked to pull requests Jan 19, 2025

[memory refactor][2/n] Update faiss and make it pass tests #830

Open

[memory refactor][1/n] Rename Memory -> VectorIO, MemoryBanks -> VectorDBs #828

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve ergonomics for the builtin "memory" (RAG) tool #827

Improve ergonomics for the builtin "memory" (RAG) tool #827

ashwinb commented Jan 19, 2025 •

edited

Loading

Improve ergonomics for the builtin "memory" (RAG) tool #827

Improve ergonomics for the builtin "memory" (RAG) tool #827

Comments

ashwinb commented Jan 19, 2025 • edited Loading

Problem

Proposed Solution

Examples

Ingesting documents via the RAG tool

Chunking yourself

Providing this tool to the agent

ashwinb commented Jan 19, 2025 •

edited

Loading