Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ergonomics for the builtin "memory" (RAG) tool #827

Open
ashwinb opened this issue Jan 19, 2025 · 0 comments · May be fixed by #830 or #828
Open

Improve ergonomics for the builtin "memory" (RAG) tool #827

ashwinb opened this issue Jan 19, 2025 · 0 comments · May be fixed by #830 or #828
Assignees
Milestone

Comments

@ashwinb
Copy link
Contributor

ashwinb commented Jan 19, 2025

Problem

We recently refactored the RAG subsystem built into the agent implementation into an independent "memory" tool. The desire was to:

  • enhance modularity
  • allow for swapping of this subsystem by other implementations
  • most importantly, allow for evaluating this subsystem independently

However, this has created a few other downstream issues we need to solve.

  • There's confusion between this memory tool and the "Memory / MemoryBanks" APIs -- what goes where? it is not at all clear beyond the naming issue
  • Ergonomics for accessing this "special built-in" tool are not great since the client SDK calls look like client.tools_runtime.invoke_tool("builtin::memory", bag_of_arguments)

The goal of this issue is to propose a solution for this.

Proposed Solution

Image
  • change our existing "Memory + MemoryBanks" APIs and reframe them as "raw indexing" APIs. Specifically, we will have { vector_index, vector_io } and {keyvalue_index, keyvalue_io} APIs. The vector indices will accept raw chunks with metadata to be embedded. They will not do any intelligent chunking.
  • move chunking functionality and semantics up to the builting memory tool.
  • rename the memory tool as "builtin::rag_tool" to make the naming more straightforward.
  • introduce special sugar API to make addressing this tool easier: client.tools_runtime.memory.insert_documents(documents, ...)

Examples

The user starts by registering a vector index with Llama Stack:

client.vector_indices.register(
   "my_knowledge_base", 
   embedding_model=...,
   ...other params...,
   provider_id="remote::weaviate"
)

Ingesting documents via the RAG tool

This is the recommended approach so the RAG system can automatically with minimal user configuration can ingest the documents:

client.tools_runtime.memory.insert_documents(
   documents,
   metadata,
   vector_indices=["my_knowledge_base"],
   # ... params for checking, etc.
)

Chunking yourself

If the user has a pre-existing ingestion system or prefers to do chunking themselves,

chunks = get_chunks_from_documents(documents)
client.vector_io.insert_chunks(
  "my_knowledge_base",
  chunks,
  # embedding params
)

Providing this tool to the agent

There are no changes in terms of how the tool is given to the agent. The rough structure stays the same:

agent_config = AgentConfig(
   tools=[
      ("builtin::rag_tool", dict(vector_indices=["my_knowledge_base"]))
   ]
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment