Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ChatCompletionClient to support request caching #4752

Open
ekzhu opened this issue Dec 18, 2024 · 5 comments · May be fixed by #4924
Open

ChatCompletionClient to support request caching #4752

ekzhu opened this issue Dec 18, 2024 · 5 comments · May be fixed by #4924
Assignees
Milestone

Comments

@ekzhu
Copy link
Collaborator

ekzhu commented Dec 18, 2024

Support client-side caching for any ChatCompletionClient type.

Simplest way to do it is to create a ChatCompletionCache type that implements the ChatCompletionClient protocol but wraps an existing client.

Example how this may work:

from autogen_ext.stores.diskcache import DiskCacheStore
from autogen_ext.models.cache import ChatCompletionCache
from autogen_ext.models.openai import OpenAIChatCompletionClient

# Cached client.
cached_client = ChatCompletionCache(OpenAIChatCompletionClient(model="gpt-4o"), store=DiskCacheStore())
@ekzhu ekzhu added this to the 0.4.1 milestone Dec 18, 2024
@ekzhu ekzhu assigned ekzhu and srjoglekar246 and unassigned ekzhu Jan 4, 2025
@srjoglekar246
Copy link
Contributor

Heres a basic idea I have based on what we had in 0.2:

  1. We add AbstractStoreBase as the primary interface in autogen_core for cache stores.
  2. We implement InMemoryStore etc in autogen_ext along with a general Store factory.
  3. This will allow the implementation of a Cached Client interface similar to what Eric mentioned above:
    from autogen_ext.store.in_memory_store import InMemoryStore
    model_client = OpenAIChatCompletionClient(
        model="gpt-4o",
        api_key="...",
    )
    model_client = ChatCompletionCache(model_client, InMemoryStore())
    
    print(f"Model info: {model_client.model_info}")
    print("\n")
    
    prompt_messages = [
        SystemMessage(content=SYSTEM_PROMPT),
        UserMessage(content=USER_PROMPT, source="user"),
    ]
    
    num_prompt_tokens = model_client.count_tokens(prompt_messages)
    print(f"Prompt tokens: {num_prompt_tokens}")
    
    result = await model_client.create(messages=prompt_messages)
    print(f"create output: {result.content}")

We can modify the Result instances from this class to have cached=True etc.

As for the actual caching, since we use pydantic Models for the messages we can encode the incoming prompt info as a json & hash it for the cache key.

WDYT @ekzhu / @jackgerrits ?

@ekzhu
Copy link
Collaborator Author

ekzhu commented Jan 6, 2025

For the abstract interface we can keep it super simple so existing libraries like diskcache and redis can just duck-type it. e.g., an interface with just set and get. So, there will be no need to create another extension module just for this, and user can just import redis and use it directly for in-memory cache, and additional in-memory store implementation is not needed.

@rickyloynd-microsoft
Copy link
Contributor

On a related note, for cases where the user requires all responses to be pulled from the cache, such as for quick regression tests, it could be useful to have the cached client throw an error (rather than calling the model_client) for any prompt that is not found in the cache. This functionality could be enabled by passing None as the model_client parameter. I've implemented a client wrapper that provides this caching and checking (plus numeric result checking) for my own regression tests, but my client wrapper isn't a complete ChatCompletionClient replacement.

@srjoglekar246
Copy link
Contributor

srjoglekar246 commented Jan 6, 2025

@rickyloynd-microsoft Can you share a pointer/branch to your code, if possible?

useful to have the cached client throw an error (rather than calling the model_client) for any prompt that is not found in the cache

Since the original client is passed during init (for other methods like model_info/etc), this can probably be implemented as a kwarg on the create method maybe?

@rickyloynd-microsoft
Copy link
Contributor

It will be in a PR soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants