Skip to content

codeably-io/jest-ai

Repository files navigation

jest-ai

Custom jest matchers for testing AI applications


version downloads MIT License PRs Welcome

Watch on GitHub Star on GitHub

The problem

Development of AI tools and applications is a process which requires a lot of manual testing and prompt tweaking. Not only this, but for many developers the world of AI feels like "uncharted land".

This solution

The jest-ai library provides a set of custom jest matchers that you can use to extend jest. These will allow testing the calls and responses of LLMs in a more familiar way.

Table of Contents

Installation

This module is distributed via npm which is bundled with node and should be installed as one of your project's devDependencies:

npm install --save-dev jest-ai

or

for installation with yarn package manager.

yarn add --dev jest-ai

Usage

First thing first, make sure you have OPENAI_API_KEY set in your environment variables. as this library uses the OpenAI API to run the tests.

Import jest-ai once (for instance in your tests setup file) and you're good to go:

// In your own jest-setup.js (or any other name)
import "jest-ai";

// In jest.config.js add (if you haven't already)
setupFilesAfterEnv: ["<rootDir>/jest-setup.js"];

With @jest/globals

If you are using @jest/globals with injectGlobals: false, you will need to use a different import in your tests setup file:

// In your own jest-setup.js (or any other name)
import "jest-ai/jest-globals";

With TypeScript

If you're using TypeScript, make sure your setup file is a .ts and not a .js to include the necessary types.

You will also need to include your setup file in your tsconfig.json if you haven't already:

  // In tsconfig.json
  "include": [
    ...
    "./jest-setup.ts"
  ],

If TypeScript is not able to resolve the matcher methods, you can add the following to your tsconfig.json:

{
  "compilerOptions": {
    "types": ["jest", "jest-ai"]
  }
}

Custom matchers

toSemanticallyMatch

toSemanticallyMatch();

This allows checking if the response from the AI matches or includes the expected response. It uses semantic comparison, which means that "What is your age?" and "When were you born?" could both pass. This is in order to allow the natural and flexible nature of using AI.

Examples

const response = await ai.getResponse("Hello");
// AI Response: "Hello, I am a chatbot set to help you with information for your flight. Can you please share your flight number with me?"
await expect(response).toSemanticallyMatch("What is your flight number?");

or

await expect("What is your surname?").toSemanticallyMatch(
  "What is your last name?"
);

⚠️ This matcher is async: use async await when calling the matcher. This library uses a cosine calculation to check the similarity distance between the two strings. When running semantic match, a range of options can pass/fail. Currently, the threshold is set to 0.75.


toSatisfyStatement

toSatisfyStatement();

This checks if the response from the AI satisfies a simple true of false statement. It uses a custom prompt and a separate chat completion to determine the truthiness of the statement. If the truthiness of the statement cannot be determined from the response, the assertion will fail.

Examples

const response = await ai.getResponse("Hello");
// AI Response: "Hello, I am a chatbot set to help you with information for your flight. Can you please share your flight number with me?"
await expect(response).toSatisfyStatement(
  "It contains a question asking for your flight number."
);

or

await expect("What is your surname?").toSatisfyStatement(
  "It asks for your last name."
);

⚠️ This matcher is async: use async await when calling the matcher. This assertion uses the OpenAI chat completion API, using the gpt-4-turbo model by default. As always, be aware of your API usage!


toHaveUsedSomeTools

toHaveUsedSomeTools();

Assert that a Chat Completion response requests the use of a particular tool.

Examples

const getResponse = async () =>
  await ai.getResponse("Will my KL1234 flight be delayed?");
await expect(getResponse).toHaveUsedSomeTools(["get_flight_status"]);
await expect(getResponse).toHaveUsedSomeTools([
  { name: "get_flight_status", arguments: "KL1234" },
]);

⚠️ This matcher is async: use async await when calling the matcher. This matcher uses the OpenAI chat completion API to check tool calls.


toHaveUsedSomeAssistantTools

toHaveUsedSomeAssistantTools();

Assert that an Assistants API Run response requests the use of a particular tool.

Examples

const assistant = await openai.beta.assistants.create({
  name: "Weather Reporter",
  instructions: "You are a reporter who answers questions on the weather.",
  tools: [getWeatherTool],
  model: "gpt-3.5-turbo-0125",
});

const thread = await openai.beta.threads.create();
await openai.beta.threads.messages.create(thread.id, {
  role: "user",
  content: "What is the weather in New York City?",
});

let run = await openai.beta.threads.runs.create(thread.id, {
  assistant_id: assistant.id,
});

// Assert on just function name
await expect(run).toHaveUsedSomeAssistantTools(["getWeather"]);

// Assert on function name and arguments
await expect(run).toHaveUsedAllAssistantTools([
  { name: "getWeather", arguments: "New York City" },
]);

⚠️ This matcher is async: use async await when calling the matcher This matcher polls the OpenAI Run API to check for tool calls.


toHaveUsedAllTools

toHaveUsedAllTools();

Checks if all the tools given to the LLM were used. Will fail if any of the tools were not used.

Examples

const getResponse = async () =>
  await ai.getResponse("Will my KL1234 flight be delayed?");
await expect(getResponse).toHaveUsedAllTools([
  "get_flight_status",
  "get_flight_delay",
]);
await expect(getResponse).toHaveUsedAllTools([
  { name: "get_flight_status", arguments: "KL1234" },
  { name: "get_flight_delay", arguments: "KL1234" },
]);

⚠️ This matcher is async: use async await when calling the matcher This matcher uses the OpenAI chat completion API to check tool calls.


toHaveUsedAllAssistantTools

toHaveUsedAllAssistantTools();

Assert that an Assistants API Run response requests the use of a particular tool.

Examples

const assistant = await openai.beta.assistants.create({
  name: "Weather Reporter",
  instructions: "You are a reporter who answers questions on the weather.",
  tools: [getWeatherTool],
  model: "gpt-3.5-turbo-0125",
});

const thread = await openai.beta.threads.create();
await openai.beta.threads.messages.create(thread.id, {
  role: "user",
  content: "What is the weather in New York City and in San Francisco?",
});

let run = await openai.beta.threads.runs.create(thread.id, {
  assistant_id: assistant.id,
});
// Assert simply on function name
await expect(run).toHaveUsedAllAssistantTools(["getWeather"]);

// Assert on function name and arguments
await expect(run).toHaveUsedAllAssistantTools([
  { name: "getWeather", arguments: "New York City" },
  { name: "getWeather", arguments: "San Francisco" },
]);

⚠️ This matcher is async: use async await when calling the matcher This matcher polls the OpenAI Run API to check for tool calls.


toMatchZodSchema

toMatchZodSchema();

Many times, we would like our LLMs to respond in a JSON format that's easier to work with later. This matcher allows us to check if the response from the LLM matches a given Zod schema.

Examples

const response = await ai.getResponse(`
    Name 3 animals, their height, and weight. Response in the following JSON format:
    {
        "animals": [
            {
                "name": "Elephant",
                "height": "3m",
                "weight": "6000kg"
            },
        ]
    }
`);
const expectedSchema = z.object({
  animals: z.array(
    z.object({
      name: z.string(),
      height: z.string(),
      weight: z.string(),
    })
  ),
});
expect(getResponse).toMatchZodSchema(expectedSchema);

LICENSE

MIT