LLM-RAC

This repo provides a library for buidling retrieval augmented classifications (RAC) systems, pairing embedding models with LLMs for enhanced text classification accuracy.

There are a few simple steps to building a RAC system:

Collect data of paired text and labels
Choose an embedding model
[Optional] Finetune the embedding model on your data

We provide a logic for training the model, deriving pairs of positive and negatives using the provided data. Text with the same class label will be provided as positive examples, and text with different class labels will be provided as negative examples.

Build an ANN index by embedding all of the data, additionally storing metadata for each vector, e.g. the text and label.

Here we could build a simple RAC system using just the embedding model and the ANN index. For a given query, we search for the top-K similar examples and perform a majority vote on the labels. This is a simple baseline, but we can do better.

We use the top-K similar pairs as in-context learning examples which we add to the prompt for an LLM, which is instructed to make a prediction on the query.

We provide a notebook with samples of how to use this library as well as two scripts to build the embedding index and evaluate the RAC system.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
rac		rac
.gitignore		.gitignore
.pylintrc		.pylintrc
README.md		README.md
build_rac.py		build_rac.py
eval.py		eval.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM-RAC

About

Releases

Packages

Languages

ManavR123/llm_rac

Folders and files

Latest commit

History

Repository files navigation

LLM-RAC

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages