LLM Documentation Explorer

Overview

The LLM Documentation Explorer is a sophisticated application designed to browse and interact with NoRedInk's internal documentation. It leverages a locally hosted open-source Large Language Model (LLM) with Retrieval Augmented Generation (RAG) capabilities, ensuring efficient and intuitive access to extensive company knowledge. This tool integrates a core RAG pipeline, an API layer, and a user-friendly frontend interface.

Technologies

Langchain: Manages the RAG pipeline.
Langserve: API management built on FastAPI.
Langsmith: Used for monitoring and evaluation.
Streamlit: Powers the frontend interface.
Ollama: Hosts and runs the LLM.
PostgreSQL with PGVector: Manages the remote vector database.

Architecture

Core RAG pipeline

The core RAG implementation allows for either local or remote operation. Local operation saves a FAISS vector store to a local file, running it in memory. Remote operation saves the vector database to a PostgreSQL database using the PGVector extension.

The implementation uses the Llama 3 8B model, run locally using Ollama.

Multiple optimization have been implemented to maximise performance:

A parent document retrieval system is used, separating the data used for vector db retrieval (the child documents) from that sent to the LLM (the parent documents) into separate copies of the knowledge base
The child document data embedded and stored in the vector db is preprocessed to optimize vector search, using the following steps:
- Noise removal: Unhelpful symbols and whitespace removed
- Case normalization: all text converted to lowercase
- Lemmatization: words stemmed for better matching
Retrieved documents are reranked to find the most appropriate matches
Final documents are summarised to provide only useful and contextually relevant information to the LLM

API

The API serves the following endpoints:

/query/*: Processes queries using the RAG to generate responses.
/search/*: Searches the vector database without LLM intervention.
/feedback: Collects user feedback on query responses.
/ingest: Initiates re-ingestion of knowledge base data.
/upload: Manages the uploading and ingestion of new datasets.

*Langserve automatically implements invoke, batch, stream, stream_log, and stream_events endpoint expansions for the query and search base endpoints (as well as their async versions).

NOTE: the /feedback, /ingest, and /upload endpoints have schemas that can be viewed and interacted with in the application playground at /docs url.

Frontend

The Streamlit frontend provides a straightforward interface for querying the knowledge base.

It also allows for the latest version of the knowledge base documents to be uploaded directly.

Running Locally

Install, setup, and run Ollama and Llama3

# https://github.com/ollama/ollama?tab=readme-ov-file
brew install ollama
brew services start ollama
ollama pull llama3
OLLAMA_NUM_PARALLEL=2 OLLAMA_MAX_LOADED_MODELS=2 ollama serve  # Run with concurrency

Download repo (if not in monorepo)
Install poetry and project dependencies

pip install poetry
poetry install

Setup .env file

LANGCHAIN_TRACING_V2=true
LANGCHAIN_PROJECT=llm-docs-explorer
LANGCHAIN_API_KEY=<key>

# Local values for these can be found by starting postgres: 
# $ aide setup-postgres
PG_HOST=127.0.0.1
PG_PORT=<port>
PG_USER=<user>
PG_PASSWORD=<password>
PG_DBNAME=llm_docs_explorer

Ensure PGVector is installed locally

Apply the changes found in Micah's PR
Run CREATE EXTENSION IF NOT EXISTS vector; in postgres to ensure verify

Run server and client

# Backend
cd backend
poetry run uvicorn main:app --reload

# Frontend
cd frontend
poetry run streamlit run main.py

Use the frontend to upload documents for the knowledge base
- Go to Engineering folder in Dropbox Paper
- Select Fires folder
- Select "Export" from bottom menu
- Select "Export as Markdown"
- upload the .zip file using the Frontend

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md
evaluation.ipynb		evaluation.ipynb
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
scratchpad.ipynb		scratchpad.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Documentation Explorer

Overview

Technologies

Architecture

Core RAG pipeline

API

Frontend

Running Locally

Resources and references

General RAG Resources:

Streaming Server Output:

Concurrency

RAG Performance Improvement Resources

About

Releases

Packages

Languages

mandla-noredink/LLM-Documentation-Exploration

Folders and files

Latest commit

History

Repository files navigation

LLM Documentation Explorer

Overview

Technologies

Architecture

Core RAG pipeline

API

Frontend

Running Locally

Resources and references

General RAG Resources:

Streaming Server Output:

Concurrency

RAG Performance Improvement Resources

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages