Skip to content

Commit

Permalink
Merge pull request #10 from Bessouat40/refacto
Browse files Browse the repository at this point in the history
Refacto
  • Loading branch information
Bessouat40 authored Dec 28, 2024
2 parents ba4bba0 + a53f335 commit 42540a4
Show file tree
Hide file tree
Showing 21 changed files with 77 additions and 77 deletions.
9 changes: 8 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,11 @@ chroma*/
**/__pycache__/
.env
.DS_Store
api/
api/
build/
setup.py
MANIFEST.in
pyproject.toml
src/raglight.egg-info/
dist/
build/
106 changes: 49 additions & 57 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,99 +1,89 @@
# RAGLight

**RAGLight** is a lightweight and modular framework for implementing **Retrieval-Augmented Generation (RAG)**. It enhances the capabilities of Large Language Models (LLMs) by combining document retrieval with natural language inference.
[![PyPI version](https://badge.fury.io/py/raglight.svg)](https://badge.fury.io/py/raglight)

Designed for simplicity and flexibility, RAGLight leverages **Ollama** for LLM interaction and vector embeddings for efficient document similarity searches, making it an ideal tool for building context-aware AI solutions. ✨
**RAGLight** is a lightweight and modular Python library for implementing **Retrieval-Augmented Generation (RAG)**. It enhances the capabilities of Large Language Models (LLMs) by combining document retrieval with natural language inference.

Designed for simplicity and flexibility, RAGLight provides modular components to easily integrate various LLMs, embeddings, and vector stores, making it an ideal tool for building context-aware AI solutions. ✨

---

## Features
## Features 🔥

- 🌐 **Embeddings Model**: Uses `all-MiniLM-L6-v2` for creating compact and efficient vector embeddings, ideal for document similarity searches.
- 🧙🏽 **LLM Integration**: Employs `llama3` for natural language inference, enabling human-like and context-aware responses.
- ⚖️ **RAG Pipeline**: Seamlessly integrates document retrieval with natural language generation into a unified workflow.
- 🖋️ **PDF Support**: Supports ingestion and indexing of PDF files for easy querying.
- 🌐 **Embeddings Model Integration**: Plug in your preferred embedding models (e.g., HuggingFace `all-MiniLM-L6-v2`) for compact and efficient vector embeddings.
- 🧙🏽 **LLM Agnostic**: Seamlessly integrates with different LLMs, such as `llama3` or custom providers, for natural language inference.
- ⚖️ **RAG Pipeline**: Combines document retrieval and language generation in a unified workflow.
- 🖋️ **Flexible Document Support**: Ingest and index various document types (e.g., PDF, TXT, DOCX).
- 🛠️ **Extensible Architecture**: Easily swap vector stores, embedding models, or LLMs to suit your needs.

---

## Prerequisites
## Installation 🛠️

Before you get started, make sure you have the following:
Install RAGLight directly from PyPI:

- **Python**: Version >= 3.11
- **Ollama Client**: Ensure you have a properly configured Ollama client. You may need an API key or a local Ollama server instance.
- **Python Dependencies**: See the Installation section below.
```bash
pip install raglight
```

---

## Installation
## Quick Start 🚀

### 1. Clone the Repository
### **1. Configure Your Pipeline**

```bash
git clone https://github.com/Bessouat40/rag-example.git
cd rag-example
```
Set up the components of your RAG pipeline:

### 2. Install Dependencies
```python
from raglight.rag.builder import Builder
from src.raglight.config.settings import Settings

```bash
pip install -r requirements.txt
rag = Builder() \
.with_embeddings(Settings.HUGGINGFACE, model_name=model_embeddings) \
.with_vector_store(Settings.CHROMA, persist_directory=persist_directory, collection_name=collection_name) \
.with_llm(Settings.OLLAMA, model_name=model_name, system_prompt_file=system_prompt_directory) \
.build_rag()
```

### 3. Configure Environment Variables
### **2. Ingest Documents**

```bash
mv .env.example .env
Use the pipeline to ingest documents into the vector store:

```python
rag.vector_store.ingest(file_extension='**/*.pdf', data_path='./data')
```

Then fill in the `.env` file with the necessary configuration:
### **3. Query the Pipeline**

```bash
# Example configuration
OLLAMA_CLIENT=<URL or key for the Ollama client>
PERSIST_DIRECTORY=<Path to store inference data>
PERSIST_DIRECTORY_INGESTION=<Path to store ingestion data>
MODEL_EMBEDDINGS=all-MiniLM-L6-v2
MODEL_NAME=llama3
SYSTEM_PROMPT_DIRECTORY=<Path to the system prompt file>
COLLECTION_NAME=<Collection name for inference>
COLLECTION_NAME_INGESTION=<Collection name for ingestion>
DATA_PATH=./data
Retrieve and generate answers using the RAG pipeline:

```python
response = rag.question_graph("How can I optimize my marathon training?")
print(response)
```

---

## Document Ingestion
## Advanced Configuration ⚙️

To ingest your files (currently only PDF files are supported), place them in the `data` folder or the path specified by the `DATA_PATH` variable in the `.env` file.
### Environment Variables

Run the following script to index the documents:
Configure the pipeline with environment variables for better modularity:

```bash
python ingestion_example.py
export PERSIST_DIRECTORY=./vectorstore
export MODEL_EMBEDDINGS=all-MiniLM-L6-v2
export MODEL_NAME=llama3
```

This script:

- ⏳ Loads the embeddings model specified in `.env`.
- 🎮 Uses the `VectorStore` (Chroma) to index the documents.
- 🔐 Creates a persistent index in the directory defined by `PERSIST_DIRECTORY_INGESTION`.

---

## Query the Model (RAG Pipeline)

To query the RAG pipeline, use the following script:
You can also define these in a `.env` file:

```bash
python rag_example.py
PERSIST_DIRECTORY=./vectorstore
MODEL_EMBEDDINGS=all-MiniLM-L6-v2
MODEL_NAME=llama3
```

The pipeline:

- 🔍 Retrieves the most relevant documents using the vector model.
- 🤖 Uses the `llama3` model to generate a response based on the retrieved context.

---

## TODO
Expand All @@ -103,3 +93,5 @@ The pipeline:
- [ ] **Feature**: Integrate new LLM providers (e.g., VLLM, HuggingFace, GPT-Neo).

---

🚀 **Get started with RAGLight today and build smarter, context-aware AI solutions!**
4 changes: 2 additions & 2 deletions discussion_example.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from src.rag.builder import Builder
from src.config.settings import Settings
from src.raglight.rag.builder import Builder
from src.raglight.config.settings import Settings
from dotenv import load_dotenv
import os

Expand Down
4 changes: 2 additions & 2 deletions ingestion_example.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from src.rag.builder import Builder
from src.config.settings import Settings
from src.raglight.rag.builder import Builder
from src.raglight.config.settings import Settings
from dotenv import load_dotenv
import os

Expand Down
4 changes: 2 additions & 2 deletions rag_example.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from src.rag.builder import Builder
from src.config.settings import Settings
from src.raglight.rag.builder import Builder
from src.raglight.config.settings import Settings
from dotenv import load_dotenv
import os

Expand Down
14 changes: 1 addition & 13 deletions src/__init__.py
Original file line number Diff line number Diff line change
@@ -1,13 +1 @@
from .vectorestore.vectorStore import VectorStore
from .vectorestore.chroma import ChromaVS

from .embeddings.embeddingsModel import EmbeddingsModel

from .llm.llm import LLM
from .embeddings.huggingfaceEmbeddings import HuggingfaceEmbeddings
from .llm.ollamaModel import OllamaModel

from .rag.rag import RAG
from .rag.builder import Builder

from .config.settings import Settings
from .raglight import *
13 changes: 13 additions & 0 deletions src/raglight/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
from .vectorestore.vectorStore import VectorStore
from .vectorestore.chroma import ChromaVS

from .embeddings.embeddingsModel import EmbeddingsModel

from .llm.llm import LLM
from .embeddings.huggingfaceEmbeddings import HuggingfaceEmbeddings
from .llm.ollamaModel import OllamaModel

from .rag.rag import RAG
from .rag.builder import Builder

from .config.settings import Settings
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.

0 comments on commit 42540a4

Please sign in to comment.