Merge pull request #10 from Bessouat40/refacto

Refacto
Bessouat40 · Dec 28, 2024 · 42540a4 · 42540a4
2 parents ba4bba0 + a53f335
commit 42540a4
Show file tree

Hide file tree

Showing 21 changed files with 77 additions and 77 deletions.
diff --git a/.gitignore b/.gitignore
@@ -4,4 +4,11 @@ chroma*/
 **/__pycache__/
 .env
 .DS_Store
-api/
+api/
+build/
+setup.py
+MANIFEST.in
+pyproject.toml
+src/raglight.egg-info/
+dist/
+build/
diff --git a/README.md b/README.md
@@ -1,99 +1,89 @@
 # RAGLight
 
-**RAGLight** is a lightweight and modular framework for implementing **Retrieval-Augmented Generation (RAG)**. It enhances the capabilities of Large Language Models (LLMs) by combining document retrieval with natural language inference.
+[![PyPI version](https://badge.fury.io/py/raglight.svg)](https://badge.fury.io/py/raglight)
 
-Designed for simplicity and flexibility, RAGLight leverages **Ollama** for LLM interaction and vector embeddings for efficient document similarity searches, making it an ideal tool for building context-aware AI solutions. ✨
+**RAGLight** is a lightweight and modular Python library for implementing **Retrieval-Augmented Generation (RAG)**. It enhances the capabilities of Large Language Models (LLMs) by combining document retrieval with natural language inference.
+
+Designed for simplicity and flexibility, RAGLight provides modular components to easily integrate various LLMs, embeddings, and vector stores, making it an ideal tool for building context-aware AI solutions. ✨
 
 ---
 
-## Features
+## Features 🔥
 
-- 🌐 **Embeddings Model**: Uses `all-MiniLM-L6-v2` for creating compact and efficient vector embeddings, ideal for document similarity searches.
-- 🧙🏽 **LLM Integration**: Employs `llama3` for natural language inference, enabling human-like and context-aware responses.
-- ⚖️ **RAG Pipeline**: Seamlessly integrates document retrieval with natural language generation into a unified workflow.
-- 🖋️ **PDF Support**: Supports ingestion and indexing of PDF files for easy querying.
+- 🌐 **Embeddings Model Integration**: Plug in your preferred embedding models (e.g., HuggingFace `all-MiniLM-L6-v2`) for compact and efficient vector embeddings.
+- 🧙🏽 **LLM Agnostic**: Seamlessly integrates with different LLMs, such as `llama3` or custom providers, for natural language inference.
+- ⚖️ **RAG Pipeline**: Combines document retrieval and language generation in a unified workflow.
+- 🖋️ **Flexible Document Support**: Ingest and index various document types (e.g., PDF, TXT, DOCX).
+- 🛠️ **Extensible Architecture**: Easily swap vector stores, embedding models, or LLMs to suit your needs.
 
 ---
 
-## Prerequisites
+## Installation 🛠️
 
-Before you get started, make sure you have the following:
+Install RAGLight directly from PyPI:
 
-- **Python**: Version >= 3.11
-- **Ollama Client**: Ensure you have a properly configured Ollama client. You may need an API key or a local Ollama server instance.
-- **Python Dependencies**: See the Installation section below.
+```bash
+pip install raglight
+```
 
 ---
 
-## Installation
+## Quick Start 🚀
 
-### 1. Clone the Repository
+### **1. Configure Your Pipeline**
 
-```bash
-git clone https://github.com/Bessouat40/rag-example.git
-cd rag-example
-```
+Set up the components of your RAG pipeline:
 
-### 2. Install Dependencies
+```python
+from raglight.rag.builder import Builder
+from src.raglight.config.settings import Settings
 
-```bash
-pip install -r requirements.txt
+rag = Builder() \
+    .with_embeddings(Settings.HUGGINGFACE, model_name=model_embeddings) \
+    .with_vector_store(Settings.CHROMA, persist_directory=persist_directory, collection_name=collection_name) \
+    .with_llm(Settings.OLLAMA, model_name=model_name, system_prompt_file=system_prompt_directory) \
+    .build_rag()
 ```
 
-### 3. Configure Environment Variables
+### **2. Ingest Documents**
 
-```bash
-mv .env.example .env
+Use the pipeline to ingest documents into the vector store:
+
+```python
+rag.vector_store.ingest(file_extension='**/*.pdf', data_path='./data')
 ```
 
-Then fill in the `.env` file with the necessary configuration:
+### **3. Query the Pipeline**
 
-```bash
-# Example configuration
-OLLAMA_CLIENT=<URL or key for the Ollama client>
-PERSIST_DIRECTORY=<Path to store inference data>
-PERSIST_DIRECTORY_INGESTION=<Path to store ingestion data>
-MODEL_EMBEDDINGS=all-MiniLM-L6-v2
-MODEL_NAME=llama3
-SYSTEM_PROMPT_DIRECTORY=<Path to the system prompt file>
-COLLECTION_NAME=<Collection name for inference>
-COLLECTION_NAME_INGESTION=<Collection name for ingestion>
-DATA_PATH=./data
+Retrieve and generate answers using the RAG pipeline:
+
+```python
+response = rag.question_graph("How can I optimize my marathon training?")
+print(response)
 ```
 
 ---
 
-## Document Ingestion
+## Advanced Configuration ⚙️
 
-To ingest your files (currently only PDF files are supported), place them in the `data` folder or the path specified by the `DATA_PATH` variable in the `.env` file.
+### Environment Variables
 
-Run the following script to index the documents:
+Configure the pipeline with environment variables for better modularity:
 
 ```bash
-python ingestion_example.py
+export PERSIST_DIRECTORY=./vectorstore
+export MODEL_EMBEDDINGS=all-MiniLM-L6-v2
+export MODEL_NAME=llama3
 ```
 
-This script:
-
-- ⏳ Loads the embeddings model specified in `.env`.
-- 🎮 Uses the `VectorStore` (Chroma) to index the documents.
-- 🔐 Creates a persistent index in the directory defined by `PERSIST_DIRECTORY_INGESTION`.
-
----
-
-## Query the Model (RAG Pipeline)
-
-To query the RAG pipeline, use the following script:
+You can also define these in a `.env` file:
 
 ```bash
-python rag_example.py
+PERSIST_DIRECTORY=./vectorstore
+MODEL_EMBEDDINGS=all-MiniLM-L6-v2
+MODEL_NAME=llama3
 ```
 
-The pipeline:
-
-- 🔍 Retrieves the most relevant documents using the vector model.
-- 🤖 Uses the `llama3` model to generate a response based on the retrieved context.
-
 ---
 
 ## TODO
@@ -103,3 +93,5 @@ The pipeline:
 - [ ] **Feature**: Integrate new LLM providers (e.g., VLLM, HuggingFace, GPT-Neo).
 
 ---
+
+🚀 **Get started with RAGLight today and build smarter, context-aware AI solutions!**
diff --git a/discussion_example.py b/discussion_example.py
@@ -1,5 +1,5 @@
-from src.rag.builder import Builder
-from src.config.settings import Settings
+from src.raglight.rag.builder import Builder
+from src.raglight.config.settings import Settings
 from dotenv import load_dotenv
 import os
 

diff --git a/ingestion_example.py b/ingestion_example.py
@@ -1,5 +1,5 @@
-from src.rag.builder import Builder
-from src.config.settings import Settings
+from src.raglight.rag.builder import Builder
+from src.raglight.config.settings import Settings
 from dotenv import load_dotenv
 import os
 

diff --git a/rag_example.py b/rag_example.py
@@ -1,5 +1,5 @@
-from src.rag.builder import Builder
-from src.config.settings import Settings
+from src.raglight.rag.builder import Builder
+from src.raglight.config.settings import Settings
 from dotenv import load_dotenv
 import os
 

diff --git a/src/__init__.py b/src/__init__.py
@@ -1,13 +1 @@
-from .vectorestore.vectorStore import VectorStore
-from .vectorestore.chroma import ChromaVS
-
-from .embeddings.embeddingsModel import EmbeddingsModel
-
-from .llm.llm import LLM
-from .embeddings.huggingfaceEmbeddings import HuggingfaceEmbeddings
-from .llm.ollamaModel import OllamaModel
-
-from .rag.rag import RAG
-from .rag.builder import Builder
-
-from .config.settings import Settings
+from .raglight import *
diff --git a/src/raglight/__init__.py b/src/raglight/__init__.py
@@ -0,0 +1,13 @@
+from .vectorestore.vectorStore import VectorStore
+from .vectorestore.chroma import ChromaVS
+
+from .embeddings.embeddingsModel import EmbeddingsModel
+
+from .llm.llm import LLM
+from .embeddings.huggingfaceEmbeddings import HuggingfaceEmbeddings
+from .llm.ollamaModel import OllamaModel
+
+from .rag.rag import RAG
+from .rag.builder import Builder
+
+from .config.settings import Settings
diff --git a/src/config/__init__.py → src/raglight/config/__init__.py b/src/config/__init__.py → src/raglight/config/__init__.py
diff --git a/src/config/settings.py → src/raglight/config/settings.py b/src/config/settings.py → src/raglight/config/settings.py
diff --git a/src/embeddings/__init__.py → src/raglight/embeddings/__init__.py b/src/embeddings/__init__.py → src/raglight/embeddings/__init__.py
diff --git a/src/embeddings/embeddingsModel.py → src/raglight/embeddings/embeddingsModel.py b/src/embeddings/embeddingsModel.py → src/raglight/embeddings/embeddingsModel.py
diff --git a/src/embeddings/huggingfaceEmbeddings.py → ...light/embeddings/huggingfaceEmbeddings.py b/src/embeddings/huggingfaceEmbeddings.py → ...light/embeddings/huggingfaceEmbeddings.py
diff --git a/src/llm/__init__.py → src/raglight/llm/__init__.py b/src/llm/__init__.py → src/raglight/llm/__init__.py
diff --git a/src/llm/llm.py → src/raglight/llm/llm.py b/src/llm/llm.py → src/raglight/llm/llm.py
diff --git a/src/llm/ollamaModel.py → src/raglight/llm/ollamaModel.py b/src/llm/ollamaModel.py → src/raglight/llm/ollamaModel.py
diff --git a/src/rag/__init__.py → src/raglight/rag/__init__.py b/src/rag/__init__.py → src/raglight/rag/__init__.py
diff --git a/src/rag/builder.py → src/raglight/rag/builder.py b/src/rag/builder.py → src/raglight/rag/builder.py
diff --git a/src/rag/rag.py → src/raglight/rag/rag.py b/src/rag/rag.py → src/raglight/rag/rag.py
diff --git a/src/vectorestore/__init__.py → src/raglight/vectorestore/__init__.py b/src/vectorestore/__init__.py → src/raglight/vectorestore/__init__.py
diff --git a/src/vectorestore/chroma.py → src/raglight/vectorestore/chroma.py b/src/vectorestore/chroma.py → src/raglight/vectorestore/chroma.py
diff --git a/src/vectorestore/vectorStore.py → src/raglight/vectorestore/vectorStore.py b/src/vectorestore/vectorStore.py → src/raglight/vectorestore/vectorStore.py