This project implements a vector search-based real estate recommendation system using MongoDB, OpenAI embeddings, and Flask. It allows users to search for properties using natural language queries, leveraging vector similarity to find relevant listings and providing AI-enhanced responses.
Recording.2024-10-02.022805.mp4
- Natural language property search using vector embeddings
- AI-powered response generation for property recommendations
- MongoDB Atlas vector search integration
- RESTful API endpoint for property queries
- Python 3.8+
- Flask (Web framework)
- MongoDB Atlas (Database with vector search capability)
- OpenAI API (for embeddings and response generation)
- Python 3.8 or higher
- MongoDB Atlas account with vector search enabled
- OpenAI API key
/vector_search_project
│
├── /app
│ ├── __init__.py
│ ├── embeddings.py # Handles embedding generation
│ ├── db.py # Database connection and operations
│ └── api.py # Flask API endpoints
│
├── /data
│ └── dataset.csv # Real estate dataset
│
├── /scripts
│ └── load_data.py # Script to load and embed data
│
├── .env # Environment variables
├── requirements.txt # Python dependencies
└── app.py # Main application entry point
- Clone the repository:
git clone https://github.com/yourusername/vector_search_project.git
cd vector_search_project
- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows, use `venv\Scripts\activate`
- Install dependencies:
pip install -r requirements.txt
- Set up environment variables:
Create a
.env
file in the project root with the following variables:
OPENAI_API_KEY=your_openai_api_key
MONGO_URI=your_mongodb_connection_string
Before running the API, you need to load and embed the real estate data:
- Ensure your dataset is in the correct format and placed in
data/dataset.csv
- Run the data loading script:
python scripts/load_data.py
This script will:
- Load the real estate data
- Generate embeddings for each property
- Store the data and embeddings in MongoDB
- Create the necessary vector search index
- [create a custom vector search index if necessary](create custom vector search index)
Start the Flask application:
python app.py
The API will be available at http://localhost:5000
Endpoint: POST /vector_search
Request Body:
{
"query": "3 bedroom house in Aguadilla under $200,000"
}
Response:
{
"response": "Detailed AI-generated response about matching properties",
"source_information": "Information about the properties used to generate the response"
}
- Basic location and bedroom query:
{
"query": "3 bedroom houses in Aguadilla"
}
- Price range query:
{
"query": "homes under $150,000 in San Juan"
}
- Complex feature query:
{
"query": "large houses with more than 2000 square feet and a pool"
}
The system uses the following pipeline for vector search:
pipeline = [
{
"$vectorSearch": {
"index": "vector_index",
"queryVector": query_embedding,
"path": "embedding_vector",
"numCandidates": 150,
"limit": 5
}
},
{
"$project": {
"_id": 0,
"brokered_by": 1,
"status": 1,
"price": 1,
# ... other fields
}
}
]
Properties are embedded using OpenAI's text-embedding-3-small model. The embedding input combines various property features:
embedding_input = f"{property['brokered_by']}, {property['status']}, Price: {property['price']}, Beds: {property['bed']}, ..."
Common issues and solutions:
-
No results returned:
- Verify that the vector index is created correctly
- Check if documents have embedding vectors
- Ensure query embedding dimensionality matches document embeddings
-
MongoDB connection issues:
- Verify your MongoDB URI in the .env file
- Ensure your IP is whitelisted in MongoDB Atlas
- Fork the repository
- Create a new branch for your feature
- Commit your changes
- Push to the branch
- Create a new Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI for providing the embedding and language models
- MongoDB for their vector search capability
- Bhai free me bohot explore karne diya thanks, MongoDB