The
Index
module handles the indexing and post-processing of the extracted data from the multimodal documents. It creates an indexed Vector Store DB based on Milvus. We enable the use of hybrid retrieval, combining both dense and sparse retrieval.You can customize various parts of the pipeline by defining an inference indexing config file.
Here is a minimal example to index processed documents.
-
Create a config file:
indexer: dense_model_name: sentence-transformers/all-MiniLM-L6-v2 sparse_model_name: splade db: uri: ./proc_demo.db name: my_db collection_name: my_docs documents_path: './output'
-
Index your documents by calling the inference script:
python run_index.py --config_file /path/to/config.yaml
See examples/index
for other examples.