-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions about model evaluation #32
Comments
Hi @KAGAII, make sure you update neural-cherche using from neural_cherche import models, rank, retrieve, utils
device = "cpu" # or "mps" or "conda"
documents, queries, qrels = utils.load_beir(
"arguana",
split="test",
)
retriever = retrieve.BM25(
key="id",
on=["title", "text"],
)
ranker = rank.ColBERT(
key="id",
on=["title", "text"],
model=models.ColBERT(
model_name_or_path="raphaelsty/neural-cherche-colbert",
device=device,
).to(device),
)
retriever = retriever.add(
documents_embeddings=retriever.encode_documents(
documents=documents,
)
)
candidates = retriever(
queries_embeddings=retriever.encode_queries(
queries=queries,
),
k=30,
tqdm_bar=True,
)
batch_size = 32
scores = ranker(
documents=candidates,
queries_embeddings=ranker.encode_queries(
queries=queries,
batch_size=batch_size,
tqdm_bar=True,
),
documents_embeddings=ranker.encode_candidates_documents(
candidates=candidates,
documents=documents,
batch_size=batch_size,
tqdm_bar=True,
),
k=10,
)
scores = utils.evaluate(
scores=scores,
qrels=qrels,
queries=queries,
metrics=["ndcg@10"] + [f"hits@{k}" for k in range(1, 11)],
)
print(scores) Yield {
"ndcg@10": 0.3686831610778578,
"hits@1": 0.01386748844375963,
"hits@2": 0.27889060092449924,
"hits@3": 0.40061633281972264,
"hits@4": 0.4861325115562404,
"hits@5": 0.5562403697996918,
"hits@6": 0.6194144838212635,
"hits@7": 0.6556240369799692,
"hits@8": 0.6887519260400616,
"hits@9": 0.7218798151001541,
"hits@10": 0.74884437596302,
} which are good scores, it run in 3 min on mps device. The results you get are due do duplicates queries which are now handled by the evaluation of neural-cherche. EDIT: sorry I just saw you mention sparse embed a not colbert, running benchmark |
@KAGAII There is definitely something wrong with SparseEmbed right now, we recently updated SparseEmbed but we may need to update it back to the previous version @arthur-75. I'll make an update in the following days |
Thank you for your prompt reply, looking forward to the new version! |
When I used the pre-trained model 'raphaelsty/neural-cherche-sparse-embed' to evaluate the dataset, specifically, the arguana dataset, with a retrieval k value of 100, the result was very poor
{'map': 0.033567943638956016,
'ndcg@10': 0.042417859280348115,
'ndcg@100': 0.08691780846498275,
'recall@10': 0.09815078236130868,
'recall@100': 0.32147937411095306}
As shown above, ndcg is only 4.2%
The text was updated successfully, but these errors were encountered: