Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about model evaluation #32

Open
yaxundai opened this issue Jun 7, 2024 · 3 comments
Open

Questions about model evaluation #32

yaxundai opened this issue Jun 7, 2024 · 3 comments
Assignees
Labels
bug Something isn't working question Further information is requested

Comments

@yaxundai
Copy link

yaxundai commented Jun 7, 2024

When I used the pre-trained model 'raphaelsty/neural-cherche-sparse-embed' to evaluate the dataset, specifically, the arguana dataset, with a retrieval k value of 100, the result was very poor
{'map': 0.033567943638956016,
'ndcg@10': 0.042417859280348115,
'ndcg@100': 0.08691780846498275,
'recall@10': 0.09815078236130868,
'recall@100': 0.32147937411095306}
As shown above, ndcg is only 4.2%

@raphaelsty
Copy link
Owner

raphaelsty commented Jun 7, 2024

Hi @KAGAII, make sure you update neural-cherche using pip install neural-cherche --upgrade to get the 1.4.3 version.

from neural_cherche import models, rank, retrieve, utils

device = "cpu" # or "mps" or "conda"

documents, queries, qrels = utils.load_beir(
    "arguana",
    split="test",
)

retriever = retrieve.BM25(
    key="id",
    on=["title", "text"],
)


ranker = rank.ColBERT(
    key="id",
    on=["title", "text"],
    model=models.ColBERT(
        model_name_or_path="raphaelsty/neural-cherche-colbert",
        device=device,
    ).to(device),
)


retriever = retriever.add(
    documents_embeddings=retriever.encode_documents(
        documents=documents,
    )
)


candidates = retriever(
    queries_embeddings=retriever.encode_queries(
        queries=queries,
    ),
    k=30,
    tqdm_bar=True,
)

batch_size = 32

scores = ranker(
    documents=candidates,
    queries_embeddings=ranker.encode_queries(
        queries=queries,
        batch_size=batch_size,
        tqdm_bar=True,
    ),
    documents_embeddings=ranker.encode_candidates_documents(
        candidates=candidates,
        documents=documents,
        batch_size=batch_size,
        tqdm_bar=True,
    ),
    k=10,
)

scores = utils.evaluate(
    scores=scores,
    qrels=qrels,
    queries=queries,
    metrics=["ndcg@10"] + [f"hits@{k}" for k in range(1, 11)],
)

print(scores)

Yield

{
    "ndcg@10": 0.3686831610778578,
    "hits@1": 0.01386748844375963,
    "hits@2": 0.27889060092449924,
    "hits@3": 0.40061633281972264,
    "hits@4": 0.4861325115562404,
    "hits@5": 0.5562403697996918,
    "hits@6": 0.6194144838212635,
    "hits@7": 0.6556240369799692,
    "hits@8": 0.6887519260400616,
    "hits@9": 0.7218798151001541,
    "hits@10": 0.74884437596302,
}

which are good scores, it run in 3 min on mps device. The results you get are due do duplicates queries which are now handled by the evaluation of neural-cherche.

EDIT: sorry I just saw you mention sparse embed a not colbert, running benchmark

@raphaelsty raphaelsty added the question Further information is requested label Jun 7, 2024
@raphaelsty raphaelsty self-assigned this Jun 7, 2024
@raphaelsty
Copy link
Owner

raphaelsty commented Jun 7, 2024

@KAGAII There is definitely something wrong with SparseEmbed right now, we recently updated SparseEmbed but we may need to update it back to the previous version @arthur-75. I'll make an update in the following days

@raphaelsty raphaelsty added the bug Something isn't working label Jun 7, 2024
@yaxundai
Copy link
Author

yaxundai commented Jun 8, 2024

Thank you for your prompt reply, looking forward to the new version!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants