-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrating FollowIR #321
Comments
Thanks for adding me to this. I haven't looked much at the reranking code, but generally, we have discussed allowing task-dependent encode functions using additional arguments (some more discussion on this in #215). Something like: def encode(
sentences: list[str],
encode_type: Literal["queries", "corpus"] | None # currently formalized in encode_corpus and encode_queries
language: list[str] | None, # e.g. required for bitext mining. None can e.g. solve cases where language is unknown (e.g. lang clf)
task: str # or some kind of metadata object - this allows e.g. embedding differently depending on clustering, retrieval etc.. It also allows for prompt-based encoding.
) Generally, this addition seems to require multiple PRs, so I would make sure to make them smaller adding one aspect at a time. |
Thanks @KennethEnevoldsen and @Muennighoff!
I was planning to make a leaderboard, so integrating into MTEB works great for me, less maintenance work. Do you have any pointers on how to do that - does the legal one have a PR somewhere?
Do you have suggestions for the best way to implement this (both @KennethEnevoldsen and @Muennighoff)? I made it an alternative to DRESModel, does that seem reasonable to you or did you have another suggestion? It seems like we just need one encode function, so I suppose we might be able to just re-use the existing class actually. We'd just need some flag to give the encode function which portions to rerank. Re: which results to rerank, I'm thinking we either allow that as an option to be provided and/or make the default a BM25 run on them (which we can host the a BM25 top ranking dataset on Huggingface, like the InPars people did with BEIR). We can either make this an option to the
These are interesting issues, thanks for pointing them out! I agree they support the "bi-encoder models do matching only" hypothesis! And definitely agree about making a bi-encoder model that can follow instructions, that was my on my top next todo projects! I had some thoughts both in terms of data and architectures for that could work (definitely inspired from GritLM). Maybe a separate thread we can talk about. |
Yes, this was the leaderboard PR https://huggingface.co/spaces/mteb/leaderboard/discussions/90, this the code PR #311. We also need a results PR, e.g. like this https://huggingface.co/datasets/mteb/results/discussions/27/files. Happy to help, e.g. I can do the LB one if you find it too complex.
Sounds reasonable to me but curious what @KennethEnevoldsen thinks, too
I'd allow it as an option. The whole snippet here mteb/mteb/abstasks/AbsTaskRetrieval.py Line 282 in 5bd11fb
Cool! Happy to help if I can be useful - maybe I can help with compute if you plan to scale up, as a bi-encoder may require more compute to train. |
Without knowing the exact implementation, we would want to keep our models as clear as possible. Atm. I believe we only require models with an encode method and optionally allow for the encode_queries or encode_corpus. Instead of replacing DRESModel, it might be better to consider an extension to the model if that makes sense. E.g. either if a model has method X we will use that instead or pass in an argument to one of the related functions which the model can use if they wish (e.g. see outline above). |
Thanks so much for the links @Muennighoff! Will add a PR for the leaderboard and ask questions about it in a separate thread.
Works for me! The only downside I see is that BM25 is not current implemented in MTEB and it's a common first-stage model, but that's a fixable problem :)
I like that approach, thanks @KennethEnevoldsen! I'll start with a PR on FollowIR, then do a separate one for rerankers. |
Thanks for both of your help! It's almost all integrated now! Two questions about the last step, of the leaderboard:
|
@KennethEnevoldsen what do you think about adding these instructions to MTEB? Maybe something like |
Sound great @Muennighoff, I would just add a link to the readme as well (similar to |
Closing this one as done via Feel free to reopen @orionw if this is premature |
Moving our discussion on integrationg the cool FollowIR work here @orionw so others can chime in if they have thoughts :) Especially cc'ing @KennethEnevoldsen who knows MTEB internals better than me at this point!
@orionw
I'll also add a PR for the instructions abstract task of MTEB - part of which involves adding rerankers to MTEB, which could warrant a more technical discussion about design (but perhaps best done when the PR is there so it's easy to reference).
@Muennighoff
Had a quick look and it seems to me that Abstaskinstructionretrieval.py may be better handled on the user side similar to what's done in gritlm, where instructions are defined here https://github.com/ContextualAI/gritlm/blob/main/evaluation/eval_mteb.py#L10 & simply added to the text in the encode function https://github.com/ContextualAI/gritlm/blob/28d932b8703576736d6dd64d7cc7130895b2c49f/evaluation/eval_mteb.py#L1199 / https://github.com/ContextualAI/gritlm/blob/28d932b8703576736d6dd64d7cc7130895b2c49f/gritlm/gritlm.py#L117
But I may be missing something, feel free to either open a PR / issue / discussion for more technical discussion, whichever you prefer!
@orionw:
The prefixes (which E5 and others refer to as "instructions") should definitely be in the model section, like you're suggesting. However, the instructions in FollowIR are query-specific and so they come from the data - it's comparable to instructions for normal LLMs, where they are user-given and specific to the instance, as opposed to the E5-like instructions which are dataset-level model prefixes applied to all inputs of that task/dataset. Because of that, they have to come from the data side (the abstract task) rather than from the model side.
Two overarching questions actually, before I make the PR:
@Muennighoff
Yes you're right, sorry! And we can also not just put them in the dataset and reuse everything as is, as some models treat instructions differently where they don't use the embedding over the instruction. In that case, your implementation makes the most sense. While it'd be great to add instructions in a general way that extends to any embedding task without needing AbsTaskInstructionClustering , AbsTaskInstructionClassification, I think that the highest priority for MTEB should be making it as easy to use & extend as possible. Thus similar to how models are duplicated in hf/transformers, it's fine to duplicate code for ease of use/extension!
If you want, we can also add a new leaderboard tab for FollowIR! It would likely be under the Retrieval section: https://huggingface.co/spaces/mteb/leaderboard where we also just added "Law" ; I think this will give your great work more visibility as the leaderboard is viewed >100K times per month.
The text was updated successfully, but these errors were encountered: