-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: add bge-m3 ModelMeta
#1821
Conversation
# https://huggingface.co/BAAI/bge-m3/discussions/29 | ||
bgem3_languages = [ | ||
"afr_Latn", # af | ||
# als |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit unsure why these are commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've taken these language codes from the discussion, but I can't find them in the language mapping or I'm not sure which ones they correspond to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oooh okay ChatGPT usually does a remarkable job at matching these, there is also a Python library that can do this for you, wait a sec I'll find it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we have not all langs in LANG_MAPPING
mteb/mteb/evaluation/LangMapping.py
Line 5 in d7a7791
LANG_MAPPING = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. Semes that LANG_MAPPING
used only in MTEB class. I think this should be removed in v2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm yea interesting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find an easy way to get script from the language, so I'll leave it as is for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, again, I think LLMs can be a good friend in doing that. If you have the name of the language you're also probably a google search away from the solution. And most languages use Latin, Arab or Cyrillic script anyway, so there are some sensible defaults to go with.
Looks great otherwise, feel free to merge! |
ref #1803
Checklist
make test
.make lint
.