Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Added way more training dataset annotations #1765

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

KennethEnevoldsen
Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen commented Jan 11, 2025

adresses #1720

Checklist

  • Run tests locally to make sure nothing is broken using make test.
  • Run the formatter to format the code using make lint.

Copy link
Collaborator

@x-tabdeveloping x-tabdeveloping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, I had a couple of comments

mteb/models/model2vec_models.py Outdated Show resolved Hide resolved
mteb/models/sentence_transformers_models.py Outdated Show resolved Hide resolved
mteb/models/sentence_transformers_models.py Outdated Show resolved Hide resolved
mteb/models/sentence_transformers_models.py Show resolved Hide resolved
@x-tabdeveloping
Copy link
Collaborator

Also let's make sure tests pass

@isaac-chung
Copy link
Collaborator

Re: tests, please merge main once #1775 has been merged.

* fix: update max tokens for OpenAI (#1772)

update max tokens

* ci: skip AfriSentiLID for now (#1785)

* skip AfriSentiLID for now

* skip relevant test case instead

---------

Co-authored-by: Isaac Chung <[email protected]>

* 1.28.7

Automatically generated by python-semantic-release

* ci: fix model loading test (#1775)

* pass base branch into the make command as an arg

* test a file that has custom wrapper

* what about overview

* just dont check overview

* revert instance check

* explicitly omit overview and init

* remove test change

* try on a lot of models

* revert test model file

---------

Co-authored-by: Isaac Chung <[email protected]>

* feat: Update task filtering, fixing bug which included cross-lingual tasks in overly many benchmarks (#1787)

* feat: Update task filtering, fixing bug on MTEB

- Updated task filtering adding exclusive_language_filter and hf_subset
- fix bug in MTEB where cross-lingual splits were included
- added missing language filtering to MTEB(europe, beta) and MTEB(indic, beta)

The following code outlines the problems:

```py
import mteb
from mteb.benchmarks import MTEB_ENG_CLASSIC

task = [t for t in MTEB_ENG_CLASSIC.tasks if t.metadata.name == "STS22"][0]
# was eq. to:
task = mteb.get_task("STS22", languages=["eng"])
task.hf_subsets
# correct filtering to English datasets:
# ['en', 'de-en', 'es-en', 'pl-en', 'zh-en']
# However it should be:
# ['en']

# with the changes it is:
task = [t for t in MTEB_ENG_CLASSIC.tasks if t.metadata.name == "STS22"][0]
task.hf_subsets
# ['en']
# eq. to
task = mteb.get_task("STS22", hf_subsets=["en"])
# which you can also obtain using the exclusive_language_filter (though not if there was multiple english splits):
task = mteb.get_task("STS22", languages=["eng"], exclusive_language_filter=True)
```

* format

* remove "en-ext" from AmazonCounterfactualClassification

* fixed mteb(deu)

* fix: simplify in a few areas

* fix: Add gritlm

* 1.29.0

Automatically generated by python-semantic-release

* fix: Added more annotations!

* fix: Added C-MTEB (#1786)

Added C-MTEB

* 1.29.1

Automatically generated by python-semantic-release

* docs: Add contact to MMTEB benchmarks (#1796)

* Add myself to MMTEB benchmarks
* lint

* fix: loading pre 11 (#1798)

* fix loading pre 11

* add similarity

* lint

* run all task types

* 1.29.2

Automatically generated by python-semantic-release

* fix: allow to load no revision available (#1801)

* fix allow to load no revision available

* lint

* add require_model_meta to leaderboard

* lint

* 1.29.3

Automatically generated by python-semantic-release

---------

Co-authored-by: Roman Solomatin <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
Co-authored-by: github-actions <[email protected]>
Co-authored-by: Márton Kardos <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants