Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How did you train SparseEmbed? #31

Open
richardklafter opened this issue May 6, 2024 · 2 comments
Open

How did you train SparseEmbed? #31

richardklafter opened this issue May 6, 2024 · 2 comments
Labels
question Further information is requested

Comments

@richardklafter
Copy link

First, awesome project!

How did you train your model at https://huggingface.co/raphaelsty/neural-cherche-sparse-embed? Did you train it from scratch? I found an old copy of your sparsembed library. Was that library used or was this repository? What data did your train on exactly?

I am surveying various sparse embedding models and SparseEmbed while interesting has very little code or docs beyond the original google paper. Any assistance would be appreciated. Thanks!

@raphaelsty
Copy link
Owner

Hi @richardklafter, I used the msmarco dataset, trained using neural-search without negative samples. The checkpoint is either a distilbert or a co-condenser, don't remember. I trained the model with a single GPU on google colab, I wasn't aiming for extraordinary accuracy. I'm sure it's easy to do better.

@raphaelsty raphaelsty added the question Further information is requested label May 12, 2024
@richardklafter
Copy link
Author

Thanks for letting me know! Feel free to close this.

But, if you want to give me more specifics, a notebook or something, I would happily run it on an H100 and give you results. I am curious where this lands given I could find very few public implementations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants