-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why use low chunksizes? #39
Comments
The chunk size does not affect empirical results. Use the highest one that works for you! The higher it is the faster the training. A few other factors affect the RAM like model size and sequence length, I think I was bottlenecked by one of them and hence had to go very low in chunk size. |
Thanks Niklas, Had a quick question as well, I see you used a bunch of different LRs, what LR did you find to be the best? Did you also schedule the LRs in any way? |
I didn't experiment extensively with the LRs - I think it's based on SentenceTransformer defaults. It automatically uses a sgpt/biencoder/nli_msmarco/sentence-transformers/sentence_transformers/SentenceTransformer.py Line 616 in 9728de4
|
Hi!
I saw that you have used lower chunksizes (2-4) in training of models, may I know why? I am sure 40GB of RAM in a GPU can handle more? Does it give better empirical results?
Thanks!
The text was updated successfully, but these errors were encountered: