-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Could I fine tune this model for Chinese datasets? #41
Comments
Sure if you want to finetune you can follow some of what is outlined in this issue: #2 For asymmetric search (e.g. retrieval), you can also try https://huggingface.co/bigscience/sgpt-bloom-7b1-msmarco which has seen lots of Chinese during pretraining & might be good enough |
Do many spgt models on Huggingface support Chinese? |
If I want to fine-tune the sgpt model, do I just change the dataset? |
I think only the bloom ones perform well for Chinese. |
Which Chinese dataset should I evaluate the fine-tuned model on? |
I would evaluate on the Chinese datasets in MTEB. Also see embeddings-benchmark/mteb#134 |
Are evaluation indicators also Pearson and Spearman? |
For retrieval datasets its nDCG@10 ; But don't worry about the evaluation - if you use MTEB it takes care of automatically calculating the scores etc. |
Thank you very much! |
what about spanish fine tune? |
Sure you can do that too. https://huggingface.co/bigscience/sgpt-bloom-7b1-msmarco has also seen a lot of Spanish so it may work well for you. |
Could you please tell me how i can fine tune for my custom Chinese datasets?
The text was updated successfully, but these errors were encountered: