Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproduce results #4

Open
adita15 opened this issue Apr 12, 2021 · 14 comments
Open

Reproduce results #4

adita15 opened this issue Apr 12, 2021 · 14 comments

Comments

@adita15
Copy link

adita15 commented Apr 12, 2021

I am tryin to reproduce these results and I am quite confused w.r.t. the structure. Could you provide detailed setup instructions?

@kamalojasv181
Copy link
Owner

Can you please particularly point out what part confuses you?

@kamalojasv181
Copy link
Owner

I will write a generic workflow. First, you need to get the dataset. We have not posted the dataset here due to the policy of Constraint Shared Task. Get yourself registered with them to get the dataset and put it in the Dataset folder. The dataset must have two columns 1) the data 2) the labels. (we deleted the first row containing the column names and the first column containing serial numbers for each tweet). Now if you want to train the models yourself, make a directory by the name of models and run main_multitask_learning.py or main_bin_classification.py. If you wish to use our models, download them in the models folder in this directory. Now you can write your own script to generate results or use our script at your convenience. For anything specific, feel free to ask.

@adita15
Copy link
Author

adita15 commented Apr 13, 2021 via email

@kamalojasv181
Copy link
Owner

  1. Nope, for training, use training data, for validation, use validation data and generate csv on test data.
  2. Use 10 epochs for all models.
  3. Pass the one you want to generate results for.
  4. Baseline model is the one mentioned in the paper by the workshop organisers (https://arxiv.org/abs/2011.03588). For the auxiliary approach, use the file main_multitask_learning.py . I can see why this might confuse someone. We were naive about the code. For now, use this info, I will update the repo in a day or two.

@adita15
Copy link
Author

adita15 commented Apr 13, 2021 via email

@kamalojasv181
Copy link
Owner

  1. Yes, our best results were obtained on the ai4bharat/indic-bert using the auxiliary approach.

  2. So the binaries released by us are already fine-tuned on the dataset of the workshop. You can either choose to fine-tune the original ai4bharat/indic-bert model on the same dataset and reproduce the models that we have released or just use our released models to directly generate results on the testset. There is no point fine tuning our model on the same dataset.

@kamalojasv181
Copy link
Owner

Anything else? Should I close it?

@adita15
Copy link
Author

adita15 commented Apr 13, 2021 via email

@kamalojasv181
Copy link
Owner

  1. Actually, we did a very sloppy job. We combined the train and valid data(in the CSV), and we accordingly passed the spilt parameter. For now, please bear with us. I have noted this and will fix it very soon.

  2. Can you please elaborate? Are you talking about the baseline paper?

@adita15
Copy link
Author

adita15 commented Apr 13, 2021 via email

@kamalojasv181
Copy link
Owner

Ok. Our bad again!. We actually tried ensembling in the generate csv code, which did not work out for us. This is not the baseline implementation but result generation with ensambling. We forgot to delete the code. Thanks for pointing out.

@adita15
Copy link
Author

adita15 commented Apr 13, 2021 via email

@adita15
Copy link
Author

adita15 commented Apr 14, 2021

I tried fine tuning using your script. I am still not able to reproduce the results. F1 scores lag by 2 digits for all tasks

@siddjags
Copy link

siddjags commented Apr 14, 2021

I am also facing a similar issue. Are we supposed to train the model with batch size 16? The current version of the code is using batch_size=8. Also, the pre-trained models do not give identical results on running generate_csv.py. Could you please help me with this?
FYI I am trying to reproduce results for AUX Indic Bert. Here are the results that were obtained after running main_multitask_learning.py to train/fine-tune the model.

Screen Shot 2021-04-14 at 1 20 34 AM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants