-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate bolts + torch hub #442
Comments
well, we can set the Bolts as models to register, but still, for getting weights we need some heavy GPU machines... |
Can we do vice-versa too ?
I would be highly interested in implementing such feature. |
hi, With best regards, |
Great! Let's sync up also with the Bolts refactoring =) |
Just for information, currently there is a refactor in torchvision.models going on available in prototype folder. So the API with the hub might change. Edit: Let me know if I can help. P.S. A book on PyTorch Lightning will be out end of this year! |
I will start working on this. :) With best regards, |
Hi, With best regards, |
Torch hub allows you to load the model, but you need to do model surgery for specifying number of classes, etc. I have an example for DeTR. We can load the detr backbone, but need to adjust the head classifier for own number of classes. Similarly for CNNs, One need to load the backbone, modify the head classifier for custom num_classes. You need to freeze / unfreeze layers while transfer learning and fine-tuning. We can think about this little bit more, this is something Flash does well I think. |
ok, thank you @oke-aditya |
Since a single PR will not be a solution. |
ok thank you @oke-aditya |
hi, from torchvision.models import googlenet
model = googlenet().to(device)
print(model) # Prints the model architecture
model.fc = Linear(1000,len(classes)) then use the model as usual. |
Yes, you can and this is correct way, But note that fc layer is applicable for GoogleNet and Resnet, for models like mobilenet it is called |
hi, preds = self.tl_model(X)
preds = self.output(preds) I don't know if this is the best way but when I am testing TL Models I use this |
Hi ! Best way is to Thanks for asking |
hi, For freezing layers and for fine-tuning
So what are the feature I need to create? I am sorry for asking this many questions. Thank you. |
Ok so let me elaborate a bit more. Let me explain the transfer learning scenarios. These examples are written for CNNs, but kind-of generalize over other models too. Note that when we are doing transfer learning, it means we are using the pre-trained weights. Hence First two scenarios are clearly well described in Transfer learning Tutorial. (Great one by @chsasank.
This is most simple approach, we aren't freezing the backbone. Refer here in the tutorial model = resnet50(pretrained=True)
in_features = model.fc.in_features
model.fc = nn.Linear(in_features, num_classes) Simply train the model. We train each and every parameter, with just the difference being that we have num_classes instead of 1000.
Refer here in the tutorial This is what you tried above. Here we are interested in only training the classification model = resnet50(pretrained=True)
# Freeze all the parameters.
for param in model.parameters():
param.requires_grad = False
# Unfreeze the head.
# This simply replaces the head with num_classes
in_features = model.fc.in_features
model.fc = nn.Linear(in_features, num_classes)
# You may prefer to add an extra fully connected layer, but that isn't needed in most cases.
# Left to you, many don't prefer, as it can cause large increase in parameters.
# This would work well if you have BERT / millions of params in the backbone and adding a few hundreds of params in the
# head of the model won't make big difference.
# Basically no of params with pre-trained weights >>> number of fully connected params.
# Adding extra fc to head
in_features = model.fc.in_features
model.fc = nn.Sequential(
[
nn.Linear(in_features, hidden_params),
# Many prefer dropout in between to avoid over-fitting
# nn.Dropout(0.2)
nn.Linear(hidden_params, num_classes)
]
)
This is where Fine-tuning comes into play, we really want to make most of every block of network. You can first freeze the backbone and train the head with Strategy 2 This can be trained for a few epochs. with a decent learning rate of 1e-3 Here is the second training training routine. Now you want to freeze each block / specific blocks, say 5 of last Conv layers (Or residual block) in ResNet. You may unfreeze more blocks / probably stop here. It is very much left to you. I don't know if there is any other way of transfer learning, (I haven't seen any other approach), These work well in practice. P.S. First of all, my appreciation to you! You are very young developer (I guess 14), and I'm super excited that you know so much stuff at such a tender age! At your age I was probably more interested in knowing how to install anti-virus and knew nothing about coding. (forget GitHub account, I didn't even know the word GitHub) |
OK thank you I can understand the issue now. Thank you very much @oke-aditya |
hi, Again I am really sorry for asking this many questions but I am not understanding this correctly. So what I need to implement is
I need to implement the above features in lightning bolts in an easier way. Is my understanding correct? If not what are the specific thing I need to work on or implement. With best regards, |
No description provided.
The text was updated successfully, but these errors were encountered: