-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why is only dino_vits8 supported? #13
Comments
+1 |
@juancamilog I read the code again and find out that this may be because the authors have only tried the head_idxs = [0, 2, 4, 5] for vits. Since vit-b/l/g have different number of heads, the suitable head_idxs also should be found. |
Hi! The saliency maps used in the co-segmentation, part co-segmentation and correspondences examples is acquired by aggregating heads 0, 2, 4, 5 of Dino_vit8s. We removed heads 1 and 3 as they empirically attended bg areas. It is also possible to change the code to use different Dino_vits aggregating all heads, but would require adjusting some of the hyper parameters in each application. LMK if you have further questions 🙏 |
Thks for your reply. If using vit models with more heads, the 0,2,4,5 idx can keep the same? I'm not sure the larger model's head will get the similar attention result. By the way, I have tried using dinov2's weight, but find out the result is even worse. It seems that the patch_size will significantly influence the foreground seg results. The smaller the patch_size, the better the result can be get. Do you have any ideas about using dinov2's feature? Because I only found DINO_vit14 pretrained model in their repo. |
@RickyYXY did you get it working for vitb? |
In the examples, if you change the model_type to anything other than
dino_vits8
the code crashes because of an assert inViTExtractor.extract_saliency_maps
. What needs to change to properly support other model types?The text was updated successfully, but these errors were encountered: