Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Vision Transformer from ScaleMAE #422

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft

Add Vision Transformer from ScaleMAE #422

wants to merge 5 commits into from

Conversation

anwai98
Copy link
Collaborator

@anwai98 anwai98 commented Nov 29, 2024

This PR adds the vision transformer integration from ScaleMAE and plugs it into the UNETR backbone (thanks to @Mareike79 for the implementation)

A few details to share with @Mareike79:

  • In our previous setup, the dimension mismatch came from the issue that the outputs from the attention heads were flattened. They need to be restored to the target patch shape. This takes care of the issue we discussed.
  • In addition, the model loading with your setup did not exactly work. Also, there were some parts of the pretrained model which are necessary to be dropped out (eg. decoder-related parameters, FPN and FCN heads, etc.)
  • I added the configurations for vit_b, vit_l and vit_h (from what I understand, we have pretrained the model on vit_b and would be interested to observe the downstream task now)
  • I disconnected the vision transformer class from the ScaleMAE repo because they do not have installation support for their module, which makes it difficult to fetch submodules from their original repo (eg. functions like CustomCompose, get_2d_sincos_pos_embed_with_resolution, etc).

PS. I'll leave this PR open for now. We work on this branch at the moment, and merge it later if everything works as desired.

PPS. This is still a work-in-progress.

cc: @constantinpape

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant