We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
想了解一下有考虑过用pointnet之类的学到的point cloud global feature加上全连接层做finetune然后与clip中的text feature比较这样的尝试吗,还是说因为clip中image encoder和text encoder学到的特征是对齐的,所以直接考虑了2d depth maps projection的思路,如果有考虑过前者的话,是效果不好吗?
The text was updated successfully, but these errors were encountered:
对的,因为clip的image encoder已经通过预训练和text encoder形成了很好的对应关系,所以可以直接进行zero-shot分类;如果使用pointnet等3d网络,需要一段额外的训练步骤,使得pointnet和clip的text encoder相对应,我们进行过尝试,这样会伤害网络的transfer能力,并且不再是3d数据上的zero-shot分类了。
Sorry, something went wrong.
No branches or pull requests
想了解一下有考虑过用pointnet之类的学到的point cloud global feature加上全连接层做finetune然后与clip中的text feature比较这样的尝试吗,还是说因为clip中image encoder和text encoder学到的特征是对齐的,所以直接考虑了2d depth maps projection的思路,如果有考虑过前者的话,是效果不好吗?
The text was updated successfully, but these errors were encountered: