-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about the architecture (graphTransformer) #87
Comments
Copying the answer from #47
your observation is correct. It’s not exactly the standard attention
mechanism. I’ve not thoroughly compared the two, but current code was
written on purpose. The reason for this is that we have to manipulate
features of size (bs, n, n, de) anyway, so using vector attention scores
instead of scalar does not create a strong memory bottleneck. I would be
interesting to investigate this further, though.
Clement
Le mar. 30 janv. 2024 à 14:59, Adrien B ***@***.***> a écrit :
… I was looking at your implementation of attention here :
https://github.com/cvignac/DiGress/blob/main/src/models/transformer_model.py#L158
I have some question about the code :
Q = Q.unsqueeze(2) # (bs, 1, n, n_head, df)K = K.unsqueeze(1) # (bs, n, 1, n head, df)
# Compute unnormalized attentions. Y is (bs, n, n, n_head, df)Y = Q * K
Here I have a question because in the classic attention mecanism we have Y
which have a dimension of (bs, n, n, n_head) not feature specific. I don't
know if this what the author wanted (this is not proper outer product this
is element wise multiplication).
Also a few line after we have :
attn = masked_softmax(Y, softmax_mask, dim=2) # bs, n, n, n_headprint("attn.shape : ", attn.shape) # i add this
As the attention shape I obtain (bs, n, n, n_head, df) dimension (contrary
to the comment).
The code is not really implementing "real" graph transformer attention
like other code like :
https://docs.dgl.ai/_modules/dgl/nn/pytorch/gt/egt.html#EGTLayer
But as your code give me better results than the one above (with a proper
attention mecanism) I wonder if this is not something that the authors made
intentionnally.
—
Reply to this email directly, view it on GitHub
<#87>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEJOOTTRSBNRAKC3ZHJCDQ3YREDDPAVCNFSM6AAAAABCRM4D6CVHI2DSMVQWIX3LMV43ASLTON2WKOZSGEYDQMBWHEZTQNA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
*Clément Vignac*
|
I am doing some experiment on my own graph dataset. Your implementation seems to be more performant that the standard graph transformer (at least the one I tried from DGL library). Yours clearly achieve to generate more plausible edges. Your implementation is equivalent of having a classic graph transformer but with as many head as original dimension, so you ends up having heads of only one dimension (I mean if df = 1 you will obtain the same results). |
I was looking at your implementation of attention here :
https://github.com/cvignac/DiGress/blob/main/src/models/transformer_model.py#L158
I have some question about the code :
Here I have a question because in the classic attention mecanism we have Y which have a dimension of (bs, n, n, n_head) not feature specific. I don't know if this what the author wanted (this is not proper outer product this is element wise multiplication).
Also a few line after we have :
As the attention shape I obtain (bs, n, n, n_head, df) dimension (contrary to the comment).
The code is not really implementing "real" graph transformer attention like other code like :
https://docs.dgl.ai/_modules/dgl/nn/pytorch/gt/egt.html#EGTLayer
But as your code give me better results than the one above (with a proper attention mecanism) I wonder if this is something that the authors made intentionnally.
The text was updated successfully, but these errors were encountered: