Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the architecture (graphTransformer) #87

Open
Forbu opened this issue Jan 30, 2024 · 2 comments
Open

Question about the architecture (graphTransformer) #87

Forbu opened this issue Jan 30, 2024 · 2 comments

Comments

@Forbu
Copy link

Forbu commented Jan 30, 2024

I was looking at your implementation of attention here :
https://github.com/cvignac/DiGress/blob/main/src/models/transformer_model.py#L158

I have some question about the code :

Q = Q.unsqueeze(2)  # (bs, 1, n, n_head, df)
K = K.unsqueeze(1)  # (bs, n, 1, n head, df)

# Compute unnormalized attentions. Y is (bs, n, n, n_head, df)
Y = Q * K

Here I have a question because in the classic attention mecanism we have Y which have a dimension of (bs, n, n, n_head) not feature specific. I don't know if this what the author wanted (this is not proper outer product this is element wise multiplication).

Also a few line after we have :

attn = masked_softmax(Y, softmax_mask, dim=2)  # bs, n, n, n_head
print("attn.shape : ", attn.shape) # i add this

As the attention shape I obtain (bs, n, n, n_head, df) dimension (contrary to the comment).
The code is not really implementing "real" graph transformer attention like other code like :
https://docs.dgl.ai/_modules/dgl/nn/pytorch/gt/egt.html#EGTLayer

But as your code give me better results than the one above (with a proper attention mecanism) I wonder if this is something that the authors made intentionnally.

@cvignac
Copy link
Owner

cvignac commented Jan 30, 2024 via email

@Forbu
Copy link
Author

Forbu commented Jan 30, 2024

I am doing some experiment on my own graph dataset. Your implementation seems to be more performant that the standard graph transformer (at least the one I tried from DGL library). Yours clearly achieve to generate more plausible edges.
I am doing more experiements to confirm this (I currently only have "visual" clues and noisy loss curves to back this affirmation).

Your implementation is equivalent of having a classic graph transformer but with as many head as original dimension, so you ends up having heads of only one dimension (I mean if df = 1 you will obtain the same results).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants