Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] Optional Attention Mask to Prevent Cross-Document Attention in Sequences #62

Open
tscholak opened this issue Nov 22, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@tscholak
Copy link
Collaborator

🧐 Problem Description

Currently, Fast-LLM allows self-attention across all tokens within a sequence, including tokens from different documents separated by EOS tokens. However, it was found that document-level isolation in attention can be beneficial. While models could potentially learn to ignore tokens across boundaries, this behaviour is neither guaranteed nor efficient.

💡 Proposed Solution

Implement an optional attention mask that prevents self-attention between different documents within a sequence.

For example, the Llama 3 paper found that such masking had limited impact during standard pre-training but proved crucial in continued pre-training on very long sequences.

🔄 Alternatives Considered

Just rely on the model to learn when and when not to attend across document boundaries, which is what is happening currently.

📈 Potential Benefits

  • Improved Performance: Preventing unnecessary attention across unrelated documents could lead to more efficient learning during training and better inference for document-separated tasks.
  • Enhanced Flexibility: By making this masking optional, users can tailor attention behavior to specific tasks or datasets.
  • Support for Long-Sequence Training: As noted in the Llama 3 paper, this feature could become essential in scenarios involving very long sequences or continued pre-training.

📝 Additional Context

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant