Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training Fast Weight Programmers by backpropagating through the Delta Rule #9

Open
wants to merge 71 commits into
base: main
Choose a base branch
from

Conversation

proger
Copy link
Owner

@proger proger commented Aug 20, 2024

Introducing a kernel for training a fast weight programmer by backpropagating through the delta rule (online linear regression) with @ischlag.

Improving on top of first order recurrence with scalar hidden state, this kernel uses vector-valued updates like the transformer, allowing use of matrix multiplication hardware, and avoiding saturation of capacity of the fast weights network, thanks to the delta rule.

The implementation uses chunking provided by @sustcsonglin's equation 6 and currently excels at the head dimension of 32, perfectly fitting into the registers of a 3090 warp. The code for tensor cores uses ThunderKittens which enable effortless WMMA.

proger added 30 commits June 16, 2024 17:12
ptxas info    : 19 bytes gmem
ptxas info    : Compiling entry function '_Z20causal_attend_kernelIN3c108BFloat16ELi4ELi2ELi4EEviPKT_S4_S4_PKfPS2_' for 'sm_86'
ptxas info    : Function properties for _Z20causal_attend_kernelIN3c108BFloat16ELi4ELi2ELi4EEviPKT_S4_S4_PKfPS2_
    72 bytes stack frame, 64 bytes spill stores, 116 bytes spill loads
ptxas info    : Used 255 registers, 400 bytes cmem[0]
@proger proger changed the title Delta Rule Fast Weight Programmers Training Fast Weight Programmers through Delta RUle Aug 20, 2024
@proger proger changed the title Training Fast Weight Programmers through Delta RUle Training Fast Weight Programmers through Delta Rule Aug 20, 2024
@proger proger changed the title Training Fast Weight Programmers through Delta Rule Training Fast Weight Programmers by backpropagating through the Delta Rule Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant