Port of μP to JAX (Haiku specifically) #10528

davisyoshida · 2022-05-02T21:10:17Z

davisyoshida
May 2, 2022

μP is a method for reparameterizing NNs so that hyperparameter optima stay fixed as you scale them up. I wrote a port to JAX here.

As an example, I trained a bunch of transformers on PTB, with and without μP. As you can see, μP causes the optimal learning rate to stay fixed as you scale the models up:

(In the paper, the authors do this experiment but for LMs with 6.7B parameters).

This is specifically for Haiku, but if people are interesting in making versions for FLAX etc., it might be helpful to take a look at this. I had to change the design substantially from the authors' original repo, since a lot of things which are easy in Torch (via mutability) aren't in JAX, and vice versa. I'm also interest in any suggestions on improving the design or increasing usability.

YouJiacheng · 2022-05-03T07:07:40Z

YouJiacheng
May 3, 2022

Great Job!
But Why not implement muP in optax?

6 replies

YouJiacheng May 3, 2022

Yes. I mean that you can implement it in optax, and provide a initializer(per array or initialize the whole pytree at once.)

YouJiacheng May 3, 2022

Besides, you can make the first update of muP optimizer correctly initialize the parameters, though it is a bit tricky.

davisyoshida May 3, 2022
Author

Unfortunately you can't figure out what transformations you need to do from only the tensor shape. You do need to know a little bit about how it's being used to know what dimension is fan-in/fan-out. You could maybe guess from the parameter names, but I preferred to create a custom haiku parameter creator so you can directly check which module is requesting the parameter.

YouJiacheng May 3, 2022

Yes, but that will substantially couple muP library with a NN library.
I preferr to ask user to provide fan-in axis information by a pytree.
And muP library author/community can implement a automatic fan-in axis assigner for each popular NN libraries.
This make muP library portable and extensible.

davisyoshida May 3, 2022
Author

You're right, that's a good idea. I'll take a shot at decoupling them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port of μP to JAX (Haiku specifically) #10528

{{title}}

Replies: 1 comment 6 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Port of μP to JAX (Haiku specifically) #10528

davisyoshida May 2, 2022

Replies: 1 comment · 6 replies

YouJiacheng May 3, 2022

YouJiacheng May 3, 2022

YouJiacheng May 3, 2022

davisyoshida May 3, 2022 Author

YouJiacheng May 3, 2022

davisyoshida May 3, 2022 Author

davisyoshida
May 2, 2022

Replies: 1 comment 6 replies

YouJiacheng
May 3, 2022

davisyoshida May 3, 2022
Author

davisyoshida May 3, 2022
Author