add configurable unique layer init, clean up lr and loss display #64

lessw2020 · 2024-02-16T22:34:55Z

Small PR:
1 - add configurable init style in model_args - 'use_unique_init' will use the layer_id in the init stddev denom, otherwise uses the original init style of total layer count. (verified both work on 7B llama...not clear yet if one is better vs other).

2 - clean up lr and loss display formatting - lr display was spanning out to 12+ digits which isn't that informative, and was wrapped in list format. This PR rounds it to max of 8 digits precision and removes the []'s that were around the lr rate display.
(note this is purely UI...the full float precision is still used in actual lr calcs).

3 - clean up loss display - rounds the loss display to 4 digits precision to make it more readable and informative.
previously:

Now:

wanchaol

lgtm, one nit

torchtrain/models/llama/model.py

Small PR: 1 - add configurable init style in model_args - 'use_unique_init' will use the layer_id in the init stddev denom, otherwise uses the original init style of total layer count. (verified both work on 7B llama...not clear yet if one is better vs other). 2 - clean up lr and loss display formatting - lr display was spanning out to 12+ digits which isn't that informative, and was wrapped in list format. This PR rounds it to max of 8 digits precision and removes the []'s that were around the lr rate display. (note this is purely UI...the full float precision is still used in actual lr calcs). 3 - clean up loss display - rounds the loss display to 4 digits precision to make it more readable and informative. previously: <img width="1198" alt="Screenshot 2024-02-16 at 2 33 34 PM" src="https://github.com/pytorch-labs/torchtrain/assets/46302957/77733af0-42db-4fab-a047-fccc7d404278"> Now: <img width="1063" alt="Screenshot 2024-02-16 at 2 51 53 PM" src="https://github.com/pytorch-labs/torchtrain/assets/46302957/4eb75b98-67f4-41ec-83d8-dd84a0e8b29e">

…orch#64) Small PR: 1 - add configurable init style in model_args - 'use_unique_init' will use the layer_id in the init stddev denom, otherwise uses the original init style of total layer count. (verified both work on 7B llama...not clear yet if one is better vs other). 2 - clean up lr and loss display formatting - lr display was spanning out to 12+ digits which isn't that informative, and was wrapped in list format. This PR rounds it to max of 8 digits precision and removes the []'s that were around the lr rate display. (note this is purely UI...the full float precision is still used in actual lr calcs). 3 - clean up loss display - rounds the loss display to 4 digits precision to make it more readable and informative. previously: <img width="1198" alt="Screenshot 2024-02-16 at 2 33 34 PM" src="https://github.com/pytorch-labs/torchtrain/assets/46302957/77733af0-42db-4fab-a047-fccc7d404278"> Now: <img width="1063" alt="Screenshot 2024-02-16 at 2 51 53 PM" src="https://github.com/pytorch-labs/torchtrain/assets/46302957/4eb75b98-67f4-41ec-83d8-dd84a0e8b29e">

unique layer init, clean up lr display

267c6b7

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 16, 2024

lessw2020 added 2 commits February 16, 2024 14:38

lint fixes

36ceea5

fix line too long lint, clean up current loss display

ae49794

lessw2020 changed the title ~~add configurable unique layer init, clean up lr display~~ add configurable unique layer init, clean up lr and loss display Feb 16, 2024

lessw2020 requested review from wanchaol and tianyu-l February 16, 2024 22:55

wanchaol approved these changes Feb 21, 2024

View reviewed changes

torchtrain/models/llama/model.py Outdated Show resolved Hide resolved

tianyu-l reviewed Feb 21, 2024

View reviewed changes

torchtrain/models/llama/model.py Outdated Show resolved Hide resolved

lessw2020 added 2 commits February 22, 2024 12:57

Merge branch 'pytorch-labs:main' into configurable_init

c7b7bc9

change name to depth_init

91038a7

lessw2020 merged commit 78878f5 into pytorch:main Feb 22, 2024
3 checks passed

lessw2020 deleted the configurable_init branch February 22, 2024 21:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add configurable unique layer init, clean up lr and loss display #64

add configurable unique layer init, clean up lr and loss display #64

lessw2020 commented Feb 16, 2024 •

edited

Loading

wanchaol left a comment

add configurable unique layer init, clean up lr and loss display #64

add configurable unique layer init, clean up lr and loss display #64

Conversation

lessw2020 commented Feb 16, 2024 • edited Loading

wanchaol left a comment

Choose a reason for hiding this comment

lessw2020 commented Feb 16, 2024 •

edited

Loading