Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add configurable unique layer init, clean up lr and loss display #64

Merged
merged 5 commits into from
Feb 22, 2024

Conversation

lessw2020
Copy link
Contributor

@lessw2020 lessw2020 commented Feb 16, 2024

Small PR:
1 - add configurable init style in model_args - 'use_unique_init' will use the layer_id in the init stddev denom, otherwise uses the original init style of total layer count. (verified both work on 7B llama...not clear yet if one is better vs other).

2 - clean up lr and loss display formatting - lr display was spanning out to 12+ digits which isn't that informative, and was wrapped in list format. This PR rounds it to max of 8 digits precision and removes the []'s that were around the lr rate display.
(note this is purely UI...the full float precision is still used in actual lr calcs).

3 - clean up loss display - rounds the loss display to 4 digits precision to make it more readable and informative.
previously:
Screenshot 2024-02-16 at 2 33 34 PM

Now:
Screenshot 2024-02-16 at 2 51 53 PM

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 16, 2024
@lessw2020 lessw2020 changed the title add configurable unique layer init, clean up lr display add configurable unique layer init, clean up lr and loss display Feb 16, 2024
Copy link
Contributor

@wanchaol wanchaol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, one nit

torchtrain/models/llama/model.py Outdated Show resolved Hide resolved
@lessw2020 lessw2020 merged commit 78878f5 into pytorch:main Feb 22, 2024
3 checks passed
@lessw2020 lessw2020 deleted the configurable_init branch February 22, 2024 21:58
lessw2020 added a commit that referenced this pull request Apr 18, 2024
Small PR:
1 - add configurable init style in model_args - 'use_unique_init' will
use the layer_id in the init stddev denom, otherwise uses the original
init style of total layer count. (verified both work on 7B llama...not
clear yet if one is better vs other).

2 - clean up lr and loss display formatting - lr display was spanning
out to 12+ digits which isn't that informative, and was wrapped in list
format. This PR rounds it to max of 8 digits precision and removes the
[]'s that were around the lr rate display.
(note this is purely UI...the full float precision is still used in
actual lr calcs).

3 - clean up loss display - rounds the loss display to 4 digits
precision to make it more readable and informative.
previously:
<img width="1198" alt="Screenshot 2024-02-16 at 2 33 34 PM"
src="https://github.com/pytorch-labs/torchtrain/assets/46302957/77733af0-42db-4fab-a047-fccc7d404278">

Now:
<img width="1063" alt="Screenshot 2024-02-16 at 2 51 53 PM"
src="https://github.com/pytorch-labs/torchtrain/assets/46302957/4eb75b98-67f4-41ec-83d8-dd84a0e8b29e">
philippguevorguian pushed a commit to YerevaNN/YNNtitan that referenced this pull request Aug 17, 2024
…orch#64)

Small PR:
1 - add configurable init style in model_args - 'use_unique_init' will
use the layer_id in the init stddev denom, otherwise uses the original
init style of total layer count. (verified both work on 7B llama...not
clear yet if one is better vs other).

2 - clean up lr and loss display formatting - lr display was spanning
out to 12+ digits which isn't that informative, and was wrapped in list
format. This PR rounds it to max of 8 digits precision and removes the
[]'s that were around the lr rate display.
(note this is purely UI...the full float precision is still used in
actual lr calcs).

3 - clean up loss display - rounds the loss display to 4 digits
precision to make it more readable and informative.
previously:
<img width="1198" alt="Screenshot 2024-02-16 at 2 33 34 PM"
src="https://github.com/pytorch-labs/torchtrain/assets/46302957/77733af0-42db-4fab-a047-fccc7d404278">

Now:
<img width="1063" alt="Screenshot 2024-02-16 at 2 51 53 PM"
src="https://github.com/pytorch-labs/torchtrain/assets/46302957/4eb75b98-67f4-41ec-83d8-dd84a0e8b29e">
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants