[bug] Incorrect exported config #102

RaymondLi0 · 2025-01-02T20:07:19Z

🐞 Describe the Bug

Exporting a model where num_heads * kv_channels != hidden_size will raise an error like:
RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM: size mismatch for model.layers.0.self_attn.q_proj.weight: copying a param with shape torch.Size([3072, 4096]) from checkpoint, the shape in current model is torch.Size([4080, 4096]). size mismatch for model.layers.0.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([1360, 4096]).
when loading that model with hf tranformers.
The exported configuration misses the head_dim field. (so it is set by default to hidden_size / num_heads)

cc @nitsanluke

We should handle the kv_channels->head_dim field when exporting configs.

🔄 Steps to Reproduce

Export a model with num_heads * kv_channels != hidden_size, reload it with hf transformers.

🎯 Expected Behavior

Re-load the exported model without error.

The text was updated successfully, but these errors were encountered:

jlamypoirier · 2025-01-15T17:08:11Z

Was this addressed?

RaymondLi0 added the bug Something isn't working label Jan 2, 2025

RaymondLi0 changed the title ~~[bug] Brief description of the issue~~ [bug] Incorrect exported config Jan 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug] Incorrect exported config #102

[bug] Incorrect exported config #102

RaymondLi0 commented Jan 2, 2025 •

edited

Loading

jlamypoirier commented Jan 15, 2025

[bug] Incorrect exported config #102

[bug] Incorrect exported config #102

Comments

RaymondLi0 commented Jan 2, 2025 • edited Loading

🐞 Describe the Bug

🔄 Steps to Reproduce

🎯 Expected Behavior

jlamypoirier commented Jan 15, 2025

RaymondLi0 commented Jan 2, 2025 •

edited

Loading