You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Exporting a model where num_heads * kv_channels != hidden_size will raise an error like: RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM: size mismatch for model.layers.0.self_attn.q_proj.weight: copying a param with shape torch.Size([3072, 4096]) from checkpoint, the shape in current model is torch.Size([4080, 4096]). size mismatch for model.layers.0.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([1360, 4096]).
when loading that model with hf tranformers.
The exported configuration misses the head_dim field. (so it is set by default to hidden_size / num_heads)
🐞 Describe the Bug
Exporting a model where
num_heads * kv_channels != hidden_size
will raise an error like:RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM: size mismatch for model.layers.0.self_attn.q_proj.weight: copying a param with shape torch.Size([3072, 4096]) from checkpoint, the shape in current model is torch.Size([4080, 4096]). size mismatch for model.layers.0.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([1360, 4096]).
when loading that model with hf
tranformers
.The exported configuration misses the
head_dim
field. (so it is set by default to hidden_size / num_heads)cc @nitsanluke
We should handle the
kv_channels
->head_dim
field when exporting configs.🔄 Steps to Reproduce
Export a model with
num_heads * kv_channels != hidden_size
, reload it with hftransformers
.🎯 Expected Behavior
Re-load the exported model without error.
The text was updated successfully, but these errors were encountered: