We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
`def data_collator(features: list) -> dict:
len_ids = [len(feature["input_ids"]) for feature in features] longest = max(len_ids) input_ids = [] labels_list = [] for ids_l, feature in sorted(zip(len_ids, features), key=lambda x: -x[0]): ids = feature["input_ids"] seq_len = feature["seq_len"] labels = ( [-100] * (seq_len - 1) + ids[(seq_len - 1) :] + [-100] * (longest - ids_l) ) ids = ids + [tokenizer.pad_token_id] * (longest - ids_l) _ids = torch.LongTensor(ids) labels_list.append(torch.LongTensor(labels)) input_ids.append(_ids) input_ids = torch.stack(input_ids) labels = torch.stack(labels_list) return { "input_ids": input_ids, "labels": labels, }
` 上述代码中的return没有attention mask和position id。看了很多GitHub的finetune代码,发现有些人会加上attention mask,有些人不会。 也运行了不加attention mask的代码,似乎没有问题,效果也不错。对这个问题十分疑惑,不加attention mask的话,在微调的时候不就可以直接看到label么?
The text was updated successfully, but these errors were encountered:
ChatGLM, baichuan等模型的源码 (modeling_chatglm.py这种) 里面其实自己会构造应对 causal LM 的 attention mask,不用手动去构造了。
比如 https://huggingface.co/THUDM/chatglm2-6b/blob/main/modeling_chatglm.py#L674
其他 GPT 类的模型都不用显示输入 attention mask,模型内部都会用各种办法来生成mask。
Sorry, something went wrong.
参照于Llama而言,一般传给forward的attention mask多数只是把哪些是padding,哪些不是padding指明出来,而一般模型内部会构造causal attention(俗称三角阵),这一步就是把后面的token mask掉。
No branches or pull requests
`def data_collator(features: list) -> dict:
`
上述代码中的return没有attention mask和position id。看了很多GitHub的finetune代码,发现有些人会加上attention mask,有些人不会。 也运行了不加attention mask的代码,似乎没有问题,效果也不错。对这个问题十分疑惑,不加attention mask的话,在微调的时候不就可以直接看到label么?
The text was updated successfully, but these errors were encountered: