lmdeploy可以实现用DPO模型计算奖励的功能吗 #3056

ghntd · 2025-01-20T11:35:04Z

我有一个用DPO训练的VLM，现在我想用它作为奖励模型来做一些探索。因此我希望可以给定模型一个完整的对话消息，模型应该返回所有输入的logits而不是对对话进行续写。如果可以实现的话，这将极大方便我的工作。

lvhan028 · 2025-01-20T13:23:18Z

from lmdeploy import pipeline, GenerationConfig
from lmdeploy.vl import load_image
pipe = pipeline('OpenGVLab/InternVL2_5-8B')

image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')

response = pipe(('describe this image', image),
                gen_config=GenerationConfig(output_logits='all', max_new_tokens=1))
logits = response.logits
print(logits)

看看这个符不符合你的需求？
需要用到 lmdeploy v0.7.0

ghntd · 2025-01-20T13:53:27Z

太感谢了，这个功能对我来说太有用了

lvhan028 self-assigned this Jan 20, 2025

lvhan028 added the awaiting response label Jan 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lmdeploy可以实现用DPO模型计算奖励的功能吗 #3056

lmdeploy可以实现用DPO模型计算奖励的功能吗 #3056

ghntd commented Jan 20, 2025

lvhan028 commented Jan 20, 2025

ghntd commented Jan 20, 2025

lmdeploy可以实现用DPO模型计算奖励的功能吗 #3056

lmdeploy可以实现用DPO模型计算奖励的功能吗 #3056

Comments

ghntd commented Jan 20, 2025

lvhan028 commented Jan 20, 2025

ghntd commented Jan 20, 2025