Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tasks: Add image-text-to-text pipeline and inference API to task page #1039

Merged
merged 13 commits into from
Dec 12, 2024

Conversation

merveenoyan
Copy link
Contributor

..and remove the long inference

@merveenoyan merveenoyan changed the title Tasks: Add image-text-to-text pipeline to task page Tasks: Add image-text-to-text pipeline and inference API to task page Nov 18, 2024
Copy link
Member

@pcuenca pcuenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool! 🔥

packages/tasks/src/tasks/image-text-to-text/about.md Outdated Show resolved Hide resolved
packages/tasks/src/tasks/image-text-to-text/about.md Outdated Show resolved Hide resolved
Comment on lines 59 to 64
{
"role": "assistant",
"content": [
{"type": "text", "text": "There's a pink flower"},
],
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit strange to me that the input ends with an assistant turn. I see in the example later that the model completes the sentence with more details, but I'm not sure this is compatible with all chat VLMs. Can we maybe skip the assistant role from the input and see if the model provides a good description of the image?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has not been addressed, I think it's unusual that users supply an assistant turn with the input.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I thought I answered to this. basically it's to give more control to further align the output during inference. I used the same example here where you can see the output https://huggingface.co/docs/transformers/en/tasks/image_text_to_text

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that example ends with an user role, while this one ends with an assistant role. I don't think models are expected to be queried with an assistant role in the last turn: they receive a conversation that always ends with an user role, and then they respond with an assistant message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I think I should've sent the particular title, here you go https://huggingface.co/docs/transformers/en/tasks/image_text_to_text#pipeline I meant this one

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still looks weird / confusing to me, but ok if you feel strongly about it.

packages/tasks/src/tasks/image-text-to-text/about.md Outdated Show resolved Hide resolved
packages/tasks/src/tasks/image-text-to-text/about.md Outdated Show resolved Hide resolved
packages/tasks/src/tasks/image-text-to-text/about.md Outdated Show resolved Hide resolved
@merveenoyan merveenoyan requested a review from pcuenca December 10, 2024 15:40
@merveenoyan
Copy link
Contributor Author

ah need to lint

Copy link
Member

@pcuenca pcuenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, let's try to get this merged soon 🔥

Comment on lines 59 to 64
{
"role": "assistant",
"content": [
{"type": "text", "text": "There's a pink flower"},
],
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has not been addressed, I think it's unusual that users supply an assistant turn with the input.

@merveenoyan
Copy link
Contributor Author

@pcuenca I changed it since it looked counterintuitive as an example, merging. thanks for the review

@merveenoyan merveenoyan merged commit 8c62f4a into main Dec 12, 2024
5 checks passed
@merveenoyan merveenoyan deleted the add-vlm-pipeline branch December 12, 2024 16:50
@huggingface huggingface deleted a comment from code30x58 Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants