Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generating the caption of a given image #3

Open
claudiogreco opened this issue May 31, 2022 · 3 comments
Open

Generating the caption of a given image #3

claudiogreco opened this issue May 31, 2022 · 3 comments

Comments

@claudiogreco
Copy link

claudiogreco commented May 31, 2022

Hello,

Thank you for having implemented this model. Have you already implemented some code to generate the caption of a given image? If not, do you have an idea about how you would do it in this particular architecture?

Thank you in advance.

@mk-runner
Copy link

logits = coca(
    text = text,
    images = images
) # (4, 512, 20000)

I also have the same question. Although the caption logits can be obtained using the above code, text_tokens cannot be obtained and only image_tokens can be used in the inference phase.

Thank you in advance.

@SeaN0X
Copy link

SeaN0X commented Apr 18, 2024

Same problem here, with logits i get a huge tensor, but i didn't figure out how to convert it to text.

@elmekkiMalek
Copy link

Hello, Have you figured out how to do that ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants