Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about the paper experiments #5

Open
RichardSunnyMeng opened this issue Jun 25, 2024 · 3 comments
Open

Questions about the paper experiments #5

RichardSunnyMeng opened this issue Jun 25, 2024 · 3 comments

Comments

@RichardSunnyMeng
Copy link

RichardSunnyMeng commented Jun 25, 2024

Hi, authors. Thank you for your efforts for safe AIGC and your creative work. But I have some issues with the experiments in the paper.

  1. Do the experiments for attacking open-source models (Sec. 4.2) and online services (Sec. 4.3) only involve the text modality? If yes, how can you disable the safety checkers of online services such as Midjourney?
  2. Are the adv. prompts used in the above experiments all optimized using SD v1.5?
  3. Are the multimodal attack results (Sec. 4.4) also obtained using SD v1.5? If yes, can the adversarial images be transferred to black-box scenarios like the text modality?
  4. Is there a time comparison? Many attack methods spend a lot of time, so I think it is an important problem.

Best.

@yangyijune
Copy link
Collaborator

Hi, Richard! For your concerns: 1. Yes. We can not disable the online service's safety checker. Our adv. prompts can bypass the prompt checker in a black-box attack way. 2. Yes. 3. Yes. No. The adversarial images can only cheat image-modal safety checker. 4. No. We merely report MMA's time cost in the paper.

@RichardSunnyMeng
Copy link
Author

Thanks for your response! But I still have two questions.

  1. For 1, the results of online services indicate the performance of not only filters but also checkers, so if we can disable these checkers, can we obtain a better performance? And from Tab.2, many inappropriate images generated by adversarial prompts are released. If it is because the ability of these checkers is limited, it seems like that we don't need adversarial images?
  2. For 3, I mean whether you evaluate multimodal attacks on commercial models such as Midjourney.

@yangyijune
Copy link
Collaborator

Hi Richard,

  1. MidJourney appears to rely solely on the prompt filter, making adv. prompts sufficient. I suspect this is because the post-hoc safety checker is costly and has a high false positive rate.
  2. For multimodal attacks, we do not evaluate them on Midjounery/Leonadro.AI, as text-modal attacks are sufficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants