-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better prompting might give better results at 0 shot #1
Comments
Code to generate the images (courtesy gpt-4): import os
import random
from PIL import Image, ImageDraw, ImageFont
def draw_text_with_outline(draw, text, position, font, text_color, outline_color):
# Draw outline
x, y = position
for adj in [(-1, -1), (-1, 1), (1, -1), (1, 1)]:
draw.text((x + adj[0], y + adj[1]), text, font=font, fill=outline_color)
# Draw text
draw.text(position, text, font=font, fill=text_color)
def label_and_combine_images(input_dir, output_dir):
# Create the output directory if it doesn't exist
if not os.path.exists(output_dir):
os.makedirs(output_dir)
# Loop through each subdirectory in the input directory
for subdir in os.listdir(input_dir):
subdir_path = os.path.join(input_dir, subdir)
if os.path.isdir(subdir_path):
image_paths = [os.path.join(subdir_path, filename) for filename in sorted(os.listdir(subdir_path)) if filename.lower().endswith(('.png', '.jpg', '.jpeg'))]
# Randomize the order of the images
random.shuffle(image_paths)
images = []
# Loop through each image file in the randomized list
for i, img_path in enumerate(image_paths[:4]):
img = Image.open(img_path)
# Draw label on image
draw = ImageDraw.Draw(img)
# Use a TrueType font with a larger size
font_size = 60
font = ImageFont.truetype("arial.ttf", font_size) if os.path.isfile("arial.ttf") else ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf", font_size)
label = str(i + 1)
text_color = "white"
outline_color = "black"
# Draw the text with outline on the image
draw_text_with_outline(draw, label, (10, 10), font, text_color, outline_color)
images.append(img)
# Combine images vertically
widths, heights = zip(*(i.size for i in images))
total_height = sum(heights)
max_width = max(widths)
combined_img = Image.new('RGB', (max_width, total_height))
y_offset = 0
for img in images:
combined_img.paste(img, (0, y_offset))
y_offset += img.height
# Save the combined image
combined_img.save(os.path.join(output_dir, f'{subdir}_combined.jpg'))
# Example usage
input_directory = './data/screenshots'
output_directory = './results/combined_images'
label_and_combine_images(input_directory, output_directory) |
With some more tweaks to the prompt and workflow, I think it should be possible to get 12/12 correct. |
@tohrnii would you mind sharing your better prompt? Very curious! |
It wasn't really much better. I just wanted to test the hypothesis that the problem with the original prompt was that you asked the answer first and then the reasoning. So I wanted to tweak your prompt as little as possible to test the hypothesis. Here's the prompt:
The prompt I tried is actually very bad so it seemed reasonable to assume that with better prompting it should be able to get to 100% on this test. I'm much more interested in testing open source vision models so did some more testing with better prompts on llava-1.5-13b. It was surprising to see that it performed much better than random which is not what I was expecting and it's reasoning was quite coherent in a few cases. For this I basically included some basic design principles in the prompt (did some iterations on that). I think a basic improvement that can be done in the prompt is to either include some basic design principles in your prompt or ask gpt-4 to itself write some design principles and critique the options based on those etc. It's important to give as much time/tokens to these models to get a better answer. |
Also, I used the chatgpt interface instead of the API so it's possible there are some differences. The model on their interface could be better tuned for instructions so you can more easily get away with bad prompts. |
I ran the experiment with a slight tweak in the prompt to give the reasoning first and then the answer. Got 8/12 correct.
Here are the results:
The text was updated successfully, but these errors were encountered: