Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better prompting might give better results at 0 shot #1

Open
tohrnii opened this issue Dec 15, 2023 · 5 comments
Open

Better prompting might give better results at 0 shot #1

tohrnii opened this issue Dec 15, 2023 · 5 comments

Comments

@tohrnii
Copy link

tohrnii commented Dec 15, 2023

I ran the experiment with a slight tweak in the prompt to give the reasoning first and then the answer. Got 8/12 correct.
Here are the results:

  1. Correct
  2. Correct
  3. Correct
  4. Incorrect (given B should be A)
  5. Correct
  6. Correct
  7. Correct
  8. Incorrect (given D should be B)
  9. Incorrect (given C should be A)
  10. Correct
  11. Correct
  12. Incorrect (given D should be B)
@tohrnii
Copy link
Author

tohrnii commented Dec 15, 2023

Code to generate the images (courtesy gpt-4):

import os
import random
from PIL import Image, ImageDraw, ImageFont

def draw_text_with_outline(draw, text, position, font, text_color, outline_color):
    # Draw outline
    x, y = position
    for adj in [(-1, -1), (-1, 1), (1, -1), (1, 1)]:
        draw.text((x + adj[0], y + adj[1]), text, font=font, fill=outline_color)
    # Draw text
    draw.text(position, text, font=font, fill=text_color)

def label_and_combine_images(input_dir, output_dir):
    # Create the output directory if it doesn't exist
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)

    # Loop through each subdirectory in the input directory
    for subdir in os.listdir(input_dir):
        subdir_path = os.path.join(input_dir, subdir)
        if os.path.isdir(subdir_path):
            image_paths = [os.path.join(subdir_path, filename) for filename in sorted(os.listdir(subdir_path)) if filename.lower().endswith(('.png', '.jpg', '.jpeg'))]
            # Randomize the order of the images
            random.shuffle(image_paths)
            images = []
            # Loop through each image file in the randomized list
            for i, img_path in enumerate(image_paths[:4]):
                img = Image.open(img_path)

                # Draw label on image
                draw = ImageDraw.Draw(img)
                # Use a TrueType font with a larger size
                font_size = 60
                font = ImageFont.truetype("arial.ttf", font_size) if os.path.isfile("arial.ttf") else ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf", font_size)
                label = str(i + 1)
                text_color = "white"
                outline_color = "black"

                # Draw the text with outline on the image
                draw_text_with_outline(draw, label, (10, 10), font, text_color, outline_color)

                images.append(img)

            # Combine images vertically
            widths, heights = zip(*(i.size for i in images))
            total_height = sum(heights)
            max_width = max(widths)
            combined_img = Image.new('RGB', (max_width, total_height))

            y_offset = 0
            for img in images:
                combined_img.paste(img, (0, y_offset))
                y_offset += img.height

            # Save the combined image
            combined_img.save(os.path.join(output_dir, f'{subdir}_combined.jpg'))

# Example usage
input_directory = './data/screenshots'
output_directory = './results/combined_images'
label_and_combine_images(input_directory, output_directory)

@tohrnii
Copy link
Author

tohrnii commented Dec 15, 2023

With some more tweaks to the prompt and workflow, I think it should be possible to get 12/12 correct.

@r00dY
Copy link
Owner

r00dY commented Dec 16, 2023

@tohrnii would you mind sharing your better prompt? Very curious!

@tohrnii
Copy link
Author

tohrnii commented Dec 16, 2023

It wasn't really much better. I just wanted to test the hypothesis that the problem with the original prompt was that you asked the answer first and then the reasoning. So I wanted to tweak your prompt as little as possible to test the hypothesis. Here's the prompt:

Hey, here a few screenshots of a section from a webpage. Each section has the same content but is designed a bit differently. Sections are labelled "1", "2", "3" and "4".
Only one of the sections is correctly and cleanly designed, the rest have some obvious design flaws. Tell me which one is correct. Think step by step and finally based on your reasoning give the answer in the format #1, #2, #3 or #4.

The prompt I tried is actually very bad so it seemed reasonable to assume that with better prompting it should be able to get to 100% on this test. I'm much more interested in testing open source vision models so did some more testing with better prompts on llava-1.5-13b. It was surprising to see that it performed much better than random which is not what I was expecting and it's reasoning was quite coherent in a few cases. For this I basically included some basic design principles in the prompt (did some iterations on that).

I think a basic improvement that can be done in the prompt is to either include some basic design principles in your prompt or ask gpt-4 to itself write some design principles and critique the options based on those etc. It's important to give as much time/tokens to these models to get a better answer.

@tohrnii
Copy link
Author

tohrnii commented Dec 16, 2023

Also, I used the chatgpt interface instead of the API so it's possible there are some differences. The model on their interface could be better tuned for instructions so you can more easily get away with bad prompts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants