GitHub - TheRoadQaQ/Prompt-Engineering-Improves-Video-Understanding.

Prompt Engineering Improves Video Understanding

This repository contains the code for our paper, "Prompt Engineering Improves Video Understanding."

Directory Structure

./llama: Contains code utilizing LLaMA 3.1 for generating accuracy scores, rewriting original questions, and generating automatic prefixes.

./llava_next_video: Includes code for using the LLaVA-NeXT-Video-7B-DPO model to perform inference across all prompt settings.

./llava_one_vision: Contains code for using the LLaVA-OneVision-Qwen2-7B-OV-Chat model to perform inference across all prompt settings.

./qwen2_vl: Features code for using the Qwen2-VL-7B-Instruct model to perform inference across all prompt settings.

Scripts and Examples

./inference.sh: An example script demonstrating how to perform inference on the TGIF dataset using the LLaVA-NeXT-Video-7B-DPO model with all prompt settings.

./prompts.py: Contains all the prompts used in our experiments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prompt Engineering Improves Video Understanding

Directory Structure

Scripts and Examples

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
llama		llama
llava_next_video		llava_next_video
llava_one_vision		llava_one_vision
qwen2_vl		qwen2_vl
README.md		README.md
inference.sh		inference.sh
prompts.py		prompts.py

TheRoadQaQ/Prompt-Engineering-Improves-Video-Understanding.

Folders and files

Latest commit

History

Repository files navigation

Prompt Engineering Improves Video Understanding

Directory Structure

Scripts and Examples

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages