This repository accompanies the paper Can LLMs Design Good Questions Based on Context? available on arXiv. It provides datasets, experimental scripts, and visualization tools to explore the capability of Large Language Models (LLMs) in designing meaningful and contextually relevant questions.
This is the code repository for the paper Can LLMs Design Good Questions Based on Context? The repository contains experimental scripts, and visualization tools to analyze the performance of LLMs in generating questions based on context.
This project uses Poetry for dependency management.Install Poetry (if not already installed):
- Follow the official installation guide.
- Install dependencies:
poetry install
- Install punkit
python setup.py
The entire pipeline for running experiments is automated through the run.py
script. To run the experiments, execute the following command:
python run.py <experiment> [--flags]
Visualization scripts are available to generate plots based on experimental results.
python plot.py <plot_name> [--flags]
The data
directory contains various datasets used for training and evaluating the LLMs:
- hotpot: Datasets related to the HotpotQA project.
- llmqg_gpt & llmqg_llama: Our QA dataset generated using WikiText and different LLMs.
- trivia: Trivia QA datasets, both filtered and unfiltered.
Use the following command to download the datasets:
bash download_data.sh
To generate our QA dataset, use the following command:
python utils/datasets.py