NarrativeQA benchmark

NarrativeQA dataset is an English-lanaguage dataset of stories and corresponding questions designed to test reading comprehension, especially on long documents. The dataset is used to test reading comprehension. There are 2 tasks proposed in the paper: "summaries only" and "stories only", depending on whether the human-generated summary or the full story text is used to answer the question.

Performance

1. Leaderboard from SOTA

Paper	Year	Model	Model Details	NDCG@10	Recall@5	EM
_XXX	2024	INSTRUCTRAG	R:DPR ,G:ChatGPT-4oMINI	-	-	71.6
			R:DPR ,G:Llama-3-Ins-70B	-	-	70.8
			R:DPR ,G:Llama-3-Ins-8B	-	-	65.0
		Baseline1	R: ❌, G: Llama3-8-Ins8B	-	-	39.2
		Baseline2	R: ❌, G: Llama3-8-Ins70B	-	-	54.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NarrativeQA.md

NarrativeQA.md

NarrativeQA benchmark

Performance

1. Leaderboard from SOTA

2. LLM-based Methods (Reproducable)

Files

NarrativeQA.md

Latest commit

History

NarrativeQA.md

File metadata and controls

NarrativeQA benchmark

Performance

1. Leaderboard from SOTA

2. LLM-based Methods (Reproducable)