From 2772107781145687c112e063a548ea196a67c654 Mon Sep 17 00:00:00 2001 From: DefTruth <31974251+DefTruth@users.noreply.github.com> Date: Fri, 5 Apr 2024 09:55:38 +0800 Subject: [PATCH] DEFT: FLASH TREE-ATTENTION WITH IO-AWARENESS FOR EFFICIENT TREE-SEARCH-BASED LLM INFERENCE --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 7b44bd0..aef412c 100644 --- a/README.md +++ b/README.md @@ -146,6 +146,7 @@ Awesome-LLM-Inference: A curated list of [📙Awesome LLM Inference Papers with |2023.12|[SCCA] SCCA: Shifted Cross Chunk Attention for long contextual semantic expansion(@Beihang University)| [[pdf]](https://arxiv.org/pdf/2312.07305.pdf) | ⚠️ |⭐️ | |2023.12|🔥[**FlashLLM**] LLM in a flash: Efficient Large Language Model Inference with Limited Memory(@Apple)| [[pdf]](https://arxiv.org/pdf/2312.11514.pdf) | ⚠️ |⭐️⭐️ | |2024.03|🔥🔥[CHAI] CHAI: Clustered Head Attention for Efficient LLM Inference(@cs.wisc.edu etc)| [[pdf]](https://arxiv.org/pdf/2403.08058.pdf) | ⚠️ |⭐️⭐️ | +|2024.04| [Flash Tree Attention] DEFT: FLASH TREE-ATTENTION WITH IO-AWARENESS FOR EFFICIENT TREE-SEARCH-BASED LLM INFERENCE(@Westlake University etc)| [[pdf]](https://arxiv.org/pdf/2404.00242.pdf) | ⚠️ |⭐️⭐️ | ### 📖KV Cache Scheduling/Quantize/Dropping ([©️back👆🏻](#paperlist))