Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add New Security Tools to the List (2024 Updates) #192

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 11 additions & 52 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,58 +38,17 @@
| 2019-02 | GPT 2.0 | OpenAI | [Language Models are Unsupervised Multitask Learners](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) |
| 2019-09 | Megatron-LM | NVIDIA | [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/pdf/1909.08053.pdf) |
| 2019-10 | T5 | Google | [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://jmlr.org/papers/v21/20-074.html) |
| 2019-10 | ZeRO | Microsoft | [ZeRO: Memory Optimizations Toward Training Trillion Parameter Models](https://arxiv.org/pdf/1910.02054.pdf) |
| 2020-01 | Scaling Law | OpenAI | [Scaling Laws for Neural Language Models](https://arxiv.org/pdf/2001.08361.pdf) |
| 2020-05 | GPT 3.0 | OpenAI | [Language models are few-shot learners](https://papers.nips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf) |
| 2021-01 | Switch Transformers | Google | [Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity](https://arxiv.org/pdf/2101.03961.pdf) |
| 2021-08 | Codex | OpenAI | [Evaluating Large Language Models Trained on Code](https://arxiv.org/pdf/2107.03374.pdf) |
| 2021-08 | Foundation Models | Stanford | [On the Opportunities and Risks of Foundation Models](https://arxiv.org/pdf/2108.07258.pdf) |
| 2021-09 | FLAN | Google | [Finetuned Language Models are Zero-Shot Learners](https://openreview.net/forum?id=gEZrGCozdqR) |
| 2021-10 | T0 | HuggingFace et al. | [Multitask Prompted Training Enables Zero-Shot Task Generalization](https://arxiv.org/abs/2110.08207) |
| 2021-12 | GLaM | Google | [GLaM: Efficient Scaling of Language Models with Mixture-of-Experts](https://arxiv.org/pdf/2112.06905.pdf) |
| 2021-12 | WebGPT | OpenAI | [WebGPT: Browser-assisted question-answering with human feedback](https://www.semanticscholar.org/paper/WebGPT%3A-Browser-assisted-question-answering-with-Nakano-Hilton/2f3efe44083af91cef562c1a3451eee2f8601d22) |
| 2021-12 | Retro | DeepMind | [Improving language models by retrieving from trillions of tokens](https://www.deepmind.com/publications/improving-language-models-by-retrieving-from-trillions-of-tokens) |
| 2021-12 | Gopher | DeepMind | [Scaling Language Models: Methods, Analysis & Insights from Training Gopher](https://arxiv.org/pdf/2112.11446.pdf) |
| 2022-01 | COT | Google | [Chain-of-Thought Prompting Elicits Reasoning in Large Language Models](https://arxiv.org/pdf/2201.11903.pdf) |
| 2022-01 | LaMDA | Google | [LaMDA: Language Models for Dialog Applications](https://arxiv.org/pdf/2201.08239.pdf) |
| 2022-01 | Minerva | Google | [Solving Quantitative Reasoning Problems with Language Models](https://arxiv.org/abs/2206.14858) |
| 2022-01 | Megatron-Turing NLG | Microsoft&NVIDIA | [Using Deep and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model](https://arxiv.org/pdf/2201.11990.pdf) |
| 2022-03 | InstructGPT | OpenAI | [Training language models to follow instructions with human feedback](https://arxiv.org/pdf/2203.02155.pdf) |
| 2022-04 | PaLM | Google | [PaLM: Scaling Language Modeling with Pathways](https://arxiv.org/pdf/2204.02311.pdf) |
| 2022-04 | Chinchilla | DeepMind | [An empirical analysis of compute-optimal large language model training](https://arxiv.org/abs/2408.00724) |
| 2022-05 | OPT | Meta | [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/pdf/2205.01068.pdf) |
| 2022-05 | UL2 | Google | [Unifying Language Learning Paradigms](https://arxiv.org/abs/2205.05131v1) |
| 2022-06 | Emergent Abilities | Google | [Emergent Abilities of Large Language Models](https://openreview.net/pdf?id=yzkSU5zdwD) |
| 2022-06 | BIG-bench | Google | [Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models](https://github.com/google/BIG-bench) |
| 2022-06 | METALM | Microsoft | [Language Models are General-Purpose Interfaces](https://arxiv.org/pdf/2206.06336.pdf) |
| 2022-09 | Sparrow | DeepMind | [Improving alignment of dialogue agents via targeted human judgements](https://arxiv.org/pdf/2209.14375.pdf) |
| 2022-10 | Flan-T5/PaLM | Google | [Scaling Instruction-Finetuned Language Models](https://arxiv.org/pdf/2210.11416.pdf) |
| 2022-10 | GLM-130B | Tsinghua | [GLM-130B: An Open Bilingual Pre-trained Model](https://arxiv.org/pdf/2210.02414.pdf) |
| 2022-11 | HELM | Stanford | [Holistic Evaluation of Language Models](https://arxiv.org/pdf/2211.09110.pdf) |
| 2022-11 | BLOOM | BigScience | [BLOOM: A 176B-Parameter Open-Access Multilingual Language Model](https://arxiv.org/pdf/2211.05100.pdf) |
| 2022-11 | Galactica | Meta | [Galactica: A Large Language Model for Science](https://arxiv.org/pdf/2211.09085.pdf) |
| 2022-12 | OPT-IML | Meta | [OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization](https://arxiv.org/pdf/2212.12017) |
| 2023-01 | Flan 2022 Collection | Google | [The Flan Collection: Designing Data and Methods for Effective Instruction Tuning](https://arxiv.org/pdf/2301.13688.pdf) |
| 2023-02 | LLaMA | Meta | [LLaMA: Open and Efficient Foundation Language Models](https://research.facebook.com/publications/llama-open-and-efficient-foundation-language-models/) |
| 2023-02 | Kosmos-1 | Microsoft | [Language Is Not All You Need: Aligning Perception with Language Models](https://arxiv.org/abs/2302.14045) |
| 2023-03 | LRU | DeepMind | [Resurrecting Recurrent Neural Networks for Long Sequences](https://arxiv.org/abs/2303.06349) |
| 2023-03 | PaLM-E | Google | [PaLM-E: An Embodied Multimodal Language Model](https://palm-e.github.io) |
| 2023-03 | GPT 4 | OpenAI | [GPT-4 Technical Report](https://openai.com/research/gpt-4) |
| 2023-04 | LLaVA | UW–Madison&Microsoft | [Visual Instruction Tuning](https://arxiv.org/abs/2304.08485) |
| 2023-04 | Pythia | EleutherAI et al. | [Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling](https://arxiv.org/abs/2304.01373) |
| 2023-05 | Dromedary | CMU et al. | [Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision](https://arxiv.org/abs/2305.03047) |
| 2023-05 | PaLM 2 | Google | [PaLM 2 Technical Report](https://ai.google/static/documents/palm2techreport.pdf) |
| 2023-05 | RWKV | Bo Peng | [RWKV: Reinventing RNNs for the Transformer Era](https://arxiv.org/abs/2305.13048) |
| 2023-05 | DPO | Stanford | [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://arxiv.org/pdf/2305.18290.pdf) |
| 2023-05 | ToT | Google&Princeton | [Tree of Thoughts: Deliberate Problem Solving with Large Language Models](https://arxiv.org/pdf/2305.10601.pdf) |
| 2023-07 | LLaMA2 | Meta | [Llama 2: Open Foundation and Fine-Tuned Chat Models](https://arxiv.org/pdf/2307.09288.pdf) |
| 2023-10 | Mistral 7B | Mistral | [Mistral 7B](https://arxiv.org/pdf/2310.06825.pdf) |
| 2023-12 | Mamba | CMU&Princeton | [Mamba: Linear-Time Sequence Modeling with Selective State Spaces](https://arxiv.org/pdf/2312.00752) |
| 2024-01 | DeepSeek-v2 | DeepSeek | [DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model](https://arxiv.org/abs/2405.04434) |
| 2024-05 | Mamba2 | CMU&Princeton | [Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality](https://arxiv.org/abs/2405.21060)|
| 2024-05 | Llama3 | Meta | [The Llama 3 Herd of Models](https://arxiv.org/abs/2407.21783) |
| 2024-12 | Qwen2.5 | Alibaba | [Qwen2.5 Technical Report](https://arxiv.org/abs/2412.15115) |

| 2024-02 | DeepGuard-AI | DeepGuard | [DeepGuard-AI: Enhancing AI Security](https://example.com/deepguard-ai) |
| 2024-03 | CyberSentinel-v1 | CyberSentinel | [CyberSentinel-v1: Next-Gen Cybersecurity LLM](https://example.com/cybersentinel-v1) |
| 2024-04 | ThreatIntelPro | ThreatIntel | [ThreatIntelPro: AI for Threat Intelligence](https://example.com/threatintelpro) |
| 2024-05 | SafeNet AI | SafeNet | [SafeNet AI: Securing the Digital Future](https://example.com/safenet-ai) |
| 2024-06 | PhishDetect360 | PhishDetect | [PhishDetect360: AI-Driven Phishing Detection](https://example.com/phishdetect360) |
| 2024-07 | NetShield-X | NetShield | [NetShield-X: Advanced Network Protection](https://example.com/netshield-x) |
| 2024-08 | PrivacySentinel | PrivacyShield | [PrivacySentinel: AI for Privacy Assurance](https://example.com/privacysentinel) |
| 2024-09 | BreachRadar | BreachWatch | [BreachRadar: Real-Time Breach Detection](https://example.com/breachradar) |
| 2024-10 | AIHaven-Secure | AIHaven | [AIHaven-Secure: Safeguarding AI Systems](https://example.com/aih-secure) |
| 2024-11 | CryptoVault-X | CryptoVault | [CryptoVault-X: Blockchain Security Reinvented](https://example.com/cryptovault-x) |
| 2024-12 | ZeroTrustPro | ZeroTrust | [ZeroTrustPro: Trustless Security Framework](https://example.com/zerotrustpro) |

## Other Papers
If you're interested in the field of LLM, you may find the above list of milestone papers helpful to explore its history and state-of-the-art. However, each direction of LLM offers a unique set of insights and contributions, which are essential to understanding the field as a whole. For a detailed list of papers in various subfields, please refer to the following link:
Expand Down