directed repo readme to pipeline readme

reds-lab · Jul 6, 2024 · b9d886b · b9d886b
1 parent a1afc02
commit b9d886b
Showing 1 changed file with 2 additions and 7 deletions.
diff --git a/README.md b/README.md
@@ -79,17 +79,12 @@ To go step by step through our WokeyTalky process using the original code that g
 
 5. **Further Documentation:**
    Reference to additional documentation within the repository:
-   ```markdown
-   For more detailed instructions and further documentation, please refer to the [documentation folder](./docs/README.md) inside the repository.
+
+   For more detailed instructions and further documentation, please refer to the [documentation folder](./WokeyTalky_Research_Code/README.md) inside the repository.
 ## Introduction
 
 **TL;DR:** WokeyTalky is a scalable pipeline that generates test data to evaluate the spurious correlated safety refusal of foundation models through a systematic approach.
 
-**What did we introduce?** A taxonomy with 40 persuasion techniques to help enhance persuasion skills.
-
-**What did we find?** By iteratively applying different persuasion techniques from our taxonomy, we successfully jailbreak advanced aligned LLMs, including Llama 2-7b Chat, GPT-3.5, and GPT-4, achieving a 92% attack success rate, notably without any specified optimization.
-
-Interestingly, we found that advanced models like GPT-4 are more vulnerable to persuasive adversarial prompts (PAPs). Adaptive defenses crafted to neutralize these PAPs also provide effective protection against a spectrum of other attacks (e.g., [GCG](https://llm-attacks.org/), [Masterkey](https://sites.google.com/view/ndss-masterkey), or [PAIR](https://jailbreaking-llms.github.io/)).
 
 ## A Quick Glance
 <img src="./assets/0evQd-adv-bench-rejection-rates.png" alt="Rejection Rates" width="90%"/>