🤗 Models & Datasets | 📃 Blog Post
This work explores the application of Constitutional AI (CAI) techniques in a multilingual context. This work covers 8 languages (Arabic, Filipino, French, Hindi, English, Russian, Serbian, Spanish).
construct_principles.ipynb
is a notebook that walks through the adaptation of Anthropic's constitution to create targeted critiques and revisions to the Aya redteaming dataset.create_ultrafeedback_multilingual.py
is a script to translate Ultrafeedback Binarized into our 8 languages using NLLB-3.3B.generate_critiques_revisions.py
is an optimised vLLM script which generates the constitutional preference pairs via critiquing and revising the LLM responses to the red teaming prompts.data_cleaning.py
is a script that helps us to remove some of the unwanted examples that resulted from thegenerate_critiques_revisions.py
script.finetuning
contains scripts and configs to do supervised finetuning and DPO for both the safety trained model and the baseline.evaluate.py
is a script that generates outputs on the test set of the red team prompts, and uses GPT-4o for LLM-as-a-judge to categorise each as either HARMFUL or HARMLESS. We also provide an explanation for the categorisation for interpretability.plots.ipynb
is a notebook used for generating the plots shown in the blog post.