Skip to content

pbevan1/multilingual-constitutional-ai

Repository files navigation

🤗 Models & Datasets | 📃 Blog Post

Multilingual Constitutional AI

This work explores the application of Constitutional AI (CAI) techniques in a multilingual context. This work covers 8 languages (Arabic, Filipino, French, Hindi, English, Russian, Serbian, Spanish).

How to navigate this project

  • construct_principles.ipynb is a notebook that walks through the adaptation of Anthropic's constitution to create targeted critiques and revisions to the Aya redteaming dataset.
  • create_ultrafeedback_multilingual.py is a script to translate Ultrafeedback Binarized into our 8 languages using NLLB-3.3B.
  • generate_critiques_revisions.py is an optimised vLLM script which generates the constitutional preference pairs via critiquing and revising the LLM responses to the red teaming prompts.
  • data_cleaning.py is a script that helps us to remove some of the unwanted examples that resulted from the generate_critiques_revisions.py script.
  • finetuning contains scripts and configs to do supervised finetuning and DPO for both the safety trained model and the baseline.
  • evaluate.py is a script that generates outputs on the test set of the red team prompts, and uses GPT-4o for LLM-as-a-judge to categorise each as either HARMFUL or HARMLESS. We also provide an explanation for the categorisation for interpretability.
  • plots.ipynb is a notebook used for generating the plots shown in the blog post.