This repository contains the code used for the experiments in the paper Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking.
We study how fine-tuning affects the internal mechanisms implemented in language models. As a case study, we explore the task of entity tracking in Llama-7B, and in its fine-tuned variants - Vicuna-7B, Goat-7B, Float-7B.
Our findings suggest that fine-tuning enhances, rather than fundamentally alters, the mechanistic operation of the model.
Please check finetuning.baulab.info for more information.
In order to discover the underlying mechanism for performing entity tracking task, we employed: 1) Path Patching (experiment_1/path_patching.py) and 2) Desiderata-based Component Masking (experiment_2/DCM.py). Both the methods are implemented using baukit, which can be easily adopted for other tasks.
Moreover, in order to uncover the reason behind the performance enhancement in fine-tuned models employing the same mechanism, we have introduced a novel approach called CMAP (Cross-Model Activation Patching). This method involves patching activations across models to elucidate the enhanced mechanisms. The notebook experiment_3/cmap.ipynb provides a demonstration on how to execute the complete experiment.
Note: You need to have the weights for the LLaMA-7b model which is under a non-commercial license. Use this form to request access to the model, if you do not have it already.
To get all the dependencies run:
conda env create -f environment.yml
conda activate finetuning
@inproceedings{prakash2023fine,
title={Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking},
author={Prakash, Nikhil and Shaham, Tamar Rott and Haklay, Tal and Belinkov, Yonatan and Bau, David},
booktitle={Proceedings of the 2024 International Conference on Learning Representations},
note={arXiv:2402.14811},
year={2024}
}