This repository contains the source code for the paper titled "The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes", published at CVPR 2024.
| arXiv |
- Create and Activate the Conda Environment:
conda create -n data-infl python=3.8.16 conda activate data-infl pip install -r requirements.txt
This section outlines the steps to verify the Mirrored Influence Hypothesis.
- Execution of Scripts:
- Begin by running the following script to get a set of scores.
python LOO-DualLOO-Convex.py`
- Begin by running the following script to get a set of scores.
- Analysis:
- After running the script, proceed with the analysis using the Jupyter Notebook:
LOO-DualLOO-Convex_Analysis.ipynb
- After running the script, proceed with the analysis using the Jupyter Notebook:
- Analysis:
- Use the following Jupyter Notebook for the analysis of non-convex models:
LOO-DualLOO-Group-Nonconvex-mnist.ipynb
- Use the following Jupyter Notebook for the analysis of non-convex models:
This section provides an example of applying our algorithm in one of our applications (e.g., data leakage experiment).
-
To review the implementation, refer to the provided Jupyter Notebook in the data-leakage directory:
FINF-Duplication-ResNet18-main.ipynb
-
The same codebase can be adapted for various applications.
-
For text-to-image model data attribution experiments, use the codebase, pre-trained models, and environment detailed in this paper.
-
For NLP fact-tracing experiments, refer to the codebase, pre-trained models, and environment described in this paper.
Feel free to reach out if you have any questions.
If you find "The Mirrored Influence Hypothesis" useful in your research, please consider citing:
@article{ko2024mirrored,
title={The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes},
author={Ko, Myeongseob and Kang, Feiyang and Shi, Weiyan and Jin, Ming and Yu, Zhou and Jia, Ruoxi},
journal={arXiv preprint arXiv:2402.08922},
year={2024}
}