Evaluating and Enhancing the Robustness of Code Pre-trained Models through Structure-Aware Adversarial Samples Generation

This is the repository of EMNLP 2023 paper Evaluating and Enhancing the Robustness of Code Pre-trained Modelsthrough Structure-Aware Adversarial Samples Generation.

Introduction

We leverage the code structure to generate adversarial examples to attack code pre-trained models, and then enhance model robustness with adversarial training strategies by exploiting the structural information.

More details are provided in our EMNLP'23 paper and our paper.

Environment & Preparing

conda create --name cat python=3.7
conda activate cat
pip install -r requirements.txt
git clone https://github.com/nchen909/CodePrompt
cd CodePrompt/evaluator/CodeBLEU/parser
bash build.sh
cd ../../../
cp evaluator/CodeBLEU/parser/my-languages.so build/
#make sure git-lfs installed like 'apt-get install git-lfs'
bash get_models.sh

for cuda11.0+,

pip install torch==1.7.0+cu110 torchvision==0.8.1+cu110 torchaudio===0.7.0 -f https://download.pytorch.org/whl/torch_stable.html

for torch geometric,

https://pytorch-geometric.com/whl/torch-1.6.0%2Bcu101.html

Preparing data

The dataset comes from CodeXGLUE.

mkdir data
cd data
pip install gdown
gdown https://drive.google.com/uc?export=download&id=1BBeHFlKoyanbxaqFJ6RRWlqpiokhDhY7
unzip data.zip
rm data.zip

Preparing local path

Direct WORKDIR, HUGGINGFACE_LOCALS in run.sh, run_few_shot.sh to your path.

Finetuning tasks

export MODEL_NAME=
export TASK=
export SUB_TASK=
# to run one task
bash run.sh $MODEL_NAME $TASK $SUB_TASK
# to run few shot
bash run_few_shot.sh $MODEL_NAME $TASK $SUB_TASK
# to run multi task
bash run_multi_task.sh

MODEL_NAME can be any one of ["roberta", "codebert", "graphcodebert", "unixcoder","t5","codet5","bart","plbart"].

TASK can be any one of ['summarize', 'translate', 'refine', 'generate', 'defect', 'clone']. (generate refers concode in codexglue, and we don't consider complete)

SUB_TASK can be in picture below

Category	Dataset	Task	Sub_task(LANG)	Type	Category	Description
C2C	BCB	clone	[] (java)	bi-directional	encoder	code summarization task onCodeSearchNet data with six PLs
C2C	Devign	defect	[] (c)	bi-directional	encoder	text-to-code generation onConcode data
C2C	CodeTrans	translate	['java-cs', 'cs-java’]	end2end	en2de	code-to-code translation betweenJava and C#
C2C	Bugs2Fix	refine(repair)	['small','medium'] (java)	end2end	en2de	code refinement oncode repair data with small/medium functions
C2T	CodeSN	summarize	['java', 'python', 'javascript','php','ruby','go']	end2end	en2de	code defect detection inC/C++ data
T2C	CONCODE	generate(concode)	[] (java)	end2end	en2de	code clone detection inJava data

Scripts

For different methods of adversarial attacks, please refer to the /custom/attacks directory. All scripts can be found in the scripts directory.

Citation

Please consider citing us if you find this repository useful.👇

@inproceedings{chen2023coderobustness,
  title={Evaluating and Enhancing the Robustness of Code Pre-trained Models through Structure-Aware Adversarial Samples Generation},
  author={Chen, Nuo and Sun, Qiushi and Wang, Jianing and Gao, Ming and Li, Xiaoli and Li, Xiang},
  booktitle = {EMNLP},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
build		build
custom		custom
evaluator		evaluator
models_list		models_list
scripts		scripts
.DS_Store		.DS_Store
README.md		README.md
configs.py		configs.py
get_models.sh		get_models.sh
main.py		main.py
main_adversarial.py		main_adversarial.py
models.py		models.py
requirements.txt		requirements.txt
run.sh		run.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evaluating and Enhancing the Robustness of Code Pre-trained Models through Structure-Aware Adversarial Samples Generation

Introduction

Environment & Preparing

Preparing data

Preparing local path

Finetuning tasks

Scripts

Citation

About

Releases

Packages

Languages

nchen909/CodeRobustness

Folders and files

Latest commit

History

Repository files navigation

Evaluating and Enhancing the Robustness of Code Pre-trained Models through Structure-Aware Adversarial Samples Generation

Introduction

Environment & Preparing

Preparing data

Preparing local path

Finetuning tasks

Scripts

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages