Skip to content

Commit

Permalink
modify readme
Browse files Browse the repository at this point in the history
  • Loading branch information
t1101675 committed Oct 23, 2024
1 parent a5f6316 commit e680467
Show file tree
Hide file tree
Showing 2 changed files with 16 additions and 12 deletions.
22 changes: 16 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
# MiniPLM: Knowledge Distillation of Pre-Training Language Models
[paper]() | [huggingface](https://huggingface.co/MiniLLM)

[paper](https://arxiv.org/abs/2410.17215) | [huggingface](https://huggingface.co/MiniLLM)

<img src="figures/method.png"></img>

#### See also:
+ [MiniLLM](https://github.com/microsoft/LMOps/tree/main/minillm) (Knowledge Distillation for in SFT for LLMs)
+ [DPKD](https://github.com/microsoft/LMOps/tree/main/dpkd) (An improved version of MiniLLM with better performance and more simple implementation)

## 1 Setup
```bash
pip3 install -r requirements.txt
Expand All @@ -25,9 +28,9 @@ The processed data is stored in `processed_data/pretrain/pile/qwen-1025`, contai

## 3 Models
### 3.1 Teacher Model
We use [QWen1.5-1.8B](https://huggingface.co/Qwen/Qwen1.5-1.8B) as the teacher LM. You can download this model can put it in `checkpoints/qwen/1.8B`.
We use [Qwen1.5-1.8B](https://huggingface.co/Qwen/Qwen1.5-1.8B) as the teacher LM. You can download this model can put it in `checkpoints/qwen/1.8B`.
### 3.2 Reference Model
The [reference model](https://huggingface.co/MiniLLM/MiniPLM-QWen-104M-ref) is a 104M QWen LM trained on 5B tokens randomly split from the Pile, which should be put in `checkpoints/qwen/104M_ref`.
The [reference model](https://huggingface.co/MiniLLM/MiniPLM-Qwen-104M-ref) is a 104M Qwen LM trained on 5B tokens randomly split from the Pile, which should be put in `checkpoints/qwen/104M_ref`.
### 3.3 Pre-Trained Models
The [MiniPLM models](https://huggingface.co/collections/MiniLLM/miniplm-6712c0fdf09ef7e8da7d39bd) and baseline models can be found in the [HuggingFace Hub](https://huggingface.co/MiniLLM).

Expand Down Expand Up @@ -59,7 +62,7 @@ bash scripts/miniplm/pretraining/qwen/1.2B.sh /PATH/TO/MiniPLM
```

#### KD Across Model Families
To distill the knowledge of QWen models to Mamba or LLaMA3.1, first prepare the `config.json` and tokenization-related files in `checkpoints/mamba/130M` and `checkpoints/llama3.1/212M`, which can be downloaded from [our huggingface hub](https://huggingface.co/collections/MiniLLM/miniplm-6712c0fdf09ef7e8da7d39bd). Then, convert the QWen tokenization to the target tokenization:
To distill the knowledge of Qwen models to Mamba or LLaMA3.1, first prepare the `config.json` and tokenization-related files in `checkpoints/mamba/130M` and `checkpoints/llama3.1/212M`, which can be downloaded from [our huggingface hub](https://huggingface.co/collections/MiniLLM/miniplm-6712c0fdf09ef7e8da7d39bd). Then, convert the Qwen tokenization to the target tokenization:
```bash
bash scripts/tools/convert_tokenization/convert_tokenization_qwen_mamba.sh /PATH/TO/MiniPLM
bash scripts/tools/convert_tokenization/convert_tokenization_qwen_llama3_1.sh /PATH/TO/MiniPLM
Expand Down Expand Up @@ -106,4 +109,11 @@ bash scripts/eval/lm.sh /PATH/TO/MiniPLM --model-path /PATH/TO/TRAINED_CKPT --ck
```

## 6 Citation
TODO
```
@article{miniplm,
title={MiniPLM: Knowledge Distillation for Pre-Training Language Models},
author={Yuxian Gu and Hao Zhou and Fandong Meng and Jie Zhou and Minlie Huang},
journal={arXiv preprint arXiv:2410.17215},
year={2024}
}
```
6 changes: 0 additions & 6 deletions utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -291,9 +291,3 @@ def save_parallel(model, save_dir):
checkpoint_name = os.path.join(save_dir, f"mp{mpu.get_model_parallel_world_size()}", f"pytorch_model_{mp_rank}.bin")
torch.save(model.state_dict(), checkpoint_name)
print(f"Rank {get_rank()}: {checkpoint_name} saved.")


def remove_path(path):
print("Remove", path)
if os.path.exists(path):
os.system(f"rm -rf {path}")

0 comments on commit e680467

Please sign in to comment.