Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load edited GRACE and WISE for evaluation offline #469

Open
shariqahn opened this issue Jan 5, 2025 · 15 comments
Open

Load edited GRACE and WISE for evaluation offline #469

shariqahn opened this issue Jan 5, 2025 · 15 comments
Labels
question Further information is requested

Comments

@shariqahn
Copy link

shariqahn commented Jan 5, 2025

Is there a way to load a saved WISE or GRACE model without creating a WISE/GRACE object? I am performing a separate evaluation of the edited model, so ideally I would be able to load the model outside of the EasyEdit repo.

I understand (from here) that there is a load_path hparam to load a WISE model, but loading the WISE object in my separate evaluation repo is difficult due to dependency issues. For other methods, I have been using save_pretrained and from_pretrained to save the edited model and load it for evaluation respectively, but I understand that GRACE and WISE have special parameters.

I tried to use torch.save and torch.load_state_dict for GRACE as the original paper authors did, but the edits are not showing up as I expect. I think the edited version of the model is not being saved/loaded properly.

Any help you have to offer would be immensely appreciated! And thank you for putting together such a great framework for model editing.

@zxlzr zxlzr added the question Further information is requested label Jan 5, 2025
@pengzju
Copy link
Collaborator

pengzju commented Jan 7, 2025

It's quite challenging to load the WISE or GRACE models without creating a WISE/GRACE object, as both models include memory parameters that are part of the external structure and do not inherit from HuggingFace's PreTrainedModel. I apologize, but the scenario you mentioned is not currently feasible. It would be best to separate the generation and evaluation processes within the evaluation module. At least WISE supports offline generation.

@zxlzr
Copy link
Contributor

zxlzr commented Jan 7, 2025

hi, do you have any further questions.

@shariqahn
Copy link
Author

shariqahn commented Jan 7, 2025

I tried to perform the generation by creating the WISE/GRACE object, but it doesn't seem like the edits are there. Here is how I did it:

if ("WISE" in cfg.model_path):
    hparams = WISEHyperParams.from_hparams('../EasyEdit/hparams/WISE/eval.yaml')
    hparams.load_path = os.path.join(cfg.model_path, "wise.pt")
    editor = BaseEditor.from_hparams(hparams)
    model = editor.model
elif ('GRACE' in cfg.model_path):
    path = <llama path>
    model = AutoModelForCausalLM.from_pretrained(path, config=config, use_flash_attention_2=False, torch_dtype=torch.float16, trust_remote_code = True, device_map=device_map)
    checkpoint = os.path.join(cfg.model_path, "model.pt")
    state_dict = torch.load(checkpoint, map_location='cpu')
    model.load_state_dict(state_dict, False)
else:
    model = AutoModelForCausalLM.from_pretrained(cfg.model_path, config=config, use_flash_attention_2=False, torch_dtype=torch.float16, trust_remote_code = True, device_map=device_map)

Are you saying that I cannot use the editor.model directly? I can only see if I can utilize the generation method for the WISE object and use that output to create metrics?

Since we can calculate metrics on GRACE and WISE within your framework, is there a way to recreate those objects, generate outputs just like we do for the EasyEdit metrics, and calculate other metrics on that output?

@shariqahn
Copy link
Author

I'm also having a similar problem with AdaLoRA where the edits aren't showing up properly - it just outputs gibberish. I am using save_pretrained and from_pretrained to save and load the model. Does the same issue apply here where there are external parameters that are not being loaded properly?

@pengzju
Copy link
Collaborator

pengzju commented Jan 9, 2025

You can add editor.load(hparams.load_path) after editor = BaseEditor.from_hparams(hparams). This should allow WISE to run normally. However, GRACE currently doesn't have a save_pt interface, and the original official implementation doesn't provide it either.

@pengzju
Copy link
Collaborator

pengzju commented Jan 9, 2025

AdaLoRA might be experiencing overfitting. Have you tried reducing the learning rate? Currently, we haven't observed a large amount of garbled output when running other methods.

@shariqahn
Copy link
Author

shariqahn commented Jan 9, 2025

My evaluation of AdaLoRA is returning identical poor metrics in my evaluation for all the datasets I have tried, so I thought it was an issue with the loading of the model for evaluation rather than the method itself. Otherwise, it would be strange (though possible) that models that had differing metrics from your framework got identical metrics in my evaluation.

Am I right to be using save_pretrained and from_pretrained to save and load the model? I am not sure if this method has extra parameters like WISE and GRACE.

@pengzju
Copy link
Collaborator

pengzju commented Jan 10, 2025

Saving the model using save_pretrained is incorrect because there are additional network parameter structures. WISE has implemented an offline save function, which you can use: EasyEdit/easyeditor/models/wise/WISE.py at main · zjunlp/EasyEdit.

Additionally, I have already explained in the previous response that GRACE does not have an offline caching interface. You can implement offline model saving by following the example of WISE's save function.

@pengzju
Copy link
Collaborator

pengzju commented Jan 10, 2025

We have conducted numerous experiments with AdaLoRA, and the evaluation metrics are completely consistent with other methods. The peft_model is returned in the edit interface (code), and both saving and loading can be achieved through Hugging Face's default interfaces (save_pretrained and from_pretrained).

@zxlzr
Copy link
Contributor

zxlzr commented Jan 13, 2025

hi, do you have any further questions?

@shariqahn
Copy link
Author

shariqahn commented Jan 13, 2025

Yes, I was referring to AdaLoRA when asking about using save_pretrained. I understand that WISE and GRACE require a special implementation. For AdaLoRA, I am running

metrics, edited_model, _ = editor.edit(
    ...
)

edited_model.save_pretrained(model_save_dir)

but the results look strange because I ran several different experiments that all resulted in different metrics in your framework, but got identical evaluation metrics in my separate evaluation for a different task. So, I just wanted to clarify that my approach was correct. Thank you for clarifying that!

For GRACE, I saw that the original code does this to save. So, that's why I did the implementation I mentioned earlier, where I saved:

checkpoint = os.path.join(model_save_dir, "model.pt")
torch.save(edited_model.model.state_dict(), checkpoint)

and loaded:

model = AutoModelForCausalLM.from_pretrained(path, config=config, use_flash_attention_2=False, torch_dtype=torch.float16, trust_remote_code = True, device_map=device_map)
checkpoint = os.path.join(cfg.model_path, "model.pt")
state_dict = torch.load(checkpoint, map_location='cpu')
model.load_state_dict(state_dict, False)

However, the model didn't seem to load correctly because the loaded model doesn't seem edited. I will look into additional parameters that are necessary to save for GRACE.

For WISE, I made a slight change to your suggestion here

You can add editor.load(hparams.load_path) after editor = BaseEditor.from_hparams(hparams). This should allow WISE to run normally. However, GRACE currently doesn't have a save_pt interface, and the original official implementation doesn't provide it either.

since the editor object doesn't have a load method. I loaded a WISE object instead using:

editor = BaseEditor.from_hparams(hparams)
model = WISE(model=editor.model, config=hparams, device=editor.model.device)
model.load(hparams.load_path)

The results don't seem to have the edits still, but there was a slight change this time. I will try to reproduce the metrics given from the original saved model on the loaded model to verify.

If I misunderstood something in your suggestions, please let me know. Otherwise, I will investigate further.

@pengzju
Copy link
Collaborator

pengzju commented Jan 18, 2025

I also think that your use of LoRA is correct.

For GRACE, I believe that your offline saving method is incorrect. The code at here only saves the state_dict, which can only preserve the model architecture. However, the key and value of GRACE do not belong to the model architecture; they are just external memory and need to be saved separately. You can refer to the saving method of WISE. But since GRACE is not my work, you will need to explore this part yourself.

For WISE, as you said, “there was a slight change this time,” which proves that the saving of WISE should be OK. As for the other benchmarks, if the metrics/output do not change significantly, it might be normal. Perhaps this is a flaw of WISE that you have discovered!

@shariqahn
Copy link
Author

Thank you for the clarification! I will investigate further and see if I can resolve this.

@shariqahn
Copy link
Author

shariqahn commented Jan 22, 2025

For WISE, I am seeing a difference in the logits, but not the generated outputs. I see TODO: generation above the WISE generate method. Is this not implemented yet? Or there something additional I need to do to handle generation with WISE? I am running:

out = model.generate(inputs.input_ids, attention_mask=inputs.attention_mask, max_length=cfg.generation.max_length, max_new_tokens=cfg.generation.max_new_tokens, do_sample=False, use_cache=True, pad_token_id=left_pad_tokenizer.eos_token_id)

@pengzju
Copy link
Collaborator

pengzju commented Jan 23, 2025

def generate(self, *args, **kwargs):
        setattr(eval(f"self.model.{self.layer}"), "key_id", -1)
        return self.model.generate(*args, **kwargs)

"generate" might be problematic because the token generation process is not necessarily classified as "side memory." Additionally, "wise" can currently only operate under the condition of teacher forcing.

sry, I have no way to fix this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants