Skip to content

DeepSpeedPlugin with activation checkpoint fails #9144

Discussion options

You must be logged in to vote

Thanks @nachshonc!

I've managed to reproduce the same case without Deepspeed using torch.utils.checkpoint and our bug report model:

import deepspeed
import torch
from pytorch_lightning import LightningModule, Trainer
from pytorch_lightning.plugins import DeepSpeedPlugin
from torch.utils.data import DataLoader, Dataset


class RandomDataset(Dataset):
    def __init__(self, size, length):
        self.len = length
        self.data = torch.randn(length, size)

    def __getitem__(self, index):
        return self.data[index]

    def __len__(self):
        return self.len


class BoringModel(LightningModule):
    def __init__(self):
        super().__init__()
        self.layer = torch.nn.L…

Replies: 4 comments 4 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
2 replies
@nachshonc
Comment options

@tchaton
Comment options

Answer selected by nachshonc
Comment options

You must be logged in to vote
2 replies
@iamlockelightning
Comment options

@MSchnei
Comment options

Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
6 participants