-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
the model learns very badly on cifar10 32*32 #23
Comments
But, still, there might be some other problems. The picture looks very hazy, feels like covered by neon or something. |
@RowenaHe same problem here, did you mean fix it by something like this?:
|
@potatoker Sorry for taking so long to reply. No, this is not how I fixed this problem. I made no change in the training procedure. The only modification was in the sampling process. It goea something like this:
|
I output the intermediate data to look for reason. And I think the cause is that, for some unknown reason, during the recurrent sampling process, the trained model cannot keep the the generated data around / within [-1,1]. If no clamping in the recurrent sampling process, after 1000 inference steps, almost every single sample is out of the range of [-1,1]. So simply clamping once after the whole sampling process is over would inevitably get the picture of pure color. |
I am experiencing the same type of issue with 64x64 landscapes. Half of the images are beautiful, but the other half are solid colors, mostly black and white. I believe the reason for this runaway effect where the model can't keep the values between [-1, 1] is actually simply due to the choice of hyperparameters, and the model may also be trying to overfit. Maybe increase the weight decay and decrease the learning rate? The hyperparameters that worked for the 64x64 model may not necessarily work for your 32x32 model. |
@volcanik Thank you for your insight! I will notice you if I can somehow solve this problem in my following experiments. |
Hi! I found that using the complex, residual sampling procedure (the one Open-AI use in diffuser-samlpers), which was supposed to get the same results as the simple one-line sampling, in fact gets better results! @potatoker @volcanik @dome272 open-AI DDPM 'residual' sampler result (from the same model): |
@RowenaHe that is very interesting! I have also done a few small experiments on CIFAR10 64x64. I only trained for around 50 epochs, but I found that increasing the weight decay of AdamW to 0.1 and decreasing the learning rate to 1e-4 completely removed solid color images with the regular sampling algorithm. I think this runaway effect is most likely caused by excessively large weights. As for the learning rate, I'm not really sure if it has made any difference. Please tell me where exactly you got this new sampling procedure? BTW your results look really good now! |
@volcanik OpenAI has a public repository, and there are several available plug-in schedulers, including DDPMScheduler, DDIMScheduler, etc. You can simply use "pip install diffusers" to download them. To use them only need use "from diffusers import DDPMScheduler" and "scheduler = DDPMScheduler()", and then replace the sampling with another loop like this: There are some great jupyter notebooks demonstrating how we can use the diffusers. I put the link here for you to check out. https://github.com/huggingface/diffusion-models-class |
@RowenaHe Thank you for your suggestions. We are also trying to get this DDPM version to work on our own dataset, and get very bright colors. We will try to clamp The scheduler you mention works on a conditional model. Is that the DDPM_conditional from this github? I don't see what |
You guys are so awesome! Was having all of the above issues and you helped me solve them ❤️ @RowenaHe @thomasnicolet |
I think y is the conditional label |
Hiya! So what worked for me was clamping the output between -1 and 1 for each timestep when using the cosine schedule. This produces the greyish images that were posted earlier in this thread. Then to fix the greyish images I used code from the hugging face diffusers package which is also a technique posted about in this thread. I would post my code but it's part of coursework that's currently being marked so I can't for obvious reasons but doing the above two things really helped 😁 |
Thank you very much for your answer, @Dart120; I'll try to apply this information to check if I can improve my results. Good luck with your coursework! |
I have finally implemented what you said @Dart120, and it works as expected. Thank you again! For anyone reaching this thread in the future, the scheduler of diffusers library seems to avoid the problem of values outside range [-1, 1] by using more sophisticated ways when you call the step() function. |
Hello! Firstly, thank you for the awesome video and code which explain so well how the diffusion models are implemented in code.
I met some issues during reimplementing your code. I' m wondering if you could give me some advice on how to make it work.
I tried to use your code to generate conditional cifar10 (32*32 resolution), but so far the results look kind of bad. In my training, I changed the Unet input size and output size to 32, the corresponding resolution in self-attention, and batch-size to 256. The number of down and up blocks, as well as the bottleneck layers were kept the same as your original setting. After training 400 epochs, the generated images were almost pure color.
Then, I tried add a warmup learning schedule to 10 times as big as the original lr (1e-4) in the first 1/10 epochs, and a cosine annealing for following epochs, and trained it for 1800 epochs. But the final results still look the same as earlier
Do you have some ideas on what's wrong with my version of reimplementation ? I would really appreciate any insights.
The text was updated successfully, but these errors were encountered: