Programming Generative AI

supercut.mp4

18+ hours of video taking you all the way from VAEs to near real-time Stable Diffusion with PyTorch and Hugging Face... with plenty of hands-on examples to make deep learning fun again!

This repository contains the code, slides, and examples from my Programming Generative AI video course.

Overview

Programming Generative AI is a hands-on tour of deep generative modeling, taking you from building simple feedforward neural networks in PyTorch all the way to working with large multimodal models capable of understanding both text and images. Along the way, you will learn how to train your own generative models from scratch to create an infinity of images, generate text with large language models (LLMs) similar to the ones that power applications like ChatGPT, write your own text-to-image pipeline to understand how prompt-based generative models actually work, and personalize large pretrained models like Stable Diffusion to generate images of novel subjects in unique visual styles (among other things).

Course Materials

The code, slides, and exercises in this repository are (and will always be) freely available. The corresponding videos can be purchased on:

InformIT: individual à la carte purchase (40% off with code: VIDEO40)
O'Reilly Learning: monthly subscription

The easiest way to get started (videos or not) is to use a cloud notebook environment/platform like Google Colab (or Kaggle, Paperspace, etc.). For convenience I've provided links to the raw Jupyter notebooks for local development, an NBViewer link if you would like to browse the code without cloning the repo (or you can use the built-in Github viewer), and a Colab link if you would like to interactively run the code without setting up a local development environment (and fighting with CUDA libraries).

Notebook	Slides	NBViewer (static)	Google Colab (interactive)
Lesson 1: The What, Why, and How of Generative AI	pdf
Lesson 2: PyTorch for the Impatient	pdf
Lesson 3: Latent Space Rules Everything Around Me	pdf
Lesson 4: Demystifying Diffusion	pdf
Lesson 5: Generating and Encoding Text with Transformers	pdf
Lesson 6: Connecting Text and Images	pdf
Lesson 7: Post-Training Procedures for Diffusion Models	pdf

If you find any errors in the code or materials, please open a Github issue or email [email protected].

Local Setup

git clone https://github.com/jonathandinu/programming-generative-ai.git
cd programming-generative-ai

Code implemented and tested with Python 3.10.12 (other versions >=3.8 are likely to work fine but buyer beware...). To install all of the packages used across the notebooks in a local virtual environment:

# pyenv install 3.10.12
python --version
# => Python 3.10.12

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

If using pyenv to manage Python versions, pyenv should automatically use the version listed in .python-version when changing into this directory.

Additionally, the notebooks are setup with a cell to automatically select an appropriate device (GPU) based on what is available. If on a Windows or Linux machine, both NVIDIA and AMD GPUs should work (though this has only been tested with NVIDIA). And if on an Apple Silicon Mac, Metal Performance Shaders will be used.

import torch

# default device boilerplate
device = (
    "cuda" # Device for NVIDIA or AMD GPUs
    if torch.cuda.is_available()
    else "mps" # Device for Apple Silicon (Metal Performance Shaders)
    if torch.backends.mps.is_available()
    else "cpu"
)
print(f"Using {device} device")

If no compatible device can be found, the code will default to a CPU backend. This should be fine for Lessons 1 and 2 but for any of the image generation examples (pretty much everything after lesson 2), not using a GPU will likely be uncomfortably slow—in that case I would recommend using the Google Colab links in the table above.

Skill Level

Intermediate to advanced

Learn How To

Train a variational autoencoder (VAE) with PyTorch to learn a compressed latent space of images.
Generate and edit realistic human faces with unconditional diffusion models and SDEdit.
Use large language models such as GPT2 to generate text with Hugging Face Transformers.
Perform text-based semantic image search using multimodal models like CLIP.
Program your own text-to-image pipeline to understand how prompt-based generative models like Stable Diffusion actually work.
Properly evaluate generative models, both qualitatively and quantitatively.
Automatically caption images using pretrained foundation models.
Generate images in a specific visual style by efficiently fine-tuning Stable Diffusion with LoRA.
Create personalized AI avatars by teaching pretrained diffusion models new subjects and concepts with Dreambooth.
Guide the structure and composition of generated images using depth and edge conditioned ControlNets.
Perform near real-time inference with SDXL Turbo for frame-based video-to-video translation.

Who Should Take This Course

Engineers and developers interested in building generative AI systems and applications.
Data scientists interested in working with state-of-the-art deep learning models.
Students, researchers, and academics looking for an applied or hands-on resource to complement their theoretical or conceptual knowledge.
Technical artists and creative coders who want to augment their creative practice.
Anyone interested in working with generative AI who does not know where or how to start.

Prerequisites

Comfortable programming in Python
Knowledge of machine learning basics
Familiarity with deep learning and neural networks will be helpful but is not required

Lesson Descriptions

Lesson 1: The What, Why, and How of Generative AI

Lesson 1 starts off with an introduction to what generative AI actually is, at least as it's relevant to this course, before moving into the specifics of deep generative modeling. It covers the plethora of possible multimodal models (in terms of input and output modalities) and how it is possible for algorithms to actually generate rich media seemingly out of thin air. The lesson wraps up with a bit of the formalization and theory of deep generative models, and the tradeoffs between the various types of generative modeling architectures.

Lesson 2: PyTorch for the Impatient

Lesson 2 begins with an introduction to PyTorch and deep learning frameworks in general. I show you how the combination of automatic differentiation and transparent computation on GPUs have really enabled the current explosion of deep learning research and applications. Next, I show you how you can use PyTorch to implement and learn a linear regression model—as a stepping stone to building much more complex neural networks. Finally, the lesson wraps up by combining all of the components that PyTorch provides to build a simple feedforward multi-layer perceptron.

Lesson 3: Latent Space Rules Everything Around Me

Lesson 3 starts with a primer on how computer programs actually represent images as tensors of numbers. I cover the details of convolutional neural networks and the specific architectural features that enable computers “to see”. Next, you get your first taste of latent variable models by building and training a simple autoencoder to learn a compressed representation of input images. At the end of the lesson, you encounter your first proper generative model by adding probabilistic sampling to the autoencoder architecture to arrive at the variational autoencoder (VAE)—a key component in future generative models that we will encounter.

Lesson 4: Demystifying Diffusion

Lesson 4 begins with a conceptual introduction to diffusion models, a key component in current state of the art text-to-image systems such as Stable Diffusion. Lesson 4 is your first real introduction to the Hugging Face ecosystem of open-source libraries, where you will see how we can use the Diffusers library to generate images from random noise. The lesson then slowly peels back the layers on the library to deconstruct the diffusion process and show you the specifics of how a diffusion pipeline actually works. Finally, you learn how to leverage the unique affordances of a diffusion model’s iterative denoising process to interpolate between images, perform image-to-image translation, and even restore and enhance images.

Lesson 5: Generating and Encoding Text with Transformers

Just as Lesson 4 was all about images, Lesson 5 is all about text. It starts with a conceptual introduction to the natural language processing pipeline, as well as an introduction to probabilistic models of language. You then learn how you can convert text into a representation more readily understood by generative models, and explore the broader utility of representing words as vectors. The lesson ends with a treatment of the transformer architecture, where you will see how you can use the Hugging Face Transformers library to perform inference with pre-trained large language models (LLMs) to generate text from scratch.

Lesson 6: Connecting Text and Images

Lesson 6 starts off with a conceptual introduction to multimodal models and the requisite components needed. You see how contrastive language image pre-training jointly learns a shared model of images and text, and learn how that shared latent space can be used to build a semantic, image search engine. The lesson ends with a conceptual overview of latent diffusion models, before deconstructing a Stable Diffusion pipeline to see precisely how text-to-image systems can turn a user supplied prompt into a never-before-seen image.

Lesson 7: Post-Training Procedures for Diffusion Models

Lesson 7 is all about adapting and augmenting existing pre-trained multimodal models. It starts with the more mundane, but exceptionally important, task of evaluating generative models before moving on to methods and techniques for parameter efficient fine tuning. You then learn how to teach a pre-trained text-to-image model such as Stable Diffusion about new styles, subjects, and conditionings. The lesson finishes with techniques to make diffusion much more efficient to approach near real-time image generation.

Copyright Notice and License

For permission to use the content in your own presentation (blog posts, lectures, videos, courses, etc.) please contact [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
images		images
loras		loras
notebooks		notebooks
slides		slides
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Programming Generative AI

Overview

Course Materials

Local Setup

Skill Level

Learn How To

Who Should Take This Course

Prerequisites

Lesson Descriptions

Lesson 1: The What, Why, and How of Generative AI

Lesson 2: PyTorch for the Impatient

Lesson 3: Latent Space Rules Everything Around Me

Lesson 4: Demystifying Diffusion

Lesson 5: Generating and Encoding Text with Transformers

Lesson 6: Connecting Text and Images

Lesson 7: Post-Training Procedures for Diffusion Models

Copyright Notice and License

About

Languages

License

jonathandinu/programming-generative-ai

Folders and files

Latest commit

History

Repository files navigation

Programming Generative AI

Overview

Course Materials

Local Setup

Skill Level

Learn How To

Who Should Take This Course

Prerequisites

Lesson Descriptions

Lesson 1: The What, Why, and How of Generative AI

Lesson 2: PyTorch for the Impatient

Lesson 3: Latent Space Rules Everything Around Me

Lesson 4: Demystifying Diffusion

Lesson 5: Generating and Encoding Text with Transformers

Lesson 6: Connecting Text and Images

Lesson 7: Post-Training Procedures for Diffusion Models

Copyright Notice and License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages