Ryan D Burgert, Brian L. Price, Jason Kuen, Yijun Li, Michael S Ryoo
Presented at CVPR 2024, the MAGICK dataset is a comprehensive collection of over 140,000 high-quality, captioned, single-subject 1024x1024 RGBA images. It is large enough to be used for image generation tasks!
- Project Website: ryanndagreat.github.io/MAGICK
- Paper: Read it here!
Explore the dataset using our custom-built explorer: MAGICK Dataset Explorer
To use this dataset, download the index file and utilize the page_id
and subject
columns for data handling. subject
provides the text caption for each image. Convert a page_id
to its corresponding image URL using the following Python function:
def page_id_to_url(page_id):
return f"https://huggingface.co/datasets/OneOverZero/MAGICK/resolve/main/images/{page_id[:2]}/{page_id}.png"
- Index File: MAGICK Index (TSV Format)
- Of course, you can also git clone the hugginface repository via
git clone https://huggingface.co/datasets/OneOverZero/MAGICK
to download the whole thing in one go. Its approximately 200GB.