[model] support open-sora-plan v1.2 (#222)

* rename * update * update readme * update * readme * readme * polish * update
NUS-HPC-AI-Lab · Sep 27, 2024 · 868c489 · 868c489
1 parent ff918ec
commit 868c489
Show file tree

Hide file tree

Showing 7 changed files with 3,829 additions and 58 deletions.
diff --git a/README.md b/README.md
@@ -8,12 +8,13 @@ An easy and efficient system for video generation
 </p>
 
 ### Latest News 🔥
+- [2024/09] Support [Vchitect-2.0](https://github.com/Vchitect/Vchitect-2.0) and [Open-Sora-Plan v1.2.0](https://github.com/PKU-YuanGroup/Open-Sora-Plan).
 - [2024/08] 🔥 Evole from [OpenDiT](https://github.com/NUS-HPC-AI-Lab/VideoSys/tree/v1.0.0) to <b>VideoSys: An easy and efficient system for video generation.</b>
-- [2024/08] 🔥 <b>Release PAB paper: [Real-Time Video Generation with Pyramid Attention Broadcast](https://arxiv.org/abs/2408.12588).</b>
-- [2024/06] Propose Pyramid Attention Broadcast (PAB)[[paper](https://arxiv.org/abs/2408.12588)][[blog](https://oahzxl.github.io/PAB/)][[doc](./docs/pab.md)], the first approach to achieve <b>real-time</b> DiT-based video generation, delivering <b>negligible quality loss</b> without <b>requiring any training</b>.
+- [2024/08] 🔥 Release PAB paper: <b>[Real-Time Video Generation with Pyramid Attention Broadcast](https://arxiv.org/abs/2408.12588)</b>.
+- [2024/06] 🔥 Propose Pyramid Attention Broadcast (PAB)[[paper](https://arxiv.org/abs/2408.12588)][[blog](https://oahzxl.github.io/PAB/)][[doc](./docs/pab.md)], the first approach to achieve <b>real-time</b> DiT-based video generation, delivering <b>negligible quality loss</b> without <b>requiring any training</b>.
 - [2024/06] Support [Open-Sora-Plan](https://github.com/PKU-YuanGroup/Open-Sora-Plan) and [Latte](https://github.com/Vchitect/Latte).
-- [2024/03] Propose Dynamic Sequence Parallel (DSP)[[paper](https://arxiv.org/abs/2403.10266)][[doc](./docs/dsp.md)], achieves **3x** speed for training and **2x** speed for inference in Open-Sora compared with sota sequence parallelism.
-- [2024/03] Support [Open-Sora: Democratizing Efficient Video Production for All](https://github.com/hpcaitech/Open-Sora).
+- [2024/03] 🔥 Propose Dynamic Sequence Parallel (DSP)[[paper](https://arxiv.org/abs/2403.10266)][[doc](./docs/dsp.md)], achieves **3x** speed for training and **2x** speed for inference in Open-Sora compared with sota sequence parallelism.
+- [2024/03] Support [Open-Sora](https://github.com/hpcaitech/Open-Sora).
 - [2024/02] 🎉 Release [OpenDiT](https://github.com/NUS-HPC-AI-Lab/VideoSys/tree/v1.0.0): An Easy, Fast and Memory-Efficent System for DiT Training and Inference.
 
 # About

diff --git a/examples/open_sora_plan/sample.py b/examples/open_sora_plan/sample.py
@@ -2,17 +2,18 @@
 
 
 def run_base():
-    # num frames: 65 or 221
+    # open-sora-plan v1.2.0
+    # transformer_type (len, res): 93x480p 93x720p 29x480p 29x720p
     # change num_gpus for multi-gpu inference
-    config = OpenSoraPlanConfig(num_frames=65, num_gpus=1)
+    config = OpenSoraPlanConfig(version="v120", transformer_type="93x480p", num_gpus=1)
     engine = VideoSysEngine(config)
 
     prompt = "Sunset over the sea."
     # seed=-1 means random seed. >0 means fixed seed.
     video = engine.generate(
         prompt=prompt,
         guidance_scale=7.5,
-        num_inference_steps=150,
+        num_inference_steps=100,
         seed=-1,
     ).video[0]
     engine.save_video(video, f"./outputs/{prompt}.mp4")
@@ -36,7 +37,26 @@ def run_pab():
     engine.save_video(video, f"./outputs/{prompt}.mp4")
 
 
+def run_v110():
+    # open-sora-plan v1.1.0
+    # transformer_type: 65x512x512 or 221x512x512
+    # change num_gpus for multi-gpu inference
+    config = OpenSoraPlanConfig(version="v110", transformer_type="65x512x512", num_gpus=1)
+    engine = VideoSysEngine(config)
+
+    prompt = "Sunset over the sea."
+    # seed=-1 means random seed. >0 means fixed seed.
+    video = engine.generate(
+        prompt=prompt,
+        guidance_scale=7.5,
+        num_inference_steps=150,
+        seed=-1,
+    ).video[0]
+    engine.save_video(video, f"./outputs/{prompt}.mp4")
+
+
 if __name__ == "__main__":
     run_base()
     # run_low_mem()
     # run_pab()
+    # run_v110()
diff --git a/...encoders/autoencoder_kl_open_sora_plan.py → ...ers/autoencoder_kl_open_sora_plan_v110.py b/...encoders/autoencoder_kl_open_sora_plan.py → ...ers/autoencoder_kl_open_sora_plan_v110.py
@@ -124,7 +124,7 @@ def from_pretrained(cls, pretrained_model_name_or_path: Optional[Union[str, os.P
             last_ckpt_file = ckpt_files[-1]
             config_file = os.path.join(pretrained_model_name_or_path, cls.config_name)
             model = cls.from_config(config_file)
-            print("init from {}".format(last_ckpt_file))
+            # print("init from {}".format(last_ckpt_file))
             model.init_from_ckpt(last_ckpt_file)
             return model
         else:
@@ -778,7 +778,7 @@ def disable_tiling(self):
 
     def init_from_ckpt(self, path, ignore_keys=list(), remove_loss=False):
         sd = torch.load(path, map_location="cpu")
-        print("init from " + path)
+        # print("init from " + path)
         if "state_dict" in sd:
             sd = sd["state_dict"]
         keys = list(sd.keys())