Skip to content

waifu2x

WolframRhodium edited this page Feb 29, 2024 · 38 revisions

Waifu2x is a well-known image super-resolution neural network for anime-style arts.

Link:

Models

Includes all known publicly available waifu2x models:

  • anime_style_art: requires pre-scaled input for the scaled2.0x variant
    • noise1 noise2 noise3 scale2.0x
  • anime_style_art_rgb: requires pre-scaled input for the scale2.0x variant
    • noise0 noise1 noise2 noise3 scale2.0x
  • photo: requires pre-scaled input for the scale2.0x variant
    • noise0 noise1 noise2 noise3 scale2.0x
  • ukbench: requires pre-scaled input
    • scale2.0x
  • upconv_7_anime_style_art_rgb
    • scale2.0x noise3_scale2.0x noise2_scale2.0x noise1_scale2.0x noise0_scale2.0x
  • upconv_7_photo
    • scale2.0x noise0_scale2.0x noise1_scale2.0x noise2_scale2.0x noise3_scale2.0x
  • cunet: tile size (block_w and block_h) must be multiples of 4.
    • noise0 noise1 noise2 noise3
    • scale2.0x
    • noise0_scale2.0x noise1_scale2.0x noise2_scale2.0x noise3_scale2.0x
  • upresnet10
    • scale2.0x
    • noise0_scale2.0x noise1_scale2.0x noise2_scale2.0x noise3_scale2.0x

vsmlrt.py wrapper Usage

In order to simplify usage, we provided a Python wrapper module vsmlrt that provides full functionality of waifu2x caffe but with a more Pythonic interface:

from vsmlrt import Waifu2x, Waifu2xModel, Backend

src = core.std.BlankClip(format=vs.RGBS)

# backend could be:
#  - CPU Backend.OV_CPU(): the recommended CPU backend; generally faster than ORT-CPU.
#  - CPU Backend.ORT_CPU(num_streams=1, verbosity=2): vs-ort cpu backend.
#  - GPU Backend.ORT_CUDA(device_id=0, cudnn_benchmark=True, num_streams=1, verbosity=2)
#     - use device_id to select device
#     - set cudnn_benchmark=False to reduce script reload latency when debugging, but with slight throughput performance penalty.
#  - GPU Backend.TRT(fp16=True, device_id=0, num_streams=1): TensorRT runtime, the fastest NV GPU runtime.
flt = Waifu2x(src, noise=-1, scale=2, model=Waifu2xModel.upconv_7_anime_style_art_rgb, backend=Backend.ORT_CUDA())

Raw Model Usage

This section is mostly for reference purposes as the suggested way is to use the vsmlrt.py.

src = core.std.BlankClip(width=1920, height=1080, format=vs.RGBS)
flt = core.ov.Model(src, "upconv_7_anime_style_art_rgb_scale2.0x.onnx")

anime_style_art, anime_style_art_rgb, photo, ukbench models do not include builtin upscaling. Therefore, you need to upscale 2x using Catmull-Rom (bicubic(b=0, c=0.5)) before feeding the image to the models:

src = core.std.BlankClip(width=1920, height=1080, format=vs.RGBS)
flt = core.ov.Model(src.fmtc.resample(scale=2, kernel="bicubic", a1=0, a2=0.5), "anime_style_art_rgb_scale2.0x.onnx")

Notes

  • cunet networks work best when the tile size (block_w/block_h) is in range 60 - 150 and multiples of 4.

Benchmarking

Measurements: FPS / Device Memory (MB)

Device memory:

  • CPU: private memory including VapourSynth
  • GPU: device memory including context

RTX 3090

Software: VapourSynth R57, Windows 10 LTSC 2021, Graphics Driver 511.23.

Input size: 1920x1080

Backends

  1. vs-mlrt v6
  2. vapoursynth-waifu2x-ncnn-vulkan r4
  3. vs-mlrt v8 (driver 511.79)

Performance

FP32

Model [1] ort-cuda [1] trt [2] vulkan (540p patch) [3] ort-cuda [3] trt [3] trt (no tf32)
upconv7 6.12 / 6592 7.22 / 5694 2.83 / 10578 7.24 / 6408 7.99 / 5761 7.86 / 5785
upresnet10 4.72 / 5820 N/A N/A 5.79 / 5634 N/A N/A
cunet 2.70 / 18624 N/A 0.71 / 15082 3.28 / 18435 N/A N/A

FP16

Model [1] ort-cuda [1] trt [1] trt (2 streams) [2] vulkan [3] ort-cuda [3] trt [3] trt (2 streams)
upconv7 7.64 / 6204 13.4 / 4652 25.4 / 7852 4.20 / 20750 10.6 / 5764 16.2 / 2385 30.1 / 4096
upresnet10 6.38 / 5818 N/A N/A N/A 8.15 / 5632 N/A N/A
cunet 3.55 / 10172 N/A N/A 0.91 / 7696 (540p patch) 4.53 / 9983 N/A N/A

RTX 2080 Ti

Software: VapourSynth R57, Windows 10 LTSC 2021, Graphics Driver 511.23.

Input size: 1920x1080

Backends

  1. vs-mlrt v6
  2. VapourSynth-Waifu2x-caffe r14
  3. vapoursynth-waifu2x-ncnn-vulkan r4

Performance

FP32

Model [1] ort-cuda [1] trt [2] caffe (540p patch) [3] vulkan (540p patch)
upconv7 4.36 / 5922 4.73 / 5072 1.08 / 3159 1.40 / 10568
upresnet10 3.31 / 5150 N/A 1.03 / 7280 N/A
cunet 1.77 / 5170 (540p patch) N/A 0.73 / 6957 (360p patch) 0.60 / 6992 (360p patch)

FP16

Model [1] ort-cuda [1] trt [1] trt (2 streams) [3] vulkan (540p patch)
upconv7 5.84 / 5278 11.9 / 3055 19.2 / 5263 2.60 / 5438
upresnet10 5.14 / 5148 N/A N/A N/A
cunet 1.64 / 9502 N/A N/A 0.88 / 7686

Tesla V100

Software: VapourSynth R57, Windows Server 2019, Graphics Driver 511.23.

Input size: 1920x1080

Backends

  1. vs-mlrt v6
  2. VapourSynth-Waifu2x-caffe r14
  3. vapoursynth-waifu2x-ncnn-vulkan r4, Graphics Driver 471.68

Performance

FP32

Model [1] ort-cuda [1] trt [1] trt (2 streams) [2] caffe (540p patch) [3] vulkan (540p patch)
upconv7 5.98 / 5065 6.60 / 5033 8.43 / 9253 1.63 / 3248 1.67 / 11197
upresnet10 4.36 / 5061 N/A N/A 1.54 / 7232 N/A
cunet 2.58 / 9155 N/A N/A 1.11 / 11657 0.53 / 15705

FP16

Model [1] ort-cuda [1] trt [1] trt (2 streams) [3] vulkan
upconv7 10.4 / 5189 13.8 / 3041 26.2 / 5253 3.97 / 21369
upresnet10 6.43 / 5059 N/A N/A N/A
cunet 4.10 / 9535 N/A N/A 0.86 / 29848

Tesla A10

Software: VapourSynth R57, Windows Server 2019, Graphics Driver 511.23, lock the GPU clocks at max frequency.

Input size: 1920x1080

Backends

  1. vs-mlrt v6
  2. vapoursynth-waifu2x-ncnn-vulkan r4, Graphics Driver 471.68

Performance

FP32

Model [1] ort-cuda [1] trt [1] trt (2 streams) [2] vulkan (540p patch)
upconv7 6.94 / 9765 7.83 / 5511 8.61 / 9731 1.63 / 10892
upresnet10 3.90 / 5665 N/A N/A N/A
cunet 2.20 / 18469 N/A N/A 0.53 / 15397

FP16

Model [1] ort-cuda [1] trt [1] trt (2 streams) [2] vulkan
upconv7 9.66 / 6049 16.1 / 3501 19.9 / 5701 3.03 / 21075
upresnet10 6.53 / 5663 N/A N/A N/A
cunet 3.26 / 10017 N/A N/A 0.78 / 8011 (540p patch)

Tesla A10G

Software: VapourSynth R58, Windows Server 2022, Graphics Driver 511.65, lock the GPU clocks at max frequency.

Input size: 1920x1080

Backends

  1. vs-mlrt v8

Performance

FP32

Model [1] trt
upconv7 7.20 / 5668

FP16

Model [1] trt [1] trt (2 streams)
upconv7 16.4 / 2255 22.2 / 3981

Tesla A100 (PCIe, 40 GB)

Software: VapourSynth R57, Windows Server 2019, Graphics Driver 511.23.

Input size: 1920x1080

Backends

  1. vs-mlrt v6

Performance

FP32

Model [1] ort-cuda [1] trt [1] trt (2 streams)
upconv7 17.3 / 9827 20.0 / 5713 27.2 / 10051
upresnet10 N/A N/A N/A
cunet N/A N/A N/A

FP16

Model [1] ort-cuda [1] trt [1] trt (2 streams)
upconv7 18.3 / 6111 32.8 / 4539 57.3 / 7719
upresnet10 N/A N/A N/A
cunet N/A N/A N/A

Tesla A100 (SXM4, 80 GB)

test1

Software: VapourSynth R57-A4, Windows Server 2022, Graphics Driver 516.94.

Input size: 1920x1080

Backends

  1. vs-mlrt v9

Performance

FP16

Model [1] trt [1] trt (2 streams)
upconv7 30.4 / 2359 57.4 / 4037
cunet 19.4 / 4647 26.9 / 8558

test2

  • vsmlrt v14.test2
  • driver 545.84
  • Windows Server 2022
  • VapourSynth-classic R57.A8

Waifu2x.swin_unet_art

1920x1080 rgbs

Measurements: FPS / Device Memory (MB)

precision TRT 1 stream TRT 2 streams TRT 3 streams
fp16 5.43 / 7623.5 5.77 / 14742 5.84 / 21857
bf16 4.69 / 8058.3 4.92 / 15591 4.98 / 23124

Icelake Server

Hardware: Xeon Icelake Server 32C64T @2.90 GHz

Software: VapourSynth R57, Windows Server 2019.

Input size: 1920x1080

Backends

  1. vs-mlrt v6
  2. VapourSynth-Waifu2x-w2xc r8

Performance

FP32

Model [1] ov-cpu [2] w2xc
upconv7 1.22 / 18750 N/A
upresnet10 1.40 / 18278 N/A
cunet 0.65 / 22447 N/A
anime rgb 0.69 / 34619 0.26 / 7895

EPYC Milan

Hardware: EPYC Milan 32C64T @2.55 GHz

Software: VapourSynth R57, Windows Server 2019.

Input size: 1920x1080

Backends

  1. vs-mlrt v6
  2. VapourSynth-Waifu2x-w2xc r8

Performance

FP32

Model [1] ov-cpu [2] w2xc
upconv7 0.36 / 19583 N/A
upresnet10 0.35 / 18694 N/A
cunet 0.20 / 21644 N/A
anime rgb 0.20 / 34619 0.28 / 5398