Skip to content

Encoder FFmpeg NVENC

Xaymar edited this page Jan 5, 2022 · 64 revisions

NVIDIA NVENC (via FFmpeg) Windows Linux Unstable

With the new NVIDIA NVENC integration through FFmpeg you can achieve greater recording and stream quality, at no extra expense. Since it simply uses the FFmpeg integration and exposes it to OBS Studio, including all the necessary zero-copy logic, you can switch your stream over, set some parameters, and get started with a higher quality stream right now!

Version Information
Status Version
Added 0.8
Unstable 0.9
Stable N/A
Deprecated N/A
Removed N/A

Guides

Settings

This encoder shares some settings between all other FFmpeg based encoders. Please take a look at this document to learn more about them.

Preset

Changes the default values for all encoder settings except for a few FFmpeg settings like Bitrate. The default value for fields varies between -1 or an actual Default entry. This can be used as a good baseline to create customized configurations on, as most of the time the High Quality preset performs well enough for streaming, while the High Performance preset is great for Constant Quantization Parameter (CQP) recording.

Performance Reference
FFmpeg >=4.4 p1 p2 p3 p4 p5 p6 p7
FPS 106 106 98 98 79 76 56
% 100 100 92.45 92.45 74.53 71.70 52.83
FFmpeg <=4.3 fast medium slow hp hq bd ll llhq llhp lossless losslesshp
FPS 106 98 98 106 98 98 106 98 106 48 48
% 100 92.45 92.45 100 92.45 92.45 100 92.45 100 45.28 45.28

H264

Settings exclusive to H.264/AVC encoding.

Profile

The profile determines which features in the codec are supposed to be enabled, with each higher option requiring additional encoder work. While the Baseline profile has the widest support compared to Main or High, most devices released within the last 5-10 years should have no issues with High.

Generally it is recommended to either pick Main if you're targeting devices around the iPhone 3 timespan, and High if you're targeting devices within the last 8 years. If the platform offers transcoding, you can freely choose the High profile, with some exceptions (such as Mixer and WebRTC based platforms).

Performance Reference
baseline main high high444
FPS (p1, p2) 108 106 106 74
% 100 98.15 98.15 68.52
FPS (p3, p4) 98 98 84 73
% 100 100 85.71 74.49
FPS (p5) 80 80 62 61
% 100 100 77.50 76.25
FPS (p6) 77 76 60 59
100 98.70 77.92 76.62
FPS (p7) 56 56 56 56
% 100 100 100 100

Level

Level determines the maximum macroblocks per second, which limits the resolution and framerate, while also affecting some other things like the maximum number of B-Frames. This setting is best left at Automatic, as the encoder knows what the ideal setting is for the given resolution, framerate, profile, tier and bitrate.

H265

Settings exclusive to H.265/HEVC encoding.

Profile

The profile determines which features of the codec are available and enabled, while also affecting other restrictions. The Main profile is capable of 8-bit, Main 10-bit is capable of 10-bit, and Range Extended is capable of more than 10-bit. Any of these profiles are capable of 4:2:0, 4:2:2 and 4:4:4, however the support depends on the installed hardware.

Tier

Bitrate limitations are controlled with the Tier and Level setting. The Main tier is usually meant for network streaming, while the High tier is more aimed towards intermediate storage.

Level

Just like in H264/AVC, the Level determines the maximum macroblocks and the maximum bitrate as well. This affects the maximum possible resolution and quality at any given level. Unlike H264/AVC however, the support for Levels, Tiers and Profiles depends on the power of the target device instead of the age. This setting is best left at Automatic, as the encoder knows what the ideal setting is for the given resolution, framerate, profile, tier and bitrate.

Rate Control Options

Settings that control the final quality and size.

Mode

NVIDIAs NVENC offers several different options for bitrate control, such as CBR, VBR, CQP and CQ. Each mode has different targets, which have to be taken into account when selecting the mode to use:

  • Constant Quantization Parameter (CQP) is used for indistinguishable or lossless recording. It produces constant compression output, based on the values set in the Quantization Parameters, with lower values being large while also being higher quality.
  • Constant Bitrate (CBR) is used for streaming on older protocols where a relatively constant and controllable packet size matters a lot. CBR uses the Buffer Size and Bitrate Limits to achieve its goal. There is also a High Quality variant of this which enables additional options by default.
  • Variable Bitrate (VBR) is used for local archiving as well as for modern streaming protocols. It can save space when there isn't much to encode, which pushing out high quality content when it matters. VBR uses the Buffer Size, Bitrate Limits and Quantization Parameters to achieve its goal. This also comes with a High Quality variant that enables additional options by default.
  • Constant Quality (CQ) is, like the name implies, a constant quality encoding method with the option of having an upper bitrate limit. This mode could be considered similar to x264s CRF, and achieves comparable quality. It is enabled by setting the Mode to any of the VBR options, setting Buffer Size and Target Bitrate to 0, while setting the Target Quality to a value between 0 and 100.

Two Pass

Enables/Disables a second pass over the frames, which improves bitrate distribution and may increase or decrease quality slightly. This is an expensive feature and should be turned off first if performance is an issue. Requires additional CUDA GPU resources.

Performance Reference
Disabled Quarter Res Full Res
FPS 106 82 57
% 100 77.36 53.77

Look Ahead

The number of frames to "look ahead", achieved by increasing the encoding delay by N frames. Setting this to 0 disables look ahead, which also disables both 'Adaptive I/B-Frames' options automatically. Increasing the number past a certain point results in diminishing returns, but the exact point is undefined and depends on Hardware used as well as content encoded. Requires additional CUDA GPU resources.

Performance Reference
0 >0
FPS 106 99
% 100 93.40

Adaptive I-Frames

Adaptively insert I-Frames instead of P- or B-Frames when the use of such a frame would increase quality. Similar to x264s scenecut option, and seems to work fine for streaming services like Twitch and YouTube.

Adaptive B-Frames

Adaptively insert B-Frames instead of P-Frames, or P-Frames instead of B-Frames, when the use of such a frame would increase quality.

Limits

All the possible limits for the Rate Control modes.

Target Quality

Ideal quality to achieve in CQ mode. Values closer to 0 are higher quality, while values closer to 100 are worse quality. A value of 0 disables CQ mode.

Target Bitrate

Ideal target bitrate to achieve, if possible.

Maximum Bitrate

Ideal maximum bitrate to achieve in VBR or CQ mode.

Buffer Size

Defines the ideal upper boundary for the bitrate over twice the Keyframe Interval duration.

Quantization Parameters

Minimum/Maximum QP

Lower and upper limits for the calculated quantization parameters, which affects how much something can be compressed. May break any intended bitrate limits if used wrong.

I/P/B-Frame QP

Quantization parameter values for I/P/B-Frames, which have different meaning depending on the rate control method:

  • CQP: Fixed QP values for the entire encoding duration.
  • VBR: Initial QP values for the start of the encoding session.

Adaptive Quantization

Adaptive Quantization helps improve bitrate distibrution over the frame (Spatial) or multiple frames (Temporal).

Spatial Adaptive Quantization & Strength

Spatial Adaptive Quantization improves the bitrate distribution and/or quality slightly by redistributing bitrate towards low-complexity (flat, smooth surfaces). In some high-motion games, enabling this actually results in a lower quality - more testing is required. The strength of this redistribution can be controlled in 16 steps using the slider. Requires additional CUDA GPU resources.

Temporal Adaptive Quantization

Temporal Adaptive Quantization improves the bitrate distribution and/or quality slightly by favoring I-Frames for low-motion or constant parts of a frame. This improves the quality for those parts, while freeing up some bitrate for the rest of the frame. Requires additional CUDA GPU resources.

Other Options

Options that do not fit into any other category.

Maximum B-Frames

Sets the highest amount of B-Frames there might be at any given point it time. When "Adaptive B-Frames" and "Look Ahead" are disables, sets a fixed amount of B-Frames instead. Not all GPUs support B-Frames for H.264/AVC and H.265/HEVC, so ensure that yours does by checking with the NVIDIA customer support.

B-Frame Reference Mode

Controls if B-Frames can be used as reference or not, which has no performance impact and improves quality at the same time. The following options are available:

  • Disabled (none): Disables using B-Frames as reference, which reduces quality and compression efficiency.
  • (# of B-Frames)/2) (middle): Enables using half the number of B-Frames, favoring the "middle" of the IPB group.
  • Each (each): Enables using all the B-Frames as reference, only available for H.265/HEVC.

Zero Latency

Requires testing. Seems to remove reordering delay from certain types of encoding.

Weighted Prediction

Enables weighted prediction which improves quality with luminance changes when no B-Frames are enabled. Enabling this reduces encoder performance, and it is automatically disabled if B-Frames are not set to 0. B-Frames give better quality in almost all cases. Requires additional CUDA GPU resources.

Non-reference P-Frames

Requires testing. Seems to reduce references to some not-so-useful P-Frames. No solid data on this yet.

Access Unit Delimiter

Requires testing.

Decoded Picture Buffer Size

Requires testing.

Further Information

NVENC Generations

There is conflicting information from NVIDIA and NVIDIA on this. Their own documentation lists only 6 different generations, but their own website says that Turing and Ampere are 7th Generation.

  1. CUDA-based (any GPU)
  2. Kepler
  3. Maxwell
  4. Maxwell 2nd Generation
  5. Pascal
  6. Volta & GTX 1650
  7. GTX 1650 Super and up, Turing, Ampere
  8. Not yet released

Sources: 1 2

Performance Measurements

You can measure impact of options this way:

  1. Generate a colored noise file which usually is problematic for every encoder:
    ffmpeg -hide_banner -f lavfi -y -i color=#7e7e7e:size=3840x1080:rate=60:duration=60s,noise=alls=100:allf=t+u,format=nv12 -c:v rawvideo <path to file>.yuv
    
  2. Test each option for impact with the following command (adjust as necessary):
    ffmpeg -hide_banner -s 3840x2160 -r 60/1 -pix_fmt nv12 -color_range pc -colorspace bt709 -color_primaries bt709 -color_trc bt709 -i <path to file>.yuv -c:v h264_nvenc -preset p7 -tune hq -g 120 -b:v 20000k -maxrate 20000k -bufsize 40000k -rc cbr -rc-lookahead 32 -multipass 2 -no-scenecut 0 -b_adapt 1 -spatial-aq 1 -temporal-aq 1 -nonref_p 1 -aq-strength 7 -b_ref_mode middle -bf 4 -f null -
    
  3. Write down the fps=# part after encoding is done.
Clone this wiki locally