Skip to content

Encoder FFmpeg NVENC

Xaymar edited this page May 16, 2023 · 64 revisions

NVIDIA NVENC (via FFmpeg) 🟢Windows Linux

With the new NVIDIA NVENC integration through FFmpeg you can achieve greater recording and stream quality, at no extra expense. Since it simply uses the FFmpeg integration and exposes it to OBS Studio, including all the necessary zero-copy logic, you can switch your stream over, set some parameters, and get started with a higher quality stream right now!

Version Information
Status Version
🔴Added 0.8
🟠Unstable 0.9
🟢Stable 0.12.0a279
⚠️Deprecated N/A
❌Removed N/A

Settings

This encoder shares some settings between all other FFmpeg based encoders. Please take a look at this document to learn more about them.

Preset (-preset)

Changes the default values for all encoder settings except for a few FFmpeg settings like Bitrate, which can give a good baseline for your own configuration. The default value for fields varies between -1, 0 or an actual Default entry due to limitation with OBS's properties UI.

Tune (-tune)

Defines the tune which the encoder should have, if supported by the hardware. Available with FFmpeg 4.4 and OBS Studio 27.2.x.

H264

Settings exclusive to H.264/AVC encoding.

Profile (-profile)

The profile determines which features in the codec are supposed to be enabled, with each higher option requiring additional encoder work. While the Baseline profile has the widest support compared to Main or High, most devices released within the last 5-10 years should have no issues with High.

Generally it is recommended to either pick Main if you're targeting devices around the iPhone 3 timespan, and High if you're targeting devices within the last 8 years. If the platform offers transcoding, you can freely choose the High profile, with some exceptions (such as Mixer and WebRTC based platforms).

Level (-level)

Level determines the maximum macroblocks per second, which limits the resolution and framerate, while also affecting some other things like the maximum number of B-Frames. This setting is best left at Automatic, as the encoder knows what the ideal setting is for the given resolution, framerate, profile, tier and bitrate.

H265

Settings exclusive to H.265/HEVC encoding.

Profile (-profile)

The profile determines which features of the codec are available and enabled, while also affecting other restrictions. The Main profile is capable of 8-bit, Main 10-bit is capable of 10-bit, and Range Extended is capable of more than 10-bit. Any of these profiles are capable of 4:2:0, 4:2:2 and 4:4:4, however the support depends on the installed hardware.

Tier (-tier)

Bitrate limitations are controlled with the Tier and Level setting. The Main tier is usually meant for network streaming, while the High tier is more aimed towards intermediate storage.

Level (-level)

Just like in H264/AVC, the Level determines the maximum macroblocks and the maximum bitrate as well. This affects the maximum possible resolution and quality at any given level. Unlike H264/AVC however, the support for Levels, Tiers and Profiles depends on the power of the target device instead of the age. This setting is best left at Automatic, as the encoder knows what the ideal setting is for the given resolution, framerate, profile, tier and bitrate.

Rate Control Options

Settings that control the final quality and size.

Mode (-rc)

NVIDIAs NVENC offers several different options for bitrate control, such as CBR, VBR, CQP and CQ. Each mode has different targets, which have to be taken into account when selecting the mode to use:

  • Constant Quantization Parameter (CQP) is used for indistinguishable or lossless recording. It produces constant compression output, based on the values set in the Quantization Parameters, with lower values being large while also being higher quality.
  • Constant Bitrate (CBR) is used for streaming on older protocols where a relatively constant and controllable packet size matters a lot. CBR uses the Buffer Size and Bitrate Limits to achieve its goal. There is also a High Quality variant of this which enables additional options by default.
  • Variable Bitrate (VBR) is used for local archiving as well as for modern streaming protocols. It can save space when there isn't much to encode, which pushing out high quality content when it matters. VBR uses the Buffer Size, Bitrate Limits and Quantization Parameters to achieve its goal. This also comes with a High Quality variant that enables additional options by default.
    • Constant Quality (CQ) is a subset of VBR which attempts to achieve a certain quality level. It works similar to x264s CRF, but is not a replacement for CRF and should not be treated as CRF. It requires that the Buffer Size and Target Bitrate are set to 0, while Target Quality is set to any value equal to or above 1.0.

Two Pass (-2pass)

Enables/Disables a second pass over the frames, which improves bitrate distribution and may increase or decrease quality slightly. This is an expensive feature and should be turned off first if performance is an issue. Available with OBS Studio 27.1 and earlier, or FFmpeg 4.3 and earlier. Requires additional CUDA GPU resources.

Multi-Pass (-multipass)

Control if, how many and what passes the encoder will have while encoding the video. Multiple passes are expensive and should be disabled first if performance is a problem. Available with OBS Studio 27.2 and newer, or FFmpeg 4.4 and newer. Requires additional CUDA GPU resources.

Look Ahead (-rc-lookahead)

The number of frames to "look ahead", achieved by increasing the encoding delay by N frames. Setting this to 0 disables look ahead, which also disables both 'Adaptive I/B-Frames' options automatically. Increasing the number past a certain point results in diminishing returns, but the exact point is undefined and depends on Hardware used as well as content encoded. Requires additional CUDA GPU resources.

Adaptive I-Frames (-no-scenecut but inverted)

Adaptively insert I-Frames instead of P- or B-Frames when the use of such a frame would increase quality. Similar to x264s scenecut option, and seems to work fine for streaming services like Twitch and YouTube.

Adaptive B-Frames (-b_adapt)

Adaptively insert B-Frames instead of P-Frames, or P-Frames instead of B-Frames, when the use of such a frame would increase quality.

Limits

All the possible limits for the Rate Control modes.

Target Quality (-cq)

The quality to attempt to achieve in VBR encoding, with values closer towards 1.0 being indistinguishable from the original. Values below 1 disable the quality control and let the driver handle the ideal quality level.

Target Bitrate (-b)

Ideal target bitrate to achieve, if possible.

Maximum Bitrate (-maxrate)

Ideal maximum bitrate to achieve in VBR or CQ mode.

Buffer Size (-bufsize)

Defines the ideal upper boundary for the bitrate over twice the Keyframe Interval duration.

Quantization Parameters

Minimum/Maximum QP (-qmin, -qmax)

Lower and upper limits for the calculated quantization parameters, which affects how much something can be compressed. May break any intended bitrate limits if used wrong.

I/P/B-Frame QP (-init_qpI, -init_qpP, -init_qpB)

Quantization parameter values for I/P/B-Frames, which have different meaning depending on the rate control method:

  • CQP: Fixed QP values for the entire encoding duration.
  • VBR: Initial QP values for the start of the encoding session.

Adaptive Quantization

Adaptive Quantization adjusts bitrate distibrution and usually improves subjective quality while sacrificing objective quality. This can be enabled in two different ways:

Spatial Adaptive Quantization & Strength (-spatial-aq, -aq-strength)

Adaptively adjusts the bitrate distribution within a single frame to favor smoother surfaces. The strength controls how aggressively it will drain bitrate from textured and complex areas of the frame. The strength of this redistribution can be controlled in 15 steps using the slider, with 0 being an invalid setting. Can improve subjective quality for individual frames. Requires additional CUDA GPU resources.

Temporal Adaptive Quantization (-temporal-aq)

Adaptively adjusts the bitrate distribution over multiple frames by favouring some frames over others based on their content and other information. Can improve subjective quality while the frame is in motion. Requires additional CUDA GPU resources.

Other Options

Options that do not fit into any other category.

Maximum B-Frames (-bf)

Sets the highest amount of B-Frames there might be at any given point it time. When "Adaptive B-Frames" and "Look Ahead" are disables, sets a fixed amount of B-Frames instead. Not all GPUs support B-Frames for H.264/AVC and H.265/HEVC, so ensure that yours does by checking with the NVIDIA customer support. Requires additional GPU resources.

B-Frame Reference Mode (-b_ref_mode)

Controls if and how many B-Frames can be referenced to by the encoder, which may have an impact on encoding performance and memory requirement. Requires additional GPU resources.

Zero Latency (-zerolatency)

Attempts to remove any reordering delay on the decoding side from the encoded video, if at all possible. Requires additional GPU resources.

Weighted Prediction (-weighted_pred)

Enables weighted prediction which improves quality with luminance changes when no B-Frames are enabled. Enabling this reduces encoder performance, and it is automatically disabled if B-Frames are not set to 0. B-Frames give better quality in almost all cases. Requires additional CUDA GPU resources.

Non-reference P-Frames (-nonref_p)

If enabled, sometimes P-Frames will be marked as unreferencable and will not be used by any other frame in the encoded bitstream. Sometimes frees up some references to be used by more useful frames for improved quality, but the chances of hitting such a condition are extremely low. Requires additional CUDA GPU resources.

Reference Frames (-refs)

Defines the ideal limit of Reference Frames the encoder may use, but is not a hard limit as the encoder may at will choose to use more if it needs them. Too many reference frames can significantly reduce the number of devices that can play back the encoded bitstream, as many devices in the market only implement the absolute minimum. This limit will not be reflected in tools like MediaInfo, and may even be overriden by other limits.

Low Delay Key-Frame Scale (-ldkfs)

Specifies the ratio of I-Frame bits to P-Frame bits for single frame VBV and CBR rate control mode. Has a direct impact on encoding latency and quality, in that setting it reduces quality significantly but improves latency. Mostly useful for Low Latency and Ultra Low Latency tune. Available with OBS Studio 27.2 and newer, or FFmpeg 4.4 and newer. Requires additional CUDA GPU resources.

Further Information

NVENC Generations

There is conflicting information from NVIDIA and NVIDIA on this. Their own documentation lists only 6 different generations, but their own website says that Turing and Ampere are 7th Generation.

  1. CUDA-based (any GPU)
  2. Kepler
  3. Maxwell
  4. Maxwell 2nd Generation
  5. Pascal
  6. Volta & GTX 1650
  7. GTX 1650 Super and up, Turing, Ampere
  8. 4000 Series

Sources: 1 2

Performance Measurements

You can measure impact of options this way:

  1. Generate a colored noise file which usually is problematic for every encoder:
    ffmpeg -hide_banner -f lavfi -y -i color=#7e7e7e:size=1920x1080:rate=60:duration=60s,noise=alls=100:allf=t+u,format=yuv420p -pix_fmt yuv420p -f yuv4mpegpipe <FILE>.y4m
    
  2. Test each option for impact with the following command (adjust as necessary):
    ffmpeg -hide_banner -s 1920x1080 -r 60/1 -pix_fmt yuv420p -color_range pc -colorspace bt709 -color_primaries bt709 -color_trc bt709 -i <FILE>.y4m -c:v h264_nvenc -preset p7 -tune hq -g 120 -b:v 20000k -maxrate 20000k -bufsize 40000k -rc cbr -rc-lookahead 32 -multipass 2 -no-scenecut 0 -b_adapt 1 -spatial-aq 1 -temporal-aq 1 -nonref_p 1 -aq-strength 7 -b_ref_mode middle -bf 4 -f null -
    
  3. Write down the fps=# part after encoding is done.

Measurements

The numbers below are for reference and are not guarantees, as even small differences can change a lot of numbers.

Presets with Multi-Pass
1920x1080 NV12, CBR, HQ
H.264 on RTX 3090
P1 P2 P3 P4 P5 P6 P7
Single Pass 458 460 315 317 237 235 235
Two Pass at Quarter Resolution 384 386 384 371 318 313 315
Two Pass at Full Resolution 241 241 241 236 212 212 211
Profiles at each Preset
1920x1080, CBR, HQ
H.264 on RTX 3090
P1 P2 P3 P4 P5 P6 P7
High 4:4:4 Predictive 448 448 315 311 235 235 234
High 489 488 315 311 235 235 234
Main 487 485 391 388 297 298 293
Baseline 763 760 409 411 308 298 293
Look Ahead
0 >0
FPS 106 99
% 100 93.40
Clone this wiki locally