Release v2.8.0 · nvidia-holoscan/holoscan-sdk

Release Artifacts

🐋 Docker container: tag v2.8.0-dgpu and v2.8.0-igpu
🐍 Python wheel: pip install holoscan==2.8.0
📦️ Debian packages: 2.8.0.1-1
📕 Documentation

See supported platforms for compatibility.

Release Notes

New Features and Improvements

Core

There is a change to how the Arg class handles const char* type. Both the constructor and assignment operators will now automatically convert a value provided as const char* to a std::string. This change is made to make it easier to pass arguments corresponding to string-valued parameters to Fragment::make_operator and related methods. As an example, the "input_tensor_name" parameer of BayerDemosaicOp could now be specified via Arg("input_tensor_name", "input_frame") without having to explicitly use a std::string or string literal for the second argument to the Arg constructor. Previously, pass such a const char* value would have compiled, but lead to an error in setting the parameter at application run time.
It is now also possible to add a message-based condition that takes a "receiver" or "transmitter" argument as a positional argument to Fragment::make_operator (C++) or the operator's constructor (Python). Any "receiver" or "transmitter" parameter of the condition should be specified via a string-valued argument that takes the name of the port to which the condition would apply. The SDK will then take care of automatically swapping in the actual underlying Receiver or Transmitter object used by the named port when the application is run.
The gRPC Health Checking Service is now disabled by default, regardless of whether the --driver or --worker option is used. To enable the service, set the HOLOSCAN_ENABLE_HEALTH_CHECK environment variable.
The CLI Packager no longer enables the health check service by default. Use the --health-check option to enable it with the CLI Runner.
New stream-handling APIs for use from an operator's compute method have been added. Note that previously, several of the built-in operators used a provided CudaStreamHandler utility class to implement this type of functionality, but using it required adding this utility class as a data member on the operator and then using lower-level GXF Entity APIs to work with the streams. The CudaStreamHandler also could not be used from native Python operators. The new built-in stream handling functionality is now available from both C++ and Python via public APIs that accept or return the CUDA C Runtime API's standard cudaStream_t type and do not require using any underlying GXF Entity or CudaStream classes. The new methods on InputContext are receive_cuda_stream and receive_cuda_streams. ExecutionContext now provides allocate_cuda_stream, synchronize_streams and device_from_stream methods and OutputContext provides a new set_cuda_stream method. A new user guide section covers this new stream handling capability in additional detail. The prior CudaStreamHandler utility class continues to be provided for backwards compatibility and operators currently using it will continue to interoperate seamlessly with operators using the new built-in stream handling mechanism.

Operators/Resources/Conditions

The InferenceOp now has a new parameter enable_cuda_graphs which defaults to true. Usage of CUDA Graphs had been unconditionally enabled with version 2.6 for the TensorRT backend. However, models including loops or conditions are not supported with CUDA Graphs. For these models usage of CUDA Graphs needs to be disabled.
When using the new stream handling APIs, the Operator will be able to find a provided CudaStreamPool resource even if a dedicated parameter has not been added for it. To do this for a C++ operator, just pass an Arg of type std::shared_ptr<CudaStreamPool> to make_operator. For Python, the CudaStreamPool class can be passed as a positional argument to the operator's constructor.
The C++ API documentation for all provided Condition classes (CountCondition, PeriodicCondition, etc.) now includes detailed descriptions of the available parameters.
Condition classes that need to specify the "receiver" or "transmitter" object to which they apply were previously not able to be passed as an Arg to make_operator (C++) or as a positional argument to the operator's constructor (Python). This is because the actual receiver or transmitter objects get created automatically behind the scenes by Holoscan and the application author does not have access to them. To resolve this, starting from release v2.8, any Condition class that takes a "receiver" argument can now specify a const char* or std::string (in C++) or str (in Python) with the name of the input port to which the condition should apply. The actual Receiver object created for that port will be automatically used instead by Holoscan during initialization of the condition. Similarly, any "transmitter" argument can be specified via the name of the output port corresponding to the Transmitter the condition will apply to. As a concrete example, this would allow specifying use of a CudaStreamCondition on an input port like in the following example:

In C++, assuming we have defined a class named MyOperator with an input port named "in" we could require stream synchronization for that port using:

auto my_op = make_operator<MyOperator>(
	"my_op",
	make_condition<CudaStreamCondition>("stream_sync", Arg("receiver", "in")),
	// any additional Arg or ArgList here
	);

or from Python using

my_op = MyOperator(
	self,
	CudaStreamCondition(self, receiver="in", name="stream_sync"),
	name="my_op",
	# any additional kwargs here
)

Holoviz module

Holoinfer module

Utils

HoloHub

Documentation

Breaking Changes

Bug fixes

Issue	Description
4903377	The newly introduced `RMMAllocator` and `StreamOrderedAllocator` incorrectly parse the Bytes suffix ("B") as Megabytes ("MB") for all parameters related to memory size. Please specify any memory size parameters using the "KB", "MB", "GB" or "TB" suffixes instead.
	A bug in the number of channels returned by FormatConverterOp in the specific case of a RGB VideoBuffer input with RGB output was fixed. The bug resulted in 4 channels instead of 3 being allocated if no resize or channel conversion was being performed. The output should now always have the correct number of channels.

Known Issues

This section details issues discovered during development and QA but unresolved in this release.

Issue	Description
4062979	When Operators connected in a Directed Acyclic Graph (DAG) are executed in a multithreaded scheduler, it is not ensured that their execution order in the graph is adhered.
4267272	AJA drivers cannot be built with RDMA on IGX SW 1.0 DP iGPU due to missing `nv-p2p.h`. Expected to be addressed in IGX SW 1.0 GA.
4384768	No RDMA support on JetPack 6.0 DP and IGX SW 1.0 DP iGPU due to missing `nv-p2p` kernel module. Expected to be addressed in JP 6.0 GA and IGX SW 1.0 GA respectively.
4190019	Holoviz segfaults on multi-gpu setup when specifying device using the `--gpus` flag with `docker run`. Current workaround is to use `CUDA_VISIBLE_DEVICES` in the container instead.
4210082	v4l2 applications seg faults at exit or crashes at start with '_PyGILState_NoteThreadState: Couldn't create autoTSSkey maping'
4339399	High CPU usage observed with video_replayer_distributed application. While the high CPU usage associated with the GXF UCX extension has been fixed since v1.0, distributed applications using the MultiThreadScheduler (with the `check_recession_period_ms` parameter set to `0` by default) may still experience high CPU usage. Setting the `HOLOSCAN_CHECK_RECESSION_PERIOD_MS` environment variable to a value greater than 0 (e.g. `1.5`) can help reduce CPU usage. However, this may result in increased latency for the application until the MultiThreadScheduler switches to an event-based multithreaded scheduler.
4318442	UCX cuda_ipc protocol doesn't work in Docker containers on x86_64. As a workaround, we are currently disabling the UCX cuda_ipc protocol on all platforms via the `UCX_TLS` environment variable.
4325468	The `V4L2VideoCapture` operator only supports `YUYV` and `AB24` source pixel formats, and only outputs the `RGBA` GXF video format. Other source pixel formats compatible with V4L2 can be manually defined by the user, but they're assumed to be equivalent to RGBA8888.
4325585	Applications using MultiThreadScheduler may exit early due to timeouts. This occurs when the `stop_on_deadlock_timeout` parameter is improperly set to a value equal to or less than `check_recession_period_ms`, particularly if `check_recession_period_ms` is greater than zero.
4301203	HDMI IN fails in v4l2_camera on IGX Orin Devkit for some resolution or formats. Try the latest firmware as a partial fix. Driver-level fixes expected in IGX SW 1.0 GA.
4384348	UCX termination (either ctrl+c , press 'Esc' or clicking close button) is not smooth and can show multiple error messages.
4481171	Running the driver for a distributed applications on IGX Orin devkits fails when connected to other systems through eth1. A workaround is to use eth0 port to connect to other systems for distributed workloads.
4458192	In scenarios where distributed applications have both the driver and workers running on the same host, either within a Docker container or directly on the host, there's a possibility of encountering "Address already in use" errors. A potential solution is to assign a different port number to the `HOLOSCAN_HEALTH_CHECK_PORT` environment variable (default: `8777`), for example, by using `export HOLOSCAN_HEALTH_CHECK_PORT=8780`.
4782662	Installing Holoscan wheel 2.0.0 or later as root causes error.
4768945	Distributed applications crash when the engine file is unavailable/generating engine file.
4753994	Debugging Python application may lead to segfault when expanding an operator variable.
	Wayland: holoscan::viz::Init() with existing GLFW window fails.
4394306	When Python bindings are created for a C++ Operator, it is not always guaranteed that the destructor will be called prior to termination of the Python application. As a workaround to this issue, it is recommended that any resource cleanup should happen in an operator's `stop()` method rather than in the destructor.
4902749	V4L2 applications segfault at start if using underlying NVV4L2
4909073	V4L2 and AJA applications in x86 container report Wayland `XDG_RUNTIME_DIR not set` error
4909088	CPP `video_replayer_distributed` example throws UCX errors and segfaults on close
4911129	HoloHub Endoscopy Tool Tracking application latency exceeds 50ms on Jetson devices

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.8.0