Skip to content

v2.8.0

Compare
Choose a tag to compare
@tbirdso tbirdso released this 08 Jan 16:15
· 1 commit to main since this release
c6647da

Release Artifacts

See supported platforms for compatibility.

Release Notes

New Features and Improvements

Core
  • There is a change to how the Arg class handles const char* type. Both the constructor and assignment operators will now automatically convert a value provided as const char* to a std::string. This change is made to make it easier to pass arguments corresponding to string-valued parameters to Fragment::make_operator and related methods. As an example, the "input_tensor_name" parameer of BayerDemosaicOp could now be specified via Arg("input_tensor_name", "input_frame") without having to explicitly use a std::string or string literal for the second argument to the Arg constructor. Previously, pass such a const char* value would have compiled, but lead to an error in setting the parameter at application run time.

  • It is now also possible to add a message-based condition that takes a "receiver" or "transmitter" argument as a positional argument to Fragment::make_operator (C++) or the operator's constructor (Python). Any "receiver" or "transmitter" parameter of the condition should be specified via a string-valued argument that takes the name of the port to which the condition would apply. The SDK will then take care of automatically swapping in the actual underlying Receiver or Transmitter object used by the named port when the application is run.

  • The gRPC Health Checking Service is now disabled by default, regardless of whether the --driver or --worker option is used. To enable the service, set the HOLOSCAN_ENABLE_HEALTH_CHECK environment variable.

  • The CLI Packager no longer enables the health check service by default. Use the --health-check option to enable it with the CLI Runner.

  • New stream-handling APIs for use from an operator's compute method have been added. Note that previously, several of the built-in operators used a provided CudaStreamHandler utility class to implement this type of functionality, but using it required adding this utility class as a data member on the operator and then using lower-level GXF Entity APIs to work with the streams. The CudaStreamHandler also could not be used from native Python operators. The new built-in stream handling functionality is now available from both C++ and Python via public APIs that accept or return the CUDA C Runtime API's standard cudaStream_t type and do not require using any underlying GXF Entity or CudaStream classes. The new methods on InputContext are receive_cuda_stream and receive_cuda_streams. ExecutionContext now provides allocate_cuda_stream, synchronize_streams and device_from_stream methods and OutputContext provides a new set_cuda_stream method. A new user guide section covers this new stream handling capability in additional detail. The prior CudaStreamHandler utility class continues to be provided for backwards compatibility and operators currently using it will continue to interoperate seamlessly with operators using the new built-in stream handling mechanism.

Operators/Resources/Conditions
  • The InferenceOp now has a new parameter enable_cuda_graphs which defaults to true. Usage of CUDA Graphs had been unconditionally enabled with version 2.6 for the TensorRT backend. However, models including loops or conditions are not supported with CUDA Graphs. For these models usage of CUDA Graphs needs to be disabled.

  • When using the new stream handling APIs, the Operator will be able to find a provided CudaStreamPool resource even if a dedicated parameter has not been added for it. To do this for a C++ operator, just pass an Arg of type std::shared_ptr<CudaStreamPool> to make_operator. For Python, the CudaStreamPool class can be passed as a positional argument to the operator's constructor.

  • The C++ API documentation for all provided Condition classes (CountCondition, PeriodicCondition, etc.) now includes detailed descriptions of the available parameters.

  • Condition classes that need to specify the "receiver" or "transmitter" object to which they apply were previously not able to be passed as an Arg to make_operator (C++) or as a positional argument to the operator's constructor (Python). This is because the actual receiver or transmitter objects get created automatically behind the scenes by Holoscan and the application author does not have access to them. To resolve this, starting from release v2.8, any Condition class that takes a "receiver" argument can now specify a const char* or std::string (in C++) or str (in Python) with the name of the input port to which the condition should apply. The actual Receiver object created for that port will be automatically used instead by Holoscan during initialization of the condition. Similarly, any "transmitter" argument can be specified via the name of the output port corresponding to the Transmitter the condition will apply to. As a concrete example, this would allow specifying use of a CudaStreamCondition on an input port like in the following example:

In C++, assuming we have defined a class named MyOperator with an input port named "in" we could require stream synchronization for that port using:

auto my_op = make_operator<MyOperator>(
	"my_op",
	make_condition<CudaStreamCondition>("stream_sync", Arg("receiver", "in")),
	// any additional Arg or ArgList here
	);

or from Python using

my_op = MyOperator(
	self,
	CudaStreamCondition(self, receiver="in", name="stream_sync"),
	name="my_op",
	# any additional kwargs here
)
Holoviz module
Holoinfer module
Utils
HoloHub
Documentation

Breaking Changes

Bug fixes

Issue Description
4903377 The newly introduced RMMAllocator and StreamOrderedAllocator incorrectly parse the Bytes suffix ("B") as Megabytes ("MB") for all parameters related to memory size. Please specify any memory size parameters using the "KB", "MB", "GB" or "TB" suffixes instead.
A bug in the number of channels returned by FormatConverterOp in the specific case of a RGB VideoBuffer input with RGB output was fixed. The bug resulted in 4 channels instead of 3 being allocated if no resize or channel conversion was being performed. The output should now always have the correct number of channels.

Known Issues

This section details issues discovered during development and QA but unresolved in this release.

Issue Description
4062979 When Operators connected in a Directed Acyclic Graph (DAG) are executed in a multithreaded scheduler, it is not ensured that their execution order in the graph is adhered.
4267272 AJA drivers cannot be built with RDMA on IGX SW 1.0 DP iGPU due to missing nv-p2p.h. Expected to be addressed in IGX SW 1.0 GA.
4384768 No RDMA support on JetPack 6.0 DP and IGX SW 1.0 DP iGPU due to missing nv-p2p kernel module. Expected to be addressed in JP 6.0 GA and IGX SW 1.0 GA respectively.
4190019 Holoviz segfaults on multi-gpu setup when specifying device using the --gpus flag with docker run. Current workaround is to use CUDA_VISIBLE_DEVICES in the container instead.
4210082 v4l2 applications seg faults at exit or crashes at start with '_PyGILState_NoteThreadState: Couldn't create autoTSSkey maping'
4339399 High CPU usage observed with video_replayer_distributed application. While the high CPU usage associated with the GXF UCX extension has been fixed since v1.0, distributed applications using the MultiThreadScheduler (with the check_recession_period_ms parameter set to 0 by default) may still experience high CPU usage. Setting the HOLOSCAN_CHECK_RECESSION_PERIOD_MS environment variable to a value greater than 0 (e.g. 1.5) can help reduce CPU usage. However, this may result in increased latency for the application until the MultiThreadScheduler switches to an event-based multithreaded scheduler.
4318442 UCX cuda_ipc protocol doesn't work in Docker containers on x86_64. As a workaround, we are currently disabling the UCX cuda_ipc protocol on all platforms via the UCX_TLS environment variable.
4325468 The V4L2VideoCapture operator only supports YUYV and AB24 source pixel formats, and only outputs the RGBA GXF video format. Other source pixel formats compatible with V4L2 can be manually defined by the user, but they're assumed to be equivalent to RGBA8888.
4325585 Applications using MultiThreadScheduler may exit early due to timeouts. This occurs when the stop_on_deadlock_timeout parameter is improperly set to a value equal to or less than check_recession_period_ms, particularly if check_recession_period_ms is greater than zero.
4301203 HDMI IN fails in v4l2_camera on IGX Orin Devkit for some resolution or formats. Try the latest firmware as a partial fix. Driver-level fixes expected in IGX SW 1.0 GA.
4384348 UCX termination (either ctrl+c , press 'Esc' or clicking close button) is not smooth and can show multiple error messages.
4481171 Running the driver for a distributed applications on IGX Orin devkits fails when connected to other systems through eth1. A workaround is to use eth0 port to connect to other systems for distributed workloads.
4458192 In scenarios where distributed applications have both the driver and workers running on the same host, either within a Docker container or directly on the host, there's a possibility of encountering "Address already in use" errors. A potential solution is to assign a different port number to the HOLOSCAN_HEALTH_CHECK_PORT environment variable (default: 8777), for example, by using export HOLOSCAN_HEALTH_CHECK_PORT=8780.
4782662 Installing Holoscan wheel 2.0.0 or later as root causes error.
4768945 Distributed applications crash when the engine file is unavailable/generating engine file.
4753994 Debugging Python application may lead to segfault when expanding an operator variable.
Wayland: holoscan::viz::Init() with existing GLFW window fails.
4394306 When Python bindings are created for a C++ Operator, it is not always guaranteed that the destructor will be called prior to termination of the Python application. As a workaround to this issue, it is recommended that any resource cleanup should happen in an operator's stop() method rather than in the destructor.
4902749 V4L2 applications segfault at start if using underlying NVV4L2
4909073 V4L2 and AJA applications in x86 container report Wayland XDG_RUNTIME_DIR not set error
4909088 CPP video_replayer_distributed example throws UCX errors and segfaults on close
4911129 HoloHub Endoscopy Tool Tracking application latency exceeds 50ms on Jetson devices