v2.8.0
Release Artifacts
- π Docker container: tag
v2.8.0-dgpu
andv2.8.0-igpu
- π Python wheel:
pip install holoscan==2.8.0
- π¦οΈ Debian packages:
2.8.0.1-1
- π Documentation
See supported platforms for compatibility.
Release Notes
New Features and Improvements
Core
-
There is a change to how the
Arg
class handlesconst char*
type. Both the constructor and assignment operators will now automatically convert a value provided asconst char*
to astd::string
. This change is made to make it easier to pass arguments corresponding to string-valued parameters toFragment::make_operator
and related methods. As an example, the "input_tensor_name" parameer ofBayerDemosaicOp
could now be specified viaArg("input_tensor_name", "input_frame")
without having to explicitly use astd::string
or string literal for the second argument to theArg
constructor. Previously, pass such aconst char*
value would have compiled, but lead to an error in setting the parameter at application run time. -
It is now also possible to add a message-based condition that takes a "receiver" or "transmitter" argument as a positional argument to
Fragment::make_operator
(C++) or the operator's constructor (Python). Any "receiver" or "transmitter" parameter of the condition should be specified via a string-valued argument that takes the name of the port to which the condition would apply. The SDK will then take care of automatically swapping in the actual underlyingReceiver
orTransmitter
object used by the named port when the application is run. -
The gRPC Health Checking Service is now disabled by default, regardless of whether the
--driver
or--worker
option is used. To enable the service, set theHOLOSCAN_ENABLE_HEALTH_CHECK
environment variable. -
The CLI Packager no longer enables the health check service by default. Use the
--health-check
option to enable it with the CLI Runner. -
New stream-handling APIs for use from an operator's
compute
method have been added. Note that previously, several of the built-in operators used a providedCudaStreamHandler
utility class to implement this type of functionality, but using it required adding this utility class as a data member on the operator and then using lower-level GXF Entity APIs to work with the streams. TheCudaStreamHandler
also could not be used from native Python operators. The new built-in stream handling functionality is now available from both C++ and Python via public APIs that accept or return the CUDA C Runtime API's standardcudaStream_t
type and do not require using any underlying GXFEntity
orCudaStream
classes. The new methods onInputContext
arereceive_cuda_stream
andreceive_cuda_streams
.ExecutionContext
now providesallocate_cuda_stream
,synchronize_streams
anddevice_from_stream
methods andOutputContext
provides a newset_cuda_stream
method. A new user guide section covers this new stream handling capability in additional detail. The priorCudaStreamHandler
utility class continues to be provided for backwards compatibility and operators currently using it will continue to interoperate seamlessly with operators using the new built-in stream handling mechanism.
Operators/Resources/Conditions
-
The
InferenceOp
now has a new parameterenable_cuda_graphs
which defaults totrue
. Usage of CUDA Graphs had been unconditionally enabled with version 2.6 for the TensorRT backend. However, models including loops or conditions are not supported with CUDA Graphs. For these models usage of CUDA Graphs needs to be disabled. -
When using the new stream handling APIs, the Operator will be able to find a provided
CudaStreamPool
resource even if a dedicated parameter has not been added for it. To do this for a C++ operator, just pass anArg
of typestd::shared_ptr<CudaStreamPool>
tomake_operator
. For Python, theCudaStreamPool
class can be passed as a positional argument to the operator's constructor. -
The C++ API documentation for all provided
Condition
classes (CountCondition
,PeriodicCondition
, etc.) now includes detailed descriptions of the available parameters. -
Condition
classes that need to specify the "receiver" or "transmitter" object to which they apply were previously not able to be passed as anArg
tomake_operator
(C++) or as a positional argument to the operator's constructor (Python). This is because the actual receiver or transmitter objects get created automatically behind the scenes by Holoscan and the application author does not have access to them. To resolve this, starting from release v2.8, anyCondition
class that takes a "receiver" argument can now specify aconst char*
orstd::string
(in C++) orstr
(in Python) with the name of the input port to which the condition should apply. The actualReceiver
object created for that port will be automatically used instead by Holoscan during initialization of the condition. Similarly, any "transmitter" argument can be specified via the name of the output port corresponding to theTransmitter
the condition will apply to. As a concrete example, this would allow specifying use of aCudaStreamCondition
on an input port like in the following example:
In C++, assuming we have defined a class named MyOperator
with an input port named "in" we could require stream synchronization for that port using:
auto my_op = make_operator<MyOperator>(
"my_op",
make_condition<CudaStreamCondition>("stream_sync", Arg("receiver", "in")),
// any additional Arg or ArgList here
);
or from Python using
my_op = MyOperator(
self,
CudaStreamCondition(self, receiver="in", name="stream_sync"),
name="my_op",
# any additional kwargs here
)
Holoviz module
Holoinfer module
Utils
HoloHub
Documentation
Breaking Changes
Bug fixes
Issue | Description |
---|---|
4903377 | The newly introduced RMMAllocator and StreamOrderedAllocator incorrectly parse the Bytes suffix ("B") as Megabytes ("MB") for all parameters related to memory size. Please specify any memory size parameters using the "KB", "MB", "GB" or "TB" suffixes instead. |
A bug in the number of channels returned by FormatConverterOp in the specific case of a RGB VideoBuffer input with RGB output was fixed. The bug resulted in 4 channels instead of 3 being allocated if no resize or channel conversion was being performed. The output should now always have the correct number of channels. |
Known Issues
This section details issues discovered during development and QA but unresolved in this release.
Issue | Description |
---|---|
4062979 | When Operators connected in a Directed Acyclic Graph (DAG) are executed in a multithreaded scheduler, it is not ensured that their execution order in the graph is adhered. |
4267272 | AJA drivers cannot be built with RDMA on IGX SW 1.0 DP iGPU due to missing nv-p2p.h . Expected to be addressed in IGX SW 1.0 GA. |
4384768 | No RDMA support on JetPack 6.0 DP and IGX SW 1.0 DP iGPU due to missing nv-p2p kernel module. Expected to be addressed in JP 6.0 GA and IGX SW 1.0 GA respectively. |
4190019 | Holoviz segfaults on multi-gpu setup when specifying device using the --gpus flag with docker run . Current workaround is to use CUDA_VISIBLE_DEVICES in the container instead. |
4210082 | v4l2 applications seg faults at exit or crashes at start with '_PyGILState_NoteThreadState: Couldn't create autoTSSkey maping' |
4339399 | High CPU usage observed with video_replayer_distributed application. While the high CPU usage associated with the GXF UCX extension has been fixed since v1.0, distributed applications using the MultiThreadScheduler (with the check_recession_period_ms parameter set to 0 by default) may still experience high CPU usage. Setting the HOLOSCAN_CHECK_RECESSION_PERIOD_MS environment variable to a value greater than 0 (e.g. 1.5 ) can help reduce CPU usage. However, this may result in increased latency for the application until the MultiThreadScheduler switches to an event-based multithreaded scheduler. |
4318442 | UCX cuda_ipc protocol doesn't work in Docker containers on x86_64. As a workaround, we are currently disabling the UCX cuda_ipc protocol on all platforms via the UCX_TLS environment variable. |
4325468 | The V4L2VideoCapture operator only supports YUYV and AB24 source pixel formats, and only outputs the RGBA GXF video format. Other source pixel formats compatible with V4L2 can be manually defined by the user, but they're assumed to be equivalent to RGBA8888. |
4325585 | Applications using MultiThreadScheduler may exit early due to timeouts. This occurs when the stop_on_deadlock_timeout parameter is improperly set to a value equal to or less than check_recession_period_ms , particularly if check_recession_period_ms is greater than zero. |
4301203 | HDMI IN fails in v4l2_camera on IGX Orin Devkit for some resolution or formats. Try the latest firmware as a partial fix. Driver-level fixes expected in IGX SW 1.0 GA. |
4384348 | UCX termination (either ctrl+c , press 'Esc' or clicking close button) is not smooth and can show multiple error messages. |
4481171 | Running the driver for a distributed applications on IGX Orin devkits fails when connected to other systems through eth1. A workaround is to use eth0 port to connect to other systems for distributed workloads. |
4458192 | In scenarios where distributed applications have both the driver and workers running on the same host, either within a Docker container or directly on the host, there's a possibility of encountering "Address already in use" errors. A potential solution is to assign a different port number to the HOLOSCAN_HEALTH_CHECK_PORT environment variable (default: 8777 ), for example, by using export HOLOSCAN_HEALTH_CHECK_PORT=8780 . |
4782662 | Installing Holoscan wheel 2.0.0 or later as root causes error. |
4768945 | Distributed applications crash when the engine file is unavailable/generating engine file. |
4753994 | Debugging Python application may lead to segfault when expanding an operator variable. |
Wayland: holoscan::viz::Init() with existing GLFW window fails. | |
4394306 | When Python bindings are created for a C++ Operator, it is not always guaranteed that the destructor will be called prior to termination of the Python application. As a workaround to this issue, it is recommended that any resource cleanup should happen in an operator's stop() method rather than in the destructor. |
4902749 | V4L2 applications segfault at start if using underlying NVV4L2 |
4909073 | V4L2 and AJA applications in x86 container report Wayland XDG_RUNTIME_DIR not set error |
4909088 | CPP video_replayer_distributed example throws UCX errors and segfaults on close |
4911129 | HoloHub Endoscopy Tool Tracking application latency exceeds 50ms on Jetson devices |