Releases · nvidia-holoscan/holoscan-sdk

Release Artifacts

🐋 Docker container: tag v4.1.0-cuda13,v4.1.0-cuda12-dgpu and v4.1.0-cuda12-igpu
🐍 Python wheel: pip install holoscan==4.1.0
📦️ Debian packages: 4.1.0.1-1
📕 Documentation

See supported platforms for compatibility.

Release Notes

New Features and Improvements

Core

GPU-resident operators now support multiple input and output ports per connection.
Renamed the existing flow-oriented graph API to use the canonical FlowGraph / FlowGraphImpl C++ names and the Python holoscan.flow_graphs module, while reserving the Graph name and core/graphs path for future use.
The EventBasedScheduler now exposes advanced performance-tuning parameters from the underlying GXF scheduler that can reduce lock contention and improve scaling with higher worker thread counts. New optimizations include work stealing between worker queues, a worker-side post-check fast path that bypasses the dispatcher for READY/WAIT_TIME transitions, and sharding of internal notification and wait-state queues. In this release the work stealing (enable_queue_stealing) and post-check fast path optimizations (enable_worker_postcheck_fastpath) are not enabled by default to better preserve v4.0 scheduling behavior and get more real-world experience using these options. See the EventBasedScheduler documentation for details on each parameter.
The underlying GXF runtime now automatically uses an entity pool to reuse message entities, eliminating repeated entity creation/destruction overhead in high-throughput pipelines. This is transparent and requires no application changes. To disable pooling (e.g., for debugging), set the environment variable GXF_ENTITY_POOL_SIZE=0 before launching the application.
Fragment::add_subgraph now takes ownership of the subgraph, enabling a factory pattern where subgraphs of runtime-determined type can be created externally and added to a fragment or parent subgraph.
Added Fragment::subgraphs() and Subgraph::nested_subgraphs() accessors for inspecting the subgraph hierarchy (e.g. for graph visualization).

Operators/Resources/Conditions

HolovizOp now supports 16-bit signed and unsigned integer RGB, 32-bit floating point RGB, and 16-bit floating point R, RGB and RGBA image formats.

Examples

Holoviz module

Added support for R16G16B16_UNORM, R16G16B16_SNORM, R16G16B16_SFLOAT and R32G32B32_SFLOAT image formats.

Holoinfer module

Utils

HoloHub

Source build & Release Artifacts

Documentation

Breaking Changes

Removed the holoscan.cli Python module stub that was deprecated in v2.9. Users who need CLI
functionality should install the standalone holoscan-cli
package via pip install holoscan-cli.

Bug fixes

Issue	Description
5606400, 5929120	Resolved Python GIL crash (`Fatal Python error: take_gil: PyMUTEX_LOCK(gil->mutex) failed` or `SIGSEGV`) that occurred when running the V4L2 camera Python example on AGX Orin iGPU and AGX Thor iGPU. The issue was caused by `libv4l2` plugin loading, which corrupted glibc TLS destructor pointers through `dlopen`/`dlclose` of NVIDIA V4L2 codec plugins, as well as a stale `PyThreadState` left in CPython's GILState TSS, leading to a poisoned `PyGILState_Ensure()` on GXF worker threads. The immediate solution was to avoid using `libv4l2` wrappers in `V4L2VideoCaptureOp`. A permanent fix is being developed on the NVIDIA V4L2 plugin side.
5933258	Fixed SIGSEGV in `__nptl_deallocate_tsd` during thread teardown when running the V4L2 camera C++ example on AGX Orin iGPU and AGX Thor iGPU. Root cause: `v4l2_open()` loaded all NVIDIA V4L2 plugins indiscriminately, including the Thor/OpenRM CUVID plugin (`libv4l2_nvcuvidvideocodec.so`) whose `libEGL`/`libGLdispatch` dependency chain registered a `pthread_key_create` TLS destructor that became a dangling pointer after `dlclose`. Fix: replace `v4l2_open()`/`v4l2_close()` with raw POSIX `open()`/`close()` in `V4L2VideoCaptureOp`. A permanent fix is being developed on the NVIDIA V4L2 plugin side.
5946872	Fixed CuPy failing with `Permission denied` when the container runs as a non-root user other than the default Ubuntu user (UID 1000), e.g. `docker run -u 1001:1001`. CuPy was writing its kernel cache under `$HOME/.cupy/kernel_cache`, which may not exist or be writable for arbitrary UIDs. Official Holoscan Docker images now set `CUPY_CACHE_DIR=/tmp/.cupy/kernel_cache` (world-writable), and the `run` script exports a writable cache path for dev-container workflows. Custom images that omit that `ENV` should set `CUPY_CACHE_DIR` to a writable directory before running Python.

Known Issues

Issue	Description
5211869	On IGX Orin dGPU, error "Failed to start server on 0.0.0.0:10002" is reported when debugging the Distributed Endoscopy Tool Tracking application in VS Code.
5928213	On IGX Thor dGPU (Blackwell), `holoviz_conditions` C++ example application launches with no frame detected and reports the error `VK_KHR_present_wait` is not available.

Release Artifacts

🐋 Docker container: tag v4.0.0-cuda13,v4.0.0-cuda12-dgpu and v4.0.0-cuda12-igpu
🐍 Python wheel: pip install holoscan==4.0.0
📦️ Debian packages: 4.0.0.0-1
📕 Documentation

See supported platforms for compatibility.

Release Notes

New Features and Improvements

Core

GPU-resident graph (C++): Holoscan SDK introduces a new execution
mode, called GPU-resident graph execution, that keeps the CUDA compute pipeline on
the GPU for the lifetime of the application, reducing CPU scheduling
overheads, CUDA kernel launch cost, CPU-GPU coordination and improving
deterministic, low-latency behavior.
- GPU-resident operators and fragments: Operators inheriting from holoscan::GPUResidentOperator use device memory for ports and are captured into CUDA Graphs. All operators in a fragment must be GPU-resident for that fragment to use GPU-resident graph execution; currently only linear chains of operators are supported.
- Port declaration: Ports can be declared by memory block size (executor-allocated shared device memory) or by device pointer (operator-managed); see the GPU-resident user guide for connection semantics.
- Data ready handler (optional): A separate GPU-resident fragment can be registered as a data ready handler to run at the start of each iteration and decide when input data is ready, enabling sensor-driven pipelines with GPU-direct technologies (e.g., Holoscan Sensor Bridge) without host CPU involvement.
- Host control API: After run_async(), use Fragment::gpu_resident() to signal data ready, poll result ready, and configure timeout and optional host sync; see the GPU-resident user guide and examples.
- GPU-resident graph execution is supported in C++ only; Python support is planned for a future release.
Data Flow Tracking - programmatic access and probing: Data flow tracking now supports richer observability and targeting of arbitrary operators.
- Programmatic access in operators: Operators can read data flow tracking information during execution via get_data_flow_tracking_label() (C++/Python). This returns a MessageLabel for a given input port, with path names, number of paths, and timing information, enabling adaptive behavior, debugging, in-operator and runtime analysis without waiting for post-run results.
- Tracking to arbitrary operators: Use DataFlowTracker::add_probe_operator() to treat any intermediate (non-root, non-leaf) operator as a probe point. The tracker then reports latency from root operators to the probed operator and the number of messages published by that operator. Probe operators are currently validated with DoubleBufferTransmitter/DoubleBufferReceiver and default connection configurations; see the Data Flow Tracking documentation for details and the flow_tracker example for usage.

Operators/Resources/Conditions

New classes related to publish/subscribe (pub/sub) messaging were added (PubSubContext, PubSubReceiver, PubSubTransmitter) as wrappers around underlying GXF pub/sub interfaces. In this release, Holoscan does not ship a concrete pub/sub backend implementation, so these APIs should be considered experimental scaffolding for future releases. Relatedly, IOSpec now includes topic and qos methods to describe intended pub/sub topic and QoS settings, which will become active once a concrete backend is provided.

Examples

The flow_tracker example demonstrates the new data flow tracking capabilities: programmatic access via get_data_flow_tracking_label() inside an operator and tracking to intermediate operators via add_probe_operator().
Added gpu_resident_example, gpu_resident_input, and gpu_resident_inference_example demonstrating GPU-resident execution: operator chains, device memory ports, (in gpu_resident_input) device-driven data ready control, and (in gpu_resident_inference_example) GPU-resident inference with the TensorRT (TRT) backend.
Added matx_allocator example (examples/matx/matx_allocator/) demonstrating how to use the MatXAllocator adapter with RMMAllocator in a Holoscan pipeline with DLPack tensor interop.

Holoviz module

Holoinfer module

Utils

Added MatXAllocator utility class (holoscan/utils/matx_allocator.hpp) — a lightweight adapter that enables any Holoscan SDK allocator (e.g., RMMAllocator, BlockMemoryPool, StreamOrderedAllocator) to be used with MatX GPU tensor operations. Supports stream-aware async allocation/deallocation when used with CudaAllocator-derived allocators and a CUDA stream.

HoloHub

Source build & Release Artifacts

Added Holoscan Sensor Bridge 2.5.0
basic feature functionality in the Holoscan SDK developer container environment. Holoscan Sensor Bridge libraries and
tools are built with RDMA over Converged Ethernet (RoCE) support targeting discrete GPU platforms with
ConnectX Network Interface Card (NIC) such as DGX Spark or IGX platforms. Libraries may be found under
the /opt/nvidia/hololink path in the discrete GPU developer containers. Please refer to the Holoscan
Sensor Bridge User Guide for
detailed requirements.
- Note that the containerized Holoscan Sensor Bridge libraries do not support integrated GPU networking stacks (Jetson, IGX Orin 500). Please rebuild Holoscan Sensor Bridge from source for iGPU platform support.
- Holoscan Sensor Bridge support for x86_64 configurations is experimental and not formally verified. Holoscan Sensor Bridge binaries have been included in the Holoscan SDK x86_64 development container to support developers with Holoscan Sensor Bridge emulation via software loopback. We do not recommend using real Holoscan Sensor Bridge hardware with x86_64 platforms at this time.
- Please run the container as the root user to enable full Holoscan Sensor Bridge runtime features.
Updated Holoscan SDK CMake import rules with improved handling for external dependencies.
Holoscan SDK C++ and Python binary distributions (Debian packages, Python wheels, etc) traditionally
contain runtime libraries and development rules for dependencies such as RMM, SPDLOG, YAML-CPP and more,
to reduce the downstream developer burden. In this version we improve "holoscan-config.cmake" and
associated files in the Holoscan SDK binary distribution to describe embedded dependencies with
dedicated project configuration files such as "rmm-config.cmake", and rely on CMake's "find_package"
mechanism to relax dependency embeddings.
- Downstream projects may override Holoscan SDK embedded CMake transitive dependencies with "find_package":
```
# Find RMM from a custom installation in your development environment
find_package(rmm HINTS path/to/my/rmm)

# Holoscan SDK will favor the custom RMM installation over its own embedded distribution
find_package(holoscan)
```
- Downstream projects may depend directly on the Holoscan SDK embedded dependencies:
```
# Populate any embedded dependencies not already provided
find_package(holoscan)

# If not set in advance, will use the RMM installation embedded in the Holoscan SDK distribution
find_package(rmm)     # result: rmm_DIR path is already set to the Holoscan SDK installation
...
target_link_libraries(mylibrary PUBLIC rmm::rmm)   # dependency imported targets are available
```
- Note that some header-only dependencies (concurrentqueue, MatX) remain in the Holoscan SDK
  distribution without explicit CMake imported targets to describe them.
Removed the legacy Holoscan SDK runtime Dockerfile as not supported. Please refer to
Holoscan CLI and
HoloHub sample apps for guidance and examples for
writing an optimized Dockerfile for your Holoscan-based project.
The NGC holoscan_dev_deb resource (Debian packages up to HSDK 0.6) has been archived and
removed from NGC. Debian packages are now distributed exclusively via
NVIDIA Holoscan downloads; use the container
image, Python wheel, or Debian package options listed in the Release Artifacts section above.

Documentation

Breaking Changes

Removed the holoscan.cli Python module stub that was deprecated in v2.9. Users who need CLI
functionality should install the standalone holoscan-cli
package via pip install holoscan-cli.

Bug fixes

Issue	Description
5601128	On IGX Orin iGPU, fix segmentation fault by ensuring `matx::make_tensor` does not fall back from managed memory to host-pinned memory during allocation
5898338	Fix error where `InferenceOp` ignores `CudaStreamPool` passed positionally (only the named Arg works)

Known Issues

Issue	Description
5211869	On IGX Orin dGPU, error "Failed to start server on 0.0.0.0:10002" is reported when debugging the Distributed Endoscopy Tool Tracking application in VS Code.
5606400	On...

Release Artifacts

🐋 Docker container: tag v3.11.0-cuda13,v3.11.0-cuda12-dgpu and v3.11.0-cuda12-igpu
🐍 Python wheel: pip install holoscan==3.11.0
📦️ Debian packages: 3.11.0.0-1
📕 Documentation

See supported platforms for compatibility.

Release Notes

New Features and Improvements

Core

Stream-Aware Deallocation: Added automatic CUDA stream propagation to Tensor and VideoBuffer components during OutputContext::emit() to enable pool-based allocators (BlockMemoryPool, RMMAllocator, StreamOrderedAllocator) to defer memory reuse until GPU operations complete on the associated stream. This prevents potential race conditions where memory could be returned to the pool while GPU kernels are still reading from it when an operator returned from compute while GPU work was still in progress.

For sink operators (operators that consume data but don't emit it), a new set_deallocation_stream(cudaStream_t stream) method has been added to holoscan::Tensor (holoscan.Tensor.set_deallocation_stream(stream) in Python). Sink operators that return asynchronously while GPU work is ongoing should call this method on received tensors to inform the allocator which stream last accessed the tensor's memory. The method returns true if the stream was set successfully, or false for tensors not managed by a Holoscan/GXF allocator (e.g., tensors from CuPy or PyTorch via DLPack).

Stream propagation can be disabled by setting the environment variable HOLOSCAN_DISABLE_ENTITY_STREAM_PROPAGATION=1. See the CUDA Stream Handling documentation for more details.
Subgraph API Enhancements: Several improvements have been made to the Subgraph class to simplify authoring and usage:
- Config file support: The Subgraph constructor now accepts an optional config_file parameter to specify a YAML configuration file for the subgraph, enabling from_config() during compose().
- Broadcast input ports: Input interface ports can now be connected to multiple internal operators, enabling a single external port to broadcast data to multiple destinations within the subgraph.
- Simplified add_interface_port: The internal operator port name is now optional (defaults to the external port name), and port direction (input/output) can be auto-detected from the operator's port definitions.
- Fragment::add_subgraph (Fragment.add_subgraph in Python): New method to add a subgraph without interface ports that doesn't need to be connected via add_flow.
- Subgraph::operators() (Subgraph.operators in Python): New method to retrieve all operators belonging to the subgraph and its nested subgraphs.
- Convenience wrappers: Added add_data_logger and register_service methods to Subgraph that delegate to the parent Fragment.
- Python bindings: The InterfacePort class and interface port retrieval methods (interface_ports, get_interface_operator_port) are now exposed in Python.
The MetadataDictionary object returned by the C++ Operator::metadata() method now has a deep_copy() method that creates independent copies of both the dictionary structure and all contained MetadataObject instances, preserving each entry's MetadataPolicy. For Python, the dict-like MetadataDictionary object returned by the Operator.metadata property implements __copy__ (shallow copy) and __deepcopy__ (calls deep_copy()) dunder methods, so copy.copy() and copy.deepcopy() from Python's built-in copy module behave accordingly. The deep copy creates truly independent metadata dictionaries where modifications to one won't affect another. Note that values stored as std::shared_ptr<T> are not deep-copied—only the pointer is copied.
A unique, descriptive default name is assigned to all C++ Operator, Condition, Resource, Scheduler, and DataLogger classes provided by the SDK. Previously, for C++, a generic name such as "unnamed_condition" or "unnamed_resource" was used if the user did not provide a name string as the first argument to make_condition, make_resource, etc. The default names in C++ improve consistency with the existing Python API. This change should not require any changes to existing applications but should make logging more informative when default names were used.
All scheduler classes (EventBasedScheduler, GreedyScheduler, MultiThreadScheduler) support a new network_connection_timeout parameter which defaults to 5000 ms. This parameter is used only by distributed applications. During the initial phase when network connections are being established, this network_connection_timeout (in ms) is used instead of stop_on_deadlock_timeout for deadlock detection. The parameter should be set long enough to allow sufficient time for UCX connections to be established between all fragments without triggering false deadlock detection. After the network_connection_timeout period has expired, the application switches to using the standard stop_on_deadlock_timeout for deadlock detection. Single fragment applications ignore network_connection_timeout and use only stop_on_deadlock_timeout.
For distributed applications, if Application::scheduler (Application.scheduler in Python) is called, it will now automatically set that scheduler for all fragments of the application unless those fragments already have their own scheduler set via Fragment::scheduler. Previously, the Application::scheduler method only worked for non-distributed applications and did not have any effect for distributed applications.
Various inconsistencies in custom scheduler assignment for distributed applications have been resolved. Whenever the user explicitly sets a scheduler via Fragment::scheduler (Fragment.scheduler in Python), that scheduler type is now always respected and the user-specified parameters are applied (unless overridden by environment variables). As before, when environment variables HOLOSCAN_STOP_ON_DEADLOCK, HOLOSCAN_STOP_ON_DEADLOCK_TIMEOUT, HOLOSCAN_MAX_DURATION_MS and/or HOLOSCAN_CHECK_RECESSION_PERIOD_MS are specified, the values specified by these environment variables will override any corresponding argument that was set for the scheduler passed to Fragment::scheduler. Similarly, if HOLOSCAN_DISTRIBUTED_APP_SCHEDULER is set and that scheduler type does not match the user-defined one passed to Fragment.scheduler, the scheduler specified by the environment variable will be used.

Operators/Resources/Conditions

CudaStreamCondition Enhancements: The CudaStreamCondition has been reimplemented to address several limitations of the original:
- Multi-receiver support: Now supports multi-receiver ports (IOSpec::kAnySize) where the number of connections is not known at compile time. Specify the base port name (e.g., "receivers") and all matching ports (receivers:0, receivers:1, etc.) will be automatically discovered.
- Queue size > 1: Now supports input ports with queue sizes greater than 1. By default, all messages in the queue are checked for CUDA streams (check_all_messages=true). Set to false to only check the first message per receiver.
- Multiple CudaStreamId per entity: Now correctly handles GXF entities containing more than one CudaStreamId component.
- Multiple port names: The new receivers parameter accepts a std::vector<std::string> (list[str] in Python) of receiver names to monitor multiple input ports.
- Backwards compatibility: The legacy receiver parameter (single port name) is still supported but deprecated. A warning will be logged recommending migration to the receivers parameter.
A new notify_scheduler() method has been added to the base Condition class (holoscan::Condition::notify_scheduler in C++, holoscan.core.Condition.notify_scheduler in Python). This method allows event-based conditions (those returning kWaitEvent from check()) to signal to the scheduler that an asynchronous event has completed and the condition should be re-evaluated. This is used internally by CudaStreamCondition and AsynchronousCondition, and is now available for custom native conditions.
The Receiver class now exposes peek() and sync() methods. peek(index) allows conditions to inspect messages in the queue without consuming them, and sync() moves messages from the back stage to the main stage for double-buffer queues.
AsyncDataLoggerResource (and derived classes like AsyncConsoleLogger) now supports a configurable queue_type parameter (DataLoggerQueueType enum) to select between two queue implementations:
- LockFree (default): High-throughput lock-free queue with per-producer FIFO ordering. Best for most use cases.
- Ordered: Mutex-based queue with strict global FIFO ordering across all producers. Use this when strict temporal ordering is required (e.g., compliance logging, debugging, causal analysis).
In YAML configuration, use queue_type: "lock_free" or queue_type: "ordered" (case-insensitive).

Examples

Holoviz module

New present modes are now supported:
- SHARED_DEMAND_REFRESH and SHARED_CONTINUOUS_REFRESH to support front buffer rendering. These present modes are available in exclusive display (direct rendering) mode only and can be used to implement front buffer rendering.
- FIFO_LATEST_READY which waits for vblank, presents the current image and...

Post 3.10.0 Release Fixes:

Update GXF downloading URL in the Dockerfile to use the public NVIDIA artifactory URL.

Release Artifacts

🐋 Docker container: tag v3.10.0-cuda13,v3.10.0-cuda12-dgpu and v3.10.0-cuda12-igpu
🐍 Python wheel: pip install holoscan==3.10.0
📦️ Debian packages: 3.10.0.0-1
📕 Documentation

See supported platforms for compatibility.

Release Notes

New Features and Improvements

Core

The C++ API now supports more convenient and flexible assignment of an Arg with any integer type to a typed Parameter<T> (where T is a specific integer type). An error will be raised if the value stored in Arg is outside the representable range for Parameter<T> (i.e. only safe conversions are allowed). Similarly, an integer-valued Arg can be used to set Parameter<float> or Parameter<double> as well. This allows passing arguments to make_operator, make_resource, make_condition, etc. without having to match the exact target type of the parameter in these common cases.
UCX is now built with gdrcopy support for GPU Direct RDMA, enabling lower-latency GPU memory transfers in distributed applications when the gdrcopy kernel module is installed on the host.
Holoscan now emits a warning when an input port is configured with size > 1 (including IOSpec.PRECEDING_COUNT) and the default MessageAvailableCondition is used. In this configuration, min_size is implicitly set to the same value as the queue capacity, enabling batched execution. To future-proof applications and prepare for planned API evolution that will decouple queue capacity from batching, set min_size explicitly whenever size > 1 where possible. For IOSpec.PRECEDING_COUNT, the resolved size is computed from the graph at run time; a planned batch_size configuration is intended to make this behavior explicitly configurable.

Operators/Resources/Conditions

The AsyncDataLoggerResource class has a new parameter, shutdown_wait_period_ms, that can be used to control how long (in milliseconds) the application will continue trying to log any pending messages in the queue(s) during application shutdown. This interval will be respected both for normal application termination as well as when terminated via signal interrupt (e.g. Ctrl+C). The default interval of -1 indicates that all remaining items in the queue should be logged prior to exiting.

Examples

Holoviz module

Holoinfer module

Utils

UCX diagnostic utilities (ucx_info, ucx_perftest) are now packaged with Holoscan SDK for verifying UCX transport capabilities and benchmarking performance.

HoloHub

Source build & Release Artifacts

The Holoscan SDK build process can find libtorch libraries and headers from a PyTorch wheel installation on the system.
The build container (root Dockerfile) and dev container (on NGC) include a functional installation of PyTorch which can be used with import torch, version 2.8.0 for CUDA 12 images, 2.9.0 for CUDA 13 images. This is to supplement the existing installation of libtorch which is still leveraged by the holoinfer module and inference operator.
- Note: cuda dependencies defined by PyTorch are explicitly ignored or deleted in favor of the system-provided ones to ensure compatibility with other components, whether pytorch is revendoring them or requiring them with pip.
The Holoscan SDK iGPU dev container on NGC (which targets AGX Orin and IGX Orin iGPU) was downgraded from Ubuntu 24.04 to Ubuntu 22.04 to match JP6 and IGX SW 1.x OS versions, in order to provide compatibility with the PyTorch 2.8.0 wheel from pypi.jetson-ai-lab.io.

Documentation

Breaking Changes

The Subgraph class constructor argument has been renamed from instance_name to name for consistency with other Holoscan components. The Subgraph::instance_name() method is now deprecated in favor of Subgraph::name(). The instance_name keyword argument and instance_name property are still supported in Python but will emit a deprecation warning. Users should update their code to use name instead.
Planned breaking change (future release): the size argument to OperatorSpec::input / OperatorSpec.input will control receiver queue capacity only and will no longer implicitly enable batching by setting MessageAvailableCondition.min_size. Batching will require explicit configuration (e.g., set min_size directly, or use a planned batch_size configuration). Applications relying on implicit batching should set min_size explicitly now. For the IOSpec.PRECEDING_COUNT case, a planned batch_size configuration is intended to provide an explicit equivalent to today's "batch by preceding-count" behavior.

Bug fixes

Issue	Description
5749574	"import holoscan.pose_tree" fails for "holoscan" Conda package due to incompatibility with "rapidsai" UCXX
5731859, 5681319	When signal interrupt (e.g. SIGINT) is used to terminate an application, it may not wait for pending items in the `AsyncDataLoggerResource`'s queues to be processed (or discarded cleanly).
5722190	Restored RDMA/InfiniBand support for UCX that was missing since v3.7.0. Added RDMA build dependencies (`rdma-core`, `libibverbs-dev`, `librdmacm-dev`) and gdrcopy to enable `libuct_ib.so`, `libuct_ib_mlx5.so`, and `libuct_rdmacm.so` transports for high-performance distributed applications.
5536564	Duplicate profiling paths with 0 avg latency numbers in cyclic app
5061275, 5721978	Python operators created via the decorator API (`holoscan.decorator.create_op`) do not support construction by passing `Subgraph` as the first argument to the constructor

Known Issues

Issue	Description
5764538	Holoscan SDK CUDA 12 is not compatible with PyTorch v2.9. See Q30: How do I fix segmentation faults when using PyTorch 2.9.x with Holoscan SDK v3.10 CUDA 12? in Holoscan SDK 3.10 FAQs
5606400	On AGX Thor Jedha, `v4l2_camera_usb_webcam` Python application fails with segmentation fault
5427783	On Jetson, `distributed_pose_tree_multinode` application fails between Jetson and nano/IGX/Jetson
5211869	On IGX dGPU, Error "Failed to start server on 0.0.0.0:10002" is reported when debugging Distributed Endoscopy Tool Tracking application in VScode

Release Artifacts

🐋 Docker container: tag v3.9.0-cuda13,v3.9.0-cuda12-dgpu and v3.9.0-cuda12-igpu
🐍 Python wheel: pip install holoscan==3.9.0
📦️ Debian packages: 3.9.0.1-1
📕 Documentation

See supported platforms for compatibility.

Release Notes

New Features and Improvements

Added explicit and additional support for data flow tracking with asynchronous lock-free buffers.
Conditions note: sharing a single BooleanCondition instance across multiple operators is not
supported. In particular, attempting to share a window_close_condition across multiple
HolovizOp instances can prevent application shutdown when a window is closed. To support
coordinated shutdown, Holoviz now provides a window_close_callback parameter that lets
applications perform custom shutdown logic (e.g., Application::stop_execution()) when the Holoviz window is closed.

Core

The Subgraph class has a new method set_dynamic_flows which is a wrapper around the associated fragment's set_dynamic_flows method. This allows calling set_dynamic_flows directly when composing a fragment to provide a consistent syntax for defining dynamic flows within a subgraph vs. within a fragment. In Holoscan v3.8, using dynamic flows from a subgraph was possible, but required first retrieving the subgraph's fragment and then calling that fragment's set_dynamic_flows method.
Note (internal SDK changes; typical apps unaffected):
- Improved and unified parameter error handling and diagnostics:
  - C++ ArgumentSetter::SetterFunc now returns bool to communicate success/failure.
  - ArgumentSetter::set_param throws with a detailed message when a setter reports failure, including arg_type details for faster debugging.
  - YAML decode/parse errors now emit clearer diagnostics and consistent log severity to aid configuration troubleshooting.

Operators/Resources/Conditions

New method ExecutionContext::is_gpu_available can be used to check if a GPU was detected in the system. This can be used by operators to provide an alternative code path for systems without a GPU.
HolovizOp: added window_close_callback parameter (C++ and Python) invoked when the
display window is closed. This enables applications to perform custom shutdown logic
(e.g., Application::stop_execution()), including multi-window setups.
Note (internal SDK changes; typical apps unaffected):
- Aggregated error reporting when setting parameters:
  - GXFOperator::set_parameters() collects all parameter set failures and throws one exception summarizing all issues, including GXF error strings and codes, and the operator GXF type.
  - GXFComponentResource::set_parameters() aggregates GXF parameter errors in the same fashion.
  - GXFScheduler::set_parameters() aggregates GXF parameter errors and throws one exception summarizing all issues, including GXF error strings/codes and the scheduler GXF type.
  - GXFNetworkContext::set_parameters() aggregates GXF parameter errors in the same fashion.
  - ComponentBase::update_params_from_args() aggregates setter failures; unknown arguments are logged and included in the summary when other setter errors are present.
The visibility of the parameter member variables of AsyncDataLoggerResource has been changed from private to protected. This way loggers inheriting from this class can directly access these parameters.

Examples

Video Replayer (C++/Python): updated examples to wire window_close_callback so closing the
Holoviz window cleanly stops the application. Dual-window variants are supported.

Holoviz module

Introduced WindowCloseCallbackFunction and corresponding parameter in HolovizOp; the callback is
executed during the window-close path. YAML converter and Python bindings were updated to accept window_close_callback.
Added tests: C++ system test to verify callback invocation, plus Python unit/system tests to validate parameter wiring and non-intrusive behavior.

Holoinfer module

Operators

Utils

HoloHub

Documentation

Breaking Changes

Due to improved data flow tracking with asynchronous lock-free buffers, the _old term is now reserved and cannot be used in operator names. If your operator names include _old, rename them.
Note (internal SDK changes; typical apps unaffected):
- C++ API change (for advanced/custom integrations): ArgumentSetter::SetterFunc signature changed from std::function<void(ParameterWrapper&, Arg&)> to std::function<bool(ParameterWrapper&, Arg&)>.
  - If you registered custom argument setters, update them to return true on success and false on failure.
- Exception semantics during configuration:
  - ArgumentSetter::set_param now throws std::runtime_error when a setter reports failure. Previously failures might have only been logged.
  - Higher-level code now aggregates and throws after attempting all parameter sets (Operators/Resources/Conditions/Schedulers/Network Context). Applications that previously continued after warnings may now see a single configuration-time exception summarizing all issues.
- Error propagation responsibilities: Adaptor callbacks return gxf_result_t; callers compose context and throw as needed.
If a configuration file is specified (non-empty filename string passed to Fragment::config or Application::Config), a runtime error will now be raised if the file does not exist. This changes previous behavior, which was to log a warning and attempt to continue with default parameters. The warning was easy to miss and could lead to confusion, so it was decided that an explicit error is preferable.

Bug fixes

Issue	Description
5489793	Holoscan 'as_tensor' methods cannot convert singleton or null PyTorch GPU Tensors
4384348	warnings and errors logged on window close (or Ctrl+C termination) of `video_replayer_distributed`
5327270	an error should be thrown if the specified YAML configuration file does not exist

Known Issues

Issue	Description
	A `Subgraph::set_dynamic_flows` convenience method for adding dynamic flows within subgraphs is missing. The workaround is to use the underlying Fragment as obtained by the `Subgraph::fragment()` method (C++) / `Subgraph.fragment` property (Python) and call `Fragment::set_dynamic_flows` instead.
5606400	The `v4l2_camera_usb_webcam (Python)` example may segfault on Jetson AGX Thor due to unexpected `py::gil_scoped_acquire` behavior.
5427783	The `distributed_pose_tree` example may fail when distributed between two Jetson AGX Orin nodes
5061275	The `ping_distributed` example may fail to create a path between two Jetson AGX Orin nodes
5681319	The `video_replayer_distributed` example reports an error when closing the replayer window

Release Artifacts

🐋 Docker container: tag v3.8.0-cuda13,v3.8.0-cuda12-dgpu and v3.8.0-cuda12-igpu
🐍 Python wheel: pip install holoscan==3.8.0
📦️ Debian packages: 3.8.0.0-1
📕 Documentation

See supported platforms for compatibility.

Release Notes

New Features and Improvements

It is now possible to group operators into subgraphs that can be re-used when building more complex, modular Holoscan applications. For concrete examples see the new examples/subgraph folder as well as corresponding test applications (C++/Python).

Core

Codec Registry: decoupled from gxf and renamespaced from holoscan::gxf::CodecRegistry to holoscan::CodecRegistry.
The experimental DataLogger interface now has an additional std::optional<CudaStream_t> stream argument. This CUDA stream information is now logged to the console by BasicConsoleLogger, GXFConsoleLogger and AsyncConsoleLogger. The stream can optionally be used by concrete logger implementations during copying of data from device to host during logging (using cudaMemcpyAsync).

Operators/Resources/Conditions

HolovizOp: added specialized scheduling conditions that enable operators to synchronize their execution with display events. The FirstPixelOutCondition and PresentDoneCondition conditions are particularly useful for rate-limiting and ensuring that frame generation is synchronized with the display's refresh cycle, reducing latency and improving visual quality. Note that the PresentDoneCondition is not supported on Orin iGPU. Also on supported systems, when using tools to remotely access the desktop, applications using the PresentDoneCondition might hang or crash.

Examples

added the holoviz_conditions example showing the usage of the new Holoviz scheduling conditions.

Holoviz module

Holoinfer module

Operators

Utils

HoloHub

Documentation

Breaking Changes

The experimental DataLogger interface has a breaking change to its API, adding a new std::optional<CudaStream_t> stream argument. Concrete implementations must be updated to add this new argument to the method signatures for log_data, log_tensor_data, log_tensormap_data and/or log_backend_specific.
The Fragment::make_executor(ArgsT&&... args) method was fixed so that this (Fragment*) is automatically passed as the first argument to the Executor before any args. It is unlikely to affect existing applications as the methods are automatically called internally by Holoscan SDK. However, if Fragment::make_executor is being used, this change should be noted.

Bug fixes

Issue	Description
5529488	Fix a bug where the default condition was also added to a port when a native Python Condition with a "receiver" or "transmitter" parameter was applied
5381536	CTest cases for examples/multithread/python/multithread.py tests were updated so that the timing tests only require a system with >4 threads to pass (previously required >8 threads)
5552584	When the incoming message contains a video tensor, it will ignore in_tensor_name, even if the tensor with the correct name is not a video
5489793	Holoscan 'as_tensor' methods cannot convert singleton or null PyTorch GPU Tensors

Known Issues

Issue	Description
	A `Subgraph::set_dynamic_flows` convenience method for adding dynamic flows within subgraphs is missing. The workaround is to use the underlying Fragment as obtained by the `Subgraph::fragment()` method (C++) / `Subgraph.fragment` property (Python) and call `Fragment::set_dynamic_flows` instead.

Post 3.7.0 Release Fixes:

Update GXF downloading URL in the Dockerfile to use the public NVIDIA artifactory URL.

Release Artifacts

🐋 Docker container: tag v3.7.0-cuda13, v3.7.0-cuda12-dgpu and v3.7.0-cuda12-igpu
🐍 Python wheel for CUDA 12: pip install holoscan-cu12==3.7.0
🐍 Python wheel for CUDA 13: pip install holoscan-cu13==3.7.0
📦️ Debian packages: 3.7.0-3.7.0-1
📕 Documentation

See supported platforms for compatibility.

Release Notes

New Features and Improvements

Core

Debian packages and Python wheels are now available for both CUDA 12 and CUDA 13 across Holoscan target platforms.
- apt install holoscan-cuda-12, apt install holoscan-cuda-13, or apt install holoscan (default CUDA 12)
- python3 -m pip install holoscan-cu12 or python3 -m pip install holoscan-cu13
HOLOSCAN_APP_DRIVER_PORT Environment Variable: Added support for overriding the default App Driver port via the HOLOSCAN_APP_DRIVER_PORT environment variable. This allows users to customize the port number used by distributed applications when no explicit address is specified with the --address option. Default port is now 57777 instead of 8765.

Operators/Resources/Conditions

HolovizOp: in fullscreen mode GSync is now supported correctly and applications are flipping instead of blitting when displaying the next frame. This significantly improves performance and reduces latency with swap bound applications.
FormatConverterOp: added support for converting RGBA16161616 to RGB888 and RGB161616 to RGB888
Holoinfer: added support to use user-provided CUDA stream pool in torch backend
Green context: added support for the default CUDA green context pool in a fragment
Stream pool and green context: added support for user-provided NVTX identifier used in NSight profiling

Examples

Holoviz module

Holoinfer module

Operators

Utils

HoloHub

Documentation

Breaking Changes

App Driver Default Port Changed: The default port number for the App Driver service has been changed from 8765 to 57777 to avoid conflicts with the FIO (Flexible I/O Tester) service whose default port is also 8765. This change affects distributed applications that rely on the default port without explicitly specifying an address.
- Impact: Applications using --driver and --worker options without the --address parameter will now use port 57777 instead of 8765.
- Backward Compatibility: Users can override the default port using the HOLOSCAN_APP_DRIVER_PORT environment variable (e.g., HOLOSCAN_APP_DRIVER_PORT=8765) or by explicitly specifying the address with --address option.
- Migration: No code changes are required for most users. Only users who have hardcoded expectations about port 8765 or have firewall rules/network configurations specifically for that port may need to update their configurations.
The HOLOSCAN_UCX_ASYNCHRONOUS environment variable is now marked as deprecated. In the future only the synchronous mode for the underlying UCX transmitters will be supported (synchronous mode is the current default so there is no change in behavior for any users who were not explicitly changing this environment variable). Aside from raising a warning on application startup, this environment variable will continue to work as before for upcoming 3.x releases, but will be removed in v4.0.

Bug fixes

Issue	Description
5472498	`OutputContext::emit` of `std::shared_ptr<Tensor>` or `TensorMap` results in logging via `DataLogger::log_backend_specific` interface instead of the expected (`log_tensor_data` or `log_tensormap_data` methods).
	Construction of `holoscan.data_loggers.AsyncConsoleLogger` fails if `holoscan.data_loggers.SimpleTextSerializer` has not already been explicitly imported
	`FormatConverterOp` assumes that U, V or UV-plane offsets for NV12 and YUV formats are equal to the Y color plane size when converting these types to RGB format. This may not always be true and `color_planes[i].offset` should be used instead.

Known Issues

Issue	Description
5539840	Holoinfer ONNX Runtime backend Input/Output CUDA buffer test segfault on IGX Orin iGPU
5538962	[L4T SIPL, JP7.0, Thor, Jedha, IQ, VB1940, HOLOLINK] I2C issues on Jedha with SIPL COE
5529488	Native Python port-based condition does not automatically disable the default condition a port
5518571	[HSB FPGA] Disabling packetizer in build causes second camera to fail to stream [IRIS]
5502972	HDMI Cameras not working with AJA card using aja_video_capture application
5489793	Holoscan 'as_tensor' methods cannot convert singleton or null Torch tensors
5484505	Thread Pools are not created properly in python
5475816	EXAMPLE_CPP_ACTIVATION_MAP fails on x86_64 w/ Debian container
5470011	CUDA 13 container is not compatible on IGX-dGPU. CUDA forward compatibility does not support graphics interop.
5469227	[Orin Nano] Failed to initialize component 00023 (cuda_green_context_pool)
5469136	Address Sanitizer pipeline build fails due to CUDA minimum version
5464864	Segmentation fault happens when close multiai_ultrasound window
5460314	PoseTree service access causes "pybind11::detail::get_type_info: type has multiple pybind11-registered bases" error messages
5457599	EXAMPLE_PYTHON_TENSOR_INTEROP_TEST passes despite edge case failure
5427783	[Concord] distributed_pose_tree_multinode failed between concord and nano/IGX/Concord
5427678	v4l_camera example visualization fails with Holoviz + >=R550 driver
5425996	Build Warnings with CMake 3.31
5424351	[X86 Server] holoscan-sdk/public/python/tests/unit/test_gxf.py test failed.
5411188	GXF dispatcher thread runs with SCHED_OTHER while worker threads run with Linux RT scheduling policies
5404972	Real-time thread test failures
5404947	Libtorch configuration warnings
5404926	Address Sanitizer failures
5392624	ERROR: LeakSanitizer: detected memory leaks
5390206	Distributed PoseTree service using UCX cannot accept more than three clients
5389580	is_user_defined_root() incorrectly classifies operators as root in acyclic flow graphs
5381536	EXAMPLE_PYTHON_MULTITHREAD_TIMES_VALIDATION_TEST fails on CAGX
5372221	No support for int64 and bool in Holoscan Inference Operator
5370867	Async Distributed Video Replayer segfaults at exit on IGX dGPU
5348375	getting error running nsys profile
5327270	Holoscan only warns when a config file cannot be found on disk
5211869	[IGX-dGPU] Error "Failed to start server on 0.0.0.0:10002" in VScode Debug Distributed Endoscopy Tool Tracking application
5162855	Dual-GPU Configuration fails Holoscan application v2.9
5144233	`python-api-tracing-profile` failure on Ubuntu 24.04 / Python 3.12
5061275	Ping distributed multi-nodes failed to create path between two nodes
5014059	[IGX-dGPU] VScode Debug Distributed Endoscopy Tool Tracking application on an IGX w/dGPU, does not hit the expected debug points.
4953020	The HOLOINFER_TEST is failing, or a segmentation fault occurs during the parseFromFile() call.
4789382	[Holoscan SDK v2.2] InferenceOp with libtorch backend reports undefined symbols
4768945	Distributed applications crash when the engine file is unavailable/generating engine file
4753994	Debugging Python application may lead to segfault when expanding an operator variable
4384348	UCX termination (either ctrl+c , press 'Esc' or clicking close button) is not smooth and can show multiple error messages

Release Artifacts

🐋 Docker container: tag v3.6.1-cuda13-dgpu
📦️ Debian package (Jetson AGX Thor only): apt install holoscan=3.6.1
📕 Documentation

See supported platforms for compatibility.

Release Notes

Holoscan SDK v3.6.1 is a limited release providing Holoscan SDK CUDA 13.0 support for x86_64 workstations and the Jetson AGX Thor platform.

New Features and Improvements

Core

Added support for CUDA 13.0 build environment and binary artifacts.
HOLOSCAN_APP_DRIVER_PORT Environment Variable: Added support for overriding the default App Driver port via the HOLOSCAN_APP_DRIVER_PORT environment variable. This allows users to customize the port number used by distributed applications when no explicit address is specified with the --address option. Default port is now 57777 instead of 8765.

Breaking Changes

App Driver Default Port Changed: The default port number for the App Driver service has been changed from 8765 to 57777 to avoid conflicts with the FIO (Flexible I/O Tester) service whose default port is also 8765. This change affects distributed applications that rely on the default port without explicitly specifying an address.
- Impact: Applications using --driver and --worker options without the --address parameter will now use port 57777 instead of 8765.
- Backward Compatibility: Users can override the default port using the HOLOSCAN_APP_DRIVER_PORT environment variable (e.g., HOLOSCAN_APP_DRIVER_PORT=8765) or by explicitly specifying the address with --address option.
- Migration: No code changes are required for most users. Only users who have hardcoded expectations about port 8765 or have firewall rules/network configurations specifically for that port may need to update their configurations.

Bug fixes

Issue	Description

Releases: nvidia-holoscan/holoscan-sdk

v4.1.0

Release Artifacts

Release Notes

New Features and Improvements

Core

Operators/Resources/Conditions

Examples

Holoviz module

Holoinfer module

Utils

HoloHub

Source build & Release Artifacts

Documentation

Breaking Changes

Bug fixes

Known Issues

Uh oh!

v4.0.0

Release Artifacts

Release Notes

New Features and Improvements

Core

Operators/Resources/Conditions

Examples

Holoviz module

Holoinfer module

Utils

HoloHub

Source build & Release Artifacts

Documentation

Breaking Changes

Bug fixes

Known Issues

Uh oh!

v3.11.0

Release Artifacts

Release Notes

New Features and Improvements

Core

Operators/Resources/Conditions

Examples

Holoviz module

Uh oh!

v3.10.0.post1

Uh oh!

v3.10.0

Release Artifacts

Release Notes

New Features and Improvements

Core

Operators/Resources/Conditions

Examples

Holoviz module

Holoinfer module

Utils

HoloHub

Source build & Release Artifacts

Documentation

Breaking Changes

Bug fixes

Known Issues

Uh oh!

v3.9.0

Release Artifacts

Release Notes

New Features and Improvements

Core

Operators/Resources/Conditions

Examples

Holoviz module

Holoinfer module

Operators

Utils

HoloHub

Documentation

Breaking Changes

Bug fixes

Known Issues

Uh oh!