Releases: nvidia-holoscan/holoscan-sdk
v4.1.0
Release Artifacts
- 🐋 Docker container: tag
v4.1.0-cuda13,v4.1.0-cuda12-dgpuandv4.1.0-cuda12-igpu - 🐍 Python wheel:
pip install holoscan==4.1.0 - 📦️ Debian packages:
4.1.0.1-1 - 📕 Documentation
See supported platforms for compatibility.
Release Notes
New Features and Improvements
Core
- GPU-resident operators now support multiple input and output ports per connection.
- Renamed the existing flow-oriented graph API to use the canonical
FlowGraph/FlowGraphImplC++ names and the Pythonholoscan.flow_graphsmodule, while reserving theGraphname andcore/graphspath for future use. - The
EventBasedSchedulernow exposes advanced performance-tuning parameters from the underlying GXF scheduler that can reduce lock contention and improve scaling with higher worker thread counts. New optimizations include work stealing between worker queues, a worker-side post-check fast path that bypasses the dispatcher for READY/WAIT_TIME transitions, and sharding of internal notification and wait-state queues. In this release the work stealing (enable_queue_stealing) and post-check fast path optimizations (enable_worker_postcheck_fastpath) are not enabled by default to better preserve v4.0 scheduling behavior and get more real-world experience using these options. See the EventBasedScheduler documentation for details on each parameter. - The underlying GXF runtime now automatically uses an entity pool to reuse message entities, eliminating repeated entity creation/destruction overhead in high-throughput pipelines. This is transparent and requires no application changes. To disable pooling (e.g., for debugging), set the environment variable
GXF_ENTITY_POOL_SIZE=0before launching the application. Fragment::add_subgraphnow takes ownership of the subgraph, enabling a factory pattern where subgraphs of runtime-determined type can be created externally and added to a fragment or parent subgraph.- Added
Fragment::subgraphs()andSubgraph::nested_subgraphs()accessors for inspecting the subgraph hierarchy (e.g. for graph visualization).
Operators/Resources/Conditions
- HolovizOp now supports 16-bit signed and unsigned integer RGB, 32-bit floating point RGB, and 16-bit floating point R, RGB and RGBA image formats.
Examples
Holoviz module
- Added support for
R16G16B16_UNORM,R16G16B16_SNORM,R16G16B16_SFLOATandR32G32B32_SFLOATimage formats.
Holoinfer module
Utils
HoloHub
Source build & Release Artifacts
Documentation
Breaking Changes
- Removed the
holoscan.cliPython module stub that was deprecated in v2.9. Users who need CLI
functionality should install the standaloneholoscan-cli
package viapip install holoscan-cli.
Bug fixes
| Issue | Description |
|---|---|
| 5606400, 5929120 | Resolved Python GIL crash (Fatal Python error: take_gil: PyMUTEX_LOCK(gil->mutex) failed or SIGSEGV) that occurred when running the V4L2 camera Python example on AGX Orin iGPU and AGX Thor iGPU. The issue was caused by libv4l2 plugin loading, which corrupted glibc TLS destructor pointers through dlopen/dlclose of NVIDIA V4L2 codec plugins, as well as a stale PyThreadState left in CPython's GILState TSS, leading to a poisoned PyGILState_Ensure() on GXF worker threads. The immediate solution was to avoid using libv4l2 wrappers in V4L2VideoCaptureOp. A permanent fix is being developed on the NVIDIA V4L2 plugin side. |
| 5933258 | Fixed SIGSEGV in __nptl_deallocate_tsd during thread teardown when running the V4L2 camera C++ example on AGX Orin iGPU and AGX Thor iGPU. Root cause: v4l2_open() loaded all NVIDIA V4L2 plugins indiscriminately, including the Thor/OpenRM CUVID plugin (libv4l2_nvcuvidvideocodec.so) whose libEGL/libGLdispatch dependency chain registered a pthread_key_create TLS destructor that became a dangling pointer after dlclose. Fix: replace v4l2_open()/v4l2_close() with raw POSIX open()/close() in V4L2VideoCaptureOp. A permanent fix is being developed on the NVIDIA V4L2 plugin side. |
| 5946872 | Fixed CuPy failing with Permission denied when the container runs as a non-root user other than the default Ubuntu user (UID 1000), e.g. docker run -u 1001:1001. CuPy was writing its kernel cache under $HOME/.cupy/kernel_cache, which may not exist or be writable for arbitrary UIDs. Official Holoscan Docker images now set CUPY_CACHE_DIR=/tmp/.cupy/kernel_cache (world-writable), and the run script exports a writable cache path for dev-container workflows. Custom images that omit that ENV should set CUPY_CACHE_DIR to a writable directory before running Python. |
Known Issues
| Issue | Description |
|---|---|
| 5211869 | On IGX Orin dGPU, error "Failed to start server on 0.0.0.0:10002" is reported when debugging the Distributed Endoscopy Tool Tracking application in VS Code. |
| 5928213 | On IGX Thor dGPU (Blackwell), holoviz_conditions C++ example application launches with no frame detected and reports the error VK_KHR_present_wait is not available. |
v4.0.0
Release Artifacts
- 🐋 Docker container: tag
v4.0.0-cuda13,v4.0.0-cuda12-dgpuandv4.0.0-cuda12-igpu - 🐍 Python wheel:
pip install holoscan==4.0.0 - 📦️ Debian packages:
4.0.0.0-1 - 📕 Documentation
See supported platforms for compatibility.
Release Notes
New Features and Improvements
Core
-
GPU-resident graph (C++): Holoscan SDK introduces a new execution
mode, called GPU-resident graph execution, that keeps the CUDA compute pipeline on
the GPU for the lifetime of the application, reducing CPU scheduling
overheads, CUDA kernel launch cost, CPU-GPU coordination and improving
deterministic, low-latency behavior.- GPU-resident operators and fragments: Operators inheriting from
holoscan::GPUResidentOperatoruse device memory for ports and are captured into CUDA Graphs. All operators in a fragment must be GPU-resident for that fragment to use GPU-resident graph execution; currently only linear chains of operators are supported. - Port declaration: Ports can be declared by memory block size (executor-allocated shared device memory) or by device pointer (operator-managed); see the GPU-resident user guide for connection semantics.
- Data ready handler (optional): A separate GPU-resident fragment can be registered as a data ready handler to run at the start of each iteration and decide when input data is ready, enabling sensor-driven pipelines with GPU-direct technologies (e.g., Holoscan Sensor Bridge) without host CPU involvement.
- Host control API: After
run_async(), useFragment::gpu_resident()to signal data ready, poll result ready, and configure timeout and optional host sync; see the GPU-resident user guide and examples. - GPU-resident graph execution is supported in C++ only; Python support is planned for a future release.
- GPU-resident operators and fragments: Operators inheriting from
-
Data Flow Tracking - programmatic access and probing: Data flow tracking now supports richer observability and targeting of arbitrary operators.
- Programmatic access in operators: Operators can read data flow tracking information during execution via
get_data_flow_tracking_label()(C++/Python). This returns aMessageLabelfor a given input port, with path names, number of paths, and timing information, enabling adaptive behavior, debugging, in-operator and runtime analysis without waiting for post-run results. - Tracking to arbitrary operators: Use
DataFlowTracker::add_probe_operator()to treat any intermediate (non-root, non-leaf) operator as a probe point. The tracker then reports latency from root operators to the probed operator and the number of messages published by that operator. Probe operators are currently validated withDoubleBufferTransmitter/DoubleBufferReceiverand default connection configurations; see the Data Flow Tracking documentation for details and theflow_trackerexample for usage.
- Programmatic access in operators: Operators can read data flow tracking information during execution via
Operators/Resources/Conditions
- New classes related to publish/subscribe (pub/sub) messaging were added (
PubSubContext,PubSubReceiver,PubSubTransmitter) as wrappers around underlying GXF pub/sub interfaces. In this release, Holoscan does not ship a concrete pub/sub backend implementation, so these APIs should be considered experimental scaffolding for future releases. Relatedly,IOSpecnow includestopicandqosmethods to describe intended pub/sub topic and QoS settings, which will become active once a concrete backend is provided.
Examples
- The
flow_trackerexample demonstrates the new data flow tracking capabilities: programmatic access viaget_data_flow_tracking_label()inside an operator and tracking to intermediate operators viaadd_probe_operator(). - Added
gpu_resident_example,gpu_resident_input, andgpu_resident_inference_exampledemonstrating GPU-resident execution: operator chains, device memory ports, (ingpu_resident_input) device-driven data ready control, and (ingpu_resident_inference_example) GPU-resident inference with the TensorRT (TRT) backend. - Added
matx_allocatorexample (examples/matx/matx_allocator/) demonstrating how to use theMatXAllocatoradapter withRMMAllocatorin a Holoscan pipeline with DLPack tensor interop.
Holoviz module
Holoinfer module
Utils
- Added
MatXAllocatorutility class (holoscan/utils/matx_allocator.hpp) — a lightweight adapter that enables any Holoscan SDK allocator (e.g.,RMMAllocator,BlockMemoryPool,StreamOrderedAllocator) to be used with MatX GPU tensor operations. Supports stream-aware async allocation/deallocation when used withCudaAllocator-derived allocators and a CUDA stream.
HoloHub
Source build & Release Artifacts
-
Added Holoscan Sensor Bridge 2.5.0
basic feature functionality in the Holoscan SDK developer container environment. Holoscan Sensor Bridge libraries and
tools are built with RDMA over Converged Ethernet (RoCE) support targeting discrete GPU platforms with
ConnectX Network Interface Card (NIC) such as DGX Spark or IGX platforms. Libraries may be found under
the/opt/nvidia/hololinkpath in the discrete GPU developer containers. Please refer to the Holoscan
Sensor Bridge User Guide for
detailed requirements.- Note that the containerized Holoscan Sensor Bridge libraries do not support integrated GPU networking stacks (Jetson, IGX Orin 500). Please rebuild Holoscan Sensor Bridge from source for iGPU platform support.
- Holoscan Sensor Bridge support for x86_64 configurations is experimental and not formally verified. Holoscan Sensor Bridge binaries have been included in the Holoscan SDK x86_64 development container to support developers with Holoscan Sensor Bridge emulation via software loopback. We do not recommend using real Holoscan Sensor Bridge hardware with x86_64 platforms at this time.
- Please run the container as the root user to enable full Holoscan Sensor Bridge runtime features.
-
Updated Holoscan SDK CMake import rules with improved handling for external dependencies.
Holoscan SDK C++ and Python binary distributions (Debian packages, Python wheels, etc) traditionally
contain runtime libraries and development rules for dependencies such as RMM, SPDLOG, YAML-CPP and more,
to reduce the downstream developer burden. In this version we improve "holoscan-config.cmake" and
associated files in the Holoscan SDK binary distribution to describe embedded dependencies with
dedicated project configuration files such as "rmm-config.cmake", and rely on CMake's "find_package"
mechanism to relax dependency embeddings.-
Downstream projects may override Holoscan SDK embedded CMake transitive dependencies with "find_package":
# Find RMM from a custom installation in your development environment find_package(rmm HINTS path/to/my/rmm) # Holoscan SDK will favor the custom RMM installation over its own embedded distribution find_package(holoscan)
-
Downstream projects may depend directly on the Holoscan SDK embedded dependencies:
# Populate any embedded dependencies not already provided find_package(holoscan) # If not set in advance, will use the RMM installation embedded in the Holoscan SDK distribution find_package(rmm) # result: rmm_DIR path is already set to the Holoscan SDK installation ... target_link_libraries(mylibrary PUBLIC rmm::rmm) # dependency imported targets are available
-
Note that some header-only dependencies (concurrentqueue, MatX) remain in the Holoscan SDK
distribution without explicit CMake imported targets to describe them.
-
-
Removed the legacy Holoscan SDK runtime Dockerfile as not supported. Please refer to
Holoscan CLI and
HoloHub sample apps for guidance and examples for
writing an optimized Dockerfile for your Holoscan-based project. -
The NGC
holoscan_dev_debresource (Debian packages up to HSDK 0.6) has been archived and
removed from NGC. Debian packages are now distributed exclusively via
NVIDIA Holoscan downloads; use the container
image, Python wheel, or Debian package options listed in the Release Artifacts section above.
Documentation
Breaking Changes
- Removed the
holoscan.cliPython module stub that was deprecated in v2.9. Users who need CLI
functionality should install the standaloneholoscan-cli
package viapip install holoscan-cli.
Bug fixes
| Issue | Description |
|---|---|
| 5601128 | On IGX Orin iGPU, fix segmentation fault by ensuring matx::make_tensor does not fall back from managed memory to host-pinned memory during allocation |
| 5898338 | Fix error where InferenceOp ignores CudaStreamPool passed positionally (only the named Arg works) |
Known Issues
| Issue | Description |
|---|---|
| 5211869 | On IGX Orin dGPU, error "Failed to start server on 0.0.0.0:10002" is reported when debugging the Distributed Endoscopy Tool Tracking application in VS Code. |
| 5606400 | On... |
v3.11.0
Release Artifacts
- 🐋 Docker container: tag
v3.11.0-cuda13,v3.11.0-cuda12-dgpuandv3.11.0-cuda12-igpu - 🐍 Python wheel:
pip install holoscan==3.11.0 - 📦️ Debian packages:
3.11.0.0-1 - 📕 Documentation
See supported platforms for compatibility.
Release Notes
New Features and Improvements
Core
-
Stream-Aware Deallocation: Added automatic CUDA stream propagation to
TensorandVideoBuffercomponents duringOutputContext::emit()to enable pool-based allocators (BlockMemoryPool,RMMAllocator,StreamOrderedAllocator) to defer memory reuse until GPU operations complete on the associated stream. This prevents potential race conditions where memory could be returned to the pool while GPU kernels are still reading from it when an operator returned from compute while GPU work was still in progress.For sink operators (operators that consume data but don't emit it), a new
set_deallocation_stream(cudaStream_t stream)method has been added toholoscan::Tensor(holoscan.Tensor.set_deallocation_stream(stream)in Python). Sink operators that return asynchronously while GPU work is ongoing should call this method on received tensors to inform the allocator which stream last accessed the tensor's memory. The method returnstrueif the stream was set successfully, orfalsefor tensors not managed by a Holoscan/GXF allocator (e.g., tensors from CuPy or PyTorch via DLPack).Stream propagation can be disabled by setting the environment variable
HOLOSCAN_DISABLE_ENTITY_STREAM_PROPAGATION=1. See the CUDA Stream Handling documentation for more details. -
Subgraph API Enhancements: Several improvements have been made to the
Subgraphclass to simplify authoring and usage:- Config file support: The
Subgraphconstructor now accepts an optionalconfig_fileparameter to specify a YAML configuration file for the subgraph, enablingfrom_config()duringcompose(). - Broadcast input ports: Input interface ports can now be connected to multiple internal operators, enabling a single external port to broadcast data to multiple destinations within the subgraph.
- Simplified
add_interface_port: The internal operator port name is now optional (defaults to the external port name), and port direction (input/output) can be auto-detected from the operator's port definitions. Fragment::add_subgraph(Fragment.add_subgraphin Python): New method to add a subgraph without interface ports that doesn't need to be connected viaadd_flow.Subgraph::operators()(Subgraph.operatorsin Python): New method to retrieve all operators belonging to the subgraph and its nested subgraphs.- Convenience wrappers: Added
add_data_loggerandregister_servicemethods toSubgraphthat delegate to the parentFragment. - Python bindings: The
InterfacePortclass and interface port retrieval methods (interface_ports,get_interface_operator_port) are now exposed in Python.
- Config file support: The
-
The
MetadataDictionaryobject returned by the C++Operator::metadata()method now has adeep_copy()method that creates independent copies of both the dictionary structure and all containedMetadataObjectinstances, preserving each entry'sMetadataPolicy. For Python, the dict-likeMetadataDictionaryobject returned by theOperator.metadataproperty implements__copy__(shallow copy) and__deepcopy__(callsdeep_copy()) dunder methods, socopy.copy()andcopy.deepcopy()from Python's built-incopymodule behave accordingly. The deep copy creates truly independent metadata dictionaries where modifications to one won't affect another. Note that values stored asstd::shared_ptr<T>are not deep-copied—only the pointer is copied. -
A unique, descriptive default name is assigned to all C++ Operator, Condition, Resource, Scheduler, and DataLogger classes provided by the SDK. Previously, for C++, a generic name such as "unnamed_condition" or "unnamed_resource" was used if the user did not provide a name string as the first argument to
make_condition,make_resource, etc. The default names in C++ improve consistency with the existing Python API. This change should not require any changes to existing applications but should make logging more informative when default names were used. -
All scheduler classes (
EventBasedScheduler,GreedyScheduler,MultiThreadScheduler) support a newnetwork_connection_timeoutparameter which defaults to 5000 ms. This parameter is used only by distributed applications. During the initial phase when network connections are being established, thisnetwork_connection_timeout(in ms) is used instead ofstop_on_deadlock_timeoutfor deadlock detection. The parameter should be set long enough to allow sufficient time for UCX connections to be established between all fragments without triggering false deadlock detection. After thenetwork_connection_timeoutperiod has expired, the application switches to using the standardstop_on_deadlock_timeoutfor deadlock detection. Single fragment applications ignorenetwork_connection_timeoutand use onlystop_on_deadlock_timeout. -
For distributed applications, if
Application::scheduler(Application.schedulerin Python) is called, it will now automatically set that scheduler for all fragments of the application unless those fragments already have their own scheduler set viaFragment::scheduler. Previously, theApplication::schedulermethod only worked for non-distributed applications and did not have any effect for distributed applications. -
Various inconsistencies in custom scheduler assignment for distributed applications have been resolved. Whenever the user explicitly sets a scheduler via
Fragment::scheduler(Fragment.schedulerin Python), that scheduler type is now always respected and the user-specified parameters are applied (unless overridden by environment variables). As before, when environment variablesHOLOSCAN_STOP_ON_DEADLOCK,HOLOSCAN_STOP_ON_DEADLOCK_TIMEOUT,HOLOSCAN_MAX_DURATION_MSand/orHOLOSCAN_CHECK_RECESSION_PERIOD_MSare specified, the values specified by these environment variables will override any corresponding argument that was set for the scheduler passed toFragment::scheduler. Similarly, ifHOLOSCAN_DISTRIBUTED_APP_SCHEDULERis set and that scheduler type does not match the user-defined one passed toFragment.scheduler, the scheduler specified by the environment variable will be used.
Operators/Resources/Conditions
-
CudaStreamConditionEnhancements: TheCudaStreamConditionhas been reimplemented to address several limitations of the original:- Multi-receiver support: Now supports multi-receiver ports (
IOSpec::kAnySize) where the number of connections is not known at compile time. Specify the base port name (e.g.,"receivers") and all matching ports (receivers:0,receivers:1, etc.) will be automatically discovered. - Queue size > 1: Now supports input ports with queue sizes greater than 1. By default, all messages in the queue are checked for CUDA streams (
check_all_messages=true). Set tofalseto only check the first message per receiver. - Multiple CudaStreamId per entity: Now correctly handles GXF entities containing more than one
CudaStreamIdcomponent. - Multiple port names: The new
receiversparameter accepts astd::vector<std::string>(list[str]in Python) of receiver names to monitor multiple input ports. - Backwards compatibility: The legacy
receiverparameter (single port name) is still supported but deprecated. A warning will be logged recommending migration to thereceiversparameter.
- Multi-receiver support: Now supports multi-receiver ports (
-
A new
notify_scheduler()method has been added to the baseConditionclass (holoscan::Condition::notify_schedulerin C++,holoscan.core.Condition.notify_schedulerin Python). This method allows event-based conditions (those returningkWaitEventfromcheck()) to signal to the scheduler that an asynchronous event has completed and the condition should be re-evaluated. This is used internally byCudaStreamConditionandAsynchronousCondition, and is now available for custom native conditions. -
The
Receiverclass now exposespeek()andsync()methods.peek(index)allows conditions to inspect messages in the queue without consuming them, andsync()moves messages from the back stage to the main stage for double-buffer queues. -
AsyncDataLoggerResource(and derived classes likeAsyncConsoleLogger) now supports a configurablequeue_typeparameter (DataLoggerQueueTypeenum) to select between two queue implementations:LockFree(default): High-throughput lock-free queue with per-producer FIFO ordering. Best for most use cases.Ordered: Mutex-based queue with strict global FIFO ordering across all producers. Use this when strict temporal ordering is required (e.g., compliance logging, debugging, causal analysis).
In YAML configuration, use
queue_type: "lock_free"orqueue_type: "ordered"(case-insensitive).
Examples
Holoviz module
- New present modes are now supported:
SHARED_DEMAND_REFRESHandSHARED_CONTINUOUS_REFRESHto support front buffer rendering. These present modes are available in exclusive display (direct rendering) mode only and can be used to implement front buffer rendering.FIFO_LATEST_READYwhich waits for vblank, presents the current image and...
v3.10.0.post1
Post 3.10.0 Release Fixes:
- Update GXF downloading URL in the Dockerfile to use the public NVIDIA artifactory URL.
v3.10.0
Release Artifacts
- 🐋 Docker container: tag
v3.10.0-cuda13,v3.10.0-cuda12-dgpuandv3.10.0-cuda12-igpu - 🐍 Python wheel:
pip install holoscan==3.10.0 - 📦️ Debian packages:
3.10.0.0-1 - 📕 Documentation
See supported platforms for compatibility.
Release Notes
New Features and Improvements
Core
- The C++ API now supports more convenient and flexible assignment of an
Argwith any integer type to a typedParameter<T>(whereTis a specific integer type). An error will be raised if the value stored inArgis outside the representable range forParameter<T>(i.e. only safe conversions are allowed). Similarly, an integer-valuedArgcan be used to setParameter<float>orParameter<double>as well. This allows passing arguments tomake_operator,make_resource,make_condition, etc. without having to match the exact target type of the parameter in these common cases. - UCX is now built with gdrcopy support for GPU Direct RDMA, enabling lower-latency GPU memory transfers in distributed applications when the gdrcopy kernel module is installed on the host.
- Holoscan now emits a warning when an input port is configured with
size > 1(includingIOSpec.PRECEDING_COUNT) and the defaultMessageAvailableConditionis used. In this configuration,min_sizeis implicitly set to the same value as the queue capacity, enabling batched execution. To future-proof applications and prepare for planned API evolution that will decouple queue capacity from batching, setmin_sizeexplicitly wheneversize > 1where possible. ForIOSpec.PRECEDING_COUNT, the resolved size is computed from the graph at run time; a plannedbatch_sizeconfiguration is intended to make this behavior explicitly configurable.
Operators/Resources/Conditions
- The
AsyncDataLoggerResourceclass has a new parameter,shutdown_wait_period_ms, that can be used to control how long (in milliseconds) the application will continue trying to log any pending messages in the queue(s) during application shutdown. This interval will be respected both for normal application termination as well as when terminated via signal interrupt (e.g. Ctrl+C). The default interval of -1 indicates that all remaining items in the queue should be logged prior to exiting.
Examples
Holoviz module
Holoinfer module
Utils
- UCX diagnostic utilities (
ucx_info,ucx_perftest) are now packaged with Holoscan SDK for verifying UCX transport capabilities and benchmarking performance.
HoloHub
Source build & Release Artifacts
- The Holoscan SDK build process can find libtorch libraries and headers from a PyTorch wheel installation on the system.
- The build container (root Dockerfile) and dev container (on NGC) include a functional installation of PyTorch which can be used with
import torch, version 2.8.0 for CUDA 12 images, 2.9.0 for CUDA 13 images. This is to supplement the existing installation of libtorch which is still leveraged by the holoinfer module and inference operator.- Note: cuda dependencies defined by PyTorch are explicitly ignored or deleted in favor of the system-provided ones to ensure compatibility with other components, whether pytorch is revendoring them or requiring them with pip.
- The Holoscan SDK iGPU dev container on NGC (which targets AGX Orin and IGX Orin iGPU) was downgraded from Ubuntu 24.04 to Ubuntu 22.04 to match JP6 and IGX SW 1.x OS versions, in order to provide compatibility with the PyTorch 2.8.0 wheel from pypi.jetson-ai-lab.io.
Documentation
Breaking Changes
- The
Subgraphclass constructor argument has been renamed frominstance_nametonamefor consistency with other Holoscan components. TheSubgraph::instance_name()method is now deprecated in favor ofSubgraph::name(). Theinstance_namekeyword argument andinstance_nameproperty are still supported in Python but will emit a deprecation warning. Users should update their code to usenameinstead. - Planned breaking change (future release): the
sizeargument toOperatorSpec::input/OperatorSpec.inputwill control receiver queue capacity only and will no longer implicitly enable batching by settingMessageAvailableCondition.min_size. Batching will require explicit configuration (e.g., setmin_sizedirectly, or use a plannedbatch_sizeconfiguration). Applications relying on implicit batching should setmin_sizeexplicitly now. For theIOSpec.PRECEDING_COUNTcase, a plannedbatch_sizeconfiguration is intended to provide an explicit equivalent to today's "batch by preceding-count" behavior.
Bug fixes
| Issue | Description |
|---|---|
| 5749574 | "import holoscan.pose_tree" fails for "holoscan" Conda package due to incompatibility with "rapidsai" UCXX |
| 5731859, 5681319 | When signal interrupt (e.g. SIGINT) is used to terminate an application, it may not wait for pending items in the AsyncDataLoggerResource's queues to be processed (or discarded cleanly). |
| 5722190 | Restored RDMA/InfiniBand support for UCX that was missing since v3.7.0. Added RDMA build dependencies (rdma-core, libibverbs-dev, librdmacm-dev) and gdrcopy to enable libuct_ib.so, libuct_ib_mlx5.so, and libuct_rdmacm.so transports for high-performance distributed applications. |
| 5536564 | Duplicate profiling paths with 0 avg latency numbers in cyclic app |
| 5061275, 5721978 | Python operators created via the decorator API (holoscan.decorator.create_op) do not support construction by passing Subgraph as the first argument to the constructor |
Known Issues
| Issue | Description |
|---|---|
| 5764538 | Holoscan SDK CUDA 12 is not compatible with PyTorch v2.9. See Q30: How do I fix segmentation faults when using PyTorch 2.9.x with Holoscan SDK v3.10 CUDA 12? in Holoscan SDK 3.10 FAQs |
| 5606400 | On AGX Thor Jedha, v4l2_camera_usb_webcam Python application fails with segmentation fault |
| 5427783 | On Jetson, distributed_pose_tree_multinode application fails between Jetson and nano/IGX/Jetson |
| 5211869 | On IGX dGPU, Error "Failed to start server on 0.0.0.0:10002" is reported when debugging Distributed Endoscopy Tool Tracking application in VScode |
v3.9.0
Release Artifacts
- 🐋 Docker container: tag
v3.9.0-cuda13,v3.9.0-cuda12-dgpuandv3.9.0-cuda12-igpu - 🐍 Python wheel:
pip install holoscan==3.9.0 - 📦️ Debian packages:
3.9.0.1-1 - 📕 Documentation
See supported platforms for compatibility.
Release Notes
New Features and Improvements
- Added explicit and additional support for data flow tracking with asynchronous lock-free buffers.
- Conditions note: sharing a single
BooleanConditioninstance across multiple operators is not
supported. In particular, attempting to share awindow_close_conditionacross multiple
HolovizOpinstances can prevent application shutdown when a window is closed. To support
coordinated shutdown, Holoviz now provides awindow_close_callbackparameter that lets
applications perform custom shutdown logic (e.g.,Application::stop_execution()) when the Holoviz window is closed.
Core
- The
Subgraphclass has a new methodset_dynamic_flowswhich is a wrapper around the associated fragment'sset_dynamic_flowsmethod. This allows callingset_dynamic_flowsdirectly when composing a fragment to provide a consistent syntax for defining dynamic flows within a subgraph vs. within a fragment. In Holoscan v3.8, using dynamic flows from a subgraph was possible, but required first retrieving the subgraph's fragment and then calling that fragment'sset_dynamic_flowsmethod. - Note (internal SDK changes; typical apps unaffected):
- Improved and unified parameter error handling and diagnostics:
- C++
ArgumentSetter::SetterFuncnow returnsboolto communicate success/failure. ArgumentSetter::set_paramthrows with a detailed message when a setter reports failure, includingarg_typedetails for faster debugging.- YAML decode/parse errors now emit clearer diagnostics and consistent log severity to aid configuration troubleshooting.
- C++
- Improved and unified parameter error handling and diagnostics:
Operators/Resources/Conditions
-
New method
ExecutionContext::is_gpu_availablecan be used to check if a GPU was detected in the system. This can be used by operators to provide an alternative code path for systems without a GPU. -
HolovizOp: added
window_close_callbackparameter (C++ and Python) invoked when the
display window is closed. This enables applications to perform custom shutdown logic
(e.g.,Application::stop_execution()), including multi-window setups. -
Note (internal SDK changes; typical apps unaffected):
- Aggregated error reporting when setting parameters:
GXFOperator::set_parameters()collects all parameter set failures and throws one exception summarizing all issues, including GXF error strings and codes, and the operator GXF type.GXFComponentResource::set_parameters()aggregates GXF parameter errors in the same fashion.GXFScheduler::set_parameters()aggregates GXF parameter errors and throws one exception summarizing all issues, including GXF error strings/codes and the scheduler GXF type.GXFNetworkContext::set_parameters()aggregates GXF parameter errors in the same fashion.ComponentBase::update_params_from_args()aggregates setter failures; unknown arguments are logged and included in the summary when other setter errors are present.
- Aggregated error reporting when setting parameters:
-
The visibility of the parameter member variables of
AsyncDataLoggerResourcehas been changed from private to protected. This way loggers inheriting from this class can directly access these parameters.
Examples
- Video Replayer (C++/Python): updated examples to wire
window_close_callbackso closing the
Holoviz window cleanly stops the application. Dual-window variants are supported.
Holoviz module
- Introduced
WindowCloseCallbackFunctionand corresponding parameter in HolovizOp; the callback is
executed during the window-close path. YAML converter and Python bindings were updated to acceptwindow_close_callback. - Added tests: C++ system test to verify callback invocation, plus Python unit/system tests to validate parameter wiring and non-intrusive behavior.
Holoinfer module
Operators
Utils
HoloHub
Documentation
Breaking Changes
-
Due to improved data flow tracking with asynchronous lock-free buffers, the
_oldterm is now reserved and cannot be used in operator names. If your operator names include_old, rename them. -
Note (internal SDK changes; typical apps unaffected):
- C++ API change (for advanced/custom integrations):
ArgumentSetter::SetterFuncsignature changed fromstd::function<void(ParameterWrapper&, Arg&)>tostd::function<bool(ParameterWrapper&, Arg&)>.- If you registered custom argument setters, update them to return
trueon success andfalseon failure.
- If you registered custom argument setters, update them to return
- Exception semantics during configuration:
ArgumentSetter::set_paramnow throwsstd::runtime_errorwhen a setter reports failure. Previously failures might have only been logged.- Higher-level code now aggregates and throws after attempting all parameter sets (Operators/Resources/Conditions/Schedulers/Network Context). Applications that previously continued after warnings may now see a single configuration-time exception summarizing all issues.
- Error propagation responsibilities: Adaptor callbacks return
gxf_result_t; callers compose context and throw as needed.
- C++ API change (for advanced/custom integrations):
-
If a configuration file is specified (non-empty filename string passed to
Fragment::configorApplication::Config), a runtime error will now be raised if the file does not exist. This changes previous behavior, which was to log a warning and attempt to continue with default parameters. The warning was easy to miss and could lead to confusion, so it was decided that an explicit error is preferable.
Bug fixes
| Issue | Description |
|---|---|
| 5489793 | Holoscan 'as_tensor' methods cannot convert singleton or null PyTorch GPU Tensors |
| 4384348 | warnings and errors logged on window close (or Ctrl+C termination) of video_replayer_distributed |
| 5327270 | an error should be thrown if the specified YAML configuration file does not exist |
Known Issues
| Issue | Description |
|---|---|
A Subgraph::set_dynamic_flows convenience method for adding dynamic flows within subgraphs is missing. The workaround is to use the underlying Fragment as obtained by the Subgraph::fragment() method (C++) / Subgraph.fragment property (Python) and call Fragment::set_dynamic_flows instead. |
|
| 5606400 | The v4l2_camera_usb_webcam (Python) example may segfault on Jetson AGX Thor due to unexpected py::gil_scoped_acquire behavior. |
| 5427783 | The distributed_pose_tree example may fail when distributed between two Jetson AGX Orin nodes |
| 5061275 | The ping_distributed example may fail to create a path between two Jetson AGX Orin nodes |
| 5681319 | The video_replayer_distributed example reports an error when closing the replayer window |
v3.8.0
Release Artifacts
- 🐋 Docker container: tag
v3.8.0-cuda13,v3.8.0-cuda12-dgpuandv3.8.0-cuda12-igpu - 🐍 Python wheel:
pip install holoscan==3.8.0 - 📦️ Debian packages:
3.8.0.0-1 - 📕 Documentation
See supported platforms for compatibility.
Release Notes
New Features and Improvements
- It is now possible to group operators into subgraphs that can be re-used when building more complex, modular Holoscan applications. For concrete examples see the new examples/subgraph folder as well as corresponding test applications (C++/Python).
Core
-
Codec Registry: decoupled from gxf and renamespaced from holoscan::gxf::CodecRegistry to holoscan::CodecRegistry.
-
The experimental
DataLoggerinterface now has an additionalstd::optional<CudaStream_t>stream argument. This CUDA stream information is now logged to the console byBasicConsoleLogger,GXFConsoleLoggerandAsyncConsoleLogger. The stream can optionally be used by concrete logger implementations during copying of data from device to host during logging (usingcudaMemcpyAsync).
Operators/Resources/Conditions
- HolovizOp: added specialized scheduling conditions that enable operators to synchronize their execution with display events. The
FirstPixelOutConditionandPresentDoneConditionconditions are particularly useful for rate-limiting and ensuring that frame generation is synchronized with the display's refresh cycle, reducing latency and improving visual quality. Note that thePresentDoneConditionis not supported on Orin iGPU. Also on supported systems, when using tools to remotely access the desktop, applications using thePresentDoneConditionmight hang or crash.
Examples
- added the
holoviz_conditionsexample showing the usage of the new Holoviz scheduling conditions.
Holoviz module
Holoinfer module
Operators
Utils
HoloHub
Documentation
Breaking Changes
-
The experimental
DataLoggerinterface has a breaking change to its API, adding a newstd::optional<CudaStream_t>stream argument. Concrete implementations must be updated to add this new argument to the method signatures forlog_data,log_tensor_data,log_tensormap_dataand/orlog_backend_specific. -
The
Fragment::make_executor(ArgsT&&... args)method was fixed so thatthis(Fragment*) is automatically passed as the first argument to theExecutorbefore anyargs. It is unlikely to affect existing applications as the methods are automatically called internally by Holoscan SDK. However, ifFragment::make_executoris being used, this change should be noted.
Bug fixes
| Issue | Description |
|---|---|
| 5529488 | Fix a bug where the default condition was also added to a port when a native Python Condition with a "receiver" or "transmitter" parameter was applied |
| 5381536 | CTest cases for examples/multithread/python/multithread.py tests were updated so that the timing tests only require a system with >4 threads to pass (previously required >8 threads) |
| 5552584 | When the incoming message contains a video tensor, it will ignore in_tensor_name, even if the tensor with the correct name is not a video |
| 5489793 | Holoscan 'as_tensor' methods cannot convert singleton or null PyTorch GPU Tensors |
Known Issues
| Issue | Description |
|---|---|
A Subgraph::set_dynamic_flows convenience method for adding dynamic flows within subgraphs is missing. The workaround is to use the underlying Fragment as obtained by the Subgraph::fragment() method (C++) / Subgraph.fragment property (Python) and call Fragment::set_dynamic_flows instead. |
v3.7.0.post1
Post 3.7.0 Release Fixes:
- Update GXF downloading URL in the Dockerfile to use the public NVIDIA artifactory URL.
v3.7.0
Release Artifacts
- 🐋 Docker container: tag
v3.7.0-cuda13,v3.7.0-cuda12-dgpuandv3.7.0-cuda12-igpu - 🐍 Python wheel for CUDA 12:
pip install holoscan-cu12==3.7.0 - 🐍 Python wheel for CUDA 13:
pip install holoscan-cu13==3.7.0 - 📦️ Debian packages:
3.7.0-3.7.0-1 - 📕 Documentation
See supported platforms for compatibility.
Release Notes
New Features and Improvements
Core
- Debian packages and Python wheels are now available for both CUDA 12 and CUDA 13 across Holoscan target platforms.
apt install holoscan-cuda-12,apt install holoscan-cuda-13, orapt install holoscan(default CUDA 12)python3 -m pip install holoscan-cu12orpython3 -m pip install holoscan-cu13
- HOLOSCAN_APP_DRIVER_PORT Environment Variable: Added support for overriding the default App Driver port via the
HOLOSCAN_APP_DRIVER_PORTenvironment variable. This allows users to customize the port number used by distributed applications when no explicit address is specified with the--addressoption. Default port is now57777instead of8765.
Operators/Resources/Conditions
- HolovizOp: in fullscreen mode GSync is now supported correctly and applications are flipping instead of blitting when displaying the next frame. This significantly improves performance and reduces latency with swap bound applications.
- FormatConverterOp: added support for converting RGBA16161616 to RGB888 and RGB161616 to RGB888
- Holoinfer: added support to use user-provided CUDA stream pool in
torchbackend - Green context: added support for the default CUDA green context pool in a fragment
- Stream pool and green context: added support for user-provided NVTX identifier used in NSight profiling
Examples
Holoviz module
Holoinfer module
Operators
Utils
HoloHub
Documentation
Breaking Changes
-
App Driver Default Port Changed: The default port number for the App Driver service has been changed from
8765to57777to avoid conflicts with the FIO (Flexible I/O Tester) service whose default port is also8765. This change affects distributed applications that rely on the default port without explicitly specifying an address.- Impact: Applications using
--driverand--workeroptions without the--addressparameter will now use port57777instead of8765. - Backward Compatibility: Users can override the default port using the
HOLOSCAN_APP_DRIVER_PORTenvironment variable (e.g.,HOLOSCAN_APP_DRIVER_PORT=8765) or by explicitly specifying the address with--addressoption. - Migration: No code changes are required for most users. Only users who have hardcoded expectations about port
8765or have firewall rules/network configurations specifically for that port may need to update their configurations.
- Impact: Applications using
-
The HOLOSCAN_UCX_ASYNCHRONOUS environment variable is now marked as deprecated. In the future only the synchronous mode for the underlying UCX transmitters will be supported (synchronous mode is the current default so there is no change in behavior for any users who were not explicitly changing this environment variable). Aside from raising a warning on application startup, this environment variable will continue to work as before for upcoming 3.x releases, but will be removed in v4.0.
Bug fixes
| Issue | Description |
|---|---|
| 5472498 | OutputContext::emit of std::shared_ptr<Tensor> or TensorMap results in logging via DataLogger::log_backend_specific interface instead of the expected (log_tensor_data or log_tensormap_data methods). |
Construction of holoscan.data_loggers.AsyncConsoleLogger fails if holoscan.data_loggers.SimpleTextSerializer has not already been explicitly imported |
|
FormatConverterOp assumes that U, V or UV-plane offsets for NV12 and YUV formats are equal to the Y color plane size when converting these types to RGB format. This may not always be true and color_planes[i].offset should be used instead. |
Known Issues
| Issue | Description |
|---|---|
| 5539840 | Holoinfer ONNX Runtime backend Input/Output CUDA buffer test segfault on IGX Orin iGPU |
| 5538962 | [L4T SIPL, JP7.0, Thor, Jedha, IQ, VB1940, HOLOLINK] I2C issues on Jedha with SIPL COE |
| 5529488 | Native Python port-based condition does not automatically disable the default condition a port |
| 5518571 | [HSB FPGA] Disabling packetizer in build causes second camera to fail to stream [IRIS] |
| 5502972 | HDMI Cameras not working with AJA card using aja_video_capture application |
| 5489793 | Holoscan 'as_tensor' methods cannot convert singleton or null Torch tensors |
| 5484505 | Thread Pools are not created properly in python |
| 5475816 | EXAMPLE_CPP_ACTIVATION_MAP fails on x86_64 w/ Debian container |
| 5470011 | CUDA 13 container is not compatible on IGX-dGPU. CUDA forward compatibility does not support graphics interop. |
| 5469227 | [Orin Nano] Failed to initialize component 00023 (cuda_green_context_pool) |
| 5469136 | Address Sanitizer pipeline build fails due to CUDA minimum version |
| 5464864 | Segmentation fault happens when close multiai_ultrasound window |
| 5460314 | PoseTree service access causes "pybind11::detail::get_type_info: type has multiple pybind11-registered bases" error messages |
| 5457599 | EXAMPLE_PYTHON_TENSOR_INTEROP_TEST passes despite edge case failure |
| 5427783 | [Concord] distributed_pose_tree_multinode failed between concord and nano/IGX/Concord |
| 5427678 | v4l_camera example visualization fails with Holoviz + >=R550 driver |
| 5425996 | Build Warnings with CMake 3.31 |
| 5424351 | [X86 Server] holoscan-sdk/public/python/tests/unit/test_gxf.py test failed. |
| 5411188 | GXF dispatcher thread runs with SCHED_OTHER while worker threads run with Linux RT scheduling policies |
| 5404972 | Real-time thread test failures |
| 5404947 | Libtorch configuration warnings |
| 5404926 | Address Sanitizer failures |
| 5392624 | ERROR: LeakSanitizer: detected memory leaks |
| 5390206 | Distributed PoseTree service using UCX cannot accept more than three clients |
| 5389580 | is_user_defined_root() incorrectly classifies operators as root in acyclic flow graphs |
| 5381536 | EXAMPLE_PYTHON_MULTITHREAD_TIMES_VALIDATION_TEST fails on CAGX |
| 5372221 | No support for int64 and bool in Holoscan Inference Operator |
| 5370867 | Async Distributed Video Replayer segfaults at exit on IGX dGPU |
| 5348375 | getting error running nsys profile |
| 5327270 | Holoscan only warns when a config file cannot be found on disk |
| 5211869 | [IGX-dGPU] Error "Failed to start server on 0.0.0.0:10002" in VScode Debug Distributed Endoscopy Tool Tracking application |
| 5162855 | Dual-GPU Configuration fails Holoscan application v2.9 |
| 5144233 | python-api-tracing-profile failure on Ubuntu 24.04 / Python 3.12 |
| 5061275 | Ping distributed multi-nodes failed to create path between two nodes |
| 5014059 | [IGX-dGPU] VScode Debug Distributed Endoscopy Tool Tracking application on an IGX w/dGPU, does not hit the expected debug points. |
| 4953020 | The HOLOINFER_TEST is failing, or a segmentation fault occurs during the parseFromFile() call. |
| 4789382 | [Holoscan SDK v2.2] InferenceOp with libtorch backend reports undefined symbols |
| 4768945 | Distributed applications crash when the engine file is unavailable/generating engine file |
| 4753994 | Debugging Python application may lead to segfault when expanding an operator variable |
| 4384348 | UCX termination (either ctrl+c , press 'Esc' or clicking close button) is not smooth and can show multiple error messages |
v3.6.1
Release Artifacts
🐋 Docker container: tag v3.6.1-cuda13-dgpu
📦️ Debian package (Jetson AGX Thor only): apt install holoscan=3.6.1
📕 Documentation
See supported platforms for compatibility.
Release Notes
Holoscan SDK v3.6.1 is a limited release providing Holoscan SDK CUDA 13.0 support for x86_64 workstations and the Jetson AGX Thor platform.
New Features and Improvements
Core
- Added support for CUDA 13.0 build environment and binary artifacts.
- HOLOSCAN_APP_DRIVER_PORT Environment Variable: Added support for overriding the default App Driver port via the
HOLOSCAN_APP_DRIVER_PORTenvironment variable. This allows users to customize the port number used by distributed applications when no explicit address is specified with the--addressoption. Default port is now57777instead of8765.
Breaking Changes
- App Driver Default Port Changed: The default port number for the App Driver service has been changed from
8765to57777to avoid conflicts with the FIO (Flexible I/O Tester) service whose default port is also8765. This change affects distributed applications that rely on the default port without explicitly specifying an address.- Impact: Applications using
--driverand--workeroptions without the--addressparameter will now use port57777instead of8765. - Backward Compatibility: Users can override the default port using the
HOLOSCAN_APP_DRIVER_PORTenvironment variable (e.g.,HOLOSCAN_APP_DRIVER_PORT=8765) or by explicitly specifying the address with--addressoption. - Migration: No code changes are required for most users. Only users who have hardcoded expectations about port
8765or have firewall rules/network configurations specifically for that port may need to update their configurations.
- Impact: Applications using
Bug fixes
| Issue | Description |
|---|