Skip to content

Releases: nvidia-holoscan/holoscan-sdk

v4.1.0

01 Apr 20:57
77d6387

Choose a tag to compare

Release Artifacts

See supported platforms for compatibility.

Release Notes

New Features and Improvements

Core
  • GPU-resident operators now support multiple input and output ports per connection.
  • Renamed the existing flow-oriented graph API to use the canonical FlowGraph / FlowGraphImpl C++ names and the Python holoscan.flow_graphs module, while reserving the Graph name and core/graphs path for future use.
  • The EventBasedScheduler now exposes advanced performance-tuning parameters from the underlying GXF scheduler that can reduce lock contention and improve scaling with higher worker thread counts. New optimizations include work stealing between worker queues, a worker-side post-check fast path that bypasses the dispatcher for READY/WAIT_TIME transitions, and sharding of internal notification and wait-state queues. In this release the work stealing (enable_queue_stealing) and post-check fast path optimizations (enable_worker_postcheck_fastpath) are not enabled by default to better preserve v4.0 scheduling behavior and get more real-world experience using these options. See the EventBasedScheduler documentation for details on each parameter.
  • The underlying GXF runtime now automatically uses an entity pool to reuse message entities, eliminating repeated entity creation/destruction overhead in high-throughput pipelines. This is transparent and requires no application changes. To disable pooling (e.g., for debugging), set the environment variable GXF_ENTITY_POOL_SIZE=0 before launching the application.
  • Fragment::add_subgraph now takes ownership of the subgraph, enabling a factory pattern where subgraphs of runtime-determined type can be created externally and added to a fragment or parent subgraph.
  • Added Fragment::subgraphs() and Subgraph::nested_subgraphs() accessors for inspecting the subgraph hierarchy (e.g. for graph visualization).
Operators/Resources/Conditions
  • HolovizOp now supports 16-bit signed and unsigned integer RGB, 32-bit floating point RGB, and 16-bit floating point R, RGB and RGBA image formats.
Examples
Holoviz module
  • Added support for R16G16B16_UNORM, R16G16B16_SNORM, R16G16B16_SFLOAT and R32G32B32_SFLOAT image formats.
Holoinfer module
Utils
HoloHub
Source build & Release Artifacts
Documentation

Breaking Changes

  • Removed the holoscan.cli Python module stub that was deprecated in v2.9. Users who need CLI
    functionality should install the standalone holoscan-cli
    package via pip install holoscan-cli.

Bug fixes

Issue Description
5606400, 5929120 Resolved Python GIL crash (Fatal Python error: take_gil: PyMUTEX_LOCK(gil->mutex) failed or SIGSEGV) that occurred when running the V4L2 camera Python example on AGX Orin iGPU and AGX Thor iGPU. The issue was caused by libv4l2 plugin loading, which corrupted glibc TLS destructor pointers through dlopen/dlclose of NVIDIA V4L2 codec plugins, as well as a stale PyThreadState left in CPython's GILState TSS, leading to a poisoned PyGILState_Ensure() on GXF worker threads. The immediate solution was to avoid using libv4l2 wrappers in V4L2VideoCaptureOp. A permanent fix is being developed on the NVIDIA V4L2 plugin side.
5933258 Fixed SIGSEGV in __nptl_deallocate_tsd during thread teardown when running the V4L2 camera C++ example on AGX Orin iGPU and AGX Thor iGPU. Root cause: v4l2_open() loaded all NVIDIA V4L2 plugins indiscriminately, including the Thor/OpenRM CUVID plugin (libv4l2_nvcuvidvideocodec.so) whose libEGL/libGLdispatch dependency chain registered a pthread_key_create TLS destructor that became a dangling pointer after dlclose. Fix: replace v4l2_open()/v4l2_close() with raw POSIX open()/close() in V4L2VideoCaptureOp. A permanent fix is being developed on the NVIDIA V4L2 plugin side.
5946872 Fixed CuPy failing with Permission denied when the container runs as a non-root user other than the default Ubuntu user (UID 1000), e.g. docker run -u 1001:1001. CuPy was writing its kernel cache under $HOME/.cupy/kernel_cache, which may not exist or be writable for arbitrary UIDs. Official Holoscan Docker images now set CUPY_CACHE_DIR=/tmp/.cupy/kernel_cache (world-writable), and the run script exports a writable cache path for dev-container workflows. Custom images that omit that ENV should set CUPY_CACHE_DIR to a writable directory before running Python.

Known Issues

Issue Description
5211869 On IGX Orin dGPU, error "Failed to start server on 0.0.0.0:10002" is reported when debugging the Distributed Endoscopy Tool Tracking application in VS Code.
5928213 On IGX Thor dGPU (Blackwell), holoviz_conditions C++ example application launches with no frame detected and reports the error VK_KHR_present_wait is not available.

v4.0.0

09 Mar 19:47
874dc9d

Choose a tag to compare

Release Artifacts

See supported platforms for compatibility.

Release Notes

New Features and Improvements

Core
  • GPU-resident graph (C++): Holoscan SDK introduces a new execution
    mode, called GPU-resident graph execution, that keeps the CUDA compute pipeline on
    the GPU for the lifetime of the application, reducing CPU scheduling
    overheads, CUDA kernel launch cost, CPU-GPU coordination and improving
    deterministic, low-latency behavior.

    • GPU-resident operators and fragments: Operators inheriting from holoscan::GPUResidentOperator use device memory for ports and are captured into CUDA Graphs. All operators in a fragment must be GPU-resident for that fragment to use GPU-resident graph execution; currently only linear chains of operators are supported.
    • Port declaration: Ports can be declared by memory block size (executor-allocated shared device memory) or by device pointer (operator-managed); see the GPU-resident user guide for connection semantics.
    • Data ready handler (optional): A separate GPU-resident fragment can be registered as a data ready handler to run at the start of each iteration and decide when input data is ready, enabling sensor-driven pipelines with GPU-direct technologies (e.g., Holoscan Sensor Bridge) without host CPU involvement.
    • Host control API: After run_async(), use Fragment::gpu_resident() to signal data ready, poll result ready, and configure timeout and optional host sync; see the GPU-resident user guide and examples.
    • GPU-resident graph execution is supported in C++ only; Python support is planned for a future release.
  • Data Flow Tracking - programmatic access and probing: Data flow tracking now supports richer observability and targeting of arbitrary operators.

    • Programmatic access in operators: Operators can read data flow tracking information during execution via get_data_flow_tracking_label() (C++/Python). This returns a MessageLabel for a given input port, with path names, number of paths, and timing information, enabling adaptive behavior, debugging, in-operator and runtime analysis without waiting for post-run results.
    • Tracking to arbitrary operators: Use DataFlowTracker::add_probe_operator() to treat any intermediate (non-root, non-leaf) operator as a probe point. The tracker then reports latency from root operators to the probed operator and the number of messages published by that operator. Probe operators are currently validated with DoubleBufferTransmitter/DoubleBufferReceiver and default connection configurations; see the Data Flow Tracking documentation for details and the flow_tracker example for usage.
Operators/Resources/Conditions
  • New classes related to publish/subscribe (pub/sub) messaging were added (PubSubContext, PubSubReceiver, PubSubTransmitter) as wrappers around underlying GXF pub/sub interfaces. In this release, Holoscan does not ship a concrete pub/sub backend implementation, so these APIs should be considered experimental scaffolding for future releases. Relatedly, IOSpec now includes topic and qos methods to describe intended pub/sub topic and QoS settings, which will become active once a concrete backend is provided.
Examples
  • The flow_tracker example demonstrates the new data flow tracking capabilities: programmatic access via get_data_flow_tracking_label() inside an operator and tracking to intermediate operators via add_probe_operator().
  • Added gpu_resident_example, gpu_resident_input, and gpu_resident_inference_example demonstrating GPU-resident execution: operator chains, device memory ports, (in gpu_resident_input) device-driven data ready control, and (in gpu_resident_inference_example) GPU-resident inference with the TensorRT (TRT) backend.
  • Added matx_allocator example (examples/matx/matx_allocator/) demonstrating how to use the MatXAllocator adapter with RMMAllocator in a Holoscan pipeline with DLPack tensor interop.
Holoviz module
Holoinfer module
Utils
  • Added MatXAllocator utility class (holoscan/utils/matx_allocator.hpp) — a lightweight adapter that enables any Holoscan SDK allocator (e.g., RMMAllocator, BlockMemoryPool, StreamOrderedAllocator) to be used with MatX GPU tensor operations. Supports stream-aware async allocation/deallocation when used with CudaAllocator-derived allocators and a CUDA stream.
HoloHub
Source build & Release Artifacts
  • Added Holoscan Sensor Bridge 2.5.0
    basic feature functionality in the Holoscan SDK developer container environment. Holoscan Sensor Bridge libraries and
    tools are built with RDMA over Converged Ethernet (RoCE) support targeting discrete GPU platforms with
    ConnectX Network Interface Card (NIC) such as DGX Spark or IGX platforms. Libraries may be found under
    the /opt/nvidia/hololink path in the discrete GPU developer containers. Please refer to the Holoscan
    Sensor Bridge User Guide for
    detailed requirements.

    • Note that the containerized Holoscan Sensor Bridge libraries do not support integrated GPU networking stacks (Jetson, IGX Orin 500). Please rebuild Holoscan Sensor Bridge from source for iGPU platform support.
    • Holoscan Sensor Bridge support for x86_64 configurations is experimental and not formally verified. Holoscan Sensor Bridge binaries have been included in the Holoscan SDK x86_64 development container to support developers with Holoscan Sensor Bridge emulation via software loopback. We do not recommend using real Holoscan Sensor Bridge hardware with x86_64 platforms at this time.
    • Please run the container as the root user to enable full Holoscan Sensor Bridge runtime features.
  • Updated Holoscan SDK CMake import rules with improved handling for external dependencies.
    Holoscan SDK C++ and Python binary distributions (Debian packages, Python wheels, etc) traditionally
    contain runtime libraries and development rules for dependencies such as RMM, SPDLOG, YAML-CPP and more,
    to reduce the downstream developer burden. In this version we improve "holoscan-config.cmake" and
    associated files in the Holoscan SDK binary distribution to describe embedded dependencies with
    dedicated project configuration files such as "rmm-config.cmake", and rely on CMake's "find_package"
    mechanism to relax dependency embeddings.

    • Downstream projects may override Holoscan SDK embedded CMake transitive dependencies with "find_package":

      # Find RMM from a custom installation in your development environment
      find_package(rmm HINTS path/to/my/rmm)
      
      # Holoscan SDK will favor the custom RMM installation over its own embedded distribution
      find_package(holoscan)
    • Downstream projects may depend directly on the Holoscan SDK embedded dependencies:

      # Populate any embedded dependencies not already provided
      find_package(holoscan)
      
      # If not set in advance, will use the RMM installation embedded in the Holoscan SDK distribution
      find_package(rmm)     # result: rmm_DIR path is already set to the Holoscan SDK installation
      ...
      target_link_libraries(mylibrary PUBLIC rmm::rmm)   # dependency imported targets are available
    • Note that some header-only dependencies (concurrentqueue, MatX) remain in the Holoscan SDK
      distribution without explicit CMake imported targets to describe them.

  • Removed the legacy Holoscan SDK runtime Dockerfile as not supported. Please refer to
    Holoscan CLI and
    HoloHub sample apps for guidance and examples for
    writing an optimized Dockerfile for your Holoscan-based project.

  • The NGC holoscan_dev_deb resource (Debian packages up to HSDK 0.6) has been archived and
    removed from NGC. Debian packages are now distributed exclusively via
    NVIDIA Holoscan downloads; use the container
    image, Python wheel, or Debian package options listed in the Release Artifacts section above.

Documentation

Breaking Changes

  • Removed the holoscan.cli Python module stub that was deprecated in v2.9. Users who need CLI
    functionality should install the standalone holoscan-cli
    package via pip install holoscan-cli.

Bug fixes

Issue Description
5601128 On IGX Orin iGPU, fix segmentation fault by ensuring matx::make_tensor does not fall back from managed memory to host-pinned memory during allocation
5898338 Fix error where InferenceOp ignores CudaStreamPool passed positionally (only the named Arg works)

Known Issues

Issue Description
5211869 On IGX Orin dGPU, error "Failed to start server on 0.0.0.0:10002" is reported when debugging the Distributed Endoscopy Tool Tracking application in VS Code.
5606400 On...
Read more

v3.11.0

04 Feb 23:07
c5d3327

Choose a tag to compare

Release Artifacts

See supported platforms for compatibility.

Release Notes

New Features and Improvements

Core
  • Stream-Aware Deallocation: Added automatic CUDA stream propagation to Tensor and VideoBuffer components during OutputContext::emit() to enable pool-based allocators (BlockMemoryPool, RMMAllocator, StreamOrderedAllocator) to defer memory reuse until GPU operations complete on the associated stream. This prevents potential race conditions where memory could be returned to the pool while GPU kernels are still reading from it when an operator returned from compute while GPU work was still in progress.

    For sink operators (operators that consume data but don't emit it), a new set_deallocation_stream(cudaStream_t stream) method has been added to holoscan::Tensor (holoscan.Tensor.set_deallocation_stream(stream) in Python). Sink operators that return asynchronously while GPU work is ongoing should call this method on received tensors to inform the allocator which stream last accessed the tensor's memory. The method returns true if the stream was set successfully, or false for tensors not managed by a Holoscan/GXF allocator (e.g., tensors from CuPy or PyTorch via DLPack).

    Stream propagation can be disabled by setting the environment variable HOLOSCAN_DISABLE_ENTITY_STREAM_PROPAGATION=1. See the CUDA Stream Handling documentation for more details.

  • Subgraph API Enhancements: Several improvements have been made to the Subgraph class to simplify authoring and usage:

    • Config file support: The Subgraph constructor now accepts an optional config_file parameter to specify a YAML configuration file for the subgraph, enabling from_config() during compose().
    • Broadcast input ports: Input interface ports can now be connected to multiple internal operators, enabling a single external port to broadcast data to multiple destinations within the subgraph.
    • Simplified add_interface_port: The internal operator port name is now optional (defaults to the external port name), and port direction (input/output) can be auto-detected from the operator's port definitions.
    • Fragment::add_subgraph (Fragment.add_subgraph in Python): New method to add a subgraph without interface ports that doesn't need to be connected via add_flow.
    • Subgraph::operators() (Subgraph.operators in Python): New method to retrieve all operators belonging to the subgraph and its nested subgraphs.
    • Convenience wrappers: Added add_data_logger and register_service methods to Subgraph that delegate to the parent Fragment.
    • Python bindings: The InterfacePort class and interface port retrieval methods (interface_ports, get_interface_operator_port) are now exposed in Python.
  • The MetadataDictionary object returned by the C++ Operator::metadata() method now has a deep_copy() method that creates independent copies of both the dictionary structure and all contained MetadataObject instances, preserving each entry's MetadataPolicy. For Python, the dict-like MetadataDictionary object returned by the Operator.metadata property implements __copy__ (shallow copy) and __deepcopy__ (calls deep_copy()) dunder methods, so copy.copy() and copy.deepcopy() from Python's built-in copy module behave accordingly. The deep copy creates truly independent metadata dictionaries where modifications to one won't affect another. Note that values stored as std::shared_ptr<T> are not deep-copied—only the pointer is copied.

  • A unique, descriptive default name is assigned to all C++ Operator, Condition, Resource, Scheduler, and DataLogger classes provided by the SDK. Previously, for C++, a generic name such as "unnamed_condition" or "unnamed_resource" was used if the user did not provide a name string as the first argument to make_condition, make_resource, etc. The default names in C++ improve consistency with the existing Python API. This change should not require any changes to existing applications but should make logging more informative when default names were used.

  • All scheduler classes (EventBasedScheduler, GreedyScheduler, MultiThreadScheduler) support a new network_connection_timeout parameter which defaults to 5000 ms. This parameter is used only by distributed applications. During the initial phase when network connections are being established, this network_connection_timeout (in ms) is used instead of stop_on_deadlock_timeout for deadlock detection. The parameter should be set long enough to allow sufficient time for UCX connections to be established between all fragments without triggering false deadlock detection. After the network_connection_timeout period has expired, the application switches to using the standard stop_on_deadlock_timeout for deadlock detection. Single fragment applications ignore network_connection_timeout and use only stop_on_deadlock_timeout.

  • For distributed applications, if Application::scheduler (Application.scheduler in Python) is called, it will now automatically set that scheduler for all fragments of the application unless those fragments already have their own scheduler set via Fragment::scheduler. Previously, the Application::scheduler method only worked for non-distributed applications and did not have any effect for distributed applications.

  • Various inconsistencies in custom scheduler assignment for distributed applications have been resolved. Whenever the user explicitly sets a scheduler via Fragment::scheduler (Fragment.scheduler in Python), that scheduler type is now always respected and the user-specified parameters are applied (unless overridden by environment variables). As before, when environment variables HOLOSCAN_STOP_ON_DEADLOCK, HOLOSCAN_STOP_ON_DEADLOCK_TIMEOUT, HOLOSCAN_MAX_DURATION_MS and/or HOLOSCAN_CHECK_RECESSION_PERIOD_MS are specified, the values specified by these environment variables will override any corresponding argument that was set for the scheduler passed to Fragment::scheduler. Similarly, if HOLOSCAN_DISTRIBUTED_APP_SCHEDULER is set and that scheduler type does not match the user-defined one passed to Fragment.scheduler, the scheduler specified by the environment variable will be used.

Operators/Resources/Conditions
  • CudaStreamCondition Enhancements: The CudaStreamCondition has been reimplemented to address several limitations of the original:

    • Multi-receiver support: Now supports multi-receiver ports (IOSpec::kAnySize) where the number of connections is not known at compile time. Specify the base port name (e.g., "receivers") and all matching ports (receivers:0, receivers:1, etc.) will be automatically discovered.
    • Queue size > 1: Now supports input ports with queue sizes greater than 1. By default, all messages in the queue are checked for CUDA streams (check_all_messages=true). Set to false to only check the first message per receiver.
    • Multiple CudaStreamId per entity: Now correctly handles GXF entities containing more than one CudaStreamId component.
    • Multiple port names: The new receivers parameter accepts a std::vector<std::string> (list[str] in Python) of receiver names to monitor multiple input ports.
    • Backwards compatibility: The legacy receiver parameter (single port name) is still supported but deprecated. A warning will be logged recommending migration to the receivers parameter.
  • A new notify_scheduler() method has been added to the base Condition class (holoscan::Condition::notify_scheduler in C++, holoscan.core.Condition.notify_scheduler in Python). This method allows event-based conditions (those returning kWaitEvent from check()) to signal to the scheduler that an asynchronous event has completed and the condition should be re-evaluated. This is used internally by CudaStreamCondition and AsynchronousCondition, and is now available for custom native conditions.

  • The Receiver class now exposes peek() and sync() methods. peek(index) allows conditions to inspect messages in the queue without consuming them, and sync() moves messages from the back stage to the main stage for double-buffer queues.

  • AsyncDataLoggerResource (and derived classes like AsyncConsoleLogger) now supports a configurable queue_type parameter (DataLoggerQueueType enum) to select between two queue implementations:

    • LockFree (default): High-throughput lock-free queue with per-producer FIFO ordering. Best for most use cases.
    • Ordered: Mutex-based queue with strict global FIFO ordering across all producers. Use this when strict temporal ordering is required (e.g., compliance logging, debugging, causal analysis).

    In YAML configuration, use queue_type: "lock_free" or queue_type: "ordered" (case-insensitive).

Examples
Holoviz module
  • New present modes are now supported:
    • SHARED_DEMAND_REFRESH and SHARED_CONTINUOUS_REFRESH to support front buffer rendering. These present modes are available in exclusive display (direct rendering) mode only and can be used to implement front buffer rendering.
    • FIFO_LATEST_READY which waits for vblank, presents the current image and...
Read more

v3.10.0.post1

12 Jan 22:26
134bebd

Choose a tag to compare

Post 3.10.0 Release Fixes:

  • Update GXF downloading URL in the Dockerfile to use the public NVIDIA artifactory URL.

v3.10.0

12 Jan 21:57
e7cd7d1

Choose a tag to compare

Release Artifacts

See supported platforms for compatibility.

Release Notes

New Features and Improvements

Core
  • The C++ API now supports more convenient and flexible assignment of an Arg with any integer type to a typed Parameter<T> (where T is a specific integer type). An error will be raised if the value stored in Arg is outside the representable range for Parameter<T> (i.e. only safe conversions are allowed). Similarly, an integer-valued Arg can be used to set Parameter<float> or Parameter<double> as well. This allows passing arguments to make_operator, make_resource, make_condition, etc. without having to match the exact target type of the parameter in these common cases.
  • UCX is now built with gdrcopy support for GPU Direct RDMA, enabling lower-latency GPU memory transfers in distributed applications when the gdrcopy kernel module is installed on the host.
  • Holoscan now emits a warning when an input port is configured with size > 1 (including IOSpec.PRECEDING_COUNT) and the default MessageAvailableCondition is used. In this configuration, min_size is implicitly set to the same value as the queue capacity, enabling batched execution. To future-proof applications and prepare for planned API evolution that will decouple queue capacity from batching, set min_size explicitly whenever size > 1 where possible. For IOSpec.PRECEDING_COUNT, the resolved size is computed from the graph at run time; a planned batch_size configuration is intended to make this behavior explicitly configurable.
Operators/Resources/Conditions
  • The AsyncDataLoggerResource class has a new parameter, shutdown_wait_period_ms, that can be used to control how long (in milliseconds) the application will continue trying to log any pending messages in the queue(s) during application shutdown. This interval will be respected both for normal application termination as well as when terminated via signal interrupt (e.g. Ctrl+C). The default interval of -1 indicates that all remaining items in the queue should be logged prior to exiting.
Examples
Holoviz module
Holoinfer module
Utils
  • UCX diagnostic utilities (ucx_info, ucx_perftest) are now packaged with Holoscan SDK for verifying UCX transport capabilities and benchmarking performance.
HoloHub
Source build & Release Artifacts
  • The Holoscan SDK build process can find libtorch libraries and headers from a PyTorch wheel installation on the system.
  • The build container (root Dockerfile) and dev container (on NGC) include a functional installation of PyTorch which can be used with import torch, version 2.8.0 for CUDA 12 images, 2.9.0 for CUDA 13 images. This is to supplement the existing installation of libtorch which is still leveraged by the holoinfer module and inference operator.
    • Note: cuda dependencies defined by PyTorch are explicitly ignored or deleted in favor of the system-provided ones to ensure compatibility with other components, whether pytorch is revendoring them or requiring them with pip.
  • The Holoscan SDK iGPU dev container on NGC (which targets AGX Orin and IGX Orin iGPU) was downgraded from Ubuntu 24.04 to Ubuntu 22.04 to match JP6 and IGX SW 1.x OS versions, in order to provide compatibility with the PyTorch 2.8.0 wheel from pypi.jetson-ai-lab.io.
Documentation

Breaking Changes

  • The Subgraph class constructor argument has been renamed from instance_name to name for consistency with other Holoscan components. The Subgraph::instance_name() method is now deprecated in favor of Subgraph::name(). The instance_name keyword argument and instance_name property are still supported in Python but will emit a deprecation warning. Users should update their code to use name instead.
  • Planned breaking change (future release): the size argument to OperatorSpec::input / OperatorSpec.input will control receiver queue capacity only and will no longer implicitly enable batching by setting MessageAvailableCondition.min_size. Batching will require explicit configuration (e.g., set min_size directly, or use a planned batch_size configuration). Applications relying on implicit batching should set min_size explicitly now. For the IOSpec.PRECEDING_COUNT case, a planned batch_size configuration is intended to provide an explicit equivalent to today's "batch by preceding-count" behavior.

Bug fixes

Issue Description
5749574 "import holoscan.pose_tree" fails for "holoscan" Conda package due to incompatibility with "rapidsai" UCXX
5731859, 5681319 When signal interrupt (e.g. SIGINT) is used to terminate an application, it may not wait for pending items in the AsyncDataLoggerResource's queues to be processed (or discarded cleanly).
5722190 Restored RDMA/InfiniBand support for UCX that was missing since v3.7.0. Added RDMA build dependencies (rdma-core, libibverbs-dev, librdmacm-dev) and gdrcopy to enable libuct_ib.so, libuct_ib_mlx5.so, and libuct_rdmacm.so transports for high-performance distributed applications.
5536564 Duplicate profiling paths with 0 avg latency numbers in cyclic app
5061275, 5721978 Python operators created via the decorator API (holoscan.decorator.create_op) do not support construction by passing Subgraph as the first argument to the constructor

Known Issues

Issue Description
5764538 Holoscan SDK CUDA 12 is not compatible with PyTorch v2.9. See Q30: How do I fix segmentation faults when using PyTorch 2.9.x with Holoscan SDK v3.10 CUDA 12? in Holoscan SDK 3.10 FAQs
5606400 On AGX Thor Jedha, v4l2_camera_usb_webcam Python application fails with segmentation fault
5427783 On Jetson, distributed_pose_tree_multinode application fails between Jetson and nano/IGX/Jetson
5211869 On IGX dGPU, Error "Failed to start server on 0.0.0.0:10002" is reported when debugging Distributed Endoscopy Tool Tracking application in VScode

v3.9.0

05 Dec 16:37
3500b33

Choose a tag to compare

Release Artifacts

See supported platforms for compatibility.

Release Notes

New Features and Improvements

  • Added explicit and additional support for data flow tracking with asynchronous lock-free buffers.
  • Conditions note: sharing a single BooleanCondition instance across multiple operators is not
    supported. In particular, attempting to share a window_close_condition across multiple
    HolovizOp instances can prevent application shutdown when a window is closed. To support
    coordinated shutdown, Holoviz now provides a window_close_callback parameter that lets
    applications perform custom shutdown logic (e.g., Application::stop_execution()) when the Holoviz window is closed.
Core
  • The Subgraph class has a new method set_dynamic_flows which is a wrapper around the associated fragment's set_dynamic_flows method. This allows calling set_dynamic_flows directly when composing a fragment to provide a consistent syntax for defining dynamic flows within a subgraph vs. within a fragment. In Holoscan v3.8, using dynamic flows from a subgraph was possible, but required first retrieving the subgraph's fragment and then calling that fragment's set_dynamic_flows method.
  • Note (internal SDK changes; typical apps unaffected):
    • Improved and unified parameter error handling and diagnostics:
      • C++ ArgumentSetter::SetterFunc now returns bool to communicate success/failure.
      • ArgumentSetter::set_param throws with a detailed message when a setter reports failure, including arg_type details for faster debugging.
      • YAML decode/parse errors now emit clearer diagnostics and consistent log severity to aid configuration troubleshooting.
Operators/Resources/Conditions
  • New method ExecutionContext::is_gpu_available can be used to check if a GPU was detected in the system. This can be used by operators to provide an alternative code path for systems without a GPU.

  • HolovizOp: added window_close_callback parameter (C++ and Python) invoked when the
    display window is closed. This enables applications to perform custom shutdown logic
    (e.g., Application::stop_execution()), including multi-window setups.

  • Note (internal SDK changes; typical apps unaffected):

    • Aggregated error reporting when setting parameters:
      • GXFOperator::set_parameters() collects all parameter set failures and throws one exception summarizing all issues, including GXF error strings and codes, and the operator GXF type.
      • GXFComponentResource::set_parameters() aggregates GXF parameter errors in the same fashion.
      • GXFScheduler::set_parameters() aggregates GXF parameter errors and throws one exception summarizing all issues, including GXF error strings/codes and the scheduler GXF type.
      • GXFNetworkContext::set_parameters() aggregates GXF parameter errors in the same fashion.
      • ComponentBase::update_params_from_args() aggregates setter failures; unknown arguments are logged and included in the summary when other setter errors are present.
  • The visibility of the parameter member variables of AsyncDataLoggerResource has been changed from private to protected. This way loggers inheriting from this class can directly access these parameters.

Examples
  • Video Replayer (C++/Python): updated examples to wire window_close_callback so closing the
    Holoviz window cleanly stops the application. Dual-window variants are supported.
Holoviz module
  • Introduced WindowCloseCallbackFunction and corresponding parameter in HolovizOp; the callback is
    executed during the window-close path. YAML converter and Python bindings were updated to accept window_close_callback.
  • Added tests: C++ system test to verify callback invocation, plus Python unit/system tests to validate parameter wiring and non-intrusive behavior.
Holoinfer module
Operators
Utils
HoloHub
Documentation

Breaking Changes

  • Due to improved data flow tracking with asynchronous lock-free buffers, the _old term is now reserved and cannot be used in operator names. If your operator names include _old, rename them.

  • Note (internal SDK changes; typical apps unaffected):

    • C++ API change (for advanced/custom integrations): ArgumentSetter::SetterFunc signature changed from std::function<void(ParameterWrapper&, Arg&)> to std::function<bool(ParameterWrapper&, Arg&)>.
      • If you registered custom argument setters, update them to return true on success and false on failure.
    • Exception semantics during configuration:
      • ArgumentSetter::set_param now throws std::runtime_error when a setter reports failure. Previously failures might have only been logged.
      • Higher-level code now aggregates and throws after attempting all parameter sets (Operators/Resources/Conditions/Schedulers/Network Context). Applications that previously continued after warnings may now see a single configuration-time exception summarizing all issues.
    • Error propagation responsibilities: Adaptor callbacks return gxf_result_t; callers compose context and throw as needed.
  • If a configuration file is specified (non-empty filename string passed to Fragment::config or Application::Config), a runtime error will now be raised if the file does not exist. This changes previous behavior, which was to log a warning and attempt to continue with default parameters. The warning was easy to miss and could lead to confusion, so it was decided that an explicit error is preferable.

Bug fixes

Issue Description
5489793 Holoscan 'as_tensor' methods cannot convert singleton or null PyTorch GPU Tensors
4384348 warnings and errors logged on window close (or Ctrl+C termination) of video_replayer_distributed
5327270 an error should be thrown if the specified YAML configuration file does not exist

Known Issues

Issue Description
A Subgraph::set_dynamic_flows convenience method for adding dynamic flows within subgraphs is missing. The workaround is to use the underlying Fragment as obtained by the Subgraph::fragment() method (C++) / Subgraph.fragment property (Python) and call Fragment::set_dynamic_flows instead.
5606400 The v4l2_camera_usb_webcam (Python) example may segfault on Jetson AGX Thor due to unexpected py::gil_scoped_acquire behavior.
5427783 The distributed_pose_tree example may fail when distributed between two Jetson AGX Orin nodes
5061275 The ping_distributed example may fail to create a path between two Jetson AGX Orin nodes
5681319 The video_replayer_distributed example reports an error when closing the replayer window

v3.8.0

10 Nov 19:27

Choose a tag to compare

Release Artifacts

See supported platforms for compatibility.

Release Notes

New Features and Improvements

  • It is now possible to group operators into subgraphs that can be re-used when building more complex, modular Holoscan applications. For concrete examples see the new examples/subgraph folder as well as corresponding test applications (C++/Python).
Core
  • Codec Registry: decoupled from gxf and renamespaced from holoscan::gxf::CodecRegistry to holoscan::CodecRegistry.

  • The experimental DataLogger interface now has an additional std::optional<CudaStream_t> stream argument. This CUDA stream information is now logged to the console by BasicConsoleLogger, GXFConsoleLogger and AsyncConsoleLogger. The stream can optionally be used by concrete logger implementations during copying of data from device to host during logging (using cudaMemcpyAsync).

Operators/Resources/Conditions
  • HolovizOp: added specialized scheduling conditions that enable operators to synchronize their execution with display events. The FirstPixelOutCondition and PresentDoneCondition conditions are particularly useful for rate-limiting and ensuring that frame generation is synchronized with the display's refresh cycle, reducing latency and improving visual quality. Note that the PresentDoneCondition is not supported on Orin iGPU. Also on supported systems, when using tools to remotely access the desktop, applications using the PresentDoneCondition might hang or crash.
Examples
  • added the holoviz_conditions example showing the usage of the new Holoviz scheduling conditions.
Holoviz module
Holoinfer module
Operators
Utils
HoloHub
Documentation

Breaking Changes

  • The experimental DataLogger interface has a breaking change to its API, adding a new std::optional<CudaStream_t> stream argument. Concrete implementations must be updated to add this new argument to the method signatures for log_data, log_tensor_data, log_tensormap_data and/or log_backend_specific.

  • The Fragment::make_executor(ArgsT&&... args) method was fixed so that this (Fragment*) is automatically passed as the first argument to the Executor before any args. It is unlikely to affect existing applications as the methods are automatically called internally by Holoscan SDK. However, if Fragment::make_executor is being used, this change should be noted.

Bug fixes

Issue Description
5529488 Fix a bug where the default condition was also added to a port when a native Python Condition with a "receiver" or "transmitter" parameter was applied
5381536 CTest cases for examples/multithread/python/multithread.py tests were updated so that the timing tests only require a system with >4 threads to pass (previously required >8 threads)
5552584 When the incoming message contains a video tensor, it will ignore in_tensor_name, even if the tensor with the correct name is not a video
5489793 Holoscan 'as_tensor' methods cannot convert singleton or null PyTorch GPU Tensors

Known Issues

Issue Description
A Subgraph::set_dynamic_flows convenience method for adding dynamic flows within subgraphs is missing. The workaround is to use the underlying Fragment as obtained by the Subgraph::fragment() method (C++) / Subgraph.fragment property (Python) and call Fragment::set_dynamic_flows instead.

v3.7.0.post1

16 Oct 16:29
73f27ca

Choose a tag to compare

Post 3.7.0 Release Fixes:

  • Update GXF downloading URL in the Dockerfile to use the public NVIDIA artifactory URL.

v3.7.0

03 Oct 21:00
7b67907

Choose a tag to compare

Release Artifacts

See supported platforms for compatibility.

Release Notes

New Features and Improvements

Core
  • Debian packages and Python wheels are now available for both CUDA 12 and CUDA 13 across Holoscan target platforms.
    • apt install holoscan-cuda-12, apt install holoscan-cuda-13, or apt install holoscan (default CUDA 12)
    • python3 -m pip install holoscan-cu12 or python3 -m pip install holoscan-cu13
  • HOLOSCAN_APP_DRIVER_PORT Environment Variable: Added support for overriding the default App Driver port via the HOLOSCAN_APP_DRIVER_PORT environment variable. This allows users to customize the port number used by distributed applications when no explicit address is specified with the --address option. Default port is now 57777 instead of 8765.
Operators/Resources/Conditions
  • HolovizOp: in fullscreen mode GSync is now supported correctly and applications are flipping instead of blitting when displaying the next frame. This significantly improves performance and reduces latency with swap bound applications.
  • FormatConverterOp: added support for converting RGBA16161616 to RGB888 and RGB161616 to RGB888
  • Holoinfer: added support to use user-provided CUDA stream pool in torch backend
  • Green context: added support for the default CUDA green context pool in a fragment
  • Stream pool and green context: added support for user-provided NVTX identifier used in NSight profiling
Examples
Holoviz module
Holoinfer module
Operators
Utils
HoloHub
Documentation

Breaking Changes

  • App Driver Default Port Changed: The default port number for the App Driver service has been changed from 8765 to 57777 to avoid conflicts with the FIO (Flexible I/O Tester) service whose default port is also 8765. This change affects distributed applications that rely on the default port without explicitly specifying an address.

    • Impact: Applications using --driver and --worker options without the --address parameter will now use port 57777 instead of 8765.
    • Backward Compatibility: Users can override the default port using the HOLOSCAN_APP_DRIVER_PORT environment variable (e.g., HOLOSCAN_APP_DRIVER_PORT=8765) or by explicitly specifying the address with --address option.
    • Migration: No code changes are required for most users. Only users who have hardcoded expectations about port 8765 or have firewall rules/network configurations specifically for that port may need to update their configurations.
  • The HOLOSCAN_UCX_ASYNCHRONOUS environment variable is now marked as deprecated. In the future only the synchronous mode for the underlying UCX transmitters will be supported (synchronous mode is the current default so there is no change in behavior for any users who were not explicitly changing this environment variable). Aside from raising a warning on application startup, this environment variable will continue to work as before for upcoming 3.x releases, but will be removed in v4.0.

Bug fixes

Issue Description
5472498 OutputContext::emit of std::shared_ptr<Tensor> or TensorMap results in logging via DataLogger::log_backend_specific interface instead of the expected (log_tensor_data or log_tensormap_data methods).
Construction of holoscan.data_loggers.AsyncConsoleLogger fails if holoscan.data_loggers.SimpleTextSerializer has not already been explicitly imported
FormatConverterOp assumes that U, V or UV-plane offsets for NV12 and YUV formats are equal to the Y color plane size when converting these types to RGB format. This may not always be true and color_planes[i].offset should be used instead.

Known Issues

Issue Description
5539840 Holoinfer ONNX Runtime backend Input/Output CUDA buffer test segfault on IGX Orin iGPU
5538962 [L4T SIPL, JP7.0, Thor, Jedha, IQ, VB1940, HOLOLINK] I2C issues on Jedha with SIPL COE 
5529488 Native Python port-based condition does not automatically disable the default condition a port
5518571 [HSB FPGA] Disabling packetizer in build causes second camera to fail to stream [IRIS]
5502972 HDMI Cameras not working with AJA card using aja_video_capture application
5489793 Holoscan 'as_tensor' methods cannot convert singleton or null Torch tensors
5484505 Thread Pools are not created properly in python
5475816 EXAMPLE_CPP_ACTIVATION_MAP fails on x86_64 w/ Debian container
5470011 CUDA 13 container is not compatible on IGX-dGPU. CUDA forward compatibility does not support graphics interop.
5469227 [Orin Nano] Failed to initialize component 00023 (cuda_green_context_pool)
5469136 Address Sanitizer pipeline build fails due to CUDA minimum version
5464864 Segmentation fault happens when close multiai_ultrasound window
5460314 PoseTree service access causes "pybind11::detail::get_type_info: type has multiple pybind11-registered bases" error messages
5457599 EXAMPLE_PYTHON_TENSOR_INTEROP_TEST passes despite edge case failure
5427783 [Concord] distributed_pose_tree_multinode failed between concord and nano/IGX/Concord
5427678 v4l_camera example visualization fails with Holoviz + >=R550 driver
5425996 Build Warnings with CMake 3.31
5424351 [X86 Server] holoscan-sdk/public/python/tests/unit/test_gxf.py test failed.
5411188 GXF dispatcher thread runs with SCHED_OTHER while worker threads run with Linux RT scheduling policies
5404972 Real-time thread test failures
5404947 Libtorch configuration warnings
5404926 Address Sanitizer failures
5392624 ERROR: LeakSanitizer: detected memory leaks
5390206 Distributed PoseTree service using UCX cannot accept more than three clients
5389580 is_user_defined_root() incorrectly classifies operators as root in acyclic flow graphs
5381536 EXAMPLE_PYTHON_MULTITHREAD_TIMES_VALIDATION_TEST fails on CAGX
5372221 No support for int64 and bool in Holoscan Inference Operator
5370867 Async Distributed Video Replayer segfaults at exit on IGX dGPU
5348375 getting error running nsys profile
5327270 Holoscan only warns when a config file cannot be found on disk
5211869 [IGX-dGPU] Error "Failed to start server on 0.0.0.0:10002" in VScode Debug Distributed Endoscopy Tool Tracking application
5162855 Dual-GPU Configuration fails Holoscan application v2.9
5144233 python-api-tracing-profile failure on Ubuntu 24.04 / Python 3.12
5061275 Ping distributed multi-nodes failed to create path between two nodes
5014059 [IGX-dGPU] VScode Debug Distributed Endoscopy Tool Tracking application on an IGX w/dGPU, does not hit the expected debug points.
4953020 The HOLOINFER_TEST is failing, or a segmentation fault occurs during the parseFromFile() call.
4789382 [Holoscan SDK v2.2] InferenceOp with libtorch backend reports undefined symbols
4768945 Distributed applications crash when the engine file is unavailable/generating engine file
4753994 Debugging Python application may lead to segfault when expanding an operator variable
4384348 UCX termination (either ctrl+c , press 'Esc' or clicking close button) is not smooth and can show multiple error messages

v3.6.1

17 Sep 21:19
30d9d30

Choose a tag to compare

Release Artifacts

🐋 Docker container: tag v3.6.1-cuda13-dgpu
📦️ Debian package (Jetson AGX Thor only): apt install holoscan=3.6.1
📕 Documentation

See supported platforms for compatibility.

Release Notes

Holoscan SDK v3.6.1 is a limited release providing Holoscan SDK CUDA 13.0 support for x86_64 workstations and the Jetson AGX Thor platform.

New Features and Improvements

Core
  • Added support for CUDA 13.0 build environment and binary artifacts.
  • HOLOSCAN_APP_DRIVER_PORT Environment Variable: Added support for overriding the default App Driver port via the HOLOSCAN_APP_DRIVER_PORT environment variable. This allows users to customize the port number used by distributed applications when no explicit address is specified with the --address option. Default port is now 57777 instead of 8765.

Breaking Changes

  • App Driver Default Port Changed: The default port number for the App Driver service has been changed from 8765 to 57777 to avoid conflicts with the FIO (Flexible I/O Tester) service whose default port is also 8765. This change affects distributed applications that rely on the default port without explicitly specifying an address.
    • Impact: Applications using --driver and --worker options without the --address parameter will now use port 57777 instead of 8765.
    • Backward Compatibility: Users can override the default port using the HOLOSCAN_APP_DRIVER_PORT environment variable (e.g., HOLOSCAN_APP_DRIVER_PORT=8765) or by explicitly specifying the address with --address option.
    • Migration: No code changes are required for most users. Only users who have hardcoded expectations about port 8765 or have firewall rules/network configurations specifically for that port may need to update their configurations.

Bug fixes

Issue Description