Robotics 46
☆ ViTa-Zero: Zero-shot Visuotactile Object 6D Pose Estimation ICRA 2025
Object 6D pose estimation is a critical challenge in robotics, particularly
for manipulation tasks. While prior research combining visual and tactile
(visuotactile) information has shown promise, these approaches often struggle
with generalization due to the limited availability of visuotactile data. In
this paper, we introduce ViTa-Zero, a zero-shot visuotactile pose estimation
framework. Our key innovation lies in leveraging a visual model as its backbone
and performing feasibility checking and test-time optimization based on
physical constraints derived from tactile and proprioceptive observations.
Specifically, we model the gripper-object interaction as a spring-mass system,
where tactile sensors induce attractive forces, and proprioception generates
repulsive forces. We validate our framework through experiments on a real-world
robot setup, demonstrating its effectiveness across representative visual
backbones and manipulation scenarios, including grasping, object picking, and
bimanual handover. Compared to the visual models, our approach overcomes some
drastic failure modes while tracking the in-hand object pose. In our
experiments, our approach shows an average increase of 55% in AUC of ADD-S and
60% in ADD, along with an 80% lower position error compared to FoundationPose.
comment: Accepted by ICRA 2025
☆ Novel Demonstration Generation with Gaussian Splatting Enables Robust One-Shot Manipulation
Visuomotor policies learned from teleoperated demonstrations face challenges
such as lengthy data collection, high costs, and limited data diversity.
Existing approaches address these issues by augmenting image observations in
RGB space or employing Real-to-Sim-to-Real pipelines based on physical
simulators. However, the former is constrained to 2D data augmentation, while
the latter suffers from imprecise physical simulation caused by inaccurate
geometric reconstruction. This paper introduces RoboSplat, a novel method that
generates diverse, visually realistic demonstrations by directly manipulating
3D Gaussians. Specifically, we reconstruct the scene through 3D Gaussian
Splatting (3DGS), directly edit the reconstructed scene, and augment data
across six types of generalization with five techniques: 3D Gaussian
replacement for varying object types, scene appearance, and robot embodiments;
equivariant transformations for different object poses; visual attribute
editing for various lighting conditions; novel view synthesis for new camera
perspectives; and 3D content generation for diverse object types. Comprehensive
real-world experiments demonstrate that RoboSplat significantly enhances the
generalization of visuomotor policies under diverse disturbances. Notably,
while policies trained on hundreds of real-world demonstrations with additional
2D data augmentation achieve an average success rate of 57.2%, RoboSplat
attains 87.8% in one-shot settings across six types of generalization in the
real world.
comment: Published at Robotics: Science and Systems (RSS) 2025
☆ A New Semidefinite Relaxation for Linear and Piecewise-Affine Optimal Control with Time Scaling
We introduce a semidefinite relaxation for optimal control of linear systems
with time scaling. These problems are inherently nonconvex, since the system
dynamics involves bilinear products between the discretization time step and
the system state and controls. The proposed relaxation is closely related to
the standard second-order semidefinite relaxation for quadratic constraints,
but we carefully select a subset of the possible bilinear terms and apply a
change of variables to achieve empirically tight relaxations while keeping the
computational load light. We further extend our method to handle
piecewise-affine (PWA) systems by formulating the PWA optimal-control problem
as a shortest-path problem in a graph of convex sets (GCS). In this GCS,
different paths represent different mode sequences for the PWA system, and the
convex sets model the relaxed dynamics within each mode. By combining a tight
convex relaxation of the GCS problem with our semidefinite relaxation with time
scaling, we can solve PWA optimal-control problems through a single
semidefinite program.
☆ RUKA: Rethinking the Design of Humanoid Hands with Learning
Anya Zorin, Irmak Guzey, Billy Yan, Aadhithya Iyer, Lisa Kondrich, Nikhil X. Bhattasali, Lerrel Pinto
Dexterous manipulation is a fundamental capability for robotic systems, yet
progress has been limited by hardware trade-offs between precision,
compactness, strength, and affordability. Existing control methods impose
compromises on hand designs and applications. However, learning-based
approaches present opportunities to rethink these trade-offs, particularly to
address challenges with tendon-driven actuation and low-cost materials. This
work presents RUKA, a tendon-driven humanoid hand that is compact, affordable,
and capable. Made from 3D-printed parts and off-the-shelf components, RUKA has
5 fingers with 15 underactuated degrees of freedom enabling diverse human-like
grasps. Its tendon-driven actuation allows powerful grasping in a compact,
human-sized form factor. To address control challenges, we learn
joint-to-actuator and fingertip-to-actuator models from motion-capture data
collected by the MANUS glove, leveraging the hand's morphological accuracy.
Extensive evaluations demonstrate RUKA's superior reachability, durability, and
strength compared to other robotic hands. Teleoperation tasks further showcase
RUKA's dexterous movements. The open-source design and assembly instructions of
RUKA, code, and data are available at https://ruka-hand.github.io/.
comment: Website at https://ruka-hand.github.io/
☆ Long Range Navigator (LRN): Extending robot planning horizons beyond metric maps
Matt Schmittle, Rohan Baijal, Nathan Hatch, Rosario Scalise, Mateo Guaman Castro, Sidharth Talia, Khimya Khetarpal, Byron Boots, Siddhartha Srinivasa
A robot navigating an outdoor environment with no prior knowledge of the
space must rely on its local sensing to perceive its surroundings and plan.
This can come in the form of a local metric map or local policy with some fixed
horizon. Beyond that, there is a fog of unknown space marked with some fixed
cost. A limited planning horizon can often result in myopic decisions leading
the robot off course or worse, into very difficult terrain. Ideally, we would
like the robot to have full knowledge that can be orders of magnitude larger
than a local cost map. In practice, this is intractable due to sparse sensing
information and often computationally expensive. In this work, we make a key
observation that long-range navigation only necessitates identifying good
frontier directions for planning instead of full map knowledge. To this end, we
propose Long Range Navigator (LRN), that learns an intermediate affordance
representation mapping high-dimensional camera images to `affordable' frontiers
for planning, and then optimizing for maximum alignment with the desired goal.
LRN notably is trained entirely on unlabeled ego-centric videos making it easy
to scale and adapt to new platforms. Through extensive off-road experiments on
Spot and a Big Vehicle, we find that augmenting existing navigation stacks with
LRN reduces human interventions at test-time and leads to faster decision
making indicating the relevance of LRN. https://personalrobotics.github.io/lrn
comment: 10 pages, 9 figures
☆ Force and Speed in a Soft Stewart Platform
Jake Ketchum, James Avtges, Millicent Schlafly, Helena Young, Taekyoung Kim, Ryan L. Truby, Todd D. Murphey
Many soft robots struggle to produce dynamic motions with fast, large
displacements. We develop a parallel 6 degree-of-freedom (DoF) Stewart-Gough
mechanism using Handed Shearing Auxetic (HSA) actuators. By using soft
actuators, we are able to use one third as many mechatronic components as a
rigid Stewart platform, while retaining a working payload of 2kg and an
open-loop bandwidth greater than 16Hx. We show that the platform is capable of
both precise tracing and dynamic disturbance rejection when controlling a ball
and sliding puck using a Proportional Integral Derivative (PID) controller. We
develop a machine-learning-based kinematics model and demonstrate a functional
workspace of roughly 10cm in each translation direction and 28 degrees in each
orientation. This 6DoF device has many of the characteristics associated with
rigid components - power, speed, and total workspace - while capturing the
advantages of soft mechanisms.
comment: Published at Robosoft 2025
☆ Uncertainty-Aware Trajectory Prediction via Rule-Regularized Heteroscedastic Deep Classification
Deep learning-based trajectory prediction models have demonstrated promising
capabilities in capturing complex interactions. However, their
out-of-distribution generalization remains a significant challenge,
particularly due to unbalanced data and a lack of enough data and diversity to
ensure robustness and calibration. To address this, we propose SHIFT (Spectral
Heteroscedastic Informed Forecasting for Trajectories), a novel framework that
uniquely combines well-calibrated uncertainty modeling with informative priors
derived through automated rule extraction. SHIFT reformulates trajectory
prediction as a classification task and employs heteroscedastic
spectral-normalized Gaussian processes to effectively disentangle epistemic and
aleatoric uncertainties. We learn informative priors from training labels,
which are automatically generated from natural language driving rules, such as
stop rules and drivability constraints, using a retrieval-augmented generation
framework powered by a large language model. Extensive evaluations over the
nuScenes dataset, including challenging low-data and cross-location scenarios,
demonstrate that SHIFT outperforms state-of-the-art methods, achieving
substantial gains in uncertainty calibration and displacement metrics. In
particular, our model excels in complex scenarios, such as intersections, where
uncertainty is inherently higher. Project page:
https://kumarmanas.github.io/SHIFT/.
comment: 17 Pages, 9 figures. Accepted to Robotics: Science and Systems(RSS),
2025
☆ Imperative MPC: An End-to-End Self-Supervised Learning with Differentiable MPC for UAV Attitude Control
Modeling and control of nonlinear dynamics are critical in robotics,
especially in scenarios with unpredictable external influences and complex
dynamics. Traditional cascaded modular control pipelines often yield suboptimal
performance due to conservative assumptions and tedious parameter tuning. Pure
data-driven approaches promise robust performance but suffer from low sample
efficiency, sim-to-real gaps, and reliance on extensive datasets. Hybrid
methods combining learning-based and traditional model-based control in an
end-to-end manner offer a promising alternative. This work presents a
self-supervised learning framework combining learning-based inertial odometry
(IO) module and differentiable model predictive control (d-MPC) for Unmanned
Aerial Vehicle (UAV) attitude control. The IO denoises raw IMU measurements and
predicts UAV attitudes, which are then optimized by MPC for control actions in
a bi-level optimization (BLO) setup, where the inner MPC optimizes control
actions and the upper level minimizes discrepancy between real-world and
predicted performance. The framework is thus end-to-end and can be trained in a
self-supervised manner. This approach combines the strength of learning-based
perception with the interpretable model-based control. Results show the
effectiveness even under strong wind. It can simultaneously enhance both the
MPC parameter learning and IMU prediction performance.
comment: 14 pages, 3 figures, accepted by L4DC 2025
☆ RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins CVPR 2025
Yao Mu, Tianxing Chen, Zanxin Chen, Shijia Peng, Zhiqian Lan, Zeyu Gao, Zhixuan Liang, Qiaojun Yu, Yude Zou, Mingkun Xu, Lunkai Lin, Zhiqiang Xie, Mingyu Ding, Ping Luo
In the rapidly advancing field of robotics, dual-arm coordination and complex
object manipulation are essential capabilities for developing advanced
autonomous systems. However, the scarcity of diverse, high-quality
demonstration data and real-world-aligned evaluation benchmarks severely limits
such development. To address this, we introduce RoboTwin, a generative digital
twin framework that uses 3D generative foundation models and large language
models to produce diverse expert datasets and provide a real-world-aligned
evaluation platform for dual-arm robotic tasks. Specifically, RoboTwin creates
varied digital twins of objects from single 2D images, generating realistic and
interactive scenarios. It also introduces a spatial relation-aware code
generation framework that combines object annotations with large language
models to break down tasks, determine spatial constraints, and generate precise
robotic movement code. Our framework offers a comprehensive benchmark with both
simulated and real-world data, enabling standardized evaluation and better
alignment between simulated training and real-world performance. We validated
our approach using the open-source COBOT Magic Robot platform. Policies
pre-trained on RoboTwin-generated data and fine-tuned with limited real-world
samples demonstrate significant potential for enhancing dual-arm robotic
manipulation systems by improving success rates by over 70% for single-arm
tasks and over 40% for dual-arm tasks compared to models trained solely on
real-world data.
comment: CVPR 2025 Highlight. 22 pages. Project page:
https://robotwin-benchmark.github.io/
☆ Adaptive Task Space Non-Singular Terminal Super-Twisting Sliding Mode Control of a 7-DOF Robotic Manipulator
This paper presents a new task-space Non-singular Terminal Super-Twisting
Sliding Mode (NT-STSM) controller with adaptive gains for robust trajectory
tracking of a 7-DOF robotic manipulator. The proposed approach addresses the
challenges of chattering, unknown disturbances, and rotational motion tracking,
making it suited for high-DOF manipulators in dexterous manipulation tasks. A
rigorous boundedness proof is provided, offering gain selection guidelines for
practical implementation. Simulations and hardware experiments with external
disturbances demonstrate the proposed controller's robust, accurate tracking
with reduced control effort under unknown disturbances compared to other
NT-STSM and conventional controllers. The results demonstrated that the
proposed NT-STSM controller mitigates chattering and instability in complex
motions, making it a viable solution for dexterous robotic manipulations and
various industrial applications.
comment: 10 pages, 8 figures
☆ Krysalis Hand: A Lightweight, High-Payload, 18-DoF Anthropomorphic End-Effector for Robotic Learning and Dexterous Manipulation
This paper presents the Krysalis Hand, a five-finger robotic end-effector
that combines a lightweight design, high payload capacity, and a high number of
degrees of freedom (DoF) to enable dexterous manipulation in both industrial
and research settings. This design integrates the actuators within the hand
while maintaining an anthropomorphic form. Each finger joint features a
self-locking mechanism that allows the hand to sustain large external forces
without active motor engagement. This approach shifts the payload limitation
from the motor strength to the mechanical strength of the hand, allowing the
use of smaller, more cost-effective motors. With 18 DoF and weighing only 790
grams, the Krysalis Hand delivers an active squeezing force of 10 N per finger
and supports a passive payload capacity exceeding 10 lbs. These characteristics
make Krysalis Hand one of the lightest, strongest, and most dexterous robotic
end-effectors of its kind. Experimental evaluations validate its ability to
perform intricate manipulation tasks and handle heavy payloads, underscoring
its potential for industrial applications as well as academic research. All
code related to the Krysalis Hand, including control and teleoperation, is
available on the project GitHub repository:
https://github.com/Soltanilara/Krysalis_Hand
☆ Taccel: Scaling Up Vision-based Tactile Robotics via High-performance GPU Simulation
Yuyang Li, Wenxin Du, Chang Yu, Puhao Li, Zihang Zhao, Tengyu Liu, Chenfanfu Jiang, Yixin Zhu, Siyuan Huang
Tactile sensing is crucial for achieving human-level robotic capabilities in
manipulation tasks. VBTSs have emerged as a promising solution, offering high
spatial resolution and cost-effectiveness by sensing contact through
camera-captured deformation patterns of elastic gel pads. However, these
sensors' complex physical characteristics and visual signal processing
requirements present unique challenges for robotic applications. The lack of
efficient and accurate simulation tools for VBTS has significantly limited the
scale and scope of tactile robotics research. Here we present Taccel, a
high-performance simulation platform that integrates IPC and ABD to model
robots, tactile sensors, and objects with both accuracy and unprecedented
speed, achieving an 18-fold acceleration over real-time across thousands of
parallel environments. Unlike previous simulators that operate at sub-real-time
speeds with limited parallelization, Taccel provides precise physics simulation
and realistic tactile signals while supporting flexible robot-sensor
configurations through user-friendly APIs. Through extensive validation in
object recognition, robotic grasping, and articulated object manipulation, we
demonstrate precise simulation and successful sim-to-real transfer. These
capabilities position Taccel as a powerful tool for scaling up tactile robotics
research and development. By enabling large-scale simulation and
experimentation with tactile sensing, Taccel accelerates the development of
more capable robotic systems, potentially transforming how robots interact with
and understand their physical environment.
comment: 17 pages, 7 figures
☆ 3D-PNAS: 3D Industrial Surface Anomaly Synthesis with Perlin Noise
Large pretrained vision foundation models have shown significant potential in
various vision tasks. However, for industrial anomaly detection, the scarcity
of real defect samples poses a critical challenge in leveraging these models.
While 2D anomaly generation has significantly advanced with established
generative models, the adoption of 3D sensors in industrial manufacturing has
made leveraging 3D data for surface quality inspection an emerging trend. In
contrast to 2D techniques, 3D anomaly generation remains largely unexplored,
limiting the potential of 3D data in industrial quality inspection. To address
this gap, we propose a novel yet simple 3D anomaly generation method, 3D-PNAS,
based on Perlin noise and surface parameterization. Our method generates
realistic 3D surface anomalies by projecting the point cloud onto a 2D plane,
sampling multi-scale noise values from a Perlin noise field, and perturbing the
point cloud along its normal direction. Through comprehensive visualization
experiments, we demonstrate how key parameters - including noise scale,
perturbation strength, and octaves, provide fine-grained control over the
generated anomalies, enabling the creation of diverse defect patterns from
pronounced deformations to subtle surface variations. Additionally, our
cross-category experiments show that the method produces consistent yet
geometrically plausible anomalies across different object types, adapting to
their specific surface characteristics. We also provide a comprehensive
codebase and visualization toolkit to facilitate future research.
☆ Versatile, Robust, and Explosive Locomotion with Rigid and Articulated Compliant Quadrupeds
Achieving versatile and explosive motion with robustness against dynamic
uncertainties is a challenging task. Introducing parallel compliance in
quadrupedal design is deemed to enhance locomotion performance, which, however,
makes the control task even harder. This work aims to address this challenge by
proposing a general template model and establishing an efficient motion
planning and control pipeline. To start, we propose a reduced-order template
model-the dual-legged actuated spring-loaded inverted pendulum with trunk
rotation-which explicitly models parallel compliance by decoupling spring
effects from active motor actuation. With this template model, versatile
acrobatic motions, such as pronking, froggy jumping, and hop-turn, are
generated by a dual-layer trajectory optimization, where the singularity-free
body rotation representation is taken into consideration. Integrated with a
linear singularity-free tracking controller, enhanced quadrupedal locomotion is
achieved. Comparisons with the existing template model reveal the improved
accuracy and generalization of our model. Hardware experiments with a rigid
quadruped and a newly designed compliant quadruped demonstrate that i) the
template model enables generating versatile dynamic motion; ii) parallel
elasticity enhances explosive motion. For example, the maximal pronking
distance, hop-turn yaw angle, and froggy jumping distance increase at least by
25%, 15% and 25%, respectively; iii) parallel elasticity improves the
robustness against dynamic uncertainties, including modelling errors and
external disturbances. For example, the allowable support surface height
variation increases by 100% for robust froggy jumping.
comment: 20 pages, 25 figures
☆ UncAD: Towards Safe End-to-end Autonomous Driving via Online Map Uncertainty
Pengxuan Yang, Yupeng Zheng, Qichao Zhang, Kefei Zhu, Zebin Xing, Qiao Lin, Yun-Fu Liu, Zhiguo Su, Dongbin Zhao
End-to-end autonomous driving aims to produce planning trajectories from raw
sensors directly. Currently, most approaches integrate perception, prediction,
and planning modules into a fully differentiable network, promising great
scalability. However, these methods typically rely on deterministic modeling of
online maps in the perception module for guiding or constraining vehicle
planning, which may incorporate erroneous perception information and further
compromise planning safety. To address this issue, we delve into the importance
of online map uncertainty for enhancing autonomous driving safety and propose a
novel paradigm named UncAD. Specifically, UncAD first estimates the uncertainty
of the online map in the perception module. It then leverages the uncertainty
to guide motion prediction and planning modules to produce multi-modal
trajectories. Finally, to achieve safer autonomous driving, UncAD proposes an
uncertainty-collision-aware planning selection strategy according to the online
map uncertainty to evaluate and select the best trajectory. In this study, we
incorporate UncAD into various state-of-the-art (SOTA) end-to-end methods.
Experiments on the nuScenes dataset show that integrating UncAD, with only a
1.9% increase in parameters, can reduce collision rates by up to 26% and
drivable area conflict rate by up to 42%. Codes, pre-trained models, and demo
videos can be accessed at https://github.com/pengxuanyang/UncAD.
☆ Explainable Scene Understanding with Qualitative Representations and Graph Neural Networks
This paper investigates the integration of graph neural networks (GNNs) with
Qualitative Explainable Graphs (QXGs) for scene understanding in automated
driving. Scene understanding is the basis for any further reactive or proactive
decision-making. Scene understanding and related reasoning is inherently an
explanation task: why is another traffic participant doing something, what or
who caused their actions? While previous work demonstrated QXGs' effectiveness
using shallow machine learning models, these approaches were limited to
analysing single relation chains between object pairs, disregarding the broader
scene context. We propose a novel GNN architecture that processes entire graph
structures to identify relevant objects in traffic scenes. We evaluate our
method on the nuScenes dataset enriched with DriveLM's human-annotated
relevance labels. Experimental results show that our GNN-based approach
achieves superior performance compared to baseline methods. The model
effectively handles the inherent class imbalance in relevant object
identification tasks while considering the complete spatial-temporal
relationships between all objects in the scene. Our work demonstrates the
potential of combining qualitative representations with deep learning
approaches for explainable scene understanding in autonomous driving systems.
comment: Workshop "Advancing Automated Driving in Highly Interactive Scenarios
through Behavior Prediction, Trustworthy AI, and Remote Operations" @ 36th
IEEE Intelligent Vehicles Symposium (IV)
☆ Approaching Current Challenges in Developing a Software Stack for Fully Autonomous Driving
Autonomous driving is a complex undertaking. A common approach is to break
down the driving task into individual subtasks through modularization. These
sub-modules are usually developed and published separately. However, if these
individually developed algorithms have to be combined again to form a
full-stack autonomous driving software, this poses particular challenges.
Drawing upon our practical experience in developing the software of TUM
Autonomous Motorsport, we have identified and derived these challenges in
developing an autonomous driving software stack within a scientific
environment. We do not focus on the specific challenges of individual
algorithms but on the general difficulties that arise when deploying research
algorithms on real-world test vehicles. To overcome these challenges, we
introduce strategies that have been effective in our development approach. We
additionally provide open-source implementations that enable these concepts on
GitHub. As a result, this paper's contributions will simplify future full-stack
autonomous driving projects, which are essential for a thorough evaluation of
the individual algorithms.
comment: Accepted at IEEE IV 2025
☆ Trajectory Adaptation using Large Language Models
Adapting robot trajectories based on human instructions as per new situations
is essential for achieving more intuitive and scalable human-robot
interactions. This work proposes a flexible language-based framework to adapt
generic robotic trajectories produced by off-the-shelf motion planners like
RRT, A-star, etc, or learned from human demonstrations. We utilize pre-trained
LLMs to adapt trajectory waypoints by generating code as a policy for dense
robot manipulation, enabling more complex and flexible instructions than
current methods. This approach allows us to incorporate a broader range of
commands, including numerical inputs. Compared to state-of-the-art
feature-based sequence-to-sequence models which require training, our method
does not require task-specific training and offers greater interpretability and
more effective feedback mechanisms. We validate our approach through simulation
experiments on the robotic manipulator, aerial vehicle, and ground robot in the
Pybullet and Gazebo simulation environments, demonstrating that LLMs can
successfully adapt trajectories to complex human instructions.
comment: Accepted to CoRL LangRob workshop 2024
☆ Biasing the Driving Style of an Artificial Race Driver for Online Time-Optimal Maneuver Planning
In this work, we present a novel approach to bias the driving style of an
artificial race driver (ARD) for online time-optimal trajectory planning. Our
method leverages a nonlinear model predictive control (MPC) framework that
combines time minimization with exit speed maximization at the end of the
planning horizon. We introduce a new MPC terminal cost formulation based on the
trajectory planned in the previous MPC step, enabling ARD to adapt its driving
style from early to late apex maneuvers in real-time. Our approach is
computationally efficient, allowing for low replan times and long planning
horizons. We validate our method through simulations, comparing the results
against offline minimum-lap-time (MLT) optimal control and online minimum-time
MPC solutions. The results demonstrate that our new terminal cost enables ARD
to bias its driving style, and achieve online lap times close to the MLT
solution and faster than the minimum-time MPC solution. Our approach paves the
way for a better understanding of the reasons behind human drivers' choice of
early or late apex maneuvers.
☆ B*: Efficient and Optimal Base Placement for Fixed-Base Manipulators
B* is a novel optimization framework that addresses a critical challenge in
fixed-base manipulator robotics: optimal base placement. Current methods rely
on pre-computed kinematics databases generated through sampling to search for
solutions. However, they face an inherent trade-off between solution optimality
and computational efficiency when determining sampling resolution. To address
these limitations, B* unifies multiple objectives without database dependence.
The framework employs a two-layer hierarchical approach. The outer layer
systematically manages terminal constraints through progressive tightening,
particularly for base mobility, enabling feasible initialization and broad
solution exploration. The inner layer addresses non-convexities in each
outer-layer subproblem through sequential local linearization, converting the
original problem into tractable sequential linear programming (SLP). Testing
across multiple robot platforms demonstrates B*'s effectiveness. The framework
achieves solution optimality five orders of magnitude better than
sampling-based approaches while maintaining perfect success rates and reduced
computational overhead. Operating directly in configuration space, B* enables
simultaneous path planning with customizable optimization criteria. B* serves
as a crucial initialization tool that bridges the gap between theoretical
motion planning and practical deployment, where feasible trajectory existence
is fundamental.
☆ Embodied Neuromorphic Control Applied on a 7-DOF Robotic Manipulator
The development of artificial intelligence towards real-time interaction with
the environment is a key aspect of embodied intelligence and robotics. Inverse
dynamics is a fundamental robotics problem, which maps from joint space to
torque space of robotic systems. Traditional methods for solving it rely on
direct physical modeling of robots which is difficult or even impossible due to
nonlinearity and external disturbance. Recently, data-based model-learning
algorithms are adopted to address this issue. However, they often require
manual parameter tuning and high computational costs. Neuromorphic computing is
inherently suitable to process spatiotemporal features in robot motion control
at extremely low costs. However, current research is still in its infancy:
existing works control only low-degree-of-freedom systems and lack performance
quantification and comparison. In this paper, we propose a neuromorphic control
framework to control 7 degree-of-freedom robotic manipulators. We use Spiking
Neural Network to leverage the spatiotemporal continuity of the motion data to
improve control accuracy, and eliminate manual parameters tuning. We validated
the algorithm on two robotic platforms, which reduces torque prediction error
by at least 60% and performs a target position tracking task successfully. This
work advances embodied neuromorphic control by one step forward from proof of
concept to applications in complex real-world tasks.
☆ A Genetic Approach to Gradient-Free Kinodynamic Planning in Uneven Terrains
This paper proposes a genetic algorithm-based kinodynamic planning algorithm
(GAKD) for car-like vehicles navigating uneven terrains modeled as triangular
meshes. The algorithm's distinct feature is trajectory optimization over a
fixed-length receding horizon using a genetic algorithm with heuristic-based
mutation, ensuring the vehicle's controls remain within its valid operational
range. By addressing challenges posed by uneven terrain meshes, such as
changing face normals, GAKD offers a practical solution for path planning in
complex environments. Comparative evaluations against Model Predictive Path
Integral (MPPI) and log-MPPI methods show that GAKD achieves up to 20 percent
improvement in traversability cost while maintaining comparable path length.
These results demonstrate GAKD's potential in improving vehicle navigation on
challenging terrains.
☆ Autonomous Drone for Dynamic Smoke Plume Tracking
This paper presents a novel autonomous drone-based smoke plume tracking
system capable of navigating and tracking plumes in highly unsteady atmospheric
conditions. The system integrates advanced hardware and software and a
comprehensive simulation environment to ensure robust performance in controlled
and real-world settings. The quadrotor, equipped with a high-resolution imaging
system and an advanced onboard computing unit, performs precise maneuvers while
accurately detecting and tracking dynamic smoke plumes under fluctuating
conditions. Our software implements a two-phase flight operation, i.e.,
descending into the smoke plume upon detection and continuously monitoring the
smoke movement during in-plume tracking. Leveraging Proportional
Integral-Derivative (PID) control and a Proximal Policy Optimization based Deep
Reinforcement Learning (DRL) controller enables adaptation to plume dynamics.
Unreal Engine simulation evaluates performance under various smoke-wind
scenarios, from steady flow to complex, unsteady fluctuations, showing that
while the PID controller performs adequately in simpler scenarios, the
DRL-based controller excels in more challenging environments. Field tests
corroborate these findings. This system opens new possibilities for drone-based
monitoring in areas like wildfire management and air quality assessment. The
successful integration of DRL for real-time decision-making advances autonomous
drone control for dynamic environments.
comment: 7 pages, 7 figures
☆ A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulation
Rongtao Xu, Jian Zhang, Minghao Guo, Youpeng Wen, Haoting Yang, Min Lin, Jianzheng Huang, Zhe Li, Kaidong Zhang, Liqiong Wang, Yuxuan Kuang, Meng Cao, Feng Zheng, Xiaodan Liang
Robotic manipulation faces critical challenges in understanding spatial
affordances--the "where" and "how" of object interactions--essential for
complex manipulation tasks like wiping a board or stacking objects. Existing
methods, including modular-based and end-to-end approaches, often lack robust
spatial reasoning capabilities. Unlike recent point-based and flow-based
affordance methods that focus on dense spatial representations or trajectory
modeling, we propose A0, a hierarchical affordance-aware diffusion model that
decomposes manipulation tasks into high-level spatial affordance understanding
and low-level action execution. A0 leverages the Embodiment-Agnostic Affordance
Representation, which captures object-centric spatial affordances by predicting
contact points and post-contact trajectories. A0 is pre-trained on 1 million
contact points data and fine-tuned on annotated trajectories, enabling
generalization across platforms. Key components include Position Offset
Attention for motion-aware feature extraction and a Spatial Information
Aggregation Layer for precise coordinate mapping. The model's output is
executed by the action execution module. Experiments on multiple robotic
systems (Franka, Kinova, Realman, and Dobot) demonstrate A0's superior
performance in complex tasks, showcasing its efficiency, flexibility, and
real-world applicability.
☆ Graph-based Path Planning with Dynamic Obstacle Avoidance for Autonomous Parking
Farhad Nawaz, Minjun Sung, Darshan Gadginmath, Jovin D'sa, Sangjae Bae, David Isele, Nadia Figueroa, Nikolai Matni, Faizan M. Tariq
Safe and efficient path planning in parking scenarios presents a significant
challenge due to the presence of cluttered environments filled with static and
dynamic obstacles. To address this, we propose a novel and computationally
efficient planning strategy that seamlessly integrates the predictions of
dynamic obstacles into the planning process, ensuring the generation of
collision-free paths. Our approach builds upon the conventional Hybrid A star
algorithm by introducing a time-indexed variant that explicitly accounts for
the predictions of dynamic obstacles during node exploration in the graph, thus
enabling dynamic obstacle avoidance. We integrate the time-indexed Hybrid A
star algorithm within an online planning framework to compute local paths at
each planning step, guided by an adaptively chosen intermediate goal. The
proposed method is validated in diverse parking scenarios, including
perpendicular, angled, and parallel parking. Through simulations, we showcase
our approach's potential in greatly improving the efficiency and safety when
compared to the state of the art spline-based planning method for parking
situations.
☆ Crossing the Human-Robot Embodiment Gap with Sim-to-Real RL using One Human Demonstration
Teaching robots dexterous manipulation skills often requires collecting
hundreds of demonstrations using wearables or teleoperation, a process that is
challenging to scale. Videos of human-object interactions are easier to collect
and scale, but leveraging them directly for robot learning is difficult due to
the lack of explicit action labels from videos and morphological differences
between robot and human hands. We propose Human2Sim2Robot, a novel
real-to-sim-to-real framework for training dexterous manipulation policies
using only one RGB-D video of a human demonstrating a task. Our method utilizes
reinforcement learning (RL) in simulation to cross the human-robot embodiment
gap without relying on wearables, teleoperation, or large-scale data collection
typically necessary for imitation learning methods. From the demonstration, we
extract two task-specific components: (1) the object pose trajectory to define
an object-centric, embodiment-agnostic reward function, and (2) the
pre-manipulation hand pose to initialize and guide exploration during RL
training. We found that these two components are highly effective for learning
the desired task, eliminating the need for task-specific reward shaping and
tuning. We demonstrate that Human2Sim2Robot outperforms object-aware open-loop
trajectory replay by 55% and imitation learning with data augmentation by 68%
across grasping, non-prehensile manipulation, and multi-step tasks. Project
Site: https://human2sim2robot.github.io
comment: 15 pages, 13 figures
☆ Acoustic Analysis of Uneven Blade Spacing and Toroidal Geometry for Reducing Propeller Annoyance
Nikhil Vijay, Will C. Forte, Ishan Gajjar, Sarvesh Patham, Syon Gupta, Sahil Shah, Prathamesh Trivedi, Rishit Arora
Unmanned aerial vehicles (UAVs) are becoming more commonly used in populated
areas, raising concerns about noise pollution generated from their propellers.
This study investigates the acoustic performance of unconventional propeller
designs, specifically toroidal and uneven-blade spaced propellers, for their
potential in reducing psychoacoustic annoyance. Our experimental results show
that these designs noticeably reduced acoustic characteristics associated with
noise annoyance.
comment: For paper website, see https://tubaa.dev/ . 5 pages, 6 figures.
Manuscript originally completed on October 6, 2023 and revised on April 16,
2025
☆ UniPhys: Unified Planner and Controller with Diffusion for Flexible Physics-Based Character Control
Generating natural and physically plausible character motion remains
challenging, particularly for long-horizon control with diverse guidance
signals. While prior work combines high-level diffusion-based motion planners
with low-level physics controllers, these systems suffer from domain gaps that
degrade motion quality and require task-specific fine-tuning. To tackle this
problem, we introduce UniPhys, a diffusion-based behavior cloning framework
that unifies motion planning and control into a single model. UniPhys enables
flexible, expressive character motion conditioned on multi-modal inputs such as
text, trajectories, and goals. To address accumulated prediction errors over
long sequences, UniPhys is trained with the Diffusion Forcing paradigm,
learning to denoise noisy motion histories and handle discrepancies introduced
by the physics simulator. This design allows UniPhys to robustly generate
physically plausible, long-horizon motions. Through guided sampling, UniPhys
generalizes to a wide range of control signals, including unseen ones, without
requiring task-specific fine-tuning. Experiments show that UniPhys outperforms
prior methods in motion naturalness, generalization, and robustness across
diverse control tasks.
comment: Project page: https://wuyan01.github.io/uniphys-project/
♻ ☆ Learning Diverse Robot Striking Motions with Diffusion Models and Kinematically Constrained Gradient Guidance ICRA 2025
Kin Man Lee, Sean Ye, Qingyu Xiao, Zixuan Wu, Zulfiqar Zaidi, David B. D'Ambrosio, Pannag R. Sanketi, Matthew Gombolay
Advances in robot learning have enabled robots to generate skills for a
variety of tasks. Yet, robot learning is typically sample inefficient,
struggles to learn from data sources exhibiting varied behaviors, and does not
naturally incorporate constraints. These properties are critical for fast,
agile tasks such as playing table tennis. Modern techniques for learning from
demonstration improve sample efficiency and scale to diverse data, but are
rarely evaluated on agile tasks. In the case of reinforcement learning,
achieving good performance requires training on high-fidelity simulators. To
overcome these limitations, we develop a novel diffusion modeling approach that
is offline, constraint-guided, and expressive of diverse agile behaviors. The
key to our approach is a kinematic constraint gradient guidance (KCGG)
technique that computes gradients through both the forward kinematics of the
robot arm and the diffusion model to direct the sampling process. KCGG
minimizes the cost of violating constraints while simultaneously keeping the
sampled trajectory in-distribution of the training data. We demonstrate the
effectiveness of our approach for time-critical robotic tasks by evaluating
KCGG in two challenging domains: simulated air hockey and real table tennis. In
simulated air hockey, we achieved a 25.4% increase in block rate, while in
table tennis, we saw a 17.3% increase in success rate compared to imitation
learning baselines.
comment: ICRA 2025
♻ ☆ Relevance for Human Robot Collaboration
Inspired by the human ability to selectively focus on relevant information,
this paper introduces relevance, a novel dimensionality reduction process for
human-robot collaboration (HRC). Our approach incorporates a continuously
operating perception module, evaluates cue sufficiency within the scene, and
applies a flexible formulation and computation framework. To accurately and
efficiently quantify relevance, we developed an event-based framework that
maintains a continuous perception of the scene and selectively triggers
relevance determination. Within this framework, we developed a probabilistic
methodology, which considers various factors and is built on a novel structured
scene representation. Simulation results demonstrate that the relevance
framework and methodology accurately predict the relevance of a general HRC
setup, achieving a precision of 0.99, a recall of 0.94, an F1 score of 0.96,
and an object ratio of 0.94. Relevance can be broadly applied to several areas
in HRC to accurately improve task planning time by 79.56% compared with pure
planning for a cereal task, reduce perception latency by up to 26.53% for an
object detector, improve HRC safety by up to 13.50% and reduce the number of
inquiries for HRC by 80.84%. A real-world demonstration showcases the relevance
framework's ability to intelligently and seamlessly assist humans in everyday
tasks.
comment: under review
♻ ☆ Perceive With Confidence: Statistical Safety Assurances for Navigation with Learning-Based Perception
Zhiting Mei, Anushri Dixit, Meghan Booker, Emily Zhou, Mariko Storey-Matsutani, Allen Z. Ren, Ola Shorinwa, Anirudha Majumdar
Rapid advances in perception have enabled large pre-trained models to be used
out of the box for transforming high-dimensional, noisy, and partial
observations of the world into rich occupancy representations. However, the
reliability of these models and consequently their safe integration onto robots
remains unknown when deployed in environments unseen during training. To
provide safety guarantees, we rigorously quantify the uncertainty of
pre-trained perception systems for object detection and scene completion via a
novel calibration technique based on conformal prediction. Crucially, this
procedure guarantees robustness to distribution shifts in states when
perception outputs are used in conjunction with a planner. As a result, the
calibrated perception system can be used in combination with any safe planner
to provide an end-to-end statistical assurance on safety in unseen
environments. We evaluate the resulting approach, Perceive with Confidence
(PwC), in simulation and on hardware where a quadruped robot navigates through
previously unseen indoor, static environments. These experiments validate the
safety assurances for obstacle avoidance provided by PwC. In simulation, our
method reduces obstacle misdetection by $70\%$ compared to uncalibrated
perception models. While misdetections lead to collisions for baseline methods,
our approach consistently achieves $100\%$ safety. We further demonstrate
reducing the conservatism of our method without sacrificing safety, achieving a
$46\%$ increase in success rates in challenging environments while maintaining
$100\%$ safety. In hardware experiments, our method improves empirical safety
by $40\%$ over baselines and reduces obstacle misdetection by $93.3\%$. The
safety gap widens to $46.7\%$ when navigation speed increases, highlighting our
approach's robustness under more demanding conditions.
comment: Videos and code can be found at
https://perceive-with-confidence.github.io
♻ ☆ Scalable Multi-Robot Motion Planning Using Guidance-Informed Hypergraphs
In this work, we propose a method for multiple mobile robot motion planning
that efficiently plans for robot teams up to an order of magnitude larger than
existing state-of-the-art methods in congested settings with narrow passages in
the environment. We achieve this improvement in scalability by adapting the
state-of-the-art Decomposable State Space Hypergraph (DaSH) planning framework
to expand the set of problems it can support to include those without a highly
structured planning space and those with kinodynamic constraints. We accomplish
this by exploiting guidance about a problem's structure to limit exploration of
the planning space and through modifying DaSH's conflict resolution scheme.
This guidance captures when coordination between robots is necessary, allowing
us to decompose the intractably large multi-robot search space while limiting
risk of inter-robot conflicts by composing relevant robot groups together while
planning.
comment: This work has been submitted for review
♻ ☆ Learn2Decompose: Learning Problem Decomposition for Efficient Sequential Multi-object Manipulation Planning
We present a Reactive Task and Motion Planning (TAMP) approach for efficient
sequential multi-object manipulation in dynamic environments. Conventional TAMP
solvers experience an exponential increase in planning time as the planning
horizon and number of objects grow, limiting their applicability in real-world
scenarios. To address this, we propose learning problem decomposition from
demonstrations to accelerate TAMP solvers. Our approach consists of three key
components: goal decomposition learning, temporal distance learning, and object
reduction. Goal decomposition identifies the necessary sequences of states that
the system must pass through before reaching the final goal, treating them as
subgoal sequences. Temporal distance learning predicts the temporal distance
between two states, enabling the system to identify the closest subgoal from a
disturbed state. Object reduction minimizes the set of active objects
considered during replanning, further improving efficiency. We evaluate our
approach on three benchmarks, demonstrating its effectiveness in improving
replanning efficiency for sequential multi-object manipulation tasks in dynamic
environments.
♻ ☆ Minimum-Violation Temporal Logic Planning for Heterogeneous Robots under Robot Skill Failures
In this paper, we consider teams of robots with heterogeneous skills (e.g.,
sensing and manipulation) tasked with collaborative missions described by
Linear Temporal Logic (LTL) formulas. These LTL-encoded tasks require robots to
apply their skills to specific regions and objects in a temporal and logical
order. While existing temporal logic planning algorithms can synthesize
correct-by-construction plans, they typically lack reactivity to unexpected
failures of robot skills, which can compromise mission performance. This paper
addresses this challenge by proposing a reactive LTL planning algorithm that
adapts to unexpected failures during deployment. Specifically, the proposed
algorithm reassigns sub-tasks to robots based on their functioning skills and
locally revises team plans to accommodate these new assignments and ensure
mission completion. The main novelty of the proposed algorithm is its ability
to handle cases where mission completion becomes impossible due to limited
functioning robots. Instead of reporting mission failure, the algorithm
strategically prioritizes the most crucial sub-tasks and locally revises the
team's plans, as per user-specified priorities, to minimize mission violations.
We provide theoretical conditions under which the proposed framework computes
the minimum-violation task reassignments and team plans. We provide numerical
and hardware experiments to demonstrate the efficiency of the proposed method.
♻ ☆ CAP-Net: A Unified Network for 6D Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D Image CVPR 2025
This paper tackles category-level pose estimation of articulated objects in
robotic manipulation tasks and introduces a new benchmark dataset. While recent
methods estimate part poses and sizes at the category level, they often rely on
geometric cues and complex multi-stage pipelines that first segment parts from
the point cloud, followed by Normalized Part Coordinate Space (NPCS) estimation
for 6D poses. These approaches overlook dense semantic cues from RGB images,
leading to suboptimal accuracy, particularly for objects with small parts. To
address these limitations, we propose a single-stage Network, CAP-Net, for
estimating the 6D poses and sizes of Categorical Articulated Parts. This method
combines RGB-D features to generate instance segmentation and NPCS
representations for each part in an end-to-end manner. CAP-Net uses a unified
network to simultaneously predict point-wise class labels, centroid offsets,
and NPCS maps. A clustering algorithm then groups points of the same predicted
class based on their estimated centroid distances to isolate each part.
Finally, the NPCS region of each part is aligned with the point cloud to
recover its final pose and size. To bridge the sim-to-real domain gap, we
introduce the RGBD-Art dataset, the largest RGB-D articulated dataset to date,
featuring photorealistic RGB images and depth noise simulated from real
sensors. Experimental evaluations on the RGBD-Art dataset demonstrate that our
method significantly outperforms the state-of-the-art approach. Real-world
deployments of our model in robotic tasks underscore its robustness and
exceptional sim-to-real transfer capabilities, confirming its substantial
practical utility. Our dataset, code and pre-trained models are available on
the project page.
comment: To appear in CVPR 2025 (Highlight)
♻ ☆ Robotic Optimization of Powdered Beverages Leveraging Computer Vision and Bayesian Optimization
The growing demand for innovative research in the food industry is driving
the adoption of robots in large-scale experimentation, as it offers increased
precision, replicability, and efficiency in product manufacturing and
evaluation. To this end, we introduce a robotic system designed to optimize
food product quality, focusing on powdered cappuccino preparation as a case
study. By leveraging optimization algorithms and computer vision, the robot
explores the parameter space to identify the ideal conditions for producing a
cappuccino with the best foam quality. The system also incorporates computer
vision-driven feedback in a closed-loop control to further improve the
beverage. Our findings demonstrate the effectiveness of robotic automation in
achieving high repeatability and extensive parameter exploration, paving the
way for more advanced and reliable food product development.
♻ ☆ Curriculum-based Sample Efficient Reinforcement Learning for Robust Stabilization of a Quadrotor
Fausto Mauricio Lagos Suarez, Akshit Saradagi, Vidya Sumathy, Shruti Kotpaliwar, George Nikolakopoulos
This article introduces a curriculum learning approach to develop a
reinforcement learning-based robust stabilizing controller for a Quadrotor that
meets predefined performance criteria. The learning objective is to achieve
desired positions from random initial conditions while adhering to both
transient and steady-state performance specifications. This objective is
challenging for conventional one-stage end-to-end reinforcement learning, due
to the strong coupling between position and orientation dynamics, the
complexity in designing and tuning the reward function, and poor sample
efficiency, which necessitates substantial computational resources and leads to
extended convergence times. To address these challenges, this work decomposes
the learning objective into a three-stage curriculum that incrementally
increases task complexity. The curriculum begins with learning to achieve
stable hovering from a fixed initial condition, followed by progressively
introducing randomization in initial positions, orientations and velocities. A
novel additive reward function is proposed, to incorporate transient and
steady-state performance specifications. The results demonstrate that the
Proximal Policy Optimization (PPO)-based curriculum learning approach, coupled
with the proposed reward structure, achieves superior performance compared to a
single-stage PPO-trained policy with the same reward function, while
significantly reducing computational resource requirements and convergence
time. The curriculum-trained policy's performance and robustness are thoroughly
validated under random initial conditions and in the presence of disturbances.
comment: 8 pages, 7 figures
♻ ☆ Securing the Skies: A Comprehensive Survey on Anti-UAV Methods, Benchmarking, and Future Directions CVPR
Yifei Dong, Fengyi Wu, Sanjian Zhang, Guangyu Chen, Yuzhi Hu, Masumi Yano, Jingdong Sun, Siyu Huang, Feng Liu, Qi Dai, Zhi-Qi Cheng
Unmanned Aerial Vehicles (UAVs) are indispensable for infrastructure
inspection, surveillance, and related tasks, yet they also introduce critical
security challenges. This survey provides a wide-ranging examination of the
anti-UAV domain, centering on three core objectives-classification, detection,
and tracking-while detailing emerging methodologies such as diffusion-based
data synthesis, multi-modal fusion, vision-language modeling, self-supervised
learning, and reinforcement learning. We systematically evaluate
state-of-the-art solutions across both single-modality and multi-sensor
pipelines (spanning RGB, infrared, audio, radar, and RF) and discuss
large-scale as well as adversarially oriented benchmarks. Our analysis reveals
persistent gaps in real-time performance, stealth detection, and swarm-based
scenarios, underscoring pressing needs for robust, adaptive anti-UAV systems.
By highlighting open research directions, we aim to foster innovation and guide
the development of next-generation defense strategies in an era marked by the
extensive use of UAVs.
comment: Accepted at CVPR Workshop Anti-UAV 2025. 15 pages
♻ ☆ Causality-enhanced Decision-Making for Autonomous Mobile Robots in Dynamic Environments
The growing integration of robots in shared environments -- such as
warehouses, shopping centres, and hospitals -- demands a deep understanding of
the underlying dynamics and human behaviours, including how, when, and where
individuals engage in various activities and interactions. This knowledge goes
beyond simple correlation studies and requires a more comprehensive causal
analysis. By leveraging causal inference to model cause-and-effect
relationships, we can better anticipate critical environmental factors and
enable autonomous robots to plan and execute tasks more effectively. To this
end, we propose a novel causality-based decision-making framework that reasons
over a learned causal model to predict battery usage and human obstructions,
understanding how these factors could influence robot task execution. Such
reasoning framework assists the robot in deciding when and how to complete a
given task. To achieve this, we developed also PeopleFlow, a new Gazebo-based
simulator designed to model context-sensitive human-robot spatial interactions
in shared workspaces. PeopleFlow features realistic human and robot
trajectories influenced by contextual factors such as time, environment layout,
and robot state, and can simulate a large number of agents. While the simulator
is general-purpose, in this paper we focus on a warehouse-like environment as a
case study, where we conduct an extensive evaluation benchmarking our causal
approach against a non-causal baseline. Our findings demonstrate the efficacy
of the proposed solutions, highlighting how causal reasoning enables autonomous
robots to operate more efficiently and safely in dynamic environments shared
with humans.
comment: Causal Discovery and Inference - Robot Autonomy - Human-Robot Spatial
Interaction - Decision-Making
♻ ☆ ExploRLLM: Guiding Exploration in Reinforcement Learning with Large Language Models ICRA
In robot manipulation, Reinforcement Learning (RL) often suffers from low
sample efficiency and uncertain convergence, especially in large observation
and action spaces. Foundation Models (FMs) offer an alternative, demonstrating
promise in zero-shot and few-shot settings. However, they can be unreliable due
to limited physical and spatial understanding. We introduce ExploRLLM, a method
that combines the strengths of both paradigms. In our approach, FMs improve RL
convergence by generating policy code and efficient representations, while a
residual RL agent compensates for the FMs' limited physical understanding. We
show that ExploRLLM outperforms both policies derived from FMs and RL baselines
in table-top manipulation tasks. Additionally, real-world experiments show that
the policies exhibit promising zero-shot sim-to-real transfer. Supplementary
material is available at https://explorllm.github.io.
comment: 6 pages, 6 figures, IEEE International Conference on Robotics and
Automation (ICRA) 2025
♻ ☆ Listen to Your Map: An Online Representation for Spatial Sonification
Robotic perception is becoming a key technology for navigation aids,
especially helping individuals with visual impairments through spatial
sonification. This paper introduces a mapping representation that accurately
captures scene geometry for sonification, turning physical spaces into auditory
experiences. Using depth sensors, we encode an incrementally built 3D scene
into a compact 360-degree representation with angular and distance information,
aligning this way with human auditory spatial perception. The proposed
framework performs localisation and mapping via VDB-Gaussian Process Distance
Fields for efficient online scene reconstruction. The key aspect is a
sensor-centric structure that maintains either a 2D-circular or 3D-cylindrical
raster-based projection. This spatial representation is then converted into
binaural auditory signals using simple pre-recorded responses from a
representative room. Quantitative and qualitative evaluations show improvements
in accuracy, coverage, timing and suitability for sonification compared to
other approaches, with effective handling of dynamic objects as well. An
accompanying video demonstrates spatial sonification in room-like environments.
https://tinyurl.com/ListenToYourMap
♻ ☆ Embedding high-resolution touch across robotic hands enables adaptive human-like grasping
Zihang Zhao, Wanlin Li, Yuyang Li, Tengyu Liu, Boren Li, Meng Wang, Kai Du, Hangxin Liu, Yixin Zhu, Qining Wang, Kaspar Althoefer, Song-Chun Zhu
Developing robotic hands that adapt to real-world dynamics remains a
fundamental challenge in robotics and machine intelligence. Despite significant
advances in replicating human hand kinematics and control algorithms, robotic
systems still struggle to match human capabilities in dynamic environments,
primarily due to inadequate tactile feedback. To bridge this gap, we present
F-TAC Hand, a biomimetic hand featuring high-resolution tactile sensing (0.1mm
spatial resolution) across 70% of its surface area. Through optimized hand
design, we overcome traditional challenges in integrating high-resolution
tactile sensors while preserving the full range of motion. The hand, powered by
our generative algorithm that synthesizes human-like hand configurations,
demonstrates robust grasping capabilities in dynamic real-world conditions.
Extensive evaluation across 600 real-world trials demonstrates that this
tactile-embodied system significantly outperforms non-tactile-informed
alternatives in complex manipulation tasks (p<0.0001). These results provide
empirical evidence for the critical role of rich tactile embodiment in
developing advanced robotic intelligence, offering new perspectives on the
relationship between physical sensing capabilities and intelligent behavior.
♻ ☆ Planning for quasi-static manipulation tasks via an intrinsic haptic metric: a book insertion case study
Contact-rich manipulation often requires strategic interactions with objects,
such as pushing to accomplish specific tasks. We propose a novel scenario where
a robot inserts a book into a crowded shelf by pushing aside neighboring books
to create space before slotting the new book into place. Classical planning
algorithms fail in this context due to limited space and their tendency to
avoid contact. Additionally, they do not handle indirectly manipulable objects
or consider force interactions. Our key contributions are: i) reframing
quasi-static manipulation as a planning problem on an implicit manifold derived
from equilibrium conditions; ii) utilizing an intrinsic haptic metric instead
of ad-hoc cost functions; and iii) proposing an adaptive algorithm that
simultaneously updates robot states, object positions, contact points, and
haptic distances. We evaluate our method on a crowded bookshelf insertion task,
and it can be generally applied to rigid body manipulation tasks. We propose
proxies to capture contact points and forces, with superellipses to represent
objects. This simplified model guarantees differentiability. Our framework
autonomously discovers strategic wedging-in policies while our simplified
contact model achieves behavior similar to real world scenarios. We also vary
the stiffness and initial positions to analyze our framework comprehensively.
The video can be found at https://youtu.be/eab8umZ3AQ0.
♻ ☆ Know Where You're Uncertain When Planning with Multimodal Foundation Models: A Formal Framework
Multimodal foundation models offer a promising framework for robotic
perception and planning by processing sensory inputs to generate actionable
plans. However, addressing uncertainty in both perception (sensory
interpretation) and decision-making (plan generation) remains a critical
challenge for ensuring task reliability. We present a comprehensive framework
to disentangle, quantify, and mitigate these two forms of uncertainty. We first
introduce a framework for uncertainty disentanglement, isolating perception
uncertainty arising from limitations in visual understanding and decision
uncertainty relating to the robustness of generated plans.
To quantify each type of uncertainty, we propose methods tailored to the
unique properties of perception and decision-making: we use conformal
prediction to calibrate perception uncertainty and introduce
Formal-Methods-Driven Prediction (FMDP) to quantify decision uncertainty,
leveraging formal verification techniques for theoretical guarantees. Building
on this quantification, we implement two targeted intervention mechanisms: an
active sensing process that dynamically re-observes high-uncertainty scenes to
enhance visual input quality and an automated refinement procedure that
fine-tunes the model on high-certainty data, improving its capability to meet
task specifications. Empirical validation in real-world and simulated robotic
tasks demonstrates that our uncertainty disentanglement framework reduces
variability by up to 40% and enhances task success rates by 5% compared to
baselines. These improvements are attributed to the combined effect of both
interventions and highlight the importance of uncertainty disentanglement,
which facilitates targeted interventions that enhance the robustness and
reliability of autonomous systems. Fine-tuned models, code, and datasets are
available at https://uncertainty-in-planning.github.io/.
comment: Fine-tuned models, code, and datasets are available at
https://uncertainty-in-planning.github.io/
♻ ☆ A Centralized Planning and Distributed Execution Method for Shape Filling with Homogeneous Mobile Robots
The pattern formation task is commonly seen in a multi-robot system. In this
paper, we study the problem of forming complex shapes with functionally limited
mobile robots, which have to rely on other robots to precisely locate
themselves. The goal is to decide whether a given shape can be filled by a
given set of robots; in case the answer is yes, to complete a shape formation
process as fast as possible with a minimum amount of communication. Traditional
approaches either require global coordinates for each robot or are prone to
failure when attempting to form complex shapes beyond the capability of given
approaches - the latter calls for a decision procedure that can tell whether a
target shape can be formed before the actual shape-forming process starts. In
this paper, we develop a method that does not require global coordinate
information during the execution process and can effectively decide whether it
is feasible to form the desired shape. The latter is achieved via a planning
procedure that is capable of handling a variety of complex shapes, in
particular, those with holes, and assigning a simple piece of scheduling
information to each robot, facilitating subsequent distributed execution, which
does not rely on the coordinates of all robots but only those of neighboring
ones. The effectiveness of our shape-forming approach is vividly illustrated in
several simulation case studies.
♻ ☆ Safety with Agency: Human-Centered Safety Filter with Application to AI-Assisted Motorsports
Donggeon David Oh, Justin Lidard, Haimin Hu, Himani Sinhmar, Elle Lazarski, Deepak Gopinath, Emily S. Sumner, Jonathan A. DeCastro, Guy Rosman, Naomi Ehrich Leonard, Jaime Fernández Fisac
We propose a human-centered safety filter (HCSF) for shared autonomy that
significantly enhances system safety without compromising human agency. Our
HCSF is built on a neural safety value function, which we first learn scalably
through black-box interactions and then use at deployment to enforce a novel
state-action control barrier function (Q-CBF) safety constraint. Since this
Q-CBF safety filter does not require any knowledge of the system dynamics for
both synthesis and runtime safety monitoring and intervention, our method
applies readily to complex, black-box shared autonomy systems. Notably, our
HCSF's CBF-based interventions modify the human's actions minimally and
smoothly, avoiding the abrupt, last-moment corrections delivered by many
conventional safety filters. We validate our approach in a comprehensive
in-person user study using Assetto Corsa-a high-fidelity car racing simulator
with black-box dynamics-to assess robustness in "driving on the edge"
scenarios. We compare both trajectory data and drivers' perceptions of our HCSF
assistance against unassisted driving and a conventional safety filter.
Experimental results show that 1) compared to having no assistance, our HCSF
improves both safety and user satisfaction without compromising human agency or
comfort, and 2) relative to a conventional safety filter, our proposed HCSF
boosts human agency, comfort, and satisfaction while maintaining robustness.
comment: Accepted to Robotics: Science and Systems (R:SS) 2025, 22 pages, 16
figures, 7 tables