R²D²: Unlocking Robotic Assembly and Contact Rich Manipulation with NVIDIA Research

This blog post was originally published at NVIDIA’s website. It is reprinted here with the permission of NVIDIA.

This edition of NVIDIA Robotics Research and Development Digest (R2D2) explores several contact-rich manipulation workflows for robotic assembly tasks from NVIDIA Research and how they can address key challenges with fixed automation, such as robustness, adaptability, and scalability.

What is contact-rich manipulation?

Contact-rich manipulation refers to robotic tasks that involve continuous or repeated physical contact between the robot and objects in its environment, requiring precise control of forces and motion. Unlike simple pick-and-place operations, these tasks demand fine-grained interaction to manage friction, compliance, and alignment under uncertainty.

It plays a key role in industries like robotics, manufacturing, and automotive, where tasks such as inserting pegs, meshing gears, threading bolts, or assembling snap-fit parts are common. As a core capability in robotic assembly, contact-rich manipulation enables robots to perform complex, high-precision tasks, which are crucial for automating assembly and handling real-world variability.

NVIDIA research workflows for challenging robotic assembly tasks

Solving robotic assembly tasks has been challenging due to the need for precise manipulation in dynamic environments. Traditionally, robotic assembly relied on fixed automation, limiting flexibility. However, advances in AI, machine learning, and robotic simulation are enabling robots to handle more complex tasks. The following NVIDIA Research workflows for robotic assembly tasks mark a move from rigid automation to more flexible and scalable robotic systems.

  • Factory: A fast, physics-based simulation and learning toolkit for real-time contact-rich interactions.
  • IndustReal: A toolkit of algorithms and systems enabling robots to learn assembly tasks in simulation using reinforcement learning and transfer them to the real world.
  • AutoMate: A novel policy-learning framework for training specialist and generalist robotic assembly policies across diverse geometries.
  • MatchMaker:  A novel pipeline for auto-generating diverse, sim-ready assembly asset pairs using generative AI.
  • SRSA: A framework for retrieving preexisting skills for fine-tuning on a new robot assembly task.
  • TacSL: A library for GPU-based visuotactile sensor simulation and learning.
  • FORGE: Zero-shot sim-to-real transfer of reinforcement-learning policies that use force measurements as input.

Foundational advancements in robotic assembly: Factory, IndustReal, and Automate

Simulating contact-rich interactions in real-time, long viewed as computationally intractable, saw a breakthrough with Factory—a GPU-based simulation framework using SDF collisions, contact reduction, and a Gauss-Seidel solver. The environments are now available in NVIDIA Isaac Lab as seen in Figure 1. Building on this, the release of IndustReal enabled zero-shot transfer of assembly skills from simulation to the real world with 83% to 99% success over 600 trials, driven by innovations like simulation-aware policy updates, SDF-based rewards, sampling-based curricula, and a policy-level action integrator. It was tested on the Franka Panda and the UR10e, opening the door for real-world industrial applications of these methods.

Figure 1. Contact-Rich Simulation Environments in Isaac Lab

AutoMate expanded these efforts by introducing the first simulation-based framework to solve a wide range of challenging assembly tasks by combining reinforcement learning with imitation learning, achieving zero-shot sim-to-real transfer at scale. It offers 100 simulation-compatible assembly assets, specialist policies for solving around 80 tasks, and a generalist policy trained via distillation and fine-tuning to handle 20 tasks, all achieving ~80% success rates.

Notably, both policy types demonstrated zero-shot sim-to-real transfer, sometimes exceeding simulated performance. AutoMate was evaluated with over 5M simulated trials and 500 real-world trials, with an example shown in Figure 2. Specialist policies were trained using a novel combination of assembly-by-disassembly, reinforcement learning with imitation learning, and Dynamic Time Warping (DTW). The generalist policy used a PointNet autoencoder for geometry representation, distilled knowledge from specialist policies, and leveraged RL-based fine-tuning.

Figure 2. AutoMate policy deployment in simulation and the real world, with unique assembly IDs shown above each example

Exploring new frontiers in robotic assembly with advanced learning algorithms and automation

Building on the breakthroughs of Factory and IndustReal, the team pushed the boundaries of contact-rich manipulation by tackling more complex and varied assembly challenges—leveraging automated asset generation, skill retrieval and adaptation, reinforcement and imitation learning, and different sensory inputs. The following sections spotlight this next wave of innovation.

MatchMaker: automated asset generation for robotic assembly

Figure 3. Matchmaker asset generation pipeline

MatchMaker, featured at ICRA 2025, is a novel generative pipeline that automatically generates diverse, simulation-compatible assembly asset pairs to facilitate learning assembly skills. It addresses the challenge of manually curating assets by generating interpenetration-free, geometrically mating parts. MatchMaker accepts three possible inputs—no input, a single asset, or an assembly pair—and outputs a simulation-ready pair of assets with adjustable clearance.

Key contributions:

  • Automated conversion: Transforms incompatible asset pairs into simulation-compatible models.
  • Asset pair generation: Creates a geometrically mating part from a single asset for new assembly tasks.
  • Realistic contact interactions: Erodes contact surfaces based on user-defined clearance, ensuring realistic part interactions.

MatchMaker generates assembly pairs in three stages as a shape completion task, as seen in Figure 3:

  • Contact-surface detection: A VLM (GPT-4o) identifies asset type, assembly direction, and axis to detect contact faces.
  • Shape completion: 3D generative models complete the second asset of the pair.
  • Clearance specification: Contact surfaces are eroded to avoid interpenetration and ensure compatibility with simulators.

Figure 4 shows examples of the generated assets. MatchMaker has been validated in simulated and real-world environments, demonstrating its effectiveness in developing robust assembly policies.

Figure 4. Examples of assembly pairs generated by MatchMaker

SRSA: Skill retrieval and adaptation for robotic assembly tasks

SRSA, featured as an ICLR 2025 Spotlight, is a framework that enables data-efficient robot learning by reusing and adapting pre-trained skills from a library of assembly tasks. It selects the most suitable existing policy for a new task based on geometry, dynamics, actions, and predicted success, then fine-tunes it for the target task.

Figure 5. SRSA retrieves and fine-tunes the best prior skill before adding it to the skill library

SRSA outperforms learning from scratch (e.g., learning using AutoMate techniques) by offering better performance, efficiency, and stability, while supporting continual learning. It achieves 19% higher success on new tasks, needs 2.4x fewer samples, and reaches a 90% mean success rate in real-world tests.

TacSL: a library for visuotactile sensor simulation and learning

TacSL (taxel) is a library for accelerated GPU-based simulation of visuotactile images (i.e., the output of a popular robotic touch sensor) and corresponding contact-force fields at over 200x faster than the prior state-of-the-art.

Figure 6. Components of the TacSL toolkit

For humans, the sense of touch is a key part of how we understand and interact with the world, helping us feel pressure, texture, and shape. In robots, tactile sensing means using special touch sensors to detect contact with objects, which is especially useful for tasks like picking things up, assembling parts, or adjusting a grip. Despite its value, tactile sensing in robots remains underutilized compared to visual sensing, because it’s hard to understand the data, simulate touch realistically, and train robots to act on it.

TacSL tackles these longstanding challenges with a fast, GPU-accelerated tactile simulation module for visuotactile sensors and learning algorithms. It enables robots to learn contact-rich tasks, like peg placement, in simulation with realistic touch feedback. TacSL supports large-scale training and successful sim-to-real transfer (83% to 91% success rates), making touch-based learning more practical and scalable.

Video 1. Insertion policy execution with varying socket location, peg-in-hand position, and peg-in-hand orientation, demonstrating robustness to acute illumination changes

FORGE: force-guided exploration for robust contact-rich manipulation under uncertainty

Figure 7. The FORGE training pipeline and multi-stage planetary gearbox assembly example

FORGE introduces a method that enables the zero-shot sim-to-real transfer of reinforcement-learning policies that utilize force measurements as input. This approach is important when there is significant uncertainty in part poses, or when tasks demand high precision and accuracy.

Key features:

  • Force observation inputs: Adaptively regulates exerted force during manipulation.
  • Force threshold mechanism: Limits maximum force to protect delicate parts.
  • Dynamic randomization scheme: Varies robot dynamics and part properties during training for robust real-world generalization.
  • Success predictor: Enables autonomous task completion instead of relying on fixed-duration executions.

With these features, FORGE supports safe exploration and successful execution, even with up to 5 mm of position estimation error. This capability is demonstrated in the assembly of a multi-stage planetary gear system, as shown in Figure 8. The task requires three assembly skills: insertion, gear meshing, and nut-and-bolt threading.

Figure 8. FORGE in action on three distinct assembly skills: insertion, gear meshing, and nut-and-bolt threading

FORGE handles forceful tasks like snap-fit insertion and auto-tunes force limits using success prediction when the required force is unknown. This demonstrates its ability to manage complex tasks with high precision and adapt to real-world uncertainties.

Summary

Robotic assembly is complex, needing precise contact and real-world adaptability. This post highlights research advancing sim-to-real transfer, tactile and force sensing, and automated asset generation—paving the way for more flexible, adaptable automation.

This post is part of our NVIDIA Robotics Research and Development Digest (R2D2) series that helps developers gain deeper insight into the SOTA breakthroughs from NVIDIA Research across physical AI and robotics applications.

Stay up-to-date by subscribing to the newsletter and following NVIDIA Robotics on YouTube, Discord, and developer forums.

To get started on your robotics journey, enroll in free NVIDIA Robotics Fundamentals courses.

Acknowledgments

For their contributions to the research mentioned in this post, thanks to Abhishek Gupta, Adam Moravanszky, Ankur Handa, Bingjie Tang, Bowen Wen, Chad Kessens, Chuang Gan, Dieter Fox, Fabio Ramos, Gaurav S. Sukhatme, Gavriel State, Iretiayo Akinola, Jan Carius, Jie Xu, Kaichun Mo, Karl Van Wyk, Kier Storey, Lukasz Wawrzyniak, Michael A. Lin, Michael Noseworthy, Michelle Lu, Miles Macklin, Nicholas Roy, Philipp Reist, Yashraj Narang, Yian Wang, Yijie Guo, Yunrong Guo.

Oyindamola Omotuyi
Technical Marketing Engineer, NVIDIA

Yashraj Narang
Senior Research Scientist, NVIDIA

Here you’ll find a wealth of practical technical insights and expert advice to help you bring AI and visual intelligence into your products without flying blind.

Contact

Address

Berkeley Design Technology, Inc.
PO Box #4446
Walnut Creek, CA 94596

Phone
Phone: +1 (925) 954-1411
Scroll to Top