A New Grasp on Robotics: Teaching Robots to Hold the Future

4 June 2025
Assistant Professor
Computer Science
SHARE THIS ARTICLE

Walk into any modern warehouse or high-tech factory and you’ll find robots moving with impressive precision. They can zip down aisles, lift heavy loads, and work around the clock. But look closer, especially when they need to pick up something delicate or oddly shaped, and you’ll likely notice a struggle. Despite decades of progress, robots still have trouble with what might seem like a simple task: grasping objects as reliably and flexibly as a human hand.

This seemingly basic problem of “dexterous grasping” isn’t just a nuisance. It’s a fundamental barrier to the next wave of robotic applications.  Think of robots assisting in surgery, delicately assembling electronics, or even helping out at home by picking up toys or sorting groceries. The stakes are high, and the solution has remained elusive.

That’s what makes the recent research from Assistant Professor Shao Lin at NUS Computing so exciting. A team of scientists from NUS School of Computing and Shanghai Jiaotong University has developed a novel framework called D(R, O) Grasp that promises to bring human-like dexterity within reach of robotic hands. It combines deep learning with a novel spatial representation that allows robots to learn grasping strategies that are fast, accurate, and critically, generalizable across different robot hands and objects.

The research, grounded in both theory and real-world validation, may well mark a turning point in robotic manipulation. And yes; it all begins with a grasp.

 

Why Grasping Is So Hard for Robots

Humans make grasping look easy. We can pick up a slippery glass, turn a doorknob, or gently cradle a bird. But under the surface, these actions require sophisticated spatial awareness, precise motor control, and the ability to adapt to countless variations in shape, size, weight, and texture.

For robots, there have traditionally been two approaches to this problem. The first is the robot-centric method, where the robot learns to move its own joints and fingers to pick up an object. These models are fast, but they’re also brittle – change the object slightly, or swap out the robot hand, and they often fail. The second approach is object-centric. These models analyze the object’s shape and attempt to compute good grasps independent of the robot body. They’re more flexible but often slow and computationally costly.

In other words, robot-centric methods are fast but narrow; object-centric methods are general but slow. What if there was a way to get the best of both?

 

Introducing D(R, O) Grasp: A Unified Approach

That’s exactly what D(R, O) Grasp tries to do. The core idea is deceptively simple: instead of focusing only on the object or only on the robot, D(R, O) Grasp looks at the relationship between the two.

This relationship is encoded in something called a D(R, O) matrix that encodes the relative spatial distances between key points on the robot’s hand (in its desired grasp pose) and points on the object. Think of it as a 3D map that shows exactly how the hand and object should align for a successful grasp. The beauty of the D(R, O) representation is that it works across different hand designs and object geometries. It captures the essence of the interaction between the robot and what it’s trying to hold.

 

Learning to Know Its Own Hand

But creating this kind of generalizable knowledge requires something more than just data; it requires self-awareness, or at least something close to it.

Before D(R, O) Grasp learns how to grasp objects, it needs to learn about itself. Specifically, it undergoes a training process called configuration invariant pretraining. During this phase, the system is shown different poses of its own hand—from fully open to tightly closed—and learns to identify which points on the hand remain the same across these poses.

This might sound trivial, but it’s not. Robot hands change shape dramatically when they grasp. By learning which parts of the hand are “invariant” regardless of pose, the model develops a stable internal map of its own structure. This self-knowledge becomes crucial when the robot needs to plan how to move its joints into a specific grasp pose.

 

The Pipeline: From Sight to Touch

So, how does it all work when a robot is presented with a new object?

First, the system obtains two point clouds: it samples points on the hand’s link meshes and transforms them with forward kinematics to generate the hand points, then captures the object points with a depth camera. These point clouds are then encoded using neural networks, with the robot-hand encoder leveraging its earlier self-understanding.

Next, the system uses a cross-attention transformer, a powerful deep learning architecture, to match features between the robot and object. This is followed by a conditional variational autoencoder (CVAE), a structure that allows the model to generate multiple valid grasp poses, not just one. After all, there’s usually more than one good way to pick something up.

The CVAE produces a D(R, O) matrix that defines how far each point on the robot hand should be from the object. Using this matrix, the robot then estimates the final 3D positions for its hand points using a method called multilateration (similar to how GPS works). It aligns its rigid finger segments with these positions and finally solves an optimization problem to compute the specific joint angles needed to execute the grasp—all while staying within its physical constraints.

 

Performance in the Real World

All this theory is impressive, but does it actually work?

The results say yes—resoundingly. D(R, O) Grasp was tested on three different robot hands: the Barrett Hand, the Allegro Hand, and the Shadow Hand. Despite their vastly different shapes and numbers of joints, the system adapted to all of them. In simulation, it demonstrated strong success rates across both known and novel objects.

But perhaps more importantly, the system was fast. The entire pipeline—from perception to grasp execution—took less than a second. That makes D(R, O) Grasp viable for real-time use in applications like pick-and-place tasks or object sorting.

The researchers also tested D(R, O) Grasp in real-world experiments using a robotic arm and hand setup. It achieved an 89% success rate when grasping objects it had never seen before. It even handled partial observations (i.e., situations where parts of the object were obscured) remarkably well.

 

Why This Matters: Unlocking the Next Generation of Robotics

The implications of this work are far-reaching.  In industrial automation, robots could be deployed more flexibly on production lines, quickly adapting to new tools or product shapes without manual reprogramming. In logistics, warehouse robots could handle a wider range of packages, reducing errors and improving throughput.

In the medical field, robotic assistants could assist surgeons by reliably handling instruments, even in dynamic and high-pressure environments. In eldercare or home robotics, helper robots could handle everything from medication bottles to laundry—objects they weren’t explicitly trained on.

And in space exploration or disaster recovery, environments where robots need to deal with unknown objects and environments, D(R, O) Grasp could make all the difference between failure and success.

But perhaps most exciting of all is the idea that the intelligence learned by one robot hand can be transferred to another. This kind of cross-embodiment generalization, demonstrated in the research, opens up new pathways for “robot brains” that aren’t tethered to a specific hardware body. Learn once, apply anywhere!

 

The Road Ahead

D(R, O) Grasp represents more than just a clever new algorithm; it embodies a new philosophy for robotic manipulation. It moves beyond the siloed thinking of robot-centric or object-centric models and instead embraces the complexity of interaction.

Its success hinges not on brute force, but on understanding.  The robot understanding itself, the object, and the task. And in doing so, it takes a significant step toward the dream of versatile, intelligent, reliable and trustworthy robotic agents.

There’s still much work to be done. Future improvements could include more dynamic grasps (think soft deformation), learning from tactile feedback, or incorporating planning in cluttered environments. But with D(R, O) Grasp, the foundation is now in place.

If robots are to truly become helpful partners in our daily lives, not just in factories, but in homes, hospitals, and public spaces, then grasping is not a footnote. It’s the starting point. Thanks to this innovative work at NUS Computing, we’re now much closer to getting a firm hold on the future.  

 

Further Readings: Wei, Z., Xu, Z., Guo, J., Hou, Y., Gao, C., Cai, Z., Luo, J., and Shao, L. (2025) “D(R, O) Grasp: A Unified Representation of Robot and Object Interaction for Cross-Embodiment Dexterous Grasping,” IEEE International Conference on Robotics and Automation (ICRA 2025), Atlanta: GA, May 19-23; available at https://arxiv.org/abs/2410.01702

Trending Posts