Researchers from UC Berkeley, Nvidia, and Stanford unveil T-Rex framework for robots to respond to physical contact in real time

2 hours ago 11

Teaching a robot to see is hard. Teaching it to talk is harder. Teaching it to feel things, and then react to what it feels in real time, while also seeing and understanding language? That’s the problem a team from UC Berkeley, Nvidia, Stanford, and collaborating institutions just took a serious swing at.

The framework is called T-Rex, short for Tactile-Reactive Dexterous Manipulation. It was submitted to arXiv on June 15 under paper ID 2606.17055, and it represents a meaningful leap in how robots handle physical contact during complex tasks.

What T-Rex actually does

Most modern robot brains, known as Vision-Language-Action (VLA) models, are good at processing what they see and understanding instructions. But the moment something unexpected happens during physical contact, like an object slipping or deforming, these systems tend to fall apart.

T-Rex solves this by adding a third sensory channel: high-frequency tactile data. The robot can feel what’s happening at its fingertips and adjust its grip or motion many times per second, not just react to what it sees.

The key architectural innovation is a variable-rate Mixture-of-Transformers, or MoT. This separates the robot’s brain into two processing speeds. Low-frequency visuomotor planning handles the big picture, things like where to reach and what sequence of actions to follow. High-frequency tactile reactivity handles the moment-to-moment adjustments, like how hard to squeeze an egg without cracking it.

Across 12 challenging real-world tasks, including page flipping, egg transfer, lock opening, and bulb screwing, T-Rex achieved an average success rate that exceeded existing benchmarks by over 30 percentage points.

The dataset behind the magic

The team collected roughly 100 hours of tactile-rich demonstrations using teleoperated setups. Human operators wore MANUS gloves, which capture precise finger motion and multi-modal sensing data, while controlling Sharpa Wave robotic hands. The demonstrations covered interactions with over 200 different objects across 22 distinct motor primitives.

Why Nvidia’s involvement matters

The variable-rate MoT architecture is computationally demanding. Running high-frequency tactile inference alongside lower-frequency vision and language processing requires hardware that can handle parallel workloads efficiently.

What this means for the robotics industry

T-Rex makes a compelling case that tactile sensing isn’t just additive to robot performance. The 30-plus percentage point improvement over existing systems suggests it’s transformative for contact-rich manipulation tasks.

The risk, as always with academic research, is the gap between lab performance and real-world deployment. Twelve tasks with carefully selected objects in a controlled setting is impressive but not the same as a robot working an eight-hour shift in a warehouse. The 100-hour dataset, while large by current standards, is still tiny compared to what production systems will eventually need.

Disclosure: This article was edited by Editorial Team. For more information on how we create and review content, see our Editorial Policy.

Read Entire Article