Robot Intelligence

We advance robot intelligence through foundation models and world models that enable robots to perceive, reason, and act in complex physical environments. Vision-Language-Action (VLA) models allow robots to ground language understanding in real-world manipulation and locomotion. Our world models learn physics-aware representations of environments, supporting sim-to-real transfer and long-horizon planning. We apply these approaches across manipulation, humanoid control, autonomous vehicles and racing, and multi-agent systems — combining multimodal sensing (vision, tactile, proprioception) with deep reinforcement learning and computer vision.

Humanoid Control

Vision-Language-Action Model

Reinforcement Learning for Dynamic Control

Robotic Manipulation with Tactile Sensor

Dexterous Hand Manipulation

Autonomous Vehicles and Racing

Key Research Topics

Foundation Models for Robotic Perception and Action
World Models for Physical Reasoning and Sim-to-Real Transfer
Vision-Language-Action (VLA) Models
Learning-based Manipulation and Locomotion
Autonomous Vehicles and Racing
Multi-Agent Motion and Task Planning
3D Computer Vision and Scene Reconstruction
Tactile Sensing for Dexterous Manipulation

Foundation ModelsWorld ModelsVLAReinforcement LearningAutonomous VehiclesComputer VisionHumanoid