MINT Lab

Robot Intelligence

We advance robot intelligence through foundation models and world models that enable robots to perceive, reason, and act in complex physical environments. Vision-Language-Action (VLA) models allow robots to ground language understanding in real-world manipulation and locomotion. Our world models learn physics-aware representations of environments, supporting sim-to-real transfer and long-horizon planning. We apply these approaches across manipulation, humanoid control, autonomous vehicles and racing, and multi-agent systems — combining multimodal sensing (vision, tactile, proprioception) with deep reinforcement learning and computer vision.

Humanoid Control

Vision-Language-Action Model

Reinforcement Learning for Dynamic Control

Robotic Manipulation with Tactile Sensor

Dexterous Hand Manipulation

Autonomous Vehicles and Racing

Key Research Topics

  • Foundation Models for Robotic Perception and Action
  • World Models for Physical Reasoning and Sim-to-Real Transfer
  • Vision-Language-Action (VLA) Models
  • Learning-based Manipulation and Locomotion
  • Autonomous Vehicles and Racing
  • Multi-Agent Motion and Task Planning
  • 3D Computer Vision and Scene Reconstruction
  • Tactile Sensing for Dexterous Manipulation
Foundation ModelsWorld ModelsVLAReinforcement LearningAutonomous VehiclesComputer VisionHumanoid