Biologically Inspired Hierarchical Vision-Language-Action System for Robotics

This project presents a biologically inspired hierarchical Vision-Language-Action (VLA) system for robotic control. It separates high-level reasoning from low-level execution, similar to how the human brain works. A Phi-4-multimodal model interprets visual scenes and natural language instructions, using chain-of-thought reasoning to break tasks into steps. A CLIPort-based visuomotor policy then executes each step precisely. Tested in simulation with a Franka Emika Panda arm, the system shows strong generalization across tasks. This modular design combines flexibility and precision, offering a scalable path toward more adaptable and intelligent robotic behavior.

EUCYS 2025

Biologically Inspired Hierarchical Vision-Language-Action System for Robotics

Cookies