Vision Language Action Model (VLA)

SmolVLA on SO-ARM101

This project implements a Vision Language Action (VLA) model trained on the SO-ARM101 robotic arm platform. Using SmolVLA (a lightweight, efficient VLA model), the system enables robots to understand visual scenes, interpret natural language instructions, and execute precise manipulation tasks with six degrees of freedom.

Back to Portfolio