SmolVLA on SO-ARM101
This project implements a Vision Language Action (VLA) model trained on the SO-ARM101 robotic arm platform. Using SmolVLA (a lightweight, efficient VLA model), the system enables robots to understand visual scenes, interpret natural language instructions, and execute precise manipulation tasks with six degrees of freedom.
Vision Language Action models represent a breakthrough in robotic manipulation by combining:
This unified model allows robots to learn from demonstrations and generalize to new tasks through language guidance.