Machine Learning @lemmy.ml fox @lemm.ee 11 mo. ago

RT-2: New model translates vision and language into action

www.deepmind.com RT-2: New model translates vision and language into action

Introducing Robotic Transformer 2 (RT-2), a novel vision-language-action (VLA) model that learns from both web and robotics data, and translates this knowledge into generalised instructions for robotic control, while retaining web-scale capabilities. This work builds upon Robotic Transformer 1 (RT-1...