RIO: Flexible Real-time Robot I/O for Cross-Embodiment Robot Learning

Authors:
Pablo Ortega-Kral^*,1, Eliot Xing^*,1, Arthur Fender Coelho Bucker¹, Vernon Luk¹, Jason Kim², Owen Kwon¹, Angchen Xie¹, Nikhil Sobanbabu¹, Yifu Yuan¹, Megan Lee¹, Deepam Ameria¹, Bhaswanth Ayapilla¹, Jaycie Bussell³, Guanya Shi¹, Jonathan Francis^1,3, Jean Oh^1,4,†

Affiliations:
¹Carnegie Mellon University
²Delft University of Technology
³Bosch Center for AI
⁴Lavoro AI Research

^*equal contribution, ^†corresponding author

Despite recent efforts to collect multi-task or multi-embodiment datasets, to design efficient recipes for training Vision-Language-Action models (VLAs), and to showcase these models on selected robot platforms, generalist robot capabilities and cross-embodiment transfer remain largely elusive ideals. This cross-embodiment robot learning paradigm remains limited by fragmented data-collection infrastructure, the lack of standardization on versatile data formats, and the significant engineering effort involved in reproducing hardware setups and organizing multiple control stacks for quickly deploying models on diverse robot platforms. As a result, most robot code tends to be highly specific to the exact robot setup that the user decided on, which adds major overhead when attempting to reuse, recycle, or share artifacts between users. To bridge this gap, we present \proj (\PROJ), an open-source Python-based framework that provides flexible, lightweight components for robot control, teleoperation, data formatting, sensor configuration, and policy deployment across diverse hardware platforms and morphologies. \PROJ provides abstractions that enable users to make any choice (robots, sensors, teleoperation interfaces, middlewares, data formats, policies) and to switch between them, with minimal reconfiguration effort. We validate \PROJ on VLA deployment workflows across three morphologies (single-arm, bimanual, humanoid) and four robot hardware platforms with varying grippers and cameras. We showcase policy rollouts by collecting teleoperated data to fine-tune state-of-the-art VLAs, including $\pi_{0.5}$ and GR00T, on household tasks such as pick-and-place, folding, and bowl scrubbing. By open sourcing all our efforts, we hope the wider robotics community can accelerate their pace of robot learning on real-world robot hardware.

pdf

Share on

X (formerly Twitter) Facebook LinkedIn

Arthur F C Bucker

Share on