NeurIPS 2024 Workshop on Open-World Agents
The last few decades have witnessed the widespread adoption of robot teleoperation across a myriad of real-world domains, including manufacturing, healthcare, military, and beyond. It has been recognized as an effective approach to assist humans in remotely tackling tasks that pose significant challenges and risks when undertaken alone. To improve the efficiency of collaboration between human and robot in teleoperated systems, it is essential to facilitate the robot to precisely infer human intentions. In this work, we introduce RoHIE, a novel architecture designed to reason about the intentions of the human partner at different levels of granularity. In particular, it leverages non-verbal observations that capture the motion and gaze information in shared autonomy, and learns a flexible intention hierarchy to categorize the relationship between low-level action primitives and higher-level task goals, thereby enabling robust inference. Moreover, by learning a compact representation in the embedding space, our framework captures the latent structural information of human behaviors from human partners’ demonstrations, empowering the robot to robustly and accurately estimate the intention of new human companions. We further collect a teleoperation dataset featuring different human participants engaged in a variety of building block assembly tasks, and rigorously validate the efficacy of our approach against baseline methods with various evaluation metrics.