DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics

We introduce the first work to explore web-scale diffusion models for robotics. DALL-E-Bot enables a robot to rearrange objects in a scene, by first inferring a text description of those objects, then generating an image representing a natural, human-like arrangement of those objects, and finally ph...

Full description

Bibliographic Details
Main Authors:	Johns, E. (Author), Kapelyukh, I. (Author), Vosylius, V. (Author)
Format:	Article
Language:	English
Published:	Institute of Electrical and Electronics Engineers Inc. 2023
Subjects:	AI-Based Methods Big Data in Robotics and Automation Deep Learning in Grasping and Manipulation Image segmentation Pipelines Predictive models Robots Task analysis Training Visualization
Online Access:	View Fulltext in Publisher View in Scopus

Description
Summary:	We introduce the first work to explore web-scale diffusion models for robotics. DALL-E-Bot enables a robot to rearrange objects in a scene, by first inferring a text description of those objects, then generating an image representing a natural, human-like arrangement of those objects, and finally physically arranging the objects according to that goal image. We show that this is possible zero-shot using DALL-E, without needing any further example arrangements, data collection, or training. DALL-E-Bot is fully autonomous and is not restricted to a pre-defined set of objects or scenes, thanks to DALL-E's web-scale pre-training. Encouraging real-world results, with both human studies and objective metrics, show that integrating web-scale diffusion models into robotics pipelines is a promising direction for scalable, unsupervised robot learning. Videos are available on our webpage at: <uri>https://www.robot-learning.uk/dall-e-bot</uri>. IEEE
Physical Description:	8
ISBN:	23773766 (ISSN)
DOI:	10.1109/LRA.2023.3272516

DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics

Similar Items