Abstract : We present a visual system for a humanoid robot that supports an efﬁcient online learning and recognition of various elements of the environment. Taking inspiration from child's perception and following the principles of developmental robotics, our algorithm does not require image databases, predeﬁned objects nor face/skin detectors. The robot explores the visual space from interactions with people and its own experiments. The object detection is based on the hypothesis of coherent motion and appearance during manipulations. A hierarchical object representation is constructed from SURF points and color of superpixels that are grouped in local geometric structures and form the basis of a multiple-view object model. The learning algorithm accumulates the statistics of feature occurrences and identiﬁes objects using a maximum likelihood approach and temporal coherency. The proposed visual system is implemented on the iCub robot and shows 85% average recognition rate for 10 objects after 30 minutes of interaction.