Self-developing Proprioception-Based Robot Internal Models

,


Introduction
Arm motion skills such as reaching, grasping, placing as well as other manipulations are fundamental for humanoid robots and have been heavily focused for decades.Traditionally, controlling methods like inverse kinematics work well in well-defined environments [21] [10].However, such methods are weak in adaption and lead to poor performance in volatile environments due to the strong reliance on prior knowledge about the kinematic parameters of the robots and the information of environments.As model learning may allow the absence of knowledge about the robot parameters and can work in unstructured environments, it is gaining more interests in robot motion control [14] [18] [8].In the work of Nguyen-Tuong and Peters,they pointed out that uncertainty of environments is the key problem in robotics [14].In this respect, humans usually develop their new skills in a quite short time and the acquired ability is robust in versatile environments.Therefore, the human-inspired learning methods could offer a promising alternative for robot skill acquisition, especially for humanoids which have human-like physical body structures [13] [1].
Studies on human cognitive development reveal that the central nervous system (CNS) makes use of internal models in planning, controlling and learning of the motor behaviors [23].Generally, there are two kinds of internal models: forward model (FM) and inverse model (IM).The FM, also called predictor, predicts the next sensory state according to the current state and motor commands [9].While the IM converts desired state into motor commands, which is also known as controller.The FM and IM always work in harmony in a feedback manner to fulfill complex arm motion control tasks [22] [15].These views from cognitive science provide a solid inspiration and imply that developing the FM and IM can contribute to gaining the abilities for robots to accomplish various motion tasks.
Actually, many works have focused on how to build internal models for robots.For example, back 2001, D'Souza et al. employed the then-popular Locally Weighted Projection Regression (LWPR) to learn the inverse kinematic mappings [5].The LWPR performed well on the given task.However, it largely depends on the data distribution and different tasks requires different optimization criteria, and thus the expandability is debased.In 2007, Castellini et al. tried to establish human internal models for motion behaviors using the learning machine [3].They concentrated on reaching and grasping and collected the biometric data from human users during grasping, then used the data to train a support vector machine (SVM).Yet model is learned for predicting the behaviors of humans or robots, not for motion control.In 2010, Baranes and Oudeyer introduced SAGG-RIAC algorithm to learn the internal models for a simulated robot efficiently and actively [2], while the accuracy of both the inverse model and the forward model seem to be unappealing respectively.There are also relative works on humanoids.In 2012, Schillaci et al. proposed to map the arm and neck joint angles with the estimated 3D position of hand (using a marker) to establish the forward and inverse models for a humanoid robot NAO [19].However, previous research did not pay much attention to the real human mechanism on how the internal models were developed.
As cognitive psychologists suggested, infants primarily employ proprioceptive information instead of vision to control and direct their hands towards a specific target [17].Besides, the emergence of human arm motion ability like reaching is the product of a deeply embodied process in which infants firstly learn to direct these arm movements in space using proprioception [4].The term proprioception, firstly introduced by Sherrington in 1907 [20], represents the sense of the relative position of neighboring parts of the body and strength of effort being employed in movement.Intuitively, it originates from the embodied perspective that humans have sense of their own body even when eyes are closed.It has been shown that those findings of human mechanism involving proprioception are beneficial for robots [7] [11] [12].In 2010, Huffmann et al. pointed out that the body representations contribute to better adaption in new environments [7].In 2016 and 2017, Luo et al. simulated the findings from human infants by Corbetta et al. with humanoid robot, and illustrated the effectiveness [11] [12].
In this work, inspired by the fact that proprioception plays an important role in human infants developing their internal models, the issue of how a robot establishes its internal models for arm motion control is further discussed.By borrowing the basic idea from Luo et al. [11] [12], we propose new proprioceptionbased internal models for robot arm motion control.Different from other works, the new proposed models in this work not only model the proprioception with autoencoder network and develop the internal models by robot itself just like what human infants do, but also delicately model both FM and IM with deep neural networks where modified cascade-structures are involved.The effectiveness of the new models is illustrated with the improved performance in the experiments, which indicates that human mechanism is deeply captured by the new models.

Developing the Internal Models
To develop the internal models is to establish the mapping between Motions and States.Since the changes of arm joint angles most directly describe arm motions, we choose arm joint angles to represent the motions (for dynamic control it is better to use joint velocity and acceleration to describe the motions).It is studied that when reaching a target, subjects aligned their arms by matching hand position rather than elbow angle [6], so we chose the hand pose (position and orientation) to represent the state of the arm.Therefore, the internal models are essentially the mappings between the arm joint angles and the hand pose.
As is shown in Fig. 1, the learning process originates from the random selfproduced arm movements just like human infants' babbling.During the learning process, the proprioception is developed to represent the joint angles, and the two internal models are developed based on the currently learned proprioception.More detailedly, once the robot conducts a self-produced arm movement, proprioception is updated once, and the two internal models are also updated based on the current proprioception.With the capability of deep neural network in describing the mappings between different variables, in this paper, we employ neural networks to formalize both forward models and inverse models.

Developing the proprioception
Proprioception is the sense of human own body and is provided by different proprioceptors.From an embodied view, humans have the sense of each part of their own body and such sense of their body is called proprioception.For example, even if we close our eyes, we can feel the position of our arms.What we know is not the exact joint angles but vague sense.For arm motion control we discussed here, the most important proprioception is the sense of arm joints.For humans, the fibrous capsules act as the proprioceptors to help humans get Each movement is accompanied by a pair of arm joint angles q and hand pose p which are the raw materials of learning.
the sense of their joints.As for robots, the joint servos sever as the same effect which helps robots learn the proprioception of their joints.
To model the proprioception of robots, we consider the characteristics of human proprioception.First, the exact value of proprioception is unknowable for humans, which means the learning of proprioception is unsupervised.Second, it should be able to represent the body states and as for robots the proprioception should be able to transfer to joint state as the input of joint servos.Thus, the autoencoder neural network is considered, which have identical input and output.We take joint state as input of autoencoder.The gotten hidden layer is considered as proprioception, which accords with the fact that the proprioception is invisible to human consciousness and can rebuild the joint state.In this way, the features of joint state which represent proprioception may help to reduce the influence of noise and be more suitable for modeling.
Therefore, autoencoders are employed to learn the proprioception of joints.The joint angles act as the input and the hidden layer as proprioception.The proprioception of each joint is developed respectively.The structures of the models are shown in Fig. 2, where q i , i = 1, 2, .., n is the joint angle read from the servo and S qi represents the sense of the i th joint, and n is the total number of joints.
The structure of the autoencoders employed to develop the proprioception of each joint respectively (the number of circles in the figure do not represent the sizes of the layers).

Developing the forward model
Developing the forward model is essentially learning the mapping from joint angles onto hand states.In this research, the proprioception is introduced and we propose to establish the forward model through the mapping from proprioception of joints S q onto the hand pose p and we name it proprioception-based neural forward model (PNFM), while the S q is the splicing of all joint proprioception.However, the most intuitive and common formation of the forward model is the mapping from joint angles q onto the hand pose p, and we name this classical neural forward model NFM.An illustration of the two forward models is shown in Fig. 3.The comparison of NFM's and PNFM's performance may show the effectiveness as proprioception is involved in modeling FM.

Developing the inverse model
The inverse model is essentially the mapping from hand pose onto the arm joints.
Similar to the forward model, we propose to use the proprioception of joints S q to replace the joint angles q and the inverse model based on the proprioception is shown in Fig. 4 (b) which is named PNIM.The most basic formation of neural inverse model (NIM) is shown in Fig. 4 (a) which directly employs the arm joint angles q.However, different from the forward model, the inverse map is more complicated thus is more difficult to learn, especially for the complex-structured redundant robot arms.

Experiments
In this section, experiments are conducted to show the effectiveness of our proprioception model, and to compare the two forward models as well as the three inverse models for evaluating the function of proprioception.We conduct all the experiments on our PKU-HR6.0robot platform (Fig. 5 (a)).The models are trained in the simulation system Gazebo (Fig. 5 (b)) where the hand pose can be directly read from the supervisor view and the abrasion of the real robot is avoided.Then the learned models are finally employed in the real robot while the hand pose is calculated by 3D vision system.PKU-HR6.0 is the 6th generation kid-sized humanoid robot (3.93kg weight and 59.50cm height) designed by our lab.It has 24 degrees of freedom (DOFs) with four DOFs for moving plus one DOF for grasping in each arm.In this paper, we only consider the right arm (Shoulder Pitch, Shoulder Roll, Elbow Roll and Elbow Yaw).Yet the proposed model is easily applied to more complex tasks such as whole body movements.

Development of proprioception
In this experiment, the learning process of proprioception is shown to demonstrate the developmental learning process, and the performance of the learned proprioception are shown to evaluate the effectiveness of autoencoder network in modeling the proprioception.As described before, the whole learning process is conducted in a developing manner, which means once the robot conducts a self-produced movement, the proprioception model and the two internal models are updated for one time.In the learning process, the angle of each joint is recorded by the joint servo which is a numerical value (in rad).
We conduct Joint Position Matching to evaluate the performance of proprioception, which is an established protocol for measuring the joint position proprioception of humans.In the experiments, individuals are blindfolded while a joint is moved to a specific angle q for a given period of time.Then the subjects are asked to replicate the specified angle, and the replicated angle is marked as q .The difference between q and q reflects the accuracy of proprioception.For the evaluation of robot's performance, after each iteration, we randomly conduct N um times of Joint Position Matching and calculate the average error (we set N um = 100).Fig. 6 shows the records of the average error of each iteration during learning the proprioception of joint Shoulder Pitch.The three small pictures in Fig. 6 show the results of one same task of Joint Position Matching at different learning stages.Fig. 6: The learning process of joint Shoulder Pitch's proprioception.The three small pictures visualize the robot's performance in Joint Position Matching at iteration 10, 3000, 45000, when the target joint angles are all the same: -1.74 (rad) which is visualized by the shaded arm.The solid arm in each small picture is what the robot re-conducts using the currently developed proprioception.
We can see from the learning curves as well as the robot's performances in the three small pictures that with the learning process going on, the robot is able to replicate the given joint angle more and more accurately which means the proprioception is getting better and better, and this is similar to human infants' development.After learning, the testing errors of the four joint proprioception are 0.0658, 0.0387, 0.0471, 0.0353(rad) respectively which indicates that the learned proprioception are all quite accurate, and the proposed model is qualified enough to model the proprioception.

Comparison of the forward models
In this experiment, we compare our proprioception-based neural forward model (PNFM) with the classical neural forward map (NFM) to verify the benefits of using proprioception rather than the original joint angles.The hidden layers of the two structures are both configured with 100 units.The learning rate and iteration time are both the same.
To evaluate the learning results of the two models, we randomly create 56118 pairs of joint angles which are within the range limits of robot's arm.We use each pair of data as motor command and send it to the robot, then the robot uses NFM and PNFM to predict the corresponding hand pose p predicted respectively.We calculate the average predicting error after testing all the 56118 pairs of motor commands.Fig. 7 shows the comparison of the averaged predicting error of NFM and PNFM in the three axes.We can see from Fig. 7 that PNFM has significantly lower mean errors than NFM in all three directions, which verifies that the proprioception-based forward model is more effective than the classical forward model that uses the raw joint angles.Although this idea is inspired by humans, the reason for the improvement may be that the proprioception (feature of data) is both robust to noises and suitable in format which makes it easier to build a more accurate map.Moreover, the mean errors of our proposed PNFM in three axes are all lower than 0.35cm which means it is qualified enough to be used in real tasks.

Comparison of the inverse models
Similar to the FM, we compare the proprioception-based neural inverse model (PNIM) with the classical neural inverse model (NIM).Furthermore, we also compare the cascaded proprioception-based neural inverse model (cPNIM) with PNIM to show that the cascaded structure offers a more efficient way to form the IM.
To compare the effectiveness of the three inverse models in controlling the robot arm, the testing inputs are chosen as 56118 pairs of hand pose which are the expected outputs of forward models.The output of the inverse model is the joint angles which are desired to be able to drive the hand to the expected pose.To evaluate the performance of an inverse model, the robot's arm joints are set according to the output of inverse model, and the difference between the actual hand pose and the expected one is calculated.Fig. 8 shows the averaged error of the three inverse models in the three axes.
From the Fig. 8, we can see that PNIM has better performance than NIM which means the proprioception is also effective in improving the accuracy of inverse model.Further, cPNIM has even lower mean errors than PNIM, which means learning the inverse map of each joint in a cascading way do improve the performance.To some extent, the results verify the findings of Riemann's that movement of one joint may induce movement of another as well as that different joints may play dominant role in separate directions.

Evaluation of the integrated internal models
To show the effectiveness of the internal models in different arm movements, we integrate them to accomplish reaching, grasping and placing tasks in PKU-HR6.0 robot as shown Fig. 9.
In the integration framework, the input is the 3D pose of the object(s) in robot coordination system.The robot determines the expected hand pose p expected according to the specific task and the pose of the object.After determining the expected hand pose p expected , the inverse model (cPNIM in this experiment) outputs the joints proprioception according to p expected .Then the forward model (PNFM in this experiment) predict the hand pose p predicted according to the gotten proprioception.If the difference ∆p between p expected and p predicted is small enough, the robot executes the command according to the proprioception.While if ∆p is too large, the robot chooses not to execute the command and response something like "Sorry, I can't do it".
Table .1 shows the records of the 20 times executions for each task, in which S represents successful trail, F represents failed one and R means the robot responds "Sorry, I can't do it".Both S and R are considered as successful trials.We can see from the records that the successful rates of the three tasks are all quite ideal.However, there are still some failed cases.The reasons may be that the real environment is more unstable and the hand pose calculated by the 3D vision system is not accurate enough.Table 1: The records of the three different tasks in evaluating the integrated framework of the internal models in PKU-HR6.0.Each task is conducted with two different desk hight (1cm and 3.5cm) and random cuboid poses.

Conclusion
In this paper, an approach for a robot to develop its internal models is proposed.The approach mimicking the human infants and enables the robot to learn the internal models by itself through self-produced movements.To summarize our approach, firstly the proprioception is developed using the autoencoder neural networks through the robot motor babbling.Then based on the proprioception rather than directly using the motor command, the forward model and inverse model are built using the deep neural network.As for the more complicated inverse map, a cascaded proprioception-based inverse model (cPNIM) is further proposed to learn the inverse map of proprioception in a cascading manner.The learned forward model and inverse model are then integrated in a feedback framework to fulfill different arm movement tasks including reaching, grasping and placing.
With the experiments, we demonstrated the learning process of proprioception, and also compared the two different forward models and three inverse models.The learning process of proprioception shows that the progress of robot's ability is similar to the cognitive development of human infants.The performance of the learned proprioception verifies that the proposed autoencoder model works well in mimicking the proprioception of human.The comparisons between the two forward models as well as the three inverse models both verified the effectiveness and superiority of using the proprioception rather than the original numerical values of body configuration, and that the cascaded model do improve the performance of the inverse model.The integration of forward model and inverse model provides a more confidential mechanism just like human do in accomplishing arm motion tasks.In the future, we will extend our model to more complex tasks where whole body motion control is involved.

Fig. 1 :
Fig. 1: Illustration of the architecture for learning internal models based on proprioception.The robot learns the proprioception, the forward model and the inverse model during the same process of randomly self-produced arm babbling.Each movement is accompanied by a pair of arm joint angles q and hand pose p which are the raw materials of learning.

Fig. 7 :
Fig. 7: Comparison of the averaged error of NFM and PNFM in three axes.

Fig. 8 :
Fig. 8: Comparison of the averaged error of NIM, PNIM and cPNIM models in the three axes.

Fig. 9 :
Fig. 9: Examples of successful executions of the three arm motion tasks: (a) reaching, (b) grasping and (c) placing.