The Acquisition of Kiwifruit Feature Point Coordinates Based on the Spatial Coordinates of Image

: How to obtain the spatial coordinates of kiwi fruit has been one of the key techniques for kiwi fruit harvesting robot. In this paper, the writer proposes a unique way to obtain the spatial coordinates of the features of kiwi fruit from the bottom of the target fruit based on the growth characteristics and scaffolding cultivation pattern characteristicsof kiwi fruit, plus the help of Microsoft camera and Kinect sensor. Also included in this paper is the coordinate conversion between the images come from Microsoft camera and the images of the Kinect sensor, which is followed by an analysis of the precision of the spatial coordinates of Kiwi fruit captured by the Microsoft camera and Kinect sensor. The process is like this: first, capture images of the target fruit from the bottom of the fruit with Microsoft camera, and then extract coordinates of the target fruits’ feature points to determine the corresponding target fruit feature point coordinates in the Kinect sensor; second, analyze the correspondence between the Microsoft camera image coordinate system and the Kinect sensor image coordinate system so as to establish amathematical model for the image coordinate conversion; finally,capture target feature points’ spatial coordinates with Kinect sensorand conduct tests. The results show that the precision of coordinate conversion mode and Kiwifruit spatial coordinates can meet the requirements of the harvesting robots.

1 Introduction 1   The acreage and production of China's kiwi fruit rank first in the world.However, at present, the kiwi fruit is mainly harvested manually, which is highly labor-intensive.With the progress of urbanization and industrialization, more and more young and middle aged people are attracted to work in cities.As a result, the loss of labor force in agriculture is becoming serious, which in turn raises the cost of agricultural production and lowers market competitiveness of our agricultural products.Therefore, the development of kiwi fruit picking robot is of great significance to the development of China's kiwi fruit industry.
The key techniques of Kiwi picking robot involve three parts: fruit identification, location and nondestructive picking.
The widespread adoption of standardized scaffolding pattern in kiwi fruit production makes robot picking fruit feasible.
However, there are still several factors that hinders the development of kiwi fruit picking robot.Firstly, kiwi plants grow in clusters, each of which is usually composed of 3-5fruits, and the fruits usually grow too close to one another and even overlap.Moreover, foliage sheltering and similar color between fruits and the background make the harvesting robot difficult to perform precise fruit identification and separation as well as feature extraction of fruits.Secondly, kiwi fruit positioning and spatial coordinate acquisition are also problems to be solved for the development of the harvesting robot.The existing fruit and vegetable harvesting robot positioning system is low in precision, time-consuming, complex in structure and high in cost.So it is imperative to develop a new, efficient positioning system.Among the existing fruit and vegetable harvesting robots at home and abroad, some can harvest fruits whose colors are greatly different from the background colors, such as strawberry picking robot [1], tomato picking robot [2][3][4], citrus harvesting robots [5][6], and some can harvest the target fruits whose colors are similar to the background colors, such as cucumber picking robots [7].Such robots usually adopt near-infrared spectroscopy or laser technology for detection.In terms of detection and identification of kiwi fruits, Zhan Wentian [8]used Adaboot algorithm.Ding Yalan [9] used RB color component method to separate kiwi fruits.These two methods can only separate the regions with fruits from the ones without fruits, but they failed to identify the fruit individually.Cui Yongjie [10] used 0.9R-G color features in the fruit image segmentation, but in a complex background environment, this method involves a large amount of calculation and time.In terms of kiwi fruit positioning and coordinate acquisition, some other methods are used, such as monocular vision, binocular vision, multi-purpose vision, hyper-spectrum, laser etc.However, these methods have drawbacks, such as complex computation, low accuracy, high cost, poor reliability and so on.Meanwhile, the conversion among pixel coordinates, spatial coordinates and mechanical arms is still a problem to be solved.
In on-site investigation, it is found that the places below the fruits are spacious with less sheltering and the background is simple, so the writer proposes that the fruits be identified, positioned, and picked from the bottom parts of the plants.The principle is like this: determine the sequence of fruit identification, feature point extraction and fruit picking by using elliptic Hough conversion; acquire the feature point coordinates of the images with Kinect sensors made by Microsoft Company; obtain the image coordinates of feature points by Kinect sensor referring the foreign research results for Microsoft Kinect sensor in robot navigation [11][12] and feature recognition [13][14][15]; finally, conduct coordinates conversion between the camera and sensors and construct mathematical model of the coordinates conversion to obtain the 3D coordinates of the feature points.

Feature Extraction and Image Acquisition
The kiwi fruit pictures were taken in October 2014 during harvest time at Kiwi Experimental Station of Northwest Agriculture and Forestry University and the breed is "Hayward".Camera used is Microsoft Life Camera studio with COMS sensor and auto-focus.Image pixel acquisition is of 640 × 360, jpg format.Each picture containing 2-5 fruitswas taken from the bottom with a distance of 20cm to the fruits.Image acquisition mainly comes from the side and bottom, as shown in Figure 1.In Figure 1, due to the greater scene depth, and complicated background, it can be seen that the images of fruits taken from the side contain not only the leaves of the near-byplant branches, but also distant non-target fruits, and serious mutual occlusion between the target fruits.All of these affect the accuracy of target fruit segmentation and recognition.
In contrast, the picture shot from the bottom has less mutual occlusion between fruits and no interference of other distant non-target fruits, which is favorable for extracting target fruits.Due to the mutual occlusion between fruits, fruit identification can only be performed from outside to inside, picking one by one, which results in low harvest efficiency.
In contrast, the shadow area between fruits is less in image shot from the bottom, which makes it possible that all the fruits can be identified at a time.As a result, the picking sequence can be determined and efficiency improved.
In order to improve fruit recognition success rate under complex background, Cui Yongjieet al [16], researchers of Northwest Agriculture and Forestry University, presented a comprehensive method to identify fruits and extract fruit features according to kiwi fruit characteristics and color features, and elliptical Hough conversion.This method can minimize the impact of different complex background and illumination on the identification and the extraction of fruit features.The specific steps are shown in Figure 2.

2.2Picking Order
Figure 3 is the pixel coordinates of the feature points when each fruit is identified.X-axis is within the range of 0-360, Y axis 0-640.

Fig.3. Coordinates of target fruit
The picking sequence is determined according to the values of the feature point coordinates.In figure 3, 1,2,3,4,5 represent picking sequence, which is determined according to the Y coordinate values of the feature points from small to big, thus the picking arm of the robot reaches minimum stroke, and maximum efficiency in the whole picking process.

3Coordinate Conversion
The principle of Kiwi fruit extraction is shown in Fig 4 .On the left is a front view and on the right is atop view, where 1 stands for kiwi fruits, 2 for Microsoft camera, and 3 for Kinect sensors.The camera is used to identify fruits, extract features and determine the picking sequence.Kinect sensorsare used to obtain the spatial coordinates of the feature points of kiwi fruits.The intersection between the optical center and the outer surface of the infrared camera is used as the origin of the coordinates; the intersection between Microsoft camera lens surface and the optical center is used as the coordinate origin of its coordinate system as is shown in Figure 4  The relationship between Δx and Δy is positive and negative, and the image can be divided into our regions, that is, '1', '2', '3', '4'.WhenΔx < 0 and Δy < 0, it corresponds to region '1', representing that the feature point is on the upper left of the projection point.When Δx < 0and Δy > 0, it corresponds to region '2', showing that the feature point is on the lower left of the projection point.When Δx > 0and Δy > 0, it corresponds to region '3', meaning that the feature point is at the right bottom of the projection point; When Δx > 0 and Δy < 0, it corresponds to region '4', indicating that the feature point is on the upper right of the projection point.
When the distance between the camera and the feature point is h, letone of the image pixels be a (mm), then the three 3D coordinates Xw, Yw, Zw of the feature pointsrecognized relative to the origin are respectively as follows: In addition, the positive or negative values of   and   determines the regions where the feature points distribute.
The Kinect sensor is installedbelow the Microsoft camera with a fairly great distance between them, so that images acquired by the Kinect sensor include the shooting area of the Microsoft camera.As is shown in Figure 5  From equations ( 11) and ( 12), we can get the following formulas: .When Microsoft camera recognizes and extracts pixel coordinates(x, y) of the feature point of kiwi fruits, the corresponding pixel on the Kinect sensor screen is(x', y')in theory.However, in practice, image acquired by the obtained by RGB video camera on the Kinect sensor is just opposite along the left-right direction.That is to say, it is reversed along the direction of axis X.In this case, the value of the coordinates (x, y) of the point corresponding to the coordinates (x'', y'') is (640 − X′,y′).visualization, close infrared spectroscopy, laser scanners etc. but each of them has some problems to be solved.In this study, Microsoft's Kinect sensor is used to obtain the spatial coordinates of the feature points of the fruits, and the development platform is Kinect for Windows SDK.The sensing device is shown in Figure 6 (a).It mainly consists of three parts.They are, from left to right, an infrared projector, a RGB camera, and an infrared camera.The function of the infrared projector is to project near-infrared spectrum actively.As is known, when the infrared spectrum is projected ontothe objects with rough surfaces or ground glass,there would be distorted spectrum, which in turn would will generate random points of reflected light (also called speckles).The speckles are then read by the infrared camera.The infrared camera is used to analyze the close infrared spectrum and to create depth images of the objects within our vision.RGB camera is used to shoot colored images within our vision.The measurement range is shown in Figure 6 (b).The range centers on the infrared camera with upper angle43° and lower angle 43°, 400-4000mm away in front of the video camera.The precision of the depth images captured within this area can reach millimeter.Kinect sensor was used to obtain the images of feature points, pixel coordinates and spatial coordinates.4)Equations ( 13), (14) were verified with the pixel coordinates acquired by Microsoft camera and Kinect sensor and the values of a, b.Errors in coordinate conversion were derived according to the actual pixel coordinates of the feature points while Kinect sensor acquiring images, D-values between the images coordinates derived from equations, D-values between pixel coordinates from Kinect sensor image and calculation, and the actual length represented by each D-value.

Result and Analysis
Through calibration, it is found that the actual length represented by one pixel 200mm away from at Microsoft camera is 0.445mm, that is to say, a=0.445m and that the actual length represented by one pixel 928mm away from the Kinect sensor is 1.32mm, that is to say, b=1.778.In this experiment,we got the pixel coordinates of 24 groups of feature points in different positions on the images captured by Microsoft camera and the images captured by Kinect sensor.In addition, with the help of Kinect sensor, we got the spatial coordinates of the feature points at such positions.
By plugging into formulas ( 13), ( 14    In the diagram, the red curves represent the actual distribution of points, and the blue ones represent the distribution of calculated points.It can be seen that the two curves coincide with each other.Table 1 indicates that the error between the calculated point coordinates and the actual coordinates is less than 3 pixels or 5mm, thus formulas ( 13), ( 14) can accurately reflect the correspondence of a same point on the image acquired by Microsoft camera and on the image acquired by the Kinect sensor.

Conclusions
(1)In light of the kiwi fruit growth characteristics, an automatic identification method is studied, including how to acquire fruit image from the bottom and an integrated application of fruit shape and color features in recognition.
(2)Considering the drawbacks of the existing fruit and vegetable harvest robots, a fresh method based on Kinect sensor is proposed to acquire the coordinates of the target kiwi fruits.
(3)This paper discusses the coordinate conversion between Microsoft camera and Kinect sensor and the mathematical model constructed can perform accurate coordinates conversion.

Find
Extract fruit boundaries by Canny operator Label local region boundaries Compute the minimal bounding rectangle Calculate the most elliptical shape by elliptical Hough transform threshold in 1.1R-G Eliminate noises by a morphological operation and an area thresholding method Is it a fruit boundary?

Fig. 2 .
Fig.2.Flow chart of fruit recognition Fig.4.The Schematic of fruit space coordinates acquisition As is shown in Fig 4, the horizontal distance between the feature point and the Microsoft camera is 'h'; the distance between the feature point and Kinect sensor is 'H', then the distance between the Microsoft camera and Kinect sensor is 'H-h'.When the distance between the Microsoft camera and Kinect sensor remains unchanged, let the spatial coordinates between the camera and Kinect sensor be (X, Y, Z), then we get Figure 5.

Fig. 5 .
Fig.5.Relationship between two coordinate systemsThe coordinate system diagram of Microsoft camera image is shown in Figure5(a) with a pixel area of 640×360.The value of the center coordinates is (320,180), which is also the projection position of the camera's optical center in the image.Letthe pixel coordinates of the feature point A recognized by the camera at the image be (x, y), so the value of the relative pixel coordinates in the image is (Δx, Δy).Then: ∆x = x − 320(1) ∆y = y − 180(2) (b), 'XOY' is the Kinect image screen coordinates, and ′xoy′ is the Microsoft camera image screen coordinates.When the pixel range of the Kinect sensor in capturing image is 640 × 480, the projection of the optical center of the infrared video camera in the image is the pixel coordinates (320 × 240) of the center point of image projected.When the spatial position of the feature points recognized by the Microsoft camera remains unchanged, and let the coordinates of feature point A in the screen coordinate system of the Kinect sensor be x', y', then its relative pixel coordinates to the center point in the image are:∆ ′ =  ′ − 320(6) ∆′ = ′ − 240(7) Supposing one pixel of the image on plane H of the Kinect sensor represents the actual length b (mm), the 3Dcoordinates X  ,  ,  of feature point A relative to the origin of the Kinect sensor are respectively:  = ∆′(8)   = ∆′(9)  = (10) Similarly, the positive and negative of X  , Y  correspond to the locations of four images①②③④ and the actual locations of information.Since the spatial position of the Microsoft camera and Kinect sensors remain unchanged, pixel coordinates of feature point A in the imaged captured by Kinect sensor can be derived from the pixel coordinates of the feature point 'A' in the camera image shot by the Microsoft camera.That is (x − 320) + X = b(x ′ − 320)(11) (y − 180) + Y = b(y ′ − 240)(12) When the values of distance H and h remain unchanged, the values of X, Y, a and b would be constant, so would be

Fig. 6 .
Fig.6.Hardware components and field of view of the Kinect sensor

Fig. 8 .
Fig.8.Test equipment and images acquired 1)In order to facilitate verification testing, the whole coordinate conversion system was inverted: Kiwi fruit was placed at the bottom, and Microsoft camera was placed at the upper part with the Kinect sensor on top.The testing platform constructed is shown in Figure 8 (a).Kinect sensor and Microsoft camera are fixed onto the bracket on the same level with desktop supporting Kiwi fruit.In this test, in order to facilitate verification, the kiwi fruit surface was randomly marked with a point as the recognition feature point.Figure 8 (b) is an image acquired by the sensor and Figure 8 (c) is the image captured by Microsoft camera.2) In this test, the vertical distance from the Microsoft camera to the feature point is 200mm, and the vertical distance from the Kinect sensor to the feature point is 928mm.Graph paper was used to calibrate the actual length between the image plane and the place 200mm away from the Microsoft camera as well as the actual length of one pixel 928mm away from the Kinect sensor (i.e. the values of a and b).The way of calibration is shown in Figure 9, where Microsoft camera is fastened to the height gauge parallel to the coordinate plane.Kinect sensor calibration method is same with Microsoft camera calibration method.

Fig. 9 .
Fig.9.Calibration correspondence between the pixel value and the actual distance 3)Microsoft camera was used to obtain the images of feature points and the pixel coordinates of the feature point.
) the pixel coordinates of point 1 and point 24 on the images acquired by Microsoft camera and the symmetricalimage of the image acquired by Kinect sensor, we have the following results:The value of a/b coincides with the previous calibration.By plugging into formula (13), (14)the values of a/b, X/b, Y/b and the coordinate values of the rest 22 points on the images acquired by the Microsoft camera, we can work out the coordinates of these 22 points on the symmetrical images acquired of the Kinect sensor image.

Figure 13
Figure 13 is a diagram drawn with MATLAB to express the calculated coordinate values and the actual coordinate values.

Fig. 10 .
Fig.10.The calculated and actual coordinate distribution curves

Table 1 .
The coordinates of points in different coordinate systems