An Improved Method for Image Retrieval Based on Color and Texture Features

: With the development of technology and Internet,people need more and more information.The carrier of information is also changed from the original text into the current image, video, etc. Image retrieval plays an increasingly important role in the field of study. This paper is a study of image retrieval based on color and texture features, the main algorithms include color histogram,color moment,gray level co-occurrence matrix(GLCM) and Tamura texture features,etc.Then it made further improvements that combined color histogram and Tamura texture features method (we made Tamura texture retriveal again based on color histogram firstly) based on search results.Experimental results demonstrate that the rates of retriveal by comprehensive approach is higher than using a single method.


Introduction
With the development of multimedia network technology,we can get information from many kinds of methods. Besides text,currently digital imaging is widely used in various areas including medical science,industrial manufacturing,aerospace engineering,remote sensing,etc. In order to effectively manage and retrieve image, the traditional methods are to mark the image with text and create relation between them. But sometimes it is difficult to describe the complete meaning of the images using only a few key words, and it is subjectivive(people have different descriptions of the same image).Using the traditional methods lead to differences between input keywords and the keywords in database,so users won't find the target image. To overcome this disadvantage, people proposed a content-based image retrieval [1] which includes extracting the features of the image, matching the target image, and the images in the database, and then we can search the target image. Image features, including text features and visual features , visual features can be described by color, texture and shape.
Color is the most straightforward descriptor of the images, Swain [2] first proposed a method using the color histogram feature to search images. Stricker,took the first three-moment matrix of colors to characterize the content of the image. Texture is an important feature of the image, and extracting the texture feature is also an important way to retrieve image. In 1970s, Haralick proposed a method gray level co-occurrence matrix; Tamura [3] proposed a form of expression texture including coarseness, contrast, directionality, linelikeness, regularity, and roughness.
In this paper, we retrieve images from both color and texture. The main algorithms include color histogram,color moment,GLCM,gray-scale diffidence statistics and Tamura texture features.The recall ratio and precision were used as the performance of different algorithms. Based on the experiments,we also proposed a method combining the color histogram and Tamura texture features. The results demonstrated that the rates of retrieval by comprehensive approach is higher than only using a single method.
This retrieval process is divided into five modules, and the procedure is as follow:

Color Feature
Color is the most widely used visual feature of image retrieval, because color is highly related to objects and scenes of image. The common methods that we used to describe-the color feature contain color moment, color correlogram , color histogram [4] ,etc.
(1)Color Moment Color moment is based on numerical methods and it can describe the distribution of color by calculating the moment.Since the color distribution information mainly concentrated on the lower order moments, so first moment (mean), second moment (variance) and third moment (skewness) are commonly used methods to represent the color distribution of image.The mathematical formula is as follow: p represents the probability of the color image of the i-th color component of the channel appears as the gray pixel j, N represents the number of pixels of the image.
(2)Color Correlation The color correlogram is another expression of the distribution of image color.It reflects the spatial correlation between each two neighboring pixels, as well as the correlation between local pixel distribution and overall pixel distribution.It is easy to calculate and has good effect.

Texture
Texture is an attribute of an image area, it needs to contact the context because the texture of one pixel is meaningless. Texture reflects the relationship between the gray value of image pixels. Some common texture descriptors are introduced as follow.
(1)GLCM [5] Before calculating GLCM,we should preprocess images. The grayscale of images is generally large,so it can be time-saving and reducing the amount of computation greatly if we compress grayscale first.
GLCM describes texture by doing a study of the spatial correlation properties of gray, and it is based on second-order gray level co-occurrence matrix statistical features. It describes as follow: Entropy: Entropy is a measurement of the amount of information of images.It shows the texture complexity and non-uniform of images. When the values of GLCM are uniformly distributed, the entropy is large. (2) Gray-scale diffience statistics Assumed that the gray value of any point (x, y) of the image is g(x, y).A point deviating from (x,y) is (x+∆x,y+∆y), the gray value of it is g(x+∆x,y+∆y). The difference between gray values is: We can get the histograms of and texture features by the formula. In this paper, texture features are the mean, contrast and entropy. Gray value reflects the smooth degree of texture, the smaller the mean, the more smooth the texture. Contrast reflects the clarity of image, the greater the contrast, the more easily distinguish the image.Entropy reflects image texture roughness,the higher the value, the more coarse the texture.

Comprehensive color histogram and Tamura texture feature
Color histogram is the simplest and most common image features, which describes the proportion of different colors in the whole image and has the invariance of rotation, translation and scale.But it is not concerned with the spatial position of each color.Therefore,diffierent objects with same color histogram might be determined as one target when using color histogram based methods.Ultimately,the error of retrieval results increases significantly. It is suitable to use color histogram based methods when images are difficult to segmented or the spatial distribution of color is not important.
Based on the texture of visual perception, Tamura et al.proposed an expression of texture features.
Calculating the six components of texture which is corresponding to the texture features of the six attributes in psychological [6] :coarseness,contrast,directionality,regularity, linelikeness, and roughness.
Typically, coarseness, contrast and directionality is sufficient to express the texture features of images.
Therefore,we get the roughness, contrast and directionality of the image in the database as eigenvectors, calculating the Euclidean distance, matching the image and searching the target image. The following describes the mathematical expression of these three attributes [7] :  As we all know,color [8] describes the surface properties of scene in the image or block as a global features.Because of color is not sensitive to the change of direction and size of the image,it can not capture the local features greatly. Texture reflects the relationship between the gray value of image pixel. The global regularities and local irregularities of the image may be certain or random.From this experiment we can get that single image retrieval method has advantages,but there are also various shortcomings, resulting in retrieval efficiency is not very high.Therefore,in order to remedy these shortcomings, this paper presents a comprehensive method of color histogram and Tamura texture feature.
The method that integrates color histogram and Tamura texture feature can search the image fully.
The main principle is:based on the color histogram,we make secondary use of retrieval by using Tamura texture feature.We compare the similarity of texture after color,judging by the two indexes so that the experimental result is more accurate and has higher retrieval efficiency. Experiment shows that the retrieval rates of comprehensive color histogram and Tamura texture feature is higher than single retrieval method.

Image similarity measure
The measurement of image similarity plays an important role in image retrieval,and it affects the accuracy of image matching. When matching the image features, the common method calculating the distance between two points. The methods of calculating the distance include Absolute distance, Secondary Type distance and Euclidean distance [9] . The Euclidean distance is a commonly used method which has the advantage of low complexity, so this paper measure the similarity of the image by calculating the Euclidean distance. The formula is as follows: Selecting the retrieved picture:  As can be seen from the above results，the number of images that retrieved by five ways is:7,3,3,4,8. So for figure 4.1,the best way of retrieval is Tamura texture, followed by color histogram, the worst way are color moments and gray-scale difference statistics.But this is only one picture in database and is not representative,then we search the remaining 99 images and add up data.

the first method of evaluation
We need to have some indexes to measure if retrieval is accurate and how much the degree of accuracy is. The common indexes are recall ratio and precision ratio [10] .The definition is as follows: Recall ratio: n/N Precision ratio: n/T n refers to the number of images in the query results are associated with the key figure [11] .N refers to the number of images in the test set are associated with the key figure.And T refers to the number of images returned by the query. According to the choice and the number of image database,we know N is 10，T is 20.
Experiments were conducted on each image(total 100) with five kinds of retrieval methods(Color Histogram,Color Moment,GLCM,Gray-scale diffience statistics,Tamura texture features). We get a total of five sets of data and calculate the average recall and precision ratio, the following is a table: The analysis of the results:from the table 5.1 we can get the best retrieval method is color histogram with average precision ratio of 33 and the worst is gray-scale difference statistics with 15.7%.
In addition, the system response time is also a index of retrieval results. From the experimental we can know the fastest retrieval method are color histogram and color moments with only one second.
The gray differential statistics with a time of about 10 seconds followed by GLCM with a time of 20 seconds and the longest is Tamura texture with more than one minute. Therefore, for the purposes of retrieval efficiency, color histogram and color moments have the highest efficiency.
In summary, using color histogram for image retrieval can achieve better results in this image database of experiment.

the second method of evaluation
In addition to calculating recall and precision ratio, we can collate data and get a line chart so that we have seen what is the better retrieval method. In the 100 samples we selected 50 pictures with equal intervals(taking five images of each kind of flower),a line chart is as follow:

Fig.5.1 a line chart of search results
Calculating the average of this 50 pictures with five method, we get the following  Table 5.2 we can see that color histogram is better than others especially for the former 20 pictures with the average number of retrieval 6.82. Tamura texture feature is the next one.
The overall line trend of gray-scale difference statistics located below the other four retrieval method with a poor result, the average number of sheets to retrieve is only 3.02. It is also consistent with the experimental results in Table 5.1.

Improved methods
As can be seen from the above experiment Tamura texture feature and color histogram are better than others, but a single retrieval method is still lower. Therefore, this paper proposes a combination of these two methods to retrieve,I.e., based on the color histogram we do Tamura texture feature again.
For the same Fig 4.1, the experimental results are as follows: In order to make it representative,we selected ten images (numbered 11, 21, ... 81, 91)from database to test and get the number of similar images of the former 10,the   Table 6.1 and Fig 6.2,for the numbered images 51 and 61,we can see that the integrated method is worse than color histogram,but is better than Tamura texture feature. Overall, the efficiency of a single retrieval method is relatively poor, and integrated color histogram method and Tamura texture feature enables retrieval rate increased.

Summary and expectation
This paper is a study of image retrieval based on color and texture features, the five methods we used include color histogram,color moment,GLCM, gray-scale difference statistics,and Tamura texture features.Subsequently we do further comparison after calculating the recall ratio and precision ratio.
From this experiment we can see that color histogram is better than other methods whether on time or the accuracy. Tamura texture feature provides better results than other methods,but it is not suitable for real-time applications because it is time-consuming. The preliminary results showed that all methods based on single feature cannot provide acceptable performance,and thus a retrieve model was proposed using both color-histogram and texture.The experimental results showed that the retrieval accuracy of the improved model are increased.
In future,the study will be focusing on three aspects.In order to avoid effects caused by irregular images,all will be automatically cropped and saved into the database.The retrieval efficiency on large-scale database needs to be further studied and optimized.Additionally,the selection of similarity measurement is crucial to improve the retrieval accuracy,and various similarity calculations should be tested.