Classification of Mammograms Using Cartesian Genetic Programming Evolved Artificial Neural Networks

. We developed a system that classiﬁes masses or microcalci-ﬁcations observed in a mammogram as either benign or malignant. The system assumes prior manual segmentation of the image. The image segment is then processed for its statistical parameters and applied to a computational intelligence system for classiﬁcation. We used Cartesian Genetic Programming Evolved Artiﬁcial Neural Network (CGPANN) for classiﬁcation. To train and test our system we selected 2000 mammogram images with equal number of benign and malignant cases from the well-known Digital Database for Screening Mammography (DDSM). To ﬁnd the input parameters for our network we exploited the overlay ﬁles associated with the mammograms. These ﬁles mark the boundaries of masses or microcalciﬁcations. A Gray Level Co-occurrence matrix (GLCM) was developed for a rectangular region enclosing each boundary and its statistical parameters computed. Five experiments were conducted in each fold of a 10-fold cross validation strategy. Testing accuracy of 100 % was achieved in some experiments.


Introduction
Breast cancer is a leading cause of death in women worldwide. The disease has often times no symptoms till it advances to a dangerous level. It is therefore highly recommended to carry out screening tests after a certain age. The best screening method to date is the mammography, in which both the breasts are imaged with low dose x-ray radiation. Nowadays digital mammography has become the standard screening practice.
As mammography is highly subject to the expertise of radiologist and prone to errors, a decision support system based on machine learning is highly desirable.
There are two main indicators for breast cancer using mammography. These are masses and microcalcifications. Often times before a suspicious mass appears in the mammogram, very fine specs appear in the image, which is caused by micro-calcifications (MCs). The MCs that are 1 mm or smaller in diameter and appear in clusters are more likely to be cancerous while those that are larger in size and scattered randomly are often benign. A mass on the other hand is considered benign if its shape is round or oval and the margins are circumscribed, while it is malignant if it is irregular, stellate or micro-lobulated in shape. The density of breast is such that the mammogram shows a very low contrast between parenchymal tissue and a mass and hence very difficult to isolate the two, visually. A number of methods have been developed for computer based classification of the masses and microcalcifications. Before classification the mammogram image is preprocessed using digital filters to enhance the different regions of the image. The masses and microcalcifications are then outlined using segmentation techniques. Before these segmented images could be classified using machine, a machine learning system is trained with a large set of mammograms that are classified by expert radiologists. There are a number of mammogram databases available on the internet. The two most popular of them are Digital Database for Screening Mammography (DDSM) at the University of South Florida 1 and the MIAS 2 In this paper we present our work in classifying the masses and MCs seen in mammograms, as either benign or malignant. The system that we developed assumes that the physician segments the masses or MCs manually using the CROP function, found in most windows based graphic software packages. In order to train our system for classification we had to train it with sample mammogram images from the Digital Database for Screening Mammography (DDSM). These images are available in compressed lossless JPEG format. The Windows versions of the uncompressed images were downloaded under a transfer agreement with Dr. Thomas Deserno (nee Lehmann) Department of Medical Informatics Aachen University of Technology D-52057 Aachen GERMANY, under the IRMA (Image Retrieval in Medical Applications) project. The database contains 2620 cases comprising of Normal, Benign and Malignant, high resolution mammograms. Each case further consists of two views of each breast, the cranio-caudal view and the oblique medio-lateral view. Besides the images, each case has a .ics file that contains information about the patient and the image file. Section 4 gives the details of the work done in preparing the training and testing data for the CGPANN. After training the CGPANN with the former the system is then evaluated with the later.
The paper is organized as follows: The next section titled "Literature Survey" describes the latest research done in the field of mammogram classification. The sections "Cartesian Genetic Programming Evolved Artificial Neural Network (CGPANN)" and "Evolution Strategy" describe the evolutionary computation system that we used for the classification task. The section "Experimental setup" describes the preprocessing and statistical parameters extraction steps needed before mammogram classification. It also describes the network parameters that we chose. The section "Results and Analyses" compares the results of our ex-periments with those of the competitors. Finally, the section "Conclusion and Future work" summarizes the paper and states our intentions for future work in this field.

Literature Survey
A brief review of the work done in the field of mammogram classification is presented below. As the abnormality in mammograms can be in the form of microcalcifications or masses, separate subsections review each of them.

Microcalcifications
In [1] Xuejun et al. presented a number of ANN architectures for detecting microcalcifications (MC) in mammogram images. A back propagation neural network responds to an MC lying in the middle of a region of interest. To reduce the image size, FFT of the image was also determined. Another network with multiple outputs had one of its outputs as 1 when its corresponding input had a microcalcification at its position in the image. A Shift Invariant ANN (SIANN) was also experimented with. The performance of the networks was compared with Triple Ring Filter (TRF) which is a rule based filter [2]. A sensitivity of 95% was achieved by using SIANN and TRF together.
In [3] Issam et al. presented an SVM based microcalcification (MC) classification technique. In [4] Rolando et al. presented a technique in which a mammogram is first preprocessed with a median filter to remove noise. This is followed by binarization. The resulting image is then applied to two Gaussian filters. The optimized difference of Gaussian (DoG) filters is used to enhance those regions in the image that contain bright points. A number of techniques are then applied to this image to detect potential MCs.
In [5] Walker et al. discussed a multi-chromosome Cartesian Genetic Programming in general and its application for classification of microcalcifications (MCs) in mammograms, in particular. The process is termed multi-chromosome Evolution (MCE). About 80% test images were classified correctly. In [6] Zhang et al. presented a technique in which areas marked by radiologists in the DDSM database were resized and fourteen features extracted from each area. These features are represented by genes in a chromosome of a neural-Genetic Algorithm. A subset of features is applied as input and the NN trained. A random population of subsets is generated and each individual is evaluated. Genetic operators of crossover and mutation are applied. The NN with the best classification rate together with the selected feature subset is chosen. It was observed that the highest classification result using the proposed algorithm with NN classifier was 85%.

Masses
In [7]the authors preprocessed and segmented images from popular databases on the basis of grey level distribution and texture using Otsu's method. They ex-tracted the intensity histogram and Gray Level Co-occurrence Matrix (GLCM) features. To improve prediction accuracy the author used a hybrid of Genetic Algorithm (GA) and Linear Forward Selection (LFS) algorithms. For classification they used the machine learning package WEKA and trained the system using J48 decision tree method. The classification accuracy based on the selected features was 86.7%. In [8] the author presents mammogram classification technique using Linear Vector Quantization (LVQ). Best classification accuracy of 99% was achieved with 100 DDSM cases.
In [9] the authors present an algorithm that is based on the principle that a mass segment has uniform texture, its margins should coincide with the maximum gradient points that corresponds to the edges and that the mass profile shape has minimum changes. In [10], [11] the authors present mammogram image segmentation based on Iso-level contour map representation of the image.
In [12] a detailed review of the work done in segmentation and classification of the different anatomical regions of the breast is presented. In [13] the authors used Gray Level Co-occurrence Matrix (GLCM) to extract second order statistical descriptors of the texture from the masses. These features were applied to error back propagation ANN for classification. The best performance result that they obtained had 91.67% sensitivity, and 84.17% specificity.
In [14] the authors made use of the chain code in the overlay files of mammograms, in the DDSM database. A rectangular image is formed from the ROI defined by the chain code. The image is preprocessed and a GLCM formed from it. The Haralick energy descriptors are then determined from the GLCM. Adaptive thresholding is applied to the image so formed to detect edges. A binary image is thus formed. The morphological operation of image filling is applied to all closed path edges. The filled area that has the largest overlapping with the central part of ROI is the segmented mass.
In [15] the authors present a technique for segmentation and classification of masses. For segmentation the highest intensity point inside the ROI is taken as center and radial lines drawn to its margins. Critical point, the point with highest local variance on each line is detected and new ROI drawn by linearly interpolating in between these points. Sixteen different morphological and textural features of the segmented masses are computed and applied to an ANN for classifying the mass as benign or malignant.
In [16] the authors present a technique focusing on utilization of mass margin descriptors called Fourier Transform of Radial Distance (FTRD). These features are applied to an MLP ANN for classification of the mass as either benign or malignant. The best accuracy that they achieved was 92.8%.
In [17] the authors present a mammogram classification system based on computational and human features. These features include shape, size and margins of the mass, patient age and tissue density. The density values are represented in Breast Imaging Reporting and Data System (BI-RADS). Textural features are found using Spatial Gray Level Dependence Method (SGLDM) and Run Difference Method (RDM). Sequential Forward Selection technique and genetic algorithm technique were used for classification.
In [18] the authors present a mass detection technique in which the pectoral muscles are segmented and removed. The mass inside the breast is then segmented using intensity thresholding. The segmented mass is analyzed for its textural features, proposed by Haralick, using the gray level co-occurence matrix (GLCM). The textural indices are applied to a support vector machine (SVM) for training and classification. An average classification rate of 95% was achieved.

Cartesian Genetic Programming Evolved Artificial
Neural Network (CGPANN) In CGPANN a neural network is developed by evolving its inter-node connections, weights, output connection(s) and the activation functions. It's an enhancement to the popular Cartesian genetic programming [19]. The nodes and connections in CGPANN are arranged as a graph with rows and columns. A CGPANN node, shown in fig.1b, consists of a summer and an activation function, similar to an ANN neuron. The summer takes its inputs from other nodes through weighted connections. Each connection additionally has a switch that can be turned On or Off. The number of inputs to each node, number of rows, number of columns and levels-back are predefined. A CGPANN genotype is a string of integers, each representing one of the network quantities. A typical phenotype and its related genotype can be seen in fig.1a. For each node the letters F I W C .. have the following meanings and possible values. F: Activation function-sigmoid or Tangent Hyperbolic; I: The node connected to an input of this node, only a node in a column on the left or a network input are allowed; W: connection weight (-1 to +1); C: on-off switch (0 or 1); O: Network output, all nodes and network inputs are allowed to connect to it. Nodes of the corresponding phenotype are numbered sequentially, starting with the top node in the first column followed by the nodes down the column, followed by all the columns to the right in the same manner. When the genotype is decoded to phenotype, outputs of the nodes that don't connect to any other node are called inactive nodes. During evolution, a certain percentage of genes are randomly picked and mutated to allowed values only. Unlike other ANN configurations that use both crossover and mutation, CGP and its derivative CGPANN give excellent results with mutation only. During the process many inactive nodes become active and vice versa. CGP architectures and its derivative CGPANN have been investigated on a number of problems in the past [20][21][22].

Evolution Strategy
We use a 1+ λ (Number of Offspring) evolution strategy in which λ = 4. The pseudo-code for the algorithm that we use for training and testing is presented below.
-Prepare the set of parameters to be classified and a set of corresponding target outputs.  -Form the initial population of fifty genotypes by assigning random, yet legitimate values to the genes of each genotype as discussed above. -Generate new population with the fittest genotype and its four replicas. Accuracy of the result has been used as the fitness measure for this application. -Randomly mutate 10% (Mutation Rate) of each replica, forming four offspring. -Determine fitness of the parent and the offspring and select the fittest to make the next parent. If the parent and an offspring have equal fitness, priority is given to the offspring over the parent, [23]. The fitness improves from generation to generation until the iterations are stopped. -Evaluate the performance of the trained network by applying the testing data.
-The system is now ready to classify a new pattern. -END

Experimental Setup
We trained our classifier system with 1000 mammogram images of benign and malignant types each. Using the Matlab command GET DDSM GROUNDTRUTH we extracted the image information from the groundtruth file. The overlay file inside the groundtruth file contains the following information: lesion type, assessment, subtlety, pathology, annotations and a chain code representing the region of interest (ROI) marked by expert radiologists. The function is normally used to get a binary image of the ROI but we modified it to return a rectangular image that contains this ROI (see fig.2c). The network trained with parameters from these rectangular images, containing masses and MCs, makes it capable to accept parameters from manually segmented images (using the CROP function) easier. A Gray Level Co-occurrence Matrix (GLCM) of an image is then formed.
The matrix contains information about the frequency of gray level repetition between pixels at a certain offset distance and angle and can be used to extract statistical parameters of the image. Out of the many possible second order statistical texture descriptors, proposed by Haralick [24], we extracted only contrast, energy, homogeneity and correlation. Each of these four parameters were determined for four different angles and a certain distance, using the graycoprops function in matlab. The sample parameters were divided into a training set and a testing set for training and testing our CGPANN. See equations (1) to (4) for the four Haralick's parameters.
Where p(i, j) is the normalized entry in row i and column j of the GLCM, i is the intensity of one pixel while j is that of the next pixel making the pair for GLCM [13]. We get a total of 16 parameters. A 10-fold cross validation strategy was adapted, where ten distinct data sets are formed. Each set has nine parts for training and one for testing. We experimented with the following six distance offsets (D): 4, 8, 12, 16, 20 and 24, and found that for D=20 we got the best average 10-fold accuracy result i.e. Accuracy=90.85%, Sensitivity=86.2% and Specificity=95.5%. We performed five experiments with this offset value, each time using a different seed for random number generator in the initial population in CGPANN. The tenfold results for these experiments are shown in table 1.
The CGPANN network that we used has the following features: Number of nodes= 100 (10rows × 10columns), Inputs per node= 3 and Number of out-puts= 1(Average of ten outputs). An input-output set contains the 16 statistical parameters as inputs and a target output that is 0 for benign and 1 for malignant. In the process of evolution, an initial population of 50 networks was formed randomly. Each network is applied all the 2000 sample parameters and its fitness determined using the following metrics:    Table 1: Average values for training and testing results for five independent evolution runs with offset:20 and 10-fold cross validation; Acc: accuracy, Sen: Sensitivity, Spec: Specificity Results from other researchers for comparison Acc Sen Spec Al Mutaz et al. [13] (mass) 87.9 91.67 84.17 M Vasantha et al. [7] (mass) 86.7 Zhang et al. [6] (MC) 85 Amir Tahmasabi et al. [16] (mass) 92.8 Fatima Eddaoudi et al. [18] (mass) 95 Proposed method (overall 10-fold average ) (Both MC and mass) 90.58 85.32 95.84 Table 2: Comparative results from other authors and the overall 10-fold average from the proposed method; MC: Micro-calcifications, Acc: accuracy, Sen: Sensitivity, Spec: Specificity Where N=Number of samples, A=Actual, T=Target. We chose accuracy as the measure of fitness for our network. Thus in the initial population the network that has the highest accuracy is chosen as the parent for the next generation. The parent, together with four of its mutated replicas, form the next generation. The fittest of these five networks form the new parent. This process repeats till we get the required fitness. In our case the fitness became almost stable after 100,000 generations so we stopped the experiments at 200,000 generations.

Results and analyses
In this project we tried to provide a user friendly mammogram classification system using image statistical parameters and CGPANN. Table 1 shows that although many of the accuracy, sensitivity and specificity values in the 10-fold cross validation strategy test results are above 90%, some are infact 100%. Table  2 shows the comparative results for a few other authors in comparison to our proposed work. All these authors have tried to classify either masses or micro- calcifications alone. In comparison, our method classifies a sample set containing both masses and microcalcifications. Amongst the authors who worked in this area, Al Mutaz et al. [13] used the same Haralick's statistical texture descriptors as we did and the same database for their system training and testing. The main difference however is that they used an MLP ANN as the computational intelligence system for classification while we used CGPANN.

Conclusions and Future Work
In this paper we have attempted to classify mammograms for both masses and microcalcifications. The reason for this versatility lies in the fact that irrespective of the type of abnormality the technique relies on the textural characteristics of the abnormality on which our network is trained. Unlike this approach, most of the other techniques that we reviewed in the literature classify only masses or microcalcifications alone. We also developed the method to convert the chain code for ROIs, associated with the images, into rectangular image sections and training our network with the statistical parameters obtained from them. This makes the system simple to use in a windows based graphical environment for real mammogram classification. We evaluated our system with a 10-fold cross validation strategy and got an accuracy of 100% in some experiments (see table  1). In the current work the mammogram is segmented manually. Our system only classifies the region for benignity or malignancy. In future, we intend to work on automatic mammogram segmentation as well. We would then be able to segment and classify the image on a single platform.

Acknowledgments
Many thanks to TM Deserno, Dept. of Medical Informatics, RWTH Aachen, Germany, for providing the windows version of DDSM mammogram images.