Word Recognition by Combining Outline Emphasis and Synthesize Background

. Character recognition collects item keywords from images from e-commerce websites; however, it requires a huge amount of training data. In this paper, we propose an eﬃcient method to collect the training data by generating synthesis images and emphasizing outlines to obtain realistic images. The proposed method improves recognition accuracy on both generated images and real images from e-commerce websites.


Introduction
Deep Convolutional Neural Network (DCNN) is a common approach in character recognition of handwritten characters and signs in scenes.Character recognition in scene images consists of two parts: character detection and recognition [4].This technique can be applied to item images from e-commerce websites to collect the item's information.DCNN requires a huge amount of training data in order to obtain high accuracy [3] [5].Although public datasets are available, the range of fonts in these datasets is too small.In order to address this problem, a character synthesis method was proposed to reduce the image collection cost [1], which generates synthesis images by using font data and background images.However, it assumes English characters on a simple background.In this paper, we propose a method to generate synthesis characters and word images using Japanese font data and complex background images for item images from ecommerce websites.In addition, we introduce an approach to emphasize the outline of characters.Our method trained with synthesized images, which include both with and without outline emphasis, improves recognition accuracy.

Proposed method
The proposed method consists of three steps: generation of character images, addition of margins and emphasis of outlines, and synthesis of the characters with a complex background.Figure 1 shows the flow of character image generation.First, character images are generated from character lists with font data; these

Fig. 3. Synthesis of background image
images are then synthesized with a background image.We prepared 22 fonts commonly used in e-commerce websites such as MS Mincho and Yuri Gothic.Character images on complex backgrounds of e-commerce images are sometimes decorated, for example with borders.In order to improve recognition accuracy of such characters, outline emphasis of characters is introduced during character generation.First, a margin is added to the generated image.Then, an outline is added to emphasize characters in the image.At that time, two types of images are generated with different outline thicknesses.Figure 2 shows the flow of the addition of margins to the character image, followed by outline emphasis.
The background region is replaced with a complex background such as an item image.As shown in Fig. 3, the color of the background is green, and the color of the character and the outline are different.The background image is cropped randomly from a banner image of an e-commerce store.
The DCNN is trained by using the synthesis character images.To test the effectiveness of the method, we also applied the proposed method to word synthesis.The word images are generated based on a word list and synthesized with complex backgrounds.

Structure of DCNN
Figure 4 shows the DCNN network structure.The network consists of 4 layers: 3 convolution layers and 1 fully connected layer.The filter size of each layer is 5 × 5. Max pooling is employed for the pooling layer.The fully connected layer has 4,096 units and it employs Dropout [7] during the learning phase.The activation function of each layer is ReLU [6].The output units have 1,253 classes for AdaGrad [8] is used for the optimization method.The mini batch size is 32 and the epoch number is 50.

Evaluation
First, we evaluate the effectiveness of outline emphasis and synthesis with background images using top 5 accuracy.We trained 1,253 characters using 145 images for each (Fig. 5).For evaluation, 227 images collected from e-commerce websites were used.Table 1 shows the result for synthesis images.The method with emphasis and synthesis achieves best performance on top 1 accuracy.The evaluation results of character recognition and word recognition on real images of e-commerce are shown in Table 2 and Table 3, respectively.From Table 2, the method with emphasis performs 12.4% better than baseline, which is without emphasis and synthesis, on top 1 accuracy.On the other hand, the method with synthesis also improves 13.8% than baseline on top 1 accuracy.The combination of emphasis and synthesis achieves best performance with an improvement of 14.0%.The results of word recognition on real images are shown in Table 3.The method with emphasis improves about 20.4% and 19.8% than baseline on top 1 and top 5, respectively.Synthesis is also effective for real images; it improves accuracy by 24.4% and 22.3% than baseline on top 1 and top 5, respectively.The combination of emphasis and synthesis achieves best  x 99.1 99.6 99.8 99.9 99.9 x x 99.6 99.9 99.9 99.9 99.9

Table 1 .
The comparison of character recognition on synthesis images[%]

Table 2 .
Comparison of character recognition on real images[%]