A Web-Based Cooperation and Retrieval Model of Character Images for Ancient Chinese Character Research

The character images are the kernel in the research work of ancient Chinese characters. Nowadays, more and more optical sensing device such as scanners and digital cameras are used for transforming ancient Chinese books into digital images for character research instead of paper documents to overcome the difficulties resulted from the rarity and fragility of ancient documents. How to cooperate the images with researchers in web environment becomes a key topic for the higher efficiency and quality of research work. In this paper, considering the requirements in research work of ancient Chinese characters, a web-based cooperation and retrieval model of character images is constructed which consists of several modules including ancient Chinese image management and retrieval, research work cooperation, research conclusion data management and so on. The key techniques employed in this system are discussed. Firstly, the cooperation mechanism of ancient Chinese character images was designed to make character images cooperate with researchers to avoid the occurrence of the situations that a character image is analyzed by several researchers simultaneously or not assigned to any personal for long time. Secondly, the method of integrated image-text arrangement was constructed to adapt to the special situation of ancient Chinese character research that their study conclusion records often contain the mixture of text and images. Thirdly, the conclusion record sorting algorithm by Chinese character radicals was proposed which could arrange the records according to various kinds of dictionary radical orders to meet the needs of research works. Finally, the global and local retrieval method of ancient Chinese character images is proposed for researchers to search the similar character images in database with the global or local area features of character images. The experimental results show that the system constructed with the proposed model has effective assisting function for the research work of ancient Chinese characters.


INTRODUCTION
The research on ancient Chinese characters is a meaningful work for the digitalization of ancient documents and the promotion of popularization and dissemination of Chinese civilization.
In a large scale research project in this field, researchers are often confronted with large amounts of characters to be studied.To identify the attributes of an ancient Chinese character including its pattern, pronunciation and meaning, researchers must apply their knowledge about ancient Chinese characters sufficiently, read a lot of related references and discuss with other researchers frequently.Meanwhile, it is also necessary for scientific management among researchers, the research task and the research resources including studied character images and related references.No doubt, employing information technology could bring many conveniences and improve the efficiency for these activities.
The number of Chinese characters is hard to be determined.At present, many ancient Chinese characters have still not been included in existing coded character set, which brings many problems when they are processed in computers.As we all know, the characters in computers are stored in code mode for transmitting and processing conveniently.When a character is to be displayed or printed, the corresponding character pattern is loaded by operating system according to its code.So, a character beyond the coded character set could only be displayed or printed in image mode, which will result in the larger memory consuming and compatibility issues with normal text.Moreover, these character images could not be searched with normal text retrieval techniques, they could only be found out with the help of more complex technology of image retrieval.Unfortunately, most of ancient Chinese characters to be researched are not coded and have to be treated as images.These will bring many problems when they need processed together with ordinary text which consists of coded characters.
Through the aforementioned analysis, it is necessary to research and develop a targeted assisting model for ancient Chinese character research.
The related theory and technology of constructing a managing model for assisting the research work of ancient Chinese characters based on network are related to the theory and technology in many fields including CSCW (Computer Supported Cooperative Work), 1 network, database management system, image processing and retrieval.In detail, the model building relates to website construction, ancient Chinese character resource management and cooperation, image-text integrated arrangement, ancient Chinese character sorting according to different dictionary radicals, ancient Chinese character image retrieval, and so on.
The theory of CSCW has been researched for many years which provide a basic support for our model construction.Its object is to construct a computer based system with which people could cooperate with each other to accomplish a common task cooperatively. 1 Rama and Bishop compared the CSCW groupware system including three commercial systems and four academic systems and designed a set of multidimensional criteria for comparing CSCW systems. 2 Penichet et al. proposed a classification method of CSCW system based on logical principles in a flexible and appropriate way. 3 Chen discussed the key problems of cooperative platform system including system hierarchy structure, user interface, consistency maintenance, concurrent control, access control and record management.A prototype system RITIS(Real-time Image and Text Interactive System ) composed of clients and servers with centralized and peer-to-peer structure was constructed.It supports the real-time image and text interaction in Internet environment and multiuser interface of WYSIWIS. 4 Image-text integrated arrangement is to organize text and images in a layout concertedly.To realize this object, the spatial information of images and characters must be recorded and utilized. 5- 6][9][10][11][12] Yang and Cheng designed a scheme to realize image-text mixed arrangement based on XML.They employed the design pattern of MVC to separate text and its view for minimizing the coupling degree among modules and improving the extensibility of system with java language.Because of the platform independent attribute of java and XML, the proposed prototype system has better flexibility. 5Lu proposed a method to realize the storage technology of image-text mixed arranging documents and their online edition using the open source server control FreeTextBox and ASPJpeg of ASP.net. 6hang et al. put forward a B/S mode based question bank management system.Through using DSOFramer container and the edition function of Word, they realized the input, edition and composition of test paper containing image-text integrated questions. 7Fan discussed the method of image-text mixed arrangement of test paper in which not only text but also images frequently appear.He classified the layouts into three types called non-image layouts, single image layouts and multiple images layouts.VB and SQL server are used to process and manage the three types of layouts respectively. 8Zhang and Chen studied the collection, storage and extraction methods of test paper which contain text, formulas, tables and images.They use Delphi as the developing tool and realize the import and export of test papers. 94][15][16][17][18][19][20] Yang researches on image clustering and its application in image retrieval.Through analyzing the existing image clustering features and algorithms, an image retrieval system based on image clustering is designed.The images in library are clustered with AP algorithm and an image index is built firstly.Then, the sample image is searched in the index to find the corresponding class.And the following image matching operation is fulfilled only in this class. 13Xie studied on the topic of image clustering and retrieval.To solve the problem existing in traditional image clustering algorithms, an image clustering method based on MRF is proposed which transforms the clustering task into energy minimization process.And a local image retrieval method is designed with graph cutting mode. 14Zhuang et al. proposed a novel method of retrieving Chinese calligraphic characters.The images of Chinese calligraphic characters are matched by the feature of approximate point correspondence algorithm.After the contour points are extracted, the approximate point correspondence is computed and the matching operation of character images is run according to their accumulated matching cost. 15Zhuang et al. put forward a retrieval method of Chinese calligraphic manuscript images based on probabilistic indexing structure called PMF-Tree (Probabilistic Multiple-Feature-Tree). Integrated features are used in retrieval such as contour points of character images, character styles and types.The characteristic of this method is that users are allowed to select one of above features as retrieval components. 16Chen proposed an image retrieval method based on integrated features of global statistic feature and local bitmap feature.The mean-variance of RGB values of images are calculated as the global feature.Then, the image is divided into sub areas to get the mean value with binaryzation processing as the local feature.Finally, image retrieval program is run with the combination of the global and local features. 17Kong et al. design a semi-supervised image retrieval method.The characteristic points of an image are extracted with improved Harris algorithm.The image is divided into the regions of interest and the color and texture features are extracted.Then, the semantic relation between the image and its class is established through semi-supervised learning in image feature space.Finally, the similarity between images and class centers are computed. 18he theory and technology on the construction of network stations assisting for research work have become mature; the details of them will not be discussed here.
The above work laid the foundation for our research and developing work.In this paper, considering the requirements of ancient Chinese character research, a web-based cooperation and retrieval model of character images for ancient Chinese character research is constructed which is composed of several modules including ancient Chinese image management and retrieval, research work cooperation, research conclusion and reference management.The key techniques employed in this system are discussed such as the character image cooperation mechanism, image-text integrated arrangement, conclusion data sorting of ancient Chinese character research according to Chinese characters radicals, global and local retrieval of ancient Chinese character images, and so on.
The paper is organized as follows.Section 2 outlines the overall architecture and functions of the model.In Section 3, the key techniques employed in the model are analyzed and introduced.The experimental result is introduced and analyzed in Section 4. Finally, conclusions and the further work are discussed.

ARCHITECTURE OF THE COOPERATION AND RETRIEVAL MODEL
The object of the model is to realize the cooperation management among character images to be studied, researchers and the records of research conclusions.Meanwhile, it provides the image and document retrieval service for Chinese character researchers in the process of research work.The architecture of the model is shown in Fig. 1.

Fig. 1. The architecture of the cooperation and retrieval model
The input data of the model is the images of single ancient Chinese character.Ancient books are digitalized with optical sensing devices (scanners or digital cameras) to form the layout images firstly.Then, page layout analysis and character image segmentation program is employed to segment these layout images into a series of single character images supplied to ancient Chinese character researchers.
The output data of the system is the records of research conclusion data which includes the pattern, pronunciation, meaning and so on of each character image.
Based on the requirement analysis to the ancient Chinese character researchers, the design principles of the model are as follows.
Principle 1. Uniqueness principle.Each image is to be given a unique key code when it is storied into the library of character images and should be allocated to only one researcher for studying.
This could avoid the occurrence of the situation that one character image is assigned to more than one researcher at any time which will result in the confusions of conclusions.
Principle 2. Hierarchy principle.The users of the model are divided into different levels with different authority according to their roles in research work.
Users of the platform with different authorities have different operation scopes, which could effectively avoid the fault operations to the research data.
Principle 3. Independence principle.The research conclusions of a researcher about a character image could only be modified by himself.Other people could give suggestions to him rather than change his research records.
This item is to protect the data of research conclusions from modified by other people rather than original researcher of the character image himself.
Principle 4. Compatibility principle.No matter coded characters or images of no coded characters, the system could organize them normally with the mode of image-text integrated arrangement.
The image-text integrated arrangement problem exists not only in the display operation of research data but also the import and export of the research records in database.So, it is necessary to design a special structure to tackle these problems to ensure the normal use of conclusion records by researchers.
The data flow diagram of the model is shown in Fig. 2.

Cooperation Mechanism of Research Task
According to the principles of the relationship among researchers, images and research conclusion records, a field controlling operation must be done in data library of different image elements.
Assume ResearchState to be a field in ancient Chinese character image library, SelectID to be a field in researcher library and ExpertNumber to be a field in research conclusion library.The definition of the field value in cooperation is shown in Table 1.

Research conclusion/ ExpertNumber
Primary Key of researcher: The record could only be modified with the researcher Through proper setting operations to the semaphores shown in Table 1, the assigning principles of research resource could be abided to ensure the normally running of research work.

Image-text Integrated Arrangement
Image-text integrated arrangement mainly consists of three modules: image-text integrated display, edition and their import and export of research data.

Image-text integrated display
In this module, Literal control is employed to realize the image-text integrated display.The coded characters and the tags of image addresses are stored in character strings of Literal.The text attributes of Literal control is linked to Bind ("literal").The image-text integrated display is shown in Algorithm 1.

Input:
The content of research conclusions to be displayed in database.

Fig. 3. Mapping operation of radical sorting of research conclusion records
Then, the series number of record ACC i in sorted list could be calculated as ( ) where PR ( ) ( ) is the number of the records whose RS(RC k) is less than RS(RC j ), EQ ( ) ( ) is the number of the records whose RS(RC l ) is equal to RS(RC j ) and ACC l <ACC j .

Ancient Chinese Character Image Retrieval
A retrieval algorithm of character images is specially designed in the model to assist researchers to find the local or global similar character images in database.It contains not only the traditional image retrieval functions oriented on the whole area or partial area of a character image, but also a new image searching style called image retrieval in symmetrical areas of character images for searching radicals in Chinese characters.
Assume A to be the area of an ancient Chinese character image which is composed of the sub area a ij   ,( 0,1, , 1; 0,1, , 1) where m is the row number and n is the column number of meshes divided according to the principle of elastic mesh 19 within the character image A as shown in Fig. 4(a).The directional line elements feature 20 is extracted in sub areas a ij to form corresponding feature vector as shown in Fig.

4(b).
  ,( 0,1, , 1; 0,1, , 1) where f ij consists of four directional components.where γ C1 and γ C2 is the coordinate of the vertical margin of A C , μ C1 and μ C2 is the coordinate of the horizontal margin of A C .The pixel area of A C in character image is shown in Fig. 7, while A W includes the whole area of character image.

Fig. 2 .
Fig. 2. Data flow diagram of the model

Fig. 4 .
Fig. 4. Area division and feature extraction of character image A To improve the efficiency of image retrieval, a hierarchy strategy is employed in which a character image A is clustered into sub clusters previously according to the typical areas A U , A D , A L , A R , A C and A W defined by the structural characteristics of ancient Chinese characters.The local areas A U and A D in vertical are defined as: U 1

Fig. 5 .
Fig. 5. Area division and feature extraction of AU and AD in character image A The local areas A L and A R in horizontal are defined as: L 1 0 L 0 m ij i j A a      UU

Fig. 6 .
Fig. 6.Area division and feature extraction of character image A The local areas A C and A W are defined as: C2 C2

Table Ⅰ .
Cooperation attributes in library.