New classification quality estimators for analysis of documentary information: application to patent analysis and web mapping
Abstract
In the procedure of information analysis, one general problem is the evaluation of the results of data classification methods. The complexity of the studied topics combined with the weaknesses of the most widespread objective classification quality estimators, like inertia, may finally led to make use of an expert of the studied domain for a subjective evaluation of the quality of the classification results. In this paper we propose new objective classification quality estimators for both evaluating and optimising the results of the classification and of the mapping methods, especially when they are applied in the domain of documentary databases. We have experienced our estimators in two different ways. The first way consists in using them for comparing the efficiency of the viewpoint's oriented data analysis methods with the efficiency of the global analysis methods on the same set of data, composed of a patent collection. The second way consists in using it for optimising the results of an original Webometrics experiment who combines contents and links classification starting from a large non-homogeneous set of web pages.