An Attribute-Value Block Based Method of Acquiring Minimum Rule Sets: A Granulation Method to Construct Classifier

. Decision rule acquisition is one of the important topics in rough set theory and is drawing more and more attention. In this paper, decision logic language and attribute-value block technique are introduced first. And then realization methods of rule reduction and rule set minimum are relatively systematically studied by using attribute-value block technique, and as a result effective algorithms of reducing decision rules and minimizing rule sets are proposed, which, together with related attribute reduction algorithm, constitute an effective granulation method to acquire minimum rule sets, which is a kind classifier and can be used for class prediction. At last, related experiments are conducted to demonstrate that the proposed methods are effective and feasible.


Introduction
Rough set theory [1], as a powerful mathematical tool to deal with insufficient, incomplete or vague information, has been widely used in many fields. In rough set theory, the study of attribute reduction seems to attract more attention than that of rule acquisition. But in recent years there have been more and more studies involving the decision rule acquisition. Papers [2,3] gave discernibility matrix or the discernibility function-based methods to acquire decision rules. These methods are able to acquire all minimum rule sets for a given decision system theoretically, but they usually would pay both huge time cost and huge space cost, which extremely narrow their applications in real life. In addition, paper [4] discussed the problem of producing a set of certain and possible rules from incomplete data sets based on rough sets and gave corresponding rule learning algorithm. Paper [5] discussed optimal certain rules and optimal association rules, and proposed two quantitative measures, random certainty factor and random coverage factor, to explain relationships between the condition and decision parts of a rule in incomplete decision systems. Paper [6] also discussed the rule acquisition in incomplete decision contexts. This paper presented the notion of an approximate decision rule, and then proposed an approach for extracting non-redundant approximate decision rules from an incomplete decision context. But the proposed method is also based on discernibility matrix and discernibility function, which determines that it is relatively difficult to acquire decision rules from large data sets.
Attribute-value block technique is an important tool to analyze data sets [7,8].
Actually, it is a granulation method to deal with data. Our paper will use the attributevalue block technique and other related techniques to systematically study realization methods of rule reduction and rule set minimum, and propose effective algorithms of reducing decision rules and minimizing decision rule sets. These algorithms, together with related attribute reduction algorithm, constitute an effective solution to the acquisition of minimum rule sets, which is a kind classifier and can be used for class prediction. The rest of the paper is organized as follows. In Section 2, we review some basic notions linked to decision systems. Section 3 introduces the concept of minimum rule sets. Section 4 gives specific algorithms for rule reduction and rule set minimum based on attribute-value blocks. In Section 5, some experiments are conducted to verify the effectiveness of the proposed methods. Section 6 concludes this paper.

Preliminaries
In this section, we first review some basic notions, such as attribute-value blocks, decision rule sets, which are prepared for acquiring minimum rule sets in next sections.

Decision systems and relative reducts
A decision system (DS) can be expressed as the following 4-tuple: , where U is a finite nonempty set of objects; C and D are condition attribute set and decision attribute set, respectively, and C ∩ D = ; V a is a value domain of attribute a; f a : U →V is an information function from U to V, which maps an object in U to a value in V a .
For simplicity, (U, f a are understood. Without loss of generality, we suppose D is supposed to be composed of only one attribute. For for any a  B}, which is known as equivalence class. For any subset X  U, the lower approximation BX and the upper approximation BX of X with respect to B are defined And then the concepts of positive region POS B (X), boundary region BND B (X) and negative region NEG B (X) of X are defined as:

Decision logic and attribute-value blocks
Decision rules are in fact related formulae in decision logic. In rough set theory, a decision logic language depends on a specific information system, while a decision system (U, C ∪ D) can be regarded as being composed of two information systems: (U, C) and (U, D). Therefore, there are two corresponding decision logic languages, while attribute-value blocks just act as a bridge between the two languages. For the sake of Then a decision logic language DL(B) is defined as a system being composed of the following formulae [3]: (2) an atomic formula is a formula in DL(B); (3) if φ is a formula, then ~φ is also a formula in DL(B); (4) if both φ and ψ are formulae, then φ∨ψ, φ∧ψ, φ→ψ, φ≡ψ are all formulae; (5) only the formulae obtained according to the above Steps (1) to (4) are formulae in DL(B).
The atomic formula (a, v) is also called attribute-value pair [7]. If φ is a simple conjunction, which consists of only atomic formulae and connectives ∧, then φ is called a basic formula.
For any x∈U, the relationship between x and formulae in DL(B) is defined as following: ( which is the set of all those objects that satisfy formula φ. Obviously, formula φ consists of several attribute-value pairs by using connectives. Therefore, [φ] is so-called an attribute-value block and φ is called the (attribute-value pair) formula of the block. For DL(C) and DL(D), they are distinct decision logic languages and have no formulae in common. However, through attribute-value blocks, an association between DL(C) and DL(D) can be established. For example, suppose φ  DL(C) and ψ  DL(D) and obviously φ and ψ are two different formulae; but if [φ]  [ψ], we can obtain a decision rule φ→ψ. Therefore, attribute-value blocks play an important role in acquiring decision rules, especially in acquiring certainty rules.

Minimum rule sets
Suppose that φ  DL(C) and ψ  DL(D). Implication form φ→ψ is said to be a (decision) rule in decision system (U, C∪D). If both φ and ψ are basic formula, then φ→ψ is called basic decision rule. A decision rule is not necessarily useful unless it satisfies some given indices. Below we introduce these indices.
A decision rule usually has two important measuring indices, confidence and support, which are defined as: and sup(φ→ψ) are confidence and support of decision rule φ→ψ, respectively.
For decision system DS = (U, C∪D), if rule φ→ψ is true in DL(C∪D), i.e., for any xU x|= φ→ψ, then rule φ→ψ is said to be consistent in DS, denoted by |= DS φ→ψ; if there exists at least object xU such that x |=φ∧ψ, then rule φ→ψ is said to be satisfiable in DS. Consistency and satisfiability are the basic properties that must be satisfied by decision rules.
For object xU and decision rule r: φ→ψ, if x|=r, then it is said that rule r covers object x, and let coverage(r) = {xU | x|= r}, which is the set of all objects that are covered by rule r; for two rules, r 1 and r 2 , if coverage(r 1 )  coverage(r 2 ), then it is said that r 2 functionally covers r 1 , denoted by r 1  r 2 . Obviously, if there exist such two rules, then rule r 1 is redundant and should be deleted, or in other words, those rules that are functionally covered by other rules should be removed out from rule sets.
In addition, for a rule φ→ψ, we say that φ→ψ is reduced if [φ]  [ψ] does not hold any more when any attribute-value pair is removed from φ. And this is just known as rule reduction, which will be introduced in next section.
A decision rule set  is said to be minimal if it satisfies the following properties [3]: (1) any rule in  should be consistent; (2) any rule in  should be satisfiable; (3) any rule in  should be reduced; (4) for any two rules r 1 , r 2  , neither r 1  r 2 nor r 2  r 1 .
In order to obtain a minimum rule set from a given data set, it is required to complete three steps: attribute reduction, rule reduction and rule set minimum. This paper does not introduce attribute reduction methods any more, and we try to propose new methods for rule reduction and for rule set minimum in next sections.

Rule reduction
Rule reduction is to keep the minimal attribute-value pairs in a rule such that the rule is still consistent and satisfiable by removing redundant attributes from the rule. For the convenience of discussion, we let r(x) denote a decision rule that is generated with object x, and introduce the following definitions and properties.   block(x, B).
The proof of Property 2 is also straightforward. This property shows that the problem of judging whether block(x, B) is contained in [x] D becomes that of judging whether f d (y) = f d (x) for all y  block(x, B). Evidently, the latter is much easier than the former. Thus, we give the following algorithm for reducing a decision rule. Let φ = pairs( x, B) and ψ = (d, f d (x)); Step 8.
Let r(x) =φ→ψ; Step 9. return r(x); End. The time-consuming step in this algorithm is to compute block(x, B), whose comparison number is |U||B|. Therefore, the complexity of this algorithm is O(|U||C| 2 ) in the worst case. According to Algorithm 1, it is guaranteed at any time that block(x, B)  [x] D =block(x, D), so the confidence of rule r(x) is always equal to 1.

Minimum of decision rule sets
Using Algorithm 1, each object in U can be used to generate a rule. This means that after reducing rules, there are still |U| rules left. Obviously, there must be many rules that are covered by other rules, and hereby we need to delete those rules which are covered by other rules.
For decision system (U, C∪D), after using Algorithm 1 to reduce each object x  U, all generated rules r(x) constitute a rule set, denoted by RS, i.e., RS = {r(x) | x  U}. Obviously, |RS| = |U|. Our purpose in this section is to delete those rules which are covered by other rules, or in other words, to minimize RS such that each of the remaining rules is consistent, satisfiable, reduced, and is not covered by other rules.
We use decision attribute d to partition U into t attribute-value blocks (equivalence classes): Accordingly, let Step 1.
For each r   do Step 6.
In Algorithm 2, judging if x j coverage(r) takes at most |C| comparison times. But because all rules in i v RS have been reduced by Algorithm 1, the comparison number should be much smaller than |C|. Therefore, the complexity of Algorithm 2 is O(q 2 · |C|) = O(| i v U | 2 · |C|) in the worst case.

An algorithm for acquiring minimum rule sets
Using the above proposed algorithms and related attribute reduction algorithms, we now can give an entire algorithm for acquiring a minimum rule set from a given data set. The algorithm is described as follows.
Algorithm 3: an algorithm for acquiring a minimum rule set from a data set Input: decision system DS = (U, C∪D) Output: a minimum rule set, minRS Begin Step 1.
Use an attribute reduction algorithm to find a reduct of DS, and suppose the reduct is R; Compute U/R, and then select one object in each equivalence class in U/R to constitute a new decision system (U′, R∪D); Step 3.
Reduce each object (rule) in (U′, R∪D) using Algorithm 1, and suppose the obtained rule set is denoted by RS; Step 4.
Use decision attribute set D to partition U′ into several decision classes: Step 6.
Step 7. Return minRS; End. In Algorithm 3, there are three steps used to "evaporating" redundant data: Steps 2, 3, 5. These steps also determine the complexity of the entire algorithm. Actually, the newly generated decision system (U′, R∪D) in Step 2 is completely determined by Step 1, which is attribute reduction and has the complexity of about O(|C| 2 |U| 2 ). The complexity of Step 3 is O(|U′| 2 |C| 2 ) in the worst case. Step . Because this step can be performed in parallel, so it can be more efficient under parallel environment. Generally, after attribute reduction, the size of a data set would greatly decrease, i.e., |U′| << |U|. Therefore, computation time of Algorithm 3 is mainly determined by Step 1, so it has the complexity of O(|C| 2 |U| 2 ) in most cases.

Experiment analysis
This section aims to verify the effectiveness of the proposed methods through experiments. There are four UCI data sets (http://archive.ics.uci.edu/ml/datasets.html) used in our experiments, and they are outlined in Table 1. For missing values, they were replaced with the most frequently occurring value on the corresponding attribute.
We executed Algorithm 3 on the four data sets to obtain minimum rule sets. Suppose that the set of finally obtained decision rules on each data set is denoted by minRS. The indices that we are interesting in and their meanings are as follows. The experimental results on the four data sets are shown in Table 2. From Table 2, it can be found that the obtained rule sets on the four data sets all have very high evaporation ratio, and each rule in these rule sets has certain support. Specially, there are averagely 0.0689*8124 = 560 objects supporting each rule in the rule set obtained on Mushroom. This shows that these rule sets have relatively strong generalization ability. Furthermore, the running time of Algorithm 3 on each data set is not long and hereby can be accepted by users. In addition, Algorithm 1 can guarantee at any time that block(x, B)  [x] D =block(x, D) for all x  U, so the confidence of each rule is always equal to 1, or in other words all the obtained decision rules are deterministic. All these results demonstrate Algorithm 3 is effective and has better application value.

Conclusion
Acquiring decision rules from data sets is an important task in rough set theory. This paper conducted our study through the following three aspects so as to provide an effective granulation method to acquire minimum rule sets. Firstly, we introduced decision logic language and attribute-value block technique. Secondly, we used attribute-value block technique to study how to reduce rules and to minimize rule sets, and then proposed effective algorithms for rule reduction and rule set minimum. Thus, together with related attribute reduction algorithm, the proposed granulation method constituted an effective solution to the acquisition of minimum rule sets, which is a kind classifier and can be used for class prediction. Thirdly, we conducted a series of experiments to show that our methods are effective and feasible.