An Eﬀicient Approach for Extraction Positive and Negative Association Rules from Big Data

. Mining association rules is an signiﬁcant research area in Knowledge Extraction. Although the negative association rules have notable advantages, but they are less explored in comparaison with the positive association rules. In this paper, we propose a new approach allowing the mining of positive and negative rules. We deﬁne an eﬃcient method of support counting, called reduction-access-database . Moreover, all the frequent itemsets can be obtained in a single scan over the whole database. As for the generating of interesting association rules, we introduce a new eﬃcient technique, called reduction-rules-space . Therefore, only half of the candidate rules have to be studied. Some experiments will be conducted into such reference databases to complete our study


Introduction and Motivations
Since Agrawal's work [1], the extraction of association rules has been on of the most popular techniques for in Knowledge Extraction.An association rule is an implication of the form "if Condition then Result".Association rules may be used for store layout, target marketing, organize promotions of the supermarket, etc.In the literature, there exist two types of association rules: positive and negative rules.An association rule is said to be positive when it considers the presence of variables.It is negative when it considers the absence of these same variables.Although the negative rules have obvious advantages [6,10], they remain less explored in comparaison with positive rules.One of the major disadvantages lies in their difficult extraction, this type increases the exponential costs.Besides, the current approaches [10,11,15,16,18] are limited on the Apriori's data structure and support-confidence pair.While, this data structure imposes the repetitive accesses over the database, which can be costly.In addition, the support-confidence pair is questionable: (i) finding frequent itemsets is very complex in large databases and/or for low minimum support threshold; (ii) the number of rules that can be reduced nevertheless remains high that many prove uninteresting.In order to exceed these notable limits, we propose an efficient approach for mining positive and negative association rules using a new pair, support-M GK .We introduce a new economical technique of support counting, called reduction-access-database, based on the new data structure MatrixSupport and generator concepts.Therefore, a simple pass allows us to extract all the frequent itemsets over the whole database.As for association rules generating, we introduce an efficient method, called reduction-rules-space, partitioning the search space rules.Therefore, only half of the candidate rules are to study.Bazed on these optimizations, we also propose Erapn algorithm, less consumer in memory.We present the experimental evaluation conducted with databases from the literature by showing performances compared to semantically close approach such that RAPN algorithm [14] and Wu's algorithm [19].
The rest of this paper is organized as follows.Section 2 introduces the formal concepts.Section 3 details our approach.Section 4 summarizes our experimental results.Section 5 reviews the related work.A conclusion is given in Section 6.

Preliminaries concepts
This section describes association rules terminology (Subsection 2.1) and limits of the support-confidence pair (Subsection 2.2).

Association rules and Terminology
A transactional context is a triple B = (T , I, R), where T , I and R are finite and not empty sets.An element of I is called item (or attribute).A set of items, called an itemset.An element of T is called transaction (or object) represented by a TID-Transaction IDentifier, and R is a binary relationship between T and I. So, |T | and |I| denotes the total number of transactions and items respectively.The table below represents an example.Given X, Y ⊆ I, ¬X = X = I\X = {t ∈ T |∃i ∈ X : (i, t) / ∈ R} is called the logical negation of X.For example, with the table 1, we have AB = {3, 5}, so AB = {1, 2, 4, 6}.A k-itemset is an itemset of length k.We will use the correspondances (Galois connections [12]) g(I) = {t ∈ T |∀i ∈ I, iRt} and f (T ) = {i ∈ I|∀t ∈ T , iRt}.

The function g is antimonotony: for all
A positive rule is an implication of the form X → Y .It is called negative rule which consider the absence of the item, i.e., X → Y , X → Y and X → Y , where X ∩ Y = ∅.X is called the premise and Y the conclusion.To determine an association rule interesting, two measures are used, support and confidence [1].The support of X is the number of transactions that contain X, defined as supp(X) Denoting by P the intuitive probability measure defined on (T , P(T )) by P (Z) = |Z| |T | for Z ⊆ T , the support of X can be written in terms of P as supp(X) = P (X).The item X is said to be frequent if its support exceeds a minimum support threshold value, minsup ∈ [0, 1], i.e. supp(X) ≥ minsup.The support and confidence of supp(X) , respectively.Thereafter, we will omit the sign union and sometimes write XY instead of X ∪ Y .According to Morgan, we obtain, for all X, Y ⊆ I,

Limit of support-confidence pair
Despite its notable contribution, this pair support-confidence easily selects uninteresting association rules (independence stochastic between two itemsets (For all X, Y ⊆ I, P (Y |X) = P (Y )), or dependence negative (P (Y |X) < P (Y ))).The examples in table 2 illustrate this.In this case, the first four columns give the characteristics of the purchase of products A and B, the last four indicate those of the purchase of coffee and tea.We obtain supp(A∪B) = 0.72 and P (B|A) = 0.9.These reasonably high values lead us to believe that the persons buying A also buy B. However, we find that the confidence is equal to the probability of the conclusion regardless of the premise (i.e.P (B|A) = P (B)), it is a stochastic independence between A and B. The rule A → B that seemed interesting is therefore misleading.On the other hand, we obtain supp(tea ∪ coffee) = 0.2, which assumes that tea favors coffee.However, the share of people buying coffee regardless of whether they also buy tea is higher, it is a negative dependence between tea and coffee.The rule tea → coffee that seemed interesting is therefore misleading.That's why the support-confidence couple sometimes extracts uninteresting rules.The use of other more effective measures is imperative.

Mining Positive and Negative Association Rules
In this section, we introduce our approach for mining positive and negative association rules.It describes in a double problematic: finding frequent itemsets and generating potential valid association rules based on the previously extracted frequent itemsets.The first problem is often complex (in the worst case, it reaches 2 |I| ) and dramatic when one considers the negative items.With the small database from Table 1, we have 1024 different items instead of 32 positive.The second problem is also complex (for an m-itemset, we have 5 m − 2(3 m ) + 1 instead of 3 m − 2 m+1 + 1).From Table 1, we have 2640 different rules instead of 180 classical rules.In these dimensions, it is necessary to select only a part.
In [5,8], we have initiated the solution, which will be refined in Subsections 3.1 (reduction-access-database method) and 3.2 (reduction-rules-space method).

Mining frequent itemsets: Reduction-Access-Database
This is based on two steps: finding (in a single scans) frequent 1 and 2-itemsets, and frequent k-itemsets (k ≥ 3).After that first step, frequent 2-itemsets are used to generate candidate 3-itemsets.The process continues until no more candidate can be generated.Given a minsup, finding the set of frequent itemsets F, defined: As noted, mining frequent itemets is very complex.The worst case concerns the small itemsets (1 and 2-itemsets).To answer this, we develop a new data structure MatrixSupport.The following Table 3 describes its formalism on the small database from Table 1.It is a projection of database B in relation to its attributes.The idea is to acquire data as the structure develops and store it.To each attribute corresponds a cell of the matrix to which we associate the absolute support, noted |υ ij |, expressing the number of times the item υ j appears with the item υ i , where i (resp.j) denotes the i-th line (resp.j-th column) of table.This is then used to identify the relative support, defined by: For example, in Its equivalence class is given by [X] = {X ⊆ I|γ(X ) = γ(X)}.Note that the computational cost of closures is very exponential.However, the following lemma exploits the monotony of support upon set inclusion.
This Lemma 1 indicate that an itemset X is generator if it has no proper subset with the same support.For example, from the table 3 Because, AB is minimal, then it is generator.If the candidate is a not generator, it will be calculated using the following proposition 1. Proposition 1.For all X non generator, supp(X) = min{supp(X )|X ⊂ X}.
Proof.Let I be a set items.Let X and X 1 be two itemsets on I such that X 1 ⊆ X. Due to the monotonicity of support, we have supp(X) ≤ supp(X 1 ).In addition (by assumption), X is not generator, it exists X ⊆ X on I such that supp(X ) = supp(X).However, supp(X 1 ) is minimal in I, so supp(X 1 ) < supp(X ).Finally, supp(X) = supp(X 1 ) = min{supp(X )|X ⊂ X}.
The support of a non generator k size is exactly the smallest support of its (k − 1)-subsets.For example, from the table 3, we have supp(AC) = supp(A) = 3/6, therefore AC and its superset ABC are not generators itemsets.However, the superset of its subset ABC is then obtained by supp(ABC) = min{2/6, 3/6, 4/6} = 2/6.The following properties generalizes this observation.Property 1.Given X ⊆ I, if X is a generator, then ∀Y ⊆ X, Y is a generator, whereas if X is not a generator, ∀Z ⊇ X, Z is not a generator.Theorem 1.Any subset of a generator itemset must also be a generator.Any superset of a nongenerator itemset must also be nongenerator.
Proof.Let X and Z be two itemsets on I satisfy X ⊆ Z.It exists an itemset This theorem is central in search space of frequent itemsets, no pass is done if a candidate is not generator.Only the generator are generated from database.

Generating Association Rules: Reduction-Rules-Space
The most common framework in the association rules generation is the supportconfidence pair.As we already mentioned (see Subsection 2.2), this pair allow the pruning of many associations that are discovered in data, there are cases when many uninteresting may be produced.As such, we use the new pair support-M GK .The next paragraph introduces the new measure, M GK [13,17,19]. Given In equation 3, X favors (resp.disfavors) In our approach, an association rule , else, it is positive approximate rule.The following definition defines the interesting and uninteresting rules.
The range of values for M GK varies in [−1 , 1].Two zones are present: attractive zone and repulsive zone.The first is a zone that ranges from independence (P (Y |X) = P (Y )) to logical implication (P (Y |X) = 1).The second is a zone that ranges from incompatibility (P (Y |X) < P (Y )) to independence.If M GK (X → Y ) = 1, then X and Y are strongly correlated, which denotes the logical implication between X and Y .Moreover, the rule X → Y is exact.Similarly, if M GK (X → Y ) = −1, then X and Y are incompatible.This corresponds to the repulsion limit between X and Y .If M GK (X → Y ) = 0, then X and Y are stochastically independant, moreover, the rule Let minsup ∈ [0, 1] and minmgk ∈ [0, 1] be two minimum thresholds of support and M GK , respectively.The rule X → Y is said to be valid according to our approach if its support supp(X ∪ Y ) is frequent and The set of all valid association rules from B is denoted E RAP N , formally: (4) For the sake of comprehension, we apply this model on a same example in Table 1.The minimum support (resp.minmgk) is equal to 0.1 (resp.0.8).Because, M GK (A → B) = 0 < 0.8, then A and B are stochastically independant, the association rule On the other hand, one has M GK (tea → coffee) < 0, coffee is negatively dependant on tea.This is a situation we should consider the negative association rules.
In the following paragraph, we present our strategies for elimination of uninteresting association rules from B. We show that only half candidates are to study by using the new technique, reduction-rules-space.Indeed, we are interested in partitioning the search space as shown in the following proposition 2.
Proof.Let X and Y be items of I. We first prove, (a) and Y → X are to be studied.That's why our method studies only half of the candidates.The following proposition describes the independence between a pair of two variables.Proposition 3. Given X and Y two itemsets of I, if (X, Y ) is a stochastically independent pair, so are pairs (X, Y ), (X, Y ), (X, Y ).
Since X and Y play symmetric roles, we have the same result for (X, Y ), then replacing Y with Y , for (X, Y ).This proposition 3 is ideal, no association rule can be interesting if (X, Y ) is stochastically independent.We continue our analysis by studying the candidate rules over the attractive class.To do this, we introduce the proposition 4 in order to pruning certain positive and negative association rules.

Proposition 4. For all X and Y
From this property, M GK is implicative.So, the property ( 3) is immediate, it derives from this implicative character of the M GK .The property remains to be shown (4).Indeed, according to proposition 2(2), we have In this proposition 4, the properties (1), ( 2), ( 3) and ( 4) guarantee that if X → Y is valid, then Y → X, X → Y and Y → X will also be the same because M GK of the rule X → Y is less than or equal to those of Y → X, X → Y and Y → X.The set of valid rules of the class is thus derived from the only rule X → Y .This will significantly limit the research space.The following proposition 5 is introduced to loosen certain rules of the repulsive class.= P (XP (Y ) P (X)P (Y ) M GK (X → Y ).By hypothesis, X disfavors Y and X ⊆ Y , we have P (X) ≥ P (Y ) ⇔ P (X) ≤ P (Y ) implie P (X)P (Y ) ≤ P (X)P (Y ), finally

Proposition 5. For all X and
The properties (1), ( 2) and (3) of this proposition 5 indicate that if X → Y is valid, then X → Y , Y → X and Y → X will be valid, because this M GK is less than or equal to those of X → Y , Y → X and Y → X.Only X → Y will make it possible to deduce the interest of the class.

Proposition 6. For all X and
Proof.We show in two cases: (1) X favors Y and (2) The next result of the following proposition makes it possible to characterize the exact negative association rules according to support-M GK pair.
Since X, Y and Z are disjoint 2 to 2, supp(X∪Y ) The corollary 1 is the consequence of the proposition 7.

Corollary 1. Let X and Y be two itemsets on I, for all
Proof.For all Z, such that Z ⊂ X, we have supp(X) > 0. Therefore, by proposition 7, we have Proposition 8. Let X, Y, T and Z be four itemsets of I, such that X favors Y and Z favors T , and X∩Y = Z∩T = ∅, and X ⊂ Z ⊆ γ(X), and = P (T |Z)−P (T ) = P (T |Z)−P (T ) The following Subsection summarizes these different optimizations via the algorithm 1 and the algorithm 3.

Our Algorithm
As we mentioned, our approach describes in a double problematic: mining frequent itemsts (algorithm 1) and generation of potential valid positive and negative association rules (algorithm 3).The algorithm 1 takes as argument a context B, a minsup.It returns a set F of frequent itemsets, where C k denotes the set of candidate k-itemsets, and CGM k the set of generator k-itemsets.The database

2). A join between
the elements of F k−1 is then made (algo. 2 lines 2 to 6).Indeed, two p and q items of F k−1 form a c if, and only if they contain common (k − 2)-itemsets.For example, joining ABC and ABD gives ABCD.However, joining ABC and CDE does not work because they do not contain common 2-itemsets.Once C k has been established, it researches among the elements of C k .If this is the case, it calculates the support in two cases (algo. 1 lines 9 to 13): if c is generator, an access to the database is made to know its support (algo. 1 line 10), otherwise, it is derived from its subsets without going through the database (algo. 1 line 12).The support is then increased (algo. 1 line 14).And, only frequent itemsets are retained in F k (algo. 1 line 17).For the sake of comprehension, we apply this algorithm 1 on a small database B, shown in Table 1.The minimum support is equal to 2/6, where Gen. designates a generator itemset.Results are shown in Fig. 1.After reading the dataset B, D is not frequent, its support is smaller than From this example, our approach does it in a single pass to the database, this is not the case for the existing ones, they do it in 4 passes.
The following algorithm 3 embodies the different optimizations we have defined in above Subsection 3.2.The algorithm 3 takes as argument a set F, thresholds minsup and minmgk, and returns a set E RAP N .It is initialized by the empty set in line 1.Next, for each itemset of F set, the set A is gererated (line 3).For each subset X k−1 of A (line 4), the algorithm proceeds in two recursive steps.The first consists in generating attractive class rules using the single

Algorithm 3 Association Rules Generation
Require: A set F of frequent itemsets, a minsup and minmgk.Ensure: A set E RAP N of valid positive and negative rules. 1: for all (X k−1 ∈ A) do 5: 9: 10: 1−P (Y ) ; 13: 14: Example illustrate of algorithm 3. Indeed, we consider the frequent itemset ABC ⊆ F (cf. Fig. 1).We will study a total  4 below. Because

Complexity of ERAPN algorithm
There are three lenses: average, best and worst case.The first model evaluates the average time, which proves to be very difficult and leaves the framework of this work.The second model estimates the minimal time, which also leaves the framework of this work.We are interested in the last one, because we want to evaluate the costs of calls of the most expensive operations.In what follows, we present the study of the complexity of our Erapn algorithm.This is calculated for each of the two constituting steps: frequent itemsets mining and finding positive and negative association rules.

Complexity of frequent itemsets mining (Algorithm 1):
The algorithm 1 takes as input the transaction context B = (T , I, R).
) for a (m − 2)-itemset, and so an.In sum, we have In the worst case, the total complexity of the Erapn algorithm is of the order of O(2

Experimental results
This section presents the experimental study conducted in order to evaluate the performances of our algorithm.The latter is implemented in R and tested on PC Core i3 and 4GB of RAM running under Windows system.We compare the results with those of Wu and RAPN, conducted out on four databases from UCI, such as Adult, German, Income and Iris.For each algorithm, we have chosen the same thresholds to avoid biasing the results.The following table 6 reports the characteristics of datasets, and the number of positive and negative rules by varying the minimum thresholds minsup and minmgk.Indeed, the first three columns indicate the data characteristics in question, the last fiveteen columns present the different results, where the column labelled "++" corresponds to the type X → Y , column "-+" to X → Y , column "+-" to X → Y , and "--" to X → Y .The behaviour of algorithms varies according to data characteristics.The large database is much more time-consuming to run.In other words, the number of rules increases as thresholds decrease.Except for dense databases (Adult and German) and relatively low thresholds (minsup = 1% et minmgk = 60%), the number of rules (see Table 6) in Wu is 100581 and 89378 for RAPN.They are relatively large, due to the strong contribution of positive rules of type X → Y and negative rules of type X → Y (see table 6), than for Erapn (28784 rules).The rules of type X → Y and X → Y remain reasonable for each algorithm.On less dense databases (Income and Iris), these algorithm gives the reasonable number of rules.Note that RAPN, for Iris data, does not extract the type X → Y (see Table 6) for minsup (resp.minmgk) over 3% (resp.80%).Figure 2 below shows the response times by varying the minupp and keeping minmgk = 60%.They also increase when thresholds are lowered.The execution time of Erapn is faster than that of Wu and RAPN.Erapn gained 7 more times the best response in the worst cases.These different performances can be explained as follows.RAPN and Wu are limited on classical data structure, which requires repetitive access over the whole database.Wu has the lowest performance.One of the main reasons lies in the pruning technique.In this case, the interest measure does not have effective properties for frequent itemsets mining.In addition, the search space of valid association rules can be covered exhaustively.
To this, our algorithm introduces the different optimizations.Therefore, the all frequent itemsets can be traversed only once.Moreover, the search space of rules is only half full.In all cases, our model remains the most selective and concise.

Related Work
Association rules mining is an active topic in Big Data.Apriori algorithm [2] is the first model that deals with this topic.On the other hand, it scans database multiple times as long as large frequent itemsets are generated.Apriori TID algorithm [2] generates candidate itemset before database is scanned with the help of Apriori-Gen function.Database is scanned only first time to count support, rather than scanning database it scans candidate itemsets.Despite their notable contributions, Apriori and Apriori TID [2] algorithms are limited on a single type classical (or positive) association rules.The negative association rules has not been studied.To this, outreach works has been proposed.Brin et al. [10] propose a model generating negative association rules by using the χ 2 measure.It is a first time in the literature the notion of negative relationships.The statistical chi-square is used to verify the independence between two variables.It's also used to determine the nature of the relationship, and a correlation metric.Although effective, the model suffers the problem of space memory due to the chi-square χ 2 was used.In [15], the authors present an approach to mine strong negative rules.They combine positive frequent itemsets with domain knowledge in the form of a taxonomy to mine negative association rules.However, as mentioned in many works [3,14], their approach is hard to generalized since it is domain dependant and requires a predefined taxonomy.Boulicaut et al. [9] present an approach using contraints to generate the association of the form X ∧ Y → Z or X ∧ Y → Z with negations using closed itemsets.Despite its notable contribution, this method is limited of this form.Wu et al. [19] propose an approach for generating both positive and negative association rules.They add on top of the support-confidence framework other two measures, called interest and CPIR for a better pruning of the frequent itemset and frequent association rules, respectively.One of the key problems lies in pruning: no optimized techniques are used, and the search space can be exhaustively explored due to the measure interest was used.In [3], the authors propose an approach for mining positive and negative association rules.They add on top of the support-confidence framework another measure, called Correlation coefficient.Nevertheless, it requires to challenging problem of finding the frequent association rules, their strategy for search space is not optimized, which can be costly.In [16], the authors propose a new algorithm SRM (substitution rules mining) for mining only negative association of the type X → Y .Although effective, SRM algorithm is limited of this only type.In [11], the authors propose the PNAR algorithm.Although obtaining notable contributions, PNAR suffers the high volume of results, due to support-confidence pair wase used.Wilhelmiina proposes the Kingfisher algorithm [18] using the Fisher test.A notable limitation of this model lies in the computation of p-value imposing exhaustive passes over the whole database, which gives the high computational time.Guillaume and Papon [14] propose RAPN algorithm based on support-confidence pair and other measure, M G (M GK [13] modified).Although effective, RAPN suffers relatively the high computational cost on the search space frequent itemsets.
Note that the major handicap of these works stems mainly from the computational costs for frequent itemsets mining (repetitive passes over the whole database) and association rules mining (exhaustive passes over the search space).
Recently, we proposed a new algorithm, Eomf [5], allowing the extraction of frequent itemsets.Therefore, a single pass over the database will extract all frequent itemsets, which significantly reduces the costs of calculation.As for association rules mining, we introduced in [7,8] a new approach allowing the extraction of positive and negative association rules using a new pair, support-M GK .As a result, only half of all candidate rules are studied, which also reduces the search space significantly.In this paper, we combine our works [5,7,8].Ameliorations have been made, especially in terms of accuracy and simplicity.In [5], the path of frequent itemsets space has been quite heavy: the non-generator itemsets are implicitly taken twice for each calculation step, which can be costly.This gap has been corrected in the current work.We introduced a new strategy of the search space via a notable property (cf.property 1) exploiting the monotony concepts of the generator itemsets, which consequently reduces the cost.Therefore, improvements have been added in algorithm 1 (lines 4 to 16), which makes the approach robust.In [7,8], we used the parameter vc α (r) = 1 n−n Y χ 2 (α), to prune association rules.Nevertheless, this parameter presents a notable limit.It requires the exhaustive paths over the whole database to know its values for each candidate association rule, i.e. for a m-itemset, computable in O(2 m ) on its contingency table, it gives 4C k m 2 k , tradionally cost, for all k.This parameter is not very selective, its sometimes eliminates the interesting rules (or robust), but considers the unintersing rules (far from the logical implication), because, of its critical value.In this paper, we try to close this limit using a simple parameter, minmgk ∈ [0, 1], that does not require access to the database.In addition, we introduced effective properties for the search space (cf.propositions 7 to 10).

Conclusion
In this paper, we have studied the problems of positive and negative association rules for Big Data.Further optimizations have been defined.Experiments conducted on reference databases, compared to RAPN and Wu algorithms, have Proof.Let X and Y of I.According to proposition 2(2), we have X disfavors Y ⇔ X favors Y ⇔ Y favors X ⇔ Y favors X ⇔ X favors Y .Thus, due to the implicitive character of M GK , the properties (1) and (2) are then immediate.It remains to show (3).As X favors Y , we get M GK (X → Y ) = P (Y |X)−P (Y ) 1−P (Y ) = P (X)P (Y ) P (X)P (Y ) P (Y |X)−P (Y ) 1−P (Y )

14 :
return C k (algo. 1 line 5).It takes as argument F k−1 , and returns a superset C k .The initialization of C k to the empty set is done in line 1 (algo. RAP N (ABC)| = 72 rules, where 12 positive rules and 60 negative rules.First, we start to study the positive rules.There are 6 possible rules: A → BC, B → AC, C → AB, AB → C, AC → B and BC → A. Since ABC is frequent, then its subests A, B and C are also frequent, which gives the other candidates A → B, A → C, B → C, B → A, C → A and C → B. Indeed, we will study first A → B, A → C and B → C. Given minsup = 0.1 and minmgk = 0.6.Results are shown in Table

Table 1 :
Example of the transactional context B

Table 2 :
Limit of the couple support-confidence

Table 3 :
Formalism of the MatriceSupport in dataset B

Table 3
candidate in t B is built in line 1.Next, F 1 and F 2 are generated in a single pass (algo. 1 lines 2 and 3).The Eomf-Gen function (algorithm 2) is called to generate candidates Algorithm 2 Eomf-Gen Procedure 12se12:supp(c) = min{supp(c )|c ⊂ c};13:F k ← {c ∈ C k |supp(c) ≥ minsup}; //Generate frequent itemsets 18: end for 19: return F = k F k M GK (C → B) = 0.2 < 0.6, then B → C, B → A, C → A and C → B are not valid.So, by Proposition 7 and 8, A → BC, BC → A, B → AC and C → AB are also invalid.Since M GK

Table 5 :
Potential valid association rules according to support-MGK Let n = |T | and m = |I|.There is worst case if the candidate are generators (i.e. 2 I ).The time complexity of support counting for 1 and 2-itemsets is O(m × n) (line 1).The instructions for lines 2-3 are O(2).The cost of finding longest frequent itemsets (i.e.all itemsets of sizes ≥ 3) (lines 4-16) is equal to the sum of the following costs.Eomf-Gen: there are (2 m − m − 1) candidates to generate.Thus, the cost of this procedure is O(2 m − m) (lines 2-13 in the algorithm 2).The cost of support counting of longest candidates is O(n(2 m − m)) (lines 6-16).The time complexity of space frequent itemsets is O(2 m − m) (line 17).The global complexity of this algorithm 1 is therefore O(mn + 2 m − m + n(2 m − m) + 2 m − m) = O(n2 m ).The algorithm takes as input a set of frequent itemsets F, which is obtained from a context B. Its global complexity is linear in |F|, which takes O(2 −1 |F|(5 m − 2(3 m ))).This complexity is obtained by the following instructions.The "for" loop (line 2), which runs through all of the F itemsets, is done in O(|F|) at worst.The second "for" loop (line 4) is O(|A|/2) at worst, because only half of the candidate rules that are traversed in our approach to test their eligibility (instructions 6 to 16).It is carried out in two identical tests (lines 8 and 13).For each of the tests, the possible number of rules generated, at a m-itemset, is equal to 2 2m − 2 m+1 .Which gives

Table 6 :
Caracteristics of Datasets and Results extracts