*(Last updated: 17. April. 2020)*

This blog post is a supplement for Data Mining instruction at *Business Process Intelligence, RWTH-Aachen*.

### Concept

Association rule aims at discovering interesting relations between variables (mostly sets of variables) in large databases. A typical example is that customers who purchase beer are likely also to buy diapers. Importantly, we need to distinguish frequent itemset and association rules. In essence, frequent itemset is a joint probability, e.g., $P(beer,diaper)$, while association rule is a conditional probability, e.g., $P(diaper|beer)$. Thus, we can say that association rule more likely reflects the *relation* aspect.

In fact, frequent itemsets are part of the calculation of association rules. Frequent itemsets are informally defined as itemsets having high *support*. $support(X)= \frac{N_{X }}{N}$, where $N$ is the number of instances and $N_X$ is the number of instances covering $X$. (You may understand it as the joint probability of elements in $X$).

Association rules are informally defined as relations between two sets having high *confidence*. $confidence(X \Rightarrow Y)= \frac{N_{X \cup Y}}{N*X}=\frac{support*{X \cup Y}}{support_X}$, where $N*X$ is the number of instances covering $X$ and $N*{X \cup Y}$ is the number of instances covering $X$ and $Y$. (You may understand it as conditional probability of two sets $X, Y$).

An association rule is evaluated as “good” if it has higher *support*, *confidence* closer to 1, and *lift* higher than 1.

(To deal with *lift*)

### Association Rule Exercise

Given the example below, let’s evaluate the association rule, $Tea \Rightarrow Coffee$.

It has *support* of $0.15$, *confidence* of $0.75$, and *lift* of $0.83$. Since *support* is low and *lift* is less than $1$, we can say that this rule is not desired.