UNIT-5
Association Rules and Clustering
• Association rule mining finds interesting associations and relationships among large sets of data items. This rule shows how frequently a itemset occurs in a transaction. A typical example is Market Based Analysis.
• Market Based Analysis is one of the key techniques used by large relations to show associations between items. It allows retailers to identify relationships between the items that people buy together frequently.
• Given a set of transactions, we can find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction.
TID | ITEAMs |
1 | Bread, Milk |
2 | Bread, Diaper, Beer, Eggs |
3 | Milk, Diaper, Beer, Coke |
4 | Bread, Milk, Diaper, Beer |
5 | Bread, Milk, Diaper, Coke |
Before we start defining the rule, let us first see the basic definitions.
- Support Count ():– Frequency of occurrence of an itemset.
Here ({Milk, Bread, Diaper}) =2
- Frequent Itemset: – An itemset whose support is greater than or equal to minsup threshold.
- Association Rule: – An implication expression of the form X -> Y, where X and Y are any 2 itemsets.
Example: {Milk, Diaper}->{Beer}
5.1.1. Rule Evaluation Metrics:-
- Support(s):–
The number of transactions that include items in the {X} and {Y} parts of the rule as a percentage of the total number of transaction. It is a measure of how frequently the collection of items occurs together as a percentage of all transactions.
Support = sigma (X+Y) /total
It is interpreted as fraction of transactions that contain both X and Y.
- Confidence(c):-
It is the ratio of the no of transactions that includes all items in {B} as well as the no of transactions that includes all items in {A} to the no of transactions that includes all items in {A}.
Conf(X=>Y) = Supp(XUY)÷Supp(X) –
It measures how often each item in Y appears in transactions that contains items in X also.
- Lift(l):–
The lift of the rule X=>Y is the confidence of the rule divided by the expected confidence, assuming that the itemsets X and Y are independent of each other.The expected confidence is the confidence divided by the frequency of {Y}.
Lift(X=>Y) = Conf(X=>Y) Supp(Y) –
Lift value near 1 indicates X and Y almost often appear together as expected, greater than 1 means they appear together more than expected and less than 1 means they appear less than expected.Greater lift values indicate stronger association.
- Example – From the above table, {Milk, Diaper} => {Beer}
The Association rule is very useful in analyzing datasets. The data is collected using bar-code scanners in supermarkets. Such database consists of a large number of transaction records which list all items bought by a customer on a single purchase.
So the manager could know if certain groups of items are consistently purchased together and use this data for adjusting store layouts, cross-selling, promotions based on statistics.
Key Takeaways
- Association rule mining finds interesting associations and relationships among large sets of data items. This rule shows how frequently a itemset occurs in a transaction
- Support Count ():– Frequency of occurrence of an itemset.
- Frequent Itemset: – An itemset whose support is greater than or equal to minsup threshold.
- Association Rule: – An implication expression of the form X -> Y, where X and Y are any 2 itemsets.
- Rule Evaluation metrices:- Support , Confidence and Lift.
- If the items or attributes in an association rule reference only one dimension, then it is a single-dimensional association rule.
- For example, the rule
computer => antivirus_software [support = 2%, confidence = 60% could be written as
buys(X, "computer”) = buys(X, “antivirus software")
- Apriori is the best-known algorithm to mine association rules. It uses a breadth-first search strategy to count the support of itemsets and uses a candidate generation function which exploits the downward closure property of support.
- Apriori is a classic algorithm for learning association rules. Apriori is designed to operate on databases containing transactions (for example, collections of items bought by customers, or details of a website frequentation).
- Other algorithms are designed for finding association rules in data having no transactions (Winepi and Minepi), or having no timestamps (DNA sequencing).
- As is common in association rule mining, given a set of itemsets (for instance, sets of retail transactions, each listing individual items purchased), the algorithm attempts to find subsets which are common to at least a minimum number C of the item sets.
- Apriori uses a “bottom up” approach, where frequent subsets are extended one item at a time (a step known as candidate generation), and groups of candidates are tested against the data.
- The algorithm terminates when no further successful extensions are found.
- The purpose of the Apriori Algorithm is to find associations between different sets of data. It is sometimes referred to as “Market Basket Analysis”. Each set of data has a number of items and is called a transaction.
- The output of Apriori is sets of rules that tell us how often items are contained in sets of data. Here is an example
- Each line is a set of time
Create table:
Alpha | Beta | Gamma |
Alpha | Beta | Theta |
Alpha | Beta | Epsilon |
Alpha | Beta | Theta |
- Apriori, while historically significant, suffers from a number of inefficiencies or trade-offs, which have spawned other algorithms.
- Candidate generation generates large numbers of subsets (the algorithm attempts to load up the candidate set with as many as possible before each scan).
- Bottom-up subset exploration (essentially a breadth-first traversal of the subset lattice) finds any maximal subset S only after all
2(rais to s)-1 of its proper
5.3.1. Pros of Apriori Algorithm:
- Easy to understand and implement
- Can use on large item sets
5.3.2. Cons of Apriori Algorithm:
- At times, you need a large number of candidate rules. It can become computationally expensive.
- It is also an expensive method to calculate support because the calculation has to go through the entire database.
5.3.3. Limitations of Apriori Algorithm:
- The process can sometimes be very tedious.
Key Takeaways
- Apriori is the best-known algorithm to mine association rules. It uses a breadth-first search strategy to count the support of itemsets and uses a candidate generation function which exploits the downward closure property of support.
- Apriori is a classic algorithm for learning association rules. Apriori is designed to operate on databases containing transactions
- Apriori uses a “bottom up” approach, where frequent subsets are extended one item at a time
- The purpose of the Apriori Algorithm is to find associations between different sets of data.
- Association rule mining finds interesting associations and relationships among large sets of data items. This rule shows how frequently a itemset occurs in a transaction. A typical example is Market Based Analysis.
- Association rules are usually required to satisfy a user-specified minimum support and a user-specified minimum confidence at the same time. Association rule generation is usually split up into two separate steps:
- A minimum support threshold is applied to find all frequent itemsets in a database.
- A minimum confidence constraint is applied to these frequent itemsets in order to form rules.
- While the second step is straightforward, the first step needs more attention.
- Finding all frequent itemsets in a database is difficult since it involves searching all possible itemsets (item combinations).
- The set of possible itemsets is the powerset over and has size (excluding the empty set which is not a valid itemset).
- Although the size of the power-set grows exponentially in the number of items in , efficient search is possible using the downward-closure property of support (also called anti-monotonicity) which guarantees that for a frequent itemset, all its subsets are also frequent and thus no infrequent itemset can be a subset of a frequent itemset.
- Exploiting this property, efficient algorithms (e.g., Apriori\and Eclat) can find all frequent itemsets.
5.4.1. Other type of General Association Rules:
- Multi-Relation Association Rules: Multi-Relation Association Rules (MRAR) is association rules where each item may have several relations. These relations indicate indirect relationship between the entities.
- Consider the following MRAR where the first item consists of three relations live in, nearby and humid: “Those who live in a place which is nearby a city with humid climate type and also are younger than 20 -> their health condition is good”. Such association rules are extractable from RDBMS data or semantic web data
- Contrast set learning is a form of associative learning. Contrast set learners use rules that differ meaningfully in their distribution across subsets.
- Weighted class learning is another form of associative learning in which weight may be assigned to classes to give focus to a particular issue of concern for the consumer of the data mining results.
- High-order pattern discovery facilitate the capture of high-order (polythetic) patterns or event associations that are intrinsic to complex real-world data.
- K-optimal pattern discovery provides an alternative to the standard approach to association rule learning that requires that each pattern appear frequently in the data.
- Approximate Frequent Itemset mining is a relaxed version of Frequent Itemset mining that allows some of the items in some of the rows to be 0.
- Generalized Association Rules hierarchical taxonomy (concept hierarchy)
- Quantitative Association Rules categorical and quantitative data
- Interval Data Association Rules e.g. Partition the age into 5-year-increment ranged
- Sequential pattern mining discovers subsequences that are common to more than minsup sequences in a sequence database, where minsup is set by the user. A sequence is an ordered list of transactions.
- Subspace Clustering, a specific type of Clustering high-dimensional data, is in many variants also based on the downward-closure property for specific clustering models.
- Warmr is shipped as part of the ACE data mining suite. It allows association rule learning for first order relational rules
Key Takeaways
- Association rule mining finds interesting associations and relationships among large sets of data items
- Association rules are usually required to satisfy a user-specified minimum support and a user-specified minimum confidence at the same time. Association rule generation is usually split up into two separate steps:
- A minimum support threshold is applied to find all frequent itemsets in a database.
- A minimum confidence constraint is applied to these frequent itemsets in order to form rules.
5.5.1. Introduction: -
- Clustering is the task of dividing the population or data points into a number of groups such that data points in the same groups are more similar to other data points in the same group than those in other groups.
- In simple words, the aim is to segregate groups with similar traits and assign them into clusters.
- Let’s understand this with an example. Suppose, you are the head of a rental store and wish to understand preferences of your costumers to scale up your business.
- Is it possible for you to look at details of each costumer and devise a unique business strategy for each one of them? Definitely not.
- But, what you can do is to cluster all of your costumers into say 10 groups based on their purchasing habits and use a separate strategy for costumers in each of these 10 groups. And this is what we call clustering.
5.5.2. Clustering Method: -
Broadly speaking, clustering can be divided into two subgroups :
- Hard Clustering:-
In hard clustering, each data point either belongs to a cluster completely or not. For example, in the above example each customer is put into one group out of the 10 groups.
2. Soft Clustering:
In soft clustering, instead of putting each data point into a separate cluster, a probability or likelihood of that data point to be in those clusters is assigned. For example, from the above scenario each costumer is assigned a probability to be in either of 10 clusters of the retail store.
5.5.3. Types of clustering algorithms:-
Since the task of clustering is subjective, the means that can be used for achieving this goal are plenty. Every methodology follows a different set of rules for defining the ‘similarity’ among data points. In fact, there are more than 100 clustering algorithms known. But few of the algorithms are used popularly, let’s look at them in detail:
- Connectivity models:-
- As the name suggests, these models are based on the notion that the data points closer in data space exhibit more similarity to each other than the data points lying farther away.
- These models can follow two approaches. In the first approach, they start with classifying all data points into separate clusters & then aggregating them as the distance decreases.
- In the second approach, all data points are classified as a single cluster and then partitioned as the distance increases. Also, the choice of distance function is subjective.
- These models are very easy to interpret but lacks scalability for handling big datasets. Examples of these models are hierarchical clustering algorithm and its variants.
- Centroid models:-
- These are iterative clustering algorithms in which the notion of similarity is derived by the closeness of a data point to the centroid of the clusters.
- K-Means clustering algorithm is a popular algorithm that falls into this category.
- In these models, the no. Of clusters required at the end have to be mentioned beforehand, which makes it important to have prior knowledge of the dataset. These models run iteratively to find the local optima.
- Distribution models:-
- These clustering models are based on the notion of how probable is it that all data points in the cluster belong to the same distribution (For example: Normal, Gaussian).
- These models often suffer from overfitting. A popular example of these models is Expectation-maximization algorithm which uses multivariate normal distributions.
- Density Models:-
- These models search the data space for areas of varied density of data points in the data space.
- It isolates various different density regions and assign the data points within these regions in the same cluster. Popular examples of density models are DBSCAN and OPTICS.
5.5.4. Partitioning Clustering:-
In this method, let us say that “m” partition is done on the “p” objects of the database. A cluster will be represented by each partition and m < p. K is the number of groups after the classification of objects. There are some requirements which need to be satisfied with this .
Partitioning Clustering Method are: –
- One objective should only belong to only one group.
- There should be no group without even a single purpose.
- There are some points which should be remembered in this type of Partitioning Clustering Method which are:
1. There will be an initial partitioning if we already give no. Of a partition (say m).
2. There is one technique called iterative relocation, which means the object will be moved from one group to another to improve the partitioning.
5.5.5. Hierarchical Clustering:-
- A Hierarchical clustering method works via grouping data into a tree of clusters. Hierarchical clustering begins by treating every data points as a separate cluster. Then, it repeatedly executes the subsequent steps:
- Identify the 2 clusters which can be closest together, and
- Merge the 2 maximum comparable clusters. We need to continue these steps until all the clusters are merged together.
- In Hierarchical Clustering, the aim is to produce a hierarchical series of nested clusters.
- A diagram called Dendrogram (A Dendrogram is a tree-like diagram that statistics the sequences of merges or splits) graphically represents this hierarchy and is an inverted tree that describes the order in which factors are merged (bottom-up view) or cluster are break up (top-down view).
- Agglomerative:-
- Initially consider every data point as an individual Cluster and at every step, merge the nearest pairs of the cluster. (It is a bottom-up method).
- At first every data set is considered as individual entity or cluster. At every iteration, the clusters merge with different clusters until one cluster is formed.
- Algorithm for Agglomerative Hierarchical Clustering is:
• Calculate the similarity of one cluster with all the other clusters (calculate proximity matrix)
• Consider every data point as a individual cluster
• Merge the clusters which are highly similar or close to each other.
• Recalculate the proximity
• Matrix for each cluster
• Repeat Step 3 and 4 until only a single cluster remains.
• Let’s see the graphical representation of this algorithm using a dendrogram.
Note:-
This is just a demonstration of how the actual algorithm works no calculation has been performed below all the proximity among the clusters are assumed.
Let’s say we have six data points A, B, C, D, E, F.
Fig 1: Example of Agglomerative
• Step-1:-
Consider each alphabet as a single cluster and calculate the distance of one cluster from all the other clusters.
• Step-2:-
In the second step comparable clusters are merged together to form a single cluster. Let’s say cluster (B) and cluster (C) are very similar to each other therefore we merge them in the second step similarly with cluster (D) and (E) and at last, we get the clusters
[(A), (BC), (DE), (F)]
• Step-3:-
We recalculate the proximity according to the algorithm and merge the two nearest clusters ([(DE), (F)]) together to form new clusters as [(A), (BC), (DEF)]
• Step-4:-
Repeating the same process; the clusters DEF and BC are comparable and merged together to form a new cluster. We’re now left with clusters [(A), (BCDEF)].
• Step-5:-
At last the two remaining clusters are merged together to form a single cluster [(ABCDEF)].
- Divisive:-
- We can say that the Divisive Hierarchical clustering is precisely the opposite of the Agglomerative Hierarchical clustering.
- In Divisive Hierarchical clustering, we take into account all of the data points as a single cluster and in every iteration, we separate the data points from the clusters which aren’t comparable. In the end, we are left with N clusters.
Fig 2: Example of Divisive
Key Takeaways
- Clustering is the task of dividing the population or data points into a number of groups such that data points in the same groups are more similar to other data points in the same group than those in other groups.
- In simple words, the aim is to segregate groups with similar traits and assign them into clusters
- Broadly speaking, clustering can be divided into two subgroups:
- Hard Clustering
- Soft Clustering
- Types of clustering algorithms:-
1. Connectivity models:-
2. Centroid models:-
3. Distribution models:-
4. Density Models:-
- In Hierarchical Clustering, the aim is to produce a hierarchical series of nested clusters. A diagram called Dendrogram graphically represents this hierarchy and is an inverted tree that describes the order in which factors are merged (bottom-up view) or cluster are break up (top-down view).
References:
- Data mining Introductory and Advanced topics- Margaret H. Dunhum-Pearson
2. Business Intelligence – Data Mining and optimization for Decision Making – Curlo Vercellis-Wiley Publications.
3. Data Mining: Concepts and Techniques Second Edition- Jaiwei Han and Micheline kamber-Morgan KaufMan publisher
4. Data Mining and Analysis Fundamental Concepts and Algorithms –Mohammed J. Zaki and Wager Meira Jr. Cambridge University Press.