Frequent pattern mining is a popular problem in data mining, which consists in finding frequent patterns in transaction databases. Let me describe first the problem of frequent itemset mining. Consider the following database. It is a transaction database.
A transaction database is a database containing a set of transactions made by customers. The goal of frequent itemset mining is to find frequent itemsets.
The problem of frequent itemset mining is popular. An important limitation is that purchase quantities are not taken into account.
Thus, an item may only appear once or zero time in a transaction.
Thus, if a customer has bought five breads, ten breads or twenty breads, it is viewed as the same. A second important limitation is that all items are viewed as having the same importance, utility of weight.
Thus, frequent pattern mining may find many frequent patterns that are not interesting. For example, consider the following transaction database.
Consider transaction T3. Now look at the table on the right. This table indicates the unit profit of each item. The problem of high-utility itemset mining is to find the itemsets group of items that generate a high profit in a database, when they are sold together.
The result of a high utility itemset mining algorithm would be the following. In general, the utility of an itemset in a transaction is the quantity of each item from the itemset multiplied by their unit profit.
For example, consider the figure below. The problem of high-utility itemset mining is quite interesting for the following reasons. Second, from a research perspective, the problem of high-utility itemset mining is more challenging. In high-utility itemset mining there is no such property.
Thus given an itemset, the utility of its supersets may be higher, lower or the same. Also, you may have a look at the book of high utility pattern mining There exists several algorithms for high-utility itemset mining that have been proposed over the year. To our knowledge, FHM is one of the fastest algorithm for this problem. It was shown to be up to six times faster than HUI-Miner, which was shown to be up to times faster than UPGrowth, which was shown to be up to times faster than Two-Phase.
Note that there also exists several variations of the problem of high-utility itemset mining, which I will not cover in this blog post. Hope that you enjoyed it. Thank you for your effort and time to write this post which provides a clear and informative introduction of HUIM. It is really useful for people who initially do research on this problem. Recently, many variations of frequent itemsets have been proposed such as high-utility itemsets, erasable itemsets, top-rank-k frequen itemsets, and so on.
Glad you enjoy this blog post. I think that it depends. I had implemented one algorithm for SPMF and it is basically the same thing as Apriori, whithout any challenge.
I did not read the recent paper on that topic, but at least the one that I read was straigthforward. In my opinion, high utility mining is probably the best topic among those.
It offers some new challenges because we need to use upper-bounds to solve the problem. Of course, it is still possible to do some incremental work on this topic which are not important. But there is also potential to create something new. Another possibility is to create some new problems related to pattern mining and show that they are useful and have some real applications it would be better. So I would say, it depends more on what you can do on the topic than the topic itself and some topics are more challenging.
Going for a more challenging topic is always better, because it raises challenges and at the same time force to think about new solutions.
Thank you for sharing your opinion.
Recently, I am working on the problem of frequent subgraph mining. It would be great if you post an overview of this topic.
Subgraph mining looks like a good topic. Graph are a more complicated structure than transactions so there should be more challenges than something more simple like itemset mining. Also, there are less work on subgraph mining in my opinion than itemset mining, so many problems have not been solved for graphs. Are you in that city? How complex depends on what you want to do in these research area and how difficult depends also on your background.
I would say that you should rather choose a topic that you like. Actually I inspired by your research in high utility itemset mining. Hi, There is a lot of possible problems on this topics.
You can actually combine high utility itemset mining with any other topic from pattern mining to create a new topic.
A blog by Philippe Fournier-Viger about data mining, data science, machine learning, big data…
For example, authors have previously combined high utility itemset mining with top-k pattern mining to do top-k high utility pattern mining. This is an example. But you can combine any topic with high utility pattern mining to create a new topic. It would be a little bit long to explain.
An Introduction to High-Utility Itemset Mining
The main idea is that unlike the support, the utility is not anti-monotonic. Thus, the supersets of an itemset may have a utility that is the same, lower or higher. For this reason, the utility cannot be directly used to prune the search space. To avoid this problems, high utility itemset mining algorithms introduces some upper-bounds. The TWU is a measure that is always equal or greater than the utility. This is why it is called an upper-bound.
Why is it interesting? Because the TWU is anti-monotonic. Thus it can be used to prune the search space. Besides, the TWU, better upper-bounds have been introduces. Thank u so much for your reply sir, which helps me a lot. Can you give details about it. I need a specific problem related to it.
Plz help me Sir.
To find topics that are combinations, you would need to search for some papers on pattern mining such as itemset mining and see what has not been done in high utility mining. Thank you for such an informative blog. I am doing my research in utility mining. Your blog and your SPMF is really helpful. Thanks for your wonderful contribution.
I also request you to post more blogs related to negative utility, on shelf utility, incremental utility etc. Sir, Can you list any other tools to calculate high utility itemset. Hi, Thanks for reading the blog and commenting.
Glad you like it and you are using SPMF. I will try to write more about high utility pattern mining in the future. As you have seen, I have worked quite a lot on this topic in the recent years, so yes, I could write more on this topic. I will remember that request. Dear authors, anyone able to give me suggestion for finding complexity of algorithms used in utility mining, I.
If there is no complexity analysis provided in the paper, you may do your own analysis of the complexity of these algorithms.
MINING HIGH UTILITY ITEM SETS IN TRANSACTIONAL DATABASE
TwoPhase is based on Apriori. You may get a discussion of the complexity of Apriori here:.
The EFIM algorithm which is the fastest high-utility itemset mining algorithm, do most of its operations in linear time for each pattern in the search space. Thus its complexity is roughly linear with the number of patterns in the search space.
Mining high utility item sets pdf files
I searched many IEEE paper regarding high utility mining. I think that you could start from the source code of high utility mining algorithms in SPMF. You will then save a lot of time because many algorithms are already implemented and you also have the datasets for doing experiments on the website.
Some topics ideas could be to modify the algorithm to add some novel ideas. There are a lot of possibilities. I have just listed a few examples. Can u please help me regarding this.