p.p1 {margin: 0.0px 0.
0px 6.0px 0.0px; text-align: center; font: 24.0px ‘Times New Roman’; color: #000000; -webkit-text-stroke: #000000}p.p2 {margin: 0.0px 0.0px 18.
0px 0.0px; text-align: center; font: 24.0px ‘Times New Roman’; color: #000000; -webkit-text-stroke: #000000; min-height: 27.0px}p.p3 {margin: 0.0px 0.
0px 2.0px 0.0px; text-align: center; font: 12.0px ‘Times New Roman’; color: #000000; -webkit-text-stroke: #000000}p.p4 {margin: 0.0px 0.0px 0.
0px 0.0px; text-align: center; font: 12.0px ‘Times New Roman’; color: #000000; -webkit-text-stroke: #000000}p.p5 {margin: 0.0px 0.0px 18.0px 0.
0px; text-align: center; font: 12.0px ‘Times New Roman’; color: #000000; -webkit-text-stroke: #000000; min-height: 15.0px}p.
p6 {margin: 0.0px 0.0px 0.0px 0.
0px; text-align: center; font: 12.0px ‘Times New Roman’; color: #000000; -webkit-text-stroke: #000000; min-height: 15.0px}p.p7 {margin: 0.0px 0.
0px 0.0px 0.0px; text-align: center; font: 10.0px ‘Times New Roman’; color: #000000; -webkit-text-stroke: #000000; min-height: 11.0px}p.p8 {margin: 0.0px 0.
0px 10.0px 0.0px; text-align: justify; text-indent: 13.7px; font: 12.
0px ‘Times New Roman’; color: #000000; -webkit-text-stroke: #000000}p.p9 {margin: 0.0px 0.0px 6.0px 0.
0px; text-align: justify; text-indent: 13.7px; font: 12.0px ‘Times New Roman’; color: #000000; -webkit-text-stroke: #000000}p.p10 {margin: 0.0px 0.
0px 6.0px 0.0px; text-align: justify; text-indent: 13.7px; font: 12.0px ‘Times New Roman’; color: #000000; -webkit-text-stroke: #000000; min-height: 15.0px}p.p11 {margin: 0.0px 0.
0px 8.0px 0.0px; text-align: justify; text-indent: 13.7px; font: 12.0px ‘Times New Roman’; color: #000000; -webkit-text-stroke: #000000; min-height: 15.
0px}p.p13 {margin: 0.0px 0.0px 6.
0px 0.0px; text-align: justify; text-indent: 14.4px; font: 12.0px ‘Times New Roman’; color: #000000; -webkit-text-stroke: #000000}p.p14 {margin: 0.0px 0.
0px 6.0px 0.0px; text-align: justify; text-indent: 14.4px; font: 12.
0px ‘Times New Roman’; color: #000000; -webkit-text-stroke: #000000; min-height: 15.0px}p.p15 {margin: 0.0px 0.0px 8.0px 0.0px; text-align: justify; text-indent: 14.
4px; font: 12.0px ‘Times New Roman’; color: #000000; -webkit-text-stroke: #000000; min-height: 15.0px}p.
p16 {margin: 0.0px 0.0px 8.0px 0.0px; text-align: center; font: 10.
0px ‘Times New Roman’; color: #000000; -webkit-text-stroke: #000000; min-height: 11.0px}p.p17 {margin: 0.0px 0.0px 0.
0px 0.0px; text-align: justify; font: 12.0px ‘Times New Roman’; color: #000000; -webkit-text-stroke: #000000; min-height: 15.0px}p.p21 {margin: 0.0px 0.
0px 0.0px 0.0px; text-align: justify; font: 10.0px ‘Times New Roman’; color: #000000; -webkit-text-stroke: #000000; background-color: #ffffff; min-height: 11.0px}p.
p22 {margin: 0.0px 0.0px 0.0px 0.0px; text-align: center; font: 10.0px ‘Times New Roman’; color: #000000; -webkit-text-stroke: #000000; background-color: #ffffff; min-height: 11.0px}p.
p23 {margin: 0.0px 0.0px 6.0px 0.0px; text-align: justify; text-indent: 14.4px; font: 12.0px ‘Times New Roman’; color: #000000; -webkit-text-stroke: #000000; background-color: #ffffff; min-height: 15.0px}li.
li12 {margin: 0.0px 0.0px 4.0px 0.0px; text-align: center; font: 13.
0px ‘Times New Roman’; color: #000000; -webkit-text-stroke: #000000}li.li18 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px ‘Times New Roman’; color: #222222; -webkit-text-stroke: #222222}li.
li19 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.
0px ‘Times New Roman’; color: #222222; -webkit-text-stroke: #222222; background-color: #ffffff}li.li20 {margin: 0.0px 0.0px 0.0px 0.0px; text-align: justify; font: 12.0px ‘Times New Roman’; color: #222222; -webkit-text-stroke: #222222; background-color: #ffffff}span.s1 {font-kerning: none}span.
s2 {font: 12.0px Helvetica; color: #000000}span.s3 {letter-spacing: -0.1px}span.s4 {font: 12.8px Arial; letter-spacing: -0.
1px}span.s5 {font: 12.0px Arial; letter-spacing: -0.1px}span.s6 {font: 12.
0px Helvetica; color: #000000; -webkit-text-stroke: 0px #000000}span.s7 {font-kerning: none; background-color: #ffffff}ol.ol1 {list-style-type: upper-roman}ol.ol2 {list-style-type: decimal}Nishant Srivastava, Stuti, Kanika GuptaDepartment of Computer Science & EngineeringJaypee Institute of Information TechnologyA-10, Sector-62, Noida – 201309Under the Supervision of – Dr. Niyati BaliyanDepartment of Computer Science & EngineeringJaypee Institute of Information TechnologyA-10, Sector-62, Noida – 201309Abstract— Data Mining can be defined as an activity that extracts some new nontrivial information contained in large databases. Traditional data mining techniques have focused largely on detecting the statistical correlations between the items that are more frequent in the transaction databases.
Also termed as frequent item set mining, these techniques were based on the rationale that item sets which appear more frequently must be of more importance to the user from the business perspective. In this project we try to simulate an algorithm based on an emerging area called Utility Mining which not only considers the frequency of the item sets but also considers the utility associated with the item sets. The term utility refers to the importance or the usefulness of the appearance of the item set in transactions quantified in terms like profit, sales or any other user preferences. In High Utility Item set Mining the objective is to identify item sets that have utility values above a given utility threshold. The limitation of the above approach lies in the fact that most useful frequent item sets might be ignored due to low support which in turn affect the analysis of consumer behavior pattern.
Hence we aim to devise an ensemble algorithm which can tackle this issue and further the research done in this arena.Keywords—market basket analysis; data mining; utility mining; customer relationship management IntroductionApplication of data mining techniques for Customer Relationship Management has become widely recognized as an important business approach. Many organizations have collected a wealth of data about their current customers, potential customers, and suppliers. Data mining uses statistical, mathematical, artificial intelligence and machine-learning techniques to extract and identify useful information and subsequently gain knowledge from large databases enabling organizations to take business decisions that facilitate Customer Identification, Customer Attraction, Customer Retention and Customer Development.Market basket analysis (also known as association rule mining) is a method of discovering customer purchasing patterns by extracting associations or co-occurrences from stores’ transactional databases.
Discovering, for example, that supermarket customers are likely to purchase milk, bread, and cheese together can help managers in designing store layout, web sites, product mix and bundling, and other marketing strategies. By far, the A-priori algorithm 1 is the most known algorithm for mining the association rules from a transactional database, which satisfy the minimum support and confidence levels specified by users. The methodology was introduced by Agrawal et al. 1 and can be stated as follows. Given two non-overlapping subsets of product items, X and Y, an association rule in form of X -> Y indicates a purchase pattern that if a customer purchases X then he or she also purchases Y. Two measures, support and confidence, are commonly used to select the association rules.
Support is a measure of how often the transactional records in the database contain both X and Y, and confidence is a measure of the accuracy of the rule, defined as the ratio of the number of transactional records with both X and Y to the number of transactional records with X only. But the practical usefulness of the frequent item set mining is limited by the significance of the discovered item sets. While mining literature has been exclusively focused on frequent item sets, in many practical situations rare ones are of higher interest. So during the mining process we should not be prejudiced to identify either frequent or rare item sets instead our aim should be to identify item sets which have comparatively higher utilities in the database. An emerging area is that of Utility Mining which not only considers the frequency of the item sets but also considers the utility associated with the item sets by taking metrics like profit and sales into consideration. In utility based mining the term utility refers to the quantitative representation of user preference i.e, the utility value of an item set is the measurement of importance of that item set in the user’s perspective. For example, if a sales analyst involved in some retail research needs to find out which item sets in the stores earn maximum sales revenue for the stores he or she will define utility of any items as the monetary profit that the store earns by selling each unit of that item set.
This aims to overcome the limitations of existing frequent items mining techniques.Problem definitionTo develop an ensemble algorithm which can effectively find item sets having a high utility to the user while also implicitly using the metrics of minimum support and minimum confidence proposed by traditional ARM algorithms hence overcoming the shortcomings of traditional frequent item set algorithms and HUIM(High Utility Item set Mining) method finding a pruned and useful set of high utility items for an effective Market Basket Analysis.Related WorkIn this section, we will review some articles related to utility mining and association rule mining that prove useful in Market Basket Analysis. Association rule mining was first introduced by Agrawal et al. 1. They proposed A-priori algorithm which generates significant association rules from repositories. A-priori is the most accepted algorithm to prune frequent item sets.
It is a multi-pass algorithm, which scans data base number of times. Later many efficient methods have been proposed to extract useful rules from the data bases. Han et al proposed a novel way of extracting association rules without generating candidate sets by introducing a data structure, FP-tree and an FP- growth method. 6 In real life applications, an item can be valued because of its importance or utility.
Bhattacharya et al proposed a high utility mining algorithm depending on the profit gained from the item set which should satisfy a minimum threshold thus having a minimum support. 11 An improved UP-growth high utility mining algorithm was proposed by Reddy et al. 13 Jabbar et al proposed a novel approach for utility frequent item set mining.
The method mines novel frequent item sets by giving importance to items quantity, significance weightage, utility and user defined support. 16 The traditional market basket analysis fails to discover important purchasing patterns in a multi-store environment, because of an implicit assumption that products under consideration are on shelf all the time across all stores. Chen et al proposed an A-priori- like algorithm for automatically extracting association rules in a multi-store environment.
8 Liu et al proposed a novel algorithm for mining association rules with multiple minimum supports to tackle the rare item problem. 5 Agrawal et al proposed three integrated algorithms for mining association rules with item constraints. 4 Apart from the many data mining algorithms, a number of nature inspired algorithms have also been proposed to further research in Market Basket Analysis.
Literature ReviewResearch GapsThough the field of frequent item set mining has been a heavily researched one ever since it’s inception, traditional association rule mining is still the most cited and all-prevailing in the field of market basket analysis. Though some research has been done in the field of utility item set mining, there is a lack of a benchmark algorithm which can be taken as a base to further research in this avenue. The claims of efficient algorithms in terms of computational complexities stand on thin ground as it varies largely with the size of the dataset and the specifications of the machine used to run the transactions. A large proportion of the research done takes association rule mining as a standard algorithm and provides methods to optimize it’s output in terms of the number of interesting association rules generated. The use of neural networks in market basket analysis is an emerging area and not a lot of work has been done to incorporate them. Lastly, even after a decade of research frequent item set generating algorithms continue to dominate market basket analysis, limiting their usefulness in the business world.
ConclusionMarket basket analysis is of immense use in discovering customer purchasing patterns by extracting associations or co-occurrences from stores’ transactional databases. This was first introduced by the means of A-priori algorithm, the most known algorithm for mining the association rules from a transactional database, which satisfy the minimum support and confidence levels specified by users. A-priori however results in a large number of association rules, all of which are not of utility to the analyst hence, efforts were made to extract useful rules from the data bases and many efficient methods have been proposed.
Utility Mining then emerged as a whole new area of research which not only considers the frequency of the item sets but also considers the utility associated with the item sets by taking metrics like profit and sales into consideration. Even after a decade of research market basket analysis continues to be dominated by algorithms focusing more on statistical aspects rather than the semantic value of the rules generated, limiting their usefulness in the business world. An algorithm generating rules with high utility and simultaneously integrating the traditional concepts of minimum support and minimum confidence can open new avenues in Market Basket Analysis and it’s applications.
ReferencesAgrawal, R., Imieli?ski, T., & Swami, A. (1993, June). Mining association rules between sets of items in large databases.
In Acm sigmod record (Vol. 22, No. 2, pp.
207-216). ACM.Agrawal, R., & Srikant, R. (1994, September).
Fast algorithms for mining association rules. In Proc. 20th int. conf. very large data bases, VLDB (Vol. 1215, pp. 487-499).
Agrawal, R., & Srikant, R. (1995, March). Mining sequential patterns. In Data Engineering, 1995.
Proceedings of the Eleventh International Conference on (pp. 3-14). IEEE.Srikant, R., Vu, Q., & Agrawal, R. (1997, August).
Mining association rules with item constraints. In KDD (Vol. 97, pp.
67-73).Liu, Bing, Wynne Hsu, and Yiming Ma. “Mining association rules with multiple minimum supports.
” Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1999. Han, J., Pei, J., & Yin, Y. (2000, May). Mining frequent patterns without candidate generation.
In ACM sigmod record(Vol. 29, No. 2, pp. 1-12). ACM.Hipp, J., Güntzer, U.
, & Nakhaeizadeh, G. (2000). Algorithms for association rule mining—a general survey and comparison. ACM sigkdd explorations newsletter, 2(1), 58-64.
Chen, Y. L., Tang, K., Shen, R.
J., & Hu, Y. H. (2005). Market basket analysis in a multiple store environment. Decision support systems, 40(2), 339-354.Ngai, E. W.
, Xiu, L., & Chau, D. C. (2009). Application of data mining techniques in customer relationship management: A literature review and classification. Expert systems with applications, 36(2), 2592-2602.Liao, C.
W., Perng, Y. H., & Chiang, T. L. (2009). Discovery of unapparent association rules based on extracted probability.
Decision Support Systems, 47(4), 354-363.Bhattacharya, S., & Dubey, D. (2012). High Utility Itemset Mining.
International Journal of Emerging Technology and Advanced Engineering, 2(8), 476-481.Raorane, A. A., Kulkarni, R. V., & Jitkar, B. D.
(2012). Association rule–extracting knowledge using market basket analysis. Research Journal of Recent Sciences.Reddy, B. Adinarayana, O. Srinivasa Rao, and M.
H. M. Prasad.
“An Improved UP-Growth High Utility Itemset Mining.” arXiv preprint arXiv:1212.0317 (2012).Lee, D., Park, S. H.
, & Moon, S. (2013). Utility-based association rule mining: A marketing solution for cross-selling. Expert Systems with applications, 40(7), 2715-2725.Mostafa, M. M. (2015).
Knowledge discovery of hidden consumer purchase behaviour: a market basket analysis. International Journal of Data Analysis Techniques and Strategies, 7(4), 384-405.Jabbar, M. A.
, Deekshatulu, B. L., & Chandra, P. (2016). A Novel Algorithm for Utility-Frequent Itemset Mining in Market Basket Analysis.
In Innovations in Bio-Inspired Computing and Applications (pp. 337-345). Springer, Cham.