Mining Sets of Patterns

3 downloads 0 Views 132KB Size Report
(Anti-)monotonic constraints (Mannila and Toivonen, 1996; Han et al, 1999; Pei et ... Supervised patterns on itemsets, with FP trees (Han et al, 2004; Cheng et al, ...
Mining Sets of Patterns a tutorial at ECMLPKDD’10, 20 September 2010, Barcelona by Bj¨orn Bringmann, Siegfried Nijssen, Nikolaj Tatti, Jilles Vreeken and Albrecht Zimmermann Reference Overview This document provides an overview of the methods and approaches we cover in our tutorial. Please find the slides for the presentation, as well as the most recent version of this document at: http://www.cs.kuleuven.be/conference/msop/

1

Introduction — Mining (Sets of) Patterns

1.1 Pattern Mining Definitions of patterns (Hand et al, 2002) Apriori (Agrawal et al, 1993),(Agrawal and Srikant, 1994), (Agrawal et al, 1996) FP-Growth (Han et al, 2004),(Han and Pei, 2000) Frequent Itemset Mining implementation and data repository (Goethals and Zaki, 2003) Perspectives (Han et al, 2007),(Zaki and Orihara, 1998) Measures for association rules (Omiecinski, 2003), (Wu et al, 2007), (Tan et al, 2004), (Silverstein et al, 1998) 1.2 Constraint-based Mining (Anti-)monotonic constraints (Mannila and Toivonen, 1996; Han et al, 1999; Pei et al, 2001; Bucila et al, 2003; Bonchi and Lucchese, 2007; De Raedt et al, 2008; Bonchi et al, 2009) Hierarchies of items (generalized itemsets) (Srikant and Agrawal, 1995) Unexpected itemsets (Jaroszewicz and Simovici, 2004; Jaroszewicz and Scheffer, 2005; Sun et al, 2009) 1.3 Condensed Representations Multiple uses of frequent sets and condensed representations (Mannila and Toivonen, 1996) Closed Frequent Itemsets (Pasquier et al, 1999) Maximal Frequent Itemsets (Bayardo, 1998) Non-Derivable Itemsets (Calders and Goethals, 2002) 1.4 Supervised Pattern Mining Emerging patterns (Dong and Li, 1999, 2005) Subgroups (Kl¨osgen, 1996; Wrobel, 1997; Kavsek et al, 2003; Grosskreutz et al, 2008) Contrast sets (Bay and Pazzani, 2001) Correlating patterns (Morishita and Sese, 2000) Discriminative patterns (Cheng et al, 2007) 1

(Interesting) rules (Bayardo Jr. and Agrawal, 1999; Morimoto et al, 1998; Webb, 1995, 2005) Studies of relationships between approaches (Novak et al, 2009; Nijssen et al, 2009; Nijssen and Kok, 2005) Supervised sequence patterns (Bringmann et al, 2006; Hirao et al, 2003) Supervised tree patterns (Zimmermann and Bringmann, 2005; Hashimoto et al, 2008) Supervised graph patterns (Bringmann et al, 2006; Geamsakul et al, 2003; Yan et al, 2008; Nowozin et al, 2007a) Supervised patterns on itemsets, with FP trees (Han et al, 2004; Cheng et al, 2008b; Atzm¨uller and Puppe, 2006) Supervised patterns on itemsets, with BDDs (Loekito and Bailey, 2006) Supervised patterns on itemsets, with CP (Nijssen et al, 2009) False positives in supervised pattern mining (Bay and Pazzani, 2001; Webb, 2007)

2

Mining Sets of Patterns — Unsupervised

2.1 Deviation-based Methods 2.1.1 Static Lift = Interest = Strength (Brin et al, 1997; Dhar and Tuzhilin, 1993) related to above (Aggarwal and Yu, 1998) The Pattern Ranking Problem (Mielik¨ainen and Mannila, 2003) Bayesian Network-based model (Jaroszewicz and Scheffer, 2005; Jaroszewicz and Simovici, 2004) Exponential model-based model (Gallo et al, 2007) Randomization-based methods, binary (Gionis et al, 2006), numeric (Ojala et al, 2009) Maximum Entropy-based methods (Jaroszewicz and Simovici, 2002; Tatti, 2007; Meo, 2002), Magnum Opus, pattern mining and statistical testing (Webb, 2007) 2.1.2 Dynamic Probabilistic Summaries (Wang and Parthasarathy, 2006) Swap-Randomization (Hanhij¨arvi et al, 2009)

2.2 Description-based Methods Tiling, set-cover like (Geerts et al, 2004) Nested-tiling (Gionis et al, 2004; Sepp¨anen and Mannila, 2004) Binary matrix factorization (Miettinen et al, 2008) Junction-tree based data description (Tatti and Heikinheimo, 2008) Constraint-based Pattern Set Mining (De Raedt and Zimmermann, 2007) 2.2.1 Compression-related methods K RIMP, itemset selection by lossless compression of the data (Siebes et al, 2006) LESS, low-entropy set selection (Heikinheimo et al, 2009) PACK, itemset selection by compressing the data with decision trees (Tatti and Vreeken, 2008) 2

Profiles, lossy compression of itemset data (Yan et al, 2005) R-K RIMP, K RIMP-like selection for relational itemsets (Koopman and Siebes, 2008) RDB-K RIMP, K RIMP-like selection of multi-relational itemsets (Koopman and Siebes, 2009) Information-theoretic noisy tiles (Kontonasios and De Bie, 2010) 2.3 Presence-based Methods Most-Informative k-Itemsets (Knobbe and Ho, 2006a) Exhaustive presence-based selection, Pattern Teams (Knobbe and Ho, 2006b) Greedy presence-based selection (Bringmann and Zimmermann, 2007) Feature Selection by MDL (Pfahringer, 1995) Constraint-based Pattern Set Mining (De Raedt and Zimmermann, 2007) 2.4 Pattern Clustering-based Redundancy aware top-k patterns (Xin et al, 2006) Semantically meaningful patterns (Yuan et al, 2007) Sampling representative patterns (Hasan et al, 2007) Mining representative subgraphs (Zhang et al, 2009) 2.5 Using Pattern Sets Difference measurement and characterisation (Vreeken et al, 2007) Missing Value Estimation (Vreeken and Siebes, 2008) Clustering (Van Leeuwen et al, 2009; Wang et al, 1999; Seeland et al, 2010; Fung et al, 2003)

3

Mining Sets of Patterns — Supervised

3.1 Scoring Pattern Sets Model Independently Class correspondences (Thoma et al, 2009) Class-correlated dispersion (R¨uckert and Kramer, 2007) 3.2 Post-processing Patterns into Models (Heuristic) Processing patterns greedily (Dong et al, 1999; Wang and Karypis, 2005; Li et al, 2001a; Zaki and Aggarwal, 2003; Ramamohanarao and Fan, 2007; Arunasalam and Chawla, 2006) Processing patterns in fixed order (Liu et al, 1998; Li et al, 2001b; Zimmermann and Bringmann, 2005; van Leeuwen et al, 2006; Bouzouita et al, 2006) No processing; only pattern constraints (Zhang et al, 2000; Kramer and De Raedt, 2001; Bringmann et al, 2006) Bayesian classification (Meretakis et al, 2000; Meretakis and W¨uthrich, 1999)

3

3.3 Post-processing Patterns (Optimal) Pattern teams (Knobbe and Ho, 2006b) Aposteriori (De Raedt and Zimmermann, 2007) Decision trees (DL8) (Nijssen and Fromont, 2007) 3.4 Iterative Mining Optimized on parts of data, applied on part of the data (Bringmann and Zimmermann, 2005) Optimized on all data, applied on all data (Thoma et al, 2009) Optimized on parts of data, applied on all data (Zimmermann et al, 2010) Patterns as tests in decision trees (Tree2 , local) (Bringmann and Zimmermann, 2005; Geamsakul et al, 2003; Cheng et al, 2008a) Patterns as weak learners in boosting (Nowozin et al, 2007a,b) Patterns as attributes in regression (Saigo et al, 2008; Tsuda, 2007) Instance-based (Veloso et al, 2006, 2007; Li et al, 2000) 3.5 Dedicated Experimental Comparisons Molecules (Deshpande et al, 2005; Wale et al, 2008; Bringmann et al, 2006) Greedy vs complete on attribute-value data (Janssen and F¨urnkranz, 2009, 2010)

References Aggarwal CC, Yu PS (1998) A new framework for itemset generation. In: PODS ’98: Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, ACM, New York, NY, USA, pp 18–24, DOI http://doi.acm.org/10.1145/ 275487.275490 Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the VLDB’94, pp 487–499 Agrawal R, Imielinksi T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the SIGMOD’93, ACM, pp 207–216 Agrawal R, Mannila H, Srikant R, Toivonen H, Verkamo AI (1996) Fast discovery of association rules. In: Advances in Knowledge Discovery and Data Mining, AAAI, pp 307–328 Arunasalam B, Chawla S (2006) Cccs: a top-down associative classifier for imbalanced class distribution. In: Eliassi-Rad et al (2006), pp 517–522 Atzm¨uller M, Puppe F (2006) Sd-map - a fast algorithm for exhaustive subgroup discovery. In: F¨urnkranz et al (2006), pp 6–17 Balc´azar JL, Bonchi F, Gionis A, Sebag M (eds) (2010) Machine Learning and Knowledge Discovery in Databases, European Conference, ECML PKDD 2010, Barcelona, Spain, September 20-24, 2010, Proceedings, Part III, Lecture Notes in Computer Science, vol 6323, Springer 4

Bay SD, Pazzani MJ (2001) Detecting group differences: Mining constrast sets. Data Mining and Knowledge Discovery 5(3):213–246 Bayardo R (1998) Efficiently mining long patterns from databases. In: Proceedings of the SIGMOD’98, pp 85–93 Bayardo Jr RJ, Agrawal R (1999) Mining the most interesting rules. In: KDD, pp 145–154 Bonchi F, Lucchese C (2007) Extending the state-of-the-art of constraint-based pattern discovery. Data Knowl Eng 60(2):377–399 Bonchi F, Giannotti F, Lucchese C, Orlando S, Perego R, Trasarti R (2009) A constraint-based querying system for exploratory pattern discovery. Inf Syst 34(1):3–27 Bouzouita I, Elloumi S, Yahia SB (2006) GARC: A new associative classification approach. In: Proceedings of the DaWaK’06, pp 554–565 Brin S, Motwani R, Silverstein C (1997) Beyond market baskets: Generalizing association rules to correlations. In: Proceedings of the SIGMOD’97, pp 265–276 Bringmann B, Zimmermann A (2005) Tree2 - decision trees for tree structured data. In: Jorge A, Torgo L, Brazdil P, Camacho R, Gama J (eds) PKDD, Springer, Lecture Notes in Computer Science, vol 3721, pp 46–58 Bringmann B, Zimmermann A (2007) The chosen few: On identifying valuable patterns. In: Proceedings of the ICDM’07, pp 63–72 Bringmann B, Zimmermann A, De Raedt L, Nijssen S (2006) Don’t be afraid of simpler patterns. In: PKDD, pp 55–66 Bucila C, Gehrke J, Kifer D, White WM (2003) Dualminer: A dual-pruning algorithm for itemsets with constraints. Data Min Knowl Discov 7(3):241–272 Calders T, Goethals B (2002) Mining all non-derivable frequent itemsets. In: Proceedings of the ECML PKDD’02, pp 74–85 Cheng H, Yan X, Han J, Hsu CW (2007) Discriminative frequent pattern analysis for effective classification. In: ICDE, pp 716–725 Cheng H, Han J, Yan X, Yu PS (2008a) Integration of classification and pattern mining: A discriminative and frequent pattern-based approach. In: ICDM Tutorials Cheng H, Yan X, Han J, Yu PS (2008b) Direct discriminative pattern mining for effective classification. In: ICDE, pp 169–178 De Raedt L, Zimmermann A (2007) Constraint-based pattern set mining. In: SDM, SIAM De Raedt L, Guns T, Nijssen S (2008) Constraint programming for itemset mining. In: KDD, pp 204–212 Deshpande M, Kuramochi M, Wale N, Karypis G (2005) Frequent substructure-based approaches for classifying chemical compounds. IEEE Trans Knowl Data Eng 17(8):1036–1050 5

Dhar V, Tuzhilin A (1993) Abstract-driven pattern discovery in databases. IEEE Trans Knowl Data Eng 5(6):926–938 Dong G, Li J (1999) Efficient mining of emerging patterns: Discovering trends and differences. In: KDD, pp 43–52 Dong G, Li J (2005) Mining border descriptions of emerging patterns from dataset pairs. Knowledge and Information Systems 8(2):178–202 Dong G, Zhang X, Wong L, Li J (1999) Caep: Classification by aggregating emerging patterns. In: Arikawa S, Furukawa K (eds) Discovery Science, Springer, Lecture Notes in Computer Science, vol 1721, pp 30–42 Eliassi-Rad T, Ungar LH, Craven M, Gunopulos D (eds) (2006) Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, August 20-23, 2006, ACM Fung BCM, Wang K, Ester M (2003) Hierarchical document clustering using frequent itemsets. In: Barbar´a D, Kamath C (eds) SDM, SIAM F¨urnkranz J, Scheffer T, Spiliopoulou M (eds) (2006) Knowledge Discovery in Databases: PKDD 2006, 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, Berlin, Germany, September 18-22, 2006, Proceedings, Springer Gallo A, Cristianini N, De Bie T (2007) Mini: Mining informative non-redundant itemsets. In: Proceedings of the ECMLPKDD’07, pp 438–445 Geamsakul W, Matsuda T, Yoshida T, Motoda H, Washio T (2003) Performance evaluation of decision tree graph-based induction. In: Grieser G, Tanaka Y, Yamamoto A (eds) Discovery Science, Springer, Sapporo, Japan, pp 128–140 Geerts F, Goethals B, Mielik¨ainen T (2004) Tiling databases. In: Proceedings of the DS’04, pp 278–289 Gionis A, Mannila H, Sepp¨anen JK (2004) Geometric and combinatorial tiles in 0-1 data. In: Boulicaut JF, Esposito F, Giannotti F, Pedreschi D (eds) PKDD, Springer, Lecture Notes in Computer Science, vol 3202, pp 173–184 Gionis A, Mannila H, Mielik¨ainen T, Tsaparas P (2006) Assessing data mining results via swap randomization. In: Eliassi-Rad et al (2006), pp 167–176 Goethals B, Zaki MJ (2003) Frequent itemset mining implementations repository (FIMI), http://fimi.cs.helsinki.fi Grosskreutz H, R¨uping S, Wrobel S (2008) Tight optimistic estimates for fast subgroup discovery. In: ECML/PKDD (1), pp 440–456 Han J, Pei J (2000) Mining frequent patterns by pattern-growth: methodology and implications. SIGKDD Explorations Newsletter 2(2):14–20

6

Han J, Lakshmanan LVS, Ng RT (1999) Constraint-based multidimensional data mining. IEEE Computer 32(8):46–50 Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min Knowl Discov 8(1):53–87 Han J, Cheng H, Xin D, Yan X (2007) Frequent pattern mining: Current status and future directions. Data Mining and Knowledge Discovery 15(1):55–86 Hand D, Adams N, Bolton R (eds) (2002) Pattern Detection and Discovery. Springer-Verlag Hanhij¨arvi S, Ojala M, Vuokko N, Puolam¨aki K, Tatti N, Mannila H (2009) Tell me something I don’t know: randomization strategies for iterative data mining. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, June 28 – July 1, 2009, pp 379–388 Hasan MA, Chaoji V, Salem S, Besson J, Zaki MJ (2007) Origami: Mining representative orthogonal graph patterns. In: Ramakrishnan and Zaiane (2007), pp 153–162 Hashimoto K, Takigawa I, Shiga M, Kanehisa M, Mamitsuka H (2008) Mining significant tree patterns in carbohydrate sugar chains. Bioinformatics 24(16):i167–i173, DOI http://dx.doi.org/ 10.1093/bioinformatics/btn293 Heikinheimo H, Vreeken J, Siebes A, Mannila H (2009) Low-entropy set selection. In: Proceedings of the SDM’09, pp 569–579 Hirao M, Hoshino H, Shinohara A, Takeda M, Arikawa S (2003) A practical algorithm to find the best subsequence patterns. Theor Comput Sci 292(2):465–479 Janssen F, F¨urnkranz J (2009) A re-evaluation of the over-searching phenomenon in inductive rule learning. In: SDM, pp 329–340 Janssen F, F¨urnkranz J (2010) On the quest for optimal rule learning heuristics. Machine Learning 78(3):343–379 Jaroszewicz S, Scheffer T (2005) Fast discovery of unexpected patterns in data, relative to a bayesian network. In: Proceedings of the KDD’05, pp 118–127 Jaroszewicz S, Simovici D (2004) Interestingness of frequent itemsets using bayesian networks as background knowledge. In: Proceedings of the KDD’04, pp 178–186 Jaroszewicz S, Simovici DA (2002) Pruning redundant association rules using maximum entropy principle. In: Proceedings of the PAKDD’02, pp 135–147 Kavsek B, Lavrac N, Jovanoski V (2003) APRIORI-SD: Adapting association rule learning to subgroup discovery. In: IDA, pp 230–241 Kl¨osgen W (1996) Explora: A multipattern and multistrategy discovery assistant. In: Fayyad U, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (eds) Advances in Knowledge Discovery and Data Mining 7

Knobbe A, Ho EY (2006a) Maximally informative k-itemsets and their efficient discovery. In: Proceedings of the KDD’06, pp 237–244 Knobbe AJ, Ho EKY (2006b) Pattern teams. In: Proceedings of the ECML PKDD’06, pp 577–584 Kok JN, Koronacki J, de M´antaras RL, Matwin S, Mladenic D, Skowron A (eds) (2007) Knowledge Discovery in Databases: PKDD 2007, 11th European Conference on Principles and Practice of Knowledge Discovery in Databases, Warsaw, Poland, September 17-21, 2007, Proceedings, Lecture Notes in Computer Science, vol 4702, Springer Kontonasios KN, De Bie T (2010) An information-theoretic approach to finding informative noisy tiles in binary databases. In: SDM, SIAM, pp 153–164 Koopman A, Siebes A (2008) Discovering relational items sets efficiently. In: SDM, SIAM, pp 108–119 Koopman A, Siebes A (2009) Characteristic relational patterns. In: Proceedings of the KDD’09, pp 437–446, DOI http://doi.acm.org/10.1145/1557019.1557071 Kramer S, De Raedt L (2001) Feature construction with version spaces for biochemical applications. In: Brodley CE, Danyluk AP (eds) ICML, Morgan Kaufmann, pp 258–265 van Leeuwen M, Vreeken J, Siebes A (2006) Compression picks item sets that matter. In: F¨urnkranz et al (2006), pp 585–592 Li J, Dong G, Ramamohanarao K (2000) Instance-based classification by emerging patterns. In: Zighed DA, Komorowski HJ, Zytkow JM (eds) PKDD, Springer, Lecture Notes in Computer Science, vol 1910, pp 191–200 Li J, Dong G, Ramamohanarao K (2001a) Making use of the most expressive jumping emerging patterns for classification. Knowl Inf Syst 3(2):131–145 Li W, Han J, Pei J (2001b) CMAR: Accurate and efficient classification based on multiple classassociation rules. In: Cercone N, Lin TY, Wu X (eds) Proceedings of the 2001 IEEE International Conference on Data Mining, IEEE Computer Society, San Jos´e, California, USA, pp 369–376 Liu B, Hsu W, Ma Y (1998) Integrating classification and association rule mining. In: Agrawal R, Stolorz PE, Piatetsky-Shapiro G (eds) Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, AAAI Press, New York City, New York, USA, pp 80–86 Loekito E, Bailey J (2006) Fast mining of high dimensional expressive contrast patterns using zero-suppressed binary decision diagrams. In: Eliassi-Rad et al (2006), pp 307–316 Mannila H, Toivonen H (1996) Multiple uses of frequent sets and condensed representations. In: Proceedings of the KDD’96, pp 189–194 Meo R (2002) Maximum independence and mutual information. IEEE Transactions on Information Theory 48(1):318–324 8

Meretakis D, W¨uthrich B (1999) Extending na¨ıve bayes classifiers using long itemsets. In: Proceedings of the KDD’99, pp 165–174 Meretakis D, Lu H, W¨uthrich B (2000) A study on the performance of large bayes classifier. In: Proceedings of the ECML’00, pp 271–279 Mielik¨ainen T, Mannila H (2003) The pattern ordering problem. In: Proceedings of the ECML PKDD’03, pp 327–338 Miettinen P, Mielik¨ainen T, Gionis A, Das G, Mannila H (2008) The discrete basis problem. IEEE Trans Knowl Data Eng 20(10):1348–1362 Morimoto Y, Fukuda T, Matsuzawa H, Tokuyama T, Yoda K (1998) Algorithms for mining association rules for binary segmentations of huge categorical databases. In: VLDB, pp 380–391 Morishita S, Sese J (2000) Traversing itemset lattice with statistical metric pruning. In: PODS, pp 226–236 Nijssen S, Fromont E (2007) Mining optimal decision trees from itemset lattices. In: Proceeding of the 13th ACM SIGKDD international conference on Knowledge discovery in data mining (KDD’07), pp 530–539 Nijssen S, Kok JN (2005) Multi-class correlated pattern mining. In: Bonchi F, Boulicaut JF (eds) KDID, Springer, Lecture Notes in Computer Science, vol 3933, pp 165–187 Nijssen S, Guns T, De Raedt L (2009) Correlated itemset mining in ROC space: a constraint programming approach. In: KDD, pp 647–656 Novak PK, Lavrac N, Webb GI (2009) Supervised descriptive rule discovery: A unifying survey of contrast set, emerging pattern and subgroup mining. Journal of Machine Learning Research 10:377–403 Nowozin S, Tsuda K, Uno T, Kudo T, Bakir GH (2007a) Weighted substructure mining for image analysis. In: MLG Nowozin S, Tsuda K, Uno T, Kudo T, Bakir GH (2007b) Weighted substructure mining for image analysis. In: CVPR Ojala M, Vuokko N, Kallio A, Haiminen N, Mannila H (2009) Randomization methods for assessing data analysis results on real-valued matrices. Statistical Analysis and Data Mining 2(4):209– 230 Omiecinski E (2003) Alternative interest measures for mining associations in databases. IEEE Transactions on Knowledge and Data Engineering 15(1):57–69, DOI 10.1109/TKDE.2003. 1161582 Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proceedings of the ICDT’99, pp 398–416 Pei J, Han J, Lakshmanan LVS (2001) Mining frequent item sets with convertible constraints. In: Proceedings of the IEEE International Conference on Data Engineering (ICDE), pp 433–442 9

Pfahringer B (1995) Compression-based feature subset selection. In: Proceedings of the IJCAI’95 Workshop on Data Engineering for Inductive Learning, pp 109–119 Ramakrishnan N, Zaiane O (eds) (2007) Proceedings of the 7th IEEE International Conference on Data Mining (ICDM 2007), October 28-31, 2007, Omaha, Nebraska, USA, IEEE Computer Society Ramamohanarao K, Fan H (2007) Patterns based classifiers. In: World Wide Web, pp 71–83 R¨uckert U, Kramer S (2007) Optimizing feature sets for structured data. In: Kok JN, Koronacki J, de M´antaras RL, Matwin S, Mladenic D, Skowron A (eds) ECML, Springer, Lecture Notes in Computer Science, vol 4701, pp 716–723 Saigo H, Kr¨amer N, Tsuda K (2008) Partial least squares regression for graph mining. In: KDD, pp 578–586 Seeland M, Girschick T, Buchwald F, Kramer S (2010) Online structural graph clustering using frequent subgraph mining. In: Balc´azar et al (2010), pp 213–228 Sepp¨anen JK, Mannila H (2004) Dense itemsets. In: KDD ’04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, New York, NY, USA, pp 683–688, DOI http://doi.acm.org/10.1145/1014052.1014140 Siebes A, Vreeken J, van Leeuwen M (2006) Item sets that compress. In: Proceedings of the SDM’06, pp 393–404 Silverstein C, Brin S, Motwani R (1998) Beyond market baskets: Generalizing association rules to dependence rules. Data Min Knowl Discov 2(1):39–68 Srikant R, Agrawal R (1995) Mining generalized association rules. In: Dayal U, Gray PMD, Nishio S (eds) VLDB, Morgan Kaufmann, pp 407–419 Sun H, Bie TD, Storms V, Fu Q, Dhollander T, Lemmens K, Verstuyf A, Moor BD, Marchal K (2009) Moduledigger: an itemset mining framework for the detection of is-regulatory modules. BMC Bioinformatics 10(S-1) Tan PN, Kumar V, Srivastava J (2004) Selecting the right objective measure for association analysis. Inf Syst 29(4):293–313 Tatti N (2007) Maximum entropy based significance of itemsets. In: Ramakrishnan and Zaiane (2007), pp 312–321 Tatti N, Heikinheimo H (2008) Decomposable families of itemsets. In: Proceedings of the ECMLPKDD’08 Tatti N, Vreeken J (2008) Finding good itemsets by packing data. In: Proceedings of the ICDM’08, pp 588–597 Thoma M, Cheng H, Gretton A, Han J, Kriegel HP, Smola AJ, Song L, Yu PS, Yan X, Borgwardt KM (2009) Near-optimal supervised feature selection among frequent subgraphs. In: SDM 10

Tsuda K (2007) Entire regularization paths for graph data. In: Ghahramani Z (ed) ICML, ACM, ACM International Conference Proceeding Series, vol 227, pp 919–926 Van Leeuwen , Vreeken J, Siebes A (2009) Identifying the components. Data Min Knowl Discov 19(2):173–292 Veloso A, Jr WM, Zaki MJ (2006) Lazy associative classification. In: ICDM, IEEE Computer Society, pp 645–654 Veloso A, Jr WM, Gonc¸alves MA, Zaki MJ (2007) Multi-label lazy associative classification. In: Kok et al (2007), pp 605–612 Vreeken J, Siebes A (2008) Filling in the blanks – K RIMP minimisation for missing data. In: Proceedings of the ICDM’08, pp 1067–1072 Vreeken J, van Leeuwen M, Siebes A (2007) Characterising the difference. In: Proceedings of the KDD’07, pp 765–774 Wale N, Watson IA, Karypis G (2008) Comparison of descriptor spaces for chemical compound retrieval and classification. Knowl Inf Syst 14(3):347–375 Wang C, Parthasarathy S (2006) Learning approximate mrfs from large transaction data. In: F¨urnkranz J, Scheffer T, Spiliopoulou M (eds) PKDD, Springer, Lecture Notes in Computer Science, vol 4213, pp 641–649 Wang J, Karypis G (2005) Harmony: Efficiently mining the best rules for classification. In: SDM Wang K, Xu C, Liu B (1999) Clustering transactions using large items. In: Proceedings of the CIKM’99, pp 483–490 Webb GI (1995) Opus: An efficient admissible algorithm for unordered search. J Artif Intell Res (JAIR) 3:431–465 Webb GI (2005) K-optimal pattern discovery: An efficient and effective approach to exploratory data mining. In: Zhang S, Jarvis R (eds) Australian Conference on Artificial Intelligence, Springer, Lecture Notes in Computer Science, vol 3809, pp 1–2 Webb GI (2007) Discovering significant patterns. Machine Learning 68(1):1–33 Wrobel S (1997) An algorithm for multi-relational discovery of subgroups. In: Komorowski J, Zytkow J (eds) Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery (PKDD ’97), pp 78 – 87 Wu T, Chen Y, Han J (2007) Association mining in large databases: A re-examination of its measures. In: Kok et al (2007), pp 621–628 Xin D, Cheng H, Yan X, Han J (2006) Extracting redundancy-aware top-k patterns. In: Eliassi-Rad et al (2006), pp 444–453 Yan X, Cheng H, Han J, Xin D (2005) Summarizing itemset patterns: A profile-based approach. In: Proceedings of the KDD’05, pp 314–323 11

Yan X, Cheng H, Han J, Yu PS (2008) Mining significant graph patterns by leap search. In: SIGMOD Conference, pp 433–444 Yuan J, Wu Y, Yang M (2007) From frequent itemsets to semantically meaningful visual patterns. In: Berkhin P, Caruana R, Wu X (eds) KDD, ACM, pp 864–873 Zaki M, Orihara M (1998) Theoretical foundations of association rules. In: Proceedings of SIGMOD’98 workshop on Research Issues in KDD Zaki MJ, Aggarwal CC (2003) XRules: an effective structural classifier for XML data. In: Getoor L, Senator TE, Domingos P, Faloutsos C (eds) KDD, ACM, Washington, DC, USA, pp 316–325 Zhang S, Yang J, Li S (2009) Ring: An integrated method for frequent representative subgraph mining. In: Wang W, Kargupta H, Ranka S, Yu PS, Wu X (eds) ICDM, IEEE Computer Society, pp 1082–1087 Zhang X, Guozhu D, Ramamohanarao K (2000) Information-based classification by aggregating emerging patterns. In: Proceedings of the IDEAL’00, pp 48–53 Zimmermann A, Bringmann B (2005) CTC - correlating tree patterns for classification. In: ICDM, pp 833–836 Zimmermann A, Bringmann B, R¨uckert U (2010) Fast, effective molecular feature mining by local optimization. In: Balc´azar et al (2010), pp 563–578

12