Papers on Emerging Patterns, Changes, and Contrasts/Differences

This is a list of papers on (i) the mining of emerging patterns (EPs), contrast/difference patterns, and change patterns; (ii) the application of such patterns to classification; (iii) the application of such patterns to various contexts (e.g. microarray gene expression data analysis, rare event identificaiton, intrusion detection, etc). The mining of the pattern types listed above can be viewed as special cases of comparative data mining, whose aim is to mine patterns relating twoor more given datasets.

Emerging patterns were originally introduced in (Dong+Li 99). The first emerging pattern based classification paper was (Dong+Zhang+Wong+Li 99). The first papers applying emerging patterns to microarray gene expression data analysis were (Li+Wong 01; Li+Wong 02).

The list is divided into three sections.

Please email guozhu dot dong at wright dot edu if you want to add other papers to this list.

Emerging Pattern Mining, Change Mining, Contrast/Difference Mining

  • Arunasalam, Bavani and Chawla, Sanjay and Sun, Pei. Striking Two Birds with One Stone: Simultaneous Mining of Positive and Negative Spatial Patterns. In Proceedings of the Fifth SIAM International Conference on Data Mining, April 21-23, pp, Newport Beach, CA, USA, SIAM 2005

  • Bavani Arunasalam, Sanjay Chawla: CCCS: a top-down associative classifier for imbalanced class distribution. KDD 2006: 517-522

  • Eric Bae, James Bailey, Guozhu Dong: Clustering Similarity Comparison Using Density Profiles. Australian Conference on Artificial Intelligence 2006: 342-351

  • James Bailey, Thomas Manoukian, Kotagiri Ramamohanarao: Fast Algorithms for Mining Emerging Patterns. PKDD 2002: 39-50.

  • J. Bailey and T. Manoukian and K. Ramamohanarao: A Fast Algorithm for Computing Hypergraph Transversals and its Application in Mining Emerging Patterns. Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM). Pages 485-488. Florida, USA, November 2003.

  • Stephen D. Bay, Michael J. Pazzani: Detecting Change in Categorical Data: Mining Contrast Sets. KDD 1999: 302-306.

  • Stephen D. Bay, Michael J. Pazzani: Detecting Group Differences: Mining Contrast Sets. Data Min. Knowl. Discov. 5(3): 213-246 (2001)

  • Cristian Bucila, Johannes Gehrke, Daniel Kifer, Walker M. White: DualMiner: A Dual-Pruning Algorithm for Itemsets with Constraints. Data Min. Knowl. Discov. 7(3): 241-272 (2003)

  • Yandong Cai, Nick Cercone, Jiawei Han: An Attribute-Oriented Approach for Learning Classification Rules from Relational Databases. ICDE 1990: 281-288

  • Sarah Chan, Ben Kao, Chi Lap Yip, Michael Tang: Mining Emerging Substrings. DASFAA 2003.

  • Yixin Chen, Guozhu Dong, Jiawei Han, Jian Pei, Benjamin W. Wah, Jianyong Wang: Online Analytical Processing Stream Data: Is It Feasible? DMKD 2002

  • Chen Chen, Xifeng Yan, Philip S. Yu, Jiawei Han, Dong-Qing Zhang, Xiaohui Gu: Towards Graph Containment Search and Indexing. VLDB 2007: 926-937

  • Graham Cormode, S. Muthukrishnan: What's new: finding significant differences in network data streams. IEEE/ACM Trans. Netw. 13(6): 1219-1232 (2005)

  • Luc De Raedt, Albrecht Zimmermann: Constraint-Based Pattern Set Mining. SDM 2007

  • Luc De Raedt: Towards Query Evaluation in Inductive Databases Using Version Spaces. Database Support for Data Mining Applications 2004: 117-134

  • Luc De Raedt, Stefan Kramer: The Levelwise Version Space Algorithm and its Application to Molecular Fragment Finding. IJCAI 2001: 853-862

  • Guozhu Dong, Jinyan Li: Efficient Mining of Emerging Patterns: Discovering Trends and Differences. KDD 1999: 43-52.

  • Guozhu Dong, Jinyan Li: Mining border descriptions of emerging patterns from dataset pairs. Knowl. Inf. Syst. 8(2): 178-202 (2005).

  • Dong, G. and Han, J. and Lakshmanan, L.V.S. and Pei, J. and Wang, H. and Yu, P.S. Online Mining of Changes from Data Streams: Research Problems and Preliminary Results, Proceedings of the 2003 ACM SIGMOD Workshop on Management and Processing of Data Streams, 2003

  • Guozhu Dong, Jiawei Han, Joyce M. W. Lam, Jian Pei, Ke Wang, Wei Zou: Mining Constrained Gradients in Large Databases. IEEE Trans. Knowl. Data Eng. 16(8): 922-938 (2004).

  • Johannes Fischer, Volker Heun, Stefan Kramer: Optimal String Mining Under Frequency Constraints. PKDD 2006: 139-150

  • Venkatesh Ganti, Johannes Gehrke, Raghu Ramakrishnan: A Framework for Measuring Changes in Data Characteristics. PODS 1999: 126-137

  • Venkatesh Ganti, Johannes Gehrke, Raghu Ramakrishnan, Wei-Yin Loh: A Framework for Measuring Differences in Data Characteristics. J. Comput. Syst. Sci. 64(3): 542-578 (2002)

  • Garriga, G.C. and Kralj, P. and Lavrac, N. Closed Sets for Labeled Data?, PKDD, 2006

  • Hilderman, R.J. and Peckham, T. A Statistically Sound Alternative Approach to Mining Contrast Sets, Proceedings of the 4th Australasian Data Mining Conference, 2005 (pp157-172)

  • Hui-jing Huang, Yongsong Qin, Xiaofeng Zhu, Jilian Zhang, and Shichao Zhang. Difference Detection Between Two Contrast Sets. Proceedings of the 8th International Conference on Data Warehousing and Knowledge Discovery (DaWak), 2006.

  • Imberman, S.P. and Tansel, A.U. and Pacuit, E. An Efficient Method For Finding Emerging Frequent Itemsets, 3rd International Workshop on Mining Temporal and Sequential Data, pp112--121, 2004

  • Tomasz Imielinski, Leonid Khachiyan, Amin Abdulghani: Cubegrades: Generalizing Association Rules. Data Min. Knowl. Discov. 6(3): 219-257 (2002)

  • Inakoshi, H. and Ando, T. and Sato, A. and Okamoto, S. Discovery of emerging patterns from nearest neighbors, International Conference on Machine Learning and Cybernetics, 2002.

  • Xiaonan Ji, James Bailey, Guozhu Dong: Mining Minimal Distinguishing Subsequence Patterns with Gap Constraints. ICDM 2005: 194-201.

  • Xiaonan Ji, James Bailey, Guozhu Dong: Mining Minimal Distinguishing Subsequence Patterns with Gap Constraints. Knowl. Inf. Syst. 11(3): 259--286 (2007).

  • Daniel Kifer, Shai Ben-David, Johannes Gehrke: Detecting Change in Data Streams. VLDB 2004: 180-191

  • P Kralj, N Lavrac, D Gamberger, A Krstacic. Contrast Set Mining for Distinguishing Between Similar Diseases. LNCS Volume 4594, 2007.

  • Sau Dan Lee, Luc De Raedt: An Efficient Algorithm for Mining String Databases Under Constraints. KDID 2004: 108-129

  • Haiquan Li, Jinyan Li, Limsoon Wong, Mengling Feng, Yap-Peng Tan: Relative risk and odds ratio: a data mining perspective. PODS 2005: 368-377

  • Jinyan Li, Guimei Liu and Limsoon Wong. Mining Statistically Important Equivalence Classes and $\delta$-Discriminative Emerging Patterns. KDD 2007.

  • Jinyan Li, Thomas Manoukian, Guozhu Dong, Kotagiri Ramamohanarao: Incremental Maintenance on the Border of the Space of Emerging Patterns. Data Min. Knowl. Discov. 9(1): 89-116 (2004).

  • Jinyan Li, Kotagiri Ramamohanarao, Guozhu Dong. The Space of Jumping Emerging Patterns and Its Incremental Maintenance Algorithms. ICML 2000: 551-558.

  • Jinyan Li and Qiang Yang. Strong Compound-Risk Factors: Efficient Discovery through Emerging Patterns and Contrast Sets. IEEE Transactions on Information Technology in Biomedicine. To appear.

  • Lin, J. and Keogh, E. Group SAX: Extending the Notion of Contrast Sets to Time Series and Multimedia Data. Proceedings of the 10th european conference on principles and practice of knowledge discovery in databases. Berlin, Germany, September, 2006.

  • Bing Liu, Ke Wang, Lai-Fun Mun, Xin-Zhi Qi: Using Decision Tree Induction for Discovering Holes in Data. PRICAI 1998: 182-193

  • Bing Liu, Liang-Ping Ku, Wynne Hsu: Discovering Interesting Holes in Data. IJCAI (2) 1997: 930-935

  • Bing Liu, Wynne Hsu, Yiming Ma: Discovering the set of fundamental rule changes. KDD 2001: 335-340.

  • Elsa Loekito, James Bailey: Fast Mining of High Dimensional Expressive Contrast Patterns Using Zero-suppressed Binary Decision Diagrams. KDD 2006: 307-316.

  • Yu Meng, Margaret H. Dunham: Efficient Mining of Emerging Events in a Dynamic Spatiotemporal Environment. PAKDD 2006: 750-754

  • Tom M. Mitchell: Version Spaces: A Candidate Elimination Approach to Rule Learning. IJCAI 1977: 305-310

  • Amit Satsangi, Osmar R. Zaiane, Contrasting the Contrast Sets: An Alternative Approach, Eleventh International Database Engineering and Applications Symposium (IDEAS 2007), Banff, Canada, September 6-8, 2007

  • Michele Sebag: Delaying the Choice of Bias: A Disjunctive Version Space Approach. ICML 1996: 444-452

  • Michele Sebag: Using Constraints to Building Version Spaces. ECML 1994: 257-271

  • Arnaud Soulet, Bruno Crémilleux, François Rioult: Condensed Representation of EPs and Patterns Quantified by Frequency-Based Measures. KDID 2004: 173-190

  • Pawel Terlecki, Krzysztof Walczak: On the relation between rough set reducts and jumping emerging patterns. Inf. Sci. 177(1): 74-83 (2007).

  • Roger Ming Hieng Ting, James Bailey: Mining Minimal Contrast Subgraph Patterns. SDM 2006.

  • V. S. Tseng, C. J. Chu, and Tyne Liang, An Efficient Method for Mining Temporal Emerging Itemsets From Data Streams, International Computer Symposium, Workshop on Software Engineering, Databases and Knowledge Discovery, 2006

  • J. Vreeken, M. van Leeuwen, A. Siebes: Characterising the Difference. KDD 2007.

  • Haixun Wang, Wei Fan, Philip S. Yu, Jiawei Han: Mining concept-drifting data streams using ensemble classifiers. KDD 2003: 226-235

  • Peng Wang, Haixun Wang, Xiaochen Wu, Wei Wang, Baile Shi: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams. ICDM 2005: 474-481

  • Lusheng Wang, Hao Zhao, Guozhu Dong, Jianping Li: On the complexity of finding emerging patterns. Theor. Comput. Sci. 335(1): 15-27 (2005).

  • Ke Wang, Senqiang Zhou, Ada Wai-Chee Fu, Jeffrey Xu Yu: Mining Changes of Classification by Correspondence Tracing. SDM 2003.

  • Geoffrey I. Webb: Discovering Significant Patterns. Machine Learning 68(1): 1-33 (2007)

  • Geoffrey I. Webb, Songmao Zhang: K-Optimal Rule Discovery. Data Min. Knowl. Discov. 10(1): 39-79 (2005)

  • Geoffrey I. Webb, Shane M. Butler, Douglas A. Newlands: On detecting differences between groups. KDD 2003: 256-265.

  • Xiuzhen Zhang, Guozhu Dong, Kotagiri Ramamohanarao: Exploring constraints to efficiently mine emerging patterns from large high-dimensional datasets. KDD 2000: 310-314.

  • Lizhuang Zhao, Mohammed J. Zaki, Naren Ramakrishnan: BLOSOM: a framework for mining arbitrary boolean expressions. KDD 2006: 827-832

    Emerging/Contrast Pattern Based Classification

  • Hamad Alhammady, Kotagiri Ramamohanarao: The Application of Emerging Patterns for Improving the Quality of Rare-Class Classification. PAKDD 2004: 207-211

  • Hamad Alhammady, Kotagiri Ramamohanarao: Using Emerging Patterns and Decision Trees in Rare-Class Classification. ICDM 2004: 315-318

  • Hamad Alhammady, Kotagiri Ramamohanarao: Expanding the Training Data Space Using Emerging Patterns and Genetic Methods. SDM 2005

  • Hamad Alhammady, Kotagiri Ramamohanarao: Using Emerging Patterns to Construct Weighted Decision Trees. IEEE Trans. Knowl. Data Eng. 18(7): 865-876 (2006).

  • Hamad Alhammady, Kotagiri Ramamohanarao: Mining Emerging Patterns and Classification in Data Streams. Web Intelligence 2005: 272-275

  • James Bailey, Thomas Manoukian, Kotagiri Ramamohanarao: Classification Using Constrained Emerging Patterns. WAIM 2003: 226-237

  • Guozhu Dong, Xiuzhen Zhang, Limsoon Wong, Jinyan Li: CAEP: Classification by Aggregating Emerging Patterns. Discovery Science 1999: 30-42.

  • Hongjian Fan, Kotagiri Ramamohanarao: An Efficient Single-Scan Algorithm for Mining Essential Jumping Emerging Patterns for Classification. PAKDD 2002: 456-462

  • Hongjian Fan, Kotagiri Ramamohanarao: Efficiently Mining Interesting Emerging Patterns. WAIM 2003: 189-201

  • Hongjian Fan, Kotagiri Ramamohanarao: Noise Tolerant Classification by Chi Emerging Patterns. PAKDD 2004: 201-206

  • Hongjian Fan, Ming Fan, Kotagiri Ramamohanarao, Mengxu Liu: Further Improving Emerging Pattern Based Classifiers Via Bagging. PAKDD 2006: 91-96

  • Hongjian Fan, Kotagiri Ramamohanarao: A weighting scheme based on emerging patterns for weighted support vector machines. GrC 2005: 435-440

  • Hongjian Fan, Kotagiri Ramamohanarao: Fast Discovery and the Generalization of Strong Jumping Emerging Patterns for Building Compact and Accurate Classifiers. IEEE Trans. Knowl. Data Eng. 18(6): 721-737 (2006)

  • Jinyan Li, Guozhu Dong, Kotagiri Ramamohanarao: Instance-Based Classification by Emerging Patterns. PKDD 2000: 191-200

  • Jinyan Li, Guozhu Dong, Kotagiri Ramamohanarao: Making Use of the Most Expressive Jumping Emerging Patterns for Classification. PAKDD 2000: 220-232

  • Jinyan Li, Guozhu Dong, Kotagiri Ramamohanarao: Making Use of the Most Expressive Jumping Emerging Patterns for Classification. Knowl. Inf. Syst. 3(2): 131-145 (2001)

  • Jinyan Li, Kotagiri Ramamohanarao, Guozhu Dong: Emerging Patterns and Classification. ASIAN 2000: 15-32

  • Jinyan Li, Guozhu Dong, Kotagiri Ramamohanarao, Limsoon Wong: DeEPs: A New Instance-Based Lazy Discovery and Classification System. Machine Learning 54(2): 99-124 (2004).

  • Wenmin Li, Jiawei Han, Jian Pei: CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules. ICDM 2001: 369-376

  • Jinyan Li, Kotagiri Ramamohanarao, Guozhu Dong: Combining the Strength of Pattern Frequency and Distance for Classification. PAKDD 2001: 455-466

  • Bing Liu, Wynne Hsu, Yiming Ma: Integrating Classification and Association Rule Mining. KDD 1998: 80-86

  • Kotagiri Ramamohanarao, James Bailey: Discovery of Emerging Patterns and Their Use in Classification. Australian Conference on Artificial Intelligence 2003: 1-12

  • Ramamohanarao, K. and Bailey, J. and Fan, H. Efficient Mining of Contrast Patterns and Their Applications to Classification, Third International Conference on Intelligent Sensing and Information Processing, 2005 (39--47).

  • Ramamohanarao, K. and Fan, H. Patterns Based Classifiers, World Wide Web 2007: 10(71--83).

  • Qun Sun, Xiuzhen Zhang, Kotagiri Ramamohanarao: Noise Tolerance of EP-Based Classifiers. Australian Conference on Artificial Intelligence 2003: 796-806

  • Xiaoxin Yin, Jiawei Han: CPAR: Classification based on Predictive Association Rules. SDM 2003

  • Xiuzhen Zhang, Guozhu Dong, Kotagiri Ramamohanarao: Information-Based Classification by Aggregating Emerging Patterns. IDEAL 2000: 48-53

  • Xiuzhen Zhang, Guozhu Dong, Kotagiri Ramamohanarao: Building Behaviour Knowledge Space to Make Classification Decision. PAKDD 2001: 488-494

  • Zhou Wang, Hongjian Fan, Kotagiri Ramamohanarao: Exploiting Maximal Emerging Patterns for Classification. Australian Conference on Artificial Intelligence 2004: 1062-1068

    Other Applications of Emerging Patterns

  • Anne-Laure Boulesteix, Gerhard Tutz, Korbinian Strimmer: A CART-based approach to discover emerging patterns in microarray data. Bioinformatics 19(18): 2465-2472 (2003).

  • Lijun Chen, Guozhu Dong: Masquerader Detection Using OCLEP: One Class Classification Using Length Statistics of Emerging Patterns. Proceedings of International Workshop on INformation Processing over Evolving Networks (WINPEN), 2006.

  • Guozhu Dong, Kaustubh Deshpande: Efficient Mining of Niches and Set Routines. PAKDD 2001: 234-246

  • Grandinetti, W.M. and Chesnevar, C.I. and Falappa, M.A. Enhanced Approximation of the Emerging Pattern Space using an Incremental Approach, Proceedings of VII Workshop of Researchers in Computer Sciences, Argentine, pp263--267, 2005

  • Jinyan Li, Huiqing Liu, See-Kiong Ng, Limsoon Wong. Discovery of Significant Rules for Classifying Cancer Diagnosis Data . Bioinformatics. 19 (suppl. 2): ii93-ii102. (This paper was also presented in the 2003 European Conference on Computational Biology, Paris, France, September 26-30.)

  • Jinyan Li, Huiqing Liu, James R. Downing, Allen Eng-Juh Yeoh, Limsoon Wong. Simple Rules Underlying Gene Expression Profiles of More than Six Subtypes of Acute Lymphoblastic Leukemia (ALL) Patients. Bioinformatics. 19:71--78, 2003.

  • Jinyan Li, Limsoon Wong: Emerging patterns and gene expression data. Genome Informatics, 2001:12(3--13).

  • Jinyan Li, Limsoon Wong: Identifying good diagnostic gene groups from gene expression profiles using the concept of emerging patterns. Bioinformatics 18(5): 725-734 (2002)

  • Jinyan Li, Limsoon Wong. Geography of Differences Between Two Classes of Data. Proceedings 6th European Conference on Principles of Data Mining and Knowledge Discovery, pages 325--337, Helsinki, Finland, August 2002.

  • Jinyan Li and Limsoon Wong. Structural Geography of the space of emerging patterns. Intelligent Data Analysis (IDA): An International Journal, Volume 9, pages 567-588, November 2005.

  • Jinyan Li, Xiuzhen Zhang, Guozhu Dong, Kotagiri Ramamohanarao, Qun Sun: Efficient Mining of High Confidience Association Rules without Support Thresholds. PKDD 1999: 406-411

  • Shihong Mao, Guozhu Dong: Discovery of Highly Differentiative Gene Groups from Microarray Gene Expression Data Using the Gene Club Approach. J. Bioinformatics and Computational Biology 3(6): 1263-1280 (2005).

  • Podraza, R. and Tomaszewski, K. KTDA: Emerging Patterns Based Data Analysis System, Proceedings of XXI Fall Meeting of Polish Information Processing Society, pp213--221, 2005

  • Rioult, F. Mining strong emerging patterns in wide SAGE data, Proceedings of the ECML/PKDD Discovery Challenge Workshop, Pisa, Italy, pp127--138, 2004

  • Eng-Juh Yeoh, Mary E. Ross, Sheila A. Shurtleff, W. Kent William, Divyen Patel, Rami Mahfouz, Fred G. Behm, Susana C. Raimondi, Mary V. Reilling, Anami Patel, Cheng Cheng, Dario Campana, Dawn Wilkins, Xiaodong Zhou, Jinyan Li, Huiqing Liu, Chin-Hon Pui, William E. Evans, Clayton Naeve, Limsoon Wong, James R. Downing. Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell, 1:133--143, March 2002.

  • Yoon, H.S. and Lee, S.H. and Kim, J.H. Application of Emerging Patterns for Multi-source Bio-Data Classification and Analysis, LECTURE NOTES IN COMPUTER SCIENCE Vol 3610, 2005.

  • Yu, L.T.H. and Chung, F. and Chan, S.C.F. and Yuen, S.M.C. Using emerging pattern based projected clustering and gene expression data for cancer detection, Proceedings of the second conference on Asia-Pacific bioinformatics, pp75--84, 2004.

  • Zhang, X. and Dong, G. and Wong, L. Using CAEP to predict translation initiation sites from genomic DNA sequences, TR2001/22, CSSE, Univ. of Melbourne, 2001.