Bibliography: Mining Data Streams

    Language and System

  1. Hancock: A Language for Extracting Signatures from Data Streams, by Corinna Cortes, Kathleen Fisher, Daryl Pregibon, Anne Rogers, in the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD) 2000.
  2. Aurora: A Data Stream Management System (Demonstration), by D. Abadi, D. Carney, U. Cetintemel, M. Cherniack, C. Convey, C. Erwin, E. Galvez, M. Hatoun, J. Hwang, A. Maskey, A. Rasin, A. Singer, M. Stonebraker, N. Tatbul, Y. Xing, R.Yan, S. Zdonik, in the ACM International Conference on Management of Data (SIGMOD) 2003.
  3. Models and issues in data stream systems, by B. Babcock, S. Babu, M. Datar, R. Motwani, J. Widom, in the ACM Symposium on Principles of Database Systems (PODS) 2002.
  4. Query Languages and Data Models for Database Sequences and Data Streams, by Yan-Nei Law, Haixun Wang, Carlo Zaniolo, in the International Conference on Very Large Data Bases (VLDB) 2004.
  5. ATLaS: A Native Extension of SQL for Data Mining, by Haixun Wang, Carlo Zaniolo, in the SIAM International Conference on Data Mining (SIAM DM) 2003.

    Change, Novelty Detection

  6. Online Novelty Detection on Temporal Sequences, by Junshui Ma, Simon Perkins, in the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD) 2003.
  7. A Framework for Diagnosing Changes in Evolving Data Streams, by Charu C. Aggarwal, in the ACM International Conference on Management of Data (SIGMOD) 2003.
  8. Efficient Elastic Burst Detection in Data Streams, by Yunyue Zhu, Dennis Shasha, in the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD) 2003.
  9. Active Mining of Data Streams, by Wei Fan, Yi-an Huang, Haixun Wang, Philip S Yu, in the SIAM International Conference on Data Mining (SIAM DM) 2004.

    Clustering, Near-Neighbor Search

  10. TECNO-STREAMS: Tracking Evolving Clusters in Noisy Data Streams with a Scalable Immune System Learning Model, by Olfa Nasraoui, Cesar Cardona Uribe, Carlos Rojas Coronel, in the IEEE International Conf. Data Mining (ICDM) 2003.
  11. Clustering Binary Data Streams with Kmeans, by Carlos Ordonez, in the ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD) 2003.
  12. Reverse Nearest Neighbor Aggregates Over Data Streams, by Flip Korn, S. Muthukrishnan, Divesh Srivastava, in the International Conference on Very Large Data Bases (VLDB) 2002.
  13. A Framework for Clustering Evolving Data Streams, by Charu C. Aggarwal, Jiawei Han, Jianyong Wang, Philip S. Yu, in the International Conference on Very Large Data Bases (VLDB) 2003.
  14. Streaming-Data Algorithms for High-Quality Clustering, by Liadan O'Callaghan, Nina Mishra, Adam Meyerson, Sudipto Guha, Rajeev Motawani, in the IEEE International Conference Data Engineering (ICDE) 2001.

    Synopsis Maintenance

  15. Approximate Counts and Quantiles over Sliding Windows, by Arvind Arasu, Gurmeet Singh Manku, in the ACM Symposium on Principles of Database Systems (PODS) 2004.
  16. Distributed TopK Monitoring, by Brian Babcock, Chris Olston, in the ACM International Conference on Management of Data (SIGMOD) 2003.
  17. Maintaining Stream Statistics over Sliding Windows, by Mayur Datar, Aristides Gionis, Piotr Indyk, Rajeev Motwani, in the ACM-SIAM Symposium on Discrete Algorithms (SODA) 2002.
  18. Maintaining Variance and k-Medians over Data Stream Windows, by Brian Babcock, Mayur Datar, Rajeev Motwani, LiadanO O'Callaghan, in the ACM Symposium on Principles of Database Systems (PODS) 2003.
  19. StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time, by Yunyue Zhu, Dennis Shasha, in the International Conference on Very Large Data Bases (VLDB) 2002.
  20. Mining A Stream of Transactions for Customer Patterns, by Diane Lambert, Jose C. Pinheiro, in the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD) 2001.
  21. Approximate Medians and other Quantiles in One Pass and with Limited Memory, by Gurmeet Singh Manku, Sridhar Rajagopalan, Bruce G. Lindsay, in the ACM International Conference on Management of Data (SIGMOD) 1998.
  22. Random Sampling Techniques for Space Efficient Online Computation of Order Statistics of Large Datasets, by Gurmeet Singh Manku, Sridhar Rajagopalan, Bruce G. Lindsay, in the ACM International Conference on Management of Data (SIGMOD) 1999.
  23. Synopsis Data Structures for Massive Data Sets, by Phillip B. Gibbons, Yossi Matias, in the ACM-SIAM Symposium on Discrete Algorithms (SODA) 1999.

    Frequent Pattern Mining

  24. What's Hot and What's Not: Tracking Most Frequent Items Dynamically, by Graham Cormode, S. Muthukrishnan, in the ACM Symposium on Principles of Database Systems (PODS) 2003.
  25. Dynamically Maintaining Frequent Items Over A Data Stream, by Cheqing Jin, Weining Qian, Chaofeng Sha, Jeffrey X. Yu, Aoying Zhou, in the Conference on Information and Knowledge Management (CIKM) 2003.
  26. Processing Frequent Itemset Discovery Queries by Division and Set Containment Join Operators, by Ralf Rantzau, in the ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD) 2003.
  27. Approximate Frequency Counts over Data Streams, by Gurmeet Singh Manku, Rajeev Motawani, in the International Conference on Very Large Data Bases (VLDB) 2002.
  28. An Algorithm for In-Core Frequent Itemset Mining on Streaming Data, by Ruoming Jin, Gagan Agrawal, submitted for publication 2004.
  29. A Simple Algorithm for Finding Frequent Elements in Streams and Bags, by Richard M. Karp, Scott Shenker, in the ACM Transactions on Database Systems (TODS) 2003.
  30. Bursty and Hierarchical Structure in Streams, by Jon Kleinberg, in the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD) 2002.
  31. Online Algorithms for Mining Semi-structured Data Stream, by Tatsuya Asai, Hiroki Arimura, Kenji Abe, Shinji Kawasoe, Setsuo Arikawa, in the IEEE International Conf. Data Mining (ICDM) 2002.
  32. Finding Hierarchical Heavy Hitters in Data Streams, by Graham Cormode, Flip Korn, S. Muthukrishnan, Divesh Srivastava, in the International Conference on Very Large Data Bases (VLDB) 2003.
  33. Finding Recent Frequent Itemsets Adaptively over Online Data Streams, by Joong Hyuk Chang, Won Suk Lee, in the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD) 2003.

    Classification, Regression and Other Learning Methods

  34. A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification, by W. Nick Street, YongSeog Kim, in the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD) 2001.
  35. A Regression-Based Temporal Pattern Mining Scheme for Data Streams, by Wei-Guang Teng, Ming-Syan Chen, Philip S. Yu, in the International Conference on Very Large Data Bases (VLDB) 2003.
  36. Mining Concept Drifting Data Streams using Ensemble Classifiers, by Haixun Wang, Wei Fan, Philip S. Yu, Jiawei Han, in the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD) 2003.
  37. Mining High Speed Data Streams, by Pedro Domingos, Geoff Hulten, in the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD) 2000.
  38. Accurate Decision Trees for Mining Highspeed Data Streams, by Joao Gama, Ricardo Rocha, Pedro Medas, in the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD) 2003.
  39. Mining Time-Changing Data Streams, by Geoff Hulten, Laurie Spencer, Pedro Domingos, in the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD) 2001.
  40. Efficient Decision Tree Construction on Streaming Data, by Ruoming Jin, Gagan Agrawal, in the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD) 2003.
  41. Dynamic Weighted Majority: A New Ensemble Method for Tracking Concept Drift, by Jeremy Z. Kolter, Marcus A. Maloof, in the IEEE International Conf. Data Mining (ICDM) 2003.
  42. Distributed Web Mining using Bayesian Networks from Multiple Data Streams, by R. Chen, K. Sivakumar, H. Kargupta, in the IEEE International Conf. Data Mining (ICDM) 2001.
  43. An approach to online Bayesian learning from multiple data streams, by R. Chen, K. Sivakumar, H. Kargupta, in the European Conference on Principles of Data Mining and Knowledge Discovery (PKDD) 2001.
  44. Adaptive, Hands-Off Stream Mining, by Spiros Papadimitriou, Anthony Brockwell, Christos Faloutsos, in the International Conference on Very Large Data Bases (VLDB) 2003.
  45. Correlating Synchronous And Asynchronous Data Streams, by Sudipto Guha, D. Gunopulos, Nick Koudas, in the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD) 2003.