Mining the internet: the eighth wonder of the world.
The architecture of complexity: the structure and the dynamics of networks, from the web to the cell.
A Bayesian network classifier with inverse tree structure for voxelwise magnetic resonance image analysis.
Rong Chen, Edward Herskovits
Variable latent semantic indexing.
Anirban Dasgupta, Ravi Kumar, Prabhakar Raghavan, Andrew Tomkins
Mining images on semantics via statistical learning.
Jianping Fan, Hangzai Luo, Mohand-Said Hacid
Rule extraction from linear support vector machines.
Glenn Fung, Sathyakama Sandilya, R. Bharat Rao
Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering.
Bin Gao, Tie-Yan Liu, Xin Zheng, QianSheng Cheng, Wei-Ying Ma
Dimension induced clustering.
Aristides Gionis, Alexander Hinneburg, Spiros Papadimitriou, Panayiotis Tsaparas
Mining tree queries in a graph.
Bart Goethals, Eveline Hoekx, Jan Van den Bussche
Non-redundant clustering with conditional ensembles.
David Gondek, Thomas Hofmann
The predictive power of online chatter.
Daniel Gruhl, Ramanathan V. Guha, Ravi Kumar, Jasmine Novak, Andrew Tomkins
Wavelet synopsis for data streams: minimizing non-euclidean error.
Sudipto Guha, Boulos Harb
Combining email models for false positive reduction.
Shlomo Hershkop, Salvatore J. Stolfo
Nomograms for visualizing support vector machines.
Aleks Jakulin, Martin Mozina, Janez Demsar, Ivan Bratko, Blaz Zupan
Fast discovery of unexpected patterns in data, relative to a Bayesian network.
Szymon Jaroszewicz, Tobias Scheffer
Local sparsity control for naive Bayes with extreme misclassification costs.
A multiple tree algorithm for the efficient association of asteroid observations.
Jeremy Kubica, Andrew W. Moore, Andrew J. Connolly, Robert Jedicke
Combining partitions by probabilistic label aggregation.
Tilman Lange, Joachim M. Buhmann
Feature bagging for outlier detection.
Aleksandar Lazarevic, Vipin Kumar
Simple and effective visual models for gene expression cancer diagnostics.
Gregor Leban, Minca Mramor, Ivan Bratko, Blaz Zupan
Graphs over time: densification laws, shrinking diameters and possible explanations.
Jure Leskovec, Jon M. Kleinberg, Christos Faloutsos
A general model for clustering binary data.
Discovering evolutionary theme patterns from text: an exploration of temporal text mining.
Qiaozhu Mei, ChengXiang Zhai
A distributed learning framework for heterogeneous data sources.
Srujana Merugu, Joydeep Ghosh
Detection of emerging space-time clusters.
Daniel B. Neill, Andrew W. Moore, Maheshkumar Sabhnani, Kenny Daniel
On mining cross-graph quasi-cliques.
Jian Pei, Daxin Jiang, Aidong Zhang
Query chains: learning to rank from implicit feedback.
Filip Radlinski, Thorsten Joachims
Robust boosting and its relation to bagging.
On the use of linear programming for unsupervised text classification.
Sampling-based sequential subgroup mining.
Probabilistic workflow mining.
Ricardo Bezerra de Andrade e Silva, Jiji Zhang, James G. Shanahan
Finding partial orders from unordered 0-1 data.
Antti Ukkonen, Mikael Fortelius, Heikki Mannila
Web object indexing using domain knowledge.
Muyuan Wang, Zhiwei Li, Lie Lu, Wei-Ying Ma, Naiyao Zhang
Improving discriminative sequential learning with rare--but--important associations.
Xuan Hieu Phan, Minh Le Nguyen, Tu Bao Ho, Susumu Horiguchi
Summarizing itemset patterns: a profile-based approach.
Xifeng Yan, Hong Cheng, Jiawei Han, Dong Xin
Mining closed relational graphs with connectivity constraints.
Xifeng Yan, Xianghong Jasmine Zhou, Jiawei Han
Anonymity-preserving data collection.
Zhiqiang Yang, Sheng Zhong, Rebecca N. Wright
Cross-relational clustering with user's guidance.
Xiaoxin Yin, Jiawei Han, Philip S. Yu
SVM selective sampling for ranking with application to data retrieval.
Reasoning about sets using redescription mining.
Mohammed Javeed Zaki, Naren Ramakrishnan
A new scheme on privacy-preserving data classification.
Nan Zhang, Shengquan Wang, Wei Zhao
Streaming feature selection using alpha-investing.
Jing Zhou, Dean P. Foster, Robert A. Stine, Lyle H. Ungar
Finding similar files in large document repositories.
George Forman, Kave Eshghi, Stephane Chiocchetti
An approach to spacecraft anomaly detection problem using kernel feature space.
Ryohei Fujimaki, Takehisa Yairi, Kazuo Machida
Price prediction and insurance for online auctions.
Deriving marketing intelligence from online discussion.
Natalie S. Glance, Matthew Hurst, Kamal Nigam, Matthew Siegler, Robert Stockton, Takashi Tomokiyo
Making holistic schema matching robust: an ensemble approach.
Bin He, Kevin Chen-Chuan Chang
Using retrieval measures to assess similarity in mining dynamic web clickstreams.
Olfa Nasraoui, Cesar Cardona, Carlos Rojas
Using relational knowledge discovery to prevent securities fraud.
Jennifer Neville, Özgür Simsek, David D. Jensen, John Komoroske, Kelly Palmer, Henry G. Goldberg
A hit-miss model for duplicate detection in the WHO drug safety database.
G. Niklas Norén, Roland Orre, Andrew Bate
Predicting the product purchase patterns of corporate customers.
Bhavani Raskutti, Alan Herschtal
Modeling and predicting personal information dissemination behavior.
Xiaodan Song, Ching-Yung Lin, Belle L. Tseng, Ming-Ting Sun
Email data cleaning.
Jie Tang, Hang Li, Yunbo Cao, ZhaoHui Tang
Dynamic syslog mining for network failure monitoring.
Kenji Yamanishi, Yuko Maruyama
Enhancing the lift under budget constraints: an application in the mutual fund industry.
Lian Yan, Michael Fassino, Patrick Baldasare
Learning to predict train wheel failures.
Chunsheng Yang, Sylvain Létourneau
Towards exploratory test instance specific algorithms for high dimensional classification.
Charu C. Aggarwal
Model-based overlapping clustering.
Arindam Banerjee, Chase Krumpelman, Joydeep Ghosh, Sugato Basu, Raymond J. Mooney
Integration of profile hidden Markov model output into association rule mining.
Christopher Besemann, Anne Denton
Scalable discovery of hidden emails from large folders.
Giuseppe Carenini, Raymond T. Ng, Xiaodong Zhou
Web mining from competitors' websites.
Xin Chen, Yi-fang Brook Wu
LIPED: HMM-based life profiles for adaptive event detection.
Chien Chin Chen, Meng Chang Chen, Ming-Syan Chen
Parallel mining of closed sequential patterns.
Shengnan Cong, Jiawei Han, David A. Padua
Creating social networks to improve peer-to-peer networking.
Andrew S. Fast, David D. Jensen, Brian Neil Levine
Unweaving a web of documents.
Ramanathan V. Guha, Ravi Kumar, D. Sivakumar, Ravi Sundaram
Cinda Heeren, Leonard Pitt
Application of kernels to link analysis.
Takahiko Ito, Masashi Shimbo, Taku Kudo, Yuji Matsumoto
Privacy-preserving distributed k-means clustering over arbitrarily partitioned data.
Geetha Jagannathan, Rebecca N. Wright
Simultaneous optimization of complex mining tasks with a knowledgeable cache.
Ruoming Jin, Kaushik Sinha, Gagan Agrawal
Discovering frequent topological structures from graph datasets.
Ruoming Jin, Chao Wang, Dmitrii Polshakov, Srinivasan Parthasarathy, Gagan Agrawal
A maximum entropy web recommendation system: combining collaborative and content features.
Xin Jin, Yanzan Zhou, Bamshad Mobasher
Information retrieval based on collaborative filtering with latent interest semantic map.
Noriaki Kawamae, Katsumi Takahashi
Determining an author's native language by mining a text for errors.
Moshe Koppel, Jonathan Schler, Kfir Zigdon
A fast kernel-based multilevel algorithm for graph clustering.
Inderjit S. Dhillon, Yuqiang Guan, Brian Kulis
Co-clustering by block value decomposition.
Bo Long, Zhongfei (Mark) Zhang, Philip S. Yu
Daniel Lowd, Christopher Meek
Estimating missed actual positives using independent classifiers.
Sandeep Mane, Jaideep Srivastava, San-Yih Hwang
Efficient computations via scalable sparse kernel partial least squares and boosted latent features.
Optimizing time series discretization for knowledge discovery.
Fabian Mörchen, Alfred Ultsch
Key semantics extraction by dependency tree mining.
Satoshi Morinaga, Hiroki Arimura, Takahiro Ikeda, Yosuke Sakao, Susumu Akamine
Density-based clustering of uncertain data.
Hans-Peter Kriegel, Martin Pfeifle
Evaluating similarity measures: a large-scale study in the orkut social network.
Ellen Spertus, Mehran Sahami, Orkut Buyukkokten
A hybrid unsupervised approach for document clustering.
Mihai Surdeanu, Jordi Turmo, Alicia Ageno
Mining comparable bilingual text corpora for cross-language information integration.
Tao Tao, ChengXiang Zhai
Regression error characteristic surfaces.
Formulating distance functions via the kernel trick.
Gang Wu, Edward Y. Chang, Navneet Panda
Combining proactive and reactive predictions for data streams.
Ying Yang, Xindong Wu, Xingquan Zhu
A generalized framework for mining spatio-temporal patterns in scientific data.
Hui Yang, Srinivasan Parthasarathy, Sameep Mehta
Building connected neighborhood graphs for isometric data embedding.
Pattern lattice traversal by selective jumps.
Osmar R. Zaïane, Mohammad El-Hajj
CLICKS: an effective algorithm for mining subspace clusters in categorical datasets.
Mohammed Javeed Zaki, Markus Peters, Ira Assent, Thomas Seidl
Fast window correlations over uncooperative time series.
Richard Cole, Dennis E. Shasha, Xiaojian Zhao
Failure detection and localization in component based systems by online tracking.
Haifeng Chen, Guofei Jiang, Cristian Ungureanu, Kenji Yoshihira
Generation of synthetic data sets for evaluating the accuracy of knowledge discovery systems.
Daniel R. Jeske, Behrokh Samadi, Pengyue J. Lin, Lan Ye, Sean Cox, Rui Xiao, Ted Younglove, Minh Ly, Douglas Holt, Ryan Rich
Data mining in the chemical industry.
Alex N. Kalos, Tim Rey
Mining risk patterns in medical data.
Jiuyong Li, Ada Wai-Chee Fu, Hongxing He, Jie Chen, Huidong Jin, Damien McAullay, Graham J. Williams, Ross Sparks, Chris Kelman
An integrated framework on mining logs files for computing system management.
Tao Li, Feng Liang, Sheng Ma, Wei Peng
Automated detection of frontal systems from numerical model-generated data.
Xiang Li, Rahul Ramachandran, Sara J. Graves, Sunil Movva, Bilahari Akkiraju, David Emmitt, Steven Greco, Robert Atlas, Joseph Terry, Juan-Carlos Jusem
Disease progression modeling from historical clinical databases.
Ronald K. Pearson, Robert J. Kingan, Alan Hochberg
Mining rare and frequent events in multi-camera surveillance video using self-organizing maps.
Valery A. Petrushin
Short term performance forecasting in enterprise systems.
Rob Powers, Moisés Goldszmidt, Ira Cohen
A multinomial clustering model for fast simulation of computer architecture designs.
Kaushal Sanghai, Ting Su, Jennifer G. Dy, David R. Kaeli
Pattern-based similarity search for microarray data.
Haixun Wang, Jian Pei, Philip S. Yu