Mismatched models, wrong results, and dreadful decisions: on choosing appropriate data mining tools.
David J. Hand
Mining web logs: applications and challenges.
Randomization methods in data mining.
Data mining at NASA: from theory to applications.
Ashok N. Srivastava
Network science: an introduction to recent statistical approaches.
Open standards and cloud computing: KDD-2009 panel report.
Michael Zeller, Robert Grossman, Christoph Lingenfelder, Michael R. Berthold, Erik Marcade, Rick Pechter, Mike Hoskins, Wayne Thompson, Rich Holada
Regression-based latent factor models.
Deepak Agarwal, Bee-Chung Chen
Frequent pattern mining with uncertain data.
Charu C. Aggarwal, Yan Li, Jianyong Wang, Jing Wang
Structured correspondence topic models for mining captioned figures in biological literature.
Amr Ahmed, Eric P. Xing, William W. Cohen, Robert F. Murphy
Name-ethnicity classification from open sources.
Anurag Ambekar, Charles B. Ward, Jahangir Mohammed, Swapna Male, Steven Skiena
Detection of unique temporal segments by information theoretic meta-clustering.
Shin Ando, Einoshin Suzuki
Collusion-resistant anonymous data collection method.
Mafruz Zaman Ashrafi, See-Kiong Ng
A viewpoint-based approach for interaction graph analysis.
Sitaram Asur, Srinivasan Parthasarathy
Optimizing web traffic via the media scheduling problem.
Lars Backstrom, Jon M. Kleinberg, Ravi Kumar
Improving clustering stability with combinatorial MRFs.
Ron Bekkerman, Martin Scholz, Krishnamurthy Viswanathan
Temporal mining for interactive workflow data analysis.
Michele Berlingerio, Fabio Pinelli, Mirco Nanni, Fosca Giannotti
Probabilistic frequent itemset mining in uncertain databases.
Thomas Bernecker, Hans-Peter Kriegel, Matthias Renz, Florian Verhein, Andreas Züfle
The offset tree for learning with partial labels.
Alina Beygelzimer, John Langford
New ensemble methods for evolving data streams.
Albert Bifet, Geoffrey Holmes, Bernhard Pfahringer, Richard Kirkby, Ricard Gavaldà
CoCo: coding cost for parameter-free outlier detection.
Christian Böhm, Katrin Haegler, Nikola S. Müller, Claudia Plant
Efficient anomaly monitoring over moving object trajectory streams.
Yingyi Bu, Lei Chen, Ada Wai-Chee Fu, Dawei Liu
Connections between the lines: augmenting social networks with text.
Jonathan Chang, Jordan L. Boyd-Graber, David M. Blei
Extracting discriminative concepts for domain adaptation in text mining.
Bo Chen, Wai Lam, Ivor W. Tsang, Tak-Lam Wong
Constrained optimization for validation-guided conditional random field learning.
Minmin Chen, Yixin Chen, Michael R. Brent, Aaron E. Tenney
Efficient influence maximization in social networks.
Wei Chen, Yajun Wang, Siyu Yang
Large-scale behavioral targeting.
Ye Chen, Dmitry Pavlov, John F. Canny
On compressing social networks.
Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, Michael Mitzenmacher, Alessandro Panconesi, Prabhakar Raghavan
Regret-based online ranking for a growing digital library.
A generalized Co-HITS algorithm and its application to bipartite graphs.
Hongbo Deng, Michael R. Lyu, Irwin King
Mining for the most certain predictions from dyadic data.
Meghana Deodhar, Joydeep Ghosh
Efficiently learning the accuracy of labeling sources for selective sampling.
Pinar Donmez, Jaime G. Carbonell, Jeff G. Schneider
Large human communication networks: patterns and a utility-driven generator.
Nan Du, Christos Faloutsos, Bai Wang, Leman Akoglu
Learning with a non-exhaustive training dataset: a case study: detection of bacteria cultures using optical-scattering technology.
Murat Dundar, E. Daniel Hirleman, Arun K. Bhunia, J. Paul Robinson, Bartek Rajwa
Turning down the noise in the blogosphere.
Khalid El-Arini, Gaurav Veda, Dafna Shahaf, Carlos Guestrin
Feature shaping for linear SVM classifiers.
George Forman, Martin Scholz, Shyamsundar Rajaram
A multi-relational approach to spatial classification.
Richard Frank, Martin Ester, Arno J. Knobbe
Scalable pseudo-likelihood estimation in hybrid random fields.
Antonino Freno, Edmondo Trentin, Marco Gori
Issues in evaluation of stream learning algorithms.
João Gama, Raquel Sebastião, Pedro Pereira Rodrigues
Heterogeneous source consensus learning via decision propagation and negotiation.
Jing Gao, Wei Fan, Yizhou Sun, Jiawei Han
Multi-focal learning and its application to customer service support.
Yong Ge, Hui Xiong, Wenjun Zhou, Ramendra K. Sahoo, Xiaofeng Gao, Weili Wu
Co-clustering on manifolds.
Quanquan Gu, Jie Zhou
Analyzing patterns of user content generation in online social networks.
Lei Guo, Enhua Tan, Songqing Chen, Xiaodong Zhang, Yihong Eric Zhao
Tell me something I don't know: randomization strategies for iterative data mining.
Sami Hanhijärvi, Markus Ojala, Niko Vuokko, Kai Puolamäki, Nikolaj Tatti, Heikki Mannila
Exploiting Wikipedia as external knowledge for document clustering.
Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, Xiaohua Zhou
TrustWalker: a random walk model for combining trust-based and item-based recommendation.
Mohsen Jamali, Martin Ester
Drosophila gene expression pattern annotation using sparse features and term-term interactions.
Shuiwang Ji, Lei Yuan, Ying-Xin Li, Zhi-Hua Zhou, Sudhir Kumar, Jieping Ye
Cartesian contour: a concise representation for a collection of frequent sets.
Ruoming Jin, Yang Xiang, Lin Liu
Genre-based decomposition of email class noise.
Aleksander Kolcz, Gordon V. Cormack
Characteristic relational patterns.
Arne Koopman, Arno Siebes
Collaborative filtering with temporal dynamics.
Collective annotation of Wikipedia entities in web text.
Sayali Kulkarni, Amit Singh, Ganesh Ramakrishnan, Soumen Chakrabarti
Finding a team of experts in social networks.
Theodoros Lappas, Kun Liu, Evimaria Terzi
On burstiness-aware search for document sequences.
Theodoros Lappas, Benjamin Arai, Manolis Platakis, Dimitrios Kotsakos, Dimitrios Gunopulos
Improving data mining utility with projective sampling.
Meme-tracking and the dynamics of the news cycle.
Jure Leskovec, Lars Backstrom, Jon M. Kleinberg
DynaMMo: mining and summarization of coevolving sequences with missing values.
Lei Li, James McCann, Nancy S. Pollard, Christos Faloutsos
On the tradeoff between privacy and utility in data publishing.
Tiancheng Li, Ninghui Li
MetaFac: community discovery via relational hypergraph factorization.
Yu-Ru Lin, Jimeng Sun, Paul Castro, Ravi B. Konuru, Hari Sundaram, Aisling Kelliher
BBM: bayesian browsing model from petabyte-scale data.
Chao Liu, Fan Guo, Christos Faloutsos
Large-scale sparse logistic regression.
Jun Liu, Jianhui Chen, Jieping Ye
Classification of software behaviors for failure detection: a discriminative pattern mining approach.
David Lo, Hong Cheng, Jiawei Han, Siau-Cheng Khoo, Chengnian Sun
Consensus group stable feature selection.
Steven Loscalzo, Lei Yu, Chris H. Q. Ding
Grouped graphical Granger modeling methods for temporal causal modeling.
Aurelie C. Lozano, Naoki Abe, Yan Liu, Saharon Rosset
Spatial-temporal causal modeling for climate change attribution.
Aurelie C. Lozano, Hongfei Li, Alexandru Niculescu-Mizil, Yan Liu, Claudia Perlich, Jonathan R. M. Hosking, Naoki Abe
Using graph-based metrics with empirical risk minimization to speed up active learning on networked data.
Sofus A. Macskassy
Characterizing individual communication patterns.
R. Dean Malmgren, Jake M. Hofman, Luis A. Nunes Amaral, Duncan J. Watts
Large-scale graph mining using backbone refinement classes.
Andreas Maunz, Christoph Helma, Stefan Kramer
Differentially Private Recommender Systems: Building Privacy into the Netflix Prize Contenders.
Frank McSherry, Ilya Mironov
WhereNext: a location predictor on trajectory pattern mining.
Anna Monreale, Fabio Pinelli, Roberto Trasarti, Fosca Giannotti
Correlated itemset mining in ROC space: a constraint programming approach.
Siegfried Nijssen, Tias Guns, Luc De Raedt
TANGENT: a novel, 'Surprise me', recommendation algorithm.
Kensuke Onuma, Hanghang Tong, Christos Faloutsos
Mind the gaps: weighting the unknown in large-scale one-class collaborative filtering.
Rong Pan, Martin Scholz
An association analysis approach to biclustering.
Gaurav Pandey, Gowtham Atluri, Michael Steinbach, Chad L. Myers, Vipin Kumar
CP-summary: a concise representation for browsing frequent itemsets.
Ardian Kristanto Poernomo, Vivekanand Gopalkrishnan
Towards efficient mining of proportional fault-tolerant frequent itemsets.
Ardian Kristanto Poernomo, Vivekanand Gopalkrishnan
Audience selection for on-line brand advertising: privacy-friendly social network targeting.
Foster J. Provost, Brian Dalessandro, Rod Hook, Xiaohan Zhang, Alan Murray
A principled and flexible framework for finding alternative clusterings.
Zijie Qi, Ian Davidson
Learning optimal ranking with tensor factorization for tag recommendation.
Steffen Rendle, Leandro Balby Marinho, Alexandros Nanopoulos, Lars Schmidt-Thieme
Scalable graph clustering using stochastic flows: applications to community discovery.
Venu Satuluri, Srinivasan Parthasarathy
Measuring the effects of preprocessing decisions and network forces in dynamic network analysis.
Jerry Scripps, Pang-Ning Tan, Abdol-Hossein Esfahanian
Mining discrete patterns via binary matrix factorization.
Bao-Hong Shen, Shuiwang Ji, Jieping Ye
Anomalous window discovery through scan statistics for linear intersecting paths (SSLIP).
Lei Shi, Vandana Pursnani Janeja
User grouping behavior in online forums.
Xiaolin Shi, Jun Zhu, Rui Cai, Lei Zhang
Causality quantification and its applications: structuring and modeling of multivariate time series.
Takashi Shibuya, Tatsuya Harada, Yasuo Kuniyoshi
Ranking-based clustering of heterogeneous information networks with star network schema.
Yizhou Sun, Yintao Yu, Jiawei Han
Social influence analysis in large-scale networks.
Jie Tang, Jimeng Sun, Chi Wang, Zi Yang
Relational learning via latent social dimensions.
Lei Tang, Huan Liu
Constant-factor approximation algorithms for identifying dynamic communities.
Chayant Tantipathananandh, Tanya Y. Berger-Wolf
DOULION: counting triangles in massive graphs with a coin.
Charalampos E. Tsourakakis, U. Kang, Gary L. Miller, Christos Faloutsos
Category detection using hierarchical mean shift.
Pavan Vatturi, Weng-Keen Wong
Learning, indexing, and diagnosing network faults.
Ting Wang, Mudhakar Srivatsa, Dakshi Agrawal, Ling Liu
Mining broad latent query aspects from search sessions.
Xuanhui Wang, Deepayan Chakrabarti, Kunal Punera
Adapting the right measures for K-means clustering.
Junjie Wu, Hui Xiong, Jian Chen
A LRT framework for fast spatial anomaly detection.
Mingxi Wu, Xiuyao Song, Chris Jermaine, Sanjay Ranka, John Gums
Quantification and semi-supervised classification methods for handling changes in class distribution.
Jack Chongjie Xue, Gary M. Weiss
Fast approximate spectral clustering.
Donghui Yan, Ling Huang, Michael I. Jordan
Effective multi-label active learning for text classification.
Bishan Yang, Jian-Tao Sun, Tengjiao Wang, Zheng Chen
Combining link and content for community detection: a discriminative approach.
Tianbao Yang, Rong Jin, Yun Chi, Shenghuo Zhu
Efficient methods for topic model inference on streaming document collections.
Limin Yao, David M. Mimno, Andrew McCallum
Time series shapelets: a new primitive for data mining.
Lexiang Ye, Eamonn J. Keogh
Exploring social tagging graph for web object classification.
Zhijun Yin, Rui Li, Qiaozhu Mei, Jiawei Han
Mining social networks for personalized email prioritization.
Shinjae Yoo, Yiming Yang, Frank Lin, Il-Chul Moon
Learning patterns in the dynamics of biological networks.
Chang Hun You, Lawrence B. Holder, Diane J. Cook
Toward autonomic grids: analyzing the job flow with affinity streaming.
Xiangliang Zhang, Cyril Furtlehner, Julien Perez, Cécile Germain-Renaud, Michèle Sebag
Parallel community detection on large networks with propinquity dynamics.
Yuzhou Zhang, Jianyong Wang, Yi Wang, Lizhu Zhou
Co-evolution of social and affiliation networks.
Elena Zheleva, Hossam Sharara, Lise Getoor
Information theoretic regularization for semi-supervised boosting.
Lei Zheng, Shaojun Wang, Yan Liu, Chi-Hoon Lee
Cross domain distribution adaptation via kernel mapping.
Erheng Zhong, Wei Fan, Jing Peng, Kun Zhang, Jiangtao Ren, Deepak S. Turaga, Olivier Verscheure
Mining rich session context to improve web search.
Guangyu Zhu, Gilad Mishne
Primal sparse Max-margin Markov networks.
Jun Zhu, Eric P. Xing, Bo Zhang
Augmenting the generalized hough transform to enable the mining of petroglyphs.
Qiang Zhu, Xiaoyue Wang, Eamonn J. Keogh, Sang-Hee Lee
Modeling and predicting user behavior in sponsored search.
Josh Attenberg, Sandeep Pandey, Torsten Suel
Enabling analysts in managed services for CRM analytics.
Indrajit Bhattacharya, Shantanu Godbole, Ajay Gupta, Ashish Verma, Jeff Achtermann, Kevin English
Applying syntactic similarity algorithms for enterprise information management.
Ludmila Cherkasova, Kave Eshghi, Charles B. Morrey III, Joseph Tucek, Alistair C. Veitch
A case study of behavior-driven conjoint analysis on Yahoo!: front page today module.
Wei Chu, Seung-Taek Park, Todd Beaupre, Nitin Motgi, Amit Phadke, Seinjuti Chakraborty, Joe Zachariah
Seven pitfalls to avoid when running controlled experiments on the web.
Thomas Crook, Brian Frasca, Ron Kohavi, Roger Longbotham
Pervasive parallelism in data mining: dataflow solution to co-clustering large and sparse Netflix data.
Srivatsava Daruru, Nena M. Marin, Matt Walker, Joydeep Ghosh
Entity discovery and assignment for opinion mining applications.
Xiaowen Ding, Bing Liu, Lei Zhang
Migration motif: a spatial - temporal pattern mining approach for financial markets.
Xiaoxi Du, Ruoming Jin, Liang Ding, Victor E. Lee, John H. Thornton Jr.
Improving classification accuracy using automatically extracted training data.
Ariel Fuxman, Anitha Kannan, Andrew B. Goldberg, Rakesh Agrawal, Panayiotis Tsaparas, John C. Shafer
Address standardization with latent semantic association.
Honglei Guo, Huijia Zhu, Zhili Guo, Xiaoxun Zhang, Zhong Su
Catching the drift: learning broad matches from clickthrough data.
Sonal Gupta, Mikhail Bilenko, Matthew Richardson
COA: finding novel patents through text analysis.
Mohammad Al Hasan, W. Scott Spangler, Thomas D. Griffin, Alfredo Alba
Network anomaly detection based on Eigen equation compression.
Shunsuke Hirose, Kenji Yamanishi, Takayuki Nakata, Ryohei Fujimaki
OpinionMiner: a novel machine learning system for web opinion mining and extraction.
Wei Jin, Hung Hay Ho, Rohini K. Srihari
Query result clustering for object-level search.
Jongwuk Lee, Seung-won Hwang, Zaiqing Nie, Ji-Rong Wen
Grocery shopping recommendations based on basket-sensitive random walk.
Ming Li, M. Benjamin Dias, Ian H. Jarman, Wael El-Deredy, Paulo J. G. Lisboa
Learning dynamic temporal graphs for oil-production equipment monitoring system.
Yan Liu, Jayant R. Kalagnanam, Oivind Johnsen
Towards combining web classification and web information extraction: a case study.
Ping Luo, Fen Lin, Yuhong Xiong, Yong Zhao, Zhongzhi Shi
Beyond blacklists: learning to detect malicious web sites from suspicious URLs.
Justin Ma, Lawrence K. Saul, Stefan Savage, Geoffrey M. Voelker
Clustering event logs using iterative partitioning.
Adetokunbo Makanju, A. Nur Zincir-Heywood, Evangelos E. Milios
SNARE: a link analytic system for graph labeling and risk detection.
Mary McGlohon, Stephen Bay, Markus G. Anderle, David M. Steier, Christos Faloutsos
Sentiment analysis of blogs by combining lexical knowledge with text classification.
Prem Melville, Wojciech Gryc, Richard D. Lawrence
Anonymizing healthcare data: a case study on the blood transfusion service.
Noman Mohammed, Benjamin C. M. Fung, Patrick C. K. Hung, Cheuk-kwong Lee
Towards a universal marketplace over the web: statistical multi-label classification of service provider forms with simulated annealing.
Kivanc M. Ozonat, Donald Young
Sustainable operation and management of data center chillers using temporal data mining.
Debprakash Patnaik, Manish Marwah, Ratnesh K. Sharma, Naren Ramakrishnan
BGP-lens: patterns and anomalies in internet routing updates.
B. Aditya Prakash, Nicholas Valler, David Andersen, Michalis Faloutsos, Christos Faloutsos
Predicting bounce rates in sponsored search advertisements.
D. Sculley, Robert G. Malkin, Sugato Basu, Roberto J. Bayardo
Mining brain region connectivity for alzheimer's disease study via sparse inverse covariance estimation.
Liang Sun, Rinkal Patel, Jun Liu, Kewei Chen, Teresa Wu, Jing Li, Eric Reiman, Jieping Ye
Can we learn a template-independent wrapper for news article extraction from a single training site?
Junfeng Wang, Chun Chen, Can Wang, Jian Pei, Jiajun Bu, Ziyu Guan, Wei Vivian Zhang
PSkip: estimating relevance ranking quality from web search clickthrough data.
Kuansan Wang, Toby Walker, Zijian Zheng
Named entity mining from click-through data using weakly supervised latent dirichlet allocation.
Gu Xu, Shuang-Hong Yang, Hang Li
Incorporating site-level knowledge for incremental crawling of web forums: a list-wise strategy.
Jiang-Ming Yang, Rui Cai, Chunsong Wang, Hua Huang, Lei Zhang, Wei-Ying Ma
Intelligent file scoring system for malware detection from the gray list.
Yanfang Ye, Tao Li, Qingshan Jiang, Zhixue Han, Li Wan
OLAP on search logs: an infrastructure supporting data-driven applications in search engines.
Bin Zhou, Daxin Jiang, Jian Pei, Hang Li