kdd41

SIGKDD(KDD) 2008 论文列表

Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada, USA, August 24-27, 2008.

Pictor: an interactive system for importing data from a website.
A software system for buzz-based recommendations.
Morpheus: interactive exploration of subspace clustering.
CRO: a system for online review structurization.
Pattern-Miner: integrated management and mining over data mining models.
DiMaC: a disguised missing data cleaning tool.
An integrated system for automatic customer satisfaction analysis in the services industry.
Using tagflake for condensing navigable tag hierarchies from tag clouds.
Febrl -: an open source data cleaning, deduplication and record linkage system with a graphical user interface.
An inductive database prototype based on virtual mining views.
Social networks: looking ahead.
Learning from multi-topic web documents for contextual advertisement.
Using predictive analysis to improve invoice-to-cash collection.
Privacy-preserving cox regression for survival analysis.
Heterogeneous data fusion for alzheimer's disease study.
A visual-analytic toolkit for dynamic interaction graphs.
Experimental comparison of scalable online ad serving.
Tagmark: reliable estimations of RFID tags for business processes.
ArnetMiner: extraction and mining of academic social networks.
Identifying domain expertise of developers from source code.
Scalable and near real-time burst detection from eCommerce queries.
Temporal pattern discovery for trends and transient effects: its application to patient records.
Anticipating annotations and emerging trends in biomedical literature.
Customer targeting models using actively-selected web content.
Spotting out emerging artists using geo-aware analysis of P2P query strings.
Automated cyclone discovery and tracking using knowledge sharing in multiple heterogeneous satellite data.
Data mining using high performance data clouds: experimental studies using sector and sphere.
Text classification, business intelligence, and interactivity: automating C-Sat analysis for services industry.
Learning methods for lung tumor markerless gating in image-guided radiotherapy.
Detecting privacy leaks using corpus-based association rules.
The persuasive phase of visualization.
Context-aware query suggestion by mining click-through and session data.
Identifying authoritative actors in question-answering forums: the case of Yahoo! answers.
Land cover change detection: a case study.
Volatile correlation computation: a checkpoint view.
Identifying biologically relevant genes via multiple heterogeneous data sources.
Cuts3vm: a fast semi-supervised svm algorithm.
Fastanova: an efficient algorithm for genome-wide association study.
Categorizing and mining concept drifting data streams.
Stable feature selection via dense feature groups.
Training structural svms with kernels using sampled cuts.
A family of dissimilarity measures between nodes generalizing both the shortest-path and the commute-time distances.
Local peculiarity factor and its application in outlier detection.
Anonymizing transaction databases for publication.
Succinct summarization of transactional databases: an overlapped hyperrectangle scheme.
Asymmetric support vector machines: low false-positive learning under the user tolerance.
SAIL: summation-based incremental learning for information-theoretic clustering.
Information extraction from Wikipedia: moving down the long tail.
A unified approach for schema matching, coreference and canonicalization.
Building semantic kernels for text classification using wikipedia.
Model-based document clustering with a collapsed gibbs sampler.
Can complex network metrics predict the behavior of NBA teams?
Colibri: fast mining of large static and dynamic graphs.
Community evolution in dynamic multi-mode networks.
Hypergraph spectral learning for multi-label classification.
A bayesian mixture model with linear regression mixing proportions.
Relational learning via collective matrix factorization.
Semi-supervised approach to rapid and reliable labeling of large data sets.
Efficient computation of personal aggregate queries on blogs.
iSAX: indexing and mining terabyte sized time series.
Get another label? improving data quality and data mining using multiple, noisy labelers.
Efficient ticket routing by resolution sequence mining.
Mobile call graphs: beyond power-law and lognormal distributions.
Knowledge discovery of semantic relationships between words using nonparametric bayesian graph model.
Partial least squares regression for graph mining.
Fast collapsed gibbs sampling for latent dirichlet allocation.
Discrimination-aware data mining.
Classification with partial labels.
Joint latent topic models for text and citations.
Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering.
Weighted graphs and disconnected components: patterns and a generator.
On updates that constrain the features' connections during learning.
Multi-class cost-sensitive boosting with p-norm loss functions.
Mining multi-faceted overviews of arbitrary topics in a text collection.
Spectral domain-transfer learning.
Active learning with direct query construction.
Cut-and-stitch: efficient parallel learning of linear dynamical systems on smps.
Microscopic evolution of social networks.
Stream prediction using a generative model based on frequent episodes in event sequences.
Angle-based outlier detection in high-dimensional data.
The structure of information pathways in a social communication network.
Factorization meets the neighborhood: a multifaceted collaborative filtering model.
Constructing comprehensive summaries of large event sequences.
A sequential dual method for large scale multi-class linear svms.
Effective and efficient itemset pattern summarization: regression-based approaches.
Mining preferences from superior and inferior examples.
Extracting shared subspace for multi-label classification.
Automatic identification of quasi-experimental designs for discovering causal knowledge.
Probabilistic latent semantic visualization: topic model for visualizing documents.
Fast logistic regression for text categorization with variable-length n-grams.
Interpretable nonnegative matrix decompositions.
Bridging centrality: graph mining from element level to group level.
Simultaneous tensor subspace selection and clustering: the equivalence of high order svd and k-means clustering.
Permu-pattern: discovery of mutable permutation patterns with proximity constraint.
Unsupervised deduplication using cross-field dependencies.
Quantitative evaluation of approximate frequent pattern mining algorithms.
Banded structure in binary matrices.
Knowledge transfer via multiple model local structure mapping.
Entity categorization over large document collections.
Composition attacks and auxiliary information in data privacy.
Using ghost edges for classification in sparsely labeled networks.
SPIRAL: efficient and exact model identification for hidden Markov models.
Scaling up text classification for large file systems.
Direct mining of discriminative and essential frequent patterns via model-based search tree.
Locality sensitive hash functions based on concomitant rank order statistics.
Learning classifiers from only positive and unlabeled data.
Constraint programming for itemset mining.
Structured metric learning for high dimensional problems.
De-duping URLs via rewrite rules.
Bypass rates: reducing query abandonment using negative inferences.
Anomaly pattern detection in categorical datasets.
Feedback effects between similarity and social influence in online communities.
Automatic record linkage using seeded nearest neighbour and support vector machine classification.
Reconstructing chemical reaction networks: data mining meets system identification.
Semi-supervised learning with data calibration for long-term time series forecasting.
FAST: a roc-based feature selection metric for small samples and imbalanced data classification problems.
Combinational collaborative filtering for personalized community recommendation.
Learning subspace kernels for classification.
Partitioned logistic regression for spam filtering.
Structured learning for non-smooth ranking losses.
Generating succinct titles for web URLs.
The cost of privacy: destruction of data-mining utility in anonymized data publishing.
Unsupervised feature selection for principal components analysis.
Topical query decomposition.
Effective label acquisition for collective classification.
Mining adaptively frequent closed unlabeled rooted trees in data streams.
Structured entity identification and document categorization: two tasks with one joint model.
Efficient semi-streaming algorithms for local triangle counting in massive graphs.
Influence and correlation in social networks.
Genesis of postal address reading, current state and future prospects: thirty years of pattern recognition on duty of postal services.
The future of image search.
Regularization paths and coordinate descent.
Large scale data analysis and modelling in online services and advertising.
Internet advertising and optimal auction design.