Prioritizing Information for the Discovery of Phenomena

作者:Paul Helman, Rebecca Gore

摘要

We consider the problem of prioritizing a collection of discrete pieces of information, or transactions. The goal is to rank the transactions in such a way that the user can best pursue a subset of the transactions in hopes of discovering those which were generated by an interesting source. The problem is shown to differ from traditional classification in several fundamental ways. Ranking algorithms are divided into classes, depending on the amount of information they may utilize. We demonstrate that while ranking by the least constrained algorithm class is consistent with classification, such is not the case for a more constrained class of algorithms. We demonstrate also that while optimal ranking by the former class is “easy”, optimal ranking by the latter class is NP-hard. Finally, we present detectors which solve optimally restricted versions of the ranking problem, including symmetric anomaly detection.

论文关键词:anomaly detection, Bayesian methods, classification, computational complexity, knowledge discovery

论文评审过程:

论文官网地址:https://doi.org/10.1023/A:1008628802726