Efficient Label Collection for Image Datasets via Hierarchical Clustering

作者:Maggie Wigness, Bruce A. Draper, J. Ross Beveridge

摘要

Raw visual data used to train classifiers is abundant and easy to gather, but lacks semantic labels that describe visual concepts of interest. These labels are necessary for supervised learning and can require significant human effort to collect. We discuss four labeling objectives that play an important role in the design of frameworks aimed at collecting label information for large training sets while maintaining low human effort: discovery, efficiency, exploitation and accuracy. We introduce a framework that explicitly models and balances these four labeling objectives with the use of (1) hierarchical clustering, (2) a novel interestingness measure that defines structural change within the hierarchy, and (3) an iterative group-based labeling process that exploits relationships between labeled and unlabeled data. Results on benchmark data show that our framework collects labeled training data more efficiently than existing labeling techniques and trains higher performing visual classifiers. Further, we show that our resulting framework is fast and significantly reduces human interaction time when labeling real-world multi-concept imagery depicting outdoor environments.

论文关键词:Efficient label collection, Hierarchical clustering, Image classification, Visual concept discovery

论文评审过程:

论文官网地址:https://doi.org/10.1007/s11263-017-1039-1