CVA file: an index structure for high-dimensional datasets

作者:Jiyuan An, Hanxiong Chen, Kazutaka Furuse, Nobuo Ohbo

摘要

Similarity search is important in information-retrieval applications where objects are usually represented as vectors of high dimensionality. This paper proposes a new dimensionality-reduction technique and an indexing mechanism for high-dimensional datasets. The proposed technique reduces the dimensions for which coordinates are less than a critical value with respect to each data vector. This flexible datawise dimensionality reduction contributes to improving indexing mechanisms for high-dimensional datasets that are in skewed distributions in all coordinates. To apply the proposed technique to information retrieval, a CVA file (compact VA file), which is a revised version of the VA file is developed. By using a CVA file, the size of index files is reduced further, while the tightness of the index bounds is held maximally. The effectiveness is confirmed by synthetic and real data.

论文关键词:Information retrieval, High-dimensional data, Spatial index, Local dimensionality reduction, Zipf’s law, CVA file

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-004-0149-6