Fast spatial autocorrelation

作者:Anar Amgalan, LR Mujica-Parodi, Steven S. Skiena

摘要

Physical or geographic location proves to be an important feature in many data science models, because many diverse natural and social phenomenon have a spatial component. Spatial autocorrelation measures the extent to which locally adjacent observations of the same phenomenon are correlated. Although statistics like Moran’s I and Geary’s C are widely used to measure spatial autocorrelation, they are slow: All popular methods run in \(\Omega (n^2)\) time, rendering them unusable for large datasets, or long time-courses with moderate numbers of points. We propose a new \(S_A\) statistic based on the notion that the variance observed when merging pairs of nearby clusters should increase slowly for spatially autocorrelated variables. We give a linear-time algorithm to calculate \(S_A\) for a variable with an input agglomeration order (available at https://github.com/aamgalan/spatial_autocorrelation). For a typical dataset of \(n \approx 63,000\) points, our \(S_A\) autocorrelation measure can be computed in 1 second, versus 2 hours or more for Moran’s I and Geary’s C. Through simulation studies, we demonstrate that \(S_A\) identifies spatial correlations in variables generated with spatially-dependent model half an order of magnitude earlier than either Moran’s I or Geary’s C. Finally, we prove several theoretical properties of \(S_A\): namely that it behaves as a true correlation statistic and is invariant under addition or multiplication by a constant.

论文关键词:Algorithm design and analysis, Computational efficiency, Autocorrelation, Biomedical informatics, Magnetic resonance, Clustering algorithms

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-021-01640-x