Visual SLAM: Why filter?
作者:
Highlights:
•
摘要
While the most accurate solution to off-line structure from motion (SFM) problems is undoubtedly to extract as much correspondence information as possible and perform batch optimisation, sequential methods suitable for live video streams must approximate this to fit within fixed computational bounds. Two quite different approaches to real-time SFM – also called visual SLAM (simultaneous localisation and mapping) – have proven successful, but they sparsify the problem in different ways. Filtering methods marginalise out past poses and summarise the information gained over time with a probability distribution. Keyframe methods retain the optimisation approach of global bundle adjustment, but computationally must select only a small number of past frames to process.In this paper we perform a rigorous analysis of the relative advantages of filtering and sparse bundle adjustment for sequential visual SLAM. In a series of Monte Carlo experiments we investigate the accuracy and cost of visual SLAM. We measure accuracy in terms of entropy reduction as well as root mean square error (RMSE), and analyse the efficiency of bundle adjustment versus filtering using combined cost/accuracy measures. In our analysis, we consider both SLAM using a stereo rig and monocular SLAM as well as various different scenes and motion patterns. For all these scenarios, we conclude that keyframe bundle adjustment outperforms filtering, since it gives the most accuracy per unit of computing time.
论文关键词:SLAM,Structure from motion,Bundle adjustment,EKF,Information filter,Monocular vision,Stereo vision
论文评审过程:Received 2 August 2011, Revised 13 December 2011, Accepted 17 February 2012, Available online 3 March 2012.
论文官网地址:https://doi.org/10.1016/j.imavis.2012.02.009