Overview of Visual Position Recognition
DOI:
https://doi.org/10.31861/sisiot2025.2.02006Keywords:
computer vision, visual position recognition, object recognition, machine learning, neural networksAbstract
One of the promising areas in the development of artificial intelligence is the creation of computer vision technology – a system that enables computerized systems to acquire, analyze, and interpret information from photos, videos, or digital images. This opens up extensive opportunities for process automation across various fields, including robotics, autonomous transportation, industry, and medicine. One of the key challenges in computer vision research is the problem of visual position recognition – assessing a robot’s coordinates and orientation based on video or photo data obtained from its cameras. In robotic systems, precise position recognition is critical for navigation, adaptation to environmental changes, and interaction with objects. The article attempts to formulate the problem in terms of approximating the probability density function of the robot’s states within the space of input data. In addition to theoretical aspects, the study examines a set of algorithms currently in use – both classical approaches and neural network-based models – their universality, and their integration potential with other computer vision technologies. The interpretation of these algorithms is presented from the perspective of dimensionality reduction in the input data space during localization. Furthermore, a list of relevant datasets for training and testing visual position recognition models is provided, along with key metrics for evaluating their performance. Thus, the study not only summarizes modern approaches to solving this problem but also outlines directions for further technological advancements that can ensure more efficient and accurate robot localization in space.
Downloads
References
D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” [Online]. Available: https://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf
T. Lindeberg, “Scale invariant feature transform,” [Online]. Available: https://www.researchgate.net/publication/235355151_Scale_Invariant_Feature_Transform
H. Bay, A. Ess, T. Tuytelaars, and L. V. Gool, “SURF: Speeded up robust features,” 2008. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1077314207001555
D. Gossow, P. Decker, and D. Paulus, “An evaluation of open source SURF implementations,” 2010. [Online]. Available: https://doi.org/10.1007/978-3-642-20217-9_15
E. Abbadi and A. Hassani, “Panoramic image stitching techniques based on SURF and singular value decomposition,” 2022. [Online]. Available: https://doi.org/10.1007/978-3-030-93417-0_5
A. Riabko and Y. Averyanova, “Comparative analysis of SIFT and SURF methods for local feature detection in satellite imagery,” 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1077314214000391
E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “ORB: An efficient alternative to SIFT or SURF,” 2011. [Online]. Available: https://ieeexplore.ieee.org/document/6126544
C. Campos, R. Elvira, J. J. Gómez Rodríguez, J. M. M. Montiel, and J. D. Tardós, “SLAM3: An accurate open-source library for visual, visual-inertial and multi-map SLAM,” [Online]. Available: https://arxiv.org/abs/2007.11898
V. P. Lysechko, B. I. Sadovnykov, O. M. Komar, and O. S. Zhuchenko, “A research of the latest approaches to visual image recognition,” [Online]. Available: https://pdfs.semanticscholar.org/2cc6/befc9db461b20f4cae44a54707ed1257a1d3.pdf
B. Ferrarini, M. Milford, K. D. McDonald-Maier, and S. Ehsan, “Binary neural networks for memory-efficient and effective visual place recognition in changing environments,” [Online]. Available: https://arxiv.org/pdf/2010.00716
S. Dhar, “Visual place recognition. Introduction,” [Online]. Available: https://medium.com/@sd5023/visual-place-recognition-8999307ebb2f
S. Hussaini, M. Milford, and T. Fischer, “Spiking neural networks for visual place recognition via weighted neuronal assignments,” [Online]. Available: https://arxiv.org/pdf/2109.06452
F. Xue, I. Budvytis, and R. Cipolla, “PRAM: Place recognition anywhere model for efficient visual localization,” [Online]. Available: https://arxiv.org/pdf/2404.07785
S. Hussaini, M. Milford, and T. Fischer, “Applications of spiking neural networks in visual place recognition,” [Online]. Available: https://arxiv.org/pdf/2311.13186
C.-Y. Wang, I.-H. Yeh, H.-Y. M. Liao, and C. Yuan, “YOLOv9: Learning what you want to learn using programmable gradient information,” [Online]. Available: https://arxiv.org/pdf/2402.13616
“Visual place recognition – Papers with Code,” [Online]. Available: https://paperswithcode.com/task/visual-place-recognition
R. Dube, D. Dugas, E. Stumm, and J. I. Nieto, “SegMatch: Segment based place recognition in 3D point clouds,” [Online]. Available: https://www.researchgate.net/publication/318693876_SegMatch_Segment_based_place_recognition_in_3D_point_clouds
S. Arshad, “SVS-VPR: A semantic visual and spatial information-based hierarchical visual place recognition for autonomous navigation in challenging environmental conditions,” 2024. [Online]. Available: https://www.mdpi.com/1424-8220/24/3/906
K. Song, S. Zhang, Z. An, Z. Luo, T. Wang, and J. Xie, “Semantics-consistent feature search for self-supervised visual representation learning,” [Online]. Available: https://arxiv.org/pdf/2212.06486
B. Chen, X. Song, H. Shen, and T. Lu, “Hierarchical visual place recognition based on semantic-aggregation,” 2020. [Online]. Available: https://www.mdpi.com/2076-3417/11/20/9540
Oxford Robotics Institute, “Oxford RobotCar Dataset,” [Online]. Available: https://robotcar-dataset.robots.ox.ac.uk/
Meta Platforms Ireland Limited, “Mapillary Vistas Dataset,” [Online]. Available: https://www.mapillary.com/dataset/vistas
A. Geiger, P. Lenz, and R. Urtasun, “Vision meets robotics: The KITTI dataset,” 2012. [Online]. Available: https://www.cvlibs.net/datasets/kitti/
M. Schleiss, F. Rouatbi, and D. Cremers, “VPAIR: Aerial visual place recognition and localization in large-scale outdoor environments,” 2022. [Online]. Available: https://github.com/AerVisLoc/vpair
N. Keetha, A. Mishra, J. Karhade, K. M. Jatavallabhula, S. Scherer, M. Krishna, and S. Garg, “AnyLoc: Towards universal visual place recognition,” [Online]. Available: https://arxiv.org/pdf/2308.00688
F. Xue, B. Chen, X.-D. Zhou, and D. Song, “STA-VPR: Spatio-temporal alignment for visual place recognition,” [Online]. Available: https://arxiv.org/abs/2103.13580
Z. Shi, H. Shi, K. Yang, Z. Yin, Y. Lin, and K. Wang, “PanoVPR: Towards unified perspective-to-equirectangular visual place recognition via sliding windows across the panoramic view,” [Online]. Available: https://arxiv.org/abs/2303.14095
Published
Issue
Section
License
Copyright (c) 2025 Security of Infocommunication Systems and Internet of Things

This work is licensed under a Creative Commons Attribution 4.0 International License.







