Overview of Visual Position Recognition

Andriy Kramar

doi:10.31861/sisiot2025.2.02006

Authors

Andriy Kramar Private Higher Educational Institution "Bukovinian University" Author https://orcid.org/0009-0006-7371-4508

DOI:

https://doi.org/10.31861/sisiot2025.2.02006

Keywords:

computer vision, visual position recognition, object recognition, machine learning, neural networks

Abstract

One of the promising areas in the development of artificial intelligence is the creation of computer vision technology – a system that enables computerized systems to acquire, analyze, and interpret information from photos, videos, or digital images. This opens up extensive opportunities for process automation across various fields, including robotics, autonomous transportation, industry, and medicine. One of the key challenges in computer vision research is the problem of visual position recognition – assessing a robot’s coordinates and orientation based on video or photo data obtained from its cameras. In robotic systems, precise position recognition is critical for navigation, adaptation to environmental changes, and interaction with objects. The article attempts to formulate the problem in terms of approximating the probability density function of the robot’s states within the space of input data. In addition to theoretical aspects, the study examines a set of algorithms currently in use – both classical approaches and neural network-based models – their universality, and their integration potential with other computer vision technologies. The interpretation of these algorithms is presented from the perspective of dimensionality reduction in the input data space during localization. Furthermore, a list of relevant datasets for training and testing visual position recognition models is provided, along with key metrics for evaluating their performance. Thus, the study not only summarizes modern approaches to solving this problem but also outlines directions for further technological advancements that can ensure more efficient and accurate robot localization in space.

Downloads

Download data is not yet available.

Author Biography

Andriy Kramar, Private Higher Educational Institution "Bukovinian University"

Graduated from Chernivtsi National University in 2001, master of Physics, teacher of physics and computer science at the College of Private Higher Educational Institution "Bukovinian University" since 2024. Scientific interests: machine learning, genetic algorithms, and digital signal processing.

References

D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” [Online]. Available: https://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf

T. Lindeberg, “Scale invariant feature transform,” [Online]. Available: https://www.researchgate.net/publication/235355151_Scale_Invariant_Feature_Transform

H. Bay, A. Ess, T. Tuytelaars, and L. V. Gool, “SURF: Speeded up robust features,” 2008. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1077314207001555

D. Gossow, P. Decker, and D. Paulus, “An evaluation of open source SURF implementations,” 2010. [Online]. Available: https://doi.org/10.1007/978-3-642-20217-9_15

E. Abbadi and A. Hassani, “Panoramic image stitching techniques based on SURF and singular value decomposition,” 2022. [Online]. Available: https://doi.org/10.1007/978-3-030-93417-0_5

A. Riabko and Y. Averyanova, “Comparative analysis of SIFT and SURF methods for local feature detection in satellite imagery,” 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1077314214000391

E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “ORB: An efficient alternative to SIFT or SURF,” 2011. [Online]. Available: https://ieeexplore.ieee.org/document/6126544

C. Campos, R. Elvira, J. J. Gómez Rodríguez, J. M. M. Montiel, and J. D. Tardós, “SLAM3: An accurate open-source library for visual, visual-inertial and multi-map SLAM,” [Online]. Available: https://arxiv.org/abs/2007.11898

V. P. Lysechko, B. I. Sadovnykov, O. M. Komar, and O. S. Zhuchenko, “A research of the latest approaches to visual image recognition,” [Online]. Available: https://pdfs.semanticscholar.org/2cc6/befc9db461b20f4cae44a54707ed1257a1d3.pdf

B. Ferrarini, M. Milford, K. D. McDonald-Maier, and S. Ehsan, “Binary neural networks for memory-efficient and effective visual place recognition in changing environments,” [Online]. Available: https://arxiv.org/pdf/2010.00716

S. Dhar, “Visual place recognition. Introduction,” [Online]. Available: https://medium.com/@sd5023/visual-place-recognition-8999307ebb2f

S. Hussaini, M. Milford, and T. Fischer, “Spiking neural networks for visual place recognition via weighted neuronal assignments,” [Online]. Available: https://arxiv.org/pdf/2109.06452

F. Xue, I. Budvytis, and R. Cipolla, “PRAM: Place recognition anywhere model for efficient visual localization,” [Online]. Available: https://arxiv.org/pdf/2404.07785

S. Hussaini, M. Milford, and T. Fischer, “Applications of spiking neural networks in visual place recognition,” [Online]. Available: https://arxiv.org/pdf/2311.13186

C.-Y. Wang, I.-H. Yeh, H.-Y. M. Liao, and C. Yuan, “YOLOv9: Learning what you want to learn using programmable gradient information,” [Online]. Available: https://arxiv.org/pdf/2402.13616

“Visual place recognition – Papers with Code,” [Online]. Available: https://paperswithcode.com/task/visual-place-recognition

R. Dube, D. Dugas, E. Stumm, and J. I. Nieto, “SegMatch: Segment based place recognition in 3D point clouds,” [Online]. Available: https://www.researchgate.net/publication/318693876_SegMatch_Segment_based_place_recognition_in_3D_point_clouds

S. Arshad, “SVS-VPR: A semantic visual and spatial information-based hierarchical visual place recognition for autonomous navigation in challenging environmental conditions,” 2024. [Online]. Available: https://www.mdpi.com/1424-8220/24/3/906

K. Song, S. Zhang, Z. An, Z. Luo, T. Wang, and J. Xie, “Semantics-consistent feature search for self-supervised visual representation learning,” [Online]. Available: https://arxiv.org/pdf/2212.06486

B. Chen, X. Song, H. Shen, and T. Lu, “Hierarchical visual place recognition based on semantic-aggregation,” 2020. [Online]. Available: https://www.mdpi.com/2076-3417/11/20/9540

Oxford Robotics Institute, “Oxford RobotCar Dataset,” [Online]. Available: https://robotcar-dataset.robots.ox.ac.uk/

Meta Platforms Ireland Limited, “Mapillary Vistas Dataset,” [Online]. Available: https://www.mapillary.com/dataset/vistas

A. Geiger, P. Lenz, and R. Urtasun, “Vision meets robotics: The KITTI dataset,” 2012. [Online]. Available: https://www.cvlibs.net/datasets/kitti/

M. Schleiss, F. Rouatbi, and D. Cremers, “VPAIR: Aerial visual place recognition and localization in large-scale outdoor environments,” 2022. [Online]. Available: https://github.com/AerVisLoc/vpair

N. Keetha, A. Mishra, J. Karhade, K. M. Jatavallabhula, S. Scherer, M. Krishna, and S. Garg, “AnyLoc: Towards universal visual place recognition,” [Online]. Available: https://arxiv.org/pdf/2308.00688

F. Xue, B. Chen, X.-D. Zhou, and D. Song, “STA-VPR: Spatio-temporal alignment for visual place recognition,” [Online]. Available: https://arxiv.org/abs/2103.13580

Z. Shi, H. Shi, K. Yang, Z. Yin, Y. Lin, and K. Wang, “PanoVPR: Towards unified perspective-to-equirectangular visual place recognition via sliding windows across the panoramic view,” [Online]. Available: https://arxiv.org/abs/2303.14095

Overview of Visual Position Recognition

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biography

References

Downloads

Published

Issue

Section

License

How to Cite

Similar Articles

Language

Information

Features

Indexing

Visitors

Founder