№2, 2025

SCOUT: A SOLUTION TO AUTOMATIC INSPECTION OF THE OUTDOOR ADVERTISEMENT BANNERS

Jamaladdin Hasanov, Nizami Guliyev, Toghrul Babayev, Eljan Abdullazada, Vagif Karimli

This paper presents the design and implementation of SCOUT - Street-level Capture & Overlay for Urban Track-inspection - a hybrid AI system developed for the automated detection and geospatial mapping of outdoor advertisement banners in urban environments. Targeting the operational needs of municipal authorities and marketing service agencies, SCOUT integrates rule-based heuristics, large language models (LLMs), and machine learning (ML) techniques, including a custom-trained YOLOv11 object detector. The system processes dashcam video streams to identify advertisement banners and automatically logs their locations in a spatial database using GPS metadata. Comparative experiments evaluate the performance of traditional computer vision methods, LLM-based analysis, and deep learning models, highlighting the limitations of classical and LLM-based approaches in dynamic urban settings. Results demonstrate that the hybrid system - anchored by a lightweight object detection model - offers significant improvements in detection accuracy, scalability, and real-time applicability. SCOUT provides a novel, practical solution for urban advertisement monitoring and policy enforcement, with potential for integration into larger smart city infrastructure through third-party APIs (pp.3-14).

Keywords: Object detection, Boundary detection, Banner recognition, Transfer learning, Llm, Marketing automation.
References
  • Azimi, S. M., Vig, E., Bahmanyar, R., Körner, M., & Reinartz, P. (2018). Towards multi-class object detection in unconstrained remote sensing imagery. Proceedings of the European Conference on Computer Vision (ECCV).
  • Zhou, P., Lin, Z., & Liu, Y. (2022). GeoAI: Integrating AI and geospatial analytics for smart cities. ISPRS International Journal of Geo-Information, 11(2), 107.
  • Gonzalez, R. C., Woods, R. E., & Eddins, S. L. (2019). Digital Image Processing Using MATLAB. Pearson Education.
  • Canny, J. (1986). A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, (6), 679–698.
  • Dalal, N., & Triggs, B. (2005). Histograms of Oriented Gradients for Human Detection. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR).
  • Gadre, S., Misra, I., Rohrbach, A., Zitnick, C. L., & Achlioptas, P. (2023). CLIP the Gap: A Single Modality Representation Learning for Vision-Language Tasks. CVPR 2023.
  • Duda, R. O., & Hart, P. E. (1972). Use of the Hough transformation to detect lines and curves in pictures. Communications of the ACM, 15(1), 11–15. https://doi.org/10.1145/361237.361242
  • Jiaowoguanren. (2022, August 15). Billboards-signs-and-branding TF ResMLP. Kaggle.
    https://www.kaggle.com/code/jiaowoguanren/billboards-signs-and-branding-tf-resmlp
  • Naeem, M., Asim, M., & Ali, M. (2023). I2MVFormer: Large language model generated multi-view document supervision for zero-shot image classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1–9). IEEE CVPR 2023 Open Access Repository.
  • OpenAI. (n.d.). Rate limits. OpenAI. Retrieved April 28, 2025, from https://platform.openai.com/settings/organization/limits
  • Wase, Z. M., Madisetti, V. K., & Bahga, A. (2023). Object detection meets LLMS: model fusion for safety and security. Journal of Software Engineering and Applications, 16(12), 672–684. https://doi.org/10.4236/jsea.2023.1612034
  • Wang, W., Chen, Z., Chen, X., Wu, J., Zhu, X., Zeng, G., & Dai, J. (2023). VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks. arXiv preprint arXiv:2305.11175. https://arxiv.org/abs/2305.11175
  • Wang, J., Wu, Z., Li, Y., Jiang, H., Shu, P., Shi, E., & Zhang, S. (2024). Large Language Models for Robotics: Opportunities, Challenges, and Perspectives. Journal of Automation and Intelligence. https://doi.org/10.1016/j.jai.2024.12.003