№2, 2025

THERMAL-RGB FUSION WITH LIGHTWEIGHT CNNS FOR NIGHT-TIME DRONE SURVEILLANCE AND REAL-TIME ADAPTIVE SENSOR SELECTION

Vugar Gasimov

This paper presents a lightweight and adaptive object detection framework tailored for night-time drone surveillance using thermal-RGB fusion. The proposed system integrates a dual-branch CNN backbone to extract modality-specific features, followed by a mid-level fusion strategy that combines thermal and RGB representations into a unified feature map. A real-time adaptive sensor selection mechanism dynamically adjusts the contribution of each modality based on scene context, enhancing robustness under varying illumination conditions. The architecture employs YOLOv5-Nano as a compact feature extraction backbone, supported by quantization and pruning for real-time deployment. Extensive experiments conducted on publicly available datasets, demonstrate superior accuracy and efficiency compared to single-modality and non-adaptive baselines. Ablation studies validate the contribution of each component. The results highlight the system’s capability to achieve high detection performance with low computational overhead, offering a practical and scalable solution for intelligent aerial surveillance in complex night-time environments (pp.56-68).

Keywords: Thermal-RGB fusion, Lightweight CNN, Adaptive sensor selection, Drone surveillance, Object detection.
References
  • He, X., Tang, C., Zou, X., & Zhang, W. (2023). Multispectral object detection via cross-modal conflict-aware learning. In 31st ACM International Conference on Multimedia, Ottawa, Canada, October 29 - November 3 (pp. 1465 - 1474). https://doi.org/10.1145/3581783.3612651
  • Kim, T., Shin S., Yu, Y., Kim, H.G., & Ro., Y.M. (2024). Causal mode multiplexer: A novel framework for unbiased multispectral pedestrian detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, 16-22 June 2024 (pp. 26774-26783). https://doi.org/10.1109/CVPR52733.2024.02529
  • Li, Q., Zhang, C., Hu, Q., Fu, H., & Zhu, P. (2023). Confidence-aware fusion using Dempster-Shafer theory for multispectral pedestrian detection. IEEE Transactions on Multimedia, 25, 3420-3431. http://doi.org/10.1109/TMM.2022.3160589
  • Li, R., Xiang, J., Sun, F., Yuan, Y., Yuan, L., & Gou, S. (2024). Multiscale cross-modal homogeneity enhancement and confidence-aware fusion for multispectral pedestrian detection. IEEE Transactions on Multimedia, 26, 852-863. http://doi.org/10.1109/TMM.2023.3272471
  • Lim, D.Y., Jin, I.J., & Bang, I.C. (2023). Heat-vision based drone surveillance augmented by deep learning for critical industrial monitoring. Scientific Reports, 13(1), 1-12. https://doi.org/10.1038/s41598-023-49589-x
  • Liu, Y., & Jiang, W. (2024). Frequency mining and complementary fusion network for RGB-infrared object detection. IEEE Geoscience and Remote Sensing Letters, 21, 1–5.  http://doi.org/10.1109/LGRS.2024.3448493 
  • Ozcan, A., & Cetin, O. (2022). A novel fusion method with thermal and RGB-D sensor data for human detection. IEEE Access, 10, 66831–66841. https://doi.org/10.1109/ACCESS.2022.3185402
  • Sun, J., Yin, M., Wang, Z., Xie, T., & Bei, S. (2024). Multispectral object detection based on multilevel feature fusion and dual feature modulation. Electronics, 13(2), 443. https://doi.org/10.3390/electronics13020443
  • Sun, X., Zhu, Y., & Huang, H. (2025). Specificity-guided cross-modal feature reconstruction for RGB-Infrared object detection. IEEE Transactions on Intelligent Transportation Systems, 26(1), 950-961. http://doi.org/10.1109/TITS.2024.3495028
  • Zhang, X., Cao, S.-Y., Wang, F., Zhang, R., Wu, Z. & Zhang, X. (2024). Rethinking early-fusion strategies for improved multispectral object detection. IEEE Transactions on Intelligent Vehicles, 1–15. (in press) https://doi.org/10.1109/TIV.2024.3462488
  • Zhang, Y., Zeng, W., Jin, S., Qian, C., Luo, P., & Liu, W. (2025). When pedestrian detection meets multi-modal learning: Generalist model and benchmark dataset. In 18th European Conference on Computer Vision, Milan, Italy, September 29–October 4 (pp. 430–448). https://doi.org/10.1007/978-3-031-73195-2_25
  • Wang, Q., Chi, Y., Shen, T., Song, J., Zhang, Z., & Zhu, Y. (2022). Improving RGB-infrared pedestrian detection by reducing cross-modality redundancy," In IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 16-19 October 2022 (pp. 526-530) http://doi.org/10.1109/ICIP46576.2022.9897682
  • Zhu, Y., Sun, X., Wang, M., & Huang, H. (2023). Multi-modal feature pyramid transformer for RGB-infrared object detection. IEEE Transactions on Intelligent Transportation Systems, 24(9), 9984-9995. http://doi.org/10.1109/TITS.2023.3266487