Hybrid Evolutionary Reinforcement Learning for UAV Path Planning: Genetic Programming and Soft Actor Critic Integrations

Muhammad Umer Mushtaq; Hein Venter; Pule Nxasana; Fhulufhelo Tshivhula; Katleho Junior Modise; Tamoor Shafique; Owais Muhammad

doi:10.54963/jic.v4i2.1594

Authors

Muhammad Umer Mushtaq
Department of Computer Science, University of Pretoria, Pretoria 0028, South Africa
Department of Computer Science, Bahria University Islamabad Campus, Islamabad 44230, Pakistan
Hein Venter
Department of Computer Science, University of Pretoria, Pretoria 0028, South Africa
Pule Nxasana
Department of Computer Science, University of Pretoria, Pretoria 0028, South Africa
Fhulufhelo Tshivhula
Department of Computer Science, University of Pretoria, Pretoria 0028, South Africa
Katleho Junior Modise
Department of Computer Science, University of Pretoria, Pretoria 0028, South Africa
Tamoor Shafique
School of Digital Technology, Innovation and Business, University of Staffordshire, Stoke‑on‑Trent ST4 2DF, UK
Owais Muhammad
Department of Electrical and Electronic Engineering, University of Johannesburg, Johannesburg 1619, South Africa

Received: 9 July 2025; Revised: 12 August 2025; Accepted: 16 August 2025; Published: 2 September 2025

Abstract:

Unmanned Aerial Vehicle (UAV) path planning in unknown environments continues to pose a significant challenge, as Deep Reinforcement Learning (DRL) solutions are often severely hampered by slow convergence rates as well as unstable training dynamics. To address this gap, we introduce a Genetic Programming–seeded Soft Actor–Critic (GP+SAC) approach in which Genetic Programming produces high-quality trajectories that are introduced into the replay buffer of SAC as a “warm-start” policy to prevent wasteful early exploration. Through experiments in three benchmark grid environments, we demonstrate that GP+SAC converges significantly more rapidly than the FA-DQN baseline, achieving superior returns in fewer episodes while capitalizing on the same reward design. We show that in large environments, GP+SAC achieved a mean path length of 30.55 units as compared to FA-DQN’s 28.38, thus validating that rapid convergence has no tradeoff in path efficiency. Observably, results also show that as much as GP+SAC obtains superior cumulative rewards, there is a visible fluctuation in the level of training that is indicative of instabilities under very constrained environments. Numerical evaluations show that the proposed GP+SAC agent converges significantly faster than the FA-DQN baseline, achieving higher episodic returns within only a few episodes. In terms of path efficiency, GP+SAC yields an average path length of 30.55 units, which is comparable to the FA-DQN’s 28.38 units, demonstrating that accelerated convergence is achieved without sacrificing path optimality.

Keywords:

UAV‑Assisted WSNs UAV Flight Path Scheduling Soft Actor‑Critic (SAC) Reinforcement Learning Genetic Programming

References

Wei, Z.; Zhu, M.; Zhang, N.; et al. UAV-Assisted Data Collection for Internet of Things: A Survey. IEEE Internet Things J. 2022, 9, 15460–15483.
Liu, L.; Wang, A.; Sun, G.; et al. Multi-Objective Optimization for Data Collection in UAV-Assisted Agricultural IoT. IEEE Trans. Veh. Technol. 2024, 7, 6488–6503.
Hsueh, H.-Y.; Toma, A.-L.; Jaafar, H.A.; et al. Systematic Comparison of Path Planning Algorithms Using PathBench. Adv. Robot. 2022, 36, 566–581.
Sun, W.; Huan, J.; Wang, Q.; et al. FADQN: A Heuristic Reinforcement Learning Mechanism for UAV Path Planning in Unknown Environment. IEEE Access 2025, 13, 131909–131921.
Zhang, J.; Xian, Y.; Zhu, X.; et al. A Hybrid Deep Learning Model for UAV Path Planning in Dynamic Environments. IEEE Access 2025, 13, 67459–67475.
Mohammed, Y.I.; Hassan, R.; Hasan, M.K.; et al. Revolutionizing FANETs With Reinforcement Learning: Optimized Data Forwarding and Real-Time Adaptability. IEEE Open J. Commun. Soc. 2025, 6, 4295–4310.
Medani, K.; Gherbi, C.; Mabed, K.; et al. Energy-Efficient Q-Learning-Based Path Planning for UAV-Aided Data Collection in Agricultural WSNs. Internet Things 2025, 33, 101698.
Bashir, N.; Boudjit, S.; Dauphin, G.; et al. An Obstacle Avoidance Approach for UAV Path Planning. Simul. Model. Pract. Theory 2023, 129, 102815.
Mushtaq, M.U.; Venter, H.; Singh, A.; et al. Advances in Energy Harvesting for Sustainable Wireless Sensor Networks: Challenges and Opportunities. Hardware 2025, 3, 1.
Umer, M.M.; Venter, H.; Muhammad, O.; et al. Cognitive Strategies for UAV Trajectory Optimization: Ensuring Safety and Energy Efficiency in Real-World Scenarios. Ain Shams Eng. J. 2025, 16, 103301.
Zhang, Q.; Guo, Q.; Jiang, H.; et al. EMD Empowered Neural Network for Predicting Spatio-Temporal Non-Stationary Channel in UAV Communications. Appl. Intell. 2025, 55, 285.
Yang, K.; Liu, L. An Improved Deep Reinforcement Learning Algorithm for Path Planning in Unmanned Driving. IEEE Access 2024, 12, 67935–67944.
Wang, Z.; Ng, S.X.; Mohammed, E.-H. Deep Reinforcement Learning Assisted UAV Path Planning Relying on Cumulative Reward Mode and Region Segmentation. IEEE Open J. Veh. Technol. 2024, 5, 737–751.
Sharma, G.; Jain, S.; Sharma, R.S. Path Planning for Fully Autonomous UAVs—A Taxonomic Review and Future Perspectives. IEEE Access 2025, 13, 13356–13379.
Muhammad, O.; Jiang, H.; Bilal, M.; et al. Optimizing Power Allocation for URLLC-D2D in 5G Networks With Rician Fading Channel. PeerJ Comput. Sci. 2025, 11, e2712.
Zhang, Q.; Jiang, H.; Guo, Q.; et al. UAV-Assisted Wireless Communication Network Capacity Analysis and Deployment Decision. In Communications, Signal Processing, and Systems. CSPS 2020. Lecture Notes in Electrical Engineering; Liang, Q., Wang, W., Liu, X., et al., Eds.; Springer: Cham, Switzerland, 2020; 654, pp. 1185–1195.
Oladeji-Atanda, G. A Multi-Greedy Forwarding Approach in Geographic Packet Routing; Botswana International University of Science and Technology: Palapye, Botswana, 2023.
Parekh, R.; Honavar, V. Learning DFA from Simple Examples. Mach. Learn. 2001, 44, 9–35.
Karvinen, T. Configuration Management of Distributed Systems over Unreliable and Hostile Networks. Ph.D. Thesis, University of Westminster, London, UK, 15 May 2023.
Li, Y.; Li, S.; Zhang, Y.; et al. Dynamic Route Planning for a USV-UAV Multi-Robot System in the Rendezvous Task With Obstacles. J. Intell. Robot. Syst. 2023, 107, 52.
Kaljaca, D.; Vroegindeweij, B.; Van Henten, E. Coverage Trajectory Planning for a Bush Trimming Robot Arm. J. Field Robot. 2020, 37, 283–308.
Fu, H.; Li, Z.; Zhang, W.; et al. Research on Path Planning of Agricultural UAV Based on Improved Deep Reinforcement Learning. Agronomy 2024, 14, 2669.
Debnath, D.; Vanegas, F.; Boiteau, S.; et al. An Integrated Geometric Obstacle Avoidance and Genetic Algorithm TSP Model for UAV Path Planning. Drones 2024, 8, 302.
Sonmez, C.; Ozgovde, A.; Ersoy, C. EdgeCloudSim: An Environment for Performance Evaluation of Edge Computing Systems. Trans. Emerg. Telecommun. Technol. 2018, 29, e3493.
Javed, S.; Hassan, A.; Ahmad, R.; et al. State-of-the-Art and Future Research Challenges in UAV Swarms. IEEE Internet Things J. 2024, 11, 19023–19045.
Kadu, A.; Reddy, K.V.; Gawande, U. Innovative AgTech: A Predictive Machine Learning-Driven Precision Farming Solution for Enhancing Agricultural Productivity in Wardha. In Proceedings of the 2nd DMIHER International Conference on Artificial Intelligence in Healthcare, Education and Industry (IDICAIEI),Wardha, India, 29–30 November 2024; pp. 1–6.
Shao, L.; Qian, L.; Wu, M.; et al. Integrated Clustering and Routing Design and Triangle Path Optimization for UAV-Assisted Wireless Sensor Networks. China Commun. 2024, 21, 178–192.
Amodu, O.A.; Jarray, C.; Mahmood, R.A.R.; et al. Deep Reinforcement Learning for AoI Minimization in UAV-Aided Data Collection for WSN and IoT: A Survey. IEEE Access 2024, 12, 108000–108040.
Nwachukwu, S.E.; Folly, K.A.; Awodele, K.O. A Comparative Study Between Soft Actor-Critic (SAC) and Deep Deterministic Policy Gradient (DDPG) Algorithms for Solar PV MPPT Control Under Partial Shading Conditions. IEEE Access 2025, 13, 71738–71754.
Cheng, M.-M.; Fan, D.-P. Structure-Measure: A New Way to Evaluate Foreground Maps. Int. J. Comput. Vis. 2021, 129, 2622–2638.

Journal of Intelligent Communication

Article

Hybrid Evolutionary Reinforcement Learning for UAV Path Planning: Genetic Programming and Soft Actor Critic Integrations

Downloads

Authors

Keywords:

References