TY - GEN
T1 - On the importance of reward design in reinforcement learning-based dynamic algorithm configuration
T2 - a case study on OneMax with (1+(λ,λ))-GA
AU - Nguyen, Tai
AU - Le, Phong
AU - Biedenkapp, André
AU - Doerr, Carola
AU - Dang, Nguyen
N1 - Funding: The project is financially supported by the European Union (ERC,“dynaBBO”, grant no. 101125586), by ANR project ANR-23-CE23-0035 Opt4DAC. André Biedenkapp acknowledges funding through the research network “Responsive and Scalable Learning for Robots Assisting Humans” (ReScaLe) of the University of Freiburg. The ReScaLe project is funded by the Carl Zeiss Foundation.
Tai Nguyen acknowledges funding from the St Andrews Global Doctoral Scholarship programme.
PY - 2025/3/19
Y1 - 2025/3/19
N2 - Dynamic Algorithm Configuration (DAC) has garnered significant attention in recent years, particularly in the prevalence of machine learning and deep learning algorithms. Numerous studies have leveraged the robustness of decision-making in Reinforcement Learning (RL) to address the optimization challenges associated with algorithm configuration. However, making an RL agent work properly is a non-trivial task, especially in reward design, which necessitates a substantial amount of handcrafted knowledge based on domain expertise. In this work, we study the importance of reward design in the context of DAC via a case study on controlling the population size of the (1+(λ,λ))-GA optimizing OneMax. We observed that a poorly designed reward can hinder the RL agent's ability to learn an optimal policy because of a lack of exploration, leading to both scalability and learning divergence issues. To address those challenges, we propose the application of a reward shaping mechanism to facilitate enhanced exploration of the environment by the RL agent. Our work not only demonstrates the ability of RL in dynamically configuring the (1+(λ,λ))-GA, but also confirms the advantages of reward shaping in the scalability of RL agents across various sizes of OneMax problems.
AB - Dynamic Algorithm Configuration (DAC) has garnered significant attention in recent years, particularly in the prevalence of machine learning and deep learning algorithms. Numerous studies have leveraged the robustness of decision-making in Reinforcement Learning (RL) to address the optimization challenges associated with algorithm configuration. However, making an RL agent work properly is a non-trivial task, especially in reward design, which necessitates a substantial amount of handcrafted knowledge based on domain expertise. In this work, we study the importance of reward design in the context of DAC via a case study on controlling the population size of the (1+(λ,λ))-GA optimizing OneMax. We observed that a poorly designed reward can hinder the RL agent's ability to learn an optimal policy because of a lack of exploration, leading to both scalability and learning divergence issues. To address those challenges, we propose the application of a reward shaping mechanism to facilitate enhanced exploration of the environment by the RL agent. Our work not only demonstrates the ability of RL in dynamically configuring the (1+(λ,λ))-GA, but also confirms the advantages of reward shaping in the scalability of RL agents across various sizes of OneMax problems.
KW - Automated algorithm configuration
KW - Deep reinforcement learning
UR - https://dl.acm.org/conference/gecco
U2 - 10.48550/arXiv.2502.20265
DO - 10.48550/arXiv.2502.20265
M3 - Conference contribution
SN - 979840071465
BT - Proceedings of the Genetic and Evolutionary Computation Conference 2025
PB - ACM
ER -