Neural Logic Reinforcement Learning


Deep reinforcement learning (DRL) has achieved significant breakthroughs in various tasks. However, most DRL algorithms suffer a problem of generalizing the learned policy which makes the learning performance largely affected even by minor modifications of the training environment. Except that, the use of deep neural networks makes the learned policies hard to be interpretable. To address these two challenges, we propose a novel algorithm named Neural Logic Reinforcement Learning (NLRL) to represent the policies in reinforcement learning by first-order logic. NLRL is based on policy gradient methods and differentiable inductive logic programming that have demonstrated significant advantages in terms of interpretability and generalisability in supervised tasks. Extensive experiments conducted on cliff-walking and blocks manipulation tasks demonstrate that NLRL can induce interpretable policies achieving near-optimal performance while demonstrating good generalisability to environments of different initial states and problem sizes.

In 36th International Conference on Machine Learning
Zhengyao Jiang
Zhengyao Jiang
PhD Student of Machine Learning

I’m Zhengyao Jiang, a machine learning PhD student at UCL, supervised by Tim Rocktäschel and Edward Grefenstette. I’m interested in improving the data efficiency and generalization of reinforcement learning, pushing the RL to real-world applications. To reach those goals, my research focuses on both incorporating priors with neuro-symbolic methods and leveraging off-policy/offline data. I used to work on deep learning financial applications, having the experience of algorithmic live trading on the cryptocurrency market.