Member-only story

Reinforcement Learning with Human Feedback (RLHF) for algorithmic trading

18 min readJun 30, 2023

The success of ChatGPT brings the Reinforcement Learning with Human Feedback (RLHF) technique in the spotlight. RLHF is a type of machine learning approach that combines reinforcement learning (RL) and human feedback (HF) to improve the learning process. This post will give you a comprehensive understanding for RLHF. It describes RLHF applications in algorithmic trading (algo trading) and provides executable Python code examples. In the code examples, I will present a code example that does not have RLHF, then add RLHF to the code examples. I believe this is a natural way to learn a topic. I gradually take you deeper to the components in RLHF including Epsilon-greedy policy and Q-learning update rule. This will equip algorithmic traders for RLHF.

What is reinforce learning with human feedback?

It will be interesting to explain reinforcement learning with the classical game Pacman. The Pacman follows food and avoids ghosts in order to get higher scores. The food reinforces its actions every time its makes a move. In the traditional reinforcement learning (RL) terminology, the Pacman is the “agent” that learns from trial and error by interacting with an environment and receiving reward signals.

Reinforcement Learning with Human Feedback (RLHF) for algorithmic trading

Written by Chris Kuo/Dr. Dataman

Responses (4)