Through a conference we learn how to reinforce learning while we deeprace. Our algorithm: Vanilla policy gradient considered we have a model which let us train ourselves with positive reinforce each time we do something right.
Well, we are not dogs or cats but it looked like humans experience positive rewards in the same way animals do. We start learning some chore concepts of psychology to understand the purpose of the simulation and the training of our models.
The reinforce strategy is used only during the training and creation of the model not while we are in the race.
RL vs robotic racing
In the first one we collect data observing a driver doing the movements of driving, In the second one we can control the movements in a simulation and extract the data, which will be later explore.
Throughout different measures from the environment the participant can practice with their own virtual model to explore the training before the big race.