Q learning is a form of reinforcement learning which is based on reward and penalty. There are no neural networks or deep learning methods. For all the states(states corresponds to possible locations the bot can move) there will be Q values for each action(left, right, top, bottom). The bot will take action which has the max Q value in any state. It will in turn update the Q values depending upon the outcome. The Q table is updated every time the bot moves.
The python code opens up a gui with a bot which uses Q learning to reach its destination, all the credits for the code goes to PhilippeMorere. I edited the code to understand it, added comment in each step, made the canvas bigger and made it difficult for the bot to traverse. The states are numbered in centre and visualised so that editing the gui is easier. The Q values and rewards are printed out to see the updated values. The code for the below video can be found here. To run the code, extract the zip downloaded from github and run as,
The training took only few minutes, the video shows that the bot learned to reach green square after training. The arrow colour shows the relative values of Q along the directions. More reddish means less reward, less Q value.
The simulation time can be changed by adjusting the time in Learner.py . A value of 5, 5 seconds will allow us to see the various values getting printed out in the terminal. Currently it is set at .005 seconds for faster simulation.
# MODIFY THIS SLEEP IF THE GAME IS GOING TOO FAST. time.sleep(.005)
Walls, penalty areas, reward areas can be changed/added in World.py.
#black wall locations walls = [(0, 2)....(3, 5), (5, 4), (8, 0), (1, 1)] #green and red square locations specials = [(4, 1, "red", -1)... (6, 1, "red", -1)]