Getting started with OpenAI Gym – Part 1, Installation and configuration

OpenAI Gym toolkit provides easy visualisation tools to experiment with reinforcement learning algorithms. Here I will explain the process of setting it up and the issues I have faced.

Installation instructions are given in the github page. While I was trying in the default terminal I was getting issues with python dependencies and different versions of packages installed in the system. So I tried with a virtual environment to set up gym.  First, I have added the Anacaonda path to create a virtual environment.

export PATH="/<installation path>/anaconda3/bin:$PATH"

create virtual environment.

conda create -n py34 python=3.4
source activate py34
git clone https://github.com/openai/gym.git
cd gym
pip install -e .

This will install gym, if you are getting error saying swig not found. Install the dependencies,

sudo apt-get install python python-setuptools python-dev python-augeas gcc swig dialog

Run the sample program.

python
>import gym
>env = gym.make('LunarLander-v2')
>env.reset()
>env.render()

If everything is installed correctly, It will render this frame,

OpenAi gym lunar lander getting started

If there is an error regarding Box2D library, install it manually.

pip3 uninstall Box2D box2d-py
git clone https://github.com/pybox2d/pybox2d
cd pybox2d/
python setup.py clean
python setup.py build
python setup.py install

OpenAI gym needs OpenGL drivers to be configured in the machine. I have got issues with nvidia driver (nvidia-smi). So I tried switching to an older driver. This can be done through ‘Software Updater->Additional Drivers’.

OpenGl driver OpenAI Gym Nvidia

OpenGl driver can be tested by running glxgears  in terminal. If installed correctly, it shows up this image with animation.

OpenAI OPENGL configuration nvidia driver issue

For using MuJoCo physics engine, mujoco-py needs to be installed separately. The instructions are given here. Before running mujoco examples, add these paths to your .bashrc

#mujoco
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/kiran/.mujoco/mjpro150/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia-390
#If there is OpenGL error 
export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libGLEW.so:/usr/lib/nvidia-390/libGL.so

 

 

Experiments with Q learning – A reinforcement learning technique

Q learning is a form of reinforcement learning which is based on reward and penalty.  There are no neural networks or deep learning methods. For all the states(states corresponds to possible locations the bot can move) there will be Q values for each action(left, right, top, bottom). The bot will take action which has the max Q value in any state. It will in turn update the Q values depending upon the outcome. The Q table is updated every time the bot moves.

The python code opens up a gui with a bot which uses Q learning to reach its destination, all the credits for the code goes to PhilippeMorere. I edited the code to understand it, added comment in each step, made the canvas bigger and made it difficult for the bot to traverse. The states are numbered in centre and visualised so that editing the gui is easier. The Q values and rewards are printed out to see the updated values. The code for the below video can be found here.  To run the code, extract the zip downloaded from github and run as,

python Learner.py

The training took only few minutes, the video shows that the bot learned to reach  green square after training. The arrow colour shows the relative values of Q along the directions. More reddish means less reward, less Q value.

The simulation time can be changed by adjusting the time in Learner.py . A value of 5, 5 seconds will allow us to see the various values getting printed out in the terminal. Currently it is set at .005 seconds for faster simulation.


# MODIFY THIS SLEEP IF THE GAME IS GOING TOO FAST.
time.sleep(.005)

Walls, penalty areas, reward areas can be changed/added in World.py.


#black wall locations
walls = [(0, 2)....(3, 5), (5, 4), (8, 0), (1, 1)]
#green and red square locations
specials = [(4, 1, "red", -1)... (6, 1, "red", -1)]