Getting started with OpenAI Gym – Part 1, Installation and configuration

OpenAI Gym toolkit provides easy visualisation tools to experiment with reinforcement learning algorithms. Here I will explain the process of setting it up and the issues I have faced.

Installation instructions are given in the github page. While I was trying in the default terminal I was getting issues with python dependencies and different versions of packages installed in the system. So I tried with a virtual environment to set up gym.  First, I have added the Anacaonda path to create a virtual environment.

export PATH="/<installation path>/anaconda3/bin:$PATH"

create virtual environment.

conda create -n py34 python=3.4
source activate py34
git clone
cd gym
pip install -e .

This will install gym, if you are getting error saying swig not found. Install the dependencies,

sudo apt-get install python python-setuptools python-dev python-augeas gcc swig dialog

Run the sample program.

>import gym
>env = gym.make('LunarLander-v2')

If everything is installed correctly, It will render this frame,

OpenAi gym lunar lander getting started

If there is an error regarding Box2D library, install it manually.

pip3 uninstall Box2D box2d-py
git clone
cd pybox2d/
python clean
python build
python install

OpenAI gym needs OpenGL drivers to be configured in the machine. I have got issues with nvidia driver (nvidia-smi). So I tried switching to an older driver. This can be done through ‘Software Updater->Additional Drivers’.

OpenGl driver OpenAI Gym Nvidia

OpenGl driver can be tested by running glxgears  in terminal. If installed correctly, it shows up this image with animation.

OpenAI OPENGL configuration nvidia driver issue

For using MuJoCo physics engine, mujoco-py needs to be installed separately. The instructions are given here. Before running mujoco examples, add these paths to your .bashrc

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/kiran/.mujoco/mjpro150/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia-390
#If there is OpenGL error 
export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/



Custom image detector using Tensorflow object detection API

The aim of this tutorial to use tensorflow object detection API to detect custom objects. Here in this tutorial, we will try to train the network to recognize battery charging image (Why battery charging ? later, this trained net can be used in a robot to detect the charging point from a picture). This is basically an excerpt of sentdex tensorflow tutorial series. I have listed out the steps which I have done to train custom image for quick access.

Download files here

battery charging image detection
Image to detect

To train the model, first we need to collect training data. This can be done by collecting images from google images. I used a chrome extension ‘Fatkun Batch Download Image’ for saving bulk images. Once the images are downloaded, download and install labelImg to annotate the training data.

git clone
sudo apt-get install pyqt5-dev-tools
sudo pip3 install lxml
make qt5py3

Browse to the image folder that contains downloaded  images. The idea is to create xml label for all the images. Select the image one by one, Click create rectangle box, give the label as ‘charging sign’ and save as xml file(default). labelImg-tensorflow 

Create train and test directory. Copy 10% of images with respective xml label file to test directory and remaining 90% to train directory.

Run modified from datitran’s github  to create ‘train/test_labels.csv’. The directory structure is as follows.

Next step is to generate tfrecord for test and train data from generated csv data. Use modified for this step and generate tfrecord for test and train data.

python3 --csv_input=data/train_labels.csv  --output_path=data/train.record

python3 --csv_input=data/test_labels.csv  --output_path=data/test.record

If you are getting error saying object_detection folder does not exist, export the below path. This tutorial needs tensor flow Object detection preinstalled.   Please follow this link for more information

# From tensorflow/models/research/
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim

Copy data, training, images and  ssd_mobilenet_v1_coco_11_06_2017 directories to tensorflow object_detection folder and start training.


python --logtostderr --train_dir=training/ --pipeline_config_path=training/ssd_mobilenet_v1_pets.config

ssd_mobilenet_v1_pets.config will have paths to both tf records, graph and pbtxt file which contain the classes to detect. The checkpoint files will be created inside training directory.

Next we need to create a frozen inference graph from the latest checkpoint file created. Once done, use the inference program to detect the charging sign.

python --input_type image_tensor 
--pipeline_config_path training/ssd_mobilenet_v1_pets.config 
--trained_checkpoint_prefix training/model.ckpt-9871 
--output_directory charging_spot_det__inference_graph


Since my training data set was small( less than 100) and there was only one class, the inference is buggy. It identifies almost everything as charging sign. but this can be extended with multiple classes and more training data to get accurate results.

Speech detection with Tensorflow 1.4 on Raspberry Pi 3 – Part 1: Getting audio file using PyAudio

In previous posts 1 and about speech detection using tensorflow, it is shown how to inference a 1 sec audio sample using the graph that is trained in tensorflow by running This series of posts will look into inferencing a continuous stream of audio. there is an excellent post  by Allan in which shows how to do the same but, I was not happy with the results and the code was quiet a lot to understand. It uses tensorflow audio functions to process the audio. I will be using pyAudio to process audio since it is easy to understand and later, I may move into tensorflow audio processing. The code posted is running on raspberry pi 3 but it should be able to run on any linux system without any modification.

To get the audio, you need to purchase a usb sound card as shown in the figure below, this is available in ebay/aliexpress or amazon. Connect a 2.5mm mic to it or like I did, scavenge a mic from old electronics and a 2.5mm audio jack and connect it together.

usb audio card on paspberry pi for tensorflow
USB microphone, pi noir camera and earphones for audio on raspberry pi 3

The following python code will record a 1 sec audio and save it as a .wav file. For tensorflow speech recognition we use a sampling rate of 16K (RATE), single channel (CHANNELS) and 1 sec duration (RECORD_SECONDS).

import pyaudio
import wave
FORMAT = pyaudio.paInt16
RATE = 16000
CHUNK = 512 
audio = pyaudio.PyAudio()
# start Recording
stream =, channels=CHANNELS,
                rate=RATE, input=True,
print "recording..."
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
    data =
print "finished recording"
# stop Recording
waveFile =, 'wb')

When you run pyaudio.PyAudio() ALSA may print out errors like the one shown below.

ALSA error raspberry pi 3 tensorflow

The errors can be removed by commenting out the corresponding devices in /usr/share/alsa/alsa.conf.

alsa conf error raspberry pi 3

Next step is to integrate this to in tensorflow branch: tensorflow/examples/speech_commands/ In the updated file;, I have added a for loop around run_graph() to record a 1 sec wav audio. A new audio sample will be recorded every time when the loop runs and the audio is rewritten with the same file name.

audio inferencing tensorflow raspberry pi

Here is the output. The input file is given as file.wav from the same directory …../speech_commands, the file will be overwritten each time when recording finishes. To start with, create a dummy file file.wav to run the script.

touch file.wav
python3 --graph=./my_frozen_graph.pb --labels=./conv_labels.txt --wav=./file.wav

audio inferencing raspberry pi tensorflow



This is by no means a great method to do speech inferencing. We need to wait for the script to record and again for the next command. But, this is the start. In the next post I will explain how to detect an audio threshold to activate the recording/inferencing. For this I have forked a google assistant speech invoking script written by jeysonmc, this will be the starting point.


Simple program to capture video using OpenCV and python for video inferencing

Note: OpenCV needs to be compiled and installed in Ubuntu to get all its full functionality. ‘pip install open-cv‘ will install python bindings but it is very limited by its use. Most of the time there are multiple issues while running. There are lot of dependencies for OpenCV build, I have installed OpenCV by following this tutorial for python 2.7 and python 3.5.  It took around two hours when compiled using 4 cores.

This program contains an infinite while loop, inside the loop each frame is captured from the video device specified and shows it frame by frame.

import cv2
#the video device number is 0
cap = cv2.VideoCapture(0)
    #reading frame
    ret, frame =
    #frame is displayed 
    #press q to break
    if cv2.waitKey(1) & 0xFF == ord('q'):

To run the program,


The earlier post is embedding this OpenCV program for video inferencing.

Experiments with Q learning – A reinforcement learning technique

Q learning is a form of reinforcement learning which is based on reward and penalty.  There are no neural networks or deep learning methods. For all the states(states corresponds to possible locations the bot can move) there will be Q values for each action(left, right, top, bottom). The bot will take action which has the max Q value in any state. It will in turn update the Q values depending upon the outcome. The Q table is updated every time the bot moves.

The python code opens up a gui with a bot which uses Q learning to reach its destination, all the credits for the code goes to PhilippeMorere. I edited the code to understand it, added comment in each step, made the canvas bigger and made it difficult for the bot to traverse. The states are numbered in centre and visualised so that editing the gui is easier. The Q values and rewards are printed out to see the updated values. The code for the below video can be found here.  To run the code, extract the zip downloaded from github and run as,


The training took only few minutes, the video shows that the bot learned to reach  green square after training. The arrow colour shows the relative values of Q along the directions. More reddish means less reward, less Q value.

The simulation time can be changed by adjusting the time in . A value of 5, 5 seconds will allow us to see the various values getting printed out in the terminal. Currently it is set at .005 seconds for faster simulation.


Walls, penalty areas, reward areas can be changed/added in

#black wall locations
walls = [(0, 2)....(3, 5), (5, 4), (8, 0), (1, 1)]
#green and red square locations
specials = [(4, 1, "red", -1)... (6, 1, "red", -1)]


Video inferencing on neural network trained using NVIDIA DIGITS with opencv

I have been playing with the inferencing code for some time. Here is a real time video inferencing using opencv to capture video and slice through the frames. The overall frame rate is low due to the system slowness. In the video, ‘frame’ is the normalised image caffe network sees after reducing mean image file . ‘frame2’ is the input image.

Caffe model is trained in NVIDIA DIGITS using goolgleNet(SGD, 100 epoch), it reached 100% accuracy by 76 epoch.
NVIDIA DIGITS goolgleNet caffe inferencing

Here is the inferencing code.

import numpy as np
import matplotlib.pyplot as plt
import caffe
import time
import cv2
cap = cv2.VideoCapture(0)
from skimage import io

MODEL_FILE = './deploy.prototxt'
PRETRAINED = './snapshot_iter_4864.caffemodel'
MEAN_IMAGE = './mean.jpg'
mean_image =
net = caffe.Classifier(MODEL_FILE, PRETRAINED,
image_dims=(256, 256))
#OpenCv loop
    start = time.time()
    ret, frame =
    resized_image = cv2.resize(frame, (256, 256)) 
    cv2.imwrite("frame.jpg", resized_image)
    IMAGE_FILE = './frame.jpg'
    im2 =
    inferImg = im2 - mean_image
    #print ("Shape------->",inferImg.shape)
    prediction = net.predict([inferImg])
    end = time.time()
    #print ("prediction -> ",prediction[0]) 
    if pred == 0:
    #Opencv display

    if cv2.waitKey(1) & 0xFF == ord('q'):



Inferencing on the trained caffe model from NVIDIA DIGITS

With this post I will explain how to do inferencing on the trained network created with NVIDIA DIGITS through command line. link to the previous post here

In DIGITS UI, we have to upload a file into the model webpage to do inferencing. This is time consuming and not practical for real world appications. We need to deploy trained model as a standalone python application.
To achieve this we need to download the trained model from NVIDIA DIGITS model page. This will download a .tgz file to your computer. Open the .tgz file using this command

tar -xvzf filename.tgz
caffe model NVIDIA DIGITS

Save the ‘Image mean’ image file from datasets page of NVIDIA DIGITS in to your computer.

NVIDIA DIGITS inferencing

Provide path for,

'Image mean' file    -> eg:'/home/catsndogs/mean.jpg'
deploy.prototext ->eg:'/home/catsndogs/deploy.prototxt'
caffemodel ->eg:'/home/catsndogs/snapshot_iter_480.caffemodel'
input image to test ->eg:'/home/catsndogs/image_to_test.jpg'

in the below python script.

import numpy as np
import matplotlib.pyplot as plt
import caffe
import time
from PIL import Image

MODEL_FILE = '/home/catsndogs/deploy.prototxt'
PRETRAINED = '/home/catsndogs/snapshot_iter_480.caffemodel'
MEAN_IMAGE = '/home/catsndogs/mean.jpg'
# load the mean image
mean_image =
#input the image file need to be tested
IMAGE_FILE = '/home/catsndogs/image_to_test.jpg'
im1 =
# Tell Caffe to use the GPU
# Initialize the Caffe model using the model trained in DIGITS
net = caffe.Classifier(MODEL_FILE, PRETRAINED,
image_dims=(256, 256))
# Load the input image into a numpy array and display it
# Iterate over each grid square using the model to make a class prediction
start = time.time()
inferImg = im1.resize((256, 256), Image.NEAREST)
inferImg -= mean_image
prediction = net.predict([inferImg])
end = time.time()
if pred == 0: 
# Display total time to perform inference
print 'Total inference time: ' + str(end-start) + ' seconds'

Run the file with


for inferencing.