Getting started with OpenAI Gym – Part 1, Installation and configuration

OpenAI Gym toolkit provides easy visualisation tools to experiment with reinforcement learning algorithms. Here I will explain the process of setting it up and the issues I have faced.

Installation instructions are given in the github page. While I was trying in the default terminal I was getting issues with python dependencies and different versions of packages installed in the system. So I tried with a virtual environment to set up gym.  First, I have added the Anacaonda path to create a virtual environment.

export PATH="/<installation path>/anaconda3/bin:$PATH"

create virtual environment.

conda create -n py34 python=3.4
source activate py34
git clone https://github.com/openai/gym.git
cd gym
pip install -e .

This will install gym, if you are getting error saying swig not found. Install the dependencies,

sudo apt-get install python python-setuptools python-dev python-augeas gcc swig dialog

Run the sample program.

python
>import gym
>env = gym.make('LunarLander-v2')
>env.reset()
>env.render()

If everything is installed correctly, It will render this frame,

OpenAi gym lunar lander getting started

If there is an error regarding Box2D library, install it manually.

pip3 uninstall Box2D box2d-py
git clone https://github.com/pybox2d/pybox2d
cd pybox2d/
python setup.py clean
python setup.py build
python setup.py install

OpenAI gym needs OpenGL drivers to be configured in the machine. I have got issues with nvidia driver (nvidia-smi). So I tried switching to an older driver. This can be done through ‘Software Updater->Additional Drivers’.

OpenGl driver OpenAI Gym Nvidia

OpenGl driver can be tested by running glxgears  in terminal. If installed correctly, it shows up this image with animation.

OpenAI OPENGL configuration nvidia driver issue

For using MuJoCo physics engine, mujoco-py needs to be installed separately. The instructions are given here. Before running mujoco examples, add these paths to your .bashrc

#mujoco
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/kiran/.mujoco/mjpro150/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia-390
#If there is OpenGL error 
export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libGLEW.so:/usr/lib/nvidia-390/libGL.so

 

 

Custom image detector using Tensorflow object detection API

The aim of this tutorial to use tensorflow object detection API to detect custom objects. Here in this tutorial, we will try to train the network to recognize battery charging image (Why battery charging ? later, this trained net can be used in a robot to detect the charging point from a picture). This is basically an excerpt of sentdex tensorflow tutorial series. I have listed out the steps which I have done to train custom image for quick access.

Download files here

battery charging image detection
Image to detect

To train the model, first we need to collect training data. This can be done by collecting images from google images. I used a chrome extension ‘Fatkun Batch Download Image’ for saving bulk images. Once the images are downloaded, download and install labelImg to annotate the training data.

git clone https://github.com/tzutalin/labelImg.git
sudo apt-get install pyqt5-dev-tools
sudo pip3 install lxml
make qt5py3
python3 labelImg.py

Browse to the image folder that contains downloaded  images. The idea is to create xml label for all the images. Select the image one by one, Click create rectangle box, give the label as ‘charging sign’ and save as xml file(default). labelImg-tensorflow 

Create train and test directory. Copy 10% of images with respective xml label file to test directory and remaining 90% to train directory.
train_test_split

Run modified xml_to_csv.py from datitran’s github  to create ‘train/test_labels.csv’. The directory structure is as follows.

Next step is to generate tfrecord for test and train data from generated csv data. Use modified generate_tfrecord.py for this step and generate tfrecord for test and train data.

python3 generate_tfrecord.py --csv_input=data/train_labels.csv  --output_path=data/train.record

python3 generate_tfrecord.py --csv_input=data/test_labels.csv  --output_path=data/test.record

If you are getting error saying object_detection folder does not exist, export the below path. This tutorial needs tensor flow Object detection preinstalled.   Please follow this link for more information

# From tensorflow/models/research/
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim

Copy data, training, images and  ssd_mobilenet_v1_coco_11_06_2017 directories to tensorflow object_detection folder and start training.

wget http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_11_06_2017.tar.gz

python train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/ssd_mobilenet_v1_pets.config

ssd_mobilenet_v1_pets.config will have paths to both tf records, graph and pbtxt file which contain the classes to detect. The checkpoint files will be created inside training directory.

Next we need to create a frozen inference graph from the latest checkpoint file created. Once done, use the inference program to detect the charging sign.

python export_inference_graph.py --input_type image_tensor 
--pipeline_config_path training/ssd_mobilenet_v1_pets.config 
--trained_checkpoint_prefix training/model.ckpt-9871 
--output_directory charging_spot_det__inference_graph

python custom_inference.py

Since my training data set was small( less than 100) and there was only one class, the inference is buggy. It identifies almost everything as charging sign. but this can be extended with multiple classes and more training data to get accurate results.

Speech detection with Tensorflow 1.4 on Raspberry Pi 3 – Part 2: Live audio inferencing using PyAudio

Here is link to Part 1

Now we know, how to loop around the inferencing function, capture a voice for a fixed time and process it. What we need now is a program to listen to the input stream and measure the audio level. This will help us to take a decision if we need to capture the audio data or not.

File1: audio_intensity.py
The following code, reads a CHUNK of data from the stream and measure average intensity, prints it out so that we will know how much ambient noise is there in the background. First we need to figure out the average intensity level (INTENSITY) so that we will get a threshold number to check for.

import pyaudio
import wave
import math
import audioop
import time
 
p = pyaudio.PyAudio() 
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
CHUNK = 512 
RECORD_SECONDS = 1
WAVE_OUTPUT_FILENAME = "file.wav"
INTENSITY=11000
 
def audio_int(num_samples=50):
    print ('Getting intensity values from mic.')
    p = pyaudio.PyAudio()

    stream = p.open(format=FORMAT,
                    channels=CHANNELS,
                    rate=RATE,
                    input=True,
                    frames_per_buffer=CHUNK)
    #----------------------checks average noise level-------------------------
    cur_data = stream.read(CHUNK)
    values = [math.sqrt(abs(audioop.avg(cur_data, 4)))
                for x in range(num_samples)]
    values = sorted(values, reverse=True)
    r = sum(values[:int(num_samples * 0.2)]) / int(num_samples * 0.2)
    #---------------------prints out avg noise level--------------------------
    print (' Average audio intensity is r', r)
    time.sleep(.1)

    stream.close()
    p.terminate()
    return r


if(__name__ == '__main__'):
    while (True):
    audio_int()  # To measure your mic levels

File 2: audio_intensity_trigger.py
In this program, I have added an infinite loop and a check for INTENSITY level before printing the average audio level. If the room is silent or just background noise nothing is triggered. I have kept is as ‘11000’. Make sure that you change it according to output of audio_intensity.py. If its output is, say 8000, keep the intensity as 9000 or 10000.

......
......
......
while True:
  cur_data = stream.read(CHUNK)
  values = [math.sqrt(abs(audioop.avg(cur_data, 4)))
            for x in range(num_samples)]
  values = sorted(values, reverse=True)
  r = sum(values[:int(num_samples * 0.2)]) / int(num_samples * 0.2)
  #print " Finished "
  if (r > INTENSITY):
    print (' Average audio intensity is r', r)

stream.close()
......
......

File 3: audio_trigger_save_wav.py
T
his one will wait for the threshold and once triggered, it will save 1 second of audio to a file in wave format together with 5 frames of  previous voice chunks. This is important, otherwise our recording will not contain the starting of words or the words will be biased towards first half of 1 second and remaining half will be empty. The spectrogram when generated by tensorflow will looked chopped off.

......
......
......
    prev_data0=[]
    prev_data1=[]
    prev_data2=[]
    prev_data3=[]
    prev_data4=[]
    while True:
      #reading current data
      cur_data = stream.read(CHUNK)
      values = [math.sqrt(abs(audioop.avg(cur_data, 4)))
                for x in range(num_samples)]
      values = sorted(values, reverse=True)
      r = sum(values[:int(num_samples * 0.2)]) / int(num_samples * 0.2)
      if (r > INTENSITY):
        #-------------------------------------------------if triggered; file.wav = 5 previous frames + capture 1 sec of voice-------------------------------
        print (' Average audio intensity is r', r)
        frames = []
        frames.append(prev_data0)
        frames.append(prev_data1)
        frames.append(prev_data2)
        frames.append(prev_data3)
        frames.append(prev_data4)
        frames.append(cur_data)
        #---------------getting 1 second of voice data-----------------
        for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
          data = stream.read(CHUNK)
          frames.append(data)
        print ('finished recording')
        #-------------     ---saving wave file-------------------------
        waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
        waveFile.setnchannels(CHANNELS)
        waveFile.setsampwidth(p.get_sample_size(FORMAT))
        waveFile.setframerate(RATE)
        waveFile.writeframes(b''.join(frames))
        waveFile.close()
      #------------------------------------------------------if not triggered; saving previous values to a FIFO of 5 levels----------------------------------
      prev_data0=prev_data1
      prev_data1=prev_data2
      prev_data2=prev_data3
      prev_data3=prev_data4
      prev_data4=cur_data
     stream.close()
......
......
......

File 4: wav_trigger_inference.py
T
his is the modified tensorflow inference file (label_wav.py).  I have fused the program audio_trigger_save_wav.py to label_wav.py. The usage is,

cd /tensorflow/examples/speech_commands
touch file.wav ; to create a dummy file for the first pass
python3 wav_trigger_inference.py --graph=./my_frozen_graph.pb --labels=./conv_labels.txt --wav=file.wav

The while loop is around run_graph(). If the audio is detected and is above threshold; wave file is captured and given for inferencing. Once the results are printed out, it continue listening for the next audio.

....
....
....
      waveFile.writeframes(b''.join(frames))
      waveFile.close()
      with open(wav, 'rb') as wav_file:
        wav_data = wav_file.read()
      run_graph(wav_data, labels_list, input_name, output_name, how_many_labels)
    prev_data0=prev_data1
    prev_data1=prev_data2
....
....
....
 parser.add_argument(
      '--how_many_labels',
      type=int,
      default=1,# -------------------this will make use that, it prints out only one result with max probability------------------------
      help='Number of results to show.')
....
....
....

Here is the result. There are some errors while processing since the graph is not accurate. I could train it only till 88% accuracy. More data argumentation is needed for improving the accuracy and I may need to fiddle around with all the switches that is provided by tensorflow for training. But this is good enough to create a speech controlled device using raspberry pi.

Experiments with Q learning – A reinforcement learning technique

Q learning is a form of reinforcement learning which is based on reward and penalty.  There are no neural networks or deep learning methods. For all the states(states corresponds to possible locations the bot can move) there will be Q values for each action(left, right, top, bottom). The bot will take action which has the max Q value in any state. It will in turn update the Q values depending upon the outcome. The Q table is updated every time the bot moves.

The python code opens up a gui with a bot which uses Q learning to reach its destination, all the credits for the code goes to PhilippeMorere. I edited the code to understand it, added comment in each step, made the canvas bigger and made it difficult for the bot to traverse. The states are numbered in centre and visualised so that editing the gui is easier. The Q values and rewards are printed out to see the updated values. The code for the below video can be found here.  To run the code, extract the zip downloaded from github and run as,

python Learner.py

The training took only few minutes, the video shows that the bot learned to reach  green square after training. The arrow colour shows the relative values of Q along the directions. More reddish means less reward, less Q value.

The simulation time can be changed by adjusting the time in Learner.py . A value of 5, 5 seconds will allow us to see the various values getting printed out in the terminal. Currently it is set at .005 seconds for faster simulation.


# MODIFY THIS SLEEP IF THE GAME IS GOING TOO FAST.
time.sleep(.005)

Walls, penalty areas, reward areas can be changed/added in World.py.


#black wall locations
walls = [(0, 2)....(3, 5), (5, 4), (8, 0), (1, 1)]
#green and red square locations
specials = [(4, 1, "red", -1)... (6, 1, "red", -1)]

 

Video inferencing on neural network trained using NVIDIA DIGITS with opencv

I have been playing with the inferencing code for some time. Here is a real time video inferencing using opencv to capture video and slice through the frames. The overall frame rate is low due to the system slowness. In the video, ‘frame’ is the normalised image caffe network sees after reducing mean image file . ‘frame2’ is the input image.

Caffe model is trained in NVIDIA DIGITS using goolgleNet(SGD, 100 epoch), it reached 100% accuracy by 76 epoch.
NVIDIA DIGITS goolgleNet caffe inferencing

Here is the inferencing code.


import numpy as np
import matplotlib.pyplot as plt
import caffe
import time
import cv2
cap = cv2.VideoCapture(0)
from skimage import io

MODEL_FILE = './deploy.prototxt'
PRETRAINED = './snapshot_iter_4864.caffemodel'
MEAN_IMAGE = './mean.jpg'
#Caffe
mean_image = caffe.io.load_image(MEAN_IMAGE)
caffe.set_mode_gpu()
net = caffe.Classifier(MODEL_FILE, PRETRAINED,
channel_swap=(2,1,0),
raw_scale=255,
image_dims=(256, 256))
#OpenCv loop
while(True):
    start = time.time()
    ret, frame = cap.read()
    resized_image = cv2.resize(frame, (256, 256)) 
    cv2.imwrite("frame.jpg", resized_image)
    IMAGE_FILE = './frame.jpg'
    im2 = caffe.io.load_image(IMAGE_FILE)
    inferImg = im2 - mean_image
    #print ("Shape------->",inferImg.shape)
    #Inferencing
    prediction = net.predict([inferImg])
    end = time.time()
    pred=prediction[0].argmax()
    #print ("prediction -> ",prediction[0]) 
    if pred == 0:
       print("cat")
    else:
       print("dog")
    #Opencv display
    cv2.imshow('frame',inferImg)
    cv2.imshow('frame2',im2)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break
cap.release()
cv2.destroyAllWindows()

 

 

Inferencing on the trained caffe model from NVIDIA DIGITS

With this post I will explain how to do inferencing on the trained network created with NVIDIA DIGITS through command line. link to the previous post here

In DIGITS UI, we have to upload a file into the model webpage to do inferencing. This is time consuming and not practical for real world appications. We need to deploy trained model as a standalone python application.
To achieve this we need to download the trained model from NVIDIA DIGITS model page. This will download a .tgz file to your computer. Open the .tgz file using this command

tar -xvzf filename.tgz
caffe model NVIDIA DIGITS
 

Save the ‘Image mean’ image file from datasets page of NVIDIA DIGITS in to your computer.

NVIDIA DIGITS inferencing

Provide path for,

'Image mean' file    -> eg:'/home/catsndogs/mean.jpg'
deploy.prototext ->eg:'/home/catsndogs/deploy.prototxt'
caffemodel ->eg:'/home/catsndogs/snapshot_iter_480.caffemodel'
input image to test ->eg:'/home/catsndogs/image_to_test.jpg'

in the below python script.

import numpy as np
import matplotlib.pyplot as plt
import caffe
import time
from PIL import Image

MODEL_FILE = '/home/catsndogs/deploy.prototxt'
PRETRAINED = '/home/catsndogs/snapshot_iter_480.caffemodel'
MEAN_IMAGE = '/home/catsndogs/mean.jpg'
# load the mean image
mean_image = caffe.io.load_image(MEAN_IMAGE)
#input the image file need to be tested
IMAGE_FILE = '/home/catsndogs/image_to_test.jpg'
im1 = Image.open(IMAGE_FILE)
# Tell Caffe to use the GPU
caffe.set_mode_gpu()
# Initialize the Caffe model using the model trained in DIGITS
net = caffe.Classifier(MODEL_FILE, PRETRAINED,
channel_swap=(2,1,0),
raw_scale=255,
image_dims=(256, 256))
# Load the input image into a numpy array and display it
plt.imshow(im1)
# Iterate over each grid square using the model to make a class prediction
start = time.time()
inferImg = im1.resize((256, 256), Image.NEAREST)
inferImg -= mean_image
prediction = net.predict([inferImg])
end = time.time()
print(prediction[0].argmax())
pred=prediction[0].argmax()
if pred == 0: 
  print("cat")
else: 
  print("dog")
# Display total time to perform inference
print 'Total inference time: ' + str(end-start) + ' seconds'

Run the file with

python catsndogs.py

for inferencing.