Getting started with OpenAI Gym – Part 1, Installation and configuration

OpenAI Gym toolkit provides easy visualisation tools to experiment with reinforcement learning algorithms. Here I will explain the process of setting it up and the issues I have faced.

Installation instructions are given in the github page. While I was trying in the default terminal I was getting issues with python dependencies and different versions of packages installed in the system. So I tried with a virtual environment to set up gym.  First, I have added the Anacaonda path to create a virtual environment.

export PATH="/<installation path>/anaconda3/bin:$PATH"

create virtual environment.

conda create -n py34 python=3.4
source activate py34
git clone
cd gym
pip install -e .

This will install gym, if you are getting error saying swig not found. Install the dependencies,

sudo apt-get install python python-setuptools python-dev python-augeas gcc swig dialog

Run the sample program.

>import gym
>env = gym.make('LunarLander-v2')

If everything is installed correctly, It will render this frame,

OpenAi gym lunar lander getting started

If there is an error regarding Box2D library, install it manually.

pip3 uninstall Box2D box2d-py
git clone
cd pybox2d/
python clean
python build
python install

OpenAI gym needs OpenGL drivers to be configured in the machine. I have got issues with nvidia driver (nvidia-smi). So I tried switching to an older driver. This can be done through ‘Software Updater->Additional Drivers’.

OpenGl driver OpenAI Gym Nvidia

OpenGl driver can be tested by running glxgears  in terminal. If installed correctly, it shows up this image with animation.

OpenAI OPENGL configuration nvidia driver issue

For using MuJoCo physics engine, mujoco-py needs to be installed separately. The instructions are given here. Before running mujoco examples, add these paths to your .bashrc

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/kiran/.mujoco/mjpro150/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia-390
#If there is OpenGL error 
export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/



Custom image detector using Tensorflow object detection API

The aim of this tutorial to use tensorflow object detection API to detect custom objects. Here in this tutorial, we will try to train the network to recognize battery charging image (Why battery charging ? later, this trained net can be used in a robot to detect the charging point from a picture). This is basically an excerpt of sentdex tensorflow tutorial series. I have listed out the steps which I have done to train custom image for quick access.

Download files here

battery charging image detection
Image to detect

To train the model, first we need to collect training data. This can be done by collecting images from google images. I used a chrome extension ‘Fatkun Batch Download Image’ for saving bulk images. Once the images are downloaded, download and install labelImg to annotate the training data.

git clone
sudo apt-get install pyqt5-dev-tools
sudo pip3 install lxml
make qt5py3

Browse to the image folder that contains downloaded  images. The idea is to create xml label for all the images. Select the image one by one, Click create rectangle box, give the label as ‘charging sign’ and save as xml file(default). labelImg-tensorflow 

Create train and test directory. Copy 10% of images with respective xml label file to test directory and remaining 90% to train directory.

Run modified from datitran’s github  to create ‘train/test_labels.csv’. The directory structure is as follows.

Next step is to generate tfrecord for test and train data from generated csv data. Use modified for this step and generate tfrecord for test and train data.

python3 --csv_input=data/train_labels.csv  --output_path=data/train.record

python3 --csv_input=data/test_labels.csv  --output_path=data/test.record

If you are getting error saying object_detection folder does not exist, export the below path. This tutorial needs tensor flow Object detection preinstalled.   Please follow this link for more information

# From tensorflow/models/research/
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim

Copy data, training, images and  ssd_mobilenet_v1_coco_11_06_2017 directories to tensorflow object_detection folder and start training.


python --logtostderr --train_dir=training/ --pipeline_config_path=training/ssd_mobilenet_v1_pets.config

ssd_mobilenet_v1_pets.config will have paths to both tf records, graph and pbtxt file which contain the classes to detect. The checkpoint files will be created inside training directory.

Next we need to create a frozen inference graph from the latest checkpoint file created. Once done, use the inference program to detect the charging sign.

python --input_type image_tensor 
--pipeline_config_path training/ssd_mobilenet_v1_pets.config 
--trained_checkpoint_prefix training/model.ckpt-9871 
--output_directory charging_spot_det__inference_graph


Since my training data set was small( less than 100) and there was only one class, the inference is buggy. It identifies almost everything as charging sign. but this can be extended with multiple classes and more training data to get accurate results.

Getting Tensorflow 1.4 working with NVIDIA DIGITS in Ubuntu 16.04

The steps to follow are here.

In this post, I am not explaining all the bits and pieces on how to install, I am trying to avoid the confusion regarding what to follow.

Here is the main dependency, If you need to train tensorflow model with NVIDIA DIGITS, you need to get DIGITS 6. If you have DIGITS 5 installed, It won’t detect tensorflow. At the time of writing, unlike installing DIGITS 5, there are no binaries provided by NVIDIA to install DIGITS 6. You need to either install using docker or compile and install from source. I tried installing from docker, later I figured out that unless you are already familiar with docker, you are going to spend hell a lot of time trying to understand docker itself and its usage. Then there is nvidia-docker, which is the one actually needed for NVIDIA DIGITS 6. I tried for some time and realised that it is not required for me since I own the machine and I am the only person using it. I really am not ready to spend time on going through docker pain at this point of time.

Even though I am not a fan of compiling and installing, looks like that’s the only way. It is going to take some time and, you may need to fix some build failures, dependencies, stack-overflowing, googling etc. I followed the installation instructions from DIGITS github page.

Long story short,  you need to,

  1. Remove DIGITS 5 ( check here how to list and remove packages)
  2. compile and install Caffe (can not skip this, it is a dependency for DIGITS 6)
  3. compile and install Torch (not a requirement but let’s keep it)
  4. compile and install tensorflow_gpu (I had this already, so I skipped)
  5. compile and install DIGITS 6

Make sure you add this variables to ~/.bashrc

export DIGITS_ROOT=~/digits
export CAFFE_ROOT=~/caffe
export TORCH_ROOT=~/torch

The digits server can be invoked by ‘digits-devserver &‘. By default the service will be active at http://localhost:5000/
If everything goes fine, when you create a new model in DIGITS you can see the frameworks.tensorflow on NVIDIA digits 6


Speech detection with Tensorflow 1.4 on Raspberry Pi 3 – Part 2: Live audio inferencing using PyAudio

Here is link to Part 1

Now we know, how to loop around the inferencing function, capture a voice for a fixed time and process it. What we need now is a program to listen to the input stream and measure the audio level. This will help us to take a decision if we need to capture the audio data or not.

The following code, reads a CHUNK of data from the stream and measure average intensity, prints it out so that we will know how much ambient noise is there in the background. First we need to figure out the average intensity level (INTENSITY) so that we will get a threshold number to check for.

import pyaudio
import wave
import math
import audioop
import time
p = pyaudio.PyAudio() 
FORMAT = pyaudio.paInt16
RATE = 16000
CHUNK = 512 
def audio_int(num_samples=50):
    print ('Getting intensity values from mic.')
    p = pyaudio.PyAudio()

    stream =,
    #----------------------checks average noise level-------------------------
    cur_data =
    values = [math.sqrt(abs(audioop.avg(cur_data, 4)))
                for x in range(num_samples)]
    values = sorted(values, reverse=True)
    r = sum(values[:int(num_samples * 0.2)]) / int(num_samples * 0.2)
    #---------------------prints out avg noise level--------------------------
    print (' Average audio intensity is r', r)

    return r

if(__name__ == '__main__'):
    while (True):
    audio_int()  # To measure your mic levels

File 2:
In this program, I have added an infinite loop and a check for INTENSITY level before printing the average audio level. If the room is silent or just background noise nothing is triggered. I have kept is as ‘11000’. Make sure that you change it according to output of If its output is, say 8000, keep the intensity as 9000 or 10000.

while True:
  cur_data =
  values = [math.sqrt(abs(audioop.avg(cur_data, 4)))
            for x in range(num_samples)]
  values = sorted(values, reverse=True)
  r = sum(values[:int(num_samples * 0.2)]) / int(num_samples * 0.2)
  #print " Finished "
  if (r > INTENSITY):
    print (' Average audio intensity is r', r)


File 3:
his one will wait for the threshold and once triggered, it will save 1 second of audio to a file in wave format together with 5 frames of  previous voice chunks. This is important, otherwise our recording will not contain the starting of words or the words will be biased towards first half of 1 second and remaining half will be empty. The spectrogram when generated by tensorflow will looked chopped off.

    while True:
      #reading current data
      cur_data =
      values = [math.sqrt(abs(audioop.avg(cur_data, 4)))
                for x in range(num_samples)]
      values = sorted(values, reverse=True)
      r = sum(values[:int(num_samples * 0.2)]) / int(num_samples * 0.2)
      if (r > INTENSITY):
        #-------------------------------------------------if triggered; file.wav = 5 previous frames + capture 1 sec of voice-------------------------------
        print (' Average audio intensity is r', r)
        frames = []
        #---------------getting 1 second of voice data-----------------
        for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
          data =
        print ('finished recording')
        #-------------     ---saving wave file-------------------------
        waveFile =, 'wb')
      #------------------------------------------------------if not triggered; saving previous values to a FIFO of 5 levels----------------------------------

File 4:
his is the modified tensorflow inference file (  I have fused the program to The usage is,

cd /tensorflow/examples/speech_commands
touch file.wav ; to create a dummy file for the first pass
python3 --graph=./my_frozen_graph.pb --labels=./conv_labels.txt --wav=file.wav

The while loop is around run_graph(). If the audio is detected and is above threshold; wave file is captured and given for inferencing. Once the results are printed out, it continue listening for the next audio.

      with open(wav, 'rb') as wav_file:
        wav_data =
      run_graph(wav_data, labels_list, input_name, output_name, how_many_labels)
      default=1,# -------------------this will make use that, it prints out only one result with max probability------------------------
      help='Number of results to show.')

Here is the result. There are some errors while processing since the graph is not accurate. I could train it only till 88% accuracy. More data argumentation is needed for improving the accuracy and I may need to fiddle around with all the switches that is provided by tensorflow for training. But this is good enough to create a speech controlled device using raspberry pi.

Speech detection with Tensorflow 1.4 on Raspberry Pi 3 – Part 1: Getting audio file using PyAudio

In previous posts 1 and about speech detection using tensorflow, it is shown how to inference a 1 sec audio sample using the graph that is trained in tensorflow by running This series of posts will look into inferencing a continuous stream of audio. there is an excellent post  by Allan in which shows how to do the same but, I was not happy with the results and the code was quiet a lot to understand. It uses tensorflow audio functions to process the audio. I will be using pyAudio to process audio since it is easy to understand and later, I may move into tensorflow audio processing. The code posted is running on raspberry pi 3 but it should be able to run on any linux system without any modification.

To get the audio, you need to purchase a usb sound card as shown in the figure below, this is available in ebay/aliexpress or amazon. Connect a 2.5mm mic to it or like I did, scavenge a mic from old electronics and a 2.5mm audio jack and connect it together.

usb audio card on paspberry pi for tensorflow
USB microphone, pi noir camera and earphones for audio on raspberry pi 3

The following python code will record a 1 sec audio and save it as a .wav file. For tensorflow speech recognition we use a sampling rate of 16K (RATE), single channel (CHANNELS) and 1 sec duration (RECORD_SECONDS).

import pyaudio
import wave
FORMAT = pyaudio.paInt16
RATE = 16000
CHUNK = 512 
audio = pyaudio.PyAudio()
# start Recording
stream =, channels=CHANNELS,
                rate=RATE, input=True,
print "recording..."
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
    data =
print "finished recording"
# stop Recording
waveFile =, 'wb')

When you run pyaudio.PyAudio() ALSA may print out errors like the one shown below.

ALSA error raspberry pi 3 tensorflow

The errors can be removed by commenting out the corresponding devices in /usr/share/alsa/alsa.conf.

alsa conf error raspberry pi 3

Next step is to integrate this to in tensorflow branch: tensorflow/examples/speech_commands/ In the updated file;, I have added a for loop around run_graph() to record a 1 sec wav audio. A new audio sample will be recorded every time when the loop runs and the audio is rewritten with the same file name.

audio inferencing tensorflow raspberry pi

Here is the output. The input file is given as file.wav from the same directory …../speech_commands, the file will be overwritten each time when recording finishes. To start with, create a dummy file file.wav to run the script.

touch file.wav
python3 --graph=./my_frozen_graph.pb --labels=./conv_labels.txt --wav=./file.wav

audio inferencing raspberry pi tensorflow



This is by no means a great method to do speech inferencing. We need to wait for the script to record and again for the next command. But, this is the start. In the next post I will explain how to detect an audio threshold to activate the recording/inferencing. For this I have forked a google assistant speech invoking script written by jeysonmc, this will be the starting point.


Getting Tensorflow 1.4 on RaspberryPi 3

There are two methods to install tensorflow on raspberry pi, installing form binary which is provided by (you can get all nighly builds for all platform) or building from source which is hard and takes lots of time and setup. Build usually fails multiple times unless you know exactly what you do.

I will explain, how I installed tensorflow 1.4 on raspberry pi3 from pre-compiled binary.

Install pip,

# For Python 2.7
sudo apt-get install python-pip python-dev

# For Python 3.4
sudo apt-get install python3-pip python3-dev

Tensorflow nightly build for pi3 in python3 is avalilable here

Choose the date of the build, copy the link for .whl file.tensorflow nightly for raspberry pi 3

Install tensorflow, for python 3.4 use pip3 and pip for python 2.7.

sudo pip3 install

or download it if you need a backup and install the file


sudo pip3 install ./tensorflow-1.4.0-cp34-none-any.whl

Test the installation,

$ python3
>>> import tensorflow as tf

If it is installed correctly, no errors will be shown. If there are errors while running, try uninstalling and install another nighty build binary. The tensorflow speech detection can be now run on raspberry pi after copying the files to the pi. Use scp for copying over ssh.

scp /tmp/
scp /tmp/speech_commands_train/conv_labels.txt
scp /tmp/speech_dataset/left/a5d485dc_nohash_0.wav

Make sure to clone the tensorflow tree from github to raspberry pi before running the

git clone
cd tensorflow/examples/speech_commands

python tensorflow/examples/speech_commands/ \
--graph=./my_frozen_graph.pb \
--labels=./conv_labels.txt \

running tensorflow speech detection on raspberry pi 3

Visualising and understanding MNIST dataset and solving with simple DNN using tflearn

This post is a sort of getting started with digits dataset for deep learning.

MNIST dataset is available in csv format, it is confusing at the start in understanding, how the image dataset is arranged in a csv file. To start with, MNIST dataset consist of image data as scalar, one dimension array of 784 values. In 2D array form, it will be a 28×28 matrix.

The training dataset consist of header in first row detailing what type of data the column contains. First column is the label for image data and remaining 784 columns contain pixel value.MNIST dataset visualisation

The test dataset follow the same format without the label information.

MNIST test dataset visualisation

First the csv file is read to a dataframe using pandas.

train = pd.read_csv("./train.csv")

Each row of the dataframe contain label information as the first item and pixel values in the remaining items. The following code parse through first 10 rows of data.

#loop through the dataframe line by line and print the image for the arranged data
for x in range(1, 10): #first 10 rows are parsed
    rowData=np.array(train[x-1:x]) #np array rowData contain all 785 values (1 label value+784 pixel value)
    label=np.resize(rowData,(1,1)) #np array label gets the first value from rowdata
    print('label shape             ->',label.shape) #printing the shape of np array label
    print('label                   ->',label.ravel()) #Image label
    rowWithIndex = rowData.ravel()#scalar data with 785 items    
    print('row with index shape    ->',rowWithIndex.shape)
    rowWithOutIndex = rowWithIndex[1:785:1]#scalar image data with 784 pixel values
    print('row without index shape ->',rowWithOutIndex.shape)
    Image1=np.resize(rowWithOutIndex,(28,28)) #28x28 Image
    print('Image shape             ->',Image1.shape) #printing Image shape
    plt.imshow(Image1, interpolation='nearest') #plotting

The data when plotted looks like this,

MNIST data plotThat is all about visualisation and back to actual data preparation and training. For training the DNN, we need label and data as two separate arrays. The label data need to be one-hot encoded for passing it to tensorflow training function.

# Split data into training set and labels
y_train = train.ix[:,0].values #all input labels, first cloumn(index 0) of each row in the train csv file
trainX = train.ix[:,1:].values #remaining 784 values after(from index 1 till end) the first colum. 

#one hot encoded form of labels
y_train_one_hot = to_categorical(y_train)

one hot encoding for training label

Once the data is prepped, creating a DNN using tflearn is as easy as defining input layer, hidden layers, output layer and regression function.

#DNN - input layer of 784 inputs, 4 hidden layers and a softmax layer at output
def build_model():
    net = tflearn.input_data([None, 784]) #input layer with 784 inputs
    net = tflearn.fully_connected(net, 128, activation='ReLU') #hidden layer1
    net = tflearn.fully_connected(net, 64, activation='ReLU') #hidden layer2
    net = tflearn.fully_connected(net, 32, activation='ReLU') #hidden layer3
    net = tflearn.fully_connected(net, 10, activation='softmax') #output layer
    net = tflearn.regression(net, optimizer='sgd', learning_rate=0.1, loss='categorical_crossentropy')
    model = tflearn.DNN(net)
    return model
model = build_model()

The updated kernel is available in kaggle and in github page.


Getting started with tensorflow speech recognition API and object detection API

Detailed tutorial for Tensorflow speech recognition is here, I am going through the steps not mentioned for initial setup of the code and the issues faced.

Step 1: Download tensorflow source from git

git clone

this will download tensorflow source tree to the location there it is executed.

Step 2:  Training, the training script is located in tensorflow/examples/speech_commands pass the switch –data_url= to stop downloading default speech data from tensorflow. The path for training data can be set in this file. Tensorboard can be opened by this command ‘tensorboard –logdir /tmp/logs’. Go to the url which will get printed after executing the command.

python tensorflow/examples/speech_commands/ --data_url=

tensorboard for speech recognition using tensorflow

Step 3: Create a frozen graph after the training ends. It took 1.5hrs for training with a GTX 1050Ti GPU.

python tensorflow/examples/speech_commands/ \
--start_checkpoint=/tmp/speech_commands_train/conv.ckpt-18000 \

Step 4: Inference

python tensorflow/examples/speech_commands/ \
--graph=/tmp/my_frozen_graph.pb \
--labels=/tmp/speech_commands_train/conv_labels.txt \

tensorflow speech recognition

The short voice samples are converted to spectrogram image before processing. A CNN can be used for training on image. To create a spectrogram using the provided tool, go to tensorflow folder which contain ‘configure’ script and run,


this will start building the tensorflow source code. Once this is done use this command to create spectrogram image for a wav file. Make sure to give absolute paths, more of the time I have encountered error because of mismatched paths.

bazel run tensorflow/examples/wav_to_spectrogram:wav_to_spectrogram -- --input_wav=

this throws an error saying bazel not found. Bazel is a build tool like ant or maven, this is used to build tensorflow.

I had to install bazel from this link for the above command to work. there are multiple methods to install bazel. I tried installing bazel using custom apt repo.

sudo apt-get install openjdk-8-jdk

echo "deb [arch=amd64] stable jdk1.8"
 | sudo tee /etc/apt/sources.list.d/bazel.list
curl | sudo apt-key add -

sudo apt-get update && sudo apt-get install bazel

sudo apt-get upgrade bazel

This is the spectrogram output.

spectrogram speech recognition


The github page for Tensorflow Object detection API is here.

To use Tensorflow Object detection API,

Step 1: Clone the tensorflow model tree to your PC.

git clone

Step 2: go to research folder, install dependencies, protobuf, export PYTHONPATH.

or follow the detailed steps here

cd models/research
sudo apt-get install protobuf-compiler python-pil python-lxml
sudo pip install jupyter
sudo pip install matplotlib
sudo pip install pillow
sudo pip install lxml
sudo pip install jupyter
sudo pip install matplotlib
# From models/research/
protoc object_detection/protos/*.proto --python_out=.
# From models/research/
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim

Step 3: Open default ipython notebook comes with  Object detection API

cd models/research/object_detection
jupyter notebook object_detection_tutorial.ipynb


Setup and installation for machine learning with CUDA-8.0.61 and cuDNN-6 on Ubuntu 16.04 LTS- Part 2

Tensorflow installation is as simple as running few commands if you have the correct version of CUDA and cuDNN.

To start with I will explain how to uninstall the previous version of CUDA/cuDNN which is installed in Part-1. It is important to know how to configure the installation since this utils can break anytime due to version changes and frequent updates. So every time reinstalling OS is not a solution.

To remove nvidia drivers use this command,

sudo /usr/bin/nvidia-uninstall
sudo apt-get remove --purge nvidia-*
sudo apt-get --purge remove nvidia-cuda* 

Just to make sure, try listing out the packages,

apt list --installed | grep cuda

and uninstall each package by,

sudo apt-get remove <package>

Disable nouveau driver(free driver for nvidia cards comes with ubuntu) for nvidia driver installation.

Edit this file,

vi /etc/modprobe.d/blacklist-nouveau.conf

with this content,

blacklist nouveau
options nouveau modeset=0

Regenerate the kernel initramfs:(initramfs is used to mount root file system / while boot)

sudo update-initramfs -u
sudo reboot

Reboot system.

Install CUDA-8.0 and cuBLAS patch form .deb file downloaded from NVIDIA CUDA archives.

cuda-8.0 installation

sudo dpkg -i cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb
sudo apt-get update
sudo apt-get install cuda-8.0
sudo dpkg -i cuda-repo-ubuntu1604-8-0-local-cublas-performance-update_8.0.61-1_amd64.deb
sudo apt-get update
sudo apt-get upgrade cuda-8.0

Once the installation is done, install cuDNN 6. Download .deb file form cuDNN download page and install. Install, runtime library, development library and code samples.

cudnn6 for cuda-8.0

sudo dpkg -i libcudnn6_6.0.21-1+cuda8.0_amd64.deb
sudo dpkg -i libcudnn6-dev_6.0.21-1+cuda8.0_amd64.deb
sudo dpkg -i libcudnn6-doc_6.0.21-1+cuda8.0_amd64.deb

Add this to .bashrc

export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Go to cuDNN samples directory and compile the sample program.

cp /usr/src/cudnn_samples_v6 ~/.
cd ~/cudnn_samples_v6/mnistCUDNN
make clean

If you get this error,

cudnnGetVersion() : 6021 , CUDNN_VERSION from cudnn.h : 6021 (6.0.21)
Host compiler version : GCC 5.4.0
There are 1 CUDA capable devices on your machine :
device 0 : sms 6 Capabilities 6.1, SmClock 1417.5 Mhz,
 MemSize (Mb) 4035, MemClock 3504.0 Mhz, Ecc=0, boardGroupID=0
Using device 0

Testing single precision
CUDNN failure

just run as root,

cuDNN 6 sample testing

Test Passed!! You now have CUDA-8.0 with cuDNN-6

To install tensorflow, execute this commands. For python 2.7,

sudo apt-get install libcupti-dev

sudo apt-get install python-pip python-dev

pip install tensorflow-gpu

or for python 3.5,

sudo apt-get install libcupti-dev

sudo apt-get install python3-pip python3-dev

pip3 install tensorflow-gpu

After installation, test by calling sample program,

# Python
import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()

tensorflow on nvidia 1050ti

If everything is installed correctly, this will print out the GPU device tensorflow is running.



Setup and installation for machine learning with CUDA and cuDNN on Ubuntu 16.04 LTS- Part 1

Important: This is to install CUDA 9.0 with CuDNN 7, this will not work with tensorflow 1.4(at the time of writing). I realized this after the installation. I will go through tensorflow 1.4 with CUDA 8 and cuDNN 6 in the next post.

Tensorflow 1.4 release notes

All our prebuilt binaries have been built with CUDA 8 and cuDNN 6. We anticipate releasing TensorFlow 1.5 with CUDA 9 and cuDNN 7.

Step 1: Download ubuntu .iso from ubuntu/downloads. This will download a .iso image file to your PC. In my case, the file is ubuntu-16.04.03-desktop-amd64.iso

Step 2: Create bootable USD stick or  burn a dvd from the image for installation. Install Ubuntu on the PC. I have downloaded .iso in Windows 10 and used the default dvd writer program to burn to a disk.

Step 3: Boot into the fresh installation. Open a terminal in ubuntu, update the installation.

sudo apt-get update

Step 4: Download NVIDIA drivers.

I have updated the drivers through additional drivers menu in ubuntu.
Make sure that you have, NVIDIA graphics driver 384.81 or newer for CUDA 9.

Step 5: Download and Install CUDA.

Method 1 : Download the .run file ( for ubuntu 16.04).

Press ctrl+alt+f1 to stop X server and go to tty mode, execute the command.

sudo sh

Accept the licence terms, skip install driver which comes with it (at least it did not work for me), install OpenGL driver, allow permission to manage x server configuration, accept all default paths.

NVIDIA driver CUDA 9

If the driver installation fails, got to /tmp and remove the X server lock files, retry the installation.

cd /tmp
rm -rf .X*

Press ctrl+alt+f7 to return to login screen, once the installation completes.
Install third party lib for building CUDA samples,

sudo apt-get install g++ freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libglu1-mesa libglu1-mesa-dev​

Go to the samples directory, eg:/usr/local/cuda/samples/5_Simulations/particles and try,

sudo make

if everything goes well it will compile and create an executable, run it by,


this will show the demo application

CUDA 9 demo

To check device status,

/usr/local/cuda-8.0/samples/1_Utilities/deviceQuery$ ./deviceQuery

CUDA-8.0 deviceQuery

To uninstall CUDA, if something goes wrong, got to /usr/local/cuda/bin and run the uninstall script.

The default installation path will be /usr/local/cuda/

Method 2: installing from .deb

sudo dpkg -i cuda-repo-ubuntu1604-9-0-local_9.0.176-1_amd64.deb
sudo apt-get update
sudo apt-get install cuda

Try compiling the sample program to check if CUDA is installed fine.

Add this to .bashrc

export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Step 6: Install cuDNN.

Method 1: Download .deb file form cuDNN download page and install. Install, runtime library, development library and code samples.

sudo dpkg -i libcudnn7_7.0.3.11-1+cuda9.0_amd64.deb
sudo dpkg -i libcudnn7-dev_7.0.3.11-1+cuda9.0_amd64.deb
sudo dpkg -i libcudnn7-doc_7.0.3.11-1+cuda9.0_amd64.deb

Method 2 :​ install for downloaded tar file, cudnn-9.0-linux-x64-v7.tgz

If there is any error associated with running cuDNN, check the libcudnn*.so* files are present in /usr/local/cuda/lib64 and cudnn.h file is present in /usr/local/cuda/include

If you are installing form a tar file, cuDNN can be installed by simply copying these files to respective folder of CUDA installation.

tar -xzvf cudnn-9.0-linux-x64-v7.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

Go to cuDNN samples directory and compile the sample program.

cd /usr/src/cudnn_samples_v7/conv_sample

sudo make clean

sudo make

This will compile the code and show the result and we can verify the cuDNN installation.

$ sudo ./conv_sample
Testing single precision
Testing conv
^^^^ CUDA : elapsed = 4.41074e-05 sec,
Testing half precision (math in single precision)
Testing conv
^^^^ CUDA : elapsed = 4.00543e-05 sec,

Cool !! CUDA 9.0 with cuDNN 7 is installed in your system.

Support and documentation.

CUDA developer zone

NVIDIA Linux Display driver archive

I got an error while compiling the mnist code sample, not sure what is the issue, just pasting the error below,

/usr/src/cudnn_samples_v7/mnistCUDNN$ sudo make
/usr/local/cuda/bin/nvcc -ccbin g++ -I/usr/local/cuda/include -IFreeImage/include  -m64    -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_53,code=compute_53 -o fp16_dev.o -c
g++ -I/usr/local/cuda/include -IFreeImage/include   -o fp16_emu.o -c fp16_emu.cpp
g++ -I/usr/local/cuda/include -IFreeImage/include   -o mnistCUDNN.o -c mnistCUDNN.cpp
In file included from /usr/local/cuda/include/channel_descriptor.h:62:0,
                 from /usr/local/cuda/include/cuda_runtime.h:90,
                 from /usr/include/cudnn.h:64,
                 from mnistCUDNN.cpp:30:
/usr/local/cuda/include/cuda_runtime_api.h:1683:101: error: use of enum ‘cudaDeviceP2PAttr’ without previous declaration
  __cudart_builtin__ cudaError_t CUDARTAPI cudaDeviceGetP2PAttribute(int *value, enum cudaDeviceP
/usr/local/cuda/include/cuda_runtime_api.h:2930:102: error: use of enum ‘cudaFuncAttribute’ without previous declaration
 __cudart_builtin__ cudaError_t CUDARTAPI cudaFuncSetAttribute(const void *func, enum cudaFuncAtt
In file included from /usr/local/cuda/include/channel_descriptor.h:62:0,
                 from /usr/local/cuda/include/cuda_runtime.h:90,
                 from /usr/include/cudnn.h:64,
                 from mnistCUDNN.cpp:30:
/usr/local/cuda/include/cuda_runtime_api.h:5770:92: error: use of enum ‘cudaMemoryAdvise’ without previous declaration
  __host__ cudaError_t CUDARTAPI cudaMemAdvise(const void *devPtr, size_t count, enum cudaMemoryA
/usr/local/cuda/include/cuda_runtime_api.h:5827:98: error: use of enum ‘cudaMemRangeAttribute’ without previous declaration
 t__ cudaError_t CUDARTAPI cudaMemRangeGetAttribute(void *data, size_t dataSize, enum cudaMemRang
/usr/local/cuda/include/cuda_runtime_api.h:5864:102: error: use of enum ‘cudaMemRangeAttribute’ without previous declaration
 cudaError_t CUDARTAPI cudaMemRangeGetAttributes(void **data, size_t *dataSizes, enum cudaMemRang
Makefile:200: recipe for target 'mnistCUDNN.o' failed
make: *** [mnistCUDNN.o] Error 1