Getting started with tensorflow speech recognition API and object detection API

Detailed tutorial for Tensorflow speech recognition is here, I am going through the steps not mentioned for initial setup of the code and the issues faced.

Step 1: Download tensorflow source from git

git clone https://github.com/tensorflow/tensorflow.git

this will download tensorflow source tree to the location there it is executed.

Step 2:  Training, the training script is located in tensorflow/examples/speech_commands pass the switch –data_url= to stop downloading default speech data from tensorflow. The path for training data can be set in this file. Tensorboard can be opened by this command ‘tensorboard –logdir /tmp/logs’. Go to the url which will get printed after executing the command.

python tensorflow/examples/speech_commands/train.py --data_url=

tensorboard for speech recognition using tensorflow

Step 3: Create a frozen graph after the training ends. It took 1.5hrs for training with a GTX 1050Ti GPU.

python tensorflow/examples/speech_commands/freeze.py \
--start_checkpoint=/tmp/speech_commands_train/conv.ckpt-18000 \
--output_file=/tmp/my_frozen_graph.pb

Step 4: Inference


python tensorflow/examples/speech_commands/label_wav.py \
--graph=/tmp/my_frozen_graph.pb \
--labels=/tmp/speech_commands_train/conv_labels.txt \
--wav=/tmp/speech_dataset/left/a5d485dc_nohash_0.wav

tensorflow speech recognition

The short voice samples are converted to spectrogram image before processing. A CNN can be used for training on image. To create a spectrogram using the provided tool, go to tensorflow folder which contain ‘configure’ script and run,

./configure

this will start building the tensorflow source code. Once this is done use this command to create spectrogram image for a wav file. Make sure to give absolute paths, more of the time I have encountered error because of mismatched paths.

bazel run tensorflow/examples/wav_to_spectrogram:wav_to_spectrogram -- --input_wav=
/tensorflow/core/kernels/spectrogram_test_data/short_test_segment.wav
 --output_image=/tensorflow/tmp/spectrogram.png

this throws an error saying bazel not found. Bazel is a build tool like ant or maven, this is used to build tensorflow.

I had to install bazel from this link for the above command to work. there are multiple methods to install bazel. I tried installing bazel using custom apt repo.

sudo apt-get install openjdk-8-jdk

echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8"
 | sudo tee /etc/apt/sources.list.d/bazel.list
curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add -

sudo apt-get update && sudo apt-get install bazel

sudo apt-get upgrade bazel

This is the spectrogram output.

spectrogram speech recognition


 

The github page for Tensorflow Object detection API is here.

To use Tensorflow Object detection API,

Step 1: Clone the tensorflow model tree to your PC.

git clone https://github.com/tensorflow/models.git

Step 2: go to research folder, install dependencies, protobuf, export PYTHONPATH.

or follow the detailed steps here

cd models/research
sudo apt-get install protobuf-compiler python-pil python-lxml
sudo pip install jupyter
sudo pip install matplotlib
sudo pip install pillow
sudo pip install lxml
sudo pip install jupyter
sudo pip install matplotlib
# From models/research/
protoc object_detection/protos/*.proto --python_out=.
# From models/research/
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim

Step 3: Open default ipython notebook comes with  Object detection API

cd models/research/object_detection
jupyter notebook object_detection_tutorial.ipynb

 

Simple program to capture video using OpenCV and python for video inferencing

Note: OpenCV needs to be compiled and installed in Ubuntu to get all its full functionality. ‘pip install open-cv‘ will install python bindings but it is very limited by its use. Most of the time there are multiple issues while running. There are lot of dependencies for OpenCV build, I have installed OpenCV by following this tutorial for python 2.7 and python 3.5.  It took around two hours when compiled using 4 cores.

This program contains an infinite while loop, inside the loop each frame is captured from the video device specified and shows it frame by frame.


import cv2
#the video device number is 0
cap = cv2.VideoCapture(0)
 
while(True):
    #reading frame
    ret, frame = cap.read()
    #frame is displayed 
    cv2.imshow('window1',frame)
    #press q to break
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break
cap.release()
cv2.destroyAllWindows()

To run the program,


#python2.7
python opencv_test.py
#python3.5
python3 opencv_test.py

The earlier post is embedding this OpenCV program for video inferencing.