Getting started with tensorflow speech recognition API and object detection API

Detailed tutorial for Tensorflow speech recognition is here, I am going through the steps not mentioned for initial setup of the code and the issues faced.

Step 1: Download tensorflow source from git

git clone https://github.com/tensorflow/tensorflow.git

this will download tensorflow source tree to the location there it is executed.

Step 2:  Training, the training script is located in tensorflow/examples/speech_commands pass the switch –data_url= to stop downloading default speech data from tensorflow. The path for training data can be set in this file. Tensorboard can be opened by this command ‘tensorboard –logdir /tmp/logs’. Go to the url which will get printed after executing the command.

python tensorflow/examples/speech_commands/train.py --data_url=

tensorboard for speech recognition using tensorflow

Step 3: Create a frozen graph after the training ends. It took 1.5hrs for training with a GTX 1050Ti GPU.

python tensorflow/examples/speech_commands/freeze.py \
--start_checkpoint=/tmp/speech_commands_train/conv.ckpt-18000 \
--output_file=/tmp/my_frozen_graph.pb

Step 4: Inference


python tensorflow/examples/speech_commands/label_wav.py \
--graph=/tmp/my_frozen_graph.pb \
--labels=/tmp/speech_commands_train/conv_labels.txt \
--wav=/tmp/speech_dataset/left/a5d485dc_nohash_0.wav

tensorflow speech recognition

The short voice samples are converted to spectrogram image before processing. A CNN can be used for training on image. To create a spectrogram using the provided tool, go to tensorflow folder which contain ‘configure’ script and run,

./configure

this will start building the tensorflow source code. Once this is done use this command to create spectrogram image for a wav file. Make sure to give absolute paths, more of the time I have encountered error because of mismatched paths.

bazel run tensorflow/examples/wav_to_spectrogram:wav_to_spectrogram -- --input_wav=
/tensorflow/core/kernels/spectrogram_test_data/short_test_segment.wav
 --output_image=/tensorflow/tmp/spectrogram.png

this throws an error saying bazel not found. Bazel is a build tool like ant or maven, this is used to build tensorflow.

I had to install bazel from this link for the above command to work. there are multiple methods to install bazel. I tried installing bazel using custom apt repo.

sudo apt-get install openjdk-8-jdk

echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8"
 | sudo tee /etc/apt/sources.list.d/bazel.list
curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add -

sudo apt-get update && sudo apt-get install bazel

sudo apt-get upgrade bazel

This is the spectrogram output.

spectrogram speech recognition


 

The github page for Tensorflow Object detection API is here.

To use Tensorflow Object detection API,

Step 1: Clone the tensorflow model tree to your PC.

git clone https://github.com/tensorflow/models.git

Step 2: go to research folder, install dependencies, protobuf, export PYTHONPATH.

or follow the detailed steps here

cd models/research
sudo apt-get install protobuf-compiler python-pil python-lxml
sudo pip install jupyter
sudo pip install matplotlib
sudo pip install pillow
sudo pip install lxml
sudo pip install jupyter
sudo pip install matplotlib
# From models/research/
protoc object_detection/protos/*.proto --python_out=.
# From models/research/
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim

Step 3: Open default ipython notebook comes with  Object detection API

cd models/research/object_detection
jupyter notebook object_detection_tutorial.ipynb