Getting started with tensorflow speech recognition API and object detection API

Detailed tutorial for Tensorflow speech recognition is here, I am going through the steps not mentioned for initial setup of the code and the issues faced.

Step 1: Download tensorflow source from git

git clone https://github.com/tensorflow/tensorflow.git

this will download tensorflow source tree to the location there it is executed.

Step 2:  Training, the training script is located in tensorflow/examples/speech_commands pass the switch –data_url= to stop downloading default speech data from tensorflow. The path for training data can be set in this file. Tensorboard can be opened by this command ‘tensorboard –logdir /tmp/logs’. Go to the url which will get printed after executing the command.

python tensorflow/examples/speech_commands/train.py --data_url=

tensorboard for speech recognition using tensorflow

Step 3: Create a frozen graph after the training ends. It took 1.5hrs for training with a GTX 1050Ti GPU.

python tensorflow/examples/speech_commands/freeze.py \
--start_checkpoint=/tmp/speech_commands_train/conv.ckpt-18000 \
--output_file=/tmp/my_frozen_graph.pb

Step 4: Inference


python tensorflow/examples/speech_commands/label_wav.py \
--graph=/tmp/my_frozen_graph.pb \
--labels=/tmp/speech_commands_train/conv_labels.txt \
--wav=/tmp/speech_dataset/left/a5d485dc_nohash_0.wav

tensorflow speech recognition

The short voice samples are converted to spectrogram image before processing. A CNN can be used for training on image. To create a spectrogram using the provided tool, go to tensorflow folder which contain ‘configure’ script and run,

./configure

this will start building the tensorflow source code. Once this is done use this command to create spectrogram image for a wav file. Make sure to give absolute paths, more of the time I have encountered error because of mismatched paths.

bazel run tensorflow/examples/wav_to_spectrogram:wav_to_spectrogram -- --input_wav=
/tensorflow/core/kernels/spectrogram_test_data/short_test_segment.wav
 --output_image=/tensorflow/tmp/spectrogram.png

this throws an error saying bazel not found. Bazel is a build tool like ant or maven, this is used to build tensorflow.

I had to install bazel from this link for the above command to work. there are multiple methods to install bazel. I tried installing bazel using custom apt repo.

sudo apt-get install openjdk-8-jdk

echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8"
 | sudo tee /etc/apt/sources.list.d/bazel.list
curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add -

sudo apt-get update && sudo apt-get install bazel

sudo apt-get upgrade bazel

This is the spectrogram output.

spectrogram speech recognition


 

The github page for Tensorflow Object detection API is here.

To use Tensorflow Object detection API,

Step 1: Clone the tensorflow model tree to your PC.

git clone https://github.com/tensorflow/models.git

Step 2: go to research folder, install dependencies, protobuf, export PYTHONPATH.

or follow the detailed steps here

cd models/research
sudo apt-get install protobuf-compiler python-pil python-lxml
sudo pip install jupyter
sudo pip install matplotlib
sudo pip install pillow
sudo pip install lxml
sudo pip install jupyter
sudo pip install matplotlib
# From models/research/
protoc object_detection/protos/*.proto --python_out=.
# From models/research/
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim

Step 3: Open default ipython notebook comes with  Object detection API

cd models/research/object_detection
jupyter notebook object_detection_tutorial.ipynb

 

2 thoughts on “Getting started with tensorflow speech recognition API and object detection API

  1. Hi,
    I was wondering how long did this step take? And how much memory is needed for this step to be accomplished:
    bazel run tensorflow/examples/wav_to_spectrogram:wav_to_spectrogram — –input_wav=
    /tensorflow/core/kernels/spectrogram_test_data/short_test_segment.wav
    –output_image=/tensorflow/tmp/spectrogram.png

    Since I keep getting INFO messages till I reach out of memory, I am not sure why.

    Thanks

    1. I have run this on a desktop with 8gb memory. Since it is bazel commmand, it should be using system memory and not GPU memory. What is your system configuration?. When you run the command first it start building tensorflow source code to make the binary. This is why it is taking time.

      Any configuration related tensorflow will be outdated in no time, I have installed bazel ~0.5.4 two months back and now when I updated it is 0.11.1. All the dependencies broke down with tensorflow.

      So, I downloaded recent tensorflow source from git and configured without CUDA. Here are the settings I used,

      Start time: 9.00PM
      ———————————————————————
      git clone https://github.com/tensorflow/tensorflow.git

      kiran@kiran-Z370-HD3P:~/ml/tf_19_03_2018/tensorflow$ sudo ./configure
      You have bazel 0.11.1 installed.
      Please specify the location of python. [Default is /usr/bin/python]:

      Found possible Python library paths:
      /usr/local/lib/python2.7/dist-packages
      /usr/lib/python2.7/dist-packages
      Please input the desired Python library path to use. Default is [/usr/local/lib/python2.7/dist-packages]

      Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: n
      No jemalloc as malloc support will be enabled for TensorFlow.

      Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: n
      No Google Cloud Platform support will be enabled for TensorFlow.

      Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n
      No Hadoop File System support will be enabled for TensorFlow.

      Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n
      No Amazon S3 File System support will be enabled for TensorFlow.

      Do you wish to build TensorFlow with Apache Kafka Platform support? [y/N]:
      No Apache Kafka Platform support will be enabled for TensorFlow.

      Do you wish to build TensorFlow with XLA JIT support? [y/N]:
      No XLA JIT support will be enabled for TensorFlow.

      Do you wish to build TensorFlow with GDR support? [y/N]:
      No GDR support will be enabled for TensorFlow.

      Do you wish to build TensorFlow with VERBS support? [y/N]:
      No VERBS support will be enabled for TensorFlow.

      Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]:
      No OpenCL SYCL support will be enabled for TensorFlow.

      Do you wish to build TensorFlow with CUDA support? [y/N]: N
      No CUDA support will be enabled for TensorFlow.

      Do you wish to build TensorFlow with MPI support? [y/N]:
      No MPI support will be enabled for TensorFlow.

      Please specify optimization flags to use during compilation when bazel option “–config=opt” is specified [Default is -march=native]:

      Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: N
      Not configuring the WORKSPACE for Android builds.

      Preconfigured Bazel build configs. You can use any of the below by adding “–config=<>” to your build command. See tools/bazel.rc for more details.
      –config=mkl # Build with MKL support.
      –config=monolithic # Config for mostly static monolithic build.
      Configuration finished

      sudo bazel run tensorflow/examples/wav_to_spectrogram:wav_to_spectrogram — –input_wav=/home/kiran/ml/tensorflow/tensorflow/core/kernels/spectrogram_test_data/short_test_segment.wav –output_image=/home/kiran/ml/tensorflow/tmp/spectrogram.png

      End time: 9:21PM
      Build happened on 4 x CPU with 100% utilization
      resource usage

      Once it is built, it hardly takes any time to create spectrogram image for the second time.

      ——————————————————————————————-

Leave a Reply

Your email address will not be published. Required fields are marked *