Detailed tutorial for Tensorflow speech recognition is here, I am going through the steps not mentioned for initial setup of the code and the issues faced.
Step 1: Download tensorflow source from git
git clone https://github.com/tensorflow/tensorflow.git
this will download tensorflow source tree to the location there it is executed.
Step 2: Training, the training script is located in tensorflow/examples/speech_commands pass the switch –data_url= to stop downloading default speech data from tensorflow. The path for training data can be set in this file. Tensorboard can be opened by this command ‘tensorboard –logdir /tmp/logs’. Go to the url which will get printed after executing the command.
python tensorflow/examples/speech_commands/train.py --data_url=
Step 3: Create a frozen graph after the training ends. It took 1.5hrs for training with a GTX 1050Ti GPU.
python tensorflow/examples/speech_commands/freeze.py \ --start_checkpoint=/tmp/speech_commands_train/conv.ckpt-18000 \ --output_file=/tmp/my_frozen_graph.pb
Step 4: Inference
python tensorflow/examples/speech_commands/label_wav.py \
--graph=/tmp/my_frozen_graph.pb \
--labels=/tmp/speech_commands_train/conv_labels.txt \
--wav=/tmp/speech_dataset/left/a5d485dc_nohash_0.wav
The short voice samples are converted to spectrogram image before processing. A CNN can be used for training on image. To create a spectrogram using the provided tool, go to tensorflow folder which contain ‘configure’ script and run,
./configure
this will start building the tensorflow source code. Once this is done use this command to create spectrogram image for a wav file. Make sure to give absolute paths, more of the time I have encountered error because of mismatched paths.
bazel run tensorflow/examples/wav_to_spectrogram:wav_to_spectrogram -- --input_wav= /tensorflow/core/kernels/spectrogram_test_data/short_test_segment.wav --output_image=/tensorflow/tmp/spectrogram.png
this throws an error saying bazel not found. Bazel is a build tool like ant or maven, this is used to build tensorflow.
I had to install bazel from this link for the above command to work. there are multiple methods to install bazel. I tried installing bazel using custom apt repo.
sudo apt-get install openjdk-8-jdk echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add - sudo apt-get update && sudo apt-get install bazel sudo apt-get upgrade bazel
This is the spectrogram output.
The github page for Tensorflow Object detection API is here.
To use Tensorflow Object detection API,
Step 1: Clone the tensorflow model tree to your PC.
git clone https://github.com/tensorflow/models.git
Step 2: go to research folder, install dependencies, protobuf, export PYTHONPATH.
or follow the detailed steps here
cd models/research
sudo apt-get install protobuf-compiler python-pil python-lxml
sudo pip install jupyter
sudo pip install matplotlib
sudo pip install pillow
sudo pip install lxml
sudo pip install jupyter
sudo pip install matplotlib
# From models/research/
protoc object_detection/protos/*.proto --python_out=.
# From models/research/
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim
Step 3: Open default ipython notebook comes with Object detection API
cd models/research/object_detection jupyter notebook object_detection_tutorial.ipynb