Speech detection with Tensorflow 1.4 on Raspberry Pi 3 – Part 1: Getting audio file using PyAudio

In previous posts 1 and about speech detection using tensorflow, it is shown how to inference a 1 sec audio sample using the graph that is trained in tensorflow by running label_wav.py. This series of posts will look into inferencing a continuous stream of audio. there is an excellent post  by Allan in medium.com which shows how to do the same but, I was not happy with the results and the code was quiet a lot to understand. It uses tensorflow audio functions to process the audio. I will be using pyAudio to process audio since it is easy to understand and later, I may move into tensorflow audio processing. The code posted is running on raspberry pi 3 but it should be able to run on any linux system without any modification.

To get the audio, you need to purchase a usb sound card as shown in the figure below, this is available in ebay/aliexpress or amazon. Connect a 2.5mm mic to it or like I did, scavenge a mic from old electronics and a 2.5mm audio jack and connect it together.

usb audio card on paspberry pi for tensorflow
USB microphone, pi noir camera and earphones for audio on raspberry pi 3

The following python code will record a 1 sec audio and save it as a .wav file. For tensorflow speech recognition we use a sampling rate of 16K (RATE), single channel (CHANNELS) and 1 sec duration (RECORD_SECONDS).

import pyaudio
import wave
 
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
CHUNK = 512 
RECORD_SECONDS = 1
WAVE_OUTPUT_FILENAME = "file.wav"
 
audio = pyaudio.PyAudio()
 
# start Recording
stream = audio.open(format=FORMAT, channels=CHANNELS,
                rate=RATE, input=True,
                frames_per_buffer=CHUNK)
print "recording..."
frames = []
 
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
    data = stream.read(CHUNK)
    frames.append(data)
print "finished recording"
 
 
# stop Recording
stream.stop_stream()
stream.close()
audio.terminate()
 
waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
waveFile.setnchannels(CHANNELS)
waveFile.setsampwidth(audio.get_sample_size(FORMAT))
waveFile.setframerate(RATE)
waveFile.writeframes(b''.join(frames))
waveFile.close()

When you run pyaudio.PyAudio() ALSA may print out errors like the one shown below.

ALSA error raspberry pi 3 tensorflow

The errors can be removed by commenting out the corresponding devices in /usr/share/alsa/alsa.conf.

alsa conf error raspberry pi 3

Next step is to integrate this to label_wav.py in tensorflow branch: tensorflow/examples/speech_commands/label_wav.py. In the updated file; mod_label_wav.py, I have added a for loop around run_graph() to record a 1 sec wav audio. A new audio sample will be recorded every time when the loop runs and the audio is rewritten with the same file name.

audio inferencing tensorflow raspberry pi

Here is the output. The input file is given as file.wav from the same directory …../speech_commands, the file will be overwritten each time when recording finishes. To start with, create a dummy file file.wav to run the script.

touch file.wav
wget https://github.com/kiranjose/python-tensorflow-speech-recognition/blob/master/mod_label_wav.py
python3 mod_label_wav.py --graph=./my_frozen_graph.pb --labels=./conv_labels.txt --wav=./file.wav

audio inferencing raspberry pi tensorflow

Files:
1. pyAudio_wav.py

2. mod_label_wav.py

This is by no means a great method to do speech inferencing. We need to wait for the script to record and again for the next command. But, this is the start. In the next post I will explain how to detect an audio threshold to activate the recording/inferencing. For this I have forked a google assistant speech invoking script written by jeysonmc, this will be the starting point.

 

Leave a Reply

Your email address will not be published. Required fields are marked *