Removing noise from video file in Linux using Audacity and kdenlive

Note: This is in Ubuntu 16.04

Removing the static noise picked up by the microphone is essential before uploading youtube videos since youtube does not allow editing audio track after the upload. It stopped allowing annotations in videos too, so the only way to remove noise after uploading is to take down the video and upload a new video. Unfortunately I had to do the same, just explaining here how to do the editing using freeware in ubuntu.

To remove noise, we need two software,

1)Audacity – Install it from Ubuntu Software
use audacity to remove noise

2)kdelive

sudo add-apt-repository ppa:sunab/kdenlive-release
sudo apt-get update
sudo apt-get install kdenlive
Audacity:

Initial step is two create a audio file which is free from noise from the video file using Audacity.
Open Audacity-> File->Open (open video file)

audacity noise removal

Select noise as shown in fig above; Effect->Noise Reduction: Step 1 – Get Noise Profile (this will tell audacity which is the noise signature)
Noise Reduction_audacity

Deselect the noise (click anywhere outside the selection); Effect->Noise Reduction: Step 2 – OK (keep default settings or change Noise reduction settings in db for more attenuation)

noise attenuated file audacity

File->Export Audio->Save as noise_free.mp3

kdelive:

We use kdelive to merge the noise free audio track to the video track.

Open kdelive,
menu->Projects->Add clip->Add the original video with noisy audio
menu->Projects->Add clip->Add the noise free audio file (noise_free.mp3)
kdelive creating new audio track for youtube video in ubuntu

Drag the video to Video 1, drag noise_free.mp3 to Audio 1, mute Video 1 audio track. Go to project Monitor as press play button to check the new audio.

kdelive create videos for youtube with changing audio track

To save the new video Click on Render-> Render into a file (keep default settings or change settings whichever needed for file format)

See the comparisons below.

with noise
noise removed

Getting Tensorflow 1.4 working with NVIDIA DIGITS in Ubuntu 16.04

The steps to follow are here.

In this post, I am not explaining all the bits and pieces on how to install, I am trying to avoid the confusion regarding what to follow.

Here is the main dependency, If you need to train tensorflow model with NVIDIA DIGITS, you need to get DIGITS 6. If you have DIGITS 5 installed, It won’t detect tensorflow. At the time of writing, unlike installing DIGITS 5, there are no binaries provided by NVIDIA to install DIGITS 6. You need to either install using docker or compile and install from source. I tried installing from docker, later I figured out that unless you are already familiar with docker, you are going to spend hell a lot of time trying to understand docker itself and its usage. Then there is nvidia-docker, which is the one actually needed for NVIDIA DIGITS 6. I tried for some time and realised that it is not required for me since I own the machine and I am the only person using it. I really am not ready to spend time on going through docker pain at this point of time.

Even though I am not a fan of compiling and installing, looks like that’s the only way. It is going to take some time and, you may need to fix some build failures, dependencies, stack-overflowing, googling etc. I followed the installation instructions from DIGITS github page.

Long story short,  you need to,

  1. Remove DIGITS 5 ( check here how to list and remove packages)
  2. compile and install Caffe (can not skip this, it is a dependency for DIGITS 6)
  3. compile and install Torch (not a requirement but let’s keep it)
  4. compile and install tensorflow_gpu (I had this already, so I skipped)
  5. compile and install DIGITS 6

Make sure you add this variables to ~/.bashrc

export DIGITS_ROOT=~/digits
export CAFFE_ROOT=~/caffe
export TORCH_ROOT=~/torch

The digits server can be invoked by ‘digits-devserver &‘. By default the service will be active at http://localhost:5000/
If everything goes fine, when you create a new model in DIGITS you can see the frameworks.tensorflow on NVIDIA digits 6

 

Speech detection with Tensorflow 1.4 on Raspberry Pi 3 – Part 2: Live audio inferencing using PyAudio

Here is link to Part 1

Now we know, how to loop around the inferencing function, capture a voice for a fixed time and process it. What we need now is a program to listen to the input stream and measure the audio level. This will help us to take a decision if we need to capture the audio data or not.

File1: audio_intensity.py
The following code, reads a CHUNK of data from the stream and measure average intensity, prints it out so that we will know how much ambient noise is there in the background. First we need to figure out the average intensity level (INTENSITY) so that we will get a threshold number to check for.

import pyaudio
import wave
import math
import audioop
import time
 
p = pyaudio.PyAudio() 
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
CHUNK = 512 
RECORD_SECONDS = 1
WAVE_OUTPUT_FILENAME = "file.wav"
INTENSITY=11000
 
def audio_int(num_samples=50):
    print ('Getting intensity values from mic.')
    p = pyaudio.PyAudio()

    stream = p.open(format=FORMAT,
                    channels=CHANNELS,
                    rate=RATE,
                    input=True,
                    frames_per_buffer=CHUNK)
    #----------------------checks average noise level-------------------------
    cur_data = stream.read(CHUNK)
    values = [math.sqrt(abs(audioop.avg(cur_data, 4)))
                for x in range(num_samples)]
    values = sorted(values, reverse=True)
    r = sum(values[:int(num_samples * 0.2)]) / int(num_samples * 0.2)
    #---------------------prints out avg noise level--------------------------
    print (' Average audio intensity is r', r)
    time.sleep(.1)

    stream.close()
    p.terminate()
    return r


if(__name__ == '__main__'):
    while (True):
    audio_int()  # To measure your mic levels

File 2: audio_intensity_trigger.py
In this program, I have added an infinite loop and a check for INTENSITY level before printing the average audio level. If the room is silent or just background noise nothing is triggered. I have kept is as ‘11000’. Make sure that you change it according to output of audio_intensity.py. If its output is, say 8000, keep the intensity as 9000 or 10000.

......
......
......
while True:
  cur_data = stream.read(CHUNK)
  values = [math.sqrt(abs(audioop.avg(cur_data, 4)))
            for x in range(num_samples)]
  values = sorted(values, reverse=True)
  r = sum(values[:int(num_samples * 0.2)]) / int(num_samples * 0.2)
  #print " Finished "
  if (r > INTENSITY):
    print (' Average audio intensity is r', r)

stream.close()
......
......

File 3: audio_trigger_save_wav.py
T
his one will wait for the threshold and once triggered, it will save 1 second of audio to a file in wave format together with 5 frames of  previous voice chunks. This is important, otherwise our recording will not contain the starting of words or the words will be biased towards first half of 1 second and remaining half will be empty. The spectrogram when generated by tensorflow will looked chopped off.

......
......
......
    prev_data0=[]
    prev_data1=[]
    prev_data2=[]
    prev_data3=[]
    prev_data4=[]
    while True:
      #reading current data
      cur_data = stream.read(CHUNK)
      values = [math.sqrt(abs(audioop.avg(cur_data, 4)))
                for x in range(num_samples)]
      values = sorted(values, reverse=True)
      r = sum(values[:int(num_samples * 0.2)]) / int(num_samples * 0.2)
      if (r > INTENSITY):
        #-------------------------------------------------if triggered; file.wav = 5 previous frames + capture 1 sec of voice-------------------------------
        print (' Average audio intensity is r', r)
        frames = []
        frames.append(prev_data0)
        frames.append(prev_data1)
        frames.append(prev_data2)
        frames.append(prev_data3)
        frames.append(prev_data4)
        frames.append(cur_data)
        #---------------getting 1 second of voice data-----------------
        for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
          data = stream.read(CHUNK)
          frames.append(data)
        print ('finished recording')
        #-------------     ---saving wave file-------------------------
        waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
        waveFile.setnchannels(CHANNELS)
        waveFile.setsampwidth(p.get_sample_size(FORMAT))
        waveFile.setframerate(RATE)
        waveFile.writeframes(b''.join(frames))
        waveFile.close()
      #------------------------------------------------------if not triggered; saving previous values to a FIFO of 5 levels----------------------------------
      prev_data0=prev_data1
      prev_data1=prev_data2
      prev_data2=prev_data3
      prev_data3=prev_data4
      prev_data4=cur_data
     stream.close()
......
......
......

File 4: wav_trigger_inference.py
T
his is the modified tensorflow inference file (label_wav.py).  I have fused the program audio_trigger_save_wav.py to label_wav.py. The usage is,

cd /tensorflow/examples/speech_commands
touch file.wav ; to create a dummy file for the first pass
python3 wav_trigger_inference.py --graph=./my_frozen_graph.pb --labels=./conv_labels.txt --wav=file.wav

The while loop is around run_graph(). If the audio is detected and is above threshold; wave file is captured and given for inferencing. Once the results are printed out, it continue listening for the next audio.

....
....
....
      waveFile.writeframes(b''.join(frames))
      waveFile.close()
      with open(wav, 'rb') as wav_file:
        wav_data = wav_file.read()
      run_graph(wav_data, labels_list, input_name, output_name, how_many_labels)
    prev_data0=prev_data1
    prev_data1=prev_data2
....
....
....
 parser.add_argument(
      '--how_many_labels',
      type=int,
      default=1,# -------------------this will make use that, it prints out only one result with max probability------------------------
      help='Number of results to show.')
....
....
....

Here is the result. There are some errors while processing since the graph is not accurate. I could train it only till 88% accuracy. More data argumentation is needed for improving the accuracy and I may need to fiddle around with all the switches that is provided by tensorflow for training. But this is good enough to create a speech controlled device using raspberry pi.

Speech detection with Tensorflow 1.4 on Raspberry Pi 3 – Part 1: Getting audio file using PyAudio

In previous posts 1 and about speech detection using tensorflow, it is shown how to inference a 1 sec audio sample using the graph that is trained in tensorflow by running label_wav.py. This series of posts will look into inferencing a continuous stream of audio. there is an excellent post  by Allan in medium.com which shows how to do the same but, I was not happy with the results and the code was quiet a lot to understand. It uses tensorflow audio functions to process the audio. I will be using pyAudio to process audio since it is easy to understand and later, I may move into tensorflow audio processing. The code posted is running on raspberry pi 3 but it should be able to run on any linux system without any modification.

To get the audio, you need to purchase a usb sound card as shown in the figure below, this is available in ebay/aliexpress or amazon. Connect a 2.5mm mic to it or like I did, scavenge a mic from old electronics and a 2.5mm audio jack and connect it together.

usb audio card on paspberry pi for tensorflow
USB microphone, pi noir camera and earphones for audio on raspberry pi 3

The following python code will record a 1 sec audio and save it as a .wav file. For tensorflow speech recognition we use a sampling rate of 16K (RATE), single channel (CHANNELS) and 1 sec duration (RECORD_SECONDS).

import pyaudio
import wave
 
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
CHUNK = 512 
RECORD_SECONDS = 1
WAVE_OUTPUT_FILENAME = "file.wav"
 
audio = pyaudio.PyAudio()
 
# start Recording
stream = audio.open(format=FORMAT, channels=CHANNELS,
                rate=RATE, input=True,
                frames_per_buffer=CHUNK)
print "recording..."
frames = []
 
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
    data = stream.read(CHUNK)
    frames.append(data)
print "finished recording"
 
 
# stop Recording
stream.stop_stream()
stream.close()
audio.terminate()
 
waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
waveFile.setnchannels(CHANNELS)
waveFile.setsampwidth(audio.get_sample_size(FORMAT))
waveFile.setframerate(RATE)
waveFile.writeframes(b''.join(frames))
waveFile.close()

When you run pyaudio.PyAudio() ALSA may print out errors like the one shown below.

ALSA error raspberry pi 3 tensorflow

The errors can be removed by commenting out the corresponding devices in /usr/share/alsa/alsa.conf.

alsa conf error raspberry pi 3

Next step is to integrate this to label_wav.py in tensorflow branch: tensorflow/examples/speech_commands/label_wav.py. In the updated file; mod_label_wav.py, I have added a for loop around run_graph() to record a 1 sec wav audio. A new audio sample will be recorded every time when the loop runs and the audio is rewritten with the same file name.

audio inferencing tensorflow raspberry pi

Here is the output. The input file is given as file.wav from the same directory …../speech_commands, the file will be overwritten each time when recording finishes. To start with, create a dummy file file.wav to run the script.

touch file.wav
wget https://github.com/kiranjose/python-tensorflow-speech-recognition/blob/master/mod_label_wav.py
python3 mod_label_wav.py --graph=./my_frozen_graph.pb --labels=./conv_labels.txt --wav=./file.wav

audio inferencing raspberry pi tensorflow

Files:
1. pyAudio_wav.py

2. mod_label_wav.py

This is by no means a great method to do speech inferencing. We need to wait for the script to record and again for the next command. But, this is the start. In the next post I will explain how to detect an audio threshold to activate the recording/inferencing. For this I have forked a google assistant speech invoking script written by jeysonmc, this will be the starting point.

 

Getting Tensorflow 1.4 on RaspberryPi 3

There are two methods to install tensorflow on raspberry pi, installing form binary which is provided by ci.tensorflow.org (you can get all nighly builds for all platform) or building from source which is hard and takes lots of time and setup. Build usually fails multiple times unless you know exactly what you do.

I will explain, how I installed tensorflow 1.4 on raspberry pi3 from pre-compiled binary.

Install pip,

# For Python 2.7
sudo apt-get install python-pip python-dev

# For Python 3.4
sudo apt-get install python3-pip python3-dev

Tensorflow nightly build for pi3 in python3 is avalilable here

Choose the date of the build, copy the link for .whl file.tensorflow nightly for raspberry pi 3

Install tensorflow, for python 3.4 use pip3 and pip for python 2.7.

sudo pip3 install http://ci.tensorflow.org/view/Nightly/job/nightly-pi-python3/39/artifact/output-artifacts/tensorflow-1.4.0-cp34-none-any.whl

or download it if you need a backup and install the file

wget http://ci.tensorflow.org/view/Nightly/job/nightly-pi-python3/39/artifact/output-artifacts/tensorflow-1.4.0-cp34-none-any.whl

sudo pip3 install ./tensorflow-1.4.0-cp34-none-any.whl

Test the installation,

$ python3
>>> import tensorflow as tf

If it is installed correctly, no errors will be shown. If there are errors while running, try uninstalling and install another nighty build binary. The tensorflow speech detection can be now run on raspberry pi after copying the files to the pi. Use scp for copying over ssh.

scp pi@192.168.1.xxx:my_frozen_graph.py /tmp/my_frozen_graph.py
scp pi@192.168.1.xxx:conv_labels.txt /tmp/speech_commands_train/conv_labels.txt
scp pi@192.168.1.xxx:a5d485dc_nohash_0.wav /tmp/speech_dataset/left/a5d485dc_nohash_0.wav

Make sure to clone the tensorflow tree from github to raspberry pi before running the label_wav.py

git clone https://github.com/tensorflow/tensorflow.git
cd tensorflow/examples/speech_commands

python tensorflow/examples/speech_commands/label_wav.py \
--graph=./my_frozen_graph.pb \
--labels=./conv_labels.txt \
--wav=./a5d485dc_nohash_0.wav

running tensorflow speech detection on raspberry pi 3