Adding sounds!

Since I’ve implemented multiple image detection last week, this week, I was working on a completely different issue. I was working on incorporating multiple sounds for multiple images, and it all went great! I have managed to do that quite easily, so let me tell you what I’ve achieved this week.

As I’ve mentioned in the introduction, I have managed to implement sound playback. To deal with the audio processing in my app, I’m using the ofxMaxim addon. The addon itself is simply an implementation of Maximillian digital audio library developed by Mick Grierson. It provides easy ways to synthesise, manipulate, analyse and play back audio on multiple platforms. I’ve picked Maxim because I have worked with it before during my previous year of studying, and also because it provides the functionality required for my project.

So far I have written a separate class which would deal with audio in my project called MusicManager. In this class, I have several functions which deal with audio. Here is a basic workflow for the MusicManager class:

We load the sounds in the setup(). After the sounds are loaded, we take their names and store into an array for later use. Ideally, I wanted to load up all the sounds into a vector of maxiSample objects. However, there seem to be some problems with memory allocation within the Maximilian library. So for now, I load up each of the samples manually and store them in separate variables as in maxiSample sampl1, sampl2.. . To get going with the sound analysis I have to set up a Fast Fourier Transform object provided by Maximillian. I can do that fairly simply by using just a few lines of code:

ofxMaxiFFT maximyFft ;
//setup fft
fftSize = 1024;
windowSize = 512;
hopSize =256;
myFft.setup(fftSize, windowSize, hopSize);

I will be using the FFT to do most of my sound analysis and therefore I have created a function called analyseSound(), which would be called when the FFT starts to analyse the music sample that is beeing played.

playingSound = samp.play();
if (myFft.process(playingSound)) {
//call this function to do sound analysis
analyseSound();
}

Inside the analyseSound() function, we will have various operations done on the sound using the FFT. Maximillian provides quite an extensive API to analyse sound, and so the plan is to extract several values from each sound, such as spectral centroid, spectral flatness, pitch histogram, peak frequencies, Mel bands, RMS, MFCCS, octave averages and magnitudes of the bins from the FFT itself. If you would like to see the implementation for each of those values, please check out my GitHub page at https://github.com/jasetom/VinylAR/blob/master/src/MusicManager.cpp, where I provide thoroughly commented code and explain how it all works.

Now that I have multiple sounds attached to multiple images,  I had to write a few functions and figure out a way in which users would be able to switch between multiple songs in the album. I have done that by simply swapping sound files for the specific image that has been detected. Since I’ve done that I have also built a temporary GUI which would let the users use the app more efficiently.

Here is a screen shot of temporary GUI in action:

When pressed, the round button in the lower middle activates functionality which would start looking for an image to match. When found, square buttons on each side would appear. These will let the user navigate through different songs within the album and see different visuals. And so this leads me to the final stage of the development. That is to implement the visuals.

So my tasks for the next upcoming weeks are to implement 3d visuals, better GUI and start writing my dissertation where I will try to explain in more detail how it all works.