Multiple image detection

For the past week, I have been working on multiple image detection. It has caused me a lot of pain, and I’ve spent countless hours in front of my screen. I have managed to get it working and here are the things I’ve learned by working on this.

Not ALL images are the same

When I first started to work with computer vision, I have learned how does it all work. I have also learned how do different algorithms detect features and compute descriptors in the images. I’ve learned quite a lot about this topic, but I have simply forgotten this important fact when I needed it the most, and this caused me to spend hours debugging my app. The reason I’ve spent so many hours debugging was that some of my test images had poor features in them.

I have explained how algorithms match images in my previous posts but just to recap I will explain it again, this time how it all works in my implementation using openframeworks.

Firstly we load up an image into openframeworks, then convert that image to OpenCV colour image. Next, we convert a colour image into OpenCV grayscale image as feature detectors work in grayscale (they only compare the brightness of pixels and not colour). After that, we convert a grayscale image to OpenCV Mat object.

OpenCV Mat object* is an instance of a class with which is made out of two data parts: a header and a pointer to image pixels values. The header contains information such as size, a method for storage and other parameters for pixels matrix. While pointer is pointing to the matrix of image pixels.


Once we’ve got Mat object ready we use it with computer vision algorithms which go throughout each and every pixel in the matrix and does various calculations. That is how we detect features and key points in the images which are then used for image tracking and matching.

Here is how the code looks like for this specific scenario:

//Load up the image
ofImage markerImg;
ofImage markerImg.load(ofToDataPath("images/1.jpg"));

// create the cvImage from the ofImage
ofxCvColorImage analyseImg
ofxCvColorImage analyseImg.setFromPixels(markerImg.getPixels().getData(), markerImg.getWidth(), markerImg.getHeight());
//convert image to grayscale
ofxCvGrayscaleImage greyImg = analyseImg;
// convert to Mat
Mat imgMat = cvarrToMat(greyImg.getCvImage());

// keypoints of the object we're analysing
vector<KeyPoint> imgKeyPoints;

//creating detector
OrbFeatureDetector detector;
detector = OrbFeatureDetector();

//use imgMat in detect function to detect keypoints (features)
detector.detect(imgMat, imgKeyPoints);

Once we do that we end up with a vector of image key points (imgKeyPoints) detected by OrbFeatureDetector. In order to track multiple images, we would have to go through the same sequence for every image and store all of imgKeyPoints in another vector called manyImgKeyPoints. 

vector<vector<KeyPoint>> manyImgKeyPoints;

We also have to compute descriptors for each of the images and store those in a vector:

//defining variables
Mat imgDescriptors;
vector <Mat> manyImgDescriptors

//creating descriptor extractor
OrbDescriptorExtractor extractor;
extractor = OrbDescriptorExtractor();

//calculating descriptors (feature vectors)
extractor.compute(imgMat, imgKeyPoints, imgDescriptors);

//storing multiple image descriptors in one place 

To detect analysed images with a camera, we have to compute keypoints and descriptors for each frame of the camera too. Then use another object provided by OpenCV which is called matcher. As the name suggest, it tries to match features from the images and features coming from the camera.

//create matcher object
BFMatcher matcher;
matcher = BFMatcher();

//train matcher with analysed features from many images   matcher.add(manyDescriptors);

Now to do the matching we use cameraDescriptors inside the knnMatcher function which takes the descriptor of one feature in the first set and matches with all other features in the second set using distance calculation.

vector< vector< DMatch > > matches;
matcher.knnMatch(cameraDescriptors, matches, 20);

The output of this function is a vector of vectors with matches. Each of the matches has these properties: trainIdx, queryIdx and imgIdx. This is important because we can access the variable named imgIdx and see which image contained this feature! Or in other words, we can use this number to confirm which image we have detected.

If we get enough of good matches, we can check of their imgIdx’s and use this number for further development in the app.

int detectedImageId = matches[0][0].imgIdX;

I have used this in my app with multiple images, and it sure does work!

if(detectedImageId ==0){
   cout << wood <<endl;
}else if(detectedImageId == 1){
  cout << pebbles <<endl;

Detecting two different images:

This was very exciting for me as my app was detecting two different images. Next step was to draw homography onto these images. That means taking all of the matches and drawing a square around them so I could add more graphics later on. However, this is where I’ve spent the next few days struggling. I have written all required functions to do this, and it was working for one of the images but not for the other. I tried changing a lot of things, but at the end of the day, the problem was that the image wasn’t good enough! It did not contain enough good features for my detector and extractor to detect, therefore I was not able to draw correct homography for each of the images.

Here are the comparisons of different images analysed:

(The first image has poor features, therefore there is no homography drawn)

My conclusion is that since I’ve traded SurfFeatureDetector to a faster OrbFeatureDetector the quality of feature detection has gone down and that is why I was not able to get good results for the homography.

However, now that I’ve selected images with better features, I’m going to work with I can continue my journey towards completing this app. My next milestone involves working on a music player and augmented reality 3d visuals.