OpenCV - Rotation (Deskewing)

In a previous article I presented how to compute the skew angle of a digitized text document by using the Probabilistic Hough Transform. In the last article I presented how to compute a bounding box using OpenCV, this method was also used to compute the skew angle but with a reduced accuracy compared to the first method.

Test Set

We will be using the same small test set as before:

The naming convention for those images is simple, the first letter stands for the sign of the angle (p for plus, m for minus) and the following number is the value of the angle.
m8.jpg has therefore been rotated by an angle of -8 degrees.

Bounding Box

In this article I will assume we have computed the skew angle of each image with a good accuracy and we now want to rotate the text by this angle value. We therefore declare a function called deskew that takes as parameters the path to the image to process and the skew angle.

void deskew(const char* filename, double angle)
{
  cv::Mat img = cv::imread(filename, 0);
 
  cv::bitwise_not(img, img);
 
  std::vector<cv::Point> points;
  cv::Mat_<uchar>::iterator it = img.begin<uchar>();
  cv::Mat_<uchar>::iterator end = img.end<uchar>();
  for (; it != end; ++it)
    if (*it)
      points.push_back(it.pos());
 
  cv::RotatedRect box = cv::minAreaRect(cv::Mat(points));

This code is similar to the previous article: we load the image, invert black and white and compute the minimum bounding box. However this time there is no preprocessing stage because we want the bounding box of the whole text.

Rotation

We compute the rotation matrix using the corresponding OpenCV function, we specify the center of the rotation (the center of our bounding box), the rotation angle (the skew angle) and the scale factor (none here).

  cv::Mat rot_mat = cv::getRotationMatrix2D(box.center, angle, 1);

Now that we have the rotation matrix, we can apply the geometric transformation using the function warpAffine:

  cv::Mat rotated;
  cv::warpAffine(img, rotated, rot_mat, img.size(), cv::INTER_CUBIC);

The 4th argument is the interpolation method. Interpolation is important in this situation, when applying the transformation matrix, some pixels in the destination image might have no predecessor from the source image (think of scaling with a factor 2). Those pixels have no defined value, and the role of interpolation is to fill those gaps by computing a value using the local neighborhood of this pixel.
The quality of the output and the execution speed depends on the method chosen.

The simplest (and fastest) interpolation method is INTER_NEAREST, but it yields awful results:
.

There are four other interpolation methods: INTER_NEAREST, INTER_AREA, INTER_CUBIC and INTER_LANCSOZ4.
For our example those 4 methods yielded visually similar results.
The rotated image using INTER_CUBIC (bicubic interpolation):

Cropping

We should now crop the image in order to remove borders:

  cv::Size box_size = box.size;
  if (box.angle < -45.)
    std::swap(box_size.width, box_size.height);
  cv::Mat cropped;
  cv::getRectSubPix(rotated, box_size, box.center, cropped);

As mentioned in the previous article, if the skew angle is positive, the angle of the bounding box is below -45 degrees because the angle is given by taking as a reference a "vertical" rectangle, i.e. with the height greater than the width.
Therefore, if the angle is positive, we swap height and width before calling the cropping function.

Cropping is made using getRectSubPix, you must specify the input image, the size of the output image, the center of the rectangle and finally the output image.
We use the original center because the center of a rotation is invariant through this transformation.
This function works at a sub-pixel accuracy (hence its name): the center of the rectangle can be a floating point value.

The cropped image:
Cropped
To better understand the problem we have with positive angles, here what you would get without the correction:


We can immediately see that we just need to swap the height and the width of the rectangle.

Display

This is a small demo so let's display the original image, the rotated image and the cropped image:

  cv::imshow("Original", img);
  cv::imshow("Rotated", rotated);
  cv::imshow("Cropped", cropped);
  cv::waitKey(0);
}

That's it ! It's really simple to rotate an image with OpenCV !

OpenCV - Bounding Box & Skew Angle

In a previous article I presented how to compute the skew angle of a digitized text document by using the Probabilistic Hough Transform.
In this article we will present another method in order to calculate this angle , this method is less acurate than the previous one but our goal is rather to introduce two new OpenCV techniques: image scan with an iterator and computing the minimum bounding box of a set of points.

Bounding Box

The minimum bounding box of a set of 2D points set is the smallest rectangle (i.e. with smallest area) that contains all the points of the set. Given an image representing a text, like this one:

The points of our 2D set are all the white pixels forming the different letters.
If we can compute the bounding box of this set, it will be possible to compute the skew angle of our document. Given a bounding box of our set, it will also be easy to extract our text from the image and rotate it (probably in a future article).

Preprocessing

Our text is small but we have a large number of points, indeed the resolution of our image is large, we have many pixels per letter. We have several possibilities here: we can downstrongle the image in order to reduce its resolution, we can use mathematical morphology (i.e. erosion), etc. There are certainly other solutions, you will have to choose one depending on what are the previous or next stages in your processing pipeline (e.g. maybe you already have a downstrongled image).

In this article I have chosen to experiment using mathematical morphology for this problem.
We used a 5x3 rectangle shaped structuring element, that is to say we want to keep pixels that lies in a region of white pixels of height 3 and width 5.
Here is the result on the previous image:

Okay, this erosion was really aggressive, we removed most pixels and in fact only some letters "survived" the operation. Calculating the bounding box will be really fast but we may have stripped too much information, it might cause problems on certain images. However as we will see, there are still enough pixels to get decent results.

Implementation

The OpenCV program is similar to the one presented in the previous article.
We declare a compute_skew function that takes as input the path to the image to process, at the beginning of the function we load this image in grayscale, we binarize it and we invert the colors (because objects are represented as white pixels, and the background is represented by black pixels).

void compute_skew(const char* filename)
{
  // Load in grayscale.
  cv::Mat img = cv::imread(filename, 0);
 
  // Binarize
  cv::threshold(img, img, 225, 255, cv::THRESH_BINARY);
 
  // Invert colors
  cv::bitwise_not(img, img);

We can now perform our erosion, we must declare our rectangle-shaped structuring element and call the erode function:

  cv::Mat element = cv::getStructuringElement(cv::MORPH_RECT, cv::Size(5, 3));
  cv::erode(img, img, element);

Now we must create our set of points before calling the function computing the bounding box. As this function cannot be called on an image, we must extract all the positions of our white pixels, this is a great opportunity to present how to scan an image using an iterator:

  std::vector<cv::Point> points;
  cv::Mat_<uchar>::iterator it = img.begin<uchar>();
  cv::Mat_<uchar>::iterator end = img.end<uchar>();
  for (; it != end; ++it)
    if (*it)
      points.push_back(it.pos());

We declare a vector of points in order to store all white pixels. Like when we iterate on a container in C++, we must declare an iterator and also get the iterator representing the end of our container. We use the Mat_ class, note the underscore at the end: it is because it is a templated class: here we must precise the type of the underlying data type. The image has only one channel of size 1 byte, the type is therefore uchar (unsigned char).

We can now use the OpenCV function in order to compute the minimal bounding box, this function is called minAreaRect, this function need a cv::Mat as input so we must convert our vector of points.

  cv::RotatedRect box = cv::minAreaRect(cv::Mat(points));

That's it, we have our minimal bounding box!
We can now have access to the angle:

  double angle = box.angle;
  if (angle < -45.)
    angle += 90.;

During testing I notice I only got negative angle and never below -90 degrees. This is because as we have no reference rectangle, there are several ways to compute the rotation angle. In our case, if the angle is less than -45 degrees, the angle was computed using a "vertical" rectangle, we must therefore correct the angle by adding 90 degrees.

Finally, we will display our bounding rectangle on our eroded image and print the angle on standard output:

  cv::Point2f vertices[4];
  box.points(vertices);
  for(int i = 0; i < 4; ++i)
    cv::line(img, vertices[i], vertices[(i + 1) % 4], cv::Scalar(255, 0, 0), 1, CV_AA);
 
  std::cout << "File " << filename << ": " << angle << std::endl;
  cv::imshow("Result", img);
  cv::waitKey(0);

Note that the line was anti-aliased using CV_AA as the last argument of cv::line.

Results

Using the same 5 images as in the previous article, we obtained the following angles:

The naming convention for those images is simple, the first letter stands for the sign of the angle (p for plus, m for minus) and the following number is the value of the angle.
The results are not as good as in the previous article, especially with a large angle. Indeed with large values, our preprocessing step (the erosion) will be less meaningful because pixels of a single letter are not vertically or horizontally aligned at all.

Note that the bounding box obtained is not the bounding box of our whole text. The erosion removed a lot of information, if we try to match the bounding box on the initial text, we will lose for example the upper part of letters with a larger height that other letters (e.g. f, t, h, d, l...). Compute the bounding box on the unprocessed text (or use a smaller structuring element) if you want the bounding box of the whole text.

C++ - Getting started with SSE

In this article I will present how to use SSE instructions in C++ (or C).
My goal is not to show how to write the fastest possible program using SSE but rather to introduce to its usage.

What is SSE ?

SSE stands for Streaming SIMD Extensions. It is a set of CPU instructions dedicated to applications like signal processing, scientific computation or 3D graphics.

SIMD is an acronym itself: Single Instruction, Multiple Data. A CPU instruction is said to be SIMD when the same operation is applied on multiple data at the same time.

SSE was first introduced in the Pentium III in 1999. Over the time, this instruction set was improved by adding more sophisticated operations. Eight 128-bits registers were added to the CPU: xmm0 through xmm7.

Initially those registers could only be used for single-precision computations (i.e. for the type float). But since SSE2, those registers can be used for any primitive data type.
Given a standard 32-bit machine we can therefore store and process in parallel:
- 2 double
- 2 long
- 4 float
- 4 int
- 8 short
- 16 char
Note that integer types can be signed or unsigned, but you will have to use (sometimes) different instructions.

For instance if you have to compute the sum of two int arrays, using SSE you can compute 4 additions in a single assembly instruction.

Simple program

It is not easy getting started with SSE, fortunately the MSDN documentation is complete and well-written.
If you take a look at the list of arithmetic operations you can notice there is always a corresponding assembly instruction. Some operations on the other side (such as set operations) are composite operations.
Using SSE in C++ is indeed a very low-level construct: we will manipulate directly the 128-bits registers through variables of type __m128 for float, __m128d for double and __m128i for int, short, char.

But the good news is that you don't need to declare arrays of type __m128 in order to use SSE: for instance if you want to calculate the square root of a large array of float, you can simply cast your pointer to the type __m128* and then use SSE operations.

There is a tweak however, most SSE operations requires the data to be 16-bytes aligned, here we will use another variable attributes of gcc.
We use the align attribute:

aligned (alignment)
This attribute specifies a minimum alignment for the variable or structure field, measured in bytes.

Here is a simple code on how to use SSE in order to compute the square root of 4 float in a single operation using the _mm_sqrt_ps function.

  float a[] __attribute__ ((aligned (16))) = { 41982.,  81.5091, 3.14, 42.666 };                                                                                                                                 
  __m128* ptr = (__m128*)a;                                                                                                                                                                                      
  __m128 t = _mm_sqrt_ps(*ptr);

The corresponding assembly instruction is SQRTPS, if we generate the assembly code of our program (using gcc -S) we can notice that this instruction is indeed used on a SSE register:

sqrtps  %xmm0, %xmm0

Don't forget the include !

  #include <emmintrin.h>

First benchmark

In the previous program we computed the square root of 4 float simultaneously but the result was not stored in the array. For this purpose we use _mm_store_ps.
In the following program we compute the square root of a very large array on float. We use the timer class presented here in order to measure the execution time of the SSE version against the standard version.

void normal(float* a, int N)                                                                                                                                                                                     
{                                                                                                                                                                                                                
  for (int i = 0; i < N; ++i)                                                                                                                                                                                    
    a[i] = sqrt(a[i]);                                                                                                                                                                                           
}                                                                                                                                                                                                                
 
void sse(float* a, int N)                                                                                                                                                                                        
{                      
  // We assume N % 4 == 0.                                                                                                                                                                                        
  int nb_iters = N / 4;                                                                                                                                                                                         
  __m128* ptr = (__m128*)a;                                                                                                                                                                                      
 
  for (int i = 0; i < nb_iters; ++i, ++ptr, a += 4)                                                                                                                                                              
    _mm_store_ps(a, _mm_sqrt_ps(*ptr));                                                                                                                                                                          
}                                                                                                                                                                                                                
 
int main(int argc, char** argv)                                                                                                                                                                                  
{                                                                                                                                                                                                                
  if (argc != 2)                                                                                                                                                                                                 
    return 1;                                                                                                                                                                                                    
  int N = atoi(argv[1]);                                                                                                                                                                                         
 
  float* a;                                                                                                                                                                                                      
  posix_memalign((void**)&a, 16,  N * sizeof(float));                                                                                                                                                            
 
  for (int i = 0; i < N; ++i)                                                                                                                                                                                    
    a[i] = 3141592.65358;                                                                                                                                                                                        
 
  {                                                                                                                                                                                                              
    TIMER("normal");                                                                                                                                                                                             
    normal(a, N);                                                                                                                                                                                                
  }                                                                                                                                                                                                              
 
  for (int i = 0; i < N; ++i)                                                                                                                                                                                    
    a[i] = 3141592.65358;                                                                                                                                                                                        
 
  {                                                                                                                                                                                                              
    TIMER("SSE");                                                                                                                                                                                                
    sse(a, N);                                                                                                                                                                                                   
  }
}

In the SSE function we use two pointers containing the same address but of different types, it is of course not necessary but avoid casting. It is also interesting to notice that we must increment the __m128 pointer by only 1 (128 bits) whereas we must increment the float pointer by 4 (32 bits).

Another interesting function here is the posix_memalign function instead of the align attribute. The first function allocates aligned data on the heap whereas the gcc attribute allocates on the stack.

The benchmark was compiled with llvm-g++ 4.2 (flags: -O3 -msse2) and ran on a Intel Core2 Duo P7350 (2GHz):

$ ./sqrt 64000000
normal: 392ms
SSE: 145ms

Pretty fast indeed !

Second Benchmark

And now how to add two arrays of char together:

void sse(char* a, const char* b, int N)                                                                                                                                                                          
{                                                                                                                                                                                                                
  int nb_iters = N / 16;                                                                                                                                                                                         
 
  __m128i* l = (__m128i*)a;                                                                                                                                                                                      
  __m128i* r = (__m128i*)b;                                                                                                                                                                                      
 
  for (int i = 0; i < nb_iters; ++i, ++l, ++r)                                                                                                                                                                   
    _mm_store_si128(l, _mm_add_epi8(*l, *r));                                                                                                                                                                    
}

And the benchmark with full optimizations turned on:

$ ./add 64000000
normal: 98ms
SSE: 42ms

Performance considerations

You may ask why we don't have a 4-fold speedup when we compute the square root of 4 float in a single instruction and why we only have a 2-fold speedup when we add 16 char at the same time.

The answer is that your compiler is smart and performs a lot of optimizations, especially at the O3 level. In fact, if you take a look at the generated assembly code for the "normal" version of the sort and add functions you will probably notice that your compiler used SSE instructions behind your back. The compiler detected a loop pattern suitable for SSE and therefore decided to use dedicated instructions, however using directly the SSE instructions the generated code was even faster.

Depending on your compiler version, for such simple loops it is possible that you will notice no difference in execution time, but as mentioned, my goal was not pure performances.

C++ - Resource Acquisition Is Initialization

Resource Acquisition Is Initialization

In the previous article we presented a simple timer class that initializes its internal clock in the constructor and computes the elapsed time in the destructor.

In fact, initializing a resource in a constructor and discarding it in a destructor is a common idiom in C++ called Resource Acquisition Is Initialization (RAII).
This is a great technique when dealing with exception handling: for example let's say at the beginning of a function you allocate 10 blocks of memory, open 2 files and acquire a lock. Of course you are really tidy so you properly free the allocated memory, close the files and release the lock at the end of the function.

However if an exception is raised in the middle of your code, you are in deep trouble: your program leaks memory, file descriptors are still opened and the lock is not released...
When leaving a block (even when an exception is raised), all variables allocated on the stack (i.e. local variables) are destroyed, this process is called stack unwinding.

Even if you are not concerned with exceptions, by using the RAII idiom you can write simpler code and there is no risk you accidentally forget to release a lock !
For example, using RAII with a lock:

// Before critical section
{ // Critical section !
  Lock lock(mutex); // Construct Lock object and acquire mutex.
  // Do some super critical stuff
  ...
  // End of scope: variable lock is destroyed and the mutex is released automatically.
}
// After critical section

auto_ptr

You can also use RAII to handle files or you can use it with raw pointers using the C++ std::auto_ptr class. When the variable reaches end of scope, the destructor automatically calls delete on the pointer. Using operator * we can manipulate an auto_ptr just like a pointer of the underlying type:

{                                                                                                                                                                                                                
  std::auto_ptr<int> p(new int);                                                                                                                                                                                 
  *p = 42;                                                                                                                                                                                                       
  std::cout << *p << std::endl;
  // Allocated memory is released.                                                                                                                                                                             
}

RAII in C

Using gcc variable attributes you can use the RAII idiom in C!
We use the cleanup attribute:

cleanup (cleanup_function)
The cleanup attribute runs a function when the variable goes out of scope. This attribute can only be applied to auto function scope variables; it may not be applied to parameters or variables with static storage duration. The function must take one parameter, a pointer to a type compatible with the variable. The return value of the function (if any) is ignored.

Wikipedia presents an example on how to automatically call fclose() on a file descriptor using the cleanup attribute:

#define RAII_VARIABLE(vartype,varname,initval,dtor) \
    void _dtor_ ## varname (vartype * v) { dtor(*v); } \
    vartype varname __attribute__((cleanup(_dtor_ ## varname))) = (initval)
 
void example_usage() {
  RAII_VARIABLE(FILE*, logfile, fopen("logfile.txt", "w+"), fclose);
  fputs("hello logfile!", logfile);
}

C++ - Timer Class

Timer Class

In C++ we often need to measure execution time of a small block or function call.
There are several ways to do this in C++, and it is a real pain to come up with both a portable and accurate version.
Using standard C++ only we can write a simple timer class in a few lines only using std::clock(). This function returns the number of clock ticks elapsed since the program was launched. Unfortunately, on some systems the precision of this function is terrible.

class Timer
{
public:
  Timer(const std::string& name)
    : name_ (name),
      start_ (std::clock())
    {
    }
  ~Timer()
    {
      double elapsed = (double(std::clock() - start_) / double(CLOCKS_PER_SEC));
      std::cout << name_ << ": " << int(elapsed * 1000) << "ms" << std::endl;
    }
private:
  std::string name_;
  std::clock_t start_;
};                                                                                                                                                                                                               
 
#define TIMER(name) Timer timer__(name);

An instance of the Timer class is created by providing a name. The constructor initializes the timer by using std::clock(). When the object is destroyed we compute the elapsed time and print it on standard output. We also define a macro TIMER(name) for convenience.

Usage

As the timer is initialized in the constructor and terminated in the destructor, we need to isolate the part of the code we want to measure in a block. When the end of the block is reached, we exit the scope of the timer variable and the destructor is called.
For example:

  int i = ...;
  for (...)
    ...
 
  {
    TIMER("Huge Loop"); // initialize timer
    for (...)
      ...
    // timer variable is destroyed: print elapsed time.
  }

The output can be something like this: Huge Loop: 352ms

User time VS Wall time

Good to know: by using std::clock() (instead of date-based methods) we measure user time, that is to say time actually spent by the CPU for the execution of the current program. If the process is put to sleep for a long time, we won't notice any change. For example the following code:

{
  TIMER("sleep");
  sleep(2); // go to sleep for 2s
}

should print this: sleep: 0ms.
If wall clock time was measured we would have approximately 2000ms here.

OpenCV - Morphological Skeleton

Skeleton

In many computer vision applications we often have to deal with huge amounts of data: processing can therefore be slow and requires a lot of memory.
In order to achieve faster processing and a smaller memory footprint, we sometimes use a more compact representation called a skeleton.
A skeleton must preserve the structure of the shape but all redundant pixels should be removed.
Here is a skeleton of the letter "B":

In this article we will present how to compute a morphological skeleton with the library OpenCV.
The skeleton obtained is far from perfect but it is a really simple method compared to other existing algorithms.

Pseudocode

As described on Wikipedia, a morphological skeleton can be computed using only the two basic morphological operations: dilate and erode.
In pseudo code, the algorithm works as follow:

img = ...;
while (not_empty(img))
{
    skel = skel | (img & !open(img));
    img = erosion(img);
}

At each iteration the image is eroded again and the skeleton is refined by computing the union of the current erosion less the opening of this erosion. An opening is simply an erosion followed by a dilation.

Implementation

It's really straightforward, first load the image to process in grayscale and transform it to a binary image using thresholding:

cv::Mat img = cv::imread("O.png", 0);
cv::threshold(img, img, 127, 255, cv::THRESH_BINARY);

We now need an image to store the skeleton and also a temporary image in order to store intermediate computations in the loop. The skeleton image is filled with black at the beginning.

cv::Mat skel(img.size(), CV_8UC1, cv::Scalar(0));
cv::Mat temp(img.size(), CV_8UC1);

We have to declare the structuring element we will use for our morphological operations, here we use a 3x3 cross-shaped structure element (i.e. we use 4-connexity).

cv::Mat element = cv::getStructuringElement(cv::MORPH_CROSS, cv::Size(3, 3));

And now the core of the algorithm, the main loop. We need a boolean variable in order to check if there is at least one pixel remaining. Operations are done in-place when possible.

bool done;
do
{
  cv::morphologyEx(img, temp, cv::MORPH_OPEN, element);
  cv::bitwise_not(temp, temp);
  cv::bitwise_and(img, temp, temp);
  cv::bitwise_or(skel, temp, skel);
  cv::erode(img, img, element);
 
  double max;
  cv::minMaxLoc(img, 0, &max);
  done = (max == 0);
} while (!done);

The use of the minMaxLoc function deserves an explanation. We want to check if there is still at least one pixel in the image, unfortunately I have not found a function for this task in OpenCV, therefore I just check if the maximum value is 0. minMaxLoc stores the minimum value in the second parameter (ignored if NULL pointer) and the maximum in the third parameter. A short-circuit OR function would be nice for this task.

The loop is over, we have our skeleton, let's display it!

cv::imshow("Skeleton", skel);
cv::waitKey(0);

Results

On a big "O":

On the word "OpenCV":

Optimization

As discussed with Arthur Kalverboer in the comments below, it is possible to optimize the computation in several ways.

First of all we can notice we perform the open operation and just after we perform an erosion on the same image, but an opening is just an erosion followed by a dilation, so we can perform the erosion and save it to a new image eroded, and at the end of the loop we copy eroded to img.

The second optimization concerns the use of cv::minMaxLoc in order to check if an image still has white pixels, computing the norm (cv::norm) of the image is faster.
EDIT2: Abid Rahman told me the function 'cv::countNonZero' is even faster, I didn't know this function existed, thanks!

Finally the last optimization is to replace the and and not operations by a simple set difference operation (cv::subtract). This works because we only manipulate binary images.

Here is the updated code:

cv::threshold(img, img, 127, 255, cv::THRESH_BINARY); 
cv::Mat skel(img.size(), CV_8UC1, cv::Scalar(0));
cv::Mat temp;
cv::Mat eroded;
 
cv::Mat element = cv::getStructuringElement(cv::MORPH_CROSS, cv::Size(3, 3));
 
bool done;		
do
{
  cv::erode(img, eroded, element);
  cv::dilate(eroded, temp, element); // temp = open(img)
  cv::subtract(img, temp, temp);
  cv::bitwise_or(skel, temp, skel);
  eroded.copyTo(img);
 
  done = (cv::countNonZero(img) == 0);
} while (!done);

Also, don't forget to crop your images before processing. The two images I gave as examples are not cropped, cropping them (manually or using OpenCV) also improves execution time.