Author Archives: Félix - Page 4

OpenCV - Detect skew angle

OpenCV

Lately I have been interested in the library OpenCV. This library, initially developed by Intel, is dedicated to computer vision applications and is known to be one of the fastest (and maybe the fastest ?) library available for real-time computer vision.

And yet, if you think the library is not fast enough you can install Intel Integrated Performance Primitives (IPP) on your system and benefit from heavily optimized routines. If you have several cores available on your computer, you can also install Intel Threading Building Blocks (TBB). And if it's still not fast enough, you can run some operations on your GPU using CUDA !

All these features looked really exciting, so I decided to experiment a little bit with OpenCV in order to test its interface and expressive power. In this article I present a simple and very short project that detects the skew angle of a digitized document. My goal is not to present an error-proof program but rather to show the ease of use of OpenCV.

Skew angle

The detection of the skew angle of a document is a common preprocessing step in document analysis.

The document might have been slightly rotated during its digitization. It is therefore necessary to compute the skew angle and to rotate the text before going further in the processing pipeline (i.e. words/letters segmentation and later OCR).

We will work on 5 different images of the same short (but famous) text.

The naming convention for those images is rather simple, the first letter stands for the sign of the angle (p for plus, m for minus) and the following number is the value of the angle.
m8.jpg has therefore been rotated by an angle of -8 degrees.

We assume here that the noise introduced by digitization has been removed by a previous preprocessing stage. We also assume that the text has been isolated: no images, horizontal or vertical separators, etc.

Implementation with OpenCV

First, let's declare a function compute_skew, it takes a path to an image as input and outputs the detected angle to standard output.
First we load the image and stores its size in a variable, very simple.

void compute_skew(const char* filename)
{
   // Load in grayscale.
   cv::Mat src = cv::imread(filename, 0);
   cv::Size size = src.size();

In image processing, objects are white and the background is black, we have the opposite, we need to invert the colors of our image:

   cv::bitwise_not(src, src);

And here is the result:

In order to compute the skew we must find straight lines in the text. In a text line, we have several letters side by side, lines should therefore be formed by finding long lines of white pixels in the image. Here is an example below:

Of course as characters have an height, we find several lines for each actual line from the text. By fine tuning the parameters used later or by using preprocessing we can decrease the number of lines.

So, how do we find lines in the image ? We use a powerful mathematical tool called the Hough transform. I won't dig into mathematical details, but the main idea of the Hough transform is to use a 2D accumulator in order to count how many times a given line has been found in the image, the whole image is scanned and by a voting system the "best" lines are identified.
We used a more efficient variant of the Standard Hough Transform (SHT) called the Probabilistic Hough Transform (PHT). In OpenCV the PHT is implemented under the name HoughLinesP.

In addition to the standard parameters of the Hough Transform, we have two additional parameters:

  • minLineLength – The minimum line length. Line segments shorter than that will be rejected. This is a great tool in order to prune out small residual lines.
  • maxLineGap – The maximum allowed gap between points on the same line to link them.
    This could be interesting for a multi-columns text, for example we could choose to not link lines from different text columns.

Back to C++ now, in OpenCV the PHT stores the end points of the line whereas the SHT stores the line in polar coordinates (relative to the origin). We need a vector in order to store all the end points:

    std::vector<cv::Vec4i> lines;

We are ready for the Hough transform now:

    cv::HoughLinesP(src, lines, 1, CV_PI/180, 100, size.width / 2.f, 20);

We use a step size of 1 for and for , the threshold (the minimum number of votes) is 100.
minLineLength is width/2, this is not an unreasonable assumption if the text is well isolated.
maxLineGap is 20, it seems a sound value for a gap.

In the remaining of the code we simply calculate the angle between each line and the horizontal line using the atan2 math function and we compute the mean angle of all the lines.
For debugging purposes we also draw all the lines in a new image called disp_lines and we display this image in a new window.

    cv::Mat disp_lines(size, CV_8UC1, cv::Scalar(0, 0, 0));
    double angle = 0.;
    unsigned nb_lines = lines.size();
    for (unsigned i = 0; i < nb_lines; ++i)
    {
        cv::line(disp_lines, cv::Point(lines[i][0], lines[i][1]),
                 cv::Point(lines[i][2], lines[i][3]), cv::Scalar(255, 0 ,0));
        angle += atan2((double)lines[i][3] - lines[i][1],
                       (double)lines[i][2] - lines[i][0]);
    }
    angle /= nb_lines; // mean angle, in radians.
 
    std::cout << "File " << filename << ": " << angle * 180 / CV_PI << std::endl;
 
    cv::imshow(filename, disp_lines);
    cv::waitKey(0);
    cv::destroyWindow(filename);
}

We just need a main function in order to call compute_skew on several images:

const char* files[] = { "m8.jpg", "m20.jpg", "p3.jpg", "p16.jpg", "p24.jpg"};
 
int main()
{
	unsigned nb_files = sizeof(files) / sizeof(const char*);
	for (unsigned i = 0; i < nb_files; ++i)
		compute_skew(files[i]);
}

That's all, here are the skew angles we obtain for each image, we have a pretty good accuracy:

4D Julia Set

The project

During the last school year we had numerous maths courses: theory of distributions, model theory, Lie algebra... For this last course, we were given several projects idea by our teacher, we have chosen to make a 3D representation of the special linear group and of the Julia set.
An online representation of the Julia set can be found here.
You can try some presetted values that yielded nice figures or you can mess around with the different parameters (explained in the following section).

Warning:
I recommend using a top notch web browser, if everything is black or you have only a 3D axis, try again with another browser.

This project was my first experiment with Javascript and WebGL using ThreeJS
Using JS was not a requirement for the project, but we thought it would be a good idea to have an online demo !

Explanations

Julia sets are fractals defined by the recurrence relation , with being a constant.
If, for a given , converges, then is in the Julia set.

Julia sets can be represented in 2D (as shown on wikipedia) with ) or in 4D with .
is the set of quaternions, quaternions are hypercomplex numbers of the following form:


With , , being roots of unity, i.e. .

To obtain a representation in a 4D space, each point of this space is represented by a quaternion.
Given a fixed value of , we can compute the corresponding Julia set by testing the convergence of for all points of this 4D space.
In our online demo, the 4 coefficients of the quaternion constant can be modified through the sliders , , and .

We cannot visualize 4D objects, so we need a 3D representation of the Julia set. We have to take a slice of the Julia set, in other words we compute the intersection of the set with an hyperplane.
What this means for our demo is that we assign a constant value to the same coefficient for each quaternion (for example the coefficient associated with ).
The three remaining coefficients will form the three axis of our 3D representation (, et ).
In our online demo we can choose which coefficient should be fixed and modify its value with the slider Hyperplane.

Once we have fixed a coefficient for all quaternions, we iterate through all possible values of , and .
For instance if we have chosen that , the initial value of the sequence is therefore:

Point is in the Julia set (and hence displayed as a white dot) if, after a given number of iterations, the sequence is below a threshold. In our demo we compute 10 iterations.

Project - Optical Character Recognition


Presentation

During my second year at EPITA, the Fooo team has undertaken an ambitious project: a software for Optical Music Recognition (OMR, aka Music OCR).
The project was written mainly in C and OCaml.

Contributions

I was in charge of developing a neural network for character recognition.

As a first attempt I developed a classical multilayer perceptron (MLP).
However the results obtained on the MNIST database of handwritten digits were not good enough. The first major problem of a MLP is its low resilience to shifting, scaling or other forms of distortion. The second major problem is the fact that the geometry of the input image is not taken into account. In order to solve this problem, the input of the MLP is not the raw image but rather the output of a features extractor.

After doing some research on other classification methods and image characterization techniques I decided to implement a simple but efficient variant of MLPs: Convolutional Neural Networks (CNN) introduced by Yann LeCun and Yoshua Bengio in this paper.
The image is divided in several receptive fields, for instance overlapping regions of 5x5 pixels. The second layer is obtained by applying several convolutions on the input image. The values of the convolution kernel are computed during the learning phase of the neural network. The application of a convolution on the input image forms a features map in the subsequent layer. Generally, 4 or 6 features maps are used in the first hidden layer.
The second hidden layer is a subsampling layer, a local average is computed for each value of the previous feature map. Using this technique, the classifier achieves some form of translation invariance.

This convolution/subsampling alternation is used twice before the output layer.
We therefore have this order: input-convolution-subsampling-convolution-subsampling-output.

Using Convolutional Neural Networks we achieved good classification results on the MNIST database. After preprocessing of the input data (for example distortion) we achieved very goods results: around 97% on the test set.

The final report for the project can be found here (in French).
A good explanation on CNNs can be found here.

Project - Fooo

Presentation

Fooo !

Fooo was my very first computer science project. It was developed in 8 months during my first year at EPITA, a french graduate school of computer science and advanced technologies and succeeded in finishing top project of the year.
In a team of 4 students (with Christopher Chedeau, Vladimir Nachbaur and Alban Perillat-Merceroz) we developed a Real Time Strategy (RTS) game largely inspired from Warcraft III. However, textures, models and icons were not taken from MDX/MDL/BLP files of this game. Instead we used non-copyrighted fan-made art, with the approval of the authors.

The core of the game was developed in Delphi, the 3D engine was developed with OpenGL and the interface is in Lua.
Source files, reports (in French) and screenshots can be found on our website.

Video

We have made a video presenting all the features of our project, check it out !

Contribution

As mentioned, Fooo was my first large project experience. The project quickly started after being introduced to programming through the functional language OCaml and later Delphi (language inspired by Object Pascal).

Developing a video game from scratch during the first year seems to be rather a daunting project. The policy of our school to help us develop our project was rather simple: there was no such policy. Of course we had general algorithms courses and smaller Delphi project in the meantime but concerning project design, pathfinding, 3D engine, networking... we had to figure most of it by ourselves by reading books, forums, tutorials...

I was mostly in charge of the core of the game: movement, orders (e.g. patrol, attack), spells, resources generation and usage, unit creation, unit death, building creation, etc.
I also contributed to the interface in order to associate each user action (like mouse selection, button click, etc) to an action in the game's engine.