Project - Optical Character Recognition


Presentation

During my second year at EPITA, the Fooo team has undertaken an ambitious project: a software for Optical Music Recognition (OMR, aka Music OCR).
The project was written mainly in C and OCaml.

Contributions

I was in charge of developing a neural network for character recognition.

As a first attempt I developed a classical multilayer perceptron (MLP).
However the results obtained on the MNIST database of handwritten digits were not good enough. The first major problem of a MLP is its low resilience to shifting, scaling or other forms of distortion. The second major problem is the fact that the geometry of the input image is not taken into account. In order to solve this problem, the input of the MLP is not the raw image but rather the output of a features extractor.

After doing some research on other classification methods and image characterization techniques I decided to implement a simple but efficient variant of MLPs: Convolutional Neural Networks (CNN) introduced by Yann LeCun and Yoshua Bengio in this paper.
The image is divided in several receptive fields, for instance overlapping regions of 5x5 pixels. The second layer is obtained by applying several convolutions on the input image. The values of the convolution kernel are computed during the learning phase of the neural network. The application of a convolution on the input image forms a features map in the subsequent layer. Generally, 4 or 6 features maps are used in the first hidden layer.
The second hidden layer is a subsampling layer, a local average is computed for each value of the previous feature map. Using this technique, the classifier achieves some form of translation invariance.

This convolution/subsampling alternation is used twice before the output layer.
We therefore have this order: input-convolution-subsampling-convolution-subsampling-output.

Using Convolutional Neural Networks we achieved good classification results on the MNIST database. After preprocessing of the input data (for example distortion) we achieved very goods results: around 97% on the test set.

The final report for the project can be found here (in French).
A good explanation on CNNs can be found here.

Comments are closed.