Author Archives: Félix - Page 2

TBB - Tasks: spawning, dependencies and recycling

The project

Last year we had to implement a spell-checker as a Text Mining project. The challenge was to be as fast as possible, and if possible faster than our teacher !
We implemented a trie and the Damerau-Levenshtein distance for fast lookup of similar words.

But it was still not fast enough, we decided it was a good opportunity to use Intel Threading Building Blocks (TBB) for parallelization. Our program had two modes: an interactive mode and a batch mode.

In interactive mode, the user writes a query composed of a word w and a maximum distance d. The programs then outputs all the words with a Damerau-Levenshtein less than d compared to w.

In batch mode, we have a long list of queries in a file. This is the mode used by our teacher for benchmarking our programs.

Parallelization

We can obviously store all queries in a vector and then apply a simple parallelization algorithm such as tbb::parallel_for. But we would need to store the results of each query and print them after all queries were processed (indeed we can process queries in parallel but we still need to print them in the correct order).

The program would be faster but would use a lot more RAM than the serial version: this is not acceptable. We would like to be able to use all our cores while still printing and flushing the results if possible.

In order to solve this problem I had to manipulate directly tbb::tasks, my solution is maybe not the most elegant, but it was a good opportunity for me to dig into the task system of TBB.

How it works

We need to decompose our problem into small tasks and express their dependencies.
To each query we associate two tasks: a Print task, and an Execute task.
The dependency between tasks is simple: the Print task can only be executed after the corresponding Execute task AND after the previous Print task.
This is pretty much all, now we need to express these dependencies with the task system of TBB.

Implementation

Dependencies in TBB can handled in two ways: by spawning child tasks or by manipulating the reference counter.
When a task A increments the reference counter of another task B, or if A is a child task of B, it means that A must be executed before B. Once A has finished executing, it must decrease the reference counter of B, if the reference counter of a task reaches 0, the task is ready to be executed.

With TBB we declare a task by declaring a class deriving from tbb::task, this class must implement an execute method. This method is called when the task is fired by a thread.
The return value of this method is of type tbb::task*, if the return value is different from 0, it means we specify to the scheduler what task should be executed next (this is some kind of recycling, but be aware that recycling has a different meaning within the TBB library, for example recycle_as_continuation).

The implementation is now quite straightforward: specify dependencies by spawning child tasks and incrementing reference counters.

In order to simplify the code, the dependencies to the rest of the project were removed. Therefore, instead of executing a query, we sleep a random amount of time.

The Print Class

/*! \class Print                                                                                                                                                                                                 
** \brief A tbb::task printing the results of an execution.                                                                                                                                                      
**                                                                                                                                                                                                               
** A Print task references the Print task corresponding to the next query.                                                                                                                                       
*/                                                                                                                                                                                                               
class Print: public tbb::task                                                                                                                                                                                    
{                                                                                                                                                                                                                
public:                                                                                                                                                                                                          
  Print(unsigned idx)                                                                                                                                                                                            
    : next_(0),                                                                                                                                                                                                  
      idx_(idx)                                                                                                                                                                                                  
    {                                                                                                                                                                                                            
      this->increment_ref_count();                                                                                                                                                                               
    }                                                                                                                                                                                                            
 
/*!                                                                                                                                                                                                              
** \brief Set the reference to the next Print task and increment its                                                                                                                                             
** ref count.                                                                                                                                                                                                    
*/                                                                                                                                                                                                               
  void set_next(Print* next)                                                                                                                                                                                     
    {                                                                                                                                                                                                            
      next_ = next;                                                                                                                                                                                              
      next_->increment_ref_count();                                                                                                                                                                              
    }                                                                                                                                                                                                            
 
/*!                                                                                                                                                                                                              
** \brief Print the results to a query.                                                                                                                                                                          
**                                                                                                                                                                                                               
** \return The Print task corresponding to the next query if it can be                                                                                                                                           
** executed, 0 otherwise.                                                                                                                                                                                        
**                                                                                                                                                                                                               
** After executing, a Print task decrements the reference count of the                                                                                                                                           
** next Print task. If this reference count is 0, this query was                                                                                                                                                 
** processed and therefore the result can be printed too.                                                                                                                                                        
*/
  tbb::task* execute()                                                                                                                                                                                           
    {                                                                                                                                                                                                            
      std::cout << "print task " << idx_ << std::endl;                                                                                                                                                           
 
      if (next_ && next_->decrement_ref_count() == 0)                                                                                                                                                            
        return next_; // Spawn next print.                                                                                                                                                                       
      return 0;                                                                                                                                                                                                  
    }                                                                                                                                                                                                            
private:                                                                                                                                                                                                         
  Print* next_; /*!< The Print task corresponding to the next query */                                                                                                                                           
  unsigned idx_;                                                                                                                                                                                                 
};

The Execute Class

/*! \class Execute                                                                                                                                                                                               
** \brief A tbb::task processing a query.                                                                                                                                                                        
*/                                                                                                                                                                                                               
class Execute: public tbb::task                                                                                                                                                                                  
{                                                                                                                                                                                                                
public:                                                                                                                                                                                                          
/*!                                                                                                                                                                                                              
** \brief Constructor                                                                                                                                                                                            
**                                                                                                                                                                                                               
*/                                                                                                                                                                                                               
  Execute(unsigned idx)                                                                                                                                                                                          
    : idx_(idx)                                                                                                                                                                                                  
    {                                                                                                                                                                                                            
    }                                                                                                                                                                                                            
 
/*!                                                                                                                                                                                                              
** \brief Process the current query.                                                                                                                                                                             
**                                                                                                                                                                                                               
** \return 0                                                                                                                                                                                       
**                                                                                                                                                                                                                                                                                                                                                                                                                       
*/                                                                                                                                                                                                               
  tbb::task* execute()                                                                                                                                                                                           
    {                                                                                                                                                                                                            
      std::cout << "execute task " << idx_ << std::endl;                                                                                                                                                         
      usleep(rand() % 500000);                                                                                                                                                                                 
      return 0;                                                                                                                                                                                                  
    }                                                                                                                                                                                                            
private:                                                                                                                                                                                                         
  unsigned idx_;                                                                                                                                                                                                 
};

The RangeSpawner Class

We need a last class in order to create all the tasks and spawn them. The RangeSpawner task emulates tbb::parallel_do.

/*! \class RangeSpawner                                                                                                                                                                                          
** \brief A tbb::task emulating tbb::parallel_do.                                                                                                                                                                
**                                                                                                                                                                                                               
** Spawn an Execute and a Print task for each query and properly set the                                                                                                                                         
** dependencies.                                                                                                                                                                                                 
*/                                                                                                                                                                                                               
class RangeSpawner: public tbb::task                                                                                                                                                                             
{                                                                                                                                                                                                                
public:                                                                                                                                                                                                          
/*!                                                                                                                                                                                                              
** \brief Constructor                                                                                                                                                                                            
**                                                                                                                                                                                                               
*/                                                                                                                                                                                                               
  RangeSpawner(std::vector<int>& queries)                                                                                                                                                                        
    : queries_(queries)                                                                                                                                                                                          
    {                                                                                                                                                                                                            
    }                                                                                                                                                                                                            
 
/*!                                                                                                                                                                                                              
** \brief Spawn two tasks for each query.                                                                                                                                                                        
**                                                                                                                                                                                                               
** Emulates tbb::parallel_do.                                                                                                                                                                                    
**                                                                                                                                                                                                               
*/                                                                                                                                                                                                               
  tbb::task* execute()                                                                                                                                                                                           
    {                                                                                                                                                                                                            
      unsigned size = queries_.size();                                                                                                                                                                           
      // 1 child for the wait and 1 child for each query (Print)                                                                                                                                                                                                
      this->set_ref_count(size + 1);                                                                                                                                                                             
 
      // Allocate print tasks                                                                                                                                                                                    
      std::vector<Print*> print_tasks;                                                                                                                                                                           
      print_tasks.reserve(size);                                                                                                                                                                                 
      for (unsigned i = 0; i < size; ++i)                                                                                                                                                                        
      {                                                                                                                                                                                                          
        Print* p = new(this->allocate_child()) Print(i);                                                                                                                                                         
        print_tasks.push_back(p);                                                                                                                                                                                
      }                                                                                                                                                                                                                                    
      // Set dependencies between print tasks: query i + 1 must be                                                                                                                                               
      // printed after query i.                                                                                                                                                                                  
      for (unsigned i = 0; i < size - 1; ++i)                                                                                                                                                                    
        print_tasks[i]->set_next(print_tasks[i + 1]);                                                                                                                                                            
 
      // Spawn a new Execution task for each query.                                                                                                                                                              
      for (unsigned i = 0; i < size; ++i)                                                                                                                                                                        
      {                                                                                                                                                                                                          
        Execute* e = new(print_tasks[i]->allocate_child()) Execute(i);                                                                                                                                           
        spawn(*e);                                                                                                                                                                                               
      }                                                                                                                                                                                                          
      // Wait the end of all executions.                                                                                                                                                                         
      this->wait_for_all();                                                                                                                                                                                      
      return 0;                                                                                                                                                                                                  
    }                                                                                                                                                                                                            
private:                                                                                                                                                                                                         
  std::vector<int>& queries_;/*!< The list of all queries */                                                                                                                                                     
};

Here the list of queries is just a list of integers (the ID of each request).
Note that tasks are allocated using placement new, therefore we don't need to delete them.
Another interesting point is that we must also set the reference counter of the RangeSpawner task, otherwise the method would exit before the execution of all queries.

Main function

The main function generates 500 queries and execute them in parallel while printing when possible.

int main()                                                                                                                                                                                                       
{                                                                                                                                                                                                                
  std::vector<int> queries;                                                                                                                                                                                      
  unsigned count = 500;                                                                                                                                                                                          
 
  for (unsigned i = 0; i < count; ++i)                                                                                                                                                                           
    queries.push_back(i);                                                                                                                                                                                        
 
  RangeSpawner* root = new(tbb::task::allocate_root()) RangeSpawner(queries);                                                                                                                                    
  tbb::task::spawn_root_and_wait(*root);                                                                                                                                                                         
}

If everything is working well, print task i should always be preceded by execute task i.

Criticism

The main problem with this implementation is that it is highly optimistic, we hope at least one thread will start at the beginning of our vector of queries, if this is not the case, we end up storing a lot of results, waiting for previous queries to be executed and printed.
Another possibility would be to set a dependency between Execute tasks too. Task i can only be executed if, for example, task i - 32 has been executed. But it could affect scalability.

Project - Barcode segmentation and decoding

In the 3 previous articles I explained how to develop a really simple barcode decoding algorithm. This program was really naive and only worked with really clean barcodes.
As an assignment for a school project, a more advanced version was developed. The program can now detect and decode EAN13 barcodes in large images with various objects and different backgrounds.

Demonstration

Here are the output images of the program. Red rectangles indicates that the segmentation phase detected a potential barcode but the decoding phase rejected it. Green rectangles indicates that a barcode was found and correctly decoded (using the check digit equation, we can detect most decoding errors).

Click on the images for a better view of the output.

As you can see, there were some vicious images :), especially the one with the wooden floor that totally messed up our segmentation, fortunately, the decoding process rejected all those potential barcodes.

The barcodes themselves are sometimes really noisy. If you zoom on some of them you will notice there are large changes in illumination, a low resolution, low contrasts... These problems required an approach much more robust than the one described in the previous articles.

OpenCV - UPC Barcode Reader Part 3

After explaining how the UPC system works in Part 1, I have shown how to decode a digit from an image in Part 2 of this serie. In this article I will now demonstrate how to read the entire barcode.

Helper functions

We will need other helper functions for this last task.

We need a function to skip the quiet zone, skip_value advances the scan pointer to the guard pattern

void skip_quiet_zone(const Mat& img, cv::Point& cur)
{
  while (img(cur) == SPACE)
    ++cur.x;
}

We need a function to read the left guard and return the smallest width.

unsigned read_lguard(const Mat& img, cv::Point& cur)
{
  int widths[3] = { 0, 0, 0 };
  int pattern[3] = { BAR, SPACE, BAR };
  for (int i = 0; i < 3; ++i)
    while (img(cur) == pattern[i])
    {
      ++cur.x;
      ++widths[i];
    }
  return widths[0];
}

Here I only return the first width but the best would be to compute the mean. Even better would be to scan the right guard and compute the mean/median of all values. This would be much better for noisy images.

We also need a function to skip the middle guard, note that we don't care about the width right now.

void skip_mguard(const Mat& img, cv::Point& cur)
{
  int pattern[5] = { SPACE, BAR, SPACE, BAR, SPACE };
  for (int i = 0; i < 5; ++i)
    while (img(cur) == pattern[i])
      ++cur.x;
}

Reading the barcode

We have everything we need, the rest is simple: we load the image, we invert it, threshold it, we compute the smallest width with the left guard, we read the 6 left digits, the middle guard, the 6 right digits and that's all ! At the end we print the decoded sequence.

void read_barcode(const std::string& filename)
{
  Mat img = cv::imread(filename, 0);
  cv::Size size = img.size();
  cv::Point cur(0, size.height / 2);
 
  cv::bitwise_not(img, img);
  cv::threshold(img, img, 128, 255, cv::THRESH_BINARY);
  skip_quiet_zone(img, cur);
 
  pattern_map table;
  setup_map(table); // presented in part 2.
 
  int unit_width = read_lguard(img, cur);
  display(img, cur);
 
  std::vector<int> digits;
  for (int i = 0; i < 6; ++i)
  {
    int d = read_digit(img, cur, unit_width, table, LEFT);
    digits.push_back(d);
  }
 
  skip_mguard(img, cur);
 
  for (int i = 0; i < 6; ++i)
  {
    int d = read_digit(img, cur, unit_width, table, RIGHT);
    digits.push_back(d);
  }
 
  for (int i = 0; i < 12; ++i)
    std::cout << digits[i];
  std::cout << std::endl;
  cv::waitKey();
}

Going further

This reader is simple and far from perfect. There are numerous ways to improve it:
- Use several scan lines. For each bit, compute the median of the intensity found by each scan line.
- We assume we always have a match in our decoding table, we should rather compute the distance to each binary sequence and we should choose the digit with the lowest distance. If we have no digit with a low distance, we can try several digits and see which one satisfies the check digit equation.
- ...
But without doubts, the toughest problem with barcode reading will be concerning preprocessing. The input can be noisy, blurry, skewed, scratched...

OpenCV - UPC Barcode Reader Part 2

In the previous article I explained the basics about the UPC barcode system. In this second part I will show how to read an encoded digit.

UPC Reader

Principle

How are going to decode our image ? That's pretty simple, we perform an horizontal scan of our image.
First of all we read the guard in order to find the smallest width and after we read the 6 left digits, the middle guard again and the 6 right digits. To read a digit we read 7 bits and lookup the corresponding digit in a lookup table. Simple !

Digits encoding

First of all we need to store the left and right encoding of each digit, for this purpose we need a std::map (just one because right-encoding is the bitwise complement of left-encoding). The map associates a binary encoding to the value of the digit.
In C++ you can't directly write a binary value (but you can write an hexadecimal value), so I could have just converted the binary sequence in decimal, but for readability I prefer to express the value directly in binary. After a bit of searching I stumbled upon this handy macro:

#define Ob(x)  ((unsigned)Ob_(0 ## x ## uL))
#define Ob_(x) (x & 1 | x >> 2 & 2 | x >> 4 & 4 | x >> 6 & 8 |		\
	x >> 8 & 16 | x >> 10 & 32 | x >> 12 & 64 | x >> 14 & 128)

To convert the binary sequence 0001101 just write Ob(0001101) !
Here is the code to setup the table:

typedef std::map<unsigned, char> pattern_map;
 
void setup_map(pattern_map& table)
{
  table.insert(std::make_pair(Ob(0001101), 0));
  table.insert(std::make_pair(Ob(0011001), 1));
  table.insert(std::make_pair(Ob(0010011), 2));
  table.insert(std::make_pair(Ob(0111101), 3));
  table.insert(std::make_pair(Ob(0100011), 4));
  table.insert(std::make_pair(Ob(0110001), 5));
  table.insert(std::make_pair(Ob(0101111), 6));
  table.insert(std::make_pair(Ob(0111011), 7));
  table.insert(std::make_pair(Ob(0110111), 8));
  table.insert(std::make_pair(Ob(0001011), 9));
}

Now we need an...

Helper function

We will need a small helper function in order to assist us while reading a digit.
First of all we will use the following typedef, because we will only be using grayscale/B&W images:

typedef cv::Mat_<uchar> Mat;

And the two following define (remember we always invert the colors so bars are white !)

#define SPACE 0
#define BAR 255

In all following functions, the cur variable is the position of our horizontal scan pointer.

The following function is really helpful in order to prevent error from accumulating after reading a bit.

For example left-digits always end with a '1' bit and we always begin with a '0' bit, therefore if after reading a digit we are still on a bar, we need to advance our scan pointer until we reach the beginning of the next digit. Also, if the previous bit is '0', we have gone too far and we need to decrease our scan pointer in order to be perfectly on the boundary between the two digits.
The situation is reversed for right-digits.

void align_boundary(const Mat& img, cv::Point& cur, int begin, int end)
{
  if (img(cur) == end)
  {
    while (img(cur) == end)
      ++cur.x;
  }
  else
  {
    while (img(cur.y, cur.x - 1) == begin)
      --cur.x;
  }
}

Reading a digit

We now have almost everything we need in order to read a digit, we just need an enum indicating if we are reading a left or a right-encoded digit.

enum position
{
  LEFT,
  RIGHT
};
 
int read_digit(const Mat& img, cv::Point& cur, int unit_width,
	       pattern_map& table, int position)
{
  // Read the 7 consecutive bits.
  int pattern[7] = {0, 0, 0, 0, 0, 0, 0};
  for (int i = 0; i < 7; ++i)
  {
    for (int j = 0; j < unit_width; ++j)
    {
      if (img(cur) == 255)
        ++pattern[i];
      ++cur.x;
    }
    // See below for explanation.
    if (pattern[i] == 1 && img(cur) == BAR
	|| pattern[i] == unit_width - 1 && img(cur) == SPACE)
      --cur.x;
  }
 
  // Convert to binary, consider that a bit is set if the number of
  // bars encountered is greater than a threshold.
  int threshold = unit_width / 2;
  unsigned v = 0;
  for (int i = 0; i < 7; ++i)
    v = (v << 1) + (pattern[i] >= threshold);
 
  // Lookup digit value.
  char digit;
  if (position == LEFT)
  {
    digit = table[v];
    align_boundary(img, cur, SPACE, BAR);
  }
  else
  {
    // Bitwise complement (only on the first 7 bits).
    digit = table[~v & Ob(1111111)];
    align_boundary(img, cur, BAR, SPACE);
  }
 
  display(img, cur);
  return digit;
}

After reading a bit we are using a little trick in order to increase accuracy:
- if we have read only 1 bar and it was the last pixel, we are probably reading the following bit, so move backward by one pixel.
- if we have read only bars except the last pixel, we are probably reading the following bit, so move backward by one pixel.
Warning: I assume here that the smallest width is not 1. This trick should scale with the value of the smallest width.

And now read part 3 for complete decoding !

OpenCV - UPC Barcode Reader Part 1

As a school project we are assigned to find and decode barcodes in an image. For this project I will be teaming with Julien Marquegnies. This is the first article of a (perhaps) long serie.
As a first draft, this morning I quickly developed a program to decode UPC barcodes. I used OpenCV for loading and displaying but I don't use any advanced function from the library so the code can be quickly adapted to other use cases.

Universal Product Code

UPC is a standardized type of barcode used for scanning goods in stores. Here is an example of a UPC barcode:


First of all, we need to explain how UPC works.

Quiet Zone

To the left and to the right of the barcode we have a large area of white called the quiet zone. This zone is required to be at least 10 times the width of the smallest bar and is here to facilitate barcode detection, it has no interest for decoding.

Guard patterns

We can notice that the same pattern at the left, at the middle and at the right of our barcode:


This patterns is called the guard. The three occurences of this pattern are called (from left to right) S, M, and E (Start, Middle and End). S and E are encoded with 3 bits and the pattern is bar-space-bar. M is encoded with 5 bits with the pattern space-bar-space-bar-space.

Left and right digits

Between S and M we have the left digits and between M and E we have the right digits. Each digit is encoded on 7 bits, giving a total of 84 bits encoding digits, and 11 bits encoding guards.

Left and right digits are encoded differently: the right encoding scheme is the bitwise complement of the left encoding scheme.

The concept of bits for a visual pattern might be unclear, so let me explain: by reading the guard we have determined the width of the thinnest bar, in order to read a digit we read approximatively 7 * width pixels, at each time, if we encountered a bar the current bit will be 1 and 0 otherwise. The encoding of each digit is presented in the following table:

Digit L-code R-code
0 0001101 1110010
1 0011001 1100110
2 0010011 1101100
3 0111101 1000010
4 0100011 1011100
5 0110001 1001110
6 0101111 1010000
7 0111011 1000100
8 0110111 1001000
9 0001011 1110100

Let's zoom on the beginning of our example code, at the left we have our guard pattern and just after we have the first encoded digit.


After reading the guard, we have the width of a single bit, we can now read the next 7 bits and get their value.
We obtain here space-space-space-bar-bar-space-bar, that is to say we have 0001101, and according to our table it is the value 0.

Here are two more barcode images I used:



I tried with other images as well but you can generate your own barcodes online if you want.

Now read part 2 for the first part of the implementation.

SSE - Saturation Arithmetic

In a previous article I presented the SSE instruction set and how to use it in C/C++ with simple examples.
In this article I will introduce other instructions that have a really nice property: they allow you to use saturation arithmetic.

Saturation Arithmetic

With saturation arithmetic the result of an operation is bounded in a range between a minimum and a maximum value. For example, with saturation arithmetic in the range [0, 255], we have: 200 + 70 = 255 and 20 - 70 = 0.

This property would be great in C/C++ because when performing arithmetic the regular way, we use modular arithmetic, for example with an unsigned char we have : 200 + 70 = 14 and 20 - 70 = 206, this phenomenon is called overflow and at CPU level, it can be detected using the carry/overflow flags.

Saturation with SSE

As mentioned in the previous article, SSE was initially designed for signal processing and graphics processing. For example imagine we want to add/subtract two grayscale images together, we would be losing a lot of time if we had to clip the result by hand after the operation.
Fortunately, SSE provides special instructions for saturation arithmetic, with a single assembly instruction you can add several values at the same time and clip all the results.

Example

In the last article we used the _mm_add_epi8 function in order to add 16 char at the same time. In order to perform the same operation but with saturation, we simply use _mm_adds_epi8 (notice the additional 's').

In the following example we add two unsigned char values and print the result (but remember you can add 16 values at the same time).

  unsigned char a[16] __attribute__ ((aligned (16))) = { 200 };                                                                                                                                                  
  unsigned char b[16] __attribute__ ((aligned (16))) = { 70 };                                                                                                                                                   
  __m128i l = _mm_load_si128((__m128i*)a);                                                                                                                                                                                      
  __m128i r = _mm_load_si128((__m128i*)b);  
 
  _mm_store_si128((__m128i*)a, _mm_adds_epu8(l, r));
  std::cout << (int)a[0] << std::endl;

This small program should print 255, instead of 14 with modular arithmetic.

We can also subtract unsigned bytes using _mm_subs_epi8:

  unsigned char a[16] __attribute__ ((aligned (16))) = { 20 };                                                                                                                                                  
  unsigned char b[16] __attribute__ ((aligned (16))) = { 70 };                                                                                                                                                   
  __m128i l = _mm_load_si128((__m128i*)a);                                                                                                                                                                                      
  __m128i r = _mm_load_si128((__m128i*)b);  
 
  _mm_store_si128((__m128i*)a, _mm_subs_epu8(l, r));
  std::cout << (int)a[0] << std::endl;

This program should output 0 (206 with modular arithmetic).

Limitations

With SSE, saturation arithmetic is limited to signed/unsigned 8 and 16 bytes operands: epi8, epu8, epi16, epu16. The available arithmetic operations are adds and subs.