In a previous article I presented the SSE instruction set and how to use it in C/C++ with simple examples.

In this article I will introduce other instructions that have a really nice property: they allow you to use saturation arithmetic.

### Saturation Arithmetic

With saturation arithmetic the result of an operation is bounded in a range between a minimum and a maximum value. For example, with saturation arithmetic in the range [0, 255], we have: 200 + 70 = 255 and 20 - 70 = 0.

This property would be great in C/C++ because when performing arithmetic the regular way, we use modular arithmetic, for example with an **unsigned char** we have : 200 + 70 = 14 and 20 - 70 = 206, this phenomenon is called **overflow** and at CPU level, it can be detected using the carry/overflow flags.

### Saturation with SSE

As mentioned in the previous article, SSE was initially designed for signal processing and graphics processing. For example imagine we want to add/subtract two grayscale images together, we would be losing a lot of time if we had to clip the result by hand after the operation.

Fortunately, SSE provides special instructions for saturation arithmetic, with a single assembly instruction you can add several values at the same time and clip all the results.

### Example

In the last article we used the **_mm_add_epi8** function in order to add 16 **char** at the same time. In order to perform the same operation but with saturation, we simply use **_mm_adds_epi8** (notice the additional 's').

In the following example we add two **unsigned char** values and print the result (but remember you can add 16 values at the same time).

unsigned char a[16] __attribute__ ((aligned (16))) = { 200 }; unsigned char b[16] __attribute__ ((aligned (16))) = { 70 }; __m128i l = _mm_load_si128((__m128i*)a); __m128i r = _mm_load_si128((__m128i*)b); _mm_store_si128((__m128i*)a, _mm_adds_epu8(l, r)); std::cout << (int)a[0] << std::endl; |

This small program should print **255**, instead of **14** with modular arithmetic.

We can also subtract unsigned bytes using **_mm_subs_epi8**:

unsigned char a[16] __attribute__ ((aligned (16))) = { 20 }; unsigned char b[16] __attribute__ ((aligned (16))) = { 70 }; __m128i l = _mm_load_si128((__m128i*)a); __m128i r = _mm_load_si128((__m128i*)b); _mm_store_si128((__m128i*)a, _mm_subs_epu8(l, r)); std::cout << (int)a[0] << std::endl; |

This program should output **0** (**206** with modular arithmetic).

### Limitations

With SSE, saturation arithmetic is limited to signed/unsigned 8 and 16 bytes operands: **epi8**, **epu8**, **epi16**, **epu16**. The available arithmetic operations are **adds** and **subs**.

Pingback: SSE – Image Processing | Félix Abecassis()

Pingback: Saturated arithmetic | Vileandr()