Long before GPUs became the focus of attention for parallel computing, most CPUs acquired a set of low level instructions that allowed multiple items of data to be operated on in a single instruction. SIMD - or Single Instruction Multiple Data - is one of the simplest and easiest parallel processing mechanisms. Essentially what it comes down to is packing multiple values into a single register and then perform the operation as if it was a single value. Of course, there are some overheads - you have to pack and unpack the data - but in most cases these can be minimized. Continue reading this article here.