Professionally, pixel-by-pixel image processing is done using low level Assembler code to access directly the memory where the pixels are stored. That greatly speedes up the work. You don't have to be an Assembler guru, you only need to know how to write embedded loops, access the stack, the memory and the registers. The _asm keyword allows you to insert directly Assembler code in a C/C++ program.

You will need to rewrite several times the same algorithm, one time for each possible color deepth. The differences are caused by the fact that in a 32 bit per pixel bitmap you will have four bytes defining each pixel, in a 24 bit per pixel you will have 3 bytes defining each pixel and so on.

Also, you'll need to create your bitmap with the CreateDIBSection function.