DWORD-aligned bytes is needed.

Instead of
#define TJPAD(p) (((p)+3)&(~3))
int paddedRow = TJPAD(bitmap.bmWidth*bitmap.bmBitsPixel/8);
Have
#define TJPAD(bits) ((((bits) + 31) & ~31) >> 3)
int...