Two questions in this one:

Is memcpy on some platform libraries integer-aligned, or is it always character aligned? I have to do a large batch copy of data that is integer aligned, meaning I know it is either aligned on 4 or 8 byte boundaries, and copying integer to integer SHOULD be faster than copying char to char.

As well, on 64-bit systems, is it 'faster' to use the native type (64-bit integer) for parameters than 32-bit integers, or would it not matter?

The second question is slightly related to the first, as the data being copied is aligned TECHNICALLY on 8-byte boundaries, so it would only be 2 64-bit integers to copy, or 4 32-bit, or 16 bytes.

This is for copying matrices, btw... if you know of a faster way to copy 4x4 matrices, I would love to know of it... I need to do batch copies from one system to another as fast as possible to free up semaphores for multiprocessing, and the faster this part is the better.

Thank you!