it seems strange to me that you are using an array of ints and then casting to __m128i*. why aren't you using an array of __m128i's ?

but if you always need to load 4 ints and can't be guaranteed of alignment. Then i'd make an inline function...

Code:
inline myLoadmm128(__m128i* p)
{
     if ((DWORD)p & 0xF)
          _mm_loadu_si128(p);
    else
          _mm_load_si128(p);
}