it seems strange to me that you are using an array of ints and then casting to __m128i*. why aren't you using an array of __m128i's ?
but if you always need to load 4 ints and can't be guaranteed of alignment. Then i'd make an inline function...
Code:inline myLoadmm128(__m128i* p) { if ((DWORD)p & 0xF) _mm_loadu_si128(p); else _mm_load_si128(p); }




Reply With Quote