1.8位加:
*(__m128i*)(dest + i * 16) = _mm_add_epi8(*(__m128i*)(srcA + i * 16), *(__m128i*)(srcB + i * 16));
2. 加载128位数据
__m128i Src1 = _mm_loadu_si128((__m128i *)(LinePS + 0));
3. 把16个8bit数据送给 dst
__m128i _mm_setr_epi8 (char e15, char e14, char e13, char e12, char e11, char e10, char e9, char e8, char e7, char e6, char e5, char e4, char e3, char e2, char e1, char e0)
原文:https://www.cnblogs.com/luoyinjie/p/9083704.html