Introduce a true memset kernel, currently operates on 16 byte per item