Converted _a3 kernels, use SIMD for CPU and GPU