amd64: tidy up memset to have rax set earlier for small sizes
amd64: finish the tail in memset with an overlapping store
amd64: align memset buffers to 16 bytes before using rep stos
amd64: convert libc bzero to a C func to avoid future bloat
amd64: sync up libc memset with the kernel version
amd64: handle small memset buffers with overlapping stores
Fix -DNO_CLEAN amd64 build after r340463