There was a missing trick expanding the passed pattern to a full word
by multiplication. As a side effect non-zero patterns would be
incorrectly laid down.
This stems from the use of rep stosq which is word-sized, while the passed
argument is byte-sized.
I initially repurposed memcpy into memset without taking this into account.
All but non-bzero testing was performed with a variant utilizing ERMS, i.e.
using only stosb which happens to not into the problem whatsoever. So my bad
twice.
Thanks to Oliver Pinter for noting the problem and providing a testcase.