MFC r271604, r271616:
Add couple memory barriers to order tdq_cpu_idle and tdq_load accesses.
This change fixes transient performance drops in some of my benchmarks,
vanishing as soon as I am trying to collect any stats from the scheduler.
It looks like reordered access to those variables sometimes caused loss of
IPI_PREEMPT, that delayed thread execution until some later interrupt.
Approved by: re (marius)
git-svn-id: svn://svn.freebsd.org/base/stable/10@271707
ccf9f872-aa2e-dd11-9fc8-
001c23d0bc1f