CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log

Add ARM64TODO comments to ACPI PCI stubs

This will make searching for missing functionalities easier.

* Address review (and add a bit myself).
- Tweek man page.
- Remove all mention of RANDOM_FORTUNA. If the system owner wants YARROW or DUMMY, they ask for it, otherwise they get FORTUNA.
- Tidy up headers a bit.
- Tidy up declarations a bit.
- Make static in a couple of places where needed.
- Move Yarrow/Fortuna SYSINIT/SYSUNINIT to randomdev.c, moving us towards a single file where the algorithm context is used.
- Get rid of random_*_process_buffer() functions. They were only used in one place each, and are better subsumed into those places.
- Remove *_post_read() functions as they are stubs everywhere.
- Assert against buffer size illegalities.
- Clean up some silly code in the randomdev_read() routine.
- Make the harvesting more consistent.
- Make some requested argument name changes.
- Tidy up and clarify a few comments.
- Make some requested comment changes.
- Make some requested macro changes.

* NOTE: the thing calling itself a 'unit test' is not yet a proper
  unit test, but it helps me ensure things work. It may be a proper
  unit test at some time in the future, but for now please don't make
  any assumptions or hold any expectations.

Differential Revision: https://reviews.freebsd.org/D2025
Approved by: so (/dev/random blanket)

Implement stubs for ACPI PCI routines

ACPI driver requires special functions to be provided by machdep code.
Add temporary stubs to satisfy the compiler when both "pci" and "acpi"
are enabled in the kernel configuration file.

Reviewed by: andrew
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3028

Run a shell in the jail when no command is specified.
Add a new flag, -l, for a clean environment, same as jail(8) exec.clean.
Change the GET_USER_INFO macro into a function.

PR: 201300
Submitted by: Willem Jan Withagen
MFC after: 3 days

Add minimum regression tests for pw -R

Add new include path for sha256.h

This fixes the bootstrap build on FreeBSD 10.

Submitted by: andrew

Try to unbreak the build after r285390 removing the obsolete static
declaration.

Make getarg return NULL if args is NULL

Fix regression: ensure when try to create the group and the user with the same
id if possible and nothing in particular was specified

Remove now unused variable

Replace custom string array with stringlist(3)

Rework groupmod modification:

Use gr_add(3) when possible to avoid code duplication.
Use a simpler logic to delete members of a group

Remove unused argument from pm_passwd

check the gecos format early: at the moment the -c option is parsed

Return the FDT node of the GPIO controller to gpiobus. It is used by the
children of gpiobus.

Remove useless use of goto

Isolate pw lock/unlock into a separate function

Implement normal and abnormal process termination.

CloudABI does not provide an explicit kill() system call, for the reason
that there is no access to the global process namespace. Instead, it
offers a raise() system call that can at least be used to terminate the
process abnormally.

CloudABI does not support installing signal handlers. CloudABI's raise()
system call should behave as if the default policy is set up. Call into
kern_sigaction(SIG_DFL) before calling sys_kill() to force this.

Obtained from: https://github.com/NuxiNL/freebsd

homedir can only be populate during useradd

Make a separate groupdel/userdel from the main function

Use FDDUP_NORMAL instead of hardcoding value 0.

Proposed by: mjg

Add missing function parameter.

A function parameter got added in r285356, meaning that the call to
kern_dup() needs to be patched up.

Make separate functions to show users and groups

cpu_number and cpu_swapout are never used, and only defined in powerpc.

Move the quiet flag into the configuration structure

Separate usernext/groupnext from the main functions

linprocfs: vref the vnode passed to vn_fullpath

vfs: always clear VI_OWEINACT in consumers bumping v_usecount

Previously vputx would detect the condition and clear the flag.

With this change it is invalid to have both v_usecount > 0 and the flag
set. Assert the condition is met in all revlevant places.

Reviewed by: kib

vfs: move si_usecount manipulation to dedicated functions

Reviewed by: kib

Create a dedicated function for ensuring that cdir and rdir are populated.

Previously several places were doing it on its own, partially
incorrectly (e.g. without the filedesc locked) or even actively harmful
by populating jdir or assigning rootvnode without vrefing it.

Reviewed by: kib

Move chdir/chroot-related fdp manipulation to kern_descrip.c

Prefix exported functions with pwd_.

Deduplicate some code by adding a helper for setting fd_cdir.

Reviewed by: kib

Always send a SIGSEGV on a map failure. Use the code to tell the reason
for the signal.

Sponsored by: ABT Systems Ltd

Regenerate syscalls.

Add an initial NUMA affinity/policy configuration for threads and processes.

This is based on work done by jeff@ and jhb@, as well as the numa.diff
patch that has been circulating when someone asks for first-touch NUMA
on -10 or -11.

* Introduce a simple set of VM policy and iterator types.
* tie the policy types into the vm_phys path for now, mirroring how
  the initial first-touch allocation work was enabled.
* add syscalls to control changing thread and process defaults.
* add a global NUMA VM domain policy.
* implement a simple cascade policy order - if a thread policy exists, use it;
  if a process policy exists, use it; use the default policy.
* processes inherit policies from their parent processes, threads inherit
  policies from their parent threads.
* add a simple tool (numactl) to query and modify default thread/process
  policities.
* add documentation for the new syscalls, for numa and for numactl.
* re-enable first touch NUMA again by default, as now policies can be
  set in a variety of methods.

This is only relevant for very specific workloads.

This doesn't pretend to be a final NUMA solution.

The previous defaults in -HEAD (with MAXMEMDOM set) can be achieved by
'sysctl vm.default_policy=rr'.

This is only relevant if MAXMEMDOM is set to something other than 1.
Ie, if you're using GENERIC or a modified kernel with non-NUMA, then
this is a glorified no-op for you.

Thank you to Norse Corp for giving me access to rather large
(for FreeBSD!) NUMA machines in order to develop and verify this.

Thank you to Dell for providing me with dual socket sandybridge
and westmere v3 hardware to do NUMA development with.

Thank you to Scott Long at Netflix for providing me with access
to the two-socket, four-domain haswell v3 hardware.

Thank you to Peter Holm for running the stress testing suite
against the NUMA branch during various stages of development!

Tested:

* MIPS (regression testing; non-NUMA)
* i386 (regression testing; non-NUMA GENERIC)
* amd64 (regression testing; non-NUMA GENERIC)
* westmere, 2 socket (thankyou norse!)
* sandy bridge, 2 socket (thankyou dell!)
* ivy bridge, 2 socket (thankyou norse!)
* westmere-EX, 4 socket / 1TB RAM (thankyou norse!)
* haswell, 2 socket (thankyou norse!)
* haswell v3, 2 socket (thankyou dell)
* haswell v3, 2x18 core (thankyou scott long / netflix!)

* Peter Holm ran a stress test suite on this work and found one
  issue, but has not been able to verify it (it doesn't look NUMA
  related, and he only saw it once over many testing runs.)

* I've tested bhyve instances running in fixed NUMA domains and cpusets;
  all seems to work correctly.

Verified:

* intel-pcm - pcm-numa.x and pcm-memory.x, whilst selecting different
  NUMA policies for processes under test.

Review:

This was reviewed through phabricator (https://reviews.freebsd.org/D2559)
as well as privately and via emails to freebsd-arch@.  The git history
with specific attributes is available at https://github.com/erikarn/freebsd/
in the NUMA branch (https://github.com/erikarn/freebsd/compare/local/adrian_numa_policy).

This has been reviewed by a number of people (stas, rpaulo, kib, ngie,
wblock) but not achieved a clear consensus.  My hope is that with further
exposure and testing more functionality can be implemented and evaluated.

Notes:

* The VM doesn't handle unbalanced domains very well, and if you have an overly
  unbalanced memory setup whilst under high memory pressure, VM page allocation
  may fail leading to a kernel panic.  This was a problem in the past, but it's
  much more easily triggered now with these tools.

* This work only controls the path through vm_phys; it doesn't yet strongly/predictably
  affect contigmalloc, KVA placement, UMA, etc.  So, driver placement of memory
  isn't really guaranteed in any way.  That's next on my plate.

Sponsored by: Norse Corp, Inc.; Dell

Since sh(1) now supports mulitbyte (only UTF-8) clarify the related BUGS
section in wordexp(3) manual page

Discussed with: jilles

sh(1): libedit has supported multibyte encodings for a while.

Do not allow creation of the dirty buffers for the dead buffer
objects, i.e. for buffer objects which vnode was reclaimed.  Buffer
cache cannot write such buffers.  Return the error and discard the
buffer immediately on write attempt.

BO_DIRTY now always set during vnode reclamation, since it is used not
only for the INVARIANTS checks.  Do allow placement of the clean
buffers on dead bufobj list, otherwise filesystems cannot use bufcache
at all after the devvp reclaim.

Reported and tested by: trasz
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks

some additional improvements to the documentation...

Sponsored by: Netflix, Inc.

Complete the move that was started w/ r263218.. For some reason I
didn't delete the files, so that means we need to bring the changes in
r282726 to the correct files..

make tinderbox completed with this patch...

Spoil even can happen for some time now even on providers opened exclusively
(on the media change event). Update GELI to handle that situation.

PR: 201185
Submitted by: Matthew D. Fuller

assorted algorithmic fixes from Paolo Valente (one of my qfq coauthors):
- use 1ULL to avoid shift truncations
- recompute the sum of weight dynamically to provide better fairness
- fix an erroneous constant in the computation of the slot
- preserve timestamp correctness when the old timestamp is stale.

one more warning suppression when compiling the test code in userspace.

add code to compute fairness indexes;
cleanups to remove compile warnings.

staticize functions only used in netmap.c
(detected by jenkins run with gcc 4.9)

Update documentation on the use of netmap_priv_d,
rename the refcount and use the same structure in
FreeBSD and linux

No functional changes.

Add missing const keyword to kern_sigaction()'s 'act' parameter.

This structure is not modified by the function. Also add const to
sigact_flag_test(), as it is called by kern_sigaction().

fd: further cleanup of kern_dup

- make mode enum start from 0 so that the assertion covers all cases [1]
- rename prefix _CLOEXEC flag with _FLAG
- postpone fhold on the old file descriptor, which eliminates the need to fdrop
in error cases.
- fixup FDDUP_FCNTL check missed in the previous commit

This removes 'fp == oldfde->fde_file' assertion which had little value. kern_dup
only calls fd-related functions which cannot drop the lock or a whole lot of
races would be introduced.

Noted by: kib [1]

fd: split kern_dup flags argument into actual flags and a mode

Tidy up the code inside to switch on the mode.

Convert between abridged (from FXSAVE) and unabridged (from FSAVE)
versions of the x87 tags. The conversion is naive, used abridged tag
is converted to valid unabridged, without additional checks for zero
and special values.

Noted by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week

Duplicate the copyright from the i386/i386/machdep.c into
i386/include/frame.h after a code was moved from machdep.c to frame.h
in r284925.

Use include guards style similar to other guards.

Noted by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week

Change the mb() use in the sched_ult tdq_notify() and sched_idletd()
to more C11-ish atomic_thread_fence_seq_cst().

Note that on PowerPC, which currently uses lwsync for mb(), the change
actually fixes the missed store/load barrier, intended by r271604 [*].

Reviewed by: alc
Noted by: alc [*]
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks

Add support for makecontext. This supports up to 8 arguments as this
simplifies the code, these can be passed in registers.

Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation

add netmap dependency when compiled as a module

Let listen() return EDESTADDRREQ when not bound.

We currently return EINVAL when calling listen() on a UNIX socket that
has not been bound to a pathname. If my interpretation of POSIX is
correct, we should return EDESTADDRREQ: "The socket is not bound to a
local address, and the protocol does not support listening on an unbound
socket."

Return EDESTADDRREQ instead when not bound and not connected.

Differential Revision: https://reviews.freebsd.org/D3038
Reviewed by: gnn, network

Sync netmap sources with the version in our private tree.
This commit contains large contributions from Giuseppe Lettieri and
Stefano Garzarella, is partly supported by grants from Verisign and Cisco,
and brings in the following:

- fix zerocopy monitor ports and introduce copying monitor ports
  (the latter are lower performance but give access to all traffic
  in parallel with the application)

- exclusive open mode, useful to implement solutions that recover
  from crashes of the main netmap client (suggested by Patrick Kelsey)

- revised memory allocator in preparation for the 'passthrough mode'
  (ptnetmap) recently presented at bsdcan. ptnetmap is described in
        S. Garzarella, G. Lettieri, L. Rizzo;
        Virtual device passthrough for high speed VM networking,
        ACM/IEEE ANCS 2015, Oakland (CA) May 2015
        http://info.iet.unipi.it/~luigi/research.html

- fix rx CRC handing on ixl

- add module dependencies for netmap when building drivers as modules

- minor simplifications to device-specific routines (*txsync, *rxsync)

- general code cleanup (remove unused variables, introduce macros
  to access rings and remove duplicate code,

Applications do not need to be recompiled, unless of course
they want to use the new features (monitors and exclusive open).

Those willing to try this code on stable/10 can just update the
sys/dev/netmap/*, sys/net/netmap* with the version in HEAD
and apply the small patches to individual device drivers.

MFC after: 1 month
Sponsored by: (partly) Verisign, Cisco

rev.284898 removed _SHLIBDIRPREFIX so we need to reconstruct its value
to properly locate libraries created in the buildworld phase.

Summary: Fix LINT build. The names of the new AES modes were not
correctly used under the REGRESSION kernel option.

Fix swapped copyin(9) arguments in cxgb's iwch_arm_cq() function.
Detected by clang 3.7.0 with the warning:

sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_provider.c:309:18: error: variable
'rptr' is uninitialized when used here [-Werror,-Wuninitialized]
chp->cq.rptr = rptr;
^~~~

MFC after: 1 week

Rename zfs nvpair files to not colidate with our nvlist.

PR: 201356
Approved by: pjd (mentor)

Remove checks for __ARM_EABI__, we only build for EABI now.

Sponsored by: ABT Systems Ltd

Add support for __aeabi_memclr4, clang 3.7 calls it.

Sponsored by: ABT Systems Ltd

Add support for AES modes to IPSec. These modes work both in software only
mode and with hardware support on systems that have AESNI instructions.

Differential Revision: D2936
Reviewed by: jmg, eri, cognet
Sponsored by: Rubicon Communications (Netgate)

Clear the carry bit on the saved program state register when asked to
clear the return value, it's used to indicate an error.

Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation

Document r285329, OpenSSL update to 1.0.1p.

Sponsored by: The FreeBSD Foundation

vfs: cosmetic changes to namei and namei_handle_root

- don't initialize cnp during declaration
- don't test error/!error, compare to 0 instead

Merge OpenSSL 1.0.1p.

Import OpenSSL 1.0.1p.

vfs: simplify error handling in namei

The logic is reorganised so that there is one exit point prior to the
lookup loop. This is an intermediate step to making audit logging
functions use found vnode instead of translating ni_dirfd on their own.

ni_startdir validation is removed. The only in-tree consumer is nfs
which already makes sure it is a directory.

Reviewed by: kib

Correct issue presented in r285051,
apparently neither clang nor gcc complain about this.
But clang intis the var to NULL correctly while gcc on at least mips does not.
Correct the undefined behavior by initializing the variable properly.

PR: 201371
Differential Revision: https://reviews.freebsd.org/D3036
Reviewed by: gnn
Approved by: gnn(mentor)

increase buffer size to significantly increase performance...

see:
https://docs.freebsd.org/cgi/mid.cgi?20150513080342.GE37063@funkthat.com

for benchmarks...

Add implementations for some of the CloudABI file descriptor system calls.

All of the CloudABI system calls that operate on file descriptors of an
arbitrary type are prefixed with fd_. This change adds wrappers for
most of these system calls around their FreeBSD equivalents.

The dup2() system call present on CloudABI deviates from POSIX, in the
sense that it can only be used to replace existing file descriptor. It
cannot be used to create new ones. The reason for this is that this is
inherently thread-unsafe. Furthermore, there is no need on CloudABI to
use fixed file descriptor numbers. File descriptors 0, 1 and 2 have no
special meaning.

This change exposes the kern_dup() through <sys/syscallsubr.h> and puts
the FDDUP_* flags in <sys/filedesc.h>. It then adds a new flag,
FDDUP_MUSTREPLACE to force that file descriptors are replaced -- not
allocated.

Differential Revision: https://reviews.freebsd.org/D3035
Reviewed by: mjg

fd: prepare do_dup for being exported

- rename it to kern_dup.
- prefix flags with FD
- assert that correct flags were passed

vfs: avoid spurious vref/vrele for absolute lookups

namei used to vref fd_cdir, which was immediatley vrele'd on entry to
the loop.

Check for absolute lookup and vref the right vnode the first time.

Reviewed by: kib

vfs: plug a use-after-free of fd_rdir in namei

fd_rdir vnode was stored in ni_rootdir without refing it in any way,
after which the filedsc lock was being dropped.

The vnode could have been freed by mountcheckdirs or another thread doing
chroot.

VREF the vnode while the lock is held.

Reviewed by: kib
MFC after: 1 week

Do not try to set password on group if the group is added as a consequence of
of creating a user (regression from r285136)

Reported by: Fabian Keil <fk@fabiankeil.de>

Add support for SMP. This uses the FDT data to find the CPUs to start on,
and psci to start them. I expect ACPI support to be added later.

This has been tested on qemu with 2 cpus as that is the current value of
MAXCPUS. This is expected to be increased in the future as FreeBSD has
been tested on 48 cores on the Cavium ThunderX hardware.

Partially based on a patch from Robin Randhawa from ARM.

Approved by: ABT Systems Ltd
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3024

Add logging of synchronous exceptions.

Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation

Add the definition of the shareable bits in the pagetables

Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation

Clean up the types used in <machine/ucontext.h> on arm64. As some ports
include this file without first including the headers needed for uint32_t
and the like use the __foo type.

Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation

Don't clobber td->td_retval[0] in proc_reap().

While writing tests for CloudABI, I noticed that close() on process
descriptors returns the process ID of the child process. This is
interesting, as close() is only allowed to return 0 or -1. It turns out
that we clobber td->td_retval[0] in proc_reap(), so that wait*()
properly returns the process ID.

Change proc_reap() to leave td->td_retval[0] alone. Set the return value
in kern_wait6() instead, by keeping track of the PID before we
(potentially) reap the process.

Differential Revision: https://reviews.freebsd.org/D3032
Reviewed by: kib

Rework CPU identification on ARM64

This commit reworks the code responsible for identification of
the CPUs during runtime.
It is necessary to provide a way for workarounds and erratums
to be applied only for certain HW versions.

The copy of MIDR is now stored in pcpu to provide a fast and
convenient way for assambly code to read it (pcpu is used quite often
so there is a chance it's inside the cache).
The MIDR is also better way of identification than using user-friendly
cpu_desc structure, because it can be compiled into comparision of
single u32 with only one access to the memory - this is crucial
for some erratums which are called from performance-critical
places.

Changes in cpu_identify makes this function safe to be called
on non-boot CPUs.

New function CPU_MATCH was implemented which returns boolean
value based on mathing masked MIDR with chip identification.
Example of usage:

printf("is thunder: %d\n", CPU_MATCH(CPU_IMPL_MASK | CPU_PART_MASK,
        CPU_IMPL_CAVIUM, CPU_PART_THUNDER, 0, 0));
printf("is generic: %d\n", CPU_MATCH(CPU_IMPL_MASK | CPU_PART_MASK,
        CPU_IMPL_ARM, CPU_PART_FOUNDATION, 0, 0));

Reviewed by:   andrew
Obtained from: Semihalf
Sponsored by:  The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3030

Cover a race between doselwakeup() and selfdfree().  If doselwakeup()
loop finds the selfd entry and clears its sf_si pointer, which is
handled by selfdfree() in parallel, NULL sf_si makes selfdfree() free
the memory.  The result is the race and accesses to the freed memory.

Refcount the selfd ownership.  One reference is for the sf_link
linkage, which is unconditionally dereferenced by selfdfree().
Another reference is for sf_threads, both selfdfree() and
doselwakeup() race to deref it, the winner unlinks and than frees the
selfd entry.

Reported by: Larry Rosenman <ler@lerctr.org>
Tested by: Larry Rosenman <ler@lerctr.org>, pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks

Add forward declaration of struct thread.

This structure is used in some of the functions in this header, but we
don't depend on any header that pulls it i.

Generate CloudABI system call table with proper $FreeBSD$ tags.

Import the CloudABI datatypes and create a system call table.

CloudABI is a pure capability-based runtime environment for UNIX. It
works similar to Capsicum, except that processes already run in
capabilities mode on startup. All functionality that conflicts with this
model has been omitted, making it a compact binary interface that can be
supported by other operating systems without too much effort.

CloudABI is 'secure by default'; the idea is that it should be safe to
run arbitrary third-party binaries without requiring any explicit
hardware virtualization (Bhyve) or namespace virtualization (Jails). The
rights of an application are purely determined by the set of file
descriptors that you grant it on startup.

The datatypes and constants used by CloudABI's C library (cloudlibc) are
defined in separate files called syscalldefs_mi.h (pointer size
independent) and syscalldefs_md.h (pointer size dependent). We import
these files in sys/contrib/cloudabi and wrap around them in
cloudabi*_syscalldefs.h.

We then add stubs for all of the system calls in sys/compat/cloudabi or
sys/compat/cloudabi64, depending on whether the system call depends on
the pointer size. We only have nine system calls that depend on the
pointer size. If we ever want to support 32-bit binaries, we can simply
add sys/compat/cloudabi32 and implement these nine system calls again.

The next step is to send in code reviews for the individual system call
implementations, but also add a sysentvec, to allow CloudABI executabled
to be started through execve().

More information about CloudABI:
- GitHub: https://github.com/NuxiNL/cloudlibc
- Talk at BSDCan: https://www.youtube.com/watch?v=SVdF84x1EdA

Differential Revision: https://reviews.freebsd.org/D2848
Reviewed by: emaste, brooks
Obtained from: https://github.com/NuxiNL/freebsd

MFV r285292:

Merge upstream fix to eliminate build-breaking gcc warnings of no
importance.

commit: cab33b7a0acba7d2268a23c4383be6167106e549

Update ND_TTEST2 to fix issue 443

Add IS_NOT_NEGATIVE macro.
Avoid these warnings:
- comparison of unsigned expression >= 0 is always true [-Wtype-limits],
- comparison is always true due to limited range of data type [-Wtype-limits].

Reviewed by: adrian
Approved by: jmallett (mentor)
MFC after: 1 month

upon further examination, it turns out that _unregister_all already
provides the guarantee that no threads will be in the _newsession code..
This is provided by the CRYPTODRIVER lock... This makes the pause
unneeded...

yet more documentation improvements... Many changes were made to the
OCF w/o documentation...

Document the new (8+ year old) device_t way of handling things, that
_unregister_all will leave no threads in newsession, the _SYNC flag,
the requirement that a flag be specified...

Other minor changes like breaking up a wall of text into paragraphs...

Fix typo which breaks build of manpages when WITHOUT_MANCOMPRESS is set

PR: 201153
Reported by: Andriy Voskoboinyk <s3erios@gmail.com>

seq: use seq_consistent_nomb in seq_consistent

Constify seqp argument for seq_consistent_nomb.

No functional changes.

Style cleanups after r285270

There should be no semicolons in added macro definitions.
Define empty macro as "do {} while (0)".

Pointed out by: jmg

Now that aesni won't reuse fpu contexts (D3016), add seatbelts to the
fpu code to prevent other reuse of the contexts in the future...

Differential Revision: https://reviews.freebsd.org/D3015
Reviewed by: kib, gnn

address an issue where consumers, like IPsec, can reuse the same
session in multiple threads w/o locking..  There was a single fpu
context shared per session, if multiple threads were using the session,
and both migrated away, they could corrupt each other's fpu context...

This patch adds a per cpu context and a lock to protect it...

It also tries to better address unloading of the aesni module...
The pause will be removed once the OpenCrypto Framework provides a
better method for draining callers into _newsession...

I first discovered the fpu context sharing issue w/ a flood ping over
an IPsec tunnel between two bhyve machines...  The patch in D3015
was used to verify that this fix does fix the issue...

Reviewed by: gnn, kib (both earlier versions)
Differential Revision:        https://reviews.freebsd.org/D3016

Address review.

Differential Revision: https://reviews.freebsd.org/D2924

Reimplement the ordering requirements for the timehands updates, and
for timehands consumers, by using fences.

Ensure that the timehands->th_generation reset to zero is visible
before the data update is visible [*]. tc_setget() allowed data update
writes to become visible before generation (but not on TSO
architectures).

Remove tc_setgen(), tc_getgen() helpers, use atomics inline [**].

Noted by: alc [*]
Requested by: bde [**]
Reviewed by: alc, bde
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks

Use atomic_fence_fence_rel() to ensure ordering in the
seq_write_begin(), instead of the load_rmb/rbm_load functions. The
update does not need to be atomic due to the write lock owned.

Similarly, in seq_write_end(), update of *seqp needs not be atomic.
Only store must be atomic with release.

For seq_read(), the natural operation is the load acquire of the
sequence value, express this directly with atomic_load_acq_int()
instead of using custom partial fence implementation
atomic_load_rmb_int().

In seq_consistent, use atomic_thread_fence_acq() which provides the
desired semantic of ordering reads before fence before the re-reading
of *seqp, instead of custom atomic_rmb_load_int().

Reviewed by: alc, bde
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks

only enable immintrin when clang is used. The base gcc does not support it.

Reviewed by: delphij

Add the atomic_thread_fence() family of functions with intent to
provide a semantic defined by the C11 fences with corresponding
memory_order.

atomic_thread_fence_acq() gives r | r, w, where r and w are read and
write accesses, and | denotes the fence itself.

atomic_thread_fence_rel() is r, w | w.

atomic_thread_fence_acq_rel() is the combination of the acquire and
release in single operation. Note that reads after the acq+rel fence
could be made visible before writes preceeding the fence.

atomic_thread_fence_seq_cst() orders all accesses before/after the
fence, and the fence itself is globally ordered against other
sequentially consistent atomic operations.

Reviewed by: alc
Discussed with: bde
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks

The intention of r254304 was to scan the active queue continuously.
However, I've observed the active queue scan stopping when there are
frequent free page shortages and the inactive queue is steadily refilled
by other mechanisms, such as the sequential access heuristic in vm_fault()
or madvise(2). To remedy this problem, record the time of the last active
queue scan, and always scan a number of pages proportional to the time
since the last scan, regardless of whether that last scan was a
timeout-triggered ("pass == 0") or free-page-shortage-triggered ("pass >
0") scan.

Also, on a timeout-triggered scan, allow a full scan of the active queue
when the system is short of inactive pages.

Reviewed by: kib
MFC after: 6 weeks
Sponsored by: EMC / Isilon Storage Division

add an extra tty for picobsd builds

trap some errors when building picobsd