melifaro [Fri, 28 Aug 2020 21:59:10 +0000 (21:59 +0000)]
Further split nhop creation and rtable operations.
As nexthops are immutable, some operations such as route attribute changes
require nexthop fetching, forking, modification and route switching.
These operations are not atomic, so they may need to be retried multiple
times in presence of multiple speakers changing the same route.
This change introduces "synchronisation" primitive: route_update_conditional(),
simplifying logic for route changes and upcoming multipath operations.
vmaffione [Fri, 28 Aug 2020 20:03:54 +0000 (20:03 +0000)]
lib: add libnetmap
This changeset introduces the new libnetmap library for writing
netmap applications.
Before libnetmap, applications could either use the kernel API
directly (e.g. NIOCREGIF/NIOCCTRL) or the simple header-only-library
netmap_user.h (e.g. nm_open(), nm_close(), nm_mmap() etc.)
The new library offers more functionalities than netmap_user.h:
- Support for complex netmap options, such as external memory
allocators or per-buffer offsets. This opens the way to future
extensions.
- More flexibility in the netmap port bind options, such as
non-numeric names for pipes, or the ability to specify the netmap
allocator that must be used for a given port.
- Automatic tracking of the netmap memory regions in use across the
open ports.
At the moment there is no man page, but the libnetmap.h header file
has in-depth documentation.
vangyzen [Fri, 28 Aug 2020 19:50:40 +0000 (19:50 +0000)]
memstat_kvm_uma: fix reading of uma_zone_domain structures
Coverity flagged the scaling by sizeof(uzd). That is the type
of the pointer, so the scaling was already done by pointer arithmetic.
However, this was also passing a stack frame pointer to kvm_read,
so it was doubly wrong.
Move ZDOM_GET into the !_KERNEL section and use it in libmemstat.
manu [Fri, 28 Aug 2020 18:25:45 +0000 (18:25 +0000)]
arm: allwinner: clk: Add printfs when we cannot set the correct freq
For some unknown reason this seems to fix this function when we printf
the best variable. This isn't a delay problem as doing a printf without
it doesn't solve this problem.
This is way above my pay grade so add some printf that shouldn't be printed
in 99% of the case anyway.
Fix booting on most Allwinner boards as the mmc IP uses a NM clock.
Reported by: Alexander Mishin <mishin@mh.net.ru>
MFC after: 3 days
X-MFC-With: 363887
imp [Fri, 28 Aug 2020 17:55:54 +0000 (17:55 +0000)]
Treat the boot loader as the same as the kernel for what's visible
The boot loader will be growing some (limited) support for some kernel
interfaces for some of the timekeeping routines to support zstd code.
Allow the declarations for them to be visible when compiling for the
boot loader, rather than treating it like a user-space environment
(which stand.h already provides to a limited degree).
imp [Fri, 28 Aug 2020 17:49:56 +0000 (17:49 +0000)]
Allow the pseudo-errnos to be returned as well in boot loader
Expose the pseudo-errno values in _STANDALONE is defined so that code
in the boot loader can make use of them. Nothing uses them today, but
the zstd support that's coming will need them.
imp [Fri, 28 Aug 2020 17:36:14 +0000 (17:36 +0000)]
Create CFLAGS_EARLY.file for boot loader.
Some external code requires a specific set of include paths to work
properly since it emulates the typical environment the code is used
in. Enable this by creating a CFLAGS_EARLY.file variable that can be
used to build this stack. Otherwise the include stack we build for
stand programs may get in the way. Code that uses this feature has to
tolerate the normal stack of inclues being last on the list (and
presumably unused), though.
Generally, it it should only be used for the specific include
directories. Defines and that sort of thing should be done in the
normal CFLAGS variable. There is a global CFLAGS_EARY hook as well for
everything in a Makefile.
imp [Fri, 28 Aug 2020 16:40:33 +0000 (16:40 +0000)]
Remove splclock(). It's not useful to keep.
splclock is used in one driver (spkr) to control access to
timer_spkr_* routines. However, nothing else does. So it shows no
useful locking info to someone that would want to lock spkr.
NOTE: I think there's races with timer_spkr_{acquire,release} since
there's no interlock in those routines, despite there being a spin
lock to protect the clock. Current other users appear to use no extra
locking protocol, though they themselves appear to be at least
attempting to make sure that only a single thread calls these
routines. I suspect the right answer is to update these routines to
take/release the clock spin lock since they are short and to the
point, but that's beyond the scope of this commit.
jilles [Fri, 28 Aug 2020 15:35:45 +0000 (15:35 +0000)]
sh: Keep ignored SIGINT/SIGQUIT after set in a background job
If job control is not enabled, a background job (... &) ignores SIGINT and
SIGQUIT, but this can be reverted using the trap builtin in the same shell
environment.
Using the set builtin to change options would also revert SIGINT and SIGQUIT
to their previous dispositions.
This broke due to r317298. Calling setsignal() reverts the effect of
ignoresig().
imp [Fri, 28 Aug 2020 15:09:43 +0000 (15:09 +0000)]
remove splbio and splcam
splbio and splcan have been completely removed from the tree. We can
now remove their definitions here. They've been nops for a long time
and were only preserved to give hints on how to lock drivers. All
drivers have been deleted or converted, so they can be deleted now.
rmacklem [Thu, 27 Aug 2020 23:57:30 +0000 (23:57 +0000)]
Add flags to enable NFS over TLS to the NFS client and server.
An Internet Draft titled "Towards Remote Procedure Call Encryption By Default"
(soon to be an RFC I think) describes how Sun RPC is to use TLS with NFS
as a specific application case.
Various commits prepared the NFS code to use KERN_TLS, mainly enabling use
of ext_pgs mbufs for large RPC messages.
r364475 added TLS support to the kernel RPC.
This commit (which is the final one for kernel changes required to do
NFS over TLS) adds support for three export flags:
MNT_EXTLS - Requires a TLS connection.
MNT_EXTLSCERT - Requires a TLS connection where the client presents a valid
X.509 certificate during TLS handshake.
MNT_EXTLSCERTUSER - Requires a TLS connection where the client presents a
valid X.509 certificate with "user@domain" in the otherName
field of the SubjectAltName during TLS handshake.
Without these export options, clients are permitted, but not required, to
use TLS.
For the client, a new nmount(2) option called "tls" makes the client do
a STARTTLS Null RPC and TLS handshake for all TCP connections used for the
mount. The CLSET_TLS client control option is used to indicate to the kernel RPC
that this should be done.
Unless the above export flags or "tls" option is used, semantics should
not change for the NFS client nor server.
For NFS over TLS to work, the userspace daemons rpctlscd(8) { for client }
or rpctlssd(8) daemon { for server } must be running.
markj [Thu, 27 Aug 2020 17:36:06 +0000 (17:36 +0000)]
Fix writing of the final block of encrypted, compressed kernel dumps.
Previously any residual data in the final block of a compressed kernel
dump would be written unencrypted. Note, such a configuration already
does not work properly when using AES-CBC since the compressed data is
typically not a multiple of the AES block length in size and EKCD does
not implement any padding scheme. However, EKCD more recently gained
support for using the ChaCha20 cipher, which being a stream cipher does
not have this problem.
jamie [Thu, 27 Aug 2020 17:04:55 +0000 (17:04 +0000)]
Disregard jails in jail.conf that have bad parameters (parameter/variable
clash, or redefining name/jid). The current behvaior, of merely warning
and moving on, can lead to unexpected behavior when a jail is created
without the offending parameter defined at all.
cy [Thu, 27 Aug 2020 14:33:46 +0000 (14:33 +0000)]
/etc/zfs/zpool.cache is the preferred (and new) location of zpool.cache.
Check for it first. Only use /boot/zfs/zpool.cache if the /etc/zfs
version is not found and good.
manu [Thu, 27 Aug 2020 08:08:49 +0000 (08:08 +0000)]
arm: ti: Fix Beaglebone black MMC after DTS update
After DTS sync with Linux kernel 5.8 this patch was included:
"ARM: dts: Move am33xx and am43xx mmc nodes to sdhci-omap driver"
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/arm/boot/dts/am33xx-l4.dtsi?h=v5.9-rc2&id=0b4edf111870b83ea77b1d7e16b8ceac29f9f388
Current will not load any driver for MMC and not mount the rootfs.
Simple patch add "ti,am335-sdhci" to compability strings in ti_sdhci.c
imp [Thu, 27 Aug 2020 05:11:15 +0000 (05:11 +0000)]
Implement FLUSHO
Turn FLUSHO on/off with ^O (or whatever VDISCARD is). Honor that to
throw away output quickly. This tries to remain true to 4.4BSD
behavior (since that was the origin of this feature), with any
corrections NetBSD has done. Since the implemenations are a little
different, though, some edge conditions may be handled differently.
jamie [Thu, 27 Aug 2020 00:17:17 +0000 (00:17 +0000)]
Don't allow jail.conf variables to have the same names as jail parameters.
It was already not allowed in many cases, but crashed instead of giving an
error.
rmacklem [Wed, 26 Aug 2020 21:49:43 +0000 (21:49 +0000)]
Fix a "v_seqc_users == 0 not met" panic when VFS_STATFS() fails during mount.
r363210 introduced v_seqc_users to the vnodes. This change requires
a vn_seqc_write_end() to match the vn_seqc_write_begin() in
vfs_cache_root_clear().
mjg@ provided this patch which seems to fix the panic.
Tested for an NFS mount where the VFS_STATFS() call will fail.
jhb [Wed, 26 Aug 2020 21:17:18 +0000 (21:17 +0000)]
Simplify compat shims for /dev/crypto.
- Make session handling always use the CIOGSESSION2 structure.
CIOGSESSION requests use a thunk similar to COMPAT_FREEBSD32 session
requests. This permits the ioctl handler to use the 'crid' field
unconditionally.
- Move COMPAT_FREEBSD32 handling out of the main ioctl handler body
and instead do conversions in/out of thunk structures in dedicated
blocks at the start and end of the ioctl function.
imp [Wed, 26 Aug 2020 19:32:28 +0000 (19:32 +0000)]
Each entry in UPDATING needs a date
It's rare for there to be two updating entries on the same day (once a
decade or so), but we have that here. Add the date to the second one
since devd and zfs are unrelated.
cperciva [Wed, 26 Aug 2020 19:26:48 +0000 (19:26 +0000)]
Add -w option to lockf(1).
By default, lockf(1) opens its lock file O_RDONLY|O_EXLOCK. On NFS, if the
file already exists, this is split into opening the file read-only and then
requesting an exclusive lock -- and the second step fails because NFS does
not permit exclusive locking on files which are opened read-only.
The new -w option changes the open flags to O_WRONLY|O_EXLOCK, allowing it
to work on NFS -- at the cost of not working if the file cannot be opened
for writing.
(Whether the traditional BSD behaviour of allowing exclusive locks to be
obtained on a file which cannot be opened for writing is a good idea is
perhaps questionable since it may allow less-privileged users to perform
a local denial of service; however this behaviour has been present for a
long time and changing it now seems like it would cause problems.)
markj [Wed, 26 Aug 2020 14:31:48 +0000 (14:31 +0000)]
Use a large kmem arena import size on NUMA systems.
This helps minimize internal fragmentation that occurs when 2MB imports
are interleaved across NUMA domains. Virtually all KVA allocations on
direct map platforms consume more than one page, so the fragmentation
manifests as runs of 511 4KB page mappings in the kernel.
Reviewed by: alc, kib
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26050
markj [Wed, 26 Aug 2020 14:31:35 +0000 (14:31 +0000)]
vmem: Avoid allocating span tags when segments are never released.
vmem uses span tags to delimit imported segments, so that they can be
released if the segment becomes free in the future. However, the
per-domain kernel KVA arenas never release resources, so the span tags
between imported ranges are unused when the ranges are contiguous.
Furthermore, such span tags prevent coalescing of free segments across
KVA_QUANTUM boundaries, resulting in internal fragmentation which
inhibits superpage promotion in the kernel map.
Stop allocating span tags in arenas that never release resources. This
saves a small amount of memory and allows free segements to coalesce
across import boundaries. This manifests as improved kernel superpage
usage during poudriere runs, which also helps to reduce physical memory
fragmentation by reducing the number of broken partially populated
reservations.
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24548
cy [Wed, 26 Aug 2020 13:13:57 +0000 (13:13 +0000)]
As of r364746 (OpenZFS import) existing ZPOOLs are not imported
prior to zvol and mountcritlocal resulting in ZVOLs (swap and
virtual machine UFS filesystems) being unavailable, leading to
boot failures.
We move the zpool import from zfs to a new zpool script, with the
-N option to avoid mounting datasets while making the ZPOOL's
datasets available for "legacy" mount (mountpoint=legacy) and ZVOLs
available for subsequent use for swap (in the zvol rc sript) or
for UFS or other filesystems in fstab(5), mounted by mountcritlocal.
arichardson [Wed, 26 Aug 2020 10:21:38 +0000 (10:21 +0000)]
Avoid recomputing COMPILER_/LINKER_ variables when set explicitly
I noticed that when we build libraries for a different ABI (in CheriBSD) we
were calling ${XCC}/${LD} --version for every directory. It turns out that
this was caused by bsd.compat.mk explicitly setting (X_)COMPILER variables
for that build stage and this stops the _can_export logic from working.
To fix this, we change the check to only set _can_export=no if the variable
is set and it is set to a different value than the cached value.
This noticeably speeds up the tree walk while building compat libraries.
During an upstream amd64 buildworld this also removes 8 --version calls.
arichardson [Wed, 26 Aug 2020 09:19:49 +0000 (09:19 +0000)]
Move libsqlite3 to the top of the SUBDIR list
In parallel builds, this should allow sqlite to start building earlier and
increase parallelism when building lib/. Looking at htop output during
buildworld/tinderbox, there are long phases where only one CPU is active
optimizing the massive sqlite3.c file since the build of libsqlite3 is
started quite late.
arichardson [Wed, 26 Aug 2020 09:19:44 +0000 (09:19 +0000)]
Fix builds that set LD=ld.lld after r364761
When using relative paths for the linker we have to transform the name
since clang does not like -fuse-ld=ld.lld and instead requires -fuse-ld=lld
(the same also applies for ld.bfd).
emaste [Wed, 26 Aug 2020 04:01:06 +0000 (04:01 +0000)]
Apply a big hammer for stale pre-OpenZFS files
-DNO_CLEAN builds have had trouble across the OpenZFS import. It's not
worth the effort to try to address this with any granularity; instead,
just trigger on a .depend file indicating a tree from before the import,
and remove the whole cddl object tree.
asomers [Wed, 26 Aug 2020 02:44:35 +0000 (02:44 +0000)]
geli: use unmapped I/O
Use unmapped I/O for geli. Unlike most geom providers, geli needs to
manipulate data on every read or write. Previously it would always map bios.
On my 16-core, dual socket server using geli atop md(4) devices, with 512B
sectors, this change increases geli IOPs by about 3x.
Note that geli still can't use unmapped I/O when data integrity verification
is enabled (but it could, with a little more work). And it can't use
unmapped I/O in combination with ZFS, because ZFS uses mapped bios.
asomers [Wed, 26 Aug 2020 02:37:42 +0000 (02:37 +0000)]
crypto(9): add CRYPTO_BUF_VMPAGE
crypto(9) functions can now be used on buffers composed of an array of
vm_page_t structures, such as those stored in an unmapped struct bio. It
requires the running to kernel to support the direct memory map, so not all
architectures can use it.
scottph [Wed, 26 Aug 2020 02:05:58 +0000 (02:05 +0000)]
efibootmgr: Add option to request booting to the firmware user interface
The OsIndications UEFI variable can request the firware to stop at
its UI instead of continuing with boot. Add flags for setting and
clearing this request.
scottph [Wed, 26 Aug 2020 02:04:04 +0000 (02:04 +0000)]
arm64: Make local stores observable before sending IPIs
Add a synchronizing instruction to flush and wait until the local
CPU's writes are observable to other CPUs before sending IPIs.
This fixes an issue where recipient CPUs doing a rendezvous could
enter the rendezvous handling code before the initiator's writes
to the smp_rv_* variables were visible. This manifested as a
system hang, where a single CPU's increment of smp_rv_waiters[0]
actually happened "before" the initiator's zeroing of that field,
so all CPUs were stuck with the field appearing to be at
ncpus - 1.
mmacy [Tue, 25 Aug 2020 23:26:52 +0000 (23:26 +0000)]
ZFS: whitelist zstd and encryption in the loader
Please note that neither zstd nor encryption is
supported by the loader at this instant. This
change makes it safe to use those features in
one's root pool, but not in one's root dataset.
cem [Tue, 25 Aug 2020 21:36:56 +0000 (21:36 +0000)]
vm_pageout: Scale worker threads with CPUs
Autoscale vm_pageout worker threads from r364129 with CPU count. The
default is arbitrarily chosen to be 16 CPUs per worker thread, but can
be adjusted with the vm.pageout_cpus_per_thread tunable.
There will never be less than 1 thread per populated NUMA domain, and
the previous arbitrary upper limit (at most ncpus/2 threads per NUMA
domain) is preserved.
Care is taken to gracefully handle asymmetric NUMA nodes, such as empty
node systems (e.g., AMD 2990WX) and systems with nodes of varying size
(e.g., some larger >20 core Intel Haswell/Broadwell Xeon).
dim [Tue, 25 Aug 2020 20:07:11 +0000 (20:07 +0000)]
After r364423, which ensures the callbacks that dl_iterate_phdr(3)
performs are protected by an exclusive lock, even for statically linked
programs, it is safe to re-enable libunwind's FrameHeaderCache, which I
temporarily disabled in r364263.
Meanwhile upstream has also used the _LIBUNWIND_USE_FRAME_HEADER_CACHE
for this purpose, so the only thing needed is to add this as a
compile-time command line flag.
While here, reformat the CFLAGS lines a little bit.
dim [Tue, 25 Aug 2020 19:57:11 +0000 (19:57 +0000)]
After r364753, there should be no need to suppress -Watomic-alignment
warnings anymore for compiler-rt's atomic.c. This occurred because the
IS_LOCK_FREE_8 macro was not correctly defined to 0 for mips, and this
caused the compiler to emit a runtime call to __atomic_is_lock_free(),
and that triggers the warning.