Jamie Gritton [Thu, 27 Aug 2020 17:04:55 +0000 (17:04 +0000)]
Disregard jails in jail.conf that have bad parameters (parameter/variable
clash, or redefining name/jid). The current behvaior, of merely warning
and moving on, can lead to unexpected behavior when a jail is created
without the offending parameter defined at all.
Cy Schubert [Thu, 27 Aug 2020 14:33:46 +0000 (14:33 +0000)]
/etc/zfs/zpool.cache is the preferred (and new) location of zpool.cache.
Check for it first. Only use /boot/zfs/zpool.cache if the /etc/zfs
version is not found and good.
Emmanuel Vadot [Thu, 27 Aug 2020 08:08:49 +0000 (08:08 +0000)]
arm: ti: Fix Beaglebone black MMC after DTS update
After DTS sync with Linux kernel 5.8 this patch was included:
"ARM: dts: Move am33xx and am43xx mmc nodes to sdhci-omap driver"
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/arm/boot/dts/am33xx-l4.dtsi?h=v5.9-rc2&id=0b4edf111870b83ea77b1d7e16b8ceac29f9f388
Current will not load any driver for MMC and not mount the rootfs.
Simple patch add "ti,am335-sdhci" to compability strings in ti_sdhci.c
Warner Losh [Thu, 27 Aug 2020 05:11:15 +0000 (05:11 +0000)]
Implement FLUSHO
Turn FLUSHO on/off with ^O (or whatever VDISCARD is). Honor that to
throw away output quickly. This tries to remain true to 4.4BSD
behavior (since that was the origin of this feature), with any
corrections NetBSD has done. Since the implemenations are a little
different, though, some edge conditions may be handled differently.
Jamie Gritton [Thu, 27 Aug 2020 00:17:17 +0000 (00:17 +0000)]
Don't allow jail.conf variables to have the same names as jail parameters.
It was already not allowed in many cases, but crashed instead of giving an
error.
Rick Macklem [Wed, 26 Aug 2020 21:49:43 +0000 (21:49 +0000)]
Fix a "v_seqc_users == 0 not met" panic when VFS_STATFS() fails during mount.
r363210 introduced v_seqc_users to the vnodes. This change requires
a vn_seqc_write_end() to match the vn_seqc_write_begin() in
vfs_cache_root_clear().
mjg@ provided this patch which seems to fix the panic.
Tested for an NFS mount where the VFS_STATFS() call will fail.
John Baldwin [Wed, 26 Aug 2020 21:17:18 +0000 (21:17 +0000)]
Simplify compat shims for /dev/crypto.
- Make session handling always use the CIOGSESSION2 structure.
CIOGSESSION requests use a thunk similar to COMPAT_FREEBSD32 session
requests. This permits the ioctl handler to use the 'crid' field
unconditionally.
- Move COMPAT_FREEBSD32 handling out of the main ioctl handler body
and instead do conversions in/out of thunk structures in dedicated
blocks at the start and end of the ioctl function.
Warner Losh [Wed, 26 Aug 2020 19:32:28 +0000 (19:32 +0000)]
Each entry in UPDATING needs a date
It's rare for there to be two updating entries on the same day (once a
decade or so), but we have that here. Add the date to the second one
since devd and zfs are unrelated.
Colin Percival [Wed, 26 Aug 2020 19:26:48 +0000 (19:26 +0000)]
Add -w option to lockf(1).
By default, lockf(1) opens its lock file O_RDONLY|O_EXLOCK. On NFS, if the
file already exists, this is split into opening the file read-only and then
requesting an exclusive lock -- and the second step fails because NFS does
not permit exclusive locking on files which are opened read-only.
The new -w option changes the open flags to O_WRONLY|O_EXLOCK, allowing it
to work on NFS -- at the cost of not working if the file cannot be opened
for writing.
(Whether the traditional BSD behaviour of allowing exclusive locks to be
obtained on a file which cannot be opened for writing is a good idea is
perhaps questionable since it may allow less-privileged users to perform
a local denial of service; however this behaviour has been present for a
long time and changing it now seems like it would cause problems.)
Mark Johnston [Wed, 26 Aug 2020 14:31:48 +0000 (14:31 +0000)]
Use a large kmem arena import size on NUMA systems.
This helps minimize internal fragmentation that occurs when 2MB imports
are interleaved across NUMA domains. Virtually all KVA allocations on
direct map platforms consume more than one page, so the fragmentation
manifests as runs of 511 4KB page mappings in the kernel.
Reviewed by: alc, kib
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26050
Mark Johnston [Wed, 26 Aug 2020 14:31:35 +0000 (14:31 +0000)]
vmem: Avoid allocating span tags when segments are never released.
vmem uses span tags to delimit imported segments, so that they can be
released if the segment becomes free in the future. However, the
per-domain kernel KVA arenas never release resources, so the span tags
between imported ranges are unused when the ranges are contiguous.
Furthermore, such span tags prevent coalescing of free segments across
KVA_QUANTUM boundaries, resulting in internal fragmentation which
inhibits superpage promotion in the kernel map.
Stop allocating span tags in arenas that never release resources. This
saves a small amount of memory and allows free segements to coalesce
across import boundaries. This manifests as improved kernel superpage
usage during poudriere runs, which also helps to reduce physical memory
fragmentation by reducing the number of broken partially populated
reservations.
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24548
Cy Schubert [Wed, 26 Aug 2020 13:13:57 +0000 (13:13 +0000)]
As of r364746 (OpenZFS import) existing ZPOOLs are not imported
prior to zvol and mountcritlocal resulting in ZVOLs (swap and
virtual machine UFS filesystems) being unavailable, leading to
boot failures.
We move the zpool import from zfs to a new zpool script, with the
-N option to avoid mounting datasets while making the ZPOOL's
datasets available for "legacy" mount (mountpoint=legacy) and ZVOLs
available for subsequent use for swap (in the zvol rc sript) or
for UFS or other filesystems in fstab(5), mounted by mountcritlocal.
Alex Richardson [Wed, 26 Aug 2020 10:21:38 +0000 (10:21 +0000)]
Avoid recomputing COMPILER_/LINKER_ variables when set explicitly
I noticed that when we build libraries for a different ABI (in CheriBSD) we
were calling ${XCC}/${LD} --version for every directory. It turns out that
this was caused by bsd.compat.mk explicitly setting (X_)COMPILER variables
for that build stage and this stops the _can_export logic from working.
To fix this, we change the check to only set _can_export=no if the variable
is set and it is set to a different value than the cached value.
This noticeably speeds up the tree walk while building compat libraries.
During an upstream amd64 buildworld this also removes 8 --version calls.
Alex Richardson [Wed, 26 Aug 2020 09:19:49 +0000 (09:19 +0000)]
Move libsqlite3 to the top of the SUBDIR list
In parallel builds, this should allow sqlite to start building earlier and
increase parallelism when building lib/. Looking at htop output during
buildworld/tinderbox, there are long phases where only one CPU is active
optimizing the massive sqlite3.c file since the build of libsqlite3 is
started quite late.
Alex Richardson [Wed, 26 Aug 2020 09:19:44 +0000 (09:19 +0000)]
Fix builds that set LD=ld.lld after r364761
When using relative paths for the linker we have to transform the name
since clang does not like -fuse-ld=ld.lld and instead requires -fuse-ld=lld
(the same also applies for ld.bfd).
Ed Maste [Wed, 26 Aug 2020 04:01:06 +0000 (04:01 +0000)]
Apply a big hammer for stale pre-OpenZFS files
-DNO_CLEAN builds have had trouble across the OpenZFS import. It's not
worth the effort to try to address this with any granularity; instead,
just trigger on a .depend file indicating a tree from before the import,
and remove the whole cddl object tree.
Alan Somers [Wed, 26 Aug 2020 02:44:35 +0000 (02:44 +0000)]
geli: use unmapped I/O
Use unmapped I/O for geli. Unlike most geom providers, geli needs to
manipulate data on every read or write. Previously it would always map bios.
On my 16-core, dual socket server using geli atop md(4) devices, with 512B
sectors, this change increases geli IOPs by about 3x.
Note that geli still can't use unmapped I/O when data integrity verification
is enabled (but it could, with a little more work). And it can't use
unmapped I/O in combination with ZFS, because ZFS uses mapped bios.
Alan Somers [Wed, 26 Aug 2020 02:37:42 +0000 (02:37 +0000)]
crypto(9): add CRYPTO_BUF_VMPAGE
crypto(9) functions can now be used on buffers composed of an array of
vm_page_t structures, such as those stored in an unmapped struct bio. It
requires the running to kernel to support the direct memory map, so not all
architectures can use it.
D Scott Phillips [Wed, 26 Aug 2020 02:05:58 +0000 (02:05 +0000)]
efibootmgr: Add option to request booting to the firmware user interface
The OsIndications UEFI variable can request the firware to stop at
its UI instead of continuing with boot. Add flags for setting and
clearing this request.
D Scott Phillips [Wed, 26 Aug 2020 02:04:04 +0000 (02:04 +0000)]
arm64: Make local stores observable before sending IPIs
Add a synchronizing instruction to flush and wait until the local
CPU's writes are observable to other CPUs before sending IPIs.
This fixes an issue where recipient CPUs doing a rendezvous could
enter the rendezvous handling code before the initiator's writes
to the smp_rv_* variables were visible. This manifested as a
system hang, where a single CPU's increment of smp_rv_waiters[0]
actually happened "before" the initiator's zeroing of that field,
so all CPUs were stuck with the field appearing to be at
ncpus - 1.
Matt Macy [Tue, 25 Aug 2020 23:26:52 +0000 (23:26 +0000)]
ZFS: whitelist zstd and encryption in the loader
Please note that neither zstd nor encryption is
supported by the loader at this instant. This
change makes it safe to use those features in
one's root pool, but not in one's root dataset.
Conrad Meyer [Tue, 25 Aug 2020 21:36:56 +0000 (21:36 +0000)]
vm_pageout: Scale worker threads with CPUs
Autoscale vm_pageout worker threads from r364129 with CPU count. The
default is arbitrarily chosen to be 16 CPUs per worker thread, but can
be adjusted with the vm.pageout_cpus_per_thread tunable.
There will never be less than 1 thread per populated NUMA domain, and
the previous arbitrary upper limit (at most ncpus/2 threads per NUMA
domain) is preserved.
Care is taken to gracefully handle asymmetric NUMA nodes, such as empty
node systems (e.g., AMD 2990WX) and systems with nodes of varying size
(e.g., some larger >20 core Intel Haswell/Broadwell Xeon).
Dimitry Andric [Tue, 25 Aug 2020 20:07:11 +0000 (20:07 +0000)]
After r364423, which ensures the callbacks that dl_iterate_phdr(3)
performs are protected by an exclusive lock, even for statically linked
programs, it is safe to re-enable libunwind's FrameHeaderCache, which I
temporarily disabled in r364263.
Meanwhile upstream has also used the _LIBUNWIND_USE_FRAME_HEADER_CACHE
for this purpose, so the only thing needed is to add this as a
compile-time command line flag.
While here, reformat the CFLAGS lines a little bit.
Dimitry Andric [Tue, 25 Aug 2020 19:57:11 +0000 (19:57 +0000)]
After r364753, there should be no need to suppress -Watomic-alignment
warnings anymore for compiler-rt's atomic.c. This occurred because the
IS_LOCK_FREE_8 macro was not correctly defined to 0 for mips, and this
caused the compiler to emit a runtime call to __atomic_is_lock_free(),
and that triggers the warning.
Brandon Bergren [Tue, 25 Aug 2020 19:04:54 +0000 (19:04 +0000)]
[PowerPC] More preemptive powerpcspe ZFS build fixes
I went through the merge and found the rest of the instances where
${MACHINE_ARCH} == "powerpc" was being used to detect 32-bit and adjusted
the rest of the instances to also check for powerpcspe.
Ryan Moeller [Tue, 25 Aug 2020 18:22:30 +0000 (18:22 +0000)]
Fix zstd in OpenZFS module with CPUTYPE?=<something with BMI>
The build breaks when something adds -march=<something with BMI> to the
compiler flags, for example CPUTYPE?=native. When the arch supports BMI,
__BMI__ is defined and zstd.c tries to include immintrin.h, which is not
present when building the kernel.
Disable experimental BMI intrinsics in zstd in the OpenZFS kernel module
by explicitly undefining __BMI__ for zstd.c.
A similar fix was needed for the original zstd import, done in r327738.
Reported by: Jakob Alvermark
Discussed with: mmacy
Sponsored by: iXsystems, Inc.
Kyle Evans [Tue, 25 Aug 2020 18:16:40 +0000 (18:16 +0000)]
libbe: lift the WARNS post-OpenZFS merge
sys/ccompile.h no longer uses #pragma ident, so we no longer need to worry
about unknown pragmas.
I fixed one WARNS issue in r363409 by annotating be_is_auto_snapshot_name's
lbh parameter __unused, then upstreamed the following changes to OpenZFS
that rode in with the merge:
- zfs_path_to_zhandle now takes a const char *path rather than a char *path,
since it won't be mutating the string it receives and I had no reason to
believe it will need to in the future. [OpenZFS PR #10605]
- Annotated some unused parameters on definitions inlined into headers as
such. [OpenZFS PR #10606]
Bjoern A. Zeeb [Tue, 25 Aug 2020 16:09:23 +0000 (16:09 +0000)]
rtsol(d): add script for "M bit"
While we do support the "O bit" running a script (usually to start a
dhcpv6 client) we have no options for setups which set the "M bit" for,
e.g., static address assignment as in EC2.
Duplicate most of the "O bit" logic to also start a script for the
"M bit" with the one difference: if the "M bit" is set we will not
start the script for the "O bit" as well (per RFC 4861, Section 4.2).
At initialization time, the netmap RX refill function used to
prepare the NIC RX ring with N-1 buffers rather than N (with
N equal to the number of descriptors in the NIC RX ring).
This is not how netmap is supposed to work, as it would keep
kring->nr_hwcur not in sync with the NIC "next index to refill"
(i.e., fl->ifl_pidx). Instead we prepare N buffers, although we
still publish (with isc_rxd_flush()) only the first N-1 buffers,
to avoid the NIC producer pointer to overrun the NIC consumer
pointer (for NICs where this is a real issue, e.g. Intel ones).
Mark Johnston [Tue, 25 Aug 2020 13:45:06 +0000 (13:45 +0000)]
Permit vm_page_wire() to be called on pages not belonging to an object.
For such pages ref_count is effectively a consumer-managed field, but
there is no harm in calling vm_page_wire() on them.
vm_page_unwire_noq() handles them as well. Relax the vm_page_wire()
assertions to permit this case which is triggered by some out-of-tree
code. [1]
Also guard a conditional assertion with INVARIANTS. Otherwise the
conditions are evaluated even though the result is unused. [2]
Reported by: bz, cem [1], kib [2]
Reviewed by: dougm, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26173
Alex Richardson [Tue, 25 Aug 2020 13:30:34 +0000 (13:30 +0000)]
Fix -Wundef warnings when building liblua
We need to define the LUA_FLOAT_INT64 macro even if we don't use it (copied
from stand/luaconf.h). While touching luaconf.h.dist also sync it with the
the 5.3.5 release version (matches the one in lib/liblua).
Alex Richardson [Tue, 25 Aug 2020 13:30:19 +0000 (13:30 +0000)]
Fix typo in r364325 that broke tinderbox with -DBUILD_WITH_STRICT_TMPPATH
${TARGET_ARCH} is empty here which results in empy MAKE_PARAMS being
passed to the buildkernel phase. This breaks the build when using the
strict TMPPATH since cc will not be included in $PATH.
Alex Richardson [Tue, 25 Aug 2020 13:30:03 +0000 (13:30 +0000)]
Pass -fuse-ld=/path/to/ld if ${LD} != "ld"
This is needed so that setting LD/XLD is not ignored when linking with $CC
instead of directly using $LD. Currently only clang accepts an absolute
path for -fuse-ld= (Clang 12+ will add a new --ld-path flag), so we now
warn when building with GCC and $LD != "ld" since that might result in the
wrong linker being used.
We have been setting XLD=/path/to/cheri/ld.lld in CheriBSD for a long time and
used a similar version of this patch to avoid linking with /usr/bin/ld.
This change is also required when building FreeBSD on an Ubuntu with Clang:
In that case we set XCC=/usr/lib/llvm-10/bin/clang and since
/usr/lib/llvm-10/bin/ does not contain a "ld" binary the build fails with
`clang: error: unable to execute command: Executable "ld" doesn't exist!`
unless we pass -fuse-ld=/usr/lib/llvm-10/bin/ld.lld.
This change passes -fuse-ld instead of copying ${XLD} to WOLRDTMP/bin/ld
since then we would have to ensure that this file does not exist while
building the bootstrap tools. The cross-linker might not be compatible with
the host linker (e.g. when building on macos: host-linker= Mach-O /usr/bin/ld,
cross-linker=LLVM ld.lld).
Alex Richardson [Tue, 25 Aug 2020 13:29:57 +0000 (13:29 +0000)]
Add necessary Makefile.inc1 infrastructure for building on non-FreeBSD
The most awkward bit in this patch is the bootstrapping of m4:
We can't simply use the host version of m4 since that is not compatible
with the flags passed by lex (at least on macOS, possibly also on Linux).
Therefore we need to bootstrap m4, but lex needs m4 to build and m4 also
depends on lex (which needs m4 to generate any files). To work around this
cyclic dependency we can build a bootstrap version of m4 (with pre-generated
files) then use that to build the real m4.
This patch also changes the xz/unxz/dd tools to always use the host version
since the version in the source tree cannot easily be bootstrapped on macOS
or Linux.
Alex Richardson [Tue, 25 Aug 2020 13:23:31 +0000 (13:23 +0000)]
Add missing FreeBSD functions to -legacy when building on macOS/Linux
In most cases this simply builds the file from lib/libc for missing
functions (e.g. strlcpy on Linux etc.). In cases where this is not possible
I've added an implementation to tools/build/cross-build.
The fgetln.c/fgetwln.c/closefrom.c compatibility code was obtained from
https://gitlab.freedesktop.org/libbsd/libbsd, but I'm not sure it makes
sense to import it into to contrib just for these three bootstrap files.
Alex Richardson [Tue, 25 Aug 2020 13:18:53 +0000 (13:18 +0000)]
Add Linux/macOS compatibility system headers to tools/build/cross-build
These headers are required in order to build the bootstrap tools on macOS
and Linux. A follow-up commit will add implementations of functions that
don't exist on those operating systems to -legacy when bootstrapping.
Michael Tuexen [Tue, 25 Aug 2020 09:42:03 +0000 (09:42 +0000)]
RFC 3465 defines a limit L used in TCP slow start for limiting the number
of acked bytes as described in Section 2.2 of that document.
This patch ensures that this limit is not also applied in congestion
avoidance. Applying this limit also in congestion avoidance can result in
using less bandwidth than allowed.
Reported by: l.tian.email@gmail.com
Reviewed by: rrs, rscheff
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D26120
Dimitry Andric [Tue, 25 Aug 2020 06:49:10 +0000 (06:49 +0000)]
Add atomic and bswap functions to libcompiler_rt
There have been several mentions on our mailing lists about missing
atomic functions in our system libraries (e.g. __atomic_load_8 and
friends), and recently I saw __bswapdi2 and __bswapsi2 mentioned too.
To address this, add implementations for the functions from compiler-rt
to the system compiler support libraries, e.g. libcompiler_rt.a and and
libgcc_s.so.
This also needs a small fixup in compiler-rt's atomic.c, to ensure that
32-bit mips can build correctly.
Bump __FreeBSD_version to make it easier for port maintainers to detect
when these functions were added.
Matt Macy [Tue, 25 Aug 2020 02:21:27 +0000 (02:21 +0000)]
Merge OpenZFS support in to HEAD.
The primary benefit is maintaining a completely shared
code base with the community allowing FreeBSD to receive
new features sooner and with less effort.
I would advise against doing 'zpool upgrade'
or creating indispensable pools using new
features until this change has had a month+
to soak.
Work on merging FreeBSD support in to what was
at the time "ZFS on Linux" began in August 2018.
I first publicly proposed transitioning FreeBSD
to (new) OpenZFS on December 18th, 2018. FreeBSD
support in OpenZFS was finally completed in December
2019. A CFT for downstreaming OpenZFS support in
to FreeBSD was first issued on July 8th. All issues
that were reported have been addressed or, for
a couple of less critical matters there are
pull requests in progress with OpenZFS. iXsystems
has tested and dogfooded extensively internally.
The TrueNAS 12 release is based on OpenZFS with
some additional features that have not yet made
it upstream.
Improvements include:
project quotas, encrypted datasets,
allocation classes, vectorized raidz,
vectorized checksums, various command line
improvements, zstd compression.
Thanks to those who have helped along the way:
Ryan Moeller, Allan Jude, Zack Welch, and many
others.
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D25872
Rick Macklem [Tue, 25 Aug 2020 00:58:14 +0000 (00:58 +0000)]
Fix hangs with processes stuck sleeping on btalloc on i386.
r358097 introduced a problem for i386, where kernel builds will intermittently
get hung, typically with many processes sleeping on "btalloc".
I know nothing about VM, but received assistance from rlibby@ and markj@.
rlibby@ stated the following:
It looks like the problem is that
for systems that do not have UMA_MD_SMALL_ALLOC, we do
uma_zone_set_allocf(vmem_bt_zone, vmem_bt_alloc);
but we haven't set an appropriate free function. This is probably why
UMA_ZONE_NOFREE was originally there. When NOFREE was removed, it was
appropriate for systems with uma_small_alloc.
So by default we get page_free as our free function. That calls
kmem_free, which calls vmem_free ... but we do our allocs with
vmem_xalloc. I'm not positive, but I think the problem is that in
effect we vmem_xalloc -> vmem_free, not vmem_xfree.
Three possible fixes:
1: The one you tested, but this is not best for systems with
uma_small_alloc.
2: Pass UMA_ZONE_NOFREE conditional on UMA_MD_SMALL_ALLOC.
3: Actually provide an appropriate vmem_bt_free function.
I think we should just do option 2 with a comment, it's simple and it's
what we used to do. I'm not sure how much benefit we would see from
option 3, but it's more work.
This patch implements #2. I haven't done a comment, since I don't know
what the problem is.
markj@ noted the following:
I think the suggested patch is ok, but not for the reason stated.
On platforms without a direct map the problem is:
to allocate btags we need a slab,
and to allocate a slab we need to map a page, and to map a page we need
to allocate btags.
We handle this recursion using a custom slab allocator which specifies
M_USE_RESERVE, allowing it to dip into a reserve of free btags.
Because the returned slab can be used to keep the reserve populated,
this ensures that there are always enough free btags available to
handle the recursion.
UMA_ZONE_NOFREE ensures that we never reclaim free slabs from the zone.
However, when it was removed, an apparent bug in UMA was exposed:
keg_drain() ignores the reservation set by uma_zone_reserve()
in vmem_startup().
So under memory pressure we reclaim the free btags that are needed to
break the recursion.
That's why adding _NOFREE back fixes the problem: it disables the
reclamation.
We could perhaps fix it more cleverly, by modifying keg_drain() to always
leave uk_reserve slabs available.
markj@'s initial patch failed testing, so committing this patch was agreed
upon as the interim solution.
Either rlibby@ or markj@ might choose to add a comment to it.
Niclas Zeising [Mon, 24 Aug 2020 22:53:23 +0000 (22:53 +0000)]
drm2: Update deprecation message
Update the deprecation message in the drm2 (aka legacy drm) drivers to point
towards the graphics/drm-kmod ports for all architectures, not just amd64.
drm-kmod has support for more architectures these days, and the
graphics/drm-legacy-kmod port is being deprecated.