stats(3): Improve t-digest merging of samples which result in mu adjustment underflow.
Allow the calculation of the mu adjustment factor to underflow instead of
rejecting the VOI sample from the digest and logging an error. This trades off
some (currently unquantified) additional centroid error in exchange for better
fidelity of the distribution's density, which is the right trade off at the
moment until follow up work to better handle and track accumulated error can be
undertaken.
Rick Macklem [Fri, 19 Mar 2021 21:09:33 +0000 (14:09 -0700)]
nfsv4 client: fix forced dismount when sleeping on nfsv4lck
During a recent NFSv4 testing event a test server caused a hang
where "umount -N" failed. The renew thread was sleeping on "nfsv4lck"
and the "umount" was sleeping, waiting for the renew thread to
terminate.
This is the first of two patches that is hoped to fix the renew thread
so that it will terminate when "umount -N" is done on the mount.
nfsv4_lock() checks for forced dismount, but only after it wakes up
from msleep(). Without this patch, a wakeup() call was required.
This patch adds a 1second timeout on the msleep(), so that it will
wake up and see the forced dismount flag. Normally a wakeup()
will occur in less than 1second, but if a premature return from
msleep() does occur, it will simply loop around and msleep() again.
While here, replace the nfsmsleep() wrapper that was used for portability
with the actual msleep() call and make the same change for nfsv4_getref().
Kyle Evans [Wed, 24 Mar 2021 04:31:02 +0000 (23:31 -0500)]
libevent1: fix layout of duplicated RB_ENTRY() definition
3a509754ded1 removed the color field from our definition, but libevent1
has a copy of it off to the side to prevent event.h consumers from
*needing* to pull in sys/queue.h and sys/tree.h.
Update the event.h definition so that we don't accidentally end up with
two different views of struct event.
This appears to have no functional effect on anything in tree, but this
came up in a local patch to port if_switch(4) and related components
from OpenBSD.
Kyle Evans [Wed, 3 Mar 2021 03:38:37 +0000 (21:38 -0600)]
init: use explicit_bzero() for clearing passwords
This is a nop in practice, because it cannot be proven that this
particular bzero() is not significant. Make it explicit anyways, rather
than relying on an implementation detail of how the password is
collected.
Discussed with: Andrew Gierth <andrew tao146 riddles org uk>
Rick Macklem [Thu, 18 Mar 2021 19:20:25 +0000 (12:20 -0700)]
nfsv4 pnfs client: fix updating of the layout stateid.seqid
During a recent NFSv4 testing event a test server was replying
NFSERR_OLDSTATEID for layout stateids presented to the server
for LayoutReturn operations. Upon rereading RFC5661, it was
apparent that the FreeBSD NFSv4.1/4.2 pNFS client did not
maintain the seqid field of the layout stateid correctly.
This patch is believed to correct the problem. Tested against
a FreeBSD pNFS server with diagnostics added to check the stateid's
seqid did not indicate problems. Unfortunately, testing aginst
this server will not happen in the near future, so the fix may
not be correct yet.
ipf_proxy_check() returns -1 for an error and 0 or 1 for success.
ipf_proxy_check()'s callers check for error and if the return code
is 0, they change it to 1 prior to returning to their callers. Simply
by returning -1 or 1 we reduce complexity and cycles burned changing
0 to 1.
The computed IPOIB_CM_RX_SG is too small. It doesn't account for fallback
to mbuf clusters when jumbo frames are not available and it also doesn't
account for the packet header and trailer mbuf.
This causes a memory overwrite situation when IPOIB_CM is configured.
While at it add a kernel assert to ensure the mapping array is not overwritten.
Kristof Provost [Mon, 15 Mar 2021 13:10:55 +0000 (14:10 +0100)]
pf tests: pfsync bulk update test
Test that pfsync works as expected with bulk updates. That is, create
some state before setting up the second firewall. Let that firewall
request a bulk update so it can catch up, and check that it got the
state which was created before it enable pfsync.
Fetch the sigfastblock value in syscalls that wait for signals
We have seen several cases of processes which have become "stuck" in
kern_sigsuspend(). When this occurs, the kernel's td_sigblock_val
is set to 0x10 (one block outstanding) and the userspace copy of the
word is set to 0 (unblocked). Because the kernel's cached value
shows that signals are blocked, kern_sigsuspend() blocks almost all
signals, which means the process hangs indefinitely in sigsuspend().
It is not entirely clear what is causing this condition to occur.
However, it seems to make sense to add some protection against this
case by fetching the latest sigfastblock value from userspace for
syscalls which will sleep waiting for signals. Here, the change is
applied to kern_sigsuspend() and kern_sigtimedwait().
John Baldwin [Fri, 12 Mar 2021 17:47:31 +0000 (09:47 -0800)]
amd64: Cleanups to setting TLS registers for Linux binaries.
- Use update_pcb_bases() when updating FS or GS base addresses to
permit use of FSBASE and GSBASE in Linux processes. This also sets
PCB_FULL_IRET. linux32 was setting PCB_32BIT which should be a
no-op (exec sets it).
- Remove write-only variables to construct unused segment descriptors
for linux32.
John Baldwin [Fri, 12 Mar 2021 17:45:18 +0000 (09:45 -0800)]
amd64: Only update fsbase/gsbase in pcb for curthread.
Before the pcb is copied to the new thread during cpu_fork() and
cpu_copy_thread(), the kernel re-reads the current register values in
case they are stale. This is done by setting PCB_FULL_IRET in
pcb_flags.
This works fine for user threads, but the creation of kernel processes
and kernel threads do not follow the normal synchronization rules for
pcb_flags. Specifically, new kernel processes are always forked from
thread0, not from curthread, so adjusting pcb_flags via a simple
instruction without the LOCK prefix can race with thread0 running on
another CPU. Similarly, kthread_add() clones from the first thread in
the relevant kernel process, not from curthread. In practice, Netflix
encountered a panic where the pcb_flags in the first kthread of the
KTLS process were trashed due to update_pcb_bases() in
cpu_copy_thread() running from thread0 to create one of the other KTLS
threads racing with the first KTLS kthread calling fpu_kern_thread()
on another CPU. In the panicking case, the write to update pcb_flags
in fpu_kern_thread() was lost triggering an "Unregistered use of FPU
in kernel" panic when the first KTLS kthread later tried to use the
FPU.
Caleb St. John [Fri, 26 Mar 2021 18:00:14 +0000 (14:00 -0400)]
rpc.lockd: Unconditionally close fds as daemon
When lockd is configured with a debug level of > 0 and foreground == 0,
the process is daemonized with a truth noclose argument to daemon().
This doesn't seem to be the desired behavior because that prevents
stdout and stderr from being closed, however, stdout and stderr aren't
used anywhere else. Furthermore, the man pages state that with a higher
debug level it will use the syslog facilities to do so.
Submitted by: Caleb St. John
Discussed with: rmacklem
MFC after: 3 days
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D29415
Caleb St. John [Wed, 24 Mar 2021 20:33:41 +0000 (16:33 -0400)]
align nfsdumpstate column output
There are scenarios where an NFS client will mount an NFSv4 export
without specifying a callback address.
When running nfsdumpstate under this circumstance, the column output is
shifted incorrectly which places the "ClientID" value underneath the
"Clientaddr" column.
This diff is a small cosmetic change that prints a blank in the
"Clientaddr" column and ensures the data for the columns are aligned
appropriately.
Submitted by: Caleb St. John
Reviewed by: sef (previous version)
MFC after: 3 days
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D18958
Wei Hu [Fri, 12 Mar 2021 04:35:16 +0000 (04:35 +0000)]
Hyper-V: hn: Enable vSwitch RSC support in hn netvsc driver
Receive Segment Coalescing (RSC) in the vSwitch is a feature available in
Windows Server 2019 hosts and later. It reduces the per packet processing
overhead by coalescing multiple TCP segments when possible. This happens
mostly when TCP traffics are among different guests on same host.
This patch adds netvsc driver support for this feature.
The patch also updates NVS version to 6.1 as needed for RSC
enablement.
MFC after: 2 weeks
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D29075
Wei Hu [Wed, 24 Feb 2021 05:07:46 +0000 (05:07 +0000)]
Hyper-V: hn: Store host hash value in flowid
When rx packet contains hash value sent from host, store it in
the mbuf's flowid field so when the same mbuf is on the tx path,
the hash value can be used by the host to determine the outgoing
network queue.
Zero `struct weightened_nhop` fields in nhgrp_get_addition_group().
`struct weightened_nhop` has spare 32bit between the fields due to
the alignment (on amd64).
Not zeroing these spare bits results in duplicating nhop groups
in the kernel due to the way how comparison works.
Mark Johnston [Thu, 25 Mar 2021 21:55:20 +0000 (17:55 -0400)]
accept_filter: Fix filter parameter handling
For filters which implement accf_create, the setsockopt(2) handler
caches the filter name in the socket, but it also incorrectly frees the
buffer containing the copy, leaving a dangling pointer. Note that no
accept filters provided in the base system are susceptible to this, as
they don't implement accf_create.
Reported by: Alexey Kulaev <alex.qart@gmail.com>
Discussed with: emaste
Security: kernel use-after-free
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Mark Johnston [Tue, 23 Mar 2021 13:38:59 +0000 (09:38 -0400)]
pf: Handle unmapped mbufs when computing checksums
PR: 254419
Reviewed by: gallatin, kp
Tested by: Igor A. Valkov <viaprog@gmail.com>
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29378
Rick Macklem [Tue, 9 Mar 2021 00:08:02 +0000 (16:08 -0800)]
mountd(8): generate a syslog message when the "V4:" line is missing
Daniel reported that NFSv4 mounts were not working despite having
set "nfsv4_server_enable=YES" in /etc/rc.conf. Mountd was logging a
message that there was no /etc/exports file.
He noted that creating a /etc/exports file with a "V4:" line in it
was needed make NFSv4 mounts work.
At least one "V4:" line in one of the exports(5) file(s) is needed to
make NFSv4 mounts work. This patch fixes mountd.c so that it logs a
message indicting that there is no "V4:" line in any exports(5)
file when NFSv4 mounts are enabled.
To avoid this message being generated erroneously, /etc/rc.d/mountd
is updated to make sure vfs.nfsd.server_max_nfsvers is properly set
before mountd(8) is started.
Emmanuel Vadot [Sat, 27 Mar 2021 11:04:51 +0000 (12:04 +0100)]
release: amd64: Fix ISO/USB hybrid image
Recent mkimg changes forces to have partitions given in explicit order.
This is so we can have the first partition starting at a specific offset
and the next ones starting after without having to specify an offset.
Switch the partition in the mkisoimage.sh script so the first one created
is the isoboot one.
PR: 254490
Reported by: Michael Dexter <editor@callfortesting.org
Tested by: Vincent Milum Jr <freebsd@darkain.com>
MFC after: Right now
Jessica Clarke [Sat, 20 Mar 2021 17:58:10 +0000 (17:58 +0000)]
elftoolchain: Support building on Arm-based Macs
Currently macOS and DragonFlyBSD get their own special case and only
handle x86. Since all the FreeBSD cases should be general enough for
macOS and DragonFlyBSD (and the x86 ones are identical to the existing
ones) we can just delete the special cases and reuse the FreeBSD ones.
Note that upstream has since removed all the architecture-specific
checks in this file, with the only code relevant to us being an
endianness check that uses the generic compiler-provided macros. Thus
this patch will not be upstreamed, and will be dropped in a future
vendor import.
Jessica Clarke [Sat, 20 Mar 2021 13:00:34 +0000 (13:00 +0000)]
tools/build: Improve host-symlinks failure mode
Since set -e is enabled by sys.mk, if the tool cannot be found in PATH
then the entire shell command line fails, causing us to not print the
error message below and instead silently (due to the @) fail, only
getting the usual "Error code 1" print from bmake. Thus, provide a dummy
default that will never exist (the same as is used by meta2deps.sh) if
which fails so that we get the error message as intended.
D Scott Phillips [Thu, 18 Mar 2021 16:08:52 +0000 (00:08 +0800)]
bhyve: support relocating fbuf and passthru data BARs
We want to allow the UEFI firmware to enumerate and assign
addresses to PCI devices so we can boot from NVMe[1]. Address
assignment of PCI BARs is properly handled by the PCI emulation
code in general, but a few specific cases need additional support.
fbuf and passthru map additional objects into the guest physical
address space and so need to handle address updates. Here we add a
callback to emulated PCI devices to inform them of a BAR
configuration change. fbuf and passthru then watch for these BAR
changes and relocate the frame buffer memory segment and passthru
device mmio area respectively.
We also add new VM_MUNMAP_MEMSEG and VM_UNMAP_PPTDEV_MMIO ioctls
to vmm(4) to facilitate the unmapping needed for addres updates.
[1]: https://github.com/freebsd/uefi-edk2/pull/9/
Originally by: scottph
Sponsored by: Intel Corporation
Reviewed by: grehan
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D24066
Plug nexthop group refcount leak.
In case with batch route delete via rib_walk_del(), when
some paths from the multipath route gets deleted, old
multipath group were not freed.
Robert Watson [Sun, 21 Mar 2021 00:01:54 +0000 (00:01 +0000)]
Tune DTrace 'aframes' for the FBT and profile providers on arm64.
In both cases, too few frames were trimmed, leading to exception handling
or DTrace internals being exposed in stack traces exposed by D's stack()
primitive.
Reviewed by: emaste, andrew
Differential Revision: https://reviews.freebsd.org/D29356
Lawrence Stewart [Wed, 24 Mar 2021 04:25:49 +0000 (15:25 +1100)]
random(9): Restore historical [0,2^31-1] output range and related man documention.
Commit SVN r364219 / Git 8a0edc914ffd changed random(9) to be a shim around
prng32(9) and inadvertently caused random(9) to begin returning numbers in the
range [0,2^32-1] instead of [0,2^31-1], where the latter has been the documented
range for decades.
The increased output range has been identified as the source of numerous bugs in
code written against the historical output range e.g. ipfw "prob" rules and
stats(3) are known to be affected, and a non-exhaustive audit of the tree
identified other random(9) consumers which are also likely affected.
As random(9) is deprecated and slated for eventual removal in 14.0, consumers
should gradually be audited and migrated to prng(9).
On FreeBSD/arm fill_fpregs, fill_dbregs are stubs that zero the reg
struct and return success. set_fpregs and set_dbregs do nothing and
return success.
Provide the same implementation for arm64 COMPAT_FREEBSD32.
Reviewed by: andrew
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29314
Mark Johnston [Sun, 21 Mar 2021 18:18:10 +0000 (14:18 -0400)]
rtsold: Fix validation of RDNSS options
The header specifies the size of the option in multiples of eight bytes.
The option consists of an eight-byte header followed by one or more IPv6
addresses, so the option is invalid if the size is not equal to 1+2n for
some n>0. Check this.
The bug can cause random stack data to be formatted as an IPv6 address
and passed to resolvconf(8), but a host able to trigger the bug may also
specify arbitrary addresses this way.
Reported by: Q C <cq674350529@gmail.com>
Sponsored by: The FreeBSD Foundation
wpa: import fix for P2P provision discovery processing vulnerability
Latest version available from: https://w1.fi/security/2021-1/
Vulnerability
A vulnerability was discovered in how wpa_supplicant processes P2P
(Wi-Fi Direct) provision discovery requests. Under a corner case
condition, an invalid Provision Discovery Request frame could end up
reaching a state where the oldest peer entry needs to be removed. With
a suitably constructed invalid frame, this could result in use
(read+write) of freed memory. This can result in an attacker within
radio range of the device running P2P discovery being able to cause
unexpected behavior, including termination of the wpa_supplicant process
and potentially code execution.
Vulnerable versions/configurations
wpa_supplicant v1.0-v2.9 with CONFIG_P2P build option enabled
An attacker (or a system controlled by the attacker) needs to be within
radio range of the vulnerable system to send a set of suitably
constructed management frames that trigger the corner case to be reached
in the management of the P2P peer table.
Alexander Motin [Wed, 17 Mar 2021 14:30:40 +0000 (10:30 -0400)]
nvme: Replace potentially long DELAY() with pause().
In some cases like broken hardware nvme(4) may wait minutes for
controller response before timeout. Doing so in a tight spin loop
made whole system unresponsive.
- Call vm_object_reference() before vm_map_lookup_done().
- Use vm_mmap_to_errno() to convert vm_map_* return values to errno.
- Fix memory leak of e->obj.
Nathan Whitehorn [Tue, 23 Mar 2021 13:19:42 +0000 (09:19 -0400)]
Fix scripted installs on EFI systems after default mounting of the ESP.
Because the ESP mount point (/boot/efi) is in mtree, tar will attempt to
extract a directory at that point post-mount when the system is installed.
Normally, this is fine, since tar can happily set whatever properties it
wants. For FAT32 file systems, however, like the ESP, tar will attempt to
set mtime on the root directory, which FAT does not support, and tar will
interpret this as a fatal error, breaking the install (see
https://github.com/libarchive/libarchive/issues/1516). This issue would
also break scripted installs on bare-metal POWER8, POWER9, and PS3
systems, as well as some ARM systems.
This patch solves the problem in two ways:
- If stdout is a TTY, use the distextract stage instead of tar, as in
interactive installs. distextract solves this problem internally and
provides a nicer UI to boot, but requires a TTY.
- If stdout is not a TTY, use tar but, as a stopgap for 13.0, exclude
boot/efi from tarball extraction and then add it by hand. This is a
hack, and better solutions (as in the libarchive ticket above) will
obsolete it, but it solves the most common case, leaving only
unattended TTY-less installs on a few tier-2 platforms broken.
In addition, fix a bug with fstab generation uncovered once the tar issue
is fixed that umount(8) can depend on the ordering of lines in fstab in a
way that mount(8) does not. The partition editor now writes out fstab in
mount order, making sure umount (run at the end of scripted, but not
interactive, installs) succeeds.
PR: 254395
Approved by: re (gjb)
Reviewed by: gjb, imp
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D29380
Kristof Provost [Thu, 11 Mar 2021 10:37:05 +0000 (11:37 +0100)]
pf: pool/kpool conversion code
stuct pf_pool and struct pf_kpool are different. We should not simply
bcopy() them.
Happily it turns out that their differences were all pointers, and the
userspace provided pointers were overwritten by the kernel, so this did
actually work correctly, but we should fix it anyway.
MFC 6eb60f5b7f7d:
Use the word "LinuxKPI" instead of "Linux compatibility", to not confuse with
user-space Linux compatibility support. No functional change.
John Baldwin [Thu, 11 Feb 2021 21:51:20 +0000 (13:51 -0800)]
iscsi: Mark iSCSI CAM sims as non-pollable.
Previously, iscsi_poll() just panicked. This meant if you got a panic
on a box when using the iSCSI initiator, the attempt to shutdown would
trigger a nested panic and never write out a core. Now, CCB's sent to
iSCSI devices (such as the sychronize-cache request in dashutdown())
just fail with a timeout during a panic shutdown.
John Baldwin [Thu, 11 Feb 2021 21:51:01 +0000 (13:51 -0800)]
cam: Don't permit crashdumps on non-pollable devices.
If a disk's SIM doesn't support polling, then it can't be used to
store crashdumps. Leave d_dump NULL in that case so that dumpon(8)
fails gracefully rather than having dumps fail at crash time.
John Baldwin [Thu, 11 Feb 2021 21:49:43 +0000 (13:49 -0800)]
cam: Permit non-pollable sims.
Some CAM sim drivers do not support polling (notably iscsi(4)).
Rather than using a no-op poll routine that always times out requests,
permit a SIM to set a NULL poll callback. cam_periph_runccb() will
fail polled requests non-pollable sims immediately as if they had
timed out.
Mitchell Horne [Mon, 15 Mar 2021 13:46:03 +0000 (10:46 -0300)]
armv8crypto: note derivation in armv8_crypto_wrap.c
This file inherits some boilerplate and structure from the analogous
file in aesni(4), aesni_wrap.c. Note the derivation and the copyright
holders of that file.
For example, the AES-XTS bits added in 4979620ece984 were ported from
aesni(4).
Mark Johnston [Mon, 8 Mar 2021 17:39:06 +0000 (12:39 -0500)]
iflib: Make if_shared_ctx_t a pointer to const
This structure is shared among multiple instances of a driver, so we
should ensure that it doesn't somehow get treated as if there's a
separate instance per interface. This is especially important for
software-only drivers like wg.
DEVICE_REGISTER() still returns a void * and so the per-driver sctx
structures are not yet defined with the const qualifier.
Reviewed by: gallatin, erj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29102
Leandro Lupori [Tue, 9 Mar 2021 15:11:58 +0000 (12:11 -0300)]
ofwfb: fix boot on LE
Some framebuffer properties obtained from the device tree were not being
properly converted to host endian.
Replace OF_getprop calls by OF_getencprop where needed to fix this.
This fixes boot on PowerPC64 LE, when using ofwfb as the system console.
Reviewed by: bdragon
Sponsored by: Eldorado Research Institute (eldorado.org.br)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D27475
Mike Karels [Thu, 18 Mar 2021 00:19:24 +0000 (19:19 -0500)]
genet: Fix problem with forwarding some TCP/IPv6 packets
TCP/IPv6 packets to be forwarded can be laid out with only the Ethernet
header in the first mbuf, and these packets are lost. There was a
previous hack to pullup ICMPv6 packets with such a layout for the
same reason. Generalize, and pullup any IPv6 packets with only the
Ethernet header in the first mbuf. Possibly this should also include
IPv4, but that situation has not been observed to fail.
PR: 254060
Reported by: denis at h3q.com
MFC after: 3 days
Alan Somers [Sat, 27 Feb 2021 15:59:40 +0000 (08:59 -0700)]
Speed up geom_stats_resync in the presence of many devices
The old code had a O(n) loop, where n is the size of /dev/devstat.
Multiply that by another O(n) loop in devstat_mmap for a total of
O(n^2).
This change adds DIOCGMEDIASIZE support to /dev/devstat so userland can
quickly determine the right amount of memory to map, eliminating the
O(n) loop in userland.
This change decreases the time to run "gstat -bI0.001" with 16,384 md
devices from 29.7s to 4.2s.
Also, fix a memory leak first reported as PR 203097.
Gordon Bergling [Sat, 13 Mar 2021 18:28:26 +0000 (19:28 +0100)]
find(1): Refine the HISTORY within the manual page.
A simple find command appeared in Version 1 AT&T UNIX and was removed in
Version 3 AT&T UNIX. It was rewritten for Version 5 AT&T UNIX and later
be enhanced for the Programmer's Workbench (PWB). These changes were
later incorporated in AT&T UNIX v7.