From the (substantially larger) upstream commit:
+ call delay_output_sp to handle BSD-style padding when tputs_sp is
called, whether directly or internally, to ensure that the SCREEN
pointer is passed correctly (reports by Henric Jungheim, Juraj
Lutter).
This fixes bison segfaults observed when colourized output is enabled.
Thanks to jrtc27@ for identifying the upstream fix.
Add check that ifp supports IPv6 multicasts in in6_getmulti.
This fixes panic when user application tries to join into multicast
group on an interface that doesn't support IPv6 multicasts, like
IFT_PFLOG interfaces.
Alexander Motin [Fri, 30 Jul 2021 03:16:22 +0000 (23:16 -0400)]
coretemp(4): Switch to smp_rendezvous_cpus().
Use of smp_rendezvous_cpus() instead of sched_bind() allows to not
block indefinitely if target CPU is running some thread with higher
priority, while all we need is single rdmsr/wrmsr instruction call.
I guess it should also be much cheaper than full thread migration.
Alexander Motin [Fri, 30 Jul 2021 03:39:04 +0000 (23:39 -0400)]
ipmi(4): Add more watchdog error checks.
Add request submission status checks before checking req->ir_compcode,
otherwise it may be zero just because of initialization.
Add checks for req->ir_compcode errors in ipmi_reset_watchdog() and
ipmi_set_watchdog(). In first case explicitly check for 0x80, which
means timer was not previously set, that I found happening after BMC
cold reset. This change makes watchdog timer to recover instead of
permanently ignoring reset errors after BMC reset or upgraded.
John Baldwin [Sat, 12 Jun 2021 00:59:46 +0000 (17:59 -0700)]
bhyve vtblk: Inform guests of disk resize events.
Register a resize callback with the blockif interface. When the
callback fires, update the size of the disk and notify the guest via a
configuration change interrupt.
John Baldwin [Sat, 12 Jun 2021 00:59:25 +0000 (17:59 -0700)]
bhyve: Add support for handling disk resize events to block_if.
Allow clients of blockif to register a resize callback handler. When
a callback is registered, register an EVFILT_VNODE kevent watching the
backing store for a change in the file's attributes. If the size has
changed when the kevent fires, invoke the clients' callback.
Currently resize detection is limited to backing stores that support
EVFILT_VNODE kevents such as regular files.
John Baldwin [Sat, 12 Jun 2021 00:59:13 +0000 (17:59 -0700)]
bhyve: Add support for EVFILT_VNODE mevents.
This allows registering an event to watch for changes to a file's
attributes. This is a bit imperfect as it would be nice to have a way
to determine if an fd can use EVFILT_VNODE successfully. mevent's
current structure does not permit that and a failure to register a
single kevent impacts several other kevents.
John Baldwin [Sat, 12 Jun 2021 00:58:54 +0000 (17:58 -0700)]
bhyve: Register new kevents synchronously.
Change mevent_add*() to synchronously add the new kevent. This
permits reporting event registration failures to the caller and avoids
failing the registration of other, unrelated events queued up in the
same batch.
Mark Johnston [Thu, 29 Jul 2021 13:46:25 +0000 (09:46 -0400)]
link_elf_obj: Invoke fini callbacks
This is required for KASAN: when a module is unloaded, poisoned regions
(e.g., pad areas between global variables) are left as such, so if they
are reused as KLDs are loaded, false positives can arise.
Reported by: pho, Jenkins
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Mark Johnston [Thu, 29 Jul 2021 14:22:37 +0000 (10:22 -0400)]
amd64: Set GS.base before calling init_secondary() on APs
KMSAN instrumentation requires thread-local storage to track
initialization state for function parameters and return values. This
buffer is accessed as part of each function prologue. It is provided by
the KMSAN runtime, which looks up a pointer in the current thread's
structure.
When KMSAN is configured, init_secondary() is instrumented, but this
means that GS.base must be initialized first, otherwise the runtime
cannot safely access curthread. Work around this by loading GS.base
before calling init_secondary(), so that the runtime can at least check
curthread == NULL and return a pointer to some dummy storage. Note that
init_secondary() still must reload GS.base after calling lgdt(), which
loads a selector into %gs, which in turn clears the base register.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Ka Ho Ng [Tue, 30 Mar 2021 08:43:24 +0000 (16:43 +0800)]
bhyve: change vq_getchain to return iovecs in both directions
The old prototype requires callers to inspect flags of each descriptors
to get the starting position of host-writable iovecs.
vq_getchain() is changed to return a virtio request with the number of
host-readable iovecs and host-writable iovecs instead. Callers can avoid
boilerplate code of getting the start offset of host-writable iovecs.
Sponsored by: The FreeBSD Foundation
Reviewed by: afedorov
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D29433
John Baldwin [Mon, 12 Apr 2021 18:43:34 +0000 (11:43 -0700)]
bhyve: Move the gdb_active check to gdb_cpu_suspend().
The check needs to be in the public routine (gdb_cpu_suspend()), not
in the internal routine called from various places
(_gdb_cpu_suspend()). All the other callers of _gdb_cpu_suspend()
already check gdb_active, and this breaks the use of snapshots when
the debug server is not enabled since gdb_cpu_suspend() tries to lock
an uninitialized mutex.
Reported by: Darius Mihai, Elena Mihailescu
Reviewed by: elenamihailescu22_gmail.com
Fixes: 621b5090487de9fed1b503769702a9a2a27cc7bb
Differential Revision: https://reviews.freebsd.org/D29538
bhyve: fix regression in legacy virtio-9p config parsing
Commit 621b5090487de9fed1b503769702a9a2a27cc7bb introduced a regression
in legacy virtio-9p config parsing by not initializing *sharename to
NULL. As a result, "sharename != NULL" check in the first iteration fails
and bhyve exits with "virtio-9p: more than one share name given".
John Baldwin [Mon, 29 Mar 2021 17:25:45 +0000 (10:25 -0700)]
bhyve: Enable virtio-scsi legacy config parsing.
The previous commit added the handler to parse the command line
options for virtio-scsi devices but forgot to set the correct function
pointer to point to the handler.
John Baldwin [Wed, 24 Mar 2021 16:29:15 +0000 (09:29 -0700)]
bhyve hostbridge: Rename "device" property to "devid".
"device" is already used as the generic PCI-level name of the device
model to use (e.g. "hostbridge"). The result was that parsing
"hostbridge" as an integer failed and the host bridge used a device ID
of 0. The EFI ROM asserts that the device ID of the hostbridge is not
0, so booting with the current EFI ROM was failing during the ROM
boot.
John Baldwin [Wed, 26 Jun 2019 20:30:41 +0000 (13:30 -0700)]
Refactor configuration management in bhyve.
Replace the existing ad-hoc configuration via various global variables
with a small database of key-value pairs. The database supports
heirarchical keys using a MIB-like syntax to name the path to a given
key. Values are always stored as strings. The API used to manage
configuation values does include wrappers to handling boolean values.
Other values use non-string types require parsing by consumers.
The configuration values are stored in a tree using nvlists. Leaf
nodes hold string values. Configuration values are permitted to
reference other configuration values using '%(name)'. This permits
constructing template configurations.
All existing command line arguments now set configuration values. For
devices, the "-s" option parses its option argument to generate a list
of key-value pairs for the given device.
A new '-o' command line option permits setting an individual
configuration variable. The key name is always given as a full path
of dot-separated components.
A new '-k' command line option parses a simple configuration file.
This configuration file holds a flat list of 'key=value' lines where
the 'key' is the full path of a configuration variable. Lines
starting with a '#' are comments.
In general, bhyve starts by parsing command line options in sequence
and applying those settings to configuration values. Once this is
complete, bhyve then begins initializing its state based on the
configuration values. This means that subsequent configuration
options or files may override or supplement previously given settings.
A special 'config.dump' configuration value can be set to true to help
debug configuration issues. When this value is set, bhyve will print
out the configuration variables as a flat list of 'key=value' lines.
Most command line argments map to a single configuration variable,
e.g. '-w' sets the 'x86.strictmsr' value to false. A few command
line arguments have less obvious effects:
- Multiple '-p' options append their values (as a comma-seperated
list) to "vcpu.N.cpuset" values (where N is a decimal vcpu number).
- For '-s' options, a pci.<bus>.<slot>.<function> node is created.
The first argument to '-s' (the device type) is used as the value of
a "device" variable. Additional comma-separated arguments are then
parsed into 'key=value' pairs and used to set additional variables
under the device node. A PCI device emulation driver can provide
its own hook to override the parsing of the additonal '-s' arguments
after the device type.
After the configuration phase as completed, the init_pci hook
then walks the "pci.<bus>.<slot>.<func>" nodes. It uses the
"device" value to find the device model to use. The device
model's init routine is passed a reference to its nvlist node
in the configuration tree which it can query for specific
variables.
The result is that a lot of the string parsing is removed from
the device models and centralized. In addition, adding a new
variable just requires teaching the model to look for the new
variable.
- For '-l' options, a similar model is used where the string is
parsed into values that are later read during initialization.
One key note here is that the serial ports use the commonly
used lowercase names from existing documentation and examples
(e.g. "lpc.com1") instead of the uppercase names previously
used internally in bhyve.
Mitchell Horne [Wed, 4 Aug 2021 17:31:36 +0000 (14:31 -0300)]
hwpmc: disable uncore class on Sandy Bridge and newer
It was written for Nehalem and Westmere, with minor but incomplete
updates for Sandy Bridge in 78d763a29b15. The uncore architecture
changed significantly with this generation, bringing new layouts and
locations for some MSRs.
Misprogramming these MSRs in ucp_start_pmc() may panic the system, and
this is trivially reproducible via pmcstat(8) on at least Broadwell and
Haswell. Disable the class on these CPUs until it can be updated more
completely and leave a TODO comment detailing some of the work required.
Note that the nclasses value for Broadwell was already incorrect and
doesn't need changing.
The result is that any uncore events listed by pmcstat -L will no longer
be allocatable, but this is already the case for newer generations of
Intel CPUs.
Numerous counters got migrated from straight uint64_t to the counter(9)
API. Unfortunately the implementation comes with a significiant
performance hit on some platforms and cannot be easily fixed.
Work around the problem by implementing a pf-specific variant.
Roy Marples [Wed, 28 Jul 2021 15:46:59 +0000 (08:46 -0700)]
socket: Implement SO_RERROR
SO_RERROR indicates that receive buffer overflows should be handled as
errors. Historically receive buffer overflows have been ignored and
programs could not tell if they missed messages or messages had been
truncated because of overflows. Since programs historically do not
expect to get receive overflow errors, this behavior is not the
default.
This is really really important for programs that use route(4) to keep
in sync with the system. If we loose a message then we need to reload
the full system state, otherwise the behaviour from that point is
undefined and can lead to chasing bogus bug reports.
Mark Johnston [Wed, 28 Jul 2021 14:16:42 +0000 (10:16 -0400)]
pf: Validate user string nul-termination before copying
Some pf ioctl handlers use strlcpy() to copy strings when converting
from user structures to their in-kernel representations. strlcpy()
ensures that the destination will be nul-terminated, but it assumes that
the source is nul-terminated. In particular, it returns the full length
of the source string, so if the source is not nul-terminated, strlcpy()
will keep scanning until it finds a nul byte, and it may encounter an
unmapped page first. Add a helper to validate user strings before
copying.
There are also places where we look up a ruleset using a user-provided
anchor string. In some ioctl handlers we were already nul-terminating
the string, avoiding the same problem, but in other places we were not.
Fix those by nul-terminating as well. Aside from being consistent,
anchors have a maximum length of MAXPATHLEN - 1 so calling strnlen()
might not be so desirable.
Reported by: syzbot+35a1549b4663e9483dd1@syzkaller.appspotmail.com
Reviewed by: kp
Sponsored by: The FreeBSD Foundation
Mark Johnston [Wed, 28 Jul 2021 14:16:25 +0000 (10:16 -0400)]
pf: Initialize arrays before copying out to userland
A number of pf ioctls populate an array of structures and copy it out.
They have the following structures:
- caller specifies the size of its output buffer
- ioctl handler allocates a kernel buffer of the same size
- ioctl handler populates the buffer, possibly leaving some items
initialized if the caller provided more space than needed
- ioctl handler copies the entire buffer out to userland
Thus, if more space was provided than is required, we end up copying out
uninitialized kernel memory. Simply zero the buffer at allocation time
to prevent this.
Reported by: KMSAN
Reviewed by: kp
Sponsored by: The FreeBSD Foundation
Alexander Motin [Wed, 28 Jul 2021 20:15:43 +0000 (16:15 -0400)]
Do not expose to scheduler caches of single CPU.
Before this change my dual-Xeon(R) Gold 6242R always reported 3 levels
or topology (root, package/L3 and core/L2). But with SMT disabled
core/L2 matches thread, so additional topology level only causes more
traversal work. With this change SMT case is reported same as before,
while non-SMT is reported with only 2 much more simple levels.
Kristof Provost [Mon, 2 Aug 2021 07:46:33 +0000 (09:46 +0200)]
pf: bound DIOCGETSTATES memory use
Similar to what we did earlier for DIOCGETSTATESV2 we only allocate
enough memory for a handful of states and copy those out, bit by bit,
rather than allocating memory for all states in one go.
pf: locally originating connections with 'route-to' fail
Similar to the REPLY_TO shortcut (6d786845cf) we also can't shortcut
ROUTE_TO. If we do we will fail to apply transformations or update the
state, which can lead to premature termination of the connections.
While nvlists are very useful in maximising flexibility for future
extensions their performance is simply unacceptably bad for the
getstates feature, where we can easily want to export a million states
or more.
The DIOCGETSTATESNV call has been MFCd, but has not hit a release on any
branch, so we can still remove it everywhere.
Andrew Turner [Thu, 8 Jul 2021 12:14:56 +0000 (12:14 +0000)]
Update the SCTLR_EL1 register definitions
They are valid as of the ARMv8.7 XML.
While here remove SCTLR_RES0 as it's unused and depends on which CPU
the kernel is running on and switch to shifted values as they are
easier to compare with the documentation.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31120
Andrew Turner [Mon, 14 Jun 2021 11:01:46 +0000 (11:01 +0000)]
Use the correct length when copying arm64 vfp registers
We passed the wrong length into memcpy in the arm64 get_fpcontext and
set_fpcontext. This caused us to copy two status registers we didn't
expect to copy.
These are safe as they exist in both the source and destination, although
in a different order, and we copy the correct values after the memcpy.
virtio: enable VTNET_LEGACY_TX when ALTQ is enabled.
ALTQ only works on network drivers which use if_start (rather than
if_transmit). vtnet uses if_start if built with VTNET_LEGACY_TX. Default
to that the kernel is built with ALTQ enabled, to reduce user surprise.
Alex Richardson [Mon, 2 Aug 2021 13:36:03 +0000 (14:36 +0100)]
Allow bootstrapping llvm-tblgen on macOS and Linux
This is needed in order to build various LLVM binutils (e.g. addr2line)
as well as clang/lld/lldb.
Co-authored-by: Jessica Clarke <jrtc27@FreeBSD.org>
Test Plan: Compiles on ubuntu 18.04 and macOS 11.4
Reviewed By: dim
Differential Revision: https://reviews.freebsd.org/D31057
Alex Richardson [Mon, 2 Aug 2021 08:45:05 +0000 (09:45 +0100)]
tools/build: Don't redefine open() for the linux bootstrap
This is needed to bootstrap llvm-tblgen on Linux since LLVM calls
`::open(...)` which does not work if open is a statement macro.
Also stop defining O_SHLOCK/O_EXLOCK and update the only bootstrap tools
user of those flags to deal with missing definitions.
Alex Richardson [Tue, 20 Jul 2021 15:04:56 +0000 (16:04 +0100)]
bsd.linker.mk: Detect LLD when built with PACKAGE_VENDOR
Recent versions of homebrew's LLD are built with PACKAGE_VENDOR (since
https://github.com/Homebrew/homebrew-core/commit/e7c972b6062af753e564104e58d1fa20c0d1ad7a).
This means that the -v output is now
`Homebrew LLD 12.0.1 (compatible with GNU linkers)` and bsd.linker.mk no
longer detects it as LLD since it only checks whether the first word is
LLD. This change allow me to build on macOS again and should unbreak the
GitHub actions CI.
Alex Richardson [Mon, 19 Jul 2021 14:04:19 +0000 (15:04 +0100)]
Allow building usr.bin/vi with MK_ASAN
We have to namespace the regex functions to avoid duplicate symbol errors.
This also ensures that vi doesn't define the libc reg* functions with
mismatched signatures.
ld: error: duplicate symbol: regcomp
>>> defined at sanitizer_common_interceptors.inc:7519 (/usr/src/contrib/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc:7519)
>>> asan_interceptors.o:(__interceptor_regcomp) in archive /usr/lib/clang/10.0.1/lib/freebsd/libclang_rt.asan-x86_64.a
>>> defined at regcomp.c
>>> .../regex/regcomp.c.o:(.text+0x0)
ld: error: duplicate symbol: regerror
>>> defined at sanitizer_common_interceptors.inc:7543 (/usr/src/contrib/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc:7543)
>>> asan_interceptors.o:(__interceptor_regerror) in archive /usr/lib/clang/10.0.1/lib/freebsd/libclang_rt.asan-x86_64.a
>>> defined at regerror.c
>>> .../regex/regerror.c.o:(.text+0x0)
ld: error: duplicate symbol: regexec
>>> defined at sanitizer_common_interceptors.inc:7530 (/usr/src/contrib/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc:7530)
>>> asan_interceptors.o:(__interceptor_regexec) in archive /usr/lib/clang/10.0.1/lib/freebsd/libclang_rt.asan-x86_64.a
>>> defined at regexec.c
>>> .../regex/regexec.c.o:(.text+0x0)
ld: error: duplicate symbol: regfree
>>> defined at sanitizer_common_interceptors.inc:7553 (/usr/src/contrib/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc:7553)
>>> asan_interceptors.o:(__interceptor_regfree) in archive /usr/lib/clang/10.0.1/lib/freebsd/libclang_rt.asan-x86_64.a
>>> defined at regfree.c
>>> .../regex/regfree.c.o:(.text+0x0)
Committed upstream as https://github.com/lichray/nvi2/pull/92
Alex Richardson [Tue, 6 Jul 2021 11:18:29 +0000 (12:18 +0100)]
Fix building rescue/rescue when sanitizers are enabled
We have to ensure that we don't link any instrumented object files
into rescue as it is a static executable and static binaries can't
use the sanitizer runtime.
Alex Richardson [Tue, 6 Jul 2021 11:16:40 +0000 (12:16 +0100)]
usr.bin/diff: fix UBSan error in readhash
UBSan complains about the `sum = sum * 127 + chrtran(t);` line below since
that can overflow an `int`. Use `unsigned int` instead to ensure that
overflow is well-defined.
Alex Richardson [Tue, 6 Jul 2021 09:50:05 +0000 (10:50 +0100)]
usr.bin/login: send errors to console if syslog isn't running
I was debugging why login(1) wasn't working as expected on a minimal
MFS_ROOT disk image. This image doesn't have syslogd running so the
warnings were lost and I had to use GDB to find out why login(1) was
failing (missing PAM libraries) instead of being able to see it in
the console output.
Alex Richardson [Mon, 5 Jul 2021 13:32:48 +0000 (14:32 +0100)]
usr.bin/sort: Avoid UBSan errors
UBSan complains about out-of-bounds accesses for zero-length arrays. To
avoid this we can use flexible array members. However, the C standard does
not allow for structures that only contain flexible array members, so we
move the length parameters into that structure too.
Alex Richardson [Fri, 2 Jul 2021 08:21:04 +0000 (09:21 +0100)]
Simplify and speed up the kyua build
Instead of having multiple kyua libraries, just include the files as part
of usr.bin/kyua. Previously, we would build each kyua source up to four
times: once as a .o file and once as a .pieo. Additionally, the kyua
libraries might be built again for compat32. As all the kyua libraries
amount to 102 C++ sources the build time is significant (especially when
using an assertions enabled compiler). This change ensures that we build
306 fewer .cpp source files as part of buildworld.
Rick Macklem [Tue, 20 Jul 2021 00:35:39 +0000 (17:35 -0700)]
nfscl: Send stateid.seqid of 0 for NFSv4.1/4.2 mounts
For NFSv4.1/4.2, the client may set the "seqid" field of the
stateid to 0 in RPC requests. This indicates to the server that
it should not check the "seqid" or return NFSERR_OLDSTATEID if the
"seqid" value is not up to date w.r.t. Open/Lock operations
on the stateid. This "seqid" is incremented by the NFSv4 server
for each Open/OpenDowngrade/Lock/Locku operation done on the stateid.
Since a failure return of NFSERR_OLDSTATEID is of no use to
the client for I/O operations, it makes sense to set "seqid"
to 0 for the stateid argument for I/O operations.
This avoids server failure replies of NFSERR_OLDSTATEID,
although I am not aware of any case where this failure occurs.
This makes the FreeBSD NFSv4.1/4.2 client compatible with the
Linux NFSv4.1/4.2 client.
Ed Maste [Tue, 27 Jul 2021 21:18:41 +0000 (17:18 -0400)]
Update WITHOUT_KERNEL_SYMBOLS description
We have installed kernel debug data under /usr/lib/debug/ for some time
now, so the suggestion to set WITHOUT_KERNEL_SYMBOLS for small root
partitions is no longer valid.
Also call them "debug symbol files" rather than just "symbol files",
since they contain much more than just symbols. The kernel also
includes (some) symbols, regardless of the setting of this knob.