Andrew Gallatin [Tue, 24 Jan 2023 01:27:17 +0000 (20:27 -0500)]
dtrace: conditionally load the systrace_linux klds when loading dtrace.
When dtrace starts, it tries to detect if the dtrace klds are loaded,
and if not, it loads them by loading the dtraceall kld. This module
depends on most dtrace modules, including systrace for the native
freebsd and freebsd32 ABIs. However, it does not depend on the
systrace_linux klds, as they in turn depend on the linux ABI klds, and
we don't want to load an ABI module that the user has not explicitly
requested. This can leave a naive user in a state where they think all
syscall providers have been loaded, yet linux ABI syscalls are
"invisible" to dtrace.
To fix this, check to see if the linux ABI modules are loaded. If they
are, then load their systrace klds.
libcxx: Implement atomic::wait/notify using _umtx_op(2) for 64bit arches
Only 64bit architectures can be supported this way, because libcxx
defines __cxx_contention_t to be int64_t for FreeBSD, and 32bit arches
do not have a kind of UMTX_OP_WAIT_INT64_PRIVATE operation.
Mark Johnston [Mon, 23 Jan 2023 19:42:41 +0000 (14:42 -0500)]
netmap: Try to count packet drops in emulated mode
Right now we have little visibility into packet drops within netmap.
Start trying to make packet loss issues more visible by counting queue
drops in the transmit path, and in the input path for interfaces running
in emulated mode, where we place received packets in a bounded software
queue that is processed by rxsync.
Mark Johnston [Mon, 23 Jan 2023 19:41:05 +0000 (14:41 -0500)]
netmap: Tell the compiler to avoid reloading ring indices
Per the removed comments these fields should be loaded only once, since
they can in principle be modified concurrently, though this would be a
violation of the userspace contract with netmap.
Mitchell Horne [Mon, 9 Jan 2023 17:14:19 +0000 (13:14 -0400)]
vtblk: secondary fix for dumping
The code paths while dumping do not got through busdma. As such,
safeguard against calling bus_dmamap_sync() with a NULL map. The x86
implementation tolerates this but others do not, resulting in a
NULL dereference panic when dumping to a vtblk device on arm64, riscv,
etc.
Fixes: 782105f7c898 ("vtblk: Use busdma")
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D37990
Mitchell Horne [Sat, 7 Jan 2023 18:09:28 +0000 (14:09 -0400)]
ddb: have 'reset' command use normal reboot path
This conditionally gives all registered shutdown handlers a chance to
perform the reboot, with cpu_reset() being the fallback. The '\s'
modifier can be used with the command to get the previous behaviour.
The motivation is that some platforms may not be able do anything
meaningful via cpu_reset(), due to a lack of standardized reset
mechanism and/or firmware shortcomings. However, they may have a
separate device driver attached that normally performs the reboot. Such
is the case for some versions of the Raspberry Pi, where reset via PSCI
fails, but the BCM2835 watchdog driver has a shutdown hook.
Currently shutdown_reset() is registered as the final entry of the
shutdown_final event handler. However, if a panic occurs early in boot
before the event is registered (SI_SUB_INTRINSIC), we may end up
spinning in the subsequent infinite for loop and failing to reset
altogether. Instead we can simply call this function unconditionally.
Jiajie Chen [Mon, 23 Jan 2023 16:36:59 +0000 (00:36 +0800)]
Add kf_file_nlink field to kf_file and populate it
This will allow user-space programs (e.g. lsof) to locate deleted files
whose nlink equals zero. Prior to this commit, programs has to use
stat(kf_path) to get nlink, but that will fail if the file is deleted.
netinet6: honor blackhole/unreach routes in the non-fastforwading code.
Currently, under the conditions specified below, IPv6 ingress packet
processing can ignore blackhole/reject flag on the prefix. The packet
will instead be looped locally till TTL expiration and a single ICMPv6
unreachable message will be send to the source even in case of
RTF_BLACKHOLE.
The following conditions needs hold to make the scenario happen:
* IPv6 forwarding is enabled
* Packet is not fast-forwarded
* Destination prefix has either RTF_BLACKHOLE or RTF_REJECT flag
Fix this behavior by checking for the blackhole/reject flags in
ip6_forward().
* Automatically use IPv6 when IPv6 addresses are used, --ip6 is not needed.
* Building of ping requests and parsing of ping replies is done layer by
layer. This way most arguments are available both for IPv6 and IPv4,
for ICMP and TCP.
* Use argument groups for improved readability.
* Change ToS and TTL argument name to TC and HL to reflect the modern
IPv6 nomenclature. The argument still set related IPv4 header fields
properly.
* Instead of sniffing for the very specific case of duplicated packets,
allow for sniffing on multiple interfaces.
* Report which sniffer has failed by setting bits of error code.
* Raise meaningful exceptions when irrecoverable errors happen.
* Make IPv4 fragmentation flags configurable.
* Make IPv6 HL / IPv4 TTL configurable.
* Make TCP MSS configurable.
* Make TCP sequence number configurable.
* Make ICMP payload size configurable.
* Add debug output.
* Move command line argument parsing out of network functions.
* Make the code somehow PEP-8 compliant.
* Remove ambiguity of configuring recvif, it must be now explicitly specified.
* Don't catch exceptions around creating the sniffer, let it properly
fail and display the whole stack trace.
* Count correct packets so that duplicates can be found.
Andrew Gallatin [Sat, 21 Jan 2023 19:26:25 +0000 (14:26 -0500)]
vm: centralize VM_BATCHQUEUE_SIZE definition
Remove the platform-specific definitions of VM_BATCHQUEUE_SIZE
for amd64 and powerpc64, and instead treat all 64-bit platforms
identically. This has the effect of increasing the arm64
and riscv VM_BATCHQUEUE_SIZE to match that of other platforms.
Some existing applications setup Netlink socket with
SOCK_DGRAM instead of SOCK_RAW. Update the manpage to clarify
that the default way of creating the socket should be with
SOCK_RAW. Update the code to support both SOCK_RAW and SOCK_DGRAM.
Warner Losh [Sat, 21 Jan 2023 05:11:00 +0000 (22:11 -0700)]
elf: EF_ARM_EABI_VERSION_UNKNOWN is no longer used, retire
EF_ARM_EABI_VERSION_UNKNOWN was used when we could run either eabi or
oabi. That was armv[45] only, and went away when we migrated to eabi
many major releases ago. No standard defines this symbol, so retire it.
Warner Losh [Sat, 21 Jan 2023 02:15:52 +0000 (19:15 -0700)]
elf: Catch up with defining EF_ARM_EABI_VERSION in elf_common.h
FreeBSD defines EF_ARM_EABI_VERSION in a non-standard way (at least
differently than everybody else). We use this only in elf*machdep.c to
make sure the image is new enough. Switch to the more standard way of
defining this and adjust other constants to match.
Warner Losh [Fri, 20 Jan 2023 23:46:05 +0000 (16:46 -0700)]
elf_common.h: Add parens to EF_ARM_EABI_VERSION macro
Due to my haste, I missed John's suggestion in the review that I add ()
here. Belatedly add them. It didn't matter for my test case, but there's
some pathological uses where it might matter.
Warner Losh [Fri, 20 Jan 2023 23:33:37 +0000 (16:33 -0700)]
byteswap.h: Add a glibc/linux compatible byteswap.h
For endian.h to work instead of sys/endian.h, some software needs
byteswap.h available. It must define {__,}byteswap_{16,32,64}.
Included sys/_endian.h to get an appropriate __byteswap16, etc
and defines the new macros in terms of them. Enhance _endian.h
to allow it to be included from here too.
Warner Losh [Fri, 20 Jan 2023 23:32:45 +0000 (16:32 -0700)]
linux: For better compatibility, provide compatible endian.h
Add endian.h. This includes sys/endian.h and then adds extra defines
that glibc defines with double underscores for our
_{BIG,BYTE,LITTLE,PDP}_ENDIAN macros. We also define __FLOAT_WORD_ORDER
to be the same as _BYTE_ENDIAN since FreeBSD doesn't currently define
this, and the default with glibc is exactly this for our platforms.
Move common parts of endian.h and sys/endian.h into sys/_endian.h
to limit namespace pollution from endian.h
All this gives us good compatibility with Linux. There may be one or two
upstreams that haven't integrated the patches I tried to send up.
There are some minor differences:
o The extra glibc macros are not defined. These are all
controlled with either __ at the start, or only defined
when glibc is being built. We also don't define macros
that are used internally in glibc that would pollute
the namespace.
o For complete compatibility, this change must also be
paired with providing a glibc-compatible byteswap.h.
Warner Losh [Fri, 20 Jan 2023 23:17:45 +0000 (16:17 -0700)]
UPDATING: Remove old entries
Belatedly remove the entries older than the stable/11 branch point after
stable/13 was created. This should be done shortly after the branch, not
well after the branch point.
Document this policy in UPDATING, though other checklists should likely
be updated as well.
Warner Losh [Fri, 20 Jan 2023 23:15:04 +0000 (16:15 -0700)]
elf_common.h: define EF_ARM_EABI_VERSION
The Linux distributions have had the EF_ARM_EABI_VERSION macro for a
while now. The kernel of arm* targets build depends on it being in the
host's elf.h environment. Add it here so it gets pulled in so there's
one fewer hack required to build a Linux kernel on a FreeBSD host.
Warner Losh [Fri, 20 Jan 2023 23:13:54 +0000 (16:13 -0700)]
UPDATING: update notes on EFI booting
The notes on EFI booting and updating for ZFS had become dated and only
partially updated. Expand the notes with a few more details and a
pointer to laoder.efi(8) and uefi(8).
Robert Wing [Fri, 20 Jan 2023 11:10:53 +0000 (11:10 +0000)]
vmm: take exclusive mem_segs_lock in vm_cleanup()
The consumers of vm_cleanup() are vm_reinit() and vm_destroy().
The vm_reinit() call path is, here vmmdev_ioctl() takes mem_segs_lock:
vmmdev_ioctl()
vm_reinit()
vm_cleanup(destroy=false)
The call path for vm_destroy() is (mem_segs_lock not taken):
sysctl_vmm_destroy()
vmmdev_destroy()
vm_destroy()
vm_cleanup(destroy=true)
Fix this by taking mem_segs_lock in vm_cleanup() when destroy == true.
Reviewed by: corvink, markj, jhb Fixes: 67b69e76e8ee ("vmm: Use an sx lock to protect the memory map.")
Differential Revision: https://reviews.freebsd.org/D38071
John Baldwin [Fri, 20 Jan 2023 17:58:38 +0000 (09:58 -0800)]
bhyve: Fix a buffer overread in the PCI hda device model.
The sc->codecs array contains HDA_CODEC_MAX (15) entries. The
guest-supplied cad field in the verb provided to hda_send_command is a
4-bit field that was used as an index into sc->codecs without any
bounds checking. The highest value (15) would overflow the array.
Other uses of sc->codecs in the device model used sc->codecs_no to
determine which array indices have been initialized, so use a similar
check to reject requests for uninitialized or invalid cad indices in
hda_send_command.
PR: 264582
Reported by: Robert Morris <rtm@lcs.mit.edu>
Reviewed by: corvink, markj, emaste
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D38128
John Baldwin [Fri, 20 Jan 2023 17:57:45 +0000 (09:57 -0800)]
bhyve: Fix a global buffer overread in the PCI hda device model.
hda_write did not validate the relative register offset before using
it as an index into the hda_set_reg_table array to lookup a function
pointer to execute after updating the register's value.
PR: 264435
Reported by: Robert Morris <rtm@lcs.mit.edu>
Reviewed by: corvink, markj, emaste
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D38127
Alex Richardson [Fri, 20 Jan 2023 10:47:17 +0000 (10:47 +0000)]
Shell-escape assignments to PATH in the top-level makefiles
Since 16fbf0191243e7c9dff6615b1424b5d39186b36c PATH is no longer set
to a hardcoded value on non-FreeBSD build hosts, so we can end up with
spaces in $PATH. Instead of only escaping PATH I updated all `env PATH=`
uses in the toplevel makefile. While many of these currently can't
contain any special characters (since the build would have failed
already), in theory this gets us closer to allowing build/source
directory to contain e.g. spaces.
Elliott Mitchell [Sat, 26 Nov 2022 16:21:33 +0000 (09:21 -0700)]
vmstat: fix overflow of interrupt name buffer
sysctl() provides a count of number of bytes in the buffer. That is the
actual buffer length. Whereas looking for an interrupt entry with an
empty name could terminate too early, or overflow the end of the buffer.
The overflow will occur if the table of interrupt names is full.
These appear to simply be the style of arguments are left untouched and
only local variables are modified. While some may prefer that style
this simply serves to complicate things as they're perfectly writeable.
John Baldwin [Thu, 19 Jan 2023 22:48:52 +0000 (14:48 -0800)]
stdlib.h: Fix qsort_r compatibility with GCC 12.
GCC 12 (unlike GCC 9) does not match a function argument passed to the
old qsort_r() API (as is used in the qsort_r_compat test) to a
function pointer type via __generic. It treats the function type as a
distinct type from a function pointer. As a workaround, add a second
definition of qsort_r for GCC 12 which uses the bare function type.
Jake Freeland [Thu, 19 Jan 2023 22:24:44 +0000 (22:24 +0000)]
Makefile: Avoid sanitizing PATH on non-FreeBSD systems
Allow the build process to find host binaries during the host-symlinks target when
cross-building on non-FreeBSD systems. Whilst most non-FreeBSD systems have all
the needed tools in /sbin:/bin:/usr/sbin:/usr/bin:/usr/local/bin (the final
path added by host-symlinks itself), Homebrew for macOS on Arm defaults to
/opt/homebrew/bin, other more niche systems may also deviate and users may
expect tools in a customised PATH to be picked up, unlike on FreeBSD where we
want to ensure everything comes from base. In particular, (un)xz are needed
from Homebrew on macOS, and thus cannot be found on Arm without this.
Note that non-FreeBSD builds enforce BUILD_WITH_STRICT_TMPPATH, and so the
actual main build steps will still use a sanitised PATH.
Commit d3f96f661050e9bd21fe29931992a8b9e67ff189 removed <sys/queue.h>
and replaced it with the very broad <sys/systm.h>. However, none of
the changes to sysctl.h in that commit require anything defined in
<sys/systm.h>. On the other hand, <sys/sysctl.h> does still make use
of queue macros. Drop the include of <sys/systm.h> and re-add
<sys/queue.h>.
John Baldwin [Thu, 19 Jan 2023 18:30:18 +0000 (10:30 -0800)]
bhyve: Remove vmctx argument from PCI device model methods.
Most of these arguments were unused. Device models which do need
access to the vmctx in one of these methods can obtain it from the
pi_vmctx member of the pci_devinst argument instead.
XHCI port and slot numbers are 1-based rather than 0-based. To handle
this, bhyve was subtracting one item from the pointers saved in the
softc so that index 1 accessed index 0 of the allocated array.
However, this is UB and confused GCC 12. The compiler noticed that
the calls to free() were using an offset and emitted a warning.
Rather than storing UB pointers in the softc, push the decrement
operation into the existing macros that wrap accesses to the relevant
arrays.
Pau Amma [Sat, 28 May 2022 18:49:17 +0000 (18:49 +0000)]
Refresh CPU types and classes from sys/sys/pmc.h.
While here, fix a few nits.
Inspired by reviewing D35342.
Sources for trademark info
- https://www.arm.com/company/policies/trademarks (no Arm8, curiously)
- https://www.ibm.com/legal/copytrade?mhsrc=ibmsearch_a&mhq=trademark
Some devices have CDC_CM descriptors that would point us to
the wrong interfaces. Add a quirk to ignore those (prefering the
CDC_UNION descriptor effectively)
Reviewed by: manu
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D37942
Andrew Turner [Wed, 18 Jan 2023 09:30:32 +0000 (09:30 +0000)]
Always store the arm64 VFP context
If a thread enters a kernel FP context the PCB_FP_STARTED may be
unset when calling get_fpcontext even if the VFP unit has been used
by the current thread.
Reduce the use of this flag to just decide when to store the VFP state.
While here add an assert to check the assumption that the passed in
thread is the current thread and remove the unneeded critical section.
The latter is unneeded as the only place we would need it is in
vfp_save_state and this already has a critical section when needed.
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D37998
Andrew Turner [Wed, 18 Jan 2023 09:30:20 +0000 (09:30 +0000)]
Always read the VFP regs in the arm64 fill_fpregs
The PCB_FP_STARTED is used to indicate that the current VFP context
has been used since either 1. the start of the thread, or 2. exiting
a kernel FP context.
When case 2 was added to the kernel this could cause incorrect results
to be returned when a thread exits the kernel FP context and fill_fpregs
is called before it has restored the VFP state, e.g. by trappin on a
userspace VFP instruction.
In both of the cases the base save area is still valid so reduce the
use of the PCB_FP_STARTED flag check to help decide if we need to
store the current threads VFP state.
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D37994
Brooks Davis [Tue, 17 Jan 2023 16:36:15 +0000 (16:36 +0000)]
riscv: Fix thread0.td_kstack_pages init
Commit 0ef3ca7ae37c70e9dc83475dc2e68e98e1c2a418 initialized
thread0.td_kstack_pages to KSTACK_PAGES. Due to the lack of an
include of opt_kstack_pages.h it used the fallback value of 4 from
machine/param.h. This meant that increasing KSTACK_PAGES in the kernel
config resulted in a panic in _epoch_enter_preempt as the following
assertion was false during network stack setup:
Brooks Davis [Tue, 17 Jan 2023 16:35:08 +0000 (16:35 +0000)]
arm64: Fix thread0.td_kstack_pages init
Commit 86a994d6537d7b5e1efb1019e466d86a688fd570 initialized
thread0.td_kstack_pages to KSTACK_PAGES. Due to the lack of an
include of opt_kstack_pages.h it used the fallback value of 4 from
machine/param.h. This meant that increasing KSTACK_PAGES in the kernel
config resulted in a panic in _epoch_enter_preempt as the following
assertion was false during network stack setup:
Mark Johnston [Tue, 17 Jan 2023 14:36:54 +0000 (09:36 -0500)]
netlink: Zero-initialize mbuf messages
Some users of nlmsg_reserve_object() and nlmsg_reserve_data() are not
careful to fully initialize pad and reserved fields, allowing
uninitialized bytes to leak to userspace. For example, dump_nhgrp()
doesn't set nhm->resvd = 0.
Meanwhile, nlmsg_get_ns_buf() and nlmsg_get_ns_lbuf() zero-initialize
the buffer, so nlmsg_get_ns_mbuf() is inconsistent. Let's just make
them all behave the same here.
Reported by: KMSAN
Reviewed by: melifaro
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D38098