Mitchell Horne [Thu, 2 Jun 2022 13:14:41 +0000 (10:14 -0300)]
Use KERNEL_PANICKED() in more places
This is slightly more optimized than checking panicstr directly. For
most of these instances performance doesn't matter, but let's make
KERNEL_PANICKED() the common idiom.
Reviewed by: mjg
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D35373
Dimitry Andric [Wed, 1 Jun 2022 18:29:15 +0000 (20:29 +0200)]
Apply compiler-rt fix for building with earlier versions of clang
Merge commit 18efa420da5f from llvm git (by Brooks Davis)
compiler-rt: Allow build without __c11_atomic_fetch_nand
Don't build atomic fetch nand libcall functions when the required
compiler builtin isn't available. Without this compiler-rt can't be
built with LLVM 13 or earlier.
Not building the libcall functions isn't optimal, but aligns with the
usecase in FreeBSD where compiler-rt from LLVM 14 is built with an LLVM
13 clang and no LLVM 14 clang is built.
Kristof Provost [Tue, 31 May 2022 19:11:14 +0000 (21:11 +0200)]
netinet6: fix panic on kldunload pfsync
Commit d6cd20cc5 ("netinet6: fix ndp proxying") caused us to panic when
unloading pfsync:
Fatal trap 12: page fault while in kernel mode
cpuid = 19; apic id = 38
fault virtual address = 0x20
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80dfe7f4
stack pointer = 0x28:0xfffffe015d4f8ac0
frame pointer = 0x28:0xfffffe015d4f8ae0
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 5477 (kldunload)
trap number = 12
panic: page fault
cpuid = 19
time = 1654023100
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe015d4f8880
vpanic() at vpanic+0x17f/frame 0xfffffe015d4f88d0
panic() at panic+0x43/frame 0xfffffe015d4f8930
trap_fatal() at trap_fatal+0x387/frame 0xfffffe015d4f8990
trap_pfault() at trap_pfault+0xab/frame 0xfffffe015d4f89f0
calltrap() at calltrap+0x8/frame 0xfffffe015d4f89f0
--- trap 0xc, rip = 0xffffffff80dfe7f4, rsp = 0xfffffe015d4f8ac0, rbp = 0xfffffe015d4f8ae0 ---
in6_purge_proxy_ndp() at in6_purge_proxy_ndp+0x14/frame 0xfffffe015d4f8ae0
if_purgeaddrs() at if_purgeaddrs+0x24/frame 0xfffffe015d4f8b90
if_detach_internal() at if_detach_internal+0x1c2/frame 0xfffffe015d4f8bf0
if_detach() at if_detach+0x71/frame 0xfffffe015d4f8c20
pfsync_clone_destroy() at pfsync_clone_destroy+0x1dd/frame 0xfffffe015d4f8c70
if_clone_destroyif() at if_clone_destroyif+0x239/frame 0xfffffe015d4f8cc0
if_clone_detach() at if_clone_detach+0xc8/frame 0xfffffe015d4f8cf0
vnet_pfsync_uninit() at vnet_pfsync_uninit+0xda/frame 0xfffffe015d4f8d10
vnet_deregister_sysuninit() at vnet_deregister_sysuninit+0x85/frame 0xfffffe015d4f8d40
linker_file_sysuninit() at linker_file_sysuninit+0x147/frame 0xfffffe015d4f8d70
linker_file_unload() at linker_file_unload+0x269/frame 0xfffffe015d4f8db0
kern_kldunload() at kern_kldunload+0x18d/frame 0xfffffe015d4f8e00
amd64_syscall() at amd64_syscall+0x12e/frame 0xfffffe015d4f8f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe015d4f8f30
--- syscall (444, FreeBSD ELF64, sys_kldunloadf), rip = 0x1601eab28cba, rsp = 0x1601e9c363f8, rbp = 0x1601e9c36c50 ---
This happens because ifp->if_afdata[AF_INET6] is NULL. Check for this,
just as we already do in a few other places.
See also c139b3c19b52a ("arp/nd: Cope with late calls to
iflladdr_event").
Kirk McKusick [Wed, 1 Jun 2022 02:55:54 +0000 (19:55 -0700)]
Two bug fixes to UFS/FFS superblock integrity checks when reading a superblock.
Two bugs have been reported with the UFS/FFS superblock integrity
checks that were added in commit 076002f24d35.
The code checked that fs_sblockactualloc was properly set to the
location of the superblock. The fs_sblockactualloc field was an
addition to the superblock in commit dffce2150eea on Jan 26 2018
and used a field that was zero in filesystems created before it
was added. The integrity check had to be expanded to accept the
fs_sblockactualloc field being zero so as not to reject filesystems
created before Jan 26 2018.
The integrity check set an upper bound on the value of fs_maxcontig
based on the maximum transfer size supported by the kernel. It
required that fs->fs_maxcontig <= maxphys / fs->fs_bsize. The kernel
variable maxphys defines the maximum transfer size permitted by the
controllers and/or buffering. The fs_maxcontig parameter controls the
maximum number of blocks that the filesystem will read or write in
a single transfer. It is calculated when the filesystem is created
as maxphys / fs_bsize. The bug appeared in the loader because it
uses a maxphys of 128K even when running on a system that supports
larger values. If the filesystem was built on a system that supports
a larger maxphys (1M is typical) it will have configured fs_maxcontig
for that larger system so would fail the test when run with the smaller
maxphys used by the loader. So we bound the upper allowable limit
for fs_maxconfig to be able to at least work with a 1M maxphys on the
smallest block size filesystem: 1M / 4096 == 256. We then use the
limit for fs_maxcontig as fs_maxcontig <= MAX(256, maxphys / fs_bsize).
There is no harm in allowing the mounting of filesystems that make larger
than maxphys I/O requests because those (mostly 32-bit machines) can
(very slowly) handle I/O requests that exceed maxphys.
Thanks to everyone who helped sort out the problems and the fixes.
Reported by: Cy Schubert, David Wolfskill
Diagnosis by: Mark Johnston, John Baldwin
Reviewed by: Warner Losh
Tested by: Cy Schubert, David Wolfskill
MFC after: 1 month (with 076002f24d35)
Differential Revision: https://reviews.freebsd.org/D35219
Arseny Smalyuk [Tue, 31 May 2022 20:04:51 +0000 (20:04 +0000)]
netinet6: Fix mbuf leak in NDP
Mbufs leak when manually removing incomplete NDP records with pending packet via ndp -d.
It happens because lltable_drop_entry_queue() rely on `la_numheld`
counter when dropping NDP entries (lles). It turned out NDP code never
increased `la_numheld`, so the actual free never happened.
Fix the issue by introducing unified lltable_append_entry_queue(),
common for both ARP and NDP code, properly addressing packet queue
maintenance.
nfs: skip bootpc when vfs.root.mountfrom is other than nfs
If "vfs.root.mountfrom" is set and the value is something other
than "nfs:*", it means the user doesn't want to mount root via nfs,
there's no reason to continue with bootpc
This fixes the powerpcspe kernel (MPC85XXSPE) that's compiled with
BOOTP_NFSROOT by default and gets stuck on bootpc/dhcp request loop
when no DHCP server is available on the network, even when user
specifies a local disk via "vfs.root.mountfrom" kernel parameter.
Reviewed by: imp
MFC after: 2 weeks
Sponsored by: Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D35098
Rick Macklem [Tue, 31 May 2022 18:59:39 +0000 (11:59 -0700)]
nfscl: Enable support for the Lookup+Open RPC
Commits 3ad1e1c1ce20 and 57014f21e754 added a Lookup+Open
RPC for NFSv4.1/4.2, which can reduce the RPC count by
10-20% for some loads. This has now received a fair amount
of testing, so I think it is ok to enable it.
Note that the Lookup+Open RPC is only used when the
"oneopenown" mount option is specified. As such, this
change won't affect most NFSv4.1/4.2 mounts.
Eric van Gyzen [Tue, 17 May 2022 17:46:59 +0000 (12:46 -0500)]
mandoc: workaround lack of macro parsing in list -width
GNU tools parse macros in the -width argument of lists. mandoc does not,
so it calculates an excessive width. This often squeezes the text into
a very narrow column, especially in nested lists.
Implement the easy workaround suggested in the TODO list. When there is
only one macro, at the beginning of the -width argument, this fixes the
formatting as well as a complete solution.
Alexander Motin [Tue, 31 May 2022 03:17:37 +0000 (23:17 -0400)]
hwpmc: Add basic Intel Alderlake CPUs support.
The PMC subsystem is not designed for non-uniform CPU capabilities
(P/E-cores are different), but at least several working architectural
events like cpu_clk_unhalted.thread_p should be better than nothing.
Kyle Evans [Fri, 20 May 2022 20:38:03 +0000 (15:38 -0500)]
zdiff: avoid non-conformant features
`setvar` is a non-conformant feature that looks slightly neater but is
not portable to other /bin/sh implementations. Making the script
portable is straightforward, so let's do it.
Tests are added to make sure that I didn't break anything major in the
process.
ibcore: Fix missing ib_cm_destroy_id() in ib_cm_insert_listen()
The algorithm pre-allocates a cm_id since allocation cannot be done while
holding the cm.lock spinlock, however it doesn't free it on one error
path, leading to a memory leak.
mlx4core: Fix a memory leak when deleting slave's resources
mlx4_delete_all_resources_for_slave() in the resource tracker should free
all memory allocated for a slave. While releasing memory of fs_rule,
it misses releasing memory of fs_rule->mirr_mbox.
Dmitry Chagin [Mon, 30 May 2022 16:53:52 +0000 (19:53 +0300)]
linux(4): Fix the type of a constant in the signal mask macro
Since l_sigset_t is 64-bit unsigned on all Linuxulators, fix the type
of a constant in the signal mask manipulation macro.
The suffix L indicates type long which is 32-bit on i386, therefore,
bitwise operations between a 32-bit constant and 64-bit signal mask
lead to the wrong result.
Dmitry Chagin [Mon, 30 May 2022 16:49:45 +0000 (19:49 +0300)]
linux(4): Microoptimize rt_sendsig(), convert signal mask once
On amd64 Linux saves the thread signal mask in both contexts, in the machine
dependent and in the machine independent. Both contexts are user accessible.
Convert the mask once, then copy it.
Mark Johnston [Mon, 30 May 2022 14:43:44 +0000 (10:43 -0400)]
rc: Add a zpoolreguid rc.d script
If one boots up multiple copies of a template VM image containing a
zpool, the pool GUIDs will be identical, making it impossible to, e.g.,
share datasets between them.
This diff introduces a simple workaround for the problem: one can use
the script to, upon first boot, assign a new GUID to one or more zpools.
This will be useful when building ZFS-based VM images from release(7).
Reviewed by: mav, allanjude, asomers
Reviewed by: Pau Amma (docs)
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35336
Doug Rabson [Mon, 30 May 2022 13:22:08 +0000 (14:22 +0100)]
pkgbase: Move pw to the runtime package
This allows building a container image with enough functionality for
downloading and installing packages without having to include the
utilities package.
Alexander Motin [Mon, 30 May 2022 13:07:30 +0000 (09:07 -0400)]
hwpmc: Use hardware PMCs freezing on PMI on Intel v2+.
Since version 2 Intel CPUs can freeze PMCs when intering PMI to reduce
PMI effects on collected statistics. Since version 4 hardware supports
"streamlined" mechanism, not requiring IA_GLOBAL_CTRL MSR access.
We could insert proxy NDP entries by the ndp command, but the host
with proxy ndp entries had not responded to Neighbor Solicitations.
Change the following points for proxy NDP to work as expected:
* join solicited-node multicast addresses for proxy NDP entries
in order to receive Neighbor Solicitations.
* look up proxy NDP entries not on the routing table but on the
link-level address table when receiving Neighbor Solicitations.
In order to decrease ifdef INET/INET6s in the lltable implementation,
introduce the llt_post_resolved callback and implement protocol-dependent
code in the protocol-dependent part.
Corvin Köhne [Mon, 30 May 2022 09:19:14 +0000 (11:19 +0200)]
x86/mp: don't create empty cpu groups
When some APICs are disabled by tunables, some cpu groups could end up
empty. An empty cpu group causes the system to panic because not all
functions handle them correctly. Additionally, it's wasted time to
handle and inspect empty cpu groups. Therefore, just don't create them.
Reviewed by: kib, avg, cem
Sponsored by: Beckhoff Automation GmbH & Co. KG
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D24927
Corvin Köhne [Mon, 30 May 2022 08:02:52 +0000 (10:02 +0200)]
vmm: add tunable to trap WBINVD
x86 is cache coherent. However, there are special cases where cache
coherency isn't ensured (e.g. when switching the caching mode). In these
cases, WBINVD can be used. WBINVD writes all cache lines back into main
memory and invalidates the whole cache.
Due to the invalidation of the whole cache, WBINVD is a very heavy
instruction and degrades the performance on all cores. So, we should
minimize the use of WBINVD as much as possible.
In a virtual environment, the WBINVD call is mostly useless. The guest
isn't able to break cache coherency because he can't switch the physical
cache mode. When using pci passthrough WBINVD might be useful.
Nevertheless, trapping and ignoring WBINVD is an unsafe operation. For
that reason, we implement it as tunable.
Corvin Köhne [Mon, 30 May 2022 08:01:36 +0000 (10:01 +0200)]
bhyve: use bhyve_config for SMBIOS strings
Some software uses SMBIOS entries to identify the system on which it's
running. In order to make it possible to use such software inside a VM,
SMBIOS entries should be configurable. Therefore, bhyve_config can be
used. While only a few SMBIOS entries might be of interest, it makes
sense that all SMBIOS entries are configurable. This way all SMBIOS
tables are build the same way and there's no special handling for some
tables.
Rick Macklem [Sat, 28 May 2022 23:27:02 +0000 (16:27 -0700)]
nfsstat: Add an entry to output NFSPROC_APPENDWRITE count
Commit 5218d82c81f9 added a new NFSv4.1/4.2 procedure called
AppendWrite that uses a Verify to avoid a separate Getattr RPC
for the common case where the client knows the correct file
size for O_APPEND writes.
This patch modifies nfsstat so that it displays a count of
these new RPCs for the "-E -c" option.
Rick Macklem [Sat, 28 May 2022 22:48:40 +0000 (15:48 -0700)]
mount_nfs: Only create a mounttab file entry is nmount(2) succeeds
mount_nfs creates entries in the mounttab file and umount removes
them. Entries in the mounttab file ae used by rpc.umntall to
notify the NFS server that NFSv3 entries need to be removed when
they have not been removed by umount.
Without this patch, an enty will be created in the mounttab file,
even if the nmount(2) syscall fails for the mount. This patch
modifies the code so that the mounttab entry is only created
after nmount(2) succeeds.
This change only affects NFSv3 and only affects how showmount
displays NFSv3 mounts.
Dimitry Andric [Sat, 28 May 2022 21:26:37 +0000 (23:26 +0200)]
Apply clang fix for assertion failure building putty 0.77 on i386
Merge commit 45084eab5e63 from llvm git (by Arthur Eubanks):
[clang] Fix some clang->llvm type cache invalidation issues
Take the following as an example
struct z {
z (*p)();
};
z f();
When we attempt to get the LLVM type of f, we recurse into z. z itself
has a function pointer with the same type as f. Given the recursion,
Clang simply treats z::p as a pointer to an empty struct `{}*`. The
LLVM type of f is as expected. So we have two different potential
LLVM types for a given Clang type. If we store one of those into the
cache, when we access the cache with a different context (e.g. we
are/aren't recursing on z) we may get an incorrect result. There is some
attempt to clear the cache in these cases, but it doesn't seem to handle
all cases.
This change makes it so we only use the cache when we are not in any
sort of function context, i.e. `noRecordsBeingLaidOut() &&
FunctionsBeingProcessed.empty()`, which are the cases where we may
decide to choose a different LLVM type for a given Clang type. LLVM
types for builtin types are never recursive so they're always ok.
This allows us to clear the type cache less often (as seen with the
removal of one of the calls to `TypeCache.clear()`). We
still need to clear it when we use a placeholder type then replace it
later with the final type and other dependent types need to be
recalculated.
I've added a check that the cached type matches what we compute. It
triggered in this test case without the fix. It's currently not
check-clang clean so it's not on by default for something like expensive
checks builds.
This change uncovered another issue where the LLVM types for an argument
and its local temporary don't match. For example in type-cache-3, when
expanding z::dc's argument into a temporary alloca, we ConvertType() the
type of z::p which is `void ({}*)*`, which doesn't match the alloca GEP
type of `{}*`.
Dmitry Chagin [Sat, 28 May 2022 20:46:05 +0000 (23:46 +0300)]
linux(4): Handle SO_TIMESTAMPNS socket option
The SO_TIMESTAMPNS enables or disables the receiving of the SCM_TIMESTAMPNS
control message. The cmsg_data field is a struct timespec.
To distinguish between SO_TIMESTAMP and SO_TIMESTAMPNS in the recvmsg()
map the last one to the SO_BINTIME and convert bintime to the timespec.
In the rest, implementation is identical to the SO_TIMESTAMP.
Dmitry Chagin [Sat, 28 May 2022 20:45:39 +0000 (23:45 +0300)]
linux(4): Handle 64-bit SO_TIMESTAMP for 32-bit binaries
To solve y2k38 problem in the recvmsg syscall the new SO_TIMESTAMP
constant were added on v5.1 Linux kernel. So, old 32-bit binaries
that knows only 32-bit time_t uses the old value of the constant,
and binaries that knows 64-bit time_t uses the new constant.
To determine what size of time_t type is expected by the user-space,
store requested value (SO_TIMESTAMP) in the process emuldata structure.
Dmitry Chagin [Sat, 28 May 2022 20:30:22 +0000 (23:30 +0300)]
linux(4): Add a helper to copyout getsockopt value
For getsockopt(), optlen is a value-result argument, which is modified
on return to indicate the actual size of the value returned.
For some cases this was missed, fixed.
Dmitry Chagin [Sat, 28 May 2022 20:29:12 +0000 (23:29 +0300)]
linux(4): Check the socket before any others sanity checks
Strictly speaking, this check is performed by the kern_recvit(), but in
the Linux emulation layer before calling the kernel we do other sanity
checks and conversions from Linux types to the native types. This changes
an order of the error returning that is critical for some buggy Linux
applications.
For recvmmsg() syscall this fixes a panic in case when the user-supplied
vlen value is 0, then error is not initialized and garbage passed to the
bsd_to_linux_errno().
Split cpuset_getaffinity() into a two counterparts, where the
user_cpuset_getaffinity() is intended to operate on the cpuset_t from
user va, while kern_cpuset_getaffinity() expects the cpuset from kernel
va.
Accordingly, the code that clears the high bits is moved to the
user_cpuset_getaffinity(). Linux sched_getaffinity() syscall returns
the size of set copied to the user-space and then glibc wrapper clears
the high bits.
Michael Tuexen [Sat, 28 May 2022 15:40:17 +0000 (17:40 +0200)]
sctp: improve handling of send() when association is shutdown
Accept send() calls only when the association is not being
shut down or the expicit message EOR mode is used and the
application provides follow-up data.
Reported by: syzbot+341e9ebd9d24ca7dc62a@syzkaller.appspotmail.com
MFC after: 3 days
Dimitry Andric [Fri, 27 May 2022 18:23:37 +0000 (20:23 +0200)]
Add several sanitizer ignore lists under /usr/lib/clang
Some of the sanitizers from compiler-rt can use ignore lists, which are
loosely modeled on valgrind's example. Upstream provides default lists
for AddressSanitizer, CFI, and MemorySanitizer, so install these in the
expected location, /usr/lib/clang/14.0.3/share.
Rick Macklem [Fri, 27 May 2022 21:32:46 +0000 (14:32 -0700)]
nfscl: Add a diagnostic printf() for a "should never happen" case
When a NFSv4.1/4.2 session to the NFS server (not a pNFS DS) is
replaced, the old session should always be marked defunct by
nfsess_defunct being set non-zero.
However, the hang reported by the PR suggests that this might
be the case.
This patch adds a printf() to indicate this has somehow happened.
Rick Macklem [Fri, 27 May 2022 21:20:31 +0000 (14:20 -0700)]
nfscl: Do not handle NFSERR_BADSESSION in operation code
The NFSERR_BADSESSION reply from a NFSv4.1/4.2 server
is handled by newnfs_request(). It should not be handled
separately after newnfs_request() has returned.
These two cases were spotted during code inspection.
One of them should only redo what newnfs_request() already
did by the same "nfscl" thread. The other might have
resulted in recovery being done twice, but the code is
only used for "pnfs" mounts, so that would be rare.
Also, since NFSERR_BADSESSION should only be replied by
a server after the server reboots, this would be extremely
rare.
Cy Schubert [Fri, 27 May 2022 14:19:53 +0000 (07:19 -0700)]
tests: Fix i386 and powerpc build
Fix:
tests/sys/kern/unix_passfd_test.c:414:24: error: comparison of integers
of different signs: 'int' and 'unsigned int' [-Werror,-Wsign-compare]
ATF_REQUIRE(getnfds() == nfds + MAXFDS);
~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
powerpc.powerpc/tmp/usr/include/atf-c/macros.h:144:15: note: expanded
from macro 'ATF_REQUIRE'
if (!(expression)) \
^~~~~~~~~~
1 error generated.
--- unix_passfd_test.o ---
Kirk McKusick [Fri, 27 May 2022 19:21:11 +0000 (12:21 -0700)]
Do comprehensive UFS/FFS superblock integrity checks when reading a superblock.
Historically only minimal checks were made of a superblock when it
was read in as it was assumed that fsck would have been run to
correct any errors before attempting to use the filesystem. Recently
several bug reports have been submitted reporting kernel panics
that can be triggered by deliberately corrupting filesystem superblocks,
see Bug 263979 - [meta] UFS / FFS / GEOM crash (panic) tracking
which is tracking the reported corruption bugs.
This change upgrades the checks that are performed. These additional
checks should prevent panics from a corrupted superblock. Although
it appears in only one place, the new code will apply to the kernel
modules and (through libufs) user applications that read in superblocks.
Reported by: Robert Morris and Neeraj
Reviewed by: kib
Tested by: Peter Holm
PR: 263979
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D35219
Dimitry Andric [Fri, 27 May 2022 17:43:39 +0000 (19:43 +0200)]
Apply clang fix for assertion failure building webkit2-gtk
Merge commit 30baa5d2a450 from llvm git (by Richard Smith):
PR45879: Fix assert when constant evaluating union assignment.
Consider the form of the first operand of a class assignment not the
second operand when implicitly starting the lifetimes of union members.
Also add a missing check that the assignment call actually came from a
syntactic assignment, not from a direct call to `operator=`.
Gleb Smirnoff [Fri, 27 May 2022 15:19:28 +0000 (08:19 -0700)]
sockbuf: remove unused mbuf counter and cluster counter
With M_EXTPG mbufs these two counters already do not represent the
reality. As we are moving towards protocol independent socket buffers,
which may not even use mbufs at all, the counters become less and less
relevant. The only userland seeing them was 'netstat -x'.