Alex Richardson [Mon, 11 Oct 2021 10:46:30 +0000 (11:46 +0100)]
Update OptionalObsoleteFiles.inc after 021385aba562
I forgot to update this file so make delete-old would incorrectly remove
the newly-installed LLVM binutils. While touching the file also update
for 8e1c989abbd1 since ObsoleteFiles.inc now inludes the tablegen binaries.
Reported by: Herbert J. Skuhra <herbert@gojira.at>
Reviewed By: emaste, imp
Alex Richardson [Mon, 6 Sep 2021 08:31:58 +0000 (09:31 +0100)]
Don't build and install {llvm,clang,lldb}-tblgen for the target
The tablegen binaries are only needed to build software that uses
LLVM's infrastructure for command line options,
disassembler tables, etc. They are not user-facing binaries and
should therefore not be installed by default.
Alex Richardson [Mon, 6 Sep 2021 08:49:49 +0000 (09:49 +0100)]
Add WITH_LLVM_BINUTILS to install LLVM binutils instead of Elftoolchain
When WITH_LLVM_BINUTILS is set, we will install the LLVM binutils as
ar/ranlib/nm/objcopy/etc. instead of the elftoolchain ones.
Having the LLVM binutils instead of the elftoolchain ones allows us to use
features such as LTO that depend on binutils that understand LLVM IR.
Another benefit will be an improved user-experience when compiling with
AddressSanitizer, since ASAN does not symbolize backtraces correctly if
addr2line is elftoolchain addr2line instead of llvm-symbolizer.
See https://lists.freebsd.org/archives/freebsd-toolchain/2021-July/000062.html
for more details.
This is currently off by default but will be turned on by default at some
point in the near future.
Mark Johnston [Sat, 20 Nov 2021 16:21:25 +0000 (11:21 -0500)]
zfs: Fix a deadlock between page busy and the teardown lock
When rolling back a dataset, ZFS has to purge file data resident in the
system page cache. To do this, it loops over all vnodes for the
mountpoint and calls vn_pages_remove() to purge pages associated with
the vnode's VM object. Each page is thus exclusively busied while the
dataset's teardown write lock is held.
When handling a page fault on a mapped ZFS file, FreeBSD's page fault
handler busies newly allocated pages and then uses VOP_GETPAGES to fill
them. The ZFS getpages VOP acquires the teardown read lock with vnode
pages already busied. This represents a lock order reversal which can
lead to deadlock.
To break the deadlock, observe that zfs_rezget() need only purge those
pages marked valid, and that pages busied by the page fault handler are,
by definition, invalid. Furthermore, ZFS pages always transition from
invalid to valid with the teardown lock held, and ZFS never creates
partially valid pages. Thus, zfs_rezget() can use the new
vn_pages_remove_valid() to skip over pages busied by the fault handler.
PR: 258208
Tested by: pho
Reviewed by: avg, sef, kib
Sponsored by: The FreeBSD Foundation
Colin Percival [Mon, 22 Nov 2021 21:51:43 +0000 (13:51 -0800)]
etc/defaults/rc.conf: Add -i flag to rtsol/rtsold
This disables the random (between zero and one seconds) delay before
rtsol and rtsold send a a Router Solicitation packet. This delay is
specified as a SHOULD by RFC 4861 for avoidance of network congestion,
but network speeds have increased enough in the 25 years since this
first appeared (in RFC 1970) that it seems unnecessary as a default
at this point.
This speeds up the FreeBSD boot process by an average of 500 ms.
Colin Percival [Sat, 13 Nov 2021 16:38:09 +0000 (08:38 -0800)]
randomdev: Remove 100 ms sleep from write routine
This was introduced in 2014 along with the comment (which has since
been deleted):
/* Introduce an annoying delay to stop swamping */
Modern cryptographic random number generators can ingest arbitrarily
large amounts of non-random (or even maliciously selected) input
without losing their security.
Depending on the number of "boot entropy files" present on the system,
this can speed up the boot process by up to 1 second.
Reviewed by: cem
Sponsored by: https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D32984
Colin Percival [Thu, 21 Oct 2021 20:15:57 +0000 (13:15 -0700)]
uefi(8): loader.efi does not search for loader.efi
This man page formerly referred to boot1.efi searching for loader.efi;
when boot1.efi was obsoleted in favour of having loader.efi launched
directly, this was left claiming that loader.efi searched for
loader.efi.
os-release: Quote variables as documented in the manual
Variables must be quoted if they contain non-alphanumeric characters.
Warner noted in the review that the lack of quoting causing problems
here is rather an edge case. I believe that it's worth adding the quotes
here anyway because this is what the specification says and there is no
good reason not to follow it.
routing: fix source address selection rules for IPv4 over IPv6.
Current logic always selects an IFA of the same family from the
outgoing interfaces. In IPv4 over IPv6 setup there can be just
single non-127.0.0.1 ifa, attached to the loopback interface.
Create a separate rt_getifa_family() to handle entire ifa selection
for the IPv4 over IPv6.
lltable: do not require prefix lookup when checking lle allocation rules.
With the new FIB_ALGO infrastructure, nearly all subsystems use
fib[46]_lookup() functions, which provides lockless lookups.
A number of places remains that uses old-style lookup functions, that
still requires RIB read lock to return the result. One of such places
is arp processing code.
FIB_ALGO implementation makes some tradeoffs, resulting in (relatively)
prolonged periods of holding RIB_WLOCK. If the lock is held and datapath
competes for it, the RX ring may get blocked, ending in traffic delays and losses.
As currently arp processing is performed directly in the interrupt handler,
handling ARP replies triggers the problem descibed above when the amount of
ARP replies is high.
To be more specific, prior to creating new ARP entry, routing lookup for the entry
address in interface fib is executed. The following conditions are the verified:
1. If lookup returns an empty result, or the resulting prefix is non-directly-reachable,
failure is returned. The only exception are host routes w/ gateway==address.
2. If the routing lookup returns different interface and non-host route,
we want to support the use case of having multiple interfaces with the same prefix.
In fact, the current code just checks if the returned prefix covers target address
(always true) and effectively allow allocating ARP entries for any directly-reachable prefix,
regardless of its interface.
Change the code to perform the following:
1) use fib4_lookup() to get the nexthop, instead of requesting exact prefix.
2) Rewrite first condition check using nexthop flags (1:1 match)
3) Rewrite second condition to check for interface addresses matching target address on
the input interface.
Ed Maste [Mon, 5 Apr 2021 17:16:01 +0000 (13:16 -0400)]
release: move installworld before installkernel
To support -DNO_ROOT work. The top-level installworld target creates a
new METALOG starting with `#mtree 2.0` so it needs to be first, to avoid
overwriting installkernel METALOG entries.
Reviewed by: gjb
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29582
Ed Maste [Fri, 26 Mar 2021 15:26:22 +0000 (11:26 -0400)]
gvinum: add deprecation notice
Vinum is a Logical Volume Manager that was introduced in FreeBSD 3.0,
and for FreeBSD 5 was ported to geom(4) as gvinum. gvinum has had no
specific development at least as far back as 2010, and has a number of
known bugs which are unlikely to be resolved.
Add a deprecation notice to raise awareness but state that vinum "may
not be" available in FreeBSD 14. Either it will be removed and the
notice will be updated to "is not" available, or someone will step up
to fix issues and maintain it and we will remove the notice.
Reviewed by: imp (earlier version)
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29424
Mitchell Horne [Wed, 17 Nov 2021 15:35:59 +0000 (11:35 -0400)]
Allow minidumps to be performed on the live system
Add a boolean parameter to minidumpsys(), to indicate a live dump. When
requested, take a snapshot of important global state, and pass this to
the machine-dependent minidump function. For now this includes the
kernel message buffer, and the bitset of pages to be dumped. Beyond
this, we don't take much action to protect the integrity of the dump
from changes in the running system.
A new function msgbuf_duplicate() is added for snapshotting the message
buffer. msgbuf_copy() is insufficient for this purpose since it marks
any new characters it finds as read.
For now, nothing can actually trigger a live minidump. A future patch
will add the mechanism for this. For simplicity and safety, live dumps
are disallowed for mips.
Reviewed by: markj, jhb
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31993
Mitchell Horne [Wed, 17 Nov 2021 15:35:18 +0000 (11:35 -0400)]
minidump: Use the provided dump bitset
When constructing the set of dumpable pages, use the bitset provided by
the state argument, rather than assuming vm_page_dump invariably. For
normal kernel minidumps this will be a pointer to vm_page_dump, but when
dumping the live system it will not.
To do this, the functions in vm_dumpset.h are extended to accept the
desired bitset as an argument. Note that this provided bitset is assumed
to be derived from vm_page_dump, and therefore has the same size.
Mitchell Horne [Wed, 17 Nov 2021 15:30:43 +0000 (11:30 -0400)]
minidump: reduce the amount direct accesses to page tables
During a live dump, we may race with updates to the kernel page tables.
This is generally okay; we accept that the state of the system while
dumping may be somewhat inconsistent with its state when the dump was
invoked. However, when walking the kernel page tables, it is important
that we load each PDE/PTE only once while operating on it. Otherwise, it
is possible to have the relevant PTE change underneath us. For example,
after checking the valid bit, but before reading the physical address.
Convert the loads to atomics, and add some validation around the
physical addresses, to ensure that we do not try to dump a non-existent
or non-canonical physical address.
Similarly, don't read kernel_vm_end more than once, on the off chance
that pmap_growkernel() is called between the two page table walks.
Reviewed by: kib, markj
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31990
Mitchell Horne [Wed, 17 Nov 2021 15:26:59 +0000 (11:26 -0400)]
minidump: Parameterize minidumpsys()
The minidump code is written assuming that certain global state will not
change, and rightly so, since it executes from a kernel debugger
context. In order to support taking minidumps of a live system, we
should allow copies of relevant global state that is likely to change to
be passed as parameters to the minidumpsys() function.
This patch does the work of parameterizing this function, by adding a
struct minidumpstate argument. For now, this struct allows for copies of
the kernel message buffer, and the bitset that tracks which pages should
be dumped (vm_page_dump). Follow-up changes will actually make use of
these arguments.
Notably, dump_avail[] does not need a snapshot, since it is not expected
to change after system initialization.
The existing minidumpsys() definitions are renamed, and a thin MI
wrapper is added to kern_dump.c, which handles the construction of
the state struct. Thus, calling minidumpsys() remains as simple as
before.
Andriy Gapon [Fri, 26 Nov 2021 06:52:56 +0000 (08:52 +0200)]
twsi: compile in support for debug messages, disabled by default
Debug messages can now be enabled per driver instance via a new sysctl.
Also, debug messages in TWSI_READ and TWSI_WRITE require debug level
greater than 1 as they are mostly redundant because callers of those
functions already log most interesting results.
NB: the twsi drivers call their device iichb, so the new sysctl will
appear under dev.iichb.N.
Rick Macklem [Thu, 18 Nov 2021 21:35:25 +0000 (13:35 -0800)]
mountd: Fix handling of usernames that start with a digit
yocalebo_gmail.com submitted a patch for mountd.c that
fixes the case where a username starts with a digit.
Without this patch, the username that starts with a
digit is misinterpreted as a numeric uid.
With this patch, any string that does not entirely
convert to a decimal number via strtoul() is considered
a user/group name.
Rick Macklem [Wed, 17 Nov 2021 00:02:53 +0000 (16:02 -0800)]
nfsd: Add a new rc variable nfs_server_maxio
Since vfs.nfsd.srvmaxio can only be set when nfsd.ko
is loaded, but nfsd is not running, setting it in
/etc/sysctl.conf is not feasible when "options NFSD"
was not specified for the kernel.
This patch adds a new rc variable nfs_server_maxio,
which sets vfs.nfsd.srvmaxio at the correct time.
Mark Johnston [Wed, 24 Nov 2021 18:19:54 +0000 (13:19 -0500)]
netinet: Implement in_cksum_skip() using m_apply()
This allows it to work with unmapped mbufs. In particular,
in_cksum_skip() calls no longer need to be preceded by calls to
mb_unmapped_to_ext() to avoid a page fault.
PR: 259645
Reviewed by: gallatin, glebius, jhb
Sponsored by: The FreeBSD Foundation
Mark Johnston [Wed, 24 Nov 2021 18:19:44 +0000 (13:19 -0500)]
netinet: Deduplicate most in_cksum() implementations
in_cksum() and related routines are implemented separately for each
platform, but only i386 and arm have optimized versions. Other
platforms' copies of in_cksum.c are identical except for style
differences and support for big-endian CPUs.
Deduplicate the implementations for the rest of the platforms. This
will make it easier to implement in_cksum() for unmapped mbufs. On arm
and i386, define HAVE_MD_IN_CKSUM to mean that the MI implementation is
not to be compiled.
No functional change intended.
Reviewed by: kp, glebius
Sponsored by: The FreeBSD Foundation
Wei Hu [Sat, 27 Nov 2021 06:42:34 +0000 (06:42 +0000)]
Hyper-V: vPCI: Prepopulate device bars
In recent Hyper-V releases on Windows Server 2022, vPCI code does not
initialize the last 4 bit of device bar registers. This behavior change
could result weird problems cuasing PCI code failure when configuring
bars.
Just write all 1's to those bars whose probed values are not the same
as current read ones. This seems to make Hyper-V vPCI and
pci_write_bar() to cooperate correctly on these releases.
Mark Johnston [Tue, 16 Nov 2021 18:36:30 +0000 (13:36 -0500)]
sctp: Use m_apply() to calcuate a checksum for an mbuf chain
m_apply() works on unmapped mbufs, so this will let us elide
mb_unmapped_to_ext() calls preceding sctp_calculate_cksum() calls in
the network stack.
Modify sctp_calculate_cksum() to assume it's passed an mbuf header.
This assumption appears to be true in practice, and we need to know the
full length of the chain.
No functional change intended.
Reviewed by: tuexen, jhb
Sponsored by: The FreeBSD Foundation
Mark Johnston [Tue, 16 Nov 2021 18:31:04 +0000 (13:31 -0500)]
mbuf: Only allow extpg mbufs if the system has a direct map
Some upcoming changes will modify software checksum routines like
in_cksum() to operate using m_apply(), which uses the direct map to
access packet data for unmapped mbufs. This approach of course does not
work on platforms without a direct map, so we have to disallow the use
of unmapped mbufs on such platforms.
I believe this is the right tradeoff: we only configure KTLS on amd64
and arm64 today (and one KTLS consumer, NFS TLS, requires a direct map
already), and the use of unmapped mbufs with plain sendfile is a recent
optimization. If need be, m_apply() could be modified to create
CPU-private mappings of extpg mbuf pages as a fallback.
So, change mb_use_ext_pgs to be hard-wired to zero on systems without a
direct map. Note that PMAP_HAS_DMAP is not a compile-time constant on
some systems, so the default value of mb_use_ext_pgs has to be
determined during boot.
Reviewed by: jhb
Discussed with: gallatin
Sponsored by: The FreeBSD Foundation