Andrew Turner [Thu, 14 Mar 2024 14:02:56 +0000 (14:02 +0000)]
arm64: Mask non-debug exceptions when single stepping
When an exception is pending when single stepping we may execute the
handler for that exception rather than the single step handler. This
could cause the scheduler to fire to run a new thread. This will mean
we single step to a new thread causing unexpected results.
Handle this by masking non-debug exceptions. This will cause issues
when stepping over instructions that access the DAIF values so future
work is needed to handle these cases, but for most code this now works
as expected.
Reviewed by: jhb
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D44350
Andrew Turner [Thu, 21 Mar 2024 10:13:16 +0000 (10:13 +0000)]
arm64: Support passing more registers to signals
To support recent extensions to the Arm architecture we may need to
store more or larger registers when sending a signal.
To support this create a list of these extra registers. Userspace that
needs to access a register in the signal handler can then walk the list
to find the correct register struct and read/write its contents.
Mark Johnston [Thu, 21 Mar 2024 04:20:37 +0000 (00:20 -0400)]
bhyve: Move device model-independent UART code into a separate file
Currently bhyve implements a ns16550-compatible UART in uart_emul.c.
This file also contains generic code to manage RX FIFOs and to handle
reading from and writing to a TTY. bhyve instantiates UARTs to
implement COM devices (via pci_lpc.c) and PCI UART devices.
The arm64 port will bring with it a PL011 device model which is used as
the default console (i.e., no COM ports). To simplify its integration,
add a UART "backend" layer which lets UART device models allocate an RX
FIFO and interact with TTYs without duplicating code. In particular,
code in uart_backend.* is to be shared among device models, and the
namespace for uart_emul.* is changed to uart_ns16550_*.
This is based on andrew@'s work in
https://github.com/zxombie/freebsd/tree/bhyvearm64 but I've made a
number of changes, particularly with respect to naming and source code
organization.
This should fix 'Assertion failed: (isa<To>(Val) && "cast<Ty>() argument
of incompatible type!")' errors when building devel/boost-libs,
specifically libs/url/src/segments_view.cpp.
Bump __FreeBSD_version so this fix can easily be detected from
devel/boost-all/compiled.mk.
John Baldwin [Wed, 20 Mar 2024 22:30:09 +0000 (15:30 -0700)]
cxgbe tom: Handle a race condition when enabling TLS offload
Use a separate state for when a request to set RX_QUIESCE has been
sent but the resulting TCB reply has not been received. In
particular, this correctly handles the case where data has been
received and queued in the receive queue before the quiesce request
takes effect.
John Baldwin [Wed, 20 Mar 2024 22:29:51 +0000 (15:29 -0700)]
NFS: Request use of TCP_USE_DDP for in-kernel TCP sockets
Since this is an optimization, ignore failures to enable the option.
For the server side, defer enabling DDP until the first non-NULLPROC
RPC is received. This allows TLS handling (which uses NULLPROC RPCs)
to enable TLS offload first.
John Baldwin [Wed, 20 Mar 2024 22:29:28 +0000 (15:29 -0700)]
cxgbe: Support TCP_USE_DDP on offloaded TOE connections
When this socket option is enabled, relatively large contiguous
buffers are allocated and used to receive data from the remote
connection. When data is received a wrapper M_EXT mbuf is queued to
the socket's receive buffer. This reduces the length of the linked
list of received mbufs and allows consumers to consume receive data in
larger chunks.
To minimize reprogramming the page pods in the adapter, receive
buffers for a given connection are recycled. When a buffer has been
fully consumed by the receiver and freed, the buffer is placed on a
per-connection free buffers list.
The size of the receive buffers defaults to 256k and can be set via
the hw.cxgbe.toe.ddp_rcvbuf_len sysctl. The
hw.cxgbe.toe.ddp_rcvbuf_cache sysctl (defaults to 4) determines the
maximum number of free buffers cached per connection. Note that this
limit does not apply to "in-flight" receive buffers that are
associated with mbufs in the socket's receive buffer.
John Baldwin [Wed, 20 Mar 2024 22:29:02 +0000 (15:29 -0700)]
tcp: Add a new kernel-only TCP_USE_DDP socket option
This socket option can be used by in-kernel consumers (like NFS) to
request a NIC to use optimized receive of large buffers for a
connection. The current use case is to support DDP by the TOE on
Chelsio NICs.
Andrew Gallatin [Wed, 20 Mar 2024 19:46:01 +0000 (15:46 -0400)]
ip6_output: Reduce cache misses on pktopts
When profiling an IP6 heavy workload, I noticed that we were
getting a lot of cache misses in ip6_output() around
ip6_pktopts. This was happening because the TCP stack passes
inp->in6p_outputopts even if all options are unused. So in the
common case of no options present, pkt_opts is not null, and is
checked repeatedly for different options. Since ip6_pktopts is
large (4 cachelines), and every field is checked, we take 4
cache misses (2 of which tend to be hidden by the adjacent line
prefetcher).
To fix this common case, I introduced a new flag in ip6_pktopts
(ip6po_valid) which tracks which options have been set. In the
common case where nothing is set, this causes just a single
cache miss to load. It also eliminates a test for some options
(if (opt != NULL && opt->val >= const) vs if ((optvalid & flag) !=0 )
To keep the struct the same size in 64-bit kernels, and to keep
the integer values (like ip6po_hlim, ip6po_tclass, etc) on the
same cacheline, I moved them to the top.
As suggested by zlei, the null check in MAKE_EXTHDR() becomes
redundant, and can be removed.
For our web server workload (with the ip6po_tclass option set),
this drops the CPI from 2.9 to 2.4 for ip6_output
Differential Revision: https://reviews.freebsd.org/D44204
Reviewed by: bz, glebius, zlei
No Objection from: melifaro
Sponsored by: Netflix Inc.
pkgbase: add a mechanism to be able to force a give ucl include
This is made in order to be able to find add the post-install scripts
for the kernel, where PKGNAME varies for each KERNCONF but we don't want
to dynamically duplicated the kernel.ucl file.
At the same time we don't want the *-dbg* packages to actually include
those post-install scripts
Jose Luis Duran [Wed, 20 Mar 2024 04:54:18 +0000 (00:54 -0400)]
rc.initdiskless: Disable soft-updates in mdmfs (again)
Re-apply the -S switch to disable soft-updates in memory disks (commit 8b1292ac5219). This might be beneficial when tmpfs(5) is not present in
the kernel, as this can cause mdmfs(8)'s auto keyword to fallback to
using md(4).
Brooks Davis [Tue, 19 Mar 2024 21:52:39 +0000 (21:52 +0000)]
syscalls.master: make __sys_fcntl take an intptr_t
The (optional) third argument of fcntl is sometimes a pointer so change
the type to intptr_t. Update the libc-internal defintion (actually used
by libthr) to take a fixed intptr_t argument rather than pretending it's
a variadic function. (That worked because all supported architectures
pass variadic arguments as though the function was declared with those
types. In CheriBSD that changes because variadic arguments are passed
via a bounded array.)
Brooks Davis [Mon, 18 Mar 2024 21:37:39 +0000 (21:37 +0000)]
freebsd32: struct siginfo32 -> struct __siginfo32
In the next commit I will update syscalls.master to use struct __siginfo
(which actually exists) so this update will be needed to make
generated files (from make sysent) align.
Brooks Davis [Tue, 19 Mar 2024 21:51:40 +0000 (21:51 +0000)]
syscalls.master: align with sigfastblock declaration
sigfastblock is declared to take a void * argument in the manpage in
headers so declare it that way and use SAL annotations to say it
interacts with a 32-bit word.
In 469cfa3c30ee cperciva added TSLOG profiling to link_elf_ireloc. This
requires curthread to be read when the kernel linker is invoked, but it
hadn't yet been initialized. On amd64 this was harmless since [gs:0] was
readable; but on arm64 this broke since [x18] was not readable.
Move the curthread (and associated PCPU) setup earlier on arm64 in order
to allow TSLOG to work there.
Fixes: 469cfa3c30ee ("tslog: Annotate some early boot functions")
Differential Revision: https://reviews.freebsd.org/D44317
Gleb Smirnoff [Tue, 19 Mar 2024 18:48:59 +0000 (11:48 -0700)]
carp: check CARP status in in_localip_fib(), in6_localip_fib()
Don't report a BACKUP CARP address as local. These two functions are used
only by source address validation for input packets, controlled by sysctls
net.inet.ip.source_address_validation and
net.inet6.ip6.source_address_validation. For this purpose we definitely
want to treat BACKUP addresses as non local.
This change is conservative and doesn't modify compat in_localip() and
in6_localip(). They are used more widely than the FIB-aware versions.
The change would modify the notion of ipfw(4) 'me' keyword. There might
be other consequences as in_localip() is used by various tunneling
protocols.
Kristof Provost [Tue, 12 Mar 2024 12:29:08 +0000 (13:29 +0100)]
pf: fix dummynet + route-to
Ensure that we pick the correct dummynet pipe (i.e. forward vs. reverse
direction) when applying route-to.
We mark the processing as outbound so that dummynet will re-inject in
the correct phase of processing after it's done with the packet, but
that will cause us to pick the wrong pipe number. Reverse them so that
the incorrect decision ends up picking the correct pipe.
Kristof Provost [Mon, 11 Mar 2024 13:44:17 +0000 (14:44 +0100)]
pf: avoid passing through dummynet multiple times
In some setups we end up with multiple states created for a single
packet, which in turn can mean we run the packet through dummynet
multiple times. That's not expected or intended. Mark each packet when
it goes through dummynet, and do not pass packet through dummynet if
they're marked as having already passed through.
pkgbase: rework certctl package to only run rehash on the main package
Rework how ucl manifest are generated leveraging ucl features and flua
now the ucl generation is done via a lua script which uses libucl to
ingest the template and use variables as defined in its command line.
the template will include only if it exist a ucl file named after the
package name which will complement the template or overwrite what was
defined in the template if defined in this specific ucl file
this allows to overwrite license, but add script only to the packages
who actually needs them.
As a results the post install scripts are now only added to the right
package and not also added to the subpackages like -man or -dev
John Baldwin [Tue, 19 Mar 2024 00:01:23 +0000 (17:01 -0700)]
kldxref: Properly handle reading strings near the end of an ELF file
If a string is at or near the end of an input file and the amount of
remaining data in the file is smaller than the maximum string size,
the pread(2) system call would return a short read which is treated as
an error. Instead, add a new helper function for reading a string
which permits short reads so long as the data read from the file
contains a terminated string.
Reported by: jrtc27
Reviewed by: jrtc27
Sponsored by: University of Cambridge, Google, Inc.
Differential Revision: https://reviews.freebsd.org/D44419
Rick Macklem [Mon, 18 Mar 2024 22:40:41 +0000 (15:40 -0700)]
nfsd.8: Document ways to minimize Copy operation times
For NFSv4.2, a Copy operation can take a long time to complete.
If there is a concurrent ExchangeID or DelegReturn operation
which requires the exclusive lock on all NFSv4 state, this can
result in a stall of the nfsd server.
livedump_start_vnode(9) is introduced such that the live minidump on the
system could take a vnode. This interface could be used to extend support
for the existing framework in downstream.
Bump __FreeBSD_version for introducing livedump_start_vnode(9).
Gleb Smirnoff [Mon, 18 Mar 2024 20:57:00 +0000 (13:57 -0700)]
tcp: clear all TCP timers in tcp_timer_stop() when in callout
When a TCP callout decides to disable self, e.g. tcp_timer_2msl() calling
tcp_close(), we must also clear all other possible timers. Otherwise,
upon return, the callout would be scheduled again in tcp_timer_enter().
Revert 57e27ff07aff, which was a temporary partial revert of otherwise
correct 62d47d73b7eb, that exposed the problem being fixed now. Add an
extra assertion in tcp_timer_enter() to check we aren't arming callout for
a closed connection.
Andrew Turner [Thu, 14 Mar 2024 17:31:39 +0000 (17:31 +0000)]
arm64: Return all registers to gdb when able
When the kdb thread is the current thread we read the registers from
the trap frame. As this contains all general purpose registers we can
use it to read these in the gdb stub. This allows us to include the
non-callee saved registers, e.g. function arguments.
Reviewed by: imp
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D44360
Andrew Turner [Tue, 12 Mar 2024 18:06:18 +0000 (18:06 +0000)]
uart: Split out initilisation of the acpi devinfo
Split out the common parts of building the uart devinfo from ACPI
tables from the SPCR parser. This will be used when we support the DBG2
table to find the debug uart to be used by the kernel gdb stub.
Reviewed by: imp
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D44357
Gleb Smirnoff [Mon, 18 Mar 2024 15:56:17 +0000 (08:56 -0700)]
tcp: remove IS_FASTOPEN() macro
The macro is more obfuscating than helping as it just checks a single flag
of t_flags. All other t_flags bits are checked without a macro.
A bigger problem was that declaration of the macro in tcp_var.h depended
on a kernel option. It is a bad practice to create such definitions in
installable headers.
Gleb Smirnoff [Mon, 18 Mar 2024 15:50:30 +0000 (08:50 -0700)]
sockets: remove unused KPIs to manipulate sockets
These KPIs were added in dd0e6c383a9f0 and through 15 years had zero use.
They slightly remind what IfAPI does for struct ifnet. But IfAPI does
that for the sake of large collection of NIC drivers not being aware of
struct ifnet. For the sockets it is unclear what could be a large
collection of externally written kernel modules that need extensively use
sockets and not be aware of their internals at the same time. This
isolation of a structure knowledge requires a lot of work, and just
throwing in a few KPIs isn't helpful.
Gleb Smirnoff [Mon, 18 Mar 2024 15:49:39 +0000 (08:49 -0700)]
inpcb: remove unused KPIs to manipulate inpcbs
These KPIs were added in 9d29c635daa69 and through 15 years had zero use.
They slightly remind what IfAPI does for struct ifnet. But IfAPI does
that for the sake of large collection of NIC drivers not being aware of
struct ifnet. For the inpcb it is unclear what could be a large
collection of externally written kernel modules that need extensively use
inpcb and not be aware of its internals at the same time. This isolation
of a structure knowledge requires a lot of work, and just throwing in a
few KPIs isn't helpful.
Ed Maste [Mon, 18 Mar 2024 14:15:27 +0000 (10:15 -0400)]
ssh: remove deprecated client VersionAddendum
Support for a client VersionAddendum was removed in bffe60ead024, but
the option was retained (as oDeprecated) as a transition aid.
Sufficient time has passed that it can be removed.
Ed Maste [Mon, 18 Mar 2024 14:00:57 +0000 (10:00 -0400)]
ssh: Update to OpenSSH 9.7p1
This release contains mostly bugfixes.
It also makes support for the DSA signature algorithm a compile-time
option, with plans to disable it upstream later this year and remove
support entirely in 2025.
Full release notes at https://www.openssh.com/txt/release-9.7
Relnotes: Yes
Sponsored by: The FreeBSD Foundation
Michael Osipov [Thu, 14 Mar 2024 16:39:47 +0000 (17:39 +0100)]
freebsd-update: mark "cron" as fetched as "fetch" itself
The change in 33bd05c3187d7b49c80cf1b0132b405c105d0833 was incomplete
because it did not mark "cron" as ISFETCHED=1 although it performs the
same operations as "install", but less output and does not perform a
hard exit. Mark result as such and make "install" know that updates have
been fetched.
Provide both zfs and ufs images which a 1MB partition reserved for the
config drive wearing a GPT Label "config-drive" to allow consumer to
know where they should push the config drive on the provided image.
2 formats available: qcow2 and raw
This has been tested on OVHCloud baremetal via "bring your own image"
Also tested on openstack
Mark Murray [Fri, 1 Mar 2024 15:53:58 +0000 (15:53 +0000)]
lib/msun: Fix tgammal(3) on IEEE 128-bit platforms
Undo the 80-bit "stub" implementation of the 128-bit long double
tgammal(3) function. The latest (as of Feb 2024) version of the
src/contrib/arm-optimised-routines library includes a standalone,
full 128-bit replacement. This needs a small bit of wrapping to
fit it in, but is otherwise a drop-in replacement.
Testing this is hard, as most maths packages blow up as soon as
their 80-bit floating-point capability is exceeded. With 128-bit
tgammal(), this is easy to do, and this is the range that needs to
be checked the most carefully. Using my copy of Maple, I was able
to check that the output was within a few ULP of the correct answer,
right up to the point of 128-bit over- and underflow. Additionally,
the results are no worse, and indeed better than the 80-bit version.
Steve Kargl sent me his libm testing code, which I used to verify
that the excpetions for certain key values were correct. Tested in
this case were +-Inf, +-NaN, +-1 and +-0.
Michael Osipov [Tue, 30 Jan 2024 16:24:45 +0000 (17:24 +0100)]
freebsd-update: Don't provide copiable commands in output
Previously, freebsd-update provided ready-to-go commands for copying and
pasting into the terminal. This causes problems as soon as options are
used and not supplied again by the user, e.g., '-b' or '-d'.
Stop making them copiable and force the user to construct a valid command
line by himself to avoid failures.
Jessica Clarke [Sat, 16 Mar 2024 01:50:21 +0000 (01:50 +0000)]
kldxref: Fix bootstrapping on macOS with Clang 16 / Apple Clang 15
macOS, like Linux, does not include an outer const qualifier for its
fts_open callback arguments, so -Wincompatible-function-pointer-types
also picks this up and breaks the build now Clang 16 makes it an error
by default. Extend the existing Linux support to fix this.
Jessica Clarke [Sat, 16 Mar 2024 01:50:20 +0000 (01:50 +0000)]
jevents: Fix bootstrapping on macOS with Clang 16 / Apple Clang 15
macOS, like Linux, does not include an outer const qualifier for its
fts_open callback arguments, so -Wincompatible-function-pointer-types
also picks this up and breaks the build now Clang 16 makes it an error
by default. Extend the existing Linux support to fix this.
Jessica Clarke [Sat, 16 Mar 2024 01:50:20 +0000 (01:50 +0000)]
mandoc: Fix bootstrapping on macOS with Clang 16 / Apple Clang 15
macOS, like Linux, does not include an outer const qualifier for its
fts_open callback arguments, so -Wincompatible-function-pointer-types
also picks this up and breaks the build now Clang 16 makes it an error
by default. Extend the existing Linux support to fix this.
Rick Macklem [Sat, 16 Mar 2024 01:04:37 +0000 (18:04 -0700)]
nfsd: Add a sysctl to limit NFSv4.2 Copy RPC size
NFSv4.2 supports a Copy operation, which avoids file data being
read to the client and then written back to the server, if both
input and output files are on the same NFSv4.2 mount for
copy_file_range(2).
Unfortunately, this Copy operation can take a long time under
certain circumstances. If this occurs concurrently with a RPC
that requires an exclusive lock on the nfsd such as ExchangeID
done for a new mount, the result can be an nfsd "stall" until
the Copy completes.
This patch adds a sysctl that can be set to limit the size of
a Copy operation or, if set to 0, disable Copy operations.
The use of this sysctl and other ways to avoid Copy operations
taking too long will be documented in the nfsd.4 man page by
a separate commit.