Gleb Smirnoff [Sat, 23 Mar 2024 05:44:16 +0000 (22:44 -0700)]
tests/netinet: add UDP socket I/O tests
Start a file that would collect tests for I/O functionality of a UDP
socket, targeted on how a socket interacts with userland rather than with
wire side of the protocol.
First version tests that MSG_TRUNC and MSG_PEEK are working correctly.
Gleb Smirnoff [Sat, 23 Mar 2024 02:50:33 +0000 (19:50 -0700)]
tests/netgraph: mark all tests as required_user="root"
Any netgraph operation requires root priveleges. Some tests in the
directory already mark themselves with 'atf_tc_set_md_var(conf,
"require.user", "root");' which creates a lot of pasted code. Some tests
don't mark self. For this particular directory a blanket metadata setting
in the Makefile is acceptable, imho.
John Baldwin [Sat, 23 Mar 2024 00:25:07 +0000 (17:25 -0700)]
nvmecontrol: Display additional Fabrics-related fields for cdata
Some of these fields are specific to Fabrics controllers (such as the
size of capsules) while other fields are shared with PCI-e
controllers, but are more relevant for Fabrics controllers (such as
KeepAlive timer properties).
John Baldwin [Sat, 23 Mar 2024 00:23:09 +0000 (17:23 -0700)]
nvme: Add SGL structure and constants for use in NVMe commands
Fabrics capsules use an SGL structure instead of prp1/2 addresses to
describe the data buffer used for a command. The SGL structure is
added to a union with the existing prp1/2 fields.
Michael Tuexen [Fri, 22 Mar 2024 13:50:25 +0000 (14:50 +0100)]
rtld: fix check for endianess of elf hints file
Don't check if the elf hints file is in host byte order, but check
if it is in little endian by looking at the magic number.
This fixes rtld on big endian platforms.
Reviewed by: se, kib (prior version of the patch)
Fixes: 7b77d37a561b ("rtld-elf: support either byte-order of hints")
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D44472
build: add -Wswitch to clang for more consistency with gcc
gcc12 and gcc13 appear to include Wswitch with Wall, while
clang doesn't. For switch() statements on enum, this forces
the use of at least a default: clause, in adherance with style(9).
Reviewed By: emaste
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D44092
Kristof Provost [Thu, 21 Mar 2024 07:38:45 +0000 (08:38 +0100)]
if_ovpn tests: test large packets in IPv6 tunnel
There's a report of MTU issues over IPv6 DCO tunnels.
Extend the 4in6 test to send a series of pings with different sizes, as
well as transfer a large file.
No issues were found, but we may as well extend the test case.
Mark Johnston [Fri, 22 Mar 2024 06:11:03 +0000 (02:11 -0400)]
ddb: Fix format string errors in db_pprint.c
For some reason, db_expr_t is defined as "long" on 64-bit platforms and
"int" on others. When printing values of this type, simply cast them to
long to suppress compilation errors on 32-bit systems.
Bojan Novković [Fri, 22 Mar 2024 03:01:34 +0000 (04:01 +0100)]
ddb: Add CTF-based pretty printing
Add basic CTF support and a CTF-powered pretty-printer to ddb.
The db_ctf.* files expose a basic interface for fetching type
data for ELF symbols, interacting with the CTF string table,
and translating type identifiers to type data.
The db_pprint.c file uses those interfaces to implement
a pretty-printer for all kernel ELF symbols.
The pretty-printer works with symbol names and arbitrary addresses:
pprint struct thread 0xffffffff8194ad90
Pretty-printing currently only works after the root filesystem
gets mounted because the CTF info is not available during
early boot.
Dimitry Andric [Thu, 21 Mar 2024 20:44:46 +0000 (21:44 +0100)]
Slightly reorganize libclang_rt Makefile again
Make a separate .elif section for MACHINE_ARCH==powerpc, and subdivide
the MACHINE_CPUARCH values under it. If at some point more sanitizer
libraries become available for powerpc CPU architectures, they can be
added before the "nothing for other powerpc yet" case. Similar for the
MACHINE_ARCH==arm case.
Dimitry Andric [Thu, 21 Mar 2024 13:53:36 +0000 (14:53 +0100)]
Fix building of several libclang_rt libraries for powerpc64 and powerp64le
I reorganized the libclang_rt Makefile in e77a1bb27574 to make it more
readable and maintainable, but the check for 32-bit powerpc was wrong.
This caused almost no libclang_rt libraries to be built for powerpc64
and powerpc64le.
Stefan Eßer [Thu, 21 Mar 2024 15:31:49 +0000 (16:31 +0100)]
rtld-elf: add some debug print statements
The byte-order independent code has been reported to fail on powerpc64.
Add some more debug statements to help identify the parametrs used and
to verify the correct operation of the byte-swap macros used..
Mitchell Horne [Thu, 21 Mar 2024 15:21:41 +0000 (12:21 -0300)]
kassert.h: update MPASS definition commentary
We now have a detailed man page describing both MPASS and KASSERT. Give
a warning that careless use of MPASS can result in inadequate assertion
messages, and point to the MPASS(9) page which describes this.
While here add a comment above the KASSERT definitions pointing to the
man page.
Suggested by: bz
Reviewed by: emaste
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D44438
Andrew Turner [Thu, 14 Mar 2024 14:02:56 +0000 (14:02 +0000)]
arm64: Mask non-debug exceptions when single stepping
When an exception is pending when single stepping we may execute the
handler for that exception rather than the single step handler. This
could cause the scheduler to fire to run a new thread. This will mean
we single step to a new thread causing unexpected results.
Handle this by masking non-debug exceptions. This will cause issues
when stepping over instructions that access the DAIF values so future
work is needed to handle these cases, but for most code this now works
as expected.
Reviewed by: jhb
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D44350
Andrew Turner [Thu, 21 Mar 2024 10:13:16 +0000 (10:13 +0000)]
arm64: Support passing more registers to signals
To support recent extensions to the Arm architecture we may need to
store more or larger registers when sending a signal.
To support this create a list of these extra registers. Userspace that
needs to access a register in the signal handler can then walk the list
to find the correct register struct and read/write its contents.
Mark Johnston [Thu, 21 Mar 2024 04:20:37 +0000 (00:20 -0400)]
bhyve: Move device model-independent UART code into a separate file
Currently bhyve implements a ns16550-compatible UART in uart_emul.c.
This file also contains generic code to manage RX FIFOs and to handle
reading from and writing to a TTY. bhyve instantiates UARTs to
implement COM devices (via pci_lpc.c) and PCI UART devices.
The arm64 port will bring with it a PL011 device model which is used as
the default console (i.e., no COM ports). To simplify its integration,
add a UART "backend" layer which lets UART device models allocate an RX
FIFO and interact with TTYs without duplicating code. In particular,
code in uart_backend.* is to be shared among device models, and the
namespace for uart_emul.* is changed to uart_ns16550_*.
This is based on andrew@'s work in
https://github.com/zxombie/freebsd/tree/bhyvearm64 but I've made a
number of changes, particularly with respect to naming and source code
organization.
This should fix 'Assertion failed: (isa<To>(Val) && "cast<Ty>() argument
of incompatible type!")' errors when building devel/boost-libs,
specifically libs/url/src/segments_view.cpp.
Bump __FreeBSD_version so this fix can easily be detected from
devel/boost-all/compiled.mk.
John Baldwin [Wed, 20 Mar 2024 22:30:09 +0000 (15:30 -0700)]
cxgbe tom: Handle a race condition when enabling TLS offload
Use a separate state for when a request to set RX_QUIESCE has been
sent but the resulting TCB reply has not been received. In
particular, this correctly handles the case where data has been
received and queued in the receive queue before the quiesce request
takes effect.
John Baldwin [Wed, 20 Mar 2024 22:29:51 +0000 (15:29 -0700)]
NFS: Request use of TCP_USE_DDP for in-kernel TCP sockets
Since this is an optimization, ignore failures to enable the option.
For the server side, defer enabling DDP until the first non-NULLPROC
RPC is received. This allows TLS handling (which uses NULLPROC RPCs)
to enable TLS offload first.
John Baldwin [Wed, 20 Mar 2024 22:29:28 +0000 (15:29 -0700)]
cxgbe: Support TCP_USE_DDP on offloaded TOE connections
When this socket option is enabled, relatively large contiguous
buffers are allocated and used to receive data from the remote
connection. When data is received a wrapper M_EXT mbuf is queued to
the socket's receive buffer. This reduces the length of the linked
list of received mbufs and allows consumers to consume receive data in
larger chunks.
To minimize reprogramming the page pods in the adapter, receive
buffers for a given connection are recycled. When a buffer has been
fully consumed by the receiver and freed, the buffer is placed on a
per-connection free buffers list.
The size of the receive buffers defaults to 256k and can be set via
the hw.cxgbe.toe.ddp_rcvbuf_len sysctl. The
hw.cxgbe.toe.ddp_rcvbuf_cache sysctl (defaults to 4) determines the
maximum number of free buffers cached per connection. Note that this
limit does not apply to "in-flight" receive buffers that are
associated with mbufs in the socket's receive buffer.
John Baldwin [Wed, 20 Mar 2024 22:29:02 +0000 (15:29 -0700)]
tcp: Add a new kernel-only TCP_USE_DDP socket option
This socket option can be used by in-kernel consumers (like NFS) to
request a NIC to use optimized receive of large buffers for a
connection. The current use case is to support DDP by the TOE on
Chelsio NICs.
Andrew Gallatin [Wed, 20 Mar 2024 19:46:01 +0000 (15:46 -0400)]
ip6_output: Reduce cache misses on pktopts
When profiling an IP6 heavy workload, I noticed that we were
getting a lot of cache misses in ip6_output() around
ip6_pktopts. This was happening because the TCP stack passes
inp->in6p_outputopts even if all options are unused. So in the
common case of no options present, pkt_opts is not null, and is
checked repeatedly for different options. Since ip6_pktopts is
large (4 cachelines), and every field is checked, we take 4
cache misses (2 of which tend to be hidden by the adjacent line
prefetcher).
To fix this common case, I introduced a new flag in ip6_pktopts
(ip6po_valid) which tracks which options have been set. In the
common case where nothing is set, this causes just a single
cache miss to load. It also eliminates a test for some options
(if (opt != NULL && opt->val >= const) vs if ((optvalid & flag) !=0 )
To keep the struct the same size in 64-bit kernels, and to keep
the integer values (like ip6po_hlim, ip6po_tclass, etc) on the
same cacheline, I moved them to the top.
As suggested by zlei, the null check in MAKE_EXTHDR() becomes
redundant, and can be removed.
For our web server workload (with the ip6po_tclass option set),
this drops the CPI from 2.9 to 2.4 for ip6_output
Differential Revision: https://reviews.freebsd.org/D44204
Reviewed by: bz, glebius, zlei
No Objection from: melifaro
Sponsored by: Netflix Inc.
pkgbase: add a mechanism to be able to force a give ucl include
This is made in order to be able to find add the post-install scripts
for the kernel, where PKGNAME varies for each KERNCONF but we don't want
to dynamically duplicated the kernel.ucl file.
At the same time we don't want the *-dbg* packages to actually include
those post-install scripts
Jose Luis Duran [Wed, 20 Mar 2024 04:54:18 +0000 (00:54 -0400)]
rc.initdiskless: Disable soft-updates in mdmfs (again)
Re-apply the -S switch to disable soft-updates in memory disks (commit 8b1292ac5219). This might be beneficial when tmpfs(5) is not present in
the kernel, as this can cause mdmfs(8)'s auto keyword to fallback to
using md(4).
Brooks Davis [Tue, 19 Mar 2024 21:52:39 +0000 (21:52 +0000)]
syscalls.master: make __sys_fcntl take an intptr_t
The (optional) third argument of fcntl is sometimes a pointer so change
the type to intptr_t. Update the libc-internal defintion (actually used
by libthr) to take a fixed intptr_t argument rather than pretending it's
a variadic function. (That worked because all supported architectures
pass variadic arguments as though the function was declared with those
types. In CheriBSD that changes because variadic arguments are passed
via a bounded array.)
Brooks Davis [Mon, 18 Mar 2024 21:37:39 +0000 (21:37 +0000)]
freebsd32: struct siginfo32 -> struct __siginfo32
In the next commit I will update syscalls.master to use struct __siginfo
(which actually exists) so this update will be needed to make
generated files (from make sysent) align.
Brooks Davis [Tue, 19 Mar 2024 21:51:40 +0000 (21:51 +0000)]
syscalls.master: align with sigfastblock declaration
sigfastblock is declared to take a void * argument in the manpage in
headers so declare it that way and use SAL annotations to say it
interacts with a 32-bit word.
In 469cfa3c30ee cperciva added TSLOG profiling to link_elf_ireloc. This
requires curthread to be read when the kernel linker is invoked, but it
hadn't yet been initialized. On amd64 this was harmless since [gs:0] was
readable; but on arm64 this broke since [x18] was not readable.
Move the curthread (and associated PCPU) setup earlier on arm64 in order
to allow TSLOG to work there.
Fixes: 469cfa3c30ee ("tslog: Annotate some early boot functions")
Differential Revision: https://reviews.freebsd.org/D44317
Gleb Smirnoff [Tue, 19 Mar 2024 18:48:59 +0000 (11:48 -0700)]
carp: check CARP status in in_localip_fib(), in6_localip_fib()
Don't report a BACKUP CARP address as local. These two functions are used
only by source address validation for input packets, controlled by sysctls
net.inet.ip.source_address_validation and
net.inet6.ip6.source_address_validation. For this purpose we definitely
want to treat BACKUP addresses as non local.
This change is conservative and doesn't modify compat in_localip() and
in6_localip(). They are used more widely than the FIB-aware versions.
The change would modify the notion of ipfw(4) 'me' keyword. There might
be other consequences as in_localip() is used by various tunneling
protocols.
Kristof Provost [Tue, 12 Mar 2024 12:29:08 +0000 (13:29 +0100)]
pf: fix dummynet + route-to
Ensure that we pick the correct dummynet pipe (i.e. forward vs. reverse
direction) when applying route-to.
We mark the processing as outbound so that dummynet will re-inject in
the correct phase of processing after it's done with the packet, but
that will cause us to pick the wrong pipe number. Reverse them so that
the incorrect decision ends up picking the correct pipe.
Kristof Provost [Mon, 11 Mar 2024 13:44:17 +0000 (14:44 +0100)]
pf: avoid passing through dummynet multiple times
In some setups we end up with multiple states created for a single
packet, which in turn can mean we run the packet through dummynet
multiple times. That's not expected or intended. Mark each packet when
it goes through dummynet, and do not pass packet through dummynet if
they're marked as having already passed through.
pkgbase: rework certctl package to only run rehash on the main package
Rework how ucl manifest are generated leveraging ucl features and flua
now the ucl generation is done via a lua script which uses libucl to
ingest the template and use variables as defined in its command line.
the template will include only if it exist a ucl file named after the
package name which will complement the template or overwrite what was
defined in the template if defined in this specific ucl file
this allows to overwrite license, but add script only to the packages
who actually needs them.
As a results the post install scripts are now only added to the right
package and not also added to the subpackages like -man or -dev
John Baldwin [Tue, 19 Mar 2024 00:01:23 +0000 (17:01 -0700)]
kldxref: Properly handle reading strings near the end of an ELF file
If a string is at or near the end of an input file and the amount of
remaining data in the file is smaller than the maximum string size,
the pread(2) system call would return a short read which is treated as
an error. Instead, add a new helper function for reading a string
which permits short reads so long as the data read from the file
contains a terminated string.
Reported by: jrtc27
Reviewed by: jrtc27
Sponsored by: University of Cambridge, Google, Inc.
Differential Revision: https://reviews.freebsd.org/D44419
Rick Macklem [Mon, 18 Mar 2024 22:40:41 +0000 (15:40 -0700)]
nfsd.8: Document ways to minimize Copy operation times
For NFSv4.2, a Copy operation can take a long time to complete.
If there is a concurrent ExchangeID or DelegReturn operation
which requires the exclusive lock on all NFSv4 state, this can
result in a stall of the nfsd server.
livedump_start_vnode(9) is introduced such that the live minidump on the
system could take a vnode. This interface could be used to extend support
for the existing framework in downstream.
Bump __FreeBSD_version for introducing livedump_start_vnode(9).
Gleb Smirnoff [Mon, 18 Mar 2024 20:57:00 +0000 (13:57 -0700)]
tcp: clear all TCP timers in tcp_timer_stop() when in callout
When a TCP callout decides to disable self, e.g. tcp_timer_2msl() calling
tcp_close(), we must also clear all other possible timers. Otherwise,
upon return, the callout would be scheduled again in tcp_timer_enter().
Revert 57e27ff07aff, which was a temporary partial revert of otherwise
correct 62d47d73b7eb, that exposed the problem being fixed now. Add an
extra assertion in tcp_timer_enter() to check we aren't arming callout for
a closed connection.