Ed Maste [Tue, 23 Jan 2024 18:04:43 +0000 (13:04 -0500)]
bsdlabel: add deprecation notice
gpart is the preferred tool for managing partitions of all types,
including BSD disklabels.
Note that this is only about bsdlabel/disklabel, the tool -- there is no
current plan to remove support for MBR or BSD disk labels from the
kernel or from gpart.
Reviewed by: imp, olce
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D43563
Mark Peek [Mon, 25 Mar 2024 15:58:46 +0000 (16:58 +0100)]
certctl: Revert to symlinks.
Unfortunately tar will not be able to extract base.txz to a system where
/etc and /usr are not on the same filesystem if the certificates are
hard links.
* Add a dummy getopt(3) loop to handle `--`.
* Move interval parsing out into a separate function.
* Print a diagnostic for every invalid interval.
* Check for NaN and infinity.
* Improve bounds checks.
Manual page:
* Miscellaneous markup fixes.
* Reword DESCRIPTION section.
* Move text about GNU compatibility to STANDARDS section.
* Convert examples from csh to sh.
Sponsored by: Klara, Inc.
Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D44471
Kristof Provost [Sun, 24 Mar 2024 08:46:31 +0000 (09:46 +0100)]
pfsync: fix use of invalidated stack variable
Calls to pfsync_send_plus() pass pointers to stack variables.
If pfsync_sendout() then fails it retains the pointer to these stack
variables, accesing them later.
Allocate a buffer and copy the data instead, so that we can retain the
pointer safely.
Kristof Provost [Sat, 23 Mar 2024 16:02:50 +0000 (17:02 +0100)]
pf: fix use-after-free
If we fragment the packet in pf_route() the first transmitted packet
will free the pf_mtag we have stored in pf_pdesc (pd). Ensure we
update that pointer for every packet to avoid using a freed pointer in
pf_dummynet_route().
Eliot Solomon [Sat, 18 Nov 2023 21:13:21 +0000 (15:13 -0600)]
arm64: fix free queue and reservation configuration for 16KB pages
Correctly configure the free page queues and the reservation size when
the base page size is 16KB. In particular, the reservation size was
less than the L2 Block size, making L2 promotions and mappings all but
impossible.
Gleb Smirnoff [Sun, 24 Mar 2024 16:13:23 +0000 (09:13 -0700)]
icmp6: bring rate limiting on a par with IPv4
Use counter_ratecheck() instead of racy and slow ppsratecheck. Use a
separate counter for every currently known type of ICMPv6. Provide logging
of ratelimit events. Provide jitter to counter open UDP port detection.
Gleb Smirnoff [Sun, 24 Mar 2024 16:13:23 +0000 (09:13 -0700)]
icmp6: move ICMPv6 related tunables to the files where they are used
Most of them can be declared as static after the move out of in6_proto.c.
Keeping sysctl(9) declarations with their text descriptions next to the
variable declaration create self-documenting code. There should be no
functional changes.
Gleb Smirnoff [Sun, 24 Mar 2024 16:13:23 +0000 (09:13 -0700)]
icmp: improve ICMP limit jitter
Instead of fixing up invalid values set by a user in badport_bandlim()
which is a fast path function, provide a sysctl handler
sysctl_icmplim_and_jitter(), that will check that jitter is less than the
limit.
Provide jitter initilization function icmplim_new_jitter() used at boot,
in the sysctl handler and when we actually hit the limit. This also fixes
no jitter on a fresh booted system until first limit hit.
Instead of CVE number provide link the the actual paper that explains what
and why we are doing here. The CVE number isn't very informative, it will
just tell you what RedHat version you need to upgrade to.
Gleb Smirnoff [Sun, 24 Mar 2024 16:13:23 +0000 (09:13 -0700)]
icmp: when logging ICMP ratelimiting message use correct jitter value
The limiting of the very last second has been done using certain jitter
value. We update the jitter for the next second. But the logging should
report the jitter before the change.
Gleb Smirnoff [Sun, 24 Mar 2024 16:13:23 +0000 (09:13 -0700)]
icmp: do not store per-VNET identical array of strings
We need per-VNET struct counter_rate, but we don't need per-VNET set of
const char *. Also, identical word "response" can go into the format
string instead of being stored 7 times.
Gordon Bergling [Sun, 24 Mar 2024 05:10:39 +0000 (06:10 +0100)]
mem.4: Correct the HISTORY section
The history section (added in CSRG) claimed both first appeared in v6.
Looking at the manuals in the TUHS archive, /dev/mem was in v1
and /dev/kmem was introduced in v5.
unionfs: implement VOP_UNP_* and remove special VSOCK vnode handling
unionfs has a bunch of clunky special-case code to avoid creating
unionfs wrapper vnodes for AF_UNIX sockets. This was added in 2008
to address PR 118346, but in the intervening years the VOP_UNP_*
operations have been added to provide a clean interface to allow
sockets to work in the presence of stacked filesystems.
Gleb Smirnoff [Sat, 23 Mar 2024 05:44:16 +0000 (22:44 -0700)]
tests/netgraph: start ng_ksocket(4) tests
The ng_ksocket(4) functionality is very fragile as it interfaces with
kernel socket code in unusual way. It definitely needs a test suite.
Start one with a test that tests UDP over IPv4.
Gleb Smirnoff [Sat, 23 Mar 2024 05:44:16 +0000 (22:44 -0700)]
tests/netinet: add UDP socket I/O tests
Start a file that would collect tests for I/O functionality of a UDP
socket, targeted on how a socket interacts with userland rather than with
wire side of the protocol.
First version tests that MSG_TRUNC and MSG_PEEK are working correctly.
Gleb Smirnoff [Sat, 23 Mar 2024 02:50:33 +0000 (19:50 -0700)]
tests/netgraph: mark all tests as required_user="root"
Any netgraph operation requires root priveleges. Some tests in the
directory already mark themselves with 'atf_tc_set_md_var(conf,
"require.user", "root");' which creates a lot of pasted code. Some tests
don't mark self. For this particular directory a blanket metadata setting
in the Makefile is acceptable, imho.
John Baldwin [Sat, 23 Mar 2024 00:25:07 +0000 (17:25 -0700)]
nvmecontrol: Display additional Fabrics-related fields for cdata
Some of these fields are specific to Fabrics controllers (such as the
size of capsules) while other fields are shared with PCI-e
controllers, but are more relevant for Fabrics controllers (such as
KeepAlive timer properties).
John Baldwin [Sat, 23 Mar 2024 00:23:09 +0000 (17:23 -0700)]
nvme: Add SGL structure and constants for use in NVMe commands
Fabrics capsules use an SGL structure instead of prp1/2 addresses to
describe the data buffer used for a command. The SGL structure is
added to a union with the existing prp1/2 fields.
Michael Tuexen [Fri, 22 Mar 2024 13:50:25 +0000 (14:50 +0100)]
rtld: fix check for endianess of elf hints file
Don't check if the elf hints file is in host byte order, but check
if it is in little endian by looking at the magic number.
This fixes rtld on big endian platforms.
Reviewed by: se, kib (prior version of the patch)
Fixes: 7b77d37a561b ("rtld-elf: support either byte-order of hints")
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D44472
build: add -Wswitch to clang for more consistency with gcc
gcc12 and gcc13 appear to include Wswitch with Wall, while
clang doesn't. For switch() statements on enum, this forces
the use of at least a default: clause, in adherance with style(9).
Reviewed By: emaste
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D44092
Kristof Provost [Thu, 21 Mar 2024 07:38:45 +0000 (08:38 +0100)]
if_ovpn tests: test large packets in IPv6 tunnel
There's a report of MTU issues over IPv6 DCO tunnels.
Extend the 4in6 test to send a series of pings with different sizes, as
well as transfer a large file.
No issues were found, but we may as well extend the test case.
Mark Johnston [Fri, 22 Mar 2024 06:11:03 +0000 (02:11 -0400)]
ddb: Fix format string errors in db_pprint.c
For some reason, db_expr_t is defined as "long" on 64-bit platforms and
"int" on others. When printing values of this type, simply cast them to
long to suppress compilation errors on 32-bit systems.
Bojan Novković [Fri, 22 Mar 2024 03:01:34 +0000 (04:01 +0100)]
ddb: Add CTF-based pretty printing
Add basic CTF support and a CTF-powered pretty-printer to ddb.
The db_ctf.* files expose a basic interface for fetching type
data for ELF symbols, interacting with the CTF string table,
and translating type identifiers to type data.
The db_pprint.c file uses those interfaces to implement
a pretty-printer for all kernel ELF symbols.
The pretty-printer works with symbol names and arbitrary addresses:
pprint struct thread 0xffffffff8194ad90
Pretty-printing currently only works after the root filesystem
gets mounted because the CTF info is not available during
early boot.
Dimitry Andric [Thu, 21 Mar 2024 20:44:46 +0000 (21:44 +0100)]
Slightly reorganize libclang_rt Makefile again
Make a separate .elif section for MACHINE_ARCH==powerpc, and subdivide
the MACHINE_CPUARCH values under it. If at some point more sanitizer
libraries become available for powerpc CPU architectures, they can be
added before the "nothing for other powerpc yet" case. Similar for the
MACHINE_ARCH==arm case.
Dimitry Andric [Thu, 21 Mar 2024 13:53:36 +0000 (14:53 +0100)]
Fix building of several libclang_rt libraries for powerpc64 and powerp64le
I reorganized the libclang_rt Makefile in e77a1bb27574 to make it more
readable and maintainable, but the check for 32-bit powerpc was wrong.
This caused almost no libclang_rt libraries to be built for powerpc64
and powerpc64le.
Stefan Eßer [Thu, 21 Mar 2024 15:31:49 +0000 (16:31 +0100)]
rtld-elf: add some debug print statements
The byte-order independent code has been reported to fail on powerpc64.
Add some more debug statements to help identify the parametrs used and
to verify the correct operation of the byte-swap macros used..
Mitchell Horne [Thu, 21 Mar 2024 15:21:41 +0000 (12:21 -0300)]
kassert.h: update MPASS definition commentary
We now have a detailed man page describing both MPASS and KASSERT. Give
a warning that careless use of MPASS can result in inadequate assertion
messages, and point to the MPASS(9) page which describes this.
While here add a comment above the KASSERT definitions pointing to the
man page.
Suggested by: bz
Reviewed by: emaste
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D44438
Andrew Turner [Thu, 14 Mar 2024 14:02:56 +0000 (14:02 +0000)]
arm64: Mask non-debug exceptions when single stepping
When an exception is pending when single stepping we may execute the
handler for that exception rather than the single step handler. This
could cause the scheduler to fire to run a new thread. This will mean
we single step to a new thread causing unexpected results.
Handle this by masking non-debug exceptions. This will cause issues
when stepping over instructions that access the DAIF values so future
work is needed to handle these cases, but for most code this now works
as expected.
Reviewed by: jhb
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D44350
Andrew Turner [Thu, 21 Mar 2024 10:13:16 +0000 (10:13 +0000)]
arm64: Support passing more registers to signals
To support recent extensions to the Arm architecture we may need to
store more or larger registers when sending a signal.
To support this create a list of these extra registers. Userspace that
needs to access a register in the signal handler can then walk the list
to find the correct register struct and read/write its contents.
Mark Johnston [Thu, 21 Mar 2024 04:20:37 +0000 (00:20 -0400)]
bhyve: Move device model-independent UART code into a separate file
Currently bhyve implements a ns16550-compatible UART in uart_emul.c.
This file also contains generic code to manage RX FIFOs and to handle
reading from and writing to a TTY. bhyve instantiates UARTs to
implement COM devices (via pci_lpc.c) and PCI UART devices.
The arm64 port will bring with it a PL011 device model which is used as
the default console (i.e., no COM ports). To simplify its integration,
add a UART "backend" layer which lets UART device models allocate an RX
FIFO and interact with TTYs without duplicating code. In particular,
code in uart_backend.* is to be shared among device models, and the
namespace for uart_emul.* is changed to uart_ns16550_*.
This is based on andrew@'s work in
https://github.com/zxombie/freebsd/tree/bhyvearm64 but I've made a
number of changes, particularly with respect to naming and source code
organization.
This should fix 'Assertion failed: (isa<To>(Val) && "cast<Ty>() argument
of incompatible type!")' errors when building devel/boost-libs,
specifically libs/url/src/segments_view.cpp.
Bump __FreeBSD_version so this fix can easily be detected from
devel/boost-all/compiled.mk.
John Baldwin [Wed, 20 Mar 2024 22:30:09 +0000 (15:30 -0700)]
cxgbe tom: Handle a race condition when enabling TLS offload
Use a separate state for when a request to set RX_QUIESCE has been
sent but the resulting TCB reply has not been received. In
particular, this correctly handles the case where data has been
received and queued in the receive queue before the quiesce request
takes effect.
John Baldwin [Wed, 20 Mar 2024 22:29:51 +0000 (15:29 -0700)]
NFS: Request use of TCP_USE_DDP for in-kernel TCP sockets
Since this is an optimization, ignore failures to enable the option.
For the server side, defer enabling DDP until the first non-NULLPROC
RPC is received. This allows TLS handling (which uses NULLPROC RPCs)
to enable TLS offload first.
John Baldwin [Wed, 20 Mar 2024 22:29:28 +0000 (15:29 -0700)]
cxgbe: Support TCP_USE_DDP on offloaded TOE connections
When this socket option is enabled, relatively large contiguous
buffers are allocated and used to receive data from the remote
connection. When data is received a wrapper M_EXT mbuf is queued to
the socket's receive buffer. This reduces the length of the linked
list of received mbufs and allows consumers to consume receive data in
larger chunks.
To minimize reprogramming the page pods in the adapter, receive
buffers for a given connection are recycled. When a buffer has been
fully consumed by the receiver and freed, the buffer is placed on a
per-connection free buffers list.
The size of the receive buffers defaults to 256k and can be set via
the hw.cxgbe.toe.ddp_rcvbuf_len sysctl. The
hw.cxgbe.toe.ddp_rcvbuf_cache sysctl (defaults to 4) determines the
maximum number of free buffers cached per connection. Note that this
limit does not apply to "in-flight" receive buffers that are
associated with mbufs in the socket's receive buffer.
John Baldwin [Wed, 20 Mar 2024 22:29:02 +0000 (15:29 -0700)]
tcp: Add a new kernel-only TCP_USE_DDP socket option
This socket option can be used by in-kernel consumers (like NFS) to
request a NIC to use optimized receive of large buffers for a
connection. The current use case is to support DDP by the TOE on
Chelsio NICs.
Andrew Gallatin [Wed, 20 Mar 2024 19:46:01 +0000 (15:46 -0400)]
ip6_output: Reduce cache misses on pktopts
When profiling an IP6 heavy workload, I noticed that we were
getting a lot of cache misses in ip6_output() around
ip6_pktopts. This was happening because the TCP stack passes
inp->in6p_outputopts even if all options are unused. So in the
common case of no options present, pkt_opts is not null, and is
checked repeatedly for different options. Since ip6_pktopts is
large (4 cachelines), and every field is checked, we take 4
cache misses (2 of which tend to be hidden by the adjacent line
prefetcher).
To fix this common case, I introduced a new flag in ip6_pktopts
(ip6po_valid) which tracks which options have been set. In the
common case where nothing is set, this causes just a single
cache miss to load. It also eliminates a test for some options
(if (opt != NULL && opt->val >= const) vs if ((optvalid & flag) !=0 )
To keep the struct the same size in 64-bit kernels, and to keep
the integer values (like ip6po_hlim, ip6po_tclass, etc) on the
same cacheline, I moved them to the top.
As suggested by zlei, the null check in MAKE_EXTHDR() becomes
redundant, and can be removed.
For our web server workload (with the ip6po_tclass option set),
this drops the CPI from 2.9 to 2.4 for ip6_output
Differential Revision: https://reviews.freebsd.org/D44204
Reviewed by: bz, glebius, zlei
No Objection from: melifaro
Sponsored by: Netflix Inc.