Jan Bramkamp [Mon, 4 Sep 2023 08:38:25 +0000 (10:38 +0200)]
bhyve: Use VMIO_SIOCSIFFLAGS instead of SIOCGIFFLAGS
Creating an IP socket to invoke the SIOCGIFFLAGS ioctl on is the only
thing preventing bhyve from working inside a bhyve jail with IPv4 and
IPv6 disabled restricting the jailed bhyve process to only access the
host network via a tap/vmnet device node.
Mark Johnston [Mon, 16 Oct 2023 21:35:07 +0000 (17:35 -0400)]
socket tests: Clean up the MSG_TRUNC regression tests a bit
- Fix style.
- Move test case-specific code out of the shared function and into the
individual test cases.
- Remove unneeded setting of SO_REUSEPORT.
- Avoid unnecessary copying.
- Use ATF_REQUIRE* instead of ATF_CHECK*. The former cause test
execution to stop after a failed assertion, which is what we want.
- Add a test case for AF_LOCAL/SOCK_SEQPACKET sockets.
The current xo_format string is incorrect. This restores the display
format prior to libxo-ification work while also explicitly marking
tv_sec and tv_usec as encoded output only.
Zhenlei Huang [Tue, 17 Oct 2023 07:05:25 +0000 (15:05 +0800)]
x86: Prefer consistent naming for loader tunables
The following loader tunables do have corresponding sysctl MIBs but
with inconsistent naming. That may be historical reason. Let's prefer
consistent naming for them so that it will be easier to maintain.
Zhenlei Huang [Fri, 20 Oct 2023 07:31:44 +0000 (15:31 +0800)]
amd64 pmap: Prefer consistent naming for loader tunable
The sysctl knob 'vm.pmap.allow_2m_x_ept' is loader tunable and have
public document entry in security(7) but is fetched from kernel
environment 'hw.allow_2m_x_ept'. That is inconsistent and obscure.
As there is public security advisory FreeBSD-SA-19:25.mcepsc [1],
people may refer to it and use 'hw.allow_2m_x_ept', let's keep old
name for compatibility.
Zhenlei Huang [Thu, 19 Oct 2023 17:18:25 +0000 (01:18 +0800)]
vmx: Prefer consistent naming for loader tunables
The following loader tunables do have corresponding sysctl MIBs but
with different names. That may be historical reason. Let's prefer
consistent naming for them so that it will be easier to read and
maintain.
Kyle Evans [Thu, 12 Oct 2023 02:51:07 +0000 (21:51 -0500)]
freebsd-update: create deep BEs by default
The -r flag to bectl needs to go away, and we need to just do the right
thing. In the meantime, we can apply an -r in freebsd-update as a
minimal fix to stop creating partial backups in these (non-default) deep
BE setups.
Zhenlei Huang [Thu, 19 Oct 2023 17:00:31 +0000 (01:00 +0800)]
pmap: Prefer consistent naming for loader tunable
The sysctl knob 'vm.pmap.pv_entry_max' becomes a loader tunable since 7ff48af7040f (Allow a specific setting for pv entries) but is fetched
from system environment 'vm.pmap.pv_entries'. That is inconsistent and
obscure.
This reverts 36e1b9702e21 (Correct the tunable name in the message).
Zhenlei Huang [Thu, 19 Oct 2023 15:23:33 +0000 (23:23 +0800)]
amd64: Fix two typos of loader tunables
To match the sysctl MIBs and document entries in security(7).
Fixes: 2dec2b4a34b4 amd64: flush L1 data cache on syscall return with an error
Fixes: 17edf152e556 Control for Special Register Buffer Data Sampling mitigation
Reviewed by: kib
MFC after: 1 day
Differential Revision: https://reviews.freebsd.org/D42249
Bojan Novković [Fri, 13 Oct 2023 05:14:36 +0000 (08:14 +0300)]
tty/teken: fix UTF8 sequence validation logic
This patch fixes UTF-8 sequence validation logic in
teken_utf8_bytes_to_codepoint() and fixes fallback behaviour in
ttydisc_rubchar() when an invalid UTF8 sequence is encountered. The code
previously used __bitcount() to extract sequence length information from
the leading byte. However, this assumption breaks for certain code
points that have additional bits set in the first half of the leading
byte (e.g. Cyrillic characters). This lead to incorrect behaviour when
deleting those characters using backspaces. The code now checks the
number of consecutive set bits in the leading byte starting from the
MSB, as per RFC 3629.
The use of bitcount() triggered a build error because it couldn't be
located. __bitcount() on the other hand is defined in sys/types.h, which
is included in teken/teken.h.
Bojan Novković [Sat, 7 Oct 2023 18:00:11 +0000 (21:00 +0300)]
tty: fix improper backspace behaviour for UTF8 characters when in canonical mode
This patch adds additional logic in ttydisc_rubchar() to properly handle
backspace behaviour for UTF-8 characters.
Currently, typing in a backspace after a UTF8 character will delete only
one byte from the byte sequence, leaving garbled output in the tty's
output queue. With this change all of the character's bytes are deleted.
This change is only active when the IUTF8 flag is set (see 19054eb6053189144aa962b2ecc1bf5087758a3e "(s)tty: add support for IUTF8
input flag")
The code uses the teken_wcwidth() function to properly handle character
column widths for different code points, and adds the
teken_utf8_bytes_to_codepoint() function that converts a UTF-8 byte
sequence to a codepoint, as specified in RFC3629.
Bojan Novković [Sat, 7 Oct 2023 17:59:57 +0000 (20:59 +0300)]
(s)tty: add support for IUTF8 input flag
This patch adds the necessary kernel and stty code to support setting
the IUTF8 flag for ttys. It is the first of two patches that fix
backspace behaviour for UTF-8 encoded characters when in canonical mode.
Mark Johnston [Mon, 16 Oct 2023 20:11:55 +0000 (16:11 -0400)]
ktrace: Handle uio_resid underflow via MSG_TRUNC
When recvmsg(2) is used with MSG_TRUNC on an atomic socket type (DGRAM
or SEQPACKET), soreceive_generic() and uipc_peek_dgram() may
intentionally underflow uio_resid so that userspace can find out how
many bytes it should have asked for.
If this happens, and KTR_GENIO is enabled, ktrgenio() will attempt to
copy in beyond the end of the output buffer's iovec. In general this
will silently cause the ktrace operation to fail since it'll result in
EFAULT from uiomove(). Let's be more careful and make sure not to try
and copy more bytes than we have.
Fixes: be1f485d7d6b ("sockets: add MSG_TRUNC flag handling for recvfrom()/recvmsg().")
Reported by: syzbot+30b4bb0c0bc0f53ac198@syzkaller.appspotmail.com
Reviewed by: kib
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42099
Wei Hu [Fri, 20 Oct 2023 08:58:20 +0000 (08:58 +0000)]
Hyper-V: vmbus: check if signaling host is needed in vmbus_rxbr_read
It is observed that netvsc's send rings could stall on the latest
Azure Boost platforms. This is due to vmbus_rxbr_read() routine
doesn't check if host is waiting for more room to put data, which
leads to host side sleeping forever on this vmbus channel. The
problem was only observed on the latest platform because the host
requests larger buffer ring room to be available, which causes
the issue to happen much more easily.
Fix this by adding check in the vmbus_rxbr_read call and signaling
the host in the callers if check returns positively.
Reported by: NetApp
Tested by: whu
Sponsored by: Microsoft
Zhenlei Huang [Thu, 12 Oct 2023 10:14:49 +0000 (18:14 +0800)]
vm_phys: Add corresponding sysctl knob for loader tunable
The loader tunable 'vm.numa.disabled' does not have corresponding sysctl
MIB entry. Add it so that it can be retrieved, and `sysctl -T` will also
report it correctly.
Zhenlei Huang [Thu, 12 Oct 2023 10:14:49 +0000 (18:14 +0800)]
vm_page: Add corresponding sysctl knob for loader tunable
The loader tunable 'vm.pgcache_zone_max_pcpu' does not have corresponding
sysctl MIB entry. Add it so that it can be retrieved, and `sysctl -T`
will also report it correctly.
Zhenlei Huang [Thu, 12 Oct 2023 10:14:48 +0000 (18:14 +0800)]
kasan: Add corresponding sysctl knob for loader tunable
The loader tunable 'debug.kasan.disabled' does not have corresponding
sysctl MIB entry. Add it so that it can be retrieved, and `sysctl -T`
will also report it correctly.
MFC: Remove confDH_PARAMETERS settings in favor of using sendmail's
built-in default which was added in sendmail 8.15.2 (the config
line predates that 8.15.2 feature). This also alleviates the need
for admins to create the DH parameters file if they opt to use
Diffie-Hellman.
POSIX has accepted a proposal[1] to add glibc-compatible ptsname_r. It
indicates an error by returning the error number, rather than returning
-1 and setting errno. Update RETURN VALUES in ptsname_r's man page now
to encourage folks to test that the return value != 0 rather than == -1.
The loader tunable 'net.inet.sctp.tcbhashsize' and 'net.inet.sctp.chunkscale'
are only used during vnet initializing, thus it make no senses to make them
writable tunable.
Validate the values of loader tunables on vnet initialize, reset them to
theirs defaults if invalid to prevent potential kernel panics.
Alan Somers [Wed, 4 Oct 2023 18:48:01 +0000 (12:48 -0600)]
fusefs: sanitize FUSE_READLINK results for embedded NULs
If VOP_READLINK returns a path that contains a NUL, it will trigger an
assertion in vfs_lookup. Sanitize such paths in fusefs, rejecting any
and warning the user about the misbehaving server.
Mateusz Guzik [Wed, 11 Oct 2023 09:42:12 +0000 (09:42 +0000)]
vfs: further speed up continuous free vnode recycle
The primary bottleneck *was* vnode_list mtx, which got artificially
worsened due to the following work done with the lock held:
1. the global heavily modified numvnodes counter was being read,
inducing massive cache line ping pong
2. should the value fit limits (which it normally did) there would be an
avoidable write to vn_alloc_cyclecount, which is being read outside
of the lock, once more inducing traffic
But if vn_alloc_cyclecount is 0, which it normally is even when facing
vnode shortage, there is no need to check numvnodes nor set it to 0 again.
Another problem was numvnodes adjustment (which made the locked read
much worse). While it fundamentally does not scale as it is not
distributed in any fashion, it was avoidably slow. When bumping over the
vnode limit, it would be modified with atomics 3 times: inc + dec to
backpedal in vn_alloc, then final inc in vn_alloc_hard.
One can let some slop persist over calls to vnlru_free instead.
In principle each thread in the system could get here and bump it, so a
limit is put in place to keep things sane.
Bench setup same as in prior commits: zfs, 20 separate directory trees
each with 1 million files in total and 20 find(1) processes stating them
in parallel (one per each tree).
Total run time (in seconds) goes down as follows:
vnode limit 8388608 400000
before ~20 ~35
after ~8 ~15
With this in place the primary bottleneck is now ZFS.
vfs: prefix regular vnlru with a special case for free vnodes
Works around severe performance problems in certain corner cases, see
the commentary added.
Modifying vnlru logic has proven rather error prone in the past and a
release is near, thus take the easy way out and fix it without having to
dig into the current machinery.
Mateusz Guzik [Tue, 10 Oct 2023 16:19:53 +0000 (16:19 +0000)]
vfs: consult freevnodes in vnlru_kick_cond
If the count is high enough there is no point trying to produce more.
Not going there reduces traffic on the vnode_list mtx.
This further shaves total real time in a test mentioned in: 74be676d87745eb7 ("vfs: drop one vnode list lock trip during vnlru free
recycle") -- 20 instances of find each creating 1 million vnodes, while
total limit is set to 400k.
Ed Maste [Fri, 29 Sep 2023 15:47:41 +0000 (11:47 -0400)]
freebsd-update: add a note about when files may be deleted
Files under /var/db/freebsd-update are required during the upgrade
process, and to support rollback. They may be deleted if no upgrade is
in progress and rollback will not be required.
PR: 273601
Reviewed by: bcr
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42022
Kristof Provost [Thu, 5 Oct 2023 14:57:50 +0000 (16:57 +0200)]
pf: fix SCTP SDT probe
We want the return value of pf_test_rule(), i.e. the result of the
evaluation of the new state, not the result of the evaluation of the
original packet/state.
MFC after: 1 week
Sponsored by: Orange Business Services
Zhenlei Huang [Mon, 9 Oct 2023 10:30:22 +0000 (18:30 +0800)]
buf: Add sysctl flag CTLFLAG_TUN to loader tunable
The sysctl variable 'vfs.unmapped_buf_allowed' is actually a loader
tunable. Add sysctl flag CTLFLAG_TUN to it so that `sysctl -T` will
report it correctly.
Zhenlei Huang [Mon, 9 Oct 2023 10:30:22 +0000 (18:30 +0800)]
sockets: Add sysctl flag CTLFLAG_TUN to loader tunable
The sysctl variable 'kern.ipc.maxsockets' is actually a loader tunable.
Add sysctl flag CTLFLAG_TUN to it so that `sysctl -T` will report it
correctly.
Zhenlei Huang [Mon, 9 Oct 2023 10:30:21 +0000 (18:30 +0800)]
ddb: Add sysctl flag CTLFLAG_TUN to loader tunable
The sysctl variable 'debug.ddb.capture.bufsize' is actually a loader
tunable. Add sysctl flag CTLFLAG_TUN to it so that `sysctl -T` will
report it correctly.
Zhenlei Huang [Mon, 9 Oct 2023 10:30:21 +0000 (18:30 +0800)]
cam/scsi: Add sysctl flag CTLFLAG_TUN to loader tunable
The sysctl variable 'kern.cam.scsi_delay' is actually a loader tunable.
Add sysctl flag CTLFLAG_TUN to it so that `sysctl -T` will report it
correctly.
John Baldwin [Fri, 8 Sep 2023 23:30:35 +0000 (16:30 -0700)]
cxgbe tom: Call t4_rcvd_locked from do_rx_data to return RX credits
In particular, the kernel RPC layer used by the NFS client never
invokes pru_rcvd since it always reads data from the socket upcall
via MSG_SOCALLBCK which avoids calling pru_rcvd. As a result, on an
NFS client connection managed by t4_tom, RX credits were never
returned to the TOE connection to open the TCP window resulting in
connection hangs.
To fix, expand the set of conditions in do_rx_data where RX credits
are returned to match those in t4_rcvd_locked by calling the function
directly.
John Hay [Thu, 28 Sep 2023 21:08:08 +0000 (14:08 -0700)]
x86: Properly align interrupt vectors for MSI
MSI (not MSI-X) interrupt vectors must be allocated in groups that are
powers of 2, and the block of IDT vectors must be aligned to the size
of the request.
The code in native_apic_alloc_vectors() does an alignment check in the loop:
if ((vector & (align - 1)) != 0)
continue;
first = vector;
But it adds APIC_IO_INTS to the value it returns:
return (first + APIC_IO_INTS);
The problem is that APIC_IO_INTS is not a multiple of 32. It is 48:
As a result, a request for 32 vectors (the max supported by MSI), was
not always aligned. To fix, check the alignment of
'vector + APIC_IO_INTS' in the loop.
The AX88179A has two firmware modes, one of which is backward
compatible with existing AX88178A/179 driver. The active firmware mode
can be controlled through a register.
Update axge(4) man page to mention 179A support and ensure that, when
bound to a AX88179A, the driver activates the compatible firmware mode.
Ed Maste [Tue, 27 Jul 2021 16:44:45 +0000 (12:44 -0400)]
pkgbase: accommodate pkg < 1.17
6cafdee71d2b adapted the pkgbase build for 1.17, but broke Cirrus-CI's
use of PKG_FORMAT=tar (the quarterly package set still has pkg 1.16).
Because of this I disabled the pkgbase build and test in 2bfba2a04b05.
Now, check `pkg --version` and use the old logic for < 1.17.
To be reverted once we no longer encounter pkg 1.16 in Cirrus-CI (i.e.,
via GCP cloud images) to avoid keeping this extra complexity around.
PR: 257422
Reviewed by: manu
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31324
Ed Maste [Fri, 1 Apr 2022 22:58:00 +0000 (18:58 -0400)]
Speed up *-old-* make targets by using sed instead of xargs
Targets like 'list-old-files' used "xargs -n1" to produce a list with
one file per line. Using xargs resulted in one fork+exec for each
Argument, resulting in rather long runtime. Instead, use sed to split
the list. On one machine `make list-old-files` took 30s wall clock time
with xargs and less than 1s with sed.
Reviewed by: jhb
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34741