]> CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log
FreeBSD/FreeBSD.git
8 months agoEnsure 'struct thread' is aligned to a cache line
Olivier Certner [Fri, 13 Oct 2023 08:52:31 +0000 (10:52 +0200)]
Ensure 'struct thread' is aligned to a cache line

Using the new UMA_ALIGN_CACHE_AND_MASK() facility, which allows to
simultaneously guarantee a minimum of 32 bytes of alignment (the 5 lower
bits are always 0).

For the record, to this day, here's a (possibly non-exhaustive) list of
synchronization primitives using lower bits to store flags in pointers
to thread structures:
- lockmgr, rwlock and sx all use the 5 bits directly.
- rmlock indirectly relies on sx, so can use the 5 bits.
- mtx (non-spin) relies on the 3 lower bits.

Reviewed by:            markj, kib
MFC after:              2 week
Sponsored by:           The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D42266

8 months agouma: Permit specifying max of cache line and some custom alignment
Olivier Certner [Fri, 13 Oct 2023 15:05:34 +0000 (17:05 +0200)]
uma: Permit specifying max of cache line and some custom alignment

To be used for structures for which we want to enforce that pointers to
them have some number of lower bits always set to 0, while still
ensuring we benefit from cache line alignment to avoid false sharing
between structures and fields within the structures (provided they are
properly ordered).

First candidate consumer that comes to mind is 'struct thread', see next
commit.

Reviewed by:            markj, kib
MFC after:              2 weeks
Sponsored by:           The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D42265

8 months agolinuxkpi: dma_get_cache_alignment(): Fix off-by-one result
Olivier Certner [Fri, 13 Oct 2023 15:13:28 +0000 (17:13 +0200)]
linuxkpi: dma_get_cache_alignment(): Fix off-by-one result

Substituting 'uma_align_cache' by the appropriately named accessor
uma_get_cache_align_mask() made apparent that dma_get_cache_alignment()
was off by one, since it was defined to be the mask derived from the
alignment value.

Reviewed by:            markj, bz
MFC after:              2 weeks
Sponsored by:           The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D42264

8 months agouma: New check_align_mask(): Validate alignments (INVARIANTS)
Olivier Certner [Fri, 13 Oct 2023 14:09:51 +0000 (16:09 +0200)]
uma: New check_align_mask(): Validate alignments (INVARIANTS)

New function check_align_mask() asserts (under INVARIANTS) that the mask
fits in a (signed) integer (see the comment) and that the corresponding
alignment is a power of two.

Use check_align_mask() in uma_set_align_mask() and also in uma_zcreate()
to replace the KASSERT() there (that was checking only for a power of
2).

Reviewed by:            kib, markj
MFC after:              2 weeks
Sponsored by:           The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D42263

8 months agouma: Make the cache alignment mask unsigned
Olivier Certner [Fri, 13 Oct 2023 12:49:11 +0000 (14:49 +0200)]
uma: Make the cache alignment mask unsigned

In uma_set_align_mask(), ensure that the passed value doesn't have its
highest bit set, which would lead to problems since keg/zone alignment
is internally stored as signed integers.  Such big values do not make
sense anyway and indicate some programming error.  A future commit will
introduce checks for this case and other ones.

Reviewed by:            kib, markj
MFC after:              2 weeks
Sponsored by:           The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D42262

8 months agoarm: Simplify get_cachetype_cp15()
Olivier Certner [Fri, 13 Oct 2023 12:22:14 +0000 (14:22 +0200)]
arm: Simplify get_cachetype_cp15()

There's no point in setting 'arm_dcache_align_mask' before the
function's end.

Reviewed by:            markj, kib
MFC after:              2 weeks
Sponsored by:           The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D42261

8 months agouma: UMA_ALIGN_CACHE: Resolve the proper value at use point
Olivier Certner [Fri, 13 Oct 2023 12:13:30 +0000 (14:13 +0200)]
uma: UMA_ALIGN_CACHE: Resolve the proper value at use point

Having a special value of -1 that is resolved internally to
'uma_align_cache' provides no significant advantages and prevents
changing that variable to an unsigned type, which is natural for an
alignment mask.  So suppress it and replace its use with a call to
uma_get_align_mask().  The small overhead of the added function call is
irrelevant since UMA_ALIGN_CACHE is only used when creating new zones,
which is not performance critical.

Reviewed by:            markj, kib
MFC after:              2 weeks
Sponsored by:           The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D42259

8 months agouma: Hide 'uma_align_cache'; Create/rename accessors
Olivier Certner [Fri, 13 Oct 2023 09:52:28 +0000 (11:52 +0200)]
uma: Hide 'uma_align_cache'; Create/rename accessors

Create the uma_get_cache_align_mask() accessor and put it in a separate
private header so as to minimize namespace pollution in header/source
files that need only this function and not the whole 'uma.h' header.

Make sure the accessors have '_mask' as a suffix, so that callers are
aware that the real alignment is the power of two that is the mask plus
one.  Rename the stem to something more explicit.  Rename
uma_set_cache_align_mask()'s single parameter to 'mask'.

Hide 'uma_align_cache' to ensure that it cannot be set in any other way
then by a call to uma_set_cache_align_mask(), which will perform sanity
checks in a further commit.  While here, rename it to
'uma_cache_align_mask'.

This is also in preparation for some further changes, such as improving
the sanity checks, eliminating internal resolving of UMA_ALIGN_CACHE and
changing the type of the 'uma_cache_align_mask' variable.

Reviewed by:            markj, kib
MFC after:              2 weeks
Sponsored by:           The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D42258

8 months agoEnsure "init" (PID 1) also executes userret() initially
Olivier Certner [Tue, 10 Oct 2023 17:36:20 +0000 (19:36 +0200)]
Ensure "init" (PID 1) also executes userret() initially

Calling userret() from fork_return() misses the first return to
userspace of the "init" (PID 1) process.  The latter is indeed created
by fork1() followed by a call to cpu_fork_kthread_handler() call that
replaces fork_return() by start_init() as the function to execute after
fork.

A new process' initial return to userspace in the end always happens
through returning from fork_exit(), so move userret() there instead to
fix the omission.

This problem was discovered as part of a revamp of scheduling priorities
that lead to experimenting with asserting and sometimes resetting
priorities in sched_userret(), in the course of which the author
stumbled on panics being triggered only in init() or only in other
processes, depending on the modifications to sched_userret().  This
change currently has no practical effect but will have some in the near
future.

Reviewed by:            markj, kib
MFC after:              2 weeks
Sponsored by:           The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D42257

8 months agopdinit(): Fix comment
Olivier Certner [Tue, 26 Sep 2023 10:26:46 +0000 (12:26 +0200)]
pdinit(): Fix comment

Reviewed by:            markj, kib
MFC after:              1 week
Sponsored by:           The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D42256

8 months agoOpen-code proc_set_cred_init()
Olivier Certner [Mon, 25 Sep 2023 08:48:49 +0000 (10:48 +0200)]
Open-code proc_set_cred_init()

This function is to be called only when initializing a new process (so,
'proc0' and at fork), and not in any other circumstances.  Setting the
process' 'p_ucred' field to the result of crcowget() on the original
credentials is the only thing it does, hiding the fact that the process'
'p_ucred' field is crushed by the call.  Moreover, most of the code it
executes is already encapsulated in crcowget().

To prevent misuse and improve code readability, just remove this
function and replace it with a direct assignment to 'p_ucred'.

Reviewed by:            markj (earlier version), kib
MFC after:              1 week
Sponsored by:           The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D42255

8 months agoHyper-V: vmbus: Add NULL check for vmbus_res
Zhenlei Huang [Thu, 2 Nov 2023 09:07:11 +0000 (17:07 +0800)]
Hyper-V: vmbus: Add NULL check for vmbus_res

QEMU emulates Hyper-V [1] but lacks the emulation for vmbus_res, thus no
coherence information is available. Add NULL check for it and fallback
to no coherence. This will prevent FreeBSD guests from panic on QEMU
with the Hyper-V enlightenment hv-synic enabled.

For real Hyper-V, both gen1 and gen2 have vmbus_res then they are not
affected by this change.

1. https://www.qemu.org/docs/master/system/i386/hyperv.html

PR: 274810
Reviewed by: mhorne, emaste, delphij, whu
Diagnosed by: mhorne
Fixes: e7a9817b8d32 Hyper-V: vmbus: implementat bus_get_dma_tag in vmbus
Insta-MFC approved by: re (delphij) for 14.0-RC4
Differential Revision: https://reviews.freebsd.org/D42414

8 months agohmt(4): Do not require input report HID usages to be a member of TLC
Vladimir Kondratyev [Thu, 2 Nov 2023 06:20:20 +0000 (09:20 +0300)]
hmt(4): Do not require input report HID usages to be a member of TLC

Some touchpads places button usages (in HID report descriptor) in to
the 2-nd level collection rather than in to the top level one. That
confuses current code. Remove collection level check in HID report
descriptor parser to fix device detection.

Reported by: Peter Much <pmc@citylink.dinoex.sub.org>
PR: 267094
MFC after: 1 week

8 months agoevdev: Sync event codes with Linux kernel 6.5
Vladimir Kondratyev [Thu, 2 Nov 2023 06:20:20 +0000 (09:20 +0300)]
evdev: Sync event codes with Linux kernel 6.5

MFC after: 1 week

8 months agocam/ata: Postpone removal of two compat sysctls until 15
Zhenlei Huang [Thu, 2 Nov 2023 05:14:40 +0000 (13:14 +0800)]
cam/ata: Postpone removal of two compat sysctls until 15

Prefer UNMAPPEDIO and ROTATING from flags sysctl. See
 1. aeab0812e68c (Add flags sysctl to ada)
 2. cf3ff63e55e4 (Convert unmappedio over to a flag)
 3. 96eb32bf0f5a (Convert rotating to a flag bit)

Reviewed by: imp, ken, #cam
MFC after: immediately (we want this in 14.0)
Differential Revision: https://reviews.freebsd.org/D42402

8 months agolibc: Purge unneeded cdefs.h
Warner Losh [Wed, 1 Nov 2023 22:43:37 +0000 (16:43 -0600)]
libc: Purge unneeded cdefs.h

These sys/cdefs.h are not needed. Purge them. They are mostly left-over
from the $FreeBSD$ removal. A few in libc are still required for macros
that cdefs.h defines. Keep those.

Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D42385

8 months agoip_var.h: align comment style
Igor Ostapenko [Wed, 1 Nov 2023 12:21:16 +0000 (14:21 +0200)]
ip_var.h: align comment style

MFC after: 2 weeks
Reviewed by: kp
Pull Request: https://github.com/freebsd/freebsd-src/pull/883

8 months agoRemove MOVED_LIBS handling from list-old-libs
Ed Maste [Mon, 16 Oct 2023 12:46:31 +0000 (08:46 -0400)]
Remove MOVED_LIBS handling from list-old-libs

In 922337e8d398 I added MOVED_LIBS into list-old-files, so that
delete-old-files would remove the old /usr/lib/libc++.so.1 as soon as
possible (after the library moved to /lib).

I left it in list-old-libs in case a user updated their src tree between
delete-old-files and delete-old-libs.  Now that some time has passed,
tremove the redundant MOVED_LIBS entry.

PR: 272642
Sponsored by: The FreeBSD Foundation

8 months agoudplite: fix checksum computation on the sender side
Michael Tuexen [Wed, 1 Nov 2023 09:24:56 +0000 (10:24 +0100)]
udplite: fix checksum computation on the sender side

Don't fill the fields of the UDP/IP header not used for the
checksum computation before performing the checksum computation.

Reviewed by: glebius
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D42275

8 months agozfs: merge openzfs/zfs@41e55b476
Martin Matuska [Wed, 1 Nov 2023 09:12:47 +0000 (10:12 +0100)]
zfs: merge openzfs/zfs@41e55b476

Notable upstream pull request merges:
 #15366 c3773de1 ZIL: Cleanup sync and commit handling
 #15409 dbe839a9 zvol: Cleanup set property
 #15409 60387fac zvol: Implement zvol threading as a Property
 #15409 9ccdb8be zvol: fix delayed update to block device ro entry
 #15448 05a7348a RAIDZ: Use cache blocking during parity math
 #15452 514d661c Tune zio buffer caches and their alignments
 #15456 799e09f7 Unify arc_prune_async() code
 #15465 763ca47f Fix block cloning between unencrypted and encrypted
                 datasets

To make the module version better comparable, the module version number
now includes the commit count since last tag.

Obtained from: OpenZFS
OpenZFS commit: 41e55b476bcfc90f1ad81c02c5375367fdace9e9

8 months agovfs: remove majority of stale commentary about free list
Mateusz Guzik [Wed, 1 Nov 2023 08:28:28 +0000 (08:28 +0000)]
vfs: remove majority of stale commentary about free list

There is no "free list" for a long time now.

While here slightly tidy up affected comments in other ways.

Note that the "free vnode" term is a misnomer at best and will also need
to get sorted out.

8 months agovfs: fix a typo introdued in previous
Mateusz Guzik [Wed, 1 Nov 2023 08:29:29 +0000 (08:29 +0000)]
vfs: fix a typo introdued in previous

Reported by: pstef

8 months agovfs: bring getnewvnode manpage closer to reality
Mateusz Guzik [Wed, 1 Nov 2023 08:20:12 +0000 (08:20 +0000)]
vfs: bring getnewvnode manpage closer to reality

8 months agoparam.h: FreeBSD_version 1500003: ino64 forward compat removal
Warner Losh [Wed, 1 Nov 2023 04:22:33 +0000 (22:22 -0600)]
param.h: FreeBSD_version 1500003: ino64 forward compat removal

Bump FreeBSD_version to 1500003 to mark the removal of the forward
compat code for the inode64 conversion. This removal should be a nop.

Sponsored by: Netflix

8 months agoino64: Remove 'forward compat' code for this
Warner Losh [Tue, 31 Oct 2023 20:55:58 +0000 (14:55 -0600)]
ino64: Remove 'forward compat' code for this

Forward compatibility code was added for running newer ino64 binaries on
older kernels as a transition aide. Now that ino64 has been in the tree
6 years, this code is no longer useful and should have been removed long
ago.  Remove it now. Should be no user-visible changes at this point as
all the 'upgrade' scenarios it was intended for are long since past.

Also need to remove this stuff from rtld since the _foo versions
no longer exist.

Sponsored by: Netflix
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D42382

8 months agoFix nfs_truncate_shares without /etc/exports.d
siv0 [Tue, 31 Oct 2023 20:57:54 +0000 (21:57 +0100)]
Fix nfs_truncate_shares without /etc/exports.d

Calling nfs_reset_shares on Linux prints a warning:
`failed to lock /etc/exports.d/zfs.exports.lock: No such file or
directory`
when /etc/exports.d does not exist. The directory gets created, when a
filesystem is actually exported through nfs_toggle_share and
nfs_init_share. The truncation of /etc/exports.d/zfs.exports happens
unconditionally when calling `zfs mount -a` (via zfs_do_mount and
share_mount in `cmd/zfs/zfs_main.c`).

Fixing the issue only in the Linux part, since the exports file on
freebsd is in `/etc/zfs/`, which seems present on 2 FreeBSD systems I
have access to (through `/etc/zfs/compatibility.d/`), while a Debian
box does not have the directory even if `/usr/sbin/exportfs` is
present through the `nfs-kernel-server` package.

The code for exports_available is copied from nfs_available above.

Fixes: ede037cda73675f42b1452187e8dd3438fafc220
("Make zfs-share service resilient to stale exports")

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
Closes #15369
Closes #15468

8 months agobhyve: ps2 implement command 0xf6
Warner Losh [Tue, 31 Oct 2023 20:03:32 +0000 (14:03 -0600)]
bhyve: ps2 implement command 0xf6

Implement PS2 Keyboard command 0xf6, which is "SET DEFAULTS". This is
the same as 0xf5 (DISABLE KEYBOARD), but without disabling the keyboard
(since that resets all the defaults as a side effect). Normally, we
clear the fifo when we re-enable the keyboard. However, since this
leaves the keyboard enabled, clear the fifo as part of this command and
send an ack.

Linux's keyboard driver sends this command on reboot. Other commands
enable / reset the kebyoard, so it doesn't matter too much this isn't
implemented for booting, eg ubuntu.

Sponsored by: Netflix
Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D42384

8 months agoFix block cloning between unencrypted and encrypted datasets
Martin Matuška [Tue, 31 Oct 2023 20:49:41 +0000 (21:49 +0100)]
Fix block cloning between unencrypted and encrypted datasets

Block cloning from an encrypted dataset into an unencrypted dataset
and vice versa is not possible. The current code did allow cloning
unencrypted files into an encrypted dataset causing a panic when
these were accessed. Block cloning between encrypted and encrypted
is currently supported on the same filesystem only.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Rob N <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #15464
Closes #15465

8 months agoAdd IBM TS1170 density codes and specs.
Kenneth D. Merry [Tue, 31 Oct 2023 19:20:36 +0000 (15:20 -0400)]
Add IBM TS1170 density codes and specs.

These were obtained from a drive, but they agree with the IBM
documentation.

The bpi/bpmm values are the same as TS1160, but the number of
tracks is much larger (18944 tracks vs 8704 for TS1160).  The tapes
are also longer, 1337m total.  (According to the MAM on a sample JF
tape.  I don't have a JE tape handy to compare.)  The end result
is a 50TB raw capacity (150TB compressed) for TS1170 with a JF
cartridge vs 20TB raw capacity (60TB compressed) for TS1160 with
a JE cartridge.

lib/libmt/mtlib.c:
Add the TS1170 density codes to the denstiy table in libmt.

usr.bin/mt/mt.1:
Add the TS1170 density codes and specs to the density table
in the mt(1) man page.  As usual for TS drives, there is an
encrypted and non-encrypted density code (0x79 and 0x59
respectively).

MFC after: 3 days
Sponsored by: Spectra Logic

8 months agoAdd all read-only compatible zpool features to grub2 compatibility
Umer Saleem [Tue, 31 Oct 2023 16:51:54 +0000 (21:51 +0500)]
Add all read-only compatible zpool features to grub2 compatibility

GRUB opens the boot pool in read-only mode. All read-only
compatible features for zpool can be enabled and added to
grub2 compatibility, as GRUB does not open the boot-pool
for write.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15459

8 months agozvol: fix delayed update to block device ro entry
Ameer Hamza [Tue, 17 Oct 2023 19:19:58 +0000 (00:19 +0500)]
zvol: fix delayed update to block device ro entry

The change in the zvol readonly property does not update the block
device readonly entry until the first IO to the ZVOL. This patch
addresses the issue by updating the block device readonly property
from the set property IOCTL call.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15409

8 months agozvol: Implement zvol threading as a Property
Ameer Hamza [Tue, 24 Oct 2023 21:53:27 +0000 (02:53 +0500)]
zvol: Implement zvol threading as a Property

Currently, zvol threading can be switched through the zvol_request_sync
module parameter system-wide. By making it a zvol property, zvol
threading can be switched per zvol.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15409

8 months agozvol: Cleanup set property
Ameer Hamza [Wed, 11 Oct 2023 22:31:11 +0000 (03:31 +0500)]
zvol: Cleanup set property

zvol_set_volmode() and zvol_set_snapdev() share a common code path.
Merge this shared code path into zvol_set_common().

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15409

8 months agopf: support SCTP-specific timeouts
Kristof Provost [Fri, 27 Oct 2023 14:45:07 +0000 (16:45 +0200)]
pf: support SCTP-specific timeouts

Allow SCTP state timeouts to be configured independently from TCP state
timeouts.

Reviewed by: tuexen
MFC after: 1 week
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D42393

8 months agolibpfctl: be more tolerant of kernel extensions
Kristof Provost [Fri, 27 Oct 2023 12:13:57 +0000 (14:13 +0200)]
libpfctl: be more tolerant of kernel extensions

Allow the kernel to supply more array elements than expected, but cut
off when we hit what we think the maximum is. This will improve forward
compatibility (i.e. old userspace with newer kernel).

Reviewed by: zlei
MFC after: 1 week
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D42392

8 months agopf tests: ensure that we generate all permutations for SCTP multihome
Kristof Provost [Tue, 10 Oct 2023 09:56:15 +0000 (11:56 +0200)]
pf tests: ensure that we generate all permutations for SCTP multihome

The initial multihome implementation was a little simplistic, and failed
to create all of the required states. Given a client with IP 1 and 2 and
a server with IP 3 and 4 we end up creating states for 1 - 3 and 2 - 3,
as well as 3 - 1 and 4 - 1, but not for 2 - 4.

Check for this.

MFC after: 1 week
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D42362

8 months agopf: fix missing SCTP multihomed states
Kristof Provost [Tue, 17 Oct 2023 16:10:39 +0000 (18:10 +0200)]
pf: fix missing SCTP multihomed states

The existing code to create extra states when SCTP endpoints supplied
extra addresses missed a case. As a result we failed to generate all of
the required states.

Briefly, if host A has address 1 and 2 and host B has addres 3 and 4 we
generated 1 - 3 and 2 - 3, as well as 1 - 4, but not 2 - 4.

Store the list of endpoints supplied by each host and use those to
generate all of the connection permutations.

MFC after: 1 week
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D42361

8 months agoGiant: Postpone removal of Giant-locked drivers until 15
Zhenlei Huang [Tue, 31 Oct 2023 12:45:14 +0000 (20:45 +0800)]
Giant: Postpone removal of Giant-locked drivers until 15

Reviewed by: imp
MFC after: 1 day
Differential Revision: https://reviews.freebsd.org/D42401

8 months agosetkey(8): make the policy specification more readable
Konstantin Belousov [Tue, 31 Oct 2023 04:07:10 +0000 (06:07 +0200)]
setkey(8): make the policy specification more readable

by applying markup and highlighting the semantical blocks.

Sponsored by: NVidia networking
MFC after: 1 week

8 months agoUnify arc_prune_async() code
Alexander Motin [Mon, 30 Oct 2023 23:56:04 +0000 (19:56 -0400)]
Unify arc_prune_async() code

There is no sense to have separate implementations for FreeBSD and
Linux.  Make Linux code shared as more functional and just register
FreeBSD-specific prune callback with arc_add_prune_callback() API.

Aside of code cleanup this should fix excessive pruning on FreeBSD:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=274698

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Johnston <markj@FreeBSD.org>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15456

8 months agotzsetup: make UTC the first (default) region
Brooks Davis [Mon, 30 Oct 2023 23:33:14 +0000 (23:33 +0000)]
tzsetup: make UTC the first (default) region

Many sysadmins prefer to configure their systems to UTC and it's a
reasonable default when installing, making it easier to get a usable
system by just hitting <return> repeatidly.

Renumber UTC to 0 to preserve the finger memory of those selecting a
region by shortcut.

Reviewed by: jrtc27, emaste
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D42383

8 months agoTune zio buffer caches and their alignments
Alexander Motin [Mon, 30 Oct 2023 21:55:32 +0000 (17:55 -0400)]
Tune zio buffer caches and their alignments

We should not always use PAGESIZE alignment for caches bigger than
it and SPA_MINBLOCKSIZE otherwise.  Doing that caches for 5, 6, 7,
10 and 14KB rounded up to 8, 12 and 16KB respectively make no sense.
Instead specify as alignment the biggest power-of-2 divisor.  This
way 2KB and 6KB caches are both aligned to 2KB, while 4KB and 8KB
are aligned to 4KB.

Reduce number of caches to half-power of 2 instead of quarter-power
of 2.  This removes caches difficult for underlying allocators to
fit into page-granular slabs, such as: 2.5, 3.5, 5, 7, 10KB, etc.
Since these caches are mostly used for transient allocations like
ZIOs and small DBUF cache it does not worth being too aggressive.
Due to the above alignment issue some of those caches were not
working properly any way.  6KB cache now finally has a chance to
work right, placing 2 buffers into 3 pages, that makes sense.

Remove explicit alignment in Linux user-space case.  I don't think
it should be needed any more with the above fixes.

As result on FreeBSD instead of such numbers of pages per slab:

vm.uma.zio_buf_comb_16384.keg.ppera: 4
vm.uma.zio_buf_comb_14336.keg.ppera: 4
vm.uma.zio_buf_comb_12288.keg.ppera: 3
vm.uma.zio_buf_comb_10240.keg.ppera: 3
vm.uma.zio_buf_comb_8192.keg.ppera: 2
vm.uma.zio_buf_comb_7168.keg.ppera: 2
vm.uma.zio_buf_comb_6144.keg.ppera: 2   <= Broken
vm.uma.zio_buf_comb_5120.keg.ppera: 2
vm.uma.zio_buf_comb_4096.keg.ppera: 1
vm.uma.zio_buf_comb_3584.keg.ppera: 7   <= Hard to free
vm.uma.zio_buf_comb_3072.keg.ppera: 3
vm.uma.zio_buf_comb_2560.keg.ppera: 2
vm.uma.zio_buf_comb_2048.keg.ppera: 1
vm.uma.zio_buf_comb_1536.keg.ppera: 2
vm.uma.zio_buf_comb_1024.keg.ppera: 1
vm.uma.zio_buf_comb_512.keg.ppera: 1

I am now getting such:

vm.uma.zio_buf_comb_16384.keg.ppera: 4
vm.uma.zio_buf_comb_12288.keg.ppera: 3
vm.uma.zio_buf_comb_8192.keg.ppera: 2
vm.uma.zio_buf_comb_6144.keg.ppera: 3   <= Fixed, 2 in 3 pages
vm.uma.zio_buf_comb_4096.keg.ppera: 1
vm.uma.zio_buf_comb_3072.keg.ppera: 3
vm.uma.zio_buf_comb_2048.keg.ppera: 1
vm.uma.zio_buf_comb_1536.keg.ppera: 2
vm.uma.zio_buf_comb_1024.keg.ppera: 1
vm.uma.zio_buf_comb_512.keg.ppera: 1

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15452

8 months agoRAIDZ: Use cache blocking during parity math
Alexander Motin [Mon, 30 Oct 2023 21:54:27 +0000 (17:54 -0400)]
RAIDZ: Use cache blocking during parity math

RAIDZ parity is calculated by adding data one column at a time.  It
works OK for small blocks, but for large blocks results of previous
addition may already be evicted from CPU caches to main memory, and
in addition to extra memory write require extra read to get it back.

This patch splits large parity operations into 64KB chunks, that
should in most cases fit into CPU L2 caches from the last decade.
I haven't touched more complicated cases of data reconstruction to
not over complicate the code.  Those should be relatively rare.

My tests on Xeon Gold 6242R CPU with 1MB of L2 cache per core show
up to 10/20% memory traffic reduction when writing to 4-wide RAIDZ/
RAIDZ2 blocks of ~4MB and up.  Older CPUs with 256KB of L2 cache
should see the effect even on smaller blocks.  Wider vdevs may need
bigger blocks to be affected.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15448

8 months agoZIL: Cleanup sync and commit handling
Alexander Motin [Mon, 30 Oct 2023 21:51:56 +0000 (17:51 -0400)]
ZIL: Cleanup sync and commit handling

ZVOL:
 - Mark all ZVOL ZIL transactions as sync.  Since ZVOLs have only
one object, it makes no sense to maintain async queue and on each
commit merge it into sync. Single sync queue is just cheaper, while
it changes nothing until actual commit request arrives.
 - Remove zsd_sync_cnt and the zil_async_to_sync() calls since we
are no longer switching between sync and async queues.

ZFS:
 - Mark write transactions as sync based only on number of sync
opens (z_sync_cnt).  We can not randomly jump between sync and
async unless we want data corruptions due to writes reordering.
 - When file first opened with O_SYNC (z_sync_cnt incremented to 1)
call zil_async_to_sync() for it to preserve correct ordering between
past and future writes.
 - Drop zfs_fsyncer_key logic.  Looks like it was an optimization
for workloads heavily intermixing async writes with tons of fsyncs.
But first it was broken 8 years ago due to Linux tsd implementation
not allowing data storage between syscalls, and second, I doubt it
is safe to switch from async to sync so often and without calling
zil_async_to_sync().

 - Rename sync argument of *_log_write() into commit, now only
signalling caller's intent to call zil_commit() soon after.  It
allows WR_COPIED optimizations without extra other meanings.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15366

8 months agozfs: merge openzfs/zfs@043c6ee3b
Martin Matuska [Mon, 30 Oct 2023 20:28:03 +0000 (21:28 +0100)]
zfs: merge openzfs/zfs@043c6ee3b

Notable upstream pull request merges:
 #15360 97a0b5be Add mutex_enter_interruptible() for interruptible sleeping
                 IOCTLs
 #15381 252f46be ZIL: Detect single-threaded workloads
 #15398 3afdc97d ZIO: Remove READY pipeline stage from root ZIOs
 #15428 e007908a ABD: Be more assertive in iterators
 #15436 07345ac2 Add prefetch property
 #15438 e9725abd Revert "Do not persist user/group/project quota zap objects
                 when unneeded"
 #15451 043c6ee3 Read prefetched buffers from L2ARC

Obtained from: OpenZFS
OpenZFS commit: 043c6ee3b6bfb55f8d36e1f048ff13128c279fb8

8 months agolibpfctl: remove unused field from struct pfctl_states
Kristof Provost [Mon, 30 Oct 2023 18:04:12 +0000 (19:04 +0100)]
libpfctl: remove unused field from struct pfctl_states

We never populate this, or use it, so remove it.

MFC after: 3 days
Sponsored by: Rubicon Communications, LLC ("Netgate")

8 months agolibpfctl: add missing pfctl_status_lcounter() function
Kristof Provost [Mon, 30 Oct 2023 18:02:29 +0000 (19:02 +0100)]
libpfctl: add missing pfctl_status_lcounter() function

We already had accessors for the other types of counters, but not this
one.

MFC after: 3 days
Sponsored by: Rubicon Communications, LLC ("Netgate")

8 months agoarm64: Add a BTI landing pad to .mcount
Andrew Turner [Mon, 16 Oct 2023 14:34:19 +0000 (15:34 +0100)]
arm64: Add a BTI landing pad to .mcount

The .mcount function needs a BTI branch target. As we can't rely on
asm.h being included use the hint version of a "bti c" instruction.
This is a nop when BTI is not supported or not used.

Reviewed by: markj
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D42230

8 months agortld: Teach rtld about the BTI elf note
Andrew Turner [Thu, 12 Oct 2023 16:29:46 +0000 (17:29 +0100)]
rtld: Teach rtld about the BTI elf note

Add the Branch Target Identification (BTI) note to libc assembly
sources. As all obect files need the note for rtld to have it we need
to insert it in all asm files.

Reviewed by: markj, emaste
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D42228

8 months agocsu: Teach csu about PAC and BTI
Andrew Turner [Thu, 12 Oct 2023 10:03:37 +0000 (11:03 +0100)]
csu: Teach csu about PAC and BTI

Add the Branch Target Identification (BTI) note to libc assembly
sources and Pointer Authentication Code (PAC) instructions to _init and
_fini.

_init and _fini may be called indirectly so need a BTI landing pad. As
they are non-leaf functions use the appropriate PAC instruction that
also guards against changing the link register.

As all object files need the note for any binary using these object files
we need to insert it in all asm files.

Reviewed by: markj
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D42227

8 months agortld: introduce STATIC_TLS_EXTRA
Stephen J. Kiernan [Sun, 29 Oct 2023 21:13:10 +0000 (17:13 -0400)]
rtld: introduce STATIC_TLS_EXTRA

The new STATIC_TLS_EXTRA variable provides a means for applications
to increases the size of the extra static TLS space allocated by
rtld beyond the default of '128'. This extra static TLS space is used
for objects loaded with dlopen.

The value specified in the variable must be no less than the default
value and no greater than the maximum allowed value for size_t type.

If an invalid value is specified, rtld will ignore it and just use
the default value.

The rtld(1) man page is updated to document this new option.

Obtained from:  Juniper Networks, Inc.
Differential Revision:  https://reviews.freebsd.org/D42025

8 months agoAdd some Intel ICH10 PCI IDs.
Dmitry Luhtionov [Mon, 30 Oct 2023 14:37:36 +0000 (10:37 -0400)]
Add some Intel ICH10 PCI IDs.

8 months agoAdd IDs for Intel BayTrail SATA.
Dmitry Luhtionov [Mon, 30 Oct 2023 13:57:19 +0000 (09:57 -0400)]
Add IDs for Intel BayTrail SATA.

8 months agoRevert "pf: remove COMPAT_FREEBSD14 #ifdef from pfvar.h"
Kristof Provost [Mon, 30 Oct 2023 08:09:56 +0000 (09:09 +0100)]
Revert "pf: remove COMPAT_FREEBSD14 #ifdef from pfvar.h"

This reverts commit 9eff6390718d0fa67dffc6cd830b0bc6b815e8c4.

The libpfctl port has been fixed (to avoid using DIOCGETSTATESV2), so we
can now safely revert this.

Sponsored by: Rubicon Communications, LLC ("Netgate")

8 months agosrc.conf.5: regen after addition of KERNEL_BIN
Ed Maste [Sun, 29 Oct 2023 22:55:56 +0000 (18:55 -0400)]
src.conf.5: regen after addition of KERNEL_BIN

Fixes: 34632ed1a495 ("arm: Introduce MK_KERNEL_BIN to control gener...")
Sponsored by: The FreeBSD Foundation

8 months agodirdeps: Add missing dependency files
Stephen J. Kiernan [Sun, 29 Oct 2023 21:08:29 +0000 (17:08 -0400)]
dirdeps: Add missing dependency files

Some leaf directories were missing Makefile.depend files or needed
architecture-specific Makefile.depend.* files.

8 months agodirdeps: Update Makefile.depend* files with empty contents
Stephen J. Kiernan [Sun, 29 Oct 2023 21:01:04 +0000 (17:01 -0400)]
dirdeps: Update Makefile.depend* files with empty contents

Some Makefile.depend* files were committed with no contents or empty
DIRDEPS list, but they should have DIRDEPS with some contents.

8 months agoStop adding $FreeBSD$ to Makefile.depend
Simon J. Gerraty [Sun, 29 Oct 2023 18:40:03 +0000 (11:40 -0700)]
Stop adding $FreeBSD$ to Makefile.depend

Reviewed by: stevek

8 months agoibcore: Introduce enum ib_raw_packet_caps from Linux 4.11
Ka Ho Ng [Sat, 28 Oct 2023 20:57:49 +0000 (16:57 -0400)]
ibcore: Introduce enum ib_raw_packet_caps from Linux 4.11

This enum also exists as enum ibv_raw_packet_caps in libibverbs/verbs.h.

[khng: cherry-picked from Linux
ebaaee253ad3a3c573ab7d3d77e849056bdfa9ea]

Sponsored by: Juniper Networks, Inc.
MFC after: 7 days
Reviewed by: kib, zlei
Differential Revision: https://reviews.freebsd.org/D42177

8 months agomlx5ib: Fix RSS Toeplitz setup to be aligned with the HW specification
Yishai Hadas [Sat, 28 Oct 2023 20:55:47 +0000 (16:55 -0400)]
mlx5ib: Fix RSS Toeplitz setup to be aligned with the HW specification

The specification for the Toeplitz function doesn't require to set the key
explicitly to be symmetric. In case a symmetric functionality is required
a symmetric key can be simply used.

Wrongly forcing the algorithm to symmetric causes the wrong packet
distribution and a performance degradation.

Link: https://lore.kernel.org/r/20190723065733.4899-7-leon@kernel.org
Fixes: 28d6137008b2 ("IB/mlx5: Add RSS QP support")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Alex Vainman <alexv@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
[khng: cherry-picked from Linux
b7165bd0d6cbb93732559be6ea8774653b204480]

Sponsored by: Juniper Networks, Inc.
MFC after:      7 days
Reviewed by: kib, zlei
Differential Revision: https://reviews.freebsd.org/D42178

8 months agomlx5ib: Fix ethertype to be ETH_P_IPV6
Ka Ho Ng [Sat, 28 Oct 2023 20:54:32 +0000 (16:54 -0400)]
mlx5ib: Fix ethertype to be ETH_P_IPV6

Sponsored by: Juniper Networks, Inc.
MFC after: 7 days
Reviewed by: ae, kib, zlei
Differential Revision: https://reviews.freebsd.org/D42184

8 months agoisa: Postpone removal of the non-PNP driver until 15
Zhenlei Huang [Sat, 28 Oct 2023 20:31:11 +0000 (04:31 +0800)]
isa: Postpone removal of the non-PNP driver until 15

Reviewed by: imp
MFC after: 1 day
Differential Revision: https://reviews.freebsd.org/D42387

8 months agolib/libcrypt: another trivial style change
Enji Cooper [Sat, 28 Oct 2023 01:56:41 +0000 (18:56 -0700)]
lib/libcrypt: another trivial style change

Normalize on hard tabs.

I didn't catch this before pushing the previous commit.

No functional changes intended.

MFC after: 2 weeks
MFC with: 8ef8da882ff475e3da3bde57d97593a68f7d97b2

8 months agolibc: Strip plentiful trailing whitespace from aarch64+arm makecontext.c
Jessica Clarke [Sat, 28 Oct 2023 01:45:06 +0000 (02:45 +0100)]
libc: Strip plentiful trailing whitespace from aarch64+arm makecontext.c

8 months agolib/libcrypt: remove trailing whitespace
Enji Cooper [Sat, 28 Oct 2023 01:10:39 +0000 (18:10 -0700)]
lib/libcrypt: remove trailing whitespace

No functional change intended.

MFC after: 2 weeks

8 months agoFix build with gcc12.
Navdeep Parhar [Fri, 27 Oct 2023 23:39:12 +0000 (16:39 -0700)]
Fix build with gcc12.

8 months agodevd: Improve devmatch support
Warner Losh [Fri, 27 Oct 2023 21:23:47 +0000 (15:23 -0600)]
devd: Improve devmatch support

We know that calling devmatch will be futile if there's no plug and play
information for it to match on. Avoid this generically when we see
"? at +on"
which happens only when the location and pnpinfo aren't provided. Don't
call "service devmatch quietstart" here.

We also ignore ACPI devices with a _HID of none. These also will never
load a new driver, so avoid calling "service devmatch quietstart" here too.

Use the more compatct "$*" instead of "'?'$_" when calling "service
devmatch quietstart" since it will evaluate to the same thing.

On my laptop, this eliminates 45% of the calls to devmatch. While it
would be even better to integrate devmatch into devd (so we only parse
linker.hints once), that will have to wait for another day as it's a bit
more complex to arrange that avoiding easy to avoid calls.

Sponsored by: Netflix
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D42326

8 months agodevd: Remove obsolete / wrong nomatch examples
Warner Losh [Fri, 27 Oct 2023 21:23:40 +0000 (15:23 -0600)]
devd: Remove obsolete / wrong nomatch examples

These examples are wrong, and with devmatch, nobody would ever see them
(since it's a higher priority).

Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D42325

8 months agoudbp: Remove stale splnet comment
Warner Losh [Fri, 27 Oct 2023 20:12:04 +0000 (14:12 -0600)]
udbp: Remove stale splnet comment

netgraph no longer needs splnet. Document that we're forcing queueing.

Sponsored by: Netflix

8 months agomwl: Remove stale reference to splnet/splvm.
Warner Losh [Fri, 27 Oct 2023 20:10:46 +0000 (14:10 -0600)]
mwl: Remove stale reference to splnet/splvm.

Sponsored by: Netflix

8 months agootus: splnet isn't a thing, remove place holders
Warner Losh [Fri, 27 Oct 2023 20:03:51 +0000 (14:03 -0600)]
otus: splnet isn't a thing, remove place holders

Even though it's if 0'd code, remove splnet/splx. This code can't
possibly run on FreeBSD, but having it here as a marker isn't especially
helpful. It causes a false positive on grepping otherwise.

Sponsored by: Netflix

8 months agonetgraph: Fix obsolete comment
Warner Losh [Fri, 27 Oct 2023 20:00:37 +0000 (14:00 -0600)]
netgraph: Fix obsolete comment

splnet is no more, adjust the comment.

Sponsored by: Netflix

8 months agoclock_gettime: Minor clarification
Warner Losh [Wed, 3 Aug 2022 16:24:47 +0000 (10:24 -0600)]
clock_gettime: Minor clarification

Add a note saying that the CLOCK_BOOTTIME is unrelated to FreeBSD's
kern.boottime sysctl. Make a minor tweak to markup.

Feedback from: pauammu
Sponsored by: Netflix
Differential Revsion: https://reviews.freebsd.org/D36037

8 months agostrlcpy/strlcat: Remove references to snprintf
Warner Losh [Fri, 27 Oct 2023 16:11:29 +0000 (10:11 -0600)]
strlcpy/strlcat: Remove references to snprintf

While strlcpy and snprintf are somewhat similar, there's big differences
between strlcat and snprintf which leads to confusion. Remove the
comparison, since it's ultimately not that useful: the snprintf man page
has similar language to strlcpy, so it doesn't provide a better
reference. The two implementations are otherwise unrelated.

Reviewed by: bcr
Sponsored by: Netflix
Differential Revision:  https://reviews.freebsd.org/D27228

8 months agonetinet: The tailq_hash code doesn't reference tcpoutflags
Warner Losh [Fri, 27 Oct 2023 14:39:31 +0000 (08:39 -0600)]
netinet: The tailq_hash code doesn't reference tcpoutflags

Don't define TCPOUTFLAGS to get the static definition from tcp_fsm.h.
tailq_hash.c doesn't refernce tcpoutflag. Only files that reference this
should define TCPOUTFLAGS. clang is fine with it, but gcc12 complained.

Sponsored by: Netflix

8 months agopkgbase: compress packages with zstandard
Baptiste Daroussin [Thu, 26 Oct 2023 20:34:00 +0000 (22:34 +0200)]
pkgbase: compress packages with zstandard

MFC After: 3 days
Reviewed by: manu
Differential Revision: https://reviews.freebsd.org/D42375

8 months agoarm64: Use the Linux sigframe to restore registers
Andrew Turner [Wed, 25 Oct 2023 09:50:11 +0000 (10:50 +0100)]
arm64: Use the Linux sigframe to restore registers

When returning from a Linux signal use the Linux sigframe to find the
register values to restore.

Remove the FreeBSD ucontext from the stack as it's now unneeded.

Reviewed by: dchagin, emaste
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D42360

8 months agoUPDATING: Document branch creation
Warner Losh [Fri, 27 Oct 2023 04:14:23 +0000 (22:14 -0600)]
UPDATING: Document branch creation

Document when stable/12, stable/13 and stable/14 were created. Once we
release 14.0, I'll trim the stable/11 branchpoint through stable/12
brnachpoint. Documenting all of these will make it easier in the future.

Sponsored by: Netflix

8 months agoefibootmgr: -C isn't implemented
Warner Losh [Tue, 17 May 2022 16:47:50 +0000 (10:47 -0600)]
efibootmgr: -C isn't implemented

-C isn't implemented, so just errx out until it is. It's not listed in
the man page, but is parsed for compatibility with the Linux
efibootmgr(8) command.

Sponsored by: Netflix

8 months agoefibootmgr: support '-b bootXXXX' as an alias for '-b XXXX'
Warner Losh [Tue, 17 May 2022 16:47:03 +0000 (10:47 -0600)]
efibootmgr: support '-b bootXXXX' as an alias for '-b XXXX'

Sponsored by: Netflix

8 months agoarm: prune imx5 support from the tree
Kyle Evans [Fri, 27 Oct 2023 03:55:17 +0000 (22:55 -0500)]
arm: prune imx5 support from the tree

The IMX5 configs were removed in advance of FreeBSD 14.0 in
cdb0c2a73df ("arm: Remove IMX5 specific kernel configs").  This code
isn't built with GENERIC and doesn't actually build today as-is, so
let's remove it to avoid needless maintenance work to it that won't be
tested.  As usual, revival is welcome with a committed user and work to
maintain it with upstream DTS and, ideally, in GENERIC.

I note that vt_early_fb is now effectively orphaned as nothing else will
use it, but I haven't yet removed it since I have not done anything to
ascertain if it could be integrated easily enough for other SoC.  It is
among the files that doesn't actually build with today's clang, though.

Reviewed by: imp, manu
Differential Revision: https://reviews.freebsd.org/D41836

8 months agoarm: Introduce MK_KERNEL_BIN to control generation of kernel.bin
Warner Losh [Fri, 27 Oct 2023 03:10:36 +0000 (21:10 -0600)]
arm: Introduce MK_KERNEL_BIN to control generation of kernel.bin

It's sometimes desirable to generate kernel.bin and install it. While
the mainstream has moved on to UEFI booting on arm, some specialized
gear can't support it. For that gear, we unconditionally generate
kernel.bin. Add a knob so that WITH_KERNEL_BIN or WITHOUT_KERNEL_BIN
control its generation and installation. config files should add
'makeoptions WITH_KERNEL_BIN=t' to enable it. Since its use is
specialized, it is off by default now since the arm world has largely
moved on to UEFI.

It only affects arm and arm64 (since those are the only two that support
it).

Sponsored by: Netflix
Reviewed by: mmel
Differential Revision: https://reviews.freebsd.org/D39013

8 months agopthread_mutexattr(3), _condattr(3): reference libthr(3)
Konstantin Belousov [Mon, 23 Oct 2023 23:03:42 +0000 (02:03 +0300)]
pthread_mutexattr(3), _condattr(3): reference libthr(3)

Reviewed by: emaste
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Differential revision: https://reviews.freebsd.org/D42344

8 months agopthread_mutexattr_init(3): describe pthread_mutexattr_{set,get}pshared
Konstantin Belousov [Mon, 23 Oct 2023 22:54:54 +0000 (01:54 +0300)]
pthread_mutexattr_init(3): describe pthread_mutexattr_{set,get}pshared

PR: 274678
Reviewed by: emaste
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Differential revision: https://reviews.freebsd.org/D42344

8 months agottys: bump date
Warner Losh [Thu, 26 Oct 2023 18:37:51 +0000 (12:37 -0600)]
ttys: bump date

8 months agoauxv: make AT_BSDFLAGS unsigned
Brooks Davis [Thu, 26 Oct 2023 17:38:35 +0000 (18:38 +0100)]
auxv: make AT_BSDFLAGS unsigned

AT_BSDFLAGS shouldn't be sign extended on 64-bit systems so use a
uint32_t instead of an int.

Reviewed by: imp, kib
Differential Revision: https://reviews.freebsd.org/D42365

8 months agoprocctl.2: improve phrasing for ASLR disable
Brooks Davis [Thu, 26 Oct 2023 17:38:14 +0000 (18:38 +0100)]
procctl.2: improve phrasing for ASLR disable

Reported by: jrtc27
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D42364

8 months agoRead prefetched buffers from L2ARC
shodanshok [Thu, 26 Oct 2023 16:40:21 +0000 (18:40 +0200)]
Read prefetched buffers from L2ARC

Prefetched buffers are currently read from L2ARC if, and only if,
l2arc_noprefetch is set to non-default value of 0. This means that
a streaming read which can be served from L2ARC will instead engage
the main pool.

For example, consider what happens when a file is sequentially read:
- application requests contiguous data, engaging the prefetcher;
- ARC buffers are initially marked as prefetched but, as the calling
application consumes data, the prefetch tag is cleared;
- these "normal" buffers become eligible for L2ARC and are copied to it;
- re-reading the same file will *not* engage L2ARC even if it contains
the required buffers;
- main pool has to suffer another sequential read load, which (due to
most NCQ-enabled HDDs preferring sequential loads) can dramatically
increase latency for uncached random reads.

In other words, current behavior is to write data to L2ARC (wearing it)
without using the very same cache when reading back the same data. This
was probably useful many years ago to preserve L2ARC read bandwidth but,
with current SSD speed/size/price, it is vastly sub-optimal.

Setting l2arc_noprefetch=1, while enabling L2ARC to serve these reads,
means that even prefetched but unused buffers will be copied into L2ARC,
further increasing wear and load for potentially not-useful data.

This patch enable prefetched buffer to be read from L2ARC even when
l2arc_noprefetch=1 (default), increasing sequential read speed and
reducing load on the main pool without polluting L2ARC with not-useful
(ie: unused) prefetched data. Moreover, it clear users confusion about
L2ARC size increasing but not serving any IO when doing sequential
reads.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #15451

8 months agoAdd mutex_enter_interruptible() for interruptible sleeping IOCTLs
Thomas Bertschinger [Thu, 26 Oct 2023 16:17:40 +0000 (10:17 -0600)]
Add mutex_enter_interruptible() for interruptible sleeping IOCTLs

Many long-running ZFS ioctls lock the spa_namespace_lock, forcing
concurrent ioctls to sleep for the mutex. Previously, the only
option is to call mutex_enter() which sleeps uninterruptibly. This
is a usability issue for sysadmins, for example, if the admin runs
`zpool status` while a slow `zpool import` is ongoing, the admin's
shell will be locked in uninterruptible sleep for a long time.

This patch resolves this admin usability issue by introducing
mutex_enter_interruptible() which sleeps interruptibly while waiting
to acquire a lock. It is implemented for both Linux and FreeBSD.

The ZFS_IOC_POOL_CONFIGS ioctl, used by `zpool status`, is changed to
use this new macro so that the command can be interrupted if it is
issued during a concurrent `zpool import` (or other long-running
operation).

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Thomas Bertschinger <bertschinger@lanl.gov>
Closes #15360

8 months agogpart: Be less picky about GPT Tables in some cases
Warner Losh [Thu, 26 Oct 2023 16:14:54 +0000 (10:14 -0600)]
gpart: Be less picky about GPT Tables in some cases

When we're recoverying a damangae GPT, or when we're restoring a backed
up partition tables, don't enforce the 4k alignment for start/end LBAs.
This is useful for 512e/4kn drives when we're creating a new partition
table or partition. However, when we're trying to fix / restore an old
partition, we shouldn't force this alignment, since in that case it's
more important to use the partition table as is than to optimize
performance by rounding (which isn't required by the standard).

MFC After: 1 week
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D42359

8 months agoarc_default_max on Linux should match FreeBSD
ednadolski-ix [Thu, 26 Oct 2023 16:13:01 +0000 (10:13 -0600)]
arc_default_max on Linux should match FreeBSD

Commits 518b487 and 23bdb07 changed the default ARC size limit on
Linux systems to 1/2 of physical memory, which has become too
strict for modern systems with large amounts of RAM. This patch
changes the default limit to match that of FreeBSD, so ZFS may
have a unified value on both platforms.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Edmund Nadolski <edmund.nadolski@ixsystems.com>
Closes #15437

8 months agottys: Document insecure flag
Warner Losh [Thu, 26 Oct 2023 16:09:16 +0000 (10:09 -0600)]
ttys: Document insecure flag

Both the secure and insecure flag is documented in init(8). the secure
flag is documented here; however, the insecure flag is not. Nor is the
nuance that a line missing the 'secure' flag is also considered
insecure. Document both here.

Sponsored by: Netflix

8 months agodevd: Restore WARNS=6
Warner Losh [Thu, 26 Oct 2023 04:35:45 +0000 (22:35 -0600)]
devd: Restore WARNS=6

We compile correctly on all platforms with clang and WARNS=6. We build
on amd64 with gcc12 and WARNS.6. Restore WARNS=6. This reverts
3741a56c310d, since that's no longer relevant.

Sponsored by: Netflix

8 months agopf: Fix packet reassembly
Kajetan Staszkiewicz [Thu, 26 Oct 2023 12:26:33 +0000 (14:26 +0200)]
pf: Fix packet reassembly

Don't drop fragmented packets when reassembly is disabled, they can be
matched by rules with "fragment" keyword. Ensure that presence of scrub
rules forces old behaviour.

Reviewed by: kp
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D42355

8 months agopf tests: Add option to send fragmented packets
Kajetan Staszkiewicz [Thu, 26 Oct 2023 09:14:14 +0000 (11:14 +0200)]
pf tests: Add option to send fragmented packets

Add option to send fragmented packets and to properly sniff them by
reassembling them by the sniffer itself.

Reviewed by: kp
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D42354

8 months agobhyve: fix arguments to ioctl(VMIO_SIOCSIFFLAGS)
Gleb Smirnoff [Thu, 26 Oct 2023 09:59:21 +0000 (02:59 -0700)]
bhyve: fix arguments to ioctl(VMIO_SIOCSIFFLAGS)

ioctl(2)'s with integer argument shall pass command argument by value,
not by pointer.  The ioctl(2) manual page is not very clear about that.
See sys/kern/sys_generic.c:sys_ioctl() near IOC_VOID.

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D42366
Fixes: fd8b9c73a5a63a7aa438a73951d7a535b4f25d9a

8 months agoDelete snapshot after opening it when running fsck_ffs(9) in background.
Kirk McKusick [Wed, 25 Oct 2023 22:36:45 +0000 (15:36 -0700)]
Delete snapshot after opening it when running fsck_ffs(9) in background.

When fsck_ffs(8) runs in background, it creates a snapshot named
fsck_snapshot in the filesystem's .snap directory. The fsck_snapshot
file was removed when the background fsck finished. If the system
crashed or the fsck exited unexpectedly, the fsck_snapshot file
would remain. The snapshot would consume ever more space as the
filesystem changed over time until it was removed by a system
administrator or a future run of background fsck removed it to
create a new snapshot file.

This commit unlinks the .snap/fsck_snapshot file immediately after
opening it so that it will be reclaimed when fsck closes it at the
conclusion of its run. After a system crash, it will be removed as
part of the filesystem cleanup because of its zero reference count.
As only a few milliseconds pass between its creation and unlinking,
there is far less opportunity for it to be accidentally left behind.

PR:           106107
MFC-after:    1 week

8 months agoZIO: Remove READY pipeline stage from root ZIOs
Alexander Motin [Wed, 25 Oct 2023 22:22:25 +0000 (18:22 -0400)]
ZIO: Remove READY pipeline stage from root ZIOs

zio_root() has no arguments for ready callback or parent ZIO. Except
one recent case in ZIL code if root ZIOs ever have a parent it is
also a root ZIO.  It means we do not need READY pipeline stage for
them, which takes some time to process, but even more time to wait
for the children and be woken by them, and both for no good reason.

The most visible effect of this change is that it avoids one taskq
wakeup per ZIL block written, previously used to run zio_ready()
for lwb_root_zio and skipped now.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15398

8 months agoOpenSSL: regenerate asm files for 3.0.12
Ed Maste [Wed, 25 Oct 2023 17:28:47 +0000 (13:28 -0400)]
OpenSSL: regenerate asm files for 3.0.12

Fixes: ad991e4c142e ("OpenSSL: update to 3.0.12")
Sponsored by: The FreeBSD Foundation

8 months agotcp: Silence a -Wunused-function warning in tcp_ratelimit.h
Mark Johnston [Wed, 25 Oct 2023 14:03:58 +0000 (10:03 -0400)]
tcp: Silence a -Wunused-function warning in tcp_ratelimit.h

No functional change intended.