Ed Maste [Tue, 11 Oct 2022 19:27:51 +0000 (15:27 -0400)]
Track upstream project rename in contrib/blocklistd
Upstream is now https://github.com/zoulasc/blocklist/. Rename the
contrib directory and update Makefiles to match, in advance of the next
vendor branch update.
Shawn Anastasio [Fri, 3 Nov 2023 17:40:18 +0000 (14:40 -0300)]
powerpc: Fix inconsistent Altivec handling in set_mcontext
When support for fpu_kern_enter/fpu_kern_leave was added to powerpc,
set_mcontext was updated to handle Altivec state restoration in the same
way that the FPU state by lazily restoring the context on the first
trap. However the function was not correctly updated to unconditionally
clear the PCB_VEC and PSL_VEC bits from the pcb's flags and srr1
respectively which can sometimes result in a mismatch between a
process's MSR[VEC] state and its pcb_flags.
Fix this by simply clearing the VEC flags unconditionally in
set_mcontext, which is already done for FPU/VSX.
Warner Losh [Thu, 2 Nov 2023 20:41:09 +0000 (14:41 -0600)]
cam: Make cam_debug macros atomic
The CAM_DEBUG* macros use multiple printfs to dump the data. This is
suboptimal when tracing things that produce even a moderate amount since
it gets intertwingled. I can't even turn on tracing with a 24-disk HBA
on boot without it getting messed up. Add helper routines to work around
clang's over-use of the stack: that way we only pay the stack penalty
when a trace hits.
This is used to the package annotation helping pkg to know about
backward compatibility is set to the version of the packages not
the version of the host building the packages
pkgbase: set a default set of kernel for when PACKAGE_BUILDING=1
PACKAGE_BUILDING is already known in the ports tree as a variable
use to defined when the packages is being actually built in an
automation process, reuse that variable to define the default set
of kernel we plan to build for the default pkgbase.
Rick Macklem [Thu, 2 Nov 2023 21:07:01 +0000 (14:07 -0700)]
krpc: Display stats of TLS usage
This patch adds some sysctls:
kern.rpc.unenc.tx_msgcnt
kern.rpc.unenc.tx_msgbytes
kern.rpc.unenc.rx_msgcnt
kern.rpc.unenc.rx_msgbytes
kern.rpc.tls.tx_msgcnt
kern.rpc.tls.tx_msgbytes
kern.rpc.tls.rx_msgcnt
kern.rpc.tls.rx_msgbytes
kern.rpc.tls.handshake_success
kern.rpc.tls.handshake_failed
kern.rpc.tls.alerts
which allow a NFS server sysadmin to determine how much
NFS-over-TLS is being used. A large number of failed
handshakes might also indicate an NFS confirguration
problem.
This patch moves the definition of "kern.rpc" from the
kgssapi module to the krpc module. As such, both modules
need to be rebuilt from sources. Since __FreeBSD_version
was bumped yesterday, I will not bump it again.
Suggested by: gwollman
Discussed on: freebsd-current
MFC after: 1 month
Mark Johnston [Thu, 2 Nov 2023 18:34:26 +0000 (14:34 -0400)]
riscv: Remove unnecessary invalidations in pmap_enter_quick_locked()
This function always overwrites an invalid PTE, so if
pmap_try_insert_pv_entry() fails it is certainly not necessary to
invalidate anything, because the PTE has not yet been written by that
point.
It should also not be necessary to invalidate TLBs after overwriting an
invalid entry. In principle the TLB could cache negative entries, but
then the worst case scenario is a spurious fault. Since pmap_enter()
does not bother issuing an sfence.vma, pmap_enter_quick_locked() should
behave similarly.
Mark Johnston [Thu, 2 Nov 2023 18:33:55 +0000 (14:33 -0400)]
riscv: Port improvements from arm64/amd64 pmaps, part 2
- Give pmap_promote_l2() a return value indicating whether or not
promotion succeeded.
- Check pmap_ps_enabled() in pmap_promote_l2() rather than making
callers do it.
- Annotate superpages_enabled with __read_frequently.
Mark Johnston [Thu, 2 Nov 2023 18:33:37 +0000 (14:33 -0400)]
riscv: Port improvements from arm64/amd64 pmaps, part 1
- When promoting, do not require that all PTEs all have PTE_A set.
Instead, record whether they did and store this information in the
PTP's valid bits.
- Synchronize some comments in pmap_promote_l2().
- Make pmap_promote_l2() scan starting from the end of the 2MB range
instead of the beginning. See the commit log for 9d1b7fa31f510 for
justification of this, which I believe applies here as well.
Mark Johnston [Thu, 2 Nov 2023 18:30:10 +0000 (14:30 -0400)]
amd64: Remove PMAP_INLINE
With clang it expands to "inline"; clang in practice may inline
externally visible functions even without the hint. So just remove the
hints and let the compiler decide.
No functional change intended. pmap.o is identical before and after
this patch.
Olivier Certner [Fri, 13 Oct 2023 08:52:31 +0000 (10:52 +0200)]
Ensure 'struct thread' is aligned to a cache line
Using the new UMA_ALIGN_CACHE_AND_MASK() facility, which allows to
simultaneously guarantee a minimum of 32 bytes of alignment (the 5 lower
bits are always 0).
For the record, to this day, here's a (possibly non-exhaustive) list of
synchronization primitives using lower bits to store flags in pointers
to thread structures:
- lockmgr, rwlock and sx all use the 5 bits directly.
- rmlock indirectly relies on sx, so can use the 5 bits.
- mtx (non-spin) relies on the 3 lower bits.
Reviewed by: markj, kib
MFC after: 2 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42266
Olivier Certner [Fri, 13 Oct 2023 15:05:34 +0000 (17:05 +0200)]
uma: Permit specifying max of cache line and some custom alignment
To be used for structures for which we want to enforce that pointers to
them have some number of lower bits always set to 0, while still
ensuring we benefit from cache line alignment to avoid false sharing
between structures and fields within the structures (provided they are
properly ordered).
First candidate consumer that comes to mind is 'struct thread', see next
commit.
Reviewed by: markj, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42265
Olivier Certner [Fri, 13 Oct 2023 15:13:28 +0000 (17:13 +0200)]
linuxkpi: dma_get_cache_alignment(): Fix off-by-one result
Substituting 'uma_align_cache' by the appropriately named accessor
uma_get_cache_align_mask() made apparent that dma_get_cache_alignment()
was off by one, since it was defined to be the mask derived from the
alignment value.
Reviewed by: markj, bz
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42264
Olivier Certner [Fri, 13 Oct 2023 14:09:51 +0000 (16:09 +0200)]
uma: New check_align_mask(): Validate alignments (INVARIANTS)
New function check_align_mask() asserts (under INVARIANTS) that the mask
fits in a (signed) integer (see the comment) and that the corresponding
alignment is a power of two.
Use check_align_mask() in uma_set_align_mask() and also in uma_zcreate()
to replace the KASSERT() there (that was checking only for a power of
2).
Reviewed by: kib, markj
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42263
Olivier Certner [Fri, 13 Oct 2023 12:49:11 +0000 (14:49 +0200)]
uma: Make the cache alignment mask unsigned
In uma_set_align_mask(), ensure that the passed value doesn't have its
highest bit set, which would lead to problems since keg/zone alignment
is internally stored as signed integers. Such big values do not make
sense anyway and indicate some programming error. A future commit will
introduce checks for this case and other ones.
Reviewed by: kib, markj
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42262
Olivier Certner [Fri, 13 Oct 2023 12:13:30 +0000 (14:13 +0200)]
uma: UMA_ALIGN_CACHE: Resolve the proper value at use point
Having a special value of -1 that is resolved internally to
'uma_align_cache' provides no significant advantages and prevents
changing that variable to an unsigned type, which is natural for an
alignment mask. So suppress it and replace its use with a call to
uma_get_align_mask(). The small overhead of the added function call is
irrelevant since UMA_ALIGN_CACHE is only used when creating new zones,
which is not performance critical.
Reviewed by: markj, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42259
Create the uma_get_cache_align_mask() accessor and put it in a separate
private header so as to minimize namespace pollution in header/source
files that need only this function and not the whole 'uma.h' header.
Make sure the accessors have '_mask' as a suffix, so that callers are
aware that the real alignment is the power of two that is the mask plus
one. Rename the stem to something more explicit. Rename
uma_set_cache_align_mask()'s single parameter to 'mask'.
Hide 'uma_align_cache' to ensure that it cannot be set in any other way
then by a call to uma_set_cache_align_mask(), which will perform sanity
checks in a further commit. While here, rename it to
'uma_cache_align_mask'.
This is also in preparation for some further changes, such as improving
the sanity checks, eliminating internal resolving of UMA_ALIGN_CACHE and
changing the type of the 'uma_cache_align_mask' variable.
Reviewed by: markj, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42258
Olivier Certner [Tue, 10 Oct 2023 17:36:20 +0000 (19:36 +0200)]
Ensure "init" (PID 1) also executes userret() initially
Calling userret() from fork_return() misses the first return to
userspace of the "init" (PID 1) process. The latter is indeed created
by fork1() followed by a call to cpu_fork_kthread_handler() call that
replaces fork_return() by start_init() as the function to execute after
fork.
A new process' initial return to userspace in the end always happens
through returning from fork_exit(), so move userret() there instead to
fix the omission.
This problem was discovered as part of a revamp of scheduling priorities
that lead to experimenting with asserting and sometimes resetting
priorities in sched_userret(), in the course of which the author
stumbled on panics being triggered only in init() or only in other
processes, depending on the modifications to sched_userret(). This
change currently has no practical effect but will have some in the near
future.
Reviewed by: markj, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42257
This function is to be called only when initializing a new process (so,
'proc0' and at fork), and not in any other circumstances. Setting the
process' 'p_ucred' field to the result of crcowget() on the original
credentials is the only thing it does, hiding the fact that the process'
'p_ucred' field is crushed by the call. Moreover, most of the code it
executes is already encapsulated in crcowget().
To prevent misuse and improve code readability, just remove this
function and replace it with a direct assignment to 'p_ucred'.
Reviewed by: markj (earlier version), kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42255
Zhenlei Huang [Thu, 2 Nov 2023 09:07:11 +0000 (17:07 +0800)]
Hyper-V: vmbus: Add NULL check for vmbus_res
QEMU emulates Hyper-V [1] but lacks the emulation for vmbus_res, thus no
coherence information is available. Add NULL check for it and fallback
to no coherence. This will prevent FreeBSD guests from panic on QEMU
with the Hyper-V enlightenment hv-synic enabled.
For real Hyper-V, both gen1 and gen2 have vmbus_res then they are not
affected by this change.
hmt(4): Do not require input report HID usages to be a member of TLC
Some touchpads places button usages (in HID report descriptor) in to
the 2-nd level collection rather than in to the top level one. That
confuses current code. Remove collection level check in HID report
descriptor parser to fix device detection.
Reported by: Peter Much <pmc@citylink.dinoex.sub.org>
PR: 267094
MFC after: 1 week
Zhenlei Huang [Thu, 2 Nov 2023 05:14:40 +0000 (13:14 +0800)]
cam/ata: Postpone removal of two compat sysctls until 15
Prefer UNMAPPEDIO and ROTATING from flags sysctl. See
1. aeab0812e68c (Add flags sysctl to ada)
2. cf3ff63e55e4 (Convert unmappedio over to a flag)
3. 96eb32bf0f5a (Convert rotating to a flag bit)
Reviewed by: imp, ken, #cam
MFC after: immediately (we want this in 14.0)
Differential Revision: https://reviews.freebsd.org/D42402
Warner Losh [Wed, 1 Nov 2023 22:43:37 +0000 (16:43 -0600)]
libc: Purge unneeded cdefs.h
These sys/cdefs.h are not needed. Purge them. They are mostly left-over
from the $FreeBSD$ removal. A few in libc are still required for macros
that cdefs.h defines. Keep those.
Ed Maste [Mon, 16 Oct 2023 12:46:31 +0000 (08:46 -0400)]
Remove MOVED_LIBS handling from list-old-libs
In 922337e8d398 I added MOVED_LIBS into list-old-files, so that
delete-old-files would remove the old /usr/lib/libc++.so.1 as soon as
possible (after the library moved to /lib).
I left it in list-old-libs in case a user updated their src tree between
delete-old-files and delete-old-libs. Now that some time has passed,
tremove the redundant MOVED_LIBS entry.
Warner Losh [Tue, 31 Oct 2023 20:55:58 +0000 (14:55 -0600)]
ino64: Remove 'forward compat' code for this
Forward compatibility code was added for running newer ino64 binaries on
older kernels as a transition aide. Now that ino64 has been in the tree
6 years, this code is no longer useful and should have been removed long
ago. Remove it now. Should be no user-visible changes at this point as
all the 'upgrade' scenarios it was intended for are long since past.
Also need to remove this stuff from rtld since the _foo versions
no longer exist.
siv0 [Tue, 31 Oct 2023 20:57:54 +0000 (21:57 +0100)]
Fix nfs_truncate_shares without /etc/exports.d
Calling nfs_reset_shares on Linux prints a warning:
`failed to lock /etc/exports.d/zfs.exports.lock: No such file or
directory`
when /etc/exports.d does not exist. The directory gets created, when a
filesystem is actually exported through nfs_toggle_share and
nfs_init_share. The truncation of /etc/exports.d/zfs.exports happens
unconditionally when calling `zfs mount -a` (via zfs_do_mount and
share_mount in `cmd/zfs/zfs_main.c`).
Fixing the issue only in the Linux part, since the exports file on
freebsd is in `/etc/zfs/`, which seems present on 2 FreeBSD systems I
have access to (through `/etc/zfs/compatibility.d/`), while a Debian
box does not have the directory even if `/usr/sbin/exportfs` is
present through the `nfs-kernel-server` package.
The code for exports_available is copied from nfs_available above.
Fixes: ede037cda73675f42b1452187e8dd3438fafc220
("Make zfs-share service resilient to stale exports")
Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
Closes #15369
Closes #15468
Warner Losh [Tue, 31 Oct 2023 20:03:32 +0000 (14:03 -0600)]
bhyve: ps2 implement command 0xf6
Implement PS2 Keyboard command 0xf6, which is "SET DEFAULTS". This is
the same as 0xf5 (DISABLE KEYBOARD), but without disabling the keyboard
(since that resets all the defaults as a side effect). Normally, we
clear the fifo when we re-enable the keyboard. However, since this
leaves the keyboard enabled, clear the fifo as part of this command and
send an ack.
Linux's keyboard driver sends this command on reboot. Other commands
enable / reset the kebyoard, so it doesn't matter too much this isn't
implemented for booting, eg ubuntu.
Martin Matuška [Tue, 31 Oct 2023 20:49:41 +0000 (21:49 +0100)]
Fix block cloning between unencrypted and encrypted datasets
Block cloning from an encrypted dataset into an unencrypted dataset
and vice versa is not possible. The current code did allow cloning
unencrypted files into an encrypted dataset causing a panic when
these were accessed. Block cloning between encrypted and encrypted
is currently supported on the same filesystem only.
Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Kay Pedersen <mail@mkwg.de> Reviewed-by: Rob N <robn@despairlabs.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #15464
Closes #15465
Kenneth D. Merry [Tue, 31 Oct 2023 19:20:36 +0000 (15:20 -0400)]
Add IBM TS1170 density codes and specs.
These were obtained from a drive, but they agree with the IBM
documentation.
The bpi/bpmm values are the same as TS1160, but the number of
tracks is much larger (18944 tracks vs 8704 for TS1160). The tapes
are also longer, 1337m total. (According to the MAM on a sample JF
tape. I don't have a JE tape handy to compare.) The end result
is a 50TB raw capacity (150TB compressed) for TS1170 with a JF
cartridge vs 20TB raw capacity (60TB compressed) for TS1160 with
a JE cartridge.
lib/libmt/mtlib.c:
Add the TS1170 density codes to the denstiy table in libmt.
usr.bin/mt/mt.1:
Add the TS1170 density codes and specs to the density table
in the mt(1) man page. As usual for TS drives, there is an
encrypted and non-encrypted density code (0x79 and 0x59
respectively).
Umer Saleem [Tue, 31 Oct 2023 16:51:54 +0000 (21:51 +0500)]
Add all read-only compatible zpool features to grub2 compatibility
GRUB opens the boot pool in read-only mode. All read-only
compatible features for zpool can be enabled and added to
grub2 compatibility, as GRUB does not open the boot-pool
for write.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15459
Ameer Hamza [Tue, 17 Oct 2023 19:19:58 +0000 (00:19 +0500)]
zvol: fix delayed update to block device ro entry
The change in the zvol readonly property does not update the block
device readonly entry until the first IO to the ZVOL. This patch
addresses the issue by updating the block device readonly property
from the set property IOCTL call.
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15409
Ameer Hamza [Tue, 24 Oct 2023 21:53:27 +0000 (02:53 +0500)]
zvol: Implement zvol threading as a Property
Currently, zvol threading can be switched through the zvol_request_sync
module parameter system-wide. By making it a zvol property, zvol
threading can be switched per zvol.
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15409
Ameer Hamza [Wed, 11 Oct 2023 22:31:11 +0000 (03:31 +0500)]
zvol: Cleanup set property
zvol_set_volmode() and zvol_set_snapdev() share a common code path.
Merge this shared code path into zvol_set_common().
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15409
Kristof Provost [Fri, 27 Oct 2023 12:13:57 +0000 (14:13 +0200)]
libpfctl: be more tolerant of kernel extensions
Allow the kernel to supply more array elements than expected, but cut
off when we hit what we think the maximum is. This will improve forward
compatibility (i.e. old userspace with newer kernel).
Kristof Provost [Tue, 10 Oct 2023 09:56:15 +0000 (11:56 +0200)]
pf tests: ensure that we generate all permutations for SCTP multihome
The initial multihome implementation was a little simplistic, and failed
to create all of the required states. Given a client with IP 1 and 2 and
a server with IP 3 and 4 we end up creating states for 1 - 3 and 2 - 3,
as well as 3 - 1 and 4 - 1, but not for 2 - 4.
Kristof Provost [Tue, 17 Oct 2023 16:10:39 +0000 (18:10 +0200)]
pf: fix missing SCTP multihomed states
The existing code to create extra states when SCTP endpoints supplied
extra addresses missed a case. As a result we failed to generate all of
the required states.
Briefly, if host A has address 1 and 2 and host B has addres 3 and 4 we
generated 1 - 3 and 2 - 3, as well as 1 - 4, but not 2 - 4.
Store the list of endpoints supplied by each host and use those to
generate all of the connection permutations.
Alexander Motin [Mon, 30 Oct 2023 23:56:04 +0000 (19:56 -0400)]
Unify arc_prune_async() code
There is no sense to have separate implementations for FreeBSD and
Linux. Make Linux code shared as more functional and just register
FreeBSD-specific prune callback with arc_add_prune_callback() API.
Aside of code cleanup this should fix excessive pruning on FreeBSD:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=274698
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Johnston <markj@FreeBSD.org>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15456
Brooks Davis [Mon, 30 Oct 2023 23:33:14 +0000 (23:33 +0000)]
tzsetup: make UTC the first (default) region
Many sysadmins prefer to configure their systems to UTC and it's a
reasonable default when installing, making it easier to get a usable
system by just hitting <return> repeatidly.
Renumber UTC to 0 to preserve the finger memory of those selecting a
region by shortcut.
Alexander Motin [Mon, 30 Oct 2023 21:55:32 +0000 (17:55 -0400)]
Tune zio buffer caches and their alignments
We should not always use PAGESIZE alignment for caches bigger than
it and SPA_MINBLOCKSIZE otherwise. Doing that caches for 5, 6, 7,
10 and 14KB rounded up to 8, 12 and 16KB respectively make no sense.
Instead specify as alignment the biggest power-of-2 divisor. This
way 2KB and 6KB caches are both aligned to 2KB, while 4KB and 8KB
are aligned to 4KB.
Reduce number of caches to half-power of 2 instead of quarter-power
of 2. This removes caches difficult for underlying allocators to
fit into page-granular slabs, such as: 2.5, 3.5, 5, 7, 10KB, etc.
Since these caches are mostly used for transient allocations like
ZIOs and small DBUF cache it does not worth being too aggressive.
Due to the above alignment issue some of those caches were not
working properly any way. 6KB cache now finally has a chance to
work right, placing 2 buffers into 3 pages, that makes sense.
Remove explicit alignment in Linux user-space case. I don't think
it should be needed any more with the above fixes.
As result on FreeBSD instead of such numbers of pages per slab:
Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15452
Alexander Motin [Mon, 30 Oct 2023 21:54:27 +0000 (17:54 -0400)]
RAIDZ: Use cache blocking during parity math
RAIDZ parity is calculated by adding data one column at a time. It
works OK for small blocks, but for large blocks results of previous
addition may already be evicted from CPU caches to main memory, and
in addition to extra memory write require extra read to get it back.
This patch splits large parity operations into 64KB chunks, that
should in most cases fit into CPU L2 caches from the last decade.
I haven't touched more complicated cases of data reconstruction to
not over complicate the code. Those should be relatively rare.
My tests on Xeon Gold 6242R CPU with 1MB of L2 cache per core show
up to 10/20% memory traffic reduction when writing to 4-wide RAIDZ/
RAIDZ2 blocks of ~4MB and up. Older CPUs with 256KB of L2 cache
should see the effect even on smaller blocks. Wider vdevs may need
bigger blocks to be affected.
Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15448
Alexander Motin [Mon, 30 Oct 2023 21:51:56 +0000 (17:51 -0400)]
ZIL: Cleanup sync and commit handling
ZVOL:
- Mark all ZVOL ZIL transactions as sync. Since ZVOLs have only
one object, it makes no sense to maintain async queue and on each
commit merge it into sync. Single sync queue is just cheaper, while
it changes nothing until actual commit request arrives.
- Remove zsd_sync_cnt and the zil_async_to_sync() calls since we
are no longer switching between sync and async queues.
ZFS:
- Mark write transactions as sync based only on number of sync
opens (z_sync_cnt). We can not randomly jump between sync and
async unless we want data corruptions due to writes reordering.
- When file first opened with O_SYNC (z_sync_cnt incremented to 1)
call zil_async_to_sync() for it to preserve correct ordering between
past and future writes.
- Drop zfs_fsyncer_key logic. Looks like it was an optimization
for workloads heavily intermixing async writes with tons of fsyncs.
But first it was broken 8 years ago due to Linux tsd implementation
not allowing data storage between syscalls, and second, I doubt it
is safe to switch from async to sync so often and without calling
zil_async_to_sync().
- Rename sync argument of *_log_write() into commit, now only
signalling caller's intent to call zil_commit() soon after. It
allows WR_COPIED optimizations without extra other meanings.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15366
Andrew Turner [Mon, 16 Oct 2023 14:34:19 +0000 (15:34 +0100)]
arm64: Add a BTI landing pad to .mcount
The .mcount function needs a BTI branch target. As we can't rely on
asm.h being included use the hint version of a "bti c" instruction.
This is a nop when BTI is not supported or not used.
Reviewed by: markj
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D42230
Andrew Turner [Thu, 12 Oct 2023 16:29:46 +0000 (17:29 +0100)]
rtld: Teach rtld about the BTI elf note
Add the Branch Target Identification (BTI) note to libc assembly
sources. As all obect files need the note for rtld to have it we need
to insert it in all asm files.
Andrew Turner [Thu, 12 Oct 2023 10:03:37 +0000 (11:03 +0100)]
csu: Teach csu about PAC and BTI
Add the Branch Target Identification (BTI) note to libc assembly
sources and Pointer Authentication Code (PAC) instructions to _init and
_fini.
_init and _fini may be called indirectly so need a BTI landing pad. As
they are non-leaf functions use the appropriate PAC instruction that
also guards against changing the link register.
As all object files need the note for any binary using these object files
we need to insert it in all asm files.
Reviewed by: markj
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D42227
The new STATIC_TLS_EXTRA variable provides a means for applications
to increases the size of the extra static TLS space allocated by
rtld beyond the default of '128'. This extra static TLS space is used
for objects loaded with dlopen.
The value specified in the variable must be no less than the default
value and no greater than the maximum allowed value for size_t type.
If an invalid value is specified, rtld will ignore it and just use
the default value.
The rtld(1) man page is updated to document this new option.
Obtained from: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D42025
Yishai Hadas [Sat, 28 Oct 2023 20:55:47 +0000 (16:55 -0400)]
mlx5ib: Fix RSS Toeplitz setup to be aligned with the HW specification
The specification for the Toeplitz function doesn't require to set the key
explicitly to be symmetric. In case a symmetric functionality is required
a symmetric key can be simply used.
Wrongly forcing the algorithm to symmetric causes the wrong packet
distribution and a performance degradation.
Warner Losh [Fri, 27 Oct 2023 21:23:47 +0000 (15:23 -0600)]
devd: Improve devmatch support
We know that calling devmatch will be futile if there's no plug and play
information for it to match on. Avoid this generically when we see
"? at +on"
which happens only when the location and pnpinfo aren't provided. Don't
call "service devmatch quietstart" here.
We also ignore ACPI devices with a _HID of none. These also will never
load a new driver, so avoid calling "service devmatch quietstart" here too.
Use the more compatct "$*" instead of "'?'$_" when calling "service
devmatch quietstart" since it will evaluate to the same thing.
On my laptop, this eliminates 45% of the calls to devmatch. While it
would be even better to integrate devmatch into devd (so we only parse
linker.hints once), that will have to wait for another day as it's a bit
more complex to arrange that avoiding easy to avoid calls.
Warner Losh [Fri, 27 Oct 2023 20:03:51 +0000 (14:03 -0600)]
otus: splnet isn't a thing, remove place holders
Even though it's if 0'd code, remove splnet/splx. This code can't
possibly run on FreeBSD, but having it here as a marker isn't especially
helpful. It causes a false positive on grepping otherwise.
Warner Losh [Fri, 27 Oct 2023 16:11:29 +0000 (10:11 -0600)]
strlcpy/strlcat: Remove references to snprintf
While strlcpy and snprintf are somewhat similar, there's big differences
between strlcat and snprintf which leads to confusion. Remove the
comparison, since it's ultimately not that useful: the snprintf man page
has similar language to strlcpy, so it doesn't provide a better
reference. The two implementations are otherwise unrelated.
Warner Losh [Fri, 27 Oct 2023 14:39:31 +0000 (08:39 -0600)]
netinet: The tailq_hash code doesn't reference tcpoutflags
Don't define TCPOUTFLAGS to get the static definition from tcp_fsm.h.
tailq_hash.c doesn't refernce tcpoutflag. Only files that reference this
should define TCPOUTFLAGS. clang is fine with it, but gcc12 complained.