Daniel Berlin [Tue, 10 Oct 2023 18:04:32 +0000 (14:04 -0400)]
Ensure we call fput when cloning fails due to different devices.
Right now, zpl_ioctl_ficlone and zpl_ioctl_ficlonerange do not call
put on the src fd if the source and destination are on two different
devices. This leaves the source file held open in this case.
Reviewed-by: Kay Pedersen <mail@mkwg.de> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Daniel Berlin <dberlin@dberlin.org>
Closes #15386
Mateusz Guzik [Tue, 10 Oct 2023 16:19:53 +0000 (16:19 +0000)]
vfs: consult freevnodes in vnlru_kick_cond
If the count is high enough there is no point trying to produce more.
Not going there reduces traffic on the vnode_list mtx.
This further shaves total real time in a test mentioned in: 74be676d87745eb7 ("vfs: drop one vnode list lock trip during vnlru free
recycle") -- 20 instances of find each creating 1 million vnodes, while
total limit is set to 400k.
Tony Hutter [Tue, 10 Oct 2023 15:57:48 +0000 (08:57 -0700)]
zvol: Temporally disable blk-mq
There was a report of zvol data loss (#15351) after enabling blk-mq on a
zvol backed with 16k physical block sized disks. Out of an abundance of
caution, do not allow the user to enable blk-mq until we can look into
the issue.
Note that blk-mq was not enabled by default on zvols. It was always
opt-in via the zvol_use_blk_mq module parameter.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Addresses: #15351
Closes #15378
Rob Norris [Sat, 5 Aug 2023 16:11:19 +0000 (02:11 +1000)]
AUTHORS: update with missing names
This is generated by scripts/update_authors.pl. I've looked over the
results fairly closely and while I don't think they're bad, they could
be improved somewhat, but also, I don't know if its good form to just
update this without explicit consent from those named.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15374
Rob Norris [Sat, 5 Aug 2023 15:58:45 +0000 (01:58 +1000)]
mailmap: initial, trying to tidy up a lot of the commit history
This comes from the observation that a huge number of commit author
fields look quite strange (to my eyes), but quite often the Signed-off-by: trailer has the correct name. For these I have updated
the name where it was obvious how to do so, however, I have not created
a mapping for the commit email to the Signed-off-by email, as whatever I
choose for email will become the prime candidate for inclusion in the
AUTHORS file, and care needs to be taken when acting without explicit
consent.
There's a small handful of commits that look like they were done on
local machines, or CI hosts, or similar, where the git authorship config
wasn't set up properly. Its obvious what this should look like, so I've
just done them.
The remainder is mapping Github noreply emails to either an
obviously-correct Signed-off-by trailer, or to a an author from another
commit. This was mostly done by hand, so there may be errors, but I
think its close. I do not understand where these come from - I know that
they're what commits made via Github web look like when there's no real
address set on the account, but I find it hard to believe that so many
of these came through the web, especially given the complexity of most
of the changes. I suspect there's some kind of merge helper tool in play
here. Regardless, the history is set now, and this tries to get it back
on track.
Obviously, all of this helps the history look tidy, but this also feeds
into the AUTHORS update script. See next commit.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15374
Andrew Turner [Mon, 2 Oct 2023 15:55:31 +0000 (16:55 +0100)]
arm64: Enable kernel branch protection
Add the build flags to enable branch protection on arm64. This enable
the use of PAC and BTI in the kernel.
For PAC we already install the kernel keys when entering the kernel
from userspace so this will start using these to sign the stack.
For BTI we need to mark the kernel page tables with a new guarded page
field. As this will require all code that could be reached through a
function pointer with an appropriate branch target instruction we
are enabling this before setting the field.
As the pointer authentication support shouldn't be reached via a
function pointer it is safe to not enable the use of BTI there.
Reviewed by: markj
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D42079
Andrew Turner [Tue, 3 Oct 2023 08:52:02 +0000 (09:52 +0100)]
arm64: Add BTI landing pads to assembly functions
When we enable BTI iboth the first instruction in a function that could
be called indirectly, and a branch within a function need a valid
landing pad instruction.
There are three options for these instructions:
1. A breakpoint instruction
2. A pointer authentication PACIASP/PACIBSP
3. A BTI instruction
Option 1 will raise a breakpoint exception so isn't useable in either
cases. Option 2 could be used in some function entry cases, but needs
to be paired with an authentication instruction, and is normally only
used in non-leaf functions we can't use it in this case. This leaves
option 3.
There are four variants of the instruction, the C variant is used on
function entry and the J variant is for jumping within a function.
There is also a JC that works with both and one with no target that
works with neither.
Reviewed by: markj
Sponsored by: Arm Ltd
Sponsored by: The FreeBSD Foundation (earlier version)
Differential Revision: https://reviews.freebsd.org/D42078
Kristof Provost [Wed, 4 Oct 2023 10:27:54 +0000 (12:27 +0200)]
pf: add a way to list creator ids
Allow userspace to retrieve a list of distinct creator ids for the
current states.
This is used by pfSense, and used to require dumping all states to
userspace. It's rather inefficient to export a (potentially extremely
large) state table to obtain a handful (typically 2) of 32-bit integers.
Kristof Provost [Mon, 2 Oct 2023 13:48:18 +0000 (15:48 +0200)]
libpfctl: introduce state iterator
Allow consumers to start processing states as the kernel supplies them,
rather than having to build a full list and only then start processing.
Especially for very large state tables this can significantly reduce
memory use.
Without this change when retrieving 1M states time -l reports:
real 3.55
user 1.95
sys 1.05
318832 maximum resident set size
194 average shared memory size
15 average unshared data size
127 average unshared stack size
79041 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
15096 messages sent
250001 messages received
0 signals received
22 voluntary context switches
34 involuntary context switches
With it it reported:
real 3.32
user 1.88
sys 0.86
3220 maximum resident set size
195 average shared memory size
11 average unshared data size
128 average unshared stack size
260 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
15096 messages sent
250001 messages received
0 signals received
21 voluntary context switches
31 involuntary context switches
The primary motivation is to improve how we deal with very large state
stables. With the previous implementation we had to build the entire
list (both in the kernel and in userspace) before we could start
processing. With netlink we start to get data in userspace while the
kernel is still generating more. This reduces peak memory consumption
(which can get to the GB range once we hit millions of states).
Netlink also makes future extension easier, in that we can easily add
fields to the state export without breaking userspace. In that regard
it's similar to an nvlist-based approach, except that it also deals
with transport to userspace and that it performs significantly better
than nvlists. Testing has failed to measure a performance difference
between the previous struct-copy based ioctl and the netlink approach.
Rework the packages TUI, do that the index caching is now done with
dialog --gauge (tested with cdialog and bsddialog).
With pkg we can know in avance the number of packages making it
possible to have a real gauge.
The cache of the index is now a file that can be sourced, meaning it
is not anymore an index like file, but a post process one, simplifying
the code.
Each menu is now built calling directly pkg rquery with just the
informations required to build the menu instead of parsing an indexfile
install all the awk index processing into a separate file to ease
reading and debuggung
Ed Maste [Fri, 6 Oct 2023 18:00:30 +0000 (14:00 -0400)]
sysctl: emit a newline after NULL node descriptions
Previously when printing the sysctl description (via the -d flag) we
omitted the newline if the node provided no description (i.e., NULL).
This could be observed via e.g. `sysctl -d dev`.
PR: 44034
Reviewed by: zlei
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42112
Olivier Certner [Tue, 30 May 2023 16:35:08 +0000 (18:35 +0200)]
setusercontext(): Apply personal settings only on matching effective UID
Commit 35305a8dc114 (r211393) added a check on whether 'uid' was equal
to getuid() before calling setlogincontext(). Doing so still allows
a setuid program to apply resource limits and priorities specified in
a user-controlled configuration file ('~/.login_conf') where
a non-setuid program could not. Plug the hole by checking instead that
the process' effective UID is the target one (which is likely what was
meant in the initial commit).
PR: 271750
Reviewed by: kib, des
MFC after: 2 weeks
Sponsored by: Kumacom SAS
Differential Revision: https://reviews.freebsd.org/D40351
Ed Maste [Mon, 7 Aug 2023 20:59:52 +0000 (16:59 -0400)]
x86: make EARLY_AP_STARTUP mandatory
When early AP startup was introduced in 2016 it was put behind a kernel
option EARLY_AP_STARTUP as a transition aid, so that it could be turned
off if necessary. For x86 the non-EARLY_AP_STARTUP case is no longer
functional, so disallow it.
Other archs are still incompatible with EARLY_AP_STARTUP, so the option
cannot yet be removed entirely.
Reported by: wollman
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D41351
We should probably consider using/honouring the standard --with-bashcompletiondir
autoconf option as well, but that's something to do later.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Umer Saleem <usaleem@ixsystems.com> Signed-off-by: Sam James <sam@gentoo.org>
Closes #15372
Bjoern A. Zeeb [Mon, 9 Oct 2023 19:47:39 +0000 (19:47 +0000)]
iwlwifi: re-enable "Invalid TXQ id" logging
Various reports recently hit the "Invalid TXQ id" in iwlwifi again.
Unconditionally enable logging and add a note to report to a specific
PR in the log message for now.
Along with 018d93ece16b this will hopefully help us to understand what
is going on.
Multiple reports have shown missed state transitions in net80211 without
major cause obvious (or with a txq warning in iwlwifi).
In order to better track down potential problems add unconditional
ic_printf calls to any case in the lkpi state machine compat code which
would let us return with an error in the hope that it helps us to catch
the actual problems.
Also remove the debug conditions from ieee80211_{beacon,connection}_loss
which can also cause state transitions to have the ic_printf all the time
there too.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Jose Luis Duran [Fri, 6 Oct 2023 12:13:26 +0000 (12:13 +0000)]
Cirrus CI: Trigger on pull requests or downstream repos
Since Cirrus Labs is limiting their free usage tier [1], limit CI runs
on pull requests only. Otherwise, we might deplete our monthly quota
within a few days.
Adapt the task amd64-llvm16 to execute on downstream repos or on pull
requests only.
Zhenlei Huang [Mon, 9 Oct 2023 10:30:22 +0000 (18:30 +0800)]
proc: Add sysctl flag CTLFLAG_TUN to loader tunable
The sysctl variable 'kern.kstack_pages' is actually a loader tunable.
Add sysctl flag CTLFLAG_TUN to it so that `sysctl -T` will report it
correctly.
No functional change intended.
Note that on arm64 the thread0 stack size can not be controlled with it,
kib@ suggested that arm64 maintainers can fix it eventually so let's
enable it also on arm64 right now.
Zhenlei Huang [Mon, 9 Oct 2023 10:30:22 +0000 (18:30 +0800)]
buf: Add sysctl flag CTLFLAG_TUN to loader tunable
The sysctl variable 'vfs.unmapped_buf_allowed' is actually a loader
tunable. Add sysctl flag CTLFLAG_TUN to it so that `sysctl -T` will
report it correctly.
Zhenlei Huang [Mon, 9 Oct 2023 10:30:22 +0000 (18:30 +0800)]
sockets: Add sysctl flag CTLFLAG_TUN to loader tunable
The sysctl variable 'kern.ipc.maxsockets' is actually a loader tunable.
Add sysctl flag CTLFLAG_TUN to it so that `sysctl -T` will report it
correctly.
Zhenlei Huang [Mon, 9 Oct 2023 10:30:21 +0000 (18:30 +0800)]
ddb: Add sysctl flag CTLFLAG_TUN to loader tunable
The sysctl variable 'debug.ddb.capture.bufsize' is actually a loader
tunable. Add sysctl flag CTLFLAG_TUN to it so that `sysctl -T` will
report it correctly.
Zhenlei Huang [Mon, 9 Oct 2023 10:30:21 +0000 (18:30 +0800)]
cam/scsi: Add sysctl flag CTLFLAG_TUN to loader tunable
The sysctl variable 'kern.cam.scsi_delay' is actually a loader tunable.
Add sysctl flag CTLFLAG_TUN to it so that `sysctl -T` will report it
correctly.
The loader tunable 'net.inet.sctp.tcbhashsize' and 'net.inet.sctp.chunkscale'
are only used during vnet initializing, thus it make no senses to make them
writable tunable.
Validate the values of loader tunables on vnet initialize, reset them to
theirs defaults if invalid to prevent potential kernel panics.
Bojan Novković [Mon, 9 Oct 2023 00:38:08 +0000 (20:38 -0400)]
arm64: Add a leaf PTP when pmap_enter(psind=1) creates a wired mapping
Let pmap_enter_l2() create wired mappings. In particular, allocate a
leaf PTP for use during demotion. This is a step towards reverting
commit 64087fd7f372.
Bojan Novković [Mon, 9 Oct 2023 00:32:35 +0000 (20:32 -0400)]
i386: Add a leaf PTP when pmap_enter(psind=1) creates a wired mapping
Let pmap_enter_pde() create wired mappings. In particular, allocate a
leaf PTP for use during demotion. This is a step towards reverting
commit 64087fd7f372.
mount_nfs(8): Indicate that the -t option is deprecated
In mount_nfs.c the -t option is deprecated and advises to use
timeout=<N> instead. However, since that refers to NFS over UDP, which
is not used nowadays, mark this option as deprecated in the man page.
Notable upstream pull request merges:
#15290 54b1b1d89 import: require force when cachefile hostid doesn't
match on-disk
#15319 342357cd9 Reduce number of metaslab preload taskq threads
#15340 2a6c62109 ARC: Remove b_cv from struct l1arc_buf_hdr
#15347 75a2eb7fa ARC: Drop different size headers for crypto
#15350 96b9cf42e ARC: Remove b_bufcnt/b_ebufcnt from ARC headers
#15353 66b81b349 ZIL: Reduce maximum size of WR_COPIED to 7.5K
#15362 5b8688e62 zfsconcepts: add description of block cloning
The use of bitcount() triggered a build error because it couldn't be
located. __bitcount() on the other hand is defined in sys/types.h, which
is included in teken/teken.h.
Bojan Novković [Sat, 7 Oct 2023 18:00:11 +0000 (21:00 +0300)]
tty: fix improper backspace behaviour for UTF8 characters when in canonical mode
This patch adds additional logic in ttydisc_rubchar() to properly handle
backspace behaviour for UTF-8 characters.
Currently, typing in a backspace after a UTF8 character will delete only
one byte from the byte sequence, leaving garbled output in the tty's
output queue. With this change all of the character's bytes are deleted.
This change is only active when the IUTF8 flag is set (see 19054eb6053189144aa962b2ecc1bf5087758a3e "(s)tty: add support for IUTF8
input flag")
The code uses the teken_wcwidth() function to properly handle character
column widths for different code points, and adds the
teken_utf8_bytes_to_codepoint() function that converts a UTF-8 byte
sequence to a codepoint, as specified in RFC3629.
Bojan Novković [Sat, 7 Oct 2023 17:59:57 +0000 (20:59 +0300)]
(s)tty: add support for IUTF8 input flag
This patch adds the necessary kernel and stty code to support setting
the IUTF8 flag for ttys. It is the first of two patches that fix
backspace behaviour for UTF-8 encoded characters when in canonical mode.
Alan Somers [Wed, 4 Oct 2023 18:48:01 +0000 (12:48 -0600)]
fusefs: sanitize FUSE_READLINK results for embedded NULs
If VOP_READLINK returns a path that contains a NUL, it will trigger an
assertion in vfs_lookup. Sanitize such paths in fusefs, rejecting any
and warning the user about the misbehaving server.
Michael Tuexen [Sat, 7 Oct 2023 13:56:00 +0000 (15:56 +0200)]
udp: fix sending of IPv4-mapped addresses
The inp_vflags field must be adjusted during the call of
in_pcbbind_setup(). This is consistent with the other places in the
code, but not elegant at all.
Alan Somers [Fri, 6 Oct 2023 21:05:41 +0000 (15:05 -0600)]
Fix intermittency in the sys.fs.fusefs.symlink.main test
This change is identical to 86885b18689 but for symlink instead of
mknod. The kernel sends a FUSE_FORGET asynchronously with the final
syscall. The lack of an expectation caused this test to occasionally
fail.
Also, remove a sleep that accidentally snuck into a different test.
Alan Somers [Fri, 6 Oct 2023 19:46:42 +0000 (13:46 -0600)]
Fix intermittency in the sys.fs.fusefs.mknod.main test
In the Mknod.parent_inode test case, the kernel sends an extra
FUSE_FORGET message. But because it gets sent asynchronously with the
failing syscall, it doesn't always get received before the test ends.
So we never setup an expectation for it. And 90+% of the time the test
would exit successfully.
Fix the intermittency by always waiting to receive the FUSE_FORGET
message.
Alexander Motin [Fri, 6 Oct 2023 17:09:27 +0000 (13:09 -0400)]
ZIL: Reduce maximum size of WR_COPIED to 7.5K
Benchmarks show that at certain write sizes range lock/unlock take
not so much time as extra memory copy. The exact threshold is not
obvious due to other overheads, but it is definitely lower than
~63KB used before. Make it configurable, defaulting at 7.5KB,
that is 8KB of nearest malloc() size minus itx and lr structs.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15353
Emmanuel Vadot [Thu, 5 Oct 2023 17:10:00 +0000 (19:10 +0200)]
dwc: Rewrite part of the descriptors setup functions
- Add a txdesc_clear which clears the tx desc instead of doing that in
dwc_setup_txdesc based on arguments.
- Remove dwc_set_owner, in the end we always set the owner of the desc
as we do it for id > 0 and then for the first one.
- Remove dwc_ prefix
siv0 [Fri, 6 Oct 2023 16:53:23 +0000 (18:53 +0200)]
rpm: Fix `make rpm` on Debian/Ubuntu
The recent patch to change the bash completion install location based
on the Distribution, ignored that it should still be possible to
create RPMs on Debian derived systems. Additionally `make deb` itself
creates RPMs and converts them via `alien`.
This patch adds the bashcompletiondir variable to the rpm defines and
uses this for the location, where to get the bash completion file.
It still changes the location on Debian/Ubuntu systems in the final
packages from /etc/bash_completion.d to
/usr/share/bash-completion/completions
The daemon utility already does its own buffering and retransmits its
child's output line by line. There's no need for stdio to add its own
buffering on top of this.
Rob Norris [Sat, 16 Sep 2023 07:02:02 +0000 (17:02 +1000)]
import: require force when cachefile hostid doesn't match on-disk
Previously, if a cachefile is passed to zpool import, the cached config
is mostly offered as-is to ZFS_IOC_POOL_TRYIMPORT->spa_tryimport(), and
the results are taken as the canonical pool config and handed back to
ZFS_IOC_POOL_IMPORT.
In the course of its operation, spa_load() will inspect the pool and
build a new config from what it finds on disk. However, it then
regenerates a new config ready to import, and so rightly sets the hostid
and hostname for the local host in the config it returns.
Because of this, the "require force" checks always decide the pool is
exported and last touched by the local host, even if this is not true,
which is possible in a HA environment when MMP is not enabled. The pool
may be imported on another head, but the import checks still pass here,
so the pool ends up imported on both.
(This doesn't happen when a cachefile isn't used, because the pool
config is discovered in userspace in zpool_find_import(), and that does
find the on-disk hostid and hostname correctly).
Since the systemd zfs-import-cache.service unit uses cachefile imports,
this can lead to a system returning after a crash with a "valid"
cachefile on disk and automatically, quietly, importing a pool that has
already been taken up by a secondary head.
This commit causes the on-disk hostid and hostname to be included in the
ZPOOL_CONFIG_LOAD_INFO item in the returned config, and then changes the
"force" checks for zpool import to use them if present.
This method should give no change in behaviour for old userspace on new
kernels (they won't know to look for the new config items) and for new
userspace on old kernels (the won't find the new config items).
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc.
Closes #15290
Rob Norris [Mon, 18 Sep 2023 01:07:32 +0000 (11:07 +1000)]
tests: add tests for zpool import behaviour when hostid changes
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc.
Closes #15290
Kristof Provost [Fri, 6 Oct 2023 12:20:17 +0000 (14:20 +0200)]
pfctl: fix incorrect mask on dynamic address
A PF rule using an IPv4 address followed by an IPv6 address and then a
dynamic address, e.g. "pass from {192.0.2.1 2001:db8::1} to (pppoe0)",
will have an incorrect /32 mask applied to the dynamic address.
MFC after: 3 weeks
Obtained from: OpenBSD
See also: https://ftp.openbsd.org/pub/OpenBSD/patches/5.6/common/007_pfctl.patch.sig
Sponsored by: Rubicon Communications, LLC ("Netgate")
Event: Oslo Hackathon at Modirum
Rob N [Fri, 6 Oct 2023 16:06:29 +0000 (03:06 +1100)]
zfsconcepts: add description of block cloning
Here I'm trying to succinctly introduce the concept, the basics of its
construction, how its different to dedup, how to use it, and where its
limitations lie, in four paragraphs and with enough searchable terms to
help the reader find more information both within OpenZFS and elsewhere.
Phew.
Sponsored-By: Klara, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #15362
Alexander Motin [Fri, 6 Oct 2023 16:04:00 +0000 (12:04 -0400)]
Reduce number of metaslab preload taskq threads.
Before this change ZFS created threads for 50% of CPUs for each top-
level vdev. Plus it created the same number of threads for embedded
log groups (that have only one metaslab and don't need any preload).
As result, on system with 80 CPUs and pool of 60 vdevs this resulted
in 4800 metaslab preload threads, that is absolutely insane.
This patch changes the preload threads to 50% of CPUs in one taskq
per pool, so on the mentioned system it will be only 40 threads.
Among other things this fixes zdb on the mentioned system and pool
on FreeBSD, that failed to create so many threads in one process.
Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15319
Alexander Motin [Tue, 3 Oct 2023 15:57:48 +0000 (11:57 -0400)]
ARC: Drop different size headers for crypto
To reduce memory usage ZFS crypto allocated bigger by 56 bytes ARC
headers only when specific block was encrypted on disk. It was a
nice optimization, except in some cases the code reallocated them
on fly, that invalidated header pointers from the buffers. Since
the buffers use different locking, it created number of races, that
were originally covered (at least partially) by b_evict_lock, used
also to protection evictions. But it has gone as part of #14340.
As result, as was found in #15293, arc_hdr_realloc_crypt() ended
up unprotected and causing use-after-free.
Instead of introducing some even more elaborate locking, this patch
just drops the difference between normal and protected headers. It
cost us additional 56 bytes per header, but with couple patches
saving 24 bytes, the net growth is only 32 bytes with total header
size of 232 bytes on FreeBSD, that IMHO is acceptable price for
simplicity. Additional locking would also end up consuming space,
time or both.
Reviewe-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15293
Closes #15347
Alexander Motin [Fri, 6 Oct 2023 15:56:17 +0000 (11:56 -0400)]
ARC: Remove b_bufcnt/b_ebufcnt from ARC headers
In most cases we do not care about exact number of buffers linked
to the header, we just need to know if it is zero, non-zero or one.
That can easily be checked just looking on b_buf pointer or in some
cases derefencing it.
b_ebufcnt is read only once, and in that case we already traverse
the list as part of arc_buf_remove(), so second traverse should not
be expensive.
This reduces L1 ARC header size by 8 bytes and full crypto header by
16 bytes, down to 176 and 232 bytes on FreeBSD respectively.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15350
Rob N [Fri, 6 Oct 2023 15:39:20 +0000 (02:39 +1100)]
tests/block_cloning: sync before write in fallback test
We're still seeing this test fail intermittently (that is, the clone
happens), which must mean the write and the clone can still be happening
on different txgs.
It might be that there's still activity after the pool is created. So
here we force a sync before starting the write.
Sponsored-By: Klara Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #15359