When link_active_on_if_down flag is disabled and link is brought down
with ifconfig, FW reports a false positive link event about an
unqualified transceiver. The condition used in the driver to filter out
those false positive events was incorrect and caused that unqualified
module event to also not be reported when the event was valid.
Change the condition to rely on IFF_UP flag instead of
link_active_on_if_down and bump driver version to 2.3.1-k.
ixl(4): Add tunable to override Flow Control settings
Add flow_control to hw.ixl tunables tree to let override
initial flow control configuration for all interfaces.
Keep using configuration set by NVM by default.
Ed Maste [Sun, 12 Sep 2021 23:04:31 +0000 (19:04 -0400)]
libprocstat: extend zfs_defs hack for .pieo
By default _pie.a archives are built only for INTERNALLIBs, so there is
usually no need for zfs_defs.pieo to exist. However, some experimental
work builds _pie.a archives for everything. Extend the existing set of
zfs_defs hacks to build zfs_defs.pieo as well.
Reviewed by: arichardson
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31924
Ed Maste [Sun, 12 Sep 2021 16:45:50 +0000 (12:45 -0400)]
bsd.lib.mk: add conditions for building _pie.a archives
As with other .a targets, build _pie.a archives only if LIB is set.
At present we build _pie.a only for INTERNALLIBs, and none of them
include bsd.lib.mk without setting LIB. However, we might want to build
_pie.a for non-INTERNALLIBs in the future.
Reviewed by: arichardson
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31920
Dimitry Andric [Thu, 26 Aug 2021 15:36:03 +0000 (17:36 +0200)]
Add -Wno-error=unused-but-set-variable when building with Clang 13+
This warning triggers many times while building world. Downgrade it to a
warning until all occurrences have been fixed. Once the Clang warnings
have been fixed we should be able to turn it on for GCC as well. See
also f4fed768bba45a406f73ed1491d7e52fd1a8711d which did the same for the
kernel builds.
Martin Matuska [Sat, 18 Sep 2021 18:30:40 +0000 (20:30 +0200)]
zfs: merge openzfs/zfs@71c609852 (zfs-2.1-release) into stable/13
OpenZFS release 2.1.1
Notable upstream pull request merges:
#11997 FreeBSD: Don't force xattr mount option
#11997 FreeBSD: Implement xattr=sa
#11997 FreeBSD: Use SET_ERROR to trace xattr name errors
#12022 Fix endianness issues with zstd
#12161 Restore FreeBSD sysctl processing for arc.min and arc.max
#12183 Optimize small random numbers generation
#12246 arc: Drop an incorrect assert
#12271 Tinker with slop space accounting with dedup
#12279 Fix ARC ghost states eviction accounting
#12281 Move gethrtime() calls out of vdev queue lock
#12289 Compact dbuf/buf hashes and lock arrays
#12294 Upstream: dmu_zfetch_stream_fini leaks refcount
#12295 Fix abd leak, kmem_free correct size of abd_t
#12297 Avoid vq_lock drop in vdev_queue_aggregate()
#12299 file reference counts can get corrupted
#12300 Introduce dsl_dir_diduse_transfer_space()
#12314 Optimize allocation throttling
#12320 FreeBSD: Use unmapped I/O for scattered/gang ABD buffers
#12328 FreeBSD: Hardcode abd_chunk_size to PAGE_SIZE
#12339 Read past end of argv array in zpool_do_import()
#12348 Minor ARC optimizations
#12365 Fixes in persistent L2ARC
#12375 FreeBSD: Ignore make_dev_s() errors
#12378 FreeBSD: Switch from MAXPHYS to maxphys on FreeBSD 13+
#12383 Fixes for KMSAN reports
#12397 Run arc_evict thread at higher priority
#12398 Remove b_pabd/b_rabd allocation from arc_hdr_alloc()
#12422 Fix/improve dbuf hits accounting
#12428 Fix unfortunate NULL in spa_update_dspace
#12443 Fixed data integrity issue when underlying disk returns error
#12446 Allow disabling of unmapped I/O on FreeBSD
#12473 Initialize parity blocks before RAID-Z reconstruction benchmarking
#12511 Make 'zpool labelclear -f' work on offlined disks
#12514 FreeBSD: Don't remove SA xattr if not SA znode
#12522 Compressed receive with different ashift can result in incorrect
PSIZE on disk
#12535 Verify embedded blkptr's in arc_read()
#12541 Allow sending corrupt snapshots even if metadata is corrupted
Manually included upstream 2.1 backport pull request #12573:
#12282 FreeBSD: fix compilation of FreeBSD world after 29274c9
tag2name() returns a uint16_t, so we don't need to use uint32_t for the
qid (or pqid). This reduces the size of struct pf_kstate slightly. That
in turn buys us space to add extra fields for dummynet later.
Happily these fields are not exposed to user space (there are user space
versions of them, but they can just stay uint32_t), so there's no ABI
breakage in modifying this.
When we're synproxy-ing a connection that's going to us (as opposed to a
forwarded one) we wound up trying to send out the pf-generated tcp
packets through pf_intr(), which called ip(6)_output(). That doesn't
work all that well for packets that are destined for us, so in that case
we must call ip(6)_input() instead.
Mark Johnston [Fri, 10 Sep 2021 13:07:40 +0000 (09:07 -0400)]
net: Enter a net epoch around protocol if_up/down notifications
When traversing a list of interface addresses, we need to be in a net
epoch section, and protocol ctlinput routines need a stable reference to
the address.
Reported by: syzbot+3219af764ead146a3a4e@syzkaller.appspotmail.com
Reviewed by: kp, melifaro
Sponsored by: The FreeBSD Foundation
Alexander Motin [Fri, 3 Sep 2021 01:16:46 +0000 (21:16 -0400)]
callout(9): Allow spin locks use with callout_init_mtx().
Implement lock_spin()/unlock_spin() lock class methods, moving the
assertion to _sleep() instead. Change assertions in callout(9) to
allow spin locks for both regular and C_DIRECT_EXEC cases. In case of
C_DIRECT_EXEC callouts spin locks are the only locks allowed actually.
As the first use case allow taskqueue_enqueue_timeout() use on fast
task queues. It actually becomes more efficient due to avoided extra
context switches in callout(9) thanks to C_DIRECT_EXEC.
Alexander Motin [Wed, 18 Aug 2021 21:11:03 +0000 (17:11 -0400)]
geli(8): Do not report error on resize to the same size.
Just validate the old metadata and exit. Originally the check was
added to not thash the only copy of metadata, but we can achieve the
same just by skipping the writing/trashing. The metadata validation
should protect user from wrongly specifying new size instead of old.
- Some configurations, e.g. HP EliteBook 840 G3, come with a dummy card
in the card slot which is detected as a valid SD card. This added long
timeout at boot time. To alleviate the problem, the default timeout is
reduced to one second during the setup phase. [1]
- Some configurations crash at boot if rtsx(4) is defined in the kernel
config. At boot time, without a card inserted, the driver found that
a card is present and just after that a "spontaneous" interrupt is
generated showing that no card is present. To solve this problem,
DELAY(9) is set to one quarter of a second before checking card presence
during driver attach.
- As advised by adrian, taskqueue and DMA are set up sooner during
the driver attach. A heuristic to try to detect configuration needing
inversion was added.
Mark Johnston [Thu, 9 Sep 2021 13:50:27 +0000 (09:50 -0400)]
osd: Fix racy assertions
osd_register(9) may reallocate and expand the destructor array for a
given object type if no space is available for a new key. This happens
with the object lock held. Thus, when verifying that a given slot in
the array is occupied, we need to hold the object lock to avoid racing
with a reallocation.
Reported by: syzbot+69ce54c7d7d813315dd3@syzkaller.appspotmail.com
Sponsored by: The FreeBSD Foundation
Alexander Motin [Thu, 2 Sep 2021 22:11:58 +0000 (18:11 -0400)]
bnxt(4): Fix bugs in WOL support.
Before this change driver reported IFCAP_WOL_MAGIC enabled, but not
supported. It caused errors on some SIOCSIFCAP calls. Instead
report the support if hardware supports WOL, and enabled status if
it has such filter installed on boot.
Also bnxt_wol_config() should check WOL status in if_getcapenable(),
not in if_getcapabilities() to get current one.
Rick Macklem [Sun, 29 Aug 2021 23:46:27 +0000 (16:46 -0700)]
nfsd: Make loop calling VOP_ALLOCATE() iterate until done
The NFSv4.2 Deallocate operation loops on VOP_DEALLOCATE()
while progress is being made (remaining length decreasing).
This patch changes the loop on VOP_ALLOCATE() for the NFSv4.2
Allocate operation do the same, instead of stopping after
an arbitrary 20 iterations.
Brian Behlendorf [Wed, 15 Sep 2021 20:19:12 +0000 (13:19 -0700)]
Linux 5.14 compat: META
Increase the Linux-Maximum version in the META file to 5.14.
All of the required compatibility patches have been merged
and the 5.14 kernel has been officially released.
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12565
Because lld 13 and higher default to garbage collecting start/stop
symbols when using --gc-sections, the linker sets used in the i386 boot
loaders will disappear. This leads to the loaders not recognizing any
commands, and failure to boot.
Until we have a good set of linker scripts for the loaders, work around
it by disabling the start-stop-gc feature.
Colin Percival [Wed, 15 Sep 2021 02:42:14 +0000 (19:42 -0700)]
Turn off acpi_timer_test on !i386 by default
The ACPI timer test was introduced in 2002 to detect an erratum in
chipsets used with Pentium II and Pentium III processors. No other
hardware is known to be affected, so on non-i386 systems it should
be safe to skip the test.
Turning off this test speeds up the FreeBSD boot process by roughly
140 ms on an EC2 c5.xlarge instance.
The previous behaviour can be restored by setting
hw.acpi.timer_test_enabled=1
in /boot/loader.conf.
Colin Percival [Tue, 7 Sep 2021 23:58:18 +0000 (16:58 -0700)]
Hide acpi_timer_test behind a tunable
When hw.acpi.timer_test_enabled is set to 0, this makes acpi_timer_test
return 1 without actually testing the ACPI timer; this results in the
ACPI-fast timecounter always being used rather than potentially using
ACPI-safe.
The ACPI timer testing was introduced in 2002 as a workaround for
errata in Pentium II and Pentium III chipsets, and is unlikely to be
needed in 2021.
While I'm here, add TSENTER/TSEXIT to make it easier to see the time
spent on the test (if it is enabled).
Ed Maste [Tue, 31 Aug 2021 19:30:50 +0000 (15:30 -0400)]
openssh: simplify login class restrictions
Login class-based restrictions were introduced in 5b400a39b8ad. The
code was adapted for sshd's Capsicum sandbox and received many changes
over time, including at least fc3c19a9fcee, bd393de91cc3, and e8c56fba2926.
During an attempt to upstream the work a much simpler approach was
suggested. Adopt it now in the in-tree OpenSSH to reduce conflicts with
future updates.
Fixed data integrity issue when underlying disk returns error
Errors in zil_lwb_write_done() are not propagated to
zil_lwb_flush_vdevs_done() which can result in zil_commit_impl()
not returning an error to applications even when zfs was not able
to write data to the disk.
Remove the ZIO_FLAG_DONT_PROPAGATE flag from zio_rewrite() to
allow errors to propagate and consolidate the error handling for
flush and write errors to a single location (rather than having
error handling split between the "write done" and "flush done"
handlers).
Reviewed-by: George Wilson <gwilson@delphix.com> Reviewed-by: Prakash Surya <prakash.surya@delphix.com> Signed-off-by: Arun KV <arun.kv@datacore.com>
Closes #12391
Closes #12443
Brian Behlendorf [Mon, 13 Sep 2021 19:18:01 +0000 (12:18 -0700)]
ZTS: Waiting for zvols to be available
This is a follow up patch for PR #12515 which addresses some
additional ZTS tests which are unreliable are should explicitly
wait for the required zvols to be available.
Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: @Theo13111 Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12553
Brian Behlendorf [Fri, 10 Sep 2021 01:02:07 +0000 (18:02 -0700)]
Verify embedded blkptr's in arc_read()
The block pointer verification check in arc_read() should also
cover embedded block pointers. While highly unlikely, accessing
a damaged block pointer can result in panic. To further harden
the code extend the existing check to include embedded block
pointers and add a comment explaining the rational for this
sanity check. Lastly, correct a flaw in zfs_blkptr_verify()
so the error count is checked even when checking a untrusted
config to verify the non-pool-specific portions of a block
pointer.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12535
Added compatibility code to detect the new ->get_acl() interface
and correctly handle the case where the new rcu argument is set.
Reviewed-by: Coleman Kane <ckane@colemankane.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12548
Allan Jude [Thu, 9 Sep 2021 14:17:31 +0000 (10:17 -0400)]
Allow sending corrupt snapshots even if metadata is corrupted
When zfs_send_corrupt_data is set, use the TRAVERSE_HARD flag,
so traverse_visitbp() will not fail with ECKSUM if a blockpointer
cannot be read, but rather will continue and send the objects it can.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Allan Jude <allan@klarasystems.com> Sponsored-By: Klara Inc. Sponsored-By: WHC Online Solutions Inc.
Closes #12541
Unfortunately, there was an overzealous assertion that was (in pretty
specific circumstances) false, causing failure. This assertion was
added in error, so we're removing it.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: George Wilson <gwilson@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #9897
Closes #12020
Closes #12246
Paul Dagnelie [Wed, 8 Sep 2021 20:52:28 +0000 (13:52 -0700)]
Compressed receive with different ashift can result in incorrect PSIZE on disk
We round up the psize to the nearest multiple of the asize or to the
lsize, whichever is smaller. Once that's done, we allocate a new
buffer of the appropriate size, zero the tail, and copy the data
into it. This adds a small performance cost to these kinds of writes,
but fixes the bookkeeping problems.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Co-authored-by: Matthew Ahrens <matthew.ahrens@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #12522
Closes #8462
Alexander [Wed, 8 Sep 2021 19:59:43 +0000 (21:59 +0200)]
Linux 5.15 compat: standalone <linux/stdarg.h>
Kernel commits
39f75da7bcc8 ("isystem: trim/fixup stdarg.h and other headers") c0891ac15f04 ("isystem: ship and use stdarg.h") 564f963eabd1 ("isystem: delete global -isystem compile option")
(for now can be found in linux-next.git tree, will land into the
Linus' tree during the ongoing 5.15 cycle with one of akpm merges)
removed the -isystem flag and disallowed the inclusion of any
compiler header files. They also introduced a minimal
<linux/stdarg.h> as a replacement for <stdarg.h>.
include/os/linux/spl/sys/cmn_err.h in the ZFS source tree includes
<stdarg.h> unconditionally. Introduce a test for <linux/stdarg.h>
and include it instead of the compiler's one to prevent module
build breakage.
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Closes #12531
The 5.15 kernel moved the backing_dev_info structure out of
the request queue structure which causes a build failure.
Rather than look in the new location for the BDI we instead
detect this upstream refactoring by the existance of either
the blk_queue_update_readahead() or disk_update_readahead()
functions. In either case, there's no longer any reason to
manually set the ra_pages value since it will be overridden
with a reasonable default (2x the block size) when
blk_queue_io_opt() is called.
Therefore, we update the compatibility wrapper to do nothing
for 5.9 and newer kernels. While it's tempting to do the
same for older kernels we want to keep the compatibility
code to preserve the existing behavior. Removing it would
effectively increase the default readahead to 128k.
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12532
George Melikov [Tue, 31 Aug 2021 20:56:45 +0000 (23:56 +0300)]
CI: don't install abigail-tools
We use docker image instead.
Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12529
George Melikov [Tue, 31 Aug 2021 19:26:30 +0000 (22:26 +0300)]
Update ABI files via new libabigail version
Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12529
George Melikov [Tue, 31 Aug 2021 18:52:05 +0000 (21:52 +0300)]
Libabigail: make .abi files more consistent
Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12529
George Melikov [Tue, 31 Aug 2021 17:53:12 +0000 (20:53 +0300)]
CI: use fresh libabigail via docker image
Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12529
George Melikov [Tue, 31 Aug 2021 17:49:29 +0000 (20:49 +0300)]
Check for libabigail version
We need to use 1.8.0+ version, older versions
may segfault and give inconsistent results.
Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12529
Ryan Moeller [Wed, 1 Sep 2021 20:20:00 +0000 (16:20 -0400)]
ZTS: Remove exceptions for flaky zhack on FreeBSD
Issue #11854 has been resolved, so we can remove the exceptions for it.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12527
Ryan Moeller [Mon, 30 Aug 2021 23:01:09 +0000 (19:01 -0400)]
FreeBSD: Don't remove SA xattr if not SA znode
We attempt to remove an existing SA xattr when setting a dir xattr, but
this only makes sense if the znode has been upgraded to the SA format.
Otherwise, we will hit an assert in zfs_sa_get_xattr.
Make sure this is an SA znode before attempting to remove the SA xattr.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12514
Rich Ercolani [Mon, 30 Aug 2021 21:13:46 +0000 (17:13 -0400)]
Fix cross-endian interoperability of zstd
It turns out that layouts of union bitfields are a pain, and the
current code results in an inconsistent layout between BE and LE
systems, leading to zstd-active datasets on one erroring out on
the other.
Switch everyone over to the LE layout, and add compatibility code
to read both.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12008
Closes #12022
Brian Behlendorf [Sun, 29 Aug 2021 15:56:58 +0000 (08:56 -0700)]
ZTS: Waiting for zvols to be available
The ZTS block_device_wait helper function should use -e when waiting
for a file to appear since it will be either a block special device
or a symlink. This didn't cause any failures but when a device path
was specified the function would wait longer than needed.
Additionally update the most flakey test cases to pass the file path
to block_device_wait to try and improve the test reliability. The
udev behavior on Fedora in particular can result in frequent false
positives.
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12515
Ryan Moeller [Fri, 27 Aug 2021 17:02:54 +0000 (13:02 -0400)]
Correct checking bdev_check_media_change message
We're not looking for bdev_disk_changed.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12492
Tony Hutter [Fri, 27 Aug 2021 16:26:49 +0000 (09:26 -0700)]
Make 'zpool labelclear -f' work on offlined disks
This patch allows you to clear the label on offlined disks in an active
pool with `-f`. Previously, labelclear wouldn't let you do that.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #12511
Anton Gubarkov [Wed, 25 Aug 2021 20:01:26 +0000 (23:01 +0300)]
vdev_id: Return an error if config file is not found
Signed-off-by: Anton Gubarkov <anton.gubarkov@gmail.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Mark Johnston [Mon, 23 Aug 2021 18:10:17 +0000 (14:10 -0400)]
Initialize parity blocks before RAID-Z reconstruction benchmarking
benchmark_raidz() allocates a row to benchmark parity calculation and
reconstruction. In the latter case, the parity blocks are left
uninitialized, leading to reports from KMSAN.
Initialize parity blocks to 0xAA as we do for the data earlier in the
function. This does not affect the selected RAID-Z implementation on
any of several systems tested.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12473
Ryan Moeller [Mon, 26 Jul 2021 20:08:52 +0000 (16:08 -0400)]
ZTS: Add tests for creation time
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12432
Richard Yao [Sun, 17 Mar 2019 00:43:13 +0000 (20:43 -0400)]
Linux 4.11 compat: statx support
Linux 4.11 added a new statx system call that allows us to expose crtime
as btime. We do this by caching crtime in the znode to match how atime,
ctime and mtime are cached in the inode.
statx also introduced a new way of reporting whether the immutable,
append and nodump bits have been set. It adds support for reporting
compression and encryption, but the semantics on other filesystems is
not just to report compression/encryption, but to allow it to be turned
on/off at the file level. We do not support that.
We could implement semantics where we refuse to allow user modification
of the bit, but we would need to do a dnode_hold() in zfs_znode_alloc()
to find out encryption/compression information. That would introduce
locking that will have a minor (although unmeasured) performance cost.
It also would be inferior to zdb, which reports far more detailed
information. We therefore omit reporting of encryption/compression
through statx in favor of recommending that users interested in such
information use zdb.
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #8507
Gordon Bergling [Tue, 17 Aug 2021 17:01:07 +0000 (19:01 +0200)]
zfs.4: Fix typo s/compatiblity/compatibility/
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: Gordon Bergling <gbergling@googlemail.com>
Closes #12464
Alexander Motin [Tue, 17 Aug 2021 16:15:54 +0000 (12:15 -0400)]
Remove b_pabd/b_rabd allocation from arc_hdr_alloc()
When a header is allocated for full overwrite it is a waste of time
to allocate b_pabd/b_rabd for it, since arc_write() will free them
without ever being touched. If it is a read or a partial overwrite
then arc_read() and arc_hdr_decrypt() allocate them explicitly.
Reduced memory allocation in user threads also reduces ARC eviction
throttling there, proportionally increasing it in ZIO threads, that
is not good. To minimize or even avoid it introduce ARC allocation
reserve, allowing certain arc_get_data_abd() callers to allocate a
bit longer in situations where user threads will already throttle.
Reviewed-by: George Wilson <gwilson@delphix.com> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12398
Alexander Motin [Tue, 17 Aug 2021 15:55:34 +0000 (11:55 -0400)]
Optimize arc_l2c_only lists assertions
It is very expensive and not informative to call multilist_is_empty()
for each arc_change_state() on debug builds to check for impossible.
Instead implement special index function for arc_l2c_only->arcs_list,
multilists, panicking on any attempt to use it.
Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12421
Alexander Motin [Tue, 17 Aug 2021 15:50:31 +0000 (11:50 -0400)]
Fix/improve dbuf hits accounting
Instead of clearing stats inside arc_buf_alloc_impl() do it inside
arc_hdr_alloc() and arc_release(). It fixes statistics being wiped
every time a new dbuf is filled from the ARC.
Remove b_l1hdr.b_l2_hits. L2ARC hits are accounted at b_l2hdr.b_hits.
Since the hits are accounted under hash lock, replace atomics with
simple increments.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Wilson <george.wilson@delphix.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12422
Alexander Motin [Tue, 17 Aug 2021 15:47:00 +0000 (11:47 -0400)]
Avoid vq_lock drop in vdev_queue_aggregate()
vq_lock is already too congested for two more operations per I/O.
Instead of dropping and reacquiring it inside vdev_queue_aggregate()
delegate the zio_vdev_io_bypass() and zio_execute() calls for parent
I/Os to callers, that drop the lock any way to execute the new I/O.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12297
Alexander Motin [Tue, 17 Aug 2021 15:44:34 +0000 (11:44 -0400)]
Use more atomics in refcounts
Use atomic_load_64() for zfs_refcount_count() to prevent torn reads
on 32-bit platforms. On 64-bit ones it should not change anything.
When built with ZFS_DEBUG but running without tracking enabled use
atomics instead of mutexes same as for builds without ZFS_DEBUG.
Since rc_tracked can't change live we can check it without lock.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12420
Ryan Moeller [Mon, 16 Aug 2021 23:38:34 +0000 (19:38 -0400)]
ZTS: Avoid unset $tmpdir in redacted_panic
The redacted_send tests make use of a $tmpdir variable, except in
redacted_send/redacted_panic the variable is never defined.
Use $TEST_BASE_DIR instead.
Clean up the stream file after the test.
Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12455
Allan Jude [Mon, 16 Aug 2021 15:35:19 +0000 (11:35 -0400)]
Restore FreeBSD sysctl processing for arc.min and arc.max
Before OpenZFS 2.0, trying to set the FreeBSD sysctl vfs.zfs.arc_max
to a disallowed value would return an error.
Since the switch, it instead only generates WARN_IF_TUNING_IGNORED
Keep the ability to set the sysctl's specifically to 0, even though
that is less than the minimum, because some tests depend on this.
Also lost, was the ability to set vfs.zfs.arc_max to a value less
than the default vfs.zfs.arc_min at boot time. Restore this as well.
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #12161
Ryan Moeller [Fri, 13 Aug 2021 20:42:45 +0000 (16:42 -0400)]
zfs: add missed dependency of zfs module on zlib
Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Martin Matuska <mm@FreeBSD.org> Co-authored-by: Konstantin Belousov <kib@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
External-issue: https://reviews.freebsd.org/D31207
Closes #12442
Ryan Moeller [Fri, 13 Aug 2021 20:37:46 +0000 (16:37 -0400)]
Add zfs.sh -r flag to reload modules
zfs.sh already can load and unload, so why not both?
This is convenient when developing changes to the module and you want
to rapidly make some changes, rebuild the module, reload the module,
and test the changes.
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12450
Ryan Moeller [Fri, 13 Aug 2021 20:13:57 +0000 (16:13 -0400)]
Fix usage of find in tests/Makefile.am
The path is not optional on FreeBSD.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12453
Tony Nguyen [Tue, 10 Aug 2021 17:36:26 +0000 (11:36 -0600)]
Run arc_evict thread at higher priority
Run arc_evict thread at higher priority, nice=0, to give it more CPU
time which can improve performance for workload with high ARC evict
activities.
On mixed read/write and sequential read workloads, I've seen between
10-40% better performance.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Tony Nguyen <tony.nguyen@delphix.com>
Closes #12397
George Melikov [Thu, 5 Aug 2021 21:30:28 +0000 (00:30 +0300)]
Man zpool-scrub.8: describe sequential scrub
Describe sequential scrub and add examples of scrub status.
Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12429
hedongzhang [Tue, 3 Aug 2021 17:46:33 +0000 (01:46 +0800)]
Modify checksum obtain method of QAT
CpaDcGeneratefooter function that obtain the checksum code
does not support the CPA_DC_STATELESS mode. So we get the
adler32 chencksum of the end of the zlib from dc_results.
Mark Johnston [Mon, 2 Aug 2021 19:18:24 +0000 (15:18 -0400)]
Allow disabling of unmapped I/O on FreeBSD
We have a tunable which permits one to disable the use of unmapped I/O
for the buffer cache. Respect it in ZFS as well. This is useful for
KMSAN, which cannot easily maintain shadow state for unmapped pages.
No functional change intended, as unmapped I/O is permitted by default
and there's no real reason to disable it in practice except for
debugging.
Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12446
- Bail out early if we're running the perf tests and forget to
specify disks.
- Allow perf tests to run with any number of disks.
- Remove weekly vs. nightly settings
- Move variables with common values to perf.shlib
- Use zinject to clear the ARC over export/import
- Fix dbuf cache size calculation
When the meaning of `dbuf_cache_max_bytes` changed, the performance
test that covers the dbuf cache started to fail. The test would try to
write files for the test using the max possible size of the cache,
inevitably filling the pool and failing. This change uses
`dbuf_cache_shift` to correctly calculate the dbuf cache size.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes #12408
Matthew Ahrens [Mon, 26 Jul 2021 19:51:39 +0000 (12:51 -0700)]
Read past end of argv array in zpool_do_import()
`zpool_do_import()` passes `argv[0]`, (optionally) `argv[1]`, and
`pool_specified` to `import_pools()`. If `pool_specified==FALSE`, the
`argv[]` arguments are not used. However, these values may be off the
end of the `argv[]` array, so loading them could dereference unmapped
memory. This error is reported by the asan build:
```
=================================================================
==6003==ERROR: AddressSanitizer: heap-buffer-overflow
READ of size 8 at 0x6030000004a8 thread T0
#0 0x562a078b50eb in zpool_do_import zpool_main.c:3796
#1 0x562a078858c5 in main zpool_main.c:10709
#2 0x7f5115231bf6 in __libc_start_main
#3 0x562a07885eb9 in _start
0x6030000004a8 is located 0 bytes to the right of 24-byte region
allocated by thread T0 here:
#0 0x7f5116ac6b40 in __interceptor_malloc
#1 0x562a07885770 in main zpool_main.c:10699
#2 0x7f5115231bf6 in __libc_start_main
```
This commit passes NULL for these arguments if they are off the end
of the `argv[]` array.
Reviewed-by: George Wilson <gwilson@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #12339
George Amanakis [Mon, 26 Jul 2021 19:30:24 +0000 (21:30 +0200)]
Fixes in persistent L2ARC
In l2arc_add_vdev() first decide whether the device is eligible for
L2ARC rebuild or whole device trim and then add it to the list of cache
devices. Otherwise l2arc_feed_thread() might already start writing on
the device invalidating previous content as l2ad_hand = l2ad_start.
However l2arc_rebuild_vdev() needs the device present in the cache
device list to figure out its l2arc_dev_t. Fix this by moving most of
l2arc_rebuild_vdev() in a new function l2arc_rebuild_dev() which does
not need to search in the cache device list.
In contrast to l2arc_add_vdev() we do not have to worry about
l2arc_feed_thread() invalidating previous content when onlining a
cache device. The device parameters (l2ad*) are not cleared when
offlining the device and writing new buffers will not invalidate
all previous content. In worst case only buffers that have not had
their log block written to the device will be lost.
Retire persist_l2arc_00{4,5,8} tests since they cover code already
covered by the remaining ones. Test persist_l2arc_006 is renamed to
persist_l2arc_004 and persist_l2arc_007 is renamed to persist_l2arc_005.
Fix a typo in persist_l2arc_004, and remove an assertion that is not
always true from l2arc_arcstats_pos. Also update an assertion in
persist_l2arc_005 and explain why in a comment.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #12365
Mark Johnston [Fri, 16 Jul 2021 14:12:47 +0000 (10:12 -0400)]
Initialize dn_next_type[] in the dnode constructor
It seems nothing ensures that this array is zeroed when a dnode is
freshly allocated, so in principle it retains the values from the
previous allocation. In practice it seems to be the case that the
fields should end up zeroed, but we can zero the field anyway for
consistency.
This was found using KMSAN.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12383
Test prioritisation and dummynet queues.
We need to give the pipe sufficient bandwidth for dummynet to work.
Given that we can't rely on the TCP connection failing alltogether, but
we can measure the effect of dummynet by imposing a time limit on a
larger data transfer.
If TCP is prioritised it'll get most of the pipe bandwidth and easily
manage to transfer the data in 3 seconds or less. When not prioritised
this will not succeed.
Kristof Provost [Tue, 25 May 2021 14:54:32 +0000 (16:54 +0200)]
ipfw: Introduce dnctl
Introduce a link to the ipfw command, dnctl, for dummynet configuration.
dnctl only handles dummynet configuration, and is part of the effort to
support dummynet in pf.
/sbin/ipfw continues to accept pipe, queue and sched commands, but these can
now also be issued via the new dnctl command.