Brian Behlendorf [Fri, 17 Dec 2021 20:40:34 +0000 (12:40 -0800)]
ZTS: alloc_class.ksh must wait for the process to exit
The alloc_class_* tests may fail on Linux with an EBUSY error if
`zfs destroy` is run before the `dd` process has had a chance to
terminate. Wait on the pid after the `kill -9` to make sure.
When testing I didn't observe any failures for the alloc_class
tests. Remove them from the exceptions list, the CI was used to
verify the tests pass on all platforms.
Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Rich Ercolani <rincebrain@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12873
Rich Ercolani [Fri, 17 Dec 2021 20:39:10 +0000 (15:39 -0500)]
ZTS: Avoid piping send directly to /dev/null
Unfortunately, #11445 means while we fail gracefully now, we still
fail, unless people want to implement a complex workaround just to
support /dev/null.
So let's just use the cheap workaround in a test for now.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12872
Tony Hutter [Fri, 17 Dec 2021 20:37:21 +0000 (12:37 -0800)]
ZTS: Fix zpool_reopen_[1-5] on Fedora 35
The zpool_reopen_[1-5] tests are failing Fedora 35 with:
zpool_reopen_001_pos.ksh[64]: log_must[67]: log_pos[270]:
wait_for_resilver_end[98]: wait_for_action: line 71: func: is read only
Renaming 'func' -> 'funct' fixes the issue.
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #12871
Brian Behlendorf [Fri, 17 Dec 2021 17:52:13 +0000 (09:52 -0800)]
Fix zvol_open() lock inversion
When restructuring the zvol_open() logic for the Linux 5.13 kernel
a lock inversion was accidentally introduced. In the updated code
the spa_namespace_lock is now taken before the zv_suspend_lock
allowing the following scenario to occur:
This commit resolves the issue by moving the acquisition of the
spa_namespace_lock back to after the zv_suspend_lock which restores
the original ordering.
Additionally, as part of this change the error exit paths were
simplified where possible.
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Rich Ercolani <rincebrain@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12863
Alan Somers [Fri, 17 Dec 2021 17:50:12 +0000 (10:50 -0700)]
FreeBSD: Update argument types for VOP_READDIR
A recent commit to FreeBSD changed the type of
vop_readdir_args.a_cookies to a uint64_t**. There is no functional
impact to ZFS because ZFS only uses 32-bit cookies, which will be
zero-extended to 64-bits by the existing code.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #12874
наб [Fri, 17 Dec 2021 00:43:10 +0000 (01:43 +0100)]
zcommon: pre-iterate over sysfs instead of statting every feature
If sufficient memory (<2K, realistically) is available, libzfs_init()
can be significantly shorted by iterating over the correct sysfs
directory before registrations, we can turn 168 stats into 15/18
syscalls (3 opens (6 if built in), 3 fstats, 6 getdentses, and 3
closes), a tenfoldish reduction; this is probably a bit faster, too.
The list is always optional, and registration functions (and one-off
users) can simply pass NULL, which will fall back to the previous
mechanism
Also, don't allocate in zfs_mod_supported_impl, and use use access()
instead of stat(), since existence is really what we care about
Also, fix pre-prop-checking compat in fallback for built-in ZFS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12089
Ryan Moeller [Thu, 16 Dec 2021 21:22:15 +0000 (16:22 -0500)]
FreeBSD: Provide correct file generation number
va_seq was actually a thin veil over va_gen, so z_gen is a more
appropriate value than z_seq to populate the field with.
Drop the unnecessary compat obfuscation and provide the correct
file generation number.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <freqlabs@freebsd.org>
Closes #12851
Allan Jude [Thu, 16 Dec 2021 19:56:22 +0000 (14:56 -0500)]
zfs list: Allow more fields in ZFS_ITER_SIMPLE mode
If the fields to be listed and sorted by are constrained
to those populated by dsl_dataset_fast_stat(), then
zfs list is much faster, as it does not need to open each
objset and reads its properties.
A previous optimization by Pawel Dawidek
(0cee24064a79f9c01fc4521543c37acea538405f) took advantage
of this to make listing snapshot names sorted only by name
much faster.
However, it was limited to `-o name -s name`, this work
extends this optimization to work with:
- name
- guid
- createtxg
- numclones
- inconsistent
- redacted
- origin
and could be further extended to any other properties
supported by dsl_dataset_fast_stat() or similar, that do
not require extra locking or reading from disk.
Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Pawel Jakub Dawidek <pawel@dawidek.net> Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #11080
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12828
Mark Johnston [Sat, 20 Nov 2021 16:21:25 +0000 (11:21 -0500)]
zfs: Fix a deadlock between page busy and the teardown lock
When rolling back a dataset, ZFS has to purge file data resident in the
system page cache. To do this, it loops over all vnodes for the
mountpoint and calls vn_pages_remove() to purge pages associated with
the vnode's VM object. Each page is thus exclusively busied while the
dataset's teardown write lock is held.
When handling a page fault on a mapped ZFS file, FreeBSD's page fault
handler busies newly allocated pages and then uses VOP_GETPAGES to fill
them. The ZFS getpages VOP acquires the teardown read lock with vnode
pages already busied. This represents a lock order reversal which can
lead to deadlock.
To break the deadlock, observe that zfs_rezget() need only purge those
pages marked valid, and that pages busied by the page fault handler are,
by definition, invalid. Furthermore, ZFS pages always transition from
invalid to valid with the teardown lock held, and ZFS never creates
partially valid pages. Thus, zfs_rezget() can use the new
vn_pages_remove_valid() to skip over pages busied by the fault handler.
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12828
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12828
наб [Tue, 7 Dec 2021 20:30:10 +0000 (21:30 +0100)]
contrib/bash_completion.d: fix error spew from __zfs_match_snapshot()
Given:
/sbin/zfs list filling/a-zvol<TAB> -o space,refratio
The rest of the cmdline gets vored by:
/sbin/zfs list filling/a-zvolcannot open 'filling/a-zvol':
operation not applicable to datasets of this type
With -x (fragment):
+ COMPREPLY=($(compgen -W "$(__zfs_match_snapshot)" -- "$cur"))
+++ __zfs_match_snapshot
+++ local base_dataset=filling/dziadtop-nowe-duchy
+++ [[ filling/dziadtop-nowe-duchy != filling/dziadtop-nowe-duchy ]]
+++ [[ filling/dziadtop-nowe-duchy != '' ]]
+++ __zfs_list_datasets filling/dziadtop-nowe-duchy
+++ /sbin/zfs list -H -o name -s name -t filesystem
-r filling/dziadtop-nowe-duchy
+++ tail -n +2
cannot open 'filling/dziadtop-nowe-duchy':
operation not applicable to datasets of this type
+++ echo filling/dziadtop-nowe-duchy
+++ echo filling/dziadtop-nowe-duchy@
++ compgen -W 'filling/dziadtop-nowe-duchy
This properly completes with:
$ /sbin/zfs list filling/a-zvol<TAB> -o space,refratio
filling/a-zvol filling/a-zvol@
$ /sbin/zfs list filling/a-zvol<cursor> -o space,refratio
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12820
Coleman Kane [Sun, 5 Dec 2021 20:18:46 +0000 (15:18 -0500)]
Linux 5.16: Resolve ZSTD_isError symbol collision in Linux kernel
Newer zstd code introduced in the main kernel tree now creates a symbol
collision with ZSTD_isError in our ZSTD code. This change relabels our
implementation with a ZFS-specific symbol name, and undoes some
macro-based micro-optimizations that conflict with the attempt to rename
our internal-use version.
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #12819
Coleman Kane [Sat, 4 Dec 2021 03:00:10 +0000 (22:00 -0500)]
Linux 5.16: The blk-cgroup.h header is where struct blkcg_gq is defined
The definition of struct blkcg_gq was moved into blk-cgroup.h, which is
a header that's been in Linux since 2015. This is used by
vdev_blkg_tryget() in module/os/linux/zfs/vdev_disk.c. Since the kernel
for CentOS 7 and similar-generation releases doesn't have this header,
its inclusion is guarded by a configure test.
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #12819
Coleman Kane [Sat, 4 Dec 2021 02:45:28 +0000 (21:45 -0500)]
Linux 5.16: bio_set_dev is no longer a helper macro
This change adds a confiugre check to determine if bio_set_dev is a
helper macro or not. If not, then the attempt to override its internal
call to bio_associate_blkg(), with a macro definition to our own
version, is no longer possible, as the compiler won't use it when
compiling the new inline function replacement implemented in the header.
This change also creates a new vdev_bio_set_dev() function that performs
the same work, and also performs the work implemented in
vdev_bio_associate_blkg(), as it is the only thing calling that function
in our code. Our custom vdev_bio_associate_blkg() is now only compiled
if the bio_set_dev() is a macro in the Linux headers.
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #12819
Coleman Kane [Fri, 3 Dec 2021 04:25:08 +0000 (23:25 -0500)]
Linux 5.16: type member of iov_iter renamed iter_type
The iov_iter->type member was renamed iov_iter->iter_type. However,
while looking into this, realized that in 2018 a iov_iter_type(*iov)
accessor function was introduced. So if that is present, use it,
otherwise fall back to trying the existing behavior of directly
accessing type from iov_iter.
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #12819
Coleman Kane [Fri, 3 Dec 2021 03:54:05 +0000 (22:54 -0500)]
Linux 5.16: block_device_operations->submit_bio now returns void
The return type for the submit_bio member of struct
block_device_operations was changed to no longer return a value.
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #12819
Paul Dagnelie [Tue, 7 Dec 2021 01:19:13 +0000 (17:19 -0800)]
Add `const` to nvlist functions to properly expose their real behavior
Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #12728
The import_rewind_device_replaced.ksh test was never entirely reliable
because it depends on MOS data not being overwritten. The MOS data is
not protected by the snapshot so occasional failures were always
expected. However, this test is now failing reliably on all platforms
indicating something has changed in the code since the test was marked
"maybe". Convert the test to a "known" failure until the root cause
is identified and resolved.
Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12821
The upload artifact functionality in github can't handle colons in
filenames. The current code handles this for files under the most
recent set of results. With the ability to rerun failed tests, now
there can be multiple sets of results, and they all need to be
processed in the same way.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes #12815
Linux 5.13 compat: retry zvol_open() when contended
Due to a possible lock inversion the zvol open call path on Linux
needs to be able to retry in the case where the spa_namespace_lock
cannot be acquired.
For Linux 5.12 an older kernel this was accomplished by returning
-ERESTARTSYS from zvol_open() to request that blkdev_get() drop
the bdev->bd_mutex lock, reaquire it, then call the open callback
again. However, as of the 5.13 kernel this behavior was removed.
Therefore, for 5.12 and older kernels we preserved the existing
retry logic, but for 5.13 and newer kernels we retry internally in
zvol_open(). This should always succeed except in the case where
a pool's vdev are layed on zvols, in which case it may fail. To
handle this case vdev_disk_open() has been updated to retry when
opening a device when -ERESTARTSYS is returned.
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #12301
Closes #12759
With the addition of functionality to rerun failing tests, some
tests that fail only sometimes still fail often enough to degrade
the reliability of the sanity runs. Remove them from the runfile
until they reliably pass.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes #12814
Paul Dagnelie [Wed, 1 Dec 2021 17:38:53 +0000 (09:38 -0800)]
Add zfs-test facility to automatically rerun failing tests
This was a project proposed as part of the Quality theme for the
hackthon for the 2021 OpenZFS Developer Summit. The idea is to improve
the usability of the automated tests that get run when a PR is created
by having failing tests automatically rerun in order to make flaky
tests less impactful.
Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #12740
Attila Fülöp [Sun, 14 Nov 2021 17:50:49 +0000 (18:50 +0100)]
get_key_material: fix style
Reviewed-by: Felix Dörre <felix@dogcraft.de> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #12765
Harald van Dijk [Tue, 19 Oct 2021 23:32:28 +0000 (00:32 +0100)]
get_key_material: skip passphrase validation when loading keys
The restriction that an encryption key must be at least
MIN_PASSPHRASE_LEN characters long make sense when changing the
encryption key, but not when loading: as this restriction is not
enforced in the libraries, it is possible to bypass zfs change-key's
restrictions and end up with a key that becomes impossible to load with
zfs load-key, for example through pam_zfs_key.
Reviewed-by: Felix Dörre <felix@dogcraft.de> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Harald van Dijk <harald@gigawatt.nl>
Closes #12765
Attila Fülöp [Sun, 14 Nov 2021 17:08:45 +0000 (18:08 +0100)]
pam_zfs_key: tests: check if zfs load-key works on short passphrases
The pam_zfs_key pam module does not enforce a minimum password
length while changing the user password and thus the users home
dataset passphrase. To not end up with a dateset `zfs load-key`
can't load the key for, `zfs load-key` should not enforce a minimum
passphrase length. This adds a test for that.
Reviewed-by: Felix Dörre <felix@dogcraft.de> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #12765
Closes #12651
Closes #12656
Attila Fülöp [Sun, 14 Nov 2021 16:36:12 +0000 (17:36 +0100)]
pam_zfs_key: tests: clean up the generated pam service config file
Remove the generated pam service config file
`/etc/pam.d/pam_zfs_key_test` on test cleanup, since the tests
shouldn't alter system state.
While here, move the pam service config file name into a variable.
Reviewed-by: Felix Dörre <felix@dogcraft.de> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #12765
Brian Behlendorf [Tue, 30 Nov 2021 18:38:09 +0000 (10:38 -0800)]
Default to zfs_dmu_offset_next_sync=1
Strict hole reporting was previously disabled by default as a
performance optimization. However, this has lead to confusion
over the expected behavior and a variety of workarounds being
adopted by consumers of ZFS. Change the default behavior to
always report holes and force the TXG sync.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12746
- Allocate ve_search on the stack, so we avoid allocating memory for
every I/O even if the VDEV cache is disabled.
- Reduce lock scope.
- Avoid locking in vdev_cache_read() when the VDEV cache is disabled.
- Sort file names properly.
- Correct comment.
Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #12749
maxz [Tue, 30 Nov 2021 18:28:57 +0000 (19:28 +0100)]
Replace wrong occurrences of `affect` by `effect` in the man pages
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Amanakis <gamanakis@gmail.com> Signed-off-by: Max Zettlmeißl <max@zettlmeissl.de>
Closes #12784
The wins for a relatively normal workload are rather slim:
real 0.02119s/0.00985s=2.15029x
user 0.02130s/0.00346s=6.15560x
sys 0.03858s/0.00643s=6.00062x
But this is a big win on machines with a lot of datasets and expensive
forks.
For example, the gain on a VM on my work laptop with 900+ legacy-mount
Docker datasets, the original gains from the C rewrite were
only five-fold:
real 0.516s/0.102s=5.05882x
user 0.237s/0.143s=1.65734x
sys 0.287s/0.100s=2.87x
And this serial variant gains this back there as well:
real 0.102s/0.008s=12.75x
user 0.143s/0.007s=20.42857
sys 0.100s/0.001s=100x
Allan Jude [Tue, 30 Nov 2021 14:46:25 +0000 (09:46 -0500)]
Vdev Properties Feature
Add properties, similar to pool properties, to each vdev.
This makes use of the existing per-vdev ZAP that was added as
part of device evacuation/removal.
A large number of read-only properties are exposed,
many of the members of struct vdev_t, that provide useful
statistics.
Adds support for read-only "removing" vdev property.
Adds the "allocating" property that defaults to "on" and
can be set to "off" to prevent future allocations from that
top-level vdev.
Supports user-defined vdev properties.
Includes support for properties.vdev in SYSFS.
Co-authored-by: Allan Jude <allan@klarasystems.com> Co-authored-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #11711
pstef [Mon, 29 Nov 2021 18:52:42 +0000 (19:52 +0100)]
Fix typo in zpool.8
Update zpool.8 to avoid parseltongue.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Piotr P. Stefaniak <pstef@freebsd.org>
Closes #12763
Coleman Kane [Tue, 16 Nov 2021 04:23:30 +0000 (23:23 -0500)]
Linux 5.16 compat: asm/fpu/xcr.h is new location for xgetbv/xsetbv
Linux 5.16 moved these functions into this new header in commit 1b4fb8545f2b00f2844c4b7619d64d98440a477c. This change adds code to look
for the presence of this header, and include it so that the code using
xgetbv & xsetbv will compile again.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #12800
Coleman Kane [Tue, 16 Nov 2021 05:10:35 +0000 (00:10 -0500)]
Linux 5.16: wait_on_page_bit() no longer available to modules
Instead, linux/pagemap.h offers a number of folio-specific functions to
be called instead. In this case, module/os/linux/zfs/zfs_vnops_os.c
wants to call wait_on_page_bit(pp, PG_writeback). This gets replaced
with folio_wait_bit(folio_page(pp), PG_writeback). This change modifies
the code to conditionally compile that if configure identifies th
presence of the folio_wait_bit() function.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #12800
Mark Johnston [Fri, 19 Nov 2021 22:26:39 +0000 (17:26 -0500)]
Fix several bugs in the FreeBSD rename VOP implementation
- To avoid a use-after-free, zfsvfs->z_log needs to be loaded after the
teardown lock is acquired with ZFS_ENTER().
- Avoid leaking vnode locks in zfs_rename_relock() and zfs_rename_()
when the ZFS_ENTER() macros forces an early return.
Refactor the rename implementation so that ZFS_ENTER() can be used
safely. As a bonus, this lets us use the ZFS_VERIFY_ZP() macro instead
of open-coding its implementation.
Reported-by: Peter Holm <pho@FreeBSD.org> Tested-by: Peter Holm <pho@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: Mark Johnston <markj@FreeBSD.org> Sponsored-by: The FreeBSD Foundation
Closes #12717
Paul Dagnelie [Fri, 19 Nov 2021 17:02:45 +0000 (09:02 -0800)]
Add notes to system_taskq
Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #12771
The inline function vn_flush_cached_data() in vnode.h
must not be compiled when building BASE.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #12743
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #12748
George Amanakis [Thu, 11 Nov 2021 20:52:16 +0000 (21:52 +0100)]
Introduce a tunable to exclude special class buffers from L2ARC
Special allocation class or dedup vdevs may have roughly the same
performance as L2ARC vdevs. Introduce a new tunable to exclude those
buffers from being cacheable on L2ARC.
Reviewed-by: Don Brady <don.brady@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #11761
Closes #12285
наб [Thu, 11 Nov 2021 20:27:37 +0000 (21:27 +0100)]
Remove basename(1). Clean up/shorten some coreutils pipelines
Basenames that remain, in cmd/zed/zed.d/statechange-led.sh:
dev=$(basename "$(echo "$therest" | awk '{print $(NF-1)}')")
vdev=$(basename "$ZEVENT_VDEV_PATH")
I don't wanna interfere with #11988
Fedor Uporov [Thu, 11 Nov 2021 19:54:15 +0000 (11:54 -0800)]
Check l2cache vdevs pending list inside the vdev_inuse()
The l2cache device could be added twice because vdev_inuse() does not
check spa_l2cache for added devices. Make l2cache vdevs inuse checking
logic more closer to spare vdevs.
Reviewed-by: George Amanakis <gamanakis@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes #9153
Closes #12689
Fedor Uporov [Thu, 11 Nov 2021 19:26:18 +0000 (11:26 -0800)]
zhack: Add repair label option
In case if all label checksums will be invalid on any vdev, the pool
will become unimportable. The zhack with newly added cli options could
be used to restore label checksums and make pool importable again.
Palash Gandhi [Thu, 11 Nov 2021 15:46:44 +0000 (07:46 -0800)]
ZTS: zfs_list_004_neg should not check paths that belong to ZFS
When ZFS is on root, /tmp is a ZFS. This causes zfs_list_004_neg to
fail since `zfs list` on /tmp passes when the test expects it not to.
The fix is to exclude paths that belong to ZFS.
Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Palash Gandhi <pbg4930@rit.edu>
Closes #12744
Brian Behlendorf [Thu, 11 Nov 2021 00:14:32 +0000 (16:14 -0800)]
Restore dirty dnode detection logic
In addition to flushing memory mapped regions when checking holes,
commit de198f2d95 modified the dirty dnode detection logic to check
the dn->dn_dirty_records instead of the dn->dn_dirty_link. Relying
on the dirty record has not be reliable, switch back to the previous
method.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #11900
Closes #12745
Fedor Uporov [Wed, 10 Nov 2021 19:22:00 +0000 (11:22 -0800)]
zdb: Report bad label checksum
In case if all label checksums will be invalid on any vdev, the pool
will become unimportable. From other side zdb with -l option will not
provide any useful information why it happened. Add notifications
about corrupted label checksums.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes #2509
Closes #12685
Fedor Uporov [Tue, 9 Nov 2021 20:50:39 +0000 (12:50 -0800)]
Skip spacemaps reading in case of pool readonly import
The only zdb utility require to read metaslab-related data during
read-only pool import because of spacemaps validation. Add global
variable which will allow zdb read spacemaps in case of readonly
import mode.
Brian Atkinson [Tue, 9 Nov 2021 19:51:33 +0000 (12:51 -0700)]
Single IO issue for raidz writes with skip sector
In order to reduce contention on the vq_lock, optional skip sectors
for Raidz writes can be placed into a single IO request. This is done by
padding out the linear ABD for a parity column to contain the skip
sector and by creating gang ABD to contain the data and skip sector for
data columns.
The vdev_raidz_map_alloc() function now contains specific functions for
both reads and write to allocate the ABD's that will be issued down to
the VDEV chldren.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-By: Mark Maybee <mark.maybee@delphix.com> Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #12333
The submit_bio() prototype has changed again. The version is 5.16
still only expects a single argument but the return type has changed
to void. Since we never used the returned value before update the
configure check to detect both single arg versions.
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12725
Commit https://github.com/torvalds/linux/commit/2e9bc346 moved
the elevator.h header under the block/ directory as part of some
refactoring. This turns out not to be a problem since there's
no longer anything we need from the header. This has been the
case for some time, this change removes the elevator.h include
and replaces it with a major.h include.
Reviewed-by: Alexander Lobakin <alobakin@pm.me> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12725
When using lseek(2) to report data/holes memory mapped regions of
the file were ignored. This could result in incorrect results.
To handle this zfs_holey_common() was updated to asynchronously
writeback any dirty mmap(2) regions prior to reporting holes.
Additionally, while not strictly required, the dn_struct_rwlock is
now held over the dirty check to prevent the dnode structure from
changing. This ensures that a clean dnode can't be dirtied before
the data/hole is located. The range lock is now also taken to
ensure the call cannot race with zfs_write().
Furthermore, the code was refactored to provide a dnode_is_dirty()
helper function which checks the dnode for any dirty records to
determine its dirtiness.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Rich Ercolani <rincebrain@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #11900
Closes #12724
Note that Dropbear supports ed25519 keys since version 2020.79.
See https://github.com/mkj/dropbear/pull/91
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Michael Franzl <michael@franzl.name>
Closes #12715
It turns out that short-circuiting the EFAULT behavior on a short read
breaks things on FreeBSD. So until there's a nicer solution, let's
just revert the behavior for not-Linux.
Rich Ercolani [Wed, 3 Nov 2021 15:00:08 +0000 (11:00 -0400)]
Workaround issue cleaning up automounted snapshots on Linux
On Linux, sometimes, when ZFS goes to unmount an automounted snap,
it fails a VERIFY check on debug builds, because taskq_cancel_id
returned ENOENT after not finding the taskq it was trying to cancel.
This presumably happens when it already died for some reason; in this
case, we don't really mind it already being dead, since we're just
going to dispatch a new task to unmount it right after.
So we just ignore it if we get back ENOENT trying to cancel here,
retry a couple times if we get back the only other possible condition
(EBUSY), and log to dbgmsg if we got anything but ENOENT or success.
(We also add some locking around taskqid, to avoid one or two cases
of two instances of trying to cancel something at once.)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #11632
Closes #12670
Rich Ercolani [Tue, 2 Nov 2021 21:45:20 +0000 (17:45 -0400)]
Add more explicit warning about dedup being dropped
"has unsupported feature: [number]" seems reasonable when we can't
know what the problem was, but with the send -D removal, we know
what it was, and can explicitly tell people "don't do that; try
this if you must".
So let's.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12708
Paul Dagnelie [Tue, 2 Nov 2021 16:23:48 +0000 (09:23 -0700)]
Fix cpu hotplug atomic sleep issue
We move the spinlock unlock before the thread creation. This should be
safe because the thread creation code doesn't actually manipulate any
taskq data structures; that's done by the thread once it's created.
We also remove the assertion that the maxthreads is the current threads
plus one; that assertion could fail if multiple hotplug events come in
quick succession, and the first new taskq thread hasn't had a chance to
start processing yet.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
eviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #12714
Mike Swanson [Fri, 29 Oct 2021 23:59:18 +0000 (16:59 -0700)]
Disable normalization implicitly when setting "utf8only=off"
When a parent dataset has normalization set to any value other than
"none", and a file system is created with the property "utf8only=off",
implicitly also set "normalization=none" instead of overriding the
desire for a non-UTF8 enforcing file system.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Mike Swanson <mikeonthecomputer@gmail.com>
Closes #11892
Closes #12038
Mark Johnston [Thu, 28 Oct 2021 17:25:26 +0000 (13:25 -0400)]
Exit the teardown section later in rename on FreeBSD
We have to hold the teardown lock while dereferencing zfsvfs->z_os and,
I believe, when committing to the ZIL.
Note that jumping to the "out" label, "error" is always non-zero.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12704
Mark Johnston [Thu, 28 Oct 2021 15:58:57 +0000 (11:58 -0400)]
Fix potential use-after-frees in FreeBSD getpages and setattr VOPs
The objset object is reallocated during certain dataset operations, such
as rollbacks, so the objset pointer must be loaded after acquiring the
teardown lock.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12704
D. Ebdrup [Fri, 29 Oct 2021 23:30:44 +0000 (01:30 +0200)]
zfsprops.7: Add note about comma-separation
This change primarily seeks to make implicit documentation explicit, as
it is not outright stated that options should be comma-separated, nor is
there a reason given for it.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Daniel Ebdrup Jensen <debdrup@FreeBSD.org>
Closes #12579
Fedor Uporov [Fri, 29 Oct 2021 23:18:13 +0000 (16:18 -0700)]
Do not print UINT64_MAX value for some of zfs properties
The values of next properties: filesystem_limit, filesystem_count,
snapshot_limit, snapshot_count were returned to user as UINT64_MAX
integers in case if -p cli option is used, return 'none' value instead.
Rich Ercolani [Fri, 29 Oct 2021 22:55:22 +0000 (18:55 -0400)]
Add explicit error for device_rebuild being disabled
Currently, you get back "can only attach to mirrors and top-level disks"
unconditionally if zpool attach returns ENOTSUP, but that also happens
if, say, feature@device_rebuild=disabled and you tried attach -s.
So let's print an error for that case, lest people go down a rabbit hole
looking into what they did wrong.
Tony Hutter [Fri, 29 Oct 2021 22:33:34 +0000 (15:33 -0700)]
vdev_id: Fix PHY sorting
One of our developers noticed a bug in vdev_id where we were incorrectly
sorting PHYs using alphabetical sorting (which usually works) instead
of natural sorting (-v). For example:
[port-0:0]# ls -d phy*
phy-0:10 phy-0:11 phy-0:8 phy-0:9
[port-0:0]# ls -vd phy*
phy-0:8 phy-0:9 phy-0:10 phy-0:11
This fixes the issue.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #12699
Allan Jude [Tue, 26 Oct 2021 23:15:38 +0000 (19:15 -0400)]
spa.c: Replace VERIFY(nvlist_*(...) == 0) with fnvlist_* (#12678)
The fnvlist versions of the functions are fatal if they fail,
saving each call from having to include checking the result.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Signed-off-by: Allan Jude <allan@klarasystems.com>
Brian Behlendorf [Mon, 25 Oct 2021 21:13:50 +0000 (14:13 -0700)]
ZTS: Standardize use of destroy_dataset in cleanup
When cleaning up a test case standardize on using the convention:
datasetexists $ds && destroy_dataset $ds <flags>
By using 'destroy_dataset' instead of 'log_must zfs destroy' we ensure
that the destroy is retried in the event that a ZFS volume is busy.
This helps ensures ensure tests are fully cleaned up and prevents false
positive test failures on Linux.
Note that all of the tests which used 'zfs destroy' in cleanup have
been updated even if they don't use volumes. This was done to
clearly establish the expected convention.
Reviewed-by: Rich Ercolani <rincebrain@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12663
Rich Ercolani [Mon, 25 Oct 2021 17:27:05 +0000 (13:27 -0400)]
Workaround cloud-init hotplug issue
cloud-init added a hook which triggers on every device add/rm
event, which results in holding open devices for a while after
they're created/destroyed.
So let's shove an exclusion rule for that into the GH workflows
until it gets fixed.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12644
Closes #12669
Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Alan Somers <asomers@gmail.com> Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12668
Attila Fülöp [Thu, 21 Oct 2021 10:17:47 +0000 (12:17 +0200)]
pam_zfs_key: malloc and mlock/munlock won't match
mlock(2) and munlock(2) operate on memory pages whereas malloc(3)
does not. So if you munlock(2) a malloced memory region, the whole
page containing it is freed. Since this page may contain another
malloced and mlocked memory region, used as a password buffer by a
concurrent running instance of pam_zfs_key, there is a slight chance
of leaking passwords. By using mmap(2) we avoid such problems since
it will return whole pages on page aligned addresses.
Although the above concern may be mostly academical, it is still
better to use mmap(2) for allocating memory since the FreeBSD
documentation suggests to call mlock(2) and munlock(2) on page
aligned addresses, and other implementations even require it.
While here, remove duplicate code in alloc_pw_string() by calling
alloc_pw_size().
Reviewed-by: Felix Dörre <felix@dogcraft.de> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #12665