]> CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log
FreeBSD/FreeBSD.git
3 years agoLinux 5.12 compat: idmapped mounts
Coleman Kane [Sat, 20 Mar 2021 04:00:59 +0000 (00:00 -0400)]
Linux 5.12 compat: idmapped mounts

In Linux 5.12, the filesystem API was modified to support ipmapped
mounts by adding a "struct user_namespace *" parameter to a number
functions and VFS handlers. This change adds the needed autoconf
macros to detect the new interfaces and updates the code appropriately.
This change does not add support for idmapped mounts, instead it
preserves the existing behavior by passing the initial user namespace
where needed.  A subsequent commit will be required to add support
for idmapped mounted.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11712

3 years agoClean up RAIDZ/DRAID ereport code
Matthew Ahrens [Fri, 19 Mar 2021 23:22:10 +0000 (16:22 -0700)]
Clean up RAIDZ/DRAID ereport code

The RAIDZ and DRAID code is responsible for reporting checksum errors on
their child vdevs.  Checksum errors represent events where a disk
returned data or parity that should have been correct, but was not.  In
other words, these are instances of silent data corruption.  The
checksum errors show up in the vdev stats (and thus `zpool status`'s
CKSUM column), and in the event log (`zpool events`).

Note, this is in contrast with the more common "noisy" errors where a
disk goes offline, in which case ZFS knows that the disk is bad and
doesn't try to read it, or the device returns an error on the requested
read or write operation.

RAIDZ/DRAID generate checksum errors via three code paths:

1. When RAIDZ/DRAID reconstructs a damaged block, checksum errors are
reported on any children whose data was not used during the
reconstruction.  This is handled in `raidz_reconstruct()`.  This is the
most common type of RAIDZ/DRAID checksum error.

2. When RAIDZ/DRAID is not able to reconstruct a damaged block, that
means that the data has been lost.  The zio fails and an error is
returned to the consumer (e.g. the read(2) system call).  This would
happen if, for example, three different disks in a RAIDZ2 group are
silently damaged.  Since the damage is silent, it isn't possible to know
which three disks are damaged, so a checksum error is reported against
every child that returned data or parity for this read.  (For DRAID,
typically only one "group" of children is involved in each io.)  This
case is handled in `vdev_raidz_cksum_finish()`. This is the next most
common type of RAIDZ/DRAID checksum error.

3. If RAIDZ/DRAID is not able to reconstruct a damaged block (like in
case 2), but there happens to be additional copies of this block due to
"ditto blocks" (i.e. multiple DVA's in this blkptr_t), and one of those
copies is good, then RAIDZ/DRAID compares each sector of the data or
parity that it retrieved with the good data from the other DVA, and if
they differ then it reports a checksum error on this child.  This
differs from case 2 in that the checksum error is reported on only the
subset of children that actually have bad data or parity.  This case
happens very rarely, since normally only metadata has ditto blocks.  If
the silent damage is extensive, there will be many instances of case 2,
and the pool will likely be unrecoverable.

The code for handling case 3 is considerably more complicated than the
other cases, for two reasons:

1. It needs to run after the main raidz read logic has completed.  The
data RAIDZ read needs to be preserved until after the alternate DVA has
been read, which necessitates refcounts and callbacks managed by the
non-raidz-specific zio layer.

2. It's nontrivial to map the sections of data read by RAIDZ to the
correct data.  For example, the correct data does not include the parity
information, so the parity must be recalculated based on the correct
data, and then compared to the parity that was read from the RAIDZ
children.

Due to the complexity of case 3, the rareness of hitting it, and the
minimal benefit it provides above case 2, this commit removes the code
for case 3.  These types of errors will now be handled the same as case
2, i.e. the checksum error will be reported against all children that
returned data or parity.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11735

3 years agoFreeBSD: make seqc asserts conditional on replay
Mateusz Guzik [Thu, 18 Mar 2021 05:09:45 +0000 (06:09 +0100)]
FreeBSD: make seqc asserts conditional on replay

Avoids tripping on asserts when doing pool recovery.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11739

3 years agoRemove unused rr_code
Matthew Ahrens [Thu, 18 Mar 2021 04:57:09 +0000 (21:57 -0700)]
Remove unused rr_code

The `rr_code` field in `raidz_row_t` is unused.

This commit removes the field, as well as the code that's used to set
it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11736

3 years agoFreeBSD: Fix memory leaks in kstats
Ryan Moeller [Thu, 18 Mar 2021 04:55:18 +0000 (00:55 -0400)]
FreeBSD: Fix memory leaks in kstats

Don't handle (incorrectly) kmem_zalloc() failure.  With KM_SLEEP,
will never return NULL.

Free the data allocated for non-virtual kstats when deleting the object.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11767

3 years agoLinux: always check or verify return of igrab()
Adam D. Moss [Tue, 16 Mar 2021 23:33:34 +0000 (16:33 -0700)]
Linux: always check or verify return of igrab()

zhold() wraps igrab() on Linux, and igrab() may fail when the inode
is in the process of being deleted.  This means zhold() must only be
called when a reference exists and therefore it cannot be deleted.
This is the case for all existing consumers so add a VERIFY and a
comment explaining this requirement.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Adam Moss <c@yotes.com>
Closes #11704

3 years agoUpdate FreeBSD versions
Dries Michiels [Tue, 16 Mar 2021 22:03:28 +0000 (23:03 +0100)]
Update FreeBSD versions

Update supported FreeBSD versions in documentation.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Dries Michiels <driesm.michiels@gmail.com>
Closes #11718

3 years agoHold and release permissions exist
gldisater [Tue, 16 Mar 2021 22:01:21 +0000 (18:01 -0400)]
Hold and release permissions exist

The man page was missing these two permissions.
Add the missing permissions to the man page.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jeremy Faulkner <gldisater@gldis.ca>
Closes #11727

3 years agoZTS: Add tests for DOS mode attributes
Ryan Moeller [Tue, 16 Mar 2021 22:00:14 +0000 (18:00 -0400)]
ZTS: Add tests for DOS mode attributes

Create a new section of tests to run with acltype=off.

For now the only test we have is for the DOS mode READONLY attribute on
FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11734

3 years agoReference_tracking_enable should be a module param
Don Brady [Tue, 16 Mar 2021 21:56:17 +0000 (15:56 -0600)]
Reference_tracking_enable should be a module param

To make use of zfs_refcount_held tunable it should be a module
parameter in open-zfs.  Also, since the macros will auto-generate OS
specific tunables, removed the existing zfs_refcount_held reference
in module/os/freebsd/zfs/sysctl_os.c.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #11753

3 years agoZTS: Fix incorrect use of libtest in user_run by xattr_003_neg
Ryan Moeller [Mon, 9 Nov 2020 22:57:00 +0000 (17:57 -0500)]
ZTS: Fix incorrect use of libtest in user_run by xattr_003_neg

You can't use user_run to eval ksh functions defined in libtest unless
you include libtest in the user shell.

Fix xattr_003_neg by:
* include libtest in the user shell
* *then* run get_xattr
* assert this fails
* use variables for filenames so they don't change in the user's shell
* don't log the contents of /etc/passwd
* cleanup all byproducts

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11185

3 years agoZTS: Use ksh and current environment for user_run
Ryan Moeller [Thu, 11 Mar 2021 20:01:58 +0000 (15:01 -0500)]
ZTS: Use ksh and current environment for user_run

The current user_run often does not work as expected.  Commands are run
in a different shell, with a different environment, and all output is
discarded.

Simplify user_run to retain the current environment, eliminate eval,
and feed the command string into ksh.  Enhance the logging for
user_run so we can see out and err.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11185

3 years agoFreeBSD: bring back possibility to rewind the checkpoint from bootloader
Mariusz Zaborski [Sat, 13 Mar 2021 00:12:14 +0000 (01:12 +0100)]
FreeBSD: bring back possibility to rewind the checkpoint from bootloader

Add parsing of the rewind options.

When I was upstreaming the change [1], I omitted the part where we
detect that the pool should be rewind. When the FreeBSD repo has
synced with the OpenZFS, this part of the code was removed.

[1] FreeBSD repo: 277f38abffc6a8160b5044128b5b2c620fbb970c
[2] OpenZFS repo: f2c027bd6a003ec5793f8716e6189c389c60f47a

External-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254152
Originally reviewed by: tsoome, allanjude
Originally reviewed by: kevans (ok from high-level overview)
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org>
Closes #11730

3 years agoFreeBSD: Clean up zfsdev_close to match Linux
Ryan Moeller [Sat, 13 Mar 2021 00:09:15 +0000 (19:09 -0500)]
FreeBSD: Clean up zfsdev_close to match Linux

Resolve some oddities in zfsdev_close() which could result in a
panic and were not present in the equivalent function for Linux.

- Remove unused definition ZFS_MIN_MINOR
- FreeBSD: Simplify zfsdev state destruction
- Assert zs_minor is valid in zfsdev_close
- Make locking around zfsdev state match Linux

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11720

3 years agoFreeBSD: switch teardown lock to rms
Mateusz Guzik [Wed, 4 Nov 2020 22:28:56 +0000 (17:28 -0500)]
FreeBSD: switch teardown lock to rms

This deserializes otherwise non-contending operations.

The previous scheme of using 17 locks hashed by curthread runs into
conflicts very quickly. Check the pull request for sample results.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11153

3 years agoMacroify teardown lock handling
Mateusz Guzik [Wed, 4 Nov 2020 22:23:48 +0000 (17:23 -0500)]
Macroify teardown lock handling

This will allow platforms to implement it as they see fit, in particular
in a different manner than rrm locks.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11153

3 years agoFreeBSD: rename teardown inactive macros to mimick rrm convention
Mateusz Guzik [Wed, 4 Nov 2020 22:19:35 +0000 (17:19 -0500)]
FreeBSD: rename teardown inactive macros to mimick rrm convention

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11153

3 years agoFreeBSD: remove 2 assertions that teardown lock is not held
Mateusz Guzik [Thu, 12 Nov 2020 22:33:14 +0000 (17:33 -0500)]
FreeBSD: remove 2 assertions that teardown lock is not held

They are not very useful and hard to implement in the rms routine
the code is about to start using.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11153

3 years agoFreeBSD: rework asserts in zfs_dd_lookup
Mateusz Guzik [Mon, 12 Oct 2020 21:27:59 +0000 (21:27 +0000)]
FreeBSD: rework asserts in zfs_dd_lookup

1. even up ifdefs
2. drop the arguably useless teardown lock asserts -- nothing else
   checks for it

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11153

3 years agoAdd branch prediction to ZFS_ENTER and ZFS_VERIFY_ZP macros
Mateusz Guzik [Thu, 15 Oct 2020 05:45:28 +0000 (05:45 +0000)]
Add branch prediction to ZFS_ENTER and ZFS_VERIFY_ZP macros

They are expected to fail only in corner cases.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11153

3 years agozpool import cachefile improvements
George Wilson [Fri, 12 Mar 2021 23:42:27 +0000 (17:42 -0600)]
zpool import cachefile improvements

Importing a pool using the cachefile is ideal to reduce the time
required to import a pool. However, if the devices associated with
a pool in the cachefile have changed, then the import would fail.
This can easily be corrected by doing a normal import which would
then read the pool configuration from the labels.

The goal of this change is make importing using a cachefile more
resilient and auto-correcting. This is accomplished by having
the cachefile import logic automatically fallback to reading the
labels of the devices similar to a normal import. The main difference
between the fallback logic and a normal import is that the cachefile
import logic will only look at the device directories that were
originally used when the cachefile was populated. Additionally,
the fallback logic will always import by guid to ensure that only
the pools in the cachefile would be imported.

External-issue: DLPX-71980
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes #11716

3 years agoFix whitespace introduced in ecc277cff
Martin Matuška [Fri, 12 Mar 2021 03:42:04 +0000 (04:42 +0100)]
Fix whitespace introduced in ecc277cff

The manual page change in ecc277c has introduced whitespace on
line ends.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #11722

3 years agoFreeBSD: Fix scope of deadman tunables
Ryan Moeller [Fri, 12 Mar 2021 03:23:24 +0000 (22:23 -0500)]
FreeBSD: Fix scope of deadman tunables

A few deadman tunables ended up in the wrong sysctl node.

Move them to vfs.zfs.deadman.*

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11715

3 years agoMicrooptimizations for VERIFY() and friends
Adam D. Moss [Fri, 12 Mar 2021 01:16:09 +0000 (17:16 -0800)]
Microoptimizations for VERIFY() and friends

Add branch hints and constify the intermediate evaluations of
left/right params in VERIFY3*().

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Adam Moss <c@yotes.com>
Closes #11708

3 years agoAdd missing files to Makefile
Allan Jude [Fri, 12 Mar 2021 01:13:34 +0000 (20:13 -0500)]
Add missing files to Makefile

Some .h files that were added were missed in this Makefile. Since
they are .h files, their being missing only resulted in them
disappeared from the dist archive.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #11705

3 years agoCI checkstyle: pin ubuntu version
George Melikov [Fri, 12 Mar 2021 01:11:31 +0000 (04:11 +0300)]
CI checkstyle: pin ubuntu version

Our checkstyle doesn't work well on Ubuntu 20.04,
temporary pin it to 18.04.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11713

3 years agoReturn finer grain errors in libzfs unmount_one
Don Brady [Mon, 8 Mar 2021 16:46:45 +0000 (09:46 -0700)]
Return finer grain errors in libzfs unmount_one

Added errno mappings to unmount_one() in libzfs.  Changed do_unmount()
implementation to return errno errors directly like is done for
do_mount() and others.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #11681

3 years agovdev_id: Create symlinks even if no /dev/mapper/
Tony Hutter [Mon, 8 Mar 2021 16:43:30 +0000 (08:43 -0800)]
vdev_id: Create symlinks even if no /dev/mapper/

vdev_id uses the /dev/mapper/ symlinks to resolve a UUID to a dm name
(like dm-1).  However on some multipath setups, there is no /dev/mapper/
entry for the UUID at the time vdev_id is called by udev.  However,
this isn't necessarily needed, as we may be able to resolve the dm
name from the $DEVNAME that udev passes us (like DEVNAME="/dev/dm-1").

This patch tries to resolve the dm name from $DEVNAME first, before
falling back to looking in /dev/mapper/.  This fixed an issue where the
by-vdev names weren't reliably showing up on one of our nodes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #11698

3 years agoZTS events_002: Improve speed and reliability
Antonio Russo [Mon, 8 Mar 2021 16:42:45 +0000 (09:42 -0700)]
ZTS events_002: Improve speed and reliability

events_002 exercises the ZED, ensuring that it neither misses events,
nor reporting events twice.

On slow test hardware, some of the timeouts are insufficient to allow
the ZED to properly settle.  Conversely, on fast hardware these same
timeouts are too long, unnecessarily slowing the test run.

Instead of using a fixed timeout, wait for the expected final event
before returning.  Additionally, wait with a timeout for unexpected
events to avoid missing them if they show up late.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11703

3 years agozvol: call zil_replaying() during replay
Christian Schwarz [Sun, 7 Mar 2021 17:49:58 +0000 (18:49 +0100)]
zvol: call zil_replaying() during replay

zil_replaying(zil, tx) has the side-effect of informing the ZIL that an
entry has been replayed in the (still open) tx.  The ZIL uses that
information to record the replay progress in the ZIL header when that
tx's txg syncs.

ZPL log entries are not idempotent and logically dependent and thus
calling zil_replaying() is necessary for correctness.

For ZVOLs the question of correctness is more nuanced: ZVOL logs only
TX_WRITE and TX_TRUNCATE, both of which are idempotent. Logical
dependencies between two records exist only if the write or discard
request had sync semantics or if the ranges affected by the records
overlap.

Thus, at a first glance, it would be correct to restart replay from
the beginning if we crash before replay completes. But this does not
address the following scenario:
Assume one log record per LWB.
The chain on disk is

    HDR -> 1:W(1, "A") -> 2:W(1, "B") -> 3:W(2, "X") -> 4:W(3, "Z")

where N:W(O, C) represents log entry number N which is a TX_WRITE of C
to offset A.
We replay 1, 2 and 3 in one txg, sync that txg, then crash.
Bit flips corrupt 2, 3, and 4.
We come up again and restart replay from the beginning because
we did not call zil_replaying() during replay.
We replay 1 again, then interpret 2's invalid checksum as the end
of the ZIL chain and call replay done.
The replayed zvol content is "AX".

If we had called zil_replaying() the HDR would have pointed to 3
and our resumed replay would not have replayed anything because
3 was corrupted, resulting in zvol content "BX".

If 3 logically depends on 2 then the replay corrupted the ZVOL_OBJ's
contents.

This patch adds the zil_replaying() calls to the replay functions.
Since the callbacks in the replay function need the zilog_t* pointer
so that they can call zil_replaying() we open the ZIL while
replaying in zvol_create_minor(). We also verify that replay has
been done when on-demand-opening the ZIL on the first modifying
bio.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11667

3 years agoZTS: Improve cleanup in zpool tests
Ryan Moeller [Sun, 7 Mar 2021 17:41:01 +0000 (12:41 -0500)]
ZTS: Improve cleanup in zpool tests

* Restore original kern.corefile value after the test.
* Don't leave behind a frozen pool.
* Clean up leftover vdev files.
* Make zpool_002_pos and zpool_003_pos consistent in their handling of
core files while here.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11694

3 years agoClarify compressed zfs send/recv behavior
manfromafar [Sun, 7 Mar 2021 17:39:16 +0000 (10:39 -0700)]
Clarify compressed zfs send/recv behavior

Docs for send and receive do not explain behavior when sending a
compressed stream then receiving on a host that overrides compression
with -o compress=value.

The data from the send stream is written as it was from the send is
the compressed form but the compression algorithm set on the receiver
is the overridden version which causes some confusion as to what
algorithm was actually used.

Updated man docs to clarify behavior

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed By: Allan Jude <allanjude@freebsd.org>
Signed-off-by: manfromafar <manfromafar@outlook.com>
Closes #11690

3 years agoIntentionally allow ZFS_READONLY in zfs_write
Ryan Moeller [Sun, 7 Mar 2021 17:31:52 +0000 (12:31 -0500)]
Intentionally allow ZFS_READONLY in zfs_write

ZFS_READONLY represents the "DOS R/O" attribute.
When that flag is set, we should behave as if write access
were not granted by anything in the ACL.  In particular:
We _must_ allow writes after opening the file r/w, then
setting the DOS R/O attribute, and writing some more.
(Similar to how you can write after fchmod(fd, 0444).)

Restore these semantics which were lost on FreeBSD when refactoring
zfs_write.  To my knowledge Linux does not actually expose this flag,
but we'll need it to eventually so I've added the supporting checks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11693

3 years agoSuppress cppcheck invalidSyntax warninigs
Brian Behlendorf [Sat, 6 Mar 2021 01:56:35 +0000 (17:56 -0800)]
Suppress cppcheck invalidSyntax warninigs

For some reason cppcheck 1.90 is generating an invalidSyntax warning
when the BF64_SET macro is used in the zstream source.  The same
warning is not reported by cppcheck 2.3, nor is their any evident
problem with the expanded macro.  This appears to be an issue with
this version of cppcheck.  This commit annotates the source to suppress
the warning.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11700

3 years agoInitialize ZIL buffers
Brian Behlendorf [Fri, 5 Mar 2021 22:45:13 +0000 (14:45 -0800)]
Initialize ZIL buffers

When populating a ZIL destination buffer ensure it is always
zeroed before its contents are constructed.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tom Caputi <caputit1@tcnj.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11687

3 years agoFix abd_get_offset_struct() may allocate new abd
Jorgen Lundman [Fri, 5 Mar 2021 20:22:57 +0000 (05:22 +0900)]
Fix abd_get_offset_struct() may allocate new abd

Even when supplied with an abd to abd_get_offset_struct(), the call
to abd_get_offset_impl() can allocate a different abd. Ensure to
call abd_fini_struct() on the abd that is not used.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #11683

3 years agoFreeBSD module --enable-debug --enable-invariants
Ryan Moeller [Fri, 5 Mar 2021 20:16:41 +0000 (15:16 -0500)]
FreeBSD module --enable-debug --enable-invariants

Wire up the --enable-debug flag for configure to the FreeBSD module
build.  Add --enable-invariants.

The running FreeBSD kernel config is used to detect whether to enable
INVARIANTS if not explicitly specified with --enable-invariants or
--disable-invariants.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11678

3 years agozpool: use tab to intend continuation from removal status
Thomas Lamprecht [Fri, 5 Mar 2021 20:15:35 +0000 (21:15 +0100)]
zpool: use tab to intend continuation from removal status

Bring the output of the removal status in line with the other
"fields" that zpool status outputs, and thus allows an parser to
easier detect this as continuation of the 'remove:' output.

Before:
remove: Removal of vdev 0 copied 282G in 0h9m, completed on [...]
    776K memory used for removed device mappings

Now:
remove: Removal of vdev 0 copied 282G in 0h9m, completed on [...]
776K memory used for removed device mappings

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Closes #11674

3 years agoDon't bomb out when using keylocation=file://
James Wah [Wed, 3 Mar 2021 16:28:49 +0000 (03:28 +1100)]
Don't bomb out when using keylocation=file://

Avoid following the error path when the operation in fact succeeded.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: James Wah <james@laird-wah.net>
Closes #11651

3 years agolinux: zvol: avoid heap allocation for zvol_request_sync=1
Christian Schwarz [Wed, 3 Mar 2021 16:15:28 +0000 (17:15 +0100)]
linux: zvol: avoid heap allocation for zvol_request_sync=1

The spl_kmem_alloc showed up in some flamegraphs in a single-threaded
4k sync write workload at 85k IOPS on an
Intel(R) Xeon(R) Silver 4215 CPU @ 2.50GHz.
Certainly not a huge win but I believe the change is clean and
easy to maintain down the road.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11666

3 years agoAdd "zstd-fast" to help options for "compression" property
Jake Howard [Wed, 3 Mar 2021 16:14:19 +0000 (16:14 +0000)]
Add "zstd-fast" to help options for "compression" property

This value does work as expected, and is documented in the manpage.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jake Howard <git@theorangeone.net>
Closes #11670

3 years agoCancel TRIM / initialize on FAULTED non-writeable vdevs
nssrikanth [Tue, 2 Mar 2021 18:27:27 +0000 (23:57 +0530)]
Cancel TRIM / initialize on FAULTED non-writeable vdevs

When a device which is actively trimming or initializing becomes
FAULTED, and therefore no longer writable, cancel the active
TRIM or initialization.  When the device is merely taken offline
with `zpool offline` then stop the operation but do not cancel it.
When the device is brought back online the operation will be
resumed if possible.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Vipin Kumar Verma <vipin.verma@hpe.com>
Signed-off-by: Srikanth N S <srikanth.nagasubbaraoseetharaman@hpe.com>
Closes #11588

3 years agoFix assert in FreeBSD-specific dmu_read_pages
Andriy Gapon [Sun, 28 Feb 2021 01:23:09 +0000 (03:23 +0200)]
Fix assert in FreeBSD-specific dmu_read_pages

The function has three similar pieces of code: for read-behind pages,
requested pages and read-ahead pages.  All three pieces had an
assert to ensure that the page is not mapped.  Later the assert was
relaxed to require that the page is not mapped for writing.  But that
was done in two places out of three.  This change fixes the third piece,
read-ahead.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes #11654

3 years agoZTS: zpool_trim_start_and_cancel_pos.ksh
Brian Behlendorf [Sun, 28 Feb 2021 01:19:50 +0000 (17:19 -0800)]
ZTS: zpool_trim_start_and_cancel_pos.ksh

Several of the TRIM tests were based of the initialize tests and
then adapted for TRIM.  The zpool_trim_start_and_cancel_pos.ksh
test was intended to be one such test but it was overlooked and
actually never adapted.  Update it accordingly.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11649

3 years agoAdd missing checks for unsupported features
Martin Matuška [Sun, 28 Feb 2021 01:16:02 +0000 (02:16 +0100)]
Add missing checks for unsupported features

After 35ec517 it has become possible to import ZFS pools witn an
active org.illumos:edonr feature on FreeBSD, leading to a panic.

In addition, "zpool status" reported all pools without edonr
as upgradable and "zpool upgrade -v" reported edonr in the list
of upgradable features.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #11653

3 years agoLinux 5.12 compat: replace bio_*_io_acct with disk_*_io_acct
Coleman Kane [Tue, 23 Feb 2021 02:18:41 +0000 (21:18 -0500)]
Linux 5.12 compat: replace bio_*_io_acct with disk_*_io_acct

The bio_*_acct functions became GPL exports, which causes the
kernel modules to refuse to compile. This replaces code with
alternate function calls to the disk_*_io_acct interfaces, which
are not GPL exports. This change was added in kernel commit
99dfc43ecbf67f12a06512918aaba61d55863efc.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11639

3 years agoLinux 5.12 compat: bio->bi_disk member moved
Coleman Kane [Tue, 23 Feb 2021 02:07:51 +0000 (21:07 -0500)]
Linux 5.12 compat: bio->bi_disk member moved

The struct bio member bi_disk was moved underneath a new member named
bi_bdev. So all attempts to reference bio->bi_disk need to now become
bio->bi_bdev->bd_disk.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11639

3 years agoFix vdev_rebuild_thread deadlock
Brian Behlendorf [Wed, 24 Feb 2021 18:01:00 +0000 (10:01 -0800)]
Fix vdev_rebuild_thread deadlock

The metaslab_disable() call may block waiting for a txg sync.
Therefore it's important that vdev_rebuild_thread release the
SCL_CONFIG read lock it is holding before this call.  Failure
to do so can result in the txg_sync thread getting blocked
waiting for this lock which results in a deadlock.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewd-by: Srikanth N S <srikanth.nagasubbaraoseetharaman@hpe.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11647

3 years agoFix overly broad locking in spa_vdev_config_exit()
Brian Behlendorf [Wed, 24 Feb 2021 18:00:21 +0000 (10:00 -0800)]
Fix overly broad locking in spa_vdev_config_exit()

Calling vdev_free() only requires the we acquire the spa config
SCL_STATE_ALL locks, not the SCL_ALL locks.  In particular, we need
need to avoid taking the SCL_CONFIG lock (included in SCL_ALL) as a
writer since this can lead to a deadlock.  The txg_sync_thread() may
block in spa_txg_history_init_io() when taking the SCL_CONFIG lock
as a reading when it detects there's a pending writer.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11585

3 years agovdev_id: Fix partition regular expression
Tony Hutter [Wed, 24 Feb 2021 17:58:46 +0000 (09:58 -0800)]
vdev_id: Fix partition regular expression

Given a DM device name, the old vdev_id script would extract any text
after a 'p' as the partition number.  It then appends "-part" + the
partition number to the name, giving a by-vdev name like "L0-part5".

This works fine if the DM name is like 'dm-2p5', but doesn't work if
the DM name is a multipath name like "mpatha".  In those cases it
incorrectly matches the 'p' in "mpatha", giving by-vdev names like
"L0-partatha".

This patch fixes the issue by making the partition regex match stricter.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #11637

3 years agoLinux: increase max nvlist_src size
Brian Behlendorf [Wed, 24 Feb 2021 17:57:18 +0000 (09:57 -0800)]
Linux: increase max nvlist_src size

On Linux increase the maximum allowed size of the src nvlist which
can be passed to the /dev/zfs ioctl.  Originally, this was set
to a maximum of KMALLOC_MAX_SIZE (4M) because it was kmalloc'd.
Since that time it's been converted to a vmalloc so that's no
longer a hard limit, and it's desirable for `zfs send/recv` to
allow larger nvlists so more snapshots can be sent at once.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6572
Closes #11638

3 years agoAdd upper bound for slop space calculation
Prakash Surya [Wed, 24 Feb 2021 17:52:43 +0000 (09:52 -0800)]
Add upper bound for slop space calculation

This change modifies the behavior of how we determine how much slop
space to use in the pool, such that now it has an upper limit. The
default upper limit is 128G, but is configurable via a tunable.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Prakash Surya <prakash.surya@delphix.com>
Closes #11023

3 years agoWrap bare EINVAL returns with SET_ERROR
Ryan Moeller [Wed, 24 Feb 2021 17:51:10 +0000 (12:51 -0500)]
Wrap bare EINVAL returns with SET_ERROR

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11636

3 years agoForce symlink creation for zpool.d compat links
Ryan Moeller [Wed, 24 Feb 2021 17:49:59 +0000 (12:49 -0500)]
Force symlink creation for zpool.d compat links

gmake install fails when zpool.d compat links already exist.

Force the symlinks to be recreated if already present.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11633

3 years agosend_iterate_snap : doall send without fromsnap
Cedric Maunoury [Wed, 24 Feb 2021 17:48:58 +0000 (18:48 +0100)]
send_iterate_snap : doall send without fromsnap

The behavior of a NULL fromsnap was inadvertently changed for a doall
send when the send/recv logic in libzfs was updated.  Restore the
previous behavior by correcting send_iterate_snap() to include all
the snapshots in the nvlist for this case.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Cedric Maunoury <cedric.maunoury@gmail.com>
Closes #11608

3 years agoFix error message when zfs module are already unloaded
Adam D. Moss [Sun, 21 Feb 2021 04:23:10 +0000 (20:23 -0800)]
Fix error message when zfs module are already unloaded

Using zfs-sh -u on linux will fail with inaccurate message when the
zfs modules are already unloaded.  Deal with the case where a module
is already unloaded; its USE_COUNT will be the empty string

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Adam Moss <c@yotes.com>
Closes #11627

3 years agovdev_ops: don't try to call vdev_op_hold or vdev_op_rele when NULL
fbynite [Sun, 21 Feb 2021 04:19:20 +0000 (19:19 -0900)]
vdev_ops: don't try to call vdev_op_hold or vdev_op_rele when NULL

This prevents a panic after a SLOG add/removal on the root pool followed
by a zpool scrub.

When a SLOG is removed, a hole takes its place - the vdev_ops for a hole
is vdev_hole_ops, which defines the handler functions of vdev_op_hold
and vdev_op_rele as NULL.

This bug has been reported in illumos and FreeBSD, a different trigger
in the FreeBSD report though.

Credit for this patch goes to Patrick Mooney <pmooney@pfmooney.com>

Obtained from: illumos-gate commit: c65bd18728f34725
External-issue: https://www.illumos.org/issues/12981
External-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252396
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Wing <rob.fx907@gmail.com>
Closes #11623

3 years agoBetter zfs_get_enclosure_sysfs_path() enclosure support
Tony Hutter [Sun, 21 Feb 2021 04:17:45 +0000 (20:17 -0800)]
Better zfs_get_enclosure_sysfs_path() enclosure support

A multpathed disk will have several 'underlying' paths to the disk.  For
example, multipath disk 'dm-0' may be made up of paths:
/dev/{sda,sdb,sdc,sdd}.  On many enclosures those underlying sysfs
paths will have a symlink back to their enclosure device entry
(like 'enclosure_device0/slot1').  This is used by the
statechange-led.sh script to set/clear the fault LED for a disk, and
by 'zpool status -c'.

However, on some enclosures, those underlying paths may not all have
symlinks back to the enclosure device.  Maybe only two out of four
of them might.

This patch updates zfs_get_enclosure_sysfs_path() to favor returning
paths that have symlinks back to their enclosure devices, rather
than just returning the first path.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #11617

3 years agoCleaning up uio headers
Brian Atkinson [Sun, 21 Feb 2021 04:16:50 +0000 (21:16 -0700)]
Cleaning up uio headers

Making uio_impl.h the common header interface between Linux and FreeBSD
so both OS's can share a common header file. This also helps reduce code
duplication for zfs_uio_t for each OS.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #11622

3 years agoztest: propagate -o to the zdb child process
Christian Schwarz [Thu, 18 Feb 2021 11:20:09 +0000 (12:20 +0100)]
ztest: propagate -o to the zdb child process

I think this is the behavior that most users expect.

Future work: have a separate flag, e.g., -O, to specify separate
set_global_vars for the zdb child than for the ztest children.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11602

3 years agoztest: fix -o by calling set_global_var in child processes
Christian Schwarz [Tue, 16 Feb 2021 10:14:44 +0000 (11:14 +0100)]
ztest: fix -o by calling set_global_var in child processes

Without set_global_var() in the child processes the -o option provides
little use.

Before this change set_global_var() was called as a side-effect of
getopt processing which only happens for the parent ztest process.

This change limits the set of options that can be set and makes them
available to the child through ztest_shared_opts_t.

Future work: support arbitrary option count and length.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11602

3 years agolibzpool: set_global_var: refactor to not modify 'arg'
Christian Schwarz [Tue, 16 Feb 2021 11:27:48 +0000 (12:27 +0100)]
libzpool: set_global_var: refactor to not modify 'arg'

Also fixes leak of the dlopen handle in the error case.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11602

3 years agolibzpool: set_global_var: fix endianness handling (fixes zdb -o )
Christian Schwarz [Mon, 15 Feb 2021 12:02:32 +0000 (13:02 +0100)]
libzpool: set_global_var: fix endianness handling (fixes zdb -o )

Without this patch I get the error

  Setting global variables is only supported on little-endian systems

when using `zdb -o` on my amd64 machine.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11602

3 years agoRestore FreeBSD resource usage accounting
Ryan Moeller [Sat, 20 Feb 2021 06:34:33 +0000 (01:34 -0500)]
Restore FreeBSD resource usage accounting

Add zfs_racct_* interfaces for platform-dependent read/write accounting.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11613

3 years agoChecksum errors may not be counted
Don Brady [Sat, 20 Feb 2021 06:33:15 +0000 (23:33 -0700)]
Checksum errors may not be counted

Fix regression seen in issue #11545 where checksum errors
where not being counted or showing up in a zpool event.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #11609

3 years agoFreeBSD: disable the use of hardware crypto offload drivers for now
Mark Johnston [Thu, 18 Feb 2021 23:51:20 +0000 (18:51 -0500)]
FreeBSD: disable the use of hardware crypto offload drivers for now

First, the crypto request completion handler contains a bug in that it
fails to reset fs_done correctly after the request is completed.  This
is only a problem for asynchronous drivers.  Second, some hardware
drivers have input constraints which ZFS does not satisfy.  For
instance, ccp(4) apparently requires the AAD length for AES-GCM to be a
multiple of the cipher block size, and with qat(4) the AES-GCM AAD
length may not be longer than 240 bytes.  FreeBSD's generic crypto
framework doesn't have a mechanism to automatically fall back to a
software implementation if a hardware driver cannot process a request,
and ZFS does not tolerate such errors.

The plan is to implement such a fallback mechanism, but with FreeBSD
13.0 approaching we should simply disable the use hardware drivers for
now.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #11612

3 years agoFix report_mount_progress never calling set_progress_header
Andriy Gapon [Thu, 18 Feb 2021 21:53:05 +0000 (23:53 +0200)]
Fix report_mount_progress never calling set_progress_header

That happens because of an off-by-one mistake.
share_mount_one_cb() calls report_mount_progress(current=sm_done) after
having incremented sm_done by one.  Then report_mount_progress()
increments the parameter again.  It appears that that logic became
obsolete after commit a10d50f999511, parallel zfs mount.

On FreeBSD I observe that zfs mount -a -v prints, for example,
    (null): (201/248)
That happens because set_progress_header() is never called.

With this change the output becomes correct:
    Mounting ZFS filesystems: (209/248)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes #11607

3 years agoRemove unused abd_alloc_scatter_offset_chunkcnt
Ryan Libby [Thu, 18 Feb 2021 05:39:13 +0000 (21:39 -0800)]
Remove unused abd_alloc_scatter_offset_chunkcnt

Remove function that become unused after refactoring in
e2af2acce3436acdb2b35fdc7c9de1a30ea85514.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Libby <rlibby@FreeBSD.org>
Closes #11614

3 years agoAdd "compatibility" property for zpool feature sets
Colm [Thu, 18 Feb 2021 05:30:45 +0000 (05:30 +0000)]
Add "compatibility" property for zpool feature sets

Property to allow sets of features to be specified; for compatibility
with specific versions / releases / external systems. Influences
the behavior of 'zpool upgrade' and 'zpool create'. Initial man
page changes and test cases included.

Brief synopsis:

zpool create -o compatibility=off|legacy|file[,file...] pool vdev...

compatibility = off : disable compatibility mode (enable all features)
compatibility = legacy : request that no features be enabled
compatibility = file[,file...] : read features from specified files.
Only features present in *all* files will be enabled on the
resulting pool. Filenames may be absolute, or relative to
/etc/zfs/compatibility.d or /usr/share/zfs/compatibility.d (/etc
checked first).

Only affects zpool create, zpool upgrade and zpool status.

ABI changes in libzfs:

* New function "zpool_load_compat" to load and parse compat sets.
* Add "zpool_compat_status_t" typedef for compatibility parse status.
* Add ZPOOL_PROP_COMPATIBILITY to the pool properties enum
* Add ZPOOL_STATUS_COMPATIBILITY_ERR to the pool status enum

An initial set of base compatibility sets are included in
cmd/zpool/compatibility.d, and the Makefile for cmd/zpool is
modified to install these in $pkgdatadir/compatibility.d and to
create symbolic links to a reasonable set of aliases.

Reviewed-by: ericloewe
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Colm Buckley <colm@tuatha.org>
Closes #11468

3 years agoFreeBSD: disable edonr in zfs_mod_supported_feature()
Brian Behlendorf [Wed, 17 Feb 2021 16:14:51 +0000 (08:14 -0800)]
FreeBSD: disable edonr in zfs_mod_supported_feature()

Rather than conditionally compiling out the edonr code for FreeBSD
update zfs_mod_supported_feature() to indicate this feature is
unsupported.  This ensures that all spa features are defined on
every platform, even if they are not supported.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11605
Issue #11468

3 years agoSupport uClibc for the tests compilations
José Luis Salvador Rufo [Wed, 17 Feb 2021 05:51:46 +0000 (06:51 +0100)]
Support uClibc for the tests compilations

There are two issues that don't allow ZFS to be compiled using uClibc.
`backtrace()`, and `program_invocation_short_name` as a `const`.
This patch adds uClibc to the conditionals in the same way there are
already for Glibc for `backtrace()`; and removes the external param
`program_invocation_short_name` because its only used here for the
whole project.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: José Luis Salvador Rufo <salvador.joseluis@gmail.com>
Closes #11600

3 years agoMake inline ABD predicates compatible with C++
Ryan Moeller [Mon, 15 Feb 2021 18:15:50 +0000 (13:15 -0500)]
Make inline ABD predicates compatible with C++

FreeBSD's zfsd fails to build after e2af2acce3 due to strict type
checking errors from the implicit conversion between bool and boolean_t
in the inline predicate definitions in abd.h.

Use conditionals to return the correct value type from these functions.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #11592

3 years agoLinux 5.11 compat: META
Brian Behlendorf [Wed, 10 Feb 2021 18:11:21 +0000 (10:11 -0800)]
Linux 5.11 compat: META

Increase the Linux-Maximum version in the META file to 5.11.
All of the required compatibility patches have been merged.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11586

3 years agovdev_id: Support daisy-chained JBODs in multipath mode
Arshad Hussain [Tue, 9 Feb 2021 21:04:09 +0000 (02:34 +0530)]
vdev_id: Support daisy-chained JBODs in multipath mode

Within function sas_handler() userspace commands like
'/usr/sbin/multipath' have been replaced with sourcing
device details from within sysfs which reduced a
significant amount of overhead and processing time.
Multiple JBOD enclosures and their order are sourced
from the bsg driver (/sys/class/enclosure) to isolate
chassis top-level expanders, which are then dynamically
indexed based on host channel of the multipath subordinate
disk member device being processed. Additionally added a
"mixed" mode for slot identification for environments where
a ZFS server system may contain SAS disk slots where there
is no expander (direct connect to HBA) while an attached
external JBOD with an expander have different slot identifier
methods.

How Has This Been Tested?
~~~~~~~~~~~~~~~~~~~~~~~~~

Testing was performed on a AMD EPYC based dual-server
high-availability multipath environment with multiple
HBAs per ZFS server and four SAS JBODs. The two primary
JBODs were multipath/cross-connected between the two
ZFS-HA servers. The secondary JBODs were daisy-chained
off of the primary JBODs using aligned SAS expander
channels (JBOD-0 expanderA--->JBOD-1 expanderA,
          JBOD-0 expanderB--->JBOD-1 expanderB, etc).
Pools were created, exported and re-imported, imported
globally with 'zpool import -a -d /dev/disk/by-vdev'.
Low level udev debug outputs were traced to isolate
and resolve errors.

Result:
~~~~~~~

Initial testing of a previous version of this change
showed how reliance on userspace utilities like
'/usr/sbin/multipath' and '/usr/bin/lsscsi' were
exacerbated by increasing numbers of disks and JBODs.
With four 60-disk SAS JBODs and 240 disks the time to
process a udevadm trigger was 3 minutes 30 seconds
during which nearly all CPU cores were above 80%
utilization. By switching reliance on userspace
utilities to sysfs in this version, the udevadm
trigger processing time was reduced to 12.2 seconds
and negligible CPU load.

This patch also fixes few shellcheck complains.

Reviewed-by: Gabriel A. Devenyi <gdevenyi@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Jeff Johnson <jeff.johnson@aeoncomputing.com>
Signed-off-by: Jeff Johnson <jeff.johnson@aeoncomputing.com>
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Closes #11526

3 years agoRename zfs_inode_update to zfs_znode_update_vfs
khng300 [Tue, 9 Feb 2021 19:17:29 +0000 (03:17 +0800)]
Rename zfs_inode_update to zfs_znode_update_vfs

zfs_znode_update_vfs is a more platform-agnostic name than
zfs_inode_update. Besides that, the function's prototype is moved to
include/sys/zfs_znode.h as the function is also used in common code.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ka Ho Ng <khng300@gmail.com>
Sponsored by: The FreeBSD Foundation
Closes #11580

3 years agoAdd an assert to clarify code
Kleber Tarcísio [Tue, 9 Feb 2021 19:14:59 +0000 (16:14 -0300)]
Add an assert to clarify code

The first time through the loop prevdb and prevhdl are NULL.  They
are then both set, but only prevdb is checked.  Add an ASSERT to
make it clear that prevhdl must be set when prevdb is.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kleber <klebertarcisio@yahoo.com.br>
Closes #10754
Closes #11575

3 years agoSet file mode during zfs_write
Antonio Russo [Mon, 8 Feb 2021 17:15:05 +0000 (10:15 -0700)]
Set file mode during zfs_write

3d40b65 refactored zfs_vnops.c, which shared much code verbatim between
Linux and BSD.  After a successful write, the suid/sgid bits are reset,
and the mode to be written is stored in newmode.  On Linux, this was
propagated to both the in-memory inode and znode, which is then updated
with sa_update.

3d40b65 accidentally removed the initialization of newmode, which
happened to occur on the same line as the inode update (which has been
moved out of the function).

The uninitialized newmode can be saved to disk, leading to a crash on
stat() of that file, in addition to a merely incorrect file mode.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11474
Closes #11576

3 years agozfs-import-{cache,scan}: change condition to FileNotEmpty
наб [Fri, 5 Feb 2021 19:25:22 +0000 (20:25 +0100)]
zfs-import-{cache,scan}: change condition to FileNotEmpty

When all pools are exported ZFS will generate an empty cache file.
This will cause the import service to fail, which is sub-optimal,
since this means that dracut fails, and it necessary to run
`zpool import -a` to boot, delete the file, and regenerate+reinstall
the initrd.

This resolves the issue by treating an zero-length cache files the
same as a missing cache file.  This aligns the behavior with that
of the `zpool` command itself.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11568

3 years agoFixed issue with processing of EC_dev_remove event
nssrikanth [Fri, 5 Feb 2021 16:30:50 +0000 (22:00 +0530)]
Fixed issue with processing of EC_dev_remove event

The pool guid and vdev guid received by zfs_agent_post_event(),
which calls zfs_retire_recv(), are normally non-zero.  However,
later in this same method they may be unconditionally reset to
zero by the code which is intended to handle  multipath, spare
and l2arc vdevs.  This will result in the EC_dev_remove not
being handled.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>\
Co-authored-by: Vipin Kumar Verma <vipin.verma@hpe.com>
Signed-off-by: Srikanth N S <srikanth.nagasubbaraoseetharaman@hpe.com>
Closes #11564

3 years agozfs-list.8: clarify listing snapshots
Brian Behlendorf [Thu, 4 Feb 2021 17:56:28 +0000 (09:56 -0800)]
zfs-list.8: clarify listing snapshots

Clarify how to include snapshots in the `zpool list` output by
referencing the full name of the `listsnapshots` pool property,
and the `zpool list -t snapshot` option.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11562
Closes #11565

3 years agoDocument monotonicity of dmu_tx_assign() and txg_hold_open()
Christian Schwarz [Mon, 25 Jan 2021 12:13:45 +0000 (13:13 +0100)]
Document monotonicity of dmu_tx_assign() and txg_hold_open()

Expand the comments to make it clear exactly what is guaranteed
by dmu_tx_assign() and txg_hold_open().  Additionally, update
the comment which refers to txg_exit() when it should reference
txg_rele_to_sync().

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11521

3 years agozts-report.py: ignore some skipped tests in Github CI
George Melikov [Wed, 27 Jan 2021 12:18:01 +0000 (15:18 +0300)]
zts-report.py: ignore some skipped tests in Github CI

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11554

3 years agoCI: add ubuntu-* functional tests runner
George Melikov [Tue, 26 Jan 2021 12:01:44 +0000 (15:01 +0300)]
CI: add ubuntu-* functional tests runner

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11554

3 years agoCI: rename zfs-tests workflow
George Melikov [Tue, 26 Jan 2021 12:01:19 +0000 (15:01 +0300)]
CI: rename zfs-tests workflow

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11554

3 years agoRemove unused iov_iter_init_compat() wrapper
Brian Behlendorf [Sat, 30 Jan 2021 18:06:14 +0000 (10:06 -0800)]
Remove unused iov_iter_init_compat() wrapper

This compatibility code is no longer needed.  For it a while
iov_iter_init_compat() was used by zfs_uio_prefaultpages() but
this code should have been dropped as part of commit 83b91ae1.
Take care of that oversight and remove it.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11543

3 years agoThe abd child/parent relationship does not need to be tracked
Matthew Ahrens [Sat, 30 Jan 2021 18:04:42 +0000 (10:04 -0800)]
The abd child/parent relationship does not need to be tracked

ABD's currently track their parent/child relationship.  This applies to
`abd_get_offset()` and `abd_borrow_buf()`.  However, nothing depends on
knowing this relationship, it's only used for consistency checks to
verify that we are not destroying an ABD that's still in use.  When we
are creating/destroying ABD's frequently, the performance impact of
maintaining these data structures (in particular the atomic
increment/decrement operations) can be measurable.

This commit removes this verification code on production builds, but
keeps it when ZFS_DEBUG is set.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11535

3 years agoAdded extra check to replace Faulted VDEV with Distributed Spare
nssrikanth [Fri, 29 Jan 2021 01:00:26 +0000 (06:30 +0530)]
Added extra check to replace Faulted VDEV with Distributed Spare

In ZED zfs_retire agent added a check to handle Distributed Spare
replacement for Faulted VDEV also.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Vipin Kumar Verma <vipin.verma@hpe.com>
Signed-off-by: Mark Maybee <mark.maybee@hpe.com>
Closes #11354
Closes #11355

3 years agoFixing gang ABD when adding another gang
Brian Atkinson [Fri, 29 Jan 2021 00:54:12 +0000 (17:54 -0700)]
Fixing gang ABD when adding another gang

I originally applied a fix in #11539 to fix a parent's child references
when a gang ABD is free'd. However, I did not take into account
abd_gang_add_gang(). We still need to make sure to update the child
references in this function as well. In order to resolve this I removed
decreasing the gang ABD's size in abd_free_gang() as well as moved back
the original placeent of zfs_refcount_remove_many() in abd_free().

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #11542

3 years agoZTS: add userspace_send_encrypted.ksh to Makefile
George Melikov [Thu, 28 Jan 2021 21:39:38 +0000 (00:39 +0300)]
ZTS: add userspace_send_encrypted.ksh to Makefile

All tests need to be included in the Makefiles.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11541

3 years agofix abd_nr_pages_off for gang abd
Matthew Ahrens [Thu, 28 Jan 2021 17:28:20 +0000 (09:28 -0800)]
fix abd_nr_pages_off for gang abd

`__vdev_disk_physio()` uses `abd_nr_pages_off()` to allocate a bio with
a sufficient number of iovec's to process this zio (i.e.
`nr_iovecs`/`bi_max_vecs`).  If there are not enough iovec's in the bio,
then additional bio's will be allocated.  However, this is a sub-optimal
code path.  In particular, it requires several abd calls (to
`abd_nr_pages_off()` and `abd_bio_map_off()`) which will have to walk
the constituents of the ABD (the pages or the gang children) because
they are looking for offsets > 0.

For gang ABD's, `abd_nr_pages_off()` returns the number of iovec's
needed for the first constituent, rather than the sum of all
constituents (within the requested range).  This always under-estimates
the required number of iovec's, which causes us to always need several
bio's.  The end result is that `__vdev_disk_physio()` is usually O(n^2)
for gang ABD's (and occasionally O(n^3), when more than 16 bio's are
needed).

This commit fixes `abd_nr_pages_off()`'s handling of gang ABD's, to
correctly determine how many iovec's are needed, by adding up the number
of iovec's for each of the gang children in the requested range.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11536

3 years agoAvoid updating the L2ARC device header unnecessarily
George Amanakis [Thu, 28 Jan 2021 17:20:03 +0000 (18:20 +0100)]
Avoid updating the L2ARC device header unnecessarily

If we do not write any buffers to the cache device and the evict hand
has not advanced do not update the cache device header.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #11522
Closes #11537

3 years agoRemoving ABD Parent Child Reference Before Freeing ABD
Brian Atkinson [Thu, 28 Jan 2021 17:15:17 +0000 (10:15 -0700)]
Removing ABD Parent Child Reference Before Freeing ABD

Moving the call to zfs_refcount_remove_many() in abd_free() to be called
before any of the ABD free variants are called. This is necessary
because abd_free_gang() adjusts the abd_size for the gang ABD. If the
parent's child references are removed after free'ing the gang ABD the
refcount is not adjusted correctly for the parent's children.

I also removed some stray abd_put() in comments and changed
abd_free_gang_abd() -> abd_free_gang().

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #11539

3 years agoAdd zdb -r <dataset> <object-id | file> <output>
Allan Jude [Thu, 28 Jan 2021 05:36:01 +0000 (00:36 -0500)]
Add zdb -r <dataset> <object-id | file> <output>

While you can use zdb -R poolname vdev:offset:[<lsize>/]<psize>[:flags]
to extract individual DVAs from a vdev, it would be handy for be able
copy an entire file out of the pool.

Given a file or object number, add support to copy the contents to a
file. Useful for debugging and recovery.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #11027

3 years agoRevert special case code from pre-hashtable nvlist era
Mark Maybee [Thu, 28 Jan 2021 05:31:51 +0000 (22:31 -0700)]
Revert special case code from pre-hashtable nvlist era

Before a hash table was added on top of the nvlist code, there were
cases where the nvlist allocation was changed from fnvlist_alloc()
to nvlist_alloc() to avoid expensive NV_UNIQUE_NAME checks. Now
this is no longer necessary. These changes should be reverted to be
consistent with other code. There are some cases where this change
will also reduce the number of iterations.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mark Maybee <mark.maybee@delphix.com>
Closes #11464

3 years agoFix zrele race in zrele_async that can cause hang
Paul Dagnelie [Thu, 28 Jan 2021 05:29:58 +0000 (21:29 -0800)]
Fix zrele race in zrele_async that can cause hang

There is a race condition in zfs_zrele_async when we are checking if
we would be the one to evict an inode. This can lead to a txg sync
deadlock.

Instead of calling into iput directly, we attempt to perform the atomic
decrement ourselves, unless that would set the i_count value to zero.
In that case, we dispatch a call to iput to run later, to prevent a
deadlock from occurring.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #11527
Closes #11530

3 years agoZTS: pool_state test check for pool existence in cleanup
George Melikov [Thu, 28 Jan 2021 01:33:30 +0000 (04:33 +0300)]
ZTS: pool_state test check for pool existence in cleanup

If there is no scsi_debug module, then this test
must be skipped, in this case cleanup routine should
be prepared for absent pool.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11534

3 years agoFix a resource leak in uu_avl_pool_destroy
Alan Somers [Wed, 27 Jan 2021 03:39:28 +0000 (20:39 -0700)]
Fix a resource leak in uu_avl_pool_destroy

Need to destroy the pthread mutex created in uu_avl_pool_create.

https://svnweb.freebsd.org/base?view=revision&revision=262912

Obtained from: FreeBSD
Sponsored by: Spectra Logic Corporation
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11528

3 years agoParallelize vdev_validate
Alan Somers [Tue, 12 Jan 2021 22:25:52 +0000 (15:25 -0700)]
Parallelize vdev_validate

The runtime of vdev_validate is dominated by the disk accesses in
vdev_label_read_config.  Speed it up by validating all vdevs in
parallel using a taskq.

Sponsored by: Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11470

3 years agoRead all disk labels concurrently in vdev_label_read_config
Alan Somers [Tue, 12 Jan 2021 21:59:56 +0000 (14:59 -0700)]
Read all disk labels concurrently in vdev_label_read_config

This is similar to what we already do in vdev_geom_read_config.

Sponsored by: Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11470

3 years agoParallelize vdev_load
Alan Somers [Tue, 12 Jan 2021 00:00:19 +0000 (17:00 -0700)]
Parallelize vdev_load

metaslab_init is the slowest part of importing a mature pool, and it
must be repeated hundreds of times for each top-level vdev.  But its
speed is dominated by a few serialized disk accesses.  That can lead to
import times of > 1 hour for pools with many top-level vdevs on spinny
disks.

Speed up the import by using a taskqueue to parallelize vdev_load across
all top-level vdevs.

This also requires adding mutex protection to
metaslab_class_t.mc_historgram.  The mc_histogram fields were
unprotected when that code was first written in "Illumos 4976-4984 -
metaslab improvements" (OpenZFS
f3a7f6610f2df0217ba3b99099019417a954b673).  The lock wasn't added until
3dfb57a35e8cbaa7c424611235d669f3c575ada1, though it's unclear exactly
which fields it's supposed to protect.  In any case, it wasn't until
vdev_load was parallelized that any code attempted concurrent access to
those fields.

Sponsored by: Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11470