]> CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log
FreeBSD/FreeBSD.git
2 years agoLinux 5.16 compat: asm/fpu/xcr.h is new location for xgetbv/xsetbv
Coleman Kane [Tue, 16 Nov 2021 04:23:30 +0000 (23:23 -0500)]
Linux 5.16 compat: asm/fpu/xcr.h is new location for xgetbv/xsetbv

Linux 5.16 moved these functions into this new header in commit
1b4fb8545f2b00f2844c4b7619d64d98440a477c. This change adds code to look
for the presence of this header, and include it so that the code using
xgetbv & xsetbv will compile again.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #12800

2 years agoLinux 5.16: wait_on_page_bit() no longer available to modules
Coleman Kane [Tue, 16 Nov 2021 05:10:35 +0000 (00:10 -0500)]
Linux 5.16: wait_on_page_bit() no longer available to modules

Instead, linux/pagemap.h offers a number of folio-specific functions to
be called instead. In this case, module/os/linux/zfs/zfs_vnops_os.c
wants to call wait_on_page_bit(pp, PG_writeback). This gets replaced
with folio_wait_bit(folio_page(pp), PG_writeback). This change modifies
the code to conditionally compile that if configure identifies th
presence of the folio_wait_bit() function.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #12800

2 years agoFix several bugs in the FreeBSD rename VOP implementation
Mark Johnston [Fri, 19 Nov 2021 22:26:39 +0000 (17:26 -0500)]
Fix several bugs in the FreeBSD rename VOP implementation

- To avoid a use-after-free, zfsvfs->z_log needs to be loaded after the
  teardown lock is acquired with ZFS_ENTER().
- Avoid leaking vnode locks in zfs_rename_relock() and zfs_rename_()
  when the ZFS_ENTER() macros forces an early return.

Refactor the rename implementation so that ZFS_ENTER() can be used
safely.  As a bonus, this lets us use the ZFS_VERIFY_ZP() macro instead
of open-coding its implementation.

Reported-by: Peter Holm <pho@FreeBSD.org>
Tested-by: Peter Holm <pho@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Sponsored-by: The FreeBSD Foundation
Closes #12717

2 years agoAdd notes to system_taskq
Paul Dagnelie [Fri, 19 Nov 2021 17:02:45 +0000 (09:02 -0800)]
Add notes to system_taskq

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #12771

2 years agoEnable edonr in FreeBSD
Rich Ercolani [Tue, 16 Nov 2021 19:40:10 +0000 (14:40 -0500)]
Enable edonr in FreeBSD

The code is integrated, builds fine, runs fine, there's not really
any reason not to.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12735

2 years agoFreeBSD: fix world build after de198f2d9
Martin Matuška [Mon, 15 Nov 2021 16:07:39 +0000 (17:07 +0100)]
FreeBSD: fix world build after de198f2d9

The inline function vn_flush_cached_data() in vnode.h
must not be compiled when building BASE.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #12743

2 years agoFix `zfs:AUTO` autodetection in initramfs scripts
Damian Szuberski [Sat, 13 Nov 2021 15:02:50 +0000 (16:02 +0100)]
Fix `zfs:AUTO` autodetection in initramfs scripts

Don't exit early in find_rootfs() when zpool.bootfs
is set to `zfs:AUTO`.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #12658

2 years agoRemove (now unused) td argument from zfs_lookup()
Pawel Jakub Dawidek [Sat, 13 Nov 2021 01:06:44 +0000 (17:06 -0800)]
Remove (now unused) td argument from zfs_lookup()

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #12748

2 years agoIntroduce a tunable to exclude special class buffers from L2ARC
George Amanakis [Thu, 11 Nov 2021 20:52:16 +0000 (21:52 +0100)]
Introduce a tunable to exclude special class buffers from L2ARC

Special allocation class or dedup vdevs may have roughly the same
performance as L2ARC vdevs. Introduce a new tunable to exclude those
buffers from being cacheable on L2ARC.

Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #11761
Closes #12285

2 years agoRemove basename(1). Clean up/shorten some coreutils pipelines
наб [Thu, 11 Nov 2021 20:27:37 +0000 (21:27 +0100)]
Remove basename(1). Clean up/shorten some coreutils pipelines

Basenames that remain, in cmd/zed/zed.d/statechange-led.sh:
dev=$(basename "$(echo "$therest" | awk '{print $(NF-1)}')")
vdev=$(basename "$ZEVENT_VDEV_PATH")
I don't wanna interfere with #11988

scripts/zfs-tests.sh:
SINGLETESTFILE=$(basename "$SINGLETEST")
tests/zfs-tests/tests/functional/cli_user/zfs_list/zfs_list.kshlib:
ACTUAL=$(basename $dataset)
ACTUAL=$(basename $dataset)
tests/zfs-tests/tests/functional/cli_user/zpool_iostat/
zpool_iostat_-c_homedir.ksh:
typeset USER_SCRIPT=$(basename "$USER_SCRIPT_FULL")
tests/zfs-tests/tests/functional/cli_user/zpool_iostat/
zpool_iostat_-c_searchpath.ksh:
typeset CMD_1=$(basename "$SCRIPT_1")
typeset CMD_2=$(basename "$SCRIPT_2")
tests/zfs-tests/tests/functional/cli_user/zpool_status/
zpool_status_-c_homedir.ksh:
typeset USER_SCRIPT=$(basename "$USER_SCRIPT_FULL")
tests/zfs-tests/tests/functional/cli_user/zpool_status/
zpool_status_-c_searchpath.ksh
typeset CMD_1=$(basename "$SCRIPT_1")
typeset CMD_2=$(basename "$SCRIPT_2")
tests/zfs-tests/tests/functional/migration/migration.cfg:
export BNAME=`basename $TESTFILE`
tests/zfs-tests/tests/perf/perf.shlib:
typeset logbase="$(get_perf_output_dir)/$(basename \
tests/zfs-tests/tests/perf/perf.shlib:
typeset logbase="$(get_perf_output_dir)/$(basename \

These are potentially Of Directories, where basename is actually
useful

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12652

2 years agoCheck l2cache vdevs pending list inside the vdev_inuse()
Fedor Uporov [Thu, 11 Nov 2021 19:54:15 +0000 (11:54 -0800)]
Check l2cache vdevs pending list inside the vdev_inuse()

The l2cache device could be added twice because vdev_inuse() does not
check spa_l2cache for added devices. Make l2cache vdevs inuse checking
logic more closer to spare vdevs.

Reviewed-by: George Amanakis <gamanakis@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes #9153
Closes #12689

2 years agozhack: Add repair label option
Fedor Uporov [Thu, 11 Nov 2021 19:26:18 +0000 (11:26 -0800)]
zhack: Add repair label option

In case if all label checksums will be invalid on any vdev, the pool
will become unimportable. The zhack with newly added cli options could
be used to restore label checksums and make pool importable again.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes #2510
Closes #12686

2 years agoZTS: zfs_list_004_neg should not check paths that belong to ZFS
Palash Gandhi [Thu, 11 Nov 2021 15:46:44 +0000 (07:46 -0800)]
ZTS: zfs_list_004_neg should not check paths that belong to ZFS

When ZFS is on root, /tmp is a ZFS. This causes zfs_list_004_neg to
fail since `zfs list` on /tmp passes when the test expects it not to.
The fix is to exclude paths that belong to ZFS.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Palash Gandhi <pbg4930@rit.edu>
Closes #12744

2 years agoRestore dirty dnode detection logic
Brian Behlendorf [Thu, 11 Nov 2021 00:14:32 +0000 (16:14 -0800)]
Restore dirty dnode detection logic

In addition to flushing memory mapped regions when checking holes,
commit de198f2d95 modified the dirty dnode detection logic to check
the dn->dn_dirty_records instead of the dn->dn_dirty_link.  Relying
on the dirty record has not be reliable, switch back to the previous
method.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #11900
Closes #12745

2 years agoExclude zfs_copies_003_pos on Linux
Brian Behlendorf [Wed, 10 Nov 2021 20:56:01 +0000 (12:56 -0800)]
Exclude zfs_copies_003_pos on Linux

This test case may fail on 5.13 and newer Linux kernels if the
/dev/zvol/ device is not created by udev.

Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #12301
Closes  #12738

2 years agozdb: Report bad label checksum
Fedor Uporov [Wed, 10 Nov 2021 19:22:00 +0000 (11:22 -0800)]
zdb: Report bad label checksum

In case if all label checksums will be invalid on any vdev, the pool
will become unimportable. From other side zdb with -l option will not
provide any useful information why it happened. Add notifications
about corrupted label checksums.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes #2509
Closes #12685

2 years agoUpgrade to libabigail 2.0.0
Dimitri John Ledkov [Mon, 8 Nov 2021 15:44:04 +0000 (15:44 +0000)]
Upgrade to libabigail 2.0.0

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Dimitri John Ledkov <dimitri.ledkov@canonical.com>
Closes #12722
Closes #12739

2 years agozed: Control NVMe fault LEDs
Tony Hutter [Wed, 10 Nov 2021 00:50:18 +0000 (16:50 -0800)]
zed: Control NVMe fault LEDs

The ZED code currently can only turn on the fault LED for
a faulted disk in a JBOD enclosure.  This extends support
for faulted NVMe disks as well.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #12648
Closes #12695

2 years agoSkip spacemaps reading in case of pool readonly import
Fedor Uporov [Tue, 9 Nov 2021 20:50:39 +0000 (12:50 -0800)]
Skip spacemaps reading in case of pool readonly import

The only zdb utility require to read metaslab-related data during
read-only pool import because of spacemaps validation. Add global
variable which will allow zdb read spacemaps in case of readonly
import mode.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes #9095
Closes #12687

2 years agoSingle IO issue for raidz writes with skip sector
Brian Atkinson [Tue, 9 Nov 2021 19:51:33 +0000 (12:51 -0700)]
Single IO issue for raidz writes with skip sector

In order to reduce contention on the vq_lock, optional skip sectors
for Raidz writes can be placed into a single IO request. This is done by
padding out the linear ABD for a parity column to contain the skip
sector and by creating gang ABD to contain the data and skip sector for
data columns.

The vdev_raidz_map_alloc() function now contains specific functions for
both reads and write to allocate the ABD's that will be issued down to
the VDEV chldren.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-By: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #12333

2 years agoLinux 5.16 compat: submit_bio()
Brian Behlendorf [Sat, 6 Nov 2021 00:17:03 +0000 (17:17 -0700)]
Linux 5.16 compat: submit_bio()

The submit_bio() prototype has changed again.  The version is 5.16
still only expects a single argument but the return type has changed
to void.  Since we never used the returned value before update the
configure check to detect both single arg versions.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12725

2 years agoLinux 5.16 compat: linux/elevator.h
Brian Behlendorf [Fri, 5 Nov 2021 00:03:50 +0000 (17:03 -0700)]
Linux 5.16 compat: linux/elevator.h

Commit https://github.com/torvalds/linux/commit/2e9bc346 moved
the elevator.h header under the block/ directory as part of some
refactoring.  This turns out not to be a problem since there's
no longer anything we need from the header.  This has been the
case for some time, this change removes the elevator.h include
and replaces it with a major.h include.

Reviewed-by: Alexander Lobakin <alobakin@pm.me>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12725

2 years agoExclude zvol_misc_volmode for now
Rich Ercolani [Tue, 9 Nov 2021 02:01:19 +0000 (21:01 -0500)]
Exclude zvol_misc_volmode for now

It keeps failing, on changes which aren't related at all.

So until someone runs down why, I'd like it to stop being the
sole reason for CI failures.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12733

2 years agoFix lseek(SEEK_DATA/SEEK_HOLE) mmap consistency
Brian Behlendorf [Sun, 7 Nov 2021 21:27:44 +0000 (13:27 -0800)]
Fix lseek(SEEK_DATA/SEEK_HOLE) mmap consistency

When using lseek(2) to report data/holes memory mapped regions of
the file were ignored.  This could result in incorrect results.
To handle this zfs_holey_common() was updated to asynchronously
writeback any dirty mmap(2) regions prior to reporting holes.

Additionally, while not strictly required, the dn_struct_rwlock is
now held over the dirty check to prevent the dnode structure from
changing.  This ensures that a clean dnode can't be dirtied before
the data/hole is located.  The range lock is now also taken to
ensure the call cannot race with zfs_write().

Furthermore, the code was refactored to provide a dnode_is_dirty()
helper function which checks the dnode for any dirty records to
determine its dirtiness.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #11900
Closes #12724

2 years agoUpdate contrib/initramfs/README.initramfs.markdown
Michael Franzl [Thu, 4 Nov 2021 15:23:50 +0000 (16:23 +0100)]
Update contrib/initramfs/README.initramfs.markdown

Note that Dropbear supports ed25519 keys since version 2020.79.

See https://github.com/mkj/dropbear/pull/91

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Michael Franzl <michael@franzl.name>
Closes #12715

2 years agoRevert behavior of 59eab109 on not-Linux
Rich Ercolani [Thu, 4 Nov 2021 13:49:40 +0000 (09:49 -0400)]
Revert behavior of 59eab109 on not-Linux

It turns out that short-circuiting the EFAULT behavior on a short read
breaks things on FreeBSD. So until there's a nicer solution, let's
just revert the behavior for not-Linux.

Reference:
https://reviews.freebsd.org/R10:70f51f0e474ffe1fb74cb427423a2fba3637544d

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12698

2 years agoWorkaround issue cleaning up automounted snapshots on Linux
Rich Ercolani [Wed, 3 Nov 2021 15:00:08 +0000 (11:00 -0400)]
Workaround issue cleaning up automounted snapshots on Linux

On Linux, sometimes, when ZFS goes to unmount an automounted snap,
it fails a VERIFY check on debug builds, because taskq_cancel_id
returned ENOENT after not finding the taskq it was trying to cancel.

This presumably happens when it already died for some reason; in this
case, we don't really mind it already being dead, since we're just
going to dispatch a new task to unmount it right after.

So we just ignore it if we get back ENOENT trying to cancel here,
retry a couple times if we get back the only other possible condition
(EBUSY), and log to dbgmsg if we got anything but ENOENT or success.

(We also add some locking around taskqid, to avoid one or two cases
of two instances of trying to cancel something at once.)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #11632
Closes #12670

2 years agoAdd more explicit warning about dedup being dropped
Rich Ercolani [Tue, 2 Nov 2021 21:45:20 +0000 (17:45 -0400)]
Add more explicit warning about dedup being dropped

"has unsupported feature: [number]" seems reasonable when we can't
know what the problem was, but with the send -D removal, we know
what it was, and can explicitly tell people "don't do that; try
this if you must".

So let's.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12708

2 years agoUpdate `checkstyle` workflow env to ubuntu-20.04
Damian Szuberski [Tue, 2 Nov 2021 20:02:57 +0000 (21:02 +0100)]
Update `checkstyle` workflow env to ubuntu-20.04

- `checkstyle` workflow uses ubuntu-20.04 environment
- improved `mancheck.sh` readability

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #12713

2 years agoFix cpu hotplug atomic sleep issue
Paul Dagnelie [Tue, 2 Nov 2021 16:23:48 +0000 (09:23 -0700)]
Fix cpu hotplug atomic sleep issue

We move the spinlock unlock before the thread creation. This should be
safe because the thread creation code doesn't actually manipulate any
taskq data structures; that's done by the thread once it's created.

We also remove the assertion that the maxthreads is the current threads
plus one; that assertion could fail if multiple hotplug events come in
quick succession, and the first new taskq thread hasn't had a chance to
start processing yet.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
eviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #12714

2 years agoDisable normalization implicitly when setting "utf8only=off"
Mike Swanson [Fri, 29 Oct 2021 23:59:18 +0000 (16:59 -0700)]
Disable normalization implicitly when setting "utf8only=off"

When a parent dataset has normalization set to any value other than
"none", and a file system is created with the property "utf8only=off",
implicitly also set "normalization=none" instead of overriding the
desire for a non-UTF8 enforcing file system.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mike Swanson <mikeonthecomputer@gmail.com>
Closes #11892
Closes #12038

2 years agoExit the teardown section later in rename on FreeBSD
Mark Johnston [Thu, 28 Oct 2021 17:25:26 +0000 (13:25 -0400)]
Exit the teardown section later in rename on FreeBSD

We have to hold the teardown lock while dereferencing zfsvfs->z_os and,
I believe, when committing to the ZIL.

Note that jumping to the "out" label, "error" is always non-zero.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12704

2 years agoFix potential use-after-frees in FreeBSD getpages and setattr VOPs
Mark Johnston [Thu, 28 Oct 2021 15:58:57 +0000 (11:58 -0400)]
Fix potential use-after-frees in FreeBSD getpages and setattr VOPs

The objset object is reallocated during certain dataset operations, such
as rollbacks, so the objset pointer must be loaded after acquiring the
teardown lock.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12704

2 years agozfsprops.7: Add note about comma-separation
D. Ebdrup [Fri, 29 Oct 2021 23:30:44 +0000 (01:30 +0200)]
zfsprops.7: Add note about comma-separation

This change primarily seeks to make implicit documentation explicit, as
it is not outright stated that options should be comma-separated, nor is
there a reason given for it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Daniel Ebdrup Jensen <debdrup@FreeBSD.org>
Closes #12579

2 years agoDo not print UINT64_MAX value for some of zfs properties
Fedor Uporov [Fri, 29 Oct 2021 23:18:13 +0000 (16:18 -0700)]
Do not print UINT64_MAX value for some of zfs properties

The values of next properties: filesystem_limit, filesystem_count,
snapshot_limit, snapshot_count were returned to user as UINT64_MAX
integers in case if -p cli option is used, return 'none' value instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes #9306
Closes #12690

2 years agoAdd explicit error for device_rebuild being disabled
Rich Ercolani [Fri, 29 Oct 2021 22:55:22 +0000 (18:55 -0400)]
Add explicit error for device_rebuild being disabled

Currently, you get back "can only attach to mirrors and top-level disks"
unconditionally if zpool attach returns ENOTSUP, but that also happens
if, say, feature@device_rebuild=disabled and you tried attach -s.

So let's print an error for that case, lest people go down a rabbit hole
looking into what they did wrong.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #11414
Closes #12680

2 years agoNormalize property names for zfs receive
Rich Ercolani [Fri, 29 Oct 2021 22:38:10 +0000 (18:38 -0400)]
Normalize property names for zfs receive

It turns out, userland is much more happy with aliased property
names than the kernel is.

So let's normalize those to the expected names before we pass
them off.

Added a test case hacked up from the other recv -o/-x test that fails
on unpatched git and passes here.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12607
Closes #12609

2 years agoPython 3.10 fixes, part 2
Rich Ercolani [Fri, 29 Oct 2021 22:36:01 +0000 (18:36 -0400)]
Python 3.10 fixes, part 2

There was a fallback case I overlooked in the initial patch, with
a similarly imperfect version extractor.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12045
Closes #12673

2 years agoSet DEFAULT_INIT_SHELL to /sbin/openrc-run for Gentoo and Alpine
Peter Levine [Fri, 29 Oct 2021 22:34:37 +0000 (18:34 -0400)]
Set DEFAULT_INIT_SHELL to /sbin/openrc-run for Gentoo and Alpine

Gentoo and Alpine always set the rc init scripts' shebang to
#!/sbin/openrc-run, whether or not openrc is installed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Peter Levine <plevine457@gmail.com>
Closes #12683
Closes #12692

2 years agovdev_id: Fix PHY sorting
Tony Hutter [Fri, 29 Oct 2021 22:33:34 +0000 (15:33 -0700)]
vdev_id: Fix PHY sorting

One of our developers noticed a bug in vdev_id where we were incorrectly
sorting PHYs using alphabetical sorting (which usually works) instead
of natural sorting (-v).  For example:

[port-0:0]# ls -d phy*
phy-0:10  phy-0:11  phy-0:8  phy-0:9

[port-0:0]# ls -vd phy*
phy-0:8  phy-0:9  phy-0:10  phy-0:11

This fixes the issue.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #12699

2 years agoRemove unused function zvol_set_volblocksize()
Fedor Uporov [Wed, 27 Oct 2021 00:07:53 +0000 (17:07 -0700)]
Remove unused function zvol_set_volblocksize()

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes #12688

2 years agoMake dsl_scan print the pool name in dbgmsg
Rich Ercolani [Tue, 26 Oct 2021 23:24:14 +0000 (19:24 -0400)]
Make dsl_scan print the pool name in dbgmsg

If you've got multiple scrubs/resilvers going, it's rather helpful
to know which pool each scan line refers to.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes: #12674
2 years agospa.c: Replace VERIFY(nvlist_*(...) == 0) with fnvlist_* (#12678)
Allan Jude [Tue, 26 Oct 2021 23:15:38 +0000 (19:15 -0400)]
spa.c: Replace VERIFY(nvlist_*(...) == 0) with fnvlist_* (#12678)

The fnvlist versions of the functions are fatal if they fail,
saving each call from having to include checking the result.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Allan Jude <allan@klarasystems.com>
2 years agoZTS: Standardize use of destroy_dataset in cleanup
Brian Behlendorf [Mon, 25 Oct 2021 21:13:50 +0000 (14:13 -0700)]
ZTS: Standardize use of destroy_dataset in cleanup

When cleaning up a test case standardize on using the convention:

    datasetexists $ds && destroy_dataset $ds <flags>

By using 'destroy_dataset' instead of 'log_must zfs destroy' we ensure
that the destroy is retried in the event that a ZFS volume is busy.
This helps ensures ensure tests are fully cleaned up and prevents false
positive test failures on Linux.

Note that all of the tests which used 'zfs destroy' in cleanup have
been updated even if they don't use volumes.  This was done to
clearly establish the expected convention.

Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12663

2 years agoWorkaround cloud-init hotplug issue
Rich Ercolani [Mon, 25 Oct 2021 17:27:05 +0000 (13:27 -0400)]
Workaround cloud-init hotplug issue

cloud-init added a hook which triggers on every device add/rm
event, which results in holding open devices for a while after
they're created/destroyed.

So let's shove an exclusion rule for that into the GH workflows
until it gets fixed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12644
Closes #12669

2 years agoFreeBSD: Catch up with recent VFS changes
Ryan Moeller [Mon, 25 Oct 2021 16:46:28 +0000 (12:46 -0400)]
FreeBSD: Catch up with recent VFS changes

cn_thread is always curthread.

https://cgit.freebsd.org/src/commit?id=b4a58fbf640409a1e507d9f7b411c83a3f83a2f3
https://cgit.freebsd.org/src/commit?id=2b68eb8e1dbbdaf6a0df1c83b26f5403ca52d4c3

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Alan Somers <asomers@gmail.com>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12668

2 years agopam_zfs_key: malloc and mlock/munlock won't match
Attila Fülöp [Thu, 21 Oct 2021 10:17:47 +0000 (12:17 +0200)]
pam_zfs_key: malloc and mlock/munlock won't match

mlock(2) and munlock(2) operate on memory pages whereas malloc(3)
does not. So if you munlock(2) a malloced memory region, the whole
page containing it is freed. Since this page may contain another
malloced and mlocked memory region, used as a password buffer by a
concurrent running instance of pam_zfs_key, there is a slight chance
of leaking passwords. By using mmap(2) we avoid such problems since
it will return whole pages on page aligned addresses.

Although the above concern may be mostly academical, it is still
better to use mmap(2) for allocating memory since the FreeBSD
documentation suggests to call mlock(2) and munlock(2) on page
aligned addresses, and other implementations even require it.

While here, remove duplicate code in alloc_pw_string() by calling
alloc_pw_size().

Reviewed-by: Felix Dörre <felix@dogcraft.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #12665

2 years agopam_zfs_key: mlock(2) and munlock(2) can fail
Attila Fülöp [Thu, 21 Oct 2021 04:06:55 +0000 (06:06 +0200)]
pam_zfs_key: mlock(2) and munlock(2) can fail

Since both syscalls can fail, add error handling, including EAGAIN.

Reviewed-by: Felix Dörre <felix@dogcraft.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #12665

2 years agopam_zfs_key: change test user name to conform to standards
Attila Fülöp [Thu, 21 Oct 2021 03:44:36 +0000 (05:44 +0200)]
pam_zfs_key: change test user name to conform to standards

The useradd(8) command on my system won't accept login names with
uppercase letters in them, so adjust for that.

Reviewed-by: Felix Dörre <felix@dogcraft.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #12665

2 years agoSkip snapshot in zfs_iter_mounted()
youzhongyang [Wed, 20 Oct 2021 23:07:19 +0000 (19:07 -0400)]
Skip snapshot in zfs_iter_mounted()

The intention of the zfs_iter_mounted() is to traverse the dataset
and its descendants, not the snapshots. The current code can cause
a mounted snapshot to be included and thus zfs_open() on the snapshot
with ZFS_TYPE_FILESYSTEM would print confusing message such as "cannot
open 'rpool/fs@snap': snapshot delimiter '@' is not expected here".

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #12447
Closes #12448

2 years agovdev_id: Fix enclosure_symlinks feature
Tony Hutter [Wed, 20 Oct 2021 22:48:04 +0000 (15:48 -0700)]
vdev_id: Fix enclosure_symlinks feature

The vdev_id.conf "enclosure_symlinks" option persistently creates
and maps /dev/by-enclosure symlinks to dynamic /dev/sg* devices.

This patch fixes two issues:

1. The enclosure_symlinks feature was accidentally broken in:

   vdev_id: Support daisy-chained JBODs in multipath mode

2. Even when working, the feature numbered the enclosure
   sequentially rather than by HBA port number.  That meant that
   if a port was down or didn't appear in sysfs, then the
   enclosure_sumlinks numbers would be numbered wrong.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #12660

2 years agolibshare: nfs: pass through ipv6 addresses in bracket notation
felixdoerre [Wed, 20 Oct 2021 17:40:00 +0000 (19:40 +0200)]
libshare: nfs: pass through ipv6 addresses in bracket notation

Recognize when the host part of a sharenfs attribute is an ipv6
Literal and pass that through without modification.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Felix Dörre <felix@dogcraft.de>
Closes: #11171
Closes #11939
Closes: #1894
2 years agozpool should call zfs_nicestrtonum() with non-NULL handle
Toomas Soome [Wed, 20 Oct 2021 17:34:34 +0000 (20:34 +0300)]
zpool should call zfs_nicestrtonum() with non-NULL handle

When zfs_nicestrtonum() is called and there will be an error,
the message is left in libzfs handle, if provided. We can use
this message, to provide better feedback for user.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #12650

2 years agoRemove code duplication
Pawel Jakub Dawidek [Mon, 18 Oct 2021 23:50:33 +0000 (16:50 -0700)]
Remove code duplication

Remove code duplication by moving code responsible for partial block
zeroing to a separate function: dnode_partial_zero().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #12627

2 years agoNotify on UNAVAIL statechange
Francesco Mazzoli [Fri, 15 Oct 2021 22:57:23 +0000 (00:57 +0200)]
Notify on UNAVAIL statechange

`UNAVAIL` is maybe not quite as concerning as `DEGRADED`, but still an
event of notice, in my opinion. For example it is triggered when a
drive goes missing.

Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Francesco Mazzoli <f@mazzo.li>
Closes #12629
Closes #12630

2 years agozdb: fix overflow of time estimation
Teodor Spæren [Fri, 15 Oct 2021 22:55:34 +0000 (00:55 +0200)]
zdb: fix overflow of time estimation

The calculation of estimated time remaining in zdb -cc could overflow,
as reported in #10666. This patch fixes this, by using uint64_t instead
of ints in the calculations.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Teodor Spæren <teodor@sparen.no>
Closes #10666
Closes #12610

2 years agoRemove FreeBSD's local copy of the dmu_buf_hold_array() function
Pawel Jakub Dawidek [Wed, 13 Oct 2021 18:01:01 +0000 (11:01 -0700)]
Remove FreeBSD's local copy of the dmu_buf_hold_array() function

Make the main dmu_buf_hold_array() function non-static.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #12628

2 years agozio: use unsigned values for enum
Teodor Spæren [Mon, 11 Oct 2021 17:58:06 +0000 (19:58 +0200)]
zio: use unsigned values for enum

cppcheck complains about the use of 1 << 31, because enums are signed
ints which cannot represent this. As discussed in issue #12611, it
appears that with C99, we can use an unsiged int for the enum, on most
platforms.

I've crafted this commit for just the include/sys/zio.h header, as it's
the only one with a shift of 31. If this is something we want to adopt
in the rest of the project, I will go through and apply it to the rest
of the project.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Teodor Spæren <teodor@sparen.no>
Closes #12611
Closes #12615

2 years agoExport minimal zfs_refcount interfaces
Brian Behlendorf [Mon, 11 Oct 2021 17:54:39 +0000 (10:54 -0700)]
Export minimal zfs_refcount interfaces

Lustre makes light use of the zfs_refcount interfaces which
isn't a problem when using a non-debug build of OpenZFS. However,
when debugging is enabled the required symbols are not exported.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12613

2 years agoZTS: Add known exceptions
Brian Behlendorf [Mon, 11 Oct 2021 17:52:32 +0000 (10:52 -0700)]
ZTS: Add known exceptions

Add the following test failures to the exception list for FreeBSD
to ensure we notice new unexpected failures.

   pool_checkpoint/checkpoint_big_rewind
   pool_checkpoint/checkpoint_indirect

And the following for Linux.

   zvol/zvol_misc/zvol_misc_snapdev

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #12621
Issue #12622
Issue #12623
Closes #12624

2 years agoZTS: deadman_sync fix
Brian Behlendorf [Mon, 11 Oct 2021 17:49:13 +0000 (10:49 -0700)]
ZTS: deadman_sync fix

In the CI environment it's possible for events to be slightly
delayed resulting in 4, instead of 5, events appearing in the
log file.  This isn't a problem and should be considered a
success to avoid false positive test results.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12625

2 years agoinitramfs: use correct dataset for rootfs on rollback=1
nachtgeist [Fri, 8 Oct 2021 18:16:31 +0000 (18:16 +0000)]
initramfs: use correct dataset for rootfs on rollback=1

When booting with root=zfs:rpool/myrootfs@foosnapshot rollback=1,
myrootfs and its descendants get rolled back to foosnapshot, however
ZFS_BOOTFS still contains myrootfs@foosnapshot instead of the
actually desired value of myrootfs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Daniel Reichelt <hacking@nachtgeist.net>
Closes #12585
Closes #12586

2 years agoFail invalid incremental recursive send gracefully
Ryan Moeller [Fri, 8 Oct 2021 18:14:26 +0000 (14:14 -0400)]
Fail invalid incremental recursive send gracefully

zfs send -R -i snap1 pool/ds@snap1 is an invalid invocation of zfs send
because the incremental source and target snapshots are the same.  We
have an error message for this condition, but we don't make it there
because of a failed assert while iterating through the dataset's
snapshots.

Check for NULL to avoid the assert so we can make it to the error
message.

Test this form of invalid send invocation in rsend tests.  Fix the
rsend_016_neg test while here: log_neg itself doesn't fail the test,
and writing to /dev/null is not supported on all Linux kernels.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11121
Closes #12533

2 years agoCorrect refcount_add in dmu_zfetch
Rich Ercolani [Fri, 8 Oct 2021 18:10:34 +0000 (14:10 -0400)]
Correct refcount_add in dmu_zfetch

refcount_add_many(foo,N) is not the same as
for (i=0; i < N; i++) { refcount_add(foo); }

Unfortunately, this is only actually true with debug kernels and
reference_tracking_enable=1.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12589
Closes #12602

2 years agoarcstat: Fix integer division with python3
Valmiky Arquissandas [Fri, 8 Oct 2021 15:32:27 +0000 (16:32 +0100)]
arcstat: Fix integer division with python3

The arcstat script requests compatibility with python2 and python3, but
PEP 238 modified the / operator and results in erroneous output when
run under python3.

This commit replaces instances of / with //, yielding the expected
result in both versions of Python.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Valmiky Arquissandas <foss@kayvlim.com>
Closes #12603

2 years agoSimplify and document OpenZFS library dependencies
Brian Behlendorf [Thu, 7 Oct 2021 17:31:26 +0000 (10:31 -0700)]
Simplify and document OpenZFS library dependencies

For those not already familiar with the code base it can be a
challenge to understand how the libraries are laid out.  This
has sometimes resulted in functionality being added in the
wrong place.  To help avoid that in the future this commit
documents the high-level dependencies for easy reference in
lib/Makefile.am.  It also simplifies a few things.

- Switched libzpool dependency on libzfs_core to libzutil.
  This change makes it clear libzpool should never depend
  on the ioctl() functionality provided by libzfs_core.

- Moved zfs_ioctl_fd() from libzutil to libzfs_core and
  renamed it lzc_ioctl_fd().  Normal access to the kmods
  should all be funneled through the libzfs_core library.
  The sole exception is the pool_active() which was updated
  to not use lzc_ioctl_fd() to remove the libzfs_core
  dependency.

- Removed libzfs_core dependency on libzutil.

- Removed the lib/libzfs/os/freebsd/libzfs_ioctl_compat.c
  source file which was all dead code.

- Removed libzfs_core dependency from mkbusy and ctime
  test utilities.  It was only needed for some trivial
  wrapper functions and that code is easy to replicate
  to shed the unneeded dependency.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12602

2 years agoRemove zdb and libzpool from initramfs image
Serapheim Dimitropoulos [Thu, 7 Oct 2021 16:52:03 +0000 (09:52 -0700)]
Remove zdb and libzpool from initramfs image

= Motivation

At Delphix we are heavy users of kernel crash dumps that are captured
through a crash kernel that is spawned whenever the main kernel panics.
The way that this works internally is that a certain amount of memory is
reserved while the main system is running so the initramfs of the crash
kernel can be loaded when a panic occurs.

In order to keep reserved memory at minimum we've been historically
trying to identify the binaries that are part of the kernel's initramfs
that are big and finding ways of either making them smaller or do not
include them in the initramfs image. An example is always stripping the
DWARF info of the ZFS kernel module copy that is included in the
initramfs image of both our running and our crash kernel (the difference
in size there is 76MB vs 4MB).

We've recently identified that libzpool has been the largest binary in
our initramfs images - currently sized around 17MB.

= This Patch

The ZFS scripts do not explicitly copy libzpool to initramfs. They copy
zdb which pulls in libzpool as a dependency. Given that both zdb and
libzpool are not really essential for initramfs (e.g. we'll still have
access to the once the root filesystem is unpacked) this patch removes
them from initramfs.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #12616

2 years agoDocument additional -c caveat
Rich Ercolani [Tue, 5 Oct 2021 18:48:17 +0000 (14:48 -0400)]
Document additional -c caveat

One might expect "send data as it is on disk, and cannot trigger
compression changes" to imply "does not attempt to compress data
that was not compressed on the sender."

One would be mistaken.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12570

2 years agoRescan enclosure sysfs path on import
Tony Hutter [Mon, 4 Oct 2021 19:32:16 +0000 (12:32 -0700)]
Rescan enclosure sysfs path on import

When you create a pool, zfs writes vd->vdev_enc_sysfs_path with the
enclosure sysfs path to the fault LEDs, like:

    vdev_enc_sysfs_path = /sys/class/enclosure/0:0:1:0/SLOT8

However, this enclosure path doesn't get updated on successive imports
even if enclosure path to the disk changes.  This patch fixes the issue.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #11950
Closes #12095

2 years agoReject zfs send -RI with nonexistent fromsnap
Rich Ercolani [Mon, 4 Oct 2021 17:17:30 +0000 (13:17 -0400)]
Reject zfs send -RI with nonexistent fromsnap

Right now, zfs send -I dataset@nonexistent dataset@existent fails, but
zfs send -RI dataset@nonexistent dataset@existent does not.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12574
Closes #12575

2 years agoZFS: Remove a redundant if condition (#12598)
Attila Fülöp [Sat, 2 Oct 2021 18:50:57 +0000 (20:50 +0200)]
ZFS: Remove a redundant if condition (#12598)

Commit 0c03d21ac99ebdbe left in a redundant if condition while
removing some code. Just remove it.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #12598

2 years agoProper support for DESTDIR and INSTALL_MOD_PATH
José Luis Salvador Rufo [Fri, 1 Oct 2021 17:44:34 +0000 (19:44 +0200)]
Proper support for DESTDIR and INSTALL_MOD_PATH

The environment variables DESTDIR and INSTALL_MOD_PATH must
be mutually exclusive.

https://www.gnu.org/prep/standards/html_node/DESTDIR.html
https://www.kernel.org/doc/Documentation/kbuild/modules.txt

This issue was discussed in this Buildroot thread:
https://lists.buildroot.org/pipermail/buildroot/2021-August/621350.html

I saw this behavior in other different projects, as:

- Yocto Project:
  https://www.yoctoproject.org/pipermail/meta-freescale/2013-August/004307.html

- Google IA Coral:
  https://coral.googlesource.com/linux-imx-debian/+/refs/heads/master/debian/rules

For the above reasons, INSTALL_MOD_PATH will be set as DESTDIR
by default.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: José Luis Salvador Rufo <salvador.joseluis@gmail.com>
Signed-off-by: Romain Naour <romain.naour@gmail.com>
Closes #12577

2 years agoZTS: Minimize udev_wait in zvol_misc tests
Ryan Moeller [Fri, 1 Oct 2021 15:36:02 +0000 (11:36 -0400)]
ZTS: Minimize udev_wait in zvol_misc tests

The zvol_misc tests, in particular zvol_misc_volmode, make use of a
common udev_wait function to wait for zvol devices in /dev to quiesce
on Linux.  On other platforms this function currently only sleeps for
one second before returning.  This is insufficient, and
zvol_misc_volmode has been flaky on FreeBSD as a result.

Replace udev_wait with block_device_wait, passing through the optional
device parameter where possible.  Rearrange a few checks to strengthen
the verifications we are making and avoid unnecessarily sleeping.  We
must keep udev_wait in a couple places to pass in Github CI workflows.
Remove zvol_misc_volmode from the maybe failing tests on FreeBSD in
zts-report.py.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12583

2 years agolibspl: fix warning about missing spl_pagesize declaration
Érico Nogueira Rolim [Fri, 24 Sep 2021 22:59:26 +0000 (19:59 -0300)]
libspl: fix warning about missing spl_pagesize declaration

Though it's unlikely anyone will alter its signature, avoids any
possible type mismatch.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Érico Nogueira <erico.erc@gmail.com>
Closes #12567

2 years agoAssorted parameter changes for performance tests
John Wren Kennedy [Tue, 21 Sep 2021 22:17:36 +0000 (16:17 -0600)]
Assorted parameter changes for performance tests

* Add async runs for sequential_writes, random_readwrite_fixed and
  random_writes
* Remove some larger block sizes that give similar results to others
* Remove nthreads == 4 from random_writes_zil test

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes #12576

2 years agoHandle partial reads in zfs_read
Rich Ercolani [Mon, 20 Sep 2021 17:30:50 +0000 (13:30 -0400)]
Handle partial reads in zfs_read

Currently, dmu_read_uio_dnode can read 64K of a requested 1M in one
loop, get EFAULT back from zfs_uiomove() (because the iovec only holds
64k), and return EFAULT, which turns into EAGAIN on the way out. EAGAIN
gets interpreted as "I didn't read anything", the caller tries again
without consuming the 64k we already read, and we're stuck.

This apparently works on newer kernels because the caller which breaks
on older Linux kernels by happily passing along a 1M read request and a
64k iovec just requests 64k at a time.

With this, we now won't return EFAULT if we got a partial read.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12370
Closes #12509
Closes #12516

2 years agoUpstream: unmount snapshots before destroying them on macOS
Jorgen Lundman [Mon, 20 Sep 2021 15:29:59 +0000 (00:29 +0900)]
Upstream: unmount snapshots before destroying them on macOS

Add function zfs_destroy_snaps_nvl_os() call. The main issue is that
macOS needs to unmount any mounted snapshots before they can be
destroyed. Other platforms can handle this in the kernel, but sending
a storm of zed events to unmount seems undesirable when we can do it
in userland to start with.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Co-authored-by: ilovezfs <ilovezfs@icloud.com>
Closes #12550

2 years agoAdded test for being able to read various variants of zstd
Rich Ercolani [Mon, 20 Sep 2021 15:08:20 +0000 (11:08 -0400)]
Added test for being able to read various variants of zstd

As detailed in #12022 and #12008, it turns out the current zstd
implementation is quite nonportable, and results in various
configurations of ondisk header that only each platform can read.

So I've added a test which contains a dataset with a file written by
Linux/x86_64 and one written by FBSD/ppc64.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12030

2 years agoReally zero the zero page
Alexander Motin [Fri, 17 Sep 2021 17:17:18 +0000 (13:17 -0400)]
Really zero the zero page

While switching abd_zero_buf allocation KPI I've missed the fact
that kmem_zalloc() zeroed the allocation, while kmem_cache_alloc()
does not.  Add explicit bzero() after it.

I don't think it should have caused real problems, but leaking one
memory page content all over the pool is not good.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12569

2 years agoAvoid panic in case of pool errors and missing L2ARC
George Amanakis [Thu, 16 Sep 2021 16:40:15 +0000 (18:40 +0200)]
Avoid panic in case of pool errors and missing L2ARC

In case an ARC buffer is allocated only on L2ARC, and there are
underlying errors in a pool with the cache device in faulty state, a
panic can occur in arc_read_done()->arc_hdr_destroy()->
arc_hdr_l2arc_destroy()->arc_hdr_clear_flags() when trying to free
the ARC buffer.

Fix this by discarding the buffer's identity in arc_hdr_destroy(), in
case the buffer is not empty, before calling arc_hdr_l2hdr_destroy().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #12392

2 years agoLinux 5.14 compat: META
Brian Behlendorf [Wed, 15 Sep 2021 21:09:31 +0000 (14:09 -0700)]
Linux 5.14 compat: META

Increase the Linux-Maximum version in the META file to 5.14.
All of the required compatibility patches have been merged
and the 5.14 kernel has been officially released.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12565

2 years agoTemporarily use root credentials to mount snapshots in .zfs
Allan Jude [Tue, 14 Sep 2021 23:10:00 +0000 (19:10 -0400)]
Temporarily use root credentials to mount snapshots in .zfs

When mounting a snapshot in the .zfs/snapshots control directory,
temporarily assume roots credentials to perform the VFS_MOUNT().

This allows regular users and users inside jails to access these
snapshots.

The regular usermount code is not helpful here, since it requires
that the user performing the mount own the mountpoint, which won't
be the case for .zfs/snapshot/<snapname>

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Sponsored-By: Modirum MDPay
Sponsored-By: Klara Inc.
Closes #11312

2 years agoUse fallthrough macro
Brian Behlendorf [Tue, 14 Sep 2021 16:17:54 +0000 (09:17 -0700)]
Use fallthrough macro

As of the Linux 5.9 kernel a fallthrough macro has been added which
should be used to anotate all intentional fallthrough paths.  Once
all of the kernel code paths have been updated to use fallthrough
the -Wimplicit-fallthrough option will because the default.  To
avoid warnings in the OpenZFS code base when this happens apply
the fallthrough macro.

Additional reading: https://lwn.net/Articles/794944/

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12441

2 years agoIterate encrypted clones at zvol_create_minor
Jorgen Lundman [Mon, 13 Sep 2021 20:27:07 +0000 (05:27 +0900)]
Iterate encrypted clones at zvol_create_minor

Userland figures out which encryption-root keys are required to load,
and issues ZFS_IOC_LOAD_KEY.
The tail section of spa_keystore_load_wkey() will call
zvol_create_minors() on the encryption-root object.

Any clones of the encrypted zvol will not be plumbed. This commits
adds additional logic to detect if zvol has clones, and is encrypted,
then adds these to the list of zvols to call zvol_create_minors() on.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12471

2 years agoFixed data integrity issue when underlying disk returns error
Arun KV [Mon, 13 Sep 2021 20:02:39 +0000 (01:32 +0530)]
Fixed data integrity issue when underlying disk returns error

Errors in zil_lwb_write_done() are not propagated to
zil_lwb_flush_vdevs_done() which can result in zil_commit_impl()
not returning an error to applications even when zfs was not able
to write data to the disk.

Remove the ZIO_FLAG_DONT_PROPAGATE flag from zio_rewrite() to
allow errors to propagate and consolidate the error handling for
flush and write errors to a single location (rather than having
error handling split between the "write done" and "flush done"
handlers).

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: Arun KV <arun.kv@datacore.com>
Closes #12391
Closes #12443

2 years agoZTS: Waiting for zvols to be available
Brian Behlendorf [Mon, 13 Sep 2021 19:18:01 +0000 (12:18 -0700)]
ZTS: Waiting for zvols to be available

This is a follow up patch for PR #12515 which addresses some
additional ZTS tests which are unreliable are should explicitly
wait for the required zvols to be available.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: @Theo13111
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12553

2 years agoVerify embedded blkptr's in arc_read()
Brian Behlendorf [Fri, 10 Sep 2021 01:02:07 +0000 (18:02 -0700)]
Verify embedded blkptr's in arc_read()

The block pointer verification check in arc_read() should also
cover embedded block pointers.  While highly unlikely, accessing
a damaged block pointer can result in panic.  To further harden
the code extend the existing check to include embedded block
pointers and add a comment explaining the rational for this
sanity check.  Lastly, correct a flaw in zfs_blkptr_verify()
so the error count is checked even when checking a untrusted
config to verify the non-pool-specific portions of a block
pointer.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12535

2 years agoUpstream: Add snapshot and zvol events
Jorgen Lundman [Thu, 9 Sep 2021 17:44:21 +0000 (02:44 +0900)]
Upstream: Add snapshot and zvol events

For kernel to send snapshot mount/unmount events to zed.

For kernel to send symlink creates/removes on zvol plumbing.
(/dev/run/dsk/zvol/$pool/$zvol -> /dev/diskX)

If zed misses the ENODEV, all errors after are EINVAL. Treat any error
as kernel module failure.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12416

2 years agoLinux 5.15 compat: get_acl()
Brian Behlendorf [Thu, 9 Sep 2021 16:38:35 +0000 (09:38 -0700)]
Linux 5.15 compat: get_acl()

Kernel commits

332f606b32b6 ovl: enable RCU'd ->get_acl()
0cad6246621b vfs: add rcu argument to ->get_acl() callback

Added compatibility code to detect the new ->get_acl() interface
and correctly handle the case where the new rcu argument is set.

Reviewed-by: Coleman Kane <ckane@colemankane.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12548

2 years agoAllow sending corrupt snapshots even if metadata is corrupted
Allan Jude [Thu, 9 Sep 2021 14:17:31 +0000 (10:17 -0400)]
Allow sending corrupt snapshots even if metadata is corrupted

When zfs_send_corrupt_data is set, use the TRAVERSE_HARD flag,
so traverse_visitbp() will not fail with ECKSUM if a blockpointer
cannot be read, but rather will continue and send the objects it can.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Sponsored-By: Klara Inc.
Sponsored-By: WHC Online Solutions Inc.
Closes #12541

2 years agoarc: Drop an incorrect assert
Rich Ercolani [Wed, 8 Sep 2021 21:00:03 +0000 (17:00 -0400)]
arc: Drop an incorrect assert

Unfortunately, there was an overzealous assertion that was (in pretty
specific circumstances) false, causing failure.  This assertion was
added in error, so we're removing it.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #9897
Closes #12020
Closes #12246

2 years agoUpstream: zdb inode mapping
Jorgen Lundman [Wed, 8 Sep 2021 20:56:04 +0000 (05:56 +0900)]
Upstream: zdb inode mapping

Unfortunately macOS reserves inode ID numbers 0-15, and we can
not used them. In macOS port we simply map them really high IDs.

Normally this is hidden inside the _os implementation, but this is
the one place in the common source files.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12530

2 years agoCompressed receive with different ashift can result in incorrect PSIZE on disk
Paul Dagnelie [Wed, 8 Sep 2021 20:52:28 +0000 (13:52 -0700)]
Compressed receive with different ashift can result in incorrect PSIZE on disk

We round up the psize to the nearest multiple of the asize or to the
lsize, whichever is smaller. Once that's done, we allocate a new
buffer of the appropriate size, zero the tail, and copy the data
into it. This adds a small performance cost to these kinds of writes,
but fixes the bookkeeping problems.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Co-authored-by: Matthew Ahrens <matthew.ahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #12522
Closes #8462

2 years agoLinux 5.15 compat: standalone <linux/stdarg.h>
Alexander [Wed, 8 Sep 2021 19:59:43 +0000 (21:59 +0200)]
Linux 5.15 compat: standalone <linux/stdarg.h>

Kernel commits

39f75da7bcc8 ("isystem: trim/fixup stdarg.h and other headers")
c0891ac15f04 ("isystem: ship and use stdarg.h")
564f963eabd1 ("isystem: delete global -isystem compile option")

(for now can be found in linux-next.git tree, will land into the
 Linus' tree during the ongoing 5.15 cycle with one of akpm merges)

removed the -isystem flag and disallowed the inclusion of any
compiler header files. They also introduced a minimal
<linux/stdarg.h> as a replacement for <stdarg.h>.
include/os/linux/spl/sys/cmn_err.h in the ZFS source tree includes
<stdarg.h> unconditionally. Introduce a test for <linux/stdarg.h>
and include it instead of the compiler's one to prevent module
build breakage.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Closes #12531

2 years agoLinux 5.15 compat: block device readahead
Brian Behlendorf [Wed, 8 Sep 2021 15:03:13 +0000 (08:03 -0700)]
Linux 5.15 compat: block device readahead

The 5.15 kernel moved the backing_dev_info structure out of
the request queue structure which causes a build failure.

Rather than look in the new location for the BDI we instead
detect this upstream refactoring by the existance of either
the blk_queue_update_readahead() or disk_update_readahead()
functions.  In either case, there's no longer any reason to
manually set the ra_pages value since it will be overridden
with a reasonable default (2x the block size) when
blk_queue_io_opt() is called.

Therefore, we update the compatibility wrapper to do nothing
for 5.9 and newer kernels.  While it's tempting to do the
same for older kernels we want to keep the compatibility
code to preserve the existing behavior.  Removing it would
effectively increase the default readahead to 128k.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12532

2 years agoDetect iSCSI in the zpool cmd vdev media script
Don Brady [Thu, 2 Sep 2021 23:11:53 +0000 (17:11 -0600)]
Detect iSCSI in the zpool cmd vdev media script

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #12206

2 years agoCI: don't install abigail-tools
George Melikov [Tue, 31 Aug 2021 20:56:45 +0000 (23:56 +0300)]
CI: don't install abigail-tools

We use docker image instead.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12529

2 years agoUpdate ABI files via new libabigail version
George Melikov [Tue, 31 Aug 2021 19:26:30 +0000 (22:26 +0300)]
Update ABI files via new libabigail version

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12529

2 years agoLibabigail: make .abi files more consistent
George Melikov [Tue, 31 Aug 2021 18:52:05 +0000 (21:52 +0300)]
Libabigail: make .abi files more consistent

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12529

2 years agoCI: use fresh libabigail via docker image
George Melikov [Tue, 31 Aug 2021 17:53:12 +0000 (20:53 +0300)]
CI: use fresh libabigail via docker image

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12529