]> CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log
FreeBSD/FreeBSD.git
3 years agozed: print out licence string as one big chunk
наб [Wed, 7 Apr 2021 12:52:58 +0000 (14:52 +0200)]
zed: print out licence string as one big chunk

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11860

3 years agozed: only go up to current limit in close_from() fallback
наб [Sat, 3 Apr 2021 10:09:24 +0000 (12:09 +0200)]
zed: only go up to current limit in close_from() fallback

Consider the following strace log:
  prlimit64(0, RLIMIT_NOFILE,
            NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0
  dup2(0, 30)                         = 30
  dup2(0, 300)                        = 300
  dup2(0, 3000)                       = -1 EBADF (Bad file descriptor)
  dup2(0, 30000)                      = -1 EBADF (Bad file descriptor)
  dup2(0, 300000)                     = -1 EBADF (Bad file descriptor)
  prlimit64(0, RLIMIT_NOFILE,
            {rlim_cur=1024*1024, rlim_max=1024*1024}, NULL) = 0
  dup2(0, 30)                         = 30
  dup2(0, 300)                        = 300
  dup2(0, 3000)                       = 3000
  dup2(0, 30000)                      = 30000
  dup2(0, 300000)                     = 300000

Even a privileged process needs to bump its rlimit before being able
to use fds higher than rlim_cur.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834

3 years agozed.8: the Diagnosis Engine is implemented
наб [Fri, 2 Apr 2021 19:37:53 +0000 (21:37 +0200)]
zed.8: the Diagnosis Engine is implemented

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834

3 years agozed: replace zed_file_write_n() with write(2), purge it
наб [Fri, 2 Apr 2021 19:31:23 +0000 (21:31 +0200)]
zed: replace zed_file_write_n() with write(2), purge it

We set SA_RESTART early on, which will prevent EINTRs (indeed, to the
point of needing to clear it in the reaper, since it interferes with
pause(2)), which is the only error zed_file_write_n() actually handled
(plus, the pid write is no bigger than 12 bytes anyway)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834

3 years agozed: merge all _NOT_IMPLEMENTED_ events
наб [Fri, 2 Apr 2021 18:47:00 +0000 (20:47 +0200)]
zed: merge all _NOT_IMPLEMENTED_ events

These events should currently never be generated.

Also untag _zed_event_add_nvpair() from merge with
zpool_do_events_nvprint() ‒ they serve different purposes (machine,
usually script vs human consumption) and format the output differently
as it stands

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834

3 years agozed: remove unused zed_file_read_n()
наб [Fri, 2 Apr 2021 15:32:51 +0000 (17:32 +0200)]
zed: remove unused zed_file_read_n()

Same deal as zed_file_close_on_exec()

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834

3 years agozed: bump zfs_zevent_len_max if we miss any events
наб [Fri, 2 Apr 2021 15:14:31 +0000 (17:14 +0200)]
zed: bump zfs_zevent_len_max if we miss any events

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834

3 years agozed.8: don't pretend an unprivileged user could change the script owner
наб [Fri, 2 Apr 2021 14:40:48 +0000 (16:40 +0200)]
zed.8: don't pretend an unprivileged user could change the script owner

And add a note on /why/ ZEDLETs need to be owned by root

Quoth chown(2), Linux man-pages project:
  Only a privileged process (Linux: one with the CAP_CHOWN capability)
  may change the owner of a file.

Quoth chown(2), FreeBSD:
     [EPERM]  The operation would change the ownership,
              but the effective user ID is not the super-user.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834

3 years agozed: purge all mentions of a configuration file
наб [Fri, 2 Apr 2021 13:57:23 +0000 (15:57 +0200)]
zed: purge all mentions of a configuration file

There simply isn't a need for one, since the flags the daemon takes
are all short (mostly just toggles) and administrative in nature,
and are therefore better served by the age-old tradition of sourcing an
environment file and preparing the cmdline in the init-specific handler
itself, if needed at all

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834

3 years agozed: implement close_from() in terms of /proc/self/fd, if available
наб [Fri, 2 Apr 2021 13:10:34 +0000 (15:10 +0200)]
zed: implement close_from() in terms of /proc/self/fd, if available

/dev/fd on Darwin

Consider the following strace output:
  prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0

Yes, that is well over a million file descriptors!

This reduces the ZED start-up time from "at least a second" to
"instantaneous", and, under strace, from "don't even try" to "usable"
by simple virtue of doing five syscalls instead of over a million;
in most cases the main loop does nothing

Recent Linuxes (5.8+) have close_range(2) for this, but that's an
overoptimisation (and libcs don't have wrappers for it yet)

This is also run by the ZEDLET pre-exec. Compare:
  Finished "all-syslog.sh" eid=13 pid=6717 time=1.027100s exit=0
  Finished "history_event-zfs-list-cacher.sh" eid=13 pid=6718 time=1.046923s exit=0
to
  Finished "all-syslog.sh" eid=12 pid=4834 time=0.001836s exit=0
  Finished "history_event-zfs-list-cacher.sh" eid=12 pid=4835 time=0.001346s exit=0
lol

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834

3 years agozed: print combined system/user time after ZEDLET death
наб [Fri, 2 Apr 2021 12:10:31 +0000 (14:10 +0200)]
zed: print combined system/user time after ZEDLET death

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834

3 years agozed: allow limiting concurrent jobs
наб [Mon, 29 Mar 2021 13:21:54 +0000 (15:21 +0200)]
zed: allow limiting concurrent jobs

200ms time-out is relatively long, but if we already hit the cap,
then we'll likely be able to spawn multiple new jobs when we wake up

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11807

3 years agozed: remove unused zed_file_close_on_exec()
наб [Sat, 27 Mar 2021 13:18:27 +0000 (14:18 +0100)]
zed: remove unused zed_file_close_on_exec()

The FIXME comment was there since the initial implementation in 2014,
there are no users

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11807

3 years agozed: use separate reaper thread and collect ZEDLETs asynchronously
наб [Fri, 26 Mar 2021 13:41:38 +0000 (14:41 +0100)]
zed: use separate reaper thread and collect ZEDLETs asynchronously

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11807

3 years agozed: set names for all threads
наб [Fri, 26 Mar 2021 20:18:18 +0000 (21:18 +0100)]
zed: set names for all threads

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11807

3 years agoTag 2.1.0-rc2
Brian Behlendorf [Wed, 7 Apr 2021 20:30:30 +0000 (13:30 -0700)]
Tag 2.1.0-rc2

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
3 years agofix misplaced quotes in kmod-preamble
Olaf Faaland [Wed, 7 Apr 2021 17:10:34 +0000 (10:10 -0700)]
fix misplaced quotes in kmod-preamble

rpm/redhat/zfs-kmod.spec.in has a typo in the shell code that
creates the kmod-preamble file.  This typo results in the
preamble file having the wrong name,

./SOURCES/kmod-preamblenObsoletes

and missing the Obsoletes clause that has become part of the name.

Because the filename is incorrect, the built package does not have
"obsoletes" or "conflicts" set.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #11851

3 years agoObsolete earlier packages due to version bump
Brian Behlendorf [Wed, 7 Apr 2021 17:09:21 +0000 (10:09 -0700)]
Obsolete earlier packages due to version bump

In order for package managers such as dnf to upgrade cleanly after
the package SONAME bump the obsolete package names must be known.
Update the new packages to correctly obsolete the old ones.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11844
Closes #11847

3 years agoi-t: don't brokenly set the scheduler for root pool vdev's disks
наб [Sat, 3 Apr 2021 22:53:40 +0000 (00:53 +0200)]
i-t: don't brokenly set the scheduler for root pool vdev's disks

This effectively reverts
  4fc411f7a3ecee8a70fc8d6c687fae9a1cf20b31 (part of #6807) and
  f6fbe25664629d1ae6a3b186f14ec69dbe6c6232 (#9042) ‒
the code itself and latter PR cite symmetry with whole-disk-vdev
behaviour (presumably because rootfs vdevs are rarely whole disks),
but the code is broken for NVME devices (indeed, it'd strip the
controller number instead of the (potential) partition number, turning
"nvme0n1p1" into "nvmen1p1", which would then subsequently fail the
sysfs existence check); it could be fixed to handle those (and any
others) rather easily by dereferencing /sys/class/block/$devname,
but this isn't the place for setting this ‒ as noted in the commit that
removed setting the scheduler by default
(9e17e6f2541c69a7a5e0ed814a7f5e71cbf8b90a) ‒ use an udev rule

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11838

3 years agoi-t: fix root=zfs:AUTO
наб [Sat, 3 Apr 2021 16:18:39 +0000 (18:18 +0200)]
i-t: fix root=zfs:AUTO

IFS= would break loops in import_pool(), which would fault
any automatic import

Additionally $ZFS_BOOTFS from cmdline would interfere with find_rootfs()

If many pools were present, same thing could happen across multiple
find_rootfs() runs, so bail out early and clean up in error path

Suggested-by: @nachtgeist
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11278
Closes #11838

3 years agozfs get -p only outputs 3 columns if "clones" property is empty
matt-fidd [Tue, 6 Apr 2021 23:05:54 +0000 (00:05 +0100)]
zfs get -p only outputs 3 columns if "clones" property is empty

get_clones_string currently returns an empty string for filesystem
snapshots which have no clones. This breaks parsable `zfs get` output as
only three columns are output, instead of 4.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Fiddaman <github@m.fiddaman.uk>
Co-authored-by: matt <matt@fiddaman.net>
Closes #11837

3 years agozpool-features.5: remove "booting not possible with this feature"s
наб [Tue, 6 Apr 2021 19:39:54 +0000 (21:39 +0200)]
zpool-features.5: remove "booting not possible with this feature"s

The exact limitations on what features are supported when booting
vary considerably depending on the environment.  In order to minimize
confusion avoid categorical statements which assume GRUB2 is being
used.  The supported GRUB2 features are covered earlier in this man
page for easy reference.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11842

3 years agoman: fix wrong .Xr macros usages
George Melikov [Tue, 6 Apr 2021 19:27:40 +0000 (22:27 +0300)]
man: fix wrong .Xr macros usages

In addition, html doc will have working hyperlinks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11845

3 years agolibzutil: zfs_isnumber(): return false if input empty
наб [Tue, 6 Apr 2021 19:25:53 +0000 (21:25 +0200)]
libzutil: zfs_isnumber(): return false if input empty

zpool list, which is the only user, would mistakenly try to parse the
empty string as the interval in this case:

  $ zpool list "a"
  cannot open 'a': no such pool
  $ zpool list ""
  interval cannot be zero
  usage: <usage string follows>
which is now symmetric with zpool get:
  $ zpool list ""
  cannot open '': name must begin with a letter

Avoid breaking the  "interval cannot be zero" string.
There simply isn't a need for this, and it's user-facing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11841
Closes #11843

3 years agoZTS: pool_checkpoint improvements
Brian Behlendorf [Sat, 3 Apr 2021 15:33:22 +0000 (08:33 -0700)]
ZTS: pool_checkpoint improvements

The pool_checkpoint tests may incorrectly fail because several of
them invoke zdb for an imported pool.  In this scenario it's not
unexpected for zdb to fail if the pool is modified.  To resolve
this these zdb checks are now done after the pool has been exported.

Additionally, the default cleanup functions assumed the pool would
be imported when they were run.  If this was not the case they're
exit early and fail to cleanup all of the test state causing
subsequent tests to fail.  Add a check to only destroy the pool
when it is imported.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11832

3 years agoFix various typos
Andrea Gelmini [Sat, 3 Apr 2021 01:38:53 +0000 (18:38 -0700)]
Fix various typos

Correct an assortment of typos throughout the code base.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #11774

3 years agobash_completion.d: always call zfs/zpool binaries directly
наб [Fri, 2 Apr 2021 23:34:58 +0000 (01:34 +0200)]
bash_completion.d: always call zfs/zpool binaries directly

/dev/zfs is 0:0 666 on most systems, so the [ -w /dev/zfs ] check always
succeeds, but if zfs isn't in $PATH (e.g. when completing from
"/sbin/zfs list" on a regular account) this can lead to error spew like

  nabijaczleweli@szarotka:~$ /sbin/zfs list bash: zfs: command not found
  @ bash: zfs: command not found

We only do read-only commands, and quite general ones at that,
so there's no need to elevate one way or another.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11828

3 years agoAdd RELEASES.md file
Brian Behlendorf [Fri, 2 Apr 2021 23:33:40 +0000 (16:33 -0700)]
Add RELEASES.md file

Document the project's policy regarding publishing and maintaining
official OpenZFS releases.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11821

3 years agoZTS: inheritance/inherit_001_pos is flaky
Ryan Moeller [Fri, 2 Apr 2021 18:11:52 +0000 (14:11 -0400)]
ZTS: inheritance/inherit_001_pos is flaky

Add inheritance/inherit_001_pos to the maybe fails on FreeBSD list.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11830

3 years agoFreeBSD: Fix stable/12 after AT_BENEATH removal
Ryan Moeller [Fri, 2 Apr 2021 18:06:44 +0000 (14:06 -0400)]
FreeBSD: Fix stable/12 after AT_BENEATH removal

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11827

3 years agoBump libzfs.so and libzpool.so versions
Brian Behlendorf [Thu, 1 Apr 2021 23:53:05 +0000 (16:53 -0700)]
Bump libzfs.so and libzpool.so versions

Bump the library versions as advised by the libtool guidelines.

https://www.gnu.org/software/libtool/manual/html_node/Updating-version-info.html

Two new functions were added but no existing functions were changed,
so we increase the version and the age (version:revision:age).

Added functions (2):
- boolean_t zpool_is_draid_spare(const char *);
- zpool_compat_status_t zpool_load_compat(const char *,
      boolean_t *, char *, char *);

Additionally bump the libzpool.so version information.  This library
is for internal use but we still want to update the version to track
major changes to the interfaces.

The libzfsbootenv, libuutil, libnvpair and libzfs_core libraries
have not been updated.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11817

3 years agoAllow pool names that look like Solaris disk names
Ryan Moeller [Thu, 1 Apr 2021 15:49:41 +0000 (11:49 -0400)]
Allow pool names that look like Solaris disk names

Nothing bad happens if a prefix of your pool name matches a disk name.
This is a bit of a silly restriction at this point.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #11781
Closes #11813

3 years agoDon't scale zfs_zevent_len_max by CPU count
Ryan Moeller [Wed, 31 Mar 2021 17:56:37 +0000 (13:56 -0400)]
Don't scale zfs_zevent_len_max by CPU count

The lower bound for this scaling to too low and the upper bound is too
high.  Use a fixed default length of 512 instead, which is a reasonable
value on any system.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11822

3 years agoAtomically check and set dropped zevent count
Ryan Moeller [Mon, 29 Mar 2021 19:44:27 +0000 (15:44 -0400)]
Atomically check and set dropped zevent count

ratelimit_dropped isn't protected by a lock and is expected to
be updated atomically.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11822

3 years agoCI: Increase free space in workflow
Brian Behlendorf [Thu, 1 Apr 2021 15:39:27 +0000 (08:39 -0700)]
CI: Increase free space in workflow

Recently we've been running out of free space in the ubuntu 20.04
environment resulting in test failures.  This appears to be caused
by a change in the default available free space and not because of
any change in OpenZFS. Try and avoid this failure by applying a
suggested workaround which removes some unnecessary files.

https://github.com/actions/virtual-environments/issues/2840

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11826

3 years agoFixing m4 iops rename check
Brian Atkinson [Thu, 1 Apr 2021 15:37:41 +0000 (09:37 -0600)]
Fixing m4 iops rename check

The configure check for iops->rename wanting flags was missing the
AC_MSG_CHECKING() so it would just print yes without saying what was
being checked.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #11825

3 years agofsck.zfs: implement 4/8 exit codes as suggested in manpage
наб [Wed, 31 Mar 2021 17:49:56 +0000 (19:49 +0200)]
fsck.zfs: implement 4/8 exit codes as suggested in manpage

Update the fsck.zfs helper to bubble up some already-known-about
errors if they are detected in the pool.

health=degraded => 4/"Filesystem errors left uncorrected"
health=faulted && dataset in /etc/fstab => 8/"Operational error"
pool not found => 8/"Operational error"
everything else => 0

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11806

3 years agoAdd compatibility file sets (ZoL 0.6.1, 0.6.4, OpenZFS 2.1)
Mike Swanson [Wed, 31 Mar 2021 16:40:25 +0000 (09:40 -0700)]
Add compatibility file sets (ZoL 0.6.1, 0.6.4, OpenZFS 2.1)

ZoL 0.6.1 introduced feature flags with the three features that all
implementations at the time were guaranteed to have.  0.6.4 introduced
a few more until 0.6.5 added two after that.  OpenZFS 2.1 added the
dRAID feature.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mike Swanson <mikeonthecomputer@gmail.com>
Closes #11818

3 years agoTag 2.1.0-rc1
Brian Behlendorf [Mon, 29 Mar 2021 23:31:29 +0000 (16:31 -0700)]
Tag 2.1.0-rc1

New features:
- Distributed Spare (dRAID) Feature
- Added "compatibility" property for zpool feature sets
- Added zpool_influxdb command to collect zpool statistics

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
3 years agozed: reap child after killing on time-out
наб [Fri, 26 Mar 2021 21:21:00 +0000 (22:21 +0100)]
zed: reap child after killing on time-out

When a child process is killed waitpid() must be called on the
pid the reap the zombie process.

Update BUGS section to reflect reality by replacing "zedlets
aren't time limited with "zedlets can be interrupted".

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11769
Closes #11798

3 years agoUse a helper function to clarify gang block size
Matthew Ahrens [Fri, 26 Mar 2021 18:19:35 +0000 (11:19 -0700)]
Use a helper function to clarify gang block size

For gang blocks, `DVA_GET_ASIZE()` is the total space allocated for the
gang DVA including its children BP's.  The space allocated at each DVA's
vdev/offset is `vdev_psize_to_asize(vd, SPA_GANGBLOCKSIZE)`.

This commit makes this relationship more clear by using a helper
function, `vdev_gang_header_asize()`, for the space allocated at the
gang block's vdev/offset.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11744

3 years agoWhen specifying raidz vdev name, parity count should match
Matthew Ahrens [Fri, 26 Mar 2021 18:12:22 +0000 (11:12 -0700)]
When specifying raidz vdev name, parity count should match

When specifying the name of a RAIDZ vdev on the command line, it can be
specified as raidz-<vdevID> or raidzP-<vdevID>.
e.g. `zpool clear poolname raidz-0` or `zpool clear poolname raidz2-0`

If the parity is specified in the vdev name, it should match the actual
parity of that RAIDZ vdev, otherwise the command should fail.  This
commit makes it so.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Stuart Maybee <stuart.maybee@comcast.net>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11742

3 years agoFix error code on __zpl_ioctl_setflags()
Luis Henriques [Fri, 26 Mar 2021 17:46:45 +0000 (17:46 +0000)]
Fix error code on __zpl_ioctl_setflags()

Other (all?) Linux filesystems seem to return -EPERM instead of -EACCESS
when trying to set FS_APPEND_FL or FS_IMMUTABLE_FL without the
CAP_LINUX_IMMUTABLE capability.  This was detected by generic/545 test
in the fstest suite.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Luis Henriques <henrix@camandro.org>
Closes #11791

3 years agoSupport running FreeBSD buildworld on Arm-based macOS hosts
Jessica Clarke [Fri, 26 Mar 2021 17:45:12 +0000 (17:45 +0000)]
Support running FreeBSD buildworld on Arm-based macOS hosts

Arm-based Macs are like FreeBSD and provide a full 64-bit stat from the
start, so have no stat64 variants. Thus, define stat64 and fstat64 as
aliases for the normal versions.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Jessica Clarke <jrtc27@jrtc27.com>
Closes #11771

3 years agoRemoved duplicated includes
Andrea Gelmini [Mon, 22 Mar 2021 19:34:58 +0000 (20:34 +0100)]
Removed duplicated includes

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #11775

3 years agoFix typo in Python method name
Andrea Gelmini [Mon, 22 Mar 2021 19:32:38 +0000 (20:32 +0100)]
Fix typo in Python method name

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #11776

3 years agoSplit dmu_zfetch() speculation and execution parts
Alexander Motin [Sat, 20 Mar 2021 05:56:11 +0000 (01:56 -0400)]
Split dmu_zfetch() speculation and execution parts

To make better predictions on parallel workloads dmu_zfetch() should
be called as early as possible to reduce possible request reordering.
In particular, it should be called before dmu_buf_hold_array_by_dnode()
calls dbuf_hold(), which may sleep waiting for indirect blocks, waking
up multiple threads same time on completion, that can significantly
reorder the requests, making the stream look like random.  But we
should not issue prefetch requests before the on-demand ones, since
they may get to the disks first despite the I/O scheduler, increasing
on-demand request latency.

This patch splits dmu_zfetch() into two functions: dmu_zfetch_prepare()
and dmu_zfetch_run().  The first can be executed as early as needed.
It only updates statistics and makes predictions without issuing any
I/Os.  The I/O issuance is handled by dmu_zfetch_run(), which can be
called later when all on-demand I/Os are already issued.  It even
tracks the activity of other concurrent threads, issuing the prefetch
only when _all_ on-demand requests are issued.

For many years it was a big problem for storage servers, handling
deeper request queues from their clients, having to either serialize
consequential reads to make ZFS prefetcher usable, or execute the
incoming requests as-is and get almost no prefetch from ZFS, relying
only on deep enough prefetch by the clients.  Benefits of those ways
varied, but neither was perfect.  With this patch deeper queue
sequential read benchmarks with CrystalDiskMark from Windows via
iSCSI to FreeBSD target show me much better throughput with almost
100% prefetcher hit rate, comparing to almost zero before.

While there, I also removed per-stream zs_lock as useless, completely
covered by parent zf_lock.  Also I reused zs_blocks refcount to track
zf_stream linkage of the stream, since I believe previous zs_fetch ==
NULL check in dmu_zfetch_stream_done() was racy.

Delete prefetch streams when they reach ends of files.  It saves up
to 1KB of RAM per file, plus reduces searches through the stream list.

Block data prefetch (speculation and indirect block prefetch is still
done since they are cheaper) if all dbufs of the stream are already
in DMU cache.  First cache miss immediately fires all the prefetch
that would be done for the stream by that time.  It saves some CPU
time if same files within DMU cache capacity are read over and over.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #11652

3 years agoFix zfs_get_data access to files with wrong generation
Chunwei Chen [Sat, 20 Mar 2021 05:53:31 +0000 (22:53 -0700)]
Fix zfs_get_data access to files with wrong generation

If TX_WRITE is create on a file, and the file is later deleted and a new
directory is created on the same object id, it is possible that when
zil_commit happens, zfs_get_data will be called on the new directory.
This may result in panic as it tries to do range lock.

This patch fixes this issue by record the generation number during
zfs_log_write, so zfs_get_data can check if the object is valid.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #10593
Closes #11682

3 years agoFix regression in POSIX mode behavior
Andrew [Sat, 20 Mar 2021 05:50:46 +0000 (01:50 -0400)]
Fix regression in POSIX mode behavior

Commit 235a85657 introduced a regression in evaluation of POSIX modes
that require group DENY entries in the internal ZFS ACL. An example
of such a POSX mode is 007. When write_implies_delete_child is set,
then ACE_WRITE_DATA is added to `wanted_dirperms` in prior to calling
zfs_zaccess_common(). This occurs is zfs_zaccess_delete().

Unfortunately, when zfs_zaccess_aces_check hits this particular DENY
ACE, zfs_groupmember() is checked to determine whether access should be
denied, and since zfs_groupmember() always returns B_TRUE on Linux and
so this check is failed, resulting ultimately in EPERM being returned.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Andrew Walker <awalker@ixsystems.com>
Closes #11760

3 years agoZTS: New test for kernel panic induced by redacted send
Palash Gandhi [Sat, 20 Mar 2021 05:47:50 +0000 (22:47 -0700)]
ZTS: New test for kernel panic induced by redacted send

This change adds a new test that covers a bug fix in the binary search
in the redacted send resume logic that causes a kernel panic.
The bug was fixed in https://github.com/openzfs/zfs/pull/11297.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Palash Gandhi <palash.gandhi@delphix.com>
Closes #11764

3 years agoAllow setting bootfs property on pools with indirect vdevs
Martin Matuška [Sat, 20 Mar 2021 05:46:43 +0000 (06:46 +0100)]
Allow setting bootfs property on pools with indirect vdevs

The FreeBSD boot loader relies on the bootfs property and is capable
of booting from removed (indirect) vdevs.

Reviewed-by Eric van Gyzen
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #11763

3 years agoFix typo in zgenhostid.8
Ryan Moeller [Sat, 20 Mar 2021 05:39:42 +0000 (01:39 -0400)]
Fix typo in zgenhostid.8

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11770

3 years agoRemoving old code for k(un)map_atomic
Brian Atkinson [Sat, 20 Mar 2021 05:38:44 +0000 (23:38 -0600)]
Removing old code for k(un)map_atomic

It used to be required to pass a enum km_type to kmap_atomic() and
kunmap_atomic(), however this is no longer necessary and the wrappers
zfs_k(un)map_atomic removed these. This is confusing in the ABD code as
the struct abd_iter member iter_km no longer exists and the wrapper
macros simply compile them out.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #11768

3 years agoInitialize metaslab range trees in metaslab_init
Serapheim Dimitropoulos [Sat, 20 Mar 2021 05:36:02 +0000 (22:36 -0700)]
Initialize metaslab range trees in metaslab_init

= Motivation

We've noticed several zloop crashes within Delphix generated
due to the following sequence of events:

- A device gets expanded and new metaslabas are allocated for
  it. These metaslabs go through `metaslab_init()` but haven't
  gone through `metaslab_sync_done()` yet. This meas that the
  only range tree that's actually set is the `ms_allocatable`.
  All the others are NULL.

- A vdev_initialization is issues and `vdev_initialize_thread`
  starts processing one of these new metaslabs of the expanded
  vdev.

- As part of `vdev_initialize_calculate_progress()` we call
  into `metaslab_load()` and `metaslab_load_impl()` which
  in turn tries to dereference the metaslabs trees that
  are still NULL and therefore we crash.

The same failure can come up from the `vdev_trim` code paths.

= This Patch

We considered the following solutions to deal with this issue:

[A] Add logic to `vdev_initialize/trim` to skip those new
    metaslabs. We decided against this as it would be good
    to avoid exposing this lower-level detail to higer-level
    operations.

[B] Have `metaslab_load_impl()` return early for new metaslabs
    and thus never touch those range_trees that are NULL at
    that time. This seemed more of a work-around for the bug
    and not a clear-cut solution.

[C] Refactor our logic so all metaslabs have their range_trees
    created at the time of their creatin in `metaslab_init()`.

In this patch we decided to go with [C] because:

(1) It doesn't expose more metaslab details to higher level
    operations such as vdev initialize and trim.

(2) The current behavior of creating the range trees lazily
    in `metaslab_sync_done()` is unnecessarily complicated.

(3) Always initializing the metaslab range_trees makes other
    parts of the codebase cleaner. For example, we used to
    use `ms_freed` as the reference value for knowing whether
    all the range_trees have been initialized. Now we no
    longer need to do that check in most places (and in the
    few that we do we use the `ms_new` boolean field now
    which is more readable).

= Side Changes

Probably due to a mismerge we set `ms_loaded` to `B_TRUE` twice
in `metasloab_load_impl()`. In this patch we remove the extraneous
assignment.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #11737

3 years agoLinux 5.12 update: bio_max_segs() replaces BIO_MAX_PAGES
Coleman Kane [Sat, 20 Mar 2021 05:33:42 +0000 (01:33 -0400)]
Linux 5.12 update: bio_max_segs() replaces BIO_MAX_PAGES

The BIO_MAX_PAGES macro is being retired in favor of a bio_max_segs()
function that implements the typical MIN(x,y) logic used throughout the
kernel for bounding the allocation, and also the new implementation is
intended to be signed-safe (which the former was not).

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11765

3 years agoLinux 5.12 compat: idmapped mounts
Coleman Kane [Sat, 20 Mar 2021 04:00:59 +0000 (00:00 -0400)]
Linux 5.12 compat: idmapped mounts

In Linux 5.12, the filesystem API was modified to support ipmapped
mounts by adding a "struct user_namespace *" parameter to a number
functions and VFS handlers. This change adds the needed autoconf
macros to detect the new interfaces and updates the code appropriately.
This change does not add support for idmapped mounts, instead it
preserves the existing behavior by passing the initial user namespace
where needed.  A subsequent commit will be required to add support
for idmapped mounted.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11712

3 years agoClean up RAIDZ/DRAID ereport code
Matthew Ahrens [Fri, 19 Mar 2021 23:22:10 +0000 (16:22 -0700)]
Clean up RAIDZ/DRAID ereport code

The RAIDZ and DRAID code is responsible for reporting checksum errors on
their child vdevs.  Checksum errors represent events where a disk
returned data or parity that should have been correct, but was not.  In
other words, these are instances of silent data corruption.  The
checksum errors show up in the vdev stats (and thus `zpool status`'s
CKSUM column), and in the event log (`zpool events`).

Note, this is in contrast with the more common "noisy" errors where a
disk goes offline, in which case ZFS knows that the disk is bad and
doesn't try to read it, or the device returns an error on the requested
read or write operation.

RAIDZ/DRAID generate checksum errors via three code paths:

1. When RAIDZ/DRAID reconstructs a damaged block, checksum errors are
reported on any children whose data was not used during the
reconstruction.  This is handled in `raidz_reconstruct()`.  This is the
most common type of RAIDZ/DRAID checksum error.

2. When RAIDZ/DRAID is not able to reconstruct a damaged block, that
means that the data has been lost.  The zio fails and an error is
returned to the consumer (e.g. the read(2) system call).  This would
happen if, for example, three different disks in a RAIDZ2 group are
silently damaged.  Since the damage is silent, it isn't possible to know
which three disks are damaged, so a checksum error is reported against
every child that returned data or parity for this read.  (For DRAID,
typically only one "group" of children is involved in each io.)  This
case is handled in `vdev_raidz_cksum_finish()`. This is the next most
common type of RAIDZ/DRAID checksum error.

3. If RAIDZ/DRAID is not able to reconstruct a damaged block (like in
case 2), but there happens to be additional copies of this block due to
"ditto blocks" (i.e. multiple DVA's in this blkptr_t), and one of those
copies is good, then RAIDZ/DRAID compares each sector of the data or
parity that it retrieved with the good data from the other DVA, and if
they differ then it reports a checksum error on this child.  This
differs from case 2 in that the checksum error is reported on only the
subset of children that actually have bad data or parity.  This case
happens very rarely, since normally only metadata has ditto blocks.  If
the silent damage is extensive, there will be many instances of case 2,
and the pool will likely be unrecoverable.

The code for handling case 3 is considerably more complicated than the
other cases, for two reasons:

1. It needs to run after the main raidz read logic has completed.  The
data RAIDZ read needs to be preserved until after the alternate DVA has
been read, which necessitates refcounts and callbacks managed by the
non-raidz-specific zio layer.

2. It's nontrivial to map the sections of data read by RAIDZ to the
correct data.  For example, the correct data does not include the parity
information, so the parity must be recalculated based on the correct
data, and then compared to the parity that was read from the RAIDZ
children.

Due to the complexity of case 3, the rareness of hitting it, and the
minimal benefit it provides above case 2, this commit removes the code
for case 3.  These types of errors will now be handled the same as case
2, i.e. the checksum error will be reported against all children that
returned data or parity.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11735

3 years agoFreeBSD: make seqc asserts conditional on replay
Mateusz Guzik [Thu, 18 Mar 2021 05:09:45 +0000 (06:09 +0100)]
FreeBSD: make seqc asserts conditional on replay

Avoids tripping on asserts when doing pool recovery.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11739

3 years agoRemove unused rr_code
Matthew Ahrens [Thu, 18 Mar 2021 04:57:09 +0000 (21:57 -0700)]
Remove unused rr_code

The `rr_code` field in `raidz_row_t` is unused.

This commit removes the field, as well as the code that's used to set
it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11736

3 years agoFreeBSD: Fix memory leaks in kstats
Ryan Moeller [Thu, 18 Mar 2021 04:55:18 +0000 (00:55 -0400)]
FreeBSD: Fix memory leaks in kstats

Don't handle (incorrectly) kmem_zalloc() failure.  With KM_SLEEP,
will never return NULL.

Free the data allocated for non-virtual kstats when deleting the object.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11767

3 years agoLinux: always check or verify return of igrab()
Adam D. Moss [Tue, 16 Mar 2021 23:33:34 +0000 (16:33 -0700)]
Linux: always check or verify return of igrab()

zhold() wraps igrab() on Linux, and igrab() may fail when the inode
is in the process of being deleted.  This means zhold() must only be
called when a reference exists and therefore it cannot be deleted.
This is the case for all existing consumers so add a VERIFY and a
comment explaining this requirement.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Adam Moss <c@yotes.com>
Closes #11704

3 years agoUpdate FreeBSD versions
Dries Michiels [Tue, 16 Mar 2021 22:03:28 +0000 (23:03 +0100)]
Update FreeBSD versions

Update supported FreeBSD versions in documentation.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Dries Michiels <driesm.michiels@gmail.com>
Closes #11718

3 years agoHold and release permissions exist
gldisater [Tue, 16 Mar 2021 22:01:21 +0000 (18:01 -0400)]
Hold and release permissions exist

The man page was missing these two permissions.
Add the missing permissions to the man page.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jeremy Faulkner <gldisater@gldis.ca>
Closes #11727

3 years agoZTS: Add tests for DOS mode attributes
Ryan Moeller [Tue, 16 Mar 2021 22:00:14 +0000 (18:00 -0400)]
ZTS: Add tests for DOS mode attributes

Create a new section of tests to run with acltype=off.

For now the only test we have is for the DOS mode READONLY attribute on
FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11734

3 years agoReference_tracking_enable should be a module param
Don Brady [Tue, 16 Mar 2021 21:56:17 +0000 (15:56 -0600)]
Reference_tracking_enable should be a module param

To make use of zfs_refcount_held tunable it should be a module
parameter in open-zfs.  Also, since the macros will auto-generate OS
specific tunables, removed the existing zfs_refcount_held reference
in module/os/freebsd/zfs/sysctl_os.c.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #11753

3 years agoZTS: Fix incorrect use of libtest in user_run by xattr_003_neg
Ryan Moeller [Mon, 9 Nov 2020 22:57:00 +0000 (17:57 -0500)]
ZTS: Fix incorrect use of libtest in user_run by xattr_003_neg

You can't use user_run to eval ksh functions defined in libtest unless
you include libtest in the user shell.

Fix xattr_003_neg by:
* include libtest in the user shell
* *then* run get_xattr
* assert this fails
* use variables for filenames so they don't change in the user's shell
* don't log the contents of /etc/passwd
* cleanup all byproducts

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11185

3 years agoZTS: Use ksh and current environment for user_run
Ryan Moeller [Thu, 11 Mar 2021 20:01:58 +0000 (15:01 -0500)]
ZTS: Use ksh and current environment for user_run

The current user_run often does not work as expected.  Commands are run
in a different shell, with a different environment, and all output is
discarded.

Simplify user_run to retain the current environment, eliminate eval,
and feed the command string into ksh.  Enhance the logging for
user_run so we can see out and err.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11185

3 years agoFreeBSD: bring back possibility to rewind the checkpoint from bootloader
Mariusz Zaborski [Sat, 13 Mar 2021 00:12:14 +0000 (01:12 +0100)]
FreeBSD: bring back possibility to rewind the checkpoint from bootloader

Add parsing of the rewind options.

When I was upstreaming the change [1], I omitted the part where we
detect that the pool should be rewind. When the FreeBSD repo has
synced with the OpenZFS, this part of the code was removed.

[1] FreeBSD repo: 277f38abffc6a8160b5044128b5b2c620fbb970c
[2] OpenZFS repo: f2c027bd6a003ec5793f8716e6189c389c60f47a

External-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254152
Originally reviewed by: tsoome, allanjude
Originally reviewed by: kevans (ok from high-level overview)
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org>
Closes #11730

3 years agoFreeBSD: Clean up zfsdev_close to match Linux
Ryan Moeller [Sat, 13 Mar 2021 00:09:15 +0000 (19:09 -0500)]
FreeBSD: Clean up zfsdev_close to match Linux

Resolve some oddities in zfsdev_close() which could result in a
panic and were not present in the equivalent function for Linux.

- Remove unused definition ZFS_MIN_MINOR
- FreeBSD: Simplify zfsdev state destruction
- Assert zs_minor is valid in zfsdev_close
- Make locking around zfsdev state match Linux

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11720

3 years agoFreeBSD: switch teardown lock to rms
Mateusz Guzik [Wed, 4 Nov 2020 22:28:56 +0000 (17:28 -0500)]
FreeBSD: switch teardown lock to rms

This deserializes otherwise non-contending operations.

The previous scheme of using 17 locks hashed by curthread runs into
conflicts very quickly. Check the pull request for sample results.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11153

3 years agoMacroify teardown lock handling
Mateusz Guzik [Wed, 4 Nov 2020 22:23:48 +0000 (17:23 -0500)]
Macroify teardown lock handling

This will allow platforms to implement it as they see fit, in particular
in a different manner than rrm locks.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11153

3 years agoFreeBSD: rename teardown inactive macros to mimick rrm convention
Mateusz Guzik [Wed, 4 Nov 2020 22:19:35 +0000 (17:19 -0500)]
FreeBSD: rename teardown inactive macros to mimick rrm convention

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11153

3 years agoFreeBSD: remove 2 assertions that teardown lock is not held
Mateusz Guzik [Thu, 12 Nov 2020 22:33:14 +0000 (17:33 -0500)]
FreeBSD: remove 2 assertions that teardown lock is not held

They are not very useful and hard to implement in the rms routine
the code is about to start using.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11153

3 years agoFreeBSD: rework asserts in zfs_dd_lookup
Mateusz Guzik [Mon, 12 Oct 2020 21:27:59 +0000 (21:27 +0000)]
FreeBSD: rework asserts in zfs_dd_lookup

1. even up ifdefs
2. drop the arguably useless teardown lock asserts -- nothing else
   checks for it

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11153

3 years agoAdd branch prediction to ZFS_ENTER and ZFS_VERIFY_ZP macros
Mateusz Guzik [Thu, 15 Oct 2020 05:45:28 +0000 (05:45 +0000)]
Add branch prediction to ZFS_ENTER and ZFS_VERIFY_ZP macros

They are expected to fail only in corner cases.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11153

3 years agozpool import cachefile improvements
George Wilson [Fri, 12 Mar 2021 23:42:27 +0000 (17:42 -0600)]
zpool import cachefile improvements

Importing a pool using the cachefile is ideal to reduce the time
required to import a pool. However, if the devices associated with
a pool in the cachefile have changed, then the import would fail.
This can easily be corrected by doing a normal import which would
then read the pool configuration from the labels.

The goal of this change is make importing using a cachefile more
resilient and auto-correcting. This is accomplished by having
the cachefile import logic automatically fallback to reading the
labels of the devices similar to a normal import. The main difference
between the fallback logic and a normal import is that the cachefile
import logic will only look at the device directories that were
originally used when the cachefile was populated. Additionally,
the fallback logic will always import by guid to ensure that only
the pools in the cachefile would be imported.

External-issue: DLPX-71980
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes #11716

3 years agoFix whitespace introduced in ecc277cff
Martin Matuška [Fri, 12 Mar 2021 03:42:04 +0000 (04:42 +0100)]
Fix whitespace introduced in ecc277cff

The manual page change in ecc277c has introduced whitespace on
line ends.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #11722

3 years agoFreeBSD: Fix scope of deadman tunables
Ryan Moeller [Fri, 12 Mar 2021 03:23:24 +0000 (22:23 -0500)]
FreeBSD: Fix scope of deadman tunables

A few deadman tunables ended up in the wrong sysctl node.

Move them to vfs.zfs.deadman.*

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11715

3 years agoMicrooptimizations for VERIFY() and friends
Adam D. Moss [Fri, 12 Mar 2021 01:16:09 +0000 (17:16 -0800)]
Microoptimizations for VERIFY() and friends

Add branch hints and constify the intermediate evaluations of
left/right params in VERIFY3*().

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Adam Moss <c@yotes.com>
Closes #11708

3 years agoAdd missing files to Makefile
Allan Jude [Fri, 12 Mar 2021 01:13:34 +0000 (20:13 -0500)]
Add missing files to Makefile

Some .h files that were added were missed in this Makefile. Since
they are .h files, their being missing only resulted in them
disappeared from the dist archive.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #11705

3 years agoCI checkstyle: pin ubuntu version
George Melikov [Fri, 12 Mar 2021 01:11:31 +0000 (04:11 +0300)]
CI checkstyle: pin ubuntu version

Our checkstyle doesn't work well on Ubuntu 20.04,
temporary pin it to 18.04.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11713

3 years agoReturn finer grain errors in libzfs unmount_one
Don Brady [Mon, 8 Mar 2021 16:46:45 +0000 (09:46 -0700)]
Return finer grain errors in libzfs unmount_one

Added errno mappings to unmount_one() in libzfs.  Changed do_unmount()
implementation to return errno errors directly like is done for
do_mount() and others.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #11681

3 years agovdev_id: Create symlinks even if no /dev/mapper/
Tony Hutter [Mon, 8 Mar 2021 16:43:30 +0000 (08:43 -0800)]
vdev_id: Create symlinks even if no /dev/mapper/

vdev_id uses the /dev/mapper/ symlinks to resolve a UUID to a dm name
(like dm-1).  However on some multipath setups, there is no /dev/mapper/
entry for the UUID at the time vdev_id is called by udev.  However,
this isn't necessarily needed, as we may be able to resolve the dm
name from the $DEVNAME that udev passes us (like DEVNAME="/dev/dm-1").

This patch tries to resolve the dm name from $DEVNAME first, before
falling back to looking in /dev/mapper/.  This fixed an issue where the
by-vdev names weren't reliably showing up on one of our nodes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #11698

3 years agoZTS events_002: Improve speed and reliability
Antonio Russo [Mon, 8 Mar 2021 16:42:45 +0000 (09:42 -0700)]
ZTS events_002: Improve speed and reliability

events_002 exercises the ZED, ensuring that it neither misses events,
nor reporting events twice.

On slow test hardware, some of the timeouts are insufficient to allow
the ZED to properly settle.  Conversely, on fast hardware these same
timeouts are too long, unnecessarily slowing the test run.

Instead of using a fixed timeout, wait for the expected final event
before returning.  Additionally, wait with a timeout for unexpected
events to avoid missing them if they show up late.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11703

3 years agozvol: call zil_replaying() during replay
Christian Schwarz [Sun, 7 Mar 2021 17:49:58 +0000 (18:49 +0100)]
zvol: call zil_replaying() during replay

zil_replaying(zil, tx) has the side-effect of informing the ZIL that an
entry has been replayed in the (still open) tx.  The ZIL uses that
information to record the replay progress in the ZIL header when that
tx's txg syncs.

ZPL log entries are not idempotent and logically dependent and thus
calling zil_replaying() is necessary for correctness.

For ZVOLs the question of correctness is more nuanced: ZVOL logs only
TX_WRITE and TX_TRUNCATE, both of which are idempotent. Logical
dependencies between two records exist only if the write or discard
request had sync semantics or if the ranges affected by the records
overlap.

Thus, at a first glance, it would be correct to restart replay from
the beginning if we crash before replay completes. But this does not
address the following scenario:
Assume one log record per LWB.
The chain on disk is

    HDR -> 1:W(1, "A") -> 2:W(1, "B") -> 3:W(2, "X") -> 4:W(3, "Z")

where N:W(O, C) represents log entry number N which is a TX_WRITE of C
to offset A.
We replay 1, 2 and 3 in one txg, sync that txg, then crash.
Bit flips corrupt 2, 3, and 4.
We come up again and restart replay from the beginning because
we did not call zil_replaying() during replay.
We replay 1 again, then interpret 2's invalid checksum as the end
of the ZIL chain and call replay done.
The replayed zvol content is "AX".

If we had called zil_replaying() the HDR would have pointed to 3
and our resumed replay would not have replayed anything because
3 was corrupted, resulting in zvol content "BX".

If 3 logically depends on 2 then the replay corrupted the ZVOL_OBJ's
contents.

This patch adds the zil_replaying() calls to the replay functions.
Since the callbacks in the replay function need the zilog_t* pointer
so that they can call zil_replaying() we open the ZIL while
replaying in zvol_create_minor(). We also verify that replay has
been done when on-demand-opening the ZIL on the first modifying
bio.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11667

3 years agoZTS: Improve cleanup in zpool tests
Ryan Moeller [Sun, 7 Mar 2021 17:41:01 +0000 (12:41 -0500)]
ZTS: Improve cleanup in zpool tests

* Restore original kern.corefile value after the test.
* Don't leave behind a frozen pool.
* Clean up leftover vdev files.
* Make zpool_002_pos and zpool_003_pos consistent in their handling of
core files while here.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11694

3 years agoClarify compressed zfs send/recv behavior
manfromafar [Sun, 7 Mar 2021 17:39:16 +0000 (10:39 -0700)]
Clarify compressed zfs send/recv behavior

Docs for send and receive do not explain behavior when sending a
compressed stream then receiving on a host that overrides compression
with -o compress=value.

The data from the send stream is written as it was from the send is
the compressed form but the compression algorithm set on the receiver
is the overridden version which causes some confusion as to what
algorithm was actually used.

Updated man docs to clarify behavior

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed By: Allan Jude <allanjude@freebsd.org>
Signed-off-by: manfromafar <manfromafar@outlook.com>
Closes #11690

3 years agoIntentionally allow ZFS_READONLY in zfs_write
Ryan Moeller [Sun, 7 Mar 2021 17:31:52 +0000 (12:31 -0500)]
Intentionally allow ZFS_READONLY in zfs_write

ZFS_READONLY represents the "DOS R/O" attribute.
When that flag is set, we should behave as if write access
were not granted by anything in the ACL.  In particular:
We _must_ allow writes after opening the file r/w, then
setting the DOS R/O attribute, and writing some more.
(Similar to how you can write after fchmod(fd, 0444).)

Restore these semantics which were lost on FreeBSD when refactoring
zfs_write.  To my knowledge Linux does not actually expose this flag,
but we'll need it to eventually so I've added the supporting checks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11693

3 years agoSuppress cppcheck invalidSyntax warninigs
Brian Behlendorf [Sat, 6 Mar 2021 01:56:35 +0000 (17:56 -0800)]
Suppress cppcheck invalidSyntax warninigs

For some reason cppcheck 1.90 is generating an invalidSyntax warning
when the BF64_SET macro is used in the zstream source.  The same
warning is not reported by cppcheck 2.3, nor is their any evident
problem with the expanded macro.  This appears to be an issue with
this version of cppcheck.  This commit annotates the source to suppress
the warning.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11700

3 years agoInitialize ZIL buffers
Brian Behlendorf [Fri, 5 Mar 2021 22:45:13 +0000 (14:45 -0800)]
Initialize ZIL buffers

When populating a ZIL destination buffer ensure it is always
zeroed before its contents are constructed.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tom Caputi <caputit1@tcnj.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11687

3 years agoFix abd_get_offset_struct() may allocate new abd
Jorgen Lundman [Fri, 5 Mar 2021 20:22:57 +0000 (05:22 +0900)]
Fix abd_get_offset_struct() may allocate new abd

Even when supplied with an abd to abd_get_offset_struct(), the call
to abd_get_offset_impl() can allocate a different abd. Ensure to
call abd_fini_struct() on the abd that is not used.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #11683

3 years agoFreeBSD module --enable-debug --enable-invariants
Ryan Moeller [Fri, 5 Mar 2021 20:16:41 +0000 (15:16 -0500)]
FreeBSD module --enable-debug --enable-invariants

Wire up the --enable-debug flag for configure to the FreeBSD module
build.  Add --enable-invariants.

The running FreeBSD kernel config is used to detect whether to enable
INVARIANTS if not explicitly specified with --enable-invariants or
--disable-invariants.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11678

3 years agozpool: use tab to intend continuation from removal status
Thomas Lamprecht [Fri, 5 Mar 2021 20:15:35 +0000 (21:15 +0100)]
zpool: use tab to intend continuation from removal status

Bring the output of the removal status in line with the other
"fields" that zpool status outputs, and thus allows an parser to
easier detect this as continuation of the 'remove:' output.

Before:
remove: Removal of vdev 0 copied 282G in 0h9m, completed on [...]
    776K memory used for removed device mappings

Now:
remove: Removal of vdev 0 copied 282G in 0h9m, completed on [...]
776K memory used for removed device mappings

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Closes #11674

3 years agoDon't bomb out when using keylocation=file://
James Wah [Wed, 3 Mar 2021 16:28:49 +0000 (03:28 +1100)]
Don't bomb out when using keylocation=file://

Avoid following the error path when the operation in fact succeeded.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: James Wah <james@laird-wah.net>
Closes #11651

3 years agolinux: zvol: avoid heap allocation for zvol_request_sync=1
Christian Schwarz [Wed, 3 Mar 2021 16:15:28 +0000 (17:15 +0100)]
linux: zvol: avoid heap allocation for zvol_request_sync=1

The spl_kmem_alloc showed up in some flamegraphs in a single-threaded
4k sync write workload at 85k IOPS on an
Intel(R) Xeon(R) Silver 4215 CPU @ 2.50GHz.
Certainly not a huge win but I believe the change is clean and
easy to maintain down the road.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11666

3 years agoAdd "zstd-fast" to help options for "compression" property
Jake Howard [Wed, 3 Mar 2021 16:14:19 +0000 (16:14 +0000)]
Add "zstd-fast" to help options for "compression" property

This value does work as expected, and is documented in the manpage.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jake Howard <git@theorangeone.net>
Closes #11670

3 years agoCancel TRIM / initialize on FAULTED non-writeable vdevs
nssrikanth [Tue, 2 Mar 2021 18:27:27 +0000 (23:57 +0530)]
Cancel TRIM / initialize on FAULTED non-writeable vdevs

When a device which is actively trimming or initializing becomes
FAULTED, and therefore no longer writable, cancel the active
TRIM or initialization.  When the device is merely taken offline
with `zpool offline` then stop the operation but do not cancel it.
When the device is brought back online the operation will be
resumed if possible.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Vipin Kumar Verma <vipin.verma@hpe.com>
Signed-off-by: Srikanth N S <srikanth.nagasubbaraoseetharaman@hpe.com>
Closes #11588

3 years agoFix assert in FreeBSD-specific dmu_read_pages
Andriy Gapon [Sun, 28 Feb 2021 01:23:09 +0000 (03:23 +0200)]
Fix assert in FreeBSD-specific dmu_read_pages

The function has three similar pieces of code: for read-behind pages,
requested pages and read-ahead pages.  All three pieces had an
assert to ensure that the page is not mapped.  Later the assert was
relaxed to require that the page is not mapped for writing.  But that
was done in two places out of three.  This change fixes the third piece,
read-ahead.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes #11654

3 years agoZTS: zpool_trim_start_and_cancel_pos.ksh
Brian Behlendorf [Sun, 28 Feb 2021 01:19:50 +0000 (17:19 -0800)]
ZTS: zpool_trim_start_and_cancel_pos.ksh

Several of the TRIM tests were based of the initialize tests and
then adapted for TRIM.  The zpool_trim_start_and_cancel_pos.ksh
test was intended to be one such test but it was overlooked and
actually never adapted.  Update it accordingly.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11649

3 years agoAdd missing checks for unsupported features
Martin Matuška [Sun, 28 Feb 2021 01:16:02 +0000 (02:16 +0100)]
Add missing checks for unsupported features

After 35ec517 it has become possible to import ZFS pools witn an
active org.illumos:edonr feature on FreeBSD, leading to a panic.

In addition, "zpool status" reported all pools without edonr
as upgradable and "zpool upgrade -v" reported edonr in the list
of upgradable features.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #11653