]> CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log
FreeBSD/FreeBSD.git
4 years agoFreeBSD: Create taskq threads in appropriate proc
Ryan Moeller [Mon, 17 Aug 2020 18:01:19 +0000 (14:01 -0400)]
FreeBSD: Create taskq threads in appropriate proc

Stepping stone toward re-enabling spa_thread on FreeBSD.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10715

4 years agoFix L2ARC reads when compressed ARC disabled
Allan Jude [Fri, 14 Aug 2020 06:31:20 +0000 (02:31 -0400)]
Fix L2ARC reads when compressed ARC disabled

When reading compressed blocks from the L2ARC, with
compressed ARC disabled, arc_hdr_size() returns
LSIZE rather than PSIZE, but the actual read is PSIZE.
This causes l2arc_read_done() to compare the checksum
against the wrong size, resulting in checksum failure.

This manifests as an increase in the kstat l2_cksum_bad
and the read being retried from the main pool, making the
L2ARC ineffective.

Add new L2ARC tests with Compressed ARC enabled/disabled

Blocks are handled differently depending on the state of the
zfs_compressed_arc_enabled tunable.

If a block is compressed on-disk, and compressed_arc is enabled:
- the block is read from disk
- It is NOT decompressed
- It is added to the ARC in its compressed form
- l2arc_write_buffers() may write it to the L2ARC (as is)
- l2arc_read_done() compares the checksum to the BP (compressed)

However, if compressed_arc is disabled:
- the block is read from disk
- It is decompressed
- It is added to the ARC (uncompressed)
- l2arc_write_buffers() will use l2arc_apply_transforms() to
  recompress the block, before writing it to the L2ARC
- l2arc_read_done() compares the checksum to the BP (compressed)
- l2arc_read_done() will use l2arc_untransform() to uncompress it

This test writes out a test file to a pool consisting of one disk
and one cache device, then randomly reads from it. Since the arc_max
in the tests is low, this will feed the L2ARC, and result in reads
from the L2ARC.

We compare the value of the kstat l2_cksum_bad before and after
to determine if any blocks failed to survive the trip through the
L2ARC.

Sponsored-by: The FreeBSD Foundation
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Closes #10693

4 years agoRelease onexit/events with any missed zfsdev_state
Jorgen Lundman [Thu, 13 Aug 2020 22:03:23 +0000 (07:03 +0900)]
Release onexit/events with any missed zfsdev_state

Linux and FreeBSD will most likely never see this issue.
On macOS when kext is unloaded, but zed is still connected, zed
will be issued ENODEV. As the cdevsw is released, the kernel
will not have zfsdev_release() called to release minor/onexit/events,
and it "leaks". This ensures it is cleaned up before unload.

Changed the for loop from zsprev, to zsnext style, for less
code duplication.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #10700

4 years agoGithub workflow: checkstyle
George Melikov [Wed, 12 Aug 2020 17:46:26 +0000 (20:46 +0300)]
Github workflow: checkstyle

Use github workflow to run checkstyle
- use free (for OS projects) resources
- starts for every commit and branch
- work on forks, contributors may use it
  before creating PRs

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #10705

4 years agocstyle.pl: echo commands for github workflow
George Melikov [Wed, 12 Aug 2020 17:45:50 +0000 (20:45 +0300)]
cstyle.pl: echo commands for github workflow

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #10705

4 years agoRemove stale .travis.yml
George Melikov [Thu, 13 Aug 2020 21:55:45 +0000 (00:55 +0300)]
Remove stale .travis.yml

- It doesn't work now.
- It has to be manually edited on tests changes.
  (even on test runtime changes!)
- Travis gives too small time to run to be useful.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #10704

4 years agoUse zfs_dbgmsg to log metaslab_load/unload
Matthew Ahrens [Wed, 12 Aug 2020 17:10:50 +0000 (10:10 -0700)]
Use zfs_dbgmsg to log metaslab_load/unload

Metaslabs are now (usually) loaded and unloaded infrequently, but when
that is not the case, it is useful to have a log of when and why these
events happened.

This commit enables the zfs_dbgmsg() in metaslab_load(), and adds a
zfs_dbgmsg() in metaslab_unload().

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10683

4 years agoRestore ARC MFU/MRU pressure
Matthew Macy [Wed, 12 Aug 2020 17:03:24 +0000 (10:03 -0700)]
Restore ARC MFU/MRU pressure

The arc_adapt() function tunes LRU/MLU balance according to 4 types of
cache hits (which is passed as state agrument): ghost LRU, LRU, MRU,
ghost MRU. If this function is called with wrong cache hit (state),
adaptation will be sub-optimal and performance will suffer.

Some time ago upstream received this commit:

6950 ARC should cache compressed data) in arc_read() do next
sequence (access to ghost buffer)

Before this commit, hit to any ghost list was passed arc_adapt() before
call to arc_access() which revive element in cache and change state from
ghost to real hit.

After this commit, the order of calls was reverted and arc_adapt() is
now called only with «real» hits even if hit was in one of two ghost
lists, which renders ghost lists useless and breaks the ARC algorithm.

FreeBSD fixed this problem locally in Change D19094 / Commit r348772.

This change is an adaptation of the above commit to the current arc
code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10548
Closes #10618

4 years ago'zfs share -a' should handle 'canmount=noauto'
George Wilson [Tue, 11 Aug 2020 20:55:04 +0000 (14:55 -0600)]
'zfs share -a' should handle 'canmount=noauto'

The 'zfs share -a' currently skips any filesystems which
have 'canmount=noauto' set. This behavior is unexpected since the
one would expect 'zfs share -a' to share any mounted filesystem
that has the 'sharenfs' property already set.

This changes the behavior of 'zfs share -a' to allow the sharing
of 'canmount=noauto' datasets if they are mounted.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: George Wilson <gwilson@delphix.com>
External-issue: DLPX-71313
Closes #10688

4 years agoFreeBSD: Fix module autoloading when built in base
Matthew Macy [Tue, 11 Aug 2020 20:49:50 +0000 (13:49 -0700)]
FreeBSD: Fix module autoloading when built in base

The KMOD name is "zfs" instead of "openzfs" when building in FreeBSD.

Define a ZFS_KMOD symbol as "zfs" when IN_BASE is defined, otherwise
"openzfs".

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10699

4 years agoLinux 5.9 compat: make_request_fn replaced with submit_bio interface
Coleman Kane [Sun, 9 Aug 2020 16:12:25 +0000 (12:12 -0400)]
Linux 5.9 compat: make_request_fn replaced with submit_bio interface

The make_request_fn and associated API was replaced recently in a
Linux 5.9 merge, to replace its functionality with a new submit_bio
member in struct block_device_operations.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #10696

4 years agoLinux 5.9 compat: Update NR_SLAB_RECLAIMABLE to NR_SLAB_RECLAIMABLE_B
Coleman Kane [Sun, 9 Aug 2020 16:07:49 +0000 (12:07 -0400)]
Linux 5.9 compat: Update NR_SLAB_RECLAIMABLE to NR_SLAB_RECLAIMABLE_B

This change appears to primarily be a name change for the enum. Had
to update the test logic so that it works so long as either one of
these is present (favoring the newer one). Additionally, as this is
newer, it only shows up in node_page_item, so this commit doesn't
test zone_page_item for the same enum.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #10696

4 years agoLinux 5.9 compat: add linux/blkdev.h include
Coleman Kane [Sun, 9 Aug 2020 16:03:03 +0000 (12:03 -0400)]
Linux 5.9 compat: add linux/blkdev.h include

Many of the block device operations (often functions with bdev in
the name) were moved into linux/blkdev.h from linux/fs.h. Seems
that this header is already included where needed in the code, but
in the autoconf tests it was missing causing false negatives. This
commit has those tests include linux/fs.h (old location) and now
also linux/blkdev.h (new locations).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #10696

4 years agoFix typo
Allan Jude [Tue, 11 Aug 2020 20:16:57 +0000 (16:16 -0400)]
Fix typo

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Closes #10694

4 years agoMove ZVOL_DIR back to zfs.h
Ryan Moeller [Tue, 11 Aug 2020 20:12:12 +0000 (16:12 -0400)]
Move ZVOL_DIR back to zfs.h

This was previously moved because nothing else in-tree uses it, but
evidently DilOS uses it out of tree.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Ryan Moeller <freqlabs@freebsd.org>
Closes #10361
Closes #10685

4 years agoFreeBSD: update vaccess signature on most recent HEAD
Matthew Macy [Fri, 7 Aug 2020 21:16:01 +0000 (14:16 -0700)]
FreeBSD: update vaccess signature on most recent HEAD

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10682

4 years agoClarify error message when a range-tree double-add occurs
Paul Dagnelie [Fri, 7 Aug 2020 21:13:13 +0000 (14:13 -0700)]
Clarify error message when a range-tree double-add occurs

In various other pieces of logic have resulted in situations where
we double-free space in ZFS. This in turn results in a double-add
to the range trees. These issues have been much more difficult to
diagnose than they should have been, because the error handling
around this case is much weaker than around the double remove case.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #10654

4 years agoZTS: Remove bashisms from zfs-tests.sh
Ryan Moeller [Fri, 7 Aug 2020 21:10:48 +0000 (17:10 -0400)]
ZTS: Remove bashisms from zfs-tests.sh

Bring zfs-tests.sh in to compliance with the other scripts
by converting it /bin/sh for to avoid a dependency on bash.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10640

4 years agoRemove commented-out code
Matthew Ahrens [Tue, 4 Aug 2020 18:36:53 +0000 (11:36 -0700)]
Remove commented-out code

Remove dead code to make the implementation easier to understand.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Ahrens <matt@delphix.com>
Closes #10650

4 years agoRemove KM_NODEBUG
Matthew Ahrens [Thu, 30 Jul 2020 20:59:07 +0000 (13:59 -0700)]
Remove KM_NODEBUG

Remove dead code to make the implementation easier to understand.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Ahrens <matt@delphix.com>
Closes #10650

4 years agoRemove KMC_NOMAGAZINE
Matthew Ahrens [Thu, 30 Jul 2020 20:56:00 +0000 (13:56 -0700)]
Remove KMC_NOMAGAZINE

Remove dead code to make the implementation easier to understand.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Ahrens <matt@delphix.com>
Closes #10650

4 years agoRemove KMC_QCACHE
Matthew Ahrens [Thu, 30 Jul 2020 20:51:31 +0000 (13:51 -0700)]
Remove KMC_QCACHE

Remove dead code to make the implementation easier to understand.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Ahrens <matt@delphix.com>
Closes #10650

4 years agoRemove KMC_NOHASH
Matthew Ahrens [Thu, 30 Jul 2020 20:46:32 +0000 (13:46 -0700)]
Remove KMC_NOHASH

Remove dead code to make the implementation easier to understand.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Ahrens <matt@delphix.com>
Closes #10650

4 years agoRemove KMC_NOTOUCH
Matthew Ahrens [Thu, 30 Jul 2020 20:43:18 +0000 (13:43 -0700)]
Remove KMC_NOTOUCH

Remove dead code to make the implementation easier to understand.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Ahrens <matt@delphix.com>
Closes #10650

4 years agoRemove KMC_OFFSLAB
Matthew Ahrens [Thu, 30 Jul 2020 05:03:23 +0000 (22:03 -0700)]
Remove KMC_OFFSLAB

Remove dead code to make the implementation easier to understand.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Ahrens <matt@delphix.com>
Closes #10650

4 years agoFix i/o error handling of livelists and zap iteration
Matthew Ahrens [Wed, 5 Aug 2020 17:22:09 +0000 (10:22 -0700)]
Fix i/o error handling of livelists and zap iteration

Pool-wide metadata is stored in the MOS (Meta Object Set).  This
metadata is stored in triplicate, in addition to any pool-level
reduncancy (e.g. RAIDZ).  However, if all 3+ copies of this metadata are
not available, we can still get EIO/ECKSUM when reading from the MOS.
If we encounter such an error in syncing context, we have typically
already committed to making a change that we now can't do because of the
corrupt/missing metadata.  We typically "handle" this with a `VERIFY()`
or `zfs_panic_recover()`.  This prevents the system from continuing on
in an undefined state, while minimizing the amount of error-handling
code.

However, there are some code paths that ignore these i/o errors, or
`ASSERT()` that they don't happen.  Since assertions are disabled on
non-debug builds, they effectively ignore them as well.  This can lead
to ZFS continuing on in an incorrect state, potentially leading to
on-disk inconsistencies.

This commit adds handling for these i/o errors on MOS metadata,
typically with a `VERIFY()`:

* Handle error return from `zap_cursor_retrieve()` in 4 places in
`dsl_deadlist.c`.

* Handle error return from `zap_contains()` in `dsl_dir_hold_obj()`.
Turns out this call isn't necessary because we can always call
`zap_lookup()`.

* Handle error return from `zap_lookup()` in `dsl_fs_ss_limit_check()`.

* Handle error return from `zap_remove()` in `dsl_dir_rename_sync()`.

* Handle error return from `zap_lookup()` in
`dsl_dir_remove_livelist()`.

* Handle error return from `dsl_process_sub_livelist()` in
`spa_livelist_delete_cb()`.

Additionally:

* Augment the internal history log message for `zfs destroy` to note
which method is used (e.g. bptree, livelist, or, synchronous) and the
mintxg.

* Correct a comment in `dbuf_init()`.

* Correct indentation in `dsl_dir_remove_livelist()`.

Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10643

4 years agoFreeBSD: Add support for lockless lookup
Matthew Macy [Wed, 5 Aug 2020 17:19:51 +0000 (10:19 -0700)]
FreeBSD: Add support for lockless lookup

Authored-by: mjg <mjg@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10657

4 years agoAdd missed thread_exit() to vdev_{autotrim,rebuild}_thread
Matthew Macy [Wed, 5 Aug 2020 17:17:07 +0000 (10:17 -0700)]
Add missed thread_exit() to vdev_{autotrim,rebuild}_thread

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10668

4 years agoFix arc__wait__for__eviction tracepoint
Pavel Snajdr [Tue, 4 Aug 2020 17:04:00 +0000 (19:04 +0200)]
Fix arc__wait__for__eviction tracepoint

3442c2a02d added new `arc_wait_for_eviction` tracepoint, which fails to
compile, when tracepoints are enabled.

The tracepoint definition begins with `DEFINE_ARC_WAIT_FOR_EVICTION_EVENT`
and is a multi-line definition, so this fixes the backslash
and parenthesis accordingly.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #10669

4 years agoVerify zfs module loaded before starting services
Jonathon [Sun, 2 Aug 2020 00:13:15 +0000 (00:13 +0000)]
Verify zfs module loaded before starting services

This is a minor change to the systemd service templates that verifies
the zfs kernel module is loaded by the kernel prior to attempting to
import any zpool.

The services check for the presence of /sys/module/zfs which indicates
the zfs is module is loaded. This uses the systemd built-in check
ConditionPathIsDirectory.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Thode <prometheanfire@gentoo.org>
Signed-off-by: Jonathon Fernyhough <jonathon.fernyhough@york.ac.uk>
Closes #10663

4 years agoFix logging in l2arc_rebuild()
George Amanakis [Sat, 1 Aug 2020 18:17:18 +0000 (14:17 -0400)]
Fix logging in l2arc_rebuild()

In case the L2ARC rebuild was canceled, do not log to spa history
log as the pool may be in the process of being removed and a panic
may occur:

BUG: kernel NULL pointer dereference, address: 0000000000000018
RIP: 0010:spa_history_log_internal+0xb1/0x120 [zfs]
Call Trace:
 l2arc_rebuild+0x464/0x7c0 [zfs]
 l2arc_dev_rebuild_start+0x2d/0x130 [zfs]
 ? l2arc_rebuild+0x7c0/0x7c0 [zfs]
 thread_generic_wrapper+0x78/0xb0 [spl]
 kthread+0xfb/0x130
 ? IS_ERR+0x10/0x10 [spl]
 ? kthread_park+0x90/0x90
 ret_from_fork+0x35/0x40

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10659

4 years agoFreeBSD: Fix `zfs jail` and add a test
Ryan Moeller [Sat, 1 Aug 2020 15:44:54 +0000 (11:44 -0400)]
FreeBSD: Fix `zfs jail` and add a test

zfs_jail was not using zfs_ioctl so failed to map the IOC number
correctly.  Use zfs_ioctl to perform the jail ioctl and add a test
case for FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10658

4 years agoFix page fault in zfsctl_snapdir_getattr
Matthew Macy [Sat, 1 Aug 2020 15:42:55 +0000 (08:42 -0700)]
Fix page fault in zfsctl_snapdir_getattr

Must acquire the z_teardown_lock before accessing the zfsvfs_t object.
I can't reproduce this panic on demand, but this looks like the
correct solution.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Authored-by: asomers <asomers@FreeBSD.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10656

4 years agoChange the error handling for invalid property values
Allan Jude [Sat, 1 Aug 2020 15:41:31 +0000 (11:41 -0400)]
Change the error handling for invalid property values

ZFS recv should return a useful error message when an invalid index
property value is provided in the send stream properties nvlist

With a compression= property outside of the understood range:

Before:
```
receiving full stream of zof/zstd_send@send2 into testpool/recv@send2
internal error: Invalid argument
Aborted (core dumped)
```
Note: the recv completes successfully, the abort() is likely just to
make it easier to track the unexpected error code.

After:
```
receiving full stream of zof/zstd_send@send2 into testpool/recv@send2
cannot receive compression property on testpool/recv: invalid property
value received 28.9M stream in 1 seconds (28.9M/sec)
```

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #10631

4 years agoChanges to make openzfs build within FreeBSD buildworld
Matthew Macy [Sat, 1 Aug 2020 04:30:31 +0000 (21:30 -0700)]
Changes to make openzfs build within FreeBSD buildworld

A collection of header changes to enable FreeBSD to build
with vendored OpenZFS.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10635

4 years agoConvert Linux-isms to FreeBSD-isms in platform zfs_debug.c
Ryan Moeller [Sat, 1 Aug 2020 04:25:35 +0000 (00:25 -0400)]
Convert Linux-isms to FreeBSD-isms in platform zfs_debug.c

Change some comments copied from the Linux code to describe
the appropriate methods on FreeBSD.

Convert some tunables to ZFS_MODULE_PARAM so they get created
on FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10647

4 years agoZTS: FreeBSD does have a l2arc.trim_ahead tunable
Ryan Moeller [Sat, 1 Aug 2020 04:18:32 +0000 (00:18 -0400)]
ZTS: FreeBSD does have a l2arc.trim_ahead tunable

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10633

4 years agoRevise ARC shrinker algorithm
Matthew Ahrens [Sat, 1 Aug 2020 04:10:52 +0000 (21:10 -0700)]
Revise ARC shrinker algorithm

The ARC shrinker callback `arc_shrinker_count/_scan()` is invoked by the
kernel's shrinker mechanism when the system is running low on free
pages.  This happens via 2 code paths:

1. "direct reclaim": The system is attempting to allocate a page, but we
are low on memory.  The ARC shrinker callback is invoked from the
page-allocation code path.

2. "indirect reclaim": kswapd notices that there aren't many free pages,
so it invokes the ARC shrinker callback.

In both cases, the kernel's shrinker code requests that the ARC shrinker
callback release some of its cache, and then it measures how many pages
were released.  However, it's measurement of released pages does not
include pages that are freed via `__free_pages()`, which is how the ARC
releases memory (via `abd_free_chunks()`).  Rather, the kernel shrinker
code is looking for pages to be placed on the lists of reclaimable pages
(which is separate from actually-free pages).

Because the kernel shrinker code doesn't detect that the ARC has
released pages, it may call the ARC shrinker callback many times,
resulting in the ARC "collapsing" down to `arc_c_min`.  This has several
negative impacts:

1. ZFS doesn't use RAM to cache data effectively.

2. In the direct reclaim case, a single page allocation may wait a long
time (e.g. more than a minute) while we evict the entire ARC.

3. Even with the improvements made in 67c0f0dedc5 ("ARC shrinking blocks
reads/writes"), occasionally `arc_size` may stay above `arc_c` for the
entire time of the ARC collapse, thus blocking ZFS read/write operations
in `arc_get_data_impl()`.

To address these issues, this commit limits the ways that the ARC
shrinker callback can be used by the kernel shrinker code, and mitigates
the impact of arc_is_overflowing() on ZFS read/write operations.

With this commit:

1. We limit the amount of data that can be reclaimed from the ARC via
the "direct reclaim" shrinker.  This limits the amount of time it takes
to allocate a single page.

2. We do not allow the ARC to shrink via kswapd (indirect reclaim).
Instead we rely on `arc_evict_zthr` to monitor free memory and reduce
the ARC target size to keep sufficient free memory in the system.  Note
that we can't simply rely on limiting the amount that we reclaim at once
(as for the direct reclaim case), because kswapd's "boosted" logic can
invoke the callback an unlimited number of times (see
`balance_pgdat()`).

3. When `arc_is_overflowing()` and we want to allocate memory,
`arc_get_data_impl()` will wait only for a multiple of the requested
amount of data to be evicted, rather than waiting for the ARC to no
longer be overflowing.  This allows ZFS reads/writes to make progress
even while the ARC is overflowing, while also ensuring that the eviction
thread makes progress towards reducing the total amount of memory used
by the ARC.

4. The amount of memory that the ARC always tries to keep free for the
rest of the system, `arc_sys_free` is increased.

5. Now that the shrinker callback is able to provide feedback to the
kernel's shrinker code about our progress, we can safely enable
the kswapd hook. This will allow the arc to receive notifications
when memory pressure is first detected by the kernel. We also
re-enable the appropriate kstats to track these callbacks.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10600

4 years agoZTS: zvol_misc_volmode is flaky on FreeBSD
Ryan Moeller [Sat, 1 Aug 2020 04:05:55 +0000 (00:05 -0400)]
ZTS: zvol_misc_volmode is flaky on FreeBSD

Mark this as a known issue.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10655

4 years agoZTS: Use POSIX-compatible space character class
Ryan Moeller [Sat, 1 Aug 2020 01:11:21 +0000 (21:11 -0400)]
ZTS: Use POSIX-compatible space character class

FreeBSD recently integrated a change which causes \s in a regex to
throw an error instead of silently being misinterpreted as an s.

Change the regex in zpool_colors.ksh to use [[:space:]].

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #10651

4 years agolua: Increase reserved stack space for FreeBSD in debug config
Ryan Moeller [Fri, 31 Jul 2020 16:17:37 +0000 (12:17 -0400)]
lua: Increase reserved stack space for FreeBSD in debug config

FreeBSD uses more stack space in debug configurations and can overflow
the stack while formatting the error message when the call depth limit
of 20 frames is reached.  This is readily reproduced by running the
gsub recursion test with increased kstack size.  I hit the panic with
16 pages per kstack, and noticed it go away when bumped to 17.

Reserve an additional 64 bytes on the stack when building for FreeBSD.
This is enough to avoid the panic with a deep stack while not wasting
too much space when the default stack size is used.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10634

4 years agoWhen encountering EZFS_UNKNOWN, print the error text buffer anyway
Allan Jude [Fri, 31 Jul 2020 16:07:37 +0000 (12:07 -0400)]
When encountering EZFS_UNKNOWN, print the error text buffer anyway

Rather than just saying there was an internal error, provide any
context we might have to the user to help them understand the issue.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #10632

4 years agoRemove duplicate include of sys/zfeature.h in dmu_objset.c
Allan Jude [Fri, 31 Jul 2020 16:04:45 +0000 (12:04 -0400)]
Remove duplicate include of sys/zfeature.h in dmu_objset.c

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #10636

4 years agopyzfs: Add missing entry to zfs_errno
Allan Jude [Fri, 31 Jul 2020 16:01:41 +0000 (12:01 -0400)]
pyzfs: Add missing entry to zfs_errno

This was causing all later errno's to have the incorrect value.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #10649

4 years agozfs promote does not delete livelist of origin
Matthew Ahrens [Fri, 31 Jul 2020 15:59:00 +0000 (08:59 -0700)]
zfs promote does not delete livelist of origin

When a clone is promoted, its livelist is no longer accurate, so it is
discarded.  If the clone's origin is also a clone (i.e. we are promoting
a clone of a clone), then the origin's livelist is also no longer
accurate, so it should be discarded, but the code doesn't actually do
that.

Consider a pool with:
* Filesystem A
* Clone B, a clone of A
* Clone C, a clone of B

If we promote C, it discards C's livelist.  It should discard B's
livelist, but that is not happening.  The impact is that when B is
destroyed, we use the livelist to find the blocks to free, but the
livelist is no longer correct so we end up freeing blocks that are still
in use by C.  The incorrectly-freed blocks can be reallocated causing
checksum errors.  And when C is destroyed it can double-free the
incorrectly-freed blocks.

The problem is that we remove the livelist of `origin_ds->ds_dir`, but
the origin snapshot has already been moved to the promoted dsl_dir.  So
this is actually trying to remove the livelist of the promoted dsl_dir,
which was already removed.  As explained in a comment in the beginning
of `dsl_dataset_promote_sync()`, we need to use the saved `odd` for the
origin's dsl_dir.

Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10652

4 years agoZTS: minor improvements to alloc_class_009_pos functional test
Don Brady [Thu, 30 Jul 2020 16:11:05 +0000 (10:11 -0600)]
ZTS: minor improvements to alloc_class_009_pos functional test

* Fixed a typo that cause one of the variations to be a no-op
* Added additional coverage for adding special vdev after pool create
* Added additional coverage for using 4K sector size

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #10641

4 years agoUse correct prefix for share/pam-configs
Ryan Moeller [Thu, 30 Jul 2020 16:09:46 +0000 (12:09 -0400)]
Use correct prefix for share/pam-configs

Respect the configured install prefix.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10604

4 years agoFix error handling of vdev_top_zap
Matthew Ahrens [Thu, 30 Jul 2020 00:04:34 +0000 (17:04 -0700)]
Fix error handling of vdev_top_zap

In `vdev_load()`, we look up several entries in the `vdev_top_zap`
object.  In most cases, if we encounter an i/o error, it will be
returned to the caller.  However, when handling
`VDEV_TOP_ZAP_ALLOCATION_BIAS`, if we get an i/o error, we may continue
on, which in theory could cause us to not realize that a vdev should be
used only for `special` allocations.

In practice, if we encountered an i/o error while looking for
`VDEV_TOP_ZAP_ALLOCATION_BIAS` in the `vdev_top_zap`, we'd also get an
i/o error while looking for other entries in the same object, and thus
the zpool open/import would fail.  Therefore the impact of this problem
is negligible.

This commit adds error handling for i/o errors while accessing the
`vdev_top_zap`, so that we aren't relying on unrelated code to fail for
us.

Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10637

4 years agoVerify zfs module loaded before starting services
Jonathon [Wed, 29 Jul 2020 23:52:18 +0000 (23:52 +0000)]
Verify zfs module loaded before starting services

This is a minor change to the systemd service templates that verifies the zfs
kernel module is loaded by the kernel prior to attempting to import any zpool.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jonathon Fernyhough <jonathon.fernyhough@york.ac.uk>
Closes #10627

4 years agoRename refcount.h to zfs_refcount.h
Matthew Macy [Wed, 29 Jul 2020 23:35:33 +0000 (16:35 -0700)]
Rename refcount.h to zfs_refcount.h

Renamed to avoid conflicting with refcount.h when a different
implementation is already provided by the platform.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10620

4 years agoIntroduce names for ZTHRs
Serapheim Dimitropoulos [Wed, 29 Jul 2020 16:43:33 +0000 (09:43 -0700)]
Introduce names for ZTHRs

When debugging issues or generally analyzing the runtime of
a system it would be nice to be able to tell the different
ZTHRs running by name rather than having to analyze their
stack.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Co-authored-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #10630

4 years agoPrefix zfs internal endian checks with _ZFS
Matthew Macy [Tue, 28 Jul 2020 20:02:49 +0000 (13:02 -0700)]
Prefix zfs internal endian checks with _ZFS

FreeBSD defines _BIG_ENDIAN BIG_ENDIAN _LITTLE_ENDIAN
LITTLE_ENDIAN on every architecture. Trying to do
cross builds whilst hiding this from ZFS has proven
extremely cumbersome.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10621

4 years agoFix lua stack overflow on recursive call to gsub()
Matthew Ahrens [Mon, 27 Jul 2020 23:11:47 +0000 (16:11 -0700)]
Fix lua stack overflow on recursive call to gsub()

The `zfs program` subcommand invokes a LUA interpreter to run ZFS
"channel programs".  This interpreter runs in a constrained environment,
with defined memory limits.  The LUA stack (used for LUA functions that
call each other) is allocated in the kernel's heap, and is limited by
the `-m MEMORY-LIMIT` flag and the `zfs_lua_max_memlimit` module
parameter.  The C stack is used by certain LUA features that are
implemented in C.  The C stack is limited by `LUAI_MAXCCALLS=20`, which
limits call depth.

Some LUA C calls use more stack space than others, and `gsub()` uses an
unusually large amount.  With a programming trick, it can be invoked
recursively using the C stack (rather than the LUA stack).  This
overflows the 16KB Linux kernel stack after about 11 iterations, less
than the limit of 20.

One solution would be to decrease `LUAI_MAXCCALLS`.  This could be made
to work, but it has a few drawbacks:

1. The existing test suite does not pass with `LUAI_MAXCCALLS=10`.

2. There may be other LUA functions that use a lot of stack space, and
the stack space may change depending on compiler version and options.

This commit addresses the problem by adding a new limit on the amount of
free space (in bytes) remaining on the C stack while running the LUA
interpreter: `LUAI_MINCSTACK=4096`.  If there is less than this amount
of stack space remaining, a LUA runtime error is generated.

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10611
Closes #10613

4 years agoRefactor ccompile.h to not include system headers
Matthew Macy [Sun, 26 Jul 2020 03:09:50 +0000 (20:09 -0700)]
Refactor ccompile.h to not include system headers

This is a step toward being able to vendor the OpenZFS code in FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10625

4 years agoMake use of ZFS_DEBUG consistent within kmod sources
Matthew Macy [Sun, 26 Jul 2020 03:07:44 +0000 (20:07 -0700)]
Make use of ZFS_DEBUG consistent within kmod sources

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10623

4 years agoFreeBSD: Fixes required to build ZFS on PowerPC
Matthew Macy [Sat, 25 Jul 2020 18:00:23 +0000 (11:00 -0700)]
FreeBSD: Fixes required to build ZFS on PowerPC

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10622

4 years agoFreeBSD: Remove accidental ARC size limiter
Ryan Moeller [Sat, 25 Jul 2020 17:49:49 +0000 (13:49 -0400)]
FreeBSD: Remove accidental ARC size limiter

i386 has some additional memory reservation logic that limits the size
of the reported available memory.  This was accidentally being used on
all arches due to a missing header.

Include machine/vmparam.h in freebsd/zfs/arc_os.c to pull in the
missing UMA_MD_SMALL_ALLOC definition.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10616

4 years agoFreeBSD: Implement arc_free_memory
Ryan Moeller [Sat, 25 Jul 2020 17:47:18 +0000 (13:47 -0400)]
FreeBSD: Implement arc_free_memory

This is only used for the kstat, but something other than 0 is nice.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10626

4 years agoAdd gang ABD child to parent gang ABD
Brian Atkinson [Sat, 25 Jul 2020 04:09:20 +0000 (22:09 -0600)]
Add gang ABD child to parent gang ABD

By design a gang ABD can not have another gang ABD as a child. This is
to make sure the logical offset in a gang ABD is consistent with the
individual ABDS it contains as children. If a gang ABD is added as a
child of a gang ABD we will add the individual children of the gang ABD
to the parent gang ABD. This allows for a consistent view of offsets
within the parent gang ABD.

Reviewed-by: Mark Maybee <mmaybee@cray.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #10430

4 years agoLimit dbuf cache sizes based only on ARC target size by default
Ryan Moeller [Sat, 25 Jul 2020 03:38:48 +0000 (23:38 -0400)]
Limit dbuf cache sizes based only on ARC target size by default

Set the initial max sizes to ULONG_MAX to allow the caches to grow
with the ARC.

Recalculate the metadata cache size on demand so it can adapt, too.

Update descriptions in zfs-module-parameters(5).

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10563
Closes #10610

4 years agoremove kmem_cache module parameter KMC_EXPIRE_AGE
Matthew Ahrens [Fri, 24 Jul 2020 16:39:26 +0000 (09:39 -0700)]
remove kmem_cache module parameter KMC_EXPIRE_AGE

By default, `spl_kmem_cache_expire` is `KMC_EXPIRE_MEM`, meaning that
objects will be removed from kmem cache magazines by
`spl_kmem_cache_reap_now()`.

There is also a module parameter to change this to `KMC_EXPIRE_AGE`,
which establishes a maximum lifetime for objects to stay in the
magazine.  This setting has rarely, if ever, been used, and is not
regularly tested.

This commit removes the code for `KMC_EXPIRE_AGE`, and associated module
parameters.

Additionally, the unused module parameter
`spl_kmem_cache_obj_per_slab_min` is removed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10608

4 years agoAdd support to decode a resume token
tony-zfs [Fri, 24 Jul 2020 00:44:03 +0000 (20:44 -0400)]
Add support to decode a resume token

Adding a new subcommand to zstream called token. This
now allows users to decode a resume token to retrieve the toname
field. This can be useful for tools that need this information.
The syntax works as follows zstream token <resume_token>.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Tony Perkins <tperkins@datto.com>
Closes #10558

4 years agoAnnotate unused parameters on inline definitions as such
Kyle Evans [Fri, 24 Jul 2020 00:41:48 +0000 (19:41 -0500)]
Annotate unused parameters on inline definitions as such

* libspl: umem: These are obviously and intentionally unused; annotate
  them as such to appease -Wunused-parameter builds that include this
  header.

* sys/dmu.h: In this case, clear_on_evict_dbufp is only used for
  ZFS_DEBUG builds, so annotate it as __maybe_unused to appease
  -Wunused-parameter.

Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kyle Evans <kevans@FreeBSD.org>
Closes #10606

4 years agoFreeBSD: Remove some code duplication in sysctl_os.c
Ryan Moeller [Fri, 24 Jul 2020 00:35:34 +0000 (20:35 -0400)]
FreeBSD: Remove some code duplication in sysctl_os.c

Drop unnecessary redefinition's of several arcstat values.
Put missing extern declaration of arc_no_grow_shift in arc_impl.h.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10609

4 years agolibzfs: const'ify path argument to zfs_path_to_zhandle
Kyle Evans [Wed, 22 Jul 2020 18:14:20 +0000 (13:14 -0500)]
libzfs: const'ify path argument to zfs_path_to_zhandle

zfs_path_to_zhandle has no need to mutate the path argument,
most notably:

- zfs_open takes path as const
- getextmntent takes path as const
- fprintf most clearly doesn't need to mutate it

It's hard to foresee any reason that libzfs could conceivably
want to mutate it in the future, either, so const'ify it.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kyle Evans <kevans@FreeBSD.org>
Closes #10605

4 years agoOpenZFSify CONTRIBUTING.md
Ryan Moeller [Wed, 22 Jul 2020 18:09:04 +0000 (14:09 -0400)]
OpenZFSify CONTRIBUTING.md

Update stale references to "ZFS on Linux" to "OpenZFS" in
CONTRIBUTING.md.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10602

4 years agoZTS: Fix devname2devid build on FreeBSD with libudev
Ryan Moeller [Wed, 22 Jul 2020 17:49:22 +0000 (13:49 -0400)]
ZTS: Fix devname2devid build on FreeBSD with libudev

When libudev is installed on FreeBSD, configure finds it and sets
WANT_DEVNAME2DEVID, but it isn't found by the linker because we
didn't specify where it is.

Use LIBUDEV_LIBS so the location of the library gets added to the
linker flags for devname2devid.
Also use LIBUDEV_CFLAGS here in case some other platform needs it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10590

4 years agoAdd zfs_gitrev.h to the distributed sources
Arvind Sankar [Sun, 19 Jul 2020 01:24:48 +0000 (21:24 -0400)]
Add zfs_gitrev.h to the distributed sources

Commit 109d2c931020 ("Move zfs_gitrev.h to build directory") stopped
distributing zfs_gitrev.h, as it is a generated file. Add it back, with
some changes in behavior.

Change the logic for gitrev as follows
- if the source tree is a git repository, the behavior for build is
  unchanged. For make dist, append -dist to the git tag in the
  distributed version of zfs_gitrev.h.
- otherwise, check if the source tree contains zfs_gitrev.h, and use it
  if so, falling back to "unknown" if it doesn't exist.
- clean it only in make maintainer-clean, so we don't remove it from the
  source tree on make clean or make distclean.

This allows disted sources to track what git tag they originally came
from, with the -dist suffix indicating that the code wasn't built
directly from git and so might contain additional changes beyond the git
tag.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Eli Schwartz <eschwartz@archlinux.org>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10595

4 years agoRestore scripts/make_gitrev.sh
Arvind Sankar [Fri, 17 Jul 2020 21:30:51 +0000 (17:30 -0400)]
Restore scripts/make_gitrev.sh

Commit 109d2c931020 ("Move zfs_gitrev.h to build directory") removed
scripts/make_gitrev.sh, putting the logic into the Makefile itself.

However, at least the Arch Linux packager wants the script so that the
file can be generated without having to run configure first, for
DKMS packaging purposes.

So move the make recipe back into the script.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Eli Schwartz <eschwartz@archlinux.org>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10595

4 years agoAdjust ARC terminology
Matthew Ahrens [Wed, 22 Jul 2020 16:51:47 +0000 (09:51 -0700)]
Adjust ARC terminology

The process of evicting data from the ARC is referred to as
`arc_adjust`.

This commit changes the term to `arc_evict`, which is more specific.

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10592

4 years agoUpdated CONTRIBUTING doc links for new issue labels
Evan Harris [Sun, 19 Jul 2020 17:22:44 +0000 (12:22 -0500)]
Updated CONTRIBUTING doc links for new issue labels

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Evan Harris <eharris@puremagic.com>
Closes #10584

4 years agoDisable shebang mangling on input files
Arvind Sankar [Sun, 19 Jul 2020 17:19:08 +0000 (13:19 -0400)]
Disable shebang mangling on input files

The DKMS module installs the entire source tree, including the .in files
that will later be substituted when building. This makes
brp_mangle_shebangs complain about shebang lines in the .in files.

Exclude everything under /usr/src from shebang mangling in the DKMS
package.

The KMOD package doesn't contain any of the files it excludes from
mangling, so just drop the exclusion.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: João Carlos Mendes Luís <jonny@jonny.eng.br>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10581
Closes #10582

4 years agoFreeBSD: Add legacy arc_min and arc_max
Ryan Moeller [Sun, 19 Jul 2020 17:15:34 +0000 (13:15 -0400)]
FreeBSD: Add legacy arc_min and arc_max

These tunables were renamed from vfs.zfs.arc_min and
vfs.zfs.arc_max to vfs.zfs.arc.min and vfs.zfs.arc.max.
Add legacy compat tunables for the old names.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10579

4 years agoMake unloading the key more robust
Jean-Baptiste Lallement [Thu, 18 Jun 2020 17:15:10 +0000 (19:15 +0200)]
Make unloading the key more robust

The unit was failing instead of stopping if someone manually unloaded
the key before stopping the unit (zfs unload-key is failing on an
unavailable key).
Follow a similar logic than for loading the key, checking for the key
status before unloading it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Co-authored-by: Didier Roche <didrocks@ubuntu.com>
Signed-off-by: Didier Roche <didrocks@ubuntu.com>
Closes #10477

4 years agoBindsTo dataset keyload unit to mount associate unit
Jean-Baptiste Lallement [Thu, 18 Jun 2020 17:00:04 +0000 (19:00 +0200)]
BindsTo dataset keyload unit to mount associate unit

We need a stronger dependency between the mount unit and its keyload unit
when we know that the dataset is encrypted.
If the keyload unit fails, Wants= will still try to mount the dataset,
which will then fail.
It’s better to show that the failure is due to a dependency failing, the
keyload unit, by tighting up the dependency. We can do this as we know
that we generate both units in the generator and so, it’s not an
optional dependency.
BindsTo enable as well that if the keyload unit fails at any point, the
associated mountpoint will be then unmounted.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Didier Roche <didrocks@ubuntu.com>
Signed-off-by: Didier Roche <didrocks@ubuntu.com>
Closes #10477

4 years agoEnsure mount unit pilots when its ZFS key is loaded
Jean-Baptiste Lallement [Thu, 18 Jun 2020 16:47:27 +0000 (18:47 +0200)]
Ensure mount unit pilots when its ZFS key is loaded

Drop Before=zfs.mount dependency explicity on generated key-load .service
unit.
Indeed, the associated mount unit is After=<dataset-key-load>.service.
This is thus the mount point which controls at what point it wants to be
mounted (Before=zfs-mount.service in stock generator), but this can be
an automount point, or triggered by another service.
This additional dependency from the key load service is not needed thus.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Didier Roche <didrocks@ubuntu.com>
Signed-off-by: Didier Roche <didrocks@ubuntu.com>
Closes #10477

4 years agoRemove skc_reclaim, hdr_recl, kmem_cache shrinker
Matthew Ahrens [Sun, 19 Jul 2020 16:58:30 +0000 (09:58 -0700)]
Remove skc_reclaim, hdr_recl, kmem_cache shrinker

The SPL kmem_cache implementation provides a mechanism, `skc_reclaim`,
whereby individual caches can register a callback to be invoked when
there is memory pressure.  This mechanism is used in only one place: the
ARC registers the `hdr_recl()` reclaim function.  This function wakes up
the `arc_reap_zthr`, whose job is to call `kmem_cache_reap()` and
`arc_reduce_target_size()`.

The `skc_reclaim` callbacks are invoked only by shrinker callbacks and
`arc_reap_zthr`, and only callback only wakes up `arc_reap_zthr`.  When
called from `arc_reap_zthr`, waking `arc_reap_zthr` is a no-op.  When
called from shrinker callbacks, we are already aware of memory pressure
and responding to it.  Therefore there is little benefit to ever calling
the `hdr_recl()` `skc_reclaim` callback.

The `arc_reap_zthr` also wakes once a second, and if memory is low when
allocating an ARC buffer.  Therefore, additionally waking it from the
shrinker calbacks has little benefit.

The shrinker callbacks can be invoked very frequently, e.g. 10,000 times
per second.  Additionally, for invocation of the shrinker callback,
skc_reclaim is invoked many times.  Therefore, this mechanism consumes
significant amounts of CPU time.

The kmem_cache shrinker calls `spl_kmem_cache_reap_now()`, which,
in addition to invoking `skc_reclaim()`, does two things to attempt to
free pages for use by the system:
 1. Return free objects from the magazine layer to the slab layer
 2. Return entirely-free slabs to the page layer (i.e. free pages)

These actions apply only to caches implemented by the SPL, not those
that use the underlying kernel SLAB/SLUB caches.  The SPL caches are
used for objects >=32KB, which are primarily linear ABD's cached in the
DBUF cache.

These actions (freeing objects from the magazine layer and returning
entirely-free slabs) are also taken whenever a `kmem_cache_free()` call
finds a full magazine.  So there would typically be zero entirely-free
slabs, and the number of objects in magazines is limited (typically no
more than 64 objects per magazine, and there's one magazine per CPU).
Therefore the benefit of `spl_kmem_cache_reap_now()`, while nonzero, is
modest.

We also call `spl_kmem_cache_reap_now()` from the `arc_reap_zthr`, when
memory pressure is detected.  Therefore, calling
`spl_kmem_cache_reap_now()` from the kmem_cache shrinker is not needed.

This commit removes the `skc_reclaim` mechanism, its only callback
`hdr_recl()`, and the kmem_cache shrinker callback.

Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10576

4 years agoLinux 4.10 compat: has_capability()
Brian Behlendorf [Sun, 19 Jul 2020 16:56:21 +0000 (09:56 -0700)]
Linux 4.10 compat: has_capability()

Stock kernels older than 4.10 do not export the has_capability()
function which is required by commit e59a377.  To avoid breaking
the build on older kernels revert to the safe legacy behavior and
return EACCES when privileges cannot be checked.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10565
Closes #10573

4 years agoanon_pages are not free/evictable
Matthew Ahrens [Thu, 16 Jul 2020 17:11:26 +0000 (10:11 -0700)]
anon_pages are not free/evictable

`arc_free_memory()` returns the amount of memory that the ARC considers
to be free.  This includes pages that are not actually free, but can be
evicted with essentially zero cost (without doing any i/o), for example
the page cache.  The ARC can "squeeze out" any pages included in this
calculation, leaving only `arc_sys_free` (1/64th of RAM) for these
free/evictable pages.

Included in the count of free/evictable pages is
`nr_inactive_anon_pages()`, which is described as "Anonymous memory that
has not been used recently and can be swapped out".  These pages would
have to be written out to disk (swap) in order to evict them, and they
are not included in `/proc/meminfo`'s `MemAvailable`.

Therefore it is not appropriate for `nr_inactive_anon_pages()` to be
included in the free/evictable memory returned by `arc_free_memory()`,
because the ARC shouldn't (intentionally) make the system swap.

This commit removes `nr_inactive_anon_pages()` from the memory returned
by `arc_free_memory()`.  This is a step towards enabling the ARC to
manage free memory by monitoring it and reducing the ARC size as we
notice that there is insufficient free memory (in the `arc_reap_zthr`),
rather than the current method of relying on the `arc_shrinker`
callback.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10575

4 years agoFreeBSD: zfs commands backward compatibility
Matthew Macy [Thu, 16 Jul 2020 04:32:50 +0000 (21:32 -0700)]
FreeBSD: zfs commands backward compatibility

Update the zfs commands such that they're backwards compatible with
the version of ZFS is the base FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10542

4 years agoUpdate zts-report.py with additional tests
Brian Behlendorf [Thu, 16 Jul 2020 04:28:18 +0000 (21:28 -0700)]
Update zts-report.py with additional tests

The following test cases have been observed to fail frequently
enough to be a problem when reporting CI results.  Until they can
be updated to be entirely reliable add them to the zts-report.py
script.

    alloc_class/alloc_class_011_neg
    cli_root/zpool_import/zpool_import_012_pos
    mmp/mmp_on_uberblocks
    rsend/send_partial_dataset

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10578

4 years agoZTS: Fix nonportable use of stat in list_file_blocks
Ryan Moeller [Thu, 16 Jul 2020 04:26:39 +0000 (00:26 -0400)]
ZTS: Fix nonportable use of stat in list_file_blocks

FreeBSD stat uses -f to specify the format string rather than -c.
list_file_blocks in blkdev.shlib uses stat -c %i to get a file's
object ID for zdb.  We already have a library function to do this
portably.

Use get_objnum to get the file's object ID.

Take log_must off of the call to list_free_blocks in
corrupt_blocks_at_level, which had masked the error.  It was not good
to pipe the output of log_must into the while-loop, anyway.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alek Pinchuk <apinchuk@datto.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10572

4 years agoFix early include of <linux/percpu_compat.h>
Romain Dolbeau [Wed, 15 Jul 2020 22:58:15 +0000 (00:58 +0200)]
Fix early include of <linux/percpu_compat.h>

Move/add include of <linux/percpu_compat.h> to satisfy missing
requirements.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Romain Dolbeau <romain@dolbeau.org>
Closes #10568
Closes #10569

4 years agoExtend zdb to print inconsistencies in livelists and metaslabs
Matthew Ahrens [Wed, 15 Jul 2020 00:51:05 +0000 (17:51 -0700)]
Extend zdb to print inconsistencies in livelists and metaslabs

Livelists and spacemaps are data structures that are logs of allocations
and frees.  Livelists entries are block pointers (blkptr_t). Spacemaps
entries are ranges of numbers, most often used as to track
allocated/freed regions of metaslabs/vdevs.

These data structures can become self-inconsistent, for example if a
block or range can be "double allocated" (two allocation records without
an intervening free) or "double freed" (two free records without an
intervening allocation).

ZDB (as well as zfs running in the kernel) can detect these
inconsistencies when loading livelists and metaslab.  However, it
generally halts processing when the error is detected.

When analyzing an on-disk problem, we often want to know the entire set
of inconsistencies, which is not possible with the current behavior.
This commit adds a new flag, `zdb -y`, which analyzes the livelist and
metaslab data structures and displays all of their inconsistencies.
Note that this is different from the leak detection performed by
`zdb -b`, which checks for inconsistencies between the spacemaps and the
tree of block pointers, but assumes the spacemaps are self-consistent.

The specific checks added are:

Verify livelists by iterating through each sublivelists and:
- report leftover FREEs
- report double ALLOCs and double FREEs
- record leftover ALLOCs together with their TXG [see Cross Check]

Verify spacemaps by iterating over each metaslab and:
- iterate over spacemap and then the metaslab's entries in the
  spacemap log, then report any double FREEs and double ALLOCs

Verify that livelists are consistenet with spacemaps.  The space
referenced by livelists (after using the FREE's to cancel out
corresponding ALLOCs) should be allocated, according to the spacemaps.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Sara Hartse <sara.hartse@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
External-issue: DLPX-66031
Closes #10515

4 years agoCentralize variable substitution
Arvind Sankar [Sat, 11 Jul 2020 23:35:58 +0000 (19:35 -0400)]
Centralize variable substitution

A bunch of places need to edit files to incorporate the configured paths
i.e. bindir, sbindir etc. Move this logic into a common file.

Create arc_summary by copying arc_summary[23] as appropriate at build
time instead of install time.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10559

4 years agoMake RPM_DEFINE_KMOD conditional on CONFIG_KERNEL
Arvind Sankar [Mon, 13 Jul 2020 23:20:27 +0000 (19:20 -0400)]
Make RPM_DEFINE_KMOD conditional on CONFIG_KERNEL

The configure variables won't be defined if CONFIG_KERNEL is disabled
and defining empty macros causes errors. The spec files do provide some
defaults if the macros are undefined.

Remove config conditionals in the tgz Makefile.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10564

4 years agoFix parallel make srpm
Arvind Sankar [Mon, 13 Jul 2020 21:24:07 +0000 (17:24 -0400)]
Fix parallel make srpm

When building srpm using make -j, each of the recursive makes invoked to
build srpm-{dkms,kmod,utils} will build the dist target. This is both
unnecessary, and also has a very good chance of breaking when they race
trying to build gitrev.

Fix this by make dist a prerequisite of srpm-{dkms,kmod,utils} instead
of srpm-common, so that it will be done once before invoking the
recursive makes.

Also, gitrev is not really required for make dist, so instead of adding
it to BUILT_SOURCES, just add it as a prerequisite of the all target.

Mark the individual package targets as PHONY.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10564

4 years agoFix LOR between dp_config_rwlock and spa_props_lock
Alexander Motin [Tue, 14 Jul 2020 19:21:57 +0000 (15:21 -0400)]
Fix LOR between dp_config_rwlock and spa_props_lock

Our QE team during automated API testing hit deadlock in ZFS, caused
by lock order reversal.  From one side dsl_sync_task_sync() locks
dp_config_rwlock as writer and calls spa_sync_props(), which waits
for spa_props_lock.  From another spa_prop_get() locks spa_props_lock
and then calls dsl_pool_config_enter(), trying to lock dp_config_rwlock
as reader.

This patch makes spa_prop_get() lock dp_config_rwlock before
spa_props_lock, making the order consistent.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #10553

4 years agoDisable -Wl,-z,defs for ASAN builds
Joao Carlos Mendes Luis [Tue, 14 Jul 2020 19:17:44 +0000 (16:17 -0300)]
Disable -Wl,-z,defs for ASAN builds

Commit af65916 added -Wl,-z,defs for the shared libraries. This
apparently does not work in some cases with --enable-asan, so only add
it for non-ASAN builds.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: João Carlos Mendes Luis <jonny@jonny.eng.br>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10557
Closes #10560

4 years agoFixing gang ABD child removal race condition
Brian Atkinson [Tue, 14 Jul 2020 18:04:35 +0000 (12:04 -0600)]
Fixing gang ABD child removal race condition

On linux the list debug code has been setting off a failure when
checking that the node->next->prev value is pointing back at the node.
At times this check evaluates to 0xdead. When removing a child from a
gang ABD we must acquire the child's abd_mtx to make sure that the
same ABD is not being added to another gang ABD while it is being
removed from a gang ABD. This fixes a race condition when checking
if an ABDs link is already active and part of another gang ABD before
adding it to a gang.

Added additional debug code for the gang ABD in abd_verify() to make
sure each child ABD has active links. Also check to make sure another
gang ABD is not added to a gang ABD.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #10511

4 years agoRemove dependency on sharetab file and refactor sharing logic
George Wilson [Mon, 13 Jul 2020 16:19:18 +0000 (11:19 -0500)]
Remove dependency on sharetab file and refactor sharing logic

== Motivation and Context

The current implementation of 'sharenfs' and 'sharesmb' relies on
the use of the sharetab file. The use of this file is os-specific
and not required by linux or freebsd. Currently the code must
maintain updates to this file which adds complexity and presents
a significant performance impact when sharing many datasets. In
addition, concurrently running 'zfs sharenfs' command results in
missing entries in the sharetab file leading to unexpected failures.

== Description

This change removes the sharetab logic from the linux and freebsd
implementation of 'sharenfs' and 'sharesmb'. It still preserves an
os-specific library which contains the logic required for sharing
NFS or SMB. The following entry points exist in the vastly simplified
libshare library:

- sa_enable_share -- shares a dataset but may not commit the change
- sa_disable_share -- unshares a dataset but may not commit the change
- sa_is_shared -- determine if a dataset is shared
- sa_commit_share -- notify NFS/SMB subsystem to commit the shares
- sa_validate_shareopts -- determine if sharing options are valid

The sa_commit_share entry point is provided as a performance enhancement
and is not required. The sa_enable_share/sa_disable_share may commit
the share as part of the implementation. Libshare provides a framework
for both NFS and SMB but some operating systems may not fully support
these protocols or all features of the protocol.

NFS Operation:
For linux, libshare updates /etc/exports.d/zfs.exports to add
and remove shares and then commits the changes by invoking
'exportfs -r'. This file, is automatically read by the kernel NFS
implementation which makes for better integration with the NFS systemd
service. For FreeBSD, libshare updates /etc/zfs/exports to add and
remove shares and then commits the changes by sending a SIGHUP to
mountd.

SMB Operation:
For linux, libshare adds and removes files in /var/lib/samba/usershares
by calling the 'net' command directly. There is no need to commit the
changes. FreeBSD does not support SMB.

== Performance Results

To test sharing performance we created a pool with an increasing number
of datasets and invoked various zfs actions that would enable and
disable sharing. The performance testing was limited to NFS sharing.
The following tests were performed on an 8 vCPU system with 128GB and
a pool comprised of 4 50GB SSDs:

Scale testing:
- Share all filesystems in parallel -- zfs sharenfs=on <dataset> &
- Unshare all filesystems in parallel -- zfs sharenfs=off <dataset> &

Functional testing:
- share each filesystem serially -- zfs share -a
- unshare each filesystem serially -- zfs unshare -a
- reset sharenfs property and unshare -- zfs inherit -r sharenfs <pool>

For 'zfs sharenfs=on' scale testing we saw an average reduction in time
of 89.43% and for 'zfs sharenfs=off' we saw an average reduction in time
of 83.36%.

Functional testing also shows a huge improvement:
- zfs share -- 97.97% reduction in time
- zfs unshare -- 96.47% reduction in time
- zfs inhert -r sharenfs -- 99.01% reduction in time

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Bryant G. Ly <bryangly@gmail.com>
Signed-off-by: George Wilson <gwilson@delphix.com>
External-Issue: DLPX-68690
Closes #1603
Closes #7692
Closes #7943
Closes #10300

4 years agofilesystem_limit/snapshot_limit is incorrectly enforced against root
Matthew Ahrens [Sun, 12 Jul 2020 00:18:02 +0000 (17:18 -0700)]
filesystem_limit/snapshot_limit is incorrectly enforced against root

The filesystem_limit and snapshot_limit properties limit the number of
filesystems or snapshots that can be created below this dataset.
According to the manpage, "The limit is not enforced if the user is
allowed to change the limit."  Two types of users are allowed to change
the limit:

1. Those that have been delegated the `filesystem_limit` or
`snapshot_limit` permission, e.g. with
`zfs allow USER filesystem_limit DATASET`.  This works properly.

2. A user with elevated system privileges (e.g. root).  This does not
work - the root user will incorrectly get an error when trying to create
a snapshot/filesystem, if it exceeds the `_limit` property.

The problem is that `priv_policy_ns()` does not work if the `cred_t` is
not that of the current process.  This happens when
`dsl_enforce_ds_ss_limits()` is called in syncing context (as part of a
sync task's check func) to determine the permissions of the
corresponding user process.

This commit fixes the issue by passing the `task_struct` (typedef'ed as
a `proc_t`) to syncing context, and then using `has_capability()` to
determine if that process is privileged.  Note that we still need to
pass the `cred_t` to syncing context so that we can check if the user
was delegated this permission with `zfs allow`.

This problem only impacts Linux.  Wrappers are added to FreeBSD but it
continues to use `priv_check_cred()`, which works on arbitrary `cred_t`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #8226
Closes #10545

4 years agolibzfs: Add error message for why creating mountpoint failed
Ryan Moeller [Sun, 12 Jul 2020 00:16:13 +0000 (20:16 -0400)]
libzfs: Add error message for why creating mountpoint failed

When zfs_mount_at() fails to stat the mountpoint and can't create the
directory, we return an error with a message "failed to create
mountpoint" but there is no indication why it failed.

Add the error string from the syscall to the error aux message.

Update do_mount for Linux to return the errno instead of -1.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10550

4 years agoFreeBSD: Use a hash table for taskqid lookups
Matthew Macy [Sun, 12 Jul 2020 00:13:45 +0000 (17:13 -0700)]
FreeBSD: Use a hash table for taskqid lookups

Previously a tqent could be recycled prematurely, update the
code to use a hash table for lookups to resolve this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10529

4 years agoUnconditionally enable debugging for libzpool
Serapheim Dimitropoulos [Fri, 10 Jul 2020 22:30:31 +0000 (15:30 -0700)]
Unconditionally enable debugging for libzpool

We already enable -DDEBUG unconditionally (meaning regardless
of this is a debug build or a performance build) for zdb and
ztest as they are mostly used for development and debugging.

This patch enables -DDEBUG for libzpool extending the debugging
checks for zdb, ztest, and a couple of other test utilities.

In addition to passing -DDEBUG we also enable -DZFS_DEBUG so
all assertion checks work s expected. We do so not only in
libzpool but in every utility that links to it, even if the
utility doesn't directly use any functionality wrapped in
ZFS_DEBUG macro definitions. The reason is that these utilities
may still include headers that contain structs that have more
fields when ZFS_DEBUG is defined. This can be a problem as
enabling that flag for libzpool but not for zdb can lead into
random problems (e.g. segmentation faults) as zdb may be have
an incorrect view of a struct passed to it by libzpool.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #10549

4 years agoFix up FIND_SYSTEM_LIBRARY to work with cross-compiling
Arvind Sankar [Thu, 9 Jul 2020 20:31:41 +0000 (16:31 -0400)]
Fix up FIND_SYSTEM_LIBRARY to work with cross-compiling

Make FIND_SYSTEM_LIBRARY respect a configured sysroot, otherwise it
might find headers from the build machine and assume the library is
available on the host/target.

Tighten up error checking: if pkg-config or the user specified _CFLAGS
or _LIBS but we can't find the header/library, issue a fatal error.

Fix the -L flag to /usr/local/lib instead of just /usr/local.

Clean out the _CFLAGS and _LIBS if we located something that we later
find doesn't work.

Rename FIND_SYSTEM_LIBRARY into the ZFS_AC_ scope.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10538

4 years agoUse abs_top_builddir when referencing libraries
Arvind Sankar [Mon, 6 Jul 2020 20:01:29 +0000 (16:01 -0400)]
Use abs_top_builddir when referencing libraries

libtool stores absolute paths in the dependency_libs component of the
.la files. If the Makefile for a dependent library refers to the
libraries by relative path, some libraries end up duplicated on the link
command line.

As an example, libzfs specifies libzfs_core, libnvpair and libuutil as
dependencies to be linked in. The .la file for libzfs_core also
specifies libnvpair, but using an absolute path, with the result that
libnvpair is present twice in the linker command line for producing
libzfs.

While the only thing this causes is to slightly slow down the linking,
we can avoid it by using absolute paths everywhere, including for
convenience libraries just for consistency.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10538

4 years agoAdd -z defs to LDFLAGS
Arvind Sankar [Mon, 6 Jul 2020 02:58:59 +0000 (22:58 -0400)]
Add -z defs to LDFLAGS

This will make sure the installed libraries are linked with everything
they require.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10538

4 years agoAdd config.rpath for AM_GNU_GETTEXT
Arvind Sankar [Mon, 6 Jul 2020 01:08:40 +0000 (21:08 -0400)]
Add config.rpath for AM_GNU_GETTEXT

Commit e8864b1b28c2 ("config: libintl/libiconv for gettext() detection")
added an empty config.rpath with a comment that the real one doesn't
work with libtool.

However, an empty config.rpath doesn't really work: eg. on FreeBSD,
where libintl is in /usr/local/lib, configure thinks that gettext
doesn't exist and NLS should be disabled, which currently isn't
supported in the source, and hence requires manual workaround to
directly link -lintl without relying on configure. config.rpath is
essential to let it be detected either in --prefix or using
--with-libintl-prefix.

I also don't see the mentioned issue with libtool flags applied to
compilation, it seems to work fine to pass LTLIBINTL to libtool. It's
unnecessary to include LTLIBICONV as the configure test will
automatically append that to LTLIBINTL if it is necessary to link with
libiconv.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10538

4 years agoClean up lib dependencies
Arvind Sankar [Tue, 30 Jun 2020 17:10:41 +0000 (13:10 -0400)]
Clean up lib dependencies

libzutil is currently statically linked into libzfs, libzfs_core and
libzpool. Avoid the unnecessary duplication by removing it from libzfs
and libzpool, and adding libzfs_core to libzpool.

Remove a few unnecessary dependencies:
- libuutil from libzfs_core
- libtirpc from libspl
- keep only libcrypto in libzfs, as we don't use any functions from
  libssl
- librt is only used for clock_gettime, however on modern systems that's
  in libc rather than librt. Add a configure check to see if we actually
  need librt
- libdl from raidz_test

Add a few missing dependencies:
- zlib to libefi and libzfs
- libuuid to zpool, and libuuid and libudev to zed
- libnvpair uses assertions, so add assert.c to provide aok and
  libspl_assertf

Sort the LDADD for programs so that libraries that satisfy dependencies
come at the end rather than the beginning of the linker command line.

Revamp the configure tests for libaries to use FIND_SYSTEM_LIBRARY
instead. This can take advantage of pkg-config, and it also avoids
polluting LIBS.

List all the required dependencies in the pkgconfig files, and move the
one for libzfs_core into the latter's directory. Install pkgconfig files
in $(libdir)/pkgconfig on linux and $(prefix)/libdata/pkgconfig on
FreeBSD, instead of /usr/share/pkgconfig, as the more correct location
for library .pc files.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10538