]> CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log
FreeBSD/FreeBSD.git
3 years agoFreeBSD: Simplify INGLOBALZONE
Ryan Moeller [Sat, 29 Aug 2020 18:43:26 +0000 (18:43 +0000)]
FreeBSD: Simplify INGLOBALZONE

FreeBSD's previous ZFS implemented INGLOBALZONE(thread) as
(!jailed((thread)->td_ucred)) and passed curthread to INGLOBALZONE.

We pass curproc instead of curthread, so we can achieve the same effect
with (!jailed((proc)->p_ucred)).  The implementation is trivial enough
to fit on a single line in a define.  We don't really need a whole
separate function for something that's already macros all the way down.

Eliminate in_globalzone.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #10851

3 years agoFreeBSD: Define crgetzoneid appropriately
Ryan Moeller [Sat, 29 Aug 2020 18:25:56 +0000 (18:25 +0000)]
FreeBSD: Define crgetzoneid appropriately

The previous ZFS implementation on FreeBSD had ifdefs to use jailed()
instead of crgetzoneid() in dsl_dir.c, however we can simply provide an
appropriate definition of crgetzoneid for the same effect.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #10851

3 years agozio_ereport_post() and zio_ereport_start() return values are ignored
Toomas Soome [Tue, 1 Sep 2020 02:35:11 +0000 (05:35 +0300)]
zio_ereport_post() and zio_ereport_start() return values are ignored

use (void) to silence analyzers.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #10857

3 years agoTypo Correction
Spencer Kinny [Sun, 30 Aug 2020 21:14:32 +0000 (02:44 +0530)]
Typo Correction

Corrected the typo in zfs/cmd/zfs/zfs_main.c
line number 404 pbkfd2iters to pbkdf2iters

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Spencer Kinny <spencerkinny1995@gmail.com>
Closes #10850

3 years agoMove spa_stats.c to common code
Matthew Macy [Sun, 30 Aug 2020 21:12:46 +0000 (14:12 -0700)]
Move spa_stats.c to common code

Initially it was considered simplest to stub out all
of the functions on FreeBSD. Now that FreeBSD supports
KSTAT_TYPE_RAW at least some of the functionality should
be made available.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10842

3 years agoFreeBSD: Fix spurious failure in zvol_geom_open
Matthew Macy [Sun, 30 Aug 2020 21:11:33 +0000 (14:11 -0700)]
FreeBSD: Fix spurious failure in zvol_geom_open

In zvol_geom_open on first open we need to guarantee
that the namespace lock is held to avoid spurious
failures in zvol_first_open.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10841

3 years agoAuto close "Status: Feedback requested" after a month
Kjeld Schouten-Lebbing [Sun, 30 Aug 2020 21:09:54 +0000 (23:09 +0200)]
Auto close "Status: Feedback requested" after a month

This commit closes issues labeled with:
"Status: Feedback requested" after 1 month, if the
label is not removed or the author has not responded

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #10807
Closes #10808

3 years agoFreeBSD: add support for KSTAT_TYPE_RAW
Matthew Macy [Sun, 30 Aug 2020 03:59:50 +0000 (20:59 -0700)]
FreeBSD: add support for KSTAT_TYPE_RAW

A few kstats use KSTAT_TYPE_RAW to provide a string generated on
demand.  Implementing these as sysctls was punted until now.

Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10836

3 years agoLinux 5.9 compat: NR_SLAB_RECLAIMABLE
Brian Behlendorf [Sun, 30 Aug 2020 03:57:45 +0000 (20:57 -0700)]
Linux 5.9 compat: NR_SLAB_RECLAIMABLE

Commit dcdc12e added compatibility code to treat NR_SLAB_RECLAIMABLE_B
as if it were the same as NR_SLAB_RECLAIMABLE.  However, the new value
is in bytes while the old value was in pages which means they are not
interchangeable.

The only place the reclaimable slab size is used is as a component of
the calculation done by arc_free_memory().  This function returns the
amount of memory the ARC considers to be free or reclaimable at little
cost.  Rather than switch to a new interface to get this value it has
been removed it from the calculation.  It is normally a minor component
compared to the number of inactive or free pages, and removing it
aligns the behavior with the FreeBSD version of arc_free_memory().

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10834

3 years agoFix another dependency loop
Richard Laager [Sun, 31 May 2020 01:39:31 +0000 (20:39 -0500)]
Fix another dependency loop

zfs-load-key-DATASET.service was gaining an
After=systemd-journald.socket due to its stdout/stderr going to the
journal (which is the default).  systemd-journald.socket has an After
(via RequiresMountsFor=/run/systemd/journal) on -.mount.  If the root
filesystem is encrypted, -.mount gets an After
zfs-load-key-DATASET.service.

By setting stdout and stderr to null on the key load services, we avoid
this loop.

Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: InsanePrawn <insane.prawny@gmail.com>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #10356
Closes #10388

3 years agoFix a dependency loop
Richard Laager [Sat, 30 May 2020 23:40:45 +0000 (18:40 -0500)]
Fix a dependency loop

When generating units with zfs-mount-generator, if the pool is already
imported, zfs-import.target is not needed.  This avoids a dependency
loop on root-on-ZFS systems:
  systemd-random-seed.service After (via RequiresMountsFor)
  var-lib.mount After
  zfs-import.target After
  zfs-import-{cache,scan}.service After
  cryptsetup.service After
  systemd-random-seed.service

Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: InsanePrawn <insane.prawny@gmail.com>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #10388

3 years agoconfig/zfs-build.m4: add --with-vendor flag
Georgy Yakovlev [Fri, 28 Aug 2020 16:43:44 +0000 (09:43 -0700)]
config/zfs-build.m4: add --with-vendor flag

This will allow an override of auto-detection of distribution, which
is based on checking presence of /etc/*-release files.

Build systems makes a lot of file location assumptions based on
detected distribution.

Some distributions (like gentoo) may prefer explicitly
setting --with-vendor=gentoo to avoid auto-detection.

Since auto-detection checks all files in order, current script may
misdetect even on gentoo system if /etc/redhat-release file is present

Default behavior is unchanged and default is --with-vendor=check

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Closes #10835

3 years agoFix definition of BLKGETSIZE64 on FreeBSD
Alexander Richardson [Thu, 27 Aug 2020 23:09:26 +0000 (00:09 +0100)]
Fix definition of BLKGETSIZE64 on FreeBSD

The matching ioctl is DIOCGMEDIASIZE.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Alex Richardson <Alexander.Richardson@cl.cam.ac.uk>
Closes #10818

3 years agomodule/zstd: pass -U__BMI__
Georgy Yakovlev [Thu, 27 Aug 2020 22:50:13 +0000 (15:50 -0700)]
module/zstd: pass -U__BMI__

If kernel is compiled with -march=znver1 or -march=znver2 zstd module
compilation will fail due to SSE register return with SSE disabled.
What's interesting, is that -march=skylake also implies -mbmi which
defines __BMI__ but compilation succeeds.  It is probably due to
different BMI implementations on AMD and INTEL processors and the
way compiler uses instructions.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Closes #10758
Closes #10829

3 years agoAdd the Xr's to the SEE ALSO as well
John-Mark Gurney [Thu, 27 Aug 2020 05:29:00 +0000 (22:29 -0700)]
Add the Xr's to the SEE ALSO as well

There are a ton of zfs-* and zpool-* man pages. This adds them to
the SEE ALSO section so that people can more quickly look through
what all the options are, now that the pages have been split.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: John-Mark Gurney <jmg@funkthat.com>
Closes #10589

3 years agodnode_sync is careless with range tree
Patrick Mooney [Thu, 27 Aug 2020 04:48:29 +0000 (23:48 -0500)]
dnode_sync is careless with range tree

Because dnode_sync_free_range() must drop dn_mtx during its processing,
using it as a callback to range_tree_vacate() is not safe.  No other
operations (besides destroy) are allowed once range_tree_vacate() has
begun, and dropping dn_mtx would leave a window open for another thread
to observe that invalid (and unsafe) state via dnode_block_freed().

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Patrick Mooney <pmooney@oxide.computer>
Closes #10708
Closes #10823

3 years agoFix NEWS file
Cédric Berger [Thu, 27 Aug 2020 04:44:41 +0000 (06:44 +0200)]
Fix NEWS file

Points to https://github.com/openzfs/zfs/releases

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Cédric Berger <cedric@precidata.com>
Closes #10824

3 years agozpool: Change base URL for ZFS messages to openzfs-docs
Ryan Moeller [Thu, 27 Aug 2020 04:43:06 +0000 (00:43 -0400)]
zpool: Change base URL for ZFS messages to openzfs-docs

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10820

3 years agoRemove duplicate dnode.h include
Brian Behlendorf [Thu, 27 Aug 2020 04:41:09 +0000 (21:41 -0700)]
Remove duplicate dnode.h include

The zfs/sa.c source file accidentally includes sys/dnode.h twice.
Remove the second occurrence.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10816
Closes #10819

3 years agoAlways track temporary fses and snapshots for accounting
Paul Dagnelie [Thu, 27 Aug 2020 04:38:27 +0000 (21:38 -0700)]
Always track temporary fses and snapshots for accounting

The root cause of the issue is that we only occasionally do as the
comments in the code suggest and actually ignore the %recv dataset when
it comes to filesystem limit tracking. Specifically, the only time we
ignore it is when initializing the filesystem and snapshot limit values;
when creating a new %recv dataset or deleting one, we always update
the bookkeeping. This causes a problem if you init the fs count on a
filesystem that already has a %recv dataset, since the bookmarking
will be decremented but not incremented. This is resolved in this
patch by simply always tracking the %recv dataset as a child.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #10791

3 years agoFix broken bug report form
Kjeld Schouten-Lebbing [Wed, 26 Aug 2020 17:49:51 +0000 (19:49 +0200)]
Fix broken bug report form

By accident previous PR broke the bug report form.
This commit fixes it
(and is actually tested completely to work)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #10821

3 years agoRemove pragma ident lines
Toomas Soome [Wed, 26 Aug 2020 17:35:50 +0000 (20:35 +0300)]
Remove pragma ident lines

The #pragma ident is a historical relic and not needed any more, this
pragma is actually unknown for common compilers and is only causing
trouble.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #10810

3 years agoFreeBSD: disable neon usage
Matthew Macy [Wed, 26 Aug 2020 16:54:37 +0000 (09:54 -0700)]
FreeBSD: disable neon usage

The neon support code does not build on FreeBSD,
ifdef out references to fix linker issues on arm64.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10809

3 years agoGithub CI: Enable checkbashism
George Melikov [Wed, 26 Aug 2020 16:52:28 +0000 (19:52 +0300)]
Github CI: Enable checkbashism

Run checkbashisms on checkstyle too.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #10811

3 years agoStaleBot Tweaks
Kjeld Schouten-Lebbing [Wed, 26 Aug 2020 16:49:58 +0000 (18:49 +0200)]
StaleBot Tweaks

- Add Status: Triage Needed to bug reports

Currently "Type: Defect" is auto added.
Adding a triage tag, makes sure all issues are reviewed by a maintainer
It also opens up some options to priorities defects in the near future.

- Prevent future StaleBot Spam

StaleBot will limit itself to 6 actions per hour
This should prevent future floods of StaleBot activity
(aka Spam)

- StaleBot: Ignore issues that are being worked on

Ignore the following Issues:
- tagged: "Status: Work in Progress"
- Having a maintainer assigned
- Being part of a project
- Having a milestone tag

- Rename Ignore "Type: Understood" to "Bot: Not Stale"

This Commits changes the general ignore tag for StaleBot from:
 "Type: Understood"
to
"Bot: Not Stale"

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #10813

3 years agoIntroduce limit on size of L2ARC headers
Alexander Motin [Tue, 25 Aug 2020 21:33:36 +0000 (17:33 -0400)]
Introduce limit on size of L2ARC headers

Since L2ARC buffers are not evicted on memory pressure, too large
amount of headers on system with irrationally large L2ARC can render
it slow or even unusable.  This change limits L2ARC writes and
rebuild if unevictable L2ARC-only headers reach dangerous level.

While there, call arc_adapt() on L2ARC rebuild, so that it could
properly grow arc_c, reflecting potentially significant ARC size
increase and avoiding slow growth with hopeless eviction attempts
later when "overflow" is detected.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reported-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #10765

3 years agoTag 2.0.0-rc1
Brian Behlendorf [Tue, 25 Aug 2020 18:48:28 +0000 (11:48 -0700)]
Tag 2.0.0-rc1

New features:
- Unified code base for Linux and FreeBSD
- Redacted 'zfs send/recv'
- Persistent L2ARC
- Sequential resilvering
- ZSTD Compression
- Log spacemaps
- Fast clone deletion
- Sectional zfs/zpool man pages
- Added 'zpool wait' subcommand
- Improved 'zfs share' scalability
- Improved AES-GCM encryption performance

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
3 years agoDon't assert on nvlists larger than SPA_MAXBLOCKSIZE
Allan Jude [Tue, 25 Aug 2020 18:04:20 +0000 (14:04 -0400)]
Don't assert on nvlists larger than SPA_MAXBLOCKSIZE

Originally we asserted that all reads are less than SPA_MAXBLOCKSIZE
However, nvlists are not ZFS records, and are not limited to
SPA_MAXBLOCKSIZE.

Add a new environment variable, ZFS_SENDRECV_MAX_NVLIST, to allow the
user to specify the maximum size of the nvlist that can be sent or
received.
Default value: 4 * SPA_MAXBLOCKSIZE (64 MB)

Modify libzfs send routines to return a useful error if the send stream
will generate an nvlist that is beyond the maximum size.

Modify libzfs recv routines to add an explicit error message if the
nvlist is too large, rather than abort()ing.

Move the change the assert() to only trigger on data records

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #9616

3 years agoMark lua setjmp/longjmp for powerpc weak
sterlingjensen [Tue, 25 Aug 2020 17:32:49 +0000 (12:32 -0500)]
Mark lua setjmp/longjmp for powerpc weak

Linux already defines setjmp/longjmp for powerpc, which leads to
duplicate symbols in a statically linked build.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sterlng Jensen <sterlingjensen@users.noreply.github.com>
Closes #10795

3 years agoExport dmu_offset_next() symbol
Brian Behlendorf [Tue, 25 Aug 2020 15:34:41 +0000 (08:34 -0700)]
Export dmu_offset_next() symbol

Export the dmu_offset_next() symbol for use by Lustre.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10796

3 years agoman: Canonicalize .TH usage
Ryan Moeller [Tue, 25 Aug 2020 04:25:28 +0000 (00:25 -0400)]
man: Canonicalize .TH usage

* Use all caps for document title.
* Remove section name as it can be inferred from the section number.
* Name "OpenZFS" as the document source.
* Bump modification date.

While here, fixed trailing whitespace reported by igor.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10792

3 years agoFix inability to destroy snapshot used over NFS
youzhongyang [Tue, 25 Aug 2020 00:33:02 +0000 (20:33 -0400)]
Fix inability to destroy snapshot used over NFS

The cache of struct svc_export and struct svc_expkey by nfsd and
rpc.mountd for the snapshot holds references to the mount point.
We need to flush them out before unmounting, otherwise umount
would fail with EBUSY.

Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #6000
Closes #10783

3 years agoAvoid symbol collision with in-kernel zstdlib
Sebastian Gottschall [Mon, 24 Aug 2020 19:20:41 +0000 (21:20 +0200)]
Avoid symbol collision with in-kernel zstdlib

For Linux, when zfs is compiled as an in kernel static variant
and the in kernel zstd library is compiled statically into the kernel
a symbol collision will occur.  This wrapper header renames all
of the relevant zstd functions to avoid this problem.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Closes #10775

3 years agoAdd Stale-bot
Kjeld Schouten-Lebbing [Mon, 24 Aug 2020 19:04:38 +0000 (21:04 +0200)]
Add Stale-bot

This file configures the following stale-bot:
https://github.com/apps/stale

It is set to mark issues as "Stale" after 365 days
It is also set to auto-close the issue 90 days after.

Please be aware that this issue also requires-
The listed stale-bot being added to the repo.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #10778

3 years agoAppease GCC sprintf warnings found on Fedora 32/GCC 10.0.1
Chris McDonough [Mon, 24 Aug 2020 17:32:59 +0000 (13:32 -0400)]
Appease GCC sprintf warnings found on Fedora 32/GCC 10.0.1

Increase the size of DDT_NAMELEN and MNT_LINE_MAX to appease GCC
snprintf truncation warnings.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris McDonough <chrism@plope.com>
Closes #10712
Closes #10766

4 years agoZTS: Improve block_device_wait on FreeBSD
Ryan Moeller [Mon, 24 Aug 2020 15:50:15 +0000 (11:50 -0400)]
ZTS: Improve block_device_wait on FreeBSD

FreeBSD doesn't have an equivalent to udevadm settle, so we have been
resorting to a three second sleep to wait for device changes to take
effect.  This is far from ideal.

We are mainly waiting for volmode=geom zvols to appear in /dev, so as
a hack, reading the geom config will have the desired effect of
quiescing the geom state.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10768

4 years agoImprove documentation of zpool import -d/-c vs -s
Chris McDonough [Mon, 24 Aug 2020 04:18:30 +0000 (00:18 -0400)]
Improve documentation of zpool import -d/-c vs -s

Specify that, by default, zpool import uses the libblkid
cache on Linux and geom on FreeBSD, and only scans when
-d/-s is provided.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Chris McDonough <chrism@plope.com>
Closes #7656
Closes #10771

4 years agoCI checkstyle: add linter + rename job + install latest flake8
George Melikov [Mon, 24 Aug 2020 04:15:25 +0000 (07:15 +0300)]
CI checkstyle: add linter + rename job + install latest flake8

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #10784

4 years agoZFS performance tests should clean up NFS mount
Tony Nguyen [Sun, 23 Aug 2020 22:14:22 +0000 (16:14 -0600)]
ZFS performance tests should clean up NFS mount

This change umounts client's NFS mount after each test so we can avoid
two sporadic issues:
1) client NFS stale mount and
2) zpool export and zpool destroy failed due to dataset busy

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Nguyen <tony.nguyen@delphix.com>
Closes #10767

4 years agoAdd seperate issue for questions
Kjeld Schouten-Lebbing [Sun, 23 Aug 2020 22:13:07 +0000 (00:13 +0200)]
Add seperate issue for questions

A big portion of issues are of "Type: Question".

This PR adds a separate issue template for those.
It also automatically adds the "Type: Question" tag.

in addition it adds "Type: Defect" to all bug reports by default

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #10779

4 years agolibzstd: Don't warn about stack frame size in userspace
Ryan Moeller [Sun, 23 Aug 2020 18:13:34 +0000 (14:13 -0400)]
libzstd: Don't warn about stack frame size in userspace

With the current way CFLAGS are modified in libzstd, CFLAGS passed on
the make command line will cause the CFLAGS in the Makefile for zstd.c
to be discarded, but not AM_CFLAGS.  This causes a smaller frame size
limit to be used, and the build fails.

We don't need to worry about stack frame sizes in userspace.  Drop the
extra flags.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #10773

4 years agoPrevent zfs_acl_chmod() if aclmode restricted and ACL inherited
Andrew [Sun, 23 Aug 2020 04:49:25 +0000 (00:49 -0400)]
Prevent zfs_acl_chmod() if aclmode restricted and ACL inherited

In absence of inheriting entry for owner@, group@, or everyone@,
zfs_acl_chmod() is called to set these. This can cause confusion for Samba
admins who do not expect these entries to appear on newly created files and
directories once they have been stripped from from the parent directory.

When aclmode is set to "restricted", chmod is prevented on non-trivial ACLs.
It is not a stretch to assume that in this case the administrator does not want
ZFS to add the missing special entries. Add check for this aclmode, and if an
inherited entry is present skip zfs_acl_chmod().

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrew Walker <awalker@ixsystems.com>
Closes #10748

4 years agoUpdate issue template
Kjeld Schouten-Lebbing [Sun, 23 Aug 2020 04:41:01 +0000 (06:41 +0200)]
Update issue template

Github has started using a new issue templating structure.
This commit moves the current template and adds one additional one.

- Moves issue template to new issue-template folder
- Adds feature request template
- removes the following warning when viewing issue template

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #10759

4 years agoZTS: Remove leftover variable names
Ryan Moeller [Sat, 22 Aug 2020 18:05:59 +0000 (14:05 -0400)]
ZTS: Remove leftover variable names

These were overlooked when use of `local` was removed to satisfy
checkbashisms.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10762

4 years agoRemove vestigial settings related to initramfs
Chris McDonough [Sat, 22 Aug 2020 18:04:49 +0000 (14:04 -0400)]
Remove vestigial settings related to initramfs

Remove ZFS_POOL_IMPORT, ZFS_INITRD_PRE_MOUNTROOT_SLEEP,
ZFS_INITRD_POST_MODPROBE_SLEEP, and ZFS_INITRD_ADDITIONAL_DATASETS
features from etc/defaults/zfs.in.  These features no longer work.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Chris McDonough <chrism@plope.com>
Closes #9126
Closes #10757

4 years agoMake formatting of dedup values string consistent
Clint Armstrong [Sat, 22 Aug 2020 17:58:07 +0000 (13:58 -0400)]
Make formatting of dedup values string consistent

All other prop values return options separated by ` | `,
dedup values do not, they are separated by `, `. This change
makes the dedup value formatting consistent with other properties.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Clint Armstrong <clint@clintarmstrong.net>
Closes #10761

4 years agoImport vdev ashift optimization from FreeBSD
Ryan Moeller [Fri, 21 Aug 2020 19:53:17 +0000 (15:53 -0400)]
Import vdev ashift optimization from FreeBSD

Many modern devices use physical allocation units that are much
larger than the minimum logical allocation size accessible by
external commands. Two prevalent examples of this are 512e disk
drives (512b logical sector, 4K physical sector) and flash devices
(512b logical sector, 4K or larger allocation block size, and 128k
or larger erase block size). Operations that modify less than the
physical sector size result in a costly read-modify-write or garbage
collection sequence on these devices.

Simply exporting the true physical sector of the device to ZFS would
yield optimal performance, but has two serious drawbacks:

 1. Existing pools created with devices that have different logical
    and physical block sizes, but were configured to use the logical
    block size (e.g. because the OS version used for pool construction
    reported the logical block size instead of the physical block
    size) will suddenly find that the vdev allocation size has
    increased. This can be easily tolerated for active members of
    the array, but ZFS would prevent replacement of a vdev with
    another identical device because it now appears that the smaller
    allocation size required by the pool is not supported by the new
    device.

 2. The device's physical block size may be too large to be supported
    by ZFS. The optimal allocation size for the vdev may be quite
    large. For example, a RAID controller may export a vdev that
    requires read-modify-write cycles unless accessed using 64k
    aligned/sized requests. ZFS currently has an 8k minimum block
    size limit.

Reporting both the logical and physical allocation sizes for vdevs
solves these problems. A device may be used so long as the logical
block size is compatible with the configuration. By comparing the
logical and physical block sizes, new configurations can be optimized
and administrators can be notified of any existing pools that are
sub-optimal.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Matthew Macy <mmacy@freebsd.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10619

4 years agoRemove hard coded "Linux" OS from manpages
Ryan Moeller [Fri, 21 Aug 2020 18:55:47 +0000 (14:55 -0400)]
Remove hard coded "Linux" OS from manpages

The recommended practice for `.Os` on FreeBSD is to not specify any
arguments.  The correct OS name is used automatically.

Oddly enough, on the Linux distro I tested this on (CentOS 7), the man
pager defaulted to displaying "BSD" as the OS rather than "Linux".  To
accommodate this, tack " Linux" back on in an install hook on Linux.
This is much simpler than removing it for FreeBSD when vendored in the
base system.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10760

4 years agoSilence 'make checkbashisms'
Brian Behlendorf [Thu, 20 Aug 2020 20:45:47 +0000 (13:45 -0700)]
Silence 'make checkbashisms'

Commit d2bce6d03 added the 'make checkbashisms' target but did not
resolve all of the bashisms in the scripts.  This commit doesn't
resolve them all either but it does fix up a few, and it excludes
the others so 'make checkstyle' no longer prints warnings.  It's
a small step in the right direction.

* Dracut is Linux specific and itself depends on bash.  Therefore
  all dracut support scripts can be bash specific, update their
  shebang accordingly.

* zed-functions.sh, zfs-import, zfs-mount, zfs-zed, smart
  paxcheck.sh, make_gitrev.sh - these scripts were excuded from
  the check until they can be updated and properly tested.

* zfsunlock - only whole values for sleep are allowed.

* vdev_id - removed unneeded locals; use && instead of -a.

* dkms.mkconf, dkms.postbuil - use || instead of -o.

Reviewed-by: InsanePrawn <insane.prawny@gmail.com>
Reviewed-by: Gabriel A. Devenyi <gdevenyi@gmail.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10755

4 years ago'zfs share -a' should clean noauto exports
Don Brady [Thu, 20 Aug 2020 20:12:12 +0000 (14:12 -0600)]
'zfs share -a' should clean noauto exports

This is a follow on to PR #10688 where `zfs share -a` allows the
sharing of canmount=noauto datasets if they are mounted.  However,
when a dataset with canmount=noauto is not mounted, the command
should also purge any existing entries from the exports file.
Otherwise, after a reboot, the nfs server attempts to export the
underlying mountpath, not the dataset. This can lead to a hard hang
for existing client mounts.

Instead of just skipping the adding of an export if not mounted
and canmount=noauto, have it also remove an existing export of the
dataset so that, after a reboot, we don't export an unmounted dataset.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #10747

4 years agoFix indentation in dnode_free_range()
Matthew Ahrens [Thu, 20 Aug 2020 18:45:20 +0000 (11:45 -0700)]
Fix indentation in dnode_free_range()

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10744

4 years agoFreeBSD: 11.x arc_stats compatibility
Matthew Macy [Thu, 20 Aug 2020 17:55:02 +0000 (10:55 -0700)]
FreeBSD: 11.x arc_stats compatibility

Removing other_size from arc_stats breaks top in 11.x jails
running on HEAD.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10745

4 years agoAdd zstd support to zfs
Michael Niewöhner [Tue, 18 Aug 2020 17:10:17 +0000 (19:10 +0200)]
Add zstd support to zfs

This PR adds two new compression types, based on ZStandard:

- zstd: A basic ZStandard compression algorithm Available compression.
  Levels for zstd are zstd-1 through zstd-19, where the compression
  increases with every level, but speed decreases.

- zstd-fast: A faster version of the ZStandard compression algorithm
  zstd-fast is basically a "negative" level of zstd. The compression
  decreases with every level, but speed increases.

  Available compression levels for zstd-fast:
   - zstd-fast-1 through zstd-fast-10
   - zstd-fast-20 through zstd-fast-100 (in increments of 10)
   - zstd-fast-500 and zstd-fast-1000

For more information check the man page.

Implementation details:

Rather than treat each level of zstd as a different algorithm (as was
done historically with gzip), the block pointer `enum zio_compress`
value is simply zstd for all levels, including zstd-fast, since they all
use the same decompression function.

The compress= property (a 64bit unsigned integer) uses the lower 7 bits
to store the compression algorithm (matching the number of bits used in
a block pointer, as the 8th bit was borrowed for embedded block
pointers).  The upper bits are used to store the compression level.

It is necessary to be able to determine what compression level was used
when later reading a block back, so the concept used in LZ4, where the
first 32bits of the on-disk value are the size of the compressed data
(since the allocation is rounded up to the nearest ashift), was
extended, and we store the version of ZSTD and the level as well as the
compressed size. This value is returned when decompressing a block, so
that if the block needs to be recompressed (L2ARC, nop-write, etc), that
the same parameters will be used to result in the matching checksum.

All of the internal ZFS code ( `arc_buf_hdr_t`, `objset_t`,
`zio_prop_t`, etc.) uses the separated _compress and _complevel
variables.  Only the properties ZAP contains the combined/bit-shifted
value. The combined value is split when the compression_changed_cb()
callback is called, and sets both objset members (os_compress and
os_complevel).

The userspace tools all use the combined/bit-shifted value.

Additional notes:

zdb can now also decode the ZSTD compression header (flag -Z) and
inspect the size, version and compression level saved in that header.
For each record, if it is ZSTD compressed, the parameters of the decoded
compression header get printed.

ZSTD is included with all current tests and new tests are added
as-needed.

Per-dataset feature flags now get activated when the property is set.
If a compression algorithm requires a feature flag, zfs activates the
feature when the property is set, rather than waiting for the first
block to be born.  This is currently only used by zstd but can be
extended as needed.

Portions-Sponsored-By: The FreeBSD Foundation
Co-authored-by: Allan Jude <allanjude@freebsd.org>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Co-authored-by: Michael Niewöhner <foss@mniewoehner.de>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes #6247
Closes #9024
Closes #10277
Closes #10278

4 years agoImport ZStandard v1.4.5
Michael Niewöhner [Tue, 18 Aug 2020 17:10:10 +0000 (19:10 +0200)]
Import ZStandard v1.4.5

ZStandard is a modern, high performance, general compression algorithm.
It provides similar or better compression levels to GZIP, but with much
better performance. ZStandard provides a large selection of compression
levels to allow a storage administrator to select the preferred
performance/compression trade-off.

This commit imports the unmodified ZStandard single-file library which
will be used by ZFS.

The implementation of this new library is done with future updates of
zstd in mind. For this reason we integrated the code in a way, that does
not require modifications to the library. For more details, see
`module/zstd/README.md`.

The library is excluded from codecov calculation and cppcheck as
unaltered dependencies do not need full codecov or cppcheck.

Co-authored-by: Allan Jude <allanjude@freebsd.org>
Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Co-authored-by: Michael Niewöhner <foss@mniewoehner.de>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
4 years agoLinux 5.7 compat: Include linux/sched.h in spl/sys/mutex.h
Pavel Snajdr [Thu, 20 Aug 2020 04:37:38 +0000 (06:37 +0200)]
Linux 5.7 compat: Include linux/sched.h in spl/sys/mutex.h

struct task_struct is needed for lockdep_off() in mutex.h

This has popped up after e616cb8daadf (in linux-5.7-rc7).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #10741

4 years agoFreeBSD: Add option to rewind checkpoint while importing root pool
Mariusz Zaborski [Thu, 20 Aug 2020 00:19:42 +0000 (02:19 +0200)]
FreeBSD: Add option to rewind checkpoint while importing root pool

This option is used by FreeBSD boot loader.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org>
Closes #10738

4 years agoZED: Do not offline a missing device if no spare is available
Brian Behlendorf [Wed, 19 Aug 2020 05:13:17 +0000 (22:13 -0700)]
ZED: Do not offline a missing device if no spare is available

Due to commit d48091d a removed device is now explicitly offlined by
the ZED if no spare is available, rather than the letting ZFS detect
it as UNAVAIL. This broke auto-replacing of whole-disk devices, as
described in issue #10577.  In short, when a new device is reinserted
in the same slot, the ZED will try to ONLINE it without letting ZFS
recreate the necessary partition table.

This change simply avoids setting the device OFFLINE when removed if
no spare is available (or if spare_on_remove is false).  This change
has been left minimal to allow it to be backported to 0.8.x release.
The auto_offline_001_pos ZTS test has been updated accordingly.

Some follow up work is planned to update the ZED so it transitions
the vdev to a REMOVED state.  This is a state which has always
existed but there is no current interface the ZED can use to
accomplish this.  Therefore it's being left to a follow up PR.

Reviewed-by: Gionatan Danti <g.danti@assyoma.it>
Co-authored-by: Gionatan Danti <g.danti@assyoma.it>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10577
Closes #10730

4 years agoFix ARC aggsum access after arc_state_fini()
Brian Behlendorf [Wed, 19 Aug 2020 05:11:34 +0000 (22:11 -0700)]
Fix ARC aggsum access after arc_state_fini()

Commit 85ec5cbae updated abd_update_scatter_stats() such that it
calls arc_space_consume() and arc_space_return() when updating the
scatter stats.  This requires that the global aggsum value for the
ARC be initialized.  Normally this is not an issue, however during
module unload the l2arc_do_free_on_write() function was called in
l2arc_cleanup() after arc_state_fini() destroyed the aggsum values.
We can resolve this issue by performing l2arc_do_free_on_write()
slightly earlier in arc_fini().

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10739

4 years agolibzfs_core: Initialize fail_ioc_cmd to ZFS_IOC_LAST
Ryan Moeller [Wed, 19 Aug 2020 01:07:43 +0000 (21:07 -0400)]
libzfs_core: Initialize fail_ioc_cmd to ZFS_IOC_LAST

FreeBSD numbers `ZFS_IOC_*` starting at 0, so pick a different
sentinel value to avoid unintentionally messing with
`ZFS_IOC_POOL_CREATE` ioctls.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #10729

4 years agoFreeBSD: Fix UNIX permissions checking
Matthew Macy [Tue, 18 Aug 2020 16:57:07 +0000 (09:57 -0700)]
FreeBSD: Fix UNIX permissions checking

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10727

4 years agoAdd define to enable autotrim to default to on
Matthew Macy [Tue, 18 Aug 2020 16:52:30 +0000 (09:52 -0700)]
Add define to enable autotrim to default to on

In FreeBSD trim has defaulted to on for several
years. In order to minimize POLA violations on
import it's important to maintain this default
when importing vendored openzfs in to FreeBSD
base.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10719

4 years agoMake zc_nvlist_src_size limit tunable
Ryan Moeller [Tue, 18 Aug 2020 16:33:55 +0000 (12:33 -0400)]
Make zc_nvlist_src_size limit tunable

We limit the size of nvlists passed to the kernel so a user cannot make
the kernel do an unreasonably large allocation.  On FreeBSD this limit
was 128 kiB, which turns out to be a bit too small when doing some
operations involving a large number of datasets or snapshots, for
example replication.

Make this limit tunable, with a platform-specific auto default.
Linux keeps its limit at KMALLOC_MAX_SIZE. FreeBSD uses 1/4 of the
system limit on user wired memory, which allows it to scale depending
on system configuration.

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Issue #6572
Closes #10706

4 years agoRemove unused `zpool_is_bootable`
George Melikov [Tue, 18 Aug 2020 16:30:12 +0000 (19:30 +0300)]
Remove unused `zpool_is_bootable`

Otherwise compiler errors with:

```
libzfs_pool.c:449:1: error: 'zpool_is_bootable'
 defined but not used [-Werror=unused-function]
```

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #10734

4 years agoRemove GRUB restrictions
Richard Laager [Tue, 18 Aug 2020 06:12:39 +0000 (01:12 -0500)]
Remove GRUB restrictions

The GRUB restrictions are based around the pool's bootfs property.
Given the current situation where GRUB is not staying current with
OpenZFS pool features, having either a non-ZFS /boot or a separate
pool with limited features are pretty much the only long-term answers
for GRUB support.  Only the second case matters in this context.  For
the restrictions to be useful, the bootfs property would have to be set
on the boot pool, because that is where we need the restrictions, as
that is the pool that GRUB reads from. The documentation for bootfs
describes it as pointing to the root pool. That's also how it's used in
the initramfs. ZFS does not allow setting bootfs to point to a dataset
in another pool. (If it did, it'd be difficult-to-impossible to enforce
these restrictions cross-pool). Accordingly, bootfs is pretty much
useless for GRUB scenarios moving forward.

Even for users who have only one pool, the existing restrictions for
GRUB are incomplete. They don't prevent you from enabling the
unsupported checksums, for example. For that reason, I have ripped out
all the GRUB restrictions.

A little longer-term, I think extending the proposed features=portable
system to define a features=grub is a much more useful approach. The
user could set that on the boot pool at creation, and things would
Just Work.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #8627

4 years agoZTS: ztest may cause mmp tests failures
Brian Behlendorf [Tue, 18 Aug 2020 05:31:18 +0000 (22:31 -0700)]
ZTS: ztest may cause mmp tests failures

The mmp_exported_import and mmp_inactive_import tests depend on
ztest simulating an active pool.  If ztest unexpectedly terminates
due to an unrelated issue the test case will fail.  Since ztest is
not yet 100% reliable I've added these tests to the maybe exception
list.  They can be removed when the issues with ztest are resolved
or if the test cases are updated to handle these unexpected failures.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10726

4 years agoInclude scatter_chunk_waste in arc_size
Matthew Ahrens [Tue, 18 Aug 2020 03:04:04 +0000 (20:04 -0700)]
Include scatter_chunk_waste in arc_size

The ARC caches data in scatter ABD's, which are collections of pages,
which are typically 4K.  Therefore, the space used to cache each block
is rounded up to a multiple of 4K.  The ABD subsystem tracks this wasted
memory in the `scatter_chunk_waste` kstat.  However, the ARC's `size` is
not aware of the memory used by this round-up, it only accounts for the
size that it requested from the ABD subsystem.

Therefore, the ARC is effectively using more memory than it is aware of,
due to the `scatter_chunk_waste`.  This impacts observability, e.g.
`arcstat` will show that the ARC is using less memory than it
effectively is.  It also impacts how the ARC responds to memory
pressure.  As the amount of `scatter_chunk_waste` changes, it appears to
the ARC as memory pressure, so it needs to resize `arc_c`.

If the sector size (`1<<ashift`) is the same as the page size (or
larger), there won't be any waste.  If the (compressed) block size is
relatively large compared to the page size, the amount of
`scatter_chunk_waste` will be small, so the problematic effects are
minimal.

However, if using 512B sectors (`ashift=9`), and the (compressed) block
size is small (e.g. `compression=on` with the default `volblocksize=8k`
or a decreased `recordsize`), the amount of `scatter_chunk_waste` can be
very large.  On a production system, with `arc_size` at a constant 50%
of memory, `scatter_chunk_waste` has been been observed to be 10-30% of
memory.

This commit adds `scatter_chunk_waste` to `arc_size`, and adds a new
`waste` field to `arcstat`.  As a result, the ARC's memory usage is more
observable, and `arc_c` does not need to be adjusted as frequently.

Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10701

4 years agoRemove KMC_KMEM and KMC_VMEM
Matthew Ahrens [Mon, 17 Aug 2020 23:04:28 +0000 (16:04 -0700)]
Remove KMC_KMEM and KMC_VMEM

`KMC_KMEM` and `KMC_VMEM` are now unused since all SPL-implemented
caches are `KMC_KVMEM`.

KMC_KMEM: Given the default value of `spl_kmem_cache_kmem_limit`, we
don't use kmalloc to back the SPL caches, instead we use kvmalloc
(KMC_KVMEM).  The flag, module parameter, /proc entries, and associated
code are removed.

KMC_VMEM: This flag is not used, and kvmalloc() is always preferable to
vmalloc().  The flag, /proc entries, and associated code are removed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10673

4 years agoFreeBSD: fix the build with Clang 11
Ryan Moeller [Mon, 17 Aug 2020 22:40:17 +0000 (18:40 -0400)]
FreeBSD: fix the build with Clang 11

* Cast void * to uintptr_t before casting to boolean_t.

* Avoid clashing definition of __asm when not on Linux to
  prevent duplicate __volatile__. This was already done in
  some places but not all.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #10723

4 years agoFreeBSD: fix merge error in zfs_acl_ids_create
Matthew Macy [Mon, 17 Aug 2020 22:28:03 +0000 (15:28 -0700)]
FreeBSD: fix merge error in zfs_acl_ids_create

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10721

4 years agoFix typo in btree.c
Serapheim Dimitropoulos [Mon, 17 Aug 2020 22:25:37 +0000 (15:25 -0700)]
Fix typo in btree.c

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #10725

4 years agoFreeBSD: fallback to /boot/ to look for zpool.cache
Matthew Macy [Mon, 17 Aug 2020 21:43:47 +0000 (14:43 -0700)]
FreeBSD: fallback to /boot/ to look for zpool.cache

Up until now zpool.cache has always lived in /boot on FreeBSD.
For the sake of compatibility fallback to /boot if zpool.cache
isn't found in /etc/zfs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10720

4 years agoFix reporting of L2ARC writes in arc_summary3
George Amanakis [Mon, 17 Aug 2020 18:04:06 +0000 (14:04 -0400)]
Fix reporting of L2ARC writes in arc_summary3

arc_summary3 reports L2ARC writes in bytes. However, the related
arc_stat is reported as hits. arc_summary2 report this correctly.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10717

4 years agoFix l2arc_dev_rebuild_start thread name
Ryan Moeller [Mon, 17 Aug 2020 18:02:32 +0000 (14:02 -0400)]
Fix l2arc_dev_rebuild_start thread name

`thread_create` on FreeBSD stringifies the argument passed as the
thread function to create a name for the thread. The thread name for
`l2arc_dev_rebuild_start` ended up with `(void (*)(void *))` in it.

Change the type signature so the function does not need to be cast
when creating the thread.  Rename the function to
`l2arc_dev_rebuild_thread` for clarity and consistency, as well.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10716

4 years agoFreeBSD: Create taskq threads in appropriate proc
Ryan Moeller [Mon, 17 Aug 2020 18:01:19 +0000 (14:01 -0400)]
FreeBSD: Create taskq threads in appropriate proc

Stepping stone toward re-enabling spa_thread on FreeBSD.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10715

4 years agoFix L2ARC reads when compressed ARC disabled
Allan Jude [Fri, 14 Aug 2020 06:31:20 +0000 (02:31 -0400)]
Fix L2ARC reads when compressed ARC disabled

When reading compressed blocks from the L2ARC, with
compressed ARC disabled, arc_hdr_size() returns
LSIZE rather than PSIZE, but the actual read is PSIZE.
This causes l2arc_read_done() to compare the checksum
against the wrong size, resulting in checksum failure.

This manifests as an increase in the kstat l2_cksum_bad
and the read being retried from the main pool, making the
L2ARC ineffective.

Add new L2ARC tests with Compressed ARC enabled/disabled

Blocks are handled differently depending on the state of the
zfs_compressed_arc_enabled tunable.

If a block is compressed on-disk, and compressed_arc is enabled:
- the block is read from disk
- It is NOT decompressed
- It is added to the ARC in its compressed form
- l2arc_write_buffers() may write it to the L2ARC (as is)
- l2arc_read_done() compares the checksum to the BP (compressed)

However, if compressed_arc is disabled:
- the block is read from disk
- It is decompressed
- It is added to the ARC (uncompressed)
- l2arc_write_buffers() will use l2arc_apply_transforms() to
  recompress the block, before writing it to the L2ARC
- l2arc_read_done() compares the checksum to the BP (compressed)
- l2arc_read_done() will use l2arc_untransform() to uncompress it

This test writes out a test file to a pool consisting of one disk
and one cache device, then randomly reads from it. Since the arc_max
in the tests is low, this will feed the L2ARC, and result in reads
from the L2ARC.

We compare the value of the kstat l2_cksum_bad before and after
to determine if any blocks failed to survive the trip through the
L2ARC.

Sponsored-by: The FreeBSD Foundation
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Closes #10693

4 years agoRelease onexit/events with any missed zfsdev_state
Jorgen Lundman [Thu, 13 Aug 2020 22:03:23 +0000 (07:03 +0900)]
Release onexit/events with any missed zfsdev_state

Linux and FreeBSD will most likely never see this issue.
On macOS when kext is unloaded, but zed is still connected, zed
will be issued ENODEV. As the cdevsw is released, the kernel
will not have zfsdev_release() called to release minor/onexit/events,
and it "leaks". This ensures it is cleaned up before unload.

Changed the for loop from zsprev, to zsnext style, for less
code duplication.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #10700

4 years agoGithub workflow: checkstyle
George Melikov [Wed, 12 Aug 2020 17:46:26 +0000 (20:46 +0300)]
Github workflow: checkstyle

Use github workflow to run checkstyle
- use free (for OS projects) resources
- starts for every commit and branch
- work on forks, contributors may use it
  before creating PRs

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #10705

4 years agocstyle.pl: echo commands for github workflow
George Melikov [Wed, 12 Aug 2020 17:45:50 +0000 (20:45 +0300)]
cstyle.pl: echo commands for github workflow

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #10705

4 years agoRemove stale .travis.yml
George Melikov [Thu, 13 Aug 2020 21:55:45 +0000 (00:55 +0300)]
Remove stale .travis.yml

- It doesn't work now.
- It has to be manually edited on tests changes.
  (even on test runtime changes!)
- Travis gives too small time to run to be useful.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #10704

4 years agoUse zfs_dbgmsg to log metaslab_load/unload
Matthew Ahrens [Wed, 12 Aug 2020 17:10:50 +0000 (10:10 -0700)]
Use zfs_dbgmsg to log metaslab_load/unload

Metaslabs are now (usually) loaded and unloaded infrequently, but when
that is not the case, it is useful to have a log of when and why these
events happened.

This commit enables the zfs_dbgmsg() in metaslab_load(), and adds a
zfs_dbgmsg() in metaslab_unload().

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10683

4 years agoRestore ARC MFU/MRU pressure
Matthew Macy [Wed, 12 Aug 2020 17:03:24 +0000 (10:03 -0700)]
Restore ARC MFU/MRU pressure

The arc_adapt() function tunes LRU/MLU balance according to 4 types of
cache hits (which is passed as state agrument): ghost LRU, LRU, MRU,
ghost MRU. If this function is called with wrong cache hit (state),
adaptation will be sub-optimal and performance will suffer.

Some time ago upstream received this commit:

6950 ARC should cache compressed data) in arc_read() do next
sequence (access to ghost buffer)

Before this commit, hit to any ghost list was passed arc_adapt() before
call to arc_access() which revive element in cache and change state from
ghost to real hit.

After this commit, the order of calls was reverted and arc_adapt() is
now called only with «real» hits even if hit was in one of two ghost
lists, which renders ghost lists useless and breaks the ARC algorithm.

FreeBSD fixed this problem locally in Change D19094 / Commit r348772.

This change is an adaptation of the above commit to the current arc
code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10548
Closes #10618

4 years ago'zfs share -a' should handle 'canmount=noauto'
George Wilson [Tue, 11 Aug 2020 20:55:04 +0000 (14:55 -0600)]
'zfs share -a' should handle 'canmount=noauto'

The 'zfs share -a' currently skips any filesystems which
have 'canmount=noauto' set. This behavior is unexpected since the
one would expect 'zfs share -a' to share any mounted filesystem
that has the 'sharenfs' property already set.

This changes the behavior of 'zfs share -a' to allow the sharing
of 'canmount=noauto' datasets if they are mounted.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: George Wilson <gwilson@delphix.com>
External-issue: DLPX-71313
Closes #10688

4 years agoFreeBSD: Fix module autoloading when built in base
Matthew Macy [Tue, 11 Aug 2020 20:49:50 +0000 (13:49 -0700)]
FreeBSD: Fix module autoloading when built in base

The KMOD name is "zfs" instead of "openzfs" when building in FreeBSD.

Define a ZFS_KMOD symbol as "zfs" when IN_BASE is defined, otherwise
"openzfs".

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10699

4 years agoLinux 5.9 compat: make_request_fn replaced with submit_bio interface
Coleman Kane [Sun, 9 Aug 2020 16:12:25 +0000 (12:12 -0400)]
Linux 5.9 compat: make_request_fn replaced with submit_bio interface

The make_request_fn and associated API was replaced recently in a
Linux 5.9 merge, to replace its functionality with a new submit_bio
member in struct block_device_operations.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #10696

4 years agoLinux 5.9 compat: Update NR_SLAB_RECLAIMABLE to NR_SLAB_RECLAIMABLE_B
Coleman Kane [Sun, 9 Aug 2020 16:07:49 +0000 (12:07 -0400)]
Linux 5.9 compat: Update NR_SLAB_RECLAIMABLE to NR_SLAB_RECLAIMABLE_B

This change appears to primarily be a name change for the enum. Had
to update the test logic so that it works so long as either one of
these is present (favoring the newer one). Additionally, as this is
newer, it only shows up in node_page_item, so this commit doesn't
test zone_page_item for the same enum.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #10696

4 years agoLinux 5.9 compat: add linux/blkdev.h include
Coleman Kane [Sun, 9 Aug 2020 16:03:03 +0000 (12:03 -0400)]
Linux 5.9 compat: add linux/blkdev.h include

Many of the block device operations (often functions with bdev in
the name) were moved into linux/blkdev.h from linux/fs.h. Seems
that this header is already included where needed in the code, but
in the autoconf tests it was missing causing false negatives. This
commit has those tests include linux/fs.h (old location) and now
also linux/blkdev.h (new locations).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #10696

4 years agoFix typo
Allan Jude [Tue, 11 Aug 2020 20:16:57 +0000 (16:16 -0400)]
Fix typo

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Closes #10694

4 years agoMove ZVOL_DIR back to zfs.h
Ryan Moeller [Tue, 11 Aug 2020 20:12:12 +0000 (16:12 -0400)]
Move ZVOL_DIR back to zfs.h

This was previously moved because nothing else in-tree uses it, but
evidently DilOS uses it out of tree.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Ryan Moeller <freqlabs@freebsd.org>
Closes #10361
Closes #10685

4 years agoFreeBSD: update vaccess signature on most recent HEAD
Matthew Macy [Fri, 7 Aug 2020 21:16:01 +0000 (14:16 -0700)]
FreeBSD: update vaccess signature on most recent HEAD

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10682

4 years agoClarify error message when a range-tree double-add occurs
Paul Dagnelie [Fri, 7 Aug 2020 21:13:13 +0000 (14:13 -0700)]
Clarify error message when a range-tree double-add occurs

In various other pieces of logic have resulted in situations where
we double-free space in ZFS. This in turn results in a double-add
to the range trees. These issues have been much more difficult to
diagnose than they should have been, because the error handling
around this case is much weaker than around the double remove case.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #10654

4 years agoZTS: Remove bashisms from zfs-tests.sh
Ryan Moeller [Fri, 7 Aug 2020 21:10:48 +0000 (17:10 -0400)]
ZTS: Remove bashisms from zfs-tests.sh

Bring zfs-tests.sh in to compliance with the other scripts
by converting it /bin/sh for to avoid a dependency on bash.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10640

4 years agoRemove commented-out code
Matthew Ahrens [Tue, 4 Aug 2020 18:36:53 +0000 (11:36 -0700)]
Remove commented-out code

Remove dead code to make the implementation easier to understand.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Ahrens <matt@delphix.com>
Closes #10650

4 years agoRemove KM_NODEBUG
Matthew Ahrens [Thu, 30 Jul 2020 20:59:07 +0000 (13:59 -0700)]
Remove KM_NODEBUG

Remove dead code to make the implementation easier to understand.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Ahrens <matt@delphix.com>
Closes #10650

4 years agoRemove KMC_NOMAGAZINE
Matthew Ahrens [Thu, 30 Jul 2020 20:56:00 +0000 (13:56 -0700)]
Remove KMC_NOMAGAZINE

Remove dead code to make the implementation easier to understand.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Ahrens <matt@delphix.com>
Closes #10650

4 years agoRemove KMC_QCACHE
Matthew Ahrens [Thu, 30 Jul 2020 20:51:31 +0000 (13:51 -0700)]
Remove KMC_QCACHE

Remove dead code to make the implementation easier to understand.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Ahrens <matt@delphix.com>
Closes #10650

4 years agoRemove KMC_NOHASH
Matthew Ahrens [Thu, 30 Jul 2020 20:46:32 +0000 (13:46 -0700)]
Remove KMC_NOHASH

Remove dead code to make the implementation easier to understand.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Ahrens <matt@delphix.com>
Closes #10650

4 years agoRemove KMC_NOTOUCH
Matthew Ahrens [Thu, 30 Jul 2020 20:43:18 +0000 (13:43 -0700)]
Remove KMC_NOTOUCH

Remove dead code to make the implementation easier to understand.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Ahrens <matt@delphix.com>
Closes #10650

4 years agoRemove KMC_OFFSLAB
Matthew Ahrens [Thu, 30 Jul 2020 05:03:23 +0000 (22:03 -0700)]
Remove KMC_OFFSLAB

Remove dead code to make the implementation easier to understand.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Ahrens <matt@delphix.com>
Closes #10650

4 years agoFix i/o error handling of livelists and zap iteration
Matthew Ahrens [Wed, 5 Aug 2020 17:22:09 +0000 (10:22 -0700)]
Fix i/o error handling of livelists and zap iteration

Pool-wide metadata is stored in the MOS (Meta Object Set).  This
metadata is stored in triplicate, in addition to any pool-level
reduncancy (e.g. RAIDZ).  However, if all 3+ copies of this metadata are
not available, we can still get EIO/ECKSUM when reading from the MOS.
If we encounter such an error in syncing context, we have typically
already committed to making a change that we now can't do because of the
corrupt/missing metadata.  We typically "handle" this with a `VERIFY()`
or `zfs_panic_recover()`.  This prevents the system from continuing on
in an undefined state, while minimizing the amount of error-handling
code.

However, there are some code paths that ignore these i/o errors, or
`ASSERT()` that they don't happen.  Since assertions are disabled on
non-debug builds, they effectively ignore them as well.  This can lead
to ZFS continuing on in an incorrect state, potentially leading to
on-disk inconsistencies.

This commit adds handling for these i/o errors on MOS metadata,
typically with a `VERIFY()`:

* Handle error return from `zap_cursor_retrieve()` in 4 places in
`dsl_deadlist.c`.

* Handle error return from `zap_contains()` in `dsl_dir_hold_obj()`.
Turns out this call isn't necessary because we can always call
`zap_lookup()`.

* Handle error return from `zap_lookup()` in `dsl_fs_ss_limit_check()`.

* Handle error return from `zap_remove()` in `dsl_dir_rename_sync()`.

* Handle error return from `zap_lookup()` in
`dsl_dir_remove_livelist()`.

* Handle error return from `dsl_process_sub_livelist()` in
`spa_livelist_delete_cb()`.

Additionally:

* Augment the internal history log message for `zfs destroy` to note
which method is used (e.g. bptree, livelist, or, synchronous) and the
mintxg.

* Correct a comment in `dbuf_init()`.

* Correct indentation in `dsl_dir_remove_livelist()`.

Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10643

4 years agoFreeBSD: Add support for lockless lookup
Matthew Macy [Wed, 5 Aug 2020 17:19:51 +0000 (10:19 -0700)]
FreeBSD: Add support for lockless lookup

Authored-by: mjg <mjg@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10657