]> CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log
FreeBSD/FreeBSD.git
20 months agoFix use-after-free in btree code
Richard Yao [Mon, 12 Sep 2022 18:22:15 +0000 (14:22 -0400)]
Fix use-after-free in btree code

Coverty static analysis found these.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #10989
Closes #13861

21 months agocontrib: dracut: zfs-snapshot-bootfs: exit status fix
gregory-lee-bartholomew [Fri, 12 Aug 2022 21:28:15 +0000 (16:28 -0500)]
contrib: dracut: zfs-snapshot-bootfs: exit status fix

When the zfs-snapshot-bootfs service attempts to create a snapshot
that already exists, the exit status of the command is non-zero and
the service reports failed to the systemd service manager. This is a
common occurrence if bootfs.snapshot is left set on the kernel command
line and it should not be considered a failure.

This service was originally set to ignore this error by prefixing
the command with - on the ExecStart line, but the leading - appears
to have been dropped in #13359.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gregory Bartholomew <gregory.lee.bartholomew@gmail.com>
Closes #13769

21 months agoarcstat: fix -p option
r-ricci [Fri, 12 Aug 2022 21:21:52 +0000 (22:21 +0100)]
arcstat: fix -p option

When the -p option is used, a list of floats is passed to sep.join(),
which expects strings. Fix this by converting each value to a string.

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Roberto Ricci <ricci@disroot.org>
Closes #12916
Closes #13767

21 months agoFix problem with zdb_objset_id test.
Paul Zuchowski [Tue, 9 Aug 2022 13:54:17 +0000 (13:54 +0000)]
Fix problem with zdb_objset_id test.

Use large numbers for datasets with
numeric names to avoid name and id
collisions.

Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
21 months agoLinux 6.0 compat: register_shrinker() now var-arg
Coleman Kane [Mon, 8 Aug 2022 23:18:30 +0000 (19:18 -0400)]
Linux 6.0 compat: register_shrinker() now var-arg

The 6.0 kernel added a printf-style var-arg for args > 0 to the
register_shrinker function, in order to add names to shrinkers, in
commit e33c267ab70de4249d22d7eab1cc7d68a889bac2. This enables the
shrinkers to have friendly names exposed in /sys/kernel/debug/shrinker/.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #13748

21 months agoLinux 5.20 compat: blk_cleanup_disk()
Brian Behlendorf [Thu, 4 Aug 2022 00:37:52 +0000 (17:37 -0700)]
Linux 5.20 compat: blk_cleanup_disk()

As of the Linux 5.20 kernel blk_cleanup_disk() has been removed,
all callers should use put_disk().

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13728

21 months agoLinux 5.20 compat: bdevname()
Brian Behlendorf [Wed, 3 Aug 2022 18:35:47 +0000 (11:35 -0700)]
Linux 5.20 compat: bdevname()

As of the Linux 5.20 kernel bdevname() has been removed, all
callers should use snprintf() and the "%pg" format specifier.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13728

21 months agoLinux 5.19 compat: META
Brian Behlendorf [Tue, 2 Aug 2022 17:04:38 +0000 (10:04 -0700)]
Linux 5.19 compat: META

Update the META file to reflect compatibility with the 5.19 kernel.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13715

21 months agoFix problem with zdb -d
Paul Zuchowski [Wed, 3 Aug 2022 17:12:23 +0000 (17:12 +0000)]
Fix problem with zdb -d

zdb -d <pool>/<objset ID> does not work when
other command line arguments are included i.e.
zdb -U <cachefile> -d <pool>/<objset ID>
This change fixes the command line parsing
to handle this situation.  Also fix issue
where zdb -r <dataset> <file> does not handle
the root <dataset> of the pool. Introduce -N
option to force <objset ID> to be interpreted
as a numeric objsetID.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #12845
Closes #12944

21 months agoFix checkstyle warning: E275 missing whitespace after keyword
Tino Reichardt [Mon, 1 Aug 2022 16:49:35 +0000 (18:49 +0200)]
Fix checkstyle warning: E275 missing whitespace after keyword

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13710

21 months agoRevert behavior of 59eab109 on not-Linux
Rich Ercolani [Thu, 4 Nov 2021 13:49:40 +0000 (09:49 -0400)]
Revert behavior of 59eab109 on not-Linux

It turns out that short-circuiting the EFAULT behavior on a short read
breaks things on FreeBSD. So until there's a nicer solution, let's
just revert the behavior for not-Linux.

Reference:
https://reviews.freebsd.org/R10:70f51f0e474ffe1fb74cb427423a2fba3637544d

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12698

21 months agoHandle partial reads in zfs_read
Rich Ercolani [Mon, 20 Sep 2021 17:30:50 +0000 (13:30 -0400)]
Handle partial reads in zfs_read

Currently, dmu_read_uio_dnode can read 64K of a requested 1M in one
loop, get EFAULT back from zfs_uiomove() (because the iovec only holds
64k), and return EFAULT, which turns into EAGAIN on the way out. EAGAIN
gets interpreted as "I didn't read anything", the caller tries again
without consuming the 64k we already read, and we're stuck.

This apparently works on newer kernels because the caller which breaks
on older Linux kernels by happily passing along a 1M read request and a
64k iovec just requests 64k at a time.

With this, we now won't return EFAULT if we got a partial read.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12370
Closes #12509
Closes #12516

22 months agomodule: lua: ldo: fix pragma name
наб [Tue, 28 Jun 2022 21:31:55 +0000 (23:31 +0200)]
module: lua: ldo: fix pragma name

/home/nabijaczleweli/store/code/zfs/module/lua/ldo.c:175:32: warning:
unknown option after ‘#pragma GCC diagnostic’ kind [-Wpragmas]
  175 | #pragma GCC diagnostic ignored "-Winfinite-recursion"a
      |                                ^~~~~~~~~~~~~~~~~~~~~~

Fixes: a6e8113fed8a508ffda13cf1c4d8da99a4e8133a ("Silence
-Winfinite-recursion warning in luaD_throw()")

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13348

22 months agoZTS: Fix io_uring support check
Brian Behlendorf [Tue, 26 Jul 2022 21:39:23 +0000 (14:39 -0700)]
ZTS: Fix io_uring support check

Not all Linux distribution kernels enable io_uring support by
default.  Update the run time check to verify that the booted
kernel was built with CONFIG_IO_URING=y.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Co-authored-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13648
Closes #13685

22 months agoFix objtool: missing int3 after ret warning
Brian Behlendorf [Mon, 20 Jun 2022 23:36:21 +0000 (23:36 +0000)]
Fix objtool: missing int3 after ret warning

Resolve straight-line speculation warnings reported by objtool
for x86_64 assembly on Linux when CONFIG_SLS is set.  See the
following LWN article for the complete details.

https://lwn.net/Articles/877845/

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13528
Closes #13575

22 months agoICP: Add missing stack frame info to SHA asm files
Attila Fülöp [Fri, 16 Apr 2021 22:11:26 +0000 (00:11 +0200)]
ICP: Add missing stack frame info to SHA asm files

Since the assembly routines calculating SHA checksums don't use
a standard stack layout, CFI directives are needed to unroll the
stack.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #11733

22 months agoFix -Wformat-overflow warning in zfs_project_handle_dir()
Brian Behlendorf [Mon, 20 Jun 2022 22:27:55 +0000 (22:27 +0000)]
Fix -Wformat-overflow warning in zfs_project_handle_dir()

Switch to using asprintf() to satisfy the compiler and resolve the
potential format-overflow warning.  Not the conditional before the
sprintf() would have prevented this regardless.

    cmd/zfs/zfs_project.c: In function ‘zfs_project_handle_dir’:
    cmd/zfs/zfs_project.c:241:38: error: ‘/’ directive writing
    1 byte into a region of size between 0 and 4352
    [-Werror=format-overflow=]
    cmd/zfs/zfs_project.c:241:17: note: ‘sprintf’ output between
    2 and 4609 bytes into a destination of size 4352

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13528
Closes #13575

22 months agoFix -Wformat-truncation warning in upgrade_set_callback()
Brian Behlendorf [Mon, 20 Jun 2022 21:54:42 +0000 (21:54 +0000)]
Fix -Wformat-truncation warning in upgrade_set_callback()

Extend the buffer slightly resolve the warning.

    cmd/zfs/zfs_main.c: In function ‘upgrade_set_callback’:
    cmd/zfs/zfs_main.c:2446:22: error: ‘%llu’ directive output
    may be truncated writing between 1 and 20 bytes into a
    region of size 16 [-Werror=format-truncation=]
    cmd/zfs/zfs_main.c:2445:24: note: ‘snprintf’ output between
    2 and 21 bytes into a destination of size 16

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13528
Closes #13575

22 months agoFix -Wuse-after-free warning in dbuf_destroy()
Brian Behlendorf [Mon, 20 Jun 2022 21:35:38 +0000 (21:35 +0000)]
Fix -Wuse-after-free warning in dbuf_destroy()

Move the use of the db pointer after it is freed.  It's only used as
a tag so a dereference would never occur, but there's no reason we
can't invert the order to resolve the warning.

    module/zfs/dbuf.c: In function 'dbuf_destroy':
    module/zfs/dbuf.c:2953:17: error:
    pointer 'db' may be used after 'free' [-Werror=use-after-free]

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13528
Closes #13575

22 months agoFix -Wuse-after-free warning in dbuf_issue_final_prefetch_done()
Brian Behlendorf [Mon, 20 Jun 2022 21:32:03 +0000 (21:32 +0000)]
Fix -Wuse-after-free warning in dbuf_issue_final_prefetch_done()

Move the use of the private pointer after it is freed.  It's only
used as a tag so a dereference would never occur, but there's no
harm in inverting the order to resolve the warning.

    module/zfs/dbuf.c: In function 'dbuf_issue_final_prefetch_done':
    module/zfs/dbuf.c:3204:17: error:
    pointer 'private' may be used after 'free' [-Werror=use-after-free]

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13528
Closes #13575

22 months agoFix -Wattribute-warning in dsl layer
Brian Behlendorf [Mon, 20 Jun 2022 21:13:26 +0000 (21:13 +0000)]
Fix -Wattribute-warning in dsl layer

The memcpy(), memmove(), and memset() functions have been annotated
to perform bounds checking when using FORTIFY_SOURCE.  A warning is
now generted when writing beyond the end of the specified field.

Alternately, the new struct_group() macro could be used to create
an anonymous union member for use by memcpy().  However, since this
is the only place the macro would be helpful it's preferable to
restructure the code slights to avoid the need for additional
compatibility code when the macro does not exist.

https://lore.kernel.org/lkml/20211118183807.1283332-1-keescook@chromium.org/T/

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13528
Closes #13575

22 months agoFix -Wattribute-warning in edonr
Brian Behlendorf [Mon, 20 Jun 2022 19:37:38 +0000 (19:37 +0000)]
Fix -Wattribute-warning in edonr

The wrong union memory was being accessed in EdonRInit resulting in
a write beyond size of field compiler warning.  Reference the correct
member to resolve the warning.  The warning was correct and this in
case the mistake was harmless.

    In function ‘fortify_memcpy_chk’,
    inlined from ‘EdonRInit’ at zfs/module/icp/algs/edonr/edonr.c:494:3:
    ./include/linux/fortify-string.h:344:25: error: call to
    ‘__write_overflow_field’ declared with attribute warning:
    detected write beyond size of field (1st parameter);
    maybe use struct_group()? [-Werror=attribute-warning]

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13528
Closes #13575

22 months agoFix -Wattribute-warning in zfs_log_xvattr()
Brian Behlendorf [Mon, 20 Jun 2022 22:36:38 +0000 (22:36 +0000)]
Fix -Wattribute-warning in zfs_log_xvattr()

Restructure the code in zfs_log_xvattr() to use a lr_attr_end
structure when accessing lr_attr_t elements located after the
variable sized array.  This makes the code more understandable
and resolves the accessing beyond the end of the field warnings.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13528
Closes #13575

22 months agoSilence -Winfinite-recursion warning in luaD_throw()
Brian Behlendorf [Mon, 20 Jun 2022 19:53:58 +0000 (19:53 +0000)]
Silence -Winfinite-recursion warning in luaD_throw()

This code should be kept inline with the upstream lua version as much
as possible.  Therefore, we simply want to silence the warning.  This
check was enabled by default as part of -Wall in gcc 12.1.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13528
Closes #13575

22 months agoconfig: prune unused -Wno-bool-compare checks
наб [Wed, 16 Feb 2022 15:08:20 +0000 (16:08 +0100)]
config: prune unused -Wno-bool-compare checks

Reviewed-by: Alejandro Colomar <alx.manpages@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13110

22 months agolibtpool: -Wno-clobbered
наб [Wed, 16 Feb 2022 15:06:08 +0000 (16:06 +0100)]
libtpool: -Wno-clobbered

Also remove -Wno-unused-but-set-variable

Upstream-bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61118
Reviewed-by: Alejandro Colomar <alx.manpages@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13110

22 months agoRemove sha1 hashing from OpenZFS, it's not used anywhere.
Tino Reichardt [Mon, 3 Jan 2022 16:43:11 +0000 (17:43 +0100)]
Remove sha1 hashing from OpenZFS, it's not used anywhere.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12895
Closes #12902
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
22 months agoFix scrub resume from newly created hole.
Alexander Motin [Fri, 8 Jul 2022 19:53:10 +0000 (15:53 -0400)]
Fix scrub resume from newly created hole.

It may happen that scan bookmark points to a block that was turned
into a part of a big hole.  In such case dsl_scan_visitbp() may skip
it and dsl_scan_check_resume() will not be called for it.  As result
new scan suspend won't be possible until the end of the object, that
may take hours if the object is a multi-terabyte ZVOL on a slow HDD
pool, stretching TXG to all that time, creating all sorts of problems.

This patch changes the resume condition to any greater or equal block,
so even if we miss the bookmarked block, the next one we find will
delete the bookmark, allowing new suspend.

Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
22 months agoAvoid memory copy when verifying raidz/draid parity
Alexander Motin [Tue, 5 Jul 2022 23:27:29 +0000 (19:27 -0400)]
Avoid memory copy when verifying raidz/draid parity

Before this change for every valid parity column raidz_parity_verify()
allocated new buffer and copied there existing data, then recalculated
the parity and compared the result with the copy.  This patch removes
the memory copy, simply swapping original buffer pointers with newly
allocated empty ones for parity recalculation and comparison. Original
buffers with potentially incorrect parity data are then just freed,
while new recalculated ones are used for repair.

On a pool of 12 4-wide raidz vdevs, storing 1.5TB of 16MB blocks, this
change reduces memory traffic during scrub by 17% and total unhalted
CPU time by 25%.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13613

22 months agoAvoid memory copies during mirror scrub
Alexander Motin [Tue, 5 Jul 2022 23:26:20 +0000 (19:26 -0400)]
Avoid memory copies during mirror scrub

Issuing several scrub reads for a block we may use the parent ZIO
buffer for one of child ZIOs.  If that read complete successfully,
then we won't need to copy the data explicitly.  If block has only
one copy (typical for root vdev, which is also a mirror inside),
then we never need to copy -- succeed or fail as-is.  Previous
code also copied data from buffer of every successfully completed
child ZIO, but that just does not make any sense.

On healthy N-wide mirror this saves all N+1 (or even more in case
of ditto blocks) memory copies for each scrubbed block, allowing
CPU to focus mostly on check-summing.  For other vdev types it
should save one memory copy per block copy at root vdev.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13606

22 months agoFix and disable blocks statistics during scrub
Alexander Motin [Tue, 28 Jun 2022 18:23:31 +0000 (14:23 -0400)]
Fix and disable blocks statistics during scrub

Block statistics calculation during scrub I/O issue in case of sorted
scrub accounted ditto blocks several times.  Embedded blocks on other
side were not accounted at all.  This change moves the accounting from
issue to scan stage, that fixes both problems and also allows to avoid
pool-wide locking and the lock contention it created.

Since this statistics is quite specific and is not even exposed now
anywhere, disable its calculation by default to not waste CPU time.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13579

22 months agoAvoid two 64-bit divisions per scanned block
Alexander Motin [Mon, 27 Jun 2022 18:08:21 +0000 (14:08 -0400)]
Avoid two 64-bit divisions per scanned block

Change math to make it like the ARC, using multiplications instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13591

22 months agoSeveral B-tree optimizations
Alexander Motin [Fri, 24 Jun 2022 20:55:58 +0000 (16:55 -0400)]
Several B-tree optimizations

- Introduce first element offset within a leaf.  It allows to reduce
by ~50% average memmove() size when adding/removing elements.  If the
added/removed element is in the first half of the leaf, we may shift
elements before it and adjust the bth_first instead of moving more
elements after it.
 - Use memcpy() instead of memmove() when we know there is no overlap.
 - Switch from uint64_t to uint32_t.  It does not limit anything,
but 32-bit arches should appreciate it greatly in hot paths.
 - Store leaf capacity in struct btree to avoid 64-bit divisions.
 - Adjust zfs_btree_insert_into_leaf() to always result in balanced
leaves after splitting, no matter where the new element was inserted.
Not that we care about it much, but it should also allow B-trees with
as little as two elements per leaf instead of 4 previously.

When scrubbing pool of 12 SSDs, storing 1.5TB of 4KB zvol blocks this
reduces amount of time spent in memmove() inside the scan thread from
13.7% to 5.7% and total scrub time by ~15 seconds out of 9 minutes.
It should also reduce spacemaps load time, but I haven't measured it.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13582

22 months agoSeveral sorted scrub optimizations
Alexander Motin [Fri, 24 Jun 2022 16:50:37 +0000 (12:50 -0400)]
Several sorted scrub optimizations

- Reduce size and comparison complexity of q_exts_by_size B-tree.
Previous code used two 64-bit divisions and many other operations to
compare two B-tree elements.  It created enormous overhead.  This
implementation moves the math to the upper level and stores the score
in the B-tree elements themselves.  Since all that we need to store in
that B-tree is the extent score and offset, those can fit into single
8 byte value instead of 24 bytes of q_exts_by_addr element and can be
compared with single operation.
 - Better decouple secondary tree logic from main range_tree by moving
rt_btree_ops and related functions into dsl_scan.c as ext_size_ops.
Those functions are very small to worry about the code duplication and
range_tree does not need to know details such as rt_btree_compare.
 - Instead of accounting number of pending bytes per pool, that needs
atomic on global variable per block, account the number of non-empty
per-vdev queues, that change much more rarely.
 - When extent scan is interrupted by TXG end, continue it in the next
TXG instead of selecting next best extent.  It allows to avoid leaving
one truncated (and so likely not the best any more) extent each TXG.

On top of some other optimizations this saves about 1.5 minutes out of
10 to scrub pool of 12 SSDs, storing 1.5TB of 4KB zvol blocks.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <caputit1@tcnj.edu>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13576

22 months agoFreeBSD: Improve crypto_dispatch() handling
Alexander Motin [Fri, 17 Jun 2022 22:38:51 +0000 (18:38 -0400)]
FreeBSD: Improve crypto_dispatch() handling

Handle crypto_dispatch() return values same as crp->crp_etype errors.
On FreeBSD 12 many drivers returned same errors both ways, and lack
of proper handling for the first ended up in assertion panic later.
It was changed in FreeBSD 13, but there is no reason to not be safe.

While there, skip waiting for completion, including locking and
wakeup() call, for sessions on synchronous crypto drivers, such as
typical aesni and software.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13563

22 months agoReduce ZIO io_lock contention on sorted scrub
Alexander Motin [Wed, 15 Jun 2022 21:25:08 +0000 (17:25 -0400)]
Reduce ZIO io_lock contention on sorted scrub

During sorted scrub multiple threads (one per vdev) are issuing many
ZIOs same time, all using the same scn->scn_zio_root ZIO as parent.
It causes huge lock contention on the single global lock on that ZIO.
Improve it by introducing per-queue null ZIOs, children to that one,
and using them instead as proxy.

For 12 SSD pool storing 1.5TB of 4KB blocks on 80-core system this
dramatically reduces lock contention and reduces scrub time from 21
minutes down to 12.5, while actual read stages (not scan) are about
3x faster, reaching 100K blocks per second per vdev.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13553

22 months agoAVL: Remove obsolete branching optimizations
Alexander Motin [Thu, 9 Jun 2022 22:27:36 +0000 (18:27 -0400)]
AVL: Remove obsolete branching optimizations

Modern Clang and GCC can successfully implement simple conditions
without branching with math and flag operations.  Use of arrays for
translation no longer helps as much as it was 14+ years ago.

Disassemble of the code generated by Clang 13.0.0 on FreeBSD 13.1,
Clang 14.0.4 on FreeBSD 14 and GCC 10.2.1 on Debian 11 with this
change still shows no branching instructions.

Profiling of CPU-bound scan stage of sorted scrub shows reproducible
reduction of time spent inside avl_find() from 6.52% to 4.58%.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13540

22 months agoMore speculative prefetcher improvements
Alexander Motin [Wed, 25 May 2022 17:12:52 +0000 (13:12 -0400)]
More speculative prefetcher improvements

- Make prefetch distance adaptive: up to 4MB prefetch doubles for
every, hit same as before, but after that it grows by 1/8 every time
the prefetch read does not complete in time to satisfy the demand.
My tests show that 4MB is sufficient for wide NVMe pool to saturate
single reader thread at 2.5GB/s, while new 64MB maximum allows the
same thread to reach 1.5GB/s on wide HDD pool.  Further distance
increase may increase speed even more, but less dramatic and with
higher latency.

 - Allow early reuse of inactive prefetch streams: streams that never
saw hits can be reused immediately if there is a demand, while others
can be reused after 1s of inactivity, starting with the oldest.  After
2s of inactivity streams are deleted to free resources same as before.
This allows by several times increase strided read performance on HDD
pool in presence of simultaneous random reads, previously filling the
zfetch_max_streams limit for seconds and so blocking most of prefetch.

 - Always issue intermediate indirect block reads with SYNC priority.
Each of those reads if delayed for longer may delay up to 1024 other
block prefetches, that may be not good for wide pools.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13452

22 months agoImprove mg_aliquot math
Alexander Motin [Wed, 4 May 2022 18:33:42 +0000 (14:33 -0400)]
Improve mg_aliquot math

When calculating mg_aliquot alike to #12046 use number of unique data
disks in the vdev, not the total number of children vdev.  Increase
default value of the tunable from 512KB to 1MB to compensate.

Before this change each disk in striped pool was getting 512KB of
sequential data, in 2-wide mirror -- 1MB, in 3-wide RAIDZ1 -- 768KB.
After this change in all the cases each disk should get 1MB.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13388

22 months agoImprove log spacemap load time
Alexander Motin [Tue, 26 Apr 2022 17:44:21 +0000 (13:44 -0400)]
Improve log spacemap load time

Previous flushing algorithm limited only total number of log blocks to
the minimum of 256K and 4x number of metaslabs in the pool.  As result,
system with 1500 disks with 1000 metaslabs each, touching several new
metaslabs each TXG could grow spacemap log to huge size without much
benefits.  We've observed one of such systems importing pool for about
45 minutes.

This patch improves the situation from five sides:
 - By limiting maximum period for each metaslab to be flushed to 1000
TXGs, that effectively limits maximum number of per-TXG spacemap logs
to load to the same number.
 - By making flushing more smooth via accounting number of metaslabs
that were touched after the last flush and actually need another flush,
not just ms_unflushed_txg bump.
 - By applying zfs_unflushed_log_block_pct to the number of metaslabs
that were touched after the last flush, not all metaslabs in the pool.
 - By aggressively prefetching per-TXG spacemap logs up to 16 TXGs in
advance, making log spacemap load process for wide HDD pool CPU-bound,
accelerating it by many times.
 - By reducing zfs_unflushed_log_block_max from 256K to 128K, reducing
single-threaded by nature log processing time from ~10 to ~5 minutes.

As further optimization we could skip bumping ms_unflushed_txg for
metaslabs not touched since the last flush, but that would be an
incompatible change, requiring new pool feature.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12789

22 months agoAdd more control/visibility to spa_load_verify().
Alexander Motin [Fri, 4 Feb 2022 21:06:38 +0000 (16:06 -0500)]
Add more control/visibility to spa_load_verify().

Use error thresholds from policy to control whether to scrub data
and/or metadata.  If threshold is set to UINT64_MAX, then caller
probably does not care about result and we may skip that part.

By default import neither set the data error threshold nor read
the error counter, so skip the data scrub for faster import.
Metadata are still scrubbed and fail if even single error found.

While there just for symmetry return number of metadata errors in
case threshold is not set to zero and we haven't reached it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #13022

22 months agospa.c: Replace VERIFY(nvlist_*(...) == 0) with fnvlist_* (#12678)
Allan Jude [Tue, 26 Oct 2021 23:15:38 +0000 (19:15 -0400)]
spa.c: Replace VERIFY(nvlist_*(...) == 0) with fnvlist_* (#12678)

The fnvlist versions of the functions are fatal if they fail,
saving each call from having to include checking the result.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Allan Jude <allan@klarasystems.com>
22 months agoAvoid small buffer copying on write
Alexander Motin [Tue, 27 Jul 2021 23:05:47 +0000 (19:05 -0400)]
Avoid small buffer copying on write

It is wrong for arc_write_ready() to use zfs_abd_scatter_enabled to
decide whether to reallocate/copy the buffer, because the answer is
OS-specific and depends on the buffer size.  Instead of that use
abd_size_alloc_linear(), moved into public header.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12425

22 months agoRemove refcount from spa_config_*()
Alexander Motin [Thu, 1 Jul 2021 15:16:54 +0000 (11:16 -0400)]
Remove refcount from spa_config_*()

The only reason for spa_config_*() to use refcount instead of simple
non-atomic (thanks to scl_lock) variable for scl_count is tracking,
hard disabled for the last 8 years.  Switch to simple int scl_count
reduces the lock hold time by avoiding atomic, plus makes structure
fit into single cache line, reducing the locks contention.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12287

22 months agoScrub mirror children without BPs
Brian Behlendorf [Thu, 14 Jul 2022 17:21:29 +0000 (10:21 -0700)]
Scrub mirror children without BPs

When scrubbing a raidz/draid pool, which contains a replacing or
sparing mirror with multiple online children, only one child will
be read.  This is not normally a serious concern because the DTL
records are used to determine where a good copy of the data is.
As long as the data can be read from one child the mirror vdev
will use it to repair gaps in any of its children.  Furthermore,
even if the data which was read is corrupt the raidz code will
detect this and issue its own repair I/O to correct the damage
in the mirror vdev.

However, in the scenario where the DTL is wrong due to silent
data corruption (say due to overwriting one child) and the scrub
happens to read from a child with good data, then the other damaged
mirror child will not be detected nor repaired.

While this is possible for both raidz and draid vdevs, it's most
pronounced when using draid.  This is because by default the zed
will sequentially rebuild a draid pool to a distributed spare,
and the distributed spare half of the mirror is always preferred
since it delivers better performance.  This means the damaged
half of the mirror will go undetected even after scrubbing.

For system administrations this behavior is non-intuitive and in
a worst case scenario could result in the only good copy of the
data being unknowingly detached from the mirror.

This change resolves the issue by reading all replacing/sparing
mirror children when scrubbing.  When the BP isn't available for
verification, then compare the data buffers from each child.  They
must all be identical, if not there's silent damage and an error
is returned to prompt the top-level vdev to issue a repair I/O to
rewrite the data on all of the mirror children.  Since we can't
tell which child was wrong a checksum error is logged against the
replacing or sparing mirror vdev.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13555

23 months agoTag zfs-2.1.5
Tony Hutter [Tue, 21 Jun 2022 17:55:02 +0000 (10:55 -0700)]
Tag zfs-2.1.5

META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
23 months agoRemove install of zfs-load-module.service for dracut
Matthew Thode [Tue, 21 Jun 2022 17:37:20 +0000 (12:37 -0500)]
Remove install of zfs-load-module.service for dracut

The zfs-load-module.service service is not currently provided by
the OpenZFS repository so we cannot safely assume it exists.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Thode <mthode@mthode.org>
Closes #13574

23 months agoFreeBSD: Silence clang unused-but-set-variable
Ryan Moeller [Wed, 15 Jun 2022 15:18:41 +0000 (15:18 +0000)]
FreeBSD: Silence clang unused-but-set-variable

Quick and dirty build fix for warnings being treated as errors.

Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
23 months agoImprove sorted scan memory accounting
Alexander Motin [Fri, 10 Jun 2022 17:01:46 +0000 (13:01 -0400)]
Improve sorted scan memory accounting

Since we use two B-trees q_exts_by_size and q_exts_by_addr, we should
count 2x sizeof (range_seg_gap_t) per node.  And since average B-tree
memory efficiency is about 75%, we should increase it to 3x.

Previous code under-counted up to 30% of the memory usage.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13537

23 months agoCorrected edge case in uncompressed ARC->L2ARC handling
Rich Ercolani [Wed, 4 May 2022 18:59:30 +0000 (14:59 -0400)]
Corrected edge case in uncompressed ARC->L2ARC handling

I genuinely don't know why this didn't come up before,
but adding the LZ4 early abort pointed out this flaw,
in which we're allocating a buffer of one size, and
then telling the compressor that we're handing it buffers
of a different size, which may be Very Different - say,
allocating 512b and then telling it the inputs are 128k.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #13375

23 months agoRemove wrong assertion in log spacemap
Alexander Motin [Wed, 1 Jun 2022 16:54:35 +0000 (12:54 -0400)]
Remove wrong assertion in log spacemap

It is typical, but not generally true that if log summary has more
blocks it must also have unflushed metaslabs.  Normally with metaslabs
flushed in order it works, but there are known exceptions, such as
device removal or metaslab being loaded during its flush attempt.

Before 600a02b8844 if spa_flush_metaslabs() hit loading metaslab it
usually stopped (unless memlimit is also exceeded), but now it may
flush more metaslabs, just skipping that particular one.  This
increased chances of assertion to fire when the skipped metaslab is
flushed on next iteration if all other metaslabs in that summary
entry are already flushed out of order.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13486
Closes #13513

23 months agolibzfs: Fail making a dataset handle gracefully
Ryan Moeller [Fri, 18 Feb 2022 21:09:03 +0000 (16:09 -0500)]
libzfs: Fail making a dataset handle gracefully

When a dataset is in the process of being received it gets marked as
inconsistent and should not be used.  We should check for this when
opening a dataset handle in libzfs and return with an appropriate error
set, rather than hitting an abort because of the incomplete data.

zfs_open() passes errno to zfs_standard_error() after observing
make_dataset_handle() fail, which ends up aborting if errno is 0.
Set errno before returning where we know it has not been set already.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #13077

23 months agolibzfs: mount: don't leak mnt_param_t if mnt_func fails
наб [Wed, 12 Jan 2022 23:50:15 +0000 (00:50 +0100)]
libzfs: mount: don't leak mnt_param_t if mnt_func fails

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12968

23 months agoReject zfs send -RI with nonexistent fromsnap
Rich Ercolani [Mon, 4 Oct 2021 17:17:30 +0000 (13:17 -0400)]
Reject zfs send -RI with nonexistent fromsnap

Right now, zfs send -I dataset@nonexistent dataset@existent fails, but
zfs send -RI dataset@nonexistent dataset@existent does not.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12574
Closes #12575

23 months agoLinux 5.18 compat: META
Brian Behlendorf [Tue, 31 May 2022 21:38:00 +0000 (14:38 -0700)]
Linux 5.18 compat: META

Update the META file to reflect compatibility with the 5.18 kernel.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13527

23 months agoautoconf: AC_MSG_CHECKING consistency
Brian Behlendorf [Tue, 31 May 2022 23:42:49 +0000 (16:42 -0700)]
autoconf: AC_MSG_CHECKING consistency

Make the wording more consistent for the kernel AC_MSG_CHECKING
output (e.g. "checking whether ...".).  Additionally, group some
of the VFS interface checks with the others.  No functional change.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13529

23 months agoLinux 5.19 compat: asm/fpu/internal.h
Brian Behlendorf [Wed, 1 Jun 2022 00:26:47 +0000 (17:26 -0700)]
Linux 5.19 compat: asm/fpu/internal.h

As of the Linux 5.19 kernel the asm/fpu/internal.h header was
entirely removed.  It has been effectively empty since the 5.16
kernel and provides no required functionality.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13529

23 months agoLinux 5.19 compat: zap_flags_t conflict
Brian Behlendorf [Fri, 27 May 2022 22:56:05 +0000 (15:56 -0700)]
Linux 5.19 compat: zap_flags_t conflict

As of the Linux 5.19 kernel an identically named zap_flags_t typedef
is declared in the include/linux/mm_types.h linux header.  Sadly,
the inclusion of this header cannot be easily avoided.  To resolve
the conflict a #define is used to remap the name in the OpenZFS
sources when building against the Linux kernel.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13515

23 months agoLinux 5.19 compat: bdev_start_io_acct() / bdev_end_io_acct()
Brian Behlendorf [Fri, 27 May 2022 21:31:03 +0000 (21:31 +0000)]
Linux 5.19 compat: bdev_start_io_acct() / bdev_end_io_acct()

As of the Linux 5.19 kernel the disk_*_io_acct() helper functions
have been replaced by the bdev_*_io_acct() functions.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13515

23 months agoLinux 5.19 compat: aops->read_folio()
Brian Behlendorf [Fri, 27 May 2022 20:44:43 +0000 (20:44 +0000)]
Linux 5.19 compat: aops->read_folio()

As of the Linux 5.19 kernel the readpage() address space operation
has been replaced by read_folio().

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13515

23 months agoLinux 5.19 compat: blkdev_issue_secure_erase()
Brian Behlendorf [Fri, 27 May 2022 19:40:22 +0000 (19:40 +0000)]
Linux 5.19 compat: blkdev_issue_secure_erase()

Linux 5.19 commit torvalds/linux@44abff2c0 splits the secure
erase functionality from the blkdev_issue_discard() function.
The blkdev_issue_secure_erase() must now be issued to issue
a secure erase.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13515

23 months agoLinux 5.19 compat: bdev_max_secure_erase_sectors()
Brian Behlendorf [Fri, 27 May 2022 18:20:04 +0000 (18:20 +0000)]
Linux 5.19 compat: bdev_max_secure_erase_sectors()

Linux 5.19 commit torvalds/linux@44abff2c0 removed the
blk_queue_secure_erase() helper function.  The preferred
interface is to now use the bdev_max_secure_erase_sectors()
function to check for discard support.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13515

23 months agoLinux 5.19 compat: bdev_max_discard_sectors()
Brian Behlendorf [Fri, 27 May 2022 17:51:55 +0000 (17:51 +0000)]
Linux 5.19 compat: bdev_max_discard_sectors()

Linux 5.19 commit torvalds/linux@70200574cc removed the
blk_queue_discard() helper function.  The preferred interface
is to now use the bdev_max_discard_sectors() function to check
for discard support.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13515

23 months agoLinux 5.18 compat: bio_alloc()
Brian Behlendorf [Fri, 27 May 2022 20:28:51 +0000 (20:28 +0000)]
Linux 5.18 compat: bio_alloc()

As for the Linux 5.18 kernel bio_alloc() expects a block_device struct
as an argument.  This removes the need for the bio_set_dev() compatibility
code for 5.18 and newer kernels.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13515

2 years agoSilence unused-but-set-variable warning
Ryan Moeller [Thu, 26 May 2022 00:26:59 +0000 (20:26 -0400)]
Silence unused-but-set-variable warning

This was breaking the kmod port build on FreeBSD with Clang 13.

Use the same trick as we do for ASSERT() to make DNODE_VERIFY() use
its parameter at compile time without actually using it at run time
in non-debug builds.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #13507

2 years agozed: support subject as header in zed_notify_email()
heeplr [Wed, 18 May 2022 17:27:53 +0000 (19:27 +0200)]
zed: support subject as header in zed_notify_email()

Some minimal MUAs don't support passing the subjects as cmdline option.
This commit checks if "@SUBJECT@" is missing in ZED_EMAIL_OPTS and then
prepends a subject header to the notification message.
Also set a default for ${subject}.

Reviewed-by: Ahelenia Ziemia<C5><84>ska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Daniel Hiepler <d-git@coderdu.de>
Closes #13440

2 years agorpm: Keep debug symbols if configured with '--enable-debuginfo'
Umer Saleem [Wed, 25 May 2022 16:22:11 +0000 (21:22 +0500)]
rpm: Keep debug symbols if configured with '--enable-debuginfo'

Do not strip debug information from packages if '--enable-debuginfo' is
configured.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #13500

2 years agoFreeBSD: libspl: Add locking around statfs globals
Ryan Moeller [Tue, 24 May 2022 16:40:20 +0000 (12:40 -0400)]
FreeBSD: libspl: Add locking around statfs globals

Makes getmntent and getmntany thread-safe for external consumers of
libzfs zpool_disable_datasets, zfs_iter_mounted, libzfs_mnttab_update,
libzfs_mnttab_find.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #13484

2 years agoStandardize RHEL version check in packages
Brian Behlendorf [Wed, 25 May 2022 16:20:17 +0000 (09:20 -0700)]
Standardize RHEL version check in packages

This is a follow up to 3c356622994 which standardizes how the RHEL
version check is done.  This simpler "0%{?rhel}" check is used
elsewhere in the packages so we do the same here.

Reviewed-by: Neal Gompa <ngompa@datto.com>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13501

2 years agoModified ncompress requirement in RPM to exclude RHEL9
Rich Ercolani [Tue, 24 May 2022 16:39:32 +0000 (12:39 -0400)]
Modified ncompress requirement in RPM to exclude RHEL9

The bug this was working around is no longer present.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #13480
Closes #13490

2 years agozed: Take no action on scrub/resilver checksum errors
Brian Behlendorf [Tue, 24 May 2022 16:36:07 +0000 (09:36 -0700)]
zed: Take no action on scrub/resilver checksum errors

When scrubbing/resilvering a pool it can be counter productive to
cancel the scan and kick of a replace operation to a hot spare
when encountering checksum errors.  In this case, the best course
of action is to allow the scrub/resilver to complete as quickly
as possible and to keep the vdevs fully online if possible.

Realistically, this is less of an issue for a RAIDZ since a
traditional resilver must be used and checksums will be verified.
However, this is not the case for a mirror or dRAID pool which is
sequentially resilvered and checksum verification is deferred
until after the replace operation completes.

Regardless, we apply this policy to all pool types since it's
a good idea for all vdevs.  Degrading additional vdevs has the
potential to make a bad situation worse.  Note the checksum
errors will still be reported as both an event and by
`zpool status`.  This change only prevents the ZED from
proactively taking any action.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13499

2 years agozdb: Fix handling of nul termination in symlink targets
Mark Johnston [Fri, 20 May 2022 17:32:49 +0000 (13:32 -0400)]
zdb: Fix handling of nul termination in symlink targets

The SA attribute containing the symlink target does not include a nul
terminator, so when printing the target zdb would sometimes include
garbage at the end of the string.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #13482

2 years agoautomake: don't install /e/d/zfs or /e/z/zfs-functions +x
наб [Tue, 24 May 2022 23:25:54 +0000 (01:25 +0200)]
automake: don't install /e/d/zfs or /e/z/zfs-functions +x

Closes #13496
Backport-of: #13503
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
2 years agoMultiple dracut module install script cleanups
Savyasachee Jha [Mon, 14 Feb 2022 16:58:37 +0000 (22:28 +0530)]
Multiple dracut module install script cleanups

- Replaced intances of `dracut_install` with `inst_simple`
- Removed calls to `test -x mark_hostonly` because the function is an
inbuilt dracut function
- Removed redundant installation of `systemd-ask-password` and
`systemd-tty-ask-password-agent` because they are already installed by
the systemd module. There is no need to install them again
- Removed multiple calls to the `mark_hostonly` function because the
`inst_simple` has a command-line switch for it
- Cleaned up the installation of the `zpool.cache`, `vdev_id.conf` and
`hostid` files to make the logic easier to follow
- Cleaned up and simplified the systemd service installation logic by
invoking systemctl instead of creating symlinks manually
- Replaced various hard-coded paths with dracut equivalents to better
conform with expected dracut behaviour
- Removed redundant call to `mkdir` (`inst_simple` creates the parent
directory if it does not exist on the destination initrd)

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Andrew J. Hesford <ajh@sideband.org>
Signed-off-by: Savyasachee Jha <hi@savyasacheejha.com>
Closes #13010

2 years agoRemove absolute paths to udev rules and binaries for dracut
Savyasachee Jha [Mon, 14 Feb 2022 12:45:16 +0000 (18:15 +0530)]
Remove absolute paths to udev rules and binaries for dracut

Since dracut functions can locate both udev rules and binaries, there is
no point in keeping absolute paths in the module setup script. It also
breaks the --sysroot option in dracut. This commit removes mentions to
absolute paths for binaries and udev rules.

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Andrew J. Hesford <ajh@sideband.org>
Signed-off-by: Savyasachee Jha <hi@savyasacheejha.com>
Closes #13010

2 years agoMake dracut fail if essential files cannot be installed
Savyasachee Jha [Sun, 6 Feb 2022 04:11:23 +0000 (09:41 +0530)]
Make dracut fail if essential files cannot be installed

Dracut will now fail in initramfs generation if essential files cannot
be installed.

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Andrew J. Hesford <ajh@sideband.org>
Signed-off-by: Savyasachee Jha <hi@savyasacheejha.com>
Closes #13010

2 years agoMake better use of dracut functions when building initramfs
Savyasachee Jha [Tue, 25 Jan 2022 03:22:48 +0000 (03:22 +0000)]
Make better use of dracut functions when building initramfs

Setting up the module involves multiple redundant calls to a bunch of
dracut functions wheich can be combined into one. Additionally, the mass
of code required to load libgcc_s.so* can be replaced with one dracut
function. This has the additional effect of removing errors involving
the non-installation of libgcc_s.so* which are seen on debian bullseye
when using version 2.1.2-1~bpo11+1 from the backports repository.

The systemd binaries are separated out into their own `dracut_install`
function call so they do not get pulled in when dracut does not load the
systemd module.

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Andrew J. Hesford <ajh@sideband.org>
Signed-off-by: Savyasachee Jha <hi@savyasacheejha.com>
Closes #13010

2 years agoFix compiler warnings about zero-length arrays in inline bitops
Coleman Kane [Tue, 17 May 2022 20:07:39 +0000 (16:07 -0400)]
Fix compiler warnings about zero-length arrays in inline bitops

The compiler appears to be expanding the unused NULL pointer into a
zero-length array via the inline bitops code. When -Werror=array-bounds
is used, this causes a build failure. Recommended solution is allocate
temporary structures, fill with zeros (to avoid uninitialized data use
warnings), and pass the pointer to those to the inline calls.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #13463
Closes #13465

2 years agoAdd missing AC_MSG_RESULT(no) to configure
Brian Behlendorf [Thu, 12 May 2022 16:12:32 +0000 (09:12 -0700)]
Add missing AC_MSG_RESULT(no) to configure

When the HAVE_IOPS_MKDIR_USERNS check fails output result
as required.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13454

2 years agoabd_os: remove redundant refcount creation for abd_children
hping [Mon, 9 May 2022 23:30:16 +0000 (07:30 +0800)]
abd_os: remove redundant refcount creation for abd_children

Refcount creation for abd_zero_scatter->abd_children is redundant in
abd_alloc_zero_scatter, as it has been done in abd_init_struct.

In addition, abd_children is undefined when ZFS_DEBUG is disabled, the
reference of abd_children in abd_alloc_zero_scatter breaks build of
libzpool when ZFS_DEBUG is disabled.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Ping Huang <huangping@smartx.com>
Closes #13429

2 years agoFix functions without a prototype
Aidan Harris [Fri, 6 May 2022 18:57:37 +0000 (18:57 +0000)]
Fix functions without a prototype

clang-15 emits the following error message for functions without
a prototype:

fs/zfs/os/linux/spl/spl-kmem-cache.c:1423:27: error:
  a function declaration without a prototype is deprecated
  in all versions of C [-Werror,-Wstrict-prototypes]

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Aidan Harris <me@aidanharr.is>
Closes #13421

2 years agoFreeBSD: use zero_region instead of allocating a dedicated page
Mateusz Guzik [Wed, 4 May 2022 18:46:37 +0000 (20:46 +0200)]
FreeBSD: use zero_region instead of allocating a dedicated page

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #13406

2 years agoautoconf: Fail when __copy_from_user_inatomic is a non-GPL symbol
szubersk [Sat, 7 May 2022 00:53:42 +0000 (00:53 +0000)]
autoconf: Fail when __copy_from_user_inatomic is a non-GPL symbol

A followup to 849c14e04844a2f0e1f7e42886c2cef083563f35
Fix https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1009242

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #13389

2 years agoPPC get_user workaround
Damian Szuberski [Tue, 26 Apr 2022 17:52:40 +0000 (03:52 +1000)]
PPC get_user workaround

Linux 5.12 PPC 5.12 get_user() and __copy_from_user_inatomic()
inline helpers very indirectly include a reference to the GPL'd
array mmu_feature_keys[] and fails to build. Workaround this by
using copy_from_user() and throwing EFAULT for any calls to
__copy_from_user_inatomic(). This is a workaround until a fix
for Linux commit 7613f5a66becfd0e43a0f34de8518695888f5458
"powerpc/64s/kuap: Use mmu_has_feature()" is fully addressed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Authored-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #11958
Closes #12590
Closes #13367

2 years agoAdding ZERO_PAGE detection
Brian Atkinson [Mon, 14 Mar 2022 19:37:39 +0000 (13:37 -0600)]
Adding ZERO_PAGE detection

On some architectures ZERO_PAGE is unavailable because it references
a GPL exported symbol of empty_zero_page. Originally e08b993 removed
the call to PAGE_ZERO(0) for assignment to the abd_zero_page. However,
a simple check can be done to avoid a kernel allocation and free for
the abd_zero_page if ZERO_PAGE is available.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #13199

2 years agoautoconf: Pretend `CONFIG_MODULES` is always on
Damian Szuberski [Tue, 26 Apr 2022 17:47:09 +0000 (03:47 +1000)]
autoconf: Pretend `CONFIG_MODULES` is always on

- Unconditionally inject `CONFIG_MODULES` make variable
  and `#define CONFIG_MODULES` to Kbuild in `ZFS_LINUX_COMPILE`
  autoconf function to emulate loadable kernel modules support.
  This allows OpenZFS to perform Linux checks despite
  `CONFIG_MODULES=n` in the actual Linux config.

- Add `ZFS_AC_KERNEL_CONFIG_MODULES` check which encompasses
  the logic from `ZFS_AC_KERNEL_TEST_MODULE` with additional
  diagnostic messages to the user

- Removed `ZFS_AC_KERNEL_TEST_MODULE` as it merely duplicates
  every check in `ZFS_AC_KERNEL_CONFIG_DEFINED`

- Moved `ZFS_AC_MODULE_SYMVERS` after `ZFS_AC_KERNEL_CONFIG_DEFINED`
  so the user has a chance to see the proper diagnostic from the
  steps before.

A workaround for Linux's

```
commit 3e3005df73b535cb849cf4ec8075d6aa3c460f68
Author: Masahiro Yamada <masahiroy@kernel.org>
Date:   Wed Mar 31 22:38:03 2021 +0900

kbuild: unify modules(_install) for in-tree and external modules

If you attempt to build or install modules ('make modules(_install)'
with CONFIG_MODULES disabled, you will get a clear error message, but
nothing for external module builds.

Factor out the modules and modules_install rules into the common part,
so you will get the same error message when you try to build external
modules with CONFIG_MODULES=n.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
```

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #10832
Closes #13361

2 years agoStrengthen Linux kernel capabilities detection
Damian Szuberski [Thu, 21 Apr 2022 16:37:11 +0000 (18:37 +0200)]
Strengthen Linux kernel capabilities detection

- Add `CONFIG_BLOCK` Linux config requirement to
  `ZFS_AC_KERNEL_CONFIG_DEFINED`. OpenZFS won't compile without
  that block device support due to large amount of functional
  dependencies on it.

- Remove dependency on `groups_alloc()` in
  `ZFS_AC_KERNEL_SRC_GROUP_INFO_GID` to circumvent the missing stub
  in Linux 4.X kernel headers.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #13351

2 years agozvol_wait: Ignore locked zvols
Richard Laager [Fri, 22 Apr 2022 22:37:03 +0000 (17:37 -0500)]
zvol_wait: Ignore locked zvols

"When an encrypted zvol is locked the zfs-volume-wait service does not
start.  The /sbin/zvol_wait should not wait for links when the volume
has property keystatus=unavailable."
-- https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1888405

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Thanks: James Dingwall <james-launchpad@dingwall.me.uk>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #10662

2 years agoFreeBSD: Implement hole-punching support
Ka Ho Ng [Wed, 4 Aug 2021 15:57:48 +0000 (23:57 +0800)]
FreeBSD: Implement hole-punching support

This adds supports for hole-punching facilities in the FreeBSD kernel
starting from __FreeBSD_version 1400032.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ka Ho Ng <khng@FreeBSD.org>
Sponsored-by: The FreeBSD Foundation
Closes #12458

2 years agomodule: zstd: check we don't leak symbols; regenerate symbol map
наб [Tue, 15 Mar 2022 23:10:10 +0000 (00:10 +0100)]
module: zstd: check we don't leak symbols; regenerate symbol map

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12988
Closes #13209
(cherry picked from commit 6ef00196db1cc6bd189eeb72df26d494a2aee889)

2 years agoman: zpool-import.8: -d -or -c
наб [Tue, 10 May 2022 20:36:37 +0000 (22:36 +0200)]
man: zpool-import.8: -d -or -c

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13437

2 years agoReduce dbuf_find() lock contention
Brian Behlendorf [Wed, 4 May 2022 18:17:29 +0000 (11:17 -0700)]
Reduce dbuf_find() lock contention

Holding a dbuf is a common operation which can become highly contended
in dbuf_find() when acquiring the dbuf hash mutex.  This is particularly
true on Linux when reading/writing volumes since by default up to 32
threads from the zvol_taskq may be taking a hold of the same dbuf.
This should also be observable on FreeBSD as long as there are enough
processes accessing the volume concurrently.

This is further aggregrated by the fact that only the block id will
be unique when calculating the dbuf hash for a single volume.  The
objset id, object id, and level will be the same for data blocks.
This has been observed to result in a somehwat less than uniform hash
distribution and a longer than expected max hash chain depth (~20)
on a large memory system (256 GB) using volumes.

This commit improves the siutation by switching the hash mutex to
an rwlock to allow concurrent lookups, and increasing DBUF_RWLOCKS
from 2048 to 8192 to further reduce the odds of a hash collision.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13405

2 years agocontrib: dracut: remove getargbool polyfill
наб [Tue, 5 Apr 2022 00:56:57 +0000 (02:56 +0200)]
contrib: dracut: remove getargbool polyfill

It was originally released in dracut 008 in February 2011;
we can probably drop it now

Upstream-commit: 47a02e39721bd226646dbdfe3063563a4a5e9749
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291

2 years agoAdd dracut.zfs.7
наб [Mon, 4 Apr 2022 23:14:49 +0000 (01:14 +0200)]
Add dracut.zfs.7

Thorough documentation with a dracut.bootup(7)-style flowchart,
dracut.cmdline(7)-style cmdline listing,
and per-file docs like the old README

Upstream-commit: e3fc330d6cc9445821fa44c699b7e66e2f6f209d
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291

2 years agocontrib: dracut: zfs-needshutdown: don't list
наб [Mon, 4 Apr 2022 23:09:11 +0000 (01:09 +0200)]
contrib: dracut: zfs-needshutdown: don't list

Upstream-commit: 1cc9cc2f89eb67722109bbb5a2d9814b5e77d5db
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291

2 years agocontrib: dracut: zfs-{rollback,snapshot}-bootfs: order after key loading
наб [Mon, 4 Apr 2022 22:19:38 +0000 (00:19 +0200)]
contrib: dracut: zfs-{rollback,snapshot}-bootfs: order after key loading

This fixes at least one race I got with an encrypted root

Upstream-commit: 6ebdb0b20de444f69404c265352a27e1b4e49b93
Upstream-commit: b8d9679f3644c813f7aa6db5b9c978b88758049a
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291

2 years agocontrib: dracut: don't require essentials to be under the same encroot
наб [Mon, 4 Apr 2022 21:39:18 +0000 (23:39 +0200)]
contrib: dracut: don't require essentials to be under the same encroot

Upstream-commit: 30c6dce7f7d3808b544ac1c3dbcb4d32c9831c60
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291

2 years agocontrib: dracut: inline single-use import_pool, move single-use ask_for_password
наб [Mon, 4 Apr 2022 21:24:29 +0000 (23:24 +0200)]
contrib: dracut: inline single-use import_pool, move single-use ask_for_password

Also don't set ROOTFS_MOUNTED; the final mention was removed in dracut
011 from July 2011

Upstream-commit: eaf1e060453ad6a335b708c5e724092741d6d1d3
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291

2 years agocontrib: dracut: zfs-lib: remove find_bootfs
наб [Mon, 4 Apr 2022 21:16:59 +0000 (23:16 +0200)]
contrib: dracut: zfs-lib: remove find_bootfs

Upstream-commit: dac0b0785a795efe3bbb5367af35e2c4fbe32040
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291

2 years agocontrib: dracut: zfs-lib: simplify ask_for_password
наб [Mon, 4 Apr 2022 20:57:35 +0000 (22:57 +0200)]
contrib: dracut: zfs-lib: simplify ask_for_password

The only user is mount-zfs.sh (non-systemd systems),
so reduce it to what it needs

Upstream-commit: 5d31169d7c8810baf87f1dc778c97a7024c8c205
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291