]> CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log
FreeBSD/FreeBSD.git
23 months agoSilence -Winfinite-recursion warning in luaD_throw()
Brian Behlendorf [Mon, 20 Jun 2022 19:53:58 +0000 (19:53 +0000)]
Silence -Winfinite-recursion warning in luaD_throw()

This code should be kept inline with the upstream lua version as much
as possible.  Therefore, we simply want to silence the warning.  This
check was enabled by default as part of -Wall in gcc 12.1.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13528
Closes #13575

23 months agoconfig: prune unused -Wno-bool-compare checks
наб [Wed, 16 Feb 2022 15:08:20 +0000 (16:08 +0100)]
config: prune unused -Wno-bool-compare checks

Reviewed-by: Alejandro Colomar <alx.manpages@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13110

23 months agolibtpool: -Wno-clobbered
наб [Wed, 16 Feb 2022 15:06:08 +0000 (16:06 +0100)]
libtpool: -Wno-clobbered

Also remove -Wno-unused-but-set-variable

Upstream-bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61118
Reviewed-by: Alejandro Colomar <alx.manpages@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13110

23 months agoRemove sha1 hashing from OpenZFS, it's not used anywhere.
Tino Reichardt [Mon, 3 Jan 2022 16:43:11 +0000 (17:43 +0100)]
Remove sha1 hashing from OpenZFS, it's not used anywhere.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12895
Closes #12902
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
23 months agoFix scrub resume from newly created hole.
Alexander Motin [Fri, 8 Jul 2022 19:53:10 +0000 (15:53 -0400)]
Fix scrub resume from newly created hole.

It may happen that scan bookmark points to a block that was turned
into a part of a big hole.  In such case dsl_scan_visitbp() may skip
it and dsl_scan_check_resume() will not be called for it.  As result
new scan suspend won't be possible until the end of the object, that
may take hours if the object is a multi-terabyte ZVOL on a slow HDD
pool, stretching TXG to all that time, creating all sorts of problems.

This patch changes the resume condition to any greater or equal block,
so even if we miss the bookmarked block, the next one we find will
delete the bookmark, allowing new suspend.

Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
23 months agoAvoid memory copy when verifying raidz/draid parity
Alexander Motin [Tue, 5 Jul 2022 23:27:29 +0000 (19:27 -0400)]
Avoid memory copy when verifying raidz/draid parity

Before this change for every valid parity column raidz_parity_verify()
allocated new buffer and copied there existing data, then recalculated
the parity and compared the result with the copy.  This patch removes
the memory copy, simply swapping original buffer pointers with newly
allocated empty ones for parity recalculation and comparison. Original
buffers with potentially incorrect parity data are then just freed,
while new recalculated ones are used for repair.

On a pool of 12 4-wide raidz vdevs, storing 1.5TB of 16MB blocks, this
change reduces memory traffic during scrub by 17% and total unhalted
CPU time by 25%.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13613

23 months agoAvoid memory copies during mirror scrub
Alexander Motin [Tue, 5 Jul 2022 23:26:20 +0000 (19:26 -0400)]
Avoid memory copies during mirror scrub

Issuing several scrub reads for a block we may use the parent ZIO
buffer for one of child ZIOs.  If that read complete successfully,
then we won't need to copy the data explicitly.  If block has only
one copy (typical for root vdev, which is also a mirror inside),
then we never need to copy -- succeed or fail as-is.  Previous
code also copied data from buffer of every successfully completed
child ZIO, but that just does not make any sense.

On healthy N-wide mirror this saves all N+1 (or even more in case
of ditto blocks) memory copies for each scrubbed block, allowing
CPU to focus mostly on check-summing.  For other vdev types it
should save one memory copy per block copy at root vdev.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13606

23 months agoFix and disable blocks statistics during scrub
Alexander Motin [Tue, 28 Jun 2022 18:23:31 +0000 (14:23 -0400)]
Fix and disable blocks statistics during scrub

Block statistics calculation during scrub I/O issue in case of sorted
scrub accounted ditto blocks several times.  Embedded blocks on other
side were not accounted at all.  This change moves the accounting from
issue to scan stage, that fixes both problems and also allows to avoid
pool-wide locking and the lock contention it created.

Since this statistics is quite specific and is not even exposed now
anywhere, disable its calculation by default to not waste CPU time.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13579

23 months agoAvoid two 64-bit divisions per scanned block
Alexander Motin [Mon, 27 Jun 2022 18:08:21 +0000 (14:08 -0400)]
Avoid two 64-bit divisions per scanned block

Change math to make it like the ARC, using multiplications instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13591

23 months agoSeveral B-tree optimizations
Alexander Motin [Fri, 24 Jun 2022 20:55:58 +0000 (16:55 -0400)]
Several B-tree optimizations

- Introduce first element offset within a leaf.  It allows to reduce
by ~50% average memmove() size when adding/removing elements.  If the
added/removed element is in the first half of the leaf, we may shift
elements before it and adjust the bth_first instead of moving more
elements after it.
 - Use memcpy() instead of memmove() when we know there is no overlap.
 - Switch from uint64_t to uint32_t.  It does not limit anything,
but 32-bit arches should appreciate it greatly in hot paths.
 - Store leaf capacity in struct btree to avoid 64-bit divisions.
 - Adjust zfs_btree_insert_into_leaf() to always result in balanced
leaves after splitting, no matter where the new element was inserted.
Not that we care about it much, but it should also allow B-trees with
as little as two elements per leaf instead of 4 previously.

When scrubbing pool of 12 SSDs, storing 1.5TB of 4KB zvol blocks this
reduces amount of time spent in memmove() inside the scan thread from
13.7% to 5.7% and total scrub time by ~15 seconds out of 9 minutes.
It should also reduce spacemaps load time, but I haven't measured it.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13582

23 months agoSeveral sorted scrub optimizations
Alexander Motin [Fri, 24 Jun 2022 16:50:37 +0000 (12:50 -0400)]
Several sorted scrub optimizations

- Reduce size and comparison complexity of q_exts_by_size B-tree.
Previous code used two 64-bit divisions and many other operations to
compare two B-tree elements.  It created enormous overhead.  This
implementation moves the math to the upper level and stores the score
in the B-tree elements themselves.  Since all that we need to store in
that B-tree is the extent score and offset, those can fit into single
8 byte value instead of 24 bytes of q_exts_by_addr element and can be
compared with single operation.
 - Better decouple secondary tree logic from main range_tree by moving
rt_btree_ops and related functions into dsl_scan.c as ext_size_ops.
Those functions are very small to worry about the code duplication and
range_tree does not need to know details such as rt_btree_compare.
 - Instead of accounting number of pending bytes per pool, that needs
atomic on global variable per block, account the number of non-empty
per-vdev queues, that change much more rarely.
 - When extent scan is interrupted by TXG end, continue it in the next
TXG instead of selecting next best extent.  It allows to avoid leaving
one truncated (and so likely not the best any more) extent each TXG.

On top of some other optimizations this saves about 1.5 minutes out of
10 to scrub pool of 12 SSDs, storing 1.5TB of 4KB zvol blocks.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <caputit1@tcnj.edu>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13576

23 months agoFreeBSD: Improve crypto_dispatch() handling
Alexander Motin [Fri, 17 Jun 2022 22:38:51 +0000 (18:38 -0400)]
FreeBSD: Improve crypto_dispatch() handling

Handle crypto_dispatch() return values same as crp->crp_etype errors.
On FreeBSD 12 many drivers returned same errors both ways, and lack
of proper handling for the first ended up in assertion panic later.
It was changed in FreeBSD 13, but there is no reason to not be safe.

While there, skip waiting for completion, including locking and
wakeup() call, for sessions on synchronous crypto drivers, such as
typical aesni and software.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13563

23 months agoReduce ZIO io_lock contention on sorted scrub
Alexander Motin [Wed, 15 Jun 2022 21:25:08 +0000 (17:25 -0400)]
Reduce ZIO io_lock contention on sorted scrub

During sorted scrub multiple threads (one per vdev) are issuing many
ZIOs same time, all using the same scn->scn_zio_root ZIO as parent.
It causes huge lock contention on the single global lock on that ZIO.
Improve it by introducing per-queue null ZIOs, children to that one,
and using them instead as proxy.

For 12 SSD pool storing 1.5TB of 4KB blocks on 80-core system this
dramatically reduces lock contention and reduces scrub time from 21
minutes down to 12.5, while actual read stages (not scan) are about
3x faster, reaching 100K blocks per second per vdev.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13553

23 months agoAVL: Remove obsolete branching optimizations
Alexander Motin [Thu, 9 Jun 2022 22:27:36 +0000 (18:27 -0400)]
AVL: Remove obsolete branching optimizations

Modern Clang and GCC can successfully implement simple conditions
without branching with math and flag operations.  Use of arrays for
translation no longer helps as much as it was 14+ years ago.

Disassemble of the code generated by Clang 13.0.0 on FreeBSD 13.1,
Clang 14.0.4 on FreeBSD 14 and GCC 10.2.1 on Debian 11 with this
change still shows no branching instructions.

Profiling of CPU-bound scan stage of sorted scrub shows reproducible
reduction of time spent inside avl_find() from 6.52% to 4.58%.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13540

23 months agoMore speculative prefetcher improvements
Alexander Motin [Wed, 25 May 2022 17:12:52 +0000 (13:12 -0400)]
More speculative prefetcher improvements

- Make prefetch distance adaptive: up to 4MB prefetch doubles for
every, hit same as before, but after that it grows by 1/8 every time
the prefetch read does not complete in time to satisfy the demand.
My tests show that 4MB is sufficient for wide NVMe pool to saturate
single reader thread at 2.5GB/s, while new 64MB maximum allows the
same thread to reach 1.5GB/s on wide HDD pool.  Further distance
increase may increase speed even more, but less dramatic and with
higher latency.

 - Allow early reuse of inactive prefetch streams: streams that never
saw hits can be reused immediately if there is a demand, while others
can be reused after 1s of inactivity, starting with the oldest.  After
2s of inactivity streams are deleted to free resources same as before.
This allows by several times increase strided read performance on HDD
pool in presence of simultaneous random reads, previously filling the
zfetch_max_streams limit for seconds and so blocking most of prefetch.

 - Always issue intermediate indirect block reads with SYNC priority.
Each of those reads if delayed for longer may delay up to 1024 other
block prefetches, that may be not good for wide pools.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13452

23 months agoImprove mg_aliquot math
Alexander Motin [Wed, 4 May 2022 18:33:42 +0000 (14:33 -0400)]
Improve mg_aliquot math

When calculating mg_aliquot alike to #12046 use number of unique data
disks in the vdev, not the total number of children vdev.  Increase
default value of the tunable from 512KB to 1MB to compensate.

Before this change each disk in striped pool was getting 512KB of
sequential data, in 2-wide mirror -- 1MB, in 3-wide RAIDZ1 -- 768KB.
After this change in all the cases each disk should get 1MB.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13388

23 months agoImprove log spacemap load time
Alexander Motin [Tue, 26 Apr 2022 17:44:21 +0000 (13:44 -0400)]
Improve log spacemap load time

Previous flushing algorithm limited only total number of log blocks to
the minimum of 256K and 4x number of metaslabs in the pool.  As result,
system with 1500 disks with 1000 metaslabs each, touching several new
metaslabs each TXG could grow spacemap log to huge size without much
benefits.  We've observed one of such systems importing pool for about
45 minutes.

This patch improves the situation from five sides:
 - By limiting maximum period for each metaslab to be flushed to 1000
TXGs, that effectively limits maximum number of per-TXG spacemap logs
to load to the same number.
 - By making flushing more smooth via accounting number of metaslabs
that were touched after the last flush and actually need another flush,
not just ms_unflushed_txg bump.
 - By applying zfs_unflushed_log_block_pct to the number of metaslabs
that were touched after the last flush, not all metaslabs in the pool.
 - By aggressively prefetching per-TXG spacemap logs up to 16 TXGs in
advance, making log spacemap load process for wide HDD pool CPU-bound,
accelerating it by many times.
 - By reducing zfs_unflushed_log_block_max from 256K to 128K, reducing
single-threaded by nature log processing time from ~10 to ~5 minutes.

As further optimization we could skip bumping ms_unflushed_txg for
metaslabs not touched since the last flush, but that would be an
incompatible change, requiring new pool feature.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12789

23 months agoAdd more control/visibility to spa_load_verify().
Alexander Motin [Fri, 4 Feb 2022 21:06:38 +0000 (16:06 -0500)]
Add more control/visibility to spa_load_verify().

Use error thresholds from policy to control whether to scrub data
and/or metadata.  If threshold is set to UINT64_MAX, then caller
probably does not care about result and we may skip that part.

By default import neither set the data error threshold nor read
the error counter, so skip the data scrub for faster import.
Metadata are still scrubbed and fail if even single error found.

While there just for symmetry return number of metadata errors in
case threshold is not set to zero and we haven't reached it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #13022

23 months agospa.c: Replace VERIFY(nvlist_*(...) == 0) with fnvlist_* (#12678)
Allan Jude [Tue, 26 Oct 2021 23:15:38 +0000 (19:15 -0400)]
spa.c: Replace VERIFY(nvlist_*(...) == 0) with fnvlist_* (#12678)

The fnvlist versions of the functions are fatal if they fail,
saving each call from having to include checking the result.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Allan Jude <allan@klarasystems.com>
23 months agoAvoid small buffer copying on write
Alexander Motin [Tue, 27 Jul 2021 23:05:47 +0000 (19:05 -0400)]
Avoid small buffer copying on write

It is wrong for arc_write_ready() to use zfs_abd_scatter_enabled to
decide whether to reallocate/copy the buffer, because the answer is
OS-specific and depends on the buffer size.  Instead of that use
abd_size_alloc_linear(), moved into public header.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12425

23 months agoRemove refcount from spa_config_*()
Alexander Motin [Thu, 1 Jul 2021 15:16:54 +0000 (11:16 -0400)]
Remove refcount from spa_config_*()

The only reason for spa_config_*() to use refcount instead of simple
non-atomic (thanks to scl_lock) variable for scl_count is tracking,
hard disabled for the last 8 years.  Switch to simple int scl_count
reduces the lock hold time by avoiding atomic, plus makes structure
fit into single cache line, reducing the locks contention.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12287

23 months agoScrub mirror children without BPs
Brian Behlendorf [Thu, 14 Jul 2022 17:21:29 +0000 (10:21 -0700)]
Scrub mirror children without BPs

When scrubbing a raidz/draid pool, which contains a replacing or
sparing mirror with multiple online children, only one child will
be read.  This is not normally a serious concern because the DTL
records are used to determine where a good copy of the data is.
As long as the data can be read from one child the mirror vdev
will use it to repair gaps in any of its children.  Furthermore,
even if the data which was read is corrupt the raidz code will
detect this and issue its own repair I/O to correct the damage
in the mirror vdev.

However, in the scenario where the DTL is wrong due to silent
data corruption (say due to overwriting one child) and the scrub
happens to read from a child with good data, then the other damaged
mirror child will not be detected nor repaired.

While this is possible for both raidz and draid vdevs, it's most
pronounced when using draid.  This is because by default the zed
will sequentially rebuild a draid pool to a distributed spare,
and the distributed spare half of the mirror is always preferred
since it delivers better performance.  This means the damaged
half of the mirror will go undetected even after scrubbing.

For system administrations this behavior is non-intuitive and in
a worst case scenario could result in the only good copy of the
data being unknowingly detached from the mirror.

This change resolves the issue by reading all replacing/sparing
mirror children when scrubbing.  When the BP isn't available for
verification, then compare the data buffers from each child.  They
must all be identical, if not there's silent damage and an error
is returned to prompt the top-level vdev to issue a repair I/O to
rewrite the data on all of the mirror children.  Since we can't
tell which child was wrong a checksum error is logged against the
replacing or sparing mirror vdev.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13555

2 years agoTag zfs-2.1.5
Tony Hutter [Tue, 21 Jun 2022 17:55:02 +0000 (10:55 -0700)]
Tag zfs-2.1.5

META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2 years agoRemove install of zfs-load-module.service for dracut
Matthew Thode [Tue, 21 Jun 2022 17:37:20 +0000 (12:37 -0500)]
Remove install of zfs-load-module.service for dracut

The zfs-load-module.service service is not currently provided by
the OpenZFS repository so we cannot safely assume it exists.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Thode <mthode@mthode.org>
Closes #13574

2 years agoFreeBSD: Silence clang unused-but-set-variable
Ryan Moeller [Wed, 15 Jun 2022 15:18:41 +0000 (15:18 +0000)]
FreeBSD: Silence clang unused-but-set-variable

Quick and dirty build fix for warnings being treated as errors.

Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
2 years agoImprove sorted scan memory accounting
Alexander Motin [Fri, 10 Jun 2022 17:01:46 +0000 (13:01 -0400)]
Improve sorted scan memory accounting

Since we use two B-trees q_exts_by_size and q_exts_by_addr, we should
count 2x sizeof (range_seg_gap_t) per node.  And since average B-tree
memory efficiency is about 75%, we should increase it to 3x.

Previous code under-counted up to 30% of the memory usage.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13537

2 years agoCorrected edge case in uncompressed ARC->L2ARC handling
Rich Ercolani [Wed, 4 May 2022 18:59:30 +0000 (14:59 -0400)]
Corrected edge case in uncompressed ARC->L2ARC handling

I genuinely don't know why this didn't come up before,
but adding the LZ4 early abort pointed out this flaw,
in which we're allocating a buffer of one size, and
then telling the compressor that we're handing it buffers
of a different size, which may be Very Different - say,
allocating 512b and then telling it the inputs are 128k.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #13375

2 years agoRemove wrong assertion in log spacemap
Alexander Motin [Wed, 1 Jun 2022 16:54:35 +0000 (12:54 -0400)]
Remove wrong assertion in log spacemap

It is typical, but not generally true that if log summary has more
blocks it must also have unflushed metaslabs.  Normally with metaslabs
flushed in order it works, but there are known exceptions, such as
device removal or metaslab being loaded during its flush attempt.

Before 600a02b8844 if spa_flush_metaslabs() hit loading metaslab it
usually stopped (unless memlimit is also exceeded), but now it may
flush more metaslabs, just skipping that particular one.  This
increased chances of assertion to fire when the skipped metaslab is
flushed on next iteration if all other metaslabs in that summary
entry are already flushed out of order.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13486
Closes #13513

2 years agolibzfs: Fail making a dataset handle gracefully
Ryan Moeller [Fri, 18 Feb 2022 21:09:03 +0000 (16:09 -0500)]
libzfs: Fail making a dataset handle gracefully

When a dataset is in the process of being received it gets marked as
inconsistent and should not be used.  We should check for this when
opening a dataset handle in libzfs and return with an appropriate error
set, rather than hitting an abort because of the incomplete data.

zfs_open() passes errno to zfs_standard_error() after observing
make_dataset_handle() fail, which ends up aborting if errno is 0.
Set errno before returning where we know it has not been set already.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #13077

2 years agolibzfs: mount: don't leak mnt_param_t if mnt_func fails
наб [Wed, 12 Jan 2022 23:50:15 +0000 (00:50 +0100)]
libzfs: mount: don't leak mnt_param_t if mnt_func fails

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12968

2 years agoReject zfs send -RI with nonexistent fromsnap
Rich Ercolani [Mon, 4 Oct 2021 17:17:30 +0000 (13:17 -0400)]
Reject zfs send -RI with nonexistent fromsnap

Right now, zfs send -I dataset@nonexistent dataset@existent fails, but
zfs send -RI dataset@nonexistent dataset@existent does not.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12574
Closes #12575

2 years agoLinux 5.18 compat: META
Brian Behlendorf [Tue, 31 May 2022 21:38:00 +0000 (14:38 -0700)]
Linux 5.18 compat: META

Update the META file to reflect compatibility with the 5.18 kernel.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13527

2 years agoautoconf: AC_MSG_CHECKING consistency
Brian Behlendorf [Tue, 31 May 2022 23:42:49 +0000 (16:42 -0700)]
autoconf: AC_MSG_CHECKING consistency

Make the wording more consistent for the kernel AC_MSG_CHECKING
output (e.g. "checking whether ...".).  Additionally, group some
of the VFS interface checks with the others.  No functional change.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13529

2 years agoLinux 5.19 compat: asm/fpu/internal.h
Brian Behlendorf [Wed, 1 Jun 2022 00:26:47 +0000 (17:26 -0700)]
Linux 5.19 compat: asm/fpu/internal.h

As of the Linux 5.19 kernel the asm/fpu/internal.h header was
entirely removed.  It has been effectively empty since the 5.16
kernel and provides no required functionality.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13529

2 years agoLinux 5.19 compat: zap_flags_t conflict
Brian Behlendorf [Fri, 27 May 2022 22:56:05 +0000 (15:56 -0700)]
Linux 5.19 compat: zap_flags_t conflict

As of the Linux 5.19 kernel an identically named zap_flags_t typedef
is declared in the include/linux/mm_types.h linux header.  Sadly,
the inclusion of this header cannot be easily avoided.  To resolve
the conflict a #define is used to remap the name in the OpenZFS
sources when building against the Linux kernel.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13515

2 years agoLinux 5.19 compat: bdev_start_io_acct() / bdev_end_io_acct()
Brian Behlendorf [Fri, 27 May 2022 21:31:03 +0000 (21:31 +0000)]
Linux 5.19 compat: bdev_start_io_acct() / bdev_end_io_acct()

As of the Linux 5.19 kernel the disk_*_io_acct() helper functions
have been replaced by the bdev_*_io_acct() functions.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13515

2 years agoLinux 5.19 compat: aops->read_folio()
Brian Behlendorf [Fri, 27 May 2022 20:44:43 +0000 (20:44 +0000)]
Linux 5.19 compat: aops->read_folio()

As of the Linux 5.19 kernel the readpage() address space operation
has been replaced by read_folio().

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13515

2 years agoLinux 5.19 compat: blkdev_issue_secure_erase()
Brian Behlendorf [Fri, 27 May 2022 19:40:22 +0000 (19:40 +0000)]
Linux 5.19 compat: blkdev_issue_secure_erase()

Linux 5.19 commit torvalds/linux@44abff2c0 splits the secure
erase functionality from the blkdev_issue_discard() function.
The blkdev_issue_secure_erase() must now be issued to issue
a secure erase.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13515

2 years agoLinux 5.19 compat: bdev_max_secure_erase_sectors()
Brian Behlendorf [Fri, 27 May 2022 18:20:04 +0000 (18:20 +0000)]
Linux 5.19 compat: bdev_max_secure_erase_sectors()

Linux 5.19 commit torvalds/linux@44abff2c0 removed the
blk_queue_secure_erase() helper function.  The preferred
interface is to now use the bdev_max_secure_erase_sectors()
function to check for discard support.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13515

2 years agoLinux 5.19 compat: bdev_max_discard_sectors()
Brian Behlendorf [Fri, 27 May 2022 17:51:55 +0000 (17:51 +0000)]
Linux 5.19 compat: bdev_max_discard_sectors()

Linux 5.19 commit torvalds/linux@70200574cc removed the
blk_queue_discard() helper function.  The preferred interface
is to now use the bdev_max_discard_sectors() function to check
for discard support.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13515

2 years agoLinux 5.18 compat: bio_alloc()
Brian Behlendorf [Fri, 27 May 2022 20:28:51 +0000 (20:28 +0000)]
Linux 5.18 compat: bio_alloc()

As for the Linux 5.18 kernel bio_alloc() expects a block_device struct
as an argument.  This removes the need for the bio_set_dev() compatibility
code for 5.18 and newer kernels.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13515

2 years agoSilence unused-but-set-variable warning
Ryan Moeller [Thu, 26 May 2022 00:26:59 +0000 (20:26 -0400)]
Silence unused-but-set-variable warning

This was breaking the kmod port build on FreeBSD with Clang 13.

Use the same trick as we do for ASSERT() to make DNODE_VERIFY() use
its parameter at compile time without actually using it at run time
in non-debug builds.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #13507

2 years agozed: support subject as header in zed_notify_email()
heeplr [Wed, 18 May 2022 17:27:53 +0000 (19:27 +0200)]
zed: support subject as header in zed_notify_email()

Some minimal MUAs don't support passing the subjects as cmdline option.
This commit checks if "@SUBJECT@" is missing in ZED_EMAIL_OPTS and then
prepends a subject header to the notification message.
Also set a default for ${subject}.

Reviewed-by: Ahelenia Ziemia<C5><84>ska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Daniel Hiepler <d-git@coderdu.de>
Closes #13440

2 years agorpm: Keep debug symbols if configured with '--enable-debuginfo'
Umer Saleem [Wed, 25 May 2022 16:22:11 +0000 (21:22 +0500)]
rpm: Keep debug symbols if configured with '--enable-debuginfo'

Do not strip debug information from packages if '--enable-debuginfo' is
configured.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #13500

2 years agoFreeBSD: libspl: Add locking around statfs globals
Ryan Moeller [Tue, 24 May 2022 16:40:20 +0000 (12:40 -0400)]
FreeBSD: libspl: Add locking around statfs globals

Makes getmntent and getmntany thread-safe for external consumers of
libzfs zpool_disable_datasets, zfs_iter_mounted, libzfs_mnttab_update,
libzfs_mnttab_find.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #13484

2 years agoStandardize RHEL version check in packages
Brian Behlendorf [Wed, 25 May 2022 16:20:17 +0000 (09:20 -0700)]
Standardize RHEL version check in packages

This is a follow up to 3c356622994 which standardizes how the RHEL
version check is done.  This simpler "0%{?rhel}" check is used
elsewhere in the packages so we do the same here.

Reviewed-by: Neal Gompa <ngompa@datto.com>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13501

2 years agoModified ncompress requirement in RPM to exclude RHEL9
Rich Ercolani [Tue, 24 May 2022 16:39:32 +0000 (12:39 -0400)]
Modified ncompress requirement in RPM to exclude RHEL9

The bug this was working around is no longer present.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #13480
Closes #13490

2 years agozed: Take no action on scrub/resilver checksum errors
Brian Behlendorf [Tue, 24 May 2022 16:36:07 +0000 (09:36 -0700)]
zed: Take no action on scrub/resilver checksum errors

When scrubbing/resilvering a pool it can be counter productive to
cancel the scan and kick of a replace operation to a hot spare
when encountering checksum errors.  In this case, the best course
of action is to allow the scrub/resilver to complete as quickly
as possible and to keep the vdevs fully online if possible.

Realistically, this is less of an issue for a RAIDZ since a
traditional resilver must be used and checksums will be verified.
However, this is not the case for a mirror or dRAID pool which is
sequentially resilvered and checksum verification is deferred
until after the replace operation completes.

Regardless, we apply this policy to all pool types since it's
a good idea for all vdevs.  Degrading additional vdevs has the
potential to make a bad situation worse.  Note the checksum
errors will still be reported as both an event and by
`zpool status`.  This change only prevents the ZED from
proactively taking any action.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13499

2 years agozdb: Fix handling of nul termination in symlink targets
Mark Johnston [Fri, 20 May 2022 17:32:49 +0000 (13:32 -0400)]
zdb: Fix handling of nul termination in symlink targets

The SA attribute containing the symlink target does not include a nul
terminator, so when printing the target zdb would sometimes include
garbage at the end of the string.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #13482

2 years agoautomake: don't install /e/d/zfs or /e/z/zfs-functions +x
наб [Tue, 24 May 2022 23:25:54 +0000 (01:25 +0200)]
automake: don't install /e/d/zfs or /e/z/zfs-functions +x

Closes #13496
Backport-of: #13503
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
2 years agoMultiple dracut module install script cleanups
Savyasachee Jha [Mon, 14 Feb 2022 16:58:37 +0000 (22:28 +0530)]
Multiple dracut module install script cleanups

- Replaced intances of `dracut_install` with `inst_simple`
- Removed calls to `test -x mark_hostonly` because the function is an
inbuilt dracut function
- Removed redundant installation of `systemd-ask-password` and
`systemd-tty-ask-password-agent` because they are already installed by
the systemd module. There is no need to install them again
- Removed multiple calls to the `mark_hostonly` function because the
`inst_simple` has a command-line switch for it
- Cleaned up the installation of the `zpool.cache`, `vdev_id.conf` and
`hostid` files to make the logic easier to follow
- Cleaned up and simplified the systemd service installation logic by
invoking systemctl instead of creating symlinks manually
- Replaced various hard-coded paths with dracut equivalents to better
conform with expected dracut behaviour
- Removed redundant call to `mkdir` (`inst_simple` creates the parent
directory if it does not exist on the destination initrd)

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Andrew J. Hesford <ajh@sideband.org>
Signed-off-by: Savyasachee Jha <hi@savyasacheejha.com>
Closes #13010

2 years agoRemove absolute paths to udev rules and binaries for dracut
Savyasachee Jha [Mon, 14 Feb 2022 12:45:16 +0000 (18:15 +0530)]
Remove absolute paths to udev rules and binaries for dracut

Since dracut functions can locate both udev rules and binaries, there is
no point in keeping absolute paths in the module setup script. It also
breaks the --sysroot option in dracut. This commit removes mentions to
absolute paths for binaries and udev rules.

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Andrew J. Hesford <ajh@sideband.org>
Signed-off-by: Savyasachee Jha <hi@savyasacheejha.com>
Closes #13010

2 years agoMake dracut fail if essential files cannot be installed
Savyasachee Jha [Sun, 6 Feb 2022 04:11:23 +0000 (09:41 +0530)]
Make dracut fail if essential files cannot be installed

Dracut will now fail in initramfs generation if essential files cannot
be installed.

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Andrew J. Hesford <ajh@sideband.org>
Signed-off-by: Savyasachee Jha <hi@savyasacheejha.com>
Closes #13010

2 years agoMake better use of dracut functions when building initramfs
Savyasachee Jha [Tue, 25 Jan 2022 03:22:48 +0000 (03:22 +0000)]
Make better use of dracut functions when building initramfs

Setting up the module involves multiple redundant calls to a bunch of
dracut functions wheich can be combined into one. Additionally, the mass
of code required to load libgcc_s.so* can be replaced with one dracut
function. This has the additional effect of removing errors involving
the non-installation of libgcc_s.so* which are seen on debian bullseye
when using version 2.1.2-1~bpo11+1 from the backports repository.

The systemd binaries are separated out into their own `dracut_install`
function call so they do not get pulled in when dracut does not load the
systemd module.

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Andrew J. Hesford <ajh@sideband.org>
Signed-off-by: Savyasachee Jha <hi@savyasacheejha.com>
Closes #13010

2 years agoFix compiler warnings about zero-length arrays in inline bitops
Coleman Kane [Tue, 17 May 2022 20:07:39 +0000 (16:07 -0400)]
Fix compiler warnings about zero-length arrays in inline bitops

The compiler appears to be expanding the unused NULL pointer into a
zero-length array via the inline bitops code. When -Werror=array-bounds
is used, this causes a build failure. Recommended solution is allocate
temporary structures, fill with zeros (to avoid uninitialized data use
warnings), and pass the pointer to those to the inline calls.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #13463
Closes #13465

2 years agoAdd missing AC_MSG_RESULT(no) to configure
Brian Behlendorf [Thu, 12 May 2022 16:12:32 +0000 (09:12 -0700)]
Add missing AC_MSG_RESULT(no) to configure

When the HAVE_IOPS_MKDIR_USERNS check fails output result
as required.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13454

2 years agoabd_os: remove redundant refcount creation for abd_children
hping [Mon, 9 May 2022 23:30:16 +0000 (07:30 +0800)]
abd_os: remove redundant refcount creation for abd_children

Refcount creation for abd_zero_scatter->abd_children is redundant in
abd_alloc_zero_scatter, as it has been done in abd_init_struct.

In addition, abd_children is undefined when ZFS_DEBUG is disabled, the
reference of abd_children in abd_alloc_zero_scatter breaks build of
libzpool when ZFS_DEBUG is disabled.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Ping Huang <huangping@smartx.com>
Closes #13429

2 years agoFix functions without a prototype
Aidan Harris [Fri, 6 May 2022 18:57:37 +0000 (18:57 +0000)]
Fix functions without a prototype

clang-15 emits the following error message for functions without
a prototype:

fs/zfs/os/linux/spl/spl-kmem-cache.c:1423:27: error:
  a function declaration without a prototype is deprecated
  in all versions of C [-Werror,-Wstrict-prototypes]

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Aidan Harris <me@aidanharr.is>
Closes #13421

2 years agoFreeBSD: use zero_region instead of allocating a dedicated page
Mateusz Guzik [Wed, 4 May 2022 18:46:37 +0000 (20:46 +0200)]
FreeBSD: use zero_region instead of allocating a dedicated page

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #13406

2 years agoautoconf: Fail when __copy_from_user_inatomic is a non-GPL symbol
szubersk [Sat, 7 May 2022 00:53:42 +0000 (00:53 +0000)]
autoconf: Fail when __copy_from_user_inatomic is a non-GPL symbol

A followup to 849c14e04844a2f0e1f7e42886c2cef083563f35
Fix https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1009242

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #13389

2 years agoPPC get_user workaround
Damian Szuberski [Tue, 26 Apr 2022 17:52:40 +0000 (03:52 +1000)]
PPC get_user workaround

Linux 5.12 PPC 5.12 get_user() and __copy_from_user_inatomic()
inline helpers very indirectly include a reference to the GPL'd
array mmu_feature_keys[] and fails to build. Workaround this by
using copy_from_user() and throwing EFAULT for any calls to
__copy_from_user_inatomic(). This is a workaround until a fix
for Linux commit 7613f5a66becfd0e43a0f34de8518695888f5458
"powerpc/64s/kuap: Use mmu_has_feature()" is fully addressed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Authored-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #11958
Closes #12590
Closes #13367

2 years agoAdding ZERO_PAGE detection
Brian Atkinson [Mon, 14 Mar 2022 19:37:39 +0000 (13:37 -0600)]
Adding ZERO_PAGE detection

On some architectures ZERO_PAGE is unavailable because it references
a GPL exported symbol of empty_zero_page. Originally e08b993 removed
the call to PAGE_ZERO(0) for assignment to the abd_zero_page. However,
a simple check can be done to avoid a kernel allocation and free for
the abd_zero_page if ZERO_PAGE is available.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #13199

2 years agoautoconf: Pretend `CONFIG_MODULES` is always on
Damian Szuberski [Tue, 26 Apr 2022 17:47:09 +0000 (03:47 +1000)]
autoconf: Pretend `CONFIG_MODULES` is always on

- Unconditionally inject `CONFIG_MODULES` make variable
  and `#define CONFIG_MODULES` to Kbuild in `ZFS_LINUX_COMPILE`
  autoconf function to emulate loadable kernel modules support.
  This allows OpenZFS to perform Linux checks despite
  `CONFIG_MODULES=n` in the actual Linux config.

- Add `ZFS_AC_KERNEL_CONFIG_MODULES` check which encompasses
  the logic from `ZFS_AC_KERNEL_TEST_MODULE` with additional
  diagnostic messages to the user

- Removed `ZFS_AC_KERNEL_TEST_MODULE` as it merely duplicates
  every check in `ZFS_AC_KERNEL_CONFIG_DEFINED`

- Moved `ZFS_AC_MODULE_SYMVERS` after `ZFS_AC_KERNEL_CONFIG_DEFINED`
  so the user has a chance to see the proper diagnostic from the
  steps before.

A workaround for Linux's

```
commit 3e3005df73b535cb849cf4ec8075d6aa3c460f68
Author: Masahiro Yamada <masahiroy@kernel.org>
Date:   Wed Mar 31 22:38:03 2021 +0900

kbuild: unify modules(_install) for in-tree and external modules

If you attempt to build or install modules ('make modules(_install)'
with CONFIG_MODULES disabled, you will get a clear error message, but
nothing for external module builds.

Factor out the modules and modules_install rules into the common part,
so you will get the same error message when you try to build external
modules with CONFIG_MODULES=n.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
```

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #10832
Closes #13361

2 years agoStrengthen Linux kernel capabilities detection
Damian Szuberski [Thu, 21 Apr 2022 16:37:11 +0000 (18:37 +0200)]
Strengthen Linux kernel capabilities detection

- Add `CONFIG_BLOCK` Linux config requirement to
  `ZFS_AC_KERNEL_CONFIG_DEFINED`. OpenZFS won't compile without
  that block device support due to large amount of functional
  dependencies on it.

- Remove dependency on `groups_alloc()` in
  `ZFS_AC_KERNEL_SRC_GROUP_INFO_GID` to circumvent the missing stub
  in Linux 4.X kernel headers.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #13351

2 years agozvol_wait: Ignore locked zvols
Richard Laager [Fri, 22 Apr 2022 22:37:03 +0000 (17:37 -0500)]
zvol_wait: Ignore locked zvols

"When an encrypted zvol is locked the zfs-volume-wait service does not
start.  The /sbin/zvol_wait should not wait for links when the volume
has property keystatus=unavailable."
-- https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1888405

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Thanks: James Dingwall <james-launchpad@dingwall.me.uk>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #10662

2 years agoFreeBSD: Implement hole-punching support
Ka Ho Ng [Wed, 4 Aug 2021 15:57:48 +0000 (23:57 +0800)]
FreeBSD: Implement hole-punching support

This adds supports for hole-punching facilities in the FreeBSD kernel
starting from __FreeBSD_version 1400032.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ka Ho Ng <khng@FreeBSD.org>
Sponsored-by: The FreeBSD Foundation
Closes #12458

2 years agomodule: zstd: check we don't leak symbols; regenerate symbol map
наб [Tue, 15 Mar 2022 23:10:10 +0000 (00:10 +0100)]
module: zstd: check we don't leak symbols; regenerate symbol map

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12988
Closes #13209
(cherry picked from commit 6ef00196db1cc6bd189eeb72df26d494a2aee889)

2 years agoman: zpool-import.8: -d -or -c
наб [Tue, 10 May 2022 20:36:37 +0000 (22:36 +0200)]
man: zpool-import.8: -d -or -c

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13437

2 years agoReduce dbuf_find() lock contention
Brian Behlendorf [Wed, 4 May 2022 18:17:29 +0000 (11:17 -0700)]
Reduce dbuf_find() lock contention

Holding a dbuf is a common operation which can become highly contended
in dbuf_find() when acquiring the dbuf hash mutex.  This is particularly
true on Linux when reading/writing volumes since by default up to 32
threads from the zvol_taskq may be taking a hold of the same dbuf.
This should also be observable on FreeBSD as long as there are enough
processes accessing the volume concurrently.

This is further aggregrated by the fact that only the block id will
be unique when calculating the dbuf hash for a single volume.  The
objset id, object id, and level will be the same for data blocks.
This has been observed to result in a somehwat less than uniform hash
distribution and a longer than expected max hash chain depth (~20)
on a large memory system (256 GB) using volumes.

This commit improves the siutation by switching the hash mutex to
an rwlock to allow concurrent lookups, and increasing DBUF_RWLOCKS
from 2048 to 8192 to further reduce the odds of a hash collision.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13405

2 years agocontrib: dracut: remove getargbool polyfill
наб [Tue, 5 Apr 2022 00:56:57 +0000 (02:56 +0200)]
contrib: dracut: remove getargbool polyfill

It was originally released in dracut 008 in February 2011;
we can probably drop it now

Upstream-commit: 47a02e39721bd226646dbdfe3063563a4a5e9749
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291

2 years agoAdd dracut.zfs.7
наб [Mon, 4 Apr 2022 23:14:49 +0000 (01:14 +0200)]
Add dracut.zfs.7

Thorough documentation with a dracut.bootup(7)-style flowchart,
dracut.cmdline(7)-style cmdline listing,
and per-file docs like the old README

Upstream-commit: e3fc330d6cc9445821fa44c699b7e66e2f6f209d
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291

2 years agocontrib: dracut: zfs-needshutdown: don't list
наб [Mon, 4 Apr 2022 23:09:11 +0000 (01:09 +0200)]
contrib: dracut: zfs-needshutdown: don't list

Upstream-commit: 1cc9cc2f89eb67722109bbb5a2d9814b5e77d5db
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291

2 years agocontrib: dracut: zfs-{rollback,snapshot}-bootfs: order after key loading
наб [Mon, 4 Apr 2022 22:19:38 +0000 (00:19 +0200)]
contrib: dracut: zfs-{rollback,snapshot}-bootfs: order after key loading

This fixes at least one race I got with an encrypted root

Upstream-commit: 6ebdb0b20de444f69404c265352a27e1b4e49b93
Upstream-commit: b8d9679f3644c813f7aa6db5b9c978b88758049a
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291

2 years agocontrib: dracut: don't require essentials to be under the same encroot
наб [Mon, 4 Apr 2022 21:39:18 +0000 (23:39 +0200)]
contrib: dracut: don't require essentials to be under the same encroot

Upstream-commit: 30c6dce7f7d3808b544ac1c3dbcb4d32c9831c60
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291

2 years agocontrib: dracut: inline single-use import_pool, move single-use ask_for_password
наб [Mon, 4 Apr 2022 21:24:29 +0000 (23:24 +0200)]
contrib: dracut: inline single-use import_pool, move single-use ask_for_password

Also don't set ROOTFS_MOUNTED; the final mention was removed in dracut
011 from July 2011

Upstream-commit: eaf1e060453ad6a335b708c5e724092741d6d1d3
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291

2 years agocontrib: dracut: zfs-lib: remove find_bootfs
наб [Mon, 4 Apr 2022 21:16:59 +0000 (23:16 +0200)]
contrib: dracut: zfs-lib: remove find_bootfs

Upstream-commit: dac0b0785a795efe3bbb5367af35e2c4fbe32040
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291

2 years agocontrib: dracut: zfs-lib: simplify ask_for_password
наб [Mon, 4 Apr 2022 20:57:35 +0000 (22:57 +0200)]
contrib: dracut: zfs-lib: simplify ask_for_password

The only user is mount-zfs.sh (non-systemd systems),
so reduce it to what it needs

Upstream-commit: 5d31169d7c8810baf87f1dc778c97a7024c8c205
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291

2 years agocontrib; dracut: flatten zfs-load-key, simplify zfs-env-bootfs
наб [Mon, 4 Apr 2022 20:52:43 +0000 (22:52 +0200)]
contrib; dracut: flatten zfs-load-key, simplify zfs-env-bootfs

Upstream-commit: fec2c613a49192942200d26bd20bf434649687b7
Upstream-change: drop 90zfs/module-setup.sh.in cleanups that don't apply
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291

2 years agocontrib; dracut: centralise root= parsing, actually support root=s
наб [Mon, 4 Apr 2022 20:45:58 +0000 (22:45 +0200)]
contrib; dracut: centralise root= parsing, actually support root=s

So far, everything parsed root= manually, which meant that while
zfs-parse.sh was updated, and supposedly supported + -> ' ' conversion,
it meant nothing

Instead, centralise parsing, and allow:
  root=
  root=zfs
  root=zfs:
  root=zfs:AUTO

  root=ZFS=data/set
  root=zfs:data/set
  root=zfs:ZFS=data/set (as a side-effect; allowed but undocumented)

  rootfstype=zfs AND root=data/set <=> root=data/set
  rootfstype=zfs AND root=         <=> root=zfs:AUTO

So rootfstype=zfs /also/ behaves as expected, and + decoding works

Upstream-commit: 245529d85fb807bfc4525b3b1858896d2860995b
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291

2 years agocontrib: dracut: parse-zfs: stop pretending we support FILESYSTEM=
наб [Mon, 4 Apr 2022 18:59:53 +0000 (20:59 +0200)]
contrib: dracut: parse-zfs: stop pretending we support FILESYSTEM=

It was added in the original ae26d0465a ("Add dracut support") commit
in 2011, and was then broken a bit later with the advent of
dracut-zfs-generator, or maybe earlier as part of other churn

Either way, it's broken, and has been in 2.0+ as well, and no-one
complained. Stop pretending we support it at all

Upstream-commit: 2c74617bcf76b305937097278b614443d47681a0
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291

2 years agocontrib: dracut: parse-zfs: drop initqueue-finished for i/f
наб [Mon, 4 Apr 2022 16:38:07 +0000 (18:38 +0200)]
contrib: dracut: parse-zfs: drop initqueue-finished for i/f

The switch was released in dracut 009 in March 2011,
we can safely get rid of the compatibility hook

Upstream-commit: 47636f5661abf202c725f1975939878e7156d3c8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291

2 years agocontrib/dracut: zfs-lib: export_all: replace with inline zpool export -a
наб [Sat, 12 Feb 2022 13:08:02 +0000 (14:08 +0100)]
contrib/dracut: zfs-lib: export_all: replace with inline zpool export -a

07a3312f170ac56cb480b0df9fdf4c83f116b59b, which introduced this in
October of 2014, didn't have zpool export -a available; we do

Upstream-commit: 6a41310c7099ca4532f2d8134bba37261f72410e
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13093

2 years agoRemove REMAKE_INITRD
jokersus [Tue, 30 Nov 2021 19:09:15 +0000 (22:09 +0300)]
Remove REMAKE_INITRD

The option has been deprecated in dkms and will break packaging in
future versions. See https://github.com/dell/dkms/commit/7114c62

Upstream-commit: b5c16861e9fa91871540a101e92013e05182fc2f
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: jokersus <jokersus.cava@gmail.com>
Closes #12781

2 years agoPython 3.10 fixes, part 2
Rich Ercolani [Fri, 29 Oct 2021 22:36:01 +0000 (18:36 -0400)]
Python 3.10 fixes, part 2

There was a fallback case I overlooked in the initial patch, with
a similarly imperfect version extractor.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12045
Closes #12673

2 years agoSilence unused-but-set-variable warnings
Brian Behlendorf [Fri, 29 Apr 2022 17:11:20 +0000 (10:11 -0700)]
Silence unused-but-set-variable warnings

Clang 13.0.0 added support for `Wunused-but-set-parameter` and
`-Wunused-but-set-variable` which correctly detects two unused
variables in zstd resulting in a build failure.  This commit
annotates these instances accordingly.

  https://releases.llvm.org/13.0.1/tools/clang/docs/ReleaseNotes.html#id6

In FSE_createCTable(), malloc() is intentionally defined as NULL when
compiled in the kernel so the variable is unused.

  zstd/lib/compress/fse_compress.c:307:12: error: variable 'size'
  set but not used [-Werror,-Wunused-but-set-variable]

Additionally, in ZSTD_seqDecompressedSize() the assert is compiled
out similarly resulting in an unused variable.

  zstd/lib/compress/zstd_compress_superblock.c:412:12: error: variable
  'litLengthSum' set but not used [-Werror,-Wunused-but-set-variable]

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2 years agomodule: zfs: freebsd: fix unused, remove argsused
наб [Wed, 22 Dec 2021 17:35:13 +0000 (18:35 +0100)]
module: zfs: freebsd: fix unused, remove argsused

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844

2 years agoFreeBSD: remove unused variable
наб [Thu, 23 Dec 2021 20:47:43 +0000 (21:47 +0100)]
FreeBSD: remove unused variable

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #12899

2 years agozvol: remove unused variable
наб [Mon, 27 Dec 2021 18:33:58 +0000 (19:33 +0100)]
zvol: remove unused variable

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12917

2 years agofm: remove unused variables
наб [Mon, 27 Dec 2021 18:33:38 +0000 (19:33 +0100)]
fm: remove unused variables

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12917

2 years agozvol: remove unused variable
наб [Mon, 27 Dec 2021 18:33:16 +0000 (19:33 +0100)]
zvol: remove unused variable

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12917

2 years agomodule/zfs: vdev_removal: spa_vdev_remove_thread: remove unused variable
наб [Thu, 3 Jun 2021 16:33:01 +0000 (18:33 +0200)]
module/zfs: vdev_removal: spa_vdev_remove_thread: remove unused variable

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12187

2 years agomodule/zfs: vdev_indirect: vdev_indirect_repair: remove unused variable
наб [Thu, 3 Jun 2021 16:32:38 +0000 (18:32 +0200)]
module/zfs: vdev_indirect: vdev_indirect_repair: remove unused variable

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12187

2 years agomodule/zfs: dbuf: dbuf_read_impl: remove unused variable
наб [Thu, 3 Jun 2021 16:31:58 +0000 (18:31 +0200)]
module/zfs: dbuf: dbuf_read_impl: remove unused variable

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12187

2 years agomodule/zfs: arc: arc_hdr_realloc_crypt: remove unused variables
наб [Thu, 3 Jun 2021 16:31:21 +0000 (18:31 +0200)]
module/zfs: arc: arc_hdr_realloc_crypt: remove unused variables

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12187

2 years agolibzfs: zfs_send: remove unused variable
наб [Thu, 3 Jun 2021 16:30:23 +0000 (18:30 +0200)]
libzfs: zfs_send: remove unused variable

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12187

2 years agolibzutil: zpool_find_config: remove unused variable
наб [Thu, 3 Jun 2021 15:10:41 +0000 (17:10 +0200)]
libzutil: zpool_find_config: remove unused variable

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12187

2 years agoSkip spacemaps reading in case of pool readonly import
Brian Behlendorf [Thu, 28 Apr 2022 23:47:12 +0000 (16:47 -0700)]
Skip spacemaps reading in case of pool readonly import

The only zdb utility require to read metaslab-related data during
read-only pool import because of spacemaps validation. Add global
variable which will allow zdb read spacemaps in case of readonly
import mode.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes #9095
Closes #12687

2 years agozfs: holds: dequadratify
наб [Thu, 28 Apr 2022 22:18:47 +0000 (00:18 +0200)]
zfs: holds: dequadratify

Before:
  15  0m0.177s
  30  0m0.653s
  45  0m1.289s
  60  0m2.129s
  75  0m3.264s
  90  0m4.397s
  100 0m5.996s
  117 0m8.552s

After:
  30  0m0.053s
  117 0m0.125s

Upstream-commit: 2a70a090729dade59967bbfe6a4984c18613e0ee
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13372
Closes #13373

2 years agoLinux 5.18 compat: replace __set_page_dirty_nobuffers
Brian Behlendorf [Thu, 28 Apr 2022 22:17:38 +0000 (15:17 -0700)]
Linux 5.18 compat: replace __set_page_dirty_nobuffers

Replace __set_page_dirty_nobuffers with filemap_dirty_folio.

Upstream-commit: 6b1f86f8e9c7f9de7ca1cb987b2cf25e99b1ae3a
("Merge tag 'folio-5.18b' of
git://git.infradead.org/users/willy/pagecache ")

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Authored-by: Satadru Pramanik <satadru@gmail.com>
Signed-off-by: Satadru Pramanik <satadru@gmail.com>
Closes #13325
Closes #13380

2 years agoFix O_APPEND for Linux 3.15 and older kernels
Brian Behlendorf [Thu, 28 Apr 2022 22:15:28 +0000 (15:15 -0700)]
Fix O_APPEND for Linux 3.15 and older kernels

When using a Linux kernel which predates the iov_iter interface the
O_APPEND flag should be applied in zpl_aio_write() via the call to
generic_write_checks().  The updated pos variable  was incorrectly
ignored resulting in the current offset being used.

This issue should only realistically impact the RHEL/CentOS 7.x
kernels which are based on Linux 3.10.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13370
Closes #13377