]> CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log
FreeBSD/FreeBSD.git
22 months agoFreeBSD: Cleanup dead code from VFS
Richard Yao [Fri, 2 Sep 2022 20:20:10 +0000 (16:20 -0400)]
FreeBSD: Cleanup dead code from VFS

The vfs_*_feature() macros turn anything that uses them into dead code,
so we can delete all of it.

As a side effect, zfs_set_fuid_feature() is now identical in
module/os/freebsd/zfs/zfs_vnops_os.c and
module/os/linux/zfs/zfs_vnops_os.c. A few other functions are identical
too. Future cleanup could move these into a common file.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13832

22 months agoAlloc zdb_cd_t to fix stack issue
Andrew Innes [Fri, 2 Sep 2022 20:15:18 +0000 (04:15 +0800)]
Alloc zdb_cd_t to fix stack issue

Alloc zdb_cd_t since it is too large for the stack on windows
which results in `zdb` crashing immediately.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrew Innes <andrew.c12@gmail.com>
Co-authored-by: Jorgen Lundman <lundman@lundman.net>
Closes #13807

22 months agoImporting from cachefile can trip assertion
George Wilson [Fri, 26 Aug 2022 21:04:27 +0000 (16:04 -0500)]
Importing from cachefile can trip assertion

When importing from cachefile, it is possible that the builtin retry
logic will trip an assertion because it also fails to find the pool.
This fix addresses that case and returns the correct error message to
the user.

Reviewed-by: Richard Yao <ryao@gentoo.org>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes #13781

22 months agoZTS: zvol_stress: fix race condition with zinject usage
Christian Schwarz [Thu, 25 Aug 2022 21:22:10 +0000 (23:22 +0200)]
ZTS: zvol_stress: fix race condition with zinject usage

In automated ZTS runs, I'd occasionally hit

    log_fail "Expected to see some write errors"

because there weren't any write errors.

The reason is that we're not syncing the zpool before `zinject -c`.
If the writes by `dd` aren't synced out at the time `zinject -c` runs,
they will not hit an error and we'll hit the log_fail above.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Christian Schwarz <christian.schwarz@nutanix.com>
Closes #13793

22 months agoRevert "Avoid panic with recordsize > 128k, raw sending and no large_blocks"
Brian Behlendorf [Thu, 25 Aug 2022 20:33:32 +0000 (13:33 -0700)]
Revert "Avoid panic with recordsize > 128k, raw sending and no large_blocks"

This reverts commit 80a650b7bb04bce3aef5e4cfd1d966e3599dafd4.  This change
inadvertently introduced a regression in ztest where one of the new ASSERTs
is triggered in dsl_scan_visitbp().

Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #12275
Closes #13799

22 months agoUpdates for snapshots_changed property
Umer Saleem [Wed, 24 Aug 2022 21:20:43 +0000 (02:20 +0500)]
Updates for snapshots_changed property

Currently, snapshots_changed property is stored in dd_props_zapobj, due
to which the property is assumed to be local. This causes a difference
in behavior with respect to other readonly properties.

This commit stores the snapshots_changed property in dd_object. Source
is not set to local in this case, which makes it consistent with other
readonly properties.

This commit also updates the date string format to include seconds.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #13785

22 months agoFix zpool status in case of unloaded keys
George Amanakis [Tue, 23 Aug 2022 00:42:01 +0000 (02:42 +0200)]
Fix zpool status in case of unloaded keys

When scrubbing an encrypted filesystem with unloaded key still report an
error in zpool status.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #13675
Closes #13717

22 months agoPrevent zevent list from consuming all of kernel memory
Paul Dagnelie [Mon, 22 Aug 2022 19:36:22 +0000 (12:36 -0700)]
Prevent zevent list from consuming all of kernel memory

There are a couple changes included here. The first is to introduce
a cap on the size the ZED will grow the zevent list to. One million
entries is more than enough for most use cases, and if you are
overflowing that value, the problem needs to be addressed another
way. The value is also tunable, for those who want the limit to be
higher or lower.

The other change is to add a kernel module parameter that allows
snapshot creation/deletion to be exempted from the history logging;
for most workloads, having these things logged is valuable, but for
some workloads it produces large quantities of log spam and isn't
especially helpful.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Issue #13374
Closes #13753

22 months agocontrib: dracut: zfs-snapshot-bootfs: exit status fix
gregory-lee-bartholomew [Fri, 12 Aug 2022 21:28:15 +0000 (16:28 -0500)]
contrib: dracut: zfs-snapshot-bootfs: exit status fix

When the zfs-snapshot-bootfs service attempts to create a snapshot
that already exists, the exit status of the command is non-zero and
the service reports failed to the systemd service manager. This is a
common occurrence if bootfs.snapshot is left set on the kernel command
line and it should not be considered a failure.

This service was originally set to ignore this error by prefixing
the command with - on the ExecStart line, but the leading - appears
to have been dropped in #13359.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gregory Bartholomew <gregory.lee.bartholomew@gmail.com>
Closes #13769

22 months agoarcstat: fix -p option
r-ricci [Fri, 12 Aug 2022 21:21:52 +0000 (22:21 +0100)]
arcstat: fix -p option

When the -p option is used, a list of floats is passed to sep.join(),
which expects strings. Fix this by converting each value to a string.

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Roberto Ricci <ricci@disroot.org>
Closes #12916
Closes #13767

22 months agoEnable relatime by default
George Melikov [Fri, 12 Aug 2022 21:20:25 +0000 (00:20 +0300)]
Enable relatime by default

Linux sets relatime on mount by default for any file system,
but relatime=off in ZFS disables it explicitly.

Let's be consistent with other file systems on Linux.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #13614

22 months agoZTS: Fix zpool_expand_001_pos
Tony Hutter [Tue, 9 Aug 2022 20:26:46 +0000 (13:26 -0700)]
ZTS: Fix zpool_expand_001_pos

`zpool_expand_001_pos` was often failing due to not seeing autoexpand
commands in the `zpool history`.  During testing, I found this to be
unreliable (sometimes the "online" wouldn't appear in `zpool history`)
and unnecessary, as we could simply check that the pool increased in
size.

This commit revamps the test to check for the expanded pool size
and corresponding new free space.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #13743

23 months agoAdd comment on acb_zio_dummy
Christian Schwarz [Mon, 8 Aug 2022 23:55:13 +0000 (01:55 +0200)]
Add comment on acb_zio_dummy

Thanks to George Wilson for clarifying this on Slack.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Christian Schwarz <christian.schwarz@nutanix.com>
Closes #13698

23 months agoLinux 6.0 compat: register_shrinker() now var-arg
Coleman Kane [Mon, 8 Aug 2022 23:18:30 +0000 (19:18 -0400)]
Linux 6.0 compat: register_shrinker() now var-arg

The 6.0 kernel added a printf-style var-arg for args > 0 to the
register_shrinker function, in order to add names to shrinkers, in
commit e33c267ab70de4249d22d7eab1cc7d68a889bac2. This enables the
shrinkers to have friendly names exposed in /sys/kernel/debug/shrinker/.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #13748

23 months agolibzfs: Remove unused zpool_get_physpath()
Ryan Moeller [Fri, 5 Aug 2022 00:04:09 +0000 (20:04 -0400)]
libzfs: Remove unused zpool_get_physpath()

This is an oddly specific function that has never had any consumers in
the history of this repo.  Get rid of it and the pile of helper
functions that exist for it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #13724

23 months agozpool: fix redundancy check after vdev removal
Stéphane Lesimple [Fri, 5 Aug 2022 00:02:57 +0000 (03:02 +0300)]
zpool: fix redundancy check after vdev removal

The presence of indirect vdevs was confusing get_redundancy(), which
considered a pool with e.g. only mirror top-level vdevs and at least
one indirect vdev (due to the removal of a previous vdev) as already
having a broken redundancy, which is not the case. This lead to the
possibility of compromising the redundancy of a pool by adding
mismatched vdevs without requiring the use of `-f`, and with no
visible notice or warning.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Stéphane Lesimple <speed47_github@speed47.net>
Closes #13705
Closes #13711

23 months agoLinux 5.20 compat: blk_cleanup_disk()
Brian Behlendorf [Thu, 4 Aug 2022 00:37:52 +0000 (17:37 -0700)]
Linux 5.20 compat: blk_cleanup_disk()

As of the Linux 5.20 kernel blk_cleanup_disk() has been removed,
all callers should use put_disk().

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13728

23 months agoLinux 5.20 compat: bdevname()
Brian Behlendorf [Wed, 3 Aug 2022 18:35:47 +0000 (11:35 -0700)]
Linux 5.20 compat: bdevname()

As of the Linux 5.20 kernel bdevname() has been removed, all
callers should use snprintf() and the "%pg" format specifier.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13728

23 months agoDon't double-zero buffers in fault management nvlists
Paul Dagnelie [Thu, 4 Aug 2022 23:53:47 +0000 (16:53 -0700)]
Don't double-zero buffers in fault management nvlists

This is a small cleanup for a trivial problem which happened to
be noticed while another issue was being investigated.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #13730

23 months agoAdd snapshots_changed as property
Umer Saleem [Tue, 2 Aug 2022 23:45:30 +0000 (04:45 +0500)]
Add snapshots_changed as property

Make dd_snap_cmtime property persistent across mount and unmount
operations by storing in ZAP and restore the value from ZAP on hold
into dd_snap_cmtime instead of updating it.

Expose dd_snap_cmtime as 'snapshots_changed' property that provides a
mechanism to quickly determine whether snapshot list for dataset has
changed without having to mount a dataset or iterate the snapshot list.

It specifies the time at which a snapshot for a dataset was last
created or deleted. This allows us to be more efficient how often we
query snapshots.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #13635

23 months agoFreeBSD: Ignore symlink to i386 includes
Ryan Moeller [Tue, 2 Aug 2022 23:34:23 +0000 (19:34 -0400)]
FreeBSD: Ignore symlink to i386 includes

A symlink to i386 includes is created in the build dir on amd64 since
freebsd/freebsd-src@d07600c563039f252becc29ac7d9a454b6b0600d

Tell git to ignore it like the other include links.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #13719

23 months agoLinux 5.19 compat: META
Brian Behlendorf [Tue, 2 Aug 2022 17:04:38 +0000 (10:04 -0700)]
Linux 5.19 compat: META

Update the META file to reflect compatibility with the 5.19 kernel.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13715

23 months agoSkip checksum benchmarks on systems with slow cpu
Tino Reichardt [Mon, 1 Aug 2022 16:51:45 +0000 (18:51 +0200)]
Skip checksum benchmarks on systems with slow cpu

The checksum benchmarking on module load may take a really long time
on embedded systems with a slow cpu. Avoid all benchmarks >= 1MiB on
systems, where EdonR is slower then 300 MiB/s.

This limit is currently hardcoded via the define LIMIT_PERF_MBS.

This is the new benchmark output of a slow Intel Atom:

```
 implementation    1k    4k   16k   64k  256k    1m    4m   16m
 edonr-generic    209   257   268   259   262     0     0     0
 skein-generic    129   150   151   150   150     0     0     0
 sha256-generic    50    55    56    56    56     0     0     0
 sha512-generic    76    86    88    89    88     0     0     0
 blake3-generic    63    62    62    62    61     0     0     0
 blake3-sse2      114   292   301   307   309     0     0     0
```

Reviewed-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13695

23 months agoFix checkstyle warning: E275 missing whitespace after keyword
Tino Reichardt [Mon, 1 Aug 2022 16:49:35 +0000 (18:49 +0200)]
Fix checkstyle warning: E275 missing whitespace after keyword

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13710

23 months agoImplement a new type of zfs receive: corrective receive (-c)
Alek P [Thu, 28 Jul 2022 22:52:46 +0000 (18:52 -0400)]
Implement a new type of zfs receive: corrective receive (-c)

This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes #9372

23 months agoFreeBSD compile fix
Tino Reichardt [Thu, 28 Jul 2022 21:19:41 +0000 (23:19 +0200)]
FreeBSD compile fix

The file module/os/freebsd/zfs/zfs_ioctl_compat.c fails compiling
because of this error: 'static' is not at beginning of declaration

This commit fixes the three places within that file.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13702

23 months agoZTS: Fix io_uring support check
Brian Behlendorf [Tue, 26 Jul 2022 21:39:23 +0000 (14:39 -0700)]
ZTS: Fix io_uring support check

Not all Linux distribution kernels enable io_uring support by
default.  Update the run time check to verify that the booted
kernel was built with CONFIG_IO_URING=y.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Co-authored-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13648
Closes #13685

23 months agoAdd createtxg sort support for simple snapshot iterator
Ameer Hamza [Mon, 25 Jul 2022 21:04:46 +0000 (02:04 +0500)]
Add createtxg sort support for simple snapshot iterator

- When iterating snapshots with name only, e.g., "-o name -s name",
libzfs uses simple snapshot iterator and results are displayed
in alphabetic order. This PR adds support for faster version of
createtxg sort by avoiding nvlist parsing for properties. Flags
"-o name -s createtxg" will enable createtxg sort while using
simple snapshot iterator.
- Added support to read createtxg property directly from zfs handle
for filesystem, volume and snapshot types instead of parsing nvlist.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #13577

23 months agoZTS: Fix occasional inherit_001_pos.ksh failure
Brian Behlendorf [Mon, 25 Jul 2022 16:52:42 +0000 (09:52 -0700)]
ZTS: Fix occasional inherit_001_pos.ksh failure

The mountpoint may still be busy when the `zfs unmount -a` command
is run causing an unexpected failure.  Retry the unmount a couple
of times since it should not remain busy for long.

    19:10:50.29 NOTE: Reading state from .../inheritance/state021.cfg
    19:10:50.32 cannot unmount '/TESTPOOL': pool or dataset is busy
    19:10:50.32 ERROR: zfs unmount -a exited 1

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13686

23 months agozdb: dump spill block pointer if present
Christian Schwarz [Thu, 21 Jul 2022 00:16:29 +0000 (02:16 +0200)]
zdb: dump spill block pointer if present

Output will look like so:

  $ sudo zdb -dddd -vv testpool/fs 2
  Dataset testpool/fs [ZPL], ID 260, cr_txg 8, 25K, 7 objects, rootbp DVA[0]=<0:1800be00:200> DVA[1]=<0:1c00be00:200> [L0 DMU objset] fletcher4 lz4 unencrypted LE contiguous unique double size=1000L/200P birth=16L/16P fill=7 cksum=d03b396cd:489ca835517:d4b04a4d0a62:1b413aac454d53

      Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
           2    1   128K    512     1K     512    512    0.00  ZFS plain file (K=inherit) (Z=inherit=lz4)
                                                 192   bonus  System attributes
      dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED SPILL_BLKPTR
      dnode maxblkid: 0
      path    /testfile
      uid     0
      gid     0
      atime   Fri Jul 15 12:36:35 2022
      mtime   Fri Jul 15 12:36:35 2022
      ctime   Fri Jul 15 12:36:51 2022
      crtime  Fri Jul 15 12:36:35 2022
      gen 10
      mode    100600
      size    0
      parent  34
      links   1
      pflags  840800000004
      SA xattrs: 248 bytes, 2 entries

          security.selinux = nutanix_u:object_r:unlabeled_t:s0\000
          user.foo = xbLQJjyVvEVPGGuRHV/gjkFFO1MdehKnLjjd36ZaoMVaUqtqFoMMYT5Ya9yywHApJNoK/1hNJfO3\012XCJWv9/QUTKamoWW9xVDE7yi8zn166RNw5QUhf84cZ3JNLnw6oN

Spill block: 0:10005c00:200 0:14005c00:200 200L/200P F=1 B=16/16 cksum=1cdfac47a4:910c5caa557:195d0493dfe5a:332b6fde6ad547
Indirect blocks:

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Christian Schwarz <christian.schwarz@nutanix.com>
Closes #13640

23 months agoAdd support for per dataset zil stats and use wmsum counters
ixhamza [Thu, 21 Jul 2022 00:14:06 +0000 (05:14 +0500)]
Add support for per dataset zil stats and use wmsum counters

ZIL kstats are reported in an inclusive way, i.e., same counters are
shared to capture all the activities happening in zil. Added support
to report zil stats for every datset individually by combining them
with already exposed dataset kstats.

Wmsum uses per cpu counters and provide less overhead as compared
to atomic operations. Updated zil kstats to replace wmsum counters
to avoid atomic operations.

Reviewed-by: Christian Schwarz <christian.schwarz@nutanix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #13636

23 months agoFix scrub resume from newly created hole
Alexander Motin [Thu, 21 Jul 2022 00:02:36 +0000 (20:02 -0400)]
Fix scrub resume from newly created hole

It may happen that scan bookmark points to a block that was turned
into a part of a big hole.  In such case dsl_scan_visitbp() may skip
it and dsl_scan_check_resume() will not be called for it.  As result
new scan suspend won't be possible until the end of the object, that
may take hours if the object is a multi-terabyte ZVOL on a slow HDD
pool, stretching TXG to all that time, creating all sorts of problems.

This patch changes the resume condition to any greater or equal block,
so even if we miss the bookmarked block, the next one we find will
delete the bookmark, allowing new suspend.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13643

23 months agoFix memory allocation for the checksum benchmark
Tino Reichardt [Thu, 21 Jul 2022 00:01:32 +0000 (02:01 +0200)]
Fix memory allocation for the checksum benchmark

Allocation via kmem_cache_alloc() is limited to less then 4m for
some architectures.

This commit limits the benchmarks with the linear abd cache to 1m
on all architectures and adds 4m + 16m benchmarks via non-linear
abd_alloc().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13669
Closes #13670

23 months agoExpose ZFS dataset case sensitivity setting via sb_opts
ixhamza [Thu, 14 Jul 2022 17:38:16 +0000 (22:38 +0500)]
Expose ZFS dataset case sensitivity setting via sb_opts

Makes the case sensitivity setting visible on Linux in /proc/mounts.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #13607

23 months agozed: Look for NVMe DEVPATH if no ID_BUS
Tony Hutter [Thu, 14 Jul 2022 17:19:37 +0000 (10:19 -0700)]
zed: Look for NVMe DEVPATH if no ID_BUS

We tried replacing an NVMe drive using autoreplace, only
to see zed reject it with:

zed[27955]: zed_udev_monitor: /dev/nvme5n1 no devid source

This happened because ZED saw that ID_BUS was not set by udev
for the NVMe drive, and thus didn't think it was "real drive".
This commit allows NVMe drives to be autoreplaced even if
ID_BUS is not set.

Reviewed-by: Don Brady <don.brady@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #13512
Closes #13646

23 months agoReplace dead opensolaris.org license link
Tino Reichardt [Mon, 11 Jul 2022 21:16:13 +0000 (23:16 +0200)]
Replace dead opensolaris.org license link

The commit replaces all findings of the link:
http://www.opensolaris.org/os/licensing with this one:
https://opensource.org/licenses/CDDL-1.0

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13619

23 months agozed: Ignore false 'atari' partitions in autoreplace
Tony Hutter [Mon, 11 Jul 2022 20:35:19 +0000 (13:35 -0700)]
zed: Ignore false 'atari' partitions in autoreplace

libudev will sometimes falsely identify an 'atari' partition on a
blank disk, preventing it from being used in an autoreplace.  This
seems to be a known issue.  The workaround is to just ignore the
fake partition and continue with the autoreplace.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #13497
Closes #13632

23 months agorpm: Silence "unversioned Obsoletes" warnings on EL 9
Tony Hutter [Mon, 11 Jul 2022 18:35:01 +0000 (11:35 -0700)]
rpm: Silence "unversioned Obsoletes" warnings on EL 9

Get rid of RPM warnings on AlmaLinux 9:

"It's not recommended to have unversioned Obsoletes"

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #13584
Closes #13638

23 months agoLinux: Align MODULE_LICENSE macro text
Brian Behlendorf [Mon, 11 Jul 2022 18:29:12 +0000 (11:29 -0700)]
Linux: Align MODULE_LICENSE macro text

Specify the lua and zstd license text in the manor in which the
kernel MODULE_LICENSE macro requires it.  The now duplicate entries
were merged and a comment added to make it clear what they apply to.

Reviewed-by: Christian Schwarz <christian.schwarz@nutanix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13641

2 years agoCall nvlist_free before return
Finix1979 [Thu, 7 Jul 2022 18:43:58 +0000 (02:43 +0800)]
Call nvlist_free before return

Fixes a small kernel memory leak which would occur if a pool failed
to import because the `DMU_POOL_VDEV_ZAP_MAP` key can't be read from
a presumably damaged MOS config.  In the case of a missing key there
was no leak.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Finix1979 <yancw@info2soft.com>
Closes #13629

2 years agoAvoid memory copy when verifying raidz/draid parity
Alexander Motin [Tue, 5 Jul 2022 23:27:29 +0000 (19:27 -0400)]
Avoid memory copy when verifying raidz/draid parity

Before this change for every valid parity column raidz_parity_verify()
allocated new buffer and copied there existing data, then recalculated
the parity and compared the result with the copy.  This patch removes
the memory copy, simply swapping original buffer pointers with newly
allocated empty ones for parity recalculation and comparison. Original
buffers with potentially incorrect parity data are then just freed,
while new recalculated ones are used for repair.

On a pool of 12 4-wide raidz vdevs, storing 1.5TB of 16MB blocks, this
change reduces memory traffic during scrub by 17% and total unhalted
CPU time by 25%.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13613

2 years agoAvoid memory copies during mirror scrub
Alexander Motin [Tue, 5 Jul 2022 23:26:20 +0000 (19:26 -0400)]
Avoid memory copies during mirror scrub

Issuing several scrub reads for a block we may use the parent ZIO
buffer for one of child ZIOs.  If that read complete successfully,
then we won't need to copy the data explicitly.  If block has only
one copy (typical for root vdev, which is also a mirror inside),
then we never need to copy -- succeed or fail as-is.  Previous
code also copied data from buffer of every successfully completed
child ZIO, but that just does not make any sense.

On healthy N-wide mirror this saves all N+1 (or even more in case
of ditto blocks) memory copies for each scrubbed block, allowing
CPU to focus mostly on check-summing.  For other vdev types it
should save one memory copy per block copy at root vdev.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13606

2 years agoRe-fix -Wwrite-strings on FreeBSD
наб [Thu, 30 Jun 2022 18:31:09 +0000 (20:31 +0200)]
Re-fix -Wwrite-strings on FreeBSD

Follow up fix for a926aab902ac5c680f4766568d19674b80fb58bb.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13348
Closes #13610

2 years agodracut: fix boot on non-zfs-root systems
Toyam Cox [Thu, 30 Jun 2022 17:47:58 +0000 (13:47 -0400)]
dracut: fix boot on non-zfs-root systems

Simply prevent overwriting root until it needs to be overwritten.

Dracut could change this value before this module is called, but won't
change the kernel command line.

Reviewed-by: Andrew J. Hesford <ajh@sideband.org>
Signed-off-by: Toyam Cox <vaelatern@voidlinux.org>
Closes #13592

2 years agocontrib: dracut: README.md
gregory-lee-bartholomew [Thu, 30 Jun 2022 17:43:27 +0000 (12:43 -0500)]
contrib: dracut: README.md

Change zfs-snapshot-bootfs.service to zfs-rollback-bootfs.service in
cmdline point 4 of README.md.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gregory Bartholomew <gregory.lee.bartholomew@gmail.com>
Closes #13609

2 years agoFix dnode byteswapping
George Amanakis [Thu, 30 Jun 2022 00:06:16 +0000 (02:06 +0200)]
Fix dnode byteswapping

If a dnode has a spill pointer, and we use DN_SLOTS_TO_BONUSLEN() then
we will possibly include the spill pointer in the len calculation and it
will be byteswapped. Then dnode_byteswap() will carry on and swap the
spill pointer again. Fix this by using DN_MAX_BONUS_LEN() instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #13002
Closes #13015

2 years agocontrib: dracut: zfs-{rollback,snapshot}-bootfs: explicit snapname fix
gregory-lee-bartholomew [Wed, 29 Jun 2022 23:56:04 +0000 (18:56 -0500)]
contrib: dracut: zfs-{rollback,snapshot}-bootfs: explicit snapname fix

Due to a missing semicolon on the ExecStart line, it wasn't possible
to specify the snapshot name on the bootfs.{rollback,snapshot}
kernel parameters if the boot dataset name was obtained from the
root=zfs:... kernel parameter.

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gregory Bartholomew <gregory.lee.bartholomew@gmail.com>
Closes #13585

2 years agodracut: fix typo in mount-zfs.sh.in
Brian Behlendorf [Wed, 29 Jun 2022 22:33:38 +0000 (15:33 -0700)]
dracut: fix typo in mount-zfs.sh.in

Format the `zpool get` command correctly.  The -o option must
be followed by "all" or the requested field name.

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13602

2 years agomodule: lua: ldo: fix pragma name
наб [Tue, 28 Jun 2022 21:31:55 +0000 (23:31 +0200)]
module: lua: ldo: fix pragma name

/home/nabijaczleweli/store/code/zfs/module/lua/ldo.c:175:32: warning:
unknown option after ‘#pragma GCC diagnostic’ kind [-Wpragmas]
  175 | #pragma GCC diagnostic ignored "-Winfinite-recursion"a
      |                                ^~~~~~~~~~~~~~~~~~~~~~

Fixes: a6e8113fed8a508ffda13cf1c4d8da99a4e8133a ("Silence
-Winfinite-recursion warning in luaD_throw()")

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13348

2 years agolinux: libzfs: util: don't fallthrough to to end-of-switch
наб [Tue, 28 Jun 2022 21:27:44 +0000 (23:27 +0200)]
linux: libzfs: util: don't fallthrough to to end-of-switch

lib/libzfs/os/linux/libzfs_util_os.c:262:3: error: fallthrough
annotation does not directly precede switch label
                zfs_fallthrough;
                ^
./lib/libspl/include/sys/feature_tests.h:34:26: note: expanded from
macro 'zfs_fallthrough'

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13348

2 years agotests: modernise zdb_decompress
наб [Sat, 28 May 2022 13:19:05 +0000 (15:19 +0200)]
tests: modernise zdb_decompress

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13348

2 years agoRemaining {=> const} char|void *tag
наб [Tue, 19 Apr 2022 18:49:30 +0000 (20:49 +0200)]
Remaining {=> const} char|void *tag

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13348

2 years agoEnable -Wwrite-strings
наб [Tue, 19 Apr 2022 18:38:30 +0000 (20:38 +0200)]
Enable -Wwrite-strings

Also, fix leak from ztest_global_vars_to_zdb_args()

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13348

2 years agoFix znode group permission different from acl mask
gaoyanping [Wed, 29 Jun 2022 20:38:46 +0000 (04:38 +0800)]
Fix znode group permission different from acl mask

Zp->z_mode is set at the same time inode->i_mode
is being changed. This has the effect of keeping both
in sync without relying on zfs_znode_update_vfs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: yanping.gao <yanping.gao@xtaotech.com>
Closes #13581

2 years agoFreeBSD: only define B_FALSE/B_TRUE if NEED_SOLARIS_BOOLEAN is not set
Kristof Provost [Tue, 28 Jun 2022 21:11:38 +0000 (23:11 +0200)]
FreeBSD: only define B_FALSE/B_TRUE if NEED_SOLARIS_BOOLEAN is not set

If NEED_SOLARIS_BOOLEAN is defined we define an enum boolean_t, which
defines B_TRUE/B_FALSE as well. If we have both the define and the enum
things don't build (because that translates to
'enum { 0, 1 }     boolean_t').

While here also remove an incorrect '#else'. With it in place we only
parse a section if the include guard is triggered. So we'd only use that
code if this file is included twice. This is clearly unintended, and
also means we don't get the 'boolean_t' definition. Fix this.

Reviewed-by: Warner Losh <imp@bsdimp.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Kristof Provost <kprovost@netgate.com>
Sponsored-By: Rubicon Communications, LLC ("Netgate")
Closes #13596

2 years agoFix and disable blocks statistics during scrub
Alexander Motin [Tue, 28 Jun 2022 18:23:31 +0000 (14:23 -0400)]
Fix and disable blocks statistics during scrub

Block statistics calculation during scrub I/O issue in case of sorted
scrub accounted ditto blocks several times.  Embedded blocks on other
side were not accounted at all.  This change moves the accounting from
issue to scan stage, that fixes both problems and also allows to avoid
pool-wide locking and the lock contention it created.

Since this statistics is quite specific and is not even exposed now
anywhere, disable its calculation by default to not waste CPU time.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13579

2 years agoFix objtool: missing int3 after ret warning
Brian Behlendorf [Mon, 20 Jun 2022 23:36:21 +0000 (23:36 +0000)]
Fix objtool: missing int3 after ret warning

Resolve straight-line speculation warnings reported by objtool
for x86_64 assembly on Linux when CONFIG_SLS is set.  See the
following LWN article for the complete details.

https://lwn.net/Articles/877845/

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13528
Closes #13575

2 years agoFix -Wformat-overflow warning in zfs_project_handle_dir()
Brian Behlendorf [Mon, 20 Jun 2022 22:27:55 +0000 (22:27 +0000)]
Fix -Wformat-overflow warning in zfs_project_handle_dir()

Switch to using asprintf() to satisfy the compiler and resolve the
potential format-overflow warning.  Not the conditional before the
sprintf() would have prevented this regardless.

    cmd/zfs/zfs_project.c: In function ‘zfs_project_handle_dir’:
    cmd/zfs/zfs_project.c:241:38: error: ‘/’ directive writing
    1 byte into a region of size between 0 and 4352
    [-Werror=format-overflow=]
    cmd/zfs/zfs_project.c:241:17: note: ‘sprintf’ output between
    2 and 4609 bytes into a destination of size 4352

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13528
Closes #13575

2 years agoFix -Wformat-truncation warning in upgrade_set_callback()
Brian Behlendorf [Mon, 20 Jun 2022 21:54:42 +0000 (21:54 +0000)]
Fix -Wformat-truncation warning in upgrade_set_callback()

Extend the buffer slightly resolve the warning.

    cmd/zfs/zfs_main.c: In function ‘upgrade_set_callback’:
    cmd/zfs/zfs_main.c:2446:22: error: ‘%llu’ directive output
    may be truncated writing between 1 and 20 bytes into a
    region of size 16 [-Werror=format-truncation=]
    cmd/zfs/zfs_main.c:2445:24: note: ‘snprintf’ output between
    2 and 21 bytes into a destination of size 16

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13528
Closes #13575

2 years agoFix -Wuse-after-free warning in dbuf_destroy()
Brian Behlendorf [Mon, 20 Jun 2022 21:35:38 +0000 (21:35 +0000)]
Fix -Wuse-after-free warning in dbuf_destroy()

Move the use of the db pointer after it is freed.  It's only used as
a tag so a dereference would never occur, but there's no reason we
can't invert the order to resolve the warning.

    module/zfs/dbuf.c: In function 'dbuf_destroy':
    module/zfs/dbuf.c:2953:17: error:
    pointer 'db' may be used after 'free' [-Werror=use-after-free]

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13528
Closes #13575

2 years agoFix -Wuse-after-free warning in dbuf_issue_final_prefetch_done()
Brian Behlendorf [Mon, 20 Jun 2022 21:32:03 +0000 (21:32 +0000)]
Fix -Wuse-after-free warning in dbuf_issue_final_prefetch_done()

Move the use of the private pointer after it is freed.  It's only
used as a tag so a dereference would never occur, but there's no
harm in inverting the order to resolve the warning.

    module/zfs/dbuf.c: In function 'dbuf_issue_final_prefetch_done':
    module/zfs/dbuf.c:3204:17: error:
    pointer 'private' may be used after 'free' [-Werror=use-after-free]

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13528
Closes #13575

2 years agoFix -Wattribute-warning in dsl layer
Brian Behlendorf [Mon, 20 Jun 2022 21:13:26 +0000 (21:13 +0000)]
Fix -Wattribute-warning in dsl layer

The memcpy(), memmove(), and memset() functions have been annotated
to perform bounds checking when using FORTIFY_SOURCE.  A warning is
now generted when writing beyond the end of the specified field.

Alternately, the new struct_group() macro could be used to create
an anonymous union member for use by memcpy().  However, since this
is the only place the macro would be helpful it's preferable to
restructure the code slights to avoid the need for additional
compatibility code when the macro does not exist.

https://lore.kernel.org/lkml/20211118183807.1283332-1-keescook@chromium.org/T/

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13528
Closes #13575

2 years agoFix -Wattribute-warning in edonr
Brian Behlendorf [Mon, 20 Jun 2022 19:37:38 +0000 (19:37 +0000)]
Fix -Wattribute-warning in edonr

The wrong union memory was being accessed in EdonRInit resulting in
a write beyond size of field compiler warning.  Reference the correct
member to resolve the warning.  The warning was correct and this in
case the mistake was harmless.

    In function ‘fortify_memcpy_chk’,
    inlined from ‘EdonRInit’ at zfs/module/icp/algs/edonr/edonr.c:494:3:
    ./include/linux/fortify-string.h:344:25: error: call to
    ‘__write_overflow_field’ declared with attribute warning:
    detected write beyond size of field (1st parameter);
    maybe use struct_group()? [-Werror=attribute-warning]

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13528
Closes #13575

2 years agoFix -Wattribute-warning in zfs_log_xvattr()
Brian Behlendorf [Mon, 20 Jun 2022 22:36:38 +0000 (22:36 +0000)]
Fix -Wattribute-warning in zfs_log_xvattr()

Restructure the code in zfs_log_xvattr() to use a lr_attr_end
structure when accessing lr_attr_t elements located after the
variable sized array.  This makes the code more understandable
and resolves the accessing beyond the end of the field warnings.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13528
Closes #13575

2 years agoSilence -Winfinite-recursion warning in luaD_throw()
Brian Behlendorf [Mon, 20 Jun 2022 19:53:58 +0000 (19:53 +0000)]
Silence -Winfinite-recursion warning in luaD_throw()

This code should be kept inline with the upstream lua version as much
as possible.  Therefore, we simply want to silence the warning.  This
check was enabled by default as part of -Wall in gcc 12.1.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13528
Closes #13575

2 years agoAvoid panic with recordsize > 128k, raw sending and no large_blocks
George Amanakis [Mon, 27 Jun 2022 21:17:25 +0000 (23:17 +0200)]
Avoid panic with recordsize > 128k, raw sending and no large_blocks

The current codebase does not support raw sending buffers with block
size > 128kB when large_blocks is not active. This can happen in the
codepath dsl_dataset_sync()->dmu_objset_sync()->zio_nowait() which
calls back dmu_objset_write_done()->dsl_dataset_block_born(). If
dsl_dataset_sync() completes its run before dsl_dataset_block_born() is
called, we will end up not activating some of the necessary flags, while
having blocks based on those flags written in the filesystem. A
subsequent send will then panic.

Fix this by directly deciding in dmu_objset_sync() whether these flags
need to be activated later by dsl_dataset_sync(). Instead of panicking
due to a NULL pointer dereference in dmu_dump_write() in case of a send,
print out an error message. Also during scrub verify there are no
contradicting filesystem flags.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #12275
Closes #12438

2 years agoAvoid two 64-bit divisions per scanned block
Alexander Motin [Mon, 27 Jun 2022 18:08:21 +0000 (14:08 -0400)]
Avoid two 64-bit divisions per scanned block

Change math to make it like the ARC, using multiplications instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13591

2 years agoSeveral B-tree optimizations
Alexander Motin [Fri, 24 Jun 2022 20:55:58 +0000 (16:55 -0400)]
Several B-tree optimizations

- Introduce first element offset within a leaf.  It allows to reduce
by ~50% average memmove() size when adding/removing elements.  If the
added/removed element is in the first half of the leaf, we may shift
elements before it and adjust the bth_first instead of moving more
elements after it.
 - Use memcpy() instead of memmove() when we know there is no overlap.
 - Switch from uint64_t to uint32_t.  It does not limit anything,
but 32-bit arches should appreciate it greatly in hot paths.
 - Store leaf capacity in struct btree to avoid 64-bit divisions.
 - Adjust zfs_btree_insert_into_leaf() to always result in balanced
leaves after splitting, no matter where the new element was inserted.
Not that we care about it much, but it should also allow B-trees with
as little as two elements per leaf instead of 4 previously.

When scrubbing pool of 12 SSDs, storing 1.5TB of 4KB zvol blocks this
reduces amount of time spent in memmove() inside the scan thread from
13.7% to 5.7% and total scrub time by ~15 seconds out of 9 minutes.
It should also reduce spacemaps load time, but I haven't measured it.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13582

2 years agoAdd a "zstream decompress" subcommand
Alan Somers [Fri, 24 Jun 2022 20:28:42 +0000 (14:28 -0600)]
Add a "zstream decompress" subcommand

It can be used to repair a ZFS file system corrupted by ZFS bug #12762.
Use it like this:

zfs send -c <DS> | \
zstream decompress <OBJECT>,<OFFSET>[,<COMPRESSION_ALGO>] ... | \
zfs recv <DST_DS>

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Alan Somers <asomers@gmail.com>
Sponsored-by: Axcient
Workaround for #12762
Closes #13256

2 years agoSeveral sorted scrub optimizations
Alexander Motin [Fri, 24 Jun 2022 16:50:37 +0000 (12:50 -0400)]
Several sorted scrub optimizations

- Reduce size and comparison complexity of q_exts_by_size B-tree.
Previous code used two 64-bit divisions and many other operations to
compare two B-tree elements.  It created enormous overhead.  This
implementation moves the math to the upper level and stores the score
in the B-tree elements themselves.  Since all that we need to store in
that B-tree is the extent score and offset, those can fit into single
8 byte value instead of 24 bytes of q_exts_by_addr element and can be
compared with single operation.
 - Better decouple secondary tree logic from main range_tree by moving
rt_btree_ops and related functions into dsl_scan.c as ext_size_ops.
Those functions are very small to worry about the code duplication and
range_tree does not need to know details such as rt_btree_compare.
 - Instead of accounting number of pending bytes per pool, that needs
atomic on global variable per block, account the number of non-empty
per-vdev queues, that change much more rarely.
 - When extent scan is interrupted by TXG end, continue it in the next
TXG instead of selecting next best extent.  It allows to avoid leaving
one truncated (and so likely not the best any more) extent each TXG.

On top of some other optimizations this saves about 1.5 minutes out of
10 to scrub pool of 12 SSDs, storing 1.5TB of 4KB zvol blocks.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <caputit1@tcnj.edu>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13576

2 years agoUse macros for quotes and such
Toomas Soome [Fri, 24 Jun 2022 16:48:10 +0000 (19:48 +0300)]
Use macros for quotes and such

Use Dq,Pq/Po/Pc macros. illumos dumpadm is now in section 8.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #13586

2 years agoScrub mirror children without BPs
Brian Behlendorf [Thu, 23 Jun 2022 17:36:28 +0000 (10:36 -0700)]
Scrub mirror children without BPs

When scrubbing a raidz/draid pool, which contains a replacing or
sparing mirror with multiple online children, only one child will
be read.  This is not normally a serious concern because the DTL
records are used to determine where a good copy of the data is.
As long as the data can be read from one child the mirror vdev
will use it to repair gaps in any of its children.  Furthermore,
even if the data which was read is corrupt the raidz code will
detect this and issue its own repair I/O to correct the damage
in the mirror vdev.

However, in the scenario where the DTL is wrong due to silent
data corruption (say due to overwriting one child) and the scrub
happens to read from a child with good data, then the other damaged
mirror child will not be detected nor repaired.

While this is possible for both raidz and draid vdevs, it's most
pronounced when using draid.  This is because by default the zed
will sequentially rebuild a draid pool to a distributed spare,
and the distributed spare half of the mirror is always preferred
since it delivers better performance.  This means the damaged
half of the mirror will go undetected even after scrubbing.

For system administrations this behavior is non-intuitive and in
a worst case scenario could result in the only good copy of the
data being unknowingly detached from the mirror.

This change resolves the issue by reading all replacing/sparing
mirror children when scrubbing.  When the BP isn't available for
verification, then compare the data buffers from each child.  They
must all be identical, if not there's silent damage and an error
is returned to prompt the top-level vdev to issue a repair I/O to
rewrite the data on all of the mirror children.  Since we can't
tell which child was wrong a checksum error is logged against the
replacing or sparing mirror vdev.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13555

2 years agoFix memory allocation issue for BLAKE3 context
Tino Reichardt [Tue, 21 Jun 2022 21:32:09 +0000 (23:32 +0200)]
Fix memory allocation issue for BLAKE3 context

The kmem_alloc(sizeof (*ctx), KM_NOSLEEP) call on FreeBSD can't be
used in this code segment. Work around this by pre-allocating a percpu
context array for later use.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13568

2 years agoRemove install of zfs-load-module.service for dracut
Matthew Thode [Tue, 21 Jun 2022 17:37:20 +0000 (12:37 -0500)]
Remove install of zfs-load-module.service for dracut

The zfs-load-module.service service is not currently provided by
the OpenZFS repository so we cannot safely assume it exists.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Thode <mthode@mthode.org>
Closes #13574

2 years agoFreeBSD: Improve crypto_dispatch() handling
Alexander Motin [Fri, 17 Jun 2022 22:38:51 +0000 (18:38 -0400)]
FreeBSD: Improve crypto_dispatch() handling

Handle crypto_dispatch() return values same as crp->crp_etype errors.
On FreeBSD 12 many drivers returned same errors both ways, and lack
of proper handling for the first ended up in assertion panic later.
It was changed in FreeBSD 13, but there is no reason to not be safe.

While there, skip waiting for completion, including locking and
wakeup() call, for sessions on synchronous crypto drivers, such as
typical aesni and software.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13563

2 years agoexpose snapshot count via stat(2) of .zfs/snapshot (#13559)
Andrew [Fri, 17 Jun 2022 18:44:49 +0000 (13:44 -0500)]
expose snapshot count via stat(2) of .zfs/snapshot (#13559)

Increase nlinks in stat results of ./zfs/snapshot based on snapshot
count. This provides quick and efficient method for administrators to
get snapshot counts without having to use libzfs or list the snapdir
contents.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrew Walker <awalker@ixsystems.com>
Closes #13559

2 years agolibzfs: Prevent overridding of error code
ixhamza [Wed, 15 Jun 2022 21:26:12 +0000 (02:26 +0500)]
libzfs: Prevent overridding of error code

zfs_send_cb_impl fails to report error for some flags.

Use second error variable for send_conclusion_record.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #13558

2 years agoReduce ZIO io_lock contention on sorted scrub
Alexander Motin [Wed, 15 Jun 2022 21:25:08 +0000 (17:25 -0400)]
Reduce ZIO io_lock contention on sorted scrub

During sorted scrub multiple threads (one per vdev) are issuing many
ZIOs same time, all using the same scn->scn_zio_root ZIO as parent.
It causes huge lock contention on the single global lock on that ZIO.
Improve it by introducing per-queue null ZIOs, children to that one,
and using them instead as proxy.

For 12 SSD pool storing 1.5TB of 4KB blocks on 80-core system this
dramatically reduces lock contention and reduces scrub time from 21
minutes down to 12.5, while actual read stages (not scan) are about
3x faster, reaching 100K blocks per second per vdev.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13553

2 years agoAdd support for ARCH=um for x86 sub-architectures
crass [Wed, 15 Jun 2022 21:22:52 +0000 (16:22 -0500)]
Add support for ARCH=um for x86 sub-architectures

When building modules (as well as the kernel) with ARCH=um, the options
-Dsetjmp=kernel_setjmp and -Dlongjmp=kernel_longjmp are passed to the C
preprocessor for C files. This causes the setjmp and longjmp used in
module/lua/ldo.c to be kernel_setjmp and kernel_longjmp respectively in
the object file. However, the setjmp and longjmp that is intended to be
called is defined in an architecture dependent assembly file under the
directory module/lua/setjmp. Since it is an assembly and not a C file,
the preprocessor define is not given and the names do not change. This
becomes an issue when modpost is trying to create the Module.symvers
and sees no defined symbol for kernel_setjmp and kernel_longjmp. To fix
this, if the macro CONFIG_UML is defined, then setjmp and longjmp
macros are undefined.

When building with ARCH=um for x86 sub-architectures, CONFIG_X86 is not
defined. Instead, CONFIG_UML_X86 is defined. Despite this, the UML x86
sub-architecture can use the same object files as the x86 architectures
because the x86 sub-architecture UML kernel is running with the same
instruction set as CONFIG_X86. So the modules/Kbuild build file is
updated to add the same object files that CONFIG_X86 would add when
CONFIG_UML_X86 is defined.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Glenn Washburn <development@efficientek.com>
Closes #13547

2 years agoFix clang 13 compilation errors
Damian Szuberski [Wed, 15 Jun 2022 21:20:28 +0000 (07:20 +1000)]
Fix clang 13 compilation errors

```
os/linux/zfs/zvol_os.c:1111:3: error: ignoring return value of function
  declared with 'warn_unused_result' attribute [-Werror,-Wunused-result]
                add_disk(zv->zv_zso->zvo_disk);
                ^~~~~~~~ ~~~~~~~~~~~~~~~~~~~~

zpl_xattr.c:1579:1: warning: no previous prototype for function
  'zpl_posix_acl_release_impl' [-Wmissing-prototypes]
```

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #13551

2 years agoReplace ZPROP_INVAL with ZPROP_USERPROP where it means a user property
Allan Jude [Tue, 14 Jun 2022 18:27:53 +0000 (14:27 -0400)]
Replace ZPROP_INVAL with ZPROP_USERPROP where it means a user property

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Sponsored-by: Klara Inc.
Closes #12676

2 years agospl: Use a clearer name for the user namespace fd
Ryan Moeller [Mon, 13 Jun 2022 20:30:34 +0000 (20:30 +0000)]
spl: Use a clearer name for the user namespace fd

This fd has nothing to do with cleanup, that's just the name of the
field in zfs_cmd_t that was used to pass it to the kernel.

Call it what it is, an fd for a user namespace.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #13554

2 years agolibzfs: zfs_userns: Don't leak the namespace fd
Ryan Moeller [Mon, 13 Jun 2022 20:24:23 +0000 (20:24 +0000)]
libzfs: zfs_userns: Don't leak the namespace fd

zfs_userns opens a file descriptor for the kernel to look up a
namespace, but does not close it.

Close the fd when we're done with it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #13554

2 years agoAdd weekly and monthly systemd timers for trimming
Julian Brunner [Sat, 11 Jun 2022 01:22:14 +0000 (03:22 +0200)]
Add weekly and monthly systemd timers for trimming

On machines using systemd, trim timers can be enabled on a per-pool
basis. Weekly and monthly timer units are provided. Timers can be
enabled as follows:

systemctl enable zfs-trim-weekly@rpool.timer --now
systemctl enable zfs-trim-monthly@datapool.timer --now

Each timer will pull in zfs-trim@${poolname}.service, which is not
schedule-specific.

The manpage zpool-trim has been updated accordingly.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Julian Brunner <julian.brunner@gmail.com>
Closes #13544

2 years agoImprove sorted scan memory accounting
Alexander Motin [Fri, 10 Jun 2022 17:01:46 +0000 (13:01 -0400)]
Improve sorted scan memory accounting

Since we use two B-trees q_exts_by_size and q_exts_by_addr, we should
count 2x sizeof (range_seg_gap_t) per node.  And since average B-tree
memory efficiency is about 75%, we should increase it to 3x.

Previous code under-counted up to 30% of the memory usage.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13537

2 years agoAdd Linux namespace delegation support
Will Andrews [Sun, 21 Feb 2021 16:19:43 +0000 (10:19 -0600)]
Add Linux namespace delegation support

This allows ZFS datasets to be delegated to a user/mount namespace
Within that namespace, only the delegated datasets are visible
Works very similarly to Zones/Jailes on other ZFS OSes

As a user:
```
 $ unshare -Um
 $ zfs list
no datasets available
 $ echo $$
1234
```

As root:
```
 # zfs list
NAME                            ZONED  MOUNTPOINT
containers                      off    /containers
containers/host                 off    /containers/host
containers/host/child           off    /containers/host/child
containers/host/child/gchild    off    /containers/host/child/gchild
containers/unpriv               on     /unpriv
containers/unpriv/child         on     /unpriv/child
containers/unpriv/child/gchild  on     /unpriv/child/gchild

 # zfs zone /proc/1234/ns/user containers/unpriv
```

Back to the user namespace:
```
 $ zfs list
NAME                             USED  AVAIL     REFER  MOUNTPOINT
containers                       129M  47.8G       24K  /containers
containers/unpriv                128M  47.8G       24K  /unpriv
containers/unpriv/child          128M  47.8G      128M  /unpriv/child
```

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Will Andrews <will.andrews@klarasystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com>
Co-authored-by: Allan Jude <allan@klarasystems.com>
Co-authored-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com>
Sponsored-by: Buddy <https://buddy.works>
Closes #12263

2 years agoRevert parts of 938cfeb0f27303721081223816d4f251ffeb1767
Allan Jude [Fri, 2 Jul 2021 19:16:58 +0000 (19:16 +0000)]
Revert parts of 938cfeb0f27303721081223816d4f251ffeb1767

When read and writing the UID/GID, we always want the value
relative to the root user namespace, the kernel will take care
of remapping this to the user namespace for us.

Calling from_kuid(user_ns, uid) with a unmapped uid will return -1
as that uid is outside of the scope of that namespace, and will result
in the files inside the namespace all being owned by 'nobody' and not
being allowed to call chmod or chown on them.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #12263

2 years agoAVL: Remove obsolete branching optimizations
Alexander Motin [Thu, 9 Jun 2022 22:27:36 +0000 (18:27 -0400)]
AVL: Remove obsolete branching optimizations

Modern Clang and GCC can successfully implement simple conditions
without branching with math and flag operations.  Use of arrays for
translation no longer helps as much as it was 14+ years ago.

Disassemble of the code generated by Clang 13.0.0 on FreeBSD 13.1,
Clang 14.0.4 on FreeBSD 14 and GCC 10.2.1 on Debian 11 with this
change still shows no branching instructions.

Profiling of CPU-bound scan stage of sorted scrub shows reproducible
reduction of time spent inside avl_find() from 6.52% to 4.58%.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13540

2 years agolibzfs: Rename msg bufs to errbuf for consistency
Ryan Moeller [Wed, 8 Jun 2022 16:32:38 +0000 (16:32 +0000)]
libzfs: Rename msg bufs to errbuf for consistency

`libzfs_pool.c` uses the name `msg` where everywhere else in libzfs uses
`errbuf` for the error message buffer.

Use the name consistent with the rest of libzfs and use ERRBUFLEN
instead of 1024.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #13539

2 years agolibzfs: Define the defecto standard errbuf size
Ryan Moeller [Wed, 8 Jun 2022 13:08:10 +0000 (13:08 +0000)]
libzfs: Define the defecto standard errbuf size

Every errbuf array in libzfs is 1024 chars.

Define ERRBUFLEN in a shared header, and use it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #13539

2 years agozvol: Support blk-mq for better performance
Tony Hutter [Thu, 9 Jun 2022 14:10:38 +0000 (07:10 -0700)]
zvol: Support blk-mq for better performance

Add support for the kernel's block multiqueue (blk-mq) interface in
the zvol block driver.  blk-mq creates multiple request queues on
different CPUs rather than having a single request queue.  This can
improve zvol performance with multithreaded reads/writes.

This implementation uses the blk-mq interfaces on 4.13 or newer
kernels.  Building against older kernels will fall back to the
older BIO interfaces.

Note that you must set the `zvol_use_blk_mq` module param to
enable the blk-mq API.  It is disabled by default.

In addition, this commit lets the zvol blk-mq layer process whole
`struct request` IOs at a time, rather than breaking them down
into their individual BIOs.  This reduces dbuf lock contention
and overhead versus the legacy zvol submit_bio() codepath.

sequential dd to one zvol, 8k volblocksize, no O_DIRECT:

legacy submit_bio()     292MB/s write  453MB/s read
this commit             453MB/s write  885MB/s read

It also introduces a new `zvol_blk_mq_chunks_per_thread` module
parameter. This parameter represents how many volblocksize'd chunks
to process per each zvol thread.  It can be used to tune your zvols
for better read vs write performance (higher values favor write,
lower favor read).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #13148
Issue #12483

2 years agoIntroduce BLAKE3 checksums as an OpenZFS feature
Tino Reichardt [Wed, 8 Jun 2022 22:55:57 +0000 (00:55 +0200)]
Introduce BLAKE3 checksums as an OpenZFS feature

This commit adds BLAKE3 checksums to OpenZFS, it has similar
performance to Edon-R, but without the caveats around the latter.

Homepage of BLAKE3: https://github.com/BLAKE3-team/BLAKE3
Wikipedia: https://en.wikipedia.org/wiki/BLAKE_(hash_function)#BLAKE3

Short description of Wikipedia:

  BLAKE3 is a cryptographic hash function based on Bao and BLAKE2,
  created by Jack O'Connor, Jean-Philippe Aumasson, Samuel Neves, and
  Zooko Wilcox-O'Hearn. It was announced on January 9, 2020, at Real
  World Crypto. BLAKE3 is a single algorithm with many desirable
  features (parallelism, XOF, KDF, PRF and MAC), in contrast to BLAKE
  and BLAKE2, which are algorithm families with multiple variants.
  BLAKE3 has a binary tree structure, so it supports a practically
  unlimited degree of parallelism (both SIMD and multithreading) given
  enough input. The official Rust and C implementations are
  dual-licensed as public domain (CC0) and the Apache License.

Along with adding the BLAKE3 hash into the OpenZFS infrastructure a
new benchmarking file called chksum_bench was introduced.  When read
it reports the speed of the available checksum functions.

On Linux: cat /proc/spl/kstat/zfs/chksum_bench
On FreeBSD: sysctl kstat.zfs.misc.chksum_bench

This is an example output of an i3-1005G1 test system with Debian 11:

implementation      1k      4k     16k     64k    256k      1m      4m
edonr-generic     1196    1602    1761    1749    1762    1759    1751
skein-generic      546     591     608     615     619     612     616
sha256-generic     240     300     316     314     304     285     276
sha512-generic     353     441     467     476     472     467     426
blake3-generic     308     313     313     313     312     313     312
blake3-sse2        402    1289    1423    1446    1432    1458    1413
blake3-sse41       427    1470    1625    1704    1679    1607    1629
blake3-avx2        428    1920    3095    3343    3356    3318    3204
blake3-avx512      473    2687    4905    5836    5844    5643    5374

Output on Debian 5.10.0-10-amd64 system: (Ryzen 7 5800X)

implementation      1k      4k     16k     64k    256k      1m      4m
edonr-generic     1840    2458    2665    2719    2711    2723    2693
skein-generic      870     966     996     992    1003    1005    1009
sha256-generic     415     442     453     455     457     457     457
sha512-generic     608     690     711     718     719     720     721
blake3-generic     301     313     311     309     309     310     310
blake3-sse2        343    1865    2124    2188    2180    2181    2186
blake3-sse41       364    2091    2396    2509    2463    2482    2488
blake3-avx2        365    2590    4399    4971    4915    4802    4764

Output on Debian 5.10.0-9-powerpc64le system: (POWER 9)

implementation      1k      4k     16k     64k    256k      1m      4m
edonr-generic     1213    1703    1889    1918    1957    1902    1907
skein-generic      434     492     520     522     511     525     525
sha256-generic     167     183     187     188     188     187     188
sha512-generic     186     216     222     221     225     224     224
blake3-generic     153     152     154     153     151     153     153
blake3-sse2        391    1170    1366    1406    1428    1426    1414
blake3-sse41       352    1049    1212    1174    1262    1258    1259

Output on Debian 5.10.0-11-arm64 system: (Pi400)

implementation      1k      4k     16k     64k    256k      1m      4m
edonr-generic      487     603     629     639     643     641     641
skein-generic      271     299     303     308     309     309     307
sha256-generic     117     127     128     130     130     129     130
sha512-generic     145     165     170     172     173     174     175
blake3-generic      81      29      71      89      89      89      89
blake3-sse2        112     323     368     379     380     371     374
blake3-sse41       101     315     357     368     369     364     360

Structurally, the new code is mainly split into these parts:
- 1x cross platform generic c variant: blake3_generic.c
- 4x assembly for X86-64 (SSE2, SSE4.1, AVX2, AVX512)
- 2x assembly for ARMv8 (NEON converted from SSE2)
- 2x assembly for PPC64-LE (POWER8 converted from SSE2)
- one file for switching between the implementations

Note the PPC64 assembly requires the VSX instruction set and the
kfpu_begin() / kfpu_end() calls on PowerPC were updated accordingly.

Reviewed-by: Felix Dörre <felix@dogcraft.de>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Co-authored-by: Rich Ercolani <rincebrain@gmail.com>
Closes #10058
Closes #12918

2 years agoautoconf: AC_MSG_CHECKING consistency
Brian Behlendorf [Tue, 31 May 2022 23:42:49 +0000 (16:42 -0700)]
autoconf: AC_MSG_CHECKING consistency

Make the wording more consistent for the kernel AC_MSG_CHECKING
output (e.g. "checking whether ...".).  Additionally, group some
of the VFS interface checks with the others.  No functional change.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13529

2 years agoLinux 5.19 compat: asm/fpu/internal.h
Brian Behlendorf [Tue, 31 May 2022 23:30:59 +0000 (16:30 -0700)]
Linux 5.19 compat: asm/fpu/internal.h

As of the Linux 5.19 kernel the asm/fpu/internal.h header was
entirely removed.  It has been effectively empty since the 5.16
kernel and provides no required functionality.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13529

2 years agoRemove wrong assertion in log spacemap
Alexander Motin [Wed, 1 Jun 2022 16:54:35 +0000 (12:54 -0400)]
Remove wrong assertion in log spacemap

It is typical, but not generally true that if log summary has more
blocks it must also have unflushed metaslabs.  Normally with metaslabs
flushed in order it works, but there are known exceptions, such as
device removal or metaslab being loaded during its flush attempt.

Before 600a02b8844 if spa_flush_metaslabs() hit loading metaslab it
usually stopped (unless memlimit is also exceeded), but now it may
flush more metaslabs, just skipping that particular one.  This
increased chances of assertion to fire when the skipped metaslab is
flushed on next iteration if all other metaslabs in that summary
entry are already flushed out of order.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13486
Closes #13513

2 years agoCorrected parameters for zstd early abort
Rich Ercolani [Tue, 31 May 2022 22:41:33 +0000 (18:41 -0400)]
Corrected parameters for zstd early abort

That'll teach me to try and recall them from the definition.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #13519

2 years agoFix typo in zil_commit() comment block
Allan Jude [Tue, 31 May 2022 22:37:46 +0000 (18:37 -0400)]
Fix typo in zil_commit() comment block

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #13518

2 years agoLinux 5.18 compat: META
Brian Behlendorf [Tue, 31 May 2022 21:38:00 +0000 (14:38 -0700)]
Linux 5.18 compat: META

Update the META file to reflect compatibility with the 5.18 kernel.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13527

2 years agoLinux 5.19 compat: zap_flags_t conflict
Brian Behlendorf [Fri, 27 May 2022 22:56:05 +0000 (15:56 -0700)]
Linux 5.19 compat: zap_flags_t conflict

As of the Linux 5.19 kernel an identically named zap_flags_t typedef
is declared in the include/linux/mm_types.h linux header.  Sadly,
the inclusion of this header cannot be easily avoided.  To resolve
the conflict a #define is used to remap the name in the OpenZFS
sources when building against the Linux kernel.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13515

2 years agoLinux 5.19 compat: bdev_start_io_acct() / bdev_end_io_acct()
Brian Behlendorf [Fri, 27 May 2022 21:31:03 +0000 (21:31 +0000)]
Linux 5.19 compat: bdev_start_io_acct() / bdev_end_io_acct()

As of the Linux 5.19 kernel the disk_*_io_acct() helper functions
have been replaced by the bdev_*_io_acct() functions.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13515