]> CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log
FreeBSD/FreeBSD.git
18 months agodracut: skip zfsexpandknoweldge when zfs_devs is present in dracut
Savyasachee Jha [Mon, 28 Feb 2022 22:35:25 +0000 (04:05 +0530)]
dracut: skip zfsexpandknoweldge when zfs_devs is present in dracut

PR 1711 (https://github.com/dracutdevs/dracut/pull/1711) adds a zfs_devs
function to dracut to detect the physical devices backing zfs pools. If
this function exists in the version of dracut this module is being
called from, then it does not need to run.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Savyasachee Jha <hi@savyasacheejha.com>
Closes #13121

18 months agoTag zfs-2.1.7
Tony Hutter [Wed, 30 Nov 2022 21:09:04 +0000 (13:09 -0800)]
Tag zfs-2.1.7

META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
18 months agozfs-2.1.7: Use ubuntu-20.04 for zloop and sanity builders
Tony Hutter [Thu, 1 Dec 2022 18:21:12 +0000 (10:21 -0800)]
zfs-2.1.7: Use ubuntu-20.04 for zloop and sanity builders

The zfs-2.1.7 branch is still using the older 'python-dev'
package names rather than the newer 'python3-dev' packages that
are required for 'ubuntu-latest'.  Use 'ubuntu-20.04' instead of
'ubuntu-latest' to get around this.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
18 months agoFix setting the large_block feature after receiving a snapshot
George Amanakis [Fri, 18 Nov 2022 19:38:37 +0000 (20:38 +0100)]
Fix setting the large_block feature after receiving a snapshot

We are not allowed to dirty a filesystem when done receiving
a snapshot. In this case the flag SPA_FEATURE_LARGE_BLOCKS will
not be set on that filesystem since the filesystem is not on
dp_dirty_datasets, and a subsequent encrypted raw send will fail.
Fix this by checking in dsl_dataset_snapshot_sync_impl() if the feature
needs to be activated and do so if appropriate.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #13699
Closes #13782

18 months agoMake autodetection disable pyzfs for kernel/srpm configurations
Damian Szuberski [Wed, 16 Nov 2022 17:27:53 +0000 (03:27 +1000)]
Make autodetection disable pyzfs for kernel/srpm configurations

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #13394
Closes #14178

18 months agoDon't leak packed recieved proprties
Brooks Davis [Tue, 29 Nov 2022 17:51:35 +0000 (09:51 -0800)]
Don't leak packed recieved proprties

When local properties (e.g., from -o and -x) are provided, don't leak
the packed representation of the received properties due to variable
reuse.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14197

18 months agoFix NULL pointer dereference in dbuf_prefetch_indirect_done()
Richard Yao [Sun, 20 Nov 2022 23:04:08 +0000 (18:04 -0500)]
Fix NULL pointer dereference in dbuf_prefetch_indirect_done()

When ZFS is built with assertions, a prefetch is done on a redacted
blkptr and `dpa->dpa_dnode` is NULL, we will have a NULL pointer
dereference in `dbuf_prefetch_indirect_done()`.

Both Coverity and Clang's Static Analyzer caught this.

Reported-by: Coverity (CID 1524671)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14210

18 months agoLua: Fix bad bitshift in lua_strx2number()
Richard Yao [Tue, 29 Nov 2022 17:53:33 +0000 (12:53 -0500)]
Lua: Fix bad bitshift in lua_strx2number()

The port of lua to OpenZFS modified lua to use int64_t for numbers
instead of double. As part of this, a function for calculating
exponentiation was replaced with a bit shift. Unfortunately, it did not
handle negative values. Also, it only supported exponents numbers with
7 digits before before overflow. This supports exponents up to 15 digits
before overflow.

Clang's static analyzer reported this as "Result of operation is garbage
or undefined" because the exponent was negative.

Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14204

18 months agoFix clang 13 compilation errors
Damian Szuberski [Wed, 15 Jun 2022 21:20:28 +0000 (07:20 +1000)]
Fix clang 13 compilation errors

```
os/linux/zfs/zvol_os.c:1111:3: error: ignoring return value of function
  declared with 'warn_unused_result' attribute [-Werror,-Wunused-result]
                add_disk(zv->zv_zso->zvo_disk);
                ^~~~~~~~ ~~~~~~~~~~~~~~~~~~~~

zpl_xattr.c:1579:1: warning: no previous prototype for function
  'zpl_posix_acl_release_impl' [-Wmissing-prototypes]
```

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #13551
(cherry picked from commit 988431966639d791ac269011d136e85f3602df75)

18 months agoRemove final K&R definitions
наб [Tue, 10 May 2022 21:28:02 +0000 (23:28 +0200)]
Remove final K&R definitions

Clang trunk now warns -Wstrict-prototypes on this, and they're removed
in C2x

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13447

18 months agomodule: zfs: vdev_removal: remove unused num_indirect
наб [Thu, 7 Apr 2022 02:26:30 +0000 (04:26 +0200)]
module: zfs: vdev_removal: remove unused num_indirect

Found with -Wunused-but-set-variable on Clang trunk

Fixes: a1d477c24c ("OpenZFS 7614, 9064 - zfs device evacuation/removal")
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13304

18 months agotests: cmd: draid: remove unused and undocumented -v
наб [Thu, 7 Apr 2022 02:25:05 +0000 (04:25 +0200)]
tests: cmd: draid: remove unused and undocumented -v

Found with -Wunused-but-set-variable on Clang trunk

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13304

18 months agolinux: libspl: zone: () -> (void)
наб [Wed, 12 Jan 2022 16:37:40 +0000 (17:37 +0100)]
linux: libspl: zone: () -> (void)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12968

18 months agoCorrect multipathd.target to .service
Laura Hild [Fri, 18 Nov 2022 19:36:19 +0000 (14:36 -0500)]
Correct multipathd.target to .service

https://github.com/openzfs/zfs/pull/9863 says it "orders
zfs-import-cache.service and zfs-import-scan.service after
multipathd.service" but the commit (79add96) actually
ordered them after .target.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Laura Hild <lsh@jlab.org>
Closes #12709
Closes #14171

18 months agoHandle and detect #13709's unlock regression (#14161)
Rich Ercolani [Tue, 15 Nov 2022 22:44:12 +0000 (17:44 -0500)]
Handle and detect #13709's unlock regression (#14161)

In #13709, as in #11294 before it, it turns out that 63a26454 still had
the same failure mode as when it was first landed as d1d47691, and
fails to unlock certain datasets that formerly worked.

Rather than reverting it again, let's add handling to just throw out
the accounting metadata that failed to unlock when that happens, as
well as a test with a pre-broken pool image to ensure that we never get
bitten by this again.

Fixes: #13709
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
18 months agoFix arc_p aggressive increase
shodanshok [Fri, 11 Nov 2022 18:41:36 +0000 (19:41 +0100)]
Fix arc_p aggressive increase

The original ARC paper called for an initial 50/50 MRU/MFU split
and this is accounted in various places where arc_p = arc_c >> 1,
with further adjustment based on ghost lists size/hit. However, in
current code both arc_adapt() and arc_get_data_impl() aggressively
grow arc_p until arc_c is reached, causing unneeded pressure on
MFU and greatly reducing its scan-resistance until ghost list
adjustments kick in.

This patch restores the original behavior of initially having arc_p
as 1/2 of total ARC, without preventing MRU to use up to 100% total
ARC when MFU is empty.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #14137
Closes #14120

18 months agoFreeBSD: Fix out of bounds read in zfs_ioctl_ozfs_to_legacy()
Richard Yao [Fri, 4 Nov 2022 18:06:14 +0000 (14:06 -0400)]
FreeBSD: Fix out of bounds read in zfs_ioctl_ozfs_to_legacy()

There is an off by 1 error in the check. Fortunately, this function does
not appear to be used in kernel space, despite being compiled as part of
the kernel module. However, it is used in userspace. Callers of
lzc_ioctl_fd() likely will crash if they attempt to use the
unimplemented request number.

This was reported by FreeBSD's coverity scan.

Reported-by: Coverity (CID 1432059)
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14135

18 months agoExpose zfs_vdev_open_timeout_ms as a tunable
Serapheim Dimitropoulos [Thu, 3 Nov 2022 22:02:46 +0000 (15:02 -0700)]
Expose zfs_vdev_open_timeout_ms as a tunable

Some of our customers have been occasionally hitting zfs import failures
in Linux because udevd doesn't create the by-id symbolic links in time
for zpool import to use them. The main issue is that the
systemd-udev-settle.service that zfs-import-cache.service and other
services depend on is racy. There is also an openzfs issue filed (see
https://github.com/openzfs/zfs/issues/10891) outlining the problem and
potential solutions.

With the proper solutions being significant in terms of complexity and
the priority of the issue being low for the time being, this patch
exposes `zfs_vdev_open_timeout_ms` as a tunable so people that are
experiencing this issue often can increase it as a workaround.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #14133

18 months agoRemove an unused variable
Brooks Davis [Tue, 1 Nov 2022 20:45:36 +0000 (20:45 +0000)]
Remove an unused variable

Clang-16 detects this set-but-unused variable which is assigned and
incremented, but never referenced otherwise.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14125

18 months agoMake 1-bit bitfields unsigned
Brooks Davis [Tue, 1 Nov 2022 20:43:32 +0000 (20:43 +0000)]
Make 1-bit bitfields unsigned

This fixes -Wsingle-bit-bitfield-constant-conversion warning from
clang-16 like:

lib/libzfs/libzfs_dataset.c:4529:19: error: implicit truncation
  from 'int' to a one-bit wide bit-field changes value from
  1 to -1 [-Werror,-Wsingle-bit-bitfield-constant-conversion]
                flags.nounmount = B_TRUE;
^ ~~~~~~

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14125

18 months agoAddress warnings about possible division by zero from clangsa
Richard Yao [Thu, 3 Nov 2022 16:58:14 +0000 (12:58 -0400)]
Address warnings about possible division by zero from clangsa

* The complaint in ztest_replay_write() is only possible if something
   went horribly wrong. An assertion will silence this and if it goes
   off, we will know that something is wrong.
 * The complaint in spa_estimate_metaslabs_to_flush() is not impossible,
   but seems very unlikely. We resolve this by passing the value from
   the `MIN()` that does not go to infinity when the variable is zero.

There was a third report from Clang's scan-build, but that was a
definite false positive and disappeared when checked again through
Clang's static analyzer with Z3 refution via CodeChecker.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14124

18 months agoAvoid null pointer dereference in dsl_fs_ss_limit_check()
Allan Jude [Sat, 29 Oct 2022 20:08:54 +0000 (16:08 -0400)]
Avoid null pointer dereference in dsl_fs_ss_limit_check()

Check for cr == NULL before dereferencing it in
dsl_enforce_ds_ss_limits() to lookup the zone/jail ID.

Reported-by: Coverity (CID 1210459)
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #14103

18 months agoFix too few arguments to formatting function
Richard Yao [Thu, 27 Oct 2022 16:45:26 +0000 (12:45 -0400)]
Fix too few arguments to formatting function

CodeQL reported that when the VERIFY3U condition is false, we do not
pass enough arguments to `spl_panic()`. This is because the format
string from `snprintf()` was concatenated into the format string for
`spl_panic()`, which causes us to have an unexpected format specifier.

A CodeQL developer suggested fixing the macro to have a `%s` format
string that takes a stringified RIGHT argument, which would fix this.
However, upon inspection, the VERIFY3U check was never necessary in the
first place, so we remove it in favor of just calling `snprintf()`.

Lastly, it is interesting that every other static analyzer run on the
codebase did not catch this, including some that made an effort to catch
such things. Presumably, all of them relied on header annotations, which
we have not yet done on `spl_panic()`. CodeQL apparently is able to
track the flow of arguments on their way to annotated functions, which
llowed it to catch this when others did not. A future patch that I have
in development should annotate `spl_panic()`, so the others will catch
this too.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14098

18 months agoRemove zpl_revalidate: fix snapshot rollback
Pavel Snajdr [Thu, 5 Dec 2019 00:52:27 +0000 (01:52 +0100)]
Remove zpl_revalidate: fix snapshot rollback

Open files, which aren't present in the snapshot, which is being
roll-backed to, need to disappear from the visible VFS image of
the dataset.

Kernel provides d_drop function to drop invalid entry from
the dcache, but inode can be referenced by dentry multiple dentries.

The introduced zpl_d_drop_aliases function walks and invalidates
all aliases of an inode.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #9600
Closes #14070

18 months agoFix theoretical use of uninitialized values
Richard Yao [Sat, 15 Oct 2022 22:27:03 +0000 (18:27 -0400)]
Fix theoretical use of uninitialized values

Clang's static analyzer complains about this.

In get_configs(), if we have an invalid configuration that has no top
level vdevs, we can read a couple of uninitialized variables. Aborting
upon seeing this would break the userland tools for healthy pools, so we
instead initialize the two variables to 0 to allow the userland tools to
continue functioning for the pools with valid configurations.

In zfs_do_wait(), if no wait activities are enabled, we read an
uninitialized error variable.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14043

18 months agoFix memory leaks in dmu_send()/dmu_send_obj()
Richard Yao [Tue, 18 Oct 2022 23:03:33 +0000 (19:03 -0400)]
Fix memory leaks in dmu_send()/dmu_send_obj()

If we encounter an EXDEV error when using the redacted snapshots
feature, the memory used by dspp.fromredactsnaps is leaked.

Clang's static analyzer caught this during an experiment in which I had
annotated various headers in an attempt to improve the results of static
analysis.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13973

18 months agoFix possible NULL pointer dereference in sha2_mac_init()
Richard Yao [Mon, 17 Oct 2022 06:06:40 +0000 (02:06 -0400)]
Fix possible NULL pointer dereference in sha2_mac_init()

If mechanism->cm_param is NULL, passing mechanism to
PROV_SHA2_GET_DIGEST_LEN() will dereference a NULL pointer.

Coverity reported this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14044

18 months agoset_global_var() should not pass NULL pointers to dlclose()
Richard Yao [Sun, 16 Oct 2022 04:56:55 +0000 (00:56 -0400)]
set_global_var() should not pass NULL pointers to dlclose()

Both Coverity and Clang's static analyzer caught this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14044

18 months agoFix NULL pointer dereference in spa_open_common()
Richard Yao [Sun, 16 Oct 2022 04:19:13 +0000 (00:19 -0400)]
Fix NULL pointer dereference in spa_open_common()

Calling spa_open() will pass a NULL pointer to spa_open_common()'s
config parameter. Under the right circumstances, we will dereference the
config parameter without doing a NULL check.

Clang's static analyzer found this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14044

18 months agoFix NULL pointer passed to strlcpy from zap_lookup_impl()
Richard Yao [Sat, 15 Oct 2022 02:55:48 +0000 (22:55 -0400)]
Fix NULL pointer passed to strlcpy from zap_lookup_impl()

Clang's static analyzer pointed out that whenever zap_lookup_by_dnode()
is called, we have the following stack where strlcpy() is passed a NULL
pointer for realname from zap_lookup_by_dnode():

strlcpy()
zap_lookup_impl()
zap_lookup_norm_by_dnode()
zap_lookup_by_dnode()

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14044

18 months agofm_fmri_hc_create() must call va_end() before returning
Richard Yao [Sat, 15 Oct 2022 02:46:43 +0000 (22:46 -0400)]
fm_fmri_hc_create() must call va_end() before returning

clang-tidy caught this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14044

18 months agoFix NULL pointer dereference in zdb
Richard Yao [Sat, 15 Oct 2022 02:45:13 +0000 (22:45 -0400)]
Fix NULL pointer dereference in zdb

Clang's static analyzer complained that we dereference a NULL pointer in
dump_path() if we return 0 when there is an error.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14044

18 months agoZED: Fix uninitialized value reads
Richard Yao [Tue, 18 Oct 2022 19:42:14 +0000 (15:42 -0400)]
ZED: Fix uninitialized value reads

Coverity complained about a couple of uninitialized value reads in ZED.

 * zfs_deliver_dle() can pass an uninitialized string to zed_log_msg()
 * An uninitialized sev.sigev_signo is passed to timer_create()

The former would log garbage while the latter is not a real issue, but
we might as well suppress it by initializing the field to 0 for
consistency's sake.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14047

18 months agoFix theoretical array overflow in lua_typename()
Richard Yao [Fri, 14 Oct 2022 20:41:56 +0000 (16:41 -0400)]
Fix theoretical array overflow in lua_typename()

Out of the 12 defects in lua that coverity reports, 5 of them involve
`lua_typename()` and out of the dozens of defects in ZFS that lua
reports, 3 of them involve `lua_typename()` due to the ZCP code. Given
all of the uses of `lua_typename()` in the ZCP code, I was surprised
that there were not more. It appears that only 2 were reported because
only 3 called `lua_type()`, which does a defective sanity check that
allows invalid types to be passed.

lua/lua@d4fb848be77f4b0209acaf37a5b5e1cee741ddce addressed this in
upstream lua 5.3. Unfortunately, we did not get that fix since we use
lua 5.2 and we do not have assertions enabled in lua, so the upstream
solution would not do anything.

While we could adopt the upstream solution and enable assertions, a
simpler solution is to fix the issue by making `lua_typename()` return
`internal_type_error` whenever it is called with an invalid type. This
avoids the array overflow and if we ever see it appear somewhere, we
will know there is a problem with the lua interpreter.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13947

18 months agoFix potential NULL pointer dereference in lzc_ioctl()
Richard Yao [Fri, 14 Oct 2022 20:33:22 +0000 (16:33 -0400)]
Fix potential NULL pointer dereference in lzc_ioctl()

Users are allowed to pass NULL to resultp, but we unconditionally assume
that they never do. When an external user does pass NULL to resultp, we
dereference a NULL pointer.

Clang's static analyzer complained about this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14008

18 months agoscripts/enum-extract.pl should not hard code perl path
Richard Yao [Tue, 11 Oct 2022 19:32:07 +0000 (15:32 -0400)]
scripts/enum-extract.pl should not hard code perl path

This is a portability issue. The issue had already been fixed for
scripts/cstyle.pl by 2dbf1bf8296f66f24d5e404505c991bfbeec7808.
scripts/enum-extract.pl was added to the repository the following year
without this portability fix.

Michael Bishop informed me that this broke his attempt to build ZFS
2.1.6 on NixOS, since he was building manually outside of their package
manager (that usually rewrites the shebangs to NixOS' unusual paths).
NixOS puts all of the paths into $PATH, so scripts that portably rely
on env to find the interpreter still work.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14012

18 months agoPAM: Fix unchecked return value from zfs_key_config_load()
Richard Yao [Thu, 6 Oct 2022 00:09:24 +0000 (20:09 -0400)]
PAM: Fix unchecked return value from zfs_key_config_load()

9a49c6b782443ba6e627f2261c45f082ad843094 was intended to fix this issue,
but I had missed the case in pam_sm_open_session(). Clang's static
analyzer had not reported it and I forgot to look for other cases.

Interestingly, GCC gcc-12.1.1_p20220625's static analyzer had caught
this as multiple double-free bugs, since another failure after the
failure in zfs_key_config_load() will cause us to attempt to free the
memory that zfs_key_config_load() was supposed to allocate, but had
cleaned up upon failure.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13978

18 months agoFix potential NULL pointer dereference in dsl_dataset_promote_check()
Richard Yao [Fri, 30 Sep 2022 23:59:51 +0000 (19:59 -0400)]
Fix potential NULL pointer dereference in dsl_dataset_promote_check()

If the `list_head()` returns NULL, we dereference it, right before we
check to see if it returned NULL.

We have defined two different pointers that both point to the same
thing, which are `origin_head` and `origin_ds`. Almost everything uses
`origin_ds`, so we switch them to use `origin_ds`.

We also promote `origin_ds` to a const pointer so that the compiler
verifies that nothing modifies it.

Coverity complained about this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13967

18 months agoFix unreachable code in zstreamdump
Richard Yao [Thu, 29 Sep 2022 17:16:37 +0000 (13:16 -0400)]
Fix unreachable code in zstreamdump

82226e4f44baa3f7c3101caaaf941927aa318e46 was intended to prevent a
warning from being printed in situations where it was inappropriate, but
accidentally disabled it entirely by setting featureflags in the wrong
case statement.

Coverity reported this as dead code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13946

18 months agoPAM: Fix uninitialized value read
Richard Yao [Tue, 27 Sep 2022 23:48:35 +0000 (19:48 -0400)]
PAM: Fix uninitialized value read

Clang's static analyzer found that config.uid is uninitialized when
zfs_key_config_load() returns an error.

Oddly, this was not included in the unchecked return values that
Coverity found.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13957

18 months agoset_global_var_parse_kv() should pass the pointer from strdup()
Richard Yao [Fri, 23 Sep 2022 17:51:14 +0000 (13:51 -0400)]
set_global_var_parse_kv() should pass the pointer from strdup()

A comment says that the caller should free k_out, but the pointer passed
via k_out is not the same pointer we received from strdup(). Instead,
it is a pointer into the region we received from strdup(). The free
function should always be called with the original pointer, so this is
likely a bug.

We solve this by calling `strdup()` a second time and then freeing the
original pointer.

Coverity reported this as a memory leak.

Reviewed-by: Neal Gompa <ngompa@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13867

18 months agoCall va_end() before return in zpool_standard_error_fmt()
Richard Yao [Tue, 20 Sep 2022 22:20:56 +0000 (18:20 -0400)]
Call va_end() before return in zpool_standard_error_fmt()

Commit ecd6cf800b63704be73fb264c3f5b6e0dafc068d by marks in OpenSolaris
at Tue Jun 26 07:44:24 2007 -0700 introduced a bug where we fail to call
`va_end()` before returning.

The man page for va_start() says:

"Each invocation of va_start() must be matched by a corresponding
invocation of va_end() in the same function."

Coverity complained about this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@nutanix.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13904

18 months agoFix potential NULL pointer dereference in zfsdle_vdev_online()
Richard Yao [Tue, 20 Sep 2022 22:20:04 +0000 (18:20 -0400)]
Fix potential NULL pointer dereference in zfsdle_vdev_online()

Coverity complained about this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@nutanix.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13903

18 months agoFreeBSD: Fix uninitialized pointer read in spa_import_rootpool()
Richard Yao [Tue, 20 Sep 2022 21:43:03 +0000 (17:43 -0400)]
FreeBSD: Fix uninitialized pointer read in spa_import_rootpool()

The FreeBSD project's coverity scans found this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13923

18 months agoLinux: Fix use-after-free in zfsvfs_create()
Richard Yao [Tue, 20 Sep 2022 00:30:58 +0000 (20:30 -0400)]
Linux: Fix use-after-free in zfsvfs_create()

Coverity reported that we pass a pointer to zfsvfs to
`dmu_objset_disown()` after freeing zfsvfs in zfsvfs_create_impl() after
a failure in zfsvfs_init().

We have nearly identical duplicate versions of this code for FreeBSD and
Linux, but interestingly, the FreeBSD version of this code differs in
such a way that it does not suffer from this bug. We remove the
difference from the FreeBSD version to fix this bug.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13883

18 months agoFix null pointer dereferences in PAM
Richard Yao [Fri, 16 Sep 2022 21:02:54 +0000 (17:02 -0400)]
Fix null pointer dereferences in PAM

Coverity caught these.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13889

18 months agoHandle ECKSUM as new EZFS_CKSUM ‒ "insufficient replicas"
наб [Fri, 16 Sep 2022 20:59:25 +0000 (22:59 +0200)]
Handle ECKSUM as new EZFS_CKSUM ‒ "insufficient replicas"

Add a meaningful error message for ECKSUM to common error messages.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #6805
Closes #13808
Closes #13898

18 months agoFix use-after-free bugs in icp code
Richard Yao [Thu, 15 Sep 2022 18:46:42 +0000 (14:46 -0400)]
Fix use-after-free bugs in icp code

These were reported by Coverity as "Read from pointer after free" bugs.
Presumably, it did not report it as a use-after-free bug because it does
not understand the inline assembly that implements the atomic
instruction.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13881

18 months agoRemove incorrect free() in zfs_get_pci_slots_sys_path()
Richard Yao [Wed, 14 Sep 2022 00:00:53 +0000 (20:00 -0400)]
Remove incorrect free() in zfs_get_pci_slots_sys_path()

Coverity found this. We attempted to free tmp, which is a pointer to a
string that should be freed by the caller.

Reviewed-by: Neal Gompa <ngompa@datto.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13864

18 months agoCleanup: Make memory barrier definitions consistent across kernels
Richard Yao [Tue, 13 Sep 2022 23:59:33 +0000 (19:59 -0400)]
Cleanup: Make memory barrier definitions consistent across kernels

We inherited membar_consumer() and membar_producer() from OpenSolaris,
but we had replaced membar_consumer() with Linux's smp_rmb() in
zfs_ioctl.c. The FreeBSD SPL consequently implemented a shim for the
Linux-only smp_rmb().

We reinstate membar_consumer() in platform independent code and fix the
FreeBSD SPL to implement membar_consumer() in a way analogous to Linux.

Reviewed-by: Konstantin Belousov <kib@FreeBSD.org>
Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13843

18 months agozpool_load_compat() should create strings of length ZFS_MAXPROPLEN
Richard Yao [Mon, 12 Sep 2022 19:54:43 +0000 (15:54 -0400)]
zpool_load_compat() should create strings of length ZFS_MAXPROPLEN

Otherwise, `strlcat()` can overflow them.

Coverity found this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13866

18 months agoicp: fix all !ENDBR objtool warnings in x86 Asm code
Alexander Lobakin [Sun, 16 Oct 2022 21:41:39 +0000 (23:41 +0200)]
icp: fix all !ENDBR objtool warnings in x86 Asm code

Currently, only Blake3 x86 Asm code has signs of being ENDBR-aware.
At least, under certain conditions it includes some header file and
uses some custom macro from there.
Linux has its own NOENDBR since several releases ago. It's defined
in the same <asm/linkage.h>, so currently <sys/asm_linkage.h>
already is provided with it.

Let's unify those two into one %ENDBR macro. At first, check if it's
present already. If so -- use Linux kernel version. Otherwise, try
to go that second way and use %_CET_ENDBR from <cet.h> if available.
If no, fall back to just empty definition.
This fixes a couple more 'relocations to !ENDBR' across the module.
And now that we always have the latest/actual ENDBR definition, use
it at the entrance of the few corresponding functions that objtool
still complains about. This matches the way how it's used in the
upstream x86 core Asm code.

Reviewed-by: Attila Fülöp <attila@fueloep.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Closes #14035

18 months agoicp: fix rodata being marked as text in x86 Asm code
Alexander Lobakin [Sun, 16 Oct 2022 21:23:44 +0000 (23:23 +0200)]
icp: fix rodata being marked as text in x86 Asm code

objtool properly complains that it can't decode some of the
instructions from ICP x86 Asm code. As mentioned in the Makefile,
where those object files were excluded from objtool check (but they
can still be visible under IBT and LTO), those are just constants,
not code.
In that case, they must be placed in .rodata, so they won't be
marked as "allocatable, executable" (ax) in EFL headers and this
effectively prevents objtool from trying to decode this data. That
reveals a whole bunch of other issues in ICP Asm code, as previously
objtool was bailing out after that warning message.

Reviewed-by: Attila Fülöp <attila@fueloep.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Closes #14035

Conflicts:
module/Kbuild.in

18 months agoicp: properly fix all RETs in x86_64 Asm code
Alexander Lobakin [Sun, 16 Oct 2022 14:53:22 +0000 (16:53 +0200)]
icp: properly fix all RETs in x86_64 Asm code

Commit 43569ee37420 ("Fix objtool: missing int3 after ret warning")
addressed replacing all `ret`s in x86 asm code to a macro in the
Linux kernel in order to enable SLS. That was done by copying the
upstream macro definitions and fixed objtool complaints.
Since then, several more mitigations were introduced, including
Rethunk. It requires to have a jump to one of the thunks in order
to work, so the RET macro was changed again. And, as ZFS code
didn't use the mainline defition, but copied it, this is currently
missing.

Objtool reminds about it time to time (Clang 16, CONFIG_RETHUNK=y):

fs/zfs/lua/zlua.o: warning: objtool: setjmp+0x25: 'naked' return
 found in RETHUNK build
fs/zfs/lua/zlua.o: warning: objtool: longjmp+0x27: 'naked' return
 found in RETHUNK build

Do it the following way:
* if we're building under Linux, unconditionally include
  <linux/linkage.h> in the related files. It is available in x86
  sources since even pre-2.6 times, so doesn't need any conftests;
* then, if RET macro is available, it will be used directly, so that
  we will always have the version actual to the kernel we build;
* if there's no such macro, we define it as a simple `ret`, as it
  was on pre-SLS times.

This ensures we always have the up-to-date definition with no need
to update it manually, and at the same time is safe for the whole
variety of kernels ZFS module supports.
Then, there's a couple more "naked" rets left in the code, they're
just defined as:

.byte 0xf3,0xc3

In fact, this is just:

rep ret

`rep ret` instead of just `ret` seems to mitigate performance issues
on some old AMD processors and most likely makes no sense as of
today.
Anyways, address those rets, so that they will be protected with
Rethunk and SLS. Include <sys/asm_linkage.h> here which now always
has RET definition and replace those constructs with just RET.
This wipes the last couple of places with unpatched rets objtool's
been complaining about.

Reviewed-by: Attila Fülöp <attila@fueloep.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Closes #14035

18 months agolibzfs recv: Check if user prop before inheritable
Ryan Moeller [Wed, 9 Nov 2022 21:24:06 +0000 (21:24 +0000)]
libzfs recv: Check if user prop before inheritable

User props trigger an assert in zfs_prop_inheritable(), we must check
if the prop is a user prop first.

Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Backported as snippit from:
63652e1 Add --enable-asan and --enable-ubsan switches

19 months agodsl_prop_known_index(): check for invalid prop
Damian Szuberski [Tue, 8 Nov 2022 18:16:01 +0000 (04:16 +1000)]
dsl_prop_known_index(): check for invalid prop

Resolve UBSAN array-index-out-of-bounds error in zprop_desc_t.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #14142
Closes #14147

19 months agozed: Avoid core dump if wholedisk property does not exist
Ameer Hamza [Fri, 21 Oct 2022 17:46:38 +0000 (22:46 +0500)]
zed: Avoid core dump if wholedisk property does not exist

zed aborts and dumps core in vdev_whole_disk_from_config() if
wholedisk property does not exist. make_leaf_vdev() adds the
property but there may be already pools that don't have the
wholedisk in the label.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #14062

19 months agozed: Prevent special vdev to be replaced by hot spare
Ameer Hamza [Fri, 4 Nov 2022 18:33:47 +0000 (23:33 +0500)]
zed: Prevent special vdev to be replaced by hot spare

Special vdevs should not be replaced by a hot spare.
Log vdevs already support this, extending the
functionality for special vdevs.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #14129

19 months agoDeny receiving into encrypted datasets if the keys are not loaded (#14139)
Attila Fülöp [Fri, 4 Nov 2022 18:07:29 +0000 (19:07 +0100)]
Deny receiving into encrypted datasets if the keys are not loaded (#14139)

Commit 68ddc06b611854560fefa377437eb3c9480e084b introduced support
for receiving unencrypted datasets as children of encrypted ones but
unfortunately got the logic upside down. This resulted in failing to
deny receives of incremental sends into encrypted datasets without
their keys loaded. If receiving a filesystem, the receive was done
into a newly created unencrypted child dataset of the target. In
case of volumes the receive made the target volume undeletable since
a dataset was created below it, which we obviously can't handle.
Incremental streams with embedded blocks are affected as well.

We fix the broken logic to properly deny receives in such cases.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #13598
Closes #14055
Closes #14119

19 months agozil: Relax assertion in zil_parse
Ryan Moeller [Tue, 1 Nov 2022 19:19:32 +0000 (15:19 -0400)]
zil: Relax assertion in zil_parse

Rather than panic debug builds when we fail to parse a whole ZIL, let's
instead improve the logging of errors and continue like in a release
build.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #14116

19 months agoquota: extend quota for dataset
Mariusz Zaborski [Fri, 28 Oct 2022 18:44:18 +0000 (20:44 +0200)]
quota: extend quota for dataset

This patch relax the quota limitation for dataset by around 3%.
What this means is that user can write more data then the quota is
set to. However thanks to that we can get more stable bandwidth, in
case when we are overwriting data in-place, and not consuming any
additional space.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org>
Sponsored-by: Zededa Inc.
Sponsored-by: Klara Inc.
Closes #13839

19 months agoFix ARC target collapse when zfs_arc_meta_limit_percent=100
shodanshok [Fri, 28 Oct 2022 17:21:54 +0000 (19:21 +0200)]
Fix ARC target collapse when zfs_arc_meta_limit_percent=100

Reclaim metadata when arc_available_memory < 0 even if
meta_used is not bigger than arc_meta_limit.

As described in https://github.com/openzfs/zfs/issues/14054 if
zfs_arc_meta_limit_percent=100 then ARC target can collapse to
arc_min due to arc_purge not freeing any metadata.

This patch lets arc_prune to do its work when arc_available_memory
is negative even if meta_used is not bigger than arc_meta_limit,
avoiding ARC target collapse.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #14054
Closes #14093

19 months agoPropagate extent_bytes change to autotrim thread
vaclavskala [Fri, 28 Oct 2022 17:16:31 +0000 (19:16 +0200)]
Propagate extent_bytes change to autotrim thread

The autotrim thread only reads zfs_trim_extent_bytes_min and
zfs_trim_extent_bytes_max variable only on thread start.  We
should check for parameter changes during thread execution to
allow parameter changes take effect without needing to disable
then restart the autotrim.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Václav Skála <skala@vshosting.cz>
Closes #14077

19 months agoLinux 6.1 compat: change order of sys/mutex.h includes
Coleman Kane [Tue, 18 Oct 2022 19:29:44 +0000 (15:29 -0400)]
Linux 6.1 compat: change order of sys/mutex.h includes

After Linux 6.1-rc1 came out, the build started failing to build a
couple of the files in the linux spl code due to the mutex_init
redefinition. Moving the sys/mutex.h include to a lower position within
these two files appears to fix the problem.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #14040

19 months agoLinux 6.0 compat: META
Brian Behlendorf [Wed, 26 Oct 2022 21:55:12 +0000 (14:55 -0700)]
Linux 6.0 compat: META

Update the META file to reflect compatibility with the 6.0 kernel.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14091

19 months agoLinux compat: fix DECLARE_EVENT_CLASS() test when ZFS is built-in
Alexander [Mon, 17 Oct 2022 18:08:36 +0000 (20:08 +0200)]
Linux compat: fix DECLARE_EVENT_CLASS() test when ZFS is built-in

ZFS_LINUX_TRY_COMPILE_HEADER macro doesn't take CONFIG_ZFS=y into
account. As a result, on several latest Linux versions, configure
script marks DECLARE_EVENT_CLASS() available for non-GPL when ZFS
is being built as a module, but marks it unavailable when ZFS is
built-in.
Follow the logic of the neighbor macros and adjust
ZFS_LINUX_TRY_COMPILE_HEADER accordingly, so that it doesn't try
to look for a .ko when ZFS is built-in.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Closes #14006

19 months agozfs_domount: fix double-disown of dataset / double-free of zfsvfs_t
Christian Schwarz [Fri, 14 Oct 2022 18:46:47 +0000 (20:46 +0200)]
zfs_domount: fix double-disown of dataset / double-free of zfsvfs_t

Before this patch, in zfs_domount, if zfs_root or d_make_root fails, we
leave zfsvfs != NULL. This will lead to execution of the error handling
`if` statement at the `out` label, and hence to a call to
dmu_objset_disown and zfsvfs_free.

However, zfs_umount, which we call upon failure of zfs_root and
d_make_root already does dmu_objset_disown and zfsvfs_free.

I suppose this patch rather adds to the brittleness of this part of the
code base, but I don't want to invest more time in this right now.
To add a regression test, we'd need some kind of fault injection
facility for zfs_root or d_make_root, which doesn't exist right now.
And even then, I think that regression test would be too closely tied
to the implementation.

To repro the double-disown / double-free, do the following:
1. patch zfs_root to always return an error
2. mount a ZFS filesystem

Here's the stack trace you would see then:

  VERIFY3(ds->ds_owner == tag) failed (0000000000000000 == ffff9142361e8000)
  PANIC at dsl_dataset.c:1003:dsl_dataset_disown()
  Showing stack for process 28332
  CPU: 2 PID: 28332 Comm: zpool Tainted: G           O      5.10.103-1.nutanix.el7.x86_64 #1
  Call Trace:
   dump_stack+0x74/0x92
   spl_dumpstack+0x29/0x2b [spl]
   spl_panic+0xd4/0xfc [spl]
   dsl_dataset_disown+0xe9/0x150 [zfs]
   dmu_objset_disown+0xd6/0x150 [zfs]
   zfs_domount+0x17b/0x4b0 [zfs]
   zpl_mount+0x174/0x220 [zfs]
   legacy_get_tree+0x2b/0x50
   vfs_get_tree+0x2a/0xc0
   path_mount+0x2fa/0xa70
   do_mount+0x7c/0xa0
   __x64_sys_mount+0x8b/0xe0
   do_syscall_64+0x38/0x50
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Co-authored-by: Christian Schwarz <christian.schwarz@nutanix.com>
Signed-off-by: Christian Schwarz <christian.schwarz@nutanix.com>
Closes #14025

19 months agoLinux: Remove ZFS_AC_KERNEL_SRC_MODULE_PARAM_CALL_CONST autotools check
Richard Yao [Sun, 9 Oct 2022 01:29:54 +0000 (21:29 -0400)]
Linux: Remove ZFS_AC_KERNEL_SRC_MODULE_PARAM_CALL_CONST autotools check

On older kernels, the definition for `module_param_call()` typecasts
function pointers to `(void *)`, which triggers -Werror, causing the
check to return false when it should return true.

Fixing this breaks the build process on some older kernels because they
define a `__check_old_set_param()` function in their headers that checks
for a non-constified `->set()`. We workaround that through the c
preprocessor by defining `__check_old_set_param(set)` to `(set)`, which
prevents the build failures.

However, it is now apparent that all kernels that we support have
adopted the GRSecurity change, so there is no need to have an explicit
autotools check for it anymore. We therefore remove the autotools check,
while adding the workaround to our headers for the build time
non-constified `->set()` check done by older kernel headers.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13984
Closes #14004

19 months agoCI: bump actions/upload-artifact to v3
George Melikov [Wed, 12 Oct 2022 06:33:38 +0000 (09:33 +0300)]
CI: bump actions/upload-artifact to v3

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #14018

19 months agoCI: bump actions/checkout to v3
George Melikov [Tue, 11 Oct 2022 19:59:06 +0000 (22:59 +0300)]
CI: bump actions/checkout to v3

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #14018

19 months agoStop ganging due to past vdev write errors
Serapheim Dimitropoulos [Tue, 11 Oct 2022 19:27:41 +0000 (12:27 -0700)]
Stop ganging due to past vdev write errors

= Problem

While examining a customer's system we noticed unreasonable space
usage from a few snapshots due to gang blocks. Under some further
analysis we discovered that the pool would create gang blocks because
all its disks had non-zero write error counts and they'd be skipped
for normal metaslab allocations due to the following if-clause in
`metaslab_alloc_dva()`:
```
/*
 * Avoid writing single-copy data to a failing,
 * non-redundant vdev, unless we've already tried all
 * other vdevs.
 */
if ((vd->vdev_stat.vs_write_errors > 0 ||
    vd->vdev_state < VDEV_STATE_HEALTHY) &&
    d == 0 && !try_hard && vd->vdev_children == 0) {
metaslab_trace_add(zal, mg, NULL, psize, d,
    TRACE_VDEV_ERROR, allocator);
goto next;
}
```

= Proposed Solution

Get rid of the predicate in the if-clause that checks the past
write errors of the selected vdev. We still try to allocate from
HEALTHY vdevs anyway by checking vdev_state so the past write
errors doesn't seem to help us (quite the opposite - it can cause
issues in long-lived pools like the one from our customer).

= Testing

I first created a pool with 3 vdevs:
```
$ zpool list -v volpool
NAME        SIZE  ALLOC   FREE
volpool    22.5G   117M  22.4G
  xvdb     7.99G  40.2M  7.46G
  xvdc     7.99G  39.1M  7.46G
  xvdd     7.99G  37.8M  7.46G
```

And used `zinject` like so with each one of them:
```
$ sudo zinject -d xvdb -e io -T write -f 0.1 volpool
```

And got the vdevs to the following state:
```
$ zpool status volpool
  pool: volpool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.
...<cropped>..
action: Determine if the device needs to be replaced, and clear the
...<cropped>..
config:

NAME        STATE     READ WRITE CKSUM
volpool     ONLINE       0     0     0
  xvdb      ONLINE       0     1     0
  xvdc      ONLINE       0     1     0
  xvdd      ONLINE       0     4     0

```

I also double-checked their write error counters with sdb:
```
sdb> spa volpool | vdev | member vdev_stat.vs_write_errors
(uint64_t)0  # <---- this is the root vdev
(uint64_t)2
(uint64_t)1
(uint64_t)1
```

Then I checked that I the problem was reproduced in my VM as I the
gang count was growing in zdb as I was writting more data:
```
$ sudo zdb volpool | grep gang
        ganged count:              1384

$ sudo zdb volpool | grep gang
        ganged count:              1393

$ sudo zdb volpool | grep gang
        ganged count:              1402

$ sudo zdb volpool | grep gang
        ganged count:              1414
```

Then I updated my bits with this patch and the gang count stayed the
same.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #14003

19 months agozvol_wait logic may terminate prematurely
Serapheim Dimitropoulos [Tue, 11 Oct 2022 19:12:04 +0000 (12:12 -0700)]
zvol_wait logic may terminate prematurely

Setups that have a lot of zvols may see zvol_wait terminate prematurely
even though the script is still making progress.  For example, we have a
customer that called zvol_wait for ~7100 zvols and by the last iteration
of that script it was still waiting on ~2900. Similarly another one
called zvol_wait for 2200 and by the time the script terminated there
were only 50 left.

This patch adjusts the logic to stay within the outer loop of the script
if we are making any progress whatsoever.

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #13998

19 months agoRemove ambiguity on demand vs prefetch stats reported by arc_summary
shodanshok [Tue, 4 Oct 2022 18:00:02 +0000 (20:00 +0200)]
Remove ambiguity on demand vs prefetch stats reported by arc_summary

arc_summary currently list prefetch stats as "demand prefetch"
However, a hit/miss can be due to demand or prefetch, not both.
To remove any confusion, this patch removes the "Demand" word
from the affected lines.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #13985

19 months agoFix panic in dsl_process_sub_livelist for EINTR
Serapheim Dimitropoulos [Thu, 29 Sep 2022 16:39:48 +0000 (09:39 -0700)]
Fix panic in dsl_process_sub_livelist for EINTR

= Issue

Recently we hit an assertion panic in `dsl_process_sub_livelist` while
exporting the spa and interrupting `bpobj_iterate_nofree`. In that case
`bpobj_iterate_nofree` stops mid-way returning an EINTR without clearing
the intermediate AVL tree that keeps track of the livelist entries it
has encountered so far. At that point the code has a VERIFY for the
number of elements of the AVL expecting it to be zero (which is not the
case for EINTR).

= Fix

Cleanup any intermediate state before destroying the AVL when
encountering EINTR. Also added a comment documenting the scenario where
the EINTR comes up. There is no need to do anything else for the calles
of `dsl_process_sub_livelist` as they already handle the EINTR case.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #13939

19 months agoBring per_txg_dirty_frees_percent back to 30
Mateusz Guzik [Wed, 28 Sep 2022 00:38:03 +0000 (02:38 +0200)]
Bring per_txg_dirty_frees_percent back to 30

The current value causes significant artificial slowdown during mass
parallel file removal, which can be observed both on FreeBSD and Linux
when running real workloads.

Sample results from Linux doing make -j 96 clean after an allyesconfig
modules build:

before: 4.14s user 6.79s system 48% cpu 22.631 total
after: 4.17s user 6.44s system 153% cpu 6.927 total

FreeBSD results in the ticket.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #13932
Closes #13938

19 months agoAdd options to zfs redundant_metadata property
Akash B [Thu, 20 Oct 2022 00:07:51 +0000 (05:37 +0530)]
Add options to zfs redundant_metadata property

Currently, additional/extra copies are created for metadata in
addition to the redundancy provided by the pool(mirror/raidz/draid),
due to this 2 times more space is utilized per inode and this decreases
the total number of inodes that can be created in the filesystem. By
setting redundant_metadata to none, no additional copies of metadata
are created, hence can reduce the space consumed by the additional
metadata copies and increase the total number of inodes that can be
created in the filesystem.  Additionally, this can improve file create
performance due to the reduced amount of metadata which needs
to be written.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Dipak Ghosh <dipak.ghosh@hpe.com>
Signed-off-by: Akash B <akash-b@hpe.com>
Closes #13680

19 months agoFreeBSD: vn_flush_cached_data: observe vnode locking contract
Andriy Gapon [Wed, 26 Oct 2022 22:00:58 +0000 (01:00 +0300)]
FreeBSD: vn_flush_cached_data: observe vnode locking contract

vm_object_page_clean() expects that the associated vnode is locked
as VOP_PUTPAGES() may get called on the vnode.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes #14079
(cherry picked from commit 41133c97949af43daadee0503a9842a8dce8f0fd)

19 months agoFreeBSD: Fix a pair of bugs in zfs_fhtovp()
Mark Johnston [Tue, 11 Oct 2022 19:29:55 +0000 (15:29 -0400)]
FreeBSD: Fix a pair of bugs in zfs_fhtovp()

- Add a zfs_exit() call in an error path, otherwise a lock is leaked.
- Remove the fid_gen > 1 check.  That appears to be Linux-specific:
  zfsctl_snapdir_fid() sets fid_gen to 0 or 1 depending on whether the
  snapshot directory is mounted.  On FreeBSD it fails, making snapshot
  dirs inaccessible via NFS.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Andriy Gapon <avg@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Fixes: 43dbf8817808 ("FreeBSD: vfsops: use setgen for error case")
Closes #14001
Closes #13974
(cherry picked from commit ed566bf1cd0bdbf85e8c63c1c119e3d2ef5db1f6)

20 months agoFix sequential resilver drive failure race condition
samwyc [Wed, 19 Oct 2022 22:48:13 +0000 (04:18 +0530)]
Fix sequential resilver drive failure race condition

This patch handles the race condition on simultaneous failure of
2 drives, which misses the vdev_rebuild_reset_wanted signal in
vdev_rebuild_thread. We retry to catch this inside the
vdev_rebuild_complete_sync function.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Dipak Ghosh <dipak.ghosh@hpe.com>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Samuel Wycliffe J <samwyc@hpe.com>
Closes #14041
Closes #14050

20 months agocontrib: dracut: zfs-snapshot-bootfs: exit status fix
Brian Behlendorf [Thu, 20 Oct 2022 18:28:48 +0000 (11:28 -0700)]
contrib: dracut: zfs-snapshot-bootfs: exit status fix

Correct misplaced `-` is the original backport of #13769.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #13769

20 months agocontrib: dracut: zfs-{rollback,snapshot}-bootfs: explicit snapname fix
gregory-lee-bartholomew [Wed, 29 Jun 2022 23:56:04 +0000 (18:56 -0500)]
contrib: dracut: zfs-{rollback,snapshot}-bootfs: explicit snapname fix

Due to a missing semicolon on the ExecStart line, it wasn't possible
to specify the snapshot name on the bootfs.{rollback,snapshot}
kernel parameters if the boot dataset name was obtained from the
root=zfs:... kernel parameter.

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gregory Bartholomew <gregory.lee.bartholomew@gmail.com>
Closes #13585

20 months agokcfpool_alloc() should have its argument list marked void
Richard Yao [Wed, 12 Oct 2022 22:47:39 +0000 (18:47 -0400)]
kcfpool_alloc() should have its argument list marked void

This error occurred when building on Gentoo with debugging enabled:

zfs-kmod-2.1.6/work/zfs-2.1.6/module/icp/core/kcf_sched.c:1277:14:
error: a function declaration without a prototype is deprecated
in all versions of C [-Werror,-Wstrict-prototypes]
  kcfpool_alloc()
               ^
               void
1 error generated.

This function is not present in master.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14023

20 months agoetc: mask zfs-load-key.service
наб [Wed, 12 Oct 2022 22:29:21 +0000 (00:29 +0200)]
etc: mask zfs-load-key.service

Otherwise, systemd-sysv-generator will generate a service equivalent
that breaks the boot: under systemd this is covered by
zfs-mount-generator

We already do this for zfs-import.service, and other init scripts are
suppressed automatically by the "actual" .service files

Fixes: commit f04b97620059d08b37d9e80ada397e742bb2f311 ("Add init script
 to load keys")
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #14010
Closes #14019

20 months agoinitramfs: use `mount.zfs` instead of `mount`
Damian Szuberski [Mon, 11 Apr 2022 22:51:23 +0000 (00:51 +0200)]
initramfs: use `mount.zfs` instead of `mount`

A followup to d7a67402a85252e163aa8a9b69e7eda499db8c61

For `mount -t zfs -o opts ds mp` command line
some implementations of `mount(8)`, e. g. Busybox in Debian
work as follows:

```
newfstatat(AT_FDCWD, "ds", 0x7fff826f4ab0, 0) = -1
mount("ds", "mp", "zfs", MS_SILENT, NULL) = 0
```

The logic above skips completely `mount.zfs` and prevents us
from reading filesystem properties and applying mount options.

For comparison, the coreutils `mount(8)` implementation does:

```
openat(AT_FDCWD, "/proc/filesystems", O_RDONLY|O_CLOEXEC) = 3
// figure out that zfs is a `nodev` filesystem and look for a helper
newfstatat(AT_FDCWD, "/sbin/mount.zfs" ...) = 0
execve("/sbin/mount.zfs" ...) = 0
```

Using `mount.zfs` in initramfs would help circumvent deficiencies
of some of `mount(8)` implementations. `mount -t zfs` translates
to `mount.zfs` invocation, except for cases when explicitly disabled
by `-i`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #13305
(cherry picked from commit 35d81a75a8c13e011e19fd12cf553d9c5849386e)

20 months agoTag zfs-2.1.6
Tony Hutter [Mon, 19 Sep 2022 16:50:46 +0000 (09:50 -0700)]
Tag zfs-2.1.6

META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
20 months agoFix bad free in skein code
Richard Yao [Tue, 27 Sep 2022 19:36:58 +0000 (15:36 -0400)]
Fix bad free in skein code

Clang's static analyzer found a bad free caused by skein_mac_atomic().
It will allocate a context on the stack and then pass it to
skein_final(), which attempts to free it. Upon inspection,
skein_digest_atomic() also has the same problem.

These functions were created to match the OpenSolaris ICP API, so I was
curious how we avoided this in other providers and looked at the SHA2
code. It appears that SHA2 has a SHA2Final() helper function that is
called by the exported sha2_mac_final()/sha2_digest_final() as well as
the sha2_mac_atomic() and sha2_digest_atomic() functions. The real work
is done in SHA2Final() while some checks and the free are done in
sha2_mac_final()/sha2_digest_final().

We fix the use after free in the skein code by taking inspiration from
the SHA2 code. We introduce a skein_final_nofree() that does most of the
work, and make skein_final() into a function that calls it and then
frees the memory.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13954

20 months agozpool: Don't print "repairing" on force faulted drives
Tony Hutter [Fri, 23 Sep 2022 17:24:19 +0000 (10:24 -0700)]
zpool: Don't print "repairing" on force faulted drives

If you force fault a drive that's resilvering, it's scan stats can get
frozen in time, giving the false impression that it's being resilvered.
This commit checks the vdev state to see if the vdev is healthy before
reporting "resilvering" or "repairing" in zpool status.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #13927
Closes #13930

20 months agoFreeBSD: handle V_PCATCH
Mateusz Guzik [Tue, 20 Sep 2022 22:22:32 +0000 (00:22 +0200)]
FreeBSD: handle V_PCATCH

See https://cgit.FreeBSD.org/src/commit/?id=a75d1ddd74312f5dd79bc1e965f7077679659f2e

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #13910

20 months agoFreeBSD: catch up to 1400068
Mateusz Guzik [Tue, 20 Sep 2022 22:21:30 +0000 (00:21 +0200)]
FreeBSD: catch up to 1400068

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #13909

20 months agoFreeBSD: stop passing LK_INTERLOCK to VOP_LOCK
Mateusz Guzik [Tue, 20 Sep 2022 00:17:27 +0000 (02:17 +0200)]
FreeBSD: stop passing LK_INTERLOCK to VOP_LOCK

There is an ongoing effort to eliminate this feature.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #13908

20 months agoFreeBSD: Fix integer conversion for vnlru_free{,_vfsops}()
Richard Yao [Wed, 14 Sep 2022 19:51:55 +0000 (15:51 -0400)]
FreeBSD: Fix integer conversion for vnlru_free{,_vfsops}()

When reviewing #13875, I noticed that our FreeBSD code has an issue
where it converts from `int64_t` to `int` when calling
`vnlru_free{,_vfsops}()`. The result is that if the int64_t is `1 <<
36`, the int will be 0, since the low bits are 0. Even when some low
bits are set, a value such as `((1 << 36) + 1)` would truncate to 1,
which is wrong.

There is protection against this on 32-bit platforms, but on 64-bit
platforms, there is no check to protect us, so we add a check.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13882

20 months agoFreeBSD: Ignore symlink to i386 includes
Ryan Moeller [Tue, 2 Aug 2022 23:34:23 +0000 (19:34 -0400)]
FreeBSD: Ignore symlink to i386 includes

A symlink to i386 includes is created in the build dir on amd64 since
freebsd/freebsd-src@d07600c563039f252becc29ac7d9a454b6b0600d

Tell git to ignore it like the other include links.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #13719

20 months agoLUA: Fix CVE-2014-5461
Richard Yao [Tue, 27 Sep 2022 23:44:13 +0000 (19:44 -0400)]
LUA: Fix CVE-2014-5461

Apply the fix from upstream.

http://www.lua.org/bugs.html#5.2.2-1
https://www.opencve.io/cve/CVE-2014-5461

It should be noted that exploiting this requires the `SYS_CONFIG`
privilege, and anyone with that privilege likely has other opportunities
to do exploits, so it is unlikely that bad actors could exploit this
unless system administrators are executing untrusted ZFS Channel
Programs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13949

20 months agoLinux: Fix uninitialized variable usage in zio_do_crypt_data()
Richard Yao [Mon, 26 Sep 2022 23:44:22 +0000 (19:44 -0400)]
Linux: Fix uninitialized variable usage in zio_do_crypt_data()

Coverity complained about this. An error from `hkdf_sha512()` before uio
initialization will cause pointers to uninitialized memory to be passed
to `zio_crypt_destroy_uio()`. This is a regression that was introduced
by cf63739191b6cac629d053930a4aea592bca3819. Interestingly, this never
affected FreeBSD, since the FreeBSD version never had that patch ported.
Since moving uio initialization to the top of this function would slow
down the qat_crypt() path, we only move the `memset()` calls to the top
of the function. This is sufficient to fix this problem.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13944

20 months agoRefactor Log Size Limit
Alexander Motin [Tue, 24 May 2022 16:46:35 +0000 (12:46 -0400)]
Refactor Log Size Limit

Original Log Size Limit implementation blocked all writes in case of
limit reached until the TXG is committed and the log is freed.  It
caused huge delays and following speed spikes in application writes.

This implementation instead smoothly throttles writes, using exactly
the same mechanism as used for dirty data.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: jxdking <lostking2008@hotmail.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Issue #12284
Closes #13476

20 months agoRevert "Reduce dbuf_find() lock contention"
Brian Behlendorf [Mon, 19 Sep 2022 18:07:15 +0000 (11:07 -0700)]
Revert "Reduce dbuf_find() lock contention"

This reverts commit 34dbc618f50cfcd392f90af80c140398c38cbcd1.  While this
change resolved the lock contention observed for certain workloads, it
inadventantly reduced the maximum hash inserts/removes per second.  This
appears to be due to the slightly higher acquisition cost of a rwlock vs
a mutex.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
20 months agoAdd zfs_btree_verify_intensity kernel module parameter
Richard Yao [Thu, 15 Sep 2022 23:22:33 +0000 (19:22 -0400)]
Add zfs_btree_verify_intensity kernel module parameter

I see a few issues in the issue tracker that might be aided by being
able to turn this on. We have no module parameter for it, so I would
like to add one.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13874

20 months agoFix incorrect size given to bqueue_enqueue() call in dmu_redact.c
Richard Yao [Thu, 15 Sep 2022 23:21:21 +0000 (19:21 -0400)]
Fix incorrect size given to bqueue_enqueue() call in dmu_redact.c

We pass sizeof (struct redact_record *) rather than sizeof (struct
redact_record). Passing the pointer size is wrong.

Coverity caught this in two places.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13885

20 months agoDelay ZFS_PROP_SHARESMB property to handle it for encrypted raw receive
Ameer Hamza [Tue, 20 Sep 2022 22:19:05 +0000 (03:19 +0500)]
Delay ZFS_PROP_SHARESMB property to handle it for encrypted raw receive

For encrypted raw receive, objset creation is delayed until a call to
dmu_recv_stream(). ZFS_PROP_SHARESMB property requires objset to be
populated when calling zpl_earlier_version(). To correctly handle the
ZFS_PROP_SHARESMB property for encrypted raw receive, this change
delays setting the property.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #13878

20 months agozfs recv hangs if max recordsize is less than received recordsize
Ameer Hamza [Fri, 16 Sep 2022 20:52:25 +0000 (01:52 +0500)]
zfs recv hangs if max recordsize is less than received recordsize

- Some optimizations for bqueue enqueue/dequeue.
- Added a fix to prevent deadlock when both bqueue_enqueue_impl()
and bqueue_dequeue() waits for signal to be triggered.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #13855