]> CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log
FreeBSD/FreeBSD.git
20 months agoFix arc_p aggressive increase
shodanshok [Fri, 11 Nov 2022 18:41:36 +0000 (19:41 +0100)]
Fix arc_p aggressive increase

The original ARC paper called for an initial 50/50 MRU/MFU split
and this is accounted in various places where arc_p = arc_c >> 1,
with further adjustment based on ghost lists size/hit. However, in
current code both arc_adapt() and arc_get_data_impl() aggressively
grow arc_p until arc_c is reached, causing unneeded pressure on
MFU and greatly reducing its scan-resistance until ghost list
adjustments kick in.

This patch restores the original behavior of initially having arc_p
as 1/2 of total ARC, without preventing MRU to use up to 100% total
ARC when MFU is empty.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #14137
Closes #14120

20 months agoAdd ability to recompress send streams with new compression algorithm
Paul Dagnelie [Thu, 10 Nov 2022 23:23:46 +0000 (15:23 -0800)]
Add ability to recompress send streams with new compression algorithm

As new compression algorithms are added to ZFS, it could be useful for
people to recompress data with new algorithms. There is currently no
mechanism to do this aside from copying the data manually into a new
filesystem with the new algorithm enabled. This tool allows the
transformation to happen through zfs send, allowing it to be done
efficiently to remote systems and in an incremental fashion.

A new zstream command is added that decompresses WRITE records and
then recompresses them with a provided algorithm, and then re-emits
the modified send stream. It may also be possible to re-compress
embedded block pointers, but that was not attempted for the initial
version.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #14106

20 months agoZTS: random_readwrite test doesn't run correctly
John Wren Kennedy [Thu, 10 Nov 2022 22:00:04 +0000 (15:00 -0700)]
ZTS: random_readwrite test doesn't run correctly

This test uses fio's bssplit mechanism to choose io sizes for the test,
leaving the PERF_IOSIZES variable empty. Because that variable is
empty, the innermost loop in do_fio_run_impl is never executed, and as
a result, this test does the setup but collects no data. Setting the
variable to "bssplit" allows performance data to be gathered.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: John Wren Kennedy <john.kennedy@delphix.com>
Closes #14163

20 months agoCleanup: Suppress Coverity dereference before/after NULL check reports
Richard Yao [Thu, 10 Nov 2022 14:09:35 +0000 (09:09 -0500)]
Cleanup: Suppress Coverity dereference before/after NULL check reports

f224eddf922a33ca4b86d83148e9e6fa155fc290 began dereferencing a NULL
checked pointer in zpl_vap_init(), which made Coverity complain because
either the dereference is unsafe or the NULL check is unnecessary. Upon
inspection, this pointer is guaranteed to never be NULL because it is
from the Linux kernel VFS. The calls into ZFS simply would not make
sense if this pointer were NULL, so the NULL check is unnecessary.

Reported-by: Coverity (CID 1527260)
Reported-by: Coverity (CID 1527262)
Reviewed-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Reviewed-by: Youzhong Yang <yyang@mathworks.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14170

20 months agoFix potential NULL pointer dereference regression
Richard Yao [Thu, 10 Nov 2022 14:01:58 +0000 (09:01 -0500)]
Fix potential NULL pointer dereference regression

945b407486a0072ec7dd117a0bde2f72d52eb445 neglected to `NULL` check
`tx->tx_objset`, which is already done in the function. This upset
Coverity, which complained about a "dereference after null check".

Upon inspection, it was found that whenever `dmu_tx_create_dd()` is
called followed by `dmu_tx_assign()`, such as in
`dsl_sync_task_common()`, `tx->tx_objset` will be `NULL`.

Reported-by: Coverity (CID 1527261)
Reviewed-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Reviewed-by: Youzhong Yang <yyang@mathworks.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14170

20 months agoAllow to control failfast
Mariusz Zaborski [Thu, 10 Nov 2022 21:37:12 +0000 (22:37 +0100)]
Allow to control failfast

Linux defaults to setting "failfast" on BIOs, so that the OS will not
retry IOs that fail, and instead report the error to ZFS.

In some cases, such as errors reported by the HBA driver, not
the device itself, we would wish to retry rather than generating
vdev errors in ZFS. This new property allows that.

This introduces a per vdev option to disable the failfast option.
This also introduces a global module parameter to define the failfast
mask value.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Sponsored-by: Seagate Technology LLC
Submitted-by: Klara, Inc.
Closes #14056

20 months agoquota: disable quota check for ZVOL
Mariusz Zaborski [Tue, 8 Nov 2022 20:40:22 +0000 (21:40 +0100)]
quota: disable quota check for ZVOL

The quota for ZVOLs is set to the size of the volume. When the quota
reaches the maximum, there isn't an excellent way to check if the new
writers are overwriting the data or if they are inserting a new one.
Because of that, when we reach the maximum quota, we wait till txg is
flushed. This is causing a significant fluctuation in bandwidth.

In the case of ZVOL, the quota is enforced by the volsize, so we
can omit it.

This commit adds a sysctl thats allow to control if the quota mechanism
should be enforced or not.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Sponsored-by: Zededa Inc.
Sponsored-by: Klara Inc.
Closes #13838

20 months agoOptionally skip zil_close during zvol_create_minor_impl
Alan Somers [Tue, 8 Nov 2022 20:38:08 +0000 (13:38 -0700)]
Optionally skip zil_close during zvol_create_minor_impl

If there were no zil entries to replay, skip zil_close.  zil_close waits
for a transaction to sync.  That can take several seconds, for example
during pool import of a resilvering pool.  Skipping zil_close can cut
the time for "zpool import" from 2 hours to 45 seconds on a resilvering
pool with a thousand zvols.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Sponsored-by: Axcient
Closes #13999
Closes #14015

20 months agoSupport idmapped mount in user namespace
youzhongyang [Tue, 8 Nov 2022 18:28:56 +0000 (13:28 -0500)]
Support idmapped mount in user namespace

Linux 5.17 commit torvalds/linux@5dfbfe71e enables "the idmapping
infrastructure to support idmapped mounts of filesystems mounted
with an idmapping". Update the OpenZFS accordingly to improve the
idmapped mount support.

This pull request contains the following changes:

- xattr setter functions are fixed to take mnt_ns argument. Without
  this, cp -p would fail for an idmapped mount in a user namespace.
- idmap_util is enhanced/fixed for its use in a user ns context.
- One test case added to test idmapped mount in a user ns.

Reviewed-by: Christian Brauner <christian@brauner.io>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #14097

20 months agodsl_prop_known_index(): check for invalid prop
Damian Szuberski [Tue, 8 Nov 2022 18:16:01 +0000 (04:16 +1000)]
dsl_prop_known_index(): check for invalid prop

Resolve UBSAN array-index-out-of-bounds error in zprop_desc_t.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #14142
Closes #14147

20 months agoAdds the `-p` option to `zfs holds`
Mohamed Tawfik [Tue, 8 Nov 2022 18:08:21 +0000 (20:08 +0200)]
Adds the `-p` option to `zfs holds`

This allows for printing a machine-readable, accurate to the second,
hold creation time in the form of a unix epoch timestamp.

Additionally, updates relevant documentation and man pages accordingly.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mohamed Tawfik <m_tawfik@aucegypt.edu>
Closes #13690
Closes #14152

20 months agofreebsd: simplify MD isa_defs.h
Brooks Davis [Thu, 27 Oct 2022 23:55:45 +0000 (00:55 +0100)]
freebsd: simplify MD isa_defs.h

Most of this file was a pile of defines, apparently from Solaris that
controlled nothing in the source tree.  A few things controlled the
definition of unused types or macros which I have removed.

Considerable further cleanup is possible including removal of
architectures FreeBSD never supported.  This file should likely converge
with the Linux version to the extent possible.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14127

20 months agofreebsd: trim dkio.h to the minimum
Brooks Davis [Thu, 27 Oct 2022 23:41:53 +0000 (00:41 +0100)]
freebsd: trim dkio.h to the minimum

Only DKIOCFLUSHWRITECACHE is required.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14127

20 months agofreebsd: add ifdefs around legacy ioctl support
Brooks Davis [Thu, 27 Oct 2022 21:45:44 +0000 (22:45 +0100)]
freebsd: add ifdefs around legacy ioctl support

Require that ZFS_LEGACY_SUPPORT be defined for legacy ioctl support to
be built.  For now, define it in zfs_ioctl_compat.h so support is always
built.  This will allow systems that need never support pre-openzfs
tools a mechanism to remove support at build time.  This code should
be removed once the need for tool compatability is gone.

No functional change at this time.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14127

20 months agofreebsd: remove no-op vn_renamepath()
Brooks Davis [Thu, 27 Oct 2022 21:28:55 +0000 (22:28 +0100)]
freebsd: remove no-op vn_renamepath()

vn_renamepath() is a Solaris-ism that was defined away in the FreeBSD
port.  Now that the only use is in the FreeBSD zfs_vnops_os.c, drop it
entierly.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14127

20 months agofreebsd: remove unused vn_rename()
Brooks Davis [Thu, 27 Oct 2022 21:24:42 +0000 (22:24 +0100)]
freebsd: remove unused vn_rename()

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14127

20 months agozed: Prevent special vdev to be replaced by hot spare
Ameer Hamza [Fri, 4 Nov 2022 18:33:47 +0000 (23:33 +0500)]
zed: Prevent special vdev to be replaced by hot spare

Special vdevs should not be replaced by a hot spare.
Log vdevs already support this, extending the
functionality for special vdevs.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #14129

20 months agoicp: fix all !ENDBR objtool warnings in x86 Asm code
Alexander Lobakin [Sun, 16 Oct 2022 21:41:39 +0000 (23:41 +0200)]
icp: fix all !ENDBR objtool warnings in x86 Asm code

Currently, only Blake3 x86 Asm code has signs of being ENDBR-aware.
At least, under certain conditions it includes some header file and
uses some custom macro from there.
Linux has its own NOENDBR since several releases ago. It's defined
in the same <asm/linkage.h>, so currently <sys/asm_linkage.h>
already is provided with it.

Let's unify those two into one %ENDBR macro. At first, check if it's
present already. If so -- use Linux kernel version. Otherwise, try
to go that second way and use %_CET_ENDBR from <cet.h> if available.
If no, fall back to just empty definition.
This fixes a couple more 'relocations to !ENDBR' across the module.
And now that we always have the latest/actual ENDBR definition, use
it at the entrance of the few corresponding functions that objtool
still complains about. This matches the way how it's used in the
upstream x86 core Asm code.

Reviewed-by: Attila Fülöp <attila@fueloep.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Closes #14035

20 months agoicp: fix rodata being marked as text in x86 Asm code
Alexander Lobakin [Sun, 16 Oct 2022 21:23:44 +0000 (23:23 +0200)]
icp: fix rodata being marked as text in x86 Asm code

objtool properly complains that it can't decode some of the
instructions from ICP x86 Asm code. As mentioned in the Makefile,
where those object files were excluded from objtool check (but they
can still be visible under IBT and LTO), those are just constants,
not code.
In that case, they must be placed in .rodata, so they won't be
marked as "allocatable, executable" (ax) in EFL headers and this
effectively prevents objtool from trying to decode this data. That
reveals a whole bunch of other issues in ICP Asm code, as previously
objtool was bailing out after that warning message.

Reviewed-by: Attila Fülöp <attila@fueloep.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Closes #14035

20 months agoicp: properly fix all RETs in x86_64 Asm code
Alexander Lobakin [Sun, 16 Oct 2022 14:53:22 +0000 (16:53 +0200)]
icp: properly fix all RETs in x86_64 Asm code

Commit 43569ee37420 ("Fix objtool: missing int3 after ret warning")
addressed replacing all `ret`s in x86 asm code to a macro in the
Linux kernel in order to enable SLS. That was done by copying the
upstream macro definitions and fixed objtool complaints.
Since then, several more mitigations were introduced, including
Rethunk. It requires to have a jump to one of the thunks in order
to work, so the RET macro was changed again. And, as ZFS code
didn't use the mainline defition, but copied it, this is currently
missing.

Objtool reminds about it time to time (Clang 16, CONFIG_RETHUNK=y):

fs/zfs/lua/zlua.o: warning: objtool: setjmp+0x25: 'naked' return
 found in RETHUNK build
fs/zfs/lua/zlua.o: warning: objtool: longjmp+0x27: 'naked' return
 found in RETHUNK build

Do it the following way:
* if we're building under Linux, unconditionally include
  <linux/linkage.h> in the related files. It is available in x86
  sources since even pre-2.6 times, so doesn't need any conftests;
* then, if RET macro is available, it will be used directly, so that
  we will always have the version actual to the kernel we build;
* if there's no such macro, we define it as a simple `ret`, as it
  was on pre-SLS times.

This ensures we always have the up-to-date definition with no need
to update it manually, and at the same time is safe for the whole
variety of kernels ZFS module supports.
Then, there's a couple more "naked" rets left in the code, they're
just defined as:

.byte 0xf3,0xc3

In fact, this is just:

rep ret

`rep ret` instead of just `ret` seems to mitigate performance issues
on some old AMD processors and most likely makes no sense as of
today.
Anyways, address those rets, so that they will be protected with
Rethunk and SLS. Include <sys/asm_linkage.h> here which now always
has RET definition and replace those constructs with just RET.
This wipes the last couple of places with unpatched rets objtool's
been complaining about.

Reviewed-by: Attila Fülöp <attila@fueloep.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Closes #14035

20 months agoFreeBSD: Fix out of bounds read in zfs_ioctl_ozfs_to_legacy()
Richard Yao [Fri, 4 Nov 2022 18:06:14 +0000 (14:06 -0400)]
FreeBSD: Fix out of bounds read in zfs_ioctl_ozfs_to_legacy()

There is an off by 1 error in the check. Fortunately, this function does
not appear to be used in kernel space, despite being compiled as part of
the kernel module. However, it is used in userspace. Callers of
lzc_ioctl_fd() likely will crash if they attempt to use the
unimplemented request number.

This was reported by FreeBSD's coverity scan.

Reported-by: Coverity (CID 1432059)
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14135

20 months agoExpose zfs_vdev_open_timeout_ms as a tunable
Serapheim Dimitropoulos [Thu, 3 Nov 2022 22:02:46 +0000 (15:02 -0700)]
Expose zfs_vdev_open_timeout_ms as a tunable

Some of our customers have been occasionally hitting zfs import failures
in Linux because udevd doesn't create the by-id symbolic links in time
for zpool import to use them. The main issue is that the
systemd-udev-settle.service that zfs-import-cache.service and other
services depend on is racy. There is also an openzfs issue filed (see
https://github.com/openzfs/zfs/issues/10891) outlining the problem and
potential solutions.

With the proper solutions being significant in terms of complexity and
the priority of the issue being low for the time being, this patch
exposes `zfs_vdev_open_timeout_ms` as a tunable so people that are
experiencing this issue often can increase it as a workaround.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #14133

20 months agoAllow mounting snapshots in .zfs/snapshot as a regular user
Allan Jude [Thu, 3 Nov 2022 18:53:24 +0000 (14:53 -0400)]
Allow mounting snapshots in .zfs/snapshot as a regular user

Rather than doing a terrible credential swapping hack, we just
check that the thing being mounted is a snapshot, and the mountpoint
is the zfsctl directory, then we allow it.

If the mount attempt is from inside a jail, on an unjailed dataset
(mounted from the host, not by the jail), the ability to mount the
snapshot is controlled by a new per-jail parameter: zfs.mount_snapshot

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Sponsored-by: Modirum MDPay
Sponsored-by: Klara Inc.
Closes #13758

20 months agoCleanup: Remove branches that always evaluate the same way
Richard Yao [Thu, 3 Nov 2022 17:47:48 +0000 (13:47 -0400)]
Cleanup: Remove branches that always evaluate the same way

Coverity reported that the ASSERT in taskq_create() is always true and
the `*offp > MAXOFFSET_T` check in zfs_file_seek() is always false.

We delete them as cleanup.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14130

20 months agoRemove an unused variable
Brooks Davis [Tue, 1 Nov 2022 20:45:36 +0000 (20:45 +0000)]
Remove an unused variable

Clang-16 detects this set-but-unused variable which is assigned and
incremented, but never referenced otherwise.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14125

20 months agoMake 1-bit bitfields unsigned
Brooks Davis [Tue, 1 Nov 2022 20:43:32 +0000 (20:43 +0000)]
Make 1-bit bitfields unsigned

This fixes -Wsingle-bit-bitfield-constant-conversion warning from
clang-16 like:

lib/libzfs/libzfs_dataset.c:4529:19: error: implicit truncation
  from 'int' to a one-bit wide bit-field changes value from
  1 to -1 [-Werror,-Wsingle-bit-bitfield-constant-conversion]
                flags.nounmount = B_TRUE;
^ ~~~~~~

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14125

20 months agoAddress warnings about possible division by zero from clangsa
Richard Yao [Thu, 3 Nov 2022 16:58:14 +0000 (12:58 -0400)]
Address warnings about possible division by zero from clangsa

* The complaint in ztest_replay_write() is only possible if something
   went horribly wrong. An assertion will silence this and if it goes
   off, we will know that something is wrong.
 * The complaint in spa_estimate_metaslabs_to_flush() is not impossible,
   but seems very unlikely. We resolve this by passing the value from
   the `MIN()` that does not go to infinity when the variable is zero.

There was a third report from Clang's scan-build, but that was a
definite false positive and disappeared when checked again through
Clang's static analyzer with Z3 refution via CodeChecker.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14124

20 months agolibuutil: deobfuscate internal pointers
Brooks Davis [Thu, 3 Nov 2022 16:57:05 +0000 (09:57 -0700)]
libuutil: deobfuscate internal pointers

uu_avl and uu_list stored internal next/prev pointers and parent
pointers (unused) obfuscated (byte swapped) to hide them from a long
forgotten leak checker (No one at the 2022 OpenZFS developers meeting
could recall the history.)  This would break on CHERI systems and adds
no obvious value.  Rename the members, use proper types rather than
uintptr_t, and eliminate the related macros.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14126

20 months agoDeny receiving into encrypted datasets if the keys are not loaded
Attila Fülöp [Thu, 3 Nov 2022 16:55:13 +0000 (17:55 +0100)]
Deny receiving into encrypted datasets if the keys are not loaded

Commit 68ddc06b611854560fefa377437eb3c9480e084b introduced support
for receiving unencrypted datasets as children of encrypted ones but
unfortunately got the logic upside down. This resulted in failing to
deny receives of incremental sends into encrypted datasets without
their keys loaded. If receiving a filesystem, the receive was done
into a newly created unencrypted child dataset of the target. In
case of volumes the receive made the target volume undeletable since
a dataset was created below it, which we obviously can't handle.
Incremental streams with embedded blocks are affected as well.

We fix the broken logic to properly deny receives in such cases.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #13598
Closes #14055
Closes #14119

20 months agolua: cast through uintptr_t when return a pointer
Brooks Davis [Thu, 27 Oct 2022 22:39:06 +0000 (23:39 +0100)]
lua: cast through uintptr_t when return a pointer

Don't assume size_t can carry pointer provenance and use uintptr_t
(identialy on all current platforms) instead.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14131

20 months agoUse intptr_t when storing an integer in a pointer
Brooks Davis [Thu, 27 Oct 2022 22:28:03 +0000 (23:28 +0100)]
Use intptr_t when storing an integer in a pointer

Cast the integer type to (u)intptr_t before casting to "void *".  In
CHERI C/C++ we warn on bare casts from integers to pointers to catch
attempts to create pointers our of thin air.  We allow the warning to be
supressed with a suitable cast through (u)intptr_t.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14131

20 months agorecvd_props_mode: use a uintptr_t to stash nvlists
Brooks Davis [Thu, 27 Oct 2022 22:25:42 +0000 (23:25 +0100)]
recvd_props_mode: use a uintptr_t to stash nvlists

Avoid assuming than a uint64_t can hold a pointer.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14131

20 months agozfs_onexit_add_cb: make action_handle point to a uintptr_t
Brooks Davis [Thu, 27 Oct 2022 22:20:05 +0000 (23:20 +0100)]
zfs_onexit_add_cb: make action_handle point to a uintptr_t

Avoid assuming than a uint64_t can hold a pointer and reduce the
number of casts in the process.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14131

20 months agoacl: use uintptr_t for ace walker cookies
Brooks Davis [Thu, 27 Oct 2022 22:04:17 +0000 (23:04 +0100)]
acl: use uintptr_t for ace walker cookies

Avoid assuming that a pointer can fit in a uint64_t and use uintptr_t
instead.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14131

20 months agolinux isa_defs.h: Don't define _ALIGNMENT_REQUIRED
Brooks Davis [Fri, 28 Oct 2022 16:36:43 +0000 (17:36 +0100)]
linux isa_defs.h: Don't define _ALIGNMENT_REQUIRED

Nothing consumes this definition so stop defining it.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14128

20 months agoImprove RISC-V support
Brooks Davis [Thu, 27 Oct 2022 23:58:41 +0000 (00:58 +0100)]
Improve RISC-V support

Check __riscv_xlen == 64 rather than _LP64 and define _LP64 if missing.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14128

20 months agoFreeBSD: Fix regression from kmem_scnprintf() in libzfs
Richard Yao [Tue, 1 Nov 2022 20:58:17 +0000 (16:58 -0400)]
FreeBSD: Fix regression from kmem_scnprintf() in libzfs

kmem_scnprintf() is only available in libzpool. Recent buildbot issues
with showing FreeBSD results kept us from seeing this before
97143b9d314d54409244f3995576d8cc8c1ebf0a was merged.

The code has been changed to sanitize the output from `kmem_scnprintf()`.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14111

20 months agoinclude overrides for zfs snapshot/rollback bootfs.service
Vince van Oosten [Sun, 23 Oct 2022 09:11:58 +0000 (11:11 +0200)]
include overrides for zfs snapshot/rollback bootfs.service

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Vince van Oosten <techhazard@codeforyouand.me>
Closes #14075
Closes #14076

20 months agoinclude overrides for zfs-import.target
Vince van Oosten [Sun, 23 Oct 2022 09:11:18 +0000 (11:11 +0200)]
include overrides for zfs-import.target

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Vince van Oosten <techhazard@codeforyouand.me>
Closes #14075
Closes #14076

20 months agoinclude systemd overrides to zfs-dracut module
Vince van Oosten [Sun, 23 Oct 2022 08:55:46 +0000 (10:55 +0200)]
include systemd overrides to zfs-dracut module

If a user that uses systemd and dracut wants to overide certain
settings, they typically use `systemctl edit [unit]` or place a file in
`/etc/systemd/system/[unit].d/override.conf` directly.

The zfs-dracut module did not include those overrides however, so this
did not have any effect at boot time.

For zfs-import-scan.service and zfs-import-cache.service, overrides are
now included in the dracut initramfs image.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Vince van Oosten <techhazard@codeforyouand.me>
Closes #14075
Closes #14076

20 months agozil: Relax assertion in zil_parse
Ryan Moeller [Tue, 1 Nov 2022 19:19:32 +0000 (15:19 -0400)]
zil: Relax assertion in zil_parse

Rather than panic debug builds when we fail to parse a whole ZIL, let's
instead improve the logging of errors and continue like in a release
build.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #14116

20 months agoZTS: rsend_009_pos.ksh is destructive on zfs-on-root system
youzhongyang [Tue, 1 Nov 2022 19:08:37 +0000 (15:08 -0400)]
ZTS: rsend_009_pos.ksh is destructive on zfs-on-root system

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #14113

20 months agoFix oversights from 4170ae4e
Richard Yao [Mon, 31 Oct 2022 17:01:04 +0000 (13:01 -0400)]
Fix oversights from 4170ae4e

4170ae4ea600fea6ac9daa8b145960c9de3915fc was intended to tackle TOCTOU
race conditions reported by CodeQL, but as an oversight, a file
descriptor was not closed and some comments were not updated.
Interestingly, CodeQL did not complain about the file descriptor leak,
so there is room for improvement in how we configure it to try to detect
this issue so that we get early warning about this.

In addition, an optimization opportunity was missed by mistake in
lib/libshare/os/linux/smb.c, which prevented us from truly closing the
TOCTOU race. This was also caught by Coverity.

Reported-by: Coverity (CID 1524424)
Reported-by: Coverity (CID 1526804)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14109

20 months agoAvoid null pointer dereference in dsl_fs_ss_limit_check()
Allan Jude [Sat, 29 Oct 2022 20:08:54 +0000 (16:08 -0400)]
Avoid null pointer dereference in dsl_fs_ss_limit_check()

Check for cr == NULL before dereferencing it in
dsl_enforce_ds_ss_limits() to lookup the zone/jail ID.

Reported-by: Coverity (CID 1210459)
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #14103

20 months agoIntroduce kmem_scnprintf()
Richard Yao [Thu, 27 Oct 2022 18:16:04 +0000 (14:16 -0400)]
Introduce kmem_scnprintf()

`snprintf()` is meant to protect against buffer overflows, but operating
on the buffer using its return value, possibly by calling it again, can
cause a buffer overflow, because it will return how many characters it
would have written if it had enough space even when it did not. In a
number of places, we repeatedly call snprintf() by successively
incrementing a buffer offset and decrementing a buffer length, by its
return value. This is a potentially unsafe usage of `snprintf()`
whenever the buffer length is reached. CodeQL complained about this.

To fix this, we introduce `kmem_scnprintf()`, which will return 0 when
the buffer is zero or the number of written characters, minus 1 to
exclude the NULL character, when the buffer was too small. In all other
cases, it behaves like snprintf(). The name is inspired by the Linux and
XNU kernels' `scnprintf()`. The implementation was written before I
thought to look at `scnprintf()` and had a good name for it, but it
turned out to have identical semantics to the Linux kernel version.
That lead to the name, `kmem_scnprintf()`.

CodeQL only catches this issue in loops, so repeated use of snprintf()
outside of a loop was not caught. As a result, a thorough audit of the
codebase was done to examine all instances of `snprintf()` usage for
potential problems and a few were caught. Fixes for them are included in
this patch.

Unfortunately, ZED is one of the places where `snprintf()` is
potentially used incorrectly. Since using `kmem_scnprintf()` in it would
require changing how it is linked, we modify its usage to make it safe,
no matter what buffer length is used. In addition, there was a bug in
the use of the return value where the NULL format character was not
being written by pwrite(). That has been fixed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14098

20 months agoCleanup dump_bookmarks()
Richard Yao [Thu, 27 Oct 2022 19:41:39 +0000 (15:41 -0400)]
Cleanup dump_bookmarks()

Assertions are meant to check assumptions, but the way that this
assertion is written does not check an assumption, since it is provably
always true. Removing the assertion will cause a compiler warning (made
into an error by -Werror) about printing up to 512 bytes to a 256-byte
buffer, so instead, we change the assertion to verify the assumption
that we never do a snprintf() that is truncated to avoid overrunning the
256-byte buffer.

This was caught by an audit of the codebase to look for misuse of
`snprintf()` after CodeQL reported that we had misused `snprintf()`. An
explanation of how snprintf() can be misused is here:

https://www.redhat.com/en/blog/trouble-snprintf

This particular instance did not misuse `snprintf()`, but it was caught
by the audit anyway.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14098

20 months agoFix too few arguments to formatting function
Richard Yao [Thu, 27 Oct 2022 16:45:26 +0000 (12:45 -0400)]
Fix too few arguments to formatting function

CodeQL reported that when the VERIFY3U condition is false, we do not
pass enough arguments to `spl_panic()`. This is because the format
string from `snprintf()` was concatenated into the format string for
`spl_panic()`, which causes us to have an unexpected format specifier.

A CodeQL developer suggested fixing the macro to have a `%s` format
string that takes a stringified RIGHT argument, which would fix this.
However, upon inspection, the VERIFY3U check was never necessary in the
first place, so we remove it in favor of just calling `snprintf()`.

Lastly, it is interesting that every other static analyzer run on the
codebase did not catch this, including some that made an effort to catch
such things. Presumably, all of them relied on header annotations, which
we have not yet done on `spl_panic()`. CodeQL apparently is able to
track the flow of arguments on their way to annotated functions, which
llowed it to catch this when others did not. A future patch that I have
in development should annotate `spl_panic()`, so the others will catch
this too.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14098

20 months agoFix TOCTOU race conditions reported by CodeQL and Coverity
Richard Yao [Thu, 27 Oct 2022 15:03:48 +0000 (11:03 -0400)]
Fix TOCTOU race conditions reported by CodeQL and Coverity

CodeQL and Coverity both complained about:

 * lib/libshare/os/linux/smb.c
 * tests/zfs-tests/cmd/mmapwrite.c
  * twice
 * tests/zfs-tests/tests/functional/tmpfile/tmpfile_002_pos.c
 * tests/zfs-tests/tests/functional/tmpfile/tmpfile_stat_mode.c
* coverity had a second complaint that CodeQL did not have
 * tests/zfs-tests/cmd/suid_write_to_file.c
* Coverity had two complaints and CodeQL had one complaint, both
  differed. The CodeQL complaint is about the main point of the
  test, so it is not fixable without a hack involving `fork()`.

The issues reported by CodeQL are fixed, with the exception of the last
one, which is deemed to be a false positive that is too much trouble to
wrokaround. The issues reported by Coverity were only fixed if CodeQL
complained about them.

There were issues reported by Coverity in a number of other files that
were not reported by CodeQL, but fixing the CodeQL complaints is
considered a priority since we want to integrate it into a github
workflow, so the remaining Coverity complaints are left for future work.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14098

20 months agoRevert "Cleanup: Delete dead code from send_merge_thread()"
Brian Behlendorf [Fri, 28 Oct 2022 20:25:37 +0000 (13:25 -0700)]
Revert "Cleanup: Delete dead code from send_merge_thread()"

This reverts commit fb823de9f due to a regression.  It is in fact possible
for the range->eos_marker to be false on error.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #14042
Closes #14104

20 months agodebug: fix output from VERIFY0 assertion
Rob N ★ [Fri, 28 Oct 2022 18:46:44 +0000 (05:46 +1100)]
debug: fix output from VERIFY0 assertion

The previous version reported all the right info, but the VERIFY3 name
made a little more confusing when looking for the matching location in
the source code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Rob N ★ <robn@despairlabs.com>
Closes #14099

20 months agoquota: extend quota for dataset
Mariusz Zaborski [Fri, 28 Oct 2022 18:44:18 +0000 (20:44 +0200)]
quota: extend quota for dataset

This patch relax the quota limitation for dataset by around 3%.
What this means is that user can write more data then the quota is
set to. However thanks to that we can get more stable bandwidth, in
case when we are overwriting data in-place, and not consuming any
additional space.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org>
Sponsored-by: Zededa Inc.
Sponsored-by: Klara Inc.
Closes #13839

20 months agoFix ARC target collapse when zfs_arc_meta_limit_percent=100
shodanshok [Fri, 28 Oct 2022 17:21:54 +0000 (19:21 +0200)]
Fix ARC target collapse when zfs_arc_meta_limit_percent=100

Reclaim metadata when arc_available_memory < 0 even if
meta_used is not bigger than arc_meta_limit.

As described in https://github.com/openzfs/zfs/issues/14054 if
zfs_arc_meta_limit_percent=100 then ARC target can collapse to
arc_min due to arc_purge not freeing any metadata.

This patch lets arc_prune to do its work when arc_available_memory
is negative even if meta_used is not bigger than arc_meta_limit,
avoiding ARC target collapse.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #14054
Closes #14093

20 months agoPropagate extent_bytes change to autotrim thread
vaclavskala [Fri, 28 Oct 2022 17:16:31 +0000 (19:16 +0200)]
Propagate extent_bytes change to autotrim thread

The autotrim thread only reads zfs_trim_extent_bytes_min and
zfs_trim_extent_bytes_max variable only on thread start.  We
should check for parameter changes during thread execution to
allow parameter changes take effect without needing to disable
then restart the autotrim.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Václav Skála <skala@vshosting.cz>
Closes #14077

20 months agozfs_rename: support RENAME_* flags
Aleksa Sarai [Sat, 22 Jun 2019 00:35:11 +0000 (10:35 +1000)]
zfs_rename: support RENAME_* flags

Implement support for Linux's RENAME_* flags (for renameat2). Aside from
being quite useful for userspace (providing race-free ways to exchange
paths and implement mv --no-clobber), they are used by overlayfs and are
thus required in order to use overlayfs-on-ZFS.

In order for us to represent the new renameat2(2) flags in the ZIL, we
create two new transaction types for the two flags which need
transactional-level support (RENAME_EXCHANGE and RENAME_WHITEOUT).
RENAME_NOREPLACE does not need any ZIL support because we know that if
the operation succeeded before creating the ZIL entry, there was no file
to be clobbered and thus it can be treated as a regular TX_RENAME.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Closes #12209
Closes #14070

20 months agozfs_rename: restructure to have cleaner fallbacks
Aleksa Sarai [Fri, 26 Apr 2019 13:23:07 +0000 (23:23 +1000)]
zfs_rename: restructure to have cleaner fallbacks

This is in preparation for RENAME_EXCHANGE and RENAME_WHITEOUT support
for ZoL, but the changes here allow for far nicer fallbacks than the
previous implementation (the source and target are re-linked in case of
the final link failing).

In addition, a small cleanup was done for the "target exists but is a
different type" codepath so that it's more understandable.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Closes #12209
Closes #14070

20 months agodebug: add VERIFY_{IMPLY,EQUIV} variants
Aleksa Sarai [Wed, 18 May 2022 10:29:33 +0000 (20:29 +1000)]
debug: add VERIFY_{IMPLY,EQUIV} variants

This allows for much cleaner VERIFY-level assertions.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Closes #14070

20 months agoRemove zpl_revalidate: fix snapshot rollback
Pavel Snajdr [Thu, 5 Dec 2019 00:52:27 +0000 (01:52 +0100)]
Remove zpl_revalidate: fix snapshot rollback

Open files, which aren't present in the snapshot, which is being
roll-backed to, need to disappear from the visible VFS image of
the dataset.

Kernel provides d_drop function to drop invalid entry from
the dcache, but inode can be referenced by dentry multiple dentries.

The introduced zpl_d_drop_aliases function walks and invalidates
all aliases of an inode.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #9600
Closes #14070

20 months agoFix multiplication converted to larger type
Andrew Innes [Fri, 28 Oct 2022 16:30:37 +0000 (00:30 +0800)]
Fix multiplication converted to larger type

This fixes the instances of the "Multiplication result converted to
larger type" alert that codeQL scanning found.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Andrew Innes <andrew.c12@gmail.com>
Closes #14094

20 months agoFix zio_flag_t print format
youzhongyang [Fri, 28 Oct 2022 16:08:12 +0000 (12:08 -0400)]
Fix zio_flag_t print format

Follow up for 4938d01d which changed zio_flag from enum to uint64_t.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #14100

20 months agoProcess `script` directory for all configs
Damian Szuberski [Thu, 27 Oct 2022 23:45:14 +0000 (09:45 +1000)]
Process `script` directory for all configs

Even when only building kmods process the scripts directory.  This
way the common.sh script will be generated and the zfs.sh script
can be used to load/unload the in-tree kernel modules.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #14027
Closes #14051

20 months agoAdd native Debian Packaging for Linux
Umer Saleem [Thu, 27 Oct 2022 22:38:45 +0000 (03:38 +0500)]
Add native Debian Packaging for Linux

Currently, the Debian packages are generated from ALIEN that converts
RPMs to Debian packages. This commit adds native Debian packaging for
Debian based systems.

This packaging is a fork of Debian zfs-linux 2.1.6-2 release.
(source: https://salsa.debian.org/zfsonlinux-team/zfs)

Some updates have been made to keep the footprint minimal that
include removing the tests, translation files, patches directory etc.
All credits go to Debian ZFS on Linux Packaging Team.

For copyright information, please refer to contrib/debian/copyright.

scripts/debian-packaging.sh can be used to invoke the build.

Reviewed-by: Mo Zhou <cdluminate@gmail.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #13451

20 months agoConvert enum zio_flag to uint64_t
Richard Yao [Thu, 27 Oct 2022 16:54:54 +0000 (09:54 -0700)]
Convert enum zio_flag to uint64_t

We ran out of space in enum zio_flag for additional flags. Rather than
introduce enum zio_flag2 and then modify a bunch of functions to take a
second flags variable, we expand the type to 64 bits via `typedef
uint64_t zio_flag_t`.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@klarasystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Co-authored-by: Richard Yao <richard.yao@klarasystems.com>
Closes #14086

20 months agoAdd CodeQL workflow
Richard Yao [Thu, 27 Oct 2022 16:36:17 +0000 (09:36 -0700)]
Add CodeQL workflow

CodeQL is a static analyzer from github with a very low false positive
rate. We have long wanted to have static analysis runs done on every
pull request and using CodeQL, we can.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Andrew Innes <andrew.c12@gmail.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14087

20 months agoAligned free for aligned alloc
Andrew Innes [Wed, 26 Oct 2022 22:08:31 +0000 (06:08 +0800)]
Aligned free for aligned alloc

Windows port frees memory that was alloc'd aligned in a different way
then alloc'd memory.  So changing frees to be specific.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrew Innes <andrew.c12@gmail.com>
Co-Authored-By: Jorgen Lundman <lundman@lundman.net>
Closes #14059

20 months agoFreeBSD: vn_flush_cached_data: observe vnode locking contract
Andriy Gapon [Wed, 26 Oct 2022 22:00:58 +0000 (01:00 +0300)]
FreeBSD: vn_flush_cached_data: observe vnode locking contract

vm_object_page_clean() expects that the associated vnode is locked
as VOP_PUTPAGES() may get called on the vnode.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes #14079

20 months agoSilence objtool warnings from 55d7afa4
Richard Yao [Wed, 26 Oct 2022 21:57:37 +0000 (14:57 -0700)]
Silence objtool warnings from 55d7afa4

The use of __noreturn__ in 55d7afa4adbb4ca569e9c4477a7d121f4dc0bfbd on
spl_panic() caused objtool warnings on Linux when the kernel is built
with CONFIG_STACK_VALIDATION=y. This patch works around that by
restricting the application of __noreturn__ to builds for static
analyzers.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14068

20 months agoLinux 6.0 compat: META
Brian Behlendorf [Wed, 26 Oct 2022 21:55:12 +0000 (14:55 -0700)]
Linux 6.0 compat: META

Update the META file to reflect compatibility with the 6.0 kernel.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14091

20 months agozed: Avoid core dump if wholedisk property does not exist
Ameer Hamza [Fri, 21 Oct 2022 17:46:38 +0000 (22:46 +0500)]
zed: Avoid core dump if wholedisk property does not exist

zed aborts and dumps core in vdev_whole_disk_from_config() if
wholedisk property does not exist. make_leaf_vdev() adds the
property but there may be already pools that don't have the
wholedisk in the label.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #14062

20 months agoAdd delay between zpool add zvol and zpool destroy
youzhongyang [Fri, 21 Oct 2022 17:05:13 +0000 (13:05 -0400)]
Add delay between zpool add zvol and zpool destroy

As investigated by #14026, the zpool_add_004_pos can reliably hang if
the timing is not right. This is caused by a race condition between
zed doing zpool reopen (due to the zvol being added to the zpool),
and the command zpool destroy.

This change adds a delay between zpool add zvol and zpool destroy to
avoid these issue, but does not address the underlying problem.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Issue #14026
Closes #14052

20 months agoLinux: Fix big endian and partial read bugs in get_system_hostid()
Richard Yao [Thu, 20 Oct 2022 21:52:35 +0000 (17:52 -0400)]
Linux: Fix big endian and partial read bugs in get_system_hostid()

Coverity made two complaints about this function. The first is that we
ignore the number of bytes read. The second is that we have a sizeof
mismatch.

On 64-bit systems, long is a 64-bit type. Paradoxically, the standard
says that hostid is 32-bit, yet is also a long type. On 64-bit big
endian systems, reading into the long would cause us to return 0 as our
hostid after the mask. This is wrong.

Also, if a partial read were to happen (it should not), we would return
a partial hostid, which is also wrong.

We introduce a uint32_t system_hostid stack variable and ensure that the
read is done into it and check the read's return value. Then we set the
value based on whether the read was successful. This should fix both of
coverity's complaints.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13968

20 months agoSilence new static analyzer defect reports from idmap_util.c
Richard Yao [Thu, 20 Oct 2022 21:46:12 +0000 (17:46 -0400)]
Silence new static analyzer defect reports from idmap_util.c

2a068a1394d179dda4becf350e3afb4e8819675e introduced 2 new defect
reports from Coverity and 1 from Clang's static analyzer.

Coverity complained about a potential resource leak from only calling
`close(fd)` when `fd > 0` because `fd` might be `0`. This is a false
positive, but rather than dismiss it as such, we can change the
comparison to ensure that this never appears again from any static
analyzer. Upon inspection, 6 more instances of this were found in the
file, so those were changed too. Unfortunately, since the file
descriptor has been put into an unsigned variable in `attr.userns_fd`,
we cannot do a non-negative check on it to see if it has not been
allocated, so we instead restructure the error handling to avoid the
need for a check. This also means that errors had not been handled
correctly here, so the static analyzer found a bug (although practically
by accident).

Coverity also complained about a dereference before a NULL check in
`do_idmap_mount()` on `source`. Upon inspection, it appears that the
pointer is never NULL, so we delete the NULL check as cleanup.

Clang's static analyzer complained that the return value of
`write_pid_idmaps()` can be uninitialized if we have no idmaps to write.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Youzhong Yang <yyang@mathworks.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14061

20 months agoLinux: Upgrade random_get_pseudo_bytes() to xoshiro256++ 1.0
Richard Yao [Thu, 20 Oct 2022 21:14:42 +0000 (17:14 -0400)]
Linux: Upgrade random_get_pseudo_bytes() to xoshiro256++ 1.0

The motivation for upgrading our PRNG is the recent buildbot failures in
the ZTS' tests/functional/fault/decompress_fault test. The probability
of a failure in that test is 0.8^256, which is ~1.6e-25 out of 1, yet we
have observed multiple test failures in it. This suggests a problem with
our random number generation.

The xorshift128+ generator that we were using has been replaced by newer
generators that have "better statistical properties". After doing some
reading, it turns out that these generators have "low linear complexity
of the lowest bits", which could explain the ZTS test failures.

We do two things to try to fix this:

1. We upgrade from xorshift128+ to xoshiro256++ 1.0.

2. We tweak random_get_pseudo_bytes() to copy the higher order
   bytes first.

It is hoped that this will fix the test failures in
tests/functional/fault/decompress_fault, although I have not done
simulations. I am skeptical that any simulations I do on a PRNG with a
period of 2^256 - 1 would be meaningful.

Since we have raised the minimum kernel version to 3.10 since this was
first implemented, we have the option of using the Linux kernel's
get_random_int(). However, I am not currently prepared to do performance
tests to ensure that this would not be a regression (for the time
being), so we opt for upgrading our PRNG to a newer one from Sebastiano
Vigna.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13983

20 months agoOptimize microzaps
Alexander Motin [Thu, 20 Oct 2022 18:57:15 +0000 (14:57 -0400)]
Optimize microzaps

Microzap on-disk format does not include a hash tree, expecting one to
be built in RAM during mzap_open().  The built tree is linked to DMU
user buffer, freed when original DMU buffer is dropped from cache. I've
found that workloads accessing many large directories and having active
eviction from DMU cache spend significant amount of time building and
then destroying the trees.  I've also found that for each 64 byte mzap
element additional 64 byte tree element is allocated, that is a waste
of memory and CPU caches.

Improve memory efficiency of the hash tree by switching from AVL-tree
to B-tree.  It allows to save 24 bytes per element just on pointers.
Save 32 bits on mze_hash by storing only upper 32 bits since lower 32
bits are always zero for microzaps.  Save 16 bits on mze_chunkid, since
microzap can never have so many elements.  Respectively with the 16 bits
there can be no more than 16 bits of collision differentiators.  As
result, struct mzap_ent now drops from 48 (rounded to 64) to 8 bytes.

Tune B-trees for small data.  Reduce BTREE_CORE_ELEMS from 128 to 126
to allow struct zfs_btree_core in case of 8 byte elements to pack into
2KB instead of 4KB.  Aside of the microzaps it should also help 32bit
range trees.  Allow custom B-tree leaf size to reduce memmove() time.

Split zap_name_alloc() into zap_name_alloc() and zap_name_init_str().
It allows to not waste time allocating/freeing memory when processing
multiple names in a loop during mzap_open().

Together on a pool with 10K directories of 1800 files each and DMU
cache limited to 128MB this reduces time of `find . -name zzz` by 41%
from 7.63s to 4.47s, and saves additional ~30% of CPU time on the DMU
cache reclamation.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #14039

20 months agoFix multiple definitions of struct mount_attr on recent glibc versions
Richard Yao [Thu, 20 Oct 2022 16:12:21 +0000 (12:12 -0400)]
Fix multiple definitions of struct mount_attr on recent glibc versions

The ifdef used would never work because the CPP is not aware of C
structure definitions. Rather than use an autotools check, we can just
use a nameless structure that we typedef to mount_attr_t. This is a
Linux kernel interface, which means that it is stable and this is fine
to do.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Youzhong Yang <yyang@mathworks.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14057
Closes #14058

20 months agoAdd defensive assertion to vdev_queue_aggregate()
Richard Yao [Mon, 17 Oct 2022 16:37:31 +0000 (12:37 -0400)]
Add defensive assertion to vdev_queue_aggregate()

a6ccb36b948efed0eeee4fcf99fe4b5fb81ae1d5 had been intended to include
this to silence Coverity reports, but this one was missed by mistake.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14043

20 months agoabd_return_buf() should call zfs_refcount_remove_many() early
Richard Yao [Sun, 16 Oct 2022 04:02:47 +0000 (00:02 -0400)]
abd_return_buf() should call zfs_refcount_remove_many() early

Calling zfs_refcount_remove_many() after freeing memory means we pass a
reference to freed memory as the holder. This is not believed to be able
to cause a problem, but there is a bit of a tradition of fixing these
issues when they appear so that they do not obscure more serious issues
in static analyzer output, so we fix this one too.

Clang's static analyzer found this with the help of CodeChecker's CTU
analysis.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14043

20 months agocrypto_get_ptrs() should always write to *out_data_2
Richard Yao [Sun, 16 Oct 2022 03:35:56 +0000 (23:35 -0400)]
crypto_get_ptrs() should always write to *out_data_2

Callers will check if it has been set to NULL before trying to access
it, but never initialize it themselves. Whenever "one block spans two
iovecs", `crypto_get_ptrs()` will return, without ever setting
`*out_data_2 = NULL`. The caller will then do a NULL check against the
uninitailized pointer and if it is not zero, pass it to `memcpy()`.

The only reason this has not caused horrible runtime issues is because
`memcpy()` should be told to copy zero bytes when this happens. That
said, this is technically undefined behavior, so we should correct it so
that future changes to the code cannot trigger it.

Clang's static analyzer found this with the help of CodeChecker's CTU
analysis.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14043

20 months agoSilence static analyzer warnings about spa_sync_props()
Richard Yao [Sat, 15 Oct 2022 22:34:31 +0000 (18:34 -0400)]
Silence static analyzer warnings about spa_sync_props()

Both Coverity and Clang's static analyzer complain about reading an
uninitialized intval if the property is not passed as DATA_TYPE_UINT64
in the nvlist. This is impossible becuase spa_prop_validate() already
checked this, but they are unlikely to be the last static analyzers to
complain about this, so lets just refactor the code to suppress the
warnings.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14043

20 months agoFix theoretical use of uninitialized values
Richard Yao [Sat, 15 Oct 2022 22:27:03 +0000 (18:27 -0400)]
Fix theoretical use of uninitialized values

Clang's static analyzer complains about this.

In get_configs(), if we have an invalid configuration that has no top
level vdevs, we can read a couple of uninitialized variables. Aborting
upon seeing this would break the userland tools for healthy pools, so we
instead initialize the two variables to 0 to allow the userland tools to
continue functioning for the pools with valid configurations.

In zfs_do_wait(), if no wait activities are enabled, we read an
uninitialized error variable.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14043

20 months agoFix userland memory leak in zfs_do_send()
Richard Yao [Thu, 20 Oct 2022 00:08:33 +0000 (20:08 -0400)]
Fix userland memory leak in zfs_do_send()

Clang 15's static analyzer caught this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14045

20 months agoAdd options to zfs redundant_metadata property
Akash B [Thu, 20 Oct 2022 00:07:51 +0000 (05:37 +0530)]
Add options to zfs redundant_metadata property

Currently, additional/extra copies are created for metadata in
addition to the redundancy provided by the pool(mirror/raidz/draid),
due to this 2 times more space is utilized per inode and this decreases
the total number of inodes that can be created in the filesystem. By
setting redundant_metadata to none, no additional copies of metadata
are created, hence can reduce the space consumed by the additional
metadata copies and increase the total number of inodes that can be
created in the filesystem.  Additionally, this can improve file create
performance due to the reduced amount of metadata which needs
to be written.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Dipak Ghosh <dipak.ghosh@hpe.com>
Signed-off-by: Akash B <akash-b@hpe.com>
Closes #13680

20 months agoFix sequential resilver drive failure race condition
samwyc [Wed, 19 Oct 2022 22:48:13 +0000 (04:18 +0530)]
Fix sequential resilver drive failure race condition

This patch handles the race condition on simultaneous failure of
2 drives, which misses the vdev_rebuild_reset_wanted signal in
vdev_rebuild_thread. We retry to catch this inside the
vdev_rebuild_complete_sync function.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Dipak Ghosh <dipak.ghosh@hpe.com>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Samuel Wycliffe J <samwyc@hpe.com>
Closes #14041
Closes #14050

20 months agoSupport idmapped mount
youzhongyang [Wed, 19 Oct 2022 18:17:09 +0000 (14:17 -0400)]
Support idmapped mount

Adds support for idmapped mounts.  Supported as of Linux 5.12 this
functionality allows user and group IDs to be remapped without changing
their state on disk.  This can be useful for portable home directories
and a variety of container related use cases.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #12923
Closes #13671

20 months agoFix memory leaks in dmu_send()/dmu_send_obj()
Richard Yao [Tue, 18 Oct 2022 23:03:33 +0000 (19:03 -0400)]
Fix memory leaks in dmu_send()/dmu_send_obj()

If we encounter an EXDEV error when using the redacted snapshots
feature, the memory used by dspp.fromredactsnaps is leaked.

Clang's static analyzer caught this during an experiment in which I had
annotated various headers in an attempt to improve the results of static
analysis.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13973

20 months agoCleanup: Remove NULL pointer check from dmu_send_impl()
Richard Yao [Mon, 17 Oct 2022 21:57:59 +0000 (17:57 -0400)]
Cleanup: Remove NULL pointer check from dmu_send_impl()

The pointer is to a structure member, so it is never NULL.

Coverity complained about this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14042

20 months agoCleanup: Delete dead code from send_merge_thread()
Richard Yao [Mon, 17 Oct 2022 06:20:21 +0000 (02:20 -0400)]
Cleanup: Delete dead code from send_merge_thread()

range is always deferenced before it reaches this check, such that the
kmem_zalloc() call is never executed.

There is also no need to set `range->eos_marker = B_TRUE` because it is
already set.

Coverity incorrectly complained about a potential NULL pointer
dereference because of this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14042

20 months agoCleanup: zvol_add_clones() should not NULL check dp
Richard Yao [Mon, 17 Oct 2022 06:18:09 +0000 (02:18 -0400)]
Cleanup: zvol_add_clones() should not NULL check dp

It is never NULL because we return early if dsl_pool_hold() fails.

This caused Coverity to complain.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14042

20 months agoCleanup: metaslab_alloc_dva() should not NULL check mg->mg_next
Richard Yao [Mon, 17 Oct 2022 06:02:12 +0000 (02:02 -0400)]
Cleanup: metaslab_alloc_dva() should not NULL check mg->mg_next

This is a circularly linked list. mg->mg_next can never be NULL.

This caused 3 defect reports in Coverity.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14042

20 months agoCleanup: Delete unnecessary pointer check from vdev_to_nvlist_iter()
Richard Yao [Sun, 16 Oct 2022 04:42:03 +0000 (00:42 -0400)]
Cleanup: Delete unnecessary pointer check from vdev_to_nvlist_iter()

This confused Clang's static analyzer, making it think there was a
possible NULL pointer dereference. There is no NULL pointer dereference.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14042

20 months agoCleanup: Simplify userspace abd_free_chunks()
Richard Yao [Sun, 16 Oct 2022 02:54:57 +0000 (22:54 -0400)]
Cleanup: Simplify userspace abd_free_chunks()

Clang's static analyzer complained that we could use after free here if
the inner loop ever iterated. That is a false positive, but upon
inspection, the userland abd_alloc_chunks() function never will put
multiple consecutive pages into a `struct scatterlist`, so there is no
need to loop. We delete the inner loop.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14042

20 months agoFix possible NULL pointer dereference in sha2_mac_init()
Richard Yao [Mon, 17 Oct 2022 06:06:40 +0000 (02:06 -0400)]
Fix possible NULL pointer dereference in sha2_mac_init()

If mechanism->cm_param is NULL, passing mechanism to
PROV_SHA2_GET_DIGEST_LEN() will dereference a NULL pointer.

Coverity reported this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14044

20 months agoset_global_var() should not pass NULL pointers to dlclose()
Richard Yao [Sun, 16 Oct 2022 04:56:55 +0000 (00:56 -0400)]
set_global_var() should not pass NULL pointers to dlclose()

Both Coverity and Clang's static analyzer caught this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14044

20 months agoFix NULL pointer dereference in spa_open_common()
Richard Yao [Sun, 16 Oct 2022 04:19:13 +0000 (00:19 -0400)]
Fix NULL pointer dereference in spa_open_common()

Calling spa_open() will pass a NULL pointer to spa_open_common()'s
config parameter. Under the right circumstances, we will dereference the
config parameter without doing a NULL check.

Clang's static analyzer found this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14044

20 months agoFix NULL pointer passed to strlcpy from zap_lookup_impl()
Richard Yao [Sat, 15 Oct 2022 02:55:48 +0000 (22:55 -0400)]
Fix NULL pointer passed to strlcpy from zap_lookup_impl()

Clang's static analyzer pointed out that whenever zap_lookup_by_dnode()
is called, we have the following stack where strlcpy() is passed a NULL
pointer for realname from zap_lookup_by_dnode():

strlcpy()
zap_lookup_impl()
zap_lookup_norm_by_dnode()
zap_lookup_by_dnode()

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14044

20 months agofm_fmri_hc_create() must call va_end() before returning
Richard Yao [Sat, 15 Oct 2022 02:46:43 +0000 (22:46 -0400)]
fm_fmri_hc_create() must call va_end() before returning

clang-tidy caught this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14044

20 months agoFix NULL pointer dereference in zdb
Richard Yao [Sat, 15 Oct 2022 02:45:13 +0000 (22:45 -0400)]
Fix NULL pointer dereference in zdb

Clang's static analyzer complained that we dereference a NULL pointer in
dump_path() if we return 0 when there is an error.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14044

20 months agoZED: Fix uninitialized value reads
Richard Yao [Tue, 18 Oct 2022 19:42:14 +0000 (15:42 -0400)]
ZED: Fix uninitialized value reads

Coverity complained about a couple of uninitialized value reads in ZED.

 * zfs_deliver_dle() can pass an uninitialized string to zed_log_msg()
 * An uninitialized sev.sigev_signo is passed to timer_create()

The former would log garbage while the latter is not a real issue, but
we might as well suppress it by initializing the field to 0 for
consistency's sake.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14047

20 months agoLinux 6.1 compat: change order of sys/mutex.h includes
Coleman Kane [Tue, 18 Oct 2022 19:29:44 +0000 (15:29 -0400)]
Linux 6.1 compat: change order of sys/mutex.h includes

After Linux 6.1-rc1 came out, the build started failing to build a
couple of the files in the linux spl code due to the mutex_init
redefinition. Moving the sys/mutex.h include to a lower position within
these two files appears to fix the problem.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #14040

20 months agoCoverity model file update
Richard Yao [Tue, 18 Oct 2022 18:06:35 +0000 (14:06 -0400)]
Coverity model file update

Upon review, it was found that the model for malloc() was incorrect.

In addition, several general purpose memory allocation functions were
missing models:

 * kmem_vasprintf()
 * kmem_asprintf()
 * kmem_strdup()
 * kmem_strfree()
 * spl_vmem_alloc()
 * spl_vmem_zalloc()
 * spl_vmem_free()
 * calloc()

As an experiment to try to find more bugs, some less than general
purpose memory allocation functions were also given models:

 * zfsvfs_create()
 * zfsvfs_free()
 * nvlist_alloc()
 * nvlist_dup()
 * nvlist_free()
 * nvlist_pack()
 * nvlist_unpack()

Finally, the models were improved using additional coverity primitives:

 * __coverity_negative_sink__()
 * __coverity_writeall0__()
 * __coverity_mark_as_uninitialized_buffer__()
 * __coverity_mark_as_afm_allocated__()

In addition, an attempt to inform coverity that certain modelled
functions read entire buffers was used by adding the following to
certain models:

int first = buf[0];
int last = buf[buflen-1];

It was inspired by the QEMU model file.

No additional false positives were found by this, but it is believed
that the more accurate model file will help to catch false positives in
the future.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14048

20 months agoFix declarations of non-global variables
Tino Reichardt [Tue, 18 Oct 2022 18:05:32 +0000 (20:05 +0200)]
Fix declarations of non-global variables

This patch inserts the `static` keyword to non-global variables,
which where found by the analysis tool smatch.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13970