Notable upstream pull request merges:
#13805 Configure zed's diagnosis engine with vdev properties
#14110 zfs list: Allow more fields in ZFS_ITER_SIMPLE mode
#14121 Batch enqueue/dequeue for bqueue
#14123 arc_read()/arc_access() refactoring and cleanup
#14159 Bypass metaslab throttle for removal allocations
#14243 Implement uncached prefetch
#14251 Cache dbuf_hash() calculation
#14253 Allow reciever to override encryption property in case of replication
#14254 Restrict visibility of per-dataset kstats inside FreeBSD jails
#14255 Zero end of embedded block buffer in dump_write_embedded()
#14263 Cleanups identified by CodeQL and Coverity
#14264 Miscellaneous fixes
#14272 Change ZEVENT_POOL_GUID to ZEVENT_POOL to display pool names
#14287 FreeBSD: Remove stray debug printf
#14288 Colorize zfs diff output
#14289 deadlock between spa_errlog_lock and dp_config_rwlock
#14291 FreeBSD: Fix potential boot panic with bad label
#14292 Add tunable to allow changing micro ZAP's max size
#14293 Turn default_bs and default_ibs into ZFS_MODULE_PARAMs
#14295 zed: add hotplug support for spare vdevs
#14304 Activate filesystem features only in syncing context
#14311 zpool: do guid-based comparison in is_vdev_cb()
#14317 Pack zrlock_t by 8 bytes
#14320 Update arc_summary and arcstat outputs
#14328 FreeBSD: catch up to 1400077
#14376 Use setproctitle to report progress of zfs send
#14340 Remove some dead ARC code
#14358 Wait for txg sync if the last DRR_FREEOBJECTS might result in a hole
#14360 libzpool: fix ddi_strtoull to update nptr
#14364 Fix unprotected zfs_znode_dmu_fini
#14379 zfs_receive_one: Check for the more likely error first
#14380 Cleanup of dead code suggested by Clang Static Analyzer
#14397 Avoid passing an uninitialized index to dsl_prop_known_index
#14404 Fix reading uninitialized variable in receive_read
#14407 free_blocks(): Fix reports from 2016 PVS Studio FreeBSD report
#14418 Introduce minimal ZIL block commit delay
#14422 x86 assembly: fix .size placement and replace .align with .balign
Previously, zic and tzsetup were both listed as install tools and basic
bootstrap tools. Actually, tzsetup is an install tool while zic is a
non-basic bootstrap tool.
These aren't just needed for compatibility with i386 binaries (which need
the 32-bit section), but potentially also for compatibility with older
binaries on all platforms.
Sponsored by: Klara, Inc.
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D38194
Emmanuel Vadot [Wed, 25 Jan 2023 07:28:22 +0000 (08:28 +0100)]
arm64: Move device scmi to std.arm
The scmi driver in its current form requires the arm_doorbell
driver to communicate with the firmware.
The arm_doorbell is only found in ARM Juno reference board (and
apparently on Morello too).
If we want to use scmi on other platform (like some rockchip or imx
soc), the driver needs to be updated to support svc/shmem communication
with the firmware.
For now since it can be only used with arm_doorbell move the device to
std.arm otherwise kernel configs like ALLWINNER or ROCKCHIP fails to build.
Ever since gtaskqueue_drain() was added to iflib_stop(), a kernel panic
occurs when the ice(4) driver is in recovery mode. Queues are not
initialized in this mode, so gt_taskqueue is not initialized, and
gtaskqueue_drain() will panic.
Fix this by only doing a drain if an RX queue's gt_taskqueue is
initialized.
Justin Hibbits [Fri, 13 Jan 2023 16:30:58 +0000 (11:30 -0500)]
debugnet: Add ifnet accessor to set debugnet methods
As part of the effort to hide the internals of the ifnet struct, convert
the DEBUGNET_SET() macro to use an accessor instead of directly touching
the methods member.
Justin Hibbits [Fri, 13 Jan 2023 15:43:51 +0000 (10:43 -0500)]
ifnet/API: Move struct ifnet definition to a <net/if_private.h>
Hide the ifnet structure definition, no user serviceable parts inside,
it's a netstack implementation detail. Include it temporarily in
<net/if_var.h> until all drivers are updated to use the accessors
exclusively.
Justin Hibbits [Thu, 12 Jan 2023 18:33:30 +0000 (13:33 -0500)]
ifnet API: Change if_init() to take context argument
Some drivers, like iflib drivers, take a 'context' argument instead of a
ifnet argument, as a single interface may have multiple contexts.
Follow this scheme by passing the context argument down. Most drivers
will likely pass 'ifp' as the context.
Coleman Kane [Tue, 24 Jan 2023 19:20:50 +0000 (14:20 -0500)]
linux 6.2 compat: zpl_set_acl arg2 is now struct dentry
Linux 6.2 changes the second argument of the set_acl operation to be a
"struct dentry *" rather than a "struct inode *". The inode* parameter
is still available as dentry->d_inode, so adjust the call to the _impl
function call to dereference and pass that pointer to it.
Also document that the get_acl -> get_inode_acl member name change from
commit 884a693 was an API change also introduced in Linux 6.2.
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #14415
Alexander Motin [Tue, 24 Jan 2023 17:20:32 +0000 (12:20 -0500)]
Introduce minimal ZIL block commit delay
Despite all optimizations, tests on actual hardware show that FreeBSD
kernel can't sleep for less then ~2us. Similar tests on Linux show
~50us delay at least from nanosleep() (haven't tested inside kernel).
It means that on very fast log device ZIL may not be able to satisfy
zfs_commit_timeout_pct block commit timeout, increasing log latency
more than desired.
Handle that by introduction of zil_min_commit_timeout parameter,
specifying minimal timeout value where additional delays to aggregate
writes may be skipped. Also skip delays if the LWB is more than 7/8
full, that often happens if I/O sizes are constant and match one of
LWB sizes. Both things are applied only if there were no already
outstanding log blocks, that may indicate single-threaded workload,
that by definition can not benefit from the commit delays.
While there, add short time moving average to zl_last_lwb_latency to
make it more stable.
Tests of single-threaded 4KB writes to NVDIMM SLOG on FreeBSD show IOPS
increase by 9% instead of expected 5%. For zfs_commit_timeout_pct of
1 there IOPS increase by 5.5% instead of expected 1%.
Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #14418
Attila Fülöp [Mon, 23 Jan 2023 19:25:21 +0000 (20:25 +0100)]
x86 asm: Replace .align with .balign
The .align directive used to align storage locations is
ambiguous. On some platforms and assemblers it takes a byte count,
on others the argument is interpreted as a shift value. The current
usage expects the first interpretation.
Replace it with the unambiguous .balign directive which always
expects a byte count, regardless of platform and assembler.
Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #14422
Attila Fülöp [Mon, 23 Jan 2023 18:48:39 +0000 (19:48 +0100)]
IPC: blake3 x86 asm: fix placement of .size directives
The .size directive used by the SET_SIZE C macro uses the special
dot symbol to calculate the size of a function. The dot symbol
refers to the current address, so for the calculation to be
meaningful the SET_SIZE macro must be placed immediately after the
end of the function the size is calculated for.
Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #14422
Michal Gulbicki [Tue, 24 Jan 2023 14:31:38 +0000 (09:31 -0500)]
qat: Add Intel® 4xxx Series platform support
Overview:
Intel(R) QuickAssist Technology (Intel(R) QAT) provides hardware
acceleration for offloading security, authentication and compression
services from the CPU, thus significantly increasing the performance and
efficiency of standard platform solutions.
This commit introduces:
- Intel® 4xxx Series platform support.
- QuickAssist kernel API implementation update for Generation 4 device.
Enabled services: symmetric cryptography and data compression.
- Increased default number of crypto instances in static configuration
for performance purposes.
OCF backend changes:
- changed GCM/CCM MAC validation policy to generate MAC by HW
and validate by SW due to the QAT HW limitations.
Patch co-authored by: Krzysztof Zdziarski <krzysztofx.zdziarski@intel.com>
Patch co-authored by: Michal Jaraczewski <michalx.jaraczewski@intel.com>
Patch co-authored by: Michal Gulbicki <michalx.gulbicki@intel.com>
Patch co-authored by: Julian Grajkowski <julianx.grajkowski@intel.com>
Patch co-authored by: Piotr Kasierski <piotrx.kasierski@intel.com>
Patch co-authored by: Adam Czupryna <adamx.czupryna@intel.com>
Patch co-authored by: Konrad Zelazny <konradx.zelazny@intel.com>
Patch co-authored by: Katarzyna Rucinska <katarzynax.kargol@intel.com>
Patch co-authored by: Lukasz Kolodzinski <lukaszx.kolodzinski@intel.com>
Patch co-authored by: Zbigniew Jedlinski <zbigniewx.jedlinski@intel.com>
Andrew Gallatin [Tue, 24 Jan 2023 01:27:17 +0000 (20:27 -0500)]
dtrace: conditionally load the systrace_linux klds when loading dtrace.
When dtrace starts, it tries to detect if the dtrace klds are loaded,
and if not, it loads them by loading the dtraceall kld. This module
depends on most dtrace modules, including systrace for the native
freebsd and freebsd32 ABIs. However, it does not depend on the
systrace_linux klds, as they in turn depend on the linux ABI klds, and
we don't want to load an ABI module that the user has not explicitly
requested. This can leave a naive user in a state where they think all
syscall providers have been loaded, yet linux ABI syscalls are
"invisible" to dtrace.
To fix this, check to see if the linux ABI modules are loaded. If they
are, then load their systrace klds.
libcxx: Implement atomic::wait/notify using _umtx_op(2) for 64bit arches
Only 64bit architectures can be supported this way, because libcxx
defines __cxx_contention_t to be int64_t for FreeBSD, and 32bit arches
do not have a kind of UMTX_OP_WAIT_INT64_PRIVATE operation.
David Hedberg [Mon, 23 Jan 2023 21:19:43 +0000 (22:19 +0100)]
Wait for txg sync if the last DRR_FREEOBJECTS might result in a hole
If we receive a DRR_FREEOBJECTS as the first entry in an object range,
this might end up producing a hole if the freed objects were the
only existing objects in the block.
If the txg starts syncing before we've processed any following
DRR_OBJECT records, this leads to a possible race where the backing
arc_buf_t gets its psize set to 0 in the arc_write_ready() callback
while still being referenced from a dirty record in the open txg.
To prevent this, we insert a txg_wait_synced call if the first
record in the range was a DRR_FREEOBJECTS that actually
resulted in one or more freed objects.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: David Hedberg <david.hedberg@findity.com>
Sponsored by: Findity AB
Closes #11893
Closes #14358
Richard Yao [Mon, 23 Jan 2023 21:16:22 +0000 (16:16 -0500)]
Reject streams that set ->drr_payloadlen to unreasonably large values
In the zstream code, Coverity reported:
"The argument could be controlled by an attacker, who could invoke the
function with arbitrary values (for example, a very high or negative
buffer size)."
It did not report this in the kernel. This is likely because the
userspace code stored this in an int before passing it into the
allocator, while the kernel code stored it in a uint32_t.
However, this did reveal a potentially real problem. On 32-bit systems
and systems with only 4GB of physical memory or less in general, it is
possible to pass a large enough value that the system will hang. Even
worse, on Linux systems, the kernel memory allocator is not able to
support allocations up to the maximum 4GB allocation size that this
allows.
This had already been limited in userspace to 64MB by
`ZFS_SENDRECV_MAX_NVLIST`, but we need a hard limit in the kernel to
protect systems. After some discussion, we settle on 256MB as a hard
upper limit. Attempting to receive a stream that requires more memory
than that will result in E2BIG being returned to user space.
Reported-by: Coverity (CID-1529836) Reported-by: Coverity (CID-1529837) Reported-by: Coverity (CID-1529838) Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14285
rob-wing [Mon, 23 Jan 2023 21:14:25 +0000 (12:14 -0900)]
Configure zed's diagnosis engine with vdev properties
Introduce four new vdev properties:
checksum_n
checksum_t
io_n
io_t
These properties can be used for configuring the thresholds of zed's
diagnosis engine and are interpeted as <N> events in T <seconds>.
When this property is set to a non-default value on a top-level vdev,
those thresholds will also apply to its leaf vdevs. This behavior can be
overridden by explicitly setting the property on the leaf vdev.
Note that, these properties do not persist across vdev replacement. For
this reason, it is advisable to set the property on the top-level vdev
instead of the leaf vdev.
The default values for zed's diagnosis engine (10 events, 600 seconds)
remains unchanged.
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Rob Wing <rob.wing@klarasystems.com> Sponsored-by: Seagate Technology LLC
Closes #13805
Richard Yao [Mon, 23 Jan 2023 21:12:37 +0000 (16:12 -0500)]
free_blocks(): Fix reports from 2016 PVS Studio FreeBSD report
In 2016, the authors of PVS Studio ran it on the FreeBSD kernel, which
identified a number of bugs / cleanup opportunities in the FreeBSD ZFS kernel
code. A few of them persist to the present day:
In particular, we have the following in free_blocks():
\sys\cddl\contrib\opensolaris\uts\common\fs\zfs\dnode_sync.c (174): error V547: Expression '__left >= __right' is always true. Unsigned type value is always >= 0.
\sys\cddl\contrib\opensolaris\uts\common\fs\zfs\dnode_sync.c (171): error V634: The priority of the '*' operation is higher than that of the '<<' operation. It's possible that parentheses should be used in the expression.
\sys\cddl\contrib\opensolaris\uts\common\fs\zfs\dnode_sync.c (175): error V547: Expression '__left >= __right' is always true. Unsigned type value is always >= 0.
A couple of assertions accidentally typecast the arguments they check to
unsigned in such a way that the result is always true. Also, parentheses
are missing around `1<<epbs` in `(db->db_blkid * 1<<epbs)`. This works
out to be okay due to multiplication not caring what order of operations
we use, but it is better to fix it to be `(db->db_blkid << epbs)`.
A few of the function local variables probably never should have been
32-bit in the first place, so we make them 64-bit. We also replace the
existing assertions with additional assertions to ensure that 64-bit
unsigned arithmetic is safe.
Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14407
Mark Johnston [Mon, 23 Jan 2023 19:42:41 +0000 (14:42 -0500)]
netmap: Try to count packet drops in emulated mode
Right now we have little visibility into packet drops within netmap.
Start trying to make packet loss issues more visible by counting queue
drops in the transmit path, and in the input path for interfaces running
in emulated mode, where we place received packets in a bounded software
queue that is processed by rxsync.
Mark Johnston [Mon, 23 Jan 2023 19:41:05 +0000 (14:41 -0500)]
netmap: Tell the compiler to avoid reloading ring indices
Per the removed comments these fields should be loaded only once, since
they can in principle be modified concurrently, though this would be a
violation of the userspace contract with netmap.
Mitchell Horne [Mon, 9 Jan 2023 17:14:19 +0000 (13:14 -0400)]
vtblk: secondary fix for dumping
The code paths while dumping do not got through busdma. As such,
safeguard against calling bus_dmamap_sync() with a NULL map. The x86
implementation tolerates this but others do not, resulting in a
NULL dereference panic when dumping to a vtblk device on arm64, riscv,
etc.
Fixes: 782105f7c898 ("vtblk: Use busdma")
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D37990
Mitchell Horne [Sat, 7 Jan 2023 18:09:28 +0000 (14:09 -0400)]
ddb: have 'reset' command use normal reboot path
This conditionally gives all registered shutdown handlers a chance to
perform the reboot, with cpu_reset() being the fallback. The '\s'
modifier can be used with the command to get the previous behaviour.
The motivation is that some platforms may not be able do anything
meaningful via cpu_reset(), due to a lack of standardized reset
mechanism and/or firmware shortcomings. However, they may have a
separate device driver attached that normally performs the reboot. Such
is the case for some versions of the Raspberry Pi, where reset via PSCI
fails, but the BCM2835 watchdog driver has a shutdown hook.
Currently shutdown_reset() is registered as the final entry of the
shutdown_final event handler. However, if a panic occurs early in boot
before the event is registered (SI_SUB_INTRINSIC), we may end up
spinning in the subsequent infinite for loop and failing to reset
altogether. Instead we can simply call this function unconditionally.
Jiajie Chen [Mon, 23 Jan 2023 16:36:59 +0000 (00:36 +0800)]
Add kf_file_nlink field to kf_file and populate it
This will allow user-space programs (e.g. lsof) to locate deleted files
whose nlink equals zero. Prior to this commit, programs has to use
stat(kf_path) to get nlink, but that will fail if the file is deleted.
netinet6: honor blackhole/unreach routes in the non-fastforwading code.
Currently, under the conditions specified below, IPv6 ingress packet
processing can ignore blackhole/reject flag on the prefix. The packet
will instead be looped locally till TTL expiration and a single ICMPv6
unreachable message will be send to the source even in case of
RTF_BLACKHOLE.
The following conditions needs hold to make the scenario happen:
* IPv6 forwarding is enabled
* Packet is not fast-forwarded
* Destination prefix has either RTF_BLACKHOLE or RTF_REJECT flag
Fix this behavior by checking for the blackhole/reject flags in
ip6_forward().
* Automatically use IPv6 when IPv6 addresses are used, --ip6 is not needed.
* Building of ping requests and parsing of ping replies is done layer by
layer. This way most arguments are available both for IPv6 and IPv4,
for ICMP and TCP.
* Use argument groups for improved readability.
* Change ToS and TTL argument name to TC and HL to reflect the modern
IPv6 nomenclature. The argument still set related IPv4 header fields
properly.
* Instead of sniffing for the very specific case of duplicated packets,
allow for sniffing on multiple interfaces.
* Report which sniffer has failed by setting bits of error code.
* Raise meaningful exceptions when irrecoverable errors happen.
* Make IPv4 fragmentation flags configurable.
* Make IPv6 HL / IPv4 TTL configurable.
* Make TCP MSS configurable.
* Make TCP sequence number configurable.
* Make ICMP payload size configurable.
* Add debug output.
* Move command line argument parsing out of network functions.
* Make the code somehow PEP-8 compliant.
* Remove ambiguity of configuring recvif, it must be now explicitly specified.
* Don't catch exceptions around creating the sniffer, let it properly
fail and display the whole stack trace.
* Count correct packets so that duplicates can be found.
Andrew Gallatin [Sat, 21 Jan 2023 19:26:25 +0000 (14:26 -0500)]
vm: centralize VM_BATCHQUEUE_SIZE definition
Remove the platform-specific definitions of VM_BATCHQUEUE_SIZE
for amd64 and powerpc64, and instead treat all 64-bit platforms
identically. This has the effect of increasing the arm64
and riscv VM_BATCHQUEUE_SIZE to match that of other platforms.
Some existing applications setup Netlink socket with
SOCK_DGRAM instead of SOCK_RAW. Update the manpage to clarify
that the default way of creating the socket should be with
SOCK_RAW. Update the code to support both SOCK_RAW and SOCK_DGRAM.
Warner Losh [Sat, 21 Jan 2023 05:11:00 +0000 (22:11 -0700)]
elf: EF_ARM_EABI_VERSION_UNKNOWN is no longer used, retire
EF_ARM_EABI_VERSION_UNKNOWN was used when we could run either eabi or
oabi. That was armv[45] only, and went away when we migrated to eabi
many major releases ago. No standard defines this symbol, so retire it.
Warner Losh [Sat, 21 Jan 2023 02:15:52 +0000 (19:15 -0700)]
elf: Catch up with defining EF_ARM_EABI_VERSION in elf_common.h
FreeBSD defines EF_ARM_EABI_VERSION in a non-standard way (at least
differently than everybody else). We use this only in elf*machdep.c to
make sure the image is new enough. Switch to the more standard way of
defining this and adjust other constants to match.
Warner Losh [Fri, 20 Jan 2023 23:46:05 +0000 (16:46 -0700)]
elf_common.h: Add parens to EF_ARM_EABI_VERSION macro
Due to my haste, I missed John's suggestion in the review that I add ()
here. Belatedly add them. It didn't matter for my test case, but there's
some pathological uses where it might matter.
Warner Losh [Fri, 20 Jan 2023 23:33:37 +0000 (16:33 -0700)]
byteswap.h: Add a glibc/linux compatible byteswap.h
For endian.h to work instead of sys/endian.h, some software needs
byteswap.h available. It must define {__,}byteswap_{16,32,64}.
Included sys/_endian.h to get an appropriate __byteswap16, etc
and defines the new macros in terms of them. Enhance _endian.h
to allow it to be included from here too.
Warner Losh [Fri, 20 Jan 2023 23:32:45 +0000 (16:32 -0700)]
linux: For better compatibility, provide compatible endian.h
Add endian.h. This includes sys/endian.h and then adds extra defines
that glibc defines with double underscores for our
_{BIG,BYTE,LITTLE,PDP}_ENDIAN macros. We also define __FLOAT_WORD_ORDER
to be the same as _BYTE_ENDIAN since FreeBSD doesn't currently define
this, and the default with glibc is exactly this for our platforms.
Move common parts of endian.h and sys/endian.h into sys/_endian.h
to limit namespace pollution from endian.h
All this gives us good compatibility with Linux. There may be one or two
upstreams that haven't integrated the patches I tried to send up.
There are some minor differences:
o The extra glibc macros are not defined. These are all
controlled with either __ at the start, or only defined
when glibc is being built. We also don't define macros
that are used internally in glibc that would pollute
the namespace.
o For complete compatibility, this change must also be
paired with providing a glibc-compatible byteswap.h.
Warner Losh [Fri, 20 Jan 2023 23:17:45 +0000 (16:17 -0700)]
UPDATING: Remove old entries
Belatedly remove the entries older than the stable/11 branch point after
stable/13 was created. This should be done shortly after the branch, not
well after the branch point.
Document this policy in UPDATING, though other checklists should likely
be updated as well.
Warner Losh [Fri, 20 Jan 2023 23:15:04 +0000 (16:15 -0700)]
elf_common.h: define EF_ARM_EABI_VERSION
The Linux distributions have had the EF_ARM_EABI_VERSION macro for a
while now. The kernel of arm* targets build depends on it being in the
host's elf.h environment. Add it here so it gets pulled in so there's
one fewer hack required to build a Linux kernel on a FreeBSD host.
Warner Losh [Fri, 20 Jan 2023 23:13:54 +0000 (16:13 -0700)]
UPDATING: update notes on EFI booting
The notes on EFI booting and updating for ZFS had become dated and only
partially updated. Expand the notes with a few more details and a
pointer to laoder.efi(8) and uefi(8).
Chunwei Chen [Fri, 20 Jan 2023 19:49:56 +0000 (11:49 -0800)]
Fix reading uninitialized variable in receive_read
When zfs_file_read returns error, resid may be uninitialized.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #14404
Allan Jude [Fri, 20 Jan 2023 19:11:54 +0000 (14:11 -0500)]
zfs_receive_one: Check for the more likely error first
If zfs_receive_one() gets back EINVAL, check for the more likely case,
embedded block pointers + encryption and return that error, before
falling back to the less likely case, a resumable stream when the
kernel has not been upgraded to support resume.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Allan Jude <allan@klarasystems.com> Sponsored-by: rsync.net Sponsored-by: Klara Inc.
Closes #14379
Richard Yao [Fri, 20 Jan 2023 19:10:15 +0000 (14:10 -0500)]
Cleanup ->dd_space_towrite should be unsigned
This is only ever used with unsigned data, so the type itself should be
unsigned. Also, PVS Studio's 2016 FreeBSD kernel report correctly
identified the following assertion as always being true, so we can drop
it:
The reason it was always true is because it would do casts to give us
unsigned comparisons. This could have been fixed by switching to
`ASSERT3S()`, but upon inspection, it turned out that this variable
never should have been allowed to be signed in the first place.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14408
Mark Johnston [Mon, 16 Jan 2023 12:53:16 +0000 (07:53 -0500)]
Micro-optimize dsl_prop_get_dd()
Use the saved property index instead of looking it up once per DSL
directory when traversing up towards the root.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Akash B <akash-b@hpe.com> Signed-off-by: Mark Johnston <markj@FreeBSD.org> Sponsored-by: The FreeBSD Foundation
Closes #14397
Mark Johnston [Sun, 15 Jan 2023 22:05:19 +0000 (17:05 -0500)]
Avoid passing an uninitialized index to dsl_prop_known_index
Reported-by: KMSAN Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Akash B <akash-b@hpe.com> Signed-off-by: Mark Johnston <markj@FreeBSD.org> Sponsored-by: The FreeBSD Foundation
Closes #14397
Robert Wing [Fri, 20 Jan 2023 11:10:53 +0000 (11:10 +0000)]
vmm: take exclusive mem_segs_lock in vm_cleanup()
The consumers of vm_cleanup() are vm_reinit() and vm_destroy().
The vm_reinit() call path is, here vmmdev_ioctl() takes mem_segs_lock:
vmmdev_ioctl()
vm_reinit()
vm_cleanup(destroy=false)
The call path for vm_destroy() is (mem_segs_lock not taken):
sysctl_vmm_destroy()
vmmdev_destroy()
vm_destroy()
vm_cleanup(destroy=true)
Fix this by taking mem_segs_lock in vm_cleanup() when destroy == true.
Reviewed by: corvink, markj, jhb Fixes: 67b69e76e8ee ("vmm: Use an sx lock to protect the memory map.")
Differential Revision: https://reviews.freebsd.org/D38071
John Baldwin [Fri, 20 Jan 2023 17:58:38 +0000 (09:58 -0800)]
bhyve: Fix a buffer overread in the PCI hda device model.
The sc->codecs array contains HDA_CODEC_MAX (15) entries. The
guest-supplied cad field in the verb provided to hda_send_command is a
4-bit field that was used as an index into sc->codecs without any
bounds checking. The highest value (15) would overflow the array.
Other uses of sc->codecs in the device model used sc->codecs_no to
determine which array indices have been initialized, so use a similar
check to reject requests for uninitialized or invalid cad indices in
hda_send_command.
PR: 264582
Reported by: Robert Morris <rtm@lcs.mit.edu>
Reviewed by: corvink, markj, emaste
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D38128
John Baldwin [Fri, 20 Jan 2023 17:57:45 +0000 (09:57 -0800)]
bhyve: Fix a global buffer overread in the PCI hda device model.
hda_write did not validate the relative register offset before using
it as an index into the hda_set_reg_table array to lookup a function
pointer to execute after updating the register's value.
PR: 264435
Reported by: Robert Morris <rtm@lcs.mit.edu>
Reviewed by: corvink, markj, emaste
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D38127