The number of events we track can vary over time, but we only allocate
enough space for the exact number of events we are tracking when we
first begin, resulting in a trivially reproducable heap overflow. Fix
this by allocating enough space for the greatest possible number of
events (two per file) and clean up the code a bit.
Also add a test case which triggers the aforementioned heap overflow,
although we don't currently have a way to detect it.
Bjoern A. Zeeb [Wed, 29 Nov 2023 21:33:23 +0000 (21:33 +0000)]
iwlwififw: add firmware for the Bz/B200 chipset
The iwlwifi driver already supports the chipset as "Bz TBD"
(also in 14.0). Add the firmware for it. Successfully tested
for 0x8086/0x272b/0x8086/0x00f4 on arm64 thanks to donated
hardware [1].
vt(4): Call post-switch callback after replacing the backend
[Why]
For instance, it gives a chance to the new backend to refresh the
screen. This is needed by the vt_drmfb backend and `drm_fb_helper`.
This change was lost when I posted changes to reviews.freebsd.org and it
broken the amdgpu driver... Thanks to manu@ for reporting the problem
and wulf@ to find out the missing change!
Tested by: manu
Reviewed by: manu
Approved by: manu
Differential Revision: https://reviews.freebsd.org/D42834
zil_claim_clone_range() takes references on cloned blocks before ZIL
replay. Later zil_free_clone_range() drops them after replay or on
dataset destroy. The total balance is neutral. It means on actual
replay we must take additional references, which would stay in BRT.
Without this blocks could be freed prematurely when either original
file or its clone are destroyed. I've observed BRT being emptied
and the feature being deactivated after ZIL replay completion, which
should not have happened. With the patch I see expected stats.
Reviewed-by: Kay Pedersen <mail@mkwg.de> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15603
John Baldwin [Wed, 29 Nov 2023 18:31:47 +0000 (10:31 -0800)]
pci_cfgreg: Add a PCI domain argument to the low-level register API
This commit changes the API of pci_cfgreg(read|write) to add a domain
argument (referred to as a segment in ACPI parlance) (note that this
is not the same as a NUMA domain, but something PCI-specific). This
does not yet enable access to domains other than 0, but updates the
API to support domains.
Places that use hard-coded bus/slot/function addresses have been
updated to hardcode a domain of 0. A few places that have the PCI
domain (segment) available such as the acpi_pcib_acpi.c Host-PCI
bridge driver pass the PCI domain.
The hpt27xx(4) and hptnr(4) drivers fail to attach to a device not on
domain 0 since they provide APIs to their binary blobs that only
permit bus/slot/function addressing.
The x86 non-ACPI PCI bus drivers all hardcode a domain of 0 as they do
not support multiple domains.
Wraithh [Wed, 29 Nov 2023 17:55:17 +0000 (19:55 +0200)]
Fix zoneid when USER_NS is disabled
getzoneid() should return GLOBAL_ZONEID instead of 0 when USER_NS is disabled.
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ilkka Sovanto <github@ilkka.kapsi.fi>
Closes #15560
Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Warner Losh <imp@FreeBSD.org> Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #15606
Igor Ostapenko [Wed, 29 Nov 2023 12:35:41 +0000 (13:35 +0100)]
pf: fix mem leaks upon vnet destroy
Add missing cleanup actions:
- remove user defined anchor rulesets
- remove user defined ether anchor rulesets
- remove tables linked to user defined anchors
- deal with wildcard anchor peculiarities to get them removed correctly
Mark Johnston [Wed, 29 Nov 2023 17:51:55 +0000 (12:51 -0500)]
ossl: Keep mutable AES-GCM state on the stack
ossl(4)'s AES-GCM implementation keeps mutable state in the session
structure, together with the key schedule. This was done for
convenience, as both are initialized together. However, some OCF
consumers, particularly ZFS, assume that requests may be dispatched to
the same session in parallel. Without serialization, this results in
incorrect output.
Fix the problem by explicitly copying per-session state onto the stack
at the beginning of each operation.
PR: 275306
Reviewed by: jhb
Fixes: 9a3444d91c70 ("ossl: Add a VAES-based AES-GCM implementation for amd64")
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D42783
Warner Losh [Wed, 29 Nov 2023 15:26:29 +0000 (08:26 -0700)]
openzfs: unbreak 32-bit builds.
32-bit builds are broken. fix that by using PRIu64 instead of a
bare '%lu.'
Feel free to revert when upstream has this fixed. I'm agnostic as to the
proper fix, but don't have the time to fight upstreaming this on top of
everything else.
Alan Somers [Wed, 12 Jul 2023 20:46:27 +0000 (14:46 -0600)]
zfsd: fault disks that generate too many I/O delay events
If ZFS reports that a disk had at least 8 I/O operations over 60s that
were each delayed by at least 30s (implying a queue depth > 4 or I/O
aggregation, obviously), fault that disk. Disks that respond this
slowly can degrade the entire system's performance.
Alexander Motin [Wed, 29 Nov 2023 01:50:30 +0000 (18:50 -0700)]
mpi3mr: Make these bus_dmamap_load calls synchronous
These calls "should" all be synchrounous. There's no bouncing that's
needed for them (at least in the typical case that we have a sane card
that has more bits of dma addresses decoded than we have memory), so
there's no errors possible. Ensure these calls are really synchronous
with BUS_DMA_NOWAIT flags (which should never fail now that the
bus_dmamem_alloc() has succeeded).
Warner Losh [Wed, 29 Nov 2023 01:49:49 +0000 (18:49 -0700)]
mpi3mr: Honor the dma mask from IOCFacts
The number of signficant bits that are decoded are returned in the flags
field of the IOCFacts structure from the device. Rather than assume the
worst with a pessimal 32-bit maximum, look at this value and pass it
along to all the dma map creation requests.
A lof of those creations are repetitive and could just inherit from the
base tag if we moved to the templated interface. This is called out as
desireable future work not done at this time.
In addition, due to a chicken and an egg problem, we have to allocate
some of the maps with a 32-bit loaddr. These are the ones we need to
read iocfacts. And they are fine to be so restricted: they are little
used after startup, and when they are used, bouncing is fine.
Warner Losh [Wed, 29 Nov 2023 01:49:39 +0000 (18:49 -0700)]
mpi3mr: Fix EINPROGRESS errors hanging the card
Move enqueueing of commands to bus_dmamap_load_ccb callback
Fix fundamental difference between FreeBSD and Linux. On Linux, your dma
load callback always happends before it returns, so drivers are written
to load the map, then submit to hardware. On FreeBSD, the callback may
be deferred and return EINPROGRESS. This means the callback is
responsible for queueing the request to the hardware is done after the
SGL list is created. Make a number of interrelated cahnages:
At the end of mpi3mr_prepare_sgls, add a call to mpi3mr_enqueue_request.
Split the hardware submission out from the end of mpi3mr_action_scsiio
and move it into a new routine mpi3mr_enqueue_request.
Move all error completion from the end of mpi3mr_action_scsiio to where
the error is detected. We cannot pass errors back from the
mpi3mr_enqueue_request to do this on a 'failed' mpi3mr in a centralized
place (since it has to be fire and forget).
Add comments about zero length SGLs never making it into
mpi3mr_prepare_sgls. Keep the code there for the moment, but we only set
cm->data to non-NULL when scsiio_req->DataLength is not zero. So the
datalength can't be zero and we can't send the zero SGLs.
Add commentts about other "impossible" tests in mpi3mr_prepare_sgls that
really should be simple asserts of some flavor.
Eliminate cm->error_code, since we can't pass data back from the
mpi3mr_prepare_sgl callback anymore.
In mpi3mr_map_request, call mpi3mr_enqueue_request for the no data case.
This seems to work even though we've not done the special zero length
handling that was in mpi3mr_prepare_sgls, giving further evidence to it
not actually being needed. This is needed for SCSI CDBs that have no
data to pass to the drive like TEST UNIT READY.
With this change, and the prior ones, we're now able to run with mpi3mr
on 128GB systems and very heavy disk load (so many buffers land > 4GB:
the driver instructs busdma to never use memory abouve 4GB, which may be
too conservative, but an issue for another time).
Warner Losh [Wed, 29 Nov 2023 01:49:30 +0000 (18:49 -0700)]
mpi3mr: Cleaup setting of status in processing scsiio requests
More uniformly use mpi3mr_set_ccbstatus in mpi3mr_action_scsiio. The
routine mostly used it, but also has setting of status by hand. In those
cases where we want to error out the request, use this routine.
As part of this, move setting CAM_SIM_QUEUED later in the function to
when we're sure it's been queued. Remove the places we clear it before
this.
Warner Losh [Wed, 29 Nov 2023 01:49:24 +0000 (18:49 -0700)]
mpi3mr: Only set callout_owned when we create a timeout
Since we assume there's a timeout to cancel when this is true, only set
it true when we set the timeout. Otherwise we may try to cancel a timeout
when there's been an error in submission.
Warner Losh [Wed, 29 Nov 2023 01:49:08 +0000 (18:49 -0700)]
mpi3mr: Reduce the scope of the reset_mutext
Reduce the scope of reset_mutext to protect the msleep in the watch dog
thread as well as the MPI3MR_FLAGS_SHUTDOWN bit. Use it to protect the
wakeup in mpi3mr_detach so this thread can exit sooner when we're trying
to do an orderly shutdown. Optimize the flow to check the sleep and
other conditions before going to sleep.
It's an open question if this should protect sc->unrecoverable, and if
we should wakeup the watchdog thread when we set it. We might also want
to move too booleans for the three flags that we have now in
mpi3mr_flags. There are a number of U8s that should really be bools and
we might want to also group them together to pack softc better.
Warner Losh [Wed, 29 Nov 2023 01:49:01 +0000 (18:49 -0700)]
mpi3mr: Remove unused fields in struct mpi3mr_cmd
All of these fields are either unused, or just initialized. Remove
them. This saves about 1MB of memory for the cards that I have which can
do 8k transactions at once.
Warner Losh [Wed, 29 Nov 2023 01:48:48 +0000 (18:48 -0700)]
mpi3mr: Don't hold fwevt_lock over call to taskqueue_drain
Holding fwevt_lock when we call taskqueue_drain can lead to deadlock
because it's draining a queue needs fwevt_lock to do work, so that other
thread will try to take out the lock and block, making the thread never
finish and taskqueue_drain never complete. There's a witness
warning/error for this which was exposed when the lock was converted to
a MTX_DEF lock from a MTX_SPIN prior to committing to the FreeBSD tree.
The lock appears to be to protect against additional items being added
to the event list while we're doing a reset. Since the taskqueue is
blocked, items can get added to the list, but won't be processed during
the reset, but there is still a (likely small) race between the
taskqueue_drain and the taskqueue_block calls where an interrupt could
fire on another CPU, resulting in a task being enqueued and started
before the block can take effect. The only way to fix that race is to
turn off interrupt processing during a reset. So we replace a deadlock
with a smaller race.
Shengqi Chen [Wed, 22 Nov 2023 13:58:47 +0000 (21:58 +0800)]
module/icp/asm-arm/sha2: auto detect __ARM_ARCH
This patch uses __ARM_ARCH set by compiler (both
GCC and Clang have this) whenever possible instead
of hardcoding it to 7. This change allows code to
compile on earlier ARM architectures such as armv5te.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #15557
route add <host> -iface <netif>" for a netif without an IPv4/IPv6
address fails with EINVAL. Need to use a link-level ifaddr for gw if
an ifaddr for dst is not found as the rtsock-based implementation does.
Notable upstream pull request merges:
#15532 c1a47de86 zdb: Fix zdb '-O|-r' options with -e/exported zpool
#15535 cf3316633 ZVOL: Minor code cleanup
#15541 803a9c12c brt: lift internal definitions into _impl header
#15541 213d68296 zdb: show BRT statistics and dump its contents
#15543 a49087510 ZIL: Refactor TX_WRITE encryption similar to
TX_CLONE_RANGE
#15543 27d8c23c5 ZIL: Do not encrypt block pointers in lr_clone_range_t
#15549 67894a597 unnecessary alloc/free in dsl_scan_visitbp()
#15551 126efb588 FreeBSD: Fix the build on FreeBSD 12
#15563 acb33ee1c FreeBSD: Fix ZFS so that snapshots under .zfs/snapshot are
NFS visible
#15564 7bbd42ef4 Don't allow attach to a raidz child vdev
#15566 688514e47 dmu_buf_will_clone: fix race in transition back to NOFILL
#15571 30d581121 dnode_is_dirty: check dnode and its data for dirtiness
Mike Karels [Tue, 28 Nov 2023 19:47:37 +0000 (13:47 -0600)]
ifconfig: add -D option to print driver name for interface
Add -D option to add the drivername and unit number to ifconfig output
for normal display, including -a. Use ifconfig_get_orig_name() from
libifconfig to fetch the name. Note that this is the original name
for many drivers, but not for some exceptions like epair (which appends
'a' or 'b' to the unit number). epair interface pairs both display
as "epair0", etc. Make -v imply -D; might as well be fully verbose.
Mark Johnston [Tue, 28 Nov 2023 19:35:49 +0000 (14:35 -0500)]
ossl: Fix handling of separate AAD buffers in ossl_aes_gcm()
Consumers may optionally provide a reference to a separate buffer
containing AAD, but ossl_aes_gcm() didn't handle this and would thus
compute an incorrect digest.
Fixes: 9a3444d91c70 ("ossl: Add a VAES-based AES-GCM implementation for amd64")
Reviewed by: jhb
MFC after: 3 days
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D42736
Linux 6.6 compat: fix configure error with clang (#15558)
With Linux v6.6.x and clang 16, a configure step fails on a warning that
later results in an error while building, due to 'ts' being
uninitialized. Add a trivial initialization to silence the warning.
Signed-off-by: Jaron Kent-Dobias <jaron@kent-dobias.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Rob N [Tue, 28 Nov 2023 17:53:04 +0000 (04:53 +1100)]
dmu_buf_will_clone: fix race in transition back to NOFILL
Previously, dmu_buf_will_clone() would roll back any dirty record, but
would not clean out the modified data nor reset the state before
releasing the lock. That leaves the last-written data in db_data, but
the dbuf in the wrong state.
This is eventually corrected when the dbuf state is made NOFILL, and
dbuf_noread() called (which clears out the old data), but at this point
its too late, because the lock was already dropped with that invalid
state.
Any caller acquiring the lock before the call into
dmu_buf_will_not_fill() can find what appears to be a clean, readable
buffer, and would take the wrong state from it: it should be getting the
data from the cloned block, not from earlier (unwritten) dirty data.
Even after the state was switched to NOFILL, the old data was still not
cleaned out until dbuf_noread(), which is another gap for a caller to
take the lock and read the wrong data.
This commit fixes all this by properly cleaning up the previous state
and then setting the new state before dropping the lock. The
DBUF_VERIFY() calls confirm that the dbuf is in a valid state when the
lock is down.
Sponsored-by: Klara, Inc. Sponsored-By: OpenDrives Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Pawel Jakub Dawidek <pawel@dawidek.net> Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #15566
Closes #15526
It is similar to VOP_GETWRITEMOUNT(), and for given vnode vp should
return the lower vnode which would actually handle write to vp.
Flags allow to specify FREAD or FWRITE for benefit of possible unionfs
implementation.
Reviewed by: markj, Olivier Certner <olce.freebsd@certner.fr>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D42603
EVFILT_TIMER: intialize stop timer list in type-stable proc init, instead of fork
Since kqueue timer may exist after the process that created it exited
(same scenario with rfork(2) as in PR 275286), make the tailq
p_kqtim_stop accessed by filt_timerdetach() type-stable.
Noted and reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D42777
Revert "kqueue: on process exit, force-clear its registered signal events"
This reverts commit 393ac29f0b8be068c8e46f76c2eeee07d20ea4df. A
different fix is following, which preserves semantic, required by the
sys.kqueue.proc3_test.proc3 test.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
PR: 275286
Differential revision: https://reviews.freebsd.org/D42777
Matthew Ahrens [Tue, 28 Nov 2023 17:20:48 +0000 (09:20 -0800)]
unnecessary alloc/free in dsl_scan_visitbp()
Clean up code in dsl_scan_visitbp() by removing an unnecessary
alloc/free and `goto`. This has the side benefit of reducing CPU usage,
which is only really noticeable if we are not doing i/o for the leaf
blocks, like when `zfs_no_scrub_io` is set.
Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #15549
Brooks Davis [Mon, 27 Nov 2023 17:07:06 +0000 (17:07 +0000)]
memfd_create: don't allocate heap memory
Rather than calling calloc() to allocate space for a page size array to
pass to getpagesizes(), just follow the getpagesizes() implementation
and allocate MAXPAGESIZES elements on the stack. This avoids the need
for the allocation.
While this does mean that a new libc is required to take advantage of a
new huge page size, that was already true due to getpagesizes() using a
static buffer of MAXPAGESIZES elements.
Brooks Davis [Mon, 27 Nov 2023 17:06:33 +0000 (17:06 +0000)]
memfd_create: move implementation to libc/gen
Due to memfd_create(3)'s construction of a path to pass to shm_open2(2),
it has a much larger than typical dependency footprint for a system
call wrapper (the list currently includes calloc, memset, sprintf, and
strlen). As such, split it off into its own file under libc/gen to
lighten libc/sys's dependency list.
Brooks Davis [Mon, 27 Nov 2023 17:06:25 +0000 (17:06 +0000)]
getpagesize(3): drop support for non-ELF kernels
AT_PAGESZ was introduced with ELF support in 1996 (commit e1743d02cd14069f69a50bb8a6c626c1c6f47ddd) so we can safely count on
being able to use it to get our page size via elf_aux_info(). As such
we don't need a fallback sysctl query.
Save a few bytes of bss by dropping caching as elf_aux_info() runs
in constant time for a given query.
Brooks Davis [Mon, 27 Nov 2023 17:06:01 +0000 (17:06 +0000)]
getpagesizes(3): drop support for kernels before 9.0
AT_PAGESIZES and elf_aux_info where added prior to FreeBSD 9.0 in commit ee235befcb8253fab9beea27b916f1bc46b33147. It's safe to say that a
FreeBSD 15 libc won't work on a 8.x kernel so drop sysctl fallback.
Rob N [Tue, 28 Nov 2023 17:07:57 +0000 (04:07 +1100)]
dnode_is_dirty: check dnode and its data for dirtiness
Over its history this the dirty dnode test has been changed between
checking for a dnodes being on `os_dirty_dnodes` (`dn_dirty_link`) and
`dn_dirty_record`.
de198f2d9 Fix lseek(SEEK_DATA/SEEK_HOLE) mmap consistency 2531ce372 Revert "Report holes when there are only metadata changes" ec4f9b8f3 Report holes when there are only metadata changes 454365bba Fix dirty check in dmu_offset_next() 66aca2473 SEEK_HOLE should not block on txg_wait_synced()
In the case of appending data to a newly created file, the dnode proper
is dirtied (at least to change the blocksize) and dirty records are
added. Thus, a single logical operation is represented by separate
dirty indicators, and must not be separated.
The incorrect dirty check becomes a problem when the first block of a
file is being appended to while another process is calling lseek to skip
holes. There is a small window where the dnode part is undirtied while
there are still dirty records. In this case, `lseek(fd, 0, SEEK_DATA)`
would not know that the file is dirty, and would go to
`dnode_next_offset()`. Since the object has no data blocks yet, it
returns `ESRCH`, indicating no data found, which results in `ENXIO`
being returned to `lseek()`'s caller.
Since coreutils 9.2, `cp` performs sparse copies by default, that is, it
uses `SEEK_DATA` and `SEEK_HOLE` against the source file and attempts to
replicate the holes in the target. When it hits the bug, its initial
search for data fails, and it goes on to call `fallocate()` to create a
hole over the entire destination file.
This has come up more recently as users upgrade their systems, getting
OpenZFS 2.2 as well as a newer coreutils. However, this problem has been
reproduced against 2.1, as well as on FreeBSD 13 and 14.
This change simply updates the dirty check to check both types of dirty.
If there's anything dirty at all, we immediately go to the "wait for
sync" stage, It doesn't really matter after that; both changes are on
disk, so the dirty fields should be correct.
Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Rich Ercolani <rincebrain@gmail.com> Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #15571
Closes #15526
The vm_guest variable readily covers all uses of xen_domain_type, so
merge them together. Since support for PV domains has been removed
hard-core xen_pv_domain() to return false.
Elliott Mitchell [Wed, 10 Nov 2021 01:18:37 +0000 (17:18 -0800)]
xen/intr: remove xenpci headers
These were needed in the past, since that time the interrupt code has
been successfully isolated from the Xen/PCI code. As such a bit of
straightforward cleanup.
While otherwise a handy potential approach, getting the trapframe via the
argument isn't documented and isn't supposed to be used. While
ipi_bitmap_handler() and ipi_swi_handler() need to be passed the
trapframe as their arguments, the Xen functions can retrieve it from
curthread->td_intr_frame, which is the proper way.
A precursor to merging them. The spacing differs quite a bit between
the i386 and amd64 hypercall headers, despite very similar content.
Consistently use tabs instead of spaces.
Gleb Smirnoff [Tue, 28 Nov 2023 04:10:52 +0000 (20:10 -0800)]
netgraph: increase size of sockaddr_ng to match maximum node name
The ng_socket(4) node already writes more than declared size of the
struct at least in the in ng_getsockaddr(). Make size match size of
a node name. The value is pasted instead of including ng_message.h
into ng_socket.h. This is external API and we want to keep it stable
even if NG_NODESIZ is redefined in a kernel build.
rmacklem [Tue, 28 Nov 2023 00:31:03 +0000 (16:31 -0800)]
FreeBSD: Fix ZFS so that snapshots under .zfs/snapshot are NFS visible
Call vfs_exjail_clone() for mounts created under .zfs/snapshot
to fill in the mnt_exjail field for the mount. If this is not
done, the snapshots under .zfs/snapshot with not be accessible
over NFS.
This version has the argument name in vfs.h fixed to match that
of the name in spl_vfs.c, although it really does not matter.
External-issue: https://reviews.freebsd.org/D42672 Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca>
Closes #15563
Warner Losh [Mon, 27 Nov 2023 22:40:40 +0000 (15:40 -0700)]
pmbr: Only load the first 545k rather than error out
It would be nice to have larger boot partitions for ESPs to live in one
day. It's trivial to carve out 5M 10M or 200M when provisioning, but
logistical issues may make it hard to do it after the fact. So only warn
when the partition is > 545k. If we ever grow the boot loader larger
than that, then it will be responsible for loading the rest anyway.
Warner Losh [Mon, 27 Nov 2023 18:48:39 +0000 (11:48 -0700)]
cdefs: Remove __func__ stub.
Redo 17a238a15fbe. Remove the __func__ crutch for gcc 2.95 and earlier.
We don't need it today to build the tree (since gcc < 12 is unlikely to
work). And it's not used in any system header that's part of the
standard interfaces today (so we don't need it for compatibility). And
we have other issues that make gcc < 4.2 unlikely to work today with
system headers.
Akash B [Mon, 27 Nov 2023 21:41:58 +0000 (03:11 +0530)]
zdb: Fix zdb '-O|-r' options with -e/exported zpool
zdb with '-e' or exported zpool doesn't work along with
'-O' and '-r' options as we process them before '-e' has
been processed.
Below errors are seen:
~> zdb -e pool-mds65/mdt65 -O oi.9/0x200000009:0x0:0x0
failed to hold dataset 'pool-mds65/mdt65': No such file or directory
~> zdb -e pool-oss0/ost0 -r file1 /tmp/filecopy1 -p.
failed to hold dataset 'pool-oss0/ost0': No such file or directory
zdb: internal error: No such file or directory
We need to make sure to process '-O|-r' options after the
'-e' option has been processed, which imports the pool to
the namespace if it's not in the cachefile.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Akash B <akash-b@hpe.com>
Closes #15532
Colin Percival [Mon, 27 Nov 2023 21:29:05 +0000 (13:29 -0800)]
Makefile.vm: Fix duplicate rc.conf files
Two bugs in Makefile.vm resulted in disk images being "built" multiple
times, resulting in lines added to /etc/rc.conf being duplicated:
1. The vm-image target reused the same "staging tree" directory for all
of its builds (multiple disk image types and multiple filesystem types).
2. The cw-type-flavour-fs target depends on emulator-portinstall, which
did not have a 'touch ${.TARGET}' and thus re-ran every time -- and
caused the cw-type-flavour-fs target to be re-run. This was triggered
by release builds running `make cloudware-release` (creating the disk
images) followed by `make ec2amis` (which re-created the disk images
prior to uploading them).
Rob Norris [Sat, 18 Nov 2023 10:33:45 +0000 (21:33 +1100)]
zdb: show BRT statistics and dump its contents
Same idea as the dedup stats, but for block cloning.
Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Kay Pedersen <mail@mkwg.de> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15541
Rob Norris [Sat, 18 Nov 2023 10:32:16 +0000 (21:32 +1100)]
brt: lift internal definitions into _impl header
So that zdb (and others!) can get at the BRT on-disk structures.
Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Kay Pedersen <mail@mkwg.de> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15541
Tony Hutter [Mon, 27 Nov 2023 21:24:37 +0000 (13:24 -0800)]
ZTS: Fix zfs_load-key failures on F39
The zfs_load-key tests were failing on F39 due to their use of the
deprecated ssl.wrap_socket function. This commit updates the test to
instead use ssl.SSLContext() as described in
https://stackoverflow.com/a/65194957.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15534
Closes #15550
AllKind [Mon, 27 Nov 2023 21:17:48 +0000 (22:17 +0100)]
zfs-dkms: fix shell-init error message
If all zfs dkms modules have been removed, a shell-init error message
may appear, because /var/lib/dkms/zfs does no longer exist.
Resolve this by leaving the directory earlier on.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Mart Frauenlob <AllKind@fastest.cc>
Closes #15576
Alexander Motin [Mon, 27 Nov 2023 21:16:59 +0000 (16:16 -0500)]
ZVOL: Minor code cleanup
- Remove zsda_tx field, it is used only once.
- Remove unneeded string lengths checks, all names are terminated.
- Replace few explicit MAXNAMELEN usages with sizeof().
- Change dsname from MAXNAMELEN to ZFS_MAX_DATASET_NAME_LEN, as
expected by dsl_dataset_name(). Both are 256 bytes now, but it is
better to be safe.
This should have no functional difference.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15535
Gleb Smirnoff [Mon, 27 Nov 2023 21:15:58 +0000 (13:15 -0800)]
tests: don't run atf_* in a subshell
Shell limitation is that a classic function call via $() is a subshell
and atf-sh(3) commands won't work as epxected there. Subsequently,
atf_skip inside a function won't skip a test. The test will fail later.
A working approach is to pass desired variable name as argument to
a function and don't run subshell.
Alan Somers [Mon, 27 Nov 2023 20:58:03 +0000 (13:58 -0700)]
FreeBSD: Fix the build on FreeBSD 12
It was broken for several reasons:
* VOP_UNLOCK lost an argument in 13.0. So OpenZFS should be using
VOP_UNLOCK1, but a few direct calls to VOP_UNLOCK snuck in.
* The location of the zlib header moved in 13.0 and 12.1. We can drop
support for building on 12.0, which is EoL.
* knlist_init lost an argument in 13.0. OpenZFS change 9d0887402ba
assumed 13.0 or later.
* FreeBSD 13.0 added copy_file_range, and OpenZFS change 67a1b037915
assumed 13.0 or later.
Sponsored-by: Axcient Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #15551
Luiz Amaral [Mon, 27 Nov 2023 15:53:27 +0000 (16:53 +0100)]
pfctl: Fix recursive printing of anchor labels
We recently noticed that the recursive printing of labels wasn't working
like the recursive printing of rules.
When running pfctl -sr -a* we get a listing of all rules, including the
ones inside anchors. On the other hand, when running pfctl -sl -a*, it
would only print the labels in the root level, just like without the
-a* argument.
As in our use-case we are interested on labels only and our labels are
unique even between anchors, we didn't add indentation or hierarchy to
the printing.
Alexander Motin [Wed, 22 Nov 2023 18:15:32 +0000 (13:15 -0500)]
ZIL: Refactor TX_WRITE encryption similar to TX_CLONE_RANGE
It should be purely textual change to make the code more readable.
Should cause no functional difference.
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Tom Caputi <caputit1@tcnj.edu> Reviewed-by: Sean Eric Fagan <sef@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Edmund Nadolski <edmund.nadolski@ixsystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15543
Closes #15513
Alexander Motin [Sun, 19 Nov 2023 01:01:03 +0000 (20:01 -0500)]
ZIL: Do not encrypt block pointers in lr_clone_range_t
In case of crash cloned blocks need to be claimed on pool import.
It is only possible if they (lr_bps) and their count (lr_nbps) are
not encrypted but only authenticated, similar to block pointer in
lr_write_t. Few other fields can be and are still encrypted.
This should fix panic on ZIL claim after crash when block cloning
is actively used.
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Tom Caputi <caputit1@tcnj.edu> Reviewed-by: Sean Eric Fagan <sef@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Edmund Nadolski <edmund.nadolski@ixsystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15543
Closes #15513
Don Brady [Mon, 27 Nov 2023 17:46:38 +0000 (10:46 -0700)]
Don't allow attach to a raidz child vdev
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Don Brady <don.brady@klarasystems.com>
Closes #15536
Closes #15564