]> CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log
FreeBSD/FreeBSD.git
6 months agozfs: merge openzfs/zfs@a03ebd9be
Martin Matuska [Wed, 29 Nov 2023 22:07:33 +0000 (23:07 +0100)]
zfs: merge openzfs/zfs@a03ebd9be

Notable upstream pull request merges:
 #15517 2a27fd411 ZIL: Assert record sizes in different places
 #15557 b94ce4e17 module/icp/asm-arm/sha2: fix compiling on armv5/6
 #15557 4340f69be module/icp/asm-arm/sha2: auto detect __ARM_ARCH
 #15603 a03ebd9be ZIL: Call brt_pending_add() replaying TX_CLONE_RANGE
 #15606 1c38cdfe9 zdb: fix printf() length for uint64_t devid

Obtained from: OpenZFS
OpenZFS commit: a03ebd9beec6243682557fa692c12b1061fc58bd

6 months agotail: Clean up error messages.
Dag-Erling Smørgrav [Wed, 29 Nov 2023 21:48:57 +0000 (22:48 +0100)]
tail: Clean up error messages.

MFC after: 1 week
Sponsored by: Klara, Inc.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D42842

6 months agotail: Fix heap overflow in -F case.
Dag-Erling Smørgrav [Wed, 29 Nov 2023 21:48:50 +0000 (22:48 +0100)]
tail: Fix heap overflow in -F case.

The number of events we track can vary over time, but we only allocate
enough space for the exact number of events we are tracking when we
first begin, resulting in a trivially reproducable heap overflow.  Fix
this by allocating enough space for the greatest possible number of
events (two per file) and clean up the code a bit.

Also add a test case which triggers the aforementioned heap overflow,
although we don't currently have a way to detect it.

MFC after: 1 week
Sponsored by: Klara, Inc.
Reviewed by: allanjude, markj
Differential Revision: https://reviews.freebsd.org/D42839

6 months agoiwlwififw: add firmware for the Bz/B200 chipset
Bjoern A. Zeeb [Wed, 29 Nov 2023 21:33:23 +0000 (21:33 +0000)]
iwlwififw: add firmware for the Bz/B200 chipset

The iwlwifi driver already supports the chipset as "Bz TBD"
(also in 14.0).  Add the firmware for it.  Successfully tested
for 0x8086/0x272b/0x8086/0x00f4 on arm64 thanks to donated
hardware [1].

    Firmware was obtained from linux-firmware at
    9552083a783e5e48b90de674d4e3bf23bb855ab0 .

Sponsored by: The FreeBSD Foundation
Sponsored by: Martin Hoehne / minipci.biz (B200 card) [1]
MFC after: 3 days

6 months agolinuxkpi: Include <linux/rbtree.h> from <linux/hrtimer.h> and <linux/mm_types.h>
Jean-Sébastien Pédron [Wed, 29 Nov 2023 18:38:54 +0000 (19:38 +0100)]
linuxkpi: Include <linux/rbtree.h> from <linux/hrtimer.h> and <linux/mm_types.h>

[Why]
Some files in DRM rely on this indirect include to use `struct rb_*`.

Reviewed by: manu
Approved by: manu
Differential Revision: https://reviews.freebsd.org/D42835

6 months agovt(4): Call post-switch callback after replacing the backend
Jean-Sébastien Pédron [Wed, 29 Nov 2023 18:34:48 +0000 (19:34 +0100)]
vt(4): Call post-switch callback after replacing the backend

[Why]
For instance, it gives a chance to the new backend to refresh the
screen. This is needed by the vt_drmfb backend and `drm_fb_helper`.

This change was lost when I posted changes to reviews.freebsd.org and it
broken the amdgpu driver... Thanks to manu@ for reporting the problem
and wulf@ to find out the missing change!

Tested by: manu
Reviewed by: manu
Approved by: manu
Differential Revision: https://reviews.freebsd.org/D42834

6 months agoZIL: Call brt_pending_add() replaying TX_CLONE_RANGE
Alexander Motin [Wed, 29 Nov 2023 18:51:34 +0000 (13:51 -0500)]
ZIL: Call brt_pending_add() replaying TX_CLONE_RANGE

zil_claim_clone_range() takes references on cloned blocks before ZIL
replay.  Later zil_free_clone_range() drops them after replay or on
dataset destroy.  The total balance is neutral.  It means on actual
replay we must take additional references, which would stay in BRT.

Without this blocks could be freed prematurely when either original
file or its clone are destroyed.  I've observed BRT being emptied
and the feature being deactivated after ZIL replay completion, which
should not have happened.  With the patch I see expected stats.

Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15603

6 months agox86: Support multiple PCI MCFG regions
John Baldwin [Wed, 29 Nov 2023 18:32:39 +0000 (10:32 -0800)]
x86: Support multiple PCI MCFG regions

In particular, this enables support for PCI config access for domains
(segments) other than 0.

Reported by: cperciva
Tested by: cperciva (m7i.metal-48xl AWS instance)
Reviewed by: imp
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D42828

6 months agox86: Refactor pcie_cfgregopen
John Baldwin [Wed, 29 Nov 2023 18:32:16 +0000 (10:32 -0800)]
x86: Refactor pcie_cfgregopen

Split out some bits of pcie_cfgregopen that only need to be executed
once into helper functions in preparation for supporting multiple MCFG
entries.

Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D42829

6 months agopci_cfgreg: Add a PCI domain argument to the low-level register API
John Baldwin [Wed, 29 Nov 2023 18:31:47 +0000 (10:31 -0800)]
pci_cfgreg: Add a PCI domain argument to the low-level register API

This commit changes the API of pci_cfgreg(read|write) to add a domain
argument (referred to as a segment in ACPI parlance) (note that this
is not the same as a NUMA domain, but something PCI-specific).  This
does not yet enable access to domains other than 0, but updates the
API to support domains.

Places that use hard-coded bus/slot/function addresses have been
updated to hardcode a domain of 0.  A few places that have the PCI
domain (segment) available such as the acpi_pcib_acpi.c Host-PCI
bridge driver pass the PCI domain.

The hpt27xx(4) and hptnr(4) drivers fail to attach to a device not on
domain 0 since they provide APIs to their binary blobs that only
permit bus/slot/function addressing.

The x86 non-ACPI PCI bus drivers all hardcode a domain of 0 as they do
not support multiple domains.

Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D42827

6 months agoagp_amd64: Use <machine/pci_cfgreg.h> rather than bare prototypes
John Baldwin [Wed, 29 Nov 2023 18:31:16 +0000 (10:31 -0800)]
agp_amd64: Use <machine/pci_cfgreg.h> rather than bare prototypes

Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D42826

6 months agoFix zoneid when USER_NS is disabled
Wraithh [Wed, 29 Nov 2023 17:55:17 +0000 (19:55 +0200)]
Fix zoneid when USER_NS is disabled

getzoneid() should return GLOBAL_ZONEID instead of 0 when USER_NS is disabled.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ilkka Sovanto <github@ilkka.kapsi.fi>
Closes #15560

6 months agoZTS: get_persistent_disk_name can return truncated names
VaibhavB [Wed, 29 Nov 2023 17:34:29 +0000 (23:04 +0530)]
ZTS: get_persistent_disk_name can return truncated names

Instead of using only the 3rd element return the entire string after
the split to handle device names with dashes.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Vaibhav Bhanawat <vaibhav.bhanawat@delphix.com>
Closes #15567

6 months agozdb: fix printf() length for uint64_t devid
Martin Matuška [Wed, 29 Nov 2023 17:18:30 +0000 (18:18 +0100)]
zdb: fix printf() length for uint64_t devid

Bug introduced in 213d6829673.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Warner Losh <imp@FreeBSD.org>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #15606

6 months agopf: fix mem leaks upon vnet destroy
Igor Ostapenko [Wed, 29 Nov 2023 12:35:41 +0000 (13:35 +0100)]
pf: fix mem leaks upon vnet destroy

Add missing cleanup actions:
- remove user defined anchor rulesets
- remove user defined ether anchor rulesets
- remove tables linked to user defined anchors
- deal with wildcard anchor peculiarities to get them removed correctly

PR: 274310
Reviewed by: kp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D42747

6 months agoossl: Keep mutable AES-GCM state on the stack
Mark Johnston [Wed, 29 Nov 2023 17:51:55 +0000 (12:51 -0500)]
ossl: Keep mutable AES-GCM state on the stack

ossl(4)'s AES-GCM implementation keeps mutable state in the session
structure, together with the key schedule.  This was done for
convenience, as both are initialized together.  However, some OCF
consumers, particularly ZFS, assume that requests may be dispatched to
the same session in parallel.  Without serialization, this results in
incorrect output.

Fix the problem by explicitly copying per-session state onto the stack
at the beginning of each operation.

PR: 275306
Reviewed by: jhb
Fixes: 9a3444d91c70 ("ossl: Add a VAES-based AES-GCM implementation for amd64")
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D42783

6 months agoopenzfs: unbreak 32-bit builds.
Warner Losh [Wed, 29 Nov 2023 15:26:29 +0000 (08:26 -0700)]
openzfs: unbreak 32-bit builds.

32-bit builds are broken. fix that by using PRIu64 instead of a
bare '%lu.'

Feel free to revert when upstream has this fixed. I'm agnostic as to the
proper fix, but don't have the time to fight upstreaming this on top of
everything else.

6 months agozfsd: fault disks that generate too many I/O delay events
Alan Somers [Wed, 12 Jul 2023 20:46:27 +0000 (14:46 -0600)]
zfsd: fault disks that generate too many I/O delay events

If ZFS reports that a disk had at least 8 I/O operations over 60s that
were each delayed by at least 30s (implying a queue depth > 4 or I/O
aggregation, obviously), fault that disk.  Disks that respond this
slowly can degrade the entire system's performance.

MFC after: 2 weeks
Sponsored by: Axcient
Reviewed by: delphij
Differential Revision: https://reviews.freebsd.org/D42825

6 months agompi3mr: Minor tweak to task queue pausing
Warner Losh [Wed, 29 Nov 2023 01:50:57 +0000 (18:50 -0700)]
mpi3mr: Minor tweak to task queue pausing

Use a while loop with cancel / drain to make sure that all tasks have
completed before proceeding to reset.

Suggested by: jhb
Sponsored by: Netflix

6 months agompi3mr: Assume dma_hiaddr is BUS_SPACE_MAXADDR
Warner Losh [Wed, 29 Nov 2023 01:50:52 +0000 (18:50 -0700)]
mpi3mr: Assume dma_hiaddr is BUS_SPACE_MAXADDR

No sense having a variable for this. So use BUS_SPACE_MAXADDR and remove
dma_hiaddr from softc.

Suggested by: jhb
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D42808

6 months agompi3mr: Replace can't happen DataLength == 0 with an assert
Warner Losh [Wed, 29 Nov 2023 01:50:47 +0000 (18:50 -0700)]
mpi3mr: Replace can't happen DataLength == 0 with an assert

Replace the test for DataLength == 0 with an assert. It can't happen,
but an assert doesn't hurt. Emacs removed some trailing white space too.

Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D42807

6 months agompi3mr: Use template for main busdma tag.
Alexander Motin [Wed, 29 Nov 2023 01:50:39 +0000 (18:50 -0700)]
mpi3mr: Use template for main busdma tag.

Use the simpler template code for the parent busdma tag for all I/O to
this card.

Reviewed by: mav, jhb, imp
Differential Revision: https://reviews.freebsd.org/D42607

6 months agompi3mr: Make these bus_dmamap_load calls synchronous
Alexander Motin [Wed, 29 Nov 2023 01:50:30 +0000 (18:50 -0700)]
mpi3mr: Make these bus_dmamap_load calls synchronous

These calls "should" all be synchrounous. There's no bouncing that's
needed for them (at least in the typical case that we have a sane card
that has more bits of dma addresses decoded than we have memory), so
there's no errors possible. Ensure these calls are really synchronous
with BUS_DMA_NOWAIT flags (which should never fail now that the
bus_dmamem_alloc() has succeeded).

Reviewed by: mav, jhb, imp
Differential Revision: https://reviews.freebsd.org/D42606

6 months agompi3mr: Fix MAXPHYS usage
Alexander Motin [Wed, 29 Nov 2023 01:50:24 +0000 (18:50 -0700)]
mpi3mr: Fix MAXPHYS usage

This usage is obsolete. Replace with maximum bus space size. maxphys
will sort itself out at higher levels.

Reviewed by: mav, jhb, imp
Differential Revision: https://reviews.freebsd.org/D42605

6 months agompi3mr: Add firmware version
Warner Losh [Wed, 29 Nov 2023 01:50:10 +0000 (18:50 -0700)]
mpi3mr: Add firmware version

Publish the firmware version on the card like we do for mps/mpr.

Sponsored by: Netflix
Reviewed by: mav
Differential Revision: https://reviews.freebsd.org/D42588

6 months agompi3mr: Trivial trailing white space reduction
Warner Losh [Wed, 29 Nov 2023 01:49:56 +0000 (18:49 -0700)]
mpi3mr: Trivial trailing white space reduction

Sponsored by: Netflix

6 months agompi3mr: Honor the dma mask from IOCFacts
Warner Losh [Wed, 29 Nov 2023 01:49:49 +0000 (18:49 -0700)]
mpi3mr: Honor the dma mask from IOCFacts

The number of signficant bits that are decoded are returned in the flags
field of the IOCFacts structure from the device.  Rather than assume the
worst with a pessimal 32-bit maximum, look at this value and pass it
along to all the dma map creation requests.

A lof of those creations are repetitive and could just inherit from the
base tag if we moved to the templated interface.  This is called out as
desireable future work not done at this time.

In addition, due to a chicken and an egg problem, we have to allocate
some of the maps with a 32-bit loaddr.  These are the ones we need to
read iocfacts.  And they are fine to be so restricted: they are little
used after startup, and when they are used, bouncing is fine.

Sponsored by: Netflix
Reviewed by: mav
Differential Revision: https://reviews.freebsd.org/D42559

6 months agompi3mr: Fix EINPROGRESS errors hanging the card
Warner Losh [Wed, 29 Nov 2023 01:49:39 +0000 (18:49 -0700)]
mpi3mr: Fix EINPROGRESS errors hanging the card

Move enqueueing of commands to bus_dmamap_load_ccb callback

Fix fundamental difference between FreeBSD and Linux. On Linux, your dma
load callback always happends before it returns, so drivers are written
to load the map, then submit to hardware. On FreeBSD, the callback may
be deferred and return EINPROGRESS. This means the callback is
responsible for queueing the request to the hardware is done after the
SGL list is created. Make a number of interrelated cahnages:

At the end of mpi3mr_prepare_sgls, add a call to mpi3mr_enqueue_request.

Split the hardware submission out from the end of mpi3mr_action_scsiio
and move it into a new routine mpi3mr_enqueue_request.

Move all error completion from the end of mpi3mr_action_scsiio to where
the error is detected. We cannot pass errors back from the
mpi3mr_enqueue_request to do this on a 'failed' mpi3mr in a centralized
place (since it has to be fire and forget).

Add comments about zero length SGLs never making it into
mpi3mr_prepare_sgls. Keep the code there for the moment, but we only set
cm->data to non-NULL when scsiio_req->DataLength is not zero. So the
datalength can't be zero and we can't send the zero SGLs.

Add commentts about other "impossible" tests in mpi3mr_prepare_sgls that
really should be simple asserts of some flavor.

Eliminate cm->error_code, since we can't pass data back from the
mpi3mr_prepare_sgl callback anymore.

In mpi3mr_map_request, call mpi3mr_enqueue_request for the no data case.
This seems to work even though we've not done the special zero length
handling that was in mpi3mr_prepare_sgls, giving further evidence to it
not actually being needed. This is needed for SCSI CDBs that have no
data to pass to the drive like TEST UNIT READY.

With this change, and the prior ones, we're now able to run with mpi3mr
on 128GB systems and very heavy disk load (so many buffers land > 4GB:
the driver instructs busdma to never use memory abouve 4GB, which may be
too conservative, but an issue for another time).

Sponsored by: Netflix
Reviewed by: sumit.saxena_broadcom.com, mav, jhb
Differential Revision: https://reviews.freebsd.org/D42543

6 months agompi3mr: Cleaup setting of status in processing scsiio requests
Warner Losh [Wed, 29 Nov 2023 01:49:30 +0000 (18:49 -0700)]
mpi3mr: Cleaup setting of status in processing scsiio requests

More uniformly use mpi3mr_set_ccbstatus in mpi3mr_action_scsiio.  The
routine mostly used it, but also has setting of status by hand. In those
cases where we want to error out the request, use this routine.

As part of this, move setting CAM_SIM_QUEUED later in the function to
when we're sure it's been queued. Remove the places we clear it before
this.

Sponsored by: Netflix
Reviewed by: mav, jhb
Differential Revision: https://reviews.freebsd.org/D42542

6 months agompi3mr: Only set callout_owned when we create a timeout
Warner Losh [Wed, 29 Nov 2023 01:49:24 +0000 (18:49 -0700)]
mpi3mr: Only set callout_owned when we create a timeout

Since we assume there's a timeout to cancel when this is true, only set
it true when we set the timeout. Otherwise we may try to cancel a timeout
when there's been an error in submission.

Sponsored by: Netflix
Reviewed by: mav
Differential Revision: https://reviews.freebsd.org/D42541

6 months agompi3mr: Minor style fix
Warner Losh [Wed, 29 Nov 2023 01:49:16 +0000 (18:49 -0700)]
mpi3mr: Minor style fix

Fold two lines to make this more readable.

Sponsored by: Netflix
Reviewed by: mav, jhb
Differential Revision: https://reviews.freebsd.org/D42540

6 months agompi3mr: Reduce the scope of the reset_mutext
Warner Losh [Wed, 29 Nov 2023 01:49:08 +0000 (18:49 -0700)]
mpi3mr: Reduce the scope of the reset_mutext

Reduce the scope of reset_mutext to protect the msleep in the watch dog
thread as well as the MPI3MR_FLAGS_SHUTDOWN bit. Use it to protect the
wakeup in mpi3mr_detach so this thread can exit sooner when we're trying
to do an orderly shutdown. Optimize the flow to check the sleep and
other conditions before going to sleep.

It's an open question if this should protect sc->unrecoverable, and if
we should wakeup the watchdog thread when we set it. We might also want
to move too booleans for the three flags that we have now in
mpi3mr_flags. There are a number of U8s that should really be bools and
we might want to also group them together to pack softc better.

Sponsored by: Netflix
Reviewed by: mav
Differential Revision: https://reviews.freebsd.org/D42539

6 months agompi3mr: Remove unused fields in struct mpi3mr_cmd
Warner Losh [Wed, 29 Nov 2023 01:49:01 +0000 (18:49 -0700)]
mpi3mr: Remove unused fields in struct mpi3mr_cmd

All of these fields are either unused, or just initialized. Remove
them. This saves about 1MB of memory for the cards that I have which can
do 8k transactions at once.

Sponsored by: Netflix
Reviewed by: mav, jhb
Differential Revision: https://reviews.freebsd.org/D42538

6 months agompi3mr: Don't hold fwevt_lock over call to taskqueue_drain
Warner Losh [Wed, 29 Nov 2023 01:48:48 +0000 (18:48 -0700)]
mpi3mr: Don't hold fwevt_lock over call to taskqueue_drain

Holding fwevt_lock when we call taskqueue_drain can lead to deadlock
because it's draining a queue needs fwevt_lock to do work, so that other
thread will try to take out the lock and block, making the thread never
finish and taskqueue_drain never complete. There's a witness
warning/error for this which was exposed when the lock was converted to
a MTX_DEF lock from a MTX_SPIN prior to committing to the FreeBSD tree.

The lock appears to be to protect against additional items being added
to the event list while we're doing a reset. Since the taskqueue is
blocked, items can get added to the list, but won't be processed during
the reset, but there is still a (likely small) race between the
taskqueue_drain and the taskqueue_block calls where an interrupt could
fire on another CPU, resulting in a task being enqueued and started
before the block can take effect. The only way to fix that race is to
turn off interrupt processing during a reset. So we replace a deadlock
with a smaller race.

Sponsored by: Netflix
Reviewed by: sumit.saxena_broadcom.com, mav, jhb
Differential Revision: https://reviews.freebsd.org/D42537

6 months agosys/sys: Remove some more vestiges of the $FreeBSD$
Konstantin Belousov [Tue, 28 Nov 2023 22:56:03 +0000 (00:56 +0200)]
sys/sys: Remove some more vestiges of the $FreeBSD$

Reviewed by: imp
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D42824

6 months agonetlink: Add tests when adding an interface route
Jose Luis Duran [Tue, 28 Nov 2023 19:58:03 +0000 (14:58 -0500)]
netlink: Add tests when adding an interface route

Add tests for adding a route using an interface only (without an IP
address).

Reviewed by: rcm
Approved by: kp (mentor)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D41436

6 months agoZIL: Assert record sizes in different places
Alexander Motin [Tue, 28 Nov 2023 21:35:14 +0000 (16:35 -0500)]
ZIL: Assert record sizes in different places

This should make sure we have log written without overflows.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15517

6 months agomodule/icp/asm-arm/sha2: fix compiling on armv5/6
Shengqi Chen [Wed, 22 Nov 2023 14:27:24 +0000 (22:27 +0800)]
module/icp/asm-arm/sha2: fix compiling on armv5/6

The `adr` insn in neon kernel generates an compiling
error on armv5/6 target. Fix that by using `ldr`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #15557

6 months agomodule/icp/asm-arm/sha2: auto detect __ARM_ARCH
Shengqi Chen [Wed, 22 Nov 2023 13:58:47 +0000 (21:58 +0800)]
module/icp/asm-arm/sha2: auto detect __ARM_ARCH

This patch uses __ARM_ARCH set by compiler (both
GCC and Clang have this) whenever possible instead
of hardcoding it to 7. This change allows code to
compile on earlier ARM architectures such as armv5te.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #15557

6 months agoroute: introduce add interface route test cases
R. Christian McDonald [Tue, 28 Nov 2023 18:18:15 +0000 (13:18 -0500)]
route: introduce add interface route test cases

As a followup to D41330 and D41436, this patch introduces two new tests
for sbin/route: interface_route_v[46].

These tests fail without D41330.

Reviewed by: kp
Approved by: kp (mentor)
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")

6 months agonetlink: fix adding an interface route
KUROSAWA Takahiro [Tue, 28 Nov 2023 18:14:50 +0000 (13:14 -0500)]
netlink: fix adding an interface route

route add <host> -iface <netif>" for a netif without an IPv4/IPv6
address fails with EINVAL. Need to use a link-level ifaddr for gw if
an ifaddr for dst is not found as the rtsock-based implementation does.

PR: 275341
Reported by: Sean Cody <sean@tinfoilhat.ca>
Reviewed by: rcm
Tested by: rcm
Approved by: kp (mentor)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D41330

6 months agozfs: merge openzfs/zfs@688514e47
Martin Matuska [Tue, 28 Nov 2023 20:35:02 +0000 (21:35 +0100)]
zfs: merge openzfs/zfs@688514e47

Notable upstream pull request merges:
 #15532 c1a47de86 zdb: Fix zdb '-O|-r' options with -e/exported zpool
 #15535 cf3316633 ZVOL: Minor code cleanup
 #15541 803a9c12c brt: lift internal definitions into _impl header
 #15541 213d68296 zdb: show BRT statistics and dump its contents
 #15543 a49087510 ZIL: Refactor TX_WRITE encryption similar to
                  TX_CLONE_RANGE
 #15543 27d8c23c5 ZIL: Do not encrypt block pointers in lr_clone_range_t
 #15549 67894a597 unnecessary alloc/free in dsl_scan_visitbp()
 #15551 126efb588 FreeBSD: Fix the build on FreeBSD 12
 #15563 acb33ee1c FreeBSD: Fix ZFS so that snapshots under .zfs/snapshot are
                  NFS visible
 #15564 7bbd42ef4 Don't allow attach to a raidz child vdev
 #15566 688514e47 dmu_buf_will_clone: fix race in transition back to NOFILL
 #15571 30d581121 dnode_is_dirty: check dnode and its data for dirtiness

Obtained from: OpenZFS
OpenZFS commit: 688514e4704bdee4551d25960febd322ac26f297

6 months agoifconfig: add -D option to print driver name for interface
Mike Karels [Tue, 28 Nov 2023 19:47:37 +0000 (13:47 -0600)]
ifconfig: add -D option to print driver name for interface

Add -D option to add the drivername and unit number to ifconfig output
for normal display, including -a.  Use ifconfig_get_orig_name() from
libifconfig to fetch the name.  Note that this is the original name
for many drivers, but not for some exceptions like epair (which appends
'a' or 'b' to the unit number).  epair interface pairs both display
as "epair0", etc.  Make -v imply -D; might as well be fully verbose.

MFC after: 1 week
Reviewed by: zlei, kp
Differential Revision: https://reviews.freebsd.org/D42721

6 months agoossl: Fix handling of separate AAD buffers in ossl_aes_gcm()
Mark Johnston [Tue, 28 Nov 2023 19:35:49 +0000 (14:35 -0500)]
ossl: Fix handling of separate AAD buffers in ossl_aes_gcm()

Consumers may optionally provide a reference to a separate buffer
containing AAD, but ossl_aes_gcm() didn't handle this and would thus
compute an incorrect digest.

Fixes: 9a3444d91c70 ("ossl: Add a VAES-based AES-GCM implementation for amd64")
Reviewed by: jhb
MFC after: 3 days
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D42736

6 months agoLinux 6.6 compat: fix configure error with clang (#15558)
Jaron Kent-Dobias [Tue, 28 Nov 2023 19:34:40 +0000 (20:34 +0100)]
Linux 6.6 compat: fix configure error with clang (#15558)

With Linux v6.6.x and clang 16, a configure step fails on a warning that
later results in an error while building, due to 'ts' being
uninitialized. Add a trivial initialization to silence the warning.

Signed-off-by: Jaron Kent-Dobias <jaron@kent-dobias.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
6 months agocompiler-rt: remove unnecessary include
Dimitry Andric [Tue, 28 Nov 2023 18:17:36 +0000 (19:17 +0100)]
compiler-rt: remove unnecessary include

This is to sync the code with upstream, see:
See https://github.com/llvm/llvm-project/pull/73439#discussion_r1406644942

Fixes: 4c9a0adad182
MFC after: 3 days

6 months agodmu_buf_will_clone: fix race in transition back to NOFILL
Rob N [Tue, 28 Nov 2023 17:53:04 +0000 (04:53 +1100)]
dmu_buf_will_clone: fix race in transition back to NOFILL

Previously, dmu_buf_will_clone() would roll back any dirty record, but
would not clean out the modified data nor reset the state before
releasing the lock. That leaves the last-written data in db_data, but
the dbuf in the wrong state.

This is eventually corrected when the dbuf state is made NOFILL, and
dbuf_noread() called (which clears out the old data), but at this point
its too late, because the lock was already dropped with that invalid
state.

Any caller acquiring the lock before the call into
dmu_buf_will_not_fill() can find what appears to be a clean, readable
buffer, and would take the wrong state from it: it should be getting the
data from the cloned block, not from earlier (unwritten) dirty data.

Even after the state was switched to NOFILL, the old data was still not
cleaned out until dbuf_noread(), which is another gap for a caller to
take the lock and read the wrong data.

This commit fixes all this by properly cleaning up the previous state
and then setting the new state before dropping the lock. The
DBUF_VERIFY() calls confirm that the dbuf is in a valid state when the
lock is down.

Sponsored-by: Klara, Inc.
Sponsored-By: OpenDrives Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #15566
Closes #15526

6 months agonullfs: do not allow bypass on copy_file_range()
Konstantin Belousov [Sat, 18 Nov 2023 09:23:22 +0000 (11:23 +0200)]
nullfs: do not allow bypass on copy_file_range()

There must be no callers of VOP_COPY_FILE_RANGE() except
vn_copy_file_range(), which does enough to find the write-vnodes where
to call the VOP.

Reviewed by: markj, Olivier Certner <olce.freebsd@certner.fr>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D42603

6 months agovn_copy_file_range(): provide ENOSYS fallback to vn_generic_copy_file_range()
Konstantin Belousov [Sat, 18 Nov 2023 08:59:19 +0000 (10:59 +0200)]
vn_copy_file_range(): provide ENOSYS fallback to vn_generic_copy_file_range()

Reviewed by: markj, Olivier Certner <olce.freebsd@certner.fr>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D42603

6 months agovn_copy_file_range(): find write vnodes on which to call the VOP
Konstantin Belousov [Sat, 18 Nov 2023 08:57:44 +0000 (10:57 +0200)]
vn_copy_file_range(): find write vnodes on which to call the VOP

Reviewed by: markj, Olivier Certner <olce.freebsd@certner.fr>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D42603

6 months agoVFS: add VOP_GETLOWVNODE()
Konstantin Belousov [Sat, 18 Nov 2023 08:55:48 +0000 (10:55 +0200)]
VFS: add VOP_GETLOWVNODE()

It is similar to VOP_GETWRITEMOUNT(), and for given vnode vp should
return the lower vnode which would actually handle write to vp.
Flags allow to specify FREAD or FWRITE for benefit of possible unionfs
implementation.

Reviewed by: markj, Olivier Certner <olce.freebsd@certner.fr>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D42603

6 months agoEVFILT_TIMER: intialize stop timer list in type-stable proc init, instead of fork
Konstantin Belousov [Tue, 28 Nov 2023 15:42:49 +0000 (17:42 +0200)]
EVFILT_TIMER: intialize stop timer list in type-stable proc init, instead of fork

Since kqueue timer may exist after the process that created it exited
(same scenario with rfork(2) as in PR 275286), make the tailq
p_kqtim_stop accessed by filt_timerdetach() type-stable.

Noted and reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D42777

6 months agoEVFILT_SIGNAL: do not use target process pointer on detach
Konstantin Belousov [Tue, 28 Nov 2023 12:51:54 +0000 (14:51 +0200)]
EVFILT_SIGNAL: do not use target process pointer on detach

It is enough to know knlist to remove from it, and the list is
autodestroyed on last removal.

PR: 275286
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D42777

6 months agoRevert "kqueue: on process exit, force-clear its registered signal events"
Konstantin Belousov [Tue, 28 Nov 2023 12:32:24 +0000 (14:32 +0200)]
Revert "kqueue: on process exit, force-clear its registered signal events"

This reverts commit 393ac29f0b8be068c8e46f76c2eeee07d20ea4df.  A
different fix is following, which preserves semantic, required by the
sys.kqueue.proc3_test.proc3 test.

Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
PR: 275286
Differential revision: https://reviews.freebsd.org/D42777

6 months agounnecessary alloc/free in dsl_scan_visitbp()
Matthew Ahrens [Tue, 28 Nov 2023 17:20:48 +0000 (09:20 -0800)]
unnecessary alloc/free in dsl_scan_visitbp()

Clean up code in dsl_scan_visitbp() by removing an unnecessary
alloc/free and `goto`.  This has the side benefit of reducing CPU usage,
which is only really noticeable if we are not doing i/o for the leaf
blocks, like when `zfs_no_scrub_io` is set.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #15549

6 months agopst-raid: De-pessimize the building of i386 kernels
Warner Losh [Tue, 28 Nov 2023 17:14:04 +0000 (10:14 -0700)]
pst-raid: De-pessimize the building of i386 kernels

Add include of sys/proc.h

Fixes: c4dacfa7f4b8
6 months agomemfd_create: don't allocate heap memory
Brooks Davis [Mon, 27 Nov 2023 17:07:06 +0000 (17:07 +0000)]
memfd_create: don't allocate heap memory

Rather than calling calloc() to allocate space for a page size array to
pass to getpagesizes(), just follow the getpagesizes() implementation
and allocate MAXPAGESIZES elements on the stack.  This avoids the need
for the allocation.

While this does mean that a new libc is required to take advantage of a
new huge page size, that was already true due to getpagesizes() using a
static buffer of MAXPAGESIZES elements.

Reviewed by: kevans, imp, emaste
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D42710

6 months agomemfd_create: move implementation to libc/gen
Brooks Davis [Mon, 27 Nov 2023 17:06:33 +0000 (17:06 +0000)]
memfd_create: move implementation to libc/gen

Due to memfd_create(3)'s construction of a path to pass to shm_open2(2),
it has a much larger than typical dependency footprint for a system
call wrapper (the list currently includes calloc, memset, sprintf, and
strlen).  As such, split it off into its own file under libc/gen to
lighten libc/sys's dependency list.

Reviewed by: kevans, imp, emaste
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D42709

6 months agogetpagesize(3): drop support for non-ELF kernels
Brooks Davis [Mon, 27 Nov 2023 17:06:25 +0000 (17:06 +0000)]
getpagesize(3): drop support for non-ELF kernels

AT_PAGESZ was introduced with ELF support in 1996 (commit
e1743d02cd14069f69a50bb8a6c626c1c6f47ddd) so we can safely count on
being able to use it to get our page size via elf_aux_info().  As such
we don't need a fallback sysctl query.

Save a few bytes of bss by dropping caching as elf_aux_info() runs
in constant time for a given query.

Reviewed by: kevans, imp, emaste
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D42708

6 months agogetpagesizes(3): drop support for kernels before 9.0
Brooks Davis [Mon, 27 Nov 2023 17:06:01 +0000 (17:06 +0000)]
getpagesizes(3): drop support for kernels before 9.0

AT_PAGESIZES and elf_aux_info where added prior to FreeBSD 9.0 in commit
ee235befcb8253fab9beea27b916f1bc46b33147.  It's safe to say that a
FreeBSD 15 libc won't work on a 8.x kernel so drop sysctl fallback.

Reviewed by: kevans, imp, emaste
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D42707

6 months agodnode_is_dirty: check dnode and its data for dirtiness
Rob N [Tue, 28 Nov 2023 17:07:57 +0000 (04:07 +1100)]
dnode_is_dirty: check dnode and its data for dirtiness

Over its history this the dirty dnode test has been changed between
checking for a dnodes being on `os_dirty_dnodes` (`dn_dirty_link`) and
`dn_dirty_record`.

  de198f2d9 Fix lseek(SEEK_DATA/SEEK_HOLE) mmap consistency
  2531ce372 Revert "Report holes when there are only metadata changes"
  ec4f9b8f3 Report holes when there are only metadata changes
  454365bba Fix dirty check in dmu_offset_next()
  66aca2473 SEEK_HOLE should not block on txg_wait_synced()

Also illumos/illumos-gate@c543ec060d illumos/illumos-gate@2bcf0248e9

It turns out both are actually required.

In the case of appending data to a newly created file, the dnode proper
is dirtied (at least to change the blocksize) and dirty records are
added.  Thus, a single logical operation is represented by separate
dirty indicators, and must not be separated.

The incorrect dirty check becomes a problem when the first block of a
file is being appended to while another process is calling lseek to skip
holes. There is a small window where the dnode part is undirtied while
there are still dirty records. In this case, `lseek(fd, 0, SEEK_DATA)`
would not know that the file is dirty, and would go to
`dnode_next_offset()`. Since the object has no data blocks yet, it
returns `ESRCH`, indicating no data found, which results in `ENXIO`
being returned to `lseek()`'s caller.

Since coreutils 9.2, `cp` performs sparse copies by default, that is, it
uses `SEEK_DATA` and `SEEK_HOLE` against the source file and attempts to
replicate the holes in the target. When it hits the bug, its initial
search for data fails, and it goes on to call `fallocate()` to create a
hole over the entire destination file.

This has come up more recently as users upgrade their systems, getting
OpenZFS 2.2 as well as a newer coreutils. However, this problem has been
reproduced against 2.1, as well as on FreeBSD 13 and 14.

This change simply updates the dirty check to check both types of dirty.
If there's anything dirty at all, we immediately go to the "wait for
sync" stage, It doesn't really matter after that; both changes are on
disk, so the dirty fields should be correct.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #15571
Closes #15526

6 months agotail.1: Add an example for +n 1
Mateusz Piotrowski [Tue, 28 Nov 2023 16:52:11 +0000 (17:52 +0100)]
tail.1: Add an example for +n 1

MFC after: 3 days
Sponsored by: Klara, Inc.

6 months agotail.1: Lint with mandoc(1)
Mateusz Piotrowski [Tue, 28 Nov 2023 16:10:12 +0000 (17:10 +0100)]
tail.1: Lint with mandoc(1)

MFC after: 3 days
Sponsored by: Klara, Inc.

6 months agoAdd DEBUG_POISON_POINTER
Mateusz Guzik [Tue, 28 Nov 2023 15:23:25 +0000 (15:23 +0000)]
Add DEBUG_POISON_POINTER

If you have a pointer which you know points to stale data, you can
fill it with junk so that dereference later will trap

Reviewed by: kib
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D40946

6 months agoxen: remove xen_domain_type enum/variable
Elliott Mitchell [Sat, 6 Aug 2022 16:02:03 +0000 (09:02 -0700)]
xen: remove xen_domain_type enum/variable

The vm_guest variable readily covers all uses of xen_domain_type, so
merge them together.  Since support for PV domains has been removed
hard-core xen_pv_domain() to return false.

Reviewed by: royger

6 months agoxen/dev: remove __unused from driver argument of identify functions
Elliott Mitchell [Wed, 27 Sep 2023 05:39:22 +0000 (22:39 -0700)]
xen/dev: remove __unused from driver argument of identify functions

The driver argument is most certainly now used by these functions.  When
originally implemented it might have been unused, but not now.

Reviewed by: royger

6 months agoxen/dev: switch to DEVMETHOD_END
Elliott Mitchell [Mon, 11 Sep 2023 21:37:20 +0000 (14:37 -0700)]
xen/dev: switch to DEVMETHOD_END

Switch to the preferred end of the device method table.  These hadn't
been updated previously.

Reviewed by: royger

6 months agoxen/x86: move x86-only variable out of common
Elliott Mitchell [Sun, 14 Feb 2021 06:46:09 +0000 (22:46 -0800)]
xen/x86: move x86-only variable out of common

Commit 27c36a12f15 is an x86-only feature.  As such xen_evtchn_needs_ack
should only exist on x86.

Differential Revision: https://reviews.freebsd.org/D29913
Reviewed by: royger
[royger]: adjust comment.

6 months agoxen/intr: remove xenpci headers
Elliott Mitchell [Wed, 10 Nov 2021 01:18:37 +0000 (17:18 -0800)]
xen/intr: remove xenpci headers

These were needed in the past, since that time the interrupt code has
been successfully isolated from the Xen/PCI code.  As such a bit of
straightforward cleanup.

Differential Revision: https://reviews.freebsd.org/D32923
Reviewed by: royger

6 months agoxen: remove declaration of evtchn_device_upcall()
Elliott Mitchell [Tue, 22 Jun 2021 04:27:55 +0000 (21:27 -0700)]
xen: remove declaration of evtchn_device_upcall()

This function was removed at 5779d8ad577.  Long past time to remove the
declaration to ensure people aren't confused.

Differential Revision: https://reviews.freebsd.org/D30865
Reviewed by: royger

6 months agoxen/apic: remove passing trapframe as argument
Elliott Mitchell [Fri, 8 Oct 2021 21:43:26 +0000 (14:43 -0700)]
xen/apic: remove passing trapframe as argument

While otherwise a handy potential approach, getting the trapframe via the
argument isn't documented and isn't supposed to be used.  While
ipi_bitmap_handler() and ipi_swi_handler() need to be passed the
trapframe as their arguments, the Xen functions can retrieve it from
curthread->td_intr_frame, which is the proper way.

Reviewed by: royger

6 months agoxen/intr: correct misuses of Xen handle pointer type
Elliott Mitchell [Sat, 23 Jul 2022 22:30:45 +0000 (15:30 -0700)]
xen/intr: correct misuses of Xen handle pointer type

Fix a few spots where handle pointers were incorrectly used.  Luckily
these appear rarely triggered given how long they've been lurking.

Fixes: 76acc41fb7c7 ("Implement vector callback for PVHVM and unify event channel implementations")
Fixes: 9f40021f288c ("Introduce a new, HVM compatible, paravirtualized timer driver for Xen.")
MFC after: 2 weeks
Reviewed by: royger

6 months agoxen: correct spacing in hypercall.h headers
Elliott Mitchell [Mon, 7 Mar 2022 23:32:17 +0000 (15:32 -0800)]
xen: correct spacing in hypercall.h headers

A precursor to merging them.  The spacing differs quite a bit between
the i386 and amd64 hypercall headers, despite very similar content.
Consistently use tabs instead of spaces.

Reviewed by: royger

6 months agoRevert "sys/mutex.h: Include sys/lock.h instead of sys/_lock.h"
Emmanuel Vadot [Tue, 28 Nov 2023 08:39:59 +0000 (09:39 +0100)]
Revert "sys/mutex.h: Include sys/lock.h instead of sys/_lock.h"

This reverts commit 2a35f3cdf63d1f9b1ea5ab0174adabb631757210.

Doesn't appears to be needed anymore and if it is at some point I'll
fix the driver.

6 months agoRevert "sys/mutex.h: Reorder includes"
Emmanuel Vadot [Tue, 28 Nov 2023 08:39:53 +0000 (09:39 +0100)]
Revert "sys/mutex.h: Reorder includes"

This reverts commit 50335b1ae4e48712f831e85ddfa7b00da0af382c.

6 months agong_socket: with getsockname() return node ID for unnamed node
Gleb Smirnoff [Tue, 28 Nov 2023 04:11:38 +0000 (20:11 -0800)]
ng_socket: with getsockname() return node ID for unnamed node

Reviewed by: afedorov
Differential Revision: https://reviews.freebsd.org/D42691

6 months agonetgraph: increase size of sockaddr_ng to match maximum node name
Gleb Smirnoff [Tue, 28 Nov 2023 04:10:52 +0000 (20:10 -0800)]
netgraph: increase size of sockaddr_ng to match maximum node name

The ng_socket(4) node already writes more than declared size of the
struct at least in the in ng_getsockaddr().  Make size match size of
a node name.  The value is pasted instead of including ng_message.h
into ng_socket.h.  This is external API and we want to keep it stable
even if NG_NODESIZ is redefined in a kernel build.

Reviewed by: afedorov
Differential Revision: https://reviews.freebsd.org/D42690

6 months agoFreeBSD: Fix ZFS so that snapshots under .zfs/snapshot are NFS visible
rmacklem [Tue, 28 Nov 2023 00:31:03 +0000 (16:31 -0800)]
FreeBSD: Fix ZFS so that snapshots under .zfs/snapshot are NFS visible

Call vfs_exjail_clone() for mounts created under .zfs/snapshot
to fill in the mnt_exjail field for the mount.  If this is not
done, the snapshots under .zfs/snapshot with not be accessible
over NFS.

This version has the argument name in vfs.h fixed to match that
of the name in spl_vfs.c, although it really does not matter.

External-issue: https://reviews.freebsd.org/D42672
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca>
Closes #15563

6 months agopmbr: Only load the first 545k rather than error out
Warner Losh [Mon, 27 Nov 2023 22:40:40 +0000 (15:40 -0700)]
pmbr: Only load the first 545k rather than error out

It would be nice to have larger boot partitions for ESPs to live in one
day. It's trivial to carve out 5M 10M or 200M when provisioning, but
logistical issues may make it hard to do it after the fact. So only warn
when the partition is > 545k. If we ever grow the boot loader larger
than that, then it will be responsible for loading the rest anyway.

Sponsored by: Netflix
Reviewed by: tsoome
Differential Revision: https://reviews.freebsd.org/D42774

6 months agocdefs: Remove __func__ stub.
Warner Losh [Mon, 27 Nov 2023 18:48:39 +0000 (11:48 -0700)]
cdefs: Remove __func__ stub.

Redo 17a238a15fbe. Remove the __func__ crutch for gcc 2.95 and earlier.
We don't need it today to build the tree (since gcc < 12 is unlikely to
work). And it's not used in any system header that's part of the
standard interfaces today (so we don't need it for compatibility). And
we have other issues that make gcc < 4.2 unlikely to work today with
system headers.

Sponsored by: Netflix

6 months agoRevert "cdefs: Remove __func__ define"
Warner Losh [Mon, 27 Nov 2023 18:47:21 +0000 (11:47 -0700)]
Revert "cdefs: Remove __func__ define"

This reverts commit 17a238a15fbed01477fbc54744d35cbccdb65871. There were
too many other changes accidentally mixed in.

Sponsored by: Netflix

6 months agozdb: Fix zdb '-O|-r' options with -e/exported zpool
Akash B [Mon, 27 Nov 2023 21:41:58 +0000 (03:11 +0530)]
zdb: Fix zdb '-O|-r' options with -e/exported zpool

zdb with '-e' or exported zpool doesn't work along with
'-O' and '-r' options as we process them before '-e' has
been processed.

Below errors are seen:

~> zdb -e pool-mds65/mdt65 -O oi.9/0x200000009:0x0:0x0
failed to hold dataset 'pool-mds65/mdt65': No such file or directory

~> zdb -e pool-oss0/ost0 -r file1 /tmp/filecopy1 -p.
failed to hold dataset 'pool-oss0/ost0': No such file or directory
zdb: internal error: No such file or directory

We need to make sure to process '-O|-r' options after the
'-e' option has been processed, which imports the pool to
the namespace if it's not in the cachefile.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Akash B <akash-b@hpe.com>
Closes #15532

6 months agoMakefile.vm: Fix duplicate rc.conf files
Colin Percival [Mon, 27 Nov 2023 21:29:05 +0000 (13:29 -0800)]
Makefile.vm: Fix duplicate rc.conf files

Two bugs in Makefile.vm resulted in disk images being "built" multiple
times, resulting in lines added to /etc/rc.conf being duplicated:

1. The vm-image target reused the same "staging tree" directory for all
of its builds (multiple disk image types and multiple filesystem types).

2. The cw-type-flavour-fs target depends on emulator-portinstall, which
did not have a 'touch ${.TARGET}' and thus re-ran every time -- and
caused the cw-type-flavour-fs target to be re-run.  This was triggered
by release builds running `make cloudware-release` (creating the disk
images) followed by `make ec2amis` (which re-created the disk images
prior to uploading them).

MFC After: 1 week
Sponsored by: https://www.patreon.com/cperciva

6 months agozdb: show BRT statistics and dump its contents
Rob Norris [Sat, 18 Nov 2023 10:33:45 +0000 (21:33 +1100)]
zdb: show BRT statistics and dump its contents

Same idea as the dedup stats, but for block cloning.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15541

6 months agobrt: lift internal definitions into _impl header
Rob Norris [Sat, 18 Nov 2023 10:32:16 +0000 (21:32 +1100)]
brt: lift internal definitions into _impl header

So that zdb (and others!) can get at the BRT on-disk structures.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15541

6 months agoZTS: Fix zfs_load-key failures on F39
Tony Hutter [Mon, 27 Nov 2023 21:24:37 +0000 (13:24 -0800)]
ZTS: Fix zfs_load-key failures on F39

The zfs_load-key tests were failing on F39 due to their use of the
deprecated ssl.wrap_socket function.  This commit updates the test to
instead use ssl.SSLContext() as described in
https://stackoverflow.com/a/65194957.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15534
Closes #15550

6 months agozfs-dkms: fix shell-init error message
AllKind [Mon, 27 Nov 2023 21:17:48 +0000 (22:17 +0100)]
zfs-dkms: fix shell-init error message

If all zfs dkms modules have been removed, a shell-init error message
may appear, because /var/lib/dkms/zfs does no longer exist.
Resolve this by leaving the directory earlier on.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mart Frauenlob <AllKind@fastest.cc>
Closes #15576

6 months agoZVOL: Minor code cleanup
Alexander Motin [Mon, 27 Nov 2023 21:16:59 +0000 (16:16 -0500)]
ZVOL: Minor code cleanup

- Remove zsda_tx field, it is used only once.
 - Remove unneeded string lengths checks, all names are terminated.
 - Replace few explicit MAXNAMELEN usages with sizeof().
 - Change dsname from MAXNAMELEN to ZFS_MAX_DATASET_NAME_LEN, as
expected by dsl_dataset_name().  Both are 256 bytes now, but it is
better to be safe.

This should have no functional difference.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15535

6 months agotests: don't run atf_* in a subshell
Gleb Smirnoff [Mon, 27 Nov 2023 21:15:58 +0000 (13:15 -0800)]
tests: don't run atf_* in a subshell

Shell limitation is that a classic function call via $() is a subshell
and atf-sh(3) commands won't work as epxected there.  Subsequently,
atf_skip inside a function won't skip a test.  The test will fail later.

A working approach is to pass desired variable name as argument to
a function and don't run subshell.

Reviewed by: ngie
Differential Revision: https://reviews.freebsd.org/D42646
Fixes: ea82362219ee715cfbb195b2114e73fdc8599fa5

6 months agoFreeBSD: Fix the build on FreeBSD 12
Alan Somers [Mon, 27 Nov 2023 20:58:03 +0000 (13:58 -0700)]
FreeBSD: Fix the build on FreeBSD 12

It was broken for several reasons:
* VOP_UNLOCK lost an argument in 13.0.  So OpenZFS should be using
  VOP_UNLOCK1, but a few direct calls to VOP_UNLOCK snuck in.
* The location of the zlib header moved in 13.0 and 12.1.  We can drop
  support for building on 12.0, which is EoL.
* knlist_init lost an argument in 13.0.  OpenZFS change 9d0887402ba
  assumed 13.0 or later.
* FreeBSD 13.0 added copy_file_range, and OpenZFS change 67a1b037915
  assumed 13.0 or later.

Sponsored-by: Axcient
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #15551

6 months agopf tests: test recursive printing of labels
Kristof Provost [Mon, 27 Nov 2023 15:47:36 +0000 (16:47 +0100)]
pf tests: test recursive printing of labels

Sponsored by: Rubicon Communications, LLC ("Netgate")

6 months agopfctl: Fix recursive printing of anchor labels
Luiz Amaral [Mon, 27 Nov 2023 15:53:27 +0000 (16:53 +0100)]
pfctl: Fix recursive printing of anchor labels

We recently noticed that the recursive printing of labels wasn't working
like the recursive printing of rules.

When running pfctl -sr -a* we get a listing of all rules, including the
ones inside anchors. On the other hand, when running pfctl -sl -a*, it
would only print the labels in the root level, just like without the
-a* argument.

As in our use-case we are interested on labels only and our labels are
unique even between anchors, we didn't add indentation or hierarchy to
the printing.

Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D42728

6 months agopf: implement DIOCGETRULES via netlink
Kristof Provost [Fri, 24 Nov 2023 23:42:44 +0000 (00:42 +0100)]
pf: implement DIOCGETRULES via netlink

Sponsored by: Rubicon Communications, LLC ("Netgate")

6 months agosnmp_pf: use libpfctl's pfctl_get_rules_info() rather than DIOCGETRULES
Kristof Provost [Mon, 27 Nov 2023 16:48:33 +0000 (17:48 +0100)]
snmp_pf: use libpfctl's pfctl_get_rules_info() rather than DIOCGETRULES

Prefer libpfctl functions over direct access to the ioctl whenever
possible.

Sponsored by: Rubicon Communications, LLC ("Netgate")

6 months agopfctl: use libpfctl instead of DIOCGETRULES directly
Kristof Provost [Fri, 24 Nov 2023 23:43:48 +0000 (00:43 +0100)]
pfctl: use libpfctl instead of DIOCGETRULES directly

MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")

6 months agoFix two latent bugs in hpts. One where a static is put on
Randall Stewart [Mon, 27 Nov 2023 19:38:06 +0000 (14:38 -0500)]
Fix two latent bugs in hpts. One where a static is put on
a local variable, the other an initialization bug where
we should be setting tv.tv_sec to 0.

PR: 275482

6 months agoZIL: Refactor TX_WRITE encryption similar to TX_CLONE_RANGE
Alexander Motin [Wed, 22 Nov 2023 18:15:32 +0000 (13:15 -0500)]
ZIL: Refactor TX_WRITE encryption similar to TX_CLONE_RANGE

It should be purely textual change to make the code more readable.
Should cause no functional difference.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Tom Caputi <caputit1@tcnj.edu>
Reviewed-by: Sean Eric Fagan <sef@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Edmund Nadolski <edmund.nadolski@ixsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15543
Closes #15513

6 months agoZIL: Do not encrypt block pointers in lr_clone_range_t
Alexander Motin [Sun, 19 Nov 2023 01:01:03 +0000 (20:01 -0500)]
ZIL: Do not encrypt block pointers in lr_clone_range_t

In case of crash cloned blocks need to be claimed on pool import.
It is only possible if they (lr_bps) and their count (lr_nbps) are
not encrypted but only authenticated, similar to block pointer in
lr_write_t.  Few other fields can be and are still encrypted.

This should fix panic on ZIL claim after crash when block cloning
is actively used.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Tom Caputi <caputit1@tcnj.edu>
Reviewed-by: Sean Eric Fagan <sef@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Edmund Nadolski <edmund.nadolski@ixsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15543
Closes #15513

6 months agoDon't allow attach to a raidz child vdev
Don Brady [Mon, 27 Nov 2023 17:46:38 +0000 (10:46 -0700)]
Don't allow attach to a raidz child vdev

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Closes #15536
Closes #15564

6 months agopwait.1: add missing prompt and command in examples
Mike Karels [Mon, 27 Nov 2023 16:55:11 +0000 (10:55 -0600)]
pwait.1: add missing prompt and command in examples

Two examples showed '$?' alone on a line, which should be '$ echo $?'.
The third example got it right.  Fix the first two.