]> CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log
FreeBSD/FreeBSD.git
2 months agozfs: merge openzfs/zfs@39be46f43
Martin Matuska [Sat, 30 Mar 2024 21:14:52 +0000 (22:14 +0100)]
zfs: merge openzfs/zfs@39be46f43

Notable upstream pull request merges:
 #15509 b1e46f869 Add ashift validation when adding devices to a pool
 #15927 45e23abed Update resume token at object receive
 #15941 bf8f72359 BRT: Skip duplicate BRT prefetches
 #15950 8cd8ccca5 BRT: Skip getting length in brt_entry_lookup()
 #15951 80cc51629 ZAP: Massively switch to _by_dnode() interfaces
 #15954 2c01cae8b BRT: Change brt_pending_tree sorting order
 #15955 4616b96a6 BRT: Relax brt_pending_apply() locking
 #15959 5c4a4f82c zio: update ZIO type x stage documentation
 #15962 493fcce9b Provide macros for setting and getting blkptr birth times
 #15963 90ff73235 freebsd: fix missing headers in distribution tarball
 #15967 f68bde723 BRT: Make BRT block sizes configurable
 #15976 c28f94f32 ZAP: Some cleanups/micro-optimizations
 #15995 cfb96c772 vdev_disk: clean up spa/bdev mode conversion
 #16006 c0aab8b8f zvols: prevent overflow of minor device numbers
 #16007 a89d209bb BRT: Fix holes cloning
 #16008 c9d8f6c59 Fix option string, adding -e and fixing order

Obtained from: OpenZFS
OpenZFS commit: 39be46f43f96fb7420386d03751b01f5cb376d6b

2 months agoarm64: enable superpage mappings by pmap_mapdev{,_attr}()
Alan Cox [Sat, 30 Mar 2024 20:35:32 +0000 (15:35 -0500)]
arm64: enable superpage mappings by pmap_mapdev{,_attr}()

In order for pmap_kenter{,_device}() to create superpage mappings,
either 64 KB or 2 MB, pmap_mapdev{,_attr}() must request appropriately
aligned virtual addresses.

Reviewed by: markj
Tested by: gallatin
Differential Revision: https://reviews.freebsd.org/D42737

2 months agoarm64 pmap: Add ATTR_CONTIGUOUS support [Part 1]
Eliot Solomon [Sun, 24 Mar 2024 19:01:47 +0000 (14:01 -0500)]
arm64 pmap: Add ATTR_CONTIGUOUS support [Part 1]

The ATTR_CONTIGUOUS bit within an L3 page table entry designates that
L3 page as being part of an aligned, physically contiguous collection
of L3 pages.  For example, 16 aligned, physically contiguous 4 KB pages
can form a 64 KB superpage, occupying a single TLB entry.  While this
change only creates ATTR_CONTIGUOUS mappings in a few places,
specifically, the direct map and pmap_kenter{,_device}(), it adds all
of the necessary code for handling them once they exist, including
demotion, protection, and removal.  Consequently, new ATTR_CONTIGUOUS
usage can be added (and tested) incrementally.

Modify the implementation of sysctl vm.pmap.kernel_maps so that it
correctly reports the number of ATTR_CONTIGUOUS mappings on machines
configured to use a 16 KB base page size, where an ATTR_CONTIGUOUS
mapping consists of 128 base pages.

Additionally, this change adds support for creating L2 superpage
mappings to pmap_kenter{,_device}().

Reviewed by: markj
Tested by: gallatin
Differential Revision: https://reviews.freebsd.org/D42737

2 months agothread_single(9): decline external requests for traced or debugger-stopped procs
Konstantin Belousov [Wed, 27 Mar 2024 12:29:25 +0000 (14:29 +0200)]
thread_single(9): decline external requests for traced or debugger-stopped procs

Debugger has the powers to cause unbound delay in single-threading,
which then blocks the threaded taskqueue.  The reproducer is
`truss -f timeout 2 sleep 10`.

Reported by: mjg
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D44523

2 months agopkgbase: Remove FreeBSD-ipfilter package
Emmanuel Vadot [Sat, 30 Mar 2024 08:36:35 +0000 (09:36 +0100)]
pkgbase: Remove FreeBSD-ipfilter package

Put the periodic script for ipfilter in the FreeBSD-ipf package with
all the utilities.

PR: 278042
Sponsored by: Beckhoff Automation GmbH & Co. KG

2 months agodts: Fix arm dts path for marvell too
Bjoern A. Zeeb [Sat, 30 Mar 2024 02:31:32 +0000 (02:31 +0000)]
dts: Fix arm dts path for marvell too

Linux 6.5 moved to a vendor-based subdirectory for arm DTS, change
our Makefiles accordingly.

This makes universe also compile arm.armv7 ARMADA38X sucessfully.

2 months agoLinux 5.18+ compat: Detect filemap_range_has_page
Robert Evans [Sat, 30 Mar 2024 00:11:52 +0000 (20:11 -0400)]
Linux 5.18+ compat: Detect filemap_range_has_page

In v5.18 `filemap_range_has_page` moved to `pagemap.h`

`pagemap.h` has been around since 3.10 so just include both

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Robert Evans <evansr@google.com>
Closes #16034

2 months agovf_i2c: update I2C controller logic
Pierre-Luc Drouin [Fri, 22 Mar 2024 22:13:04 +0000 (22:13 +0000)]
vf_i2c: update I2C controller logic

Update the I2C controller logic to be more consistent with the
newer version of the controller reference manual.
This makes it work better on modern LS/LX platforms and avoids
unnecessary delays.  Also fixes a lock leak.

MFC after: 7 days
Tested by: bz (LS1088a FDT), Pierre-Luc Drouin (Honeycomb, ACPI)
Differential Revision: https://reviews.freebsd.org/D44021

2 months agovf_i2c: split up and add ACPI attachments in addition to FDT
Pierre-Luc Drouin [Fri, 22 Mar 2024 22:12:07 +0000 (22:12 +0000)]
vf_i2c: split up and add ACPI attachments in addition to FDT

Move the code from the arm specific to the iicbus controller directory.
Split up between general logic and bus attachment code.
Add support for ACPI attachment in addition to FDT.

MFC after: 7 days
Tested by: bz (LS1088a FDT), Pierre-Luc Drouin (Honeycomb, ACPI)
Based on: D24917 by Val Packett (initial early version)
Differential Revision: https://reviews.freebsd.org/D44020

2 months agoFix buffer underflow if sysfs file is empty
Robert Evans [Fri, 29 Mar 2024 21:59:23 +0000 (17:59 -0400)]
Fix buffer underflow if sysfs file is empty

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Jason Lee <jasonlee@lanl.gov>
Signed-off-by: Robert Evans <evansr@google.com>
Closes #16028
Closes #16035

2 months agovdev_disk: clean up spa/bdev mode conversion
Rob N [Fri, 29 Mar 2024 21:51:33 +0000 (08:51 +1100)]
vdev_disk: clean up spa/bdev mode conversion

43e8f6e37 introduced a subtle API misuse, in that it passed the output
from vdev_bdev_mode() back into itself. Fortunately, the
SPA_MODE_(READ|WRITE) bit values exactly map to the FMODE_(READ|WRITE) &
BLK_OPEN_(READ|WRITE) bit values, so it didn't result in a bug, but it
was hard to read and understand, so I cleaned it up.

In doing so, I noticed that the only call to vdev_bdev_mode() without
the "exclusive" flag set was in that misuse, and actually, we never do a
non-exclusive blkdev_get_by_path(). So I've just made exclusive be
always-on.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #15995

2 months agozvols: prevent overflow of minor device numbers
Fabian-Gruenbichler [Fri, 29 Mar 2024 21:37:40 +0000 (22:37 +0100)]
zvols: prevent overflow of minor device numbers

currently, the linux kernel allows 2^20 minor devices per major device
number.  ZFS reserves blocks of 2^4 minors per zvol: 1 for the zvol
itself, the other 15 for the first partitions of that zvol. as a result,
only 2^16 such blocks are available for use.

there are no checks in place to avoid overflowing into the major device
number when more than 2^16 zvols are allocated (with volmode=dev or
default). instead of ignoring this limit, which comes with all sorts of
weird knock-on effects, detect this situation and simply fail allocating
the zvol block device early on.

without this safeguard, the kernel will reject the attempt to create an
already existing block device, but ZFS doesn't handle this error and
gets confused about which zvol occupies which minor slot, potentially
resulting in kernel NULL derefs and other issues later on.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Closes #16006

2 months agolinux: make linux_netlink_p->msg_from_linux be able to fail
Gleb Smirnoff [Fri, 29 Mar 2024 20:35:51 +0000 (13:35 -0700)]
linux: make linux_netlink_p->msg_from_linux be able to fail

The KPI for this function was misleading.  From the NetLink perspective it
looked like a function that: a) allocates new hdr, b) can fail.  Neither
was true.  Let the function return a error code instead of returning the
same hdr it was passed to.  In case if future Linux NetLink compatibility
support calls for reallocating header, pass hdr as pointer to pointer.

With KPI that returns a error, propagate domain conversion errors all the
way up to NetLink module.  This fixes panic when unknown domain is
converted to 0xff and this invalid value is passed into NetLink
processing.

PR: 274536
Reviewed by: melifaro
Differential Revision: https://reviews.freebsd.org/D44392

2 months agolinux: use sa_family_t for address family conversions
Gleb Smirnoff [Fri, 29 Mar 2024 20:35:37 +0000 (13:35 -0700)]
linux: use sa_family_t for address family conversions

Express "conversion failed" with maximum possible value.  This allows to
reduce number of size/signedness conversion in the code that utilizes the
functions.

PR: 274536
Reviewed by: melifaro
Differential Revision: https://reviews.freebsd.org/D44375

2 months agoif_tuntap: simplify storage of per-vnet cloners
Gleb Smirnoff [Fri, 29 Mar 2024 19:35:41 +0000 (12:35 -0700)]
if_tuntap: simplify storage of per-vnet cloners

There is no need for a separate structure neither for a linked list.
Provide each VNET with an array of pointers to if_clone that has the same
size as the driver list.

Reviewed by: zlei, kevans, kp
Differential Revision: https://reviews.freebsd.org/D44307

2 months agokern_ctf.c: Don't print out warning messages unconditionally
Bojan Novković [Fri, 29 Mar 2024 19:17:19 +0000 (20:17 +0100)]
kern_ctf.c: Don't print out warning messages unconditionally

The kernel CTF loading routines print various warnings when attempting
to load CTF data from an ELF file. After the changes in c21bc6f3c242
those warnings are unnecessarily printed for each kernel module
that was compiled without CTF data.

The kernel linker already uses the bootverbose flag to conditionally
print CTF loading errors. This patch alters kern_ctf.c
routines to do the same.

Reported by: Alexander@leidinger.net
Approved by: markj (mentor)
Fixes: c21bc6f3c242 ("ddb: Add CTF-based pretty printing")
2 months agobuild: Do not pass -fno-sanitize-memory-param-retval to subr_coverage.c
Mark Johnston [Sat, 23 Dec 2023 00:24:48 +0000 (19:24 -0500)]
build: Do not pass -fno-sanitize-memory-param-retval to subr_coverage.c

In the absence of -fsanitize=kernel-memory, the presence of this flag
results in a -Wunused-command-line-argument warning.

MFC after: 1 week

2 months agoinpcb: fully retire inp_ppcb pointer
Gleb Smirnoff [Fri, 29 Mar 2024 19:16:59 +0000 (12:16 -0700)]
inpcb: fully retire inp_ppcb pointer

Before a protocol specific control block started to embed inpcb in self
(see 0aa120d52f3ce68b3792440c483fe96511ec) this pointer used to point
at it.

Retain kf_sock_inpcb field in the struct kinfo_file in <sys/user.h>.  The
exp-run detected a minimal use of the field in ports:
  * sysutils/lsof - patched upstream
  * net-mgmt/netdata  - patch accepted upstream
  * emulators/qemu-user-static - upstream master branch seems not using
    the field anymore
We can keep the field around for some time, but eventually it may be
reused for something else.

PR: 277659 (exp-run)
Reviewed by: tuexen
Differential Revision: https://reviews.freebsd.org/D44491

2 months agoAdd ashift validation when adding devices to a pool
George Wilson [Fri, 29 Mar 2024 19:15:56 +0000 (15:15 -0400)]
Add ashift validation when adding devices to a pool

Currently, zpool add allows users to add top-level vdevs that have
different ashifts but doing so prevents users from being able to
perform a top-level vdev removal. Often times consumers may not realize
that they have mismatched ashifts until the top-level removal fails.

This feature adds ashift validation to the zpool add command and will
fail the operation if the sector size of the specified vdev does not
match the existing pool. This behavior can be disabled by using the -f
flag. In addition, new flags have been added to provide fine-grained
control to disable specific checks. These flags
are:

--allow-in-use
--allow-ashift-mismatch
--allow-replicaton-mismatch

The force flag will disable all of these checks.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Mark Maybee <mmaybee@delphix.com>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes #15509

2 months agoLinuxKPI: remove dummy header files with implementation
Bjoern A. Zeeb [Fri, 29 Mar 2024 19:07:58 +0000 (19:07 +0000)]
LinuxKPI: remove dummy header files with implementation

All three files now have an implementation so we no longer need the
"dummy" versions.

Sponsored by: The FreeBSD Foundation

2 months agopcm.4: Showcase default device change using mixer(8)
Christos Margiolis [Fri, 29 Mar 2024 15:59:55 +0000 (23:59 +0800)]
pcm.4: Showcase default device change using mixer(8)

Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D44556

2 months agosound: Remove unused SND_DEV_LAST and SND_DEV_MAX constants
Christos Margiolis [Fri, 29 Mar 2024 15:32:38 +0000 (23:32 +0800)]
sound: Remove unused SND_DEV_LAST and SND_DEV_MAX constants

Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D44554

2 months agosound: Fix SND_DIAGNOSTIC ifdef comment
Christos Margiolis [Fri, 29 Mar 2024 15:32:13 +0000 (23:32 +0800)]
sound: Fix SND_DIAGNOSTIC ifdef comment

Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D44555

2 months agosound: Get rid of pcm/sndstat.h and turn macros into regular code
Christos Margiolis [Fri, 29 Mar 2024 15:29:43 +0000 (23:29 +0800)]
sound: Get rid of pcm/sndstat.h and turn macros into regular code

There is no reason to have macros for this. Putting the code in
sndstat_prepare_pcm() directly makes it easier to work with it.

No functional change intended.

Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D44545

2 months agosound: Drain buffer selinfo in sndbuf_free()
Christos Margiolis [Fri, 29 Mar 2024 15:29:23 +0000 (23:29 +0800)]
sound: Drain buffer selinfo in sndbuf_free()

Prevent a use-after-free in kern_poll() by making sure the buffer's
selinfo is drained. This is required for a subsequent patch that
implements asynchronous audio device detach.

Reported by: KASAN
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D44544

2 months agorelease.sh: Don't install git if already present
Colin Percival [Fri, 29 Mar 2024 07:10:50 +0000 (00:10 -0700)]
release.sh: Don't install git if already present

Prior to this commit, we install git from ports if there is a ports
tree available and git is not installed, and we install git from pkg
otherwise -- including the case where git is already installed.

Rework the logic to not (re)install git at all if it is already
installed.

MFC after: 3 days

2 months agovtnet: set VNET context in RX handler
Gleb Smirnoff [Thu, 28 Mar 2024 21:12:39 +0000 (14:12 -0700)]
vtnet: set VNET context in RX handler

The context is required for NIC-level pfil(9) filtering.

2 months agopfilctl: fix 'pfilctl hooks' when nothing is connected
Gleb Smirnoff [Thu, 28 Mar 2024 21:10:15 +0000 (14:10 -0700)]
pfilctl: fix 'pfilctl hooks' when nothing is connected

The 'hooks' command actually worked accidentially until now.  It used
PFILIOC_LISTHEADS to determine current number of hooks.  This worked when
at least one head had a hook connected to it.

2 months agoFix style nits in kern_linker.c
Bojan Novković [Thu, 28 Mar 2024 19:36:30 +0000 (20:36 +0100)]
Fix style nits in kern_linker.c

Reported by: jrtc27
Fixes: c21bc6f3c242 ("ddb: Add CTF-based pretty printing")
Approved by: markj (mentor)

2 months agoddb: Drop obsolete -FreeBSD identifier from license
Bojan Novković [Thu, 28 Mar 2024 19:32:52 +0000 (20:32 +0100)]
ddb: Drop obsolete -FreeBSD identifier from license

Reported by: jrtc27
Fixes: c21bc6f3c242 ("ddb: Add CTF-based pretty printing")
Approved by: markj (mentor)

2 months agokerneldump: Add flag to indicate kernel core was successfully dumped
Stephen J. Kiernan [Wed, 27 Mar 2024 22:55:21 +0000 (18:55 -0400)]
kerneldump: Add flag to indicate kernel core was successfully dumped

This allows for shutdown_final EVENTHANDLERs to know that a core dump
successfully occurred. Embedded systems may want to record this fact
or act on it.

Obtained from: Juniper Networks, Inc.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D44542

2 months agostand/efi: Changes to efichar to allow it to be used in the kernel
Stephen J. Kiernan [Wed, 27 Mar 2024 22:37:48 +0000 (18:37 -0400)]
stand/efi: Changes to efichar to allow it to be used in the kernel

Replace malloc/free with EFICHAR_MALLOC and EFICHAR_FREEE macros.

Obtained from: Juniper Networks, Inc.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D44541

2 months agopf: fix reply-to after rdr and dummynet
Kristof Provost [Wed, 27 Mar 2024 14:47:21 +0000 (15:47 +0100)]
pf: fix reply-to after rdr and dummynet

If we redirect a packet to localhost and it gets dummynet'd it may be
re-injected later (e.g. when delayed) which means it will be passed
through ip_input() again. ip_input() will then reject the packet because
it's directed to the loopback address, but did not arrive on a loopback
interface.

Fix this by having pf set the rcvif to V_iflo if we redirect to
loopback.

See also: https://redmine.pfsense.org/issues/15363
Sponsored by: Rubicon Communications, LLC ("Netgate")

2 months agocp: Fix grammar in comment.
Dag-Erling Smørgrav [Thu, 28 Mar 2024 14:06:37 +0000 (15:06 +0100)]
cp: Fix grammar in comment.

This reverts commit 416fdc2d71656e2f0b4a16828fb0c736ae32b74a.

2 months agoOptimize HPTS so that little work is done until we have a hpts thread that is over...
Randall Stewart [Thu, 28 Mar 2024 12:12:37 +0000 (08:12 -0400)]
Optimize HPTS so that little work is done until we have a hpts thread that is over the connection threshold

HPTS inserts a softclock for system call return that optimizes performance. However when
no HPTS threads need the help (i.e. when they have less than 100 or so connections) then
there should be little work done i.e. check the counter and return instead of running through
all the threads getting locks etc.ptimize HPTS so that little work is done until we have a hpts
thread that is over the connection threshold.

Reported by:    eduardo
Reviewed by:    gallatin, glebius, tuexen
Tested by:      gallatin
Differential Revision: https://reviews.freebsd.org/D44420

2 months agox86: handle MXCSR from XSAVEOPT when x87 state was optimized
Konstantin Belousov [Wed, 27 Mar 2024 11:01:44 +0000 (13:01 +0200)]
x86: handle MXCSR from XSAVEOPT when x87 state was optimized

PR: 275322
Reported by: Cheyenne Wills <cheyenne.wills@gmail.com>
Reviewed by: emaste, jhb, olce
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D44522

2 months agoarm: Remove TI from NOTES
Emmanuel Vadot [Thu, 28 Mar 2024 06:27:10 +0000 (07:27 +0100)]
arm: Remove TI from NOTES

TI support was removed so remove it from NOTES too.

Sponsored by: Beckhoff Automation GmbH & Co. KG

2 months agoinclude: Allow SDESTDIR to be overridden
Stephen J. Kiernan [Wed, 27 Mar 2024 22:13:00 +0000 (18:13 -0400)]
include: Allow SDESTDIR to be overridden

Obtained from: Juniper Networks, Inc.
Reviewed by: sjg
Differential Revision: https://reviews.freebsd.org/D44540

2 months agolibmagic: Use HOST_CC when compiling hostprog used by build
Stephen J. Kiernan [Wed, 27 Mar 2024 22:02:32 +0000 (18:02 -0400)]
libmagic: Use HOST_CC when compiling hostprog used by build

The "mkmagic" program should be built with the host compiler.

Only use BTOOLSPATH if not building for host

Obtained from: Juniper Networks, Inc.
Reviewed by: sjg
Differential Revision: https://reviews.freebsd.org/D44539

2 months agocsh: Use HOST_CC when compiling hostprog used by csh build
Stephen J. Kiernan [Wed, 27 Mar 2024 21:31:40 +0000 (17:31 -0400)]
csh: Use HOST_CC when compiling hostprog used by csh build

The "gethost" program should be built with the host compiler.

Obtained from: Juniper Networks, Inc.
Reviewed by: sjg
Differential Revision: https://reviews.freebsd.org/D44537

2 months agosys.mk: Define HOST_CC as CC by default.
Stephen J. Kiernan [Wed, 27 Mar 2024 21:25:28 +0000 (17:25 -0400)]
sys.mk: Define HOST_CC as CC by default.

This allows for setting a different compiler for building hostprogs
when cross compiling.

Obtained from: Juniper Networks, Inc.
Reviewed by: sjg
Differential Revision: https://reviews.freebsd.org/D44536

2 months agoZTS: fix flakiness in cp_files_002_pos
Robert Evans [Wed, 27 Mar 2024 21:59:16 +0000 (17:59 -0400)]
ZTS: fix flakiness in cp_files_002_pos

Fix RANDOM to not return zero.

Overwriting with `dd ... count=0` does not test anything.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Robert Evans <evansr@google.com>
Closes #16029

2 months agoBRT: Check pool clone stats in more tests
Alexander Motin [Tue, 19 Mar 2024 17:08:05 +0000 (13:08 -0400)]
BRT: Check pool clone stats in more tests

This should allow to catch some leaks, if those happen.

While there fix some cosmetic issues.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16007

2 months agoBRT: Fix tests to work on non-empty pools
Alexander Motin [Tue, 19 Mar 2024 16:25:14 +0000 (12:25 -0400)]
BRT: Fix tests to work on non-empty pools

It should not normally happen, but if it does, better to not fail
everything for no good reason, or it may be hard to debug.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16007

2 months agoBRT: Fix holes cloning.
Alexander Motin [Mon, 18 Mar 2024 18:19:53 +0000 (14:19 -0400)]
BRT: Fix holes cloning.

 - When reading L0 block pointers handle buffers without ones and
without dirty records as a holes.  Those appear when dnode size
was increased, but the end was never written, so there are no new
indirection levels to store the pointers.  It makes no sense to
return EAGAIN here, since sync won't create new indirection levels
until there will be actual writes.
 - When cloning blocks set destination hole logical birth time
to the current TXG.  Otherwise if we are cloning over existing
data, newly created holes may not be properly replicated later.
Use BP_SET_BIRTH() when possible to not replicate its logic.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15994
Closes #16007

2 months agobsdinstall: draw attention to new network config options
Mike Karels [Wed, 27 Mar 2024 20:10:43 +0000 (15:10 -0500)]
bsdinstall: draw attention to new network config options

The network configuration options have changed in bsdinstall, with
an Auto option to proceed directly to DHCP and IPv6 autoconfig (which
is the default) as well as Manual (the old mode).  For users like me
that were used to hitting return automatically to select an interface,
but want manual configuration, attempt to call out the difference:
Change the menu caption to say "Please select a network interface
and configuration mode:" and not just an interface.

Reviewed by: jrtc27

2 months agoarm64: Delete stale comment
Jessica Clarke [Wed, 27 Mar 2024 19:43:38 +0000 (15:43 -0400)]
arm64: Delete stale comment

Fixes: 078a69abcbb8 ("Use a uint64_t to store the arm64 mpidr")

2 months agosockets: define shutdown(2) constants in cpp namespace
Gleb Smirnoff [Wed, 27 Mar 2024 19:19:44 +0000 (12:19 -0700)]
sockets: define shutdown(2) constants in cpp namespace

There is software that uses SHUT_RD, SHUT_WR as preprocessor defines and
its build was broken by enum declaration.  Keep the enum, but provide
defines to propagate the constants to cpp namespace.

PR: 277994
PR: 277995
Fixes: c3276e02beab825824e3147b31af33af66298430

2 months agodma.conf: Fix typo
Mikael Urankar [Wed, 27 Mar 2024 09:36:33 +0000 (10:36 +0100)]
dma.conf: Fix typo

Pull Request: https://github.com/freebsd/freebsd-src/pull/1150

2 months agotools/git: ensure git-arc is more platform indepdendent
Mina Galić [Wed, 27 Mar 2024 13:53:39 +0000 (09:53 -0400)]
tools/git: ensure git-arc is more platform indepdendent

Summary:
Linux systems' tail doesn't have `-r`.
Instead, we can use git's own `--reverse` sorting for `rev-list`s.

Reviewed by: markj, imp, jhibbits
Differential Revision: https://reviews.freebsd.org/D39975

2 months agotcp bblog: use correct length
Michael Tuexen [Wed, 27 Mar 2024 13:31:48 +0000 (14:31 +0100)]
tcp bblog: use correct length

The length of tldl_reason is TCP_LOG_REASON_LEN, not TCP_LOG_ID_LEN.
No functional change intended.
Reported by: Coverity Scan
CID: 1418074
CID: 1418276
Reviewed by: glebius, rscheff
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D44510

2 months agoinstall: Prefer strsnvis() to strsvis().
Dag-Erling Smørgrav [Wed, 27 Mar 2024 10:03:59 +0000 (11:03 +0100)]
install: Prefer strsnvis() to strsvis().

MFC after: 1 week
Sponsored by: Klara, Inc.
Reviewed by: allanjude
Differential Revision: https://reviews.freebsd.org/D44514

2 months agoln: Add a test case for ln -sfF.
Dag-Erling Smørgrav [Wed, 27 Mar 2024 10:03:56 +0000 (11:03 +0100)]
ln: Add a test case for ln -sfF.

MFC after: 1 week
Sponsored by: Klara, Inc.
Reviewed by: allanjude, asomers
Differential Revision: https://reviews.freebsd.org/D44513

2 months agoln: Clean up and simplify tests.
Dag-Erling Smørgrav [Wed, 27 Mar 2024 10:03:52 +0000 (11:03 +0100)]
ln: Clean up and simplify tests.

MFC after: 1 week
Sponsored by: Klara, Inc.
Reviewed by: allanjude
Differential Revision: https://reviews.freebsd.org/D44512

2 months agoln: Use stdbool, style nits.
Dag-Erling Smørgrav [Wed, 27 Mar 2024 10:03:49 +0000 (11:03 +0100)]
ln: Use stdbool, style nits.

MFC after: 1 week
Sponsored by: Klara, Inc.
Reviewed by: imp, allanjude
Differential Revision: https://reviews.freebsd.org/D44511

2 months agotouch: Add unit tests.
Dag-Erling Smørgrav [Wed, 27 Mar 2024 10:03:45 +0000 (11:03 +0100)]
touch: Add unit tests.

MFC after: 1 week
Reviewed by: bapt
Differential Revision: https://reviews.freebsd.org/D44505

2 months agotouch: Allow setting the timestamp to -1.
Dag-Erling Smørgrav [Wed, 27 Mar 2024 10:03:40 +0000 (11:03 +0100)]
touch: Allow setting the timestamp to -1.

Note that VFS internally interprets a timestamp of -1 as “do not set”,
so this has no effect, but at least touch won't incorrectly reject the
given date / time (1969-12-31 23:59:59 UTC) as invalid.

While here, fix some style issues.

MFC after: 1 week
Reviewed by: allanjude
Differential Revision: https://reviews.freebsd.org/D44504

2 months agolibc: Improve description of mktime() / timegm().
Dag-Erling Smørgrav [Wed, 27 Mar 2024 10:03:37 +0000 (11:03 +0100)]
libc: Improve description of mktime() / timegm().

* Mention that mktime() and timegm() set errno on failure.
* Correctly determining whether mktime() / timegm() succeeded with
  arbitrary input (where -1 can be a valid result) is non-trivial.
  Document the recommended procedure.

PR: 277863
MFC after: 1 week
Reviewed by: pauamma_gundo.com, gbe
Differential Revision: https://reviews.freebsd.org/D44503

2 months agodiff: Integrate libdiff from OpenBSD GoT.
Dag-Erling Smørgrav [Wed, 27 Mar 2024 10:03:33 +0000 (11:03 +0100)]
diff: Integrate libdiff from OpenBSD GoT.

This adds support for two new diff algorithms, Myers diff and Patience
diff.

These algorithms perform a different form of search compared to the
classic Stone algorithm and support escapes when worst case scenarios
are encountered.

Add the -A flag to allow selection of the algorithm, but default to
using the new Myers diff implementation.

The libdiff implementation currently only supports a subset of input and
output options supported by diff.  When these options are used, but the
algorithm is not selected, automatically fallback to the classic Stone
algorithm until support for these modes can be added.

Based on work originally done by thj@ with contributions from kevans@.

Sponsored by: Klara, Inc.
Reviewed by: thj
Differential Revision: https://reviews.freebsd.org/D44302

2 months agolibdiff: Improve function prototype detection.
Dag-Erling Smørgrav [Wed, 27 Mar 2024 10:03:29 +0000 (11:03 +0100)]
libdiff: Improve function prototype detection.

- Recognize ObjC methods.
- Start searching within the leading context.

Sponsored by: Klara, Inc.
Reviewed by: thj
Differential Revision: https://reviews.freebsd.org/D44301

2 months agopkgbase: remove post-install script for kernel
Baptiste Daroussin [Wed, 27 Mar 2024 08:06:35 +0000 (09:06 +0100)]
pkgbase: remove post-install script for kernel

the hint file is now directly packages within the package itself.

Reported by: jhb

2 months agokern linker: Make linker_file_add_dependency() void
Zhenlei Huang [Wed, 27 Mar 2024 04:02:32 +0000 (12:02 +0800)]
kern linker: Make linker_file_add_dependency() void

The only possible return value has been zero since cee9542d51f0.

No functional change intended.

Reviewed by: dfr
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D44507

2 months agoepoch(9): Remove the under-development note
Ka Ho Ng [Tue, 26 Mar 2024 14:38:41 +0000 (10:38 -0400)]
epoch(9): Remove the under-development note

There has not been planned changes so far to the interface. Remove the
section as it may not be relevant anymore.

Sponsored by: Juniper Networks, Inc.
Reviewed by: glebius
Differential Revision: https://reviews.freebsd.org/D44501

2 months agonetmap: Address errors on memory free in netmap_generic
Tom Jones [Tue, 26 Mar 2024 09:52:07 +0000 (09:52 +0000)]
netmap: Address errors on memory free in netmap_generic

netmap_generic keeps a pool of mbufs for handling transfers, these mbufs
have an external buffer attached to them.

If some cases other parts of the network stack can chain these mbufs,
when this happens the normal pool destructor function can end up
free'ing the pool mbufs twice:

- A first time if a pool mbuf has been chained with another mbuf when
  its chain is freed
- A second time when its entry in the pool is freed

Additionally, if other parts of the stack demote a pool mbuf its
interface reference will be cleared. In this case we deference a NULL
pointer when trying to free the mbuf through the destructor. Store a
reference to the adapter in ext_arg1 with the destructor callback so we
can find the correct adapter when free'ing a pool mbuf.

This change enables using netmap with epair interfaces.

Reviewed By: vmaffione
MFC after: 1 week
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D44371

2 months agokern linker: Do not touch userrefs of the kernel file
Zhenlei Huang [Tue, 26 Mar 2024 08:47:02 +0000 (16:47 +0800)]
kern linker: Do not touch userrefs of the kernel file

A nonzero `userrefs` of a linker file indicates that the file, either
loaded from kldload(2) or preloaded, can be unloaded via kldunload(2).
As for the kernel file, it can be unloaded by the loader but should not
be after initialization.

This change fixes regression from d9ce8a41eac9 which incidentally
increases `userrefs` of the kernel file.

Reviewed by: dfr, dab, jhb
Fixes: d9ce8a41eac9 kern_linker: Handle module-loading failures in preloaded .ko files
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D42530

2 months agokern linker: Do not unload a module if it has dependants
Zhenlei Huang [Tue, 26 Mar 2024 03:55:45 +0000 (11:55 +0800)]
kern linker: Do not unload a module if it has dependants

Despite the name, linker_file_unload() will drop a reference and return
success when the module file has dependants, i.e. it has more than one
reference. When user request to unload such modules then the kernel
should reject unambiguously and immediately.

PR: 274986
Reviewed by: dfr, dab, jhb
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D42527

2 months agoamd64: initialize td_frame stack area for init(8) main thread
Konstantin Belousov [Mon, 25 Mar 2024 12:10:43 +0000 (14:10 +0200)]
amd64: initialize td_frame stack area for init(8) main thread

Unitialized td_frame mostly does not matter since all registers are
overwritten on exec to activate init(8).  Except PSL_T bit from the
%rflags which might leak into fresh init as garbage, causing spurious
SIGTRAPs delivered to init until first syscall is executed.

Reviewed by: emaste, jhb, jhibbits
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D44498

2 months agox86: test the right CPUID bit when checking for XSAVEOPT support
Konstantin Belousov [Mon, 25 Mar 2024 10:34:06 +0000 (12:34 +0200)]
x86: test the right CPUID bit when checking for XSAVEOPT support

Reviewed by: emaste
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D44497

2 months agoBRT: Skip getting length in brt_entry_lookup()
Alexander Motin [Tue, 26 Mar 2024 00:13:45 +0000 (20:13 -0400)]
BRT: Skip getting length in brt_entry_lookup()

Unlike DDT, where ZAP values may have different lengths due to
compression, all BRT entries are identical 8-byte counters.  It
does not make sense to first fetch the length only to assert it.
zap_lookup_uint64() is specifically designed to work with counters
of different size and should return error if something odd found.
Calling it straight allows to save some measurable CPU time.

Reviewed-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15950

2 months agoabd_iter_page: don't use compound heads on Linux <4.5
Rob Norris [Wed, 13 Mar 2024 23:57:30 +0000 (10:57 +1100)]
abd_iter_page: don't use compound heads on Linux <4.5

Before 4.5 (specifically, torvalds/linux@ddc58f2), head and tail pages
in a compound page were refcounted separately. This means that using the
head page without taking a reference to it could see it cleaned up later
before we're finished with it. Specifically, bio_add_page() would take a
reference, and drop its reference after the bio completion callback
returns.

If the zio is executed immediately from the completion callback, this is
usually ok, as any data is referenced through the tail page referenced
by the ABD, and so becomes "live" that way. If there's a delay in zio
execution (high load, error injection), then the head page can be freed,
along with any dirty flags or other indicators that the underlying
memory is used. Later, when the zio completes and that memory is
accessed, its either unmapped and an unhandled fault takes down the
entire system, or it is mapped and we end up messing around in someone
else's memory. Both of these are very bad.

The solution on these older kernels is to take a reference to the head
page when we use it, and release it when we're done. There's not really
a sensible way under our current structure to do this; the "best" would
be to keep a list of head page references in the ABD, and release them
when the ABD is freed.

Since this additional overhead is totally unnecessary on 4.5+, where
head and tail pages share refcounts, I've opted to simply not use the
compound head in ABD page iteration there. This is theoretically less
efficient (though cleaning up head page references would add overhead),
but its safe, and we still get the other benefits of not mapping pages
before adding them to a bio and not mis-splitting pages.

There doesn't appear to be an obvious symbol name or config option we
can match on to discover this behaviour in configure (and the mm/page
APIs have changed a lot since then anyway), so I've gone with a simple
version check.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588

2 months agovdev_disk: use bio_chain() to submit multiple BIOs
Rob Norris [Wed, 21 Feb 2024 00:07:21 +0000 (11:07 +1100)]
vdev_disk: use bio_chain() to submit multiple BIOs

Simplifies our code a lot, so we don't have to wait for each and
reassemble them.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588

2 months agovdev_disk: add module parameter to select BIO submission method
Rob Norris [Tue, 9 Jan 2024 02:28:57 +0000 (13:28 +1100)]
vdev_disk: add module parameter to select BIO submission method

This makes the submission method selectable at module load time via the
`zfs_vdev_disk_classic` parameter, allowing this change to be backported
to 2.2 safely, and disabled in favour of the "classic" submission method
if new problems come up.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588

2 months agovdev_disk: rewrite BIO filling machinery to avoid split pages
Rob Norris [Tue, 18 Jul 2023 01:11:29 +0000 (11:11 +1000)]
vdev_disk: rewrite BIO filling machinery to avoid split pages

This commit tackles a number of issues in the way BIOs (`struct bio`)
are constructed for submission to the Linux block layer.

The kernel has a hard upper limit on the number of pages/segments that
can be added to a BIO, as well as a separate limit for each device
(related to its queue depth and other scheduling characteristics).

ZFS counts the number of memory pages in the request ABD
(`abd_nr_pages_off()`, and then uses that as the number of segments to
put into the BIO, up to the hard upper limit. If it requires more than
the limit, it will create multiple BIOs.

Leaving aside the fact that page count method is wrong (see below), not
limiting to the device segment max means that the device driver will
need to split the BIO in half. This is alone is not necessarily a
problem, but it interacts with another issue to cause a much larger
problem.

The kernel function to add a segment to a BIO (`bio_add_page()`) takes a
`struct page` pointer, and offset+len within it. `struct page` can
represent a run of contiguous memory pages (known as a "compound page").
In can be of arbitrary length.

The ZFS functions that count ABD pages and load them into the BIO
(`abd_nr_pages_off()`, `bio_map()` and `abd_bio_map_off()`) will never
consider a page to be more than `PAGE_SIZE` (4K), even if the `struct
page` is for multiple pages. In this case, it will load the same `struct
page` into the BIO multiple times, with the offset adjusted each time.

With a sufficiently large ABD, this can easily lead to the BIO being
entirely filled much earlier than it could have been. This is also
further contributes to the problem caused by the incorrect segment limit
calculation, as its much easier to go past the device limit, and so
require a split.

Again, this is not a problem on its own.

The logic for "never submit more than `PAGE_SIZE`" is actually a little
more subtle. It will actually never submit a buffer that crosses a 4K
page boundary.

In practice, this is fine, as most ABDs are scattered, that is a list of
complete 4K pages, and so are loaded in as such.

Linear ABDs are typically allocated from slabs, and for small sizes they
are frequently not aligned to page boundaries. For example, a 12K
allocation can span four pages, eg:

     -- 4K -- -- 4K -- -- 4K -- -- 4K --
    |        |        |        |        |
          :## ######## ######## ######:    [1K, 4K, 4K, 3K]

Such an allocation would be loaded into a BIO as you see:

    [1K, 4K, 4K, 3K]

This tends not to be a problem in practice, because even if the BIO were
filled and needed to be split, each half would still have either a start
or end aligned to the logical block size of the device (assuming 4K at
least).

---

In ideal circumstances, these shortcomings don't cause any particular
problems. Its when they start to interact with other ZFS features that
things get interesting.

Aggregation will create a "gang" ABD, which is simply a list of other
ABDs. Iterating over a gang ABD is just iterating over each ABD within
it in turn.

Because the segments are simply loaded in order, we can end up with
uneven segments either side of the "gap" between the two ABDs. For
example, two 12K ABDs might be aggregated and then loaded as:

    [1K, 4K, 4K, 3K, 2K, 4K, 4K, 2K]

Should a split occur, each individual BIO can end up either having an
start or end offset that is not aligned to the logical block size, which
some drivers (eg SCSI) will reject. However, this tends not to happen
because the default aggregation limit usually keeps the BIO small enough
to not require more than one split, and most pages are actually full 4K
pages, so hitting an uneven gap is very rare anyway.

If the pool is under particular memory pressure, then an IO can be
broken down into a "gang block", a 512-byte block composed of a header
and up to three block pointers. Each points to a fragment of the
original write, or in turn, another gang block, breaking the original
data up over and over until space can be found in the pool for each of
them.

Each gang header is a separate 512-byte memory allocation from a slab,
that needs to be written down to disk. When the gang header is added to
the BIO, its a single 512-byte segment.

Pulling all this together, consider a large aggregated write of gang
blocks. This results a BIO containing lots of 512-byte segments. Given
our tendency to overfill the BIO, a split is likely, and most possible
split points will yield a pair of BIOs that are misaligned. Drivers that
care, like the SCSI driver, will reject them.

---

This commit is a substantial refactor and rewrite of much of `vdev_disk`
to sort all this out.

`vdev_bio_max_segs()` now returns the ideal maximum size for the device,
if available. There's also a tuneable `zfs_vdev_disk_max_segs` to
override this, to assist with testing.

We scan the ABD up front to count the number of pages within it, and to
confirm that if we submitted all those pages to one or more BIOs, it
could be split at any point with creating a misaligned BIO.  If the
pages in the BIO are not usable (as in any of the above situations), the
ABD is linearised, and then checked again. This is the same technique
used in `vdev_geom` on FreeBSD, adjusted for Linux's variable page size
and allocator quirks.

`vbio_t` is a cleanup and enhancement of the old `dio_request_t`. The
idea is simply that it can hold all the state needed to create, submit
and return multiple BIOs, including all the refcounts, the ABD copy if
it was needed, and so on. Apart from what I hope is a clearer interface,
the major difference is that because we know how many BIOs we'll need up
front, we don't need the old overflow logic that would grow the BIO
array, throw away all the old work and restart. We can get it right from
the start.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588

2 months agovdev_disk: make read/write IO function configurable
Rob Norris [Tue, 9 Jan 2024 01:29:19 +0000 (12:29 +1100)]
vdev_disk: make read/write IO function configurable

This is just setting up for the next couple of commits, which will add a
new IO function and a parameter to select it.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588

2 months agovdev_disk: reorganise vdev_disk_io_start
Rob Norris [Tue, 9 Jan 2024 01:23:30 +0000 (12:23 +1100)]
vdev_disk: reorganise vdev_disk_io_start

Light reshuffle to make it a bit more linear to read and get rid of a
bunch of args that aren't needed in all cases.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588

2 months agovdev_disk: rename existing functions to vdev_classic_*
Rob Norris [Tue, 9 Jan 2024 01:12:56 +0000 (12:12 +1100)]
vdev_disk: rename existing functions to vdev_classic_*

This is just renaming the existing functions we're about to replace and
grouping them together to make the next commits easier to follow.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588

2 months agoabd: add page iterator
Rob Norris [Mon, 11 Dec 2023 05:05:54 +0000 (16:05 +1100)]
abd: add page iterator

The regular ABD iterators yield data buffers, so they have to map and
unmap pages into kernel memory. If the caller only wants to count
chunks, or can use page pointers directly, then the map/unmap is just
unnecessary overhead.

This adds adb_iterate_page_func, which yields unmapped struct page
instead.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588

2 months agolinux 5.4 compat: page_size()
Rob Norris [Mon, 13 Nov 2023 06:55:29 +0000 (17:55 +1100)]
linux 5.4 compat: page_size()

Before 5.4 we have to do a little math.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588

2 months agobsdlabel: add BUGS section documenting 8 partition limit
Ed Maste [Mon, 25 Mar 2024 22:25:05 +0000 (18:25 -0400)]
bsdlabel: add BUGS section documenting 8 partition limit

PR: 276517

2 months agoBRT: Make BRT block sizes configurable
Alexander Motin [Mon, 25 Mar 2024 22:02:38 +0000 (18:02 -0400)]
BRT: Make BRT block sizes configurable

Similar to DDT make BRT data and indirect block sizes configurable
via module parameters.  I am not sure what would be the best yet,
but similar to DDT 4KB blocks kill all chances of compression on
vdev with ashift=12 or more, that on my tests reaches 3x.

While here, fix documentation for respective DDT parameters.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15967

2 months agoProvide macros for setting and getting blkptr birth times
George Wilson [Mon, 25 Mar 2024 22:01:54 +0000 (18:01 -0400)]
Provide macros for setting and getting blkptr birth times

There exist a couple of macros that are used to update the blkptr birth
times but they can often be confusing. For example, the
BP_PHYSICAL_BIRTH() macro will provide either the physical birth time
if it is set or else return back the logical birth time. The
complement to this macro is BP_SET_BIRTH() which will set the logical
birth time and set the physical birth time if they are not the same.
Consumers may get confused when they are trying to get the physical
birth time and use the BP_PHYSICAL_BIRTH() macro only to find out that
the logical birth time is what is actually returned.

This change cleans up these macros and makes them symmetrical. The same
functionally is preserved but the name is changed. Instead of calling
BP_PHYSICAL_BIRTH(), consumer can now call BP_GET_BIRTH(). In
additional to cleaning up this naming conventions, two new sets of
macros are introduced -- BP_[SET|GET]_LOGICAL_BIRTH() and
BP_[SET|GET]_PHYSICAL_BIRTH.  These new macros allow the consumer to
get and set the specific birth time.

As part of the cleanup, the unused GRID macros have been removed and
that portion of the blkptr are currently unused.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes #15962

2 months agoBRT: Relax brt_pending_apply() locking
Alexander Motin [Mon, 25 Mar 2024 21:59:55 +0000 (17:59 -0400)]
BRT: Relax brt_pending_apply() locking

Since brt_pending_apply() is running in syncing context, no other
brt_pending_tree accesses are possible for the TXG.  We don't need
to acquire brt_pending_lock here.

Reviewed-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15955

2 months agoZAP: Massively switch to _by_dnode() interfaces
Alexander Motin [Mon, 25 Mar 2024 21:58:50 +0000 (17:58 -0400)]
ZAP: Massively switch to _by_dnode() interfaces

Before this change ZAP called dnode_hold() for almost every block
access, that was clearly visible in profiler under heavy load, such
as BRT.  This patch makes it always hold the dnode reference between
zap_lockdir() and zap_unlockdir().  It allows to avoid most of dnode
operations between those.  It also adds several new _by_dnode() APIs
to ZAP and uses them in BRT code.  Also adds dmu_prefetch_by_dnode()
variant and uses it in the ZAP code.

After this there remains only one call to dmu_buf_dnode_enter(),
which seems to be unneeded.  So remove the call and the functions.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15951

2 months agoBRT: Skip duplicate BRT prefetches
Alexander Motin [Mon, 25 Mar 2024 21:58:04 +0000 (17:58 -0400)]
BRT: Skip duplicate BRT prefetches

If there is a pending entry for this block, then we've already
issued BRT prefetch for it within this TXG, so don't do it again.
BRT vdev lookup and following zap_prefetch_uint64() call can be
pretty expensive and should be avoided when not necessary.

Reviewed-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15941

2 months agoFix corruption caused by mmap flushing problems
Robert Evans [Mon, 25 Mar 2024 21:56:49 +0000 (17:56 -0400)]
Fix corruption caused by mmap flushing problems

1) Make mmap flushes synchronous. Linux may skip flushing dirty pages
   already in writeback unless data-integrity sync is requested.

2) Change zfs_putpage to use TXG_WAIT. Otherwise dirty pages may be
   skipped due to DMU pushing back on TX assign.

3) Add missing mmap flush when doing block cloning.

4) While here, pass errors from putpage to writepage/writepages.

This change fixes corruption edge cases, but unfortunately adds
synchronous ZIL flushes for dirty mmap pages to llseek and bclone
operations. It may be possible to avoid these sync writes later
but would need more tricky refactoring of the writeback code.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Robert Evans <evansr@google.com>
Closes #15933
Closes #16019

2 months agolibfetch: parse scheme://domain:/ correctly
Ka Ho Ng [Mon, 25 Mar 2024 20:10:42 +0000 (16:10 -0400)]
libfetch: parse scheme://domain:/ correctly

This improves URL-parsing compability with cURL, and unbreaks parsing of
similar kinds of URLs after commit 8d9de5b10a24.

Sponsored by: Juniper Networks, Inc.
Reviewed by: des
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D44493

2 months agoboot0: remove reference to fdisk
Ed Maste [Wed, 24 Jan 2024 15:13:08 +0000 (10:13 -0500)]
boot0: remove reference to fdisk

fdisk is obsolete and there is no need to mention a specific tool used
to update the partition table.  Just refer to it as the MBR partition
table.

Sponsored by: The FreeBSD Foundation

2 months agobsdlabel: emit deprecation notice when run
Ed Maste [Wed, 24 Jan 2024 19:33:35 +0000 (14:33 -0500)]
bsdlabel: emit deprecation notice when run

Reviewed by: imp
Sponsored by: The FreeBSD Foundation
Differential Reision: https://reviews.freebsd.org/D43586

2 months agobsdlabel: add deprecation notice
Ed Maste [Tue, 23 Jan 2024 18:04:43 +0000 (13:04 -0500)]
bsdlabel: add deprecation notice

gpart is the preferred tool for managing partitions of all types,
including BSD disklabels.

Note that this is only about bsdlabel/disklabel, the tool -- there is no
current plan to remove support for MBR or BSD disk labels from the
kernel or from gpart.

Reviewed by: imp, olce
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D43563

2 months agoRemove a reference to xrpu from timetc.h
Josef 'Jeff' Sipek [Mon, 25 Mar 2024 17:50:50 +0000 (11:50 -0600)]
Remove a reference to xrpu from timetc.h

It was removed in 2007, so doesn't make a good example.

Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D44462

2 months agocertctl: Revert to symlinks.
Mark Peek [Mon, 25 Mar 2024 15:58:46 +0000 (16:58 +0100)]
certctl: Revert to symlinks.

Unfortunately tar will not be able to extract base.txz to a system where
/etc and /usr are not on the same filesystem if the certificates are
hard links.

PR: 277828
Reviewed by: mp
Differential Revision: https://reviews.freebsd.org/D44496

2 months agosleep: Overhaul.
Dag-Erling Smørgrav [Mon, 25 Mar 2024 15:58:31 +0000 (16:58 +0100)]
sleep: Overhaul.

Program:

* Add a dummy getopt(3) loop to handle `--`.
* Move interval parsing out into a separate function.
* Print a diagnostic for every invalid interval.
* Check for NaN and infinity.
* Improve bounds checks.

Manual page:

* Miscellaneous markup fixes.
* Reword DESCRIPTION section.
* Move text about GNU compatibility to STANDARDS section.
* Convert examples from csh to sh.

Sponsored by: Klara, Inc.
Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D44471

2 months agopfsync: cope with multiple pending plus messages
Kristof Provost [Sun, 24 Mar 2024 15:08:52 +0000 (16:08 +0100)]
pfsync: cope with multiple pending plus messages

It's possible for pfsync to add a plus message when one is already queued.
Append both, rather than overwriting the already pending one.

MFC after: 1 week

2 months agopfsync: fix use of invalidated stack variable
Kristof Provost [Sun, 24 Mar 2024 08:46:31 +0000 (09:46 +0100)]
pfsync: fix use of invalidated stack variable

Calls to pfsync_send_plus() pass pointers to stack variables.
If pfsync_sendout() then fails it retains the pointer to these stack
variables, accesing them later.

Allocate a buffer and copy the data instead, so that we can retain the
pointer safely.

Reported by: CI KASAN, markj
MFC after: 1 week

2 months agopf: fix use-after-free
Kristof Provost [Sat, 23 Mar 2024 16:02:50 +0000 (17:02 +0100)]
pf: fix use-after-free

If we fragment the packet in pf_route() the first transmitted packet
will free the pf_mtag we have stored in pf_pdesc (pd). Ensure we
update that pointer for every packet to avoid using a freed pointer in
pf_dummynet_route().

Reported by: CI KASAN, markj
MFC after: 1 week

2 months agonetpfil tests: disable ICMPv6 rate limiting in the test jail
Gleb Smirnoff [Mon, 25 Mar 2024 02:54:34 +0000 (19:54 -0700)]
netpfil tests: disable ICMPv6 rate limiting in the test jail

The dummynet test uses flood ping as source of traffic, so the rate
limiting of ICMP replies broke the test.

Fixes: 32aeee8ce7e72738fff236ccd5629d55035458f8

2 months agoicmp: allow zero value for ICMP limits
Gleb Smirnoff [Mon, 25 Mar 2024 02:52:03 +0000 (19:52 -0700)]
icmp: allow zero value for ICMP limits

Zero means limit is disabled, so the value doesn't need to be checked
against jitter value.

Fixes: ac44739fd834f51cacb26485a4140fd482e20150
Fixes: a03aff88a14448c3084a0384082ec996d7213897

2 months agoefibootmgr: allow -u as a valid option
Mark Peek [Sun, 24 Mar 2024 19:37:12 +0000 (12:37 -0700)]
efibootmgr: allow -u as a valid option

PR: 277907
Reported by: vsasjason@gmail.com
MFC after: 1 week

2 months agousr.bin/calendar/calendars: Add myself as a committer
Rainer Hurling [Sun, 24 Mar 2024 18:57:27 +0000 (19:57 +0100)]
usr.bin/calendar/calendars: Add myself as a committer

2 months agoarm64: fix free queue and reservation configuration for 16KB pages
Eliot Solomon [Sat, 18 Nov 2023 21:13:21 +0000 (15:13 -0600)]
arm64: fix free queue and reservation configuration for 16KB pages

Correctly configure the free page queues and the reservation size when
the base page size is 16KB.  In particular, the reservation size was
less than the L2 Block size, making L2 promotions and mappings all but
impossible.

Reviewed by: markj
Tested by: gallatin
Differential Revision: https://reviews.freebsd.org/D42737