Dapeng Gao [Mon, 3 Jun 2024 18:18:13 +0000 (12:18 -0600)]
Simplify signal handling code in libthr by removing use of SYS_sigreturn
The use of SYS_sigreturn is unnecessary here.
If handle_signal is called when a signal is delivered, it can just
return normally back to sigcode which will call sigreturn anyway.
In case handle_signal is called by check_deferred_signal, using
setcontext is better than SYS_sigreturn because that is the correct
system call to pair with the earlier getcontext.
John Baldwin [Thu, 6 Jun 2024 21:47:04 +0000 (14:47 -0700)]
cryptocheck: Don't treat OpenSSL errors as fatal
Abort the current test but keep running additional tests if OpenSSL
reports an error during a test. This matches the behavior for other
tests such as an error from OCF.
Austin Shafer [Thu, 6 Jun 2024 20:42:06 +0000 (23:42 +0300)]
linuxkpi: Allow ida_destroy and idr_destroy to be called multiple times
This fixes some weird behavior triggered by nvidia-drm.ko: some DRM
cleanup functions will be called multiple times, leading to a double
free. drm_mode_config_cleanup will be called twice, causing ida_destroy
to be called twice. Although calling the cleanup twice doesn't seem
very clean, on Linux this seems to be permissable as it handles it
just fine. Not doing these checks causes mutex panics and double frees.
In order to preserve this behavior this change checks if the objects
have already been destroyed and bails if so. This fixes the panic seen
when unloading the nvidia-drm driver.
Doug Moore [Thu, 6 Jun 2024 18:42:31 +0000 (13:42 -0500)]
rangeset: add next() iteration
Add a method rangeset_next to find the first range that starts at or
after a given value. Use it to rewrite pmap_pkru_same and
pmap_bti_same to avoid walking a page at a time over pages in no
range.
Ryan Libby [Thu, 6 Jun 2024 17:26:50 +0000 (10:26 -0700)]
vm_radix: define vm_radix_insert_lookup_lt and use in vm_page_rename
Use the new pctrie combined lookup/insert. This is an easy application
of the new facility. There are other places where we do this for pages
that may need more plumbing to use combined lookup/insert.
Cosimo Cecchi [Thu, 6 Jun 2024 17:24:07 +0000 (17:24 +0000)]
lam: fail on I/O errors
I/O errors should be reported; however lam currently does not
disambiguate between EOF because end-of-file was reached and EOF because
an I/O error occurred.
This commit changes lam to exit with EX_IOERR when an I/O error occurs.
Reviewed by: imp, allanjude
Sponsored by: Apple Inc.
Differential Revision: https://reviews.freebsd.org/D45437
Alan Somers [Wed, 5 Jun 2024 20:13:04 +0000 (14:13 -0600)]
ctladm.8: fix several errors in the "port" section
* Document the "-d" option.
* Add the "-c" and "-r" options to the summary.
* Correct the list of required options.
* Clarify that the "-t" option is only for use with "-o", "-w", and "-W"
* Replace references to the nonexistent "-n" with "-p".
Also, fix a few related error strings in the ctladm command.
Cosimo Cecchi [Thu, 6 Jun 2024 16:51:22 +0000 (16:51 +0000)]
comm: flush stdout for error checking prior to exiting
UNIX conformance wants utilities to catch any errors when doing I/O, as
opposed to relying on the implicit flush upon exit.
comm currently does not do that.
This commit adds handling of I/O errors on stdout prior to exit.
Reviewed by: imp, allanjude
Sponsored by: Apple Inc.
Differential Revision: https://reviews.freebsd.org/D45439
Kristof Provost [Wed, 5 Jun 2024 20:30:34 +0000 (16:30 -0400)]
pf: simplify pf_addrcpy() and pf_match_addr()
Use the v4/v6 union members rather than the uint32_t ones.
Export IN_ARE_MASKED_ADDR_EQUAL() in in_var.h and use it (and its IPv6
equivalent) for masked comparisons rather than hand-rolled code.
Kristof Provost [Wed, 5 Jun 2024 17:44:46 +0000 (19:44 +0200)]
netlink: pass the correct arguments for SIOCDIFADDR and SIOCDIFADDR_IN6
These take struct ifreq and struct in6_ifreq respectively. Passing struct
in_aliasreq or struct in6_aliasreq means we're supplying a shorter object than
expected. While this doesn't actively break things on most architectures other
than CHERI it is still wrong.
mpi3mr: Divert large WriteSame IOs to firmware if unmap and ndob bits are set
Firmware advertises the transfer lenght for writesame commands to driver during init.
So for any writesame IOs with ndob and unmap bit set and transfer lengh is greater
than the max write same length specified by the firmware, then direct those commands
to firmware instead of hardware otherwise hardware will break.
mpi3mr: Update consumer index of admin and operational reply queues after every 100 replies
Instead of updating the ConsumerIndex of the Admin and Operational ReplyQueues
after processing all replies in the queue, it will now be periodically updated
after processing every 100 replies.
mpi3mr: Decrement per controller and per target counter post reset
Post controller reset, If any device removal events arrive, and if
there are any outstanding IOs then the driver will unnecessarily wait
in the loop for 30 seconds before removing the device from the OS.
reset target outstanding IO counter and controller outstanding IO counter
and remove the redundant wait loop.
mpi3mr: poll reply queue and add MPI3MR_DEV_REMOVE_HS_COMPLETED flag
An outstanding IO counter per target check has been added before deleting
the target from the OS which will poll the reply queue if there are any
outstanding IOs are found.
A new flag, named "MPI3MR_DEV_REMOVE_HS_COMPLETED," is added. If a remove event
for a target occurs and before the deletion of the target resource if the add event
for another target arrives reusing the same target ID then this flag will prevent
the removal of the target reference. This flag ensures synchronization between the interrupt
top and bottom half during target removal and addition events.
Stefan Eßer [Thu, 6 Jun 2024 10:28:02 +0000 (12:28 +0200)]
newfs_msdos: align to multiple of cluster size by default
A previous commit aligned the start of the data area to a multiple of
the VM page size, in order to prevent extra buffers to be allocated
(which failed for 64 KB cluster size without this alignment).
Since a dependency on PAGE_SIZE caused compatibility issues, the
alignment was made conditional on this macro being defined, in the
previous commit. This lead to different behavior of this program
when built on FreeBSD vs. Linux (which does not define PAGE_SIZE).
This commit removes any use of PAGE_SIZE and instead always aligns
the start of the data area to a multiple of the cluster size.
The -A option is now implied, unless overridden by a specific number
of reserved sectors with the -r option.
Michael Tuexen [Thu, 6 Jun 2024 06:29:05 +0000 (08:29 +0200)]
tcp: simplify stack switching protocol
Before this patch, a stack (tfb) accepts a tcpcb (tp), if the
tp->t_state is TCPS_CLOSED or tfb->tfb_tcp_handoff_ok is not NULL
and tfb->tfb_tcp_handoff_ok(tp) returns 0.
After this patch, the only check is tfb->tfb_tcp_handoff_ok(tp)
returns 0. tfb->tfb_tcp_handoff_ok must always be provided.
For existing TCP stacks (FreeBSD, RACK and BBR) there is no
functional change. However, the logic is simpler.
Ryan Libby [Thu, 6 Jun 2024 03:21:22 +0000 (20:21 -0700)]
getblk: reduce time under bufobj lock
Use the new pctrie combined insert/lookup facility to reduce work and
time under the bufobj interlock when associating a buf with a vnode.
We now do one lookup in the dirty tree and one combined lookup/insert in
the clean tree instead of one lookup in dirty, two in clean, and then an
insert in clean. We also avoid touching the possibly unrelated buf at
the tail of the queue.
Also correct an issue where the actual order of the tail queue depended
on the insertion order due to sign issues.
Ryan Libby [Thu, 6 Jun 2024 02:17:20 +0000 (19:17 -0700)]
pctrie: add combined insert/lookup operations
In several places in code, we do a pctrie lookup followed by a pctrie
insert. Provide a few flavors of combined lookup/insert. This may save
a portion of the work from walking a large pctrie twice.
The general idea is that while we walk the trie during insert, we also
do the same kind of tracking work that we do during pctrie_lookup_ge or
pctrie_lookup_le, and we pass out a pctrie node from where such a lookup
may continue.
John Baldwin [Wed, 5 Jun 2024 19:59:28 +0000 (12:59 -0700)]
nvmf: Handle shutdowns more gracefully
If an association is disconnected during a clean shutdown, abort all
pending and future I/O requests with an error to avoid hangs either due
to filesystem unmounts or a stuck GEOM event.
If an association is connected during a clean shutdown, gracefully
disconnect from the remote controller and close the open queues.
John Baldwin [Wed, 5 Jun 2024 19:54:15 +0000 (12:54 -0700)]
nvmf: Permit failing I/O requests while disconnected
Add a kern.nvmf.fail_on_disconnection sysctl similar to the
kern.iscsi.fail_on_disconnection sysctl. This causes pending I/O
requests to fail with an error if an association is disconnected
instead of requeueing to be retried once the association is
reconnected. As with iSCSI, the default is to queue and retry
operations.
John Baldwin [Wed, 5 Jun 2024 19:53:08 +0000 (12:53 -0700)]
nvmf: Rescan namespaces after reconnecting
While a host was disconnected from a remote controller, namespaces
might have been added, removed, or altered properties. Rescan the
namespaces after reconnecting to detect any such changes.
John Baldwin [Wed, 5 Jun 2024 19:51:56 +0000 (12:51 -0700)]
nvmf: Refactor nvmf_add_namespaces to be more generic
Rename to nvmf_scan_active_namespaces and accept an additional
callback function and callback argument. The callback is invoked on
each active namespace enumerated by the active namespace list from the
IDENTIFY command.
Alan Cox [Wed, 5 Jun 2024 06:40:20 +0000 (01:40 -0500)]
vm: Eliminate a redundant call to vm_reserv_break_all()
When vm_object_collapse() was changed in commit 98087a0 to call
vm_object_terminate(), rather than destroying the object directly, its
call to vm_reserv_break_all() should have been removed, as
vm_object_terminate() calls vm_reserv_break_all().
John Baldwin [Wed, 5 Jun 2024 16:50:05 +0000 (09:50 -0700)]
pci: Only add special VF handling for direct children in bus methods
For activate/deactivate resource, use a more standard check at the
start of the function since the addition of the PCI_IOV code made this
more complex. For the three recently added methods, just add the
typical check at the beginning that I missed.
This wasn't always fatal as if your system only had PCI device_t's as
children of PCI bus devices it would happen to work ok, but if you
have a non-PCI child device (e.g. an ATA channel) then dereferencing
ivars for non-direct-children could fault.
Jessica Clarke [Wed, 5 Jun 2024 16:41:54 +0000 (17:41 +0100)]
rtld-elf: Use a proper struct type for tlsdesc entries
This clarifies the code and makes it less error-prone. It also makes it
easier to extend downstream in CheriBSD (where pointer and integer
members no longer have the same representation and an additional member
is present).
Enji Cooper [Wed, 5 Jun 2024 04:16:48 +0000 (21:16 -0700)]
pci(4): unbreak the build
`argsp` is not defined in `generic_pcie_unmap_resource(..)`. Remove the
parameter passed to `bus_generic_unmap_resource(..)` as this parameter
is never passed to `generic_pcie_unmap_resource(..)`.
Ruslan Bukin [Wed, 5 Jun 2024 13:08:35 +0000 (14:08 +0100)]
riscv: add stage 2 translation to pmap.
Add basic stage 2 translation support (guest-physical to host-physical).
RISC-V hypervisor spec[1] introduces new translation schemes: Sv32x4,
Sv39x4, Sv48x4 and Sv57x4.
In each case, the size of the incoming address is widened by 2 bits (e.g.
Sv39 becomes 41-bit system).
To accommodate the 2 extra bits, the root page table (only) is expanded
by a factor of four to be 16 KiB instead of the usual 4 KiB. The rest of
page table system (including PTE format) is similar.
This gives us 4x of memory space in each scheme, but it does not make sense
to support all that memory for now.
Allocate required amount of pages for the top directory in case of stage 2,
but leave it unused.
amd64: add a func pointer to tlb shootdown function
Make the tlb shootdown function as a pointer. By default, it still
points to the system function smp_targeted_tlb_shootdown(). It allows
other implemenations to overwrite in the future.
Andrew Turner [Tue, 4 Jun 2024 12:47:52 +0000 (13:47 +0100)]
linux64: Fix the build on arm64 with bti checking
When we enable checking for BTI on arm64 we need to include an ELF
note in all object files linked into a module.
As using objcopy from a binary to an ELF object file doesn't add the
note switch to using .incbin from an assembly file. This allows us to
add the needed note without affecting the included object.
Andrew Turner [Tue, 4 Jun 2024 12:46:46 +0000 (13:46 +0100)]
arm64: Support BTI checking in most of the kernel
LLD has the -zbti-report=error argument to check if the BTI note is
present when linking. To allow for this to be used when linking the
kernel and modules:
- Add the BTI note to the remaining assembly files
- Mark ptrauth.c as protected by BTI
- Disable -zbti-report for vmm hypervisor switching code as it's not
used there.
The linux64 module doesn't build with the flag as it includes vdso code
that doesn't include the note.
Andrew Turner [Tue, 4 Jun 2024 12:45:00 +0000 (13:45 +0100)]
arm64: Disable outling atomics
We don't have the symbols for this. The virtio randon number driver
uses a C11 atomic operation. With inline atomics this is translated to
an Armv8.0 atomic operation, with outling atomics this becomes a
function call to a handler. As we don't have the needed function the
kernel fails to link.
After miss reading the cloudinit spec I ended up writting a wrong
test for basic ssh key setup, nuageinit has been fixed, but not
the test, here is the actual fix.
Alan Cox [Sat, 1 Jun 2024 18:17:52 +0000 (13:17 -0500)]
arm64 pmap: Enable L3C promotions by pmap_enter_quick()
More precisely, implement L3C (64KB/2MB, depending on base page size)
promotion in pmap_enter_quick()'s helper function,
pmap_enter_quick_locked(). At the same time, use the recently
introduced flag VM_PROT_NO_PROMOTE from pmap_enter_object() to
pmap_enter_quick_locked() to avoid L3C promotion attempts that will
fail.
Rick Macklem [Wed, 5 Jun 2024 01:46:41 +0000 (18:46 -0700)]
nfsd: Fix delegation handled for atomic upgrade
For NFSv4.1/4.2, an atomic upgrade of a delegation from a
read delegation to a write delegation is allowed and can
result in signoficantly improved performance.
This patch adds support for this atomic upgrade, plus fixes
a couple of other delegation related bugs. Since there were
three cases where delegations were being issued, the patch
factors this out into a separate function called
nfsrv_issuedelegations().
This patch should only affect the NFSv4.1/4.2 behaviour
when delegations are enabled, which is not the default.
John Baldwin [Tue, 4 Jun 2024 23:50:56 +0000 (16:50 -0700)]
acpi/pci/vmd: Fix a nit with nested resource mapping requests
Some bus drivers use rmans to suballocate resources to child devices.
When the driver for a child device requests a mapping for a
suballocated resource, the bus driver translates this into a mapping
request for a suitable subrange of the original resource the bus
driver allocated from its parent. This nested mapping request should
look like any other resource mapping request being made by the bus
device (i.e. as if the bus device had called bus_map_resource() or
bus_alloc_resource() with RF_ACTIVE).
I had slightly flubbed this last bit though since the direct use of
bus_generic_map/unmap_resource passed up the original child device
(second argument to the underlying kobj interface). While this is
currently harmless, it is not strictly correct as the resource being
mapped is owned by the bus device, not the child and can break for
other bus drivers in the future.
Instead, use bus_map/unmap_resource for the nested request where the
requesting device is now the bus device that owns the parent resource.
Reviewed by: imp
Fixes: 0e1246e33461 acpi: Cleanup handling of suballocated resources
Fixes: b377ff8110e3 pcib: Refine handling of resources allocated from bridge windows
Fixes: d79b6b8ec267 pci_host_generic: Don't rewrite resource start address for translation
Fixes: d714e73f7895 vmd: Use bus_generic_rman_* for PCI bus and memory resources
Differential Revision: https://reviews.freebsd.org/D45433
Mitchell Horne [Tue, 4 Jun 2024 23:17:47 +0000 (20:17 -0300)]
devmap: eliminate unused arguments
The optional 'table' pointer is a legacy part of the interface, which
has been replaced by devmap_register_table()/devmap_add_entry(). The few
in-tree callers have already adapted to this, so it can be removed.
The 'l1pt' argument is already entirely unused within the function.
Reviewed by: andrew, markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45319
Kristof Provost [Tue, 4 Jun 2024 18:02:18 +0000 (20:02 +0200)]
vnet tests: check for if_bridge.ko
A number of tests create a bridge, but did not check if if_bridge.ko is loaded.
We usually get away with that, because `ifconfig bridge create` autoloads the
module, but if we run the tests in a jail (e.g. because of kyua's upcoming
execenv.jail.params feature) we can't load the module and these tests can fail.
Check if the module is loaded, skip the test if it is not.
Mark Johnston [Tue, 4 Jun 2024 19:03:17 +0000 (15:03 -0400)]
bhyve: Add arm64 support to the gdb stub
- Add -G to the arm64 getopt handler.
- Add static register definitions and extensible XML definitions.
- Provide definitions for MD bits such as breakpoint encoding and
length.
- Ensure that bhyve re-injects breakpoint exceptions that it is not
responsible for.
Reviewed by: andrew
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D44740
Mark Johnston [Tue, 4 Jun 2024 18:58:08 +0000 (14:58 -0400)]
arm64/vmm: Add breakpoint and single-stepping support
This will be used to implement parts of bhyve's gdb stub.
Three VM capabilities are added, similar to amd64 without monitor mode.
Two cause breakpoint and single-step exceptions to be raised to EL2 and
then down to bhyve. One lets the gdb stub mask hardware interrupts
while single-stepping, since otherwise the guest will handle a timer
interrupt before executing the target instruction and thus fail
to make progress.
Reviewed by: bnovkov, andrew
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D44739
Doug Moore [Tue, 4 Jun 2024 18:07:07 +0000 (13:07 -0500)]
vm_phys: use ilog2(x) instead of fls(x)-1
One of these changes saves two instructions on an amd64
GENERIC-NODEBUG build. The rest are entirely cosmetic, because the
compiler can deduce that x is nonzero, and avoid the needless test.
Doug Moore [Tue, 4 Jun 2024 18:00:25 +0000 (13:00 -0500)]
x86: simplify ceil(log2(x)) function
A function called mask_width in one place and log2 in the other
calculates its value in a more complex way than necessary. A simpler
implementation offered here saves a few bytes in the functions that
call it.
Kristof Provost [Tue, 4 Jun 2024 12:55:02 +0000 (14:55 +0200)]
pf: fix overly large copy in pf_rule_to_krule()
The timeout array in struct pf_rule has PFTM_OLD_MAX entries, the one in
struct pf_krule has PFTM_MAX entries (and PFTM_MAX > PFTM_OLD_MAX).
Use the smaller of the sizes when copying.