CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log

Extend the VMM stats interface to support a dynamic count of statistics.

- Add a starting index to 'struct vmstats' and change the
  VM_STATS ioctl to fetch the 64 stats starting at that index.
  A compat shim for <= 13 continues to fetch only the first 64
  stats.

- Extend vm_get_stats() in libvmmapi to use a loop and a static
  thread local buffer which grows to hold the stats needed.

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D27463

(cherry picked from commit 64269786170ffd8e3348edea0fc5f5b09b79391e)

cfiscsi_done: Free the dummy PDU earlier.

The dummy PDU needs to be freed before marking task abortion complete
as otherwise cfiscsi_session_terminate_tasks can return and destroy
the session in another thread before the PDU is freed.

Fixes: 2e8d1a55258d iscsi: Allocate a dummy PDU for the internal nexus reset task.
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D34176

(cherry picked from commit a3d71fffa78619fb394b8bb595d2ef680bd0e43a)

bsd.compat.mk: A few cosmetic fixes.

- Add a missing ')' to a warning.

- Consistently use {} when expanding variables.

- Remove a spurious blank line.

Reviewed by: imp, emaste
Obtained from: CheriBSD
Differential Revision: https://reviews.freebsd.org/D34172

(cherry picked from commit ffe74ab77f234fabe691bd79e254cd000e07dfe0)

cxgbei: Rework parsing of pre-offload PDUs.

sbcut() returns mbufs in reverse order so is not suitable for reading
data from the socket buffer. Instead, check for already-received data
in the receive worker thread before passing offload PDUs up to the
iSCSI layer. This uses soreceive() to read data from the socket and
is also to use M_WAITOK since it now runs from a worker thread instead
of an interrupt thread.

Also, fix decoding of the data segment length for pre-offload PDUs.

Reported by: Jithesh Arakkan @ Chelsio
Fixes: a8c4147edcdc cxgbei: Parse all PDUs received prior to enabling offload mode.
Sponsored by: Chelsio Communications

(cherry picked from commit 74fea8eb4fee65163fa745d0dbfcefc138ff7925)

iscsi: Allocate a dummy PDU for the internal nexus reset task.

When an iSCSI target session is terminated, an internal nexus reset
task is posted to abort existing tasks belonging to the session.
Previously, the ctl_io for this internal nexus reset stored a pointer
to the session in the slot that normally holds a pointer to the PDU
from the initiator that triggered the I/O request.  The completion
handler then assumed that any nexus reset I/O was due to an internal
request and fetched the session pointer (instead of the PDU pointer)
from the ctl_io.  However, it is possible to trigger a nexus reset via
an on-the-wire task management PDU.  If such a PDU were sent to the
target, then the completion handler would incorrectly treat this
request as an internal request and treat the pointer to the received
PDU as a pointer to the session instead.

To fix, allocate a dummy PDU for the internal reset task and use an
invalid opcode to differentiate internal nexus resets from resets
requested by the initiator.

PR: 260449
Reported by: Robert Morris <rtm@lcs.mit.edu>
Reviewed by: mav
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D34055

(cherry picked from commit 2e8d1a55258d39f7315fa4f2164c0fce96e79802)

libthr: Use TLS_TCB_* in _tcb_[cd]tor.

This matches libc and rtld in using the alignment (TLS_TCB_ALIGN) from
machine/tls.h instead of hardcoding 16.

Reviewed by: kib
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D34023

(cherry picked from commit 8de1a8131e42f96f8dcfbca9073896d249ff7d2c)

cxgbei: Parse all PDUs received prior to enabling offload mode.

Previously this would only handle a single PDU that did not contain
any data. This should now handle an arbitrary number of PDUs.

While here check for these PDUs in the T6-specific CPL_RX_ISCSI_CMP
handler in addition to CPL_RX_ISCSI_DDP.

Reported by: Jithesh Arakkan @ Chelsio
Sponsored by: Chelsio Communications

(cherry picked from commit a8c4147edcdce934f93dd848c6ed083500dff22c)

freebsd32: Fix layout of struct shmid_kernel32.

The kernel pointers in this structure need to be 32-bit pointers,
not native pointers to 32-bit integers.

Reviewed by: kib
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D33905

(cherry picked from commit da7fc5c33f9a4c906068a9a43f43f8d295100418)

iscsi: Abort fewer data-out tasks on a terminating session.

Only abort tasks queued for datamove after
cfiscsi_sesssion_terminate_tasks has posted its internal
CTL_TASK_I_T_NEXUS_RESET task.

Reported by: Jithesh Arakkan @ Chelsio
Reviewed by: mav
Fixes: 0cd6e85e242b iscsi: Abort data-out tasks queued on a terminating session.
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D33747

(cherry picked from commit a3af69fa81d2341a3864edc67419de3b3bad77d9)

cryptocheck: Add aliases for algs with multiple key sizes.

Previously algorithms such as AES-CBC would provide an algorithm
without a key size for the smallest key size and additional algorithms
with an explicit key size, e.g. "aes-cbc" (128 bits), "aes-cbc192",
and "aes-cbc256".

Instead, always make the key size name explicit and reuse the
"generic" name to request running tests against all of the key sizes.
For example, for AES-CBC this means "aes-cbc128" is now the name of
the variant with a 128-bit key and "aes-cbc" runs tests of AES-CBC
with all three key sizes.

This makes it easier to run tests on all combinations of ciphers like
AES-GCM or AES-CCM with -z in a single invocation.

Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33759

(cherry picked from commit 78beb051a2661b873342162b1ec0ad55b4e27261)

Bump Dd for crypto_clonereq.

Fixes: 74d3f1b63dbe OCF: Add crypto_clonereq().
(cherry picked from commit 822fa6758b88285d7f7701a801670adf256aebae)

ccr: Use a software OCF session for requests which fallback to software.

Previously the driver duplicated code from cryptosoft.c to handle
certain edge case AES-CCM and AES-GCM requests.  However, this
approach has a few downsides:

1) It only uses "plain" software and not accelerated software since it
   uses enc_xform directly.

2) It performs the operation synchronously even though the caller
   believes it is invoking an async driver.  This was fine for the
   original use case of requests with only AAD and no payload that
   execute quickly, but is a bit more disingenuous for large requests
   which fall back due to exceeding the size of a firmware work
   request (e.g. due to large scatter/gather lists).

3) It has required several updates since ccr(4) was added to the tree.

Instead, allocate a software session for AES-CCM and AES-GCM sessions
and dispatch a cloned request asynchronusly to the software session.

Reviewed by: markj
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D33608

(cherry picked from commit e43cf698d93c9f3007ade4884c6ace38eec43a52)

OCF: Add crypto_clonereq().

This function clones an existing crypto request, but associates the
new request with a specified session. The intended use case is for
drivers to be able to fall back to software by cloning a request and
dispatch it to an internally allocated software session.

Reviewed by: markj
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D33607

(cherry picked from commit 74d3f1b63dbea05038e966cf4bb69a01b0589500)

cryptodev.h: Drop 'extern' from function prototypes.

Sponsored by: Chelsio Communications

(cherry picked from commit d074adf18be2801d2c19bf50677ff10495348fc5)

sigev_findtd: Fix whitespace nit in argument list.

Obtained from: CheriBSD

(cherry picked from commit 753c8513879ba30d129e0b0301ffb8e640231266)

Simplify swi for bus_dma.

When a DMA request using bounce pages completes, a swi is triggered to
schedule pending DMA requests using the just-freed bounce pages.  For
a long time this bus_dma swi has been tied to a "virtual memory" swi
(swi_vm).  However, all of the swi_vm implementations are the same and
consist of checking a flag (busdma_swi_pending) which is always true
and if set calling busdma_swi.  I suspect this dates back to the
pre-SMPng days and that the intention was for swi_vm to serve as a
mux.  However, in the current scheme there's no need for the mux.

Instead, remove swi_vm and vm_ih.  Each bus_dma implementation that
uses bounce pages is responsible for creating its own swi (busdma_ih)
which it now schedules directly.  This swi invokes busdma_swi directly
removing the need for busdma_swi_pending.

One consequence is that the swi now works on RISC-V which had previously
failed to invoke busdma_swi from swi_vm.

Reviewed by: imp, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D33447

(cherry picked from commit 254e4e5b77d7788c46333ae35d5e9f347e22c746)

cxgbei: Don't fail task setup if the socket is disconnected.

When the initiator is reconnecting to the target, the connection may
temporarily be marked disconnected or not have an associated socket.
New I/O requests received by the initiator in this state should not
fail with ECONNRESET as that results in an I/O error back to userland.
Instead, they need to still succeed so that CAM can queue the requests
and send them once the connection is re-established.

Setting up DDP for zero-copy receive requires a socket, so just punt
on using DDP for these transfers.

Reported by: Jithesh Arakkan @ Chelsio
Sponsored by: Chelsio Communications

(cherry picked from commit 752e211e64699ff14bb0a66d368cfaec836cfb95)

iscsi: Handle large Text responses.

Text requests and responses can span multiple PDUs.  In that case, the
sender sets the Continue bit in non-final PDUs and the Final bit in
the last PDU.  The receiver responds to non-final PDUs with an empty
text PDU.

To support this, add a more abstract API in libiscsi which accepts and
receives key sets rather than PDUs.  These routines internally send or
receive one or more PDUs.  Use these new functions to replace the
handling of TextRequest and TextResponse PDUs in discovery sessions in
both ctld and iscsid.

Note that there is not currently a use case for large Text requests
and those are still always sent as a single PDU.  However, discovery
sessions can return a text response listing targets that spans
multiple PDUs, so the new API supports sending and receiving multi-PDU
responses.

Reported by: Jithesh Arakkan @ Chelsio
Reviewed by: mav
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D33548

(cherry picked from commit b406897911ea6c18c401907a076c4642dac46127)

iscsid: Always free the duplicated address in resolve_addr().

If a "raw" IPv6 address (denoted by a leading '[') is used as a target
address, then 'arg' is incremented by one to skip over the '['.
However, this meant that at the end of the function the wrong address
was passed to free(). With malloc junking enabled and given suitably
small strings, malloc() would happily overwrite the correct number of
bytes with junk, but off by one byte overwriting the byte after the
allocation.

This manifested as the first byte of the 'HeaderDigest' key being
overwritten causing the key name on the wire to be sent as
'\x5eaderDigest' which the target rejected.

Reported by: Jithesh Arakkan @ Chelsio
Found with: ASAN (via WITH_ASAN=yes)
Sponsored by: Chelsio Communications

(cherry picked from commit c74ab5ce6f259afe1720a326df7e77848cf4f00b)

ctld: Disable -Wcast-align warnings.

clang complains about the downcasts from struct connection to struct
ctld_connection as the alignment of struct ctld_connection is higher
on 32-bit platforms. However, the warning is in this case harmless as
the downcasts are on objects originally allocated as instances of
struct ctld_connection with suitable alignment.

Reported by: npn, gjb
Fixes: 6378393308bc Add an internal libiscsiutil library.
Sponsored by: Chelsio Communications

(cherry picked from commit fa255ab1b895b4a1641206e7906086aab32b1adb)

libiscsiutil: Change keys_load/save to operate on data buffers.

This will be used in future changes to support large text requests
spanning multiple PDUs.

Provide wrapper functions keys_load/save_pdu that operate use a PDU's
data buffer.

Reviewed by: mav
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D33547

(cherry picked from commit 25700db36640b1538ac91f893955a4f1a4167f63)

libiscsiutil: Fix a memory leak with negotiation keys.

When keys are loaded from a received PDU, a copy of the received keys
block is saved in the keys struct and the name and value pointers
point into that saved block. Freeing the keys frees this block.

However, when keys are added to a keys struct to build a set of keys
later sent in a PDU, the keys data block pointer is not used and
individual key names and values hold allocated strings. When the keys
structure was freed, all of these individual key name and value
strings were leaked.

Instead, allocate copies of strings for names and values when parsing
a set of keys from a received PDU and free all of the individual key
name and value strings when deleting a set of keys.

Reviewed by: mav
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D33545

(cherry picked from commit fd99905b4591a5e4df3dda32e4c67258aaf44517)

libiscsiutil: Use open_memstream to build the outgoing block of keys.

Reviewed by: mav
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D33546

(cherry picked from commit 2ccb8fde5eec8a0a25449374fd5a71654460947f)

Add an internal libiscsiutil library.

Move some of the code duplicated between ctld(8) and iscsid(8) into a
libiscsiutil library.

Sharing the low-level PDU code did require having a
'struct connection' base class with a method table to permit separate
initiator vs target behavior (e.g. in handling proxy PDUs).

Reviewed by: mav, emaste
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D33544

(cherry picked from commit 6378393308bc6bd81fb871dacf6b03cf1a390d8b)

ccr: Replace 'blkcipher' with just 'cipher'.

ccr(4) can handle requests for AES-CTR (a stream cipher), not just
block ciphers, so make the function and structure names more generic.

Sponsored by: Chelsio Communications

(cherry picked from commit 762f1dcb1c7ad6848a2f35a0b7397ef8fb91e28d)

cryptocheck: Test Camellia-CBC cipher and RIPEMD-160 HMAC.

Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33518

(cherry picked from commit a62478aa05a10cc7881eebea005bafb1ec3ef406)

crypto: Define POLY1305_BLOCK_LEN constant.

Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33485

(cherry picked from commit 47fc04958562e3a1fca06f9321f89bea3d1dcab7)

Sort libsodium sources by path in sys/modules/crypto/Makefile.

This matches the order used in sys/conf/files to make it easier to
keep these two files in sync.

Reviewed by: imp
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33484

(cherry picked from commit 8b4af206f67e93735a5b22921469a2bce63fb6ab)

Sort libsodium entries by path in sys/conf/files.

Reviewed by: imp
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33483

(cherry picked from commit eb2d9adb246a7ff184ddd0d3fccfeccbf067985b)

RELNOTES: Note support for KTLS RX for TLS 1.3.

Sponsored by: Netflix

(cherry picked from commit 253ecb389e91cd2460ff0ed98b8541ecab7391a0)

ktls: Support for TLS 1.3 receive offload.

Note that support for TLS 1.3 receive offload in OpenSSL is still an
open pull request in active development. However, potential changes
to that pull request should not affect the kernel interface.

Reviewed by: hselasky
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D33007

(cherry picked from commit 05a1d0f5d7ac8400975d1eaa30a718a1ff48b139)

TLS: Use <machine/tls.h> for libc and rtld.

- Include <machine/tls.h> in MD rtld_machdep.h headers.

- Remove local definitions of TLS_* constants from rtld_machdep.h
  headers and libc using the values from <machine/tls.h> instead.

- Use _tcb_set() instead of inlined versions in MD
  allocate_initial_tls() routines in rtld.  The one exception is amd64
  whose _tcb_set() invokes the amd64_set_fsbase ifunc.  rtld cannot
  use ifuncs, so amd64 inlines the logic to optionally write to fsbase
  directly.

- Use _tcb_set() instead of _set_tp() in libc.

- Use '&_tcb_get()->tcb_dtv' instead of _get_tp() in both rtld and libc.
  This permits removing _get_tp.c from rtld.

- Use TLS_TCB_SIZE and TLS_TCB_ALIGN with allocate_tls() in MD
  allocate_initial_tls() routines in rtld.

Reviewed by: kib, jrtc27 (earlier version)
Differential Revision: https://reviews.freebsd.org/D33353

(cherry picked from commit 8bcdb144ebe391ce243c71caf06cf417d96ce335)

libthr: Use <machine/tls.h> for most MD TLS details.

Note that on amd64 this effectively removes the unused tcb_spare field
from the end of struct tcb since the definition of struct tcb in
<x86/tls.h> does not include that field.

Reviewed by: kib, jrtc27
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D33352

(cherry picked from commit 75395023ff1edaf4832389716338b1ba12121ffe)

Add <machine/tls.h> header to hold MD constants and helpers for TLS.

The header exports the following:

- Definition of struct tcb.
- Helpers to get/set the tcb for the current thread.
- TLS_TCB_SIZE (size of TCB)
- TLS_TCB_ALIGN (alignment of TCB)
- TLS_VARIANT_I or TLS_VARIANT_II
- TLS_DTV_OFFSET (bias of pointers in dtv[])
- TLS_TP_OFFSET (bias of "thread pointer" relative to TCB)

Note that TLS_TP_OFFSET does not account for if the unbiased thread
pointer points to the start of the TCB (arm and x86) or the end of the
TCB (MIPS, PowerPC, and RISC-V).

Note also that for amd64, the struct tcb does not include the unused
tcb_spare field included in the current structure in libthr. libthr
does not use this field, and the existing calls in libc and rtld that
allocate a TCB for amd64 assume it is the size of 3 Elf_Addr's (and
thus do not allocate room for tcb_spare).

A <sys/_tls_variant_i.h> header is used by architectures using
Variant I TLS which uses a common struct tcb.

Reviewed by: kib (older version of x86/tls.h), jrtc27
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D33351

For stable/13 only, sys/arm/include/tls.h includes support for
ARM_TP_ADDRESS which is not present in main.

(cherry picked from commit 1a62e9bc0046bfe20f4dd785561e469ff73fd508)

libc: Fix the alignment of the TCB to match rtld for several architectures.

- Use 16 byte alignment rather than 8 for aarch64, powerpc64, and RISC-V.

- Use 8 byte alignment rather than 4 for 32-bit arm, mips, and powerpc.

I suspect that mips64 should be using 16 byte alignment, but both libc
and rtld currently use 8 byte alignment.

Reviewed by: kib, jrtc27
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D33350

(cherry picked from commit 4c2f5bfbfa48b33b11912e7308ebd6f98fb6e647)

amd64: Allocate TCB with alignment of 16 rather than 8.

This matches the TLS_TCB_ALIGN definition in libc.

Reviewed by: kib, jrtc27
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D33349

(cherry picked from commit 299617496cc3c525a63833894fd8dbdc4e5de6a7)

mips _libc_get_static_tls_base: Narrow scope of #ifdef.

Reviewed by: kib, emaste, jrtc27
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D33348

(cherry picked from commit 9952b82b398968a243909ee3854b1968c3cc8c12)

mips: Add TLS_DTV_OFFSET to the result of tls_get_addr_common.

Previously TLS_DTV_OFFSET was added to the offset passed to
tls_get_addr_common; however, this approach matches powerpc and RISC-V
and better matches the intention.

Reviewed by: kib, jrtc27
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D33347

(cherry picked from commit 23e0c0e9a3e0c73169c2aa90e26bb5cb35f1aa2f)

mips: Rename TLS_DTP_OFFSET to TLS_DTV_OFFSET.

This is the more standard name for the bias of dtv pointers used on
other platforms. This also fixes a few other places that were using
the wrong bias previously on MIPS such as dlpi_tls_data in struct
dl_phdr_info and the recently added __libc_tls_get_addr().

Reviewed by: kib, jrtc27
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D33346

(cherry picked from commit 03f6b141068ee7f1004ebfc76242cf951494b7d2)

libthr: Remove the DTV_OFFSET macro.

This macro is confusing as it is not related to the similarly named
TLS_DTV_OFFSET. Instead, replace its one use with the desired
expression which is the same on all platforms.

Reviewed by: kib, emaste, jrtc27
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D33345

(cherry picked from commit 5d8176337e691d3ca3fa7d519bc3eaacf6d9faee)

rtld-elf: Use _get_tp in __tls_get_addr for aarch64 and riscv64.

Reviewed by: kib
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D33047

(cherry picked from commit b928e924f74b0b8f882a9b735611421a93113640)

GMAC: Reset initial hash value and counter in AES_GMAC_Reinit().

Previously, these values were only cleared in AES_GMAC_Init(), so a
second set of operations could reuse the final hash as the initial
hash. Currently this bug does not trigger in cryptosoft as existing
GMAC and GCM operations always use an on-stack auth context
initialized from a template context.

Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33315

(cherry picked from commit 356c922f74bfcece1f139026897a79c62adbdf50)

crypto: Don't assert for empty output buffers.

It is always valid for crp_payload_output_start to be 0. However, if
an output buffer is empty (e.g. a decryption request with a tag but an
empty payload), the existing assertion failed since 0 is not less than
0.

Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33193

(cherry picked from commit ec498562b71a5e2baee3556eed7e22947f7abc5d)

cryptosoft: Reject AES-CCM/GCM sessions with invalid key lengths.

Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33195

(cherry picked from commit c172a407fb0d2e6b4389625ebf604b5a2f831054)

crypto: Validate AES-GCM IV length in check_csp().

This centralizes the check for valid nonce lengths for AES-GCM.

While here, remove some duplicate checks for valid AES-GCM tag lengths
from ccp(4) and ccr(4).

Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33194

(cherry picked from commit 6e17a2e00d62fd3041e0bb511fe925079ad1c0d7)

Only use OLD_LIBS with shared libraries.

Use OLD_FILES for a few symbolic links and static libraries previously
included in OLD_LIBS.

Add a missing shared library major number to an old libroken entry.

(cherry picked from commit 60a8277413eca3fa57f85caf331b080aee030fe5)

Add lib32 entries for WITHOUT_PROFILE.

Reported by: Mark Millard

(cherry picked from commit 610d908f8a6a2c152bf45dfaf1ada119c513c04b)

Trim a couple of duplicate entries from WITHOUT_PROFILE.

(cherry picked from commit 07c2b29b6ed805bade2751a53274bb3dd3d20edd)

Add various profiled libraries missing from the WITHOUT_PROFILE list.

Reported by: Mark Millard

(cherry picked from commit 99188582cc9bf37acf1653ad4a0c2b51666d7107)

Remove sparc64 lastcomm/sa tests.

Reported by: Mark Millard
Fixes: d6dffbae9662 lastcomm/sa: Remove sparc64 tests, they aren't needed.

(cherry picked from commit 22375931e46f469f77f5e94462073aa34076d654)

kern_utimensat: Update name of last arg in prototype.

The last argument is a mask of AT_* flags, not a namei cnp flag as
'int follow' implies in other kern_* functions.

Obtained from: CheriBSD
Sponsored by: The University of Cambridge, Google Inc.

(cherry picked from commit 3225fd22b2191c7a7a655cb5dacea9148f29c926)

geom_vfs: Pre-allocate event for g_vfs_destroy.

When an active g_vfs is orphaned due to an underlying disk going away
the destroy is deferred until the filesystem is unmounted in
g_vfs_done(). However, g_vfs_done() is invoked from a non-sleepable
context and cannot use M_WAITOK to allocate the event. Instead,
allocate the event in g_vfs_orphan() and save it in the softc to be
retrieved by the last call to g_vfs_done().

Reported by: Jithesh Arakkan @ Chelsio
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D31354

(cherry picked from commit 419d406e4ee068644218fb881bc80f79f8191970)

Use a more specific type for geom_disk.d_event.

Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D31353

(cherry picked from commit 5b5d78897c8b1ec6b6e1dd8dd9cdbf19fee32149)

fusefs: fix two bugs regarding VOP_RECLAIM of the root inode

* We never send FUSE_LOOKUP for the root inode, since its inode number
  is hard-coded to 1.  Therefore, we should not send FUSE_FORGET for it,
  lest the server see its lookup count fall below 0.

* During VOP_RECLAIM, if we are reclaiming the root inode, we must clear
  the file system's vroot pointer.  Otherwise it will be left pointing
  at a reclaimed vnode, which will cause future VOP_LOOKUP operations to
  fail.  Previously we only cleared that pointer during VFS_UMOUNT.  I
  don't know of any real-world way to trigger this bug.

Reviewed by: pfg
Differential Revision: https://reviews.freebsd.org/D34753

(cherry picked from commit 32273253667b941c376cf08383006b3a0cbc5ca2)

amd64, i386: remove profile directive from NOTES

Support for this directive has been removed in config(8) on main,
which leaves us unable to build LINT with newer config(8). It's
believed that mcount-based profiling didn't really work on modern
systems anyways, so the value of testing this is low.

We avoid providing limited backwards compatibility here to continue and
warn folks that may somehow be deploying real-world configs with `profile`
specified.

This is a direct commit to stable/13, but a partial MFC of aa3ea612be36.

ctfdump: Remove definitions of warn() and vwarn()

The presence of the latter causes a link error when building a
statically linked ctfdump(1) because libc defines the same symbol.
libc's warn() is defined as a weak symbol and so does not cause the same
problem, but let's just use libc's version.

Reported by: stephane rochoy <stephane.rochoy@stormshield.eu>
Sponsored by: The FreeBSD Foundation

(cherry picked from commit 45dd2eaac379e5576f745380260470204c49beac)

ctf: Link CTF toolchain man pages to ctf.5

Also expand the CTF acronym to provide a bit of context.

PR: 259790
Sponsored by: The FreeBSD Foundation

(cherry picked from commit 5727eceabc93e218f0dd369a66d06d619e7fa080)

mld6: Ensure that mld_domifattach() always succeeds

mld_domifattach() does a memory allocation under the global MLD mutex
and so can fail, but no error handling prevents a null pointer
dereference in this case. The mutex is only needed when updating the
global softc list; the allocation and static initialization of the softc
does not require this mutex. So, reduce the scope of the mutex and use
M_WAITOK for the allocation.

PR: 261457
Sponsored by: The FreeBSD Foundation

(cherry picked from commit 5d691ab4f03d436d38f46777c3c117cf5a27f1bc)

xhci(4): Ensure the so-called data toggle gets properly reset.

Use the drop and enable endpoint context commands to force a reset of
the data toggle for USB 2.0 and USB 3.0 after:
- clear endpoint halt command (when the driver wishes).
- set config command (when the kernel or user-space wants).
- set alternate setting command (only affected endpoints).

Some XHCI HW implementations may not allow the endpoint reset command when
the endpoint context is not in the halted state.

Reported by: Juniper and Gary Jennejohn
Sponsored by: NVIDIA Networking

(cherry picked from commit cda31e734925346328fd2369585ab3f6767ec225)

e1000: Try auto-negotiation for fixed 100 or 10 configuration

Currently if an e1000 interface is set to a fixed media configuration,
for gigabit, it will participate in auto-negotiation as required by
IEEE 802.3-2018 Clause 37. However, if set to fixed media configuration
for 100 or 10, it does NOT participate in auto-negotiation.

By my reading of Clauses 28 and 37, while auto-negotiation is optional
for 100 and 10, it is not prohibited and is, in fact, "highly
recommended".

This patch enables auto-negotiation for fixed 100 and 10 media
configuration, in a similar manner to that already performed for 1000.
I.e., the patch enables advertising of just the manually configured
settings with the goal of allowing the remote end to match the manually
configured settings if it has them available.

To be clear, this patch does NOT allow an em(4) interface that has been
manually configured with specific media settings to respond to
auto-negotiation by then configuring different parameters to those that
were manually configured. The intent of this patch is to fully comply
with the requirements of Clause 37, but for 100 and 10.

The need for this has arisen on an em(4) link where the other end is
under a different administrative control and is set to full
auto-negotiation. Due to the cable length GigE is not working well. It
is desired to set the em(4) end to "media 100baseTX mediatype
full-duplex" which does work when both ends are configured that way.
Currently, because em(4) does not participate in autoneg for this
setting, the remote defaults to half-duplex - i.e., there's a duplex
mismatch and things don't work. With this patch, em(4) would inform the
remote that it has only 100baseTX full, the remote would match that and
it will work.

Approved by: erj
Differential Revision: https://reviews.freebsd.org/D34449

(cherry picked from commit 9ab4dfce8feda8cf3545be0c3c7569095b1fcd24)

e1000: Update mc filter before RCTL flags

Update mc filter array before changing RCTL flags as in 5a3eb6207a35

Approved by: grehan

(cherry picked from commit 07ede751612f8879675e2970b3875ea3831e2b9c)

ixgbe: Update mc filter before FCTRL flags

Update mc filter array before changing FCTRL flags, similar to 5a3eb6207a35

Approved by: grehan

(cherry picked from commit 395cc55d896654b8f75071e71e856b22aed87da5)

libcxxrt: Insert padding in __cxa_dependent_exception

Padding was added to __cxa_exception in 45ca8b19 and
__cxa_dependent_exception needs the same layout.
Add some static_asserts to detect this in the future.

Merge of libcxxrt commit b00c6c564357

(cherry picked from commit c40e4349889b32500e51e60f9529dbcc080f468b)

Approved by: re (gjb, accelerated MFC)

Bump __FreeBSD_version after linuxkpi mfc

linuxkpi: Add for_each_sgtable_dma_sg and for_each_sgtable_dma_page

Variants of for_each_sg/for_each_sg_dma_page but they operate on sgtable
structs.
Needed by drm v5.10

MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG

(cherry picked from commit 1aca8a6ec61cce1a4673a8f7b5412b106e366aff)

linuxkpi: Implement dma_max_mapping_size

Simply returns SCATTERLIST_MAX_SEGMENT.
Needed by drm v5.10

MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG

(cherry picked from commit 1acf9b27704cebfcf14cd896264be29c1d27c4c3)

linuxkpi: Change irq_work_queue to return a bool

This was changed in Linux v5.10

MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co .KG

(cherry picked from commit 2192bc32554ab94900d644e2af197c700c81692d)

linuxkpi: Add llnode member in struct irq_work

This was added in Linux v5.8 and started to be used in drm code in v5.9

MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG

(cherry picked from commit 17ee6aca65c8dfb5f041b3229655ce12b0f37372)

linuxkpi: Add down_write_nest_lock

Simply calls down_write like in Linux (when CONFIG_DEBUG_LOCK_ALLOC isn't
specified)
Needed by drm v5.10

MFC after: 1 month
Reviewed by: hselasky
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D34642

(cherry picked from commit f9413897cb84499c76be140312e6ef2d24488040)

linuxkpi: Add kstrtouint_from_user

Like kstrtoint_from_user but for uint.
Needed by drm v5.10

MFC after: 1 month
Reviewed by: hselasky
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D34642

(cherry picked from commit 8e587a5f13ce676d7763ffa920148fe31bc3bfae)

linuxkpi: Add cond_resched_lock

If we need to resched it takes the lock, resched, release the lock
and returns 1, otherwise simply returns 0.

Needed by drm v5.9

MFC after: 1 month
Reviewed by: hselasky
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D34620

(cherry picked from commit 9b8016548ef61e2c42271c7dce35d0322540d2f1)

hpet: Allow a MMIO window smaller than 1K

Some new AMD systems provide a HPET MMIO region smaller than the 1KB
specified, and a correspondingly small number of timers. Handle this in
the HPET driver rather than requiring a 1KB window. This allows the
HPET driver to attach on such systems.

PR: 262638
Reviewed by: markj

(cherry picked from commit 964bf2f902c5e05381018532e5d9d456979d4bf7)

if_bnxt: Allow bnxt interfaces to use vlans

When VLAN HW filter is disabled, the NIC does not pass any vlan tagged
traffic. Setting these flags on the device allows vlan tagged traffic to
pass.

PR: 236983
Tested by: pi
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D34824

(cherry picked from commit 0c6a2fa33e36ac0b5d51cbae39a9c5564ad61788)

pf: counter argument to pfr_pool_get() may never be NULL

Coverity points out that if counter was NULL when passed to
pfr_pool_get() we could potentially end up dereferencing it.
Happily all users of the function pass a non-NULL pointer. Enforce this
by assertion and remove the pointless NULL check.

Reported by: Coverity (CID 273309)
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")

(cherry picked from commit efc64d02a62f3254ecc0b22fcbcb8f73a079669f)

pfsync: NULL check before dereference

Move the use of 'sc' to after the NULL check.
It's very unlikely that we'd actually hit this, but Coverity is correct
that it's not a good idea to dereference the pointer and only then NULL
check it.

Reported by: Coverity (CID 1398362)
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")

(cherry picked from commit 43020350635150eeb439c035f608ec9e78ddff8f)

pf: remove pointless NULL check

pfi_kkif_attach() always returns non-NULL, and we dereference the
pointer before we check it, so that's pointless.

Reported by: Coverity (CID 1007345)
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")

(cherry picked from commit ed6287c14168de409c5f333bda59896c8109eb70)

callout: fix using shared rmlocks

15b1eb142c changed the callout code to store the CALLOUT_SHAREDLOCK flag
in c_iflags (where it used to be c_flags), but failed to update the
check in softclock_call_cc(). This resulted in the callout code always
taking the write lock, even if a read lock had been requested (with
the CALLOUT_SHAREDLOCK flag in callout_init_rm()).

Reviewed by: markj
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D34959

(cherry picked from commit a879e40ca2a9e95b3e3dc4810127d3cf105ec0d3)

syscons: silent 'set but unused' warning in some configurations

(cherry picked from commit d282bb42c341d1a58afd1c66f85321aed6d67a45)

release: fix on-disc pkg binary symbolic links

Approved by: re (kib)
PR: 263574
Reported by: loader
MFC after: immediately
Sponsored by: Rubicon Communications, LLC ("Netgate")

(cherry picked from commit 68b0a79b7c7ab75597e2511f880238fbf8cfad32)

ssh: remove duplicate setting of MAIL env var

We already set it earlier in do_setup_env().

Fixes: 19261079b743 ("openssh: update to OpenSSH v8.7p1")
MFC after: 1 week
Sponsored by: The FreeBSD Foundation

(cherry picked from commit 19780592633e50efca39454d1ecf029bd7d87868)

capsicum: briefly describe capabilities in man page

Provide a very brief introduction to capabilities, using a couple of
sentences from David Chisnall's mailing list response[1] to a question
about Linux capabilities and Capsicum.

Mailing list subject (in case the archive URL changes) was
Re: Linux capabilities to Capsicum

[1] https://lists.freebsd.org/archives/freebsd-hackers/2022-April/001032.html

Reviewed by: oshogbo
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D34945

(cherry picked from commit 1f568792c6156988d357ea31a36d77ed11cc9a2d)

libexec/rc.d/hostapd: Down/up interface when interface is specified

When no interface is specified results in a syntax error in the rc
script. Only execute poststart when an interface has been specified.

PR: 263358
Submitted by: markj
Reported by: Joshua Kinard <freebsd@kumba.dev>
Fixes: 0da2c91e64528d896f69d36670e25b4b4a140579

(cherry picked from commit 1452bfcd9bbcb2f5bbb89fa38d01ce51dd9b6d44)

path_test: Verify that operations on unlinked files work

Sponsored by: The FreeBSD Foundation

(cherry picked from commit b13ac678420292f5994b0b6e0f27995b9399268b)

ssh: apply style(9) to version_addendum

Reported by: allanjude (in review D29953)
Fixes: 462c32cb8d7a ("Upgrade OpenSSH to 6.1p1.")
MFC after: 1 week
Sponsored by: The FreeBSD Foundation

(cherry picked from commit 613b4b79713e294140757270f02a8aa6273be3d4)

psci: finish psci_present implementation

This was already declared in psci.h, but it was never defined/set. Do
this now, so we can use it to decide if enable-method in /cpus FDT nodes
should be inspected later on. While we're here, convert it to a
boolean.

Reviewed by: andrew (slightly earlier version)

(cherry picked from commit 2218070b2c3c32b1574f3f3b2d0579b2d4826554)

stand: zfs: handle holes at the tail end correctly

This mirrors dmu_read_impl(), zeroing out the tail end of the buffer and
clipping the read to what's contained by the block that exists.

This fixes an issue that arose during the 13.1 release process; in
13.1-RC1 and later, setting up GELI+ZFS will result in a failure to
boot. The culprit is this, which causes us to fail to load geom_eli.ko
as there's a residual portion after the single datablk that should be
zeroed out.

PR: 263407
Reviewed by: tsoome

(cherry picked from commit 914dc91d12198352b7878a88d30e2a6373a936e1)

cp: fix -R recursion detection

The correct logic is a lot simpler than the previous iteration.  We
record the base fts_name to avoid having to worry about whether we
needed the root symlink name or not (as applicable), then we can simply
shift all of that logic to after path translation to make it less
fragile.

If we're copying to DNE, then we'll have swapped out the NULL root_stat
pointer and then attempted to recurse on it.  The previously nonexistent
directory shouldn't exist at all in the new structure, so just back out
from that tree entirely and move on.

The tests have been amended to indicate our expectations better with
subdirectory recursion.  If we copy A to A/B, then we expect to copy
everything from A/B/* into A/B/A/B, with exception to the A that we
create in A/B.

Reviewed by: bapt
Sponsored by: Klara, Inc.

(cherry picked from commit f00f8b4fbd268a212687984e44daa3e0d0a16b87)

cp: fix some cases with infinite recursion

As noted in the PR, cp -R has some surprising behavior.  Typically, when
you `cp -R foo bar` where both foo and bar exist, foo is cleanly copied
to foo/bar.  When you `cp -R foo foo` (where foo clearly exists), cp(1)
goes a little off the rails as it creates foo/foo, then discovers that
and creates foo/foo/foo, so on and so forth, until it eventually fails.

POSIX doesn't seem to disallow this behavior, but it isn't very useful.
GNU cp(1) will detect the recursion and squash it, but emit a message in
the process that it has done so.

This change seemingly follows the GNU behavior, but it currently doesn't
warn about the situation -- the author feels that the final product is
about what one might expect from doing this and thus, doesn't need a
warning.  The author doesn't feel strongly about this.

PR: 235438
Reviewed by: bapt
Sponsored by: Klara, Inc.

(cherry picked from commit 848263aad129c8f9de75b58a5ab9a010611b75ac)

cam: don't send scsi commands on shutdown when reboot method RB_NOSYNC

Don't send the SCSI comand SYNCHRONIZE CACHE on devices that are still
open when RB_NOSYNC is the reboot method. This may avoid recursive panics
when doadump is called due to a SCSI/CAM/USB error/bug.

Obtained from: Semihalf
Sponsored by: Stormshield
Reviewed by: imp
Differential revision: https://reviews.freebsd.org/D31549

(cherry picked from commit e0ceec676dc86ddca960a9858ae5e3a4e0c8390d)

mountd: Delay starting mountd until after mountlate

PR#254282 reports a problem where nullfs mounts cannot be
exported via mountd for FreeBSD 13.0.

The problem seems to be that, to do the nullfs mounts in
/etc/fstab, they require the "late" mount option, so that the
underlying filesystem is mounted (ZFS for the PR).

Adding "mountlate" to the REQUIRE list in /etc/rc.d/mountd
fixes the problem, but that results in a dependency cycle
because /etc/rc.d/lockd specifies:

REQUIRE: nfsd
BEFORE: DAEMON
--> which forces mountd to preceed DAEMON.

This patch removes "nfsd" from REQUIRE for lockd and statd,
then adds mountlate to REQUIRE for mountd, to fix this
problem. Having lockd REQUIRE nfsd was done in the NetBSD
code when it was pulled into FreeBSD and there does not
seem to be a need for this.

In case this causes problems, a long MFC has been specified.

PR: 254282
(cherry picked from commit f72926eab00ccd956298e44831b519daa704a868)

libsa: Fix a typo in a panic message

- s/occured/occurred/

(cherry picked from commit 746cc38ec358f743d3be3fa0b6eeecbf520a38be)

oce(4): Fix a typo in a sysctl description

- s/interupt/interrupt/

(cherry picked from commit 88cdccff3f76cb3f5f2656bfe5676538e9e569ab)

iicbus(4): Fix two typos in kernel error messages

- s/occured/occurred/

(cherry picked from commit 7fad3ed8e9bdba1ad81a141a47544cd0481da8b9)

tslog(4): Fix a typo in the manual page

- s/schedulling/scheduling/

(cherry picked from commit cebd29c950dc68cc416b9bd55ad62b9e7e25c077)

sed(1): Fix a typo in the manual page

- s/occurances/occurrences/

(cherry picked from commit 583bb9c530b2316c83017fc51517d3acad1ed9dd)

libsysdecode: Add regression tests for sysdecode_cap_rights(3)

Reviewed by: jhb, emaste
Sponsored by: The FreeBSD Foundation

(cherry picked from commit d0f245d21f47c55ed40b34a17d5caf08aba1952f)

libsysdecode: Include required headers in sysdecode.h

Make sysdecode.h self-contained rather than forcing all consumers to
include dependencies. No functional change intended.

Reviewed by: pauamma_gundo.com, jhb, emaste
Sponsored by: The FreeBSD Foundation

(cherry picked from commit 354efc4c94078c3c83ea45d467c6377f3f92926e)

libsysdecode: Fix decoding of Capsicum rights

Capsicum rights are a bit tricky since some of them are subsets of
others, and one can have rights R1 and R2 such that R1 is a subset of
R2, but there is no collection of named rights whose union is R2.  So,
they don't behave like most other flag sets.  sysdecode_cap_rights(3)
does not handle this properly and so can emit misleading decodings.

Try to fix all of these problems:
- Include composite rights in the caprights table.
- Use a constructor to sort the caprights table such that "larger"
  rights appear first and thus are matched first.
- Don't print rights that are a subset of rights already printed, so as
  to minimize the length of the output.
- Print a trailing message if some of the specific rights are not
  matched by the table.

PR: 263165
Reviewed by: pauamma_gundo.com (doc), jhb, emaste
Sponsored by: The FreeBSD Foundation

(cherry picked from commit 869199d9922c7dee92c1c24f95b90f1d1319433e)

callout: Remove the CS_EXECUTING flag

It is now unused.

Sponsored by: The FreeBSD Foundation

(cherry picked from commit 7524994da0bb7b4eb003bc5ac465a316925d40ed)

setitimer: Fix exit race

We use the p_itcallout callout, interlocked by the proc lock, to
schedule timeouts for the setitimer(2) system call.  When a process
exits, the callout must be stopped before the process struct is
recycled.

Currently we attempt to stop the callout in exit1() with the call
_callout_stop_safe(&p->p_itcallout, CS_EXECUTING).  If this call returns
0, then we sleep in order to drain the callout.  However, this happens
only if the callout is not scheduled at all.  If the callout thread is
blocked on the proc lock, then exit1() will not block and the callout
may execute after the process has fully exited, typically resulting in a
panic.

I cannot see a reason to use the CS_EXECUTING flag here.  Instead, use
the regular callout_stop()/callout_drain() dance to halt the callout.

Reported by: ler
Tested by: ler, pho
Sponsored by: The FreeBSD Foundation

(cherry picked from commit b319171861464f6c445905e7649cb43bf9bc78be)