CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log

busdma: Add KMSAN integration

Sanitizer instrumentation of course cannot automatically update shadow
state when devices write to host memory. KMSAN thus hooks into busdma,
both to update shadow state after a device write, and to verify that the
kernel does not publish uninitalized bytes to devices.

To implement this, when KMSAN is configured, each dmamap embeds a memory
descriptor describing the region currently loaded into the map.
bus_dmamap_sync() uses the operation flags to determine whether to
validate the loaded region or to mark it as initialized in the shadow
map.

Note that in cases where the amount of data written is less than the
buffer size, the entire buffer is marked initialized even when it is
not. For example, if a NIC writes a 128B packet into a 2KB buffer, the
entire buffer will be marked initialized, but subsequent accesses past
the first 128 bytes are likely caused by bugs.

Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31338

busdma: Add an internal BUS_DMA_FORCE_MAP flag to x86 bounce_busdma

Use this flag to indicate that busdma should allocate a map structure
even no bouncing is required to satisfy the tag's constraints. This
will be used for KMSAN.

Also fix a memory leak that can occur if the kernel fails to allocate
bounce pages in bounce_bus_dmamap_create().

Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31338

Add a GENERIC-KMSAN kernel configuration for amd64

Sponsored by: The FreeBSD Foundation

kmsan: Add a manual page

Sponsored by: The FreeBSD Foundation

amd64: Add MD bits for KMSAN

Interrupt and exception handlers must call kmsan_intr_enter() prior to
calling any C code.  This is because the KMSAN runtime maintains some
TLS in order to track initialization state of function parameters and
return values across function calls.  Then, to ensure that this state is
kept consistent in the face of asynchronous kernel-mode excpeptions, the
runtime uses a stack of TLS blocks, and kmsan_intr_enter() and
kmsan_intr_leave() push and pop that stack, respectively.

Use these functions in amd64 interrupt and exception handlers.  Note
that handlers for user->kernel transitions need not be annotated.

Also ensure that trap frames pushed by the CPU and by handlers are
marked as initialized before they are used.

Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31467

amd64: Populate the KMSAN shadow maps and integrate with the VM

- During boot, allocate PDP pages for the shadow maps.  The region above
  KERNBASE is currently not shadowed.
- Create a dummy shadow for the vm page array.  For now, this array is
  not protected by the shadow map to help reduce kernel memory usage.
- Grow shadows when growing the kernel map.
- Increase the default kernel stack size when KMSAN is enabled.  As with
  KASAN, sanitizer instrumentation appears to create stack frames large
  enough that the default value is not sufficient.
- Disable UMA's use of the direct map when KMSAN is configured.  KMSAN
  cannot validate the direct map.
- Disable unmapped I/O when KMSAN configured.
- Lower the limit on paging buffers when KMSAN is configured.  Each
  buffer has a static MAXPHYS-sized allocation of KVA, which in turn
  eats 2*MAXPHYS of space in the shadow map.

Reviewed by: alc, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31295

kern: Ensure that thread-local KMSAN state is available

Sponsored by: The FreeBSD Foundation

Add the KMSAN runtime

KMSAN enables the use of LLVM's MemorySanitizer in the kernel.  This
enables precise detection of uses of uninitialized memory.  As with
KASAN, this feature has substantial runtime overhead and is intended to
be used as part of some automated testing regime.

The runtime maintains a pair of shadow maps.  One is used to track the
state of memory in the kernel map at bit-granularity: a bit in the
kernel map is initialized when the corresponding shadow bit is clear,
and is uninitialized otherwise.  The second shadow map stores
information about the origin of uninitialized regions of the kernel map,
simplifying debugging.

KMSAN relies on being able to intercept certain functions which cannot
be instrumented by the compiler.  KMSAN thus implements interceptors
which manually update shadow state and in some cases explicitly check
for uninitialized bytes.  For instance, all calls to copyout() are
subject to such checks.

The runtime exports several functions which can be used to verify the
shadow map for a given buffer.  Helpers provide the same functionality
for a few structures commonly used for I/O, such as CAM CCBs, BIOs and
mbufs.  These are handy when debugging a KMSAN report whose
proximate and root causes are far away from each other.

Obtained from: NetBSD
Sponsored by: The FreeBSD Foundation

amd64: Define KVA regions for KMSAN shadow maps

KMSAN requires two shadow maps, each one-to-one with the kernel map.
Allocate regions of the kernels PML4 page for them. Add functions to
create mappings in the shadow map regions, these will be used by the
KMSAN runtime.

Reviewed by: alc, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31295

conf: Add a KMSAN kernel option

Sponsored by: The FreeBSD Foundation

amd64 pmap: Pre-set PG_M on 2MB KASAN shadow map entries

Also remove a redundant assertion in pmap_kasan_enter().

Reviewed by: alc, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31295

build.7: Describe the default value for LOCAL_MODULES

Suggested by: jhb
MFC after: 1 week

Mark some sysctls as CTLFLAG_MPSAFE.

MFC after: 2 weeks

geom(4): Mark all sysctls as CTLFLAG_MPSAFE.

This code does not use Giant lock for very long time.

MFC after: 2 weeks

cam(4): Mark all sysctls as CTLFLAG_MPSAFE.

This code does not use Giant lock for very long time.

MFC after: 2 weeks

ntb_hw_intel(4): Add CTLFLAG_MPSAFE flags.

I should have added those in 50f16247a1.

MFC after: 2 weeks

rtsx: Fix wakeup race similar to sdhci one fixed in 35547df5c786

rtsx copied code from sdhci, and has the same wakeup race bug that was
fixed in 35547df5c786, so apply a similar fix here.

Sponsored by: Netflix

Address the reported mmc serialization issue.

Reset the mmc owner before calling the bridge release host callback.

Some people are hitting the "mmc: host bridge didn't serialize us." panic as
the bridge is released before the mmc owner is reset.

Submitted by: luiz
Sponsored by: Rubicon Communications, LLC ("Netgate")

Call wakeup() with the lock held to avoid missed wakeup races.

Submitted by: luiz
Sponsored by: Rubicon Communications, LLC ("Netgate")

devmatch: Ignore the pnp fields tagged as ignore ('#')

When matching entries, we should ignore those with a name of '#'. It's
the standard way to skip elements and need to be present to have the
proper offsets to the fields that are observed. No bus has a pnp
attribute of '#' and that is now disallowed for future buses that are
written.

Sponsored by: Netflix
Reviewed by: kbowling
Differential Revision: https://reviews.freebsd.org/D31482

nfs tls: Update for SSL_OP_ENABLE_KTLS.

Upstream OpenSSL (and the KTLS backport) have switched to an opt-in
option (SSL_OP_ENABLE_KTLS) in place of opt-out modes
(SSL_MODE_NO_KTLS_TX and SSL_MODE_NO_KTLS_RX) for controlling kernel
TLS.

Reviewed by: rmacklem
Sponsored by: Netflix
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D31445

ar: provide error exit status upon failure

Previously ar and ranlib returned with exit status 0 (success) in the
case of a missing file or other error. Update to use error handling
similar to that added by ELF Tool Chain after that project forked
FreeBSD's ar.

PR: PR257599 [exp-run]
Reported by: Shawn Webb, gehmehgeh (on HardenedBSD IRC)
Reviewed by: markj
Obtained from: elftoolchain
MFC after: 2 months
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31402

ntb_transport(4): Mark callouts MP-safe.

The only thing around NTB using Giant lock is NewBus, and these callouts
have nothing to do with it.

MFC after: 2 weeks

ntb_hw_intel(4): Remove CTLFLAG_NEEDGIANT flags.

Most of the sysctls just read hardware registers. They don't need
any locking.

MFC after: 2 weeks

pmc(3): remove Pentium-related man pages and references

Support for Pentium events was removed completely in e92a1350b50e.

Don't bump .Dd where we are just removing xrefs.

Reviewed by: emaste
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31423

e1000: rctl/srrctl buffer size init, rfctl fix

Simplify the setup of srrctl.BSIZEPKT on igb class NICs.
Improve the setup of rctl.BSIZE on lem and em class NICs.
Don't try to touch rfctl on lem class NICs.
Manipulate rctl.BSEX correctly on lem and em class NICs.

Approved by: markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31457

build.7: Document LOCAL_MODULES and LOCAL_MODULES_DIR

Reviewed by: 0mp, imp
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31461

share/man/man7/arch.7: bump date

share/man/man7/arch.7: powerpc64 support appeared in 9.0-RELEASE

Differential revision: https://reviews.freebsd.org/D31013
Approved by: eadler, 0mp

[skip ci] unix(4): LOCAL_PEERCRED works on SOCK_SEQPACKET, too.

MFC after: 2 weeks
Reviewed By: dchagin
Differential Revision: https://reviews.freebsd.org/D31456

ip(4): Mention IP_IPSEC_POLICY ip-level socket option

Text is literally taken from NetBSD ip(4).

Sponsored by: NVIDIA Networking
MFC after: 3 days

ipsec_set_policy(3): fix sentence

Sponsored by: NVIDIA Networking
MFC after: 3 days

netipsec/key.c: Use ANSI C definition for key_random()

Sponsored by: NVIDIA Networking
MFC after: 3 days

netipsec/keydb.h: fix typo

Sponsored by: NVIDIA Networking
MFC after: 3 days

e1000: Fix lem/em UDP rx csum offload

Rebase on igb code and unify lem/em implementations.

PR: 257642
Reported by: Nick Reilly <nreilly@blackberry.com>
Reviewed by: karels, emaste
Tested by: Nick Reilly <nreilly@blackberry.com>
Approved by: grehan
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31449

riscv: Fix pmap_alloc_l2 when it should allocate a new L1 entry

The current code checks the RWX bits are 0 but does not check the V bit
is non-zero, meaning not-yet-allocated L1 entries that are still zero
are regarded as being allocated. This is likely due to copying the arm64
code that checks ATTR_DESC_MASK is L1_TABLE, which emcompasses both the
type and the validity in a single field, and erroneously translating
that to a check of just PTE_RWX being 0 to indicate non-leaf, forgetting
about the V bit. This then results in the following panic:

    panic: Fatal page fault at 0xffffffc0005cf292: 0x00000000000050
    cpuid = 1
    time = 1628379581
    KDB: stack backtrace:
    db_trace_self() at db_trace_self
    db_trace_self_wrapper() at db_trace_self_wrapper+0x38
    kdb_backtrace() at kdb_backtrace+0x2c
    vpanic() at vpanic+0x148
    panic() at panic+0x2a
    page_fault_handler() at page_fault_handler+0x1ba
    do_trap_supervisor() at do_trap_supervisor+0x7a
    cpu_exception_handler_supervisor() at
    cpu_exception_handler_supervisor+0x70
    --- exception 13, tval = 0x50
    pmap_enter_l2() at pmap_enter_l2+0xb2
    pmap_enter_object() at pmap_enter_object+0x15e
    vm_map_pmap_enter() at vm_map_pmap_enter+0x228
    vm_map_insert() at vm_map_insert+0x4ec
    vm_map_find() at vm_map_find+0x474
    vm_map_find_min() at vm_map_find_min+0x52
    vm_mmap_object() at vm_mmap_object+0x1ba
    vn_mmap() at vn_mmap+0xf8
    kern_mmap() at kern_mmap+0x4c4
    sys_mmap() at sys_mmap+0x38
    do_trap_user() at do_trap_user+0x208
    cpu_exception_handler_user() at cpu_exception_handler_user+0x72
    --- exception 8, tval = 0x1dd

Instead, we should just check the V bit, as on amd64, and assert that
any valid L1 entries are not leaves, since an L1 leaf would render the
entire range allocated and thus we should not have attempted to map that
VA in the first place.

Reported by: David Gilbert <dgilbert@daveg.ca>
MFC after: 1 week
Reviewed by: markj, mhorne
Differential Revision: https://reviews.freebsd.org/D31460

Remove duplicate entry for arm/mv/armada38x/armada38x_rtc.c

Sponsored by: Rubicon Communications, LLC ("Netgate")

vmm: Make iommu ops tables const

While here, use designated initializers and rename some AMD iommu method
implementations to match the corresponding op names. No functional
change intended.

Reviewed by: grehan
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31462

amd64: Fix output operand specs for the stmxcsr and vmread intrinsics

This does not appear to affect code generation, at least with the
default toolchain.

Noticed because incorrect output specifications lead to false positives
from KMSAN, as the instrumentation uses them to update shadow state for
output operands.

Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31466

kasan.9: Note the header required for kasan_mark()

Sponsored by: The FreeBSD Foundation

nd6: Mark several callouts as MPSAFE

The use of Giant here is vestigal and does not provide any useful
synchronization. Furthermore, non-MPSAFE callouts can cause the
softclock threads to block waiting for long-running newbus operations to
complete.

Reported by: mav
Reviewed by: bz
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31470

in6: Enter the net epoch in in6_tmpaddrtimer()

We need to do so to safely traverse the ifnet list.

Reviewed by: bz
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31470

vfs: Avoid a comparison with an uninitialized field in setutimes()

Some filesystems, e.g., devfs, do not populate va_birthtime in their
GETATTR implementations. To handle this, make sure that va_birthtime is
initialized to the quasi-standard value of { VNOVAL, 0 } before calling
VOP_GETATTR.

Reported by: KMSAN
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31468

release: allow VM_EXTRA_PACKAGES to be specified in the environment

This is useful for adding extra packages to the build of an AMI.
For example:
env VM_EXTRA_PACKAGES="zsh" make -C release ec2ami

Approved by: gjb
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")

release: fix copypasta

Approved by: gjb
MFC after: 1 week
X-MFC-With: fd17ea8c1849039c436f7192ca407db70561df03
Sponsored by: Rubicon Communications, LLC ("Netgate")

release: make pkg installs more robust

Currently pkg(8) will fail to install any package if one is missing, so
make this a loop to prevent one missing package from preventing the rest
from installing. Seen building an AWS AMI for aarch64 on main and
ebsnvme-id is not available in the repo at the moment.

Approved by: gjb
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")

sctp: remove some set, but unused variables

Thanks to pkasting for submitting the patch for the userland stack.

MFC after: 3 days

mkimg(1): Correct a typo in the usage output

- s/partion/partition/

MFC after: 5 days

nanobsd: Correct a typo in a comment

- s/partion/partition/

MFC after: 3 days

Remove an unused arm64 panic string

This was added early in the development of the arm64 port when
cpu_switch was just a stub. It should have been removed when cpu_switch
was implemented, however this didn't seem to be the case, and the '%p'
was added.

As this hasn't been needed in 7 years we can remove it.

Sponsored by: The FreeBSD Foundation

Add macros for the arm64 daifset/daifclr flags

Sponsored by: The FreeBSD Foundation

Clean up the arm64 fork_trampoline

When exiting to userspace the code is similar to the restore_registers
macro in exception.S. Rework it to remove most of the non-style
differences.

Sponsored by: The FreeBSD Foundation

ipsec: Handle ICMP NEEDFRAG message.

    It will be needed for upcoming PMTU implementation in ipsec.
    For now simply create/update an entry in tcp hostcache when needed.
    The code is based on https://people.freebsd.org/~ae/ipsec_transport_mode_ctlinput.diff

Authored by: Kornel Duleba <mindal@semihalf.com>
Differential revision: https://reviews.freebsd.org/D30992
Reviewed by: tuxen
Sponsored by: Stormshield
Obtained from: Semihalf

ipv6: Fix getsockopt() for some IPPROTO_IPV6 level socket options

Fix getsockopt() for the IPPROTO_IPV6 level socket options with the
following names: IPV6_HOPOPTS, IPV6_RTHDR, IPV6_RTHDRDSTOPTS,
IPV6_DSTOPTS, and IPV6_NEXTHOP.

Reviewed by: markj
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D31458

Optimize res_find().

When the device name is provided, we can simply run strncmp() for each
line to quickly skip unrelated ones, that is much faster than sscanf()
and only then strcmp().

MFC after: 2 weeks

mmc_fdt_helper: correct typo in DT property name

'no-1-8-v' is a proper name according to the DT binding
documentation
(https://www.kernel.org/doc/Documentation/devicetree/bindings/mmc/mmc-controller.yaml).

Fixes: e63fbd7bb7a25
MFC after: 1 week
Sponsored by: Semihalf

kbdmux(4): Make callout handler mpsafe.

Both callout and taskqueue now have drain() routines not requiring
external locking. It allows to remove TASK flag and manual drain,
so the only thing remaining for lock to protect inside the callout
handler is ks_inq_length zero comparison, that can be lockless.

MFC after: 2 weeks

dtrace: use %zu format specifier for data of size_t type

Sponsored by: The FreeBSD Foundation

enetc: Add autogenerated files to Makefile

A module makefile must list all the header files it uses which are
generated at build time from interface definitions (.m files) in its
SRCS list.

Fixes: 5ad6d28cbe6b ("enetc: Support building the driver as a loadable module.")

enetc: Force correct order with DRIVER_MODULE_ORDERED

The toolchain can reorder symbols, meaning that changing
the order of DRIVER_MODULE macros is not enough
to ensure that miibus gets registered first.
Use DRIVER_MODULE_ORDERED instead to fix the problem properly.

Fixes: 5ad6d28cbe6b ("enetc: Support building the driver as a loadable module.")
Reported by: jhb

felix: Add autogenerated files to Makefile

A module makefile must list all the header files it uses which are
generated at build time from interface definitions (.m files) in its
SRCS list.

Fixes: 451bcf1b3601
Reported by: ian

amd64 UEFI loader: stop copying staging area to 2M physical

On amd64, add a possibility to activate kernel with staging area in place.
Add 'copy_staging' command to control this.  For now, by default the
old mode of copying kernel to 2M phys is retained.  It is going to be
changed in several weeks.

On amd64, add some slop to the staging area to satisfy both requirements
of the kernel startup allocator, and to have space for minor staging data
increase after the final size is calculated.  Add a new command
'staging_slop' to control its size.

Improve staging area resizing, in particular, reallocate it anew if
we cannot grow it neither down nor up.

Reviewed by: kevans, markj
Discussed with: emaste (the delivery plan)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31121

_Exit(3): document implementation

Remove a useless note about unlinking temporary files, they are unlinked
in tmpfile(3) [1]. Add a note about __cxa_atexit().

Explain exactly what are the FreeBSD implementation differences between
exit() and _Exit().

Noted by: markj [1]
Reviewed by: emaste, markj
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Differential revision: https://reviews.freebsd.org/D31425

fork(2): comment about doubtful use of stdio and exit(3) in example

Add fflush(stdout) as the common idiom. Explain the need to use exit()
but advise against it.

Reviewed by: emaste, markj
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Differential revision: https://reviews.freebsd.org/D31425

Enable compressed debug on little-endian targets

Compressed debug was enabled by default in commit c910570e7573, but
broke the build on big-endian targets, and so was disabled in
89ed2ecb14ce.

Older versions of LLD fail with big-endian compressed debug sections.
This was fixed in LLD upstream (commit c6ebc651b6fa) and merged to
FreeBSD main (commit d69d07569ee2) by dim.

External toolchains (e.g. the llvm12 package) will not yet have the fix.
These may be used to link against base system .a archives, so compressed
debug sections would cause trouble even though the base system is fixed.

Enable compressed debug sections again, for little-endian targets only.
As discussed on freebsd-hackers[1] I expect to undo this in the future
(using compressed debug everywhere), once fixed versions of lld are
widely available.

Note that to be pedantically correct we should check both the compiler
and the linker for compressed debug support, but given the external
toolchain constraint the extra complexity does not seem worthwhile.

[1] https://lists.freebsd.org/archives/freebsd-hackers/2021-August/000188.html

PR: 257638
Reported by: jrtc27 [impact of .a archives]
Discussed with: imp
Relnotes: Yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31454

Revert "Disable compressed debug by default"

This reverts commit 89ed2ecb14ceabc27883282cf96559a9e7d52717.

Remove "All Rights Reserved" from FreeBSD Foundation sys/ copyrights

These ones were unambiguous cases where the Foundation was the only
listed copyright holder (in the associated license block).

Sponsored by: The FreeBSD Foundation

cam: revert half of 75b5caa08ef

This turns debugging printf() into a KASSERT().
It's for ATA for now; SCSI will came later.

Reviewed By: imp
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31380

Simplify nhop operations in ip_output().

Consistently use `nh` instead of always dereferencing
ro->ro_nh inside the if block.
Always use nexthop mtu, as it provides guarantee that mtu is accurate.
Pass `nh` pointer to rt_update_ro_flags() to allow upcoming uses
of updating ro flags based on different nexthop.

Differential Revision: https://reviews.freebsd.org/D31451
Reviewed by: kp
MFC after: 2 weeks

Fix some common typos in comments

- s/configuraiton/configuration/
- s/specifed/specified/
- s/compatiblity/compatibility/

MFC after: 5 days

Makefile.inc1: Avoid hanging if pkg is not installed

For `pkg --version`, Redirect stdin from /dev/null to avoid waiting on
/usr/sbin/pkg's bootstrap prompt if the pkg package is not installed.
Also redirect stderr to /dev/null to discard the warning message in
this case.

Reported by: mjg
Fixes: 4e224e4be7c3 ("pkgbase: accommodate pkg < 1.17")
Sponsored by: The FreeBSD Foundation

zfs: merge openzfs/zfs@f3678d70f (master) into main

Notable upstream pull request merges:
  #12339 Read past end of argv array in zpool_do_import()
  #12365 Fixes in persistent L2ARC
  #12383 Fixes for KMSAN reports
  #12425 Avoid small buffer copying on write
  #12428 Fix unfortunate NULL in spa_update_dspace
  #12446 Allow disabling of unmapped I/O on FreeBSD

Obtained from: OpenZFS
OpenZFS commit: f3678d70ff8f98d67caf377ec0326c9a6c7bcf29

src.conf.5: Regen after fe52b7f60ef4, PROFILE default off

bsdinstall: Remove unused sysctl.h header #include

Disable PROFILE option by default

Hardware based profiling (e.g. hwpmc) is much better and produces more
useful results. Today the profiling lib archives (_p.a) serve no real
purpose other than increasing the library build time.

Both upstream and base system (in commit b762974cf4b9) Clang have been
modified to remove the special case for linking against these libraries.

Clang's -pg support and mcount() remain, so building with -pg can still
be used on code that the user builds; we just no longer provide prebuilt
libraries compiled with -pg.

Discussed on freebsd-hackers[1] / freebsd-current [2] in 2020 and
freebsd-arch [3] in 2021. A deprecation notice was added in
commit 175841285e28.

[1] https://lists.freebsd.org/pipermail/freebsd-hackers/2020-January/055551.html
[2] https://lists.freebsd.org/pipermail/freebsd-current/2020-January/075105.html
[3] https://lists.freebsd.org/archives/freebsd-arch/2021-June/000016.html

PR: 256873 [exp-run]
Reviewed by: imp, jhb, kib
Relnotes: Yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30833

riscv: Sync NOTES with GENERIC changes

USB is already in sys/conf/NOTES, but NVMe is not, nor of course are the
new SiFive device drivers.

MFC after: 1 week

riscv: Add hwreset to NOTES to fix LINT build

Fixes: 8e7e0690ecd7 ("sifive_prci: Add reset support for the FU540 and FU740")
MFC after: 1 week

pci_dw: Drop unconditional explicit DEBUG define

This has been present since the first revision of the file. The debugf
macros have always been unused so it doesn't actually do anything
useful, and besides, debugging should not be unconditionally turned on
for a production driver. Moreover, this breaks the riscv LINT kernel
build as sys/conf/NOTES includes options DEBUG, resulting in a macro
redefinition error. This does not show up in the arm64 LINT kernel build
since that has an explicit nooptions DEBUG, which is dubious and should
be revisited. Rather than copy such a hack to riscv's NOTES, fix this
specific instance of DEBUG breaking.

Fixes: 896e217a0eae ("fu740_pci_dw: Add SiFive FU740 PCIe controller driver")
MFC after: 1 week

gpio.4: Mention new sifive_gpio driver

Suggested by: mhorne
MFC after: 1 week

riscv: Add NVMe, USB and HID support to GENERIC

The SiFive FU740 has both NVMe and USB so we need both to ensure we can
mount root, and HID is a dependency of USB.

Reviewed by: kp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31036

pci_pci: Support growing windows in bus_adjust_resource for NEW_PCIB

If we allocate a new window for a bridge rather than reusing an existing
one set up by firmware to cover all the devices then the new window only
includes the range needed for the first device to allocate the resource.
If a request comes in to adjust this resource in order to extend a
downstream window for another device then this will fail as the rman
doesn't have any space, so we must first grow the bridge's own window.

This is needed to support successfully attaching more than one PCI
device on SiFive's HiFive Unmatched, which has the following topology:

  Root Port <---> Bridge <---> Bridge <-+-> Bridge <---> (Unused)
   (pcib0)        (pcib1)      (pcib2)  |   (pcib3)
                                        +-> Bridge <---> xHCI
                                        |   (pcib4)
                                        +-> Bridge <---> M.2 E-key
                                        |   (pcib5)
                                        +-> Bridge <---> M.2 M-key
                                        |   (pcib6)
                                        +-> Bridge <---> x16 slot
                                            (pcib7)

Without this, the xHCI endpoint successfully attaches but NVMe M.2 M-key
endpoint fails to attach as, when its adjacent bridge (pcib6) attempts
to allocate a window from its parent (pcib2) on the other side of the
switch, its parent attempts to grow its own window by calling
bus_adjust_resource on its own parent (pcib1) which fails to call the
root port device (pcib0) to request more memory to grow its own window.
Had the root port been directly connected to the switch without the
bridge in the middle then the existing code would have worked, but the
extra hop broke it.

Reviewed by: jhb
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31035

fu740_pci_dw: Add SiFive FU740 PCIe controller driver

Reviewed by: mhorne
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31033

sifive_gpio: Add SiFive GPIO controller driver

This is present on both the FU540 and FU740, but only needed for the
FU740 in order to assert reset and power enable signals for its PCIe
controller.

Reviewed by: mhorne
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31031

fu540_spi: Rename to sifive_spi

The FU740 also uses the same SPI controller.

Reviewed by: kp, philip
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31026

sifive_prci: Add reset support for the FU540 and FU740

This is needed for FU740 PCIe support. Whilst we don't need the FU540's
resets they are also defined for completeness.

Reviewed by: manu
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31024

sifive_prci: Delay attachment until after clk_fixed

This avoids noisy output from early attempts to attach before clk_fixed
has attached to the parent clocks.

Reviewed by: kp, mhorne
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31023

sifive_prci: Add support for the FU740 PRCI

Reviewed by: mhorne
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31022

fu540_prci: Rename to sifive_prci and use ocd_data for FU540 specificity

The FU740 has a very similar controller and will reuse most of the
driver. This also drops the dependency on the device-tree include for
the binding indices; the header doesn't namespace its contents (and nor
does the FU740 one) so using both would require seperate translation
units which would be unnecessarily complicated just to avoid defining
local copies of the small number of constants.

Whilst here, add the missing l to gemgxlclk's name and drop the prci_
prefix from tlclk's name as we don't prefix any of the others and it's
entirely unnecessary.

Reviewed by: kp, mhorne
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31021

Parse named nodes from IORT ACPI on arm64

Add the ability to map named components from IORT to their
SMMU or ITS node in order to setup interrupts.
It is now possible to find a node by its name (substring) and
resource ID similar to PCI nodes.
This is needed by work on a driver for NXP's Second Generation
Data Path Acceleration Architecture (DPAA2).

Reviewed by: andrew
MFC after: 2 weeks
Differential Revision:: https://reviews.freebsd.org/D31267

Follow-up to d69d07569ee2 by bumping lld local version

This makes it easier to detect that lld's support for compressed input
sections on BE targets is fixed.

MFC after: 3 days
X-MFC-With: d69d07569ee2

sctp: improve handling of IPv4 addresses on IPV6 sockets
Reported by: syzbot+08fe66e4bfc2777cba95@syzkaller.appspotmail.com
MFC after: 3 days

sctp: improve input validation of mapped addresses in sctp_connectx()
MFC after: 3 days

sctp: improve input validation of mapped addresses in send()
Reported by: syzbot+35528f275f2eea6317cc@syzkaller.appspotmail.com
Reported by: syzbot+ac29916d5f16d241553d@syzkaller.appspotmail.com
MFC after: 3 days

Apply upstream lld fix for compressed input sections on BE targets

Merge commit c6ebc651b6fa from llvm git (by Simon Atanasyan):

  [LLD] Support compressed input sections on big-endian targets

  This patch enables compressed input sections on big-endian targets by
  checking the target endianness and selecting an appropriate `Chdr`
  structure.

  Fixes PR51369

  Differential Revision: https://reviews.llvm.org/D107635

Reported by: emaste
MFC after: 3 days

cache: add OPENREAD and OPENWRITE to fast path lookup

[lltable] Restructure nd6 code.

Factor out lltable locking logic from lltable_try_set_entry_addr()
into a separate lltable_acquire_wlock(), so the latter can be used
in other parts of the code w/o duplication.

Create nd6_try_set_entry_addr() to avoid code duplication in nd6.c
and nd6_nbr.c.

Move lle creation logic from nd6_resolve_slow() into a separate
nd6_get_llentry() to simplify the former.

These changes serve as a pre-requisite for implementing
RFC8950 (IPv4 prefixes with IPv6 nexthops).

Differential Revision: https://reviews.freebsd.org/D31432
MFC after: 2 weeks

bhyve: Use fspacectl(2) for BOP_DELETE on regular file images

bhyve can also make use of fspacectl(2) to implement BOP_DELETE with
hole-punching. Since it is not desirable to do zero-filling for large
DEALLOCATE/UNMAP range, candelete is not set if pathconf(2) indicates
that the underlying file system does not support native
VOP_DEALLOCATE(9).

Sponsored by: The FreeBSD Foundation
Reviewed by: grehan
Differential Revision: https://reviews.freebsd.org/D28880

namei: Add cn_flags bits for OPENREAD and OPENWRITE

VOP_LOOKUP() is called with cn_flags bits ISLASTCN and ISOPEN
to indicate that the lookup is for the last component of a pathname
when doing open.

If the cn_flags also indicates if the open is for Reading, Writing or Both,
the NFSv4 client can do an NFSv4 Open operation in the same compound
RPC as Lookup, often avoiding the additional Open RPC now done when
VOP_OPEN() is called.

This patch defines two new cn_flags bits called OPENREAD and OPENWRITE
and sets these in open2nameif() based on FREAD, FWRITE flag bits.
This will allow a subsequent patch to the NFSv4 client to do the Open
operation in the same RPC as Lookup.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31431

Fix pathconf.2 documentation error

_PC_MIN_HOLE_SIZE and _PC_DEALLOC_PRESENT were mixed somehow before this
fix.

Sponsored by: The FreeBSD Foundation
Reviewed by: delphij
Differential Revision: https://reviews.freebsd.org/D31436

cxgbei: Support for ISO (iSCSI segmentation offload).

ISO can be disabled before establishing a connection by setting
dev.tNnex.N.toe.iso to 0.

Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D31223

iSCSI: Add support for segmentation offload for hardware offloads.

Similar to TSO, iSCSI segmentation offload permits the upper layers to
submit a "large" virtual PDU which is split up into multiple segments
(PDUs) on the wire.  Similar to how the TCP/IP headers are used as
templates for TSO, the BHS at the start of a large PDU is used as a
template to construct the specific BHS at the start of each PDU.  In
particular, the DataSN is incremented for each subsequent PDU, and the
'F' flag is only set on the last PDU.

struct icl_conn has a new 'ic_hw_isomax' field which defaults to 0,
but can be set to the largest virtual PDU a backend supports.  If this
value is non-zero, the iSCSI target and initiator use this size
instead of 'ic_max_send_data_segment_length' to determine the maximum
size for SCSI Data-In and SCSI Data-Out PDUs.  Note that since PDUs
can be constructed from multiple buffers before being dispatched, the
target and initiator must wait for the PDU to be fully constructed
before determining the number of DataSN values were consumed (and thus
updating the per-transfer DataSN value used for the start of the next
PDU).

The target generates large PDUs for SCSI Data-In PDUs in
cfiscsi_datamove_in().  The initiator generates large PDUs for SCSI
Data-Out PDUs generated in response to an R2T.

Reviewed by: mav
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D31222