CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log

routing: simplify malloc flags in alloc_nhgrp().

(cherry picked from commit 639d7abec6cd31db9d240d6439fe6098b19eb3d8)

routing: Fix newly-added rt_get_inet[6]_parent() api.

Correctly handle the case when no default route is present.

Reported by: Konrad <konrad.kreciwilk at korbank.pl>

(cherry picked from commit f84c30106e8b725774b4e9a32c8dd11c90da8c25)

lltable: fix crash introduced in c541bd368f86.

Reported by: cy

(cherry picked from commit f8c1b1a9296696f70ac209612a00ae0722d07ed9)

lltable: Add support for "child" LLEs holding encap for IPv4oIPv6 entries.

Currently we use pre-calculated headers inside LLE entries as prepend data
for `if_output` functions. Using these headers allows saving some
CPU cycles/memory accesses on the fast path.

However, this approach makes adding L2 header for IPv4 traffic with IPv6
nexthops more complex, as it is not possible to store multiple
pre-calculated headers inside lle. Additionally, the solution space is
limited by the fact that PCB caching saves LLEs in addition to the nexthop.

Thus, add support for creating special "child" LLEs for the purpose of holding
custom family encaps and store mbufs pending resolution. To simplify handling
of those LLEs, store them in a linked-list inside a "parent" (e.g. normal) LLE.
Such LLEs are not visible when iterating LLE table. Their lifecycle is bound
to the "parent" LLE - it is not possible to delete "child" when parent is alive.
Furthermore, "child" LLEs are static (RTF_STATIC), avoding complex state
machine used by the standard LLEs.

nd6_lookup() and nd6_resolve() now accepts an additional argument, family,
allowing to return such child LLEs. This change uses `LLE_SF()` macro which
packs family and flags in a single int field. This is done to simplify merging
back to stable/. Once this code lands, most of the cases will be converted to
use a dedicated `family` parameter.

Differential Revision: https://reviews.freebsd.org/D31379

(cherry picked from commit c541bd368f863bbf5c08dd5c1ecce0166ad47389)

routing: Fix crashes with dpdk_lpm[46] algo.

When a prefix gets deleted from the RIB, dpdk_lpm algo needs to know
the nexthop of the "parent" prefix to update its internal state.
The glue code, which utilises RIB as a backing route store, uses
fib[46]_lookup_rt() for the prefix destination after its deletion
to fetch the desired nexthop.
This approach does not work when deleting less-specific prefixes
with most-specific ones are still present. For example, if
10.0.0.0/24, 10.0.0.0/23 and 10.0.0.0/22 exist in RIB, deleting
10.0.0.0/23 would result in 10.0.0.0/24 being returned as a search
result instead of 10.0.0.0/22. This, in turn, results in the failed
datastructure update: part of the deleted /23 prefix will still
contain the reference to an old nexthop. This leads to the
use-after-free behaviour, ending with the eventual crashes.

Fix the logic flaw by properly fetching the prefix "parent" via
newly-created rt_get_inet[6]_parent() helpers.

Differential Revision: https://reviews.freebsd.org/D31546
PR: 256882,256833

(cherry picked from commit 36e15b717eec80047fe7442898b5752101f2fbca)

routing: add IPv6 fib validation procedure.

Allow consistency validation of the inet6 fib based on rib data.
Validation can be kicked off by loading test_lookup module and
running sysctl net.route.test.run_inet6_scan=1

(cherry picked from commit cbfba56c45ab77303a3e25a82cf750043849760b)

routing: Use process fib instead of fib 0 when conducting tests.

* Allow to do validation/performance tests by using process
fib instead of default fib 0.
* Print all validation errors instead of just the first one.

(cherry picked from commit 4a77a9b6491093b9a8bb786a861ed74ddf156e8e)

Simplify nhop operations in ip_output().

Consistently use `nh` instead of always dereferencing
ro->ro_nh inside the if block.
Always use nexthop mtu, as it provides guarantee that mtu is accurate.
Pass `nh` pointer to rt_update_ro_flags() to allow upcoming uses
of updating ro flags based on different nexthop.

Differential Revision: https://reviews.freebsd.org/D31451
Reviewed by: kp

(cherry picked from commit 9748eb742791dcfbb6496dc5c7c72c9283759baf)

[lltable] Restructure nd6 code.

Factor out lltable locking logic from lltable_try_set_entry_addr()
into a separate lltable_acquire_wlock(), so the latter can be used
in other parts of the code w/o duplication.

Create nd6_try_set_entry_addr() to avoid code duplication in nd6.c
and nd6_nbr.c.

Move lle creation logic from nd6_resolve_slow() into a separate
nd6_get_llentry() to simplify the former.

These changes serve as a pre-requisite for implementing
RFC8950 (IPv4 prefixes with IPv6 nexthops).

Differential Revision: https://reviews.freebsd.org/D31432

(cherry picked from commit 0b79b007ebfc250a8a7b928df268ada6f1c988c4)

Use lltable calculated header when sending lle holdchain after successful lle resolution.

Subscribers: imp, ae, bz

Differential Revision: https://reviews.freebsd.org/D31391

(cherry picked from commit 8482aa77481a1576df7a19dbeaccb91243fbb2a3)

[lltable] Unify datapath feedback mechamism.

Use newly-create llentry_request_feedback(),
llentry_mark_used() and llentry_get_hittime() to
request datapatch usage check and fetch the results
in the same fashion both in IPv4 and IPv6.

While here, simplify llentry_provide_feedback() wrapper
by eliminating 1 condition check.

Differential Revision: https://reviews.freebsd.org/D31390

(cherry picked from commit f3a3b061216936b6233d1624dfdba03240d7c045)

Fix typo in rib_unsibscribe<_locked>().

Submitted by: Zhenlei Huang<zlei.huang at gmail.com>
Differential Revision: https://reviews.freebsd.org/D31356

(cherry picked from commit 5b42b494d54365254176dd0ef688cd96edabe657)

[netflow] fix gateway reporting in ng_netflow

Reported by: Guy Yur <guyyur at gmail.com>

(cherry picked from commit 8e55a80e0cc53002979f04a2504d2167267db3c2)

Enforce check for using the return result for ifa?_try_ref().

Suggested by: hps
Differential Revision: https://reviews.freebsd.org/D29504

(cherry picked from commit 9e5243d7b65939c3d3dbf844616084e9580876dd)

Rename variables inside nexhtop group consider_resize() code.

No functional changes.

(cherry picked from commit 0f30a36dedef43781f5003bdfcb4254d310f02e4)

Simplify ifa/ifp refcounting in the routing stack.

The routing stack control depends on quite a tree of functions to
determine the proper attributes of a route such as a source address (ifa)
or transmit ifp of a route.

When actually inserting a route, the stack needs to ensure that ifa and ifp
points to the entities that are still valid.
Validity means slightly more than just pointer validity - stack need guarantee
that the provided objects are not scheduled for deletion.

Currently, callers either ignore it (most ifp parts, historically) or try to
use refcounting (ifa parts). Even in case of ifa refcounting it's not always
implemented in fully-safe manner. For example, some codepaths inside
rt_getifa_fib() are referencing ifa while not holding any locks, resulting in
possibility of referencing scheduled-for-deletion ifa.

Instead of trying to fix all of the callers by enforcing proper refcounting,
switch to a different model.
As the rib_action() already requires epoch, do not require any stability guarantees
other than the epoch-provided one.
Use newly-added conditional versions of the refcounting functions
(ifa_try_ref(), if_try_ref()) and fail if any of these fails.

Reviewed by: donner
Differential Revision: https://reviews.freebsd.org/D28837

(cherry picked from commit 596417283722ee62ed17aed1c875ad90c01cbb0e)

Add if_try_ref() to simplify refcount handling inside epoch.

When we have an ifp pointer and the code is running inside epoch,
epoch guarantees the pointer will not be freed.
However, the following case can still happen:

* in thread 1 we drop to refcount=0 for ifp and schedule its deletion.
* in thread 2 we use this ifp and reference it
* destroy callout kicks in
* unhappy user reports a bug

This can happen with the current implementation of ifnet_byindex_ref(),
as we're not holding any locks preventing ifnet deletion by a parallel thread.

To address it, add if_try_ref(), allowing to return failure when
referencing ifp with refcount=0.
Additionally, enforce existing if_ref() is with KASSERT to provide a
cleaner error in such scenarios.

Finally, fix ifnet_byindex_ref() by using if_try_ref() and returning NULL
if the latter fails.

MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D28836

(cherry picked from commit 7563019bc69301a382abefbac3b0fea1d876410e)

sctp: Fix racy UNBOUND flag check in sctp_inpcb_bind()

SCTP needs to avoid binding a given socket twice. The check used to
avoid this is racy since neither the inpcb lock nor the global info lock
is held. Fix it by synchronizing using the global info lock. In
particular, sctp_inpcb_bind() may drop the inpcb lock in some cases, but
the info lock is sufficient to prevent double insertion into PCB hash
tables.

Reported by: syzbot+548a8560d959669d0e12@syzkaller.appspotmail.com
Reviewed by: tuexen
Sponsored by: The FreeBSD Foundation

(cherry picked from commit 4a36122b1db1b255cf21d926b997d524e6782429)

itimer: Serialize access to the p_itimers array

Fix the following race between itimer_proc_continue() and process exit.

itimer_proc_continue() may be called via realitexpire(), the real
interval timer.  Note that exit1() drains this timer _after_ draining
and freeing itimers.  Moreover, itimers_exit() is called without the
process lock held; it only acquires the proc lock when deleting
individual itimers, so once they are drained we free p->p_itimers
without any synchronization.  Thus, itimer_proc_continue() may load a
non-NULL p->p_itimers array and iterate over it after it has been freed.

Fix the problem by using the process lock when clearing p->p_itimers, to
synchronize with itimer_proc_continue().  Formally, accesses to this
field should be protected by the process lock anyway, and since the
array is allocated lazily this will not incur any overhead in the common
case.

Reported by: syzbot+c40aa8bf54fe333fc50b@syzkaller.appspotmail.com
Reported by: syzbot+929be2f32503bbc3844f@syzkaller.appspotmail.com
Reviewed by: kib
Sponsored by: The FreeBSD Foundation

(cherry picked from commit 3138392a46a4a8ecfb8e36e9970e88bbae9caed3)

md: Clamp to a multiple of the sector size when resizing

We do this when creating md(4) devices, in kern_mdattach_locked(), but
not when resizing the provider. Apply the same policy when resizing, as
many GEOM classes do not expect to deal with providers for which
pp->mediasize % pp->sectorsize != 0.

Reported by: syzkaller
Sponsored by: The FreeBSD Foundation

(cherry picked from commit 47619b604402c9672a0f9bf62666f3bcba1dfb7e)

sctp: Simplify the free port search in sctp_inpcb_bind()

Eliminate a flag variable and reduce indentation. No functional change
intended.

Reviewed by: tuexen
Sponsored by: The FreeBSD Foundation

(cherry picked from commit 2496d812a9c781f8e4be1bfd22375c6e686665da)

sctp: Avoid unnecessary refcount bumps in sctp_inpcb_bind()

We only drop the inp lock when binding to a specific port. So, only
acquire an extra reference when required. This simplifies error
handling a bit.

Reviewed by: tuexen
Sponsored by: The FreeBSD Foundation

(cherry picked from commit 93908fce7280b1146bbc5135b78829e8f8ff1b74)

graid: Avoid tasting devices with small sector sizes

The RAID metadata parsers effectively assume a sector size of 512 bytes
or larger, but md(4) devices can be created with a sector size that's
any power of 2. Add some seatbelts to graid tasting routines to ensure
that the requested sector(s) are large enough for the device to
plausibly contain RAID metadata.

Reported by: syzbot+f43583c9bf8357c8b56f@syzkaller.appspotmail.com
Reported by: syzbot+537dd9f22b91b698e161@syzkaller.appspotmail.com
Reported by: syzbot+51509dd48871c57c6e47@syzkaller.appspotmail.com
Reported by: syzbot+c882a31037ea2a54ff63@syzkaller.appspotmail.com
Sponsored by: The FreeBSD Foundation

(cherry picked from commit 9e9ba9c73de9206d82b8390c47b07f71470d001a)

mdconfig: Add a regression test for mediasize rounding

Sponsored by: The FreeBSD Foundation

(cherry picked from commit ed59446b47095fc20c1f77e832286f5b953cd289)

sctp: Remove always-false checks in sctp_inpcb_bind()

No functional change intended.

Reviewed by: tuexen
Sponsored by: The FreeBSD Foundation

(cherry picked from commit 0d29e4bc011dd4557ff9bde373bd48c567c3a4bf)

RELNOTES: Add entry for just-MFC'ed HiFive Unmatched support

This is a direct commit.

mx25l: Add support for Integrated Silicon Solution is25wp256

This is used for the on-board flash on the HiFive Unmatched board.

Reviewed by: #riscv, jrtc27
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31562

(cherry picked from commit 416ac155bb750fa55917daf340abe4ef04e7d4e6)

sifive_spi: Add missing case for SPIBUS_MODE_NONE

Otherwise sckmode is left uninitialised, not zero. This mode is used for
the on-board flash on the HiFive Unmatched board. Whilst here, catch
unknown modes and return an error rather than silently continuing.

Reviewed by: #riscv, jrtc27
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31562

(cherry picked from commit f5d78bea1f699c05e1694505088e61d22b8fb1f5)

Revert "Mark LLDB/CLANG_BOOTSTRAP/LLD_BOOTSTRAP as broken on non-FreeBSD for now"

The fixes for this have now been committed so we can re-enable these.

This reverts commit d9f25575a29ff7c83f226349a10a37b9aaf75ad5.

MFC after: 1 week

(cherry picked from commit 83ec48b79275b5211b06675dba04dab1f58c3a70)

clang: Build with -fno-strict-aliasing when using GCC

Somewhat ironically, there are strict aliasing violations in Clang,
which can result in the following assertion failure:

Assertion `*(NamedDecl **)&Data == ND && "PointerUnion mangles the NamedDecl pointer!"' failed.

Upstream's clang/CMakeLists.txt specifically (not LLVM as a whole)
passes -fno-strict-aliasing if the compiler is not Clang, and this fixes
the above issue.

This was seen when cross-building from Linux using a bootstrap
compiler, but likely also affects worlds built with a new enough
external GCC toolchain.

MFC after: 1 week
Reviewed by: dim
Differential Revision: https://reviews.freebsd.org/D31533

(cherry picked from commit c1f7d8dd23db693106fcd66e0b1766a3f3194670)

clang: Support building with GCC and DEBUG_FILES disabled

If MK_DEBUG_FILES=no then the Clang link rule has clang as .TARGET,
rather than clang.full, causing the implicit ${CFLAGS.${.TARGET:T}} to
be CFLAGS.clang, and thus pull in flags intended for when your compiler
is Clang, not when linking Clang itself. This doesn't matter if your
compiler is in fact Clang, but it breaks using GCC as, for example,
bsd.sys.mk adds -Qunused-arguments to CFLAGS.clang. This is seen when
trying to build a bootstrap toolchain on Linux where GCC is the system
compiler.

Thus, introduce a new internal NO_TARGET_FLAGS variable that is set by
Clang to disable the addition of these implicit flags. This is a bigger
hammer than necessary, as flags for .o files would be safe, but that is
not needed for Clang.

Note that the same problem does not arise for LDFLAGS when building LLD
with BFD, since our build produces a program called ld.lld, not plain
lld (unlike upstream, where ld.lld is a symlink to lld so they can
support multiple different flavours in one binary).

Suggested by: sjg
Fixes: 31ba4ce8898f ("Allow bootstrapping llvm-tblgen on macOS and Linux")
MFC after: 1 week
Reviewed by: dim, imp, emaste
Differential Revision: https://reviews.freebsd.org/D31532

(cherry picked from commit c8edd0542647f59ab07dd73e865edd34706397a5)

Fix bootstrapping to actually build lldb-tblgen for later use

Because MK_LLDB=no is in BSARGS, the bootstrap-tools recursive make does
not add lldb-tblgen to _clang_tblgen, causing it to not be built. This
means that the build currently always uses the host's lldb-tblgen
(which, whilst currently it appears to work, could in future break if
TableGen backends are added or altered) and, if it doesn't exist (either
because the current FreeBSD system was built with it disabled, or you're
building on macOS/Linux), fails. Linux and macOS cross-builds used to
work simply because LLDB was previously in BROKEN_OPTIONS when building
on non-FreeBSD.

Instead, move MK_LLDB=no from BSARGS to XMAKE. This ensures that the
lib/clang build in cross-tools continues to not build LLDB parts for the
bootstrap toolchain (both to save time/space on FreeBSD, and because our
vendored LLDB does not include the macOS and Linux host files so those
would fail to build).

The DIRDEPS target is updated to move MK_LLDB=no from the BSARGS block
that mirrors Makefile.inc1 to the line that disables additional
toolchain components. The DIRDEPS build likely suffers from the same
issue currently, but having never used it and not being familiar with
how it works I am leaving that as-is. If it does suffer from the same
issue it should be easily reproducible by renaming /usr/bin/lldb-tblgen
or moving it to a directory not in PATH.

Fixes: 31ba4ce8898f ("Allow bootstrapping llvm-tblgen on macOS and Linux")
MFC after: 1 week
Reviewed by: dim, emaste, imp
Differential Revision: https://reviews.freebsd.org/D31531

(cherry picked from commit 1e4c802913af619ac15741bbd276e1141ca17dc9)

Makefile.inc1: Make sure sub-makes see MK_CLANG_BOOTSTRAP=no when XCC is a path

Currently we override MK_CLANG_BOOTSTRAP to no so we don't build a
bootstrap compiler, but subdirectories don't see that and so the hack in
bsd.sys.mk to prefer our includes over Clang's resource dir for external
toolchains is not enabled unless you use -DWITHOUT_CLANG_BOOTSTRAP
explicitly on top of XCC (which tools/build/make.py does not do),
causing duplicate definition errors when building rtld-elf due to the
use of -ffreestanding (Clang's stdint.h will use the system one when
hosted, but its own when freestanding, and only has glibc's preprocessor
guards, not FreeBSD's).

This broke when dropping CLANG_BOOTSTRAP from BROKEN_OPTIONS.

Fixes: 31ba4ce8898f ("Allow bootstrapping llvm-tblgen on macOS and Linux")
MFC after: 1 week
Reviewed by: imp, arichardson
Differential Revision: https://reviews.freebsd.org/D31529

(cherry picked from commit ab3a18095faebe306989f25288c44968f4144063)

clang: Fix inverted condition in llvm.build.mk

Fixes: 31ba4ce8898f ("Allow bootstrapping llvm-tblgen on macOS and Linux")
MFC after: 1 week

(cherry picked from commit 5ff5d1177bc66f1c2a0a6ee4d0ffa128d32e1dad)

tools/build/cross-build: Fix building libllvmminimal on Linux

There is a __used member in glibc's posix_spawn_file_actions_t in
spawn.h, so we must temporarily undefine __used when including it,
otherwise Support/Unix/Program.inc fails to build. This is based on
similar handling for __unused in other headers.

Fixes: 31ba4ce8898f ("Allow bootstrapping llvm-tblgen on macOS and Linux")
MFC after: 1 week

(cherry picked from commit 8a1895a3fa6f634e9f459b6b62321a61c7941bdc)

riscv: Fix pmap_alloc_l2 when it should allocate a new L1 entry

The current code checks the RWX bits are 0 but does not check the V bit
is non-zero, meaning not-yet-allocated L1 entries that are still zero
are regarded as being allocated. This is likely due to copying the arm64
code that checks ATTR_DESC_MASK is L1_TABLE, which emcompasses both the
type and the validity in a single field, and erroneously translating
that to a check of just PTE_RWX being 0 to indicate non-leaf, forgetting
about the V bit. This then results in the following panic:

    panic: Fatal page fault at 0xffffffc0005cf292: 0x00000000000050
    cpuid = 1
    time = 1628379581
    KDB: stack backtrace:
    db_trace_self() at db_trace_self
    db_trace_self_wrapper() at db_trace_self_wrapper+0x38
    kdb_backtrace() at kdb_backtrace+0x2c
    vpanic() at vpanic+0x148
    panic() at panic+0x2a
    page_fault_handler() at page_fault_handler+0x1ba
    do_trap_supervisor() at do_trap_supervisor+0x7a
    cpu_exception_handler_supervisor() at
    cpu_exception_handler_supervisor+0x70
    --- exception 13, tval = 0x50
    pmap_enter_l2() at pmap_enter_l2+0xb2
    pmap_enter_object() at pmap_enter_object+0x15e
    vm_map_pmap_enter() at vm_map_pmap_enter+0x228
    vm_map_insert() at vm_map_insert+0x4ec
    vm_map_find() at vm_map_find+0x474
    vm_map_find_min() at vm_map_find_min+0x52
    vm_mmap_object() at vm_mmap_object+0x1ba
    vn_mmap() at vn_mmap+0xf8
    kern_mmap() at kern_mmap+0x4c4
    sys_mmap() at sys_mmap+0x38
    do_trap_user() at do_trap_user+0x208
    cpu_exception_handler_user() at cpu_exception_handler_user+0x72
    --- exception 8, tval = 0x1dd

Instead, we should just check the V bit, as on amd64, and assert that
any valid L1 entries are not leaves, since an L1 leaf would render the
entire range allocated and thus we should not have attempted to map that
VA in the first place.

Reported by: David Gilbert <dgilbert@daveg.ca>
MFC after: 1 week
Reviewed by: markj, mhorne
Differential Revision: https://reviews.freebsd.org/D31460

(cherry picked from commit 98138bbde032e2040af3d158658c497fd3f63f2a)

riscv: Sync NOTES with GENERIC changes

USB is already in sys/conf/NOTES, but NVMe is not, nor of course are the
new SiFive device drivers.

MFC after: 1 week

(cherry picked from commit c5e5202a3d5d6b7d47a6da7b678bc5c4320c91e9)

riscv: Add hwreset to NOTES to fix LINT build

Fixes: 8e7e0690ecd7 ("sifive_prci: Add reset support for the FU540 and FU740")
MFC after: 1 week

(cherry picked from commit 0a4cb54506e3e2c0d911ddd12416eb2fdc6a7bd7)

pci_dw: Drop unconditional explicit DEBUG define

This has been present since the first revision of the file. The debugf
macros have always been unused so it doesn't actually do anything
useful, and besides, debugging should not be unconditionally turned on
for a production driver. Moreover, this breaks the riscv LINT kernel
build as sys/conf/NOTES includes options DEBUG, resulting in a macro
redefinition error. This does not show up in the arm64 LINT kernel build
since that has an explicit nooptions DEBUG, which is dubious and should
be revisited. Rather than copy such a hack to riscv's NOTES, fix this
specific instance of DEBUG breaking.

Fixes: 896e217a0eae ("fu740_pci_dw: Add SiFive FU740 PCIe controller driver")
MFC after: 1 week

(cherry picked from commit 22997b755013bdde60119fdc781769192ab7e1e0)

gpio.4: Mention new sifive_gpio driver

Suggested by: mhorne
MFC after: 1 week

(cherry picked from commit 5668a155cbe6cef802bc95666477a440fdb6f606)

riscv: Add NVMe, USB and HID support to GENERIC

The SiFive FU740 has both NVMe and USB so we need both to ensure we can
mount root, and HID is a dependency of USB.

Reviewed by: kp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31036

(cherry picked from commit 6e162bd2f298b58a418a17d49f5671a9a113fc4e)

fu740_pci_dw: Add SiFive FU740 PCIe controller driver

Reviewed by: mhorne
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31033

(cherry picked from commit 896e217a0eae692b1fe3de5d1354478541fb13ff)

sifive_gpio: Add SiFive GPIO controller driver

This is present on both the FU540 and FU740, but only needed for the
FU740 in order to assert reset and power enable signals for its PCIe
controller.

Reviewed by: mhorne
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31031

(cherry picked from commit b47e5c5dbe2058b6a178230dc7796c37bfeaa926)

fu540_spi: Rename to sifive_spi

The FU740 also uses the same SPI controller.

Reviewed by: kp, philip
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31026

(cherry picked from commit 90a089cf2a7462e4101907e2a6161734b2487a78)

sifive_prci: Add reset support for the FU540 and FU740

This is needed for FU740 PCIe support. Whilst we don't need the FU540's
resets they are also defined for completeness.

Reviewed by: manu
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31024

(cherry picked from commit 8e7e0690ecd79e8adc9182d486c05748bd97d26d)

sifive_prci: Delay attachment until after clk_fixed

This avoids noisy output from early attempts to attach before clk_fixed
has attached to the parent clocks.

Reviewed by: kp, mhorne
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31023

(cherry picked from commit dcbea9a2f465be1786db21523a7f55db3f7ab3dd)

sifive_prci: Add support for the FU740 PRCI

Reviewed by: mhorne
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31022

(cherry picked from commit 589d8a78a57b3ca1327bec3311281a38e4e49952)

fu540_prci: Rename to sifive_prci and use ocd_data for FU540 specificity

The FU740 has a very similar controller and will reuse most of the
driver. This also drops the dependency on the device-tree include for
the binding indices; the header doesn't namespace its contents (and nor
does the FU740 one) so using both would require seperate translation
units which would be unnecessarily complicated just to avoid defining
local copies of the small number of constants.

Whilst here, add the missing l to gemgxlclk's name and drop the prci_
prefix from tlclk's name as we don't prefix any of the others and it's
entirely unnecessary.

Reviewed by: kp, mhorne
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31021

(cherry picked from commit 12b115ec57cbe5e18a6511d9c98225551263d4ac)

riscv: Fix pmap_kextract racing with concurrent superpage promotion/demotion

This repeats amd64's cfcbf8c6fd3b (r180498) and i386's cf3508519c5e
(r202894) but for riscv; pmap_kextract must be lock-free and so it can
race with superpage promotion and demotion, thus the L2 entry must only
be loaded once to avoid using inconsistent state.

PR: 250866
Reviewed by: markj, mhorne
Tested by: David Gilbert <dgilbert@daveg.ca>
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31253

(cherry picked from commit 4a235049082ee1cb044873ad9aff12cf73d0fd3b)

riscv: Include spibus and spigen in GENERIC

We already attempt to enable the SiFive SPI controller, but since spibus
isn't enabled it isn't actually built.

Reviewed by: kp, philip
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31027

(cherry picked from commit 8c439847f0d33fdb79f2bbdced4c300a620d74f5)

pci_dw: Detect number of outbound regions automatically

Currently we use the num-viewports property to decide how many outbound
regions there are we can use, defaulting to 2. However, Linux has
stopped using that and so it no longer appears in new device trees, such
as for the SiFive FU740. Instead, it's possible to just probe the
hardware directly.

Reviewed by: mmel
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31030

(cherry picked from commit 4707bb0430e6ca3935ef8196e30830b3cdaf3514)

pci_dw: Support modern "unroll" iATU mode

This supersedes the old legacy mode where a viewport register was used
to mux multiple regions behind a single set of registers, and is used on
the SiFive FU740.

Reviewed by: mmel
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31029

(cherry picked from commit f240dfff229d1f1ff502f59901ef2b9364ca55d9)

pci_dw: Support multiple memory windows

Currently we assume there is only one memory and one prefetch memory
window, and ignore the latter. However, the SiFive FU740 has two normal
memory windows.

As part of this, the viewports are rearranged. Previously the viewports
were memory, config then optionally I/O. Both to simplify the config
index calculation and to ensure it can always be mapped even if we have
too many memory windows for the number of viewports, config is moved to
being the first viewport.

This generalisation now also naturally supports mapping prefetch memory
windows.

Reviewed by: mmel
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31028

(cherry picked from commit f8c1701f23b5b99365ffab2a067e4676b905ab57)

pci_dw: Trim ATU windows bigger than 4GB

The size of the ATU MEM/IO windows is implicitly casted to uint32_t.
Because of that some window sizes were silently demoted to 0 and ignored.
Check the size if its too large, trim it to 4GB and print a warning message.

Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: mw
Obtained from: Semihalf
Sponsored by: Marvell
Differential revision: https://reviews.freebsd.org/D29625

(cherry picked from commit 243000b19f8b4ab104b584b2d16bc6aa9131c9b5)

pci_dw: fix outbound I/O window configuration

Use viewport "2" instead of "0" and change window type from MEM to IO.
Without these changes the MEM ATU window can be overwritten with the IO one.

Submitted by: Kornel Duleba <mindal@semihalf.com>
Obtained from: Semihalf
Sponsored by: Marvell
Differential revision: https://reviews.freebsd.org/D29516

(cherry picked from commit 57dbb3c25936f0d61fef152eb224ca86a73af0e9)

Fix native-xtools build

Fixes https://github.com/freebsd/poudriere/issues/894
Fixes: d0c737e18 ("Makefile: Fix MAKEOBJDIRPREFIX command-line")
X-MFC-With: d0c737e18

(cherry picked from commit b60770fceb2b94efe334221bd13a5e55229babb3)

Makefile: Fix MAKEOBJDIRPREFIX command-line variable check for bmake

Unlike the old fmake, running make FOO=bar when using bmake doesn't put
FOO=bar in .MAKEFLAGS at the top level, it instead just puts FOO in
.MAKEOVERRIDES and the full MAKEFLAGS will be formed for sub-makes.
Moreover, this only applies for sub-makes in rules, so this doesn't
apply to those in shell assignments. This means that the current check
does not catch make MAKEOBJDIRPREFIX=..., only those defined in config
files. Thus we must also check .MAKEOVERRIDES explicitly.

Reviewed by: sjg
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31015

(cherry picked from commit d0c737e18454868447f731fe2b10d04f50d9d53b)

riscv: Fix pindex level confusion

The pindex values are assigned from the L3 leaves upwards, meaning there
are NUL2E L3 tables and then NUL1E L2 tables (with a futher NUL0E L1
tables in future when we implement Sv48 support). Therefore anything
below NUL2E is an L3 table's page and anything above or equal to NUL2E
is an L2 table's page (with the threshold of NUL2E + NUL1E marking the
start of the L1 tables' pages in Sv48). Thus all the comparisons and
arithmetic operations must use NUL2E to handle the L3/L2 allocation (and
thus L2/L1 entry) transition point, not NUL1E as all but pmap_alloc_l2
were doing.

To make matters confusing, the NUL1E and NUL2E definitions in the RISC-V
pmap are based on a 4-level page hierarchy but we currently use the
3-level Sv39 format (as that's the only required one, and hardware
support for the 4-level Sv48 is not widespread). This means that, in
effect, the above bug cancels out with the bloated NULxE definitions
such that things "work" (but are still technically wrong, and thus would
break when adding Sv48 support), with one exception. pmap_enter_l2 is
currently the only function to use the correct constant, but since
_pmap_alloc_l3 uses the incorrect constant, it will do complete nonsense
when it needs to allocate a new L2 table (which is rather rare). In this
instance, _pmap_alloc_l3, whilst it would correctly determine the pindex
was for an L2 table, would only subtract NUL1E when computing l1index
and thus go way out of bounds (by 511*512*512 bytes, or 127.75 GiB) of
its own L1 table and, thanks to pmap_distribute_l1, of every other
pmap's L1 table in the whole system. This has likely never been hit as
it would presumably instantly fault and panic.

Reviewed by: markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31087

(cherry picked from commit ade2ea3c459ac1c2a7f44ce56b8999e6ffef08bf)

sifive_uart: Fix input character dropping in ddb and at a mountroot prompt

These use the raw console interface and poll. Unfortunately, the SiFive
UART puts the FIFO empty bit inside the FIFO data register, which means
that the act of checking whether a character is available also dequeues
any character from the FIFO, requiring the user to press each key twice.
However, since we configure the watermark to be 0 and, when the UART has
been grabbed for the console, we have interrupts off, we can abuse the
interrupt pending register to act as a substitute for the FIFO empty
bit.

This perhaps suggests that the console interface should move from having
rxready and getc to having getc_nonblock and getc (or make getc take a
bool), as all the places that call rxready do so to avoid blocking on
getc when there is no character available.

Reviewed by: kp, philip
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31025

(cherry picked from commit a1f9cdb1abf792cb1e1adcaaba0fb84cd56e80f1)

cgem: Add support for the SiFive FU740

Note that currently Linux's device tree uses the FU540's compatible
string, as does upstream U-Boot, but the U-Boot shipped with the board
based on an older patch series has the correct FU740 name. Thankfully
they are the same, at least as far as software is concerned.

Whilst here, fix a style(9) nit.

Reviewed by: philip, kp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31034

(cherry picked from commit 4c4a6884ad7fb18b0777597a4f6c2cdb235dccb6)

riscv: Implement missing nexus methods

This is required for the SiFive FU740's PCIe controller. Copied from
arm64 with the only difference being changing pmap_mapdev_attr to
pmap_mapdev as riscv only has the latter.

Reviewed by: mhorne
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31032

(cherry picked from commit d9e85f2c6f77418864a7531ffaa0e42061c0c7da)

riscv: Implement non-stub __vdso_gettc and __vdso_gettimekeep

PR: 256905
Reviewed by: arichardson, mhorne
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D30963

(cherry picked from commit 348c41d1815dc2e872a1deba1f4bf760caaa1094)

makefs: Cast daddr_t to off_t before multiplication

Apparently some large-file systems out there, such as my powerpc64le
Linux box, define daddr_t as a 32-bit type, which is sad and stymies
cross-building disk images. Cast daddr_t to off_t before doing
arithmetic that overflows.

Reviewed by: arichardson, jrtc27, imp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D27458

(cherry picked from commit 7ef082733bf8989797b71025ba6d597a7d17d92b)

Fix cross-builds after 4e5d32a445f90d37966cd6de571978551654e3f3

Add alignment macros to cross-build's sys/cdefs.h

Pull Request: https://github.com/freebsd/freebsd-src/pull/531
MFC after: immediately (build fix)

(cherry picked from commit 94d9439b6be6bd5ef9febfaf38128e0cad91476d)

Fix a common typo in source code comments

- s/existant/existent/

(cherry picked from commit 631504fb346800f95fc581c15eb88b01c1b66fcf)

crypto(4): Fix a few typos in camellia.c

- s/valiables/variables/

Obtained from: NetBSD

(cherry picked from commit 88a3af4da1aad5cf319c4c465baebc24b4e98fd8)

nvme(4): Add MSI and single MSI-X support.

If we can't allocate more MSI-X vectors, accept using single shared.
If we can't allocate any MSI-X, try to allocate 2 MSI vectors, but
accept single shared. If still no luck, fall back to shared INTx.

This provides maximal flexibility in some limited scenarios. For
example, vmd(4) does not support INTx and can handle only limited
number of MSI/MSI-X vectors without sharing.

MFC after: 1 week

(cherry picked from commit e3bdf3da769a55f0944d9c337bb4d91b6435f02c)

nvme(4): Do not panic on admin queue construct error.

MFC after: 1 week

(cherry picked from commit 31111372e6bad7212dbee36dd312e3b53fdfd3f6)

ffs_update(): Do not assume that EBUSY can only come LK_NOWAIT trylock

(cherry picked from commit bb536de6c0d73566e610881e12c55489a7c6ec44)

ffs_update(): recalculate flags after relocking the vnode

(cherry picked from commit f822d4feb87a7bd7747679aa779942d24fff08e0)

openssh: update default version addendum in man pages

Fixes: 2f513db72b03 ("Upgrade to OpenSSH 7.9p1.")
MFC after: 3 days
Sponsored by: The FreeBSD Foundation

(cherry picked from commit b0025f9b7ff04ed623e9e5d8f9eaf172d5ff23f0)

aesni: Avoid a potential out-of-bounds load in aes_encrypt_icm()

Given a partial block at the end of a payload, aes_encrypt_icm() would
perform a 16-byte load of the residual into a temporary variable. This
is unsafe in principle since the full block may cross a page boundary.
Fix the problem by copying the residual into a stack buffer first.

Reported by: syzbot+b7e44cde9e2e89f0f6c9@syzkaller.appspotmail.com
Reported by: syzbot+4b5eaf123a99456b5160@syzkaller.appspotmail.com
Reported by: syzbot+70c74c1aa232633355ca@syzkaller.appspotmail.com
Reported by: syzbot+2c663776a52828373d41@syzkaller.appspotmail.com
Reviewed by: cem, jhb
Sponsored by: The FreeBSD Foundation

(cherry picked from commit 564b6aa7fccd98654207447f870b82659b895e7b)

Fix -Wformat errors in pfctl on 32-bit architectures

Use PRIu64 to printf(3) uint64_t quantities, otherwise this will result
in "error: format specifies type 'unsigned long' but the argument has
type 'uint64_t' (aka 'unsigned long long') [-Werror,-Wformat]" on 32-bit
architectures.

Fixes: 80078d9d38fd
MFC after: 1 week

(cherry picked from commit 5b8f07b12f8477f1679013d6b3abdab8d33c7243)

pfctl: use libpfctl to retrieve pf status

Rather than call DIOCGETSTATUS ourselves use the new libpfctl functions.

MFC after: 1 week
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D31697

(cherry picked from commit 80078d9d38fde6f146de28809640b2c7bff45a6c)

libpfctl: Implement DIOCGETSTATUS wrappers

MFC after: 1 week
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D31696

(cherry picked from commit 46fb68b1de49c8d235024374b71c1249af9e62ef)

libpfctl: fix double free

Reviewed by: donner
MFC after: 1 week
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D31695

(cherry picked from commit b0ccc2e277acddd33c65b444e7841b780b3094d7)

pf: Introduce nvlist variant of DIOCGETSTATUS

Make it possible to extend the GETSTATUS call (e.g. when we want to add
new counters, such as for syncookie support) by introducing an
nvlist-based alternative.

MFC after: 1 week
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D31694

(cherry picked from commit 2b10cf85f8684f822511d7b9377e256ab623abbc)

tcp: document TCP Fast Open (RFC 7413) in tcp(4)

Adds documentation for the TCP_FASTOPEN socket option
and related MIB variables to the tcp(4) manual page.

PR: 257907
Reviewed by: gbe
Differential Revision: https://reviews.freebsd.org/D31764

(cherry picked from commit 71611b0c688568d513c665e1af3d95fcd50605fa)

Retore the vnet before returning an error.

Obtained from: Kanndula, Dheeraj <Dheeraj.Kandula@netapp.com>

(cherry picked from commit c6b2d024d7eedbf32f52a17bc029c92f5a4d1a54)

connect: Use soconnectat() unconditionally in kern_connect()

soconnect(...) is equivalent to soconnectat(AT_FDCWD, ...), so rely on
this to save a branch. No functional change intended.

Sponsored by: The FreeBSD Foundation

(cherry picked from commit 091869def9eeb9796c3627ea95bf6cc46cf952a0)

wpa: Enclose FreeBSD specific defines

FreeBSD only defines are specific only to FreeBSD. Document them as such.
It is our intention to push this change to w1.fi.

(cherry picked from commit 213ceba977def36470df3abfe1fac47f689130c1)

wpa: Include all wpa include file search directories

Though not all include file search directories are presently needed,
add them to the search list. This is required for the next update to
wpa.

No functional change intended.

(cherry picked from commit 81b521d2c0edaab4581546af18298310e6318b5d)

wpa: Correctly build the hostapd BSD driver

driver.bsd.c initializes itself differently when built for
hostapd than it does when built for wpa_supplicant.

(cherry picked from commit a0f2aa9318a21f401a0aef2cde666edc56a92b46)

pf tests: altq:codel_bridge requires if_bridge

Check that the bridge module is loaded before running this test.
It likely will be (as a result of running the bridge tests), but if it's
not we'll get spurious failures.

MFC after: 3 days
Sponsored by: Rubicon Communications, LLC ("Netgate")

(cherry picked from commit d491b42535db50693eac5946557f7527f9903b4b)

pxeboot: improve and simplify rx handling

This pushes the bulk of the rx servicing into a single loop that's only
slightly convoluted, and it addresses a problem with rx handling in the
process. If we hit a tx interrupt while we're processing, we'd
previously drop the frame on the floor completely and ultimately
timeout, increasing boot time on particularly busy hosts as we keep
having to backoff and resend.

After this patch, we don't seem to hit timeouts at all on zoo anymore
though loading a 27M kernel is still relatively slow (~1m20s).

Sponsored By: National Bureau of Economic Research
Sponsored by: Klara, Inc.

(cherry picked from commit 3daa8e165c661c1b45e759f4997f447384c15446)

caroot: cumulative cert update

This adds a note in all existing certs that they are explicitly trusted
for server auth, and also:

- Seven (7) added
- Nineteen (19) removed

(cherry picked from commit 446169e0b6f04b96960540784539c218f5a14c86)
(cherry picked from commit 3016c5c2bf68d8c6ebf303939f20092478e7a4ca)
(cherry picked from commit fac832b27105d926d9f8728d7147adb547b937d8)
(cherry picked from commit 76461921dac18b300489e326ba3df61d2809f364)

caroot: update CA bundle processor

Our current processor was identified as trusting cert not explicitly
marked for SERVER_AUTH, as well as certs that were tagged with
DISTRUST_AFTER.

Update the script to handle both scenarios. This patch was originally
authored by mandree@ for ports, and it was subsequently ported to base
caroot.

(cherry picked from commit c3510c941c0dddd09389915a9395e6f059088bab)

cam(4): Fix quick unplug/replug for SCSI.

If some device is plugged back in after unplug before the probe periph
destroyed, it will just restart the probe process. But I've found that
PROBE_INQUIRY_CKSUM flag not cleared between the iterations may cause
AC_FOUND_DEVICE not reported on the second iteration, and because of
AC_LOST_DEVICE reported during the first iteration, the device end up
configured, but without any periphs attached.

We've found that enabled serial console and 102-disk JBOD cause enough
probe delays to easily trigger the issue for half of the disks. This
change fixes it reliably on my tests.

MFC after: 2 weeks
Sponsored by: iXsystems, Inc.

(cherry picked from commit 84d5b6bd68ce6496592adb8fdcd8cf0c246ed935)

libsecureboot: define SOPEN_MAX

With commit 97cbd5e722389a575e820c4e03f38053308f08ea, the SOPEN_MAX
was removed from stand.h.

We would need better mechanism there.

(cherry picked from commit ee6dc333e1a1af08afa3d14b83e963e4cf90b77b)

PR: 258211

Correct "Fondation" typo (missing "u")

(cherry picked from commit 54399caa2f8470d9f7c404ce419362bc62d5a094)

vmm: Fix wrong assert in ivhd_dev_add_entry

The correct condition is to check the number of ivhd entries fit into
the array.

Reported by: bz
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31514

(cherry picked from commit 179bc5729dd72e0f4252c0dce72454c76782f935)

bhyve: Nuke double-semicolons

A distinct number of double-semicolons ended up in bhyve. Take a pass at
getting rid of many of these harmless typos.

(cherry picked from commit e76c0e4f4563029375dac90f1e1b3c6e82e157f9)

tmpfs: Move partial page invalidation to a separate helper

The partial page invalidation code is factored out to be a separate
helper from tmpfs_reg_resize().

Sponsored by: The FreeBSD Foundation
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31683

(cherry picked from commit 399be91098adb23aa27ca1228b81a3ad67e8bba2)

OpenSSL: Reduce diff with the upstream

(cherry picked from commit 649ccdd753790069623e192185d133fd26a03bf9)

OpenSSL: Regen manual pages for 1.1.1l

(cherry picked from commit d594d17b8569fb7bc22263e7da3fd626b99d9203)

Import OpenSSL 1.1.1l

(cherry picked from commit 9a3ae0cdef9ac9a4b8c5cc66305d9a516ce8d4a0)

ipfw_nat64: fix direct output mode

In nat64_find_route[46] handle NHF_GATEWAY flag and use destination
address from next hop to do link layer address lookup.

PR: 255928
Reviewed by: melifaro
Obtained from: Yandex LLC
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D31680

(cherry picked from commit da3a09d8941dc29f20447e263b3a6d60370c6203)

vfs_hash_rehash(): require the vnode to be exclusively locked

(cherry picked from commit f19063ab029b067e1763780aebca4bd620453110)

vfs_hash_insert: ensure that predicate is true

(cherry picked from commit 7c1e4aab7934933f0669c2b922976b30ed628a3f)

msdosfs: drop now unused DE_RENAME

(cherry picked from commit 85fb840ebf3c213e45939188303bd5fe0aca4422)