CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log

Fix fallout from r366811.

PR: 250442
Reported by: lwhsu
Reviewed by: mav
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D26855

Re-enable receive flow control for TOE TLS sockets.

Flow control was disabled during initial TOE TLS development to
workaround a hang (and to match the Linux TOE TLS support for T6).
The rest of the TOE TLS code maintained credits as if flow control was
enabled which was inherited from before the workaround was added with
the exception that the receive window was allowed to go negative.
This negative receive window handling (rcv_over) was because I hadn't
realized the full implications of disabling flow control.

To clean this up, re-enable flow control on TOE TLS sockets. The
existing TPF_FORCE_CREDITS workaround is sufficient for the original
hang. Now that flow control is enabled, remove the rcv_over
workaround and instead assert that the receive window never goes
negative matching plain TCP TOE sockets.

Reviewed by: np
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D26799

cxgbe(4): Fix page fault in t4_get_lb_stats with 2 port T5 cards.

PR: 250449
Reported by: freqlabs@
MFC after: 1 week
Sponsored by: Chelsio Communications

Fix a couple of bugs for asym crypto introduced in r359374.

- Check for null pointers in the crypto_drivers[] array when checking
for empty slots in crypto_select_kdriver().

- Handle the case where crypto_kdone() is invoked on a request where
krq_cap is NULL due to not finding a matching driver.

Reviewed by: markj
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D26811

Enable SUBDIR_PARALLEL for lib/googletest

This saves a few seconds in a parallel build since we can build the
gtest_main and gmock subdirectories in parallel.

Reviewed By: ngie
Differential Revision: https://reviews.freebsd.org/D26760

Major improvement to build parallelism for googletest internal tests

Currently the googletest internal tests build after the matching library.
However, each of these is serialized at the top level makefile.
Additionally some of the tests (e.g. the gmock-matches-test) take up to
90 seconds to build with clang -O2. Having to wait for this test to
complete before continuing to the next directory seriously slows down the
parllelism of a -j32 build.
Before this change running `make -C lib/googletest -j32 -s` in buildenv
took 202 seconds, now it's 153 due to improved parallelism.

Reviewed By: emaste (no objection)
Differential Revision: https://reviews.freebsd.org/D26748

nullfs: ensure correct lock is taken after bypass.

If lower VOP relocked the lower vnode, it is possible that nullfs
vnode was reclaimed meantime. In this case nullfs vnode no longer
shares lock with lower vnode, which breaks locking protocol.

Check for the condition and acquire nullfs vnode lock if detected.

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week

vgonel(): avoid recursing into VOP_INACTIVE().

It is a common pattern for filesystems' VOP_INACTIVE() implementation
to forcibly reclaim the vnode when its state is final. For instance,
UFS vnode with zero link count is removed, and since it is
inactivated, the last open reference on it is dropped.

On the other hand, vnode might get spurious usecount reference for
many reasons. If the spurious reference exists while vgonel() checks
for active state of the vnode, it would recurse into VOP_INACTIVE().

Fix it by checking and not doing inactivation when vgone() was called
from inactive VOP.

Reported and tested by: pho
Discussed with: mjg
Sponsored by: The FreeBSD Foundation
MFC after: 1 week

uma: fix KTR message after r366840

Reported by: bz
Sponsored by: The FreeBSD Foundation

cache: promote negative entries based on more than one hit

During tinderbox and similar workloads negative entries get at least one
hit before they get evicted. In the current scheme this avoidably promotes
them.

Be conservative and stick to 2 hits for now.

Check TF_TOE not the tod pointer to determine if TOE is active.

The TF_TOE flag is the check used in the rest of the network stack to
determine if TOE is active on a socket. There is at least one path in
the cxgbe(4) TOE driver that can leave the tod pointer non-NULL on a
socket not using TOE.

Reported by: Sony Arpita Das <sonyarpitad@chelsio.com>
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D26803

Mark asymmetric cryptography via OCF deprecated for 14.0.

Only one MIPS-specific driver implements support for one of the
asymmetric operations.  There are no in-kernel users besides
/dev/crypto.  The only known user of the /dev/crypto interface was the
engine in OpenSSL releases before 1.1.0.  1.1.0 includes a rewritten
engine that does not use the asymmetric operations due to lack of
documentation.

Reviewed by: cem, markj
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D26810

Properly clear PCB_KERNNPX in fpu_kern_leave().

PR: 250423
Reported by: CI
Tested by: lwhsu

icmp6: Count packets dropped due to an invalid hop limit

Pad the icmp6stat structure so that we can add more counters in the
future without breaking compatibility again, last done in r358620.
Annotate the rarely executed error paths with __predict_false while
here.

Reviewed by: bz, melifaro
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D26578

link_elf_obj: Colour VM objects

This will cause the VM to back sufficiently large .text sections, such
as those in zfs.ko or amdgpu.ko on amd64, with superpage mappings when
possible.

Reviewed by: alc, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26802

uma: Respect uk_reserve in keg_drain()

When a reserve of free items is configured for a zone, the reserve must
not be reclaimed under memory pressure. Modify keg_drain() to simply
respect the reserved pool.

While here remove an always-false uk_freef == NULL check (kegs that
shouldn't be drained should set _NOFREE instead), and make sure that the
keg_drain() KTR statement does not reference an uninitialized variable.

Reviewed by: alc, rlibby
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26772

uma: Avoid depleting keg reserves when filling a bucket

zone_import() fetches a free or partially free slab from the keg and
then uses its items to populate an array, typically filling a bucket.
If a single allocation causes the keg to drop below its minimum reserve,
the inner loop ends. However, if the bucket is still not full and
M_USE_RESERVE is specified, the outer loop will continue to fetch items
from the keg.

If M_USE_RESERVE is specified and the number of free items is below the
reserved limit, we should return only a single item. Otherwise, if the
bucket size is larger than the reserve, all of the reserved items may
end up in a single per-CPU bucket, invisible to other CPUs.

Reviewed by: rlibby
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26771

vmem: Allocate btags before looping in vmem_xalloc()

BT_MAXALLOC (4) is the number of boundary tags required to complete an
allocation in the worst case: two to clip a free segment, and two to
import from a parent arena.  vmem_xalloc() preallocates four boundary
tags before attempting a search to simplify the segment allocation code.
It implements a loop that:
1) ensures that BT_MAXALLOC boundary tags are available,
2) attempts to find and clip a free segment satisfying the allocation
   constraints, and failing that,
3) attempts to import a segment.

On !UMA_MD_SMALL_ALLOC platforms the btag zone has to handle recusion:
it needs boundary tags to allocate boundary tags.  Thus we reserve
2 * BT_MAXALLOC * mp_ncpus tags for use when recursing: the factor of 2
is because there are two layers of vmem arenas, the per-domain arena and
global arena.  For a single thread, 2 * BT_MAXALLOC tags should be
sufficient.

Because of the way the loop is structured, BT_MAXALLOC tags are not
sufficient.  The first bt_fill() call may allocate BT_MAXALLOC tags,
then import a segment (consuming two tags), then attempt to top up the
preallocation before carving into the imported free segment, thus
requiring up to six tags in the worst case.  Because we don't
preallocate that many, this bug can cause deadlocks in rare scenarios.

Fix the problem by moving the preallocation out the loop.  This assumes
that only a single import is ever required to satisfy an allocation
request.

Thanks to manu, emaste and lwhsu for helping test debug patches.

Reported by: Jenkins (hardware CI lab)
Reviewed by: alc, kib, rlibby
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26770

vmem: Simplify bt_fill() callers a bit

No functional change intended.

Reviewed by: alc, kib, rlibby
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26769

Remove unused labels from the arm64 casueword*

These are unused so can be removed. While here renumber the remaining label
to be 1.

Sponsored by: Innovate UK

Assign the reserved apic region (GAS entry) to the iommu domain msi_entry.

Requested by: kib
Reviewed by: kib
Sponsored by: Innovate DSbD
Differential Revision: https://reviews.freebsd.org/D26859

vmx: Implement pmap (de)activation in C

Rewrite the code that maintains pm_active and invalidates EPTP-tagged
TLB entries in C.  Previously this work was done in vmx_enter_guest(),
in assembly, but there is no good reason for that and it makes the TLB
invalidation algorithm for nested page tables harder to review.

No functional change intended.  Now, an error from the invept
instruction results in a kernel panic rather than a vmexit.  Such errors
should occur only as a result of VMM bugs.

Reviewed by: grehan, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26830

Manage MSI iommu pages.

This allows the interrupt controller driver only need a small change to
create a map for the page the device will write to raise an interrupt.

Submitted by: andrew
Reviewed by: kib
Sponsored by: Innovate DSbD
Differential Revision: https://reviews.freebsd.org/D26705

Split the common arm64 fu* and su* asm to a macro

As these are mostly identical split out the common code to a macro.

Sponsored by: Innovate UK

Move the arm64 userspace access checks to macros

In the functions that copy between userspace and kernel space we check the
user space address is valid before performing the copy. These are mostly
identical within each type of function so create two macros to perform the
check.

Obtained from: CheriBSD
Sponsored by: Innovate UK

efibootmgr: Use returned error code for error message, not errno

efivar_unix_path_to_device_path returns the error code, it does not set errno.

Reviewed by: imp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D26852

cache: refactor negative promotion/demotion handling

This will simplify policy changes.

Use asprintf instead of sbuf

Add more explicit instructions about updating motd

Not that you can regenerate the motd by editing motd.template and
running 'service motd restart' rather than rebooting.

Small wordsmithing by me, and updated the example from FreeBSD 2.1.6.1
release to 12.1 release.

Submitted by: Dan Mack

libbe(3): install MLINKS for all of the functions provided

MFC after: 1 week

libbe(3): document be_snapshot()

While toying around with lua bindings for libbe(3), I discovered that I
apparently never documented this, despite having documented
be_is_auto_snapshot_name that references it.

MFC after: 1 week

libbe(3): const'ify a couple arguments

libbe will never need to mutate these as we either process them into a local
buffer or we just don't touch them and write to a separate out argument.

MFC after: 1 week

[zfs] Remove a non-existent directory in the build infra

This directory doesn't exist and causes gcc-6.4 to complain about
a non-existent include directory

Approved by: kevans, imp
Differential Revision: https://reviews.freebsd.org/D26846

net80211: factor out the priv(9) checks into OS specifc code.

Factor out the priv(9) checks into OS specifc code so other OSes can equally
implement them. This sorts out those XXX in the net80211 code.
We provide 3 arguments (cmd, vap, ifp) where available to the functions, in
order to allow other OSes to use that data but also in case we'd add auditing
to these check to have the information available. For now the arguments are
marked __unused.

PR: 249403
Reported by: martin(NetBSD)
Reviewed by: adrian, martin(NetBSD)
MFC after: 10 days
Sponsored by: Rubicon Communications, LLC (d/b/a "Netgate")
Differential Revision: https://reviews.freebsd.org/D26541

Significantly speed up mkimg_test

It turns out that the majority of the test time for the mkimg tests isn't
mkimg itself but rather the use of jot and hexdump which can be quite slow
on emulated platforms such as QEMU.

On QEMU-RISC-V this reduces the time for `kyua test mkimg_test` from 655
seconds to 200. And for CheriBSD on QEMU-CHERI this saves 4-5 hours (25%
of the time for the entire testsuite!) since jot ends up triggering slow
functions inside the QEMU emulation a lot.

Reviewed By: lwhsu
Differential Revision: https://reviews.freebsd.org/D26796

[libcxx] Fix atomic type for mips32 on gcc to work w/out needing libatomic

When compiling this for mips32 on gcc-6.x, we'd hit issues where we
don't have 64 bit atomics on mips32.

gcc implements this using libatomic, which we don't currently include
in our freebsd-gcc compiler packages.

So for now add this work around so mips32 works. It's also fine for
mips64. We can fix this later once we get libatomic included.

Approved by: dim
Differential Revision: https://reviews.freebsd.org/D26774

Implement flowid calculation for outbound connections to balance
connections over multiple paths.

Multipath routing relies on mbuf flowid data for both transit
and outbound traffic. Current code fills mbuf flowid from inp_flowid
for connection-oriented sockets. However, inp_flowid is currently
not calculated for outbound connections.

This change creates simple hashing functions and starts calculating hashes
for TCP,UDP/UDP-Lite and raw IP if multipath routes are present in the
system.

Reviewed by: glebius (previous version),ae
Differential Revision: https://reviews.freebsd.org/D26523

If the SIM freezes the queue at exactly the wrong moment, after
another thread has started to send in a CCB and already checked
the queue wasn't frozen, we would end up with iscsi_action()
being called despite the queue is now frozen.

Add a check to make sure this doesn't happen . Perhaps this should
be fixed at the CAM level instead, but given how the send queue and
SIM are governed by two separate mutexes, it is somewhat hard to do.

Reviewed by: imp, mav
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D26750

Make g_attach() return ENXIO for orphaned providers; update various
classes to add missing error checking.

Reviewed by: imp
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D26658

Stop calling set_syscall_retval() from linux_set_syscall_retval().
The former clobbers some registers that shouldn't be touched.

Reviewed by: kib (earlier version)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26406

Add compat.linux.dummy_rlimits, and disable by default.

Turns out the dummy rlimits fix prlimit(1), but break su(8)
(login-1:4.5-1ubuntu2) - although not sudo(8), for some reason.

MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26814

Slightly tweak linux ptrace(2) debug message; no functional changes.

MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26815

Simplify NET_EPOCH_EXIT in inp_join_group().

Suggested by: kib

Add new USB quirk.

PR: 250422
Submitted by: vidwer+fbsdbugs@gmail.com
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking

Correct location and date of the Boston Shoemakers' organization,
which significantly predated the USA.

Reference: http://www.famousdaily.com/history/boston-shoemakers-form-first-us-labor-organization.html

net80211: update for (more) VHT160 support

Implement two macros IEEE80211_VHTCAP_SUPP_CHAN_WIDTH_IS_160MHZ()
and its 80+80 counter part to check in vhtcaps for appropriate
levels of support and use the macros throughout the code.

Add vht160_chan_ranges/is_vht160_valid_freq and handle analogue
to vht80 in various parts of the code.

Add ieee80211_add_channel_cbw() which also takes the CBW flag
fields and make the former ieee80211_add_channel() a wrapper to it.
With the CBW flags we can add HT/VHT channels passing them to
getflags() for the 2/5ghz functions.

In ifconfig(8) add the regdomain_addchans() support for VHT160
and VHT80P80.

With this (+ regdoain.xml updates) VHT160 channels can be
configured, listed, and pass regdomain where appropriate.

Tested with: iwlwifi
Reviewed by: adrian
MFC after: 10 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26712

clk: fix indentation

Just fix indentation of an if() clause.
No functional changes intended.

MFC after: 3 days

ddb: add show sysinit command

Add a show sysinit command to ddb (similar to show vnet_sysinit) which
proved to be helpful to debug some ordering issues on early-mid kernel
start panics.

cache: shorten names of debug stats

cache: don't automatically evict negative entries if usage is low

The previous scheme only looked at negative entry count in relation to the
total count, leading to tons of spurious evictions if the cache is not
significantly populated.

Instead, only try the above if negative entry count goes beyond namecache
capacity.

Fix sleepq_add panic happening with too wide net epoch in mcast control.

PR: 250413
Reported by: Christopher Hall <hsw at bitmark.com>
Reviewed by: ae
Differential Revision: https://reviews.freebsd.org/D26827

riscv: zero reserved PTE bits for L2 PTEs

As was done for L3 PTEs in r362853, mask out the reserved bits when
extracting the physical address from an L2 PTE. Future versions of the
spec or custom implementations may make use of these reserved bits, in
which case the resulting physical address could be incorrect.

Submitted by: Nathaniel Filardo <nwf20@cl.cam.ac.uk>
Reviewed by: kp, mhorne
Differential Revision: https://reviews.freebsd.org/D26607

cache: erwork sysctl vfs.cache tree

Split everything into neg, debug, param and stat categories.

The legacy nchstats sysctl (queried e.g., by systat) remains untouched.

While here rename some vars to be easier on the eye.

cache: factor negative lookup out of cache_fplookup_next

cache: avoid smr in cache_neg_evict in favoro of the already held bucket lock

cache: rework parts of negative entry management

- declutter sysctl vfs.cache by moving relevant entries into
vfs.cache.neg
- add a little more parallelism to eviction by replacing the
global lock with an atomically modified counter
- track more statistics

The code needs further effort.

cache: remove entries before trying to add new ones, not after

Should allow positive entries to replace negative ones in case
the cache is full.

vfs: annotate mountlist_mtx with __exclusive_cache_line

Bump __FreeBSD_version after ptsname_r addition.

Implement ptsname_r.

MFC after: 2 weeks
PR: 250062
Reviewed by: jilles, 0mp, Ray <i maskray me>
Differential Revision: https://reviews.freebsd.org/D26647

Update OpenZFS to 2.0.0-rc3-gfc5966

- fix panic due to tqid overflow
- Improve libzfs_error_init messages
- Expose zfetch_max_idistance tunable
- Make dbufstat work on FreeBSD
- Fix EIO after resuming receive of new dataset over an existing one

Import tzdata 2020c

Changes: https://github.com/eggert/tz/blob/2020c/NEWS

MFC after: 1 day

Import tzdata 2020c

cache: add a probe reporting addition of duplicate entries

Update OpenZFS to 2.0.0-rc3-gbd565f

Try to enable multipath routing in flowid tests.

bhyve: Update TX descriptor base address and host mapping on change

bhyve sometimes segfaults when using an e1000 NIC with a Windows guest.

We are only updating our tdba and cached host mapping when the low address
register is written and when tx is set enabled, but not when the high address
or length registers are written. It is observed that Windows 10 is occasionally
enabling tx first then writing the registers in the order low, high, len. This
leaves us with a bogus base address and mapping, which causes a segfault later
when we try to copy from a descriptor that has unpredictable garbage in a
pointer.

Updating the address and mapping when any of those registers change seems to fix
that particular issue.

Reviewed by: mav, grehan (bhyve)
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D26798

libc: typo fix (s/involes/involves)

Reported by: Masahiko Sawada via twitter
MFC after: 3 days

MFC r366760: lua: update to 5.3.6

This release contains some minor bugfixes; notably:
- 2x minor Makefile fixes (not used in base)
- Long brackets with a huge number of '=' overflow some internal buffer
arithmetic.
- Joining an upvalue with itself can cause a use-after-free crash.

See here for examples: http://www.lua.org/bugs.html#5.3.5

MFC after: 2 weeks

amd64 pmap.h: explicitly provide constants values instead of relying
on some more advanced C features.

This fixes gcc-toolchain build of exception.S.

Reported and tested by: kevans
Sponsored by: The FreeBSD Foundation
MFC after: 1 week

Update arcmsr(4) to 1.50.00.00:

Add support for ARC-1886, NVMe/SAS/SATA controller.

Many thanks to Areca for continuing to support FreeBSD.

Submitted by: 黃清隆 <ching2048 areca com tw>
MFC after: 2 weeks

Makefile: add a small blurb about building with gcc xtoolchain

The key details are to install the appropriate flavor of devel/freebsd-gcc9
and pass CROSS_TOOLCHAIN while building.

Suggested by: kib

This fixes some fun type size truncation that shows up giving errors like
" changes value from '287948901175001088' to '0' "

.. which turns out is a known issue with later gcc's.

eg - https://lists.gnu.org/archive/html/bug-gnulib/2012-03/msg00154.html

It was fixed up upstream corelib/gnulib in commit hash
252b52457da7887667c036d18cc5169777615bb0
(eg in https://github.com/coreutils/gnulib/commit/252b52457da7887667c036d18cc5169777615bb0)

TEST PLAN
- compiled/run for gcc-6.4 on amd64

Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D26804

arm64: export a few more HWCAPs

These were missed in the previous pass. The extensions (partially)
supported by this change are:
- ARMv8.2-FHM, Floating-point multiplication variant
- ARMv8.4-LSE, Large System Extensions
- ARMv8.4-DIT, Data Independent Timing instructions

Reviewed by: andrew, markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26707

Update the ID_AA64MMFR2_EL1 register definitions

This brings these definitions in sync with the ARMv8.6 version of the
architecture reference manual.

Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26706

lua: update to 5.3.6

This release contains some minor bugfixes; notably:
- 2x minor Makefile fixes (not used in base)
- Long brackets with a huge number of '=' overflow some internal buffer
arithmetic.
- Joining an upvalue with itself can cause a use-after-free crash.

See here for examples: http://www.lua.org/bugs.html#5.3.5

Trigger soft lifetime expiration on sequence number

This patch adds 80% of UINT32_MAX limit on sequence number.
When sequence number reaches limit kernel sends SADB_EXPIRE message to
IKE daemon which is responsible to perform rekeying.

Submitted by:           Patryk Duda <pdk@semihalf.com>
Reviewed by:            ae
Differential revision:  https://reviews.freebsd.org/D22370
Obtained from:          Semihalf
Sponsored by:           Stormshield

Add support for IPsec ESN and pass relevant information to crypto layer

Implement support for including IPsec ESN (Extended Sequence Number) to
both encrypt and authenticate mode (eg. AES-CBC and SHA256) and combined
mode (eg. AES-GCM). Both ESP and AH protocols are updated. Additionally
pass relevant information about ESN to crypto layer.

For the ETA mode the ESN is stored in separate crp_esn buffer because
the high-order 32 bits of the sequence number are appended after the
Next Header (RFC 4303).

For the AEAD modes the high-order 32 bits of the sequence number
[e.g.  RFC 4106, Chapter 5 AAD Construction] are included as part of
crp_aad (SPI + ESN (32 high order bits) + Seq nr (32 low order bits)).

Submitted by:           Grzegorz Jaszczyk <jaz@semihalf.com>
                        Patryk Duda <pdk@semihalf.com>
Reviewed by:            jhb, gnn
Differential revision:  https://reviews.freebsd.org/D22369
Obtained from:          Semihalf
Sponsored by:           Stormshield

Implement anti-replay algorithm with ESN support

As RFC 4304 describes there is anti-replay algorithm responsibility
to provide appropriate value of Extended Sequence Number.

This patch introduces anti-replay algorithm with ESN support based on
RFC 4304, however to avoid performance regressions window implementation
was based on RFC 6479, which was already implemented in FreeBSD.

To keep things clean and improve code readability, implementation of window
is kept in seperate functions.

Submitted by:           Grzegorz Jaszczyk <jaz@semihalf.com>
                        Patryk Duda <pdk@semihalf.com>
Reviewed by:            jhb
Differential revision:  https://reviews.freebsd.org/D22367
Obtained from:          Semihalf
Sponsored by:           Stormshield

Set default stack size for Linux apps to 8MB. This matches Linux'
defaults, makes core files smaller, and fixes applications which use
pthread_join(3) in a wrong way, namely Steam.

This is based on a patch submitted by Jason Yang, which I've reworked
to set the limit instead of only changing the value reported (which
is enough to fix the bug for Linux pthreads, but could be confusing).

PR: 248225
Submitted by: Jason_YH_Yang at wistron.com (earlier version)
Analyzed by: Alex S <iwtcex@gmail.com>
Reviewed by: emaste
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26778

Add SADB_SAFLAGS_ESN flag

This flag is going to be used by IKE daemon to signal if
Extended Sequence Number feature is going to be used.

Value for this flag was taken from OpenBSD source code
https://github.com/openbsd/src/commit/6b4cbaf181c6b60701d9fb888fd0e7a4333eecbd

Submitted by:           Patryk Duda <pdk@semihalf.com>
Reviewed by:            ae
Differential revision:  https://reviews.freebsd.org/D22366
Obtained from:          Semihalf
Sponsored by:           Stormshield

Add support for ESN in AES-NI crypto driver

This patch adds support for IPsec ESN (Extended Sequence Numbers) in
encrypt and authenticate mode (eg. AES-CBC and SHA256) and combined mode
(eg. AES-GCM).

For the encrypt and authenticate mode the ESN is stored in separate
crp_esn buffer because the high-order 32 bits of the sequence number are
appended after the Next Header (RFC 4303).

For the combined modes the high-order 32 bits of the sequence number
[e.g.  RFC 4106, Chapter 5 AAD Construction] are part of crp_aad
(prepared by netipsec layer in case of ESN support enabled), therefore
non visible diff around combined modes.

Submitted by:           Grzegorz Jaszczyk <jaz@semihalf.com>
                        Patryk Duda <pdk@semihalf.com>
Reviewed by:            jhb
Differential revision:  https://reviews.freebsd.org/D22365
Obtained from:          Semihalf
Sponsored by:           Stormshield

Add support for ESN in cryptosoft

This patch adds support for IPsec ESN (Extended Sequence Numbers) in
encrypt and authenticate mode (eg. AES-CBC and SHA256) and combined mode
(eg. AES-GCM).

For encrypt and authenticate mode the ESN is stored in separate crp_esn
buffer because the high-order 32 bits of the sequence number are
appended after the Next Header (RFC 4303).

For combined modes the high-order 32 bits of the sequence number [e.g.
RFC 4106, Chapter 5 AAD Construction] are part of crp_aad (prepared by
netipsec layer in case of ESN support enabled), therefore non visible
diff around combined modes.

Submitted by:           Grzegorz Jaszczyk <jaz@semihalf.com>
                        Patryk Duda <pdk@semihalf.com>
Reviewed by:            jhb
Differential revision:  https://reviews.freebsd.org/D22364
Obtained from:          Semihalf
Sponsored by:           Stormshield

Prepare crypto framework for IPsec ESN support

This permits requests (netipsec ESP and AH protocol) to provide the
IPsec ESN (Extended Sequence Numbers) in a separate buffer.

As with separate output buffer and separate AAD buffer not all drivers
support this feature. Consumer must request use of this feature via new
session flag.

Submitted by:           Grzegorz Jaszczyk <jaz@semihalf.com>
                        Patryk Duda <pdk@semihalf.com>
Reviewed by:            jhb
Differential revision:  https://reviews.freebsd.org/D24838
Obtained from:          Semihalf
Sponsored by:           Stormshield

Remove ifdefs around IS_ALIGNED() definition in the LinuxKPI.

Discussed with: manu@
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking

Improve the handling of cookie life times.
The staleness reported in an error cause is in us, not ms.
Enforce limits on the life time via sysct; and socket options
consistently. Update the description of the sysctl variable to
use the right unit. Also do some minor cleanups.
This also fixes an interger overflow issue if the peer can
modify the cookie. This was reported by Felix Weinrank by fuzz testing
the userland stack and in
https://oss-fuzz.com/testcase-detail/4800394024452096

MFC after: 3 days

Make linux getrlimit(2) and prlimit(2) return something reasonable
for linux-specific limits. Fixes prlimit (util-linux-2.31.1-0.4ubuntu3.7).

Reviewed by: emaste
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26777

Bump pseudofs size limit from 128kB to 1MB. The old limit could result
in process' memory maps being truncated.

PR: 237883
Submitted by: dchagin
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20575

cache: flip inverted condition in previous

It happened to not affect correctness in that the fallback code would
simply neglect to promote the entry.

cache: support negative entry promotion in slowpath smr

cache: elide vhold/vdrop around promoting negative entry

cache: dedup code for negative promotion

cache: neglist -> nl; negstate -> ns

No functional changes.

Bump the ISO EFI partition size from 1024 to 2048, following r366732.

Suggested by: imp
Sponsored by: Rubicon Communications, LLC (netgate.com)

Simplify preload_dump() condition

Hiding this feature behind RB_VERBOSE is gratuitous. The tunable is enough
to limit its use to only those who explicitly request it.

Suggested by: kevans

kldxref: Avoid buffer overflows in parse_pnp_list

We convert a string like "W32:vendor/device" into "I:vendor;I:device",
where the output is longer than the input, but only allocate space equal
to the length of the input, leading to a buffer overflow.

Instead use open_memstream so we get a safe dynamically-grown buffer.

Found by: CHERI
Reviewed by: imp, jhb (mentor)
Approved by: imp, jhb (mentor)
Obtained from: CheriBSD
Differential Revision: https://reviews.freebsd.org/D26637

cache: split hotlist between existing negative lists

This simplifies the code while allowing for concurrent negative eviction
down the road.

Cache misses increased slightly due to higher rate of evictions allowed by
the change.

The current algorithm remains too aggressive.

cache: make neglist an array given the static size

Drop unsolicited responses to the still attaching CODECs.

It is reported to fix kernel panics when early unsolicited responses
delivered to the CODEC device not having driver attached yet.

PR: 250248
Reported by: Rajeev Pillai <rajeev_v_pillai@yahoo.com>
Reviewed by: avg
MFC after: 2 weeks

Increase the amd64 ISO ESP file size from 800KB to 1024KB.

At some poing over the last week, the bootx64.efi file has grown
past the 800KB threshold, resulting in being unable to copy it to
the EFI/BOOT directory.

# stat -f %z efiboot.znWo7m
819200
# stat -f %z stand-test.PIEugN/EFI/BOOT/bootx64.efi
842752

The comment in the script that creates the ISOs suggests that 800KB
is the maximum allowed for the boot code, however I was able to
boot an ISO with a 1024KB boot partition. Additionally, I verified
against an ISO from OtherOS, where the boot EFI partition is 2.4MB.

Sponsored by: Rubicon Communications, LLC (netgate.com)