CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log

Add quirks for Linux ABI signals handling

(cherry picked from commit 870e197d52c1cb8c3ed6d04ddae34dcae57cb657)

Add a knob to disable dequeueing SIGCHLD on waiting for live process

(cherry picked from commit a12e901a5a65417849c1ccf1e37b8d092fa438da)

Add a knob to not drop signal with default ignored or ignored actions

(cherry picked from commit bc38762474caed2d41d2562e28f56aa211f47ceb)

sigwait: add comment explaining EINTR/ERESTART details

(cherry picked from commit acced8b043c5df0ebd51934bca6dcae3322cf890)

sigwait(2) and sigtimedwait(2) must not be restarted.

(cherry picked from commit afb36e289c1d96053b6063b0e548fc7d31dbd239)

i386: Add "options HYPERV" to NOTES

This unbreaks the LINT build.

Fixes: 97993d1ebf
Reported by: mjg

(cherry picked from commit 334335cb14d4754d45265d19cd3c6e7708cca0c1)

hyperv: Fix vmbus after the i386 4/4 split

The vmbus ISR needs to live in a trampoline.  Dynamically allocating a
trampoline at driver initialization time poses some difficulties due to
the fact that the KENTER macro assumes that the offset relative to
tramp_idleptd is fixed at static link time.  Another problem is that
native_lapic_ipi_alloc() uses setidt(), which assumes a fixed trampoline
offset.

Rather than fight this, move the Hyper-V ISR to i386/exception.s.  Add a
new HYPERV kernel option to make this optional, and configure it by
default on i386.  This is sufficient to make use of vmbus(4) after the
4/4 split.  Note that vmbus cannot be loaded dynamically and both the
HYPERV option and device must be configured together.  I think this is
not too onerous a requirement, since vmbus(4) was previously
non-functional.

Reported by: Harry Schmalzbauer <freebsd@omnilan.de>
Tested by: Harry Schmalzbauer <freebsd@omnilan.de>
Reviewed by: whu, kib
Sponsored by: The FreeBSD Foundation

(cherry picked from commit 97993d1ebf592ac6689a498d5d0d2afb46758680)

Export various 128 bit long double functions from libgcc_s.so.1

These were already compiled for some time on aarch64 and riscv, by
including lib/libcompiler_rt/Makefile.inc, but never exported in the
shared library. Since gcc exports these under version GCC_4.6.0, we do
the same.

This review should replace D11482 for now. For e.g. amd64 more work is
still to be done, as compiler-rt does not seem to support 128 bit long
double math for that architecture.

Reviewed by: cem
Differential Revision: https://reviews.freebsd.org/D28690

(cherry picked from commit 790a6be5a1699291c6da87871426d0c56dedcc89)

Consistently use the SOCKBUF_MTX() and SOCK_MTX() macros

This makes it easier to change the socket locking protocols. No
functional change intended.

Sponsored by: The FreeBSD Foundation

(cherry picked from commit a100217489405e5926230c50d97aa3f886df5385)

Consistently use the SOLISTENING() macro

Some code was using it already, but in many places we were testing
SO_ACCEPTCONN directly. As a small step towards fixing some bugs
involving synchronization with listen(2), make the kernel consistently
use SOLISTENING(). No functional change intended.

Sponsored by: The FreeBSD Foundation

(cherry picked from commit f4bb1869ddd2bca89b6b6bfaf4d866efdd9243cf)

amd64: Fix propagation of LDT updates

When a process has used sysarch(2) to specify descriptors for its
private LDT, upon rfork(RFMEM) descriptors are copied into the new child
process.  Any updates to the descriptors are thus reflected to all other
processes sharing the vmspace.  However, this is incorrect in the rather
obscure case where the child process was created before the LDT was
modified.  Fix this by only modifying other processes which already
share the LDT.

Reported by: syzkaller
Reviewed by: kib
Sponsored by: The FreeBSD Foundation

(cherry picked from commit 70dd5eebc025badb7b835dfee3915d8b5f1e7468)

cpucontrol: fix extended signature matching code to avoid fallthough

PR: 256502

(cherry picked from commit 87799c5f85dc0aed7e53ca841504e3b2ffc88498)

stand: Fix __elfN(loadimage) return value

Caller functions expect __elfN(loadimage) to return a value of zero on
failure and the file size on success.

PR: 256390
Reviewed by: markj

(cherry picked from commit 1ea87e2a70c31454a8696ab2979d13d21c5575d2)

zfs: merge openzfs/zfs@3de7aeb68 (zfs-2.1-release) into stable/13

Notable upstream pull request merges:
#12054 Avoid deadlock when removing L2ARC devices under I/O
#12221 vdev_draid_min_asize() ignores reserved space

Obtained from: OpenZFS
OpenZFS commit: 3de7aeb68ac1cc1fedd99506671d9028ad1a3c20

tests: Revise FIB lookups per second benchmarking routines

Fix a bug in the LPM SEQ benchmark (missing break inside a switch block)
by restructuring the test loop, while introducing additional two
synthetic test options:

ANN: scan only the address space announced in current RIB
REP: repeat lookups over several keys in a sliding window scheme

The total of eight combinations of test options are now available
through dedicated sysctl hooks.

Differential Revision: <https://reviews.freebsd.org/D30311>
Reviewed by: melifaro
MFC after: 3 days

(cherry picked from commit b6f8436b094daf7b1c429ce74997a4daf6994fcb)

Revise FIB lookups per second benchmarking routines.

Add a LPS benchmark variant which introduces artificial dependencies
between successive lookups. While here, instead of writing the results
from the lookups to a huge array, add them to an accumulator, in a more
lightweight attempt at preventing the CPU's OOO machinery from
discarding the lookup results if they would be completely unused.

net.route.test.run_lps_rnd measures LPS throughput with independent
uniformly random keys

net.route.test.run_lps_seq measures LPS throughput with uniformly
random keys with artificial interdependencies
Reviewed by: melifaro
MFC after: 7 days
Differential Revision: https://reviews.freebsd.org/D30096

(cherry picked from commit a43104ebe7630111d7e7debc56aacf49787dcf43)

Add IPv4 fib lookup performance tests with uniform keys.

Submitted by: zec
MFC after: 1 week

(cherry picked from commit b8598e2ff65ab82da0cf6861ee12f078b40bc252)

bsdconfig: add a new mirror in Bulgaria

Provided by Telepoint Mirror Service.

Reported by: Valentin Nikolov <mirror@telepoint.bg>

(cherry picked from commit 1c9605fe1e190197b3846e01dce1e491bef0ec34)

bsdinstall: add a new mirror in Bulgaria

Provided by Telepoint Mirror Service.

Reported by: Valentin Nikolov <mirror@telepoint.bg>

(cherry picked from commit 7daa45becfd32cb38933bfdc87e8a10fc982d188)

ipfw: Update the pfil mbuf pointer in ipfw_check_frame()

ipfw_chk() might call m_pullup() and thus can change the mbuf chain
head. In this case, the new chain head has to be returned to the pfil
hook caller, otherwise the pfil hook caller is left with a dangling
pointer.

Note that this affects only the link-layer hooks installed when the
net.link.ether.ipfw sysctl is set to 1.

PR: 256439, 254015, 255069, 255104
Fixes: f355cb3e6
Reviewed by: ae
Sponsored by: The FreeBSD Foundation

(cherry picked from commit bc6a2267fffeafd3946637607a74cfd639398f9d)

ipfw.8: synopsis misses nat show form

Document the existing behavior, which is currently only available by
reading third party documentation or the source code itself.

PR: 254617
Submitted by: Oliver Kiddle
Differential Revision: https://reviews.freebsd.org/D30189

(cherry picked from commit c8250c5ada85fec8936e8eb8695d7cb80a3ce8ab)

Avoid deadlock when removing L2ARC devices under I/O

In case we have I/O and try to remove an L2ARC device a deadlock might
occur. arc_read()->zio_read()->zfs_blkptr_verify() waits for SCL_VDEV
to be dropped while holding the hash_lock. However, spa_l2cache_load()
holds SCL_ALL and waits for the hash_lock in l2arc_evict().

Fix this by moving zfs_blkptr_verify() to the top top arc_read() before
the hash_lock is taken. Verify the block pointer and return a checksum
error if damaged rather than halting the system, by using
BLK_VERIFY_LOG instead of BLK_VERIFY_HALT.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #12054

fusefs: reenable the WriteCluster.cluster_write_err test

The underlying panic was just fixed by
revision 27006229f7a40a18a61a0e8fd270bc583326b690

PR: 238565
MFC with: 27006229f7a40a18a61a0e8fd270bc583326b690

(cherry picked from commit 425bbe9e64f7af6bdb30a099bd90a32885de1ab8)

Delete obsolete Solaris compat files

These files have been unused ever since the OpenSolaris import

Sponsored by: Axcient
Reviewed By: freqlabs
Differential Revision: https://reviews.freebsd.org/D30371

(cherry picked from commit fc3ba3e9fac03897d17c318b79b52d91cfb87b9e)

gmultipath: make physpath distinct from the underlying providers'

zfsd uses a device's physical path attribute to automatically replace a
missing ZFS disk when a blank disk is inserted into the same physical
slot.  Currently gmultipath passes through its underlying providers'
physical path attribute.  That may cause zfsd to replace a missing
gmultipath provider with a newly arrived, single-path disk.  That would
be bad.

This commit fixes that problem by simply appending "/mp" to the
underlying providers' physical path, in a manner similar to what geli
already does.

Sponsored by: Axcient
Differential Revision: https://reviews.freebsd.org/D29941

(cherry picked from commit 420dbe763f15b076751443edfeeb4f676deb3c44)

pf: don't hold a lock during copyout()

copyout() can trigger page faults, so it may potentially sleep.

Reported by: avg
MFC after: 3 days
Sponsored by: Rubicon Communications, LLC ("Netgate")

(cherry picked from commit 8b5f4e692b1d1585ecfc6690552650114e3e704e)

pf: use M_WAITOK where possible

In the ioctl path use M_WAITOK allocations whereever possible. These are
less sensitive to memory pressure, and ioctl requests have no hard
deadlines.

Reviewed by: donner
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30702

(cherry picked from commit ea21980a3facfed4c2c6fd10d0f16276564fb540)

dummynet: free(NULL, M_DUMMYNET); is safe

There's no need to check pointers for NULL before free()ing them.

No functional change.

MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30382

(cherry picked from commit 1b2dbe37fa32d7255faf7d1feec7bb31414a8102)

dummynet: Fix schedlist and aqmlist locking

These are global (i.e. shared across vnets) structures, so we need
global lock to protect them. However, we look up entries in these lists
(find_aqm_type(), find_sched_type()) and return them. We must ensure
that the returned structures cannot go away while we are using them.

Resolve this by using NET_EPOCH(). The structures can be safely accessed
under it, and we postpone their cleanup until we're sure they're no
longer used.

MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30381

(cherry picked from commit 51d73df18e4d120f6f062062c18efae3ed5193a6)

VNETify dummynet

This moves dn_cfg and other parameters into per VNET variables.

The taskqueue and control state remains global.

Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D29274

(cherry picked from commit fe3bcfbda30e763a3ec56083b3a19cebbeaf8952)

Introduce DXR as an IPv4 longest prefix matching / FIB module

DXR maintains compressed lookup structures with a trivial search
procedure.  A two-stage trie is indexed by the more significant bits of
the search key (IPv4 address), while the remaining bits are used for
finding the next hop in a sorted array.  The tradeoff between memory
footprint and search speed depends on the split between the trie and
the remaining binary search.  The default of 20 bits of the key being
used for trie indexing yields good performance (see below) with
footprints of around 2.5 Bytes per prefix with current BGP snapshots.

Rebuilding lookup structures takes some time, which is compensated for by
batching several RIB change requests into a single FIB update, i.e. FIB
synchronization with the RIB may be delayed for a fraction of a second.
RIB to FIB synchronization, next-hop table housekeeping, and lockless
lookup capability is provided by the FIB_ALGO infrastructure.

DXR works well on modern CPUs with several MBytes of caches, especially
in VMs, where is outperforms other currently available IPv4 FIB
algorithms by a large margin.

Reviewed by: melifaro
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29821

(cherry picked from commit 2aca58e16f507bfcad127a0865a9d5c75c5eedc3)

Do not forward datagrams originated by link-local addresses

The current implement of ip_input() reject packets destined for
169.254.0.0/16, but not those original from 169.254.0.0/16 link-local
addresses.

Fix to fully respect RFC 3927 section 2.7.

PR: 255388
Reviewed by: donner, rgrimes, karels
Differential Revision: https://reviews.freebsd.org/D29968
Reviewed by: rgrimes, donner, karels, marcus, emaste
Differential Revision: https://reviews.freebsd.org/D30374

(cherry picked from commit 3d846e48227e2e78c1e7b35145f57353ffda56ba)
(cherry picked from commit 03b0505b8fe848f33f2f38fe89dd5538908c847e)

netgraph/bridge: malloc without flags

During tests an assert was triggered and pointed to missing flags in
the newlink function of ng_bridge(4).

Reported by: markj
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D30759

(cherry picked from commit 4c3280e58727e900d4c217054fe655e3512380f1)

LinuxKPI: add fault_flag_allow_retry_first

Used by drm 5.7.

Reviewed by: bz, hselasky, nc
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D30673

(cherry picked from commit 597cc550e7b98294617cdd41800e9f132b6bcad9)

vmm: Let guests enable SMEP/SMAP if the host supports it

Reviewed by: kib, grehan, jhb
Tested by: grehan (AMD)
Sponsored by: The FreeBSD Foundation

(cherry picked from commit 4c599db71af56c28bb0388e73bd0570a3873c0ec)

tests/netgraph: Tests for ng_vlan_rotate

Test functionality of ng_vlan_rotate(4):
- Rotate 1 to 9 stagged vlans in any possible direction and length
- Rotate random combinations of ethertypes (8100, 88a8, 9100)
- Automatic reverse rotating for backward data flow
- Test too many and too few vlans

Reviewed by: kp (earlier version)
Differential Revision: https://reviews.freebsd.org/D30670

(cherry picked from commit 6b08e68be111d50931b0d30145f8b7e3402decaf)

tests/netgraph: Tests for ng_hub

Test functionality of ng_hub(4):
- replicting traffic to anything but the sending hook
- persistence
- an unrestricted loop
- implementation limits with many hooks.

Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D30633

(cherry picked from commit 7863faa78ae271017c404c635b2a9d07379d4316)

rc.d: liberate powerd from ACPI dependency

For instance, many non-ACPI ARM systems have CPU power / frequency
levels.

(cherry picked from commit 20eb6bd8c598fdbf4e96ed4ca64a609be255ccba)

rk3328_cru: fix a typo in the SCLK_I2S2 gate definition

(cherry picked from commit ffc5dc788f05dec5fd11aff8f216c37cd56fcd7f)

zfs: unbreak stable/13 build on i386 after b0c251b0d

The build was broken because upstream merged e76373de7 (author: mav)
without fef8bd41f from openzfs/zfs/master into openzfs/zfs/zfs-2.1-release.

Temporary fix until upstream decides a way to solve this problem.

Patch by: mav
Differential Revision: https://reviews.freebsd.org/D30783

(direct commit)

zfs: unbreak stable/13 clang build on non-x86 archs after b0c251b0d

(direct commit)

systemd: import: expand $ZPOOL_IMPORT_OPTS correctly

Turns out $ZPOOL_IMPORT_OPTS expands in a shell-like fashion,
yielding 'import' '-aN' '-o' 'cachefile=none' for an unset variable,
and 'import' '-aN' '-o' 'cachefile=none' 'word1' 'word2' for a
white-spaced one, but ${ZPOOL_IMPORT_OPTS} expands like "${Z_I_O}"
would in a shell, yielding 'import' '-aN' '-o' 'cachefile=none' ''
(empty) and 'import' '-aN' '-o' 'cachefile=none' 'word1 word2' (spaced)

Fixes eec5ba113e1d285d445333079a3e8184872ad00a "dracut: 90zfs: respect
zfs_force=1 on systemd systems"

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes: #12231

vdev_draid_min_asize() ignores reserved space

vdev_draid_min_asize() returns the minimum size of a child vdev.  This
is used when determining if a disk is big enough to replace a child.
It's also used by zdb to determine how big of a child to make to test
replacement.

vdev_draid_min_asize() says that the child’s asize has to be at least
1/Nth of the entire draid’s asize, which is the same logic as raidz.
However, this contradicts the code in vdev_draid_open(), which
calculates the draid’s asize based on a reduced child size:

  An additional 32MB of scratch space is reserved at the end of each
  child for use by the dRAID expansion feature

So the problem is that you can replace a draid disk with one that’s
vdev_draid_min_asize(), but it actually needs to be larger to accommodate
the additional 32MB.  The replacement is allowed and everything works at
first (since the reserved space is at the end, and we don’t try to use
it yet), but when you try to close and reopen the pool,
vdev_draid_open() calculates a smaller asize for the draid, because of
the smaller leaf, which is not allowed.

I think the confusion is that vdev_draid_min_asize() is correctly
returning the amount of required *allocatable* space in a leaf, but the
actual *size* of the leaf needs to be at least 32MB more than that.
ztest_vdev_attach_detach() assumes that it can attach that size of
device, and it actually can (the kernel/libzpool accepts it), but it
then later causes zdb to not be able to open the pool.

This commit changes vdev_draid_min_asize() to return the required size
of the leaf, not the size that draid will make available to the metaslab
allocator.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11459
Closes #12221

Do not hash unlinked inodes

In zfs_znode_alloc we always hash inodes. If the
znode is unlinked, we do not need to hash it. This
fixes the problem where zfs_suspend_fs is doing zrele
(iput) in an async fashion, and zfs_resume_fs unlinked
drain processing will try to hash an inode that could
still be hashed, resulting in a panic.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alan Somers <asomers@gmail.com>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #9741
Closes #11223
Closes #11648
Closes #12210

Added uncompress requirement

Having an old enough version of "file" and no "uncompress" program
installed can cause rpmbuild as root to crash and mangle rpmdb.

So let's add a build dependency for RPM-based systems.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes: #12071
Closes: #12168

ZTS: Add zfs_clone_livelist_dedup.ksh to Makefile.am

Commit 86b5f4c12 added a new zfs_clone_livelist_dedup.ksh test case
but didn't include it in the Makefile.am. This results in the test
not being included in the dist tarball so it's never run by the CI.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes: #12224

fread: improve performance for unbuffered reads

We can use the buffer passed to fread(3) directly in the FILE *.
The buffer needs to be reset before each call to __srefill().
This preserves the expected behavior in all cases.

The change was found originally in OpenBSD and later adopted by NetBSD.

MFC after: 2 weeks
Obtained from: OpenBSD (CVS 1.18)

Differential Revision: https://reviews.freebsd.org/D30548

libcasper: fix descriptors numbers

Casper services expect that the first 3 descriptors (stdin/stdout/stderr)
will point to /dev/null. Which Casper will ensure later. The Casper
services are forked from the original process. If the initial process
closes one of those descriptors, Casper may reuse one of them for it on
purpose. If this is the case, then renumarate the descriptors used by
Casper to higher numbers. This is done already after the fork, so it
doesn't break the parent process.

PR: 255339
Reported by: Borja Marcos <borjam (at) sarenet.es>
Tested by: jkim@

(cherry picked from commit aa310ebfba3d49a0b6b03a103b969731a8136a73)

tests/netgraph: Tests for ng_bridge

Test functionality of ng_bridge(4):
- replicating traffic to anything but the sending hook
- persistence
- detect loops
- unicast to only one link of many
- stretch to implementation limits on broadcast

Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D30647
Differential Revision: https://reviews.freebsd.org/D30699

(cherry picked from commit 61814702398ce29430b2bef75cbdd6fd2c07ad12)
(cherry picked from commit 5554abd9cc9702af30af90925b33c5efff4e7d88)

tests/netgraph: Inital framework for testing libnetgraph

Provide a framework of functions to test various netgraph modules.
Tests contain:
- creating, renaming, and destroying nodes
- connecting and removing hooks
- sending and receiving data
- sending ASCII messages and receiving binary responses
- errors can be passed for indiviual inspection or fail the test

Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D30629
Differential Revision: https://reviews.freebsd.org/D30657
Differential Revision: https://reviews.freebsd.org/D30671
Differential Revision: https://reviews.freebsd.org/D30699

(cherry picked from commit 24ea1dbf257aa6757f469bcd859f90e9ad851e59)
(cherry picked from commit 09307dbfb888a98232096c751a96ecb3344aa77c)
(cherry picked from commit 9021c46603bf29b9700f24b8dce8796b434d7c8f)
(cherry picked from commit 5554abd9cc9702af30af90925b33c5efff4e7d88)

Also contains some fixups:
- indent all files correctly
- finish factoring out
- remove debugging code
- check for renaming issues reported in PR241954

PR: 241954
Differential Revision: https://reviews.freebsd.org/D30692
Differential Revision: https://reviews.freebsd.org/D30714
Differential Revision: https://reviews.freebsd.org/D30713

(cherry picked from commit a664ade93972ce617f0888ff79e715dff9cf0f87)
(cherry picked from commit 0afa9be03937d60cb5aeba64c81e3e2165bd3737)
(cherry picked from commit 43e4821315c31db067e23564b9bfafb519e77b2b)

tcp: Missing mfree in rack and bbr

Recently (Nov) we added logic that protects against a peer negotiating a timestamp, and
then not including a timestamp. This involved in the input path doing a goto done_with_input
label. Now I suspect the code was cribbed from one in Rack that has to do with the SYN.
This had a bug, i.e. it should have a m_freem(m) before going to the label (bbr had this
missing m_freem() but rack did not). This then caused the missing m_freem to show
up in both BBR and Rack. Also looking at the code referencing m->m_pkthdr.lro_nsegs
later (after processing) is not a good idea, even though its only for logging. Best to
copy that off before any frees can take place.

Reviewed by: mtuexen
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D30727

(cherry picked from commit ba1b3e48f5be320f0590bc357ea53fdc3e4edc65)

tcp: Mbuf leak while holding a socket buffer lock.

When running at NF the current Rack and BBR changes with the recent
commits from Richard that cause the socket buffer lock to be held over
the ip_output() call and then finally culminating in a call to tcp_handle_wakeup()
we get a lot of leaked mbufs. I don't think that this leak is actually caused
by holding the lock or what Richard has done, but is exposing some other
bug that has probably been lying dormant for a long time. I will continue to
look (using his changes) at what is going on to try to root cause out the issue.

In the meantime I can't leave the leaks out for everyone else. So this commit
will revert all of Richards changes and move both Rack and BBR back to just
doing the old sorwakeup_locked() calls after messing with the so_rcv buffer.

We may want to look at adding back in Richards changes after I have pinpointed
the root cause of the mbuf leak and fixed it.

Reviewed by: mtuexen,rscheff
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D30704

(cherry picked from commit 67e892819b26c198e4232c7586ead7f854f848c5)

tcp: LRO timestamps have lost their previous precision

Recently we had a rewrite to tcp_lro.c that was tested but one subtle change
was the move to a less precise timestamp. This causes all kinds of chaos
in tcp's that do pacing and needs to be fixed to use the more precise
time that was there before.

Reviewed by: mtuexen, gallatin, hselasky
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D30695

(cherry picked from commit b45daaea95abd8bda52caaacf120f9197caab3e7)

arm64: Fix pmap_copy()'s handling of 2MB mappings

When copying mappings from parent to child, we clear the accessed and
dirty bits.  This is done for both 4KB and 2MB PTEs.  However,
pmap_demote_l2() asserts that writable superpages must be dirty.  This
is to avoid races with the MMU setting the dirty bit during promotion
and demotion.  pmap_copy() can create clean, writable superpage
mappings, so it violates this assertion.

Modify pmap_copy() to preserve the accessed and dirty bits when copying
2MB mappings, like we do on amd64.

Fixes: ca2cae0b4dd
Reported by: Jenkins via mhorne
Reviewed by: alc, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30643

(cherry picked from commit 4e4035ef1fb5e2f9da6b658ffae8a54862b4d018)

Fix handling of D_GIANTOK

It was meant to suppress only the printf(), not the subsequent injection
of Giant-protected thunks for various file operations.

Fixes: fbeb4ccac9
Reported by: pho
Tested by: pho
Pointy hat: markj

(cherry picked from commit 887c753c9f451322cae3efbf9b63f53f3d9011c8)

riscv: Rename pmap_fault_fixup() to pmap_fault()

This is consistent with other platforms, specifically arm and arm64. No
functional change intended.

Reviewed by: jrtc27
Sponsored by: The FreeBSD Foundation

(cherry picked from commit 317113bb125166f6ba3035a29408339af38cca54)

arm: Remove last_fault_code

It is unused since the removal of pmap-v4.c in commit b88b275145.

Sponsored by: The FreeBSD Foundation

(cherry picked from commit 62ba0def5584bc1fc84fc7df6d7a1256e4a34fb8)

riscv: Handle hardware-managed dirty bit updates in pmap_promote_l2()

pmap_promote_l2() failed to handle implementations which set the
accessed and dirty flags. In particular, when comparing the attributes
of a run of 512 PTEs, we must handle the possibility that the hardware
will set PTE_D on a clean, writable mapping.

Following the example of amd64 and arm64, change riscv's
pmap_promote_l2() to downgrade clean, writable mappings to read-only, so
that updates are synchronized by the pmap lock.

Fixes: f6893f09d
Reported by: Nathaniel Filardo <nwf20@cl.cam.ac.uk>
Tested by: Nathaniel Filardo <nwf20@cl.cam.ac.uk>
Reviewed by: jrtc27, alc, Nathaniel Filardo
Sponsored by: The FreeBSD Foundation

(cherry picked from commit c05748e028b84c216d0161e70418f8cb09e074e4)

Suppress D_NEEDGIANT warnings for some drivers

During boot we warn that the kbd and openfirm drivers are Giant-locked
and may be deleted. Generally, the warning helps signal that certain
old drivers are not being maintained and are subject to removal, but
this doesn't really apply to certain drivers which are harder to
detangle from Giant.

Add a flag, D_GIANTOK, that devices can specify to suppress the
misleading warning. Use it in the kbd and openfirm drivers.

Reviewed by: imp, jhb
Sponsored by: The FreeBSD Foundation

(cherry picked from commit fbeb4ccac990fdb3bc26ab925a3ca8e7d2f89721)

iwn: adjust EEPROM read timeout for Intel 4965AGN M2

Reading EEPROM from Intel 4965AGN M2 takes 60 us which was causing panic
on system startup.

PR: 255465
Reviewed by: markj

(cherry picked from commit 03d4b58feee396d392668f192ecdde08ecc8036c)

ngatm: Handle errors from uni_msg_extend()

uni_msg_extend() may fail due to a memory allocation failure. In this
case, though, the message is freed, so callers shouldn't touch it.

PR: 255861
Reviewed by: harti
Sponsored by: The FreeBSD Foundation

(cherry picked from commit e755e2776ddff729ae4102f3273473aa33b00077)

arm64: Use the right PTE when downgrading perms in pmap_promote_l2()

When promoting a run of small mappings to a superpage, we have to
downgrade clean, writable mappings to read-only, to handle the
possibility that the MMU will concurrently mark one of the mappings as
dirty.

The code which performed this operation for the first PTE in the run
used the wrong PTE pointer. As a result, the comparison would always
fail, aborting the promotion. This only occurs when promoting writable,
clean mappings.

Fixes: ca2cae0b4dd
Reviewed by: alc, kib
Sponsored by: The FreeBSD Foundation

(cherry picked from commit a48f51b3d396664f9b0a91f016159f4e4324da85)

linuxkpi: Add list_for_each_entry_lockless() macro

This is needed by the drm-kmod 5.7 update.

Approved by: hselasky (src)
Differential Revision: https://reviews.freebsd.org/D30708

(cherry picked from commit b47f461c8e67253fdb394968428b760e880baa08)

pf: Convenience function for optional (numeric) arguments

Add _opt() variants for the uint* functions. These functions set the
provided default value if the nvlist doesn't contain the relevant value.
This is helpful for optional values (e.g. when the API is extended to
add new fields).

While here simplify the header by also using macros to create the
prototypes for the macro-generated function implementations.

Reviewed by: scottl
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30510

(cherry picked from commit 7c4342890bf17b72f0d79ada1326d9cbf34e736c)

LinuxKPI: add pr_err_once

Reviewed by: hselasky, emaste
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D30672

(cherry picked from commit 05c2d94a081d5948560a01c26c7f432960cde606)

tcp: fix two bugs in new reno

* Completely initialise the CC module specific data
* Use beta_ecn in case of an ECN event whenever ABE is enabled
or it is requested by the stack.

Reviewed by: rscheff, rrs
Sponsored by: Netflix, Inc.

(cherry picked from commit fa3746be4203fc9a3414afb21d964eec8bad74f8)

tcp: remove debug output from RACK

Reported by: iron.udjin@gmail.com, Marek Zarychta
Reviewed by: rrs
PR: 256538
Differential Revision: https://reviews.freebsd.org/D30723
Sponsored by: Netflix, Inc.

(cherry picked from commit f1536bb53898b12e2d19938f8fe2d04b5e5d12a6)

tcp: fix compilation of IPv4-only builds

PR: 256538
Reported by: iron.udjin@gmail.com
Sponsored by: Netflix, Inc.

(cherry picked from commit 224cf7b35b9bbe8d075f6004249d850c620b7855)

iwmbtfw(8): Improve Intel 7260/7265 adaptors handling

- Allow firmware downloading for hw_variant #8;
- Enter manufacturer mode for setting of event mask;
- Handle multi-event response on HCI commands for 7260;
This allows to remove kludge with skipping of 0xfc2f opcode.
- Disable patch and exit manufacturer mode on downloading failure;
- Use default firmware if correct firmware file is not found;

Reviewed by: Philippe Michaud-Boudreault <pitwuu_AT_gmail_DOT_com>
Tested by: arrowd
Differential revision: https://reviews.freebsd.org/D30543

(cherry picked from commit da93a73f834612b659b37b513c8296e1178d249b)

ums(4): Do not stop USB xfers on FIFO close when evdev is still active

This fixes lose of evdev events after moused has been killed.

While here use bitwise operations for UMS_EVDEV_OPENED flag.

Reviewed by: hselasky
Differential revision: https://reviews.freebsd.org/D30342

(cherry picked from commit 05ab03a31798d4cc96c22a8f30b1d9a0d7a3dd35)

ums(4): Start USB xfers on opening of evdev node unconditionally.

This fixes inability to start USB xfers in a case when FIFO has been
already open()-ed but no read() or poll() calls has been issued yet.

MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D30343

(cherry picked from commit 47791339f0cfe3282a6f64b5607047f7bca968ad)

usbhid(4): Add second set of USB transfers to work in polled mode.

The second set of USB transfer is requested by hkbd(4) and
should improve HID keyboard handling in kdb and panic contexts.

Reviewed by: hselasky
Differential revision: https://reviews.freebsd.org/D30486

(cherry picked from commit 9aa0e5af75d033aa2dff763dd2886daaa7213612)

usbhid(4): Fix NULL pointer dereference in usbd_xfer_max_len()

Which happens when USB transfer setup is failed.

PR: 254974
Reviewed by: hselasky
Differential revision: https://reviews.freebsd.org/D30485

(cherry picked from commit e889a462d878675551b227a382764c3879e6c2b3)

Create VM_MEMATTR_DEVICE on all architectures

This is intended to be used with memory mapped IO, e.g. from
bus_space_map with no flags, or pmap_mapdev.

Use this new memory type in the map request configured by
resource_init_map_request, and in pciconf.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D29692

(cherry picked from commit 5d2d599d3f3494d813e51e1bcd1c9693eb9c098b)

pciconf: Use VM_MEMATTR_DEVICE on supported architectures

Some architectures - armv7, armv8 and riscv use VM_MEMATTR_DEVICE
when mapping device registers in kernel. Do the same in pciconf.
On armada8k SoC all reads from BARs mapped with hitherto attribute
(VM_MEMATTR_UNCACHEABLE) return 0xff's.

Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: kib
Obtained from: Semihalf
Sponsored by: Marvell
Differential revision: https://reviews.freebsd.org/D29603

(cherry picked from commit 1c1ead9b94a1a731646327ec3b09e8f3acd577b8)

zfs: merge openzfs/zfs@c3b60eded (zfs-2.1-release) into stable/13

Notable upstream pull request merges:
  #12015 Replace zstreamdump with zstream link
  #12046 Improve scrub maxinflight_bytes math.
  #12052 FreeBSD: incorporate changes to the VFS_QUOTACTL(9) KPI
  #12072 Let zfs diff be more permissive
  #12091 libzfs: On FreeBSD, use MNT_NOWAIT with getfsstat
  #12104 Reminder to update boot code after zpool upgrade
  #12114 Introduce write-mostly sums
  #12125 Modernise all (most) remaining .TH manpages
  #12145 More aggsum optimizations
  #12149 Multiple man-pages: Move to appropriate section
  #12158 Re-embed multilist_t storage
  #12177 Livelist logic should handle dedup blkptrs
  #12196 Unify manpage makefiles, move pages to better sexions, revisit some
  #12212 Remove pool io kstats

Obtained from: OpenZFS
OpenZFS commit: c3b60ededa6e6ce36a457a54451ca153c4c630dc
OpenZFS tag: zfs-2.1.0-rc7

vfs: slightly rework vn_rlimit_fsize

(cherry picked from commit 478c52f1e3654213c7d79096a2bc7f908e0b501d)

ktrace: Remove vrele() at the end of ktr_writerequest()

(cherry picked from commit 6f6cd1e8e8aa3a48b35598992f1b6c21003d35cd)

ktrace: fix a race between writes and close

(cherry picked from commit fc369a353b5b5e0f8046687fcbd78a7cd9ad1810)

Fix limit testing after 1762f674ccb571e6 ktrace commit.

(cherry picked from commit e71d5c7331700504e58cf1a35dca529381723e02)

Fix a braino in previous.

(cherry picked from commit 48235c377f960050e9129aa847d7d73019561c82)

Fix tinderbox build after 1762f674ccb571e6 ktrace commit.

(cherry picked from commit 154f0ecc10abdd3c23d233bf85e292011a130583)

libkvm: Fix build after removal of p_tracevp

(cherry picked from commit e67ef6ce667d42a235a70914159048e10039145d)

ktrace: add a kern.ktrace.filesize_limit_signal knob

(cherry picked from commit ea2b64c2413355ac0d5fc6ff597342e9437a34d4)

ktrace: use the limit of the trace initiator for file size limit on writes

(cherry picked from commit 02645b886bc62dfd8a998fd51d2e6c1bbca03ecb)

ktrace: pack all ktrace parameters into allocated structure ktr_io_params

(cherry picked from commit 1762f674ccb571e6b03c009906dd1af3c6343f9b)

ktrace: do not stop tracing other processes if our cannot write to this vnode

(cherry picked from commit a6144f713cee8f522150b1398b225eedbf4cfef1)

accounting: explicitly mark the exiting thread as doing accounting

(cherry picked from commit 9bb84c23e762e7d1b6154ef4afdcc80662692e76)

Change the return type of sv__setid_allowed from bool to int

(cherry picked from commit 62b8258a7e43f3c774f13eab758b2cfdf353073e)

linuxolator: Add compat.linux.setid_allowed knob

PR: 21463

(cherry picked from commit 598f6fb49c9ca688029b79de0a44227ab79c608c)

sysent: allow ABI to disable setid on exec.

(cherry picked from commit 2d423f7671fe452486932c8e41e7d3547afe82aa)

kern_exec.c: Add execve_nosetid() helper

(cherry picked from commit 19e6043a443ea51207786b85c8d62d070ec36005)

hyperv: register intr handler as usermode-mapped if loaded as module

(cherry picked from commit fe7d7ac40881c9d01a54bf57fff71a3af199f237)

Clean up early arm64 pmap code

Early in the arm64 pmap code we need to translate between a virtual
address and a physical address. Rather than manually walking the page
table we can ask the hardware to do it for us.

Reviewed by: kib, markj
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D30357

(cherry picked from commit e779604f1d4e5fd0cdf3a9d1bb756b168f97b39c)

Update the EFI timer to be called once a second

There is no need to call it evert 10ms when we need 1s granularity.
Update to update the time every second.

Reviewed by: imp, manu, tsoome
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D30227

(cherry picked from commit 93f7be080f3ad0bd71190d87aa2043d714270206)

Use '.arch_extension crc' in the arm64 crc32 code

We don't care about the base architecture here, just that the crc
extension is enabled.

Sponsored by: Innovate UK

(cherry picked from commit 0ec3e991112d85a790ca3bbb4175652f37f4bd15)

Implement bus_map_resource on arm64

This will allow us to allocate an unmapped memory resource, then
later map it with a specific memory attribute.

This is also needed for virtio with the modern PCI attachment.

Reviewed by: kib (via D29723)
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D29694

(cherry picked from commit fe3822497726ab84a1e3753be41e43e4d51aab0b)

Use if ... else when printing memory attributes

In vmstat there is a switch statement that converts these attributes to
a string. As some values can be duplicate we have to hide these from
userspace.

Replace this switch statement with an if ... else macro that lets us
repeat values without a compiler error.

Reviewed by: kib
MFC after: 2 weeks
Sponsored by: ABT Systems Ltd
Differential Revision: https://reviews.freebsd.org/D29703

(cherry picked from commit 15221c552b3cabcbf26613246e855010b176805a)

Clean up the style in the arm64 bus.h

MFC after: 2 weeks
Sponsored by: Innovate UK

(cherry picked from commit 5998328e55f8850718a6b48842823eb0a6524ae6)

nfscl: Use hash lists to improve expected search performance for opens

A problem was reported via email, where a large (130000+) accumulation
of NFSv4 opens on an NFSv4 mount caused significant lock contention
on the mutex used to protect the client mount's open/lock state.
Although the root cause for the accumulation of opens was not
resolved, it is obvious that the NFSv4 client is not designed to
handle 100000+ opens efficiently. When searching for an open,
usually for a match by file handle, a linear search of all opens
is done.

Commit 3f7e14ad9345 added a hash table of lists hashed on file handle
for the opens. This patch uses the hash lists for searching for
a matching open based of file handle instead of an exhaustive
linear search of all opens.
This change appears to be performance neutral for a small number
of opens, but should improve expected performance for a large
number of opens.

This commit should not affect the high level semantics of open
handling.

(cherry picked from commit 96b40b896772bbce5f93d62eaa640e1eb51b4a30)