CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log

Convert OFED rtable interactions to the new routing KPI.

Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D24387

mmc_fdt_helpers: Do not schedule a card detection is there is no cd gpio

If the fdt node doesn't have a cd-gpios properties or if the node is set
as non-removable we do not init the card detection timeout task as it is
useless so don't schedule it too.

MFC after: 1 month
X-MFC-With: r359924

Convert pf rtable checks to the new routing KPI.

Switch uRPF to use specific fib(9)-provided uRPF.
Switch MSS calculation to the latest fib(9) kpi.

Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D24386

Convert ip6_forward() to the new routing KPI.

Update ip6_forward() internals to use deembedded IPv6 addresses
to simplify calls to the new KPI and prepare for the future
scope-embedding cleanup.

Add in6_get_unicast_scopeid() and in6_set_unicast_scopeid() scopeid
operation functions tailored for unicast processing.

Reviewed by: ae
Differential Revision: https://reviews.freebsd.org/D24334

Add my birthday

Approved by: ehaupt (mentor)

Do not attempt to remove backward compatibility timezones.

Since r359736, these timezones are unconditionally installed.

bhyve(8): Correct copyright boilerplate for r359950

Use the text from the canonical sys/copyright.h 2-clause FreeBSD License.

Reported by: grehan (thanks!)

sys/types.h: adjust #endif comment to match reality

Submitted by: sigsys gmail com

kern uuid: break format validation out into a separate KPI

This new KPI, validate_uuid, strictly validates the formatting of the input
UUID and, optionally, populates a given struct uuid.

As noted in the header, the key differences are that the new KPI won't
recognize an empty string as a nil UUID and it won't do any kind of semantic
validation on it. Also key is that populating a struct uuid is optional, so
the caller doesn't necessarily need to allocate a bogus one on the stack
just to validate the string.

This KPI has specifically been broken out in support of D24288, which will
preload /etc/hostid in loader so that early boot hostuuid users (e.g.
anything that calls ether_gen_addr) can have a valid hostuuid to work with
once it's been stashed in /etc/hostid.

cxgbe/iw_cxgbe: Do not start the EP timer if soaccept fails.

This fixes a panic that would occur when the timer tried to close a
stale socket.

Submitted by: Krishnamraju Eraparaju @ Chelsio
MFC after: 1 week
Sponsored by: Chelsio Communications

bhyve(8): Minor cosmetic niceties in instemul failure

Print the failed instruction stream as a contiguous stream of hex.  This
is closer to something you could throw at a disassembler than 0xHH 0xHH
0xHH.

Also, use the debug.h 'raw' stdio-aware printf helper to avoid the
cascading
         line
             effect.

bhyve(8): Add VM Generation Counter ACPI device

Add an implementatation of the 'Virtual Machine Generation ID' spec to
Bhyve. The spec provides a randomly generated GUID (at bhyve start) in
device memory, along with an ACPI device with _CID VM_Gen_Counter and ADDR
evaluating to a Package pointing at that GUID.

A GPE is defined which Notifies the ACPI Device when the generation changes
(such as when a snapshot is rolled back). At this time, Bhyve does not
support snapshotting, so the GPE is never actually raised.

Suggested by: rpokala
Discussed with: grehan
Differential Revision: https://reviews.freebsd.org/D23165

bhyve(8): Add bootrom allocation abstraction

To allow more general use of the bootrom region, separate initialization from
allocation, and allocation from loading a file.

The bootrom segment is the high 16MB of the low 4GB region.

Each allocation in the segment creates a new mapping with specified protection.
By default, allocation begins at the low end of the range. However, the
BOOTROM_ALLOC_TOP flag is provided to locate a provided bootrom in the high
region it is expected to be in.

The existing ROM-file loading code is refactored to use the new interface.

Reviewed by: grehan (earlier version)
Differential Revision: https://reviews.freebsd.org/D24422

bus_dma.9: Remove erroneous usage recommendation

It is not valid to pass BUS_SPACE_UNRESTRICTED to bus_dma_tag_create()'s
nsegments parameter as it is interpreted as a very large segment count.
Subsequent allocation operations on the tag will preallocate some multiple of
that count. BUS_SPACE_UNRESTRICTED therefore indicates something like:
malloc(infinity).

Discussed with: bcr, jhb (earlier version)

Remove support for geli(4) algorithms deprecated in r348206.

This removes support for reading and writing volumes using the
following algorithms:

- Triple DES
- Blowfish
- MD5 HMAC integrity

In addition, this commit adds an explicit whitelist of supported
algorithms to give a better error message when an invalid or
unsupported algorithm is used by an existing volume.

Reviewed by: cem
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D24343

tests: audit: mark closefrom test an expected fail for now

closefrom has been converted to close_range internally; remediation is
underway for this, marking it as an expected fail for now while proper
course is determined.

PR: 245625

closefrom: clamp lowfd to >= 0; close_range's parameters are unsigned.

Pointy hat: kevans
Reported by: CI (lwhsu)

Convert IP/IPv6 forwarding, ICMP processing and IP PCB laddr selection to
the new routing KPI.

Reviewed by: ae
Differential Revision: https://reviews.freebsd.org/D24245

Fix the NFSv2 extended attribute support to handle 0 length attributes.

I did not realize that zero length attributes are allowed, but they are.
This patch fixes the NFSv4.2 client and server to handle zero length
extended attributes correctly.

Submitted by: Frank van der Linden <fllinden@amazon.com> (earlier version)
Reported by: Frank van der Linden <fllinder@amazon.com>

Reorganise nd6 notification code to avoid direct rtentry field access.

One of the goals of the new routing KPI defined in r359823 is to entirely hide
`struct rtentry` from the consumers. Doing so will allow to improve routing
subsystem internals and deliver features more easily. This change is one of
the ongoing changes to eliminate direct struct rtentry field accesses.

It introduces rtfree_func() wrapper around RTFREE() and reorganises nd6 notification
code to avoid accessing most of the rtentry fields.

Reviewed by: ae
Differential Revision: https://reviews.freebsd.org/D24404

modules: dtb: allwinner: Remove sun50i-a64-sid.dtso

File was removed in r359935

MFC after: 2 month
X-MFC-With: r359935

Remove bogus use of useracc() in (clock_)nanosleep.

There's no point in pre-checking that we can access the user's rmtp
pointer before we do it in copyout().

While here, improve style(9) compliance.

Reviewed by: imp
MFC after: 1 week
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D24409

Centralize compatability translation macros.

Copy the CP, PTRIN, etc macros from freebsd32.h into a sys/abi_compat.h
and replace existing definitation with includes where required. This
eliminates duplicate code and allows Linux and FreeBSD compatability
headers to be included in the same files.

Input from: cem, jhb
Obtained from: CheriBSD
MFC after: 2 weeks
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D24275

modules: dtb: allwinner: Remove non existant files

Those files have been removed in r359935.

MFC after: 2 months
X-MFC-With: r359935

allwinner: aw_thermal: Cope with DTS changes

The upstream DTS now include the thermal device node and the SID
calibration entry.
Update our driver to cope with this change and remove the DTB
overlays that aren't needed anymore.

MFC after: 2 months
X-MFC-With: r359934

dts: Import DTS from Linux 5.6

files: Add mmc_fdt_helpers for mmccam enabled config

MFC after: 1 month
X-MFC-With: r359924

sysent: re-roll after r359930

Mark closefrom(2) COMPAT12, reimplement in libc to wrap close_range

Include a temporarily compatibility shim as well for kernels predating
close_range, since closefrom is used in some critical areas.

Reviewed by: markj (previous version), kib
Differential Revision: https://reviews.freebsd.org/D24399

Import DTS files from Linux 5.6

arm: dwmmc: Use mmc_fdt_helpers

Use the mmc_fdt_parse function instead of parsing everything in the
driver.

MFC after: 1 month

Improve the TCP blackhole detection. The principle is to reduce the
MSS in two steps and try each candidate two times. However, if two
candidates are the same (which is the case in TCP/IPv6), this candidate
was tested four times. This patch ensures that each candidate actually
reduced the MSS and is only tested 2 times. This reduces the time window
of missclassifying a temporary outage as an MTU issue.

Reviewed by: jtl
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D24308

arm: allwinner: aw_mmc: Use the mmc_fdt_helper

The fdt properties are now parsed via the help of mmc_fdt_helper functions.
This also adds card detection.
Note that on some boards (like the Pine64) card detection is broken due to
a missing resistor on the cd pin.

MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D23268

Those functions are here to help fdt mmc controller drivers to parse
the dts to find the supported speeds and the regulators.
Not all DTS have every settings properly defined so host controller
will still have to add some caps themselves.
It also add a mmc_fdt_gpio_setup function which will read the cd-gpios
property and register it as the CD pin.
If the pin support interrupts one will be registered and the cd_helper
function will be called.
If the pin doesn't support interrupts the internal taskqueue will poll
for change and call the same cd_helper function.
mmc_fdt_gpio_setup will also parse the wp-gpio property and MMC drivers
can know the write-protect pin value by calling the
mmc_fdt_gpio_get_readonly function.

MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D23267

Make sonewconn() overflow messages have per-socket rate-limits and values.

sonewconn() emits debug-level messages when a listen socket's queue
overflows. Currently, sonewconn() tracks overflows on a global basis. It
will only log one message every 60 seconds, regardless of how many sockets
experience overflows. And, when it next logs at the end of the 60 seconds,
it records a single message referencing a single PCB with the total number
of overflows across all sockets.

This commit changes to per-socket overflow tracking. The code will now
log one message every 60 seconds per socket. And, the code will provide
per-socket queue length and overflow counts. It also provides a way to
change the period between log messages using a sysctl.

Reviewed by: jhb (previous version), bcr (manpages)
MFC after: 2 weeks
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D24316

Print more detail as part of the sonewconn() overflow message.

When a socket's listen queue overflows, sonewconn() emits a debug-level
log message. These messages are sometimes useful to systems administrators
in highlighting a process which is not keeping up with its listen queue.

This commit attempts to enhance the usefulness of this message by printing
more details about the socket's address. If all else fails, it will at
least print the domain name of the socket.

Reviewed by: bz, jhb, kbowling
MFC after: 2 weeks
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D24272

Make the path length of UNIX domain sockets specified by a #define.
Also, add a comment describing the historical context for this length.

Reviewed by: bz, jhb, kbowling (previous version)
MFC after: 2 weeks
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D24272

Bump FreeBSD version after r359919 (KTLS / unmapped mbuf changes)

The above changes mbufs, and any module using unmapped mbufs
would need to be re-compiled.

Sponsored by: Netflix

KTLS: Re-work unmapped mbufs to carry ext_pgs in the mbuf itself.

While the original implementation of unmapped mbufs was a large
step forward in terms of reducing cache misses by enabling mbufs
to carry more than a single page for sendfile, they are rather
cache unfriendly when accessing the ext_pgs metadata and
data. This is because the ext_pgs part of the mbuf is allocated
separately, and almost guaranteed to be cold in cache.

This change takes advantage of the fact that unmapped mbufs
are never used at the same time as pkthdr mbufs. Given this
fact, we can overlap the ext_pgs metadata with the mbuf
pkthdr, and carry the ext_pgs meta directly in the mbuf itself.
Similarly, we can carry the ext_pgs data (TLS hdr/trailer/array
of pages) directly after the existing m_ext.

In order to be able to carry 5 pages (which is the minimum
required for a 16K TLS record which is not perfectly aligned) on
LP64, I've had to steal ext_arg2. The only user of this in the
xmit path is sendfile, and I've adjusted it to use arg1 when
using unmapped mbufs.

This change is almost entirely mechanical, except that we
change mb_alloc_ext_pgs() to no longer allow allocating
pkthdrs, the change to avoid ext_arg2 as mentioned above,
and the removal of the ext_pgs zone,

This change saves roughly 2% "raw" CPU (~59% -> 57%), or over
3% "scaled" CPU on a Netflix 100% software kTLS workload at
90+ Gb/s on Broadwell Xeons.

In a follow-on commit, I plan to remove some hacks to avoid
access ext_pgs fields of mbufs, since they will now be in
cache.

Many thanks to glebius for helping to make this better in
the Netflix tree.

Reviewed by: hselasky, jhb, rrs, glebius (early version)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D24213

posixshm: fix counting of writable mappings

Similar to mmap'ing vnodes, posixshm should count any mapping where maxprot
contains VM_PROT_WRITE (i.e. fd opened r/w with no write-seal applied) as
writable and thus blocking of any write-seal.

The memfd tests have been amended to reflect the fixes here, which notably
includes:

1. Fix for error return bug; EPERM is not a documented failure mode for mmap
2. Fix rejection of write-seal with active mappings that can be upgraded via
mprotect(2).

Reported by: markj
Discussed with: markj, kib

Plug netmask NULL check during route addition causing kernel panic.
This bug was introduced by the r359823.

Reported by: hselasky

Improve manual page formatting

- Use appropriate macros for command arguments.
- Increase option list indentation for better readability.

MFC after: 3 days

Postpone multipath seed init till SI_SUB_LAST, as it is needed only after
some useland program installs multiple paths to the same destination.

While here, make multipath init conditional.

Discussed with: cem,ian

Re-organize the NFS file handle affinity code for the NFS server.

The file handle affinity code was configured to be used by both the
old and new NFS servers. This no longer makes sense, since there is
only one NFS server.
This patch copies a majority of the code in sys/nfs/nfs_fha.c and
sys/nfs/nfs_fha.h into sys/fs/nfsserver/nfs_fha_new.c and
sys/fs/nfsserver/nfs_fha_new.h, so that the files in sys/nfs can be
deleted. The code is simplified by deleting the function callback pointers
used to call functions in either the old or new NFS server and they were
replaced by calls to the functions.

As well as a cleanup, this re-organization simplifies the changes
required for handling of external page mbufs, which is required for KERN_TLS.

This patch should not result in a semantic change to file handle affinity.

Remove FreeBSD/armv4 specific bits from CK.

Now that armv4/v5 is gone, remove the bits that implemented atomic operations
by disabling interrupts.
Those were specific to FreeBSD and never reached upstream.

lagg: stop double-counting output errors and counting drops as errors

Before this change, lagg double-counted errors from lagg members, and counted
every drop by a lagg member as an error. Eg, if lagg sent a packet, and the
underlying hardware driver dropped it, a counter would be incremented by both
lagg and the underlying driver.

This change attempts to fix that by incrementing lagg's counters only for
errors that do not come from underlying drivers.

Reviewed by: hselasky, jhb
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D24331

[evdev] Use proper mutex reference in autorepeat callout initialization.

This fixes panic occuring when evdev key autorepeat is enabled by driver
which initializes evdev with external mutex.

Ensure `kyua list` working when there is no /dev/nvme*

Sponsored by: The FreeBSD Foundation

Checks here against useracc are not useful and are racy.

copyin/copyout are sufficient to guard against bad addresses. They will return
EFAULT if the user is up to no good (by choice or ignorance). There's no point
in checking, since it doesn't even improve the error messages.

Noticed by: jhb
Reviewed by: brooks, jhb

Remove stale comment

There's no useracc here, and even if there was it shouldn't be here. vmapbuf is
sufficient and as the comment says, useracc is racy.

Export a sysctl count of RX FIFO overrun events.

uart(4) backends currently detect RX FIFO overrun errors and report
them to the uart(4) core layer.  They are then reported to the generic
TTY layer which promptly ignores them.  As a result, there is
currently no good way to determine if a uart is experiencing RX FIFO
overruns.  One could add a generic per-tty counter, but there did not
appear to be a good way to export those.  Instead, add a sysctl under
the uart(4) sysctl tree to export the count of overruns.

Reviewed by: brooks
MFC after: 2 weeks
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D24368

Correct baud rate error calculation.

Shifting right by 1 is not the same as dividing by 2 for signed
values.  In particular, dividing a signed value by 2 gives the integer
ceiling of the (e.g. -5 / 2 == -2) whereas shifting right by 1 always
gives the floor (-5 >> 1 == -3).

An embedded board with a 25 Mhz base clock results in an error of
-30.5% when used with a baud rate of 115200.  Using division, this
truncates to -30% and is permitted.  Using the shift, this fails and
is rejected causing TIOCSETA requests to fail with EINVAL and breaking
getty(8).

Using division gives the same error range for both over and under baud
rates and also makes the code match the behavior documented in the
existing comment about supporting boards with 25 Mhz clocks.

Reported by: imp
MFC after: 2 weeks
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D24367

Disable QUEUE_MACRO_DEBUG_TRACE in LINT kernels

It changes the size of TAILQ_ENTRY, which obviously impacts ABI in a variety of
ways. Some of these things are _Static_asserted. For now, mask the option
from LINT.

Reported by: crees, np, jhb
X-MFC-With: r359829
Sponsored by: Dell EMC Isilon

cxgbe(4): Make sure 'flags' is at the same offset in structs toepcb and
synq_entry. TAILQ_ENTRY isn't always the same size as two pointers.

Reported by: rmacklem@
MFC after: 3 days
Sponsored by: Chelsio Communications

depend-cleanup: fix typo, ^/lib/libc/sys/Makefile.inc generates .S stubs

Pointy hat: kevans

Move shm_open dependency cleanup into a new home

r359461 introduced this nifty script to centralize these things, so add
shm_open.c there to remove a total of one (1) bad example from
Makefile.inc1.

Looked over by: emaste

snd_hda(4): Recognize the ALC257 codec.

PR: 245524
Submitted by: Jose Luis Duran <jlduran@gmail.com>
MFC after: 1 week

Fix sendto() on unconnected SOCK_STREAM/SEQPACKET unix sockets.

Previously the unpcb pointer of the newly connected remote socket was
not initialized correctly, so attempting to lock it would result in a
null pointer dereference.

Reported by: syzkaller
MFC after: 1 week
Sponsored by: The FreeBSD Foundation

Relax restrictions on private mappings of POSIX shm objects.

When creating a private mapping of a POSIX shared memory object,
VM_PROT_WRITE should always be included in maxprot regardless of
permissions on the underlying FD. Otherwise it is possible to open a
shm object read-only, map it with MAP_PRIVATE and PROT_WRITE, and
violate the invariant in vm_map_insert() that (prot & maxprot) == prot.

Reported by: syzkaller
Reviewed by: kevans, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24398

close_range/closefrom: fix regression from close_range introduction

close_range will clamp the range between [0, fdp->fd_lastfile], but failed
to take into account that fdp->fd_lastfile can become -1 if all fds are
closed. =-( In this scenario, just return because there's nothing further we
can do at the moment.

Add a test case for this, fork() and simply closefrom(0) twice in the child;
on the second invocation, fdp->fd_lastfile == -1 and will trigger a panic
before this change.

X-MFC-With: r359836

libc: remove shm_open(2)'s compat fallback

This had been introduced to ease any pain for using slightly older kernels
with a newer libc, e.g., for bisecting a kernel across the introduction of
shm_open2(2). 6 months has passed, retire the fallback and let shm_open()
unconditionally call shm_open2().

Stale includes are removed as well.

Install expected output file of test missed in r359585

Sponsored by: The FreeBSD Foundation

Remove unused 'struct rtentry' definition.

Sync with OpenBSD:

arc4random.c: In the incredibly unbelievable circumstance where
_rs_init() fails to allocate pages, don't call abort() because of
corefile data leakage concerns, but simply _exit().  The reasoning
is _rs_init() will only fail if someone finds a way to apply
specific pressure against this failure point, for the purpose of
leaking information into a core which they can read.  We don't
need a corefile in this instance to debug that.  So take this
"lever" away from whoever in the future wants to do that.

arc4random.3: reference random(4)

arc4random_uniform.c: include stdint.h over sys/types.h

Remove tcp_rtlookup6() function signature.
The function itself was removed in r122922 16 years ago.

Delete the mbuf macros that were used for the Mac OS/X port.

When the code was ported to Mac OS/X, mbuf handling functions were
converted to using the Mac OS/X accessor functions. For FreeBSD, they
are a simple set of macros in sys/fs/nfs/nfskpiport.h.
Since r359757, r359780, r359785, r359810, r359811 have removed all uses
of these macros, this patch deleted the macros from the .h files.

My eventual goal is deleting nfskpiport.h, but that will take some more
editting to replace uses of the remaining macros.

Bump __FreeBSD_version after r359836, close_range(2)

Reported by: cy

sysent: re-roll after introduction of close_range in r359836

Implement a close_range(2) syscall

close_range(min, max, flags) allows for a range of descriptors to be
closed. The Python folk have indicated that they would much prefer this
interface to closefrom(2), as the case may be that they/someone have special
fds dup'd to higher in the range and they can't necessarily closefrom(min)
because they don't want to hit the upper range, but relocating them to lower
isn't necessarily feasible.

sys_closefrom has been rewritten to use kern_close_range() using ~0U to
indicate closing to the end of the range. This was chosen rather than
requiring callers of kern_close_range() to hold FILEDESC_SLOCK across the
call to kern_close_range for simplicity.

The flags argument of close_range(2) is currently unused, so any flags set
is currently EINVAL. It was added to the interface in Linux so that future
flags could be added for, e.g., "halt on first error" and things of this
nature.

This patch is based on a syscall of the same design that is expected to be
merged into Linux.

Reviewed by: kib, markj, vangyzen (all slightly earlier revisions)
Differential Revision: https://reviews.freebsd.org/D21627

Add mention of wireless option in bsdconfig

Submitted by: debdrup
Approved by: dteske (maintainer)
Differential Revision: https://reviews.freebsd.org/D24378

Add queue(2) debug macros as build options

Add QUEUE_MACRO_DEBUG_TRACE and QUEUE_MACRO_DEBUG_TRASH as proper kernel
options.  While here, alpha-sort the debug section of sys/conf/options.

Enable QUEUE_MACRO_DEBUG_TRASH in amd64 GENERIC (but not GENERIC-NODEBUG)
kernels.  It is similar in nature and cost to other use-after-free pointer
trashing we do in GENERIC.  It is probably reasonable to enable in any arch
GENERIC kernel that defines INVARIANTS.

carp tests: Basic functionality test

Set up three vnet jails, bridged together. Run carp between two of them.
Attempt to provoke locking / epoch issues.

Reviewed by: mav (previous version), melifaro, asomers
Differential Revision: https://reviews.freebsd.org/D24303

carp: Widen epoch coverage

Fix panics related to calling code which expects to be running inside
the NET_EPOCH from outside that epoch.

This leads to panics (with INVARIANTS) such as this one:

    panic: Assertion in_epoch(net_epoch_preempt) failed at /usr/src/sys/netinet/if_ether.c:373
    cpuid = 7
    time = 1586095719
    KDB: stack backtrace:
    db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0090819700
    vpanic() at vpanic+0x182/frame 0xfffffe0090819750
    panic() at panic+0x43/frame 0xfffffe00908197b0
    arprequest_internal() at arprequest_internal+0x59e/frame 0xfffffe00908198c0
    arp_announce_ifaddr() at arp_announce_ifaddr+0x20/frame 0xfffffe00908198e0
    carp_master_down_locked() at carp_master_down_locked+0x10d/frame 0xfffffe0090819910
    carp_master_down() at carp_master_down+0x79/frame 0xfffffe0090819940
    softclock_call_cc() at softclock_call_cc+0x13f/frame 0xfffffe00908199f0
    softclock() at softclock+0x7c/frame 0xfffffe0090819a20
    ithread_loop() at ithread_loop+0x279/frame 0xfffffe0090819ab0
    fork_exit() at fork_exit+0x80/frame 0xfffffe0090819af0
    fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0090819af0
    --- trap 0, rip = 0, rsp = 0, rbp = 0 ---

Widen the NET_EPOCH to cover the relevant (callback / task) code.

Differential Revision: https://reviews.freebsd.org/D24302

Merge commit 30588a739 from llvm git (by Erich Keane):

  Make target features check work with ctor and dtor-

  The problem was reported in PR45468, applying target features to an
  always_inline constructor/destructor runs afoul of GlobalDecl
  construction assert when checking for target-feature compatibility.

  The core problem is fixed by using the version of the check that
  takes a FunctionDecl rather than the GlobalDecl. However, while
  writing the test, I discovered that source locations weren't properly
  set for this check on ctors/dtors. This patch also fixes constructors
  and CALLED destructors.

  Unfortunately, it doesn't seem too possible to get a meaningful
  source location for a 'cleanup' destructor, so those are still
  'frontend' level errors unfortunately. A fixme was added to the test
  to cover that situation.

This should fix 'Assertion failed: (!isa<CXXConstructorDecl>(D) && "Use
other ctor with ctor decls!"), function Init, file
/usr/src/contrib/llvm-project/clang/include/clang/AST/GlobalDecl.h, line
45' when compiling the security/botan2 port.

PR: 245550
MFC after: 6 weeks
X-MFC-With: 358851

Fix string format error missed in r359823.

Introduce nexthop objects and new routing KPI.

This is the foundational change for the routing subsytem rearchitecture.
More details and goals are available in https://reviews.freebsd.org/D24141 .

This patch introduces concept of nexthop objects and new nexthop-based
routing KPI.

Nexthops are objects, containing all necessary information for performing
the packet output decision. Output interface, mtu, flags, gw address goes
there. For most of the cases, these objects will serve the same role as
the struct rtentry is currently serving.
Typically there will be low tens of such objects for the router even with
multiple BGP full-views, as these objects will be shared between routing
entries. This allows to store more information in the nexthop.

New KPI:

struct nhop_object *fib4_lookup(uint32_t fibnum, struct in_addr dst,
  uint32_t scopeid, uint32_t flags, uint32_t flowid);
struct nhop_object *fib6_lookup(uint32_t fibnum, const struct in6_addr *dst6,
  uint32_t scopeid, uint32_t flags, uint32_t flowid);

These 2 function are intended to replace all all flavours of
<in_|in6_>rtalloc[1]<_ign><_fib>, mpath functions  and the previous
fib[46]-generation functions.

Upon successful lookup, they return nexthop object which is guaranteed to
exist within current NET_EPOCH. If longer lifetime is desired, one can
specify NHR_REF as a flag and get a referenced version of the nexthop.
Reference semantic closely resembles rtentry one, allowing sed-style conversion.

Additionally, another 2 functions are introduced to support uRPF functionality
inside variety of our firewalls. Their primary goal is to hide the multipath
implementation details inside the routing subsystem, greatly simplifying
firewalls implementation:

int fib4_lookup_urpf(uint32_t fibnum, struct in_addr dst, uint32_t scopeid,
  uint32_t flags, const struct ifnet *src_if);
int fib6_lookup_urpf(uint32_t fibnum, const struct in6_addr *dst6, uint32_t scopeid,
  uint32_t flags, const struct ifnet *src_if);

All functions have a separate scopeid argument, paving way to eliminating IPv6 scope
embedding and allowing to support IPv4 link-locals in the future.

Structure changes:
* rtentry gets new 'rt_nhop' pointer, slightly growing the overall size.
* rib_head gets new 'rnh_preadd' callback pointer, slightly growing overall sz.

Old KPI:
During the transition state old and new KPI will coexists. As there are another 4-5
decent-sized conversion patches, it will probably take a couple of weeks.
To support both KPIs, fields not required by the new KPI (most of rtentry) has to be
kept, resulting in the temporary size increase.
Once conversion is finished, rtentry will notably shrink.

More details:
* architectural overview: https://reviews.freebsd.org/D24141
* list of the next changes: https://reviews.freebsd.org/D24232

Reviewed by: ae,glebius(initial version)
Differential Revision: https://reviews.freebsd.org/D24232

Revert https://svnweb.freebsd.org/changeset/base/359809

The intended change was
sp->next.tqe_next = NULL;
sp->next.tqe_prev = NULL;
which doesn't fix the issue I'm seeing and the committed fix is
not the intended fix due to copy-and-paste.

Thanks a lot to Conrad Meyer for making me aware of the problem.

Reported by: cem

sendfile_iodone: correct calculation of the page index for relookup.

This is yet another bug in r359473.

Reported and tested by: delphij
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks

config(8): use sbuf to manage line buffers

PR: 245476
Reported by: kevans
Reviewed by: imp, kevans
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D24373

Replace mbuf macros with the code they would generate in the NFS code.

When the code was ported to Mac OS/X, mbuf handling functions were
converted to using the Mac OS/X accessor functions. For FreeBSD, they
are a simple set of macros in sys/fs/nfs/nfskpiport.h.
Since porting to Mac OS/X is no longer a consideration, replacement of
these macros with the code generated by them makes the code more
readable.
When support for external page mbufs is added as needed by the KERN_TLS,
the patch becomes simpler if done without the macros.

This patch should not result in any semantic change.

This is the final patch of this series and the macros should now be
able to be deleted from the .h files in a future commit.

Replace mbuf macros with the code they would generate in the NFS code.

When the code was ported to Mac OS/X, mbuf handling functions were
converted to using the Mac OS/X accessor functions. For FreeBSD, they
are a simple set of macros in sys/fs/nfs/nfskpiport.h.
Since porting to Mac OS/X is no longer a consideration, replacement of
these macros with the code generated by them makes the code more
readable.
When support for external page mbufs is added as needed by the KERN_TLS,
the patch becomes simpler if done without the macros.

This patch should not result in any semantic change.

Zero out pointers for consistency.

This was found by running syzkaller on an INVARIANTS kernel.

MFC after: 3 days

zfs: Add option for forcible unmounting dataset while receiving snapshot.

Currently when the dataset is in use we can't receive snapshots.

zfs send test/1@asd | zfs recv -FM test/2
cannot unmount '/test/2': Device busy

This commits add option 'M' which attempts to forcibly unmount the
dataset. Thanks to this we can enforce receiving snapshots in a
single step.

Note that this functionality is not supported on Linux because the
VFS will prevent active mounted filesystems from being unmounted,
even with the force option. This is the intended VFS behavior.

Discussed-with: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allanjude@freebsd.org>
Differential Revision: https://reviews.freebsd.org/D22306

openzfs/zfs@a57d3d45d6efdff935421e2ef3f97e3dc089d93d

decryptcore: load the nls data

Load the nls data before the openssl will try to do it in the
capability mode.
On my machine the sa_ossl_private_decrypt is trying to do that.

MFC after: 2 weeks

arm: am335x: Honor pmic option ti,pmic-shutdown-controller

Honor ti,pmic-shutdown-controller option in DTS

Tested on stable r359316 @ Sleep mode on custom hw, Power off on BBB and PB

OFF bit [1] in status register control the pmic behaviour when PWR_EN pin
is pulled low.
On most AM335x hardware [beaglebone *] the desired behaviour are in fact
power off due to some hardware designs - read more in the comments around
pmic in sys/gnu/dts/arm/am335x-bone-common.dtsi

This patch let the device-tree decide with ti,pmic-shutdown-controller[2]
the state of off bit in status register.

[1] 8.6.12 table 12 http://www.ti.com/lit/ds/symlink/tps65217.pdf

[2] Documentation/devicetree/bindings/regulator/tps65217.txt

PR: 245159
Submitted by: Oskar Holmlund <oskar.holmlund@ohdata.se>
MFC after: 2 weeks

gpioctl: Print interrupts capabilities

GPIO drivers who supports interrupts report them in the caps
(obtain via the getcaps method) but gpioctl doesn't know
how to interpret this and print "UNKNOWN" for each one of them.
Even if we don't have userland gpio interrupts support for now
let gpioctl print the supported caps.

MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24133

Fix build by adding forgotten header to radix_mpath.c after r359797.

wc(1): document SIGINFO handling in the manual page.

MFC after: 3 days

Remove RADIX_MPATH headers, they were unused since r293159.

MFC after: 2 weeks

Remove per-AF radix_mpath initializtion functions.

Split their functionality by moving random seed allocation
to SYSINIT and calling (new) generic multipath function from
standard IPv4/IPv5 RIB init handlers.

Differential Revision: https://reviews.freebsd.org/D24356

Avoid using a variable solely for sizes that are never meant to be
modified runtime.

No functional change.

MFC after: 2 weeks

powerpc/booke: Use power-of-two mappings in 64-bit pmap_mapdev

Summary:
This reduces the precious TLB1 entry consumption (64 possible in
existing 64-bit cores), by adjusting the size and alignment of a device
mapping to a power of 2, to encompass the full mapping and its
surroundings.

One caveat with this: If a mapping really is smaller than a power of 2,
it's possible to get a machine check or hang if the 'missing' physical
space is accessed. In practice this should not be an issue for users,
as devices overwhelmingly have physical spaces on power-of-two sizes and
alignments, and any design that includes devices which don't follow this
can be addressed by undefining the POW2_MAPPINGS guard.

Reviewed by: bdragon
Differential Revision: https://reviews.freebsd.org/D24248

powerpc/booke: Add pte_find_next() to find the next in-use PTE

Summary:
Iterating over VM_MIN_ADDRESS->VM_MAXUSER_ADDRESS can take a very long
time iterating one page at a time (2**(log_2(SIZE)-12) operations),
yielding possibly several days or even weeks on 64-bit Book-E, even for
a largely empty, which can happen when swapping out a process by
vmdaemon. Speed this up by instead finding the next PTE at or equal to
the given VA.

Reviewed by: bdragon
Differential Revision: https://reviews.freebsd.org/D24238

powerpc/booke: Change Book-E 64-bit pmap to 4-level table

Summary:
The existing page table is fraught with errors, since it creates a hole
in the address space bits. Fix this by taking a cue from the POWER9
radix pmap, and make the page table 4 levels, 52 bits.

Reviewed by: bdragon
Differential Revision: https://reviews.freebsd.org/D24220

Inode check-hash errors were being reported after system crashes.
Trace the cause down to journalled soft updates recovery code in
fsck failing to recompute the check-hash after updating an inode.

As inode check-hash was first introduced to UFS in FreeBSD 13,
there is no need to MFC this commit.

Reported by: Chuck Silvers
Sponsored by: Netflix

Add an inode check-hash verification when running the journalled
soft update recovery code with the debugging (-d) option.

As inode check-hash was first introduced to UFS in FreeBSD 13,
there is no need to MFC this commit.

Reported by: Chuck Silvers
Sponsored by: Netflix

Document removal of deprecated algorithms for in-kernel GSS.

Remove the -o option from gssd(8).

This uses DES and the kernel no longer supports DES for in-kernel GSS.

Reviewed by: kp
Relnotes: yes
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D24345

Remove support for Kernel GSS algorithms deprecated in r348875.

This removes support for using DES, Triple DES, and RC4.

Reviewed by: cem, kp
Tested by: kp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D24344

Replace mbuf macros with the code they would generate in the NFS code.

When the code was ported to Mac OS/X, mbuf handling functions were
converted to using the Mac OS/X accessor functions. For FreeBSD, they
are a simple set of macros in sys/fs/nfs/nfskpiport.h.
Since porting to Mac OS/X is no longer a consideration, replacement of
these macros with the code generated by them makes the code more
readable.
When support for external page mbufs is added as needed by the KERN_TLS,
the patch becomes simpler if done without the macros.

This patch should not result in any semantic change.
This conversion will be committed one file at a time.