CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log

amd64: stop doing special allocation for the AP startup trampoline

There is no reason now why do we need to allocate trampoline page very
early in the boot process.  The only requirement for the page is that
it is below 1M to be usable by the real mode during init.  This can be
handled by vm_alloc_contig() when we do the startup.

Also assert that startup trampoline fits into single page.  In principle
we can do multi-page allocation if needed, but it is not.

Move the alloc_ap_trampoline() function and the boot_address variable to
i386/mp_machdep.c.  Keep existing mechanism of early alloc on i386.

Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D31343

awk: Note awk upgrades.

Note the high level differences with the latest one true awk
import. This list may grow as we learn more troublesome areas.

Updated description of the format fo the file to match the file.

I'll likely merge this change (and any followups) by direct commit to
stable/13 and stable/12 in a couple of weeks.

Sponsored by: Netflix

LinuxKPI: bitfield.h cleanup

Add a missing tab and remove an unnecessary return.
No functional changes.

MFC after: 3 days

hwpmc: remove static POWER8 definitions

After b48a2770d48b, static POWER8 definitions became unnecessary,
as all of them (and much more) are already present in libpmc's
PMU events.

Submitted by: Leonardo Bianconi <leonardo.bianconi@eldorado.org.br> (initial version)
Reviewed by: kbowling, mhorne
Sponsored by: Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D31334

x86 __vdso_gettc: add O_CLOEXEC flag to open

of the /dev/hpet and /dev/hv_tsc devices, to not leak internal libc
filedescriptors on exec.

Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31344

amd64: Set GS.base before calling init_secondary() on APs

KMSAN instrumentation requires thread-local storage to track
initialization state for function parameters and return values.  This
buffer is accessed as part of each function prologue.  It is provided by
the KMSAN runtime, which looks up a pointer in the current thread's
structure.

When KMSAN is configured, init_secondary() is instrumented, but this
means that GS.base must be initialized first, otherwise the runtime
cannot safely access curthread.  Work around this by loading GS.base
before calling init_secondary(), so that the runtime can at least check
curthread == NULL and return a pointer to some dummy storage.  Note that
init_secondary() still must reload GS.base after calling lgdt(), which
loads a selector into %gs, which in turn clears the base register.

Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31336

amd64: Set MSR_KGSBASE to 0 during AP startup

There is no reason to initialize it to anything else, and this matches
initialization of the BSP. No functional change intended.

Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31336

link_elf_obj: Invoke fini callbacks

This is required for KASAN: when a module is unloaded, poisoned regions
(e.g., pad areas between global variables) are left as such, so if they
are reused as KLDs are loaded, false positives can arise.

Reported by: pho, Jenkins
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31339

libc/locale: Use O_CLOEXEC when opening locale tables

Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation

linux(4): Eliminate a now unused includes after futexes refactoring.

MFC after: 2 weeks

linux(4): Add a comment about wait/requeue pi operations.

MFC after: 2 weeks

linux(4): Handle incorrect FUTEX_CLOCK_REALTIME option bit.

Return ENOSYS if the FUTEX_CLOCK_REALTIME option bit is specified for an
inappropriate futex operation.

MFC after: 2 weeks

linux(4): Handle FUTEX_LOCK_PI2 oeration.

FUTEX_LOCK_PI2 was added to support clock selection as FUTEX_LOCK_PI uses a
CLOCK_REALTIME based absolute value since it was implemented, but it does not
require that the FUTEX_CLOCK_REALTIME bit is set, because that was introduced
later.

MFC after: 2 weeks

linux(4): Use variable name not type for sizeof() to calculate storage size.

MFC after: 2 weeks

linux(4): Move len variable initialization to the appropriate place.

MFC after: 2 weeks

linux(4): Use linux_tdfind() in get_robust_list.

In the Linux emulation layer linux_tdfind() has a special purpose to
handle glibc specific TID mangling and we should use it instead of tdfind().

MFC after: 2 weeks

linux(4): Eliminate unnecessary error initialization.

MFC after: 2 weeks

linux(4): Eliminate unnecessary head initialization.

MFC after: 2 weeks

linux(4): style, wrap too long line.

MFC after: 2 weeks

linux(4): Eliminating remnants of futex sdt.

MFC after: 2 weeks

linux(4): Eliminating an accidental comment.

MFC after: 2 weeks

linux(4): Handle special case for regular futex in handle_futex_death().

Handle some races in handle_futex_death() which can prevents a wakeup of
potential waiters which can cause these waiters to block forever.

Differential Revision: https://reviews.freebsd.org/D31280
MFC after: 2 weeks

linux(4): Futex address must be 32-bit aligned.

Linux futex documentation explicitly states that EINVAL is returned if
the futex is not 4-byte aligned. Check futex alignment as a Linux do
and return EINVAL.

Differential Revision: https://reviews.freebsd.org/D31279
MFC after: 2 weeks

linux(4): Finish cf8d74e3fe63.

Add forgotten val3_compare initialization in case of time64 futex.

MFC after: 2 weeks

linux(4): Replace casuword32 by casueword32.

Follow the r349951 (30b3018d), add check to react to stops and requests
to terminate between retries.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31254
MFC after: 2 weeks

linux(4): Implement pi futexes using umtx.

Differential Revision: https://reviews.freebsd.org/D31240
MFC after: 2 weeks

linux(4): Replace copyin() by fueword32() in handle_futex_death().

According to fetch(9) fueword facility designed to fetch atomically
small amount of data from user space.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31239
MFC after: 2 weeks

umtx: Add new pi_futex type.

Differential Revision: https://reviews.freebsd.org/D31250
MFC after: 2 weeks

umtx: Split do_unlock_pi on two counterparts.

The umtx_pi_frop() will be used by Linux emulation layer.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31238
MFC after: 2 weeks

umtx: Expose some of the pi umtx structures and API to the rest of the kernel.

Differential Revision: https://reviews.freebsd.org/D31237
MFC after: 2 weeks

linux(4): Eliminate unused includes.

MFC after: 2 weeks

linux(4): Reimplement futexes using umtx.

Differential Revision: https://reviews.freebsd.org/D31236
MFC after: 2 weeks

umtx: Add umtxq_requeue Linux emulation layer extension.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31235
MFC after: 2 weeks

umtx: Add bitset conditional wakeup functionality.

The bitset is a Linux emulation layer extension. This 32-bit mask, in which at
least one bit must be set, is used to select which threads should be woken up.

The bitset is stored in the umtx_q structure, which is used to enqueue the waiter
into the umtx waitqueue. Put the bitset into the hole, that appeared on LP64 due
to data alignment, to prevent the growth of the struct umtx_q.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31234
MFC after: 2 weeks

umtx: Expose some of the umtx structures and API to the rest of the kernel.

Differential Revision: https://reviews.freebsd.org/D31233
MFC after: 2 weeks

umtx: Expose struct abs_timeout to the rest of the kernel.

Add umtx_ prefix to all abs_timeout facility and add declaration for it.
For consistency with others abs_timeout mark inline abs_timeout_init2.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31249
MFC after: 2 weeks

umtx: Split umtx.h on two counterparts.

To prevent umtx.h polluting by future changes split it on two headers:
umtx.h - ABI header for userspace;
umtxvar.h - the kernel staff.

While here fix umtx_key_match style.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31248
MFC after: 2 weeks

freebsd32: Remove the unnecessary spaces.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31247
MFC after: 2 weeks

freebsd32: Remove unused umtx.h include.

Differential Revision: https://reviews.freebsd.org/D31246
MFC after: 2 weeks

freebsd32: Eliminate spaces at end of line.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31245
MFC after: 2 weeks

Fix mac_veriexec version mismatch

mac_veriexec sets its version to 1, but the mac_veriexec_shaX modules which depend on it expect MAC_VERIEXEC_VERSION = 2.
Be consistent and use MAC_VERIEXEC_VERSION everywhere.
This unbreaks loading of mac_veriexec modules at boot time.

Authored by: Kornel Duleba <mindal@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D31268

Add missing arm64 ID registers

These may contain values we export to userpsace.

Sponsored by: The FreeBSD Foundation

Sort the arm64 ID_AA64* user registers

Sponsored by: The FreeBSD Foundation

Minor language improvements. Note that they can't be changed
by sysctl (I think they can be changed as a tuneable.)

virtio: enable VTNET_LEGACY_TX when ALTQ is enabled.

ALTQ only works on network drivers which use if_start (rather than
if_transmit). vtnet uses if_start if built with VTNET_LEGACY_TX. Default
to that the kernel is built with ALTQ enabled, to reduce user surprise.

MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")

UPDATING: document if_bridge MTU changes

Sponsored by: Rubicon Communications, LLC ("Netgate")

linux(4): Fix gcc buld.

gcc failed as it didn't inlined the builtins and generates calls to
the libgcc, ld can't find libgcc as cross-toolchain libgcc is not installed.
To avoid this add internal vDSO ffs functions without optimized builtins.

Reported by: jhb
MFC after: 2 weeks

hexdump: Flush stdout after '*' (repeat) lines.

The canonical annoying example being: hexdump < /dev/zero | less

libc qsort(3): Eliminate ambiguous sign comparison

The left side of the MIN() expression is the (signed) result of pointer
subtraction (ptrdiff_t).  The right hand side is the also the (signed)
result of pointer subtraction, additionally subtracting the element size
('es'), which is unsigned size_t.  This coerces the right-hand
expression into an unsigned value.  MIN(signed, unsigned) triggers
-Wsign-compare.

Sorting elements of size greater than SSIZE_MAX is nonsensical, so we
can instead treat the element size as ssize_t, leaving the right-hand
result the same signedness as the left.

Reviewed by: arichardson, kib
Differential Revision: https://reviews.freebsd.org/D31292

kern: remove deprecated makesyscalls.sh

makesyscalls was rewritten in Lua and introduced in d3276301ab. In the
time since, no objections have risen and a warning was introduced long
ago on invocation of makesyscalls.sh that it would be removed before
FreeBSD 13. Belatedly follow through on that.

cli.lua.8: make the command match the code

It's disable-device, not device-disable

Spotted by: jrtc27
Sponsored by: Netflix

Refactor/optimize cpu_search_*().

Remove cpu_search_both(), unused for many years.  Without it there is
less sense for the trick of compiling common cpu_search() into separate
cpu_search_lowest() and cpu_search_highest(), so split them completely,
making code more readable.  While there, split iteration over children
groups and CPUs, complicating code for very small deduplication.

Stop passing cpuset_t arguments by value and avoid some manipulations.
Since MAXCPU bump from 64 to 256, what was a single register turned
into 32-byte memory array, requiring memory allocation and accesses.
Splitting struct cpu_search into parameter and result parts allows to
even more reduce stack usage, since the first can be passed through
on recursion.

Remove CPU_FFS() from the hot paths, precalculating first and last CPU
for each CPU group in advance during initialization.  Again, it was
not a problem for 64 CPUs before, but for 256 FFS needs much more code.

With these changes on 80-thread system doing ~260K uncached ZFS reads
per second I observe ~30% reduction of time spent in cpu_search_*().

MFC after: 1 month

debugnet: Fix false-positive assertions for dp_state

debugnet_handle_arp:
  An assertion is present to ensure the pcb is only modified when the state is
  DN_STATE_INIT. Because debugnet_arp_gw() is asynchronous it is possible for
  ARP replies to come in after the gateway address is known and the state
  already changed.

debugnet_handle_ip:
  Similarly it is possible for packets to come in, from the expected
  server, during the gateway mac discovery phase.  This can happen from
  testing disconnects / reconnects in quick succession.  This later
  causes some acks to be sent back but hit an assertion because the
  state is wrong.

Reviewed by: cem, debugnet_handle_arp: markj, vangyzen
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D31327

lua loader: Add disable-device to disable a device.

disable-device fooX will set hint.foo.X.disabled=1 as a way to easily
disable a device attaching during boot.

Reviewed by: tsoome
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D31297

nfscl: Cache an open stateid for the "oneopenown" mount option

For NFSv4.1/4.2, if the "oneopenown" mount option is used,
there is, at most, only one open stateid for each NFS vnode.
When an open stateid for a file is acquired, set a pointer to
the open structure in the NFS vnode.  This pointer can be used to
acquire the open stateid without searching the open linked list
when the following is true:
- No delegations have been issued for the file.  Since delegations
  can outlive an NFS vnode for a file, use the global
  NFSMNTP_DELEGISSUED flag on the mount to determine this.
- No lock stateid has been issued for the file.  To determine
  this, a new NFS vnode flag called NMIGHTBELOCKED is set when a lock
  stateid is issued, which can then be tested.

When this open structure pointer can be used, it avoids the need to
acquire the NFSCLSTATELOCK() and searching the open structure list for
an open.  The NFSCLSTATELOCK() can be highly contended when there are
a lot of opens issued for the NFSv4.1/4.2 mount.

This patch only affects NFSv4.1/4.2 mounts when the "oneopenown"
mount option is used.

MFC after: 2 weeks

nfscl: Set correct lockowner for "oneopenown" mount option

For NFSv4.1/4.2, the client may use either an open, lock or
delegation stateid as the stateid argument for an I/O operation.
RFC 5661 defines an order of preference of delegation, then lock
and finally open stateid for the argument, although NFSv4.1/4.2
servers are expected to handle any stateid type.

For the "oneopenown" mount option, the lock owner was not being
correctly generated and, as such, the I/O operation would use an
open stateid, even when a lock stateid existed. Although this
did not and should not affect an NFSv4.1/4.2 server's behaviour,
this patch makes the behaviour for "oneopenown" the same as when
the mount option is not specified.

Found during inspection of packet captures. No failure during
testing against NFSv4.1/4.2 servers of the unpatched code occurred.

MFC after: 2 weeks

pkgbase: improve pkg --version parsing

In some cases `pkg --version` might produce unexpected or additional
output. Use a regex /^[0-9.]+$/ to match only the line containing the
version number.

Reported by: Michael Butler on freebsd-current@
Fixes: 4e224e4be7c3 ("pkgbase: accommodate pkg < 1.17")
Sponsored by: The FreeBSD Foundation

Do not expose to scheduler caches of single CPU.

Before this change my dual-Xeon(R) Gold 6242R always reported 3 levels
or topology (root, package/L3 and core/L2). But with SMT disabled
core/L2 matches thread, so additional topology level only causes more
traversal work. With this change SMT case is reported same as before,
while non-SMT is reported with only 2 much more simple levels.

MFC after: 2 weeks

compilert-rt: build out-of-line LSE atomics helpers for aarch64

Both clang >= 12 and gcc >= 10.1 now default to -moutline-atomics for
aarch64. This requires a bunch of helper functions in libcompiler_rt.a,
to avoid link errors like "undefined symbol: __aarch64_ldadd8_acq_rel".

(Note: of course you can use -mno-outline-atomics as a workaround too,
but this would negate the potential performance benefit of the faster
LSE instructions.)

Bump __FreeBSD_version so ports maintainers can easily detect this.

PR: 257392
MFC after: 2 weeks

bridge tests: verify that we can't change MTU of bridge member interfaces

Reviewed by: donner
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31305

net: disallow MTU changes on bridge member interfaces

if_bridge member interfaces should always have the same MTU as the
bridge itself, so disallow MTU changes on interfaces that are part of an
if_bridge.

Reviewed by: donner
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31304

bridge tests: test changing the bridge MTU

Changing the bridge MTU will now also change all of the member interface
MTUs. Test this.

Reviewed by: donner
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31289

if_bridge: allow MTU changes

if_bridge used to only allow MTU changes if the new MTU matched that of
all member interfaces. This doesn't really make much sense, in that we
really shouldn't be allowed to change the MTU of bridge member in the
first place.

Instead we now change the MTU of all member interfaces. If one fails we
revert all interfaces back to the original MTU.

We do not address the issue where bridge member interface MTUs can be
changed here.

Reviewed by: donner
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31288

loader: support.4th resets the read buffer incorrectly

Large nextboot.conf files (over 80 bytes) are not read correctly by the
Forth loader, causing file parsing to abort, and nextboot configuration
fails to apply.

Simple repro:

nextboot -e foo=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
shutdown -r now

That will cause the bug to cause a parse failure but shouldn't otherwise
affect the boot.  Depending on your loader configuration, you may also
have to set beastie_disable and/or reduce the number of modules loaded
to see the error on a small console screen.  12.0 or CURRENT users will
also have to explicitly use the Forth loader instead of the Lua loader.
The error will look something like:

Warning: syntax error on file /boot/loader.conf.local
foo="xxxxxxxxxxxxxxnextboot_enable="YES"
                                    ^
/boot/support.4th has crude file I/O buffering, which uses a buffer
'read_buffer', defined to be 80 bytes by the 'read_buffer_size'
constant.  The loader first tastes nextboot.conf, reading and parsing
the first line in it for nextboot_enable="YES".  If this is true, then
it reopens the file and parses it like other loader .conf files.

Unfortunately, the file I/O buffering code does not fully reset the
buffer state in the reset_line_reading word.  If the last file was read
to the end, that doesn't matter; the file buffer is treated as empty
anyway.  But in the nextboot.conf case, the loader will not read to the
end of file if it is over 80 bytes, and the file buffer may be reused
when reading the next file.  When the file is reread, the corrupt text
may cause file parsing to abort on bad syntax (if the corrupt line has
<>2 quotes in it), the wrong variable to be set, no variable to be set
at all, or (if the splice happens to land at a line ending) something
approximating normal operation.

The bug is very old, dating back to at least 2000 if not before, and is
still present in 12.0 and CURRENT r345863 (though it is now hidden by
the Lua loader by default).

Suggested one-line attached.  This does change the behavior of the
reset_line_reading word, which is exported in the line-reading
dictionary (though the export is not documented in loader man pages).
But repo history shows it was probably exported for the PNP support
code, which was never included in the loader build, and was removed 5
months ago.

One thing that puzzles me: how has this bug gone unnoticed/unfixed for
nearly 2 decades?  I find it hard to believe that nobody's tried to do
something interesting with nextboot, like load a kernel and filesystem,
which is what I'm doing.

Tested by: Gary Jennejohn
PR: 239315
MFC After: 3 weeks
Reviewed by: imp (and correctly applied this time)
Differential Revision: https://reviews.freebsd.org/D31328

genoffset: simplify and rewrite in sh

genoffset used the fully generic ASSYM macro to generate the offsets
needed for the thread_lite structure. However, since these are offsets
into a structure, they will always be necessarily small and positive. As
such, just create a simple character array of the right size and use a
naming convention such that we can recover the field name, structure
name and type. Use nm -t d and sort -n to sort these into order, then
loop over the resutls to generate the thread_lite structure.

MFC After: 2 weeks
Reviewed by: kib, markj (earlier versions)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D31203

genassym.sh: Fix two minor issues found by shellcheck

o Remove redunant $ in $(( )) expression.
o Quote arg passed to work so paths with spaces, etc will work.

MFC After: 2 weeks
Reviewed by: kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D31335

vnic: add TODO list item for multicast filter support

PR: 223573

powerpc: change mfpvr return type to uint32_t

As the Processor Version Register (PVR) is a 32-bit PowerPC
register, change mfpvr() return type to match it and avoid
type casts on its callers.

Suggested by: jhibbits
Reviewed by: jhibbits, imp
Sponsored by: Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D31332

socket: Implement SO_RERROR

SO_RERROR indicates that receive buffer overflows should be handled as
errors. Historically receive buffer overflows have been ignored and
programs could not tell if they missed messages or messages had been
truncated because of overflows. Since programs historically do not
expect to get receive overflow errors, this behavior is not the
default.

This is really really important for programs that use route(4) to keep
in sync with the system. If we loose a message then we need to reload
the full system state, otherwise the behaviour from that point is
undefined and can lead to chasing bogus bug reports.

Reviewed by: philip (network), kbowling (transport), gbe (manpages)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D26652

Add zfskeys rc.d script for auto-loading encryption keys

ZFS in 13 supports encryption, but for the use case where keys are
available in plaintext on disk there is no mechanism for automatically
loading keys on startup.

This script will, by default, look for any dataset with encryption and
keylocation prefixed with file://. It will attempt to unlock, timing
out after 10 seconds for each dataset found.
User can optionally specify explicitly which datasets to attempt to
unlock.

Also supports (optionally by force) unmounting filesystems and unloading
associated keys.

Sponsored by: Modirum
Differential Revision: https://reviews.freebsd.org/D30015

LinuxKPI: add read_poll_timeout()

Add an implementation of read_poll_timeout() and the atomic variant
which I did at some point last year for rtw88 and now updated based
on feedback.

MFC after: 10 days
Reviewed by: hsealsky
Differential Revision: https://reviews.freebsd.org/D30980

xen: introduce xen_has_percpu_evtchn()

xen_vector_callback_enabled is x86 specific and availability of
per-cpu event channel delivery differs on other architectures.

Introduce a new helper to check if there's support for per-cpu event
channel injection.

Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D29402

xen/control: print warning on call of xctrl_suspend()

Presently suspend/resume and migration aren't supported on Xen/ARM. As
such this shouldn't ever occur.

This likely applies to future Xen architectures (RISC-V) and
xctrl_suspend() needs dependency on intr_machdep.h fixed.

Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D29599

xen/grant_table: cleanup max_nr_grant_frames()

This is no more or less than returning the smaller of two values. Since
this is what min() does, use that to shrink max_nr_grant_frames() down
to the single line.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D29840

xen/control: introduce xen_pv_shutdown_handler()

While x86 only register PV shutdown handler for PV guests. ARM guests
are always using HVM and requires the PV shutdown handler.

Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D29406

xen: introduce xen_pv_disks_disabled()

ARM guest is considered as HVM in Freebsd but they only support PV disk
(no emulation available).

Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D29403

xen/netfront: introduce xen_pv_nics_disabled()

ARM guest is considered as HVM but it only supports PV nics (no
emulation available).

Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D29405

xen/xen-os: move inclusion of machine/xen-os.h later

Several of x86 enable/disable functions depend upon the xen*domain()
functions. As such the xen*domain() functions need to be declared
before machine/xen-os.h.

Officially declare direct inclusion of machine/xen/xen-os.h verboten as
such will break these functions/macros. Remove one such soon to be
broken inclusion.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D29811

xen/xenpv: remove low memory limit for non-x86

For embedded devices reserved addresses will be known in advance. More
recently added devices will also likely be correctly updated. As a
result using any available address is reasonable on non-x86.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D29304

xen/intr: use __func__ instead of function names

Functions tend to get renamed and unless the developer is careful
often debugging messages are missed. As such using func is far
superior. Replace several instances of hard-coded function names.

Reviewed by: royger
Differential revision: https://reviews.freebsd.org/D29499

xen/timer: make xen timer optional

The timer is not used on ARM.

Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D29041

xen/intr: use struct xenisrc * as xen_intr_handle_t

Since xen_intr_handle_t is meant to be an opaque handle and the only
use is retrieving the associated struct xenisrc *, directly use it as
the opaque handler.

Also add a wrapper function for converting the other direction. If some
other value becomes appropriate in the future, these two functions will
be the only spots needing modification.

Reviewed by: mhorne, royger
Differential Revision: https://reviews.freebsd.org/D29500

xen/control: gate x86 specific code in the preprocessor

Commit 152265223048 was implemented strictly for x86. Unfortunately
one of the pieces was mixed into a common area breaking other
architectures. For now disable these bits on !x86, this should be
cleaned up later.

Fixes: 152265223048 ('xen: fix dropping bitmap IPIs during resume')
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D29306

xen/xen-os: don't let anyone else defining __XEN_INTERFACE_VERSION__

FreeBSD should always use the same version across the source. If not
it's a call for problem.

Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D29407

xen: create VM_MEMATTR_XEN for Xen memory mappings

The requirements for pages shared with Xen/other VMs may vary from
architecture to architecture. As such create a macro which various
architectures can use.

Remove a use of PAT_WRITE_BACK in xenstore.c. This is a x86-ism which
shouldn't have been present in a common area.

Original idea: Julien Grall <julien@xen.org>, 2014-01-14 06:44:08
Approach suggested by: royger
Reviewed by: royger, mhorne
Differential Revision: https://reviews.freebsd.org/D29351

xen: move x86/xen/xenpv.c to dev/xen/bus/xenpv.c

Minor changes are necessary to make this processor-independent, but
moving the file out of x86 and into common is the first step (so
others don't add /more/ x86-isms).

Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D29042

Add macros for arm64 special reg op and CR values

Use these to simplify the definition of the user_regs array.

Reviewed by: imp, markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31333

pf: Validate user string nul-termination before copying

Some pf ioctl handlers use strlcpy() to copy strings when converting
from user structures to their in-kernel representations.  strlcpy()
ensures that the destination will be nul-terminated, but it assumes that
the source is nul-terminated.  In particular, it returns the full length
of the source string, so if the source is not nul-terminated, strlcpy()
will keep scanning until it finds a nul byte, and it may encounter an
unmapped page first.  Add a helper to validate user strings before
copying.

There are also places where we look up a ruleset using a user-provided
anchor string.  In some ioctl handlers we were already nul-terminating
the string, avoiding the same problem, but in other places we were not.
Fix those by nul-terminating as well.  Aside from being consistent,
anchors have a maximum length of MAXPATHLEN - 1 so calling strnlen()
might not be so desirable.

Reported by: syzbot+35a1549b4663e9483dd1@syzkaller.appspotmail.com
Reviewed by: kp
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31169

pf: Initialize arrays before copying out to userland

A number of pf ioctls populate an array of structures and copy it out.
They have the following structures:
- caller specifies the size of its output buffer
- ioctl handler allocates a kernel buffer of the same size
- ioctl handler populates the buffer, possibly leaving some items
initialized if the caller provided more space than needed
- ioctl handler copies the entire buffer out to userland

Thus, if more space was provided than is required, we end up copying out
uninitialized kernel memory. Simply zero the buffer at allocation time
to prevent this.

Reported by: KMSAN
Reviewed by: kp
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31313

LinuxKPI: add fsleep()

Add fsleep() function now required by rtw88. This seems to be
making a decision depending on time to sleep on how to sleep.
Given our compat framework already is lenient on how long to sleep,
this is a cut down version.

MFC after: 10 days
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D31322

LinuxKPI: dmi.h do not rely on implicit includes

Add sys/types.h to dmi.h and do not rely on other files to include
all needed headers in Linux land. I ran into compile problems with
rtw88 otherwise.

MFC after: 3 days

pf: fix ABI breakage

The introduction of synproxy support changed the size of struct
pf_status, which in turn broke the userspace ABI.

Revert the relevant change. More work is needed on the synproxy code to
keep and expose the counters, but in the mean time this restores the
ABI.

PR:             257469
MFC after:      3 days
Sponsored by:   Modirum MDPay

mlx5/mlx4: Bump driver version to 3.7

While at it only output driver version to dmesg(8) when hardware is present.

Differential Revision: https://reviews.freebsd.org/D29100
MFC after: 1 week
Reviewed by: kib and markj
Sponsored by: NVIDIA Networking

booti: Enable loading the kernel image to any address aligned to 2 MB

We've supported this for a long time, plus most u-boot setups quietly expect
it. Otherwise they fail with different levels of memory overwrites.
MFC after: 2 weeks

ibcore: Kernel space update based on Linux 5.7-rc1.

Overview:

This is the first stage of a RDMA stack upgrade introducing kernel
changes only based on Linux 5.7-rc1.

This patch is based on about four main areas of work:
- Update of the IB uobjects system:
  - The memory holding so-called AH, CQ, PD, SRQ and UCONTEXT objects
    is now managed by ibcore. This also require some changes in the
    kernel verbs API. The updated verbs changes are typically about
    initialize and deinitialize objects, and remove allocation and
    free of memory.

- Update of the uverbs IOCTL framework:
  - The parsing and handling of user-space commands has been
    completely refactored to integrate with the updated IB uobjects
    system.

- Various changes and updates to the generic uverbs interfaces in
  device drivers including the new uAPI surface.

- The mlx5_ib_devx.c in mlx5ib and related mlx5 core changes.

Dependencies:

- The mlx4ib driver code has been updated with the minimum changes
needed.

- The mlx5ib driver code has been updated with the minimum changes
needed including DV support.

Compatibility:

- All user-space facing APIs are backwards compatible after this
  change.

- All kernel-space facing RDMA APIs are backwards compatible after
  this change, with exception of ib_create_ah() and ib_destroy_ah()
  which takes a new flag.

- The "ib_device_ops" structure exist, but only contains the driver ID
  and some structure sizes.

Differences from Linux:

- Infiniband drivers must use the INIT_IB_DEVICE_OPS() macro to set
  the sizes needed for allocating various IB objects, when adding
  IB device instances.

Security:

- PRIV_NET_RAW is needed to use raw ethernet transmit features.
- PRIV_DRIVER is needed to use other privileged operations.

Based on upstream Linux, Torvalds (5.7-rc1):
8632e9b5645bbc2331d21d892b0d6961c1a08429

MFC after: 1 week
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31149
Sponsored by: NVIDIA Networking

Regen

Revert most of ce42e793100b460f597e4c85ec0da12e274f9394

to restore ABI compatibility for pre-10.x binaries.

It restores _umtx_lock() and _umtx_unlock() syscalls, and UMTX_OP_LOCK/
UMTX_OP_UNLOCK umtx_op(2) operations. UMUTEX_ERROR_CHECK flag is left
out for now, I do not think it makes a difference.

PR: 218571
Reviewed by: brooks (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31220

Fix the spelling of '*/' in the vnic driver

Sponsored by: The FreeBSD Foundation

Teach the arm64 kernel to identify the Arm AEM

The Arm Architecture Envelope Model is a simulator that models the
architecture rather than any specific implementation. Add its part ID
macro and add it to the list of Arm CPUs we can decode.

Sponsored by: The FreeBSD Foundation

vnic: add TODO list item for non-promisc mode

Also drop ARM64TODO comments; this is an issue with this specific
driver, not a general arm64 issue.

PR: 223575