Roger Pau Monné [Fri, 2 Feb 2024 10:56:32 +0000 (11:56 +0100)]
x86/xen: implement early init hook
Unify the HVM and PVH early setup, byt making both rely on the hypervisor
initialization hook part of identify_hypervisor().
The current initialization takes care of the hypercall page, the sahred info
page and does any fixup necessary to metadata video console information if
FreeBSD is booted as the initial domain (so the video console is handed from
Xen into FreeBSD).
Note this has the nice side effect of also allowing to use the Xen console on
HVM guests, which allows to get rid of the QEMU emulated uart and still get
a nice text console.
Roger Pau Monné [Fri, 2 Feb 2024 10:36:52 +0000 (11:36 +0100)]
x86/cpu: introduce an optional hook for early hypervisor initialization
Hypervisor detection is done very early on x86, and so can be used to also do
some very early hypervisor related initialization. Such initialization is
required when running as a Xen PVH guest, as for example the PIT needs to be
replaced with an hypervisor based timecounter.
Introduce an optional hook that gets called as part of the early hypervisor
detection.
Roger Pau Monné [Fri, 2 Feb 2024 10:29:57 +0000 (11:29 +0100)]
x86/xen: do video console fixup as part of early initialization
When FreeBSD is running as dom0 the video console metadata provided by the
bootloader might not be accurate, as Xen has very likely taken over the console
and possibly changed the mode.
Adjust the video console information in the kernel metadata as part of early
Xen initialization.
Roger Pau Monné [Fri, 2 Feb 2024 10:20:33 +0000 (11:20 +0100)]
x86/xen: move shared page setup to early init handler
As done with the hypercall page, move the setup fo the shared info page into
the newly introduced helper, which the aim of having a single helper and call
site used by both HVM and PV in order to setup the basic Xen environment.
Roger Pau Monné [Fri, 2 Feb 2024 10:00:31 +0000 (11:00 +0100)]
x86/xen: introduce a Xen early init function
Start by moving the hyeprcall setup to such function.
The aim is to have a function that does all the required Xen early
initialization for both HVM and PVH, instead of having it scattered across
different paths.
Roger Pau Monné [Fri, 2 Feb 2024 08:50:16 +0000 (09:50 +0100)]
x86/xen: fill hypercall page with int3
Filling the hypercall page with nops is not helpful from a debugging point of
view, as for example attempting to execute an hypercall before the page is
initialized will result in the execution flow falling through into
xen_start32, making the mistake less obvious to spot.
Instead fill the page with int3 (0xcc) which will result in a #BP trap.
Roger Pau Monné [Fri, 19 Jan 2024 09:15:17 +0000 (10:15 +0100)]
x86/cpu: improve hypervisor detection
Some hypervisors can expose multiple signatures, for example Xen will expose
both the Xen and the HyperV signatures if Viridian extensions are enabled for
the guest. Presence of multiple signatures is currently not handled by
FreeBSD, that will exit once a known signature is found in cpuid output.
Exposing the HyperV signature on hypervisors different than HyperV is not
uncommon, this is done so that such hypervisor can expose a (subset) of the
Viridian extensions to Windows guests for performance reasons. Likely for
compatibility purposes the HyperV signature is always exposed on the first
leaf, and the Xen signature is exposed in the secondary leaf.
Fix the specific case of HyperV by not exiting from the scan if the HyperV
signature is found, and prefer a second signature if one is found.
Note that long term we might wish to convert vm_guest into a bitmap, so that it
can signal detection of multiple hypervisor interfaces.
Roger Pau Monné [Mon, 22 Jan 2024 13:20:25 +0000 (14:20 +0100)]
x86/xen: introduce non-hypercall based emergency print
The current xc_printf() function uses an hypercall in order to send character
buffers to the hypervisor for it to print on the hypervisor console (if the
hypervisor is configured to print such messages).
This requires the hypercall page to be initialized, which is extra work and can
go wrong.
On x86 instead of using the console IO hypercall use the debug console IO port,
also called "port E9 hack". This allows sending characters to Xen using an
outb instruction, without any initialization required.
Keep the previous hypervisor based implementation by using the weak attribute,
which allows each architecture to provide an alternate (arch-specific)
implementation.
Roger Pau Monné [Mon, 5 Feb 2024 10:47:25 +0000 (11:47 +0100)]
x86/xen: fix out of bounds access to the event channel masks on resume
When resuming from migration or suspension all regular event channels ports are
reset to the INVALID_EVTCHN value, and drivers should re-initialize them
according to the new value provided by the other end of the connection.
However, the driver would first attempt to unbind the event channel handler
before attempting to bind it using the newly provided port. This unbind uses
the stale event channel port that has been set to INVALID_EVTCHN for some
operations (notably as a result of the handler removal the interrupt subsystem
ends up calling disable intr and source PIC hooks).
This was fine when INVALID_EVTCHN was 0, as then the operation would just
result in pointless setting of the 0 bit in the different event channel related
control arrays (evtchn_{pending,mask} for example). However with the change to
define INVALID_EVTCHN as ~0 the write is no longer pointless, and we end up
triggering a page-fault, or corrupting random data that happens to be mapped at
the array position + ~0 bits.
In hindsight the change of INVALID_EVTCHN from 0 to ~0 was way more risky than
initially assessed, and I believe has end up resulting in more fragile code for
no real benefit.
Fix the disable intr and source wrappers to check whether the event channel is
valid before attempting to use it.
Also introduce some extra KASSERTs in several array accesses in order to avoid
out of bounds accesses if INVALID_EVTCHN ever reaches those functions.
Warner Losh [Thu, 22 Feb 2024 03:10:45 +0000 (20:10 -0700)]
nextboot: check unlink, but only warn on !ENOENT
Emulate rm -f from the nextboot.sh script: Report all errors, except
ENOENT. This problems show through, except the expected one when
nextboot.conf isn't there.
bcm5974(4): Properly assign MT-slot on Apple Magic Trackpad
Assign multi-touch slot number based on internal evdev MT state and
reported tracking ID of contact rather than on sequentional number of
contact in report.
Andrew Turner [Tue, 9 Jan 2024 15:22:27 +0000 (15:22 +0000)]
Import the kernel parts of bhyve/arm64
To support virtual machines on arm64 add the vmm code. This is based on
earlier work by Mihai Carabas and Alexandru Elisei at University
Politehnica of Bucharest, with further work by myself and Mark Johnston.
All AArch64 CPUs should work, however only the GICv3 interrupt
controller is supported. There is initial support to allow the GICv2
to be supported in the future. Only pure Armv8.0 virtualisation is
supported, the Virtualization Host Extensions are not currently used.
With a separate userspace patch and U-Boot port FreeBSD guests are able
to boot to multiuser mode, and the hypervisor can be tested with the
kvm unit tests. Linux partially boots, but hangs before entering
userspace. Other operating systems are untested.
Sponsored by: Arm Ltd
Sponsored by: Innovate UK
Sponsored by: The FreeBSD Foundation
Sponsored by: University Politehnica of Bucharest
Differential Revision: https://reviews.freebsd.org/D37428
Bjoern A. Zeeb [Wed, 21 Feb 2024 09:10:55 +0000 (09:10 +0000)]
iicbus/mux/pca954x: add support for PCA9546 I2C Switch
Add support for the 4 channel I2C switch from NXP by adding a new
description struct and the list entries. Compared to x=[2345] which
require code to support the INT, for this one no further changes are
needed.
Tested on: WHLE-LS1088A using a SPF+
MFC after: 1 week
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D44009
Warner Losh [Wed, 21 Feb 2024 15:50:31 +0000 (08:50 -0700)]
loader/efi: Make gcc friendlier by move md_dev
Move the extern struct devsw md_dev out of the function. gcc is happier
with this arrangemnt often. However, we really should move it to a
header file, but that requires a bit of a rework of md support and
config.
Andrew Turner [Thu, 11 Jan 2024 17:01:52 +0000 (17:01 +0000)]
arm64: Add in_vhe() to find if the kernel is in VHE
Add a function to support devices that may need to know if the kernel
has enabled the Armv8.1 Virtulization Host Extensions (FEAT_VHE).
Some devices, e.g. the generic timer, will need to know, e.g. use a
different interrupt.
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D43973
Warner Losh [Wed, 21 Feb 2024 06:03:15 +0000 (23:03 -0700)]
reboot: Emulate nextboot -D better
It used to produce no output when the file couldn't be removed. Emulate
that better by unlinking and ignoring errors. It's used at the end of
reboot always, even when the file isn't going to be there.
Warner Losh [Wed, 21 Feb 2024 03:26:08 +0000 (20:26 -0700)]
kboot: Fix zfs bootonce protocol
This wasn't updated when the other copies were updated. Make it
identical to efi code. We should likely refactor this (with userboot),
but they are all not quite identical.
Warner Losh [Wed, 21 Feb 2024 03:31:50 +0000 (20:31 -0700)]
loader: For the mini-stdio we have for lua, #define them to something else
To make it easier to port lua and some of the lua modules, we have a
series of routines to implement the stdio routines, even though we don't
normally implement them in the boot loader. Add a comment to this effect.
Also, some tools, like sanitizers and static analysis tools, make
unwarranted assumptions about these, so #define them to a different name
so they stop.
Justin Hibbits [Tue, 20 Feb 2024 22:08:54 +0000 (17:08 -0500)]
loader/libofw: Fix disk size truncation
At present OF_ioctl first multiplies, then casts to 64-bit, meaning at
the asm level it truncates the result to 32-bit, then zero-extends it to
64-bit to return. Cast `n` to 64-bit before multiplying, so that the
correct result is returned.
Mark Johnston [Wed, 21 Feb 2024 00:21:29 +0000 (19:21 -0500)]
bhyve: Add support for XML register definitions
This is useful for exposing additional registers to debuggers. For
instance, control registers are now available on amd64 when using gdb to
debug a guest.
The stub indicates support by including the string
"qXfer:features:read+" in its feature list. The debugger queries for
target descriptions by sending the query "qXfer:features:read:" followed
by a file path.
The XML definitions are copied from QEMU and installed to
/usr/share/bhyve/gdb.
Note that we currently don't handle the SIMD registers at all, since
that's of somewhat limited utility (for me at least) and since that
requires new ioctls to fetch the register values.
libsys: remove usage of pthread_once and _once_stub
that existed in auxv.c, use simple bool gate instead. This leaves a
small window if two threads try to call _elf_aux_info(3) simultaneously.
The situation is safe because auxv parsing is really idempotent. The
parsed data is the same, and we store atomic types (int/long/ptr) so
double-init does not matter.
Reviewed by: brooks, imp
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D43985
Gleb Smirnoff [Tue, 20 Feb 2024 22:31:06 +0000 (14:31 -0800)]
tests/fdgrowtable: open more files in the threaded case
This should fix the test failing on some machines/conditions/runs. This
won't fix failures in standalone run, but should fix kyua(1) runs.
Currently with standalone run it will usually fail because the 40-sized
allocation is skipped (see details below).
This matches what forking test does: open 128 files in the parent and 128
in the child. There should actually be no difference where and when the
files are open, but let's mimic the forking test, and open more files in
the spawned thread. Also opening from two different contexts adds a bit
more entropy to the test.
What the test does it checks that fdgrowtable() has been called at least
three tmes for the test process, and the old tables are still on the free
list as long as other execution contexts exist. Under kyua(1) control the
first call grows the table from 20 to 40, but the original table of 20 is
an embedded one, thus is not put on the free list. Passing 40 open files
the table grows to 128 and first old table lands on the free list. Passing
128 open file the table grows to 256 and a second old table lands on the
free list. After that the test would pass. The threaded test was one
open file off before this fix sometimes.
Gleb Smirnoff [Tue, 20 Feb 2024 18:31:05 +0000 (10:31 -0800)]
arp: fix arp -s/-S
When setting a permanent ARP entry, the route(4) would use
rtm->rtm_rmx.rmx_expire == 0 as a flag for installing a static entry, but
netlink(4) is looking for explicit NTF_STICKY flag in the request. The
arp(8) utility was adopted to use netlink(4) by default, but it has lots
of route-era guts internally. Specifically there is global variable 'opts'
that shares configuration for both protocols, and it is still initialized
with route(4) specific RTF_xxx flags. In set_nl() these flags are
translated to netlink(4) parameters. However, RTF_STATIC is a flag that is
never set by default, so attempt to use it as a proxy flag manifesting
-s/-S results in losing it. Use zero opts.expire_time as a manifest of
-s/-S operation. This is a minimal fix. A better one would be to fully
get rid of route(4) legacy.
The change also corrects the logic to set NUD_PERMANENT flag for
consistency. This flag is ignored by our kernel (now).
Ed Maste [Tue, 20 Feb 2024 16:43:00 +0000 (11:43 -0500)]
iwm.4: add iwlwifi cross-reference
iwlwifi(4) supports a superset of the devices supported by iwm(4). The
latter may be retired in the future (if there is no reason to prefer it
for the set of devices supported by both).
Stefan Eßer [Tue, 20 Feb 2024 12:02:24 +0000 (13:02 +0100)]
msdosfs: fix potential inode collision on FAT12 and FAT16
FAT file systems do not use inodes, instead all file meta-information
is stored in directory entries.
FAT12 and FAT16 use a fixed size area for root directories, with
typically 512 entries of 32 bytes each (for a total of 16 KB) on hard
disk formats. The file system data is stored in clusters of typically
512 to 4096 bytes, depending on the size of the file system.
The current code uses the offset of a DOS 8.3 style directory entry as
a pseudo-inode, which leads to inode values of 0 to 16368 for typical
root directories with 512 entries.
Sub-directories use 2 cluster length plus the byte offset of the
directory entry in the data area for the pseudo-inode, which may be
as low as 1024 in case of 512 byte clusters. A sub-directory in
cluster 2 and with 512 byte clusters will therefore lead to a
re-use of inode 1024 when there are at least 32 DOS 8.3 style
filenames in the root directory (or 11 14-character Windows
long file names, each of which takes up 3 directory entries).
FAT32 file systems are not affected by this issue and FAT12/FAT16
file systems with larger cluster sizes are unlikely to have as
many directory entries in the root directory as are required to
cause the collision.
This commit leads to inode numbers that are guaranteed to not collide
for all valid FAT12 and FAT16 file system parameters. It does also
provide a small speed-up due to more efficient use of the vnode cache.
Approved by: mckusick
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D43978
Kirk McKusick [Tue, 20 Feb 2024 00:16:07 +0000 (16:16 -0800)]
Eliminate unnecessary UFS1 integrity checks.
The UFS1 integrity checks added in FreeBSD 14 were too aggressive
for UFS1 filesystems created in FreeBSD 4 and 9 systems. This patch
removes those tests which can be done safely since they are not
relevant to the current implementation of UFS1.
This is a follow-on report to bug report 264450 (comments 21-28).
Xin LI [Mon, 19 Feb 2024 23:01:04 +0000 (15:01 -0800)]
zlib: use more memory for a small deflate speedup.
The LIT_MEM option uses slightly more memory (for base gzip(1),
about 16kiB; according to the author, about 6% for default deflate
settings) for a small speedup.
The performance gain is more noticeable for input data with higher
entropy and less significant for data that is highly compressible,
such as source code and logs.
Brooks Davis [Mon, 19 Feb 2024 22:44:08 +0000 (22:44 +0000)]
lib{c,thr}: add DT_RUNPATH for gcc -m32
To allow gcc -m32 to work, link libc and libthr with --rpath-/usr/lib32.
When called with -m32, gcc is currently unable to communicate to
the bfd linker that it should look in /usr/lib32 to resolve needed (as
opposed to explicitly linked) libraries so we need to provide a hint.
See also: https://sourceware.org/bugzilla/show_bug.cgi?id=31395
Brooks Davis [Mon, 19 Feb 2024 22:44:08 +0000 (22:44 +0000)]
lib{c,sys}: move auxargs more firmly into libsys
Continue to filter the public interface (elf_aux_info()), but entierly
relocate the private interfaces (_elf_aux_info(),
__init_elf_aux_vector(), and __elf_aux_vector) to libsys.
This ensures that rtld updates the correct (only) copy of
__elf_aux_vector. After 968a18975adc9c2a619bb52aa2f009de99fc9e24
updates were confused and __getosreldate was failing, causing
the system to fall back to compat compat12 syscalls in some cases.
Return to explicitly linking libc to libsys and link libthr with libc
and libsys (in that order).
Andriy Gapon [Mon, 19 Feb 2024 10:16:47 +0000 (12:16 +0200)]
scsi_da: add 4K quirks for Samsung SSD 860 and 870
Although the actual flash page size is either 8K or 16K for those
devices (according to different sources of various reliability), they
seem to be optimized for the "industry-standard" emulated 4K block size.
To do: consolidate very similar Samsung SSD entries for 830 - 870
models.
Boris Lytochkin [Mon, 19 Feb 2024 07:44:52 +0000 (10:44 +0300)]
ndp(8): increase buffer size in rtsock mode
On a router with many connected devices (~10k+) `ndp -an` can fail
with ENOMEM because of some additional NDP records were added
between sysctl() buffer size estimate and data fetch calls.
Allocate more space based on size estimate: 1/64 (~2%) of additional
space, but not less that 4 m_rtmsg structures.
Currently there is no support for generating armv7 vm images in the
release artifacts. In fact in terms of release artifacts and
architecture there is no good reason to have a vm release artifact for
armv7 as those are mostly used in SOCs or embedded boards. However
considering that developers actually do need an easy way to test armv7
with a vm running this is really important. As part of pre-commit ci for
developers this can be really helpful for the end developers.
Approved by: cperciva, imp, re
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D43952
Bryan Drewery [Sun, 18 Feb 2024 18:55:11 +0000 (10:55 -0800)]
wc: Fix SIGINFO race with casper init.
If a file is specified then fileargs_init(3) may return [EINTR]. With
the SIGINFO handler not being SA_RESTART this causes an early exit
if a SIGINFO comes in. Rather than checking for [EINTR] or changing the
handler just move it later which resolves the problem.
Bjoern A. Zeeb [Sun, 18 Feb 2024 17:47:22 +0000 (17:47 +0000)]
net80211: increase number of spares in struct ieee80211_vap
Turns out MFCing 713db49d06deee90dd358b2e4b9ca05368a5eaf6 does not
leave us with enough spares. Given wireless will likely see more
changes in the near future add more spares.
This is especially necessary given 'struct ieee80211_vap' gets
allocated by drivers.
Bumps size of struct ieee80211_vap to (7 * 512) on 64bit.
unionfs: work around underlying FS failing to respect cn_namelen
unionfs_mkshadowdir() may be invoked on a non-leaf pathname component
during lookup, in which case the NUL terminator of the pathname buffer
will be well beyond the end of the current component. cn_namelen in
this case will still (correctly) indicate the length of only the
current component, but ZFS in particular does not currently respect
cn_namelen, leading to the creation on inacessible files with slashes
in their names. Work around this behavior by temporarily NUL-
terminating the current pathname component for the call to VOP_MKDIR().
https://github.com/openzfs/zfs/issues/15705 has been filed to track
a proper upstream fix for the issue at hand.
unionfs: upgrade the vnode lock during fsync() if necessary
If the underlying upper FS supports shared locking for write ops,
as is the case with ZFS, VOP_FSYNC() may only be called with the vnode
lock held shared. In this case, temporarily upgrade the lock for
those unionfs maintenance operations which require exclusive locking.
While here, make unionfs inherit the upper FS' support for shared
write locking. Since the upper FS is the target of VOP_GETWRITEMOUNT()
this is what will dictate the locking behavior of any unionfs caller
that uses vn_start_write() + vn_lktype_write(), so unionfs must be
prepared for the caller to only hold a shared vnode lock in these
cases.
Found in local testing of unionfs atop ZFS with DEBUG_VFS_LOCKS.
VFS: update VOP_FSYNC() debug check to reflect actual locking policy
Shared vs. exclusive locking is determined not by MNT_EXTENDED_SHARED
but by MNT_SHARED_WRITES (although there are several places that
ignore this and simply always use an exclusive lock). Also add a
comment on the possible difference between VOP_GETWRITEMOUNT(vp)
and vp->v_mount on this path.
Found by local testing of unionfs atop ZFS with DEBUG_VFS_LOCKS.
Store the upper/lower FS mount objects in unionfs per-mount data and
use these instead of the v_mount field of the upper/lower root
vnodes. As described in the referenced PR, it is unsafe to access this
field on the unionfs unmount path as ZFS rollback may have obliterated
the v_mount field of the upper or lower root vnode. Use these stored
objects to slightly simplify other code that needs access to the
upper/lower mount objects as well.
syscon_power: do reboot after shutdown_panic is executed
A syscon_power instance can handle either poweroff or reboot, but not
both. If the instance handles reboot then set its priority to be after
shutdown_panic.
This is to provide uniform experience with other platforms.
Andriy Gapon [Sun, 18 Feb 2024 13:57:34 +0000 (15:57 +0200)]
rk8xx_poweroff: enable power-cycling on support hardware
Previously, the function would return early if RB_POWERCYCLE was
specified without RB_POWEROFF. Those flags are exclusive at the moment,
that is, they are never set together.
Søren Schmidt (sos) uses a similar but extended patch locally.