Brooks Davis [Mon, 22 Nov 2021 22:36:57 +0000 (22:36 +0000)]
makesyscalls: handle 64-bit args on 32-bit
On 32-bit architectures, 64-bit arguments are passed in pairs of
registers. On non-x86 architectures these arguments must be in evenly
aligned registers which necessiciates inserting a pad register into the
argument list. This has historically been supported by adding ifdefs
around padded and unpadded syscall defintions in syscalls.master.
In order to enable generation of 32-bit support files from the base
syscalls.master, pull this support in to makesyscalls.lua enabled by
adding pair_64bit to abi_flags.
The changes to sys_proto.h simply add #ifdef PAD64_REQUIRED
around pad arguments in struct <syscall>_args. In systrace_args(),
replace static syscall index values with post-incremented indexs
allowing a simple ifdef around the argument. Under -O1 or higher
code generation is identical. systrace_entry_setargdesc() is a bit
more complicated as we switch on argument indices. Solve this
with some use of define/undef pairs to compute the correct indices.
Brooks Davis [Mon, 22 Nov 2021 22:36:57 +0000 (22:36 +0000)]
makesyscalls: handle longs in ABI compat
Replace long-derived types with their abi equivalent where
required by the target ABI. There are two cases:
- All pointers to types that go from 64-bit to 32-bit between the
default ABI and the target ABI.
- Signed arguments that go from 64-bit to 32-bit (these require
sign-extension before passing to general kernel ABIs).
This adds four new config variables: abi_long, semid_t, abi_size_t,
and abi_u_long which default to long, size_t, and u_long respectively.
Brooks Davis [Mon, 22 Nov 2021 22:36:57 +0000 (22:36 +0000)]
makesyscalls.lua: Allow translation of intptr_t arguments
Translate instances of intptr_t to the config value abi_intptr_t
(defaults to "intptr_t"). Used in CheriABI to translate intptr_t
to intcap_t for hybrid kernels.
Brooks Davis [Mon, 22 Nov 2021 22:36:56 +0000 (22:36 +0000)]
makesyscalls: Add a way to include per-ABI headers
When the string %%ABI_HEADERS%% is found in syscalls.master, replace
it with the contents of the abi_headers config variable. This allows
an ABI-specific syscalls.conf to add lines like:
Brooks Davis [Mon, 22 Nov 2021 22:36:56 +0000 (22:36 +0000)]
syscalls: improve nstat, nfstat, nlstat
Optionally return errors when truncating dev_t, ino_t, and nlink_t.
In the interest of code reuse, use freebsd11_cvtstat() to perform the
truncation and error handling and then convert the resulting struct
freebsd11_stat to struct nstat.
Add missing freebsd32 compat syscalls. These syscalls require
translation because struct nstat contains four instances of struct
timespec which in turn contains a time_t and a long.
Allan Jude [Fri, 19 Nov 2021 15:14:30 +0000 (15:14 +0000)]
openssl: Fix detection of ARMv7 and ARM64 CPU features
OpenSSL assumes the same value for AT_HWCAP=16 (Linux)
So it ends up calling elf_auxv_info() with AT_CANARY which
returns ENOENT, and all acceleration features are disabled.
With this, my ARM64 test machine runs the benchmark
`openssl speed -evp aes-256-gcm` nearly 20x faster
going from 100 MB/sec to 2000 MB/sec
It also improves sha256 from 300 MB/sec to 1800 MB/sec
This fix has been accepted but not yet merged upstream:
https://github.com/openssl/openssl/pull/17082
Corvin Köhne [Mon, 22 Nov 2021 15:26:03 +0000 (16:26 +0100)]
bhyve: keep physical and virtual COMMAND reg in sync
On startup all virtual BARs are registered.
Additionally, the encoding bit in the virtual cmd register is set.
After that, the passthru emulation overwrites the virtual cmd register with
the physical one.
This could lead to a mismatch between registered BARs and the encoding
bits in the cmd register.
Instead of writing the physical to the virtual cmd register,
write the virtual to the physical cmd register to solve this issue.
Corvin Köhne [Mon, 22 Nov 2021 15:22:48 +0000 (16:22 +0100)]
bhyve: move 64 bit BAR location to match OVMF assumptions
OVMF will fail, if large 64 bit BARs are used. GCD-Map doesn't cover
64 bit addresses of BARs.
OVMF assumes that 64 bit addresses of BARS are located on next 32 GB
boundary behind Top of High RAM.
This patch moves 64 bit BARs on next 32 GB boundary behind Top of High
RAM to match OVMF assumptions.
Alexander Motin [Sun, 21 Nov 2021 23:50:59 +0000 (18:50 -0500)]
GEOM: Switch g_io_deliver() locking from cp to pp.
Single provider may have multiple consumers, and locking one of consumers
is not sufficient to protect the provider. Though the only part of the
provider this locking protects now is its statistics.
Reported by: Arka Sharma <arka.sw1988@gmail.com>
MFC after: 2 weeks
Ed Maste [Sun, 21 Nov 2021 17:17:20 +0000 (12:17 -0500)]
Fix coredump_phnum test with ASLR enabled by default
coredump_phnum intends to generate a core file with many PT_LOAD
segments. Previously it called mmap() in a loop with alternating
protections, relying on each mapping following the previous, to produce
a core file with many page-sized PT_LOAD segments. With ASLR on we no
longer have this property of each mmap() following the previous.
Instead, perform a single allocation, and then use mprotect() to set
alternating pages to PROT_READ.
Warner Losh [Sun, 21 Nov 2021 16:05:07 +0000 (09:05 -0700)]
loader: abstract boot services exiting to libefi function
Move direct call of ExitBootServices to efi_exit_boot_services. This
function sets boot_services_active to false so callers don't have to do
it everywhere (though currently only loader/bootinfo.c is affected).
Warner Losh [Sun, 21 Nov 2021 15:50:46 +0000 (08:50 -0700)]
mpr: fix freeze / release mismatch in timeout code
So, if we're processing a timeout, and we've sent an ABORT to the
firmware for that timeout, but not yet received the response from the
firmware, AND we get another timeout, we queue the timeout and freeze
the queue. However, when we've finally processed them all, we only
release the queue once. This causes all I/O to halt as the devq remains
frozen forever.
Instead, only freeze the queue when we start the process (eg set INRESET
on the target). This will allow the release when all the timed out I/Os
have finished ABORTing.
Mark Johnston [Sat, 20 Nov 2021 16:21:25 +0000 (11:21 -0500)]
zfs: Fix a deadlock between page busy and the teardown lock
When rolling back a dataset, ZFS has to purge file data resident in the
system page cache. To do this, it loops over all vnodes for the
mountpoint and calls vn_pages_remove() to purge pages associated with
the vnode's VM object. Each page is thus exclusively busied while the
dataset's teardown write lock is held.
When handling a page fault on a mapped ZFS file, FreeBSD's page fault
handler busies newly allocated pages and then uses VOP_GETPAGES to fill
them. The ZFS getpages VOP acquires the teardown read lock with vnode
pages already busied. This represents a lock order reversal which can
lead to deadlock.
To break the deadlock, observe that zfs_rezget() need only purge those
pages marked valid, and that pages busied by the page fault handler are,
by definition, invalid. Furthermore, ZFS pages always transition from
invalid to valid with the teardown lock held, and ZFS never creates
partially valid pages. Thus, zfs_rezget() can use the new
vn_pages_remove_valid() to skip over pages busied by the fault handler.
Mark Johnston [Fri, 19 Nov 2021 22:30:05 +0000 (17:30 -0500)]
hyperv: Register the MSR-based timecounter during SI_SUB_HYPERVISOR
This reverts commit 9ef7df022a46 ("hyperv: Register hyperv_timecounter
later during boot") and adds a comment explaining why the timecounter
needs to be registered as early as it is.
PR: 259878
Fixes: 9ef7df022a46 ("hyperv: Register hyperv_timecounter later during boot")
Reviewed by: kib
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33014
Mark Johnston [Fri, 19 Nov 2021 22:29:28 +0000 (17:29 -0500)]
timecounter: Initialize tc_lock earlier
Hyper-V wants to register its MSR-based timecounter during
SI_SUB_HYPERVISOR, before SI_SUB_LOCK, since an emulated 8254 may not be
available for DELAY(). So we cannot use MTX_SYSINIT to initialize the
timecounter lock.
PR: 259878
Reviewed by: kib
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33014
Mitchell Horne [Wed, 17 Nov 2021 15:35:59 +0000 (11:35 -0400)]
Allow minidumps to be performed on the live system
Add a boolean parameter to minidumpsys(), to indicate a live dump. When
requested, take a snapshot of important global state, and pass this to
the machine-dependent minidump function. For now this includes the
kernel message buffer, and the bitset of pages to be dumped. Beyond
this, we don't take much action to protect the integrity of the dump
from changes in the running system.
A new function msgbuf_duplicate() is added for snapshotting the message
buffer. msgbuf_copy() is insufficient for this purpose since it marks
any new characters it finds as read.
For now, nothing can actually trigger a live minidump. A future patch
will add the mechanism for this. For simplicity and safety, live dumps
are disallowed for mips.
Reviewed by: markj, jhb
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31993
Mitchell Horne [Wed, 17 Nov 2021 15:35:18 +0000 (11:35 -0400)]
minidump: Use the provided dump bitset
When constructing the set of dumpable pages, use the bitset provided by
the state argument, rather than assuming vm_page_dump invariably. For
normal kernel minidumps this will be a pointer to vm_page_dump, but when
dumping the live system it will not.
To do this, the functions in vm_dumpset.h are extended to accept the
desired bitset as an argument. Note that this provided bitset is assumed
to be derived from vm_page_dump, and therefore has the same size.
Mitchell Horne [Wed, 17 Nov 2021 15:30:43 +0000 (11:30 -0400)]
minidump: reduce the amount direct accesses to page tables
During a live dump, we may race with updates to the kernel page tables.
This is generally okay; we accept that the state of the system while
dumping may be somewhat inconsistent with its state when the dump was
invoked. However, when walking the kernel page tables, it is important
that we load each PDE/PTE only once while operating on it. Otherwise, it
is possible to have the relevant PTE change underneath us. For example,
after checking the valid bit, but before reading the physical address.
Convert the loads to atomics, and add some validation around the
physical addresses, to ensure that we do not try to dump a non-existent
or non-canonical physical address.
Similarly, don't read kernel_vm_end more than once, on the off chance
that pmap_growkernel() is called between the two page table walks.
Reviewed by: kib, markj
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31990
Mitchell Horne [Wed, 17 Nov 2021 15:26:59 +0000 (11:26 -0400)]
minidump: Parameterize minidumpsys()
The minidump code is written assuming that certain global state will not
change, and rightly so, since it executes from a kernel debugger
context. In order to support taking minidumps of a live system, we
should allow copies of relevant global state that is likely to change to
be passed as parameters to the minidumpsys() function.
This patch does the work of parameterizing this function, by adding a
struct minidumpstate argument. For now, this struct allows for copies of
the kernel message buffer, and the bitset that tracks which pages should
be dumped (vm_page_dump). Follow-up changes will actually make use of
these arguments.
Notably, dump_avail[] does not need a snapshot, since it is not expected
to change after system initialization.
The existing minidumpsys() definitions are renamed, and a thin MI
wrapper is added to kern_dump.c, which handles the construction of
the state struct. Thus, calling minidumpsys() remains as simple as
before.
Andrew Turner [Tue, 2 Nov 2021 11:31:17 +0000 (11:31 +0000)]
Use a builtin where possible in msun
Some of the functions in msun can be implemented using a compiler
builtin function to generate a small number of instructions. Implement
this support in fma, fmax, fmin, and sqrt on arm64.
Care must be taken as the builtin can be implemented as a function
call on some architectures that lack direct support. In these cases
we need to use the original code path.
As we don't set errno on failure build with -fno-math-errno so the
toolchain doesn't convert a builtin into a function call when it
detects a failure, e.g. gcc will add a call to sqrt when the input
is negative leading to an infinite loop.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32801
Andrew Turner [Mon, 1 Nov 2021 13:06:56 +0000 (13:06 +0000)]
Switch to Arm Optimized Routines for mem* & str*
These are the updated version of the older Cortex Strings Library we
previously used. The Arm Optimized Routines also support CPU features
that are currently in development on FreeBSD, e.g. Branch Target
Identification (BTI). Rather than add BTI support to the old code it's
easier to just use the maintained version.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32774
Andriy Gapon [Fri, 19 Nov 2021 07:56:30 +0000 (09:56 +0200)]
iflib_stop: drain rx tasks to prevent any data races
iflib_stop modifies iflib data structures that are used by _task_fn_rx,
most prominently the free lists. So, iflib_stop has to ensure that the
rx task threads are not active.
This should help to fix a crash seen when iflib_if_ioctl (e.g.,
SIOCSIFCAP) is called while there is already traffic flowing.
The crash has been seen on VMWare guests with vmxnet3 driver.
My guess is that on physical hardware the couple of 1ms delays that
iflib_stop has after disabling interrupts are enough for the queued work
to be completed before any iflib state is touched.
But on busy hypervisors the guests might not get enough CPU time to
complete the work, thus there can be a race between the taskqueue
threads and the work done to handle an ioctl, specifically in iflib_stop
and iflib_init_locked.
Warner Losh [Sun, 10 Oct 2021 17:28:54 +0000 (11:28 -0600)]
Bootstrap: Prune building from pre-FreeBSD 11 support
We don't need to bootstrap lex or md4 anymore.
Cat doesn't need to be bootstrapped (but is needed for buildkernel)
cruncgen doesn't need to be bootstrapped at all.
kbdcontrol isn't needed
Gleb Smirnoff [Fri, 19 Nov 2021 04:26:09 +0000 (20:26 -0800)]
Add tcp_freecb() - single place to free tcpcb.
Until this change there were two places where we would free tcpcb -
tcp_discardcb() in case if all timers are drained and tcp_timer_discard()
otherwise. They were pretty much copy-n-paste, except that in the
default case we would run tcp_hc_update(). Merge this into single
function tcp_freecb() and move new short version of tcp_timer_discard()
to tcp_timer.c and make it static.