Mark Johnston [Fri, 19 Nov 2021 22:30:05 +0000 (17:30 -0500)]
hyperv: Register the MSR-based timecounter during SI_SUB_HYPERVISOR
This reverts commit 9ef7df022a46 ("hyperv: Register hyperv_timecounter
later during boot") and adds a comment explaining why the timecounter
needs to be registered as early as it is.
PR: 259878
Fixes: 9ef7df022a46 ("hyperv: Register hyperv_timecounter later during boot")
Reviewed by: kib
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33014
Mark Johnston [Fri, 19 Nov 2021 22:29:28 +0000 (17:29 -0500)]
timecounter: Initialize tc_lock earlier
Hyper-V wants to register its MSR-based timecounter during
SI_SUB_HYPERVISOR, before SI_SUB_LOCK, since an emulated 8254 may not be
available for DELAY(). So we cannot use MTX_SYSINIT to initialize the
timecounter lock.
PR: 259878
Reviewed by: kib
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33014
Mitchell Horne [Wed, 17 Nov 2021 15:35:59 +0000 (11:35 -0400)]
Allow minidumps to be performed on the live system
Add a boolean parameter to minidumpsys(), to indicate a live dump. When
requested, take a snapshot of important global state, and pass this to
the machine-dependent minidump function. For now this includes the
kernel message buffer, and the bitset of pages to be dumped. Beyond
this, we don't take much action to protect the integrity of the dump
from changes in the running system.
A new function msgbuf_duplicate() is added for snapshotting the message
buffer. msgbuf_copy() is insufficient for this purpose since it marks
any new characters it finds as read.
For now, nothing can actually trigger a live minidump. A future patch
will add the mechanism for this. For simplicity and safety, live dumps
are disallowed for mips.
Reviewed by: markj, jhb
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31993
Mitchell Horne [Wed, 17 Nov 2021 15:35:18 +0000 (11:35 -0400)]
minidump: Use the provided dump bitset
When constructing the set of dumpable pages, use the bitset provided by
the state argument, rather than assuming vm_page_dump invariably. For
normal kernel minidumps this will be a pointer to vm_page_dump, but when
dumping the live system it will not.
To do this, the functions in vm_dumpset.h are extended to accept the
desired bitset as an argument. Note that this provided bitset is assumed
to be derived from vm_page_dump, and therefore has the same size.
Mitchell Horne [Wed, 17 Nov 2021 15:30:43 +0000 (11:30 -0400)]
minidump: reduce the amount direct accesses to page tables
During a live dump, we may race with updates to the kernel page tables.
This is generally okay; we accept that the state of the system while
dumping may be somewhat inconsistent with its state when the dump was
invoked. However, when walking the kernel page tables, it is important
that we load each PDE/PTE only once while operating on it. Otherwise, it
is possible to have the relevant PTE change underneath us. For example,
after checking the valid bit, but before reading the physical address.
Convert the loads to atomics, and add some validation around the
physical addresses, to ensure that we do not try to dump a non-existent
or non-canonical physical address.
Similarly, don't read kernel_vm_end more than once, on the off chance
that pmap_growkernel() is called between the two page table walks.
Reviewed by: kib, markj
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31990
Mitchell Horne [Wed, 17 Nov 2021 15:26:59 +0000 (11:26 -0400)]
minidump: Parameterize minidumpsys()
The minidump code is written assuming that certain global state will not
change, and rightly so, since it executes from a kernel debugger
context. In order to support taking minidumps of a live system, we
should allow copies of relevant global state that is likely to change to
be passed as parameters to the minidumpsys() function.
This patch does the work of parameterizing this function, by adding a
struct minidumpstate argument. For now, this struct allows for copies of
the kernel message buffer, and the bitset that tracks which pages should
be dumped (vm_page_dump). Follow-up changes will actually make use of
these arguments.
Notably, dump_avail[] does not need a snapshot, since it is not expected
to change after system initialization.
The existing minidumpsys() definitions are renamed, and a thin MI
wrapper is added to kern_dump.c, which handles the construction of
the state struct. Thus, calling minidumpsys() remains as simple as
before.
Andrew Turner [Tue, 2 Nov 2021 11:31:17 +0000 (11:31 +0000)]
Use a builtin where possible in msun
Some of the functions in msun can be implemented using a compiler
builtin function to generate a small number of instructions. Implement
this support in fma, fmax, fmin, and sqrt on arm64.
Care must be taken as the builtin can be implemented as a function
call on some architectures that lack direct support. In these cases
we need to use the original code path.
As we don't set errno on failure build with -fno-math-errno so the
toolchain doesn't convert a builtin into a function call when it
detects a failure, e.g. gcc will add a call to sqrt when the input
is negative leading to an infinite loop.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32801
Andrew Turner [Mon, 1 Nov 2021 13:06:56 +0000 (13:06 +0000)]
Switch to Arm Optimized Routines for mem* & str*
These are the updated version of the older Cortex Strings Library we
previously used. The Arm Optimized Routines also support CPU features
that are currently in development on FreeBSD, e.g. Branch Target
Identification (BTI). Rather than add BTI support to the old code it's
easier to just use the maintained version.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32774
Andriy Gapon [Fri, 19 Nov 2021 07:56:30 +0000 (09:56 +0200)]
iflib_stop: drain rx tasks to prevent any data races
iflib_stop modifies iflib data structures that are used by _task_fn_rx,
most prominently the free lists. So, iflib_stop has to ensure that the
rx task threads are not active.
This should help to fix a crash seen when iflib_if_ioctl (e.g.,
SIOCSIFCAP) is called while there is already traffic flowing.
The crash has been seen on VMWare guests with vmxnet3 driver.
My guess is that on physical hardware the couple of 1ms delays that
iflib_stop has after disabling interrupts are enough for the queued work
to be completed before any iflib state is touched.
But on busy hypervisors the guests might not get enough CPU time to
complete the work, thus there can be a race between the taskqueue
threads and the work done to handle an ioctl, specifically in iflib_stop
and iflib_init_locked.
Warner Losh [Sun, 10 Oct 2021 17:28:54 +0000 (11:28 -0600)]
Bootstrap: Prune building from pre-FreeBSD 11 support
We don't need to bootstrap lex or md4 anymore.
Cat doesn't need to be bootstrapped (but is needed for buildkernel)
cruncgen doesn't need to be bootstrapped at all.
kbdcontrol isn't needed
Gleb Smirnoff [Fri, 19 Nov 2021 04:26:09 +0000 (20:26 -0800)]
Add tcp_freecb() - single place to free tcpcb.
Until this change there were two places where we would free tcpcb -
tcp_discardcb() in case if all timers are drained and tcp_timer_discard()
otherwise. They were pretty much copy-n-paste, except that in the
default case we would run tcp_hc_update(). Merge this into single
function tcp_freecb() and move new short version of tcp_timer_discard()
to tcp_timer.c and make it static.
Gleb Smirnoff [Fri, 19 Nov 2021 04:24:46 +0000 (20:24 -0800)]
tcp_timewait: use on stack struct tcptw as last resort
In case we failed to uma_zalloc() and also failed to reuse with
tcp_tw_2msl_scan(), then just use on stack tcptw. This will allow
to run through tcp_twrespond() and standard tcpcb discard routine.
Bjoern A. Zeeb [Wed, 27 Oct 2021 17:07:38 +0000 (17:07 +0000)]
LinuxKPI: make bcd.h use libkern
Rather than having code to re-define bcd2bin() for the LinuxKPI
make sure libkern.h is always included before the LinuxKPI version.
Then only re-define our local LinuxKPI implementation. [1]
From the argument truncating wrapper call the libkern version.
If we change our libkern implementation in the future we can save
us the remainder of the hassle. [2] Given I need this to MFC,
which I am not sure we can with libkern, commit this intermediate
step.
Suggested by: Johannes Berg (johannes sipsolutions.net) [1]
Suggested by: ian [2]
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
X-MFC with: 548ada00e54a9e7745d041b1ec7f68f3bd493365
Differential Revision: https://reviews.freebsd.org/D32695
Rick Macklem [Thu, 18 Nov 2021 21:35:25 +0000 (13:35 -0800)]
mountd: Fix handling of usernames that start with a digit
yocalebo_gmail.com submitted a patch for mountd.c that
fixes the case where a username starts with a digit.
Without this patch, the username that starts with a
digit is misinterpreted as a numeric uid.
With this patch, any string that does not entirely
convert to a decimal number via strtoul() is considered
a user/group name.
xen/privcmd: fix MMAP_RESOURCE ioctl to copy out results
The current definition for the MMAP_RESOURCE ioctl was wrong as it
didn't copy back the result to the caller. Fix the definition and also
remove the bogus attempt to copy the result in the implementation.
Note such copy back is only needed when querying the size of a
resource.
Brooks Davis [Thu, 18 Nov 2021 01:02:06 +0000 (01:02 +0000)]
fspacectl: remove unneeded freebsd32 wrapper
fspacectl(2) does not require special handling on freebsd32. The
presence of off_t in a struct does not cause it's size to change
between the native ABI and the 32-bit ABI supported by freebsd32
because off_t is always int64_t on BSD systems. Further, byte
order only requires handling for paired argument or return registers.
(32-byte alignment of 64-bit objects on i386 can require special
handling, but that situtation does not apply here.)
Alex Richardson [Wed, 17 Nov 2021 23:51:40 +0000 (15:51 -0800)]
elf*_brand_inuse: Change return type to bool.
Reviewed by: kib
Obtained from: CheriBSD
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D33052
Alex Richardson [Wed, 17 Nov 2021 23:51:29 +0000 (15:51 -0800)]
imgact_elf: Use bool instead of boolean_t.
Reviewed by: kib
Obtained from: CheriBSD
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D33051
The fixup list was erroneously assumed to be directories only.
Only in the case of critical file flags modification (e.g.
SF_IMMUTABLE on BSD systems), other file types (e.g. regular files
or symbolic links) may be added to the fixup list. We still need to
verify that we are writing to the correct file type, so compare the
archive entry file type with the file type of the file to be
modified.
Fixes vendor issue #1617:
Immutable flag no longer preserved during tar extraction on FreeBSD
Brooks Davis [Wed, 17 Nov 2021 20:12:26 +0000 (20:12 +0000)]
freebsd32: sync return types with default ABI
This consists of int -> ssize_t where required and one int -> mode_t.
As a rule, return types are informative rather than functional as the
actual return is in a register.
Brooks Davis [Wed, 17 Nov 2021 20:12:26 +0000 (20:12 +0000)]
freebsd32: rename 32-bit compat pads to _pad
Some 32-bit architectures pass 64-bit values in aligned
register pairs (a0,a1), (a2,a3) etc. In freebsd32 we add these pads
explicitly from compat code. We also sometimes add pads in the default
ABI. Differentiate the two by making the freebsd32 ones int _pad.
In a future commit the 32-bit ones will be automatically generated.
Brooks Davis [Wed, 17 Nov 2021 20:12:26 +0000 (20:12 +0000)]
freebsd32: fix getfsstat sign extension bugs
Add freebsd32 versions of getfsstat and freebsd11_getfsstat so that
bufsize is properly sign-extended if a negative value is passed.
Reject negative values before passing to kern_getfsstat as a size_t.
Brooks Davis [Wed, 17 Nov 2021 20:12:26 +0000 (20:12 +0000)]
freebsd32: signed long corrections
Syscalls that take signed longs need to treat the 32-bit versions as
signed int so that sign extension happens correctly. Improve
decleration quality and add a few minimal syscall implementations.
Brooks Davis [Wed, 17 Nov 2021 20:12:25 +0000 (20:12 +0000)]
freebsd32: add stubs for ofreebsd32_(send|recv)msg
The upcoming change to generate freebsd32 generated files from
sys/kern/syscalls.master doesn't have a way to handle disabling
this one without disabling the non-COMPAT counterpart so just add
a stub for now.
Brooks Davis [Wed, 17 Nov 2021 20:12:25 +0000 (20:12 +0000)]
freebsd32: add feed foward clock syscalls
These are required when supporting i386 because time_t is 32-bit which
reduces struct bintime to 12-bytes when combined with the fact that 64-bit
integers only requiring 32-bit alignment on i386. Reusing the default
ABI version resulted in 4-byte overreads or overwrites to userspace.
Brooks Davis [Wed, 17 Nov 2021 20:12:25 +0000 (20:12 +0000)]
freebsd32: don't implement kldsym
Previously we fell back to sys_kldsym, but because we'd always
mismatch on the version field we'd return EINVAL. A freebsd32
implementation is impossible with the current ABI as there simply
isn't space to store a kernel virtual address in a uint32_t.
ofreebsd32_sigprocmask, ofreebsd32_sigblock, ofreebsd32_sigsetmask,
and ofreebsd32_sigsuspend were all duplicates of the default ABI
versions and there are no type concerns as all arguments are the
same.
Brooks Davis [Wed, 17 Nov 2021 20:12:24 +0000 (20:12 +0000)]
freebsd32: remove freebsd32_recvfrom
The freebsd32_recvfrom() serves no purpose as no arguments require
translation. The prototype was mis-declared and the implementation
contained (relatively harmless) errors.
Brooks Davis [Wed, 17 Nov 2021 20:12:24 +0000 (20:12 +0000)]
freebsd32: remove redundant no-arg syscalls
pipe requires no special handling.
ofreebsd32_sigpending did differ from osigpending in that it acted
on the siglist rather than the sigqueue, but this appears to be an
oversight in 3fbdb3c21524d9d95278ada1d61b4d1e6bee654b.
ogetpagesize could theoretically have ABI-dependent results, but in
practice does not. If it does it would be easy handle in the central
implementation and be the least of the problems in changing the value of
PAGE_SIZE.
Follow common convention and put the `32` on the end of the struct
name. This is a step toward generating freebsd32 syscall files
from sys/kern/syscalls.master.
Brooks Davis [Wed, 17 Nov 2021 20:12:23 +0000 (20:12 +0000)]
freebsd32: add a union semun_old32
Use this for COMPAT7 support. In practice it's the same as
union semun32 since the pointers become uint32_t's the it's more
symetric and is the logical thing to generate from semun_old.