Adam Fenn [Sat, 7 Aug 2021 20:01:46 +0000 (13:01 -0700)]
pvclock: Add 'struct pvclock' API
Consolidate more hypervisor-agnostic functionality behind a new 'struct
pvclock' API.
This should also make it easier to subsequently add hypervisor-agnostic
vDSO timekeeping support.
Also, perform some clean-up:
- Remove 'pvclock_get_last_cycles()'; do not allow external access
to 'pvclock_last_systime' since this is not necessary.
- Consolidate/simplify wall and system time reading codepaths.
- Ensure correct ordering within wall and system time reading
codepaths via 'atomic(9)' and 'rdtsc_ordered()' rather than via
'rmb()'.
- Remove some extra newlines.
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31418
fstatat(2): handle non-vnode file descriptors for AT_EMPTY_PATH
Set NIRES_EMPTYPATH earlies, to have use of EMPTYPATH recorded even if
we are going to return error. When namei_setup() refused to accept dirfd,
which is not of the vnode type, and indicated by ENOTDIR error return,
fall back to kern_fstat(dirfd).
Reported by: dchagin
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31530
ufs rename: ensure that the result of ufs_checkpath() is stable
ufs_rename() calls ufs_checkpath() to ensure that the target directory
is not a child of the source. If not, rename would create a loop.
For instance:
source->X1->X2->target
and if source moved under target, we get corrupted filesystem.
Suppose that we initially have
source->X1 .... and X2->target
where X1 is not on path from root to X2. Then ufs_checkpath() accepts
the inodes, but there is nothing preventing parallel rename of X2 to become
under X1, after checkpath finished.
Ensure stability of ufs_checkpath() result by taking a per-mount sx in
ufs_rename right before ufs_checkpath() and till the end.
Reviewed by: chs, mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Mark Johnston [Fri, 13 Aug 2021 13:52:05 +0000 (09:52 -0400)]
arc4random: Avoid KMSAN false positives from pre-seeding results
If code calls arc4random(), and our RNG is not yet seeded and
random_bypass_before_seeding is true, we'll compute a key using the
SHA256 hash of some hopefully hard-to-predict data, including the
contents of an uninitialized stack buffer (which is also the output
buffer).
When KMSAN is enabled, this use of uninitialized state propagtes through
to the arc4random() output, resulting in false positives. To address
this, lie to KMSAN and explicitly mark the buffer as initialized.
Reviewed by: cem (previous version)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31510
lld rounds up p_memsz(PT_GNU_RELRO) to satisfy common-page-size. If the
page size is smaller than common-page-size, rounding up relro_size may
incorrectly make some RW pages read-only.
GNU ld, gold, and ld.lld ensures p_vaddr+p_memsz is a multiple of
common-page-size. While max-page-size >= system the page size,
common-page-size can be smaller than the system page size.
This is a new major release with a number of changes and extensions:
- Limited the number of temporary numbers and made the space for them
static so that allocating more space for them cannot fail.
- Allowed integers with non-zero scale to be used with power, places,
and shift operators.
- Added greatest common divisor and least common multiple to lib2.bc.
- Made bc and dc UTF-8 capable.
- Added the ability for users to have bc and dc quit on SIGINT.
- Added the ability for users to disable prompt and TTY mode by
environment variables.
- Added the ability for users to redefine keywords.
- Added dc's modular exponentiation and divmod to bc.
- Added the ability to assign strings to variables and array elements
and pass them to functions in bc.
- Added dc's asciify command and stream printing to bc.
- Added bitwise and, or, xor, left shift, right shift, reverse,
left rotate, right rotate, and mod functions to lib2.bc.
- Added the functions s2u(x) and s2un(x,n), to lib2.bc.
Kornel Duleba [Fri, 13 Aug 2021 07:35:08 +0000 (09:35 +0200)]
ipsec: Return error code if no matching SA was found
If we matched SP to a packet, but no associated SA was found
ipsec4_allocsa will return NULL while setting error=0.
This resulted in use after free and potential kernel panic.
Return EINPROGRESS if the case described above instead.
Rick Macklem [Thu, 12 Aug 2021 23:48:28 +0000 (16:48 -0700)]
nfsd: Fix sanity check for NFSv4.2 Allocate operations
The NFSv4.2 Allocate operation sanity checks the aa_offset
and aa_length arguments. Since they are assigned to variables
of type off_t (signed) it was possible for them to be negative.
It was also possible for aa_offset+aa_length to exceed OFF_MAX
when stored in lo_end, which is uint64_t.
This patch adds checks for these cases to the sanity check.
Jessica Clarke [Thu, 12 Aug 2021 22:50:48 +0000 (23:50 +0100)]
tools/build/cross-build: Fix building libllvmminimal on Linux
There is a __used member in glibc's posix_spawn_file_actions_t in
spawn.h, so we must temporarily undefine __used when including it,
otherwise Support/Unix/Program.inc fails to build. This is based on
similar handling for __unused in other headers.
Fixes: 31ba4ce8898f ("Allow bootstrapping llvm-tblgen on macOS and Linux")
MFC after: 1 week
Jessica Clarke [Thu, 12 Aug 2021 22:45:09 +0000 (23:45 +0100)]
bsd.compiler.mk: Fix cross-building from non-FreeBSD
On non-FreeBSD, the various MACHINE variables for the host when
bootstrapping can be missing or not match FreeBSD's naming, causing
bsd.endian.mk to be unable to infer the endianness. Work around this by
assuming it's unsupported.
Note that we can't check BOOTSTRAPPING here as Makefile.inc1 includes
bsd.compiler.mk before that is set, and so we are unable to catch errors
during buildworld itself when cross-building and bsd.endian.mk failed,
but such errors should also show up when building on FreeBSD.
Fixes: 47363e99d3d3 ("Enable compressed debug on little-endian targets")
John Baldwin [Thu, 12 Aug 2021 15:48:14 +0000 (08:48 -0700)]
cxgbei: Wait for the final CPL to be received in icl_cxgbei_conn_close.
A socket in the FIN_WAIT_1 state is marked disconnected by
do_close_con_rpl() even though there might still receive data pending.
This is because the socket at that point has set SBS_CANTRCVMORE which
causes the protocol layer to discard any data received before the FIN.
However, icl_cxgbei_conn_close needs to wait until all the data has
been discarded. Replace the wait for SS_ISDISCONNECTED with instead
waiting for final_cpl_received() to be called.
Ka Ho Ng [Thu, 12 Aug 2021 15:01:02 +0000 (23:01 +0800)]
uipc_shm: Implements fspacectl(2) support
This implements fspacectl(2) support on shared memory objects. The
semantic of SPACECTL_DEALLOC is equivalent to clearing the backing
store and free the pages within the affected range. If the call
succeeds, subsequent reads on the affected range return all zero.
tests/sys/posixshm/posixshm_tests.c is expanded to include a
fspacectl(2) functional test.
Sponsored by: The FreeBSD Foundation
Reviewed by: kevans, kib
Differential Revision: https://reviews.freebsd.org/D31490
Ka Ho Ng [Thu, 12 Aug 2021 14:58:52 +0000 (22:58 +0800)]
vfs: Add ioflag to VOP_DEALLOCATE(9)
The addition of ioflag allows callers passing
IO_SYNC/IO_DATASYNC/IO_DIRECT down to the file system implementation.
The vop_stddeallocate fallback implementation is updated to pass the
ioflag to the file system implementation. vn_deallocate(9) internally is
also changed to pass ioflag to the VOP_DEALLOCATE call.
Sponsored by: The FreeBSD Foundation
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D31500
Cy Schubert [Thu, 12 Aug 2021 13:38:21 +0000 (06:38 -0700)]
wpa: Add wpa_cli action file event
Yan Zhong at FreeBSD Foundation is working on a wireless network
configuratior for an experimental FreeBSD installer. The new installer
requires an event to detect when connecting to a network fails due to a
bad password. When this happens a WPA-EVENT-TEMP-DISABLED event is
triggered. This patch passes the event to an action file provided by
the new experimental installer.
Submitted by: Yang Zhong <yzhong () freebsdfoundation.org>
Reviewed by: assumed to be reviewed by emaste (and cy)
MFC after: 1 week
Dmitry Chagin [Thu, 12 Aug 2021 08:49:01 +0000 (11:49 +0300)]
linux(4): Add struct clone_args for future clone3 system call.
In preparation for clone3 system call add struct clone_args and use it in
clone implementation.
Move all of clone related bits to the newly created linux_fork.h header.
Dmitry Chagin [Thu, 12 Aug 2021 08:45:25 +0000 (11:45 +0300)]
fork: Allow ABI to specify fork return values for child.
At least Linux x86 ABI's does not use carry bit and expects that the dx register
is preserved. For this add a new sv_set_fork_retval hook and call it from cpu_fork().
Add a short comment about touching dx in x86_set_fork_retval(), for more details
see phab comments from kib@ and imp@.
Dmitry Chagin [Thu, 12 Aug 2021 08:36:24 +0000 (11:36 +0300)]
linux(4): Fix futex copyrights.
As no more NetBSD code in futexes exists replace NetBSD copyrights by
standard FreeBSD 2 clause license.
Add Roman Divacky's copyrights as an author of the robust futexes.
Roger Pau Monné [Wed, 11 Aug 2021 14:55:10 +0000 (16:55 +0200)]
loader: fix multiboot loading on UEFI
The Xen kernel has no symbol tables, so calling lookup_symbol against
it triggers the following Divide by Zero fault:
Loading Xen kernel...
/boot/xen data=0x2809c8+0x149638 |
!!!! X64 Exception Type - 00(#DE - Divide Error) CPU Apic ID - 00000000 !!!!
Fix lookup_symbol to prevent the #DE fault from happening if the
symbol table is not loaded and also fix loadfile_raw to mark multiboot
kernels as relocatable, since the only multiboot kernel supported is
Xen and was already unconditionally booted as relocatable.
Fixes: f75caed644a5 ('amd64 UEFI loader: stop copying staging area to 2M physical')
Reviewed by: imp, kib
Differential Revision: https://reviews.freebsd.org/D31507
xen: use correct cache attributes for Xen specific memory regions
bus_activate_resource maps memory regions as uncacheable on x86, which
is more strict than required for regions allocated using xenmem_alloc,
so don't rely on bus_activate_resource and instead map the region
using pmap_mapdev_attr and VM_MEMATTR_XEN as the cache attribute.
Rick Macklem [Thu, 12 Aug 2021 01:49:26 +0000 (18:49 -0700)]
nfscl: Add a Lookup+Open RPC for NFSv4.1/4.2
This patch adds a Lookup+Open compound RPC to the NFSv4.1/4.2
NFS client, which can be used by nfs_lookup() so that a
subsequent Open RPC is not required.
It uses the cn_flags OPENREAD, OPENWRITE added by commit c18c74a87c15.
This reduced the number of RPCs by about 15% for a kernel
build over NFS.
For now, use of Lookup+Open is only done when the "oneopenown"
mount option is used. It may be possible for Lookup+Open to
be used for non-oneopenown NFSv4.1/4.2 mounts, but that will
require extensive further testing to determine if it works.
While here, I've added the changes to the nfscommon module
that are needed to implement the Deallocate NFSv4.2 operation.
This avoids needing another cycle of changes to the internal
KAPI between the NFS modules.
This commit has changed the internal KAPI between the NFS
modules and, as such, all need to be rebuilt from sources.
I have not bumped __FreeBSD_version, since it was bumped a
few days ago.
Justin Hibbits [Thu, 12 Aug 2021 00:03:27 +0000 (19:03 -0500)]
powerpc/pseries: Allow radix pmap in pseries for ISA 3.0
ISA 3.0 allows for nested radix translations with minimal to no
involvement of the hypervisor. This should make pseries signficantly
faster on POWER9 pseries instances, as fewer hypercalls are needed to
manage pmap now.
Toomas Soome [Sat, 31 Jul 2021 08:09:48 +0000 (11:09 +0300)]
loader: open file list should be dynamic
Summary:
Open file list is currently created as statically allocated array (64 items).
Once this array is filled up, loader will not be able to operate with files.
In most cases, this mechanism is good enough, but the problem appears, when
we have many disks with zfs pool(s). In current loader implementation, all
discovered zfs pool configurations are kept in memory and disk devices open -
consuming the open file array. Rewrite the open file mechanism to use
dynamically allocated list.
Eric van Gyzen [Fri, 6 Aug 2021 15:38:51 +0000 (10:38 -0500)]
netdump: send key before dump, in case dump fails
Previously, if an encrypted netdump failed, such as due to a timeout or
network failure, the key was not saved, so a partial dump was
completely useless.
Send the key first, so the partial dump can be decrypted, because even a
partial dump can be useful.
Eric van Gyzen [Sat, 7 Aug 2021 08:59:02 +0000 (03:59 -0500)]
dumpon: fix encrypted dumps after commit 372557d8c3d
That commit moved key generation into a child process, including
a memory allocation referenced by a structure. The child wrote
the structure to the parent over a pipe, but did not write the
referenced allocation. The parent read the structure from the
child and used its pointer, which was bogus in the parent.
In the child, send both chunks of data to the parent. In the
parent, make a corresponding allocation and read both chunks.
Mark Johnston [Wed, 11 Aug 2021 20:22:26 +0000 (16:22 -0400)]
geom_disk: Add KMSAN checks
- In g_disk_start(), verify that the data to be written is initialized
according to KMSAN shadow state.
- In g_disk_done(), verify that the block driver updated shadow state as
expected, so as to catch sources of false positives early.
Andrew Gallatin [Wed, 11 Aug 2021 18:06:43 +0000 (14:06 -0400)]
ktls: Init reset tag task for cloned sessions
When cloning a ktls session (which is needed when we need to
switch output NICs for a NIC TLS session), we need to also
init the reset task, like we do when creating a new tls session.
Mitchell Horne [Wed, 11 Aug 2021 17:40:01 +0000 (14:40 -0300)]
kdb: Handle process enumeration before procinit()
Make kdb_thr_first() and kdb_thr_next() return sane values if the
allproc list and pidhashtbl haven't been initialized yet. This can
happen if the debugger is entered very early on, for example with the
'-d' boot flag.
This allows remote gdb to attach at such a time, and fixes some ddb
commands like 'show threads'.
Be explicit about the static initialization of these variables. This
part has no functional change.
From the (substantially larger) upstream commit:
+ call delay_output_sp to handle BSD-style padding when tputs_sp is
called, whether directly or internally, to ensure that the SCREEN
pointer is passed correctly (reports by Henric Jungheim, Juraj
Lutter).
This fixes bison segfaults observed when colourized output is enabled.
Thanks to jrtc27@ for identifying the upstream fix.
Warner Losh [Wed, 11 Aug 2021 16:59:28 +0000 (10:59 -0600)]
stand: Add MK_PIE=no to defs.mk
There's no need to build both pie and non-pie .o's for stand. There's
some other build thing with MK_BEAR_SSL=yes and/or MK_LOADER_VERIEXEC=yes
that causes the pie build to fail that the 'ar' stage now. Since we don't
need the PIE stuff and the non-PIE stuff, disable PIE for the boot loader.
Andrew Turner [Wed, 11 Aug 2021 15:01:25 +0000 (16:01 +0100)]
Read the arm64 midr register earlier
We use the midr_el1 register to decode which CPU type we are booting
from. Read it on the secondary CPUs before waiting for the boot CPU
to release us as it will need to use it before the release.
Andrew Turner [Fri, 23 Jul 2021 09:14:03 +0000 (10:14 +0100)]
Use arm64 sha256 intrinsics in libmd
Summary:
When running on a CPU that supports the arm64 sha256 intrinsics use them
to improve perfromance of sha256 calculations.
With this changethe following improvement has been seen on an Apple M1
with FreeBS running under Parallels, with similar results on a
Neoverse-N1 r3p1.
x sha256.orig
+ sha256.arm64
+--------------------------------------------------------------------+
|++ x x|
|+++ xxx|
||A |A||
+--------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 5 3.41 3.5 3.46 3.458 0.042661458
+ 5 0.47 0.54 0.5 0.504 0.027018512
Difference at 95.0% confidence
-2.954 +/- 0.0520768
-85.4251% +/- 0.826831%
(Student's t, pooled s = 0.0357071)
Reviewed by: cem
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31284
Andrew Turner [Thu, 5 Aug 2021 14:36:07 +0000 (14:36 +0000)]
Only use byte register access in legacy virtio pci
Some simulators don't implement arbitrary sized memory access to the
virtio PCI registers. Follow Linux and use single byte accesses to read
and write to these registers.
Reviewed by: bryanv, emaste (previous version)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31424
Mark Johnston [Tue, 10 Aug 2021 21:23:49 +0000 (17:23 -0400)]
ck: Correct asm output operand widths in amd64 pointer intrinsics
This does not appear to change generated code with the default
toolchain. However, KMSAN makes use of output operand specifications to
instrument inline asm, and with incorrect specifications we get false
positives in code that uses the CK_(S)LIST macros.
This was submitted upstream:
https://github.com/concurrencykit/ck/pull/175
The commit applies the same change locally to make KMSAN usable until
something equivalent is merged upstream.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Mark Johnston [Tue, 10 Aug 2021 21:15:03 +0000 (17:15 -0400)]
uma: Add KMSAN hooks
For now, just hook the allocation path: upon allocation, items are
marked as initialized (absent M_ZERO). Some zones are exempted from
this when it would otherwise raise false positives.
Use kmsan_orig() to update the origin map for UMA and malloc(9)
allocations. This allows KMSAN to print the return address when an
uninitialized UMA item is implicated in a report. For example:
panic: MSan: Uninitialized UMA memory from m_getm2+0x7fe