Kyle Evans [Thu, 23 Nov 2023 16:21:33 +0000 (10:21 -0600)]
arm64: lop off another 24MB of KVA for early device mappings
This grows the block enough to fit a 4K 32-bit depth framebuffer; some
firmware would present smaller GOP modes to be able to boot with a
smaller framebuffer on these devices, but the Windows Devkit firmware
is simply not that nice. Instead, it offers exactly one GOP mode that
matches the current resolution of the attached display, so with limited
control over resolution on most of my displays it'd be nice if we could
Just Work(TM) at 4K.
andrew notes that he has some ideas for removing PMAP_MAPDEV_EARLY_SIZE
entirely, so this limitation could end up removed altogether in the
future.
Mitchell Horne [Thu, 23 Nov 2023 15:28:26 +0000 (11:28 -0400)]
kern_reboot(): don't clear kdb_active
It is possible to reach this function from ddb via the "reset" command.
When this happens, we don't actually exit kdb, meaning we never execute
the latter steps of kdb_break() to restore the system state (e.g.
re-enable scheduler).
Therefore, we should not clear the kdb_active flag in this function, as
the debugger is still active. Put differently, kern_reboot() is not an
authority on kdb state, and should not touch it. The original motivation
for this assignment is not clear; I have checked thoroughly and I am
convinced it is not required by any reset code.
This fixes an edge case where a panic can be triggered during reset from
ddb:
1. Enter ddb via keyboard break sequence (KERNEL_PANICKED() == false &&
td->td_critnest > 0)
2. Execute the "reset" command
3. kern_reboot() sets kdb_active = false
4. A witness_checkorder() call via shutdown handler sees !kdb_active
and panics
Reviewed by: imp, markj
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42684
Mitchell Horne [Thu, 23 Nov 2023 15:27:20 +0000 (11:27 -0400)]
xen: improve shutdown hook
Make better use of the shutdown flags. In particular this now handles
standard reboot where RB_POWERCYCLE is not set, and indicates a crash
when the system has panicked.
While here, give the function a prefix.
Reviewed by: royger, markj
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42343
Mitchell Horne [Thu, 23 Nov 2023 15:26:12 +0000 (11:26 -0400)]
iscsi: adjust shutdown_pre_sync handler
Don't attempt to service reconnections if RB_NOSYNC is set. More
crucially, don't do it if the scheduler is stopped, as the maintenance
thread will never run again.
Reviewed by: jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42342
Mitchell Horne [Thu, 23 Nov 2023 15:59:05 +0000 (11:59 -0400)]
pst: improve shutdown_post_sync handler
It is desirable to shut down the raid controller even in the face of a
panic. In the SCHEDULER_STOPPED() case, set the interrupt mask bits so
that we request a polled wait, rather than sleep(), from
iop_queue_wait_msg().
Tweak the function name and signature.
Reviewed by: markj
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42337
Most handlers for this event are for disk drivers/geom modules. There
are a mix of checks being used here (or not), so let's standardize on
checking the presence of the RB_NOSYNC flag.
This flag is set whenever:
1. The kernel has panicked and kern.sync_on_panic=0*
2. We reboot from within the kernel debugger (the "reset" command)
3. Userspace requested it, e.g. by 'reboot -n'
Name the functions consistently.
*This sysctl is tuned to zero by default, but its existence means that
these handlers can be executed after a panic, at the user's discretion.
IMO this use-case is implicitly understood to be risky, and we'd be
better off eliminating it altogether.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D42337
Rick Macklem [Thu, 23 Nov 2023 15:23:33 +0000 (07:23 -0800)]
nfsd: Fix NFS access to .zfs/snapshot snapshots
When a process attempts to access a snapshot under
/<dataset>/.zfs/snapshot, the snapshot is automounted.
However, without this patch, the automount does not
set mnt_exjail, which results in the snapshot not being
accessible over NFS.
This patch defines a new function called vfs_exjail_clone()
which sets mnt_exjail from another mount point and
then uses that function to set mnt_exjail in the snapshot
automount. A separate patch that is currently a pull request
for OpenZFS, calls this function to fix the problem.
Replace int with either size_t or ssize_t (depending on context) in
order to support bit strings up to SSIZE_MAX bits in length. Since
some of the arguments that need to change type are pointers, we must
resort to light preprocessor trickery to avoid breaking existing code.
Mark Johnston [Wed, 22 Nov 2023 19:11:03 +0000 (14:11 -0500)]
bhyve: Add a slirp network backend
This enables a subset of the functionality provided by QEMU's user
networking implementation. In particular, it uses net/libslirp, the
same library as QEMU.
libslirp is permissively licensed but has some dependencies which make
it impractical to bring into the base system (glib in particular). I
thus opted to make bhyve dlopen the libslirp.so, which can be installed
via pkg. The library header is imported into bhyve.
The slirp backend takes a "hostfwd" which is identical to QEMU's
hostfwd. When configured, bhyve opens a host socket and listens for
connections, which get forwarded to the guest. For instance,
"hostfwd=tcp::1234-:22" allows one to ssh into the guest by ssh'ing to
port 1234 on the host, e.g., via 127.0.0.1. I didn't try to hook up
guestfwd support since I don't personally have a use-case for it yet,
and I think it won't interact nicely with the capsicum sandbox.
Mark Johnston [Wed, 22 Nov 2023 19:10:27 +0000 (14:10 -0500)]
bhyve: Split backends into separate files
Currently the net_backend structure definition is private to
net_backends.c, so all of the backend definitions are there. While
adding a new backend to use libslirp, it was noted that this file is
somewhat cluttered. Move the netmap and netgraph backends to their own
files and clean up includes a bit. No functional change intended.
Alexander Motin [Wed, 22 Nov 2023 20:10:57 +0000 (15:10 -0500)]
CAM: Remove return value from xpt_path_sbuf()
It is wrong to call sbuf_len() on third-party sbuf. If that sbuf
has a drain function, it ends up in assertion. But even would it
work, it would return not newly written length, but the full one.
Searching through the sources I don't see this value used.
Olivier Certner [Tue, 21 Nov 2023 17:33:08 +0000 (18:33 +0100)]
kern_racct.c: Don't compile if RACCT undefined
Just skip compiling this file if RACCT isn't defined. This allows to
skip including headers that no code uses at all, and also to remove the
whole file's #ifdef/#endif bracketing.
Reviewed by: markj
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Olivier Certner [Thu, 19 Oct 2023 14:28:06 +0000 (16:28 +0200)]
kern_rctl.c: Minimal includes when RCTL not defined
If RCTL is not defined, only the system call stubs returning ENOSYS are
compiled in. In this case, don't waste time including most headers
since their code is not used.
Reviewed by: markj
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Dimitry Andric [Wed, 22 Nov 2023 18:23:06 +0000 (19:23 +0100)]
compiler-rt: avoid segfaults when re-exec'ing with ASLR
After 930a7c2ac67e ("compiler-rt: re-exec with ASLR disabled when
necessary") and 96fe7c8ab0f6 ("compiler-rt: support ReExec() on
FreeBSD"), binaries linked against the sanitizer libraries may segfault
due to procctl(2) being intercepted. Instead, the non-intercepted
internal_procctl() should be called.
Similarly, the ReExec() function that re-executes the binary after
turning off ASLR should not call elf_aux_info(3) and realpath(3), since
these will also be intercepted. Instead, loop directly over the elf aux
info vector to find the executable path, and avoid calling realpath(3)
since it is actually unwanted for this use case.
Kristof Provost [Wed, 22 Nov 2023 13:44:03 +0000 (14:44 +0100)]
ip_mroute: handle V_mfchashtbl allocation failure
We allocate V_mfchashtbl with HASH_NOWAIT (which maps to M_NOWAIT), so
this allocation may fail. As we didn't handle that failure we could end
up dereferencing a NULL pointer later (e.g. during X_ip_mrouter_done()).
Do the obvious thing and fail out if we cannot allocate the table.
See also: https://redmine.pfsense.org/issues/14917
Sponsored by: Rubicon Communications, LLC ("Netgate")
Brooks Davis [Tue, 21 Nov 2023 22:46:43 +0000 (22:46 +0000)]
libc: remove some obsolete VCS data
These wide char support files were copied from the previous versions
with expanded $FreeBSD$ strings in #if 0 blocks. Remove them and the
scssid definitions in the same #if 0 blocks.
Warner Losh [Tue, 21 Nov 2023 18:36:18 +0000 (11:36 -0700)]
stand/efi: Define ACPI_USE_SYSTEM_INTTYPES to be 1 instead of blank
To avoid a redefinition warning... This needs to be redone correctly,
but this gets amd64 building again... My amd64 environment is polluted
with something that caues earlier failures which I ignored...
Olivier Certner [Fri, 20 Oct 2023 13:43:29 +0000 (15:43 +0200)]
Remove sysctl 'kern.smp.forward_signal_enabled'
It seems this was an "emergency" knob to revert a newly introduced
behavior. Overall, we want better system-wide signal receive latency,
and it doesn't seem that some contrary policy was ever needed (and if
that comes up, it should rather be implemented, e.g., per-process).
Suggested by: kib
Reviewed by: kib, jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42315
Signed-off-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca> Fixes: 3a338c5341 ("Add the BBR and RACK stacks to the LINT kernel.")
Pull Request: https://github.com/freebsd/freebsd-src/pull/907
Andrew Turner [Tue, 3 Oct 2023 14:03:51 +0000 (15:03 +0100)]
arm64: Set the Guarded Page flag in the kernel
Now the kernel and modules are built with branch protection we can
enablethe Guarded Page flag in the page tables. This causes indirect
branches to a location without a correct landing pad instruction to
raise an exception.
This should help mitigate some attacks where a function pointer is
changed to point somewhere other than the start of the function,
however it doesn't stop an attacker pointing it to an unintended
function.
Reviewed by: alc, scottph (both earlier version), markj
Sponsored by: Arm Ltd
Sponsored by: The FreeBSD Foundation (earlier version)
Differential Revision: https://reviews.freebsd.org/D42080
Andrew Turner [Thu, 12 Oct 2023 14:22:18 +0000 (15:22 +0100)]
libc: Teach libc about the BTI elf note
Add the Branch Target Identification (BTI) note to libc assembly
sources. As all obect files need the note for the library to have it
we need to insert it in all asm files.
Warner Losh [Tue, 21 Nov 2023 03:30:16 +0000 (20:30 -0700)]
stand: bandaide for acpi
Old binaries do not set acpi.rsdp early enough. So when we boot with an
older loader.efi from an ESP that's not been updated, we assume there's
no ACPI on this system. This is unwise. Put a band-aide on this until we
can implement a proper 'feature' variable that the binary reports so we
can do conditionals for things like this in the future.
Tony Hutter [Tue, 21 Nov 2023 00:07:32 +0000 (16:07 -0800)]
ZTS: Fix 'could not unmount datasets' on Alma 9 (#15542)
Many tests are failing on AlmaLinux 9 because ZTS could not destroy the
pool in cleanup. This was due to $PWD being set to '.' instead of the
expected full path. This patch sets $PWD to the full path.
Signed-off-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Don Brady <don.brady@delphix.com>
Simon J. Gerraty [Mon, 20 Nov 2023 20:51:25 +0000 (12:51 -0800)]
Fix share/zoneinfo for DIRDEPS_BUILD
The tranditional build makes multiple passes through the tree.
The DIRDEPS_BUILD visits each directory only once per architecture,
thus makefiles should be able to everything they need in a single pass.
The use of TZS!= when doing make(*install*)
only works if the directory has previously been visited to do zoneinfo
since before the zoneinfo target is run TZS will be empty.
To fix this, have the zoneinfo target capture the list of files to
zoneinfo, and install-zoneinfo use that list.
Rename that target to zonefiles - since that is now what it does.
This is more efficient - we only gather the list of zones when it is
likely to have changed, and allows the makefile to do everything in a
single pass.
This is a follow-up patch to https://reviews.freebsd.org/D42459
that modifies the loader lua to use the correct loader variables
for determining ACPI availability.
This also fixes a bug where ACPI can be inadvertently disabled when
setting System Defaults at the loader menu.
Warner Losh [Mon, 20 Nov 2023 03:48:30 +0000 (20:48 -0700)]
math: Move to const instead of __const
There's no reason to use the __const construct here. This is a left-over
from supporting K&R and ANSI compilers in the original Sun msun. All
other K&R crutches have been removed. Remove these as well. There's no
semantic difference. And there's already several others in math.h.
John Baldwin [Sat, 18 Nov 2023 19:31:07 +0000 (11:31 -0800)]
bsdinstall.8: Clarify the description of ZFSBOOT_FORCE_4K_SECTORS
This variable does not set the exact sector size of the pool, but
controls the minimum sector size. The sector size of the underlying
disks can always be larger than the minium controlled by this knob.
John Baldwin [Sat, 18 Nov 2023 19:08:34 +0000 (11:08 -0800)]
vfs mount: Consistently use ENODEV internally for an invalid fstype
Change vfs_byname_kld to always return an error value of ENODEV to
indicate an unsupported fstype leaving ENOENT to indicate errors such
as a missing mount point or invalid path. This allows nmount(2) to
better distinguish these cases and avoid treating a missing device
node as an invalid fstype after commit 6e8272f317b8.
While here, change mount(2) to return EINVAL instead of ENODEV for an
invalid fstype to match nmount(2).
Gordon Bergling [Sat, 18 Nov 2023 09:09:40 +0000 (10:09 +0100)]
Add a HISTORY section for memcpy(3) and mempcpy(3)
The memcpy() function first appeared in AT&T System V UNIX and was
reimplemented for 4.3BSD-Tahoe. The mempcpy() function first appeared in
FreeBSD 13.1.
Warner Losh [Sat, 18 Nov 2023 04:24:00 +0000 (21:24 -0700)]
nvme: Don't use version to listen for events for ns and fw changes
Instead, use the attribtue bits from the identification data to
determine if we should listen to namespace changes and firmware
activation. Should have no functional change, though we may stop
listening for events that will never happen.
Warner Losh [Sat, 18 Nov 2023 03:46:20 +0000 (20:46 -0700)]
pnpinfo: Remove __P
We don't need to compile on a K&R compiler (and we've long ago lost the
ability to do so). It's not even clear if it ever worked with a pure K&R
compiler, but maybe it once did...
Brooks Davis [Sat, 18 Nov 2023 00:48:14 +0000 (00:48 +0000)]
makesyscalls: don't make syscall.mk by default
We only want to produce syscall.mk for the main syscall table so default
to not producing it (send it to /dev/null) and add a syscalls.conf to
sys/kern to trigger the creation of sys/sys/syscall.mk. This eliminates
the need for entries in other syscalls.conf files and is a cleaner
pattern going forward.
Kristof Provost [Fri, 17 Nov 2023 12:52:34 +0000 (13:52 +0100)]
pf: sctp heartbeats confirm a connection
When we create a new state for multihomed sctp connections (i.e.
based on INIT/INIT_ACK or ASCONF parameters) the new connection will
never see a COOKIE/COOKIE_ACK exchange. We should consider HEARTBEAT_ACK
to be a confirmation that the connection is established.
This ensures that such connections do not time out earlier than
expected.
MFC after: 1 week
Sponsored by: Orange Business Services
Kristof Provost [Thu, 16 Nov 2023 19:55:02 +0000 (20:55 +0100)]
pf: skip urpf check for sctp multihomed states
When we create a new state for multihomed sctp connections (i.e.
based on INIT/INIT_ACK or ASCONF parameters) we cannot know what
interfaces we'll be seeing that traffic on. These states are floating
states, i.e. on "all" interfaces. We cannot do reverse path filtering
for these states, so do not do so.
MFC after: 1 week
Sponsored by: Orange Business Services
Kristof Provost [Thu, 16 Nov 2023 16:06:29 +0000 (17:06 +0100)]
pf: always create multihomed states as floating
When we create a new state for multihomed sctp connections (i.e.
based on INIT/INIT_ACK or ASCONF parameters) we cannot know what
interfaces we'll be seeing that traffic on. Make those states floating,
irrespective of state policy.
MFC after: 1 week
Sponsored by: Orange Business Services
Kirk McKusick [Fri, 17 Nov 2023 22:10:29 +0000 (14:10 -0800)]
Ensure I/O buffers in libufs(3) are 128-byte aligned.
Various disk controllers require their buffers to be aligned to a
cache-line size (128 bytes). For buffers allocated in structures,
ensure that they are 128-byte aligned. Use aligned_malloc to allocate
memory to ensure that the returned memory is 128-byte aligned.
While we are here, we replace the dynamically allocated inode buffer
with a buffer allocated in the uufsd structure just as the superblock
and cylinder group buffers do.
This can be removed if/when the kernel is fixed. Because this problem
has existed on one I/O subsystem or another since the 1990's, we
are probably stuck with dealing with it forever.
The problem most recent showed up in Azure, see:
https://reviews.freebsd.org/D41728
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=267654
Before these fixes were applied, it was confirmed that the changes
in this commit also fixed the issue in Azure.
Reviewed-by: Warner Losh, kib Tested-by: Souradeep Chakrabarti of Microsoft (earlier version)
PR: 267654
Differential Revision: https://reviews.freebsd.org/D41724
Brooks Davis [Fri, 17 Nov 2023 22:02:09 +0000 (14:02 -0800)]
freebsd: remove __FBSDID macro use
With FreeBSD's switch to git the $FreeBSD$ string is no longer expanded
and they have mostly been removed upstream. Stop using __FBSDID and
remove the no-longer needed sys/cdefs.h includes.
Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #15527
Alexander Motin [Fri, 17 Nov 2023 22:00:59 +0000 (17:00 -0500)]
ZIO: Optimize zio_flush()
- Generalize vdev_nowritecache handling by traversing through the
VDEV tree and skipping children ZIOs where not supported.
- Remove intermediate zio_null() in case of several VDEV children.
- Remove children handling from zio_ioctl(). There are no other
use cases for this code beside DKIOCFLUSHWRITECACHED, and would there
be, I doubt they would so straightforward apply to all VDEV children.
Comparing to removed previous optimization this should improve cases
of redundant ZILs/SLOGs.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15515