Robert Wing [Wed, 3 Mar 2021 06:05:47 +0000 (21:05 -0900)]
bhyve/snapshot: provide a way to send other messages/data to bhyve
This is a step towards sending messages (other than suspend/checkpoint)
from bhyvectl to bhyve.
Introduce a new struct, ipc_message - this struct stores the type of
message and a union containing message specific structures for the type
of message being sent.
Robert Wing [Mon, 8 Mar 2021 00:23:29 +0000 (15:23 -0900)]
bhyve/snapshot: use SOCK_DGRAM instead of SOCK_STREAM
The save/restore feature uses a unix domain socket to send messages
from bhyvectl(8) to a bhyve(8) process. A datagram socket will suffice
for this.
An added benefit of using a datagram socket is simplified code. For
bhyve, the listen/accept calls are dropped; and for bhyvectl, the
connect() call is dropped.
EPRINTLN handles raw mode for bhyve(8), use it to print error messages.
Robert Wing [Sat, 27 Feb 2021 21:07:35 +0000 (12:07 -0900)]
bhyve/snapshot: rename and bump size of MAX_SNAPSHOT_VMNAME
MAX_SNAPSHOT_VMNAME is a macro used to set the size of a character
buffer that stores a filename or the path to a file - this file is used
by the save/restore feature.
Since the file doesn't have anything to do with a vm name, rename
MAX_SNAPSHOT_VMNAME to MAX_SNAPSHOT_FILENAME. Bump the size to PATH_MAX
while here.
Robert Wing [Sat, 27 Feb 2021 21:05:52 +0000 (12:05 -0900)]
bhyvectl: reduce code duplication
Combine send_start_checkpoint() and send_start_suspend() into a
single function named snapshot_request().
snapshot_request() is equivalent to send_start_checkpoint() and
send_start_suspend() except that it takes an additional argument. The
additional argument, enum ipc_opcode, is used to determine the type of
snapshot request being performed. Also, switch to using strlcpy instead
of strncpy.
Roger Pau Monné [Tue, 19 Jan 2021 11:52:28 +0000 (12:52 +0100)]
bhyve/ioapic: improve the tracking of IRR bit
One common method of EOI'ing an interrupt at the IO-APIC level is to
switch the pin to edge triggering mode and then back into level mode.
That would cause the IRR bit to be cleared and thus further interrupts
to be injected. FreeBSD does indeed use that method if the IO-APIC EOI
register is not supported.
The bhyve IO-APIC emulation code didn't clear the IRR bit when doing
that switch, and was also missing acknowledging the IRR state when
trying to inject an interrupt in vioapic_send_intr.
Roger Pau Monné [Tue, 19 Jan 2021 12:41:03 +0000 (13:41 +0100)]
bhyve/ioapic: only account for asserted line in level mode
After modifying a redirection entry only try to inject an interrupt if
the pin is in level mode, pins in edge mode shouldn't take into
account the line assert status as they are triggered by edge changes,
not the line status itself.
Martin Matuska [Thu, 26 Jan 2023 16:50:13 +0000 (17:50 +0100)]
zfs: merge openzfs/zfs@92e0d9d18 (zfs-2.1-release) into stable/13
OpenZFS release 2.1.9
Notable upstream pull requeset merges:
#12829 zfs diff -h/ZFS_DIFF_NO_MANGLE, diff cleanups
#14181 zed: unclean disk attachment faults the vdev
#14252 Activate filesystem features only in syncing context
#14253 Allow reciever to override encryption property in case of replication
#14254 Restrict visibility of per-dataset kstats inside FreeBSD jails
#14255 Zero end of embedded block buffer in dump_write_embedded()
#14261 FreeBSD: zfs_register_callbacks() must implement error check correctly
#14264 Miscellaneous fixes
#14272 Change ZEVENT_POOL_GUID to ZEVENT_POOL to display pool names
#14287 FreeBSD: Remove stray debug printf
#14288 Colorize zfs diff output
#14291 FreeBSD: Fix potential boot panic with bad label
#14328 FreeBSD: catch up to 1400077
Add the glue code to support netlink in Linuxolator.
linux_common(4) now depends on netlink(4).
All netlink protocol constants are consistent with the Linux version.
However, certain OS-specific constants such as AF_INET6, interface
flags or default routing table id, are different between FreeBSD and
Linux. Thus, it may be needed to rewrite some message parts or even
rewrite the whole message, adding or removing some TLVs. The core
netlink implementation code provides efficient rewriting callbacks
which Linuxolator now uses.
Rick Macklem [Wed, 11 Jan 2023 21:20:31 +0000 (13:20 -0800)]
kgssapi: Increase timeout for kernel to gssd(8) upcalls
It turns out that the underlying problem that caused
a Kerberized NFS mount with the "gssname" option to
fail was that the kernel upcall to the gssd(8) daemon
would time out prematurely after 25 seconds. The
gss_acquire_cred() GSSAPI library call
takes about 27 seconds for the case where a desired_name
argument is specified. A similarly long delay occurs
when the gss_init_sec_context() call is made and the
user principal's TGT has expired.
Once the upcall timed out, the kernel code assumed that
the gssd(8) daemon had died and closed the socket.
Ironically, closing the socket did cause the gssd(8)
daemon to terminate via a SIGPIPE signal.
This patch increases the timeout to 5 minutes. Since
a timeout should only occur when the gssd(8) daemon
has died, a long timeout should be ok and seems to fix this
problem.
I still think that commit c33509d49a should remain in the
system, since it allows the mount to complete quickly
and not take nearly 30 seconds.
Rick Macklem [Wed, 11 Jan 2023 21:28:44 +0000 (13:28 -0800)]
nfscl: Improve NFSv4 error message for NFSERR_WRONGSEC
The usual reason for an NFSv4 server replying NFSERR_WRONGSEC
to an operation is that a Kerberos credential is required.
This patch replaces a cryptic "err=10016" with a message
suggesting that a Kerberos TGT is probably needed.
Warner Losh [Wed, 25 Jan 2023 18:31:00 +0000 (11:31 -0700)]
stand/mips: retire BERI boot loader
The folks that created the BERI boot loader no longer need it. They've
long since moved on to other research platforms. Since BERI loader
contains yet another copy of code that I've consolodated; and since it's
impossible to test this platform; and since there are no users, retire
it completely to ease future MFCs.
Warner Losh [Wed, 25 Jan 2023 15:14:45 +0000 (08:14 -0700)]
stand/mips64: Make beri loader compile
The devdesc reorgs in main were done after mips was deleted. Make the
minimal changes to beri's devicename.c needed after that. I have no
ability to test this, however, so it builds with similar warnings to
before all my MFC changes.
Warner Losh [Wed, 25 Jan 2023 14:54:03 +0000 (07:54 -0700)]
stand: update mips uboot to uboot reorg
uboot reorg in main happened after the mips support was removed. Go
ahead and do the same reorg here. Move the mips files from mips/uboot to
uboot/arch/mips. Move mips/uboot/Makefile to
uboot/arch/mips/Makefile.inc and cut out the common lines. Split uboot
target into uboot and uboot.bin to better fit into reorg.
Some devices have CDC_CM descriptors that would point us to
the wrong interfaces. Add a quirk to ignore those (prefering the
CDC_UNION descriptor effectively)
Reviewed by: manu
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D37942
Warner Losh [Wed, 25 Jan 2023 04:19:48 +0000 (21:19 -0700)]
stand: mfc notes
MFCing stand is getting complicated. This file contains notes about
merging from 14, in summary:
o UFS changes have already been merged.
o The ZFS changes, on the whole, cannot be merged because OpenZFS
versions are different.
o MIPS is still present in 13, so commits to tear down mips
support.
o Some of the 14 GELI changes depend on crypto changes not yet merged.
o One change bumped the version to 14.x, and that shouldn't be merged.
The hashes listed should not be merged, unless something changes.
Coleman Kane [Tue, 24 Jan 2023 19:20:50 +0000 (14:20 -0500)]
linux 6.2 compat: zpl_set_acl arg2 is now struct dentry
Linux 6.2 changes the second argument of the set_acl operation to be a
"struct dentry *" rather than a "struct inode *". The inode* parameter
is still available as dentry->d_inode, so adjust the call to the _impl
function call to dereference and pass that pointer to it.
Also document that the get_acl -> get_inode_acl member name change from
commit 884a693 was an API change also introduced in Linux 6.2.
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #14415
Warner Losh [Fri, 20 Jan 2023 23:33:37 +0000 (16:33 -0700)]
byteswap.h: Add a glibc/linux compatible byteswap.h
For endian.h to work instead of sys/endian.h, some software needs
byteswap.h available. It must define {__,}byteswap_{16,32,64}.
Included sys/_endian.h to get an appropriate __byteswap16, etc
and defines the new macros in terms of them. Enhance _endian.h
to allow it to be included from here too.
Warner Losh [Fri, 20 Jan 2023 23:32:45 +0000 (16:32 -0700)]
linux: For better compatibility, provide compatible endian.h
Add endian.h. This includes sys/endian.h and then adds extra defines
that glibc defines with double underscores for our
_{BIG,BYTE,LITTLE,PDP}_ENDIAN macros. We also define __FLOAT_WORD_ORDER
to be the same as _BYTE_ENDIAN since FreeBSD doesn't currently define
this, and the default with glibc is exactly this for our platforms.
Move common parts of endian.h and sys/endian.h into sys/_endian.h
to limit namespace pollution from endian.h
All this gives us good compatibility with Linux. There may be one or two
upstreams that haven't integrated the patches I tried to send up.
There are some minor differences:
o The extra glibc macros are not defined. These are all
controlled with either __ at the start, or only defined
when glibc is being built. We also don't define macros
that are used internally in glibc that would pollute
the namespace.
o For complete compatibility, this change must also be
paired with providing a glibc-compatible byteswap.h.
Warner Losh [Tue, 21 Sep 2021 04:02:35 +0000 (22:02 -0600)]
endian.h: Use the __bswap* versions
Make it possible to have all these macros work without bswap* being
defined. bswap* is part of the application namespace and applications
are free to redefine those functions.
Brooks Davis [Tue, 17 Jan 2023 16:36:15 +0000 (16:36 +0000)]
riscv: Fix thread0.td_kstack_pages init
Commit 0ef3ca7ae37c70e9dc83475dc2e68e98e1c2a418 initialized
thread0.td_kstack_pages to KSTACK_PAGES. Due to the lack of an
include of opt_kstack_pages.h it used the fallback value of 4 from
machine/param.h. This meant that increasing KSTACK_PAGES in the kernel
config resulted in a panic in _epoch_enter_preempt as the following
assertion was false during network stack setup:
Brooks Davis [Tue, 17 Jan 2023 16:35:08 +0000 (16:35 +0000)]
arm64: Fix thread0.td_kstack_pages init
Commit 86a994d6537d7b5e1efb1019e466d86a688fd570 initialized
thread0.td_kstack_pages to KSTACK_PAGES. Due to the lack of an
include of opt_kstack_pages.h it used the fallback value of 4 from
machine/param.h. This meant that increasing KSTACK_PAGES in the kernel
config resulted in a panic in _epoch_enter_preempt as the following
assertion was false during network stack setup:
Warner Losh [Fri, 13 Jan 2023 21:21:16 +0000 (14:21 -0700)]
kboot: Use standard set_currdev
Use the standard set_currdev instead of the (now very old) copy of
setting currdev and loaddev directly. We do this only when we don't go
find the ZFS pool to boot from.
Warner Losh [Fri, 13 Jan 2023 21:21:07 +0000 (14:21 -0700)]
kboot: Add hostdisk override
When hostdisk_override is set, all the /dev devices are hidden, and only
the files in that directory are used. This will allow filesystem testing
on FreeBSD without root, for example. Adjust the parse routine to not
require devices start with /dev (plus fix a leak for an error
condition). Add a match routine to allow the device name to be something
like "/home/user/testing/zfsfoo:" instead of strictly in /dev. Note:
since we need to look at all the devices in the system to probe for ZFS
zpools, you can't generally use a full path to get a 'virtual disk' at
this time.
Warner Losh [Fri, 13 Jan 2023 21:20:56 +0000 (14:20 -0700)]
kboot: Fetch hostfs_root and bootdev from the environment
Fetch bootdev from the environment variable (so it should be set on the
command line). Default to 'zfs:' which will in the future look for the
first zpool that we can boot from. Prior versions of kboot would set
this from the second argument on the command line.
Fetch hostfs_root from the environment (defaulting to '/'). Prior
versions of kboot would set this from the first arg on the command line.
Warner Losh [Fri, 13 Jan 2023 21:20:40 +0000 (14:20 -0700)]
kboot: Add support for ZFS volumes
Add the zfs device and filesystem to config and write the hook we need
to probe zfs since there's not a generic mechanism in place to do that
when ZFS is configured.
Warner Losh [Fri, 13 Jan 2023 21:20:30 +0000 (14:20 -0700)]
kboot: Add ZFS support to hostdisk
Add helper function to walk through the disk drives we've found to look
for zpools. main.c will still need to call this because the loader
hasn't implemented a good way to 'taste' drives for zpools and/or GELI
partitions (mostly because there's no generic list of candidate
devices).
Warner Losh [Fri, 13 Jan 2023 21:20:09 +0000 (14:20 -0700)]
kboot: Rework hostdisk.c to allow easier ZFS support.
Keep a list of disks and partitions that we have. Keep track of the
sizes of the media and sector and use that to implement DIOCGMEDIASIZE
and DIOCGSECTORSIZE. Proivde a way to lookup disks by name.
Warner Losh [Fri, 13 Jan 2023 21:20:00 +0000 (14:20 -0700)]
stand/zfs: Fix memory leaking on error cases
Now that we return an allocated zfs_devdesc, we have to free it. These
frees were missing from the error cases. In addition, simplify the code
a bit for the out of memory case.
Warner Losh [Fri, 13 Jan 2023 21:19:47 +0000 (14:19 -0700)]
stand/uboot: Explain why we test for NULL here
Most parsedev routines assume that idev is non-null and can always be
set. Since we break from this pattern in uboot, explain why in a
comment. devparse was invented to put a lot of common code in one place
and to simplify the archsw.arch_getdev code and any dv_parsedev code
called. However, uboot couldn't use devparse at the time because its
device naming scheme slightly different parsing. So, we still use
uboot_parsedev directly from uboot_getdev where dev could be NULL. Add a
comment to this effect.
The match functionality added for ofw likely could be used to clean up
the multiple kludges that are here for uboot's device naming differences
with the normal boot loader. This work will wait for the future.
Warner Losh [Fri, 13 Jan 2023 21:19:39 +0000 (14:19 -0700)]
stand/ofw: dev can't be NULL here
dev can't be NULL here. ofw_common_parsedev is always called via
devparse (indirectly through dv_parsedev() calls there which call it
with the args unchanged). In the past, ofw_getdev could call us with
NULL pointer for the parse-only case, but that's now all handled inside
of devparse for simplicity.
Warner Losh [Fri, 13 Jan 2023 21:19:30 +0000 (14:19 -0700)]
stand: Separate base and cli parts of nvstore
zfs lives in libsa. However, it depends on nvstore (and other things)
that are in common. Fix part of this layering violation by splitting
nvstore into a libsa piece (which is the base implementation) and
keeping a much smaller common piece (to implement the nvstore
command). This just leaves zfs' knowledge of device names that's
specific to common and its calling platform specific init code to
resolve. Add a nvstore.h file for these two parts to communicate private
things and move the public nvstore api from bootstrap.h to stand.h.
Warner Losh [Wed, 11 Jan 2023 22:14:28 +0000 (15:14 -0700)]
stand: create common set_currdev
Pull together the nearly identical copies of set_currdev in i386,
userboot and efi. Other boot loaders have variances that might be fine
to use the common routine, or not. Since they are harder to test for me,
and ofw and uboot do handle these setting differently, leave them be for
now.
Warner Losh [Wed, 11 Jan 2023 22:14:17 +0000 (15:14 -0700)]
stand: Move dev_cleanup into libsa
Since dev_cleanup() walks through all the devsw devices with dv_cleanup
rotuines, move it into libsa rather than having it in
'common'. Logically, it operates only on things that are in libsa, and
would never be different for different loaders: either people would call
it as is, or they'd do the loop themselves with 'special' things inline
between calls to cleanup (not that I think that will ever be needed
though).
Warner Losh [Wed, 11 Jan 2023 22:14:02 +0000 (15:14 -0700)]
stand: Create common gen_setcurrdev and replace code
Replace 4 identical copies of *_setcurrdev with gen_setcurrdev to avoid
having to create a 5th copy. uboot_setcurrdev is actually different and
needs to remain separate (even though it's quite similar).
Warner Losh [Sun, 8 Jan 2023 19:00:51 +0000 (12:00 -0700)]
stand/efi: Better variable name
sanity_check_currdev returns true if it found a kernel or a sane loader
config file. A better name for this would be 'bootable' rather than 'rv'
which connotes in other places an errno value or similar.
Warner Losh [Fri, 16 Dec 2022 23:19:51 +0000 (16:19 -0700)]
stand/zfs: Add a third argument to zfs_probe_dev: part_too
Pass in 'true' if you'd like to search this device's partitions or
'false' if you should just search the device. EFI and (in the future)
kboot have discrete partitions that aren't accessed via the full disk
device. Weird things happen if you try to search in these cases.
Warner Losh [Fri, 23 Dec 2022 18:26:32 +0000 (11:26 -0700)]
kboot: use 128MB for the heap area, ZFS needs a lot of memory
ZFS uses a lot of memory. The old minimal allocations won't work when
ZFS support is added. Most environments this will be used (or will
liekly be used) have >> 256MB, 128MB should be safe everywhere and allow
examination of a fair number of ZFS pools to boot from.
Warner Losh [Sat, 7 Jan 2023 20:23:05 +0000 (13:23 -0700)]
stand: Add macros for file types from stat
Add the familiar macros for file types for stat's st_mode
member. Prepend HOST_ to the start of these. Make sure all the values
match the linux nolibc and uapi headers. These values are the same as
native values since they appear to be required by POSIX. Define anyway
to allow the reader of the code to know that they are in the 'host (eg
Linux)' namespace rather than the 'loader' namespace.
Warner Losh [Tue, 13 Dec 2022 05:39:03 +0000 (22:39 -0700)]
kboot: Disks should be at least 16MB
Linux pre-boot environments will often have a number of psuedo disks
that are small, all smaller than a few MB. 16MB is a good cutoff since
it's big enough to filter these devices, yet small enough to allow a
super-minimal partition through (the smallest I've been able to make
that's useful lately is around 20MB).
Warner Losh [Sat, 7 Jan 2023 01:39:09 +0000 (18:39 -0700)]
stand: Allow stand.h to be included in C++ programs
Allow stand.h to be included in C++ programs. This is little more than
using our stylized __BEGIN_DECL / __END_DECL around the entire
file. There's no run-time support for C++, so the C++ that can be used
is quite limited. It is enough for libunwind, though.
Warner Losh [Tue, 13 Dec 2022 04:46:05 +0000 (21:46 -0700)]
stand: Make ioctl declaration consistent
It typically had two args with an optional third from the userland
declaration in sys/ioccom.h. However, the funciton definition used a
non-optional char * argument. This mismatch is UB behavior (but worked
due to the calling convetions of all our machines).
Instead, add a declaration for ioctl to stand.h, make the third arg
'void *' which is a better match to the ... declaration before. This
prevents the convert int * -> char * errors as well. Make the ioctl
user-space declaration truly user-space specific (omit it in the
stand-alone build).
Warner Losh [Fri, 9 Dec 2022 14:55:42 +0000 (07:55 -0700)]
kboot: Use (void) instead of () for functiosn with no args
`int foo();` means 'a function that takes any number of arguments.`
not `a function that takes no arguemnts`, that's spelled `int foo(void);`
Adopt the latter.
Warner Losh [Fri, 9 Dec 2022 05:07:52 +0000 (22:07 -0700)]
kboot: Allow loading fdt from different sources
Linux has /sys/firmware/fdt and /proc/device-tree to publish the dtb for
the system. The former has it all in one file, while the latter breaks
it out. Prefer the former since it's the more modern interface, but
retain both since I don't have a PS3 to test to see if its kernel is new
enough for /sys/firmware or not.
Warner Losh [Wed, 7 Dec 2022 17:58:44 +0000 (10:58 -0700)]
stand/kboot: Parse the command line args
Do the standard command line parsing... With a small twist to deal with
the quirks of booting via linuxboot to the initrd from the command line
in shell.efi and other observed oddities.
Warner Losh [Wed, 7 Dec 2022 17:50:35 +0000 (10:50 -0700)]
stand/kboot: Initialize all the devices
main() of the boot loader is expected to call devinit() early. We do
this at the same time we do it in the EFI loader (except we don't have a
buffer cache here, we don't need to initialize time and we don't have
special efi partition handles to enumerate). This is just after we probe
for the console.
Warner Losh [Tue, 6 Dec 2022 17:55:58 +0000 (10:55 -0700)]
kboot: Mark the EFI specific parts of bootinfo.c
bootinfo.c is about to be shared with kboot since they create
substantially similar environments / metadata tagging / etc. Tag this
with #ifdef EFI for the moment until the proper abstracting out can
happen.
Warner Losh [Mon, 5 Dec 2022 17:40:15 +0000 (10:40 -0700)]
stand: update prototypes for md_load and md_load64
These are declared as extern in a number of files (some with the wrong
return type). Centralize this in modinfo.h and remove a few extra stray
declarations as well that are no longer used. No functional change.
Note: I've not tried to cope with the bi_load() functions which are the
same logical thing. These will be handled separately.
Warner Losh [Sun, 4 Dec 2022 05:46:21 +0000 (22:46 -0700)]
stand: aarch64 has different nlinks than amd64
Some typedefs are system dependent, so move them into stat_arch.h where
they are used. On amd64, nlinks is a int64_t, while on aarch64 it's an
int (or int32_t).