Corvin Köhne [Fri, 4 Nov 2022 11:24:49 +0000 (12:24 +0100)]
bhyve: add basic basl implementation
Basl is the bhyve ASL compiler. At the moment, it's just a small wrapper
to call iasl, the Intel ASL compiler. As bhyve will gain support for
qemu's ACPI table loader in the future, it has to create ACPI tables on
it's own. Therefore, it makes sense to create a new file which keeps the
code for basl.
This first implementation of basl supports creating an ACPI table by
appending raw bytes to it. It's also capable of loading all tables into
guest memory.
Wanpeng Qian [Mon, 14 Nov 2022 13:08:52 +0000 (14:08 +0100)]
bhyve: nvme controller obey async event setting when reporting critical temperature
Async event report is controlled by async event configuration feature
setting. When reporting a critical temperature warning, check the async
event configuration.
Wanpeng Qian [Mon, 14 Nov 2022 13:06:34 +0000 (14:06 +0100)]
bhyve: return FEATURE_NOT_CHANGEABLE for unimplemented feature of NVMe controller
Set Feature is a feature specified function. Currently only some
features have the set procedure. For features that are not handled by
the controller, we should return a FEATURE_NOT_CHANGEABLE error message.
Wanpeng Qian [Mon, 14 Nov 2022 13:02:44 +0000 (14:02 +0100)]
bhyve: abort and return FEATURE_NOT_SAVEABLE while set feature with a save flag for NVMe controller.
Currently bhyve's NVMe controller cannot save feature values cross
reboot. It should return a FEATURE_NOT_SAVEABLE error when the command
specifies a save flag.
If the Feature Identifier specified in the Set Features command is not
saveable by the controller and the controller receives a Set Features
command with the Save bit set to one, then the command shall be aborted
with a status of Feature Identifier Not Saveable.
Wanpeng Qian [Mon, 14 Nov 2022 12:59:11 +0000 (13:59 +0100)]
nvmecontrol: Fix condition when print number of Firmware Slots and Firmware Slot1 Readonly.
The Number of Firmware Slots should never be zero. So, a Firmware Slot 1
should always exist. For that reason, always print the Number of
Firmware Slots and the Firmware Slot 1 Read-Only value.
Dapeng Gao [Tue, 15 Nov 2022 00:21:38 +0000 (00:21 +0000)]
Check alignment of fp in unwind_frame
A misaligned frame pointer is certainly not a valid frame pointer and
with strict alignment enabled (as on CHERI) can cause panics when it is
loaded from later in the code.
Mark Johnston [Mon, 14 Nov 2022 20:08:45 +0000 (15:08 -0500)]
bhyve: Simplify control flow in the xhci device model
We only need to call pci_xhci_xfer_complete() when handling a transfer
to the control endpoint, so move that code into the epid == 1 block and
eliminate a goto. Also remove an unneeded reinitialization of
setup_trb.
dhclient(8): Verify lease-, renewal- and rebinding-time option sizes.
Else out-of-bound reads and undefined behaviour may happen.
The current code only checked for the presence of the first of four bytes.
Make sure the fields in question have the minium size required.
Kristof Provost [Thu, 10 Nov 2022 12:54:09 +0000 (13:54 +0100)]
if_ovpn: ensure we're in vnet context when calling sorele()
We reference count to ensure we don't release the socket while we still
have data in flight. That means that we can end up releasing the socket
from ovpn_encrypt_tx_cb().
We must have a vnet context set when calling sorele() (which asserts
this from within sofree()), so move the CURVNET_SET()/CURVNET_RESTORE()
to ensure this is the case.
While here also add a couple of assertions to make this more obvious,
and to ease future debugging.
othermta (along with mta_start_script configuration entry in rc.conf)
was a mechanism used to be able to run another mta than sendmail(8) before
"rcng" time 20 years ago.
Rick Macklem [Sun, 13 Nov 2022 20:16:06 +0000 (12:16 -0800)]
rpcb_clnt.c: Do not force use of UDP
Without this patch, the code in the rpcbind client forces
the use of UDP. A comment notes that some rpcbind servers
only support UDP. This makes NFSv3 mounts to Azure servers
impossible, since they require use of TCP for rpcbind.
Since the comment is very old (imported from NetBSD in 2001)
and I do not believe any UDP only rpcbind servers will
still exist, this patch comments out the code that forces
use of UDP, so that NFSv3 mounts to Azure servers can work.
For an NFSv3 mount, the "udp" mount option will still
make mount_nfs use UDP for rpcbind so that can be used
as a workaround for any old NFSv3 server that only
supports rpcbind over UDP (if any such server still exists).
I asked if doing this change is appropriate on freebsd-fs@
and I only got one reply (off list) that supported doing
the change.
Kirk McKusick [Sun, 13 Nov 2022 06:56:03 +0000 (22:56 -0800)]
Enable taking snapshots on UFS/FFS filesystems using journaled soft updates.
All the needed infrastructure updates have been made to allow
snapshots to be taken on UFS/FFS filesystems that are using journaled
soft updates. The most immediate benefit is the ability to use a
snapshot to take a consistent filesystem dump on a live filesystem
using the -L option to dump(8).
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36491
Kirk McKusick [Sat, 12 Nov 2022 23:36:07 +0000 (15:36 -0800)]
Fix for tunefs(8) unable to add a UFS/FFS soft update journal.
The reported bug is UFS: bad file descriptor: soft update journaling
can not be enabled on some FreeBSD-provided disk images – failed
to write updated cg.
The UFS library (libufs(3)) failed to reopen its disk descriptor
when first attempting to update a cylinder group. The error only
occurred when trying to add journaling to a filesystem whose first
cylinder group was too full to hold the journal.
PR: 259090
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Franco Fichtner [Fri, 11 Nov 2022 21:20:13 +0000 (22:20 +0100)]
rc: add a manual entry for ${name}_setup
${name}_prepend is suboptimal as it is prepended to the actual
command being run. Therefore the term "commandS to prepend"
is misleading and no clear separation takes place for setup tasks
that may be required like changing a config file permission or
generating a configuration file prior to service start.
The other reason is that {argument}_precmd is a service-side
variable and cannot be repurposed from the user-side.
Eric van Gyzen [Thu, 3 Nov 2022 02:42:54 +0000 (21:42 -0500)]
zfs tests: stop writing to arbitrary devices
TL;DR: Three ZFS tests created ZFS pools on all unmounted devices listed
in /etc/fstab, corrupting their contents. Stop that.
Imagine my surprise when the ESP on my main dev/test VM would "randomly"
become corrupted, making it unbootable. Three tests collect various devices
from the system and try to add them to a test pool. The test expects this
to fail because it _assumes_ these devices are in use and ZFS will correctly
reject the request.
My /etc/fstab has two entries for devices in /dev:
Note the `noauto` on the ESP. In a remarkable example of irony, I chose
this because it should keep the ESP more protected from corruption;
in fact, mounting it would have protected it from this case.
The tests added all of these devices to a test pool in a _single command_,
expecting the command to fail. The swap device was in use, so the command
correctly failed, but the ESP was added and therefore corrupted. However,
since the command correctly failed, the test didn't notice the ESP problem.
If each device had been added with its own command, the test _might_ have
noticed that one of them incorrectly succeeded. However, two of these
tests would not have noticed:
hotspare_create_001_neg was incorrectly specified as needing the Solaris
dumpadm command, so it was skipped. _Some_ of the test needs that command,
but it checks for its presence and runs fine without it.
Due to bug 241070, zpool_add_005_pos was marked as an expected failure.
Due to the coarse level of integration with ATF, this test would still
"pass" even if it failed for the wrong reason. I wrote bug 267554 to
reconsider the use of atf_expect_fail in these tests.
Let's further consider the use of various devices found around the system.
In addition to devices in /etc/fstab, the tests also used mounted devices
listed by the `mount` command. If ZFS behaves correctly, it will refuse
to added mounted devices and swap devices to a pool. However, these are
unit tests used by developers to ensure that ZFS still works after they
modify it, so it's reasonable to expect ZFS to do the _wrong_ thing
sometimes. Using random host devices is unsafe.
Fix the root problem by using only the disks provided via the "disks"
variable in kyua.conf. Use one to create a UFS file system and mount it.
Use another as a swap device. Use a third as a dump device, but expect
it to fail due to bug 241070.
While I'm here:
Due to commit 6b6e2954dd65, we can simply add a second dump device and
remove it in cleanup. We no longer need to save, replace, and restore the
pre-existing dump device.
The cleanup_devices function used `camcontrol inquiry` to distinguish disks
from other devices, such as partitions. That works fine for SCSI, but not
for ATA or VirtIO block. Use `geom disk list` instead.
linuxkpi: Define `ZERO_OR_NULL_PTR()` in <linux/slab.h>
On Linux, the `kmalloc()` family of functions returns a special value if
the size of the allocation is zero. This macro verifies if the pointer
is NULL (the allocation failed) or the size is 0 (the allocation was not
performed AFAIU). This special value can be passed to `kfree()`.
On FreeBSD, our `malloc(9)` functions don't return a special value for
0-size allocations. Therefore we can simply compare the result against
NULL.
Reviewed by: manu
Approved by: manu
Differential Revision: https://reviews.freebsd.org/D37367
linuxkpi: Include <linux/list.h> and <linux/kernel.h> from <linux/mutex.h>
They are not really used in this header. However they are included in
Linux and at least the DRM drivers unfortunately rely on this namespace
pollution.
Reviewed by: manu
Approved by: manu
Differential Revision: https://reviews.freebsd.org/D37365
Kyle Evans [Fri, 11 Nov 2022 19:50:29 +0000 (13:50 -0600)]
arm64: add a spin-table implementation for Apple Silicon
The M1 has no EL3, so we're limited to a spin-table implementation if we
want to eventually use bhyve on it. Implement spin-table now, but note
that we still prefer PSCI where possible.
Wanpeng Qian [Fri, 11 Nov 2022 19:13:06 +0000 (12:13 -0700)]
nvmecontrol: fix wrong temperature unit for INTEL SSDs.
Although intel's specification did not tell which unit for Temperature
Statistics (Log Identifier C5h), I believe it is based on Celsius
instead of Kelvin.
here is my P3700 SSDs result(before):
Intel Temperature Log
=====================
Current: 30 K, -243.15 C, -405.67 F
Overtemp Last Flags 0
Overtemp Lifetime Flags 0
Max Temperature 53 K, -220.15 C, -364.27 F
Min Temperature 17 K, -256.15 C, -429.07 F
Max Operating Temperature 63 K, -210.15 C, -346.27 F
Min Operating Temperature 0 K, -273.15 C, -459.67 F
Estimated Temperature Offset: 0 C/K
after apply the patch, result is
Intel Temperature Log
=====================
Current: 303.15 K, 30 C, 86.00 F
Overtemp Last Flags 0
Overtemp Lifetime Flags 0
Max Temperature 326.15 K, 53 C, 127.40 F
Min Temperature 290.15 K, 17 C, 62.60 F
Max Operating Temperature 336.15 K, 63 C, 145.40 F
Min Operating Temperature 273.15 K, 0 C, 32.00 F
Estimated Temperature Offset: 0 C/K
I also compare to smartctl's report. it match very well.
also tested on Intel P3600, it fixed the problem.
Signed-off-by: Wanpeng Qian <wanpengqian@gmail.com>
Reviewed by: imp (added tweak to samsung.c so it still compiles)
Differential Revision: https://reviews.freebsd.org/D32845
Mitchell Horne [Fri, 11 Nov 2022 18:23:11 +0000 (14:23 -0400)]
ddb: don't limit pindex output in 'show vmopag'
This command already prints a tremendous amount of output, and properly
obeys the pager. It no longer makes sense to arbitrarily limit the pages
that are printed, as the reader will not be aware that this has
happened.
Reviewed by: markj
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D37361
Mitchell Horne [Fri, 11 Nov 2022 18:21:29 +0000 (14:21 -0400)]
ddb(4): misc updates
- Describe optional 'addr' argument to many show commands
- Remove obsolete commands (show cbstat)
- 'show jails' was renamed to 'show prison'
- Remove superfluous commentary about sleepqueues
- Fix an xref to gdb(4)
- Fix issues reported by mandoc -Tlint
- Plus a couple other inaccuracies/inconsistencies
Reviewed by: pauamma, markj, jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation (in part)
Differential Revision: https://reviews.freebsd.org/D37332
Mitchell Horne [Fri, 11 Nov 2022 18:20:31 +0000 (14:20 -0400)]
netgdb(4): update list of required kernel options
The man page claims that netgdb will be enabled automatically with the
presence of the DDB, GDB, and INET options. Based on the logic in
conf/files, this is not the case. Update the manpage to list all
of the options required to include netgdb.
The previous `llnode` field is moved inside another field `node`.
This `node` field is a `struct __call_single_node` in Linux. Here, we
simply add an anonymous struct with the `llnode` field inside. That
field's new name is `llist` now.
V2: Use an anonymous union to keep the structure backward compatible
with drivers using the previous `llnode` field. This was suggested
by wufl@ and hselasky@. Thank you!
Reviewed by: manu
Approved by: manu
Differential Revision: https://reviews.freebsd.org/D36955
arm64 is the only currently supported architecture that has
${MACHINE_CPUARCH} set to a different value (aarch64) than ${MACHINE}
(arm64), as described in arch(7). However, there is no source directory
associated with arm64 that has a name set to ${MACHINE_CPUARCH}.
Remove the dead code that adds a directory with a name set to
${MACHINE_CPUARCH} to a list of directories indexed with cscope.
This change allows to use the cscope target on arm64.
Mark Johnston [Fri, 11 Nov 2022 15:02:42 +0000 (10:02 -0500)]
bhyve: Cast away const when fetching a config nvlist
Silence a warning from the compiler about "const" being discarded. The
warning is correct: nvlist values are supposed to be immutable.
However, fixing this properly will require some contortions on behalf of
consumers who look up a subtree of the config and modify it. Per a
discussion on freebsd-virtualization@, the solution will probably be to
outright replace the use of nvlists for VM configuration, but until that
happens let's document the problem and silence the warning.
Mark Johnston [Fri, 11 Nov 2022 15:02:10 +0000 (10:02 -0500)]
bhyve: Drop volatile qualifiers from virtio rings
The qualifiers are there presumably because these rings are mapped into
the guest, but they do not appear to be required for correctness, and
bhyve generally doesn't qualify accesses to guest memory this way.
Moreover, the qualifiers are discarded by snapshot code, causing clang
to emit warnings. Just stop using volatile here.
The use of volatile appears to be inherited from the kernel driver's
definitions of the same structures. It makes some sense, since USB TRBs
and related structures live in guest memory, but bhyve device models
generally don't volatile-qualify accesses to guest memory and I can't
see how they are required for correctness here. Moreover, XHCI_GADDR
does not return volatile pointers so we're already being inconsistent.
Just drop the qualifiers to address the warning.
Kristof Provost [Wed, 9 Nov 2022 16:11:26 +0000 (17:11 +0100)]
dummynet: fix codel
Serialize rcvif when enqueing packets for codel. We already tried to
restore the serialized rcvif in fq_codel_extract_head(), but that
doesn't work when we fail to serialize it first, so we ended up dropping
all packets passed through codel.
Kristof Provost [Fri, 11 Nov 2022 09:40:21 +0000 (10:40 +0100)]
if_ovpn: fix AES-128-GCM support
We need to explicitly list AES-128-GCM as an allowed cipher for that
mode to work. While here also add AES-192-GCM. That brings our supported
cipher list in line with other openvpn/dco platforms.
Andrew Turner [Fri, 11 Nov 2022 08:25:57 +0000 (08:25 +0000)]
Fix a rk356x pinctrl register offset
The pull-up/pull-down register offset was wrong on the Rockchip rk356x.
It was set such that the driver would modify the IOMUX control registers.
This seems to work with the current device tree files, but fails with
upstream files. Fix the offset so the later calculation has the correct
offset for the pull-up/pull-down control register.
tcp: account sent/received IP ECN markings independently
Have tcpstats (netstat -s) differentiate between received and sent
ECN-marked packets. Also account for IP ECN bits (on TCP packets)
even when the tcp session has not negotiated ECN support.
ixgbe: workaround errata about UDP frames with zero checksum
Intel 82599 has errata related to IPv4 UDP frames with zero checksum.
It reports such datagrams with L4 integrity errors in IXGBE_XEC
register. And after afb1aa4e6df2 commit such errors are reported
via IFCOUNTER_IERRORS. This confuses users, since actually all frames
are handled correctly by the system.
To workaround the problem, let's ignore the XEC register value for
82599 cards for now.
Kyle Evans [Thu, 10 Nov 2022 04:20:34 +0000 (22:20 -0600)]
include: put includes into -dev packages
The includes build is kind of funky, as we support either copying or
symlinking files into /usr/include. For `copies`, we were supplying
the include/ ${TAG_ARGS}, which puts packages into `FreeBSD-runtime`,
without any consideration to the fact that we're installing headers.
Let's copy the approach that the `symlinks` target uses for now, and
add ",dev" to the TAG_ARGS so that headers at least end up in
FreeBSD-runtime-dev, which is more appropriate. Some of these includes
are actually technically supposed to be in *other* packages and their
INCSGROUP's PACKAGE setting is actually correct, but this is less
trivial to solve. This is a bandaid to fix the immediate problem of
some headers ending up in two different packages.
PR: 267526
Reviewed by: dfr, manu
Differential Revision: https://reviews.freebsd.org/D37256
Luiz Amaral [Wed, 9 Nov 2022 11:40:43 +0000 (12:40 +0100)]
pfsync: prepare code to accommodate AF_INET6 family
Work is ongoing to add support for pfsync over IPv6. This required some
changes to allow for differentiating between the two families in a more
generic way.
This patch converts the relevant ioctls to using nvlists, making future
extensions (such as supporting IPv6 addresses) easier.
Anton Rang [Wed, 9 Nov 2022 20:13:01 +0000 (14:13 -0600)]
vm_page_unswappable: remove wrong assertion
markj says:
...the assertion is incorrect and should simply be removed.
It has been racy since we removed the use of the page hash
lock to synchronize wiring of pages.
PR: 267621
Reviewed by: markj, Anton Rang <rang@acm.org>
MFC after: 1 week
Sponsored by: Dell Inc.
Differential Revision: https://reviews.freebsd.org/D37320
Kirk McKusick [Wed, 9 Nov 2022 18:44:03 +0000 (10:44 -0800)]
Add support for managing UFS/FFS snapshots to fsck_ffs(8).
The kernel handles the managment of UFS/FFS snapshots. Since UFS/FFS
updates filesystem data (rather than always writing changes to new
locations like ZFS), the kernel must check every filesystem write
to see if the block being written is part of a snapshot. If it is
part of a snapshot, then the kernel must make a copy of the old
block value into a newly allocated block for the snapshot before
allowing the write to be done. Similarly, if a block is being freed,
the kernel must check to see if it is part of a snapshot and let
the snapshot claim the block rather than freeing it for future use.
When a snapshot is freed, its blocks need to be offered to older
snapshots and freed only if no older snapshots wish to claim them.
When snapshots were added to UFS/FFS they were integrated into soft
updates and just a small part of the management of snapshots needed
to be added to fsck_ffs(8) as soft updates minimized the set of
snapshot changes that might need correction. When journaling was
added to soft updates a much more complete knowledge of snapshots
needed to be added to fsck_ffs(8) for it to be able to properly
handle the filesystem changes that a journal rollback needs to do
(specifically the freeing and allocation of blocks). Since this
functionality was unavailable, the use of snapshots was disabled
when running with journaled soft updates.
This set of changes imports the kernel code for the management of
snapshots to fsck_ffs(8). With this code in place it will become
possible to enable snapshots when running with journalled soft
updates. The most immediate benefit will be the ability to use
snapshots to take consistent filesystem dumps on live filesystems.
Future work will be done to update fsck_ffs(8) to be able to use
snapshots to run in background on live filesystems running with
journaled soft updates.
Reviewed by: kib
Tested by: Peter Holm
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36491
ipfw: Have NAT steal the TH_RES1 bit, instead of the TH_AE bit
The NAT module use of the tcphdr.th_x2 field now collides with the
use of this TCP header flag as AccECN (AE) bit. Use the topmost
bit instead to allow negotiation of AccECN across a NAT device.
Event: IETF 115 Hackathon
Reviewed By: #transport, tuexen
MFC after: 3 days
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D37300