Cy Schubert [Tue, 8 Nov 2022 08:53:29 +0000 (00:53 -0800)]
heimdal: Fix multiple security vulnerabilities
The following issues are patched:
- CVE-2022-42898 PAC parse integer overflows
- CVE-2022-3437 Overflows and non-constant time leaks in DES{,3} and arcfour
- CVE-2021-44758 NULL dereference DoS in SPNEGO acceptors
- CVE-2022-44640 Heimdal KDC: invalid free in ASN.1 codec
Note that CVE-2022-44640 is a severe vulnerability, possibly a 10.0
on the Common Vulnerability Scoring System (CVSS) v3, as we believe
it should be possible to get an RCE on a KDC, which means that
credentials can be compromised that can be used to impersonate
anyone in a realm or forest of realms.
Heimdal's ASN.1 compiler generates code that allows specially
crafted DER encodings of CHOICEs to invoke the wrong free function
on the decoded structure upon decode error. This is known to impact
the Heimdal KDC, leading to an invalid free() of an address partly
or wholly under the control of the attacker, in turn leading to a
potential remote code execution (RCE) vulnerability.
This error affects the DER codec for all extensible CHOICE types
used in Heimdal, though not all cases will be exploitable. We have
not completed a thorough analysis of all the Heimdal components
affected, thus the Kerberos client, the X.509 library, and other
parts, may be affected as well.
This bug has been in Heimdal's ASN.1 compiler since 2005, but it may
only affect Heimdal 1.6 and up. It was first reported by Douglas
Bagnall, though it had been found independently by the Heimdal
maintainers via fuzzing a few weeks earlier.
While no zero-day exploit is known, such an exploit will likely be
available soon after public disclosure.
- CVE-2019-14870: Validate client attributes in protocol-transition
- CVE-2019-14870: Apply forwardable policy in protocol-transition
- CVE-2019-14870: Always lookup impersonate client in DB
Sponsored by: so (philip)
Obtained from: so (philip)
Tested by: philip, cy
MFC after: immediately
John Baldwin [Tue, 15 Nov 2022 20:08:51 +0000 (12:08 -0800)]
cxgbe: Enable TOE TLS RX when an RX key is provided via setsockopt().
Rather than requiring a socket to be created as a TLS socket from the
get go, switch a TOE socket from "plain" TOE to TLS mode when a
receive key is added to the socket.
The firmware is only able to switch a "plain" TOE connection to TLS
mode if the head of the pending socket data is the start of a TLS
record, so the connection is migrated to TLS mode as a multi-step
process.
When TOE TLS RX is enabled, the associated connection's receive side
is frozen via a flag in the TCB. The state of the socket buffer is
then examined to determine if the pending data in the socket buffer
ends on a TLS record boundary. If so, the connection is migrated to
TLS mode and unfrozen. Otherwise, the connection is unfrozen
temporarily until more data arrives. Once more data arrives, the
receive queue is frozen again and rechecked. This continues until the
connection is paused at a record boundary. Any records received
before TLS mode is enabled are decrypted as software records.
Note that this removes the 'rx_tls_ports' sysctl. TOE TLS offload for
receive is now enabled automatically on existing TOE connections when
using a KTLS-aware SSL library just as it was previously enabled
automatically for TLS transmit. This also enables TLS offload for TOE
connections which enable TLS after passing initial data in the clear
(e.g. STARTTLS with SMTP).
John Baldwin [Tue, 15 Nov 2022 20:03:19 +0000 (12:03 -0800)]
ktls: Add tests for receiving corrupted or invalid records.
These should all trigger errors when reading from the socket.
Tests include truncated records (socket closed early on the other
side), corrupted records (bits flipped in explicit IVs, ciphertext, or
MAC), invalid header fields, and various invalid record lengths.
John Baldwin [Tue, 15 Nov 2022 20:02:57 +0000 (12:02 -0800)]
ktls_ocf: Reject encrypted TLS records using AEAD that are too small.
If a TLS record is too small to contain the required explicit IV,
record_type (TLS 1.3), and MAC, reject attempts to decrypt it with
EMSGSIZE without submitting it to OCF. OCF drivers may not properly
detect that regions in the crypto request are outside the bounds of
the mbuf chain. The caller isn't supposed to submit such requests.
John Baldwin [Tue, 15 Nov 2022 20:02:03 +0000 (12:02 -0800)]
ktls: Add software support for AES-CBC decryption for TLS 1.1+.
This is mainly intended to provide a fallback for TOE TLS which may
need to use software decryption for an initial record at the start
of a connection.
Andrew Turner [Mon, 31 Oct 2022 15:08:26 +0000 (15:08 +0000)]
Split out the arm64 EL2 exception vectors
These were originally in locore.S as they are only needed so we have
a valid value to put into the vbar_el2 register. As these will soon
be used by bhyve so move them to a new file as we already have with
the EL1 exception vectors in exception.S.
Obtained from: https://github.com/FreeBSD-UPB/freebsd-src (earlier version)
Sponsored by: Innovate UK
Sponsored by: The FreeBSD Foundation
Andrew Turner [Mon, 14 Nov 2022 15:48:43 +0000 (15:48 +0000)]
Add the arch field to the arm64 MIDR macros
For completeness add accessors for the MIDR field. As the field is
always 0xf on arm64 it is unneeded in the current MICR handling, but
will be used in the vmm module for bhyve.
Obtained from: https://github.com/FreeBSD-UPB/freebsd-src (earlier version)
Sponsored by: The FreeBSD Foundation
Andrew Turner [Mon, 7 Nov 2022 11:21:42 +0000 (11:21 +0000)]
Disable superpage use for stage 2 arm64 mappings
When modifying a stage 2 mapping we may need to call into the
hypervisor to invalidate the TLB. Until it is known if the cost of
this operation is less than the performance gains superpages offers
disable their use.
Reviewed by: kib. markj
Sponsored by: Innovate UK
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D37299
Kristof Provost [Tue, 15 Nov 2022 11:11:32 +0000 (12:11 +0100)]
pfsync: fix memory leak
The recent refactoring to prepare for pfsync over IPv6 introduced a
memory leak.
If we don't have a sync peer configured we return early (without sending
out a packet), but failed to free the newly allocated packet.
Kristof Provost [Wed, 9 Nov 2022 13:48:05 +0000 (14:48 +0100)]
if_ovpn: pass control packets through the socket
Rather than passing control packets through the ioctl interface allow
them to pass through the normal UDP socket flow.
This simplifies both kernel and userspace, and matches the approach
taken (or the one that will be taken) on the Linux side of things.
Some ACPI tables like XSDT contain pointers to other ACPI tables. When
an ACPI table is loaded by qemu's loader, the address in the guest
memory is unknown. For that reason, the qemu loader supports patching
those pointers. Basl keeps track of all pointers and causes the qemu
loader to patch all pointers.
The qemu ACPI table loader is unsupport yet. However, in a future commit
bhyve will use dynamic ACPI table offsets based on the size and
alignment requirements of each ACPI table. Therefore, tracking ACPI
table pointer is required too.
The qemu ACPI table loader patches the ACPI tables. After patching them,
checksums aren't correct any more. It has to calculate a new checksum
for the ACPI table. For that reason, basl has to keep track of checksums
and has to cause the qemu loader to create new checksums for the tables.
The qemu ACPI table loader isn't supported yet. However, the address of
all tables is unknown as long as bhyve hasn't finished ACPI table
creation. So, the checksum of tables which include pointer to other
tables are unknown too. This requires tracking of checksums too.
ACPI tables have different layouts. So, there's no common position for
the length field. When tables are build by basl, the length is unknown
at the beginning. It has to be set after building the table.
Corvin Köhne [Fri, 4 Nov 2022 13:30:53 +0000 (14:30 +0100)]
bhyve: add basl support for generic addresses
In upcoming commits, bhyve will build some ACPI tables by it's own.
Therefore, it should be capable of appending GENERIC_ADDRESS structs to
ACPI tables.
Corvin Köhne [Fri, 4 Nov 2022 11:30:37 +0000 (12:30 +0100)]
bhyve: use basl to load ACPI tables
Load the blobs compiled by iasl into a basl_table. The basl_table is a
temporary buffer which copies the ACPI tables into guest memory for us.
This allows us in the future to pass the blobs over the qemu fwcfg
interface to the guest.
Corvin Köhne [Fri, 4 Nov 2022 11:24:49 +0000 (12:24 +0100)]
bhyve: add basic basl implementation
Basl is the bhyve ASL compiler. At the moment, it's just a small wrapper
to call iasl, the Intel ASL compiler. As bhyve will gain support for
qemu's ACPI table loader in the future, it has to create ACPI tables on
it's own. Therefore, it makes sense to create a new file which keeps the
code for basl.
This first implementation of basl supports creating an ACPI table by
appending raw bytes to it. It's also capable of loading all tables into
guest memory.
Wanpeng Qian [Mon, 14 Nov 2022 13:08:52 +0000 (14:08 +0100)]
bhyve: nvme controller obey async event setting when reporting critical temperature
Async event report is controlled by async event configuration feature
setting. When reporting a critical temperature warning, check the async
event configuration.
Wanpeng Qian [Mon, 14 Nov 2022 13:06:34 +0000 (14:06 +0100)]
bhyve: return FEATURE_NOT_CHANGEABLE for unimplemented feature of NVMe controller
Set Feature is a feature specified function. Currently only some
features have the set procedure. For features that are not handled by
the controller, we should return a FEATURE_NOT_CHANGEABLE error message.
Wanpeng Qian [Mon, 14 Nov 2022 13:02:44 +0000 (14:02 +0100)]
bhyve: abort and return FEATURE_NOT_SAVEABLE while set feature with a save flag for NVMe controller.
Currently bhyve's NVMe controller cannot save feature values cross
reboot. It should return a FEATURE_NOT_SAVEABLE error when the command
specifies a save flag.
If the Feature Identifier specified in the Set Features command is not
saveable by the controller and the controller receives a Set Features
command with the Save bit set to one, then the command shall be aborted
with a status of Feature Identifier Not Saveable.
Wanpeng Qian [Mon, 14 Nov 2022 12:59:11 +0000 (13:59 +0100)]
nvmecontrol: Fix condition when print number of Firmware Slots and Firmware Slot1 Readonly.
The Number of Firmware Slots should never be zero. So, a Firmware Slot 1
should always exist. For that reason, always print the Number of
Firmware Slots and the Firmware Slot 1 Read-Only value.
Dapeng Gao [Tue, 15 Nov 2022 00:21:38 +0000 (00:21 +0000)]
Check alignment of fp in unwind_frame
A misaligned frame pointer is certainly not a valid frame pointer and
with strict alignment enabled (as on CHERI) can cause panics when it is
loaded from later in the code.
Mark Johnston [Mon, 14 Nov 2022 20:08:45 +0000 (15:08 -0500)]
bhyve: Simplify control flow in the xhci device model
We only need to call pci_xhci_xfer_complete() when handling a transfer
to the control endpoint, so move that code into the epid == 1 block and
eliminate a goto. Also remove an unneeded reinitialization of
setup_trb.
dhclient(8): Verify lease-, renewal- and rebinding-time option sizes.
Else out-of-bound reads and undefined behaviour may happen.
The current code only checked for the presence of the first of four bytes.
Make sure the fields in question have the minium size required.
Kristof Provost [Thu, 10 Nov 2022 12:54:09 +0000 (13:54 +0100)]
if_ovpn: ensure we're in vnet context when calling sorele()
We reference count to ensure we don't release the socket while we still
have data in flight. That means that we can end up releasing the socket
from ovpn_encrypt_tx_cb().
We must have a vnet context set when calling sorele() (which asserts
this from within sofree()), so move the CURVNET_SET()/CURVNET_RESTORE()
to ensure this is the case.
While here also add a couple of assertions to make this more obvious,
and to ease future debugging.
othermta (along with mta_start_script configuration entry in rc.conf)
was a mechanism used to be able to run another mta than sendmail(8) before
"rcng" time 20 years ago.
Rick Macklem [Sun, 13 Nov 2022 20:16:06 +0000 (12:16 -0800)]
rpcb_clnt.c: Do not force use of UDP
Without this patch, the code in the rpcbind client forces
the use of UDP. A comment notes that some rpcbind servers
only support UDP. This makes NFSv3 mounts to Azure servers
impossible, since they require use of TCP for rpcbind.
Since the comment is very old (imported from NetBSD in 2001)
and I do not believe any UDP only rpcbind servers will
still exist, this patch comments out the code that forces
use of UDP, so that NFSv3 mounts to Azure servers can work.
For an NFSv3 mount, the "udp" mount option will still
make mount_nfs use UDP for rpcbind so that can be used
as a workaround for any old NFSv3 server that only
supports rpcbind over UDP (if any such server still exists).
I asked if doing this change is appropriate on freebsd-fs@
and I only got one reply (off list) that supported doing
the change.
Kirk McKusick [Sun, 13 Nov 2022 06:56:03 +0000 (22:56 -0800)]
Enable taking snapshots on UFS/FFS filesystems using journaled soft updates.
All the needed infrastructure updates have been made to allow
snapshots to be taken on UFS/FFS filesystems that are using journaled
soft updates. The most immediate benefit is the ability to use a
snapshot to take a consistent filesystem dump on a live filesystem
using the -L option to dump(8).
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36491
Kirk McKusick [Sat, 12 Nov 2022 23:36:07 +0000 (15:36 -0800)]
Fix for tunefs(8) unable to add a UFS/FFS soft update journal.
The reported bug is UFS: bad file descriptor: soft update journaling
can not be enabled on some FreeBSD-provided disk images – failed
to write updated cg.
The UFS library (libufs(3)) failed to reopen its disk descriptor
when first attempting to update a cylinder group. The error only
occurred when trying to add journaling to a filesystem whose first
cylinder group was too full to hold the journal.
PR: 259090
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Franco Fichtner [Fri, 11 Nov 2022 21:20:13 +0000 (22:20 +0100)]
rc: add a manual entry for ${name}_setup
${name}_prepend is suboptimal as it is prepended to the actual
command being run. Therefore the term "commandS to prepend"
is misleading and no clear separation takes place for setup tasks
that may be required like changing a config file permission or
generating a configuration file prior to service start.
The other reason is that {argument}_precmd is a service-side
variable and cannot be repurposed from the user-side.
Eric van Gyzen [Thu, 3 Nov 2022 02:42:54 +0000 (21:42 -0500)]
zfs tests: stop writing to arbitrary devices
TL;DR: Three ZFS tests created ZFS pools on all unmounted devices listed
in /etc/fstab, corrupting their contents. Stop that.
Imagine my surprise when the ESP on my main dev/test VM would "randomly"
become corrupted, making it unbootable. Three tests collect various devices
from the system and try to add them to a test pool. The test expects this
to fail because it _assumes_ these devices are in use and ZFS will correctly
reject the request.
My /etc/fstab has two entries for devices in /dev:
Note the `noauto` on the ESP. In a remarkable example of irony, I chose
this because it should keep the ESP more protected from corruption;
in fact, mounting it would have protected it from this case.
The tests added all of these devices to a test pool in a _single command_,
expecting the command to fail. The swap device was in use, so the command
correctly failed, but the ESP was added and therefore corrupted. However,
since the command correctly failed, the test didn't notice the ESP problem.
If each device had been added with its own command, the test _might_ have
noticed that one of them incorrectly succeeded. However, two of these
tests would not have noticed:
hotspare_create_001_neg was incorrectly specified as needing the Solaris
dumpadm command, so it was skipped. _Some_ of the test needs that command,
but it checks for its presence and runs fine without it.
Due to bug 241070, zpool_add_005_pos was marked as an expected failure.
Due to the coarse level of integration with ATF, this test would still
"pass" even if it failed for the wrong reason. I wrote bug 267554 to
reconsider the use of atf_expect_fail in these tests.
Let's further consider the use of various devices found around the system.
In addition to devices in /etc/fstab, the tests also used mounted devices
listed by the `mount` command. If ZFS behaves correctly, it will refuse
to added mounted devices and swap devices to a pool. However, these are
unit tests used by developers to ensure that ZFS still works after they
modify it, so it's reasonable to expect ZFS to do the _wrong_ thing
sometimes. Using random host devices is unsafe.
Fix the root problem by using only the disks provided via the "disks"
variable in kyua.conf. Use one to create a UFS file system and mount it.
Use another as a swap device. Use a third as a dump device, but expect
it to fail due to bug 241070.
While I'm here:
Due to commit 6b6e2954dd65, we can simply add a second dump device and
remove it in cleanup. We no longer need to save, replace, and restore the
pre-existing dump device.
The cleanup_devices function used `camcontrol inquiry` to distinguish disks
from other devices, such as partitions. That works fine for SCSI, but not
for ATA or VirtIO block. Use `geom disk list` instead.
linuxkpi: Define `ZERO_OR_NULL_PTR()` in <linux/slab.h>
On Linux, the `kmalloc()` family of functions returns a special value if
the size of the allocation is zero. This macro verifies if the pointer
is NULL (the allocation failed) or the size is 0 (the allocation was not
performed AFAIU). This special value can be passed to `kfree()`.
On FreeBSD, our `malloc(9)` functions don't return a special value for
0-size allocations. Therefore we can simply compare the result against
NULL.
Reviewed by: manu
Approved by: manu
Differential Revision: https://reviews.freebsd.org/D37367
linuxkpi: Include <linux/list.h> and <linux/kernel.h> from <linux/mutex.h>
They are not really used in this header. However they are included in
Linux and at least the DRM drivers unfortunately rely on this namespace
pollution.
Reviewed by: manu
Approved by: manu
Differential Revision: https://reviews.freebsd.org/D37365
Kyle Evans [Fri, 11 Nov 2022 19:50:29 +0000 (13:50 -0600)]
arm64: add a spin-table implementation for Apple Silicon
The M1 has no EL3, so we're limited to a spin-table implementation if we
want to eventually use bhyve on it. Implement spin-table now, but note
that we still prefer PSCI where possible.
Wanpeng Qian [Fri, 11 Nov 2022 19:13:06 +0000 (12:13 -0700)]
nvmecontrol: fix wrong temperature unit for INTEL SSDs.
Although intel's specification did not tell which unit for Temperature
Statistics (Log Identifier C5h), I believe it is based on Celsius
instead of Kelvin.
here is my P3700 SSDs result(before):
Intel Temperature Log
=====================
Current: 30 K, -243.15 C, -405.67 F
Overtemp Last Flags 0
Overtemp Lifetime Flags 0
Max Temperature 53 K, -220.15 C, -364.27 F
Min Temperature 17 K, -256.15 C, -429.07 F
Max Operating Temperature 63 K, -210.15 C, -346.27 F
Min Operating Temperature 0 K, -273.15 C, -459.67 F
Estimated Temperature Offset: 0 C/K
after apply the patch, result is
Intel Temperature Log
=====================
Current: 303.15 K, 30 C, 86.00 F
Overtemp Last Flags 0
Overtemp Lifetime Flags 0
Max Temperature 326.15 K, 53 C, 127.40 F
Min Temperature 290.15 K, 17 C, 62.60 F
Max Operating Temperature 336.15 K, 63 C, 145.40 F
Min Operating Temperature 273.15 K, 0 C, 32.00 F
Estimated Temperature Offset: 0 C/K
I also compare to smartctl's report. it match very well.
also tested on Intel P3600, it fixed the problem.
Signed-off-by: Wanpeng Qian <wanpengqian@gmail.com>
Reviewed by: imp (added tweak to samsung.c so it still compiles)
Differential Revision: https://reviews.freebsd.org/D32845
Mitchell Horne [Fri, 11 Nov 2022 18:23:11 +0000 (14:23 -0400)]
ddb: don't limit pindex output in 'show vmopag'
This command already prints a tremendous amount of output, and properly
obeys the pager. It no longer makes sense to arbitrarily limit the pages
that are printed, as the reader will not be aware that this has
happened.
Reviewed by: markj
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D37361
Mitchell Horne [Fri, 11 Nov 2022 18:21:29 +0000 (14:21 -0400)]
ddb(4): misc updates
- Describe optional 'addr' argument to many show commands
- Remove obsolete commands (show cbstat)
- 'show jails' was renamed to 'show prison'
- Remove superfluous commentary about sleepqueues
- Fix an xref to gdb(4)
- Fix issues reported by mandoc -Tlint
- Plus a couple other inaccuracies/inconsistencies
Reviewed by: pauamma, markj, jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation (in part)
Differential Revision: https://reviews.freebsd.org/D37332
Mitchell Horne [Fri, 11 Nov 2022 18:20:31 +0000 (14:20 -0400)]
netgdb(4): update list of required kernel options
The man page claims that netgdb will be enabled automatically with the
presence of the DDB, GDB, and INET options. Based on the logic in
conf/files, this is not the case. Update the manpage to list all
of the options required to include netgdb.
The previous `llnode` field is moved inside another field `node`.
This `node` field is a `struct __call_single_node` in Linux. Here, we
simply add an anonymous struct with the `llnode` field inside. That
field's new name is `llist` now.
V2: Use an anonymous union to keep the structure backward compatible
with drivers using the previous `llnode` field. This was suggested
by wufl@ and hselasky@. Thank you!
Reviewed by: manu
Approved by: manu
Differential Revision: https://reviews.freebsd.org/D36955
arm64 is the only currently supported architecture that has
${MACHINE_CPUARCH} set to a different value (aarch64) than ${MACHINE}
(arm64), as described in arch(7). However, there is no source directory
associated with arm64 that has a name set to ${MACHINE_CPUARCH}.
Remove the dead code that adds a directory with a name set to
${MACHINE_CPUARCH} to a list of directories indexed with cscope.
This change allows to use the cscope target on arm64.
Mark Johnston [Fri, 11 Nov 2022 15:02:42 +0000 (10:02 -0500)]
bhyve: Cast away const when fetching a config nvlist
Silence a warning from the compiler about "const" being discarded. The
warning is correct: nvlist values are supposed to be immutable.
However, fixing this properly will require some contortions on behalf of
consumers who look up a subtree of the config and modify it. Per a
discussion on freebsd-virtualization@, the solution will probably be to
outright replace the use of nvlists for VM configuration, but until that
happens let's document the problem and silence the warning.
Mark Johnston [Fri, 11 Nov 2022 15:02:10 +0000 (10:02 -0500)]
bhyve: Drop volatile qualifiers from virtio rings
The qualifiers are there presumably because these rings are mapped into
the guest, but they do not appear to be required for correctness, and
bhyve generally doesn't qualify accesses to guest memory this way.
Moreover, the qualifiers are discarded by snapshot code, causing clang
to emit warnings. Just stop using volatile here.
The use of volatile appears to be inherited from the kernel driver's
definitions of the same structures. It makes some sense, since USB TRBs
and related structures live in guest memory, but bhyve device models
generally don't volatile-qualify accesses to guest memory and I can't
see how they are required for correctness here. Moreover, XHCI_GADDR
does not return volatile pointers so we're already being inconsistent.
Just drop the qualifiers to address the warning.