Wei Hu [Sat, 27 Nov 2021 06:42:34 +0000 (06:42 +0000)]
Hyper-V: vPCI: Prepopulate device bars
In recent Hyper-V releases on Windows Server 2022, vPCI code does not
initialize the last 4 bit of device bar registers. This behavior change
could result weird problems cuasing PCI code failure when configuring
bars.
Just write all 1's to those bars whose probed values are not the same
as current read ones. This seems to make Hyper-V vPCI and
pci_write_bar() to cooperate correctly on these releases.
Mark Johnston [Tue, 16 Nov 2021 18:36:30 +0000 (13:36 -0500)]
sctp: Use m_apply() to calcuate a checksum for an mbuf chain
m_apply() works on unmapped mbufs, so this will let us elide
mb_unmapped_to_ext() calls preceding sctp_calculate_cksum() calls in
the network stack.
Modify sctp_calculate_cksum() to assume it's passed an mbuf header.
This assumption appears to be true in practice, and we need to know the
full length of the chain.
No functional change intended.
Reviewed by: tuexen, jhb
Sponsored by: The FreeBSD Foundation
Mark Johnston [Tue, 16 Nov 2021 18:31:04 +0000 (13:31 -0500)]
mbuf: Only allow extpg mbufs if the system has a direct map
Some upcoming changes will modify software checksum routines like
in_cksum() to operate using m_apply(), which uses the direct map to
access packet data for unmapped mbufs. This approach of course does not
work on platforms without a direct map, so we have to disallow the use
of unmapped mbufs on such platforms.
I believe this is the right tradeoff: we only configure KTLS on amd64
and arm64 today (and one KTLS consumer, NFS TLS, requires a direct map
already), and the use of unmapped mbufs with plain sendfile is a recent
optimization. If need be, m_apply() could be modified to create
CPU-private mappings of extpg mbuf pages as a fallback.
So, change mb_use_ext_pgs to be hard-wired to zero on systems without a
direct map. Note that PMAP_HAS_DMAP is not a compile-time constant on
some systems, so the default value of mb_use_ext_pgs has to be
determined during boot.
Reviewed by: jhb
Discussed with: gallatin
Sponsored by: The FreeBSD Foundation
Ed Maste [Fri, 29 Oct 2021 00:49:12 +0000 (20:49 -0400)]
Don't build sanitizer runtimes under WITHOUT_CXX
In the past we built the sanitizer runtimes when building Clang
(and using Clang as the compiler) but 7676b388adbc changed this to
be conditional only on using Clang, to make the runtimes available
for external Clang.
They fail to build when WITHOUT_CXX is set though, so add MK_CXX
as part of the condition.
Reported by: Michael Dexter, Build Option Survey
Reviewed by: imp, jrtc27
Fixes: 7676b388adbc ("Always build the sanitizer runtimes...")
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32731
Kristof Provost [Thu, 4 Nov 2021 17:05:58 +0000 (18:05 +0100)]
if_gif: fix vnet shutdown panic
If an if_gif exists and has an address assigned inside a vnet when the
vnet is shut down we failed to clean up the address, leading to a panic
when we ip_destroy() and the V_in_ifaddrhashtbl is not empty.
This happens because of the VNET_SYS(UN)INIT order, which means we
destroy the if_gif interface before the addresses can be purged (and
if_detach() does not remove addresses, it assumes this will be done by
the stack teardown code).
Set subsystem SI_SUB_PSEUDO just like if_bridge so the cleanup
operations happen in the correct order.
Mark Johnston [Fri, 26 Nov 2021 14:27:19 +0000 (09:27 -0500)]
Hoist cddl prebuild lib dependency definitions out of a MK_ZFS block
The compilation of several libraries under cddl/lib is not conditional
on MK_ZFS = "yes", so their dependency on libspl is not conditional
either. Unbreak buildworld when WITHOUT_ZFS is set.
Mark Johnston [Mon, 15 Nov 2021 16:35:44 +0000 (11:35 -0500)]
vm_page: Consolidate page busy sleep mechanisms
- Modify vm_page_busy_sleep() and vm_page_busy_sleep_unlocked() to take
a VM_ALLOC_* flag indicating whether to sleep on shared-busy, and fix
up callers.
- Modify vm_page_busy_sleep() to return a status indicating whether the
object lock was dropped, and fix up callers.
- Convert callers of vm_page_sleep_if_busy() to use vm_page_busy_sleep()
instead.
- Remove vm_page_sleep_if_(x)busy().
No functional change intended.
Obtained from: jeff (object_concurrency patches)
Reviewed by: kib
Mark Johnston [Mon, 15 Nov 2021 16:44:04 +0000 (11:44 -0500)]
vm: Add a mode to vm_object_page_remove() which skips invalid pages
This will be used to break a deadlock in ZFS between the per-mountpoint
teardown lock and page busy locks. In particular, when purging data
from the page cache during dataset rollback, we want to avoid blocking
on the busy state of invalid pages since the busying thread may be
blocked on the teardown lock in zfs_getpages().
Add a helper, vn_pages_remove_valid(), for use by filesystems. Bump
__FreeBSD_version so that the OpenZFS port can make use of the new
helper.
PR: 258208
Reviewed by: avg, kib, sef
Tested by: pho (part of a larger patch)
Sponsored by: The FreeBSD Foundation
Justin Hibbits [Fri, 1 Oct 2021 18:39:18 +0000 (13:39 -0500)]
Fix segment size in compressing core dumps
A core segment is bounded in size only by memory size. On 64-bit
architectures this means a segment can be much larger than 4GB.
However, compress_chunk() takes only a u_int, clamping segment size to
4GB-1, resulting in a truncated core. Everything else, including the
compressor internally, uses size_t, so use size_t at the boundary here.
This dates back to the original refactor back in 2015 (r279801 / aa14e9b7).
Rick Macklem [Sun, 14 Nov 2021 21:36:14 +0000 (13:36 -0800)]
nfsstat: Add output for counts of new RPCs to the "-E" option
Add output to the "-E" option for new RPCs related
to NFSv4.1/4.2. Also, add output of the counts for
allocated layouts and the title for the "Client"
section (which was lost during a previous commit).
Ed Maste [Sun, 21 Nov 2021 17:17:20 +0000 (12:17 -0500)]
Fix coredump_phnum test with ASLR enabled
coredump_phnum intends to generate a core file with many PT_LOAD
segments. Previously it called mmap() in a loop with alternating
protections, relying on each mapping following the previous, to produce
a core file with many page-sized PT_LOAD segments. With ASLR on we no
longer have this property of each mmap() following the previous.
Instead, perform a single allocation, and then use mprotect() to set
alternating pages to PROT_READ.
Andriy Gapon [Thu, 4 Nov 2021 11:56:22 +0000 (13:56 +0200)]
icee: allow configuration via hints on FDT-based systems
On-board devices should be configured via the FDT and overlays.
Hints are primarily useful for external and temporarily attached devices.
Adding hints is much easier and faster than writing and compiling
an overlay.
Andriy Gapon [Thu, 4 Nov 2021 11:55:35 +0000 (13:55 +0200)]
ds1307: allow configuration via hints on FDT-based systems
On-board devices should be configured via the FDT and overlays.
Hints are primarily useful for external and temporarily attached devices.
Adding hints is much easier and faster than writing and compiling
an overlay.
Rick Macklem [Sat, 13 Nov 2021 01:32:55 +0000 (17:32 -0800)]
pNFS: Add nfsstats counters for number of Layouts
For pNFS, Layouts are issued by the server to indicate
where a file's data resides on the DS(s). This patch
adds counters for how many layouts are allocated to
the nfsstatsv1 structure, using two reserved fields.
ofwfb: fix vga/hdmi console with ASMEDIA during boot on powerpc64(le)
On recent OpenBMC firmware, the onboard ASMEDIA video card framebuffer
address was removed from device tree for security purposes (value is set
to zero to avoid leaking the address).
This patch works around the problem by taking framebuffer base address
from the "ranges" property of a parent node.
Reviewed by: luporl, jhibbits (on IRC)
MFC after: 2 weeks
Sponsored by: Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D30626
Kristof Provost [Fri, 29 Oct 2021 15:40:53 +0000 (17:40 +0200)]
pf: Introduce ridentifier
Allow users to set a number on rules which will be exposed as part of
the pflog header.
The intent behind this is to allow users to correlate rules across
updates (remember that pf rules continue to exist and match existing
states, even if they're removed from the active ruleset) and pflog.
Ed Maste [Thu, 25 Nov 2021 16:47:03 +0000 (11:47 -0500)]
Update deprecation version for drivers removed in main
Removal of the amr, esp, iir, mly and twa drivers was planned before
FreeBSD 13, but did not happen before the branch. Update the
depreciation notices to indicate that they are gone in FreeBSD 14.
Rick Macklem [Thu, 11 Nov 2021 23:43:58 +0000 (15:43 -0800)]
nfscl: Add a LayoutError RPC for NFSv4.2 pNFS mounts
If a pNFS server's DS runs out of disk space, it replies
NFSERR_NOSPC to the client doing writing. For the Linux
client, it then sends a LayoutError RPC to the MDS server to
tell it about the error. This patch adds the same to the
FreeBSD NFSv4.2 pNFS client, to maintain Linux compatible
behaviour, particlularily for non-FreeBSD pNFS servers.
Rick Macklem [Mon, 8 Nov 2021 23:58:00 +0000 (15:58 -0800)]
nfsd: Fix the NFSv4.2 pNFS MDS server for NFSERR_NOSPC via LayoutError
If a pNFS server's DS runs out of disk space, it replies
NFSERR_NOSPC to the client doing writing. For the Linux
client, it then sends a LayoutError RPC to the MDS server to
tell it about the error and keeps retrying, doing repeated
LayoutGets to the MDS and Write RPCs to the DS. The Linux client is
"stuck" until disk space on the DS is free'd up unless
a subsequent LayoutGet request is sent a NFSERR_NOSPC
reply.
The looping problem still occurs for NFSv4.1 mounts, but no
fix for this is known at this time.
This patch changes the pNFS MDS server to reply to LayoutGet
operations with NFSERR_NOSPC once a LayoutError reports the
problem, until the DS has available space. This keeps the Linux
NFSv4.2 from looping.
Found during recent testing because of issues w.r.t. a DS
being out of space found during a recent IEFT NFSv4 working
group testing event.
Rick Macklem [Mon, 8 Nov 2021 20:59:31 +0000 (12:59 -0800)]
nfsd: Fix f_bavail and f_ffree for NFSv4 when negative
Since the NFS Space_available and Files_available are unsigned,
the NFSv3 server sets them to 0 when negative, so that they
do not appear to be large positive values for non-FreeBSD clients.
This patch fixes the NFSv4 server to do the same.
Found during a recent IEFT NFSv4 working group testing event.
Kristof Provost [Tue, 16 Nov 2021 19:46:26 +0000 (20:46 +0100)]
riscv: add COMPAT_FREEBSD12 option
Turn on compat option for older FreeBSD versions (i.e. 12). We do not
enable the compat options for 11 or older because riscv was never
supported in those versions.
Zhenlei Huang [Wed, 3 Nov 2021 11:46:48 +0000 (12:46 +0100)]
devfs.rules: Correctly unhide pf in vnet jails
Revision 9e9be081d8 introduced a new devfs rule devfsrules_jail_vnet. It
includes rule devfsrules_jail which include other rules. Unfortunately
devfs could not recursively parse the action include and thus
devfsrules_jail_vnet will expose all nodes.
John Baldwin [Wed, 21 Apr 2021 20:57:04 +0000 (13:57 -0700)]
riscv: Clear SUM in SSTATUS for supervisor mode exceptions.
Previously, a page fault taken during copyin/out and related functions
would run the entire fault handler while permitting direct access to
user addresses. This could also leak across context switches (e.g. if
the page fault handler was preempted by an interrupt or slept for disk
I/O).
To fix, clear SUM in assembly after saving the original version of
SSTATUS in the supervisor mode trapframe.
John Baldwin [Mon, 15 Nov 2021 19:31:16 +0000 (11:31 -0800)]
ktls: Add simple receive tests of kernel TLS.
Similar to the simple transmit tests added in a10482ea7476d68d1ab028145ae6d97cef747b49, these tests test the kernel
TLS functionality directly by manually encrypting TLS records using
randomly generated keys and writing them to a socket to be processed
by the kernel.
John Baldwin [Mon, 15 Nov 2021 19:26:45 +0000 (11:26 -0800)]
ktls: Add padding tests for AES-CBC MTE cipher suites.
For each AES-CBC MTE cipher suite, test sending records with 1 to 16
bytes of payload. This ensures that all of the potential padding
values are covered.
John Baldwin [Mon, 1 Nov 2021 18:28:10 +0000 (11:28 -0700)]
ktls: Add simple transmit tests of kernel TLS.
Note that these tests test the kernel TLS functionality directly.
Rather than using OpenSSL to perform negotiation and generate keys,
these tests generate random keys send data over a pair of TCP sockets
manually decrypting the TLS records generated by the kernel.
John Baldwin [Wed, 27 Oct 2021 23:35:56 +0000 (16:35 -0700)]
ktls: Fix assertion for TLS 1.0 CBC when using non-zero starting seqno.
The starting sequence number used to verify that TLS 1.0 CBC records
are encrypted in-order in the OCF layer was always set to 0 and not to
the initial sequence number from the struct tls_enable.
In practice, OpenSSL always starts TLS transmit offload with a
sequence number of zero, so this only matters for tests that use a
random starting sequence number.
John Baldwin [Wed, 13 Oct 2021 19:30:15 +0000 (12:30 -0700)]
ktls: Ensure FIFO encryption order for TLS 1.0.
TLS 1.0 records are encrypted as one continuous CBC chain where the
last block of the previous record is used as the IV for the next
record. As a result, TLS 1.0 records cannot be encrypted out of order
but must be encrypted as a FIFO.
If the later pages of a sendfile(2) request complete before the first
pages, then TLS records can be encrypted out of order. For TLS 1.1
and later this is fine, but this can break for TLS 1.0.
To cope, add a queue in each TLS session to hold TLS records that
contain valid unencrypted data but are waiting for an earlier TLS
record to be encrypted first.
- In ktls_enqueue(), check if a TLS record being queued is the next
record expected for a TLS 1.0 session. If not, it is placed in
sorted order in the pending_records queue in the TLS session.
If it is the next expected record, queue it for SW encryption like
normal. In addition, check if this new record (really a potential
batch of records) was holding up any previously queued records in
the pending_records queue. Any of those records that are now in
order are also placed on the queue for SW encryption.
- In ktls_destroy(), free any TLS records on the pending_records
queue. These mbufs are marked M_NOTREADY so were not freed when the
socket buffer was purged in sbdestroy(). Instead, they must be
freed explicitly.
John Baldwin [Tue, 26 Oct 2021 21:50:05 +0000 (14:50 -0700)]
Further refine the ExpDataSN checks for SCSI Response PDUs.
According to 11.4.8 in RFC 7143, ExpDataSN MUST be 0 if the response
code is not Command Completed, but we were requiring it to always be
the count of DataIn PDUs regardless of the response code.
In addition, at least one target (OCI Oracle iSCSI block device)
returns an ExpDataSN of 0 when returning a valid completion with an
error status (Check Condition) in response to a SCSI Inquiry. As a
workaround for this target, only warn without resetting the connection
for a 0 ExpDataSN for responses with a non-zero error status.
PR: 259152
Reported by: dch
Reviewed by: dch, mav, emaste
Fixes: 4f0f5bf99591 iscsi: Validate DataSN values in Data-In PDUs in the initiator.
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D32650
John Baldwin [Tue, 24 Aug 2021 21:58:34 +0000 (14:58 -0700)]
iscsi: Validate DataSN values in Data-In PDUs in the initiator.
As is done in the target, require that DataSN values are consecutive
and in-order. If an out of order Data-In PDU is received, force a
session reconnect. In addition, when a SCSI Response PDU is received,
verify that the ExpDataSN field matches the count of Data-In PDUs
received for this command. If not, force a session reconnect.
John Baldwin [Tue, 26 Oct 2021 21:52:40 +0000 (14:52 -0700)]
ctld: Always declare MaxRecvDataSegmentLength.
This key is Declarative and should always be sent even if the
initiator did not send it's own limit. This is similar to the fix in fc79cf4fea72 but for the target side. However, unlike that fix,
failure to send the key simply results in reduced performance.
John Baldwin [Thu, 18 Feb 2021 17:23:59 +0000 (09:23 -0800)]
Add Chacha20-Poly1305 as a KTLS cipher suite.
Chacha20-Poly1305 for TLS is an AEAD cipher suite for both TLS 1.2 and
TLS 1.3 (RFCs 7905 and 8446). For both versions, Chacha20 uses the
server and client IVs as implicit nonces xored with the record
sequence number to generate the per-record nonce matching the
construction used with AES-GCM for TLS 1.3.
John Baldwin [Thu, 14 Oct 2021 17:59:16 +0000 (10:59 -0700)]
cxgbe: Only run ktls_tick when NIC TLS is enabled.
Previously the body of ktls_tick was a nop when NIC TLS was disabled,
but the callout was still scheduled consuming power on otherwise-idle
systems with Chelsio T6 adapters. Now the callout only runs while NIC
TLS is enabled on at least one interface of an adapter.