John Baldwin [Wed, 6 Oct 2021 21:08:46 +0000 (14:08 -0700)]
crypto: Permit variable-sized IVs for ciphers with a reinit hook.
Add a 'len' argument to the reinit hook in 'struct enc_xform' to
permit support for AEAD ciphers such as AES-CCM and Chacha20-Poly1305
which support different nonce lengths.
Reviewed by: markj
Sponsored by: Chelsio Communications, The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32105
John Baldwin [Tue, 25 May 2021 23:59:19 +0000 (16:59 -0700)]
crypto: Add crypto_cursor_segment() to fetch both base and length.
This function combines crypto_cursor_segbase() and
crypto_cursor_seglen() into a single function. This is mostly
beneficial in the unmapped mbuf case where back to back calls of these
two functions have to iterate over the sub-components of unmapped
mbufs twice.
Bump __FreeBSD_version for crypto drivers in ports.
John Baldwin [Tue, 25 May 2021 23:59:18 +0000 (16:59 -0700)]
crypto: Add a new type of crypto buffer for a single mbuf.
This is intended for use in KTLS transmit where each TLS record is
described by a single mbuf that is itself queued in the socket buffer.
Using the existing CRYPTO_BUF_MBUF would result in
bus_dmamap_load_crp() walking additional mbufs in the socket buffer
that are not relevant, but generating a S/G list that potentially
exceeds the limit of the tag (while also wasting CPU cycles).
John Baldwin [Wed, 6 Oct 2021 21:08:46 +0000 (14:08 -0700)]
cryptodev: Use 'csp' in the handlers for requests.
- Retire cse->mode and use csp->csp_mode instead.
- Use csp->csp_cipher_algorithm instead of the ivsize when checking
for the fixup for the IV length for AES-XTS.
Reviewed by: markj
Sponsored by: Chelsio Communications, The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32103
John Baldwin [Thu, 1 Apr 2021 22:42:30 +0000 (15:42 -0700)]
cryptocheck: Expand the set of sizes tested by -z.
Test individual sizes up to the max encryption block length as well as
a few sizes that include 1 full block and a partial block before
doubling the size.
John Baldwin [Thu, 1 Apr 2021 22:42:18 +0000 (15:42 -0700)]
ossl: Don't encryt/decrypt too much data for chacha20.
The loops for Chacha20 and Chacha20+Poly1305 which encrypted/decrypted
full blocks of data used the minimum of the input and output segment
lengths to determine the size of the next chunk ('todo') to pass to
Chacha20_ctr32(). However, the input and output segments could extend
past the end of the ciphertext region into the tag (e.g. if a "plain"
single mbuf contained an entire TLS record). If the length of the tag
plus the length of the last partial block together were at least as
large as a full Chacha20 block (64 bytes), then an extra block was
encrypted/decrypted overlapping with the tag. Fix this by also
capping the amount of data to encrypt/decrypt by the amount of
remaining data in the ciphertext region ('resid').
John Baldwin [Fri, 5 Mar 2021 17:47:58 +0000 (09:47 -0800)]
poly1305: Don't export generic Poly1305_* symbols from xform_poly1305.c.
There currently isn't a need to provide a public interface to a
software Poly1305 implementation beyond what is already available via
libsodium's APIs and these symbols conflict with symbols shared within
the ossl.ko module between ossl_poly1305.c and ossl_chacha20.c.
Reported by: se, kp
Fixes: 78991a93eb9d
Sponsored by: Netflix
John Baldwin [Thu, 18 Feb 2021 17:22:18 +0000 (09:22 -0800)]
Add an implementation of CHACHA20_POLY1305 to cryptosoft.
This uses the chacha20 IETF and poly1305 implementations from
libsodium. A seperate auth_hash is created for the auth side whose
Setkey method derives the poly1305 key from the AEAD key and nonce as
described in RFC 8439.
Philip Paeps [Mon, 18 Oct 2021 06:19:42 +0000 (14:19 +0800)]
contrib/tzdata: correct DST in Fiji
Direct commit to stable/13.
Unfortunately, there is still no clear consensus on the tz mailing list
about some of the changes introduced by tzdata 2021b and later releases.
Pending consensus, only merge the recently announced DST transition date
for Fiji and corrections to commentary from tzdata 2021d. This corrects
future timestamps in Fiji.
Navdeep Parhar [Thu, 24 Jun 2021 20:05:57 +0000 (13:05 -0700)]
cxgbe(4): Do not configure traffic classes automatically on attach.
The driver used to configure all available classes with some default
parameters on attach and the rest of t4_sched.c was written with the
assumption that all traffic classes are always valid in the hardware.
But this resulted in a lot of informational messages being logged in the
firmware's circular log, crowding out other more useful messages.
This change leaves the tx scheduler alone during attach to reduce the
spam in the devlog. The state of every class is now tracked separately
from its flags and there is support for an 'uninitialized' state.
Navdeep Parhar [Tue, 22 Jun 2021 05:07:56 +0000 (22:07 -0700)]
cxgbe(4): Get the number of usable traffic classes from the firmware.
Recent firmwares are able to utilize the traffic classes of tx channels
that were previously unused. This effectively doubles the number of
traffic classes available per port for 2 port cards. Stop using the raw
per-channel value in the driver and ask the firmware for the number of
usable traffic classes instead.
Navdeep Parhar [Tue, 25 May 2021 20:47:06 +0000 (13:47 -0700)]
cxgbe(4): Update firmwares to 1.25.6.0.
Changes since 1.25.0.0 are listed here. This list comes from the
Release Notes for the "Chelsio Unified Wire v3.14.0.3 for Linux"
release dated 2021-05-21.
Fixes
-----
BASE:
- Fixed Back to back T6 100G-CR4 link coming up with NO FEC sometimes.
- [T5] Try to bring up link in 1G speed if link doesn't come up on 10G.
- Fixed a bug to not allow BaseR fec in 100G speed.
- Fixed linkup issues on BT adapter in 1G and 100M speed.
- Fixed an issue to allow driver to send VI_ENABLE multiple times (once
with rx disable and then later rx enable).
- Fixed rate limiting not working on class number 16 to 30.
- Fixed backward compatibility issue in port type interpretation with vpd
version 0x80.
ETH:
- Fixed a case when firmware failed to deliver NIC WR completion to host.
- No rate limit support for WR ETH_TX_PKTS2 due to performance reasons.
OFLD
- Fixed a connection hang in SO adapters when tp_plen_max (set by driver)
is more than the window size.
- Added fw_filter_vnic_mode to firmware API file (t4fw_interface.h)
- Use correct rx channel in coprocessor crypto completion (CPL_FW6_PLD). This
was causing out of order completion to host.
FOiSCSI
- Fixed a crash due to unaligned access of ipv6 address.
- Fixed a crash during lun reset.
Enhancements
------------
ETH:
- Rate limiting support added for encapsulated (vxlan, nvgre, geneve) NIC TCP
packets.
OFLD:
- More than 128 SGLs supported in FW_RI_FR_NSMR_WR. Now, more than 16GB
(upto 64GB) of PBLs can be written with single FW_RI_FR_NSMR_WR.
Navdeep Parhar [Thu, 27 May 2021 02:18:42 +0000 (19:18 -0700)]
cxgbe(4): Fix an incorrect assert.
CTRL and OFLD tx queues do not have automatic tx credit flush enabled so
it is okay for the cidx not to be the same as the pidx when the queue is
destroyed.
Navdeep Parhar [Sun, 23 May 2021 21:58:29 +0000 (14:58 -0700)]
cxgbe(4): Overhaul CLIP (Compressed Local IPv6) table management.
- Process the list of local IPs once instead of once per adapter. Add
addresses from all VNETs to the driver's list but leave hardware
updates for later when the global VNET/IFADDR list locks have been
released.
- Add address to the hardware table synchronously when a CLIP entry is
requested for an address that's not already in there.
- Provide ioctls that allow userspace tools to manage addresses in the
CLIP table.
- Add a knob (hw.cxgbe.clip_db_auto) that controls whether local IPs are
automatically added to the CLIP table or not.
cxgbe(4): Add support for NIC suspend/resume and live reset.
Add suspend/resume callbacks to the driver and a live reset built around
them. This commit covers the basic NIC and future commits will expand
this functionality to other stateful parts of the chip. Suspend and
resume operate on the chip (the t?nex nexus device) and affect all its
ports. It is not possible to suspend/resume or reset individual ports.
All these operations can be performed on a running NIC. A reset will
look like a link bounce to the networking stack.
Here are some ways to exercise this functionality:
/* Manual reset with driver sysctl. */
# sysctl dev.t6nex.0.reset=1
/* Automatic adapter reset on any fatal error. */
# hw.cxgbe.reset_on_fatal_err=1
Suspend disables the adapter (DMA, interrupts, and the port PHYs) and
marks the hardware as unavailable to the driver. All ifnets associated
with the adapter are still visible to the kernel but operations that
require hardware interaction will fail with ENXIO. All ifnets report
link-down while the adapter is suspended.
Resume will reattach to the card, reconfigure it as before, and recreate
the queues servicing the existing ifnets. The ifnets are able to send
and receive traffic as soon as the link comes back up.
Reset is roughly the same as a suspend and a resume with at least one of
these events in between: D0->D3Hot->D0, FLR, PCIe link retrain.
cxgbe(4): Separate the sw- and hw-specific parts of resource allocations
The driver uses both software resources (locks, callouts, memory for
descriptors and for bookkeeping, sysctls, etc.) and hardware resources
(VIs, DMA queues, TCAM entries, etc.) to operate the NIC. This commit
splits the single *_ALLOCATED flag used to track all these resources
into separate *_SW_ALLOCATED and *_HW_ALLOCATED flags.
This is the simplified pseudocode that now applies to most queues (foo
can be ctrlq/txq/rxq/ofld_txq/ofld_rxq):
/* Idempotent */
alloc_foo
{
if (!SW_ALLOCATED)
init_iq/init_eq/init_fl no-fail sw init
alloc_iq_fl/alloc_eq/alloc_wrq may-fail sw alloc
add_foo_sysctls, etc. no-fail post-alloc items
if (!HW_ALLOCATED)
alloc_iq_fl_hwq/alloc_eq_hwq hw resource allocation
}
/* Idempotent */
free_foo
{
if (!HW_ALLOCATED)
free_iq_fl_hwq/free_eq_hwq release hw resources
if (!SW_ALLOCATED)
free_iq_fl/free_eq/free_wrq release sw resources
}
The routines that take the driver to FULL_INIT_DONE and VI_INIT_DONE and
back are now all idempotent. The quiesce routines pay attention to the
HW_ALLOCATED flag and will not wait on the hardware for pidx/cidx
updates and other completions if this flag is not set.
Rick Macklem [Wed, 13 Oct 2021 00:21:01 +0000 (17:21 -0700)]
nfscl: Fix another deadlock related to the NFSv4 clientID lock
Without this patch, it is possible to hang the NFSv4 client,
when a rename/remove is being done on a file where the client
holds a delegation, if pNFS is being used. For a delegation
to be returned, dirty data blocks must be flushed to the NFSv4
server. When pNFS is in use, a shared lock on the clientID
must be acquired while doing a write to the DS(s).
However, if rename/remove is doing the delegation return
an exclusive lock will be acquired on the clientID, preventing
the write to the DS(s) from acquiring a shared lock on the clientID.
This patch stops rename/remove from doing a delegation return
if pNFS is enabled. Since doing delegation return in the same
compound as rename/remove is only an optimization, not doing
so should not cause problems.
This problem was detected during a recent NFSv4 interoperability
testing event held by the IETF working group.
Rick Macklem [Tue, 12 Oct 2021 04:58:24 +0000 (21:58 -0700)]
nfscl: Fix a deadlock related to the NFSv4 clientID lock
Without this patch, it is possible for a process doing an NFSv4
Open/create of a file to block to allow another process
to acquire the exclusive lock on the clientID when holding
a shared lock on the clientID. As such, both processes
deadlock, with one wanting the exclusive lock, while the
other holds the shared lock. This deadlock is unlikely to occur
unless delegations are in use on the NFSv4 mount.
This patch fixes the problem by not deferring to the process
waiting for the exclusive lock when a shared lock (reference cnt)
is already held by the process.
This problem was detected during a recent NFSv4 interoperability
testing event held by the IETF working group.
Mark Johnston [Wed, 13 Oct 2021 00:11:02 +0000 (20:11 -0400)]
mount: Check for !VDIR mount points before handling -o emptydir
To implement -o emptydir, vfs_emptydir() checks that the passed
directory is empty. This should be done after checking whether the
vnode is of type VDIR, though, or vfs_emptydir() may end up calling
VOP_READDIR on a non-directory.
Reported by: syzbot+4006732c69fb0f792b2c@syzkaller.appspotmail.com
Reviewed by: kib, imp
Sponsored by: The FreeBSD Foundation
Mark Johnston [Fri, 17 Sep 2021 14:44:23 +0000 (10:44 -0400)]
libc/locale: Fix races between localeconv(3) and setlocale(3)
Each locale embeds a lazily initialized lconv which is populated by
localeconv(3) and localeconv_l(3). When setlocale(3) updates the global
locale, the lconv needs to be (lazily) reinitialized. To signal this,
we set flag variables in the locale structure. There are two problems:
- The flags are set before the locale is fully updated, so a concurrent
localeconv() call can observe partially initialized locale data.
- No barriers ensure that localeconv() observes a fully initialized
locale if a flag is set.
So, move the flag update appropriately, and use acq/rel barriers to
provide some synchronization. Note that this is inadequate in the face
of multiple concurrent calls to setlocale(3), but this is not expected
to work regardless.
Thanks to Henry Hu <henry.hu.sh@gmail.com> for providing a test case
demonstrating the race.
John Baldwin [Fri, 11 Jun 2021 21:56:28 +0000 (14:56 -0700)]
Remove 'make update'.
In the CVS days this used be a wrapper around either CVS or CVSup and
used to support updating src, doc, and ports checkouts. With the move
to subversion this only supported updating src and was itself a
wrapper around 'svn update'. With Git, users are probably better off
using appropriate Git commands directly to update without needing an
explicit make target as a wrapper.
Rick Macklem [Mon, 27 Sep 2021 01:37:25 +0000 (18:37 -0700)]
nfscl: Add a check for "has acquired a delegation" to nfscl_removedeleg()
Commit 5e5ca4c8fc53 added a flag to a NFSv4 mount point that is set when
the first delegation is acquired from the NFSv4 server.
For a common case where delegations are not being issued by the
NFSv4 server, the nfscl_removedeleg() code acquires the mutex lock for
open/lock state, finds the delegation list empty, then just unlocks the
mutex and returns. This patch adds a check of the flag to avoid the
need to acquire the mutex for this common case.
This change appears to be performance neutral for a small number
of opens, but should reduce lock contention for a large number of opens
for the common case where server is not issuing delegations.
This commit should not affect the high level semantics of delegation
handling.
Alexander Motin [Tue, 5 Oct 2021 19:01:16 +0000 (15:01 -0400)]
cam(4): Limit search for disks in SES enclosure by single bus
At least for SAS that we only support now disks are typically
connected to the same bus as the enclosure. Limiting the search
scope makes it much faster on systems with multiple buses and
thousands of disks.
Alexander Motin [Tue, 5 Oct 2021 18:54:03 +0000 (14:54 -0400)]
cam(4): Improve XPT_DEV_MATCH
Remove *_MATCH_NONE enums, making no sense and so never used. Make
*_MATCH_ANY enums 0 (no any match flags set), previously used by
*_MATCH_NONE. Bump CAM_VERSION to 0x1a reflecting those changes and
add compat shims.
When traversing through buses and devices do not descend if we can
already see that requested pattern does not match the bus or device.
It allows to save significant amount of time on system with thousands
of disks when doing limited searches.
Mike Karels [Sun, 5 Sep 2021 18:14:04 +0000 (13:14 -0500)]
Change lowest address on subnet (host 0) not to broadcast by default.
The address with a host part of all zeros was used as a broadcast long
ago, but the default has been all ones since 4.3BSD and RFC1122. Until
now, we would broadcast the host zero address as well as the configured
address. Change to not broadcasting that address by default, but add a
sysctl (net.inet.ip.broadcast_lowest) to re-enable it. Note that the
correct way to use the zero address for broadcast would be to configure
it as the broadcast address for the network.
See https:/datatracker.ietf.org/doc/draft-schoen-intarea-lowest-address/
and the discussion in https://reviews.freebsd.org/D19316. Note, Linux
now implements this.
Mark Johnston [Mon, 4 Oct 2021 16:28:22 +0000 (12:28 -0400)]
libctf: Improve check for duplicate SOU definitions in ctf_add_type()
When copying a struct or union from one CTF container to another,
ctf_add_type() checks whether it matches an existing type in the
destination container. It does so by looking for a type with the same
name and kind as the new type, and if one exists, it iterates over all
members of the source type and checks whether a member with matching
name and offset exists in the matched destination type. This can
produce false positives, for example because member types are not
compared, but this is not expected to arise in practice. If the match
fails, ctf_add_type() returns an error.
The procedure used for member comparison breaks down in the face of
anonymous struct and union members. ctf_member_iter() visits each
member in the source definition and looks up the corresponding member in
the desination definition by name using ctf_member_info(), but this
function will descend into anonymous members and thus fail to match.
Fix the problem by introducing a custom comparison routine which does
not assume member names are unique. This should also be faster for
types with many members; in the previous scheme, membcmp() would perform
a linear scan of the desination type's members to perform a lookup by
name. The new routine steps through the members of both types in a
single loop.
Mark Johnston [Mon, 4 Oct 2021 21:48:44 +0000 (17:48 -0400)]
geom_label: Add more validation for NTFS volume tasting
- Ensure that the computed MFT record size isn't negative or larger than
maxphys before trying to read $Volume.
- Guard against truncated records in volume metadata.
- Ensure that the record length is large enough to contain the volume
name.
- Verify that the (UTF-16-encoded) volume name's length is a multiple of
two.
PR: 258833, 258914
Sponsored by: The FreeBSD Foundation
Mark Johnston [Fri, 17 Sep 2021 16:34:21 +0000 (12:34 -0400)]
vfs: Permit unix sockets to be opened with O_PATH
As with FIFOs, a path descriptor for a unix socket cannot be used with
kevent().
In principle connectat(2) and bindat(2) could be modified to support an
AT_EMPTY_PATH-like mode which operates on the socket referenced by an
O_PATH fd referencing a unix socket. That would eliminate the path
length limit imposed by sockaddr_un.
Update O_PATH tests.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Mark Peek [Sat, 9 Oct 2021 21:21:16 +0000 (14:21 -0700)]
vmci: fix panic due to freeing unallocated resources
Summary:
An error mapping PCI resources results in a panic due to unallocated
resources being freed up. This change puts the appropriate checks in
place to prevent the panic.
PR: 252445
Reported by: Marek Zarychta <zarychtam@plan-b.pwste.edu.pl>
Tested by: marcus
MFC after: 1 week
Sponsored by: VMware
Test Plan:
Along with user testing, also simulated error by inserting a ENXIO
return in vmci_map_bars().
Reviewed by: marcus
Subscribers: imp
Differential Revision: https://reviews.freebsd.org/D32016
Mitchell Horne [Thu, 7 Oct 2021 21:12:30 +0000 (18:12 -0300)]
riscv: fix VM_MAXUSER_ADDRESS checks in asm routines
There are two issues with the checks against VM_MAXUSER_ADDRESS. First,
the comparison should consider the values as unsigned, otherwise
addresses with the high bit set will fail to branch. Second, the value
of VM_MAXUSER_ADDRESS is, by convention, one larger than the maximum
mappable user address and invalid itself. Thus, use the bgeu instruction
for these comparisons.
Mitchell Horne [Thu, 7 Oct 2021 21:05:38 +0000 (18:05 -0300)]
riscv: handle page faults in the unmappable region
When handling a kernel page fault, check explicitly that stval resides
in either the user or kernel address spaces, and make the page fault
fatal if not. Otherwise, a properly crafted address may appear to
pmap_fault() as a valid and present page in the kernel map, causing the
page fault to be retried continuously. This is mainly due to the fact
that the upper bits of virtual addresses are not validated by most of
the pmap code.
Faults of this nature should only occur due to some kind of bug in the
kernel, but it is best to handle them gracefully when they do.
Handle user page faults in the same way, sending a SIGSEGV immediately
when a malformed address is encountered.
Add an assertion to pmap_l1(), which should help catch other bugs of
this kind that make it this far.
Looking for "tsc-tsc" in the pmu tables will fail every time. Instead,
make this an alias for the static TSC event defined in pmc_events.h.
This fixes 'pmcstat -s cycles' on Intel and AMD.
Reviewed by: emaste
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32197
The implementation of the progress bar is simple, but duplicated for
most minidump implementations. Extract the common bits to kern_dump.c.
Ensure that the bar is reset with each subsequent dump; this was only
done on some platforms previously.
Reviewed by: markj
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31885
The function is identical in each minidump implementation, so move it to
vm_phys.c. The only slight exception is powerpc where the function was
public, for use in moea64_scan_pmap().
Felix Johnson [Wed, 6 Oct 2021 20:47:02 +0000 (22:47 +0200)]
login.conf.5: Mark passwordtime as implemented
login.conf.5 listed passwordtime in RESERVED CAPABILITIES, which is a
section for capabilities not implemented in the base system. However,
passwordtime has been implemented in the base for several years now.