Doug Moore [Tue, 26 Apr 2022 07:56:23 +0000 (02:56 -0500)]
vm_phys: avoid waste in multipage allocation
In vm_phys_alloc_contig, for an allocation bigger than the size of any
buddy queue free block, avoid examining any maximum-size free block
more than twice, by only starting to consider a sequence of adjacent
max-blocks starting at a max-block that does not follow another
max-block. If that first max-block follows adjacent blocks of smaller
size, and if together they provide enough memory to reduce by one the
number of max-blocks required for this allocation, use them as part of
this allocation.
Warner Losh [Mon, 25 Apr 2022 18:54:37 +0000 (12:54 -0600)]
ada: Eliminate dead code
We never use the cgd that we get from the XPT_GDEV_TYPE call. Prior to 9a6844d55fe33 we used it to determine if READ AHEAD or WRITE CACHING was
supported. However, all that information was moved into adasetflags so
we no longer need to this since it's cached in the softc and updated
with the IDENTIFY data changes automatically.
When running binaries compiled with libasan in PIE and with ALSR on,
the previous code always print the WARNING message, the new code only
printed it if the run is made with the right verbosity
Before this change, many unit test will always fail because of the extra
message which is printed
Mark Johnston [Mon, 25 Apr 2022 13:13:03 +0000 (09:13 -0400)]
linuxkpi: Mitigate a seqlock livelock
Disable preemption in seqlock write sections when using the _irqsave
variant. This ensures that a writer can't be preempted and subsequently
starved by a reader running in a callout handler on the same CPU.
This fixes occasional display hangs seen when using the i915 driver.
Tested by: emaste, wulf
Reviewed by: wulf, hselasky
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35021
Our kern_sigtimedwait() calculates absolute sleep timo value as 'uptime+timeout'.
So, when the user specifies a big timeout value (LONG_MAX), the calculated
timo can be less the the current uptime value.
In that case kern_sigtimedwait() returns EAGAIN instead of EINTR, if
unblocked signal was caught.
While here switch to a high-precision sleep method.
Created a couple of helpers to send signals to the specific thread or to
the whole process. Use helpers in the corresponding syscalls.
This fixes the confusion where a signal destined for a whole process
was sent to a specific thread and vice versa.
There is an exclusion for the linux_kill() syscall that takes a pid
argument and should send a signal to the whole process, but I know
at least one example where kill() takes tid.
Kevin Lo [Mon, 25 Apr 2022 01:56:20 +0000 (09:56 +0800)]
Restore original MDC speed control register value after MAC reset, if it
wasn't default
Since vte_reset changes register value to MDCSC_DEFAULT value, which may not
be the original value, thus causing some phy registers read failures.
Restoring VTE_MDCSC value to original after reset solves the link state
flapping issue.
Thanks to jhb ("the code looks ok") for his review.
Reviewed by: jhb
Obtained from: NetBSD via Andrius V
Differential Revision: https://reviews.freebsd.org/D34956
Warner Losh [Sun, 24 Apr 2022 19:54:29 +0000 (13:54 -0600)]
g_vfs_done: Only report ENXIO once
The contract with the lower layers is that once ENXIO is reported, all
further I/O to the device is not possible. This is reported when the
device departs for good or changes in some material manner out from
underneath the system. Since the lower layers terminate all pending I/O
when this is detected with ENXIO, reporting more than one provides no
extra value. ENXIO suppression done with atomics due to race described
in e8827f4094cb. It's on the error path and a rare event, so this won't affect
performance.
Warner Losh [Sun, 24 Apr 2022 19:54:20 +0000 (13:54 -0600)]
g_vfs_done: Report when we switch on ENXIO conversion
On the 0 -> 1 transition of sc_enxio_active, report that we're doing
this. This is a rare, but interesting, event. Convert to using atomics
to set this field to prevent a rare race:
In CAM, when we invalidate a device, one thread (T1) will start the
process in error processing called from *dadone
(cam_periph_error). This routine will queue work to xpt_async_td
(T2) and indicate to *dadone to call biodone(ENXIO) for the bio. T2
wakes up and basically waits to acquire the periph lock. T2 will do
so when T1 drops the periph lock just before T1's call to
biodone. T2 acquires the lock and calls biodone(ENXIO) on all
pending bios. These two threads will race and we could lose the
printf or get two in rare cases. Since we only touch sc_enxio_active
in an error path that's infrequent, the extra atomic traffic will be
rare but will ensure robustness.
Warner Losh [Sat, 23 Apr 2022 23:08:39 +0000 (17:08 -0600)]
nda: Fix comment
Fix a comment that was left over from the orignial
implementation. Explain how pending transactions in hardware are
completed/aborted in the SIM prior to ndacleanup being called.
Ed Maste [Sat, 23 Apr 2022 19:52:03 +0000 (15:52 -0400)]
ssh: use upstream SSH_OPENSSL_VERSION macro
With the upgrade to OpenSSH 6.7p1 in commit a0ee8cc636cd we replaced
WITH_OPENSSL ifdefs with an OPENSSL_VERSION macro, later changing it
to OPENSSL_VERSION_STRING.
A few years later OpenSSH made an equivalent change (with a different
macro name), in commit 4d94b031ff88. Switch to the macro name they
chose.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
John Baldwin [Fri, 22 Apr 2022 22:52:50 +0000 (15:52 -0700)]
KTLS: Add a new recrypt operation to the software backend.
When using NIC TLS RX, packets that are dropped and retransmitted are
not decrypted by the NIC but are passed along as-is. As a result, a
received TLS record might contain a mix of encrypted and decrypted
data. If this occurs, the already-decrypted data needs to be
re-encrypted so that the resulting record can then be decrypted
normally.
Add support for this for sessions using AES-GCM with TLS 1.2 or TLS
1.3. For the recrypt operation, allocate a temporary buffer and
encrypt the the payload portion of the TLS record with AES-CTR with an
initial IV constructed from the AES-GCM nonce. Then fixup the
original mbuf chain by copying the results from the temporary buffer
back into the original mbufs for any mbufs containing decrypted data.
Once it has been recrypted, the mbuf chain can then be decrypted via
the normal software decryption path.
John Baldwin [Fri, 22 Apr 2022 22:52:27 +0000 (15:52 -0700)]
KTLS: Construct IV directly in crp.crp_iv for TLS 1.3 AEAD encryption.
Previously this used a temporary nonce[] buffer. The decrypt hook for
TLS 1.3 as well as the hooks for TLS 1.2 already constructed the IV
directly in crp.crp_iv.
John Baldwin [Fri, 22 Apr 2022 22:52:12 +0000 (15:52 -0700)]
KTLS: Move OCF function pointers out of ktls_session.
Instead, create a switch structure private to ktls_ocf.c and store a
pointer to the switch in the ocf_session. This will permit adding an
additional function pointer needed for NIC TLS RX without further
bloating ktls_session.
Similar to ipfw rule timestamps, these timestamps internally are
uint32_t snaps of the system time in seconds. The timestamp is CPU local
and updated each time a rule or a state associated with a rule or state
is matched.
Since these devices serve no purpose (no driver attaches) I have
enabled the reporting of suich devices only for verbose boots (a
diversion from the patch provided in the PR).
A verbose boot will now display such devices as:
pci4: <non-essential instrumentation> at device 0.0 (no driver attached)
Adrian Chadd [Tue, 12 Apr 2022 20:20:28 +0000 (13:20 -0700)]
net80211: Fix traffic hang on STA/AP VAPs on a multi-VAP interface
This took an embarrasingly long time to find.
The state changes for a radio with a STA /and/ AP VAP gets a bit messy.
The AP maps are marked as waiting, waiting for the STA AP to find a
channel to use before the AP VAPs become active.
However, the code path that clears the OACTIVE flag on a VAP only runs
during a successful run of ieee80211_newstate_cb().
So here is how it goes:
* the STA VAP goes down and needs to scan;
* the AP vap goes RUN->INIT; but it doesn't YET call ieee80211_newstate_cb();
* meanwhile - a send on the AP VAP causes the VAP to set the OACTIVE flag here;
* then the STA VAP finishes scan and goes to RUN;
* which will call wakeupwaiting() as part of the STA VAP transition to RUN;
* .. then the AP VAP goes INIT->RUN directly via a call to hostap_newstate
in wakeupwaiting rather than it being through the deferred path;
* /then/ the ieee80211_newstate_cb() is called, but it sees the state go
RUN->RUN;
* .. which results in the OACTIVE flag never being cleared.
This clears the OACTIVE flag when a VAP transitions RUN->RUN; the
driver layer or net80211 layer can set it if required in a subsequent
transmit.
John Baldwin [Thu, 21 Apr 2022 21:01:02 +0000 (14:01 -0700)]
qlxge: Inline the one use of a variable only used in a debug trace.
The other QL_DPRINT*() invocations in qls_init_hw_if() all used the
expanded form instead of the local variable. The module build always
defines QL_DBG in CFLAGS so doesn't trip over this, but adding qlxge
to a kernel config builds without QL_DBG.
John Baldwin [Thu, 21 Apr 2022 20:52:48 +0000 (13:52 -0700)]
psm: Swap the unit member in the softc for a device_t.
This entails various changes to make this driver more "modern"
(new-bus vs pre-new-bus) using device_log() and device_printf() rather
than psm%d. It also fixes the device_busy/unbusy calls to use sc->dev
directly rather than looking the device_t up via the devclass and
unit.
stand: zfs: handle holes at the tail end correctly
This mirrors dmu_read_impl(), zeroing out the tail end of the buffer and
clipping the read to what's contained by the block that exists.
This fixes an issue that arose during the 13.1 release process; in
13.1-RC1 and later, setting up GELI+ZFS will result in a failure to
boot. The culprit is this, which causes us to fail to load geom_eli.ko
as there's a residual portion after the single datablk that should be
zeroed out.
John Baldwin [Thu, 21 Apr 2022 17:41:09 +0000 (10:41 -0700)]
busdma_bounce: Make the map waiting list per-bounce-zone.
When pages are freed to a bounce zone, only maps waiting for pages for
that zone can make forward progress. If a map for a different bounce
zone is at the head of the global list, then requests that could
otherwise make forward progress will be stalled waiting on the other
bounce zone. If bounce zones shared bounce pages then a global list
would still make sense to prevent "later" requests from starving an
earlier request but that is not a concern with per-zone bounce page
pools.
John Baldwin [Thu, 21 Apr 2022 17:29:14 +0000 (10:29 -0700)]
FB_INSTALL_CDEV: Remove this option and related code.
This option was never enabled in GENERIC and does not appear to work
(the cdevsw is stored in a global array but never passed to make_dev
to be associated with a character device).
Mark Johnston [Thu, 21 Apr 2022 17:22:09 +0000 (13:22 -0400)]
mld6: Ensure that mld_domifattach() always succeeds
mld_domifattach() does a memory allocation under the global MLD mutex
and so can fail, but no error handling prevents a null pointer
dereference in this case. The mutex is only needed when updating the
global softc list; the allocation and static initialization of the softc
does not require this mutex. So, reduce the scope of the mutex and use
M_WAITOK for the allocation.
PR: 261457
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34943
pf: counter argument to pfr_pool_get() may never be NULL
Coverity points out that if counter was NULL when passed to
pfr_pool_get() we could potentially end up dereferencing it.
Happily all users of the function pass a non-NULL pointer. Enforce this
by assertion and remove the pointless NULL check.
Move the use of 'sc' to after the NULL check.
It's very unlikely that we'd actually hit this, but Coverity is correct
that it's not a good idea to dereference the pointer and only then NULL
check it.
Mark Johnston [Thu, 21 Apr 2022 14:49:22 +0000 (10:49 -0400)]
ctfdump: Remove definitions of warn() and vwarn()
The presence of the latter causes a link error when building a
statically linked ctfdump(1) because libc defines the same symbol.
libc's warn() is defined as a weak symbol and so does not cause the same
problem, but let's just use libc's version.
Reported by: stephane rochoy <stephane.rochoy@stormshield.eu>
MFC after: 1 week
Sponsored by: The FreeBSD Foundation