Alexander Motin [Thu, 28 Apr 2022 01:39:50 +0000 (21:39 -0400)]
CAM: Keep periph_links when restoring CCB in camperiphdone().
While recovery command executed, some other commands from the periph
may complete, that may affect periph_links of this CCB. So restoring
original CCB we must keep current periph_links as more up to date.
I've found this triggering assertions with debug kernel and suspect
some memory corruptions otherwise when spun down disk receives two
or sometimes more concurrent requests.
Fix another race between fork(2) and PROC_REAP_KILL subtree
where we might not yet see a new child when signalling a process.
Ensure that this cannot happen by stopping all reapping subtree,
which ensures that the child is not inside a syscall, in particular
fork(2).
Reported and tested by: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D35014
lacp: short timeout erroneously declares link-flapping
Panasas was seeing a higher-than-expected number of link-flap events.
After joint debugging with the switch vendor, we determined there were
problems on both sides; either of which might cause the occasional
event, but together caused lots of them.
On the switch side, an internal queuing issue was causing LACP PDUs --
which should be sent every second, in short-timeout mode -- to sometimes
be sent slightly later than they should have been. In some cases, two
successive PDUs were late, but we never saw three late PDUs in a row.
On the FreeBSD side, we saw a link-flap event every time there were two
late PDUs, while the spec says that it takes *three* seconds of downtime
to trigger that event. It turns out that if a PDU was received shortly
before the timer code was run, it would decrement less than a full
second after the PDU arrived. Then two delayed PDUs would cause two
additional decrements, causing it to reach zero less than three seconds
after the most-recent on-time PDU.
The solution is to note the time a PDU arrives, and only decrement if at
least a full second has elapsed since then.
John Baldwin [Wed, 27 Apr 2022 19:26:05 +0000 (12:26 -0700)]
firewire: Initialize firewire_devclass in fw_modevent.
The use of devclass_get_softc() combined with cdev unit numbers is
probably not ideal (probably should be initializing si_drv1 when each
cdev is created instead), but it looks like a bit of a PITA to do, so
just initialize the devclass explicitly instead.
John Baldwin [Wed, 27 Apr 2022 19:18:52 +0000 (12:18 -0700)]
setkey(8): Clarify language around AEAD ciphers.
AEAD ciphers for IPsec combine both encryption and authentication. As
such, ESP configurations using an AEAD cipher should not use a
seperate authentication algorithm via -A. However, this was not
apparent from the setkey manpage and 12.x and earlier did not perform
sufficient argument validation permitting users to pair an explicit -A
such as SHA256-HMAC with AES-GCM. (The result was a non-standard
combination of AES-CTR with the specified MAC, but with the wrong
initial block counter (and thus different keystream) compared to using
AES-CTR as the cipher.)
Attempt to clarify this in the manpage by explicitly calling out AEAD
ciphers (currently only AES-GCM) and noting that AEAD ciphers should
not use -A.
While here, explicitly note which authentication algorithms can be
used with esp vs esp-old. Also add subsection headings for the
different algorithm lists and tidy some language.
I did not convert the tables to column lists (Bl -column) though that
would probably be more correct than using literal blocks (Bd
-literal).
The 'failed to write TX skb to HCI' error message is twice in the code.
Print the function name and along with the message and also the reported
error so it can possibly be helpful.
The 'failed to get tx report from firmware' was purposefully changed
away from debugging in the upstream Linux driver in 584dce175f0461d5d9d63952a1e7955678c91086 . Revert that decision and
extend the logging by the actual queue length so we get an idea how
sever the problem is (see PR for a report).
PR: 248235
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
X-MFC: only to get the reminder for later
Ed Maste [Wed, 27 Apr 2022 13:15:09 +0000 (09:15 -0400)]
libcxxrt: Insert padding in __cxa_dependent_exception
Padding was added to __cxa_exception in 45ca8b19 and
__cxa_dependent_exception needs the same layout.
Add some static_asserts to detect this in the future.
Export _Unwind_Complete and _Unwind_VRS_Pop from arm's libgcc_s
Apparently some ports on arm require these symbols, and while they were
available in llvm's libunwind, they were never exported via the arm
specific Symbol.map. Put them in the same version block as gcc does
(GCC_3.5).
Reported by: Robert Clausecker <fuz_at_fuz.su>
MFC after: 3 days
Warner Losh [Tue, 26 Apr 2022 17:01:06 +0000 (11:01 -0600)]
adaasync: Harmonize with daasync
We should call cam_periph_async() always, like SCSI does. This routine
is supposed to be more of a catch-all.
cam_periph_async() only does actions for AC_LOST_DEVICE. It ignores all
other events (today), but this may not always be true. So this is a nop
change.
Drop in a 'break' so we don't fall through unnecessarily.
Doug Moore [Tue, 26 Apr 2022 07:56:23 +0000 (02:56 -0500)]
vm_phys: avoid waste in multipage allocation
In vm_phys_alloc_contig, for an allocation bigger than the size of any
buddy queue free block, avoid examining any maximum-size free block
more than twice, by only starting to consider a sequence of adjacent
max-blocks starting at a max-block that does not follow another
max-block. If that first max-block follows adjacent blocks of smaller
size, and if together they provide enough memory to reduce by one the
number of max-blocks required for this allocation, use them as part of
this allocation.
Warner Losh [Mon, 25 Apr 2022 18:54:37 +0000 (12:54 -0600)]
ada: Eliminate dead code
We never use the cgd that we get from the XPT_GDEV_TYPE call. Prior to 9a6844d55fe33 we used it to determine if READ AHEAD or WRITE CACHING was
supported. However, all that information was moved into adasetflags so
we no longer need to this since it's cached in the softc and updated
with the IDENTIFY data changes automatically.
When running binaries compiled with libasan in PIE and with ALSR on,
the previous code always print the WARNING message, the new code only
printed it if the run is made with the right verbosity
Before this change, many unit test will always fail because of the extra
message which is printed
Mark Johnston [Mon, 25 Apr 2022 13:13:03 +0000 (09:13 -0400)]
linuxkpi: Mitigate a seqlock livelock
Disable preemption in seqlock write sections when using the _irqsave
variant. This ensures that a writer can't be preempted and subsequently
starved by a reader running in a callout handler on the same CPU.
This fixes occasional display hangs seen when using the i915 driver.
Tested by: emaste, wulf
Reviewed by: wulf, hselasky
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35021
Our kern_sigtimedwait() calculates absolute sleep timo value as 'uptime+timeout'.
So, when the user specifies a big timeout value (LONG_MAX), the calculated
timo can be less the the current uptime value.
In that case kern_sigtimedwait() returns EAGAIN instead of EINTR, if
unblocked signal was caught.
While here switch to a high-precision sleep method.
Created a couple of helpers to send signals to the specific thread or to
the whole process. Use helpers in the corresponding syscalls.
This fixes the confusion where a signal destined for a whole process
was sent to a specific thread and vice versa.
There is an exclusion for the linux_kill() syscall that takes a pid
argument and should send a signal to the whole process, but I know
at least one example where kill() takes tid.
Kevin Lo [Mon, 25 Apr 2022 01:56:20 +0000 (09:56 +0800)]
Restore original MDC speed control register value after MAC reset, if it
wasn't default
Since vte_reset changes register value to MDCSC_DEFAULT value, which may not
be the original value, thus causing some phy registers read failures.
Restoring VTE_MDCSC value to original after reset solves the link state
flapping issue.
Thanks to jhb ("the code looks ok") for his review.
Reviewed by: jhb
Obtained from: NetBSD via Andrius V
Differential Revision: https://reviews.freebsd.org/D34956
Warner Losh [Sun, 24 Apr 2022 19:54:29 +0000 (13:54 -0600)]
g_vfs_done: Only report ENXIO once
The contract with the lower layers is that once ENXIO is reported, all
further I/O to the device is not possible. This is reported when the
device departs for good or changes in some material manner out from
underneath the system. Since the lower layers terminate all pending I/O
when this is detected with ENXIO, reporting more than one provides no
extra value. ENXIO suppression done with atomics due to race described
in e8827f4094cb. It's on the error path and a rare event, so this won't affect
performance.
Warner Losh [Sun, 24 Apr 2022 19:54:20 +0000 (13:54 -0600)]
g_vfs_done: Report when we switch on ENXIO conversion
On the 0 -> 1 transition of sc_enxio_active, report that we're doing
this. This is a rare, but interesting, event. Convert to using atomics
to set this field to prevent a rare race:
In CAM, when we invalidate a device, one thread (T1) will start the
process in error processing called from *dadone
(cam_periph_error). This routine will queue work to xpt_async_td
(T2) and indicate to *dadone to call biodone(ENXIO) for the bio. T2
wakes up and basically waits to acquire the periph lock. T2 will do
so when T1 drops the periph lock just before T1's call to
biodone. T2 acquires the lock and calls biodone(ENXIO) on all
pending bios. These two threads will race and we could lose the
printf or get two in rare cases. Since we only touch sc_enxio_active
in an error path that's infrequent, the extra atomic traffic will be
rare but will ensure robustness.
Warner Losh [Sat, 23 Apr 2022 23:08:39 +0000 (17:08 -0600)]
nda: Fix comment
Fix a comment that was left over from the orignial
implementation. Explain how pending transactions in hardware are
completed/aborted in the SIM prior to ndacleanup being called.
Ed Maste [Sat, 23 Apr 2022 19:52:03 +0000 (15:52 -0400)]
ssh: use upstream SSH_OPENSSL_VERSION macro
With the upgrade to OpenSSH 6.7p1 in commit a0ee8cc636cd we replaced
WITH_OPENSSL ifdefs with an OPENSSL_VERSION macro, later changing it
to OPENSSL_VERSION_STRING.
A few years later OpenSSH made an equivalent change (with a different
macro name), in commit 4d94b031ff88. Switch to the macro name they
chose.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
John Baldwin [Fri, 22 Apr 2022 22:52:50 +0000 (15:52 -0700)]
KTLS: Add a new recrypt operation to the software backend.
When using NIC TLS RX, packets that are dropped and retransmitted are
not decrypted by the NIC but are passed along as-is. As a result, a
received TLS record might contain a mix of encrypted and decrypted
data. If this occurs, the already-decrypted data needs to be
re-encrypted so that the resulting record can then be decrypted
normally.
Add support for this for sessions using AES-GCM with TLS 1.2 or TLS
1.3. For the recrypt operation, allocate a temporary buffer and
encrypt the the payload portion of the TLS record with AES-CTR with an
initial IV constructed from the AES-GCM nonce. Then fixup the
original mbuf chain by copying the results from the temporary buffer
back into the original mbufs for any mbufs containing decrypted data.
Once it has been recrypted, the mbuf chain can then be decrypted via
the normal software decryption path.
John Baldwin [Fri, 22 Apr 2022 22:52:27 +0000 (15:52 -0700)]
KTLS: Construct IV directly in crp.crp_iv for TLS 1.3 AEAD encryption.
Previously this used a temporary nonce[] buffer. The decrypt hook for
TLS 1.3 as well as the hooks for TLS 1.2 already constructed the IV
directly in crp.crp_iv.
John Baldwin [Fri, 22 Apr 2022 22:52:12 +0000 (15:52 -0700)]
KTLS: Move OCF function pointers out of ktls_session.
Instead, create a switch structure private to ktls_ocf.c and store a
pointer to the switch in the ocf_session. This will permit adding an
additional function pointer needed for NIC TLS RX without further
bloating ktls_session.
Similar to ipfw rule timestamps, these timestamps internally are
uint32_t snaps of the system time in seconds. The timestamp is CPU local
and updated each time a rule or a state associated with a rule or state
is matched.
Since these devices serve no purpose (no driver attaches) I have
enabled the reporting of suich devices only for verbose boots (a
diversion from the patch provided in the PR).
A verbose boot will now display such devices as:
pci4: <non-essential instrumentation> at device 0.0 (no driver attached)
Adrian Chadd [Tue, 12 Apr 2022 20:20:28 +0000 (13:20 -0700)]
net80211: Fix traffic hang on STA/AP VAPs on a multi-VAP interface
This took an embarrasingly long time to find.
The state changes for a radio with a STA /and/ AP VAP gets a bit messy.
The AP maps are marked as waiting, waiting for the STA AP to find a
channel to use before the AP VAPs become active.
However, the code path that clears the OACTIVE flag on a VAP only runs
during a successful run of ieee80211_newstate_cb().
So here is how it goes:
* the STA VAP goes down and needs to scan;
* the AP vap goes RUN->INIT; but it doesn't YET call ieee80211_newstate_cb();
* meanwhile - a send on the AP VAP causes the VAP to set the OACTIVE flag here;
* then the STA VAP finishes scan and goes to RUN;
* which will call wakeupwaiting() as part of the STA VAP transition to RUN;
* .. then the AP VAP goes INIT->RUN directly via a call to hostap_newstate
in wakeupwaiting rather than it being through the deferred path;
* /then/ the ieee80211_newstate_cb() is called, but it sees the state go
RUN->RUN;
* .. which results in the OACTIVE flag never being cleared.
This clears the OACTIVE flag when a VAP transitions RUN->RUN; the
driver layer or net80211 layer can set it if required in a subsequent
transmit.
John Baldwin [Thu, 21 Apr 2022 21:01:02 +0000 (14:01 -0700)]
qlxge: Inline the one use of a variable only used in a debug trace.
The other QL_DPRINT*() invocations in qls_init_hw_if() all used the
expanded form instead of the local variable. The module build always
defines QL_DBG in CFLAGS so doesn't trip over this, but adding qlxge
to a kernel config builds without QL_DBG.
John Baldwin [Thu, 21 Apr 2022 20:52:48 +0000 (13:52 -0700)]
psm: Swap the unit member in the softc for a device_t.
This entails various changes to make this driver more "modern"
(new-bus vs pre-new-bus) using device_log() and device_printf() rather
than psm%d. It also fixes the device_busy/unbusy calls to use sc->dev
directly rather than looking the device_t up via the devclass and
unit.
stand: zfs: handle holes at the tail end correctly
This mirrors dmu_read_impl(), zeroing out the tail end of the buffer and
clipping the read to what's contained by the block that exists.
This fixes an issue that arose during the 13.1 release process; in
13.1-RC1 and later, setting up GELI+ZFS will result in a failure to
boot. The culprit is this, which causes us to fail to load geom_eli.ko
as there's a residual portion after the single datablk that should be
zeroed out.