Chuck Tuffli [Mon, 21 Feb 2022 18:34:14 +0000 (10:34 -0800)]
nvme: Add OAES bit-field definitions
Create definitions for the Optional Asynchronous Events Supported (OAES)
values. Also adds a helper macro for the common use case of "mask and
shift". E.g.
value = NVME_CTRLR_DATA_OAES_NS_ATTR_MASK << NVME_CTRLR_DATA_OAES_NS_ATTR_SHIFT;
becomes
value = NVMEB(NVME_CTRLR_DATA_OAES_NS_ATTR);
1. A delayed_work struct in the WORK_ST_TIMER state.
2. Thread A calls mod_delayed_work()
3. Thread B (a callout thread) simultaneously calls
linux_delayed_work_timer_fn()
The following sequence of events is possible:
A: Call linux_cancel_delayed_work()
A: Change state from TIMER TO CANCEL
B: Change state from CANCEL to TASK
B: taskqueue_enqueue() the task
A: taskqueue_cancel() the task
A: Call linux_queue_delayed_work_on(). This is a no-op because the
state is WORK_ST_TASK.
As a result, the delayed_work struct will never be invoked. This is
causing address resolution in ib_addr.c to stop permanently, as it
never tries to reschedule a task that it thinks is already scheduled.
Fix this by introducing locking into the cancel path (which
corresponds with the lock held while the callout runs). This will
prevent the callout from changing the state of the task until the
cancel is complete, preventing the race.
ip6_setpktopts() can look up ifnets via ifnet_by_index(), which
is only safe in the net epoch. Ensure that callers are in the net
epoch before calling this function.
If the UCL ctld parser encountered a port that used the CTL
ioctl device, it fell into a special case that had an erroneous
early return. This caused all configuration in the target
following the port attribute to be skipped. Fix this by replacing
the return with a continue so that the rest of the config is
parsed correctly.
Ed Maste [Mon, 21 Feb 2022 04:09:36 +0000 (23:09 -0500)]
vt: fix double-click word selection for first/last word on line
Previously when double-clicking on the first word on a line we would
select from the cursor position to the end of the word, not from the
beginning of the line. Similarly, when double-clicking on the last word
on a line we would select from the beginning of the word to the cursor
position rather than the end of the line.
This is because we searched backward or forward for a space character to
mark the beginning or end of a word. Now, use the beginning or end of
the line if we do not find a space.
- Do not set Os to FreeBSD explicitly. We don't do it in other manual
pages.
- Remove macros from the -width specifier.
- Use Xr instead of Cm to refer to the freebsd-update command.
- Address some mandoc lint warnings and use \(em instead of --.
- Wordsmith some paragraphs.
- Add a missing El macro.
Michal Krawczyk [Mon, 3 Jan 2022 13:51:59 +0000 (14:51 +0100)]
ena: update ENA version to v2.5.0
Some of the changes in this release:
- IPv6 L4 checksum offload fixes.
- Optimization of the Tx req_id validation.
- Timer service adjustments.
- NUMA awareness for the kernel RSS mode.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
Dawid Gorecki [Mon, 3 Jan 2022 13:50:29 +0000 (14:50 +0100)]
ena: do not call reset if device is unresponsive
If the device becomes unresponsive, the driver will not be able to
finish the reset process correctly. Timeout during version validation
indicates that the device is currently not responding. In that case
do not perform the reset and instead reschedule timer service. Because
of that the driver will continue trying to reset the device until it
succeeds or is detached.
Dawid Gorecki [Mon, 3 Jan 2022 13:50:13 +0000 (14:50 +0100)]
ena: start timer service on attach
The timer service was started when the interface was brought up and it
was stopped when it was brought down. Since ena_up requires the device
to be responsive, triggering the reset would become impossible if the
device became unresponsive with the interface down.
Since most of the functions in timer service already perform the check
to see if the device is running, this only requires starting the callout
in attach and stopping it when bringing the interface up or down to
avoid race between different admin queue calls.
Since callout functions for timer service are always called with the
same arguments, replace callout_{init,reset,drain} calls with
ENA_TIMER_{INIT,RESET,DRAIN} macros.
Artur Rojek [Mon, 3 Jan 2022 13:50:06 +0000 (14:50 +0100)]
ena: rework tx req_id validation logic
Since `ena_com_tx_comp_req_id_get` already checks for `req_id` validity,
the logic was exiting early, never giving `validate_tx_req_id` a chance
to trigger device reset.
Rewrite the logic so that device reset is called based on return value
of `ena_com_tx_comp_req_id_get` instead.
Submitted by: Artur Rojek <ar@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
Dawid Gorecki [Mon, 3 Jan 2022 13:49:58 +0000 (14:49 +0100)]
ena: properly handle IPv6 L4 checksum offload
ena_tx_csum function did not check if IPv6 checksum offload was
requested it only checked checksum offloading flags for IPv4 packets.
Because of that, when encountering CSUM_IP6_* flags, the function simply
returned without actually setting checksum offloading in ena_ctx.
Check CUSM_IP6_* flags to enable IPv6 checksum offload.
Additionally, only IPv4 header was being parsed regardless of EtherType
field, because of that, value of L4 protocol read when actually trying
to send IPv6 packets was wrong. Use ip6_lasthdr function to get length
of all IPv6 headers and payload protocol.
Set the DF flag to 1 in order to allow the device to offload the IPv6
checksum calculation and achieve optimal performance.
Add CSUM6_OFFLOAD and CSUM_OFFLOAD definitions into ena_datapath.h.
Adjust the driver to the upgraded ena-com part twofold:
First update is related to the driver's NUMA awareness.
Allocate I/O queue memory in NUMA domain local to the CPU bound to the
given queue, improving data access time. Since this can result in
performance hit for unaware users, this is done only when RSS
option is enabled, for other cases the driver relies on kernel to
allocate memory by itself.
Information about first CPU bound is saved in adapter structure, so
the binding persists after bringing the interface down and up again.
If there are more buckets than interface queues, the driver will try to
bind different interfaces to different CPUs using round-robin algorithm
(but it will not bind queues to CPUs which do not have any RSS buckets
associated with them). This is done to better utilize hardware
resources by spreading the load.
Add (read-only) per-queue sysctls in order to provide the following
information:
- queueN.domain: NUMA domain associated with the queue
- queueN.cpu: CPU affinity of the queue
The second change is for the CSUM_OFFLOAD constant, as ENA platform
file has removed its definition. To align to that change, it has been
added to the ena_datapath.h file.
These zones are cache zones used to allocate TLS offload contexts from
firmware. Releasing items from the cache is a sleepable operation due
to the need to await a response from the firmware command freeing the
tag, so items cannot be reclaimed from the zone in non-sleepable
contexts. Since the cache size is limited by firmware limits, avoid
this by setting UMA_ZONE_UNMANAGED to avoid reclamation by uma_timeout()
and the low memory handler.
Reviewed by: hselasky, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34142
Allow a zone to opt out of cache size management. In particular,
uma_reclaim() and uma_reclaim_domain() will not reclaim any memory from
the zone, nor will uma_timeout() purge cached items if the zone is idle.
This effectively means that the zone consumer has control over when
items are reclaimed from the cache. In particular, uma_zone_reclaim()
will still reclaim cached items from an unmanaged zone.
Reviewed by: hselasky, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34142
mlx5en: Use a UMA cache zone for managing TLS send tags
Instead of allocating directly from a normal zone. This way
import and release are guaranteed to process all allocated and then
deallocated items. Also, the release occurs in a sleepable context when
caller of uma_zfree() or uma_zdestroy() can sleep itself.
Use the send tag refcounting mechanism to refcount the RX- and TX- TLS
send tags. Then it is no longer needed to wait for refcounts to reach
zero when destroying RX- and TX- TLS send tags as a result of pending
data or WQE commands.
This also ensures that when TX-TLS and rate limiting is used at the same
time, the underlying SQ is not prematurely destroyed.
Robert Wing [Mon, 20 Dec 2021 20:30:24 +0000 (11:30 -0900)]
tcp_twrespond: send signed segment when connection is TCP-MD5
When a connection is established to use TCP-MD5, tcp_twrespond() doesn't
respond with a signed segment. This results in the host performing the
active close to remain in a TIME_WAIT state and the other host in the
LAST_ACK state. Fix this by sending a signed segment when the connection
is established to use TCP-MD5.
Robert Wing [Tue, 25 Jan 2022 16:44:13 +0000 (07:44 -0900)]
bhyve/block_if: allow DIOCGMEDIASIZE ioctl
This is needed to get mediasize of the device after a resize event.
I missed this earlier as I was building WITH_BHYVE_SNAPSHOT, which
disables capsicum.
Reviewed by: khng, markj Fixes: ae9ea22e14bf ("bhyve: get mediasize for character devices when ...")
Differential Revision: https://reviews.freebsd.org/D34013
Kristof Provost [Thu, 9 Dec 2021 13:24:13 +0000 (14:24 +0100)]
if_epair: implement fanout
Allow multiple cores to be used to process if_epair traffic. We do this
(if RSS is enabled) based on the RSS hash of the incoming packet. This
allows us to distribute the load over multiple cores, rather than
sending everything to the same one.
We also switch from swi_sched() to taskqueues, which also contributes to
better throughput.
Martin Matuska [Wed, 9 Feb 2022 23:35:42 +0000 (00:35 +0100)]
libarchive: import changes from upstream
Libarchive 3.6.0
New features:
PR #1614: tar: new option "--no-read-sparse"
PR #1503: RAR reader: filter support
PR #1585: RAR5 reader: self-extracting archive support
New features (not used in FreeBSD base):
PR #1567: tar: threads support for zstd (#1567)
PR #1518: ZIP reader: zstd decompression support
Security Fixes:
PR #1491, #1492, #1493, CVE-2021-36976:
fix invalid memory access and out of bounds read in RAR5 reader
PR #1566, #1618, CVE-2021-31566:
extended fix for following symlinks when processing the fixup list
Other notable bugfixes and improvements:
PR #1620: tar: respect "--ignore-zeros" in c, r and u modes
PR #1625: reduced size of application binaries
Michael Tuexen [Thu, 30 Dec 2021 14:16:05 +0000 (15:16 +0100)]
sctp: improve sctp_pathmtu_adjustment()
Allow the resending of DATA chunks to be controlled by the caller,
which allows retiring sctp_mtu_size_reset() in a separate commit.
Also improve the computaion of the overhead and use 32-bit integers
consistently.
Thanks to Timo Voelker for pointing me to the code.
Michael Tuexen [Sat, 18 Dec 2021 22:43:00 +0000 (23:43 +0100)]
if_oce: fix epoch handling
Thanks to gallatin@ for suggesting the patch.
PR: 260330
Reported by: Vincent Milum Jr.
Reviewed by: gallatin, glebius
Tested by: Vincent Milum Jr.
Differential Revision: https://reviews.freebsd.org/D33395
Michael Tuexen [Tue, 28 Sep 2021 03:25:58 +0000 (05:25 +0200)]
sctp: fix usage of stream scheduler functions
sctp_ss_scheduled() should only be called for streams that are
scheduled. So call sctp_ss_remove_from_stream() before it.
This bug was uncovered by the earlier cleanup.
Michael Tuexen [Tue, 21 Sep 2021 15:13:57 +0000 (17:13 +0200)]
sctp: Simplify stream scheduler usage
Callers are getting the stcb send lock, so just KASSERT that.
No need to signal this when calling stream scheduler functions.
No functional change intended.
Mark Johnston [Sat, 11 Sep 2021 14:15:21 +0000 (10:15 -0400)]
sctp: Tighten up locking around sctp_aloc_assoc()
All callers of sctp_aloc_assoc() mark the PCB as connected after a
successful call (for one-to-one-style sockets). In all cases this is
done without the PCB lock, so the PCB's flags can be corrupted. We also
do not atomically check whether a one-to-one-style socket is a listening
socket, which violates various assumptions in solisten_proto().
We need to hold the PCB lock across all of sctp_aloc_assoc() to fix
this. In order to do that without introducing lock order reversals, we
have to hold the global info lock as well.
So:
- Convert sctp_aloc_assoc() so that the inp and info locks are
consistently held. It returns with the association lock held, as
before.
- Fix an apparent bug where we failed to remove an association from a
global hash if sctp_add_remote_addr() fails.
- sctp_select_a_tag() is called when initializing an association, and it
acquires the global info lock. To avoid lock recursion, push locking
into its callers.
- Introduce sctp_aloc_assoc_connected(), which atomically checks for a
listening socket and sets SCTP_PCB_FLAGS_CONNECTED.
There is still one edge case in sctp_process_cookie_new() where we do
not update PCB/socket state correctly.
Reviewed by: tuexen
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31908
Michael Tuexen [Mon, 9 Aug 2021 07:23:42 +0000 (09:23 +0200)]
ipv6: Fix getsockopt() for some IPPROTO_IPV6 level socket options
Fix getsockopt() for the IPPROTO_IPV6 level socket options with the
following names: IPV6_HOPOPTS, IPV6_RTHDR, IPV6_RTHDRDSTOPTS,
IPV6_DSTOPTS, and IPV6_NEXTHOP.
Reviewed by: markj
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D31458