Ed Maste [Mon, 20 Dec 2021 21:30:09 +0000 (16:30 -0500)]
csu: define STRIP_FBSDID
__FBSDID() places the provided string in the output object's .comment
section. However, with the transition to Git $FreeBSD$ is no longer
expanded and so we emitted a literal $FreeBSD$.
$FreeBSD$ will be addressed in a holistic manner in the future, but at
least avoid embedding it into everything linked on FreeBSD (via csu).
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33594
Currently, unicast/multicast loopback raw ethernet (non-RDMA) packets
are sent back to the vport. A unicast loopback packet is the packet
with destination MAC address the same as the source MAC address. For
multicast, the destination MAC address is in the vport's multicast
filter list.
Moreover, the local loopback is not needed if there is one or none
user space context.
After this patch, the raw ethernet unicast and multicast local
loopback are disabled by default. When there is more than one user
space context, the local loopback is enabled.
Note that when local loopback is disabled, raw ethernet packets are
not looped back to the vport and are forwarded to the next routing
level (eswitch, or multihost switch, or out to the wire depending on
the configuration).
mlx5: Implement flow steering helper functions for TCP sockets.
This change adds convenience functions to setup a flow steering rule based on
a TCP socket. The helper function gets all the address information from the
socket and returns a steering rule, to be used with HW TLS RX offload.
mlx5en: Create and destroy all flow tables and rules when the network interface attaches and detaches.
Previously flow steering tables and rules were only created and destroyed
at link up and down events, respectivly. Due to new requirements for adding
TLS RX flow tables and rules, the main flow steering table must always be
available as there are permanent redirections from the TLS RX flow table
to the vlan flow table.
mlx5en: Force all packets through the indirection table.
All packets must go through the indirection table, RQT,
because it is not possible to modify the RQN of the TIR
for direct dispatchment after it is created, typically
when the link goes up and down.
mlx5en: Implement support for internal queues, IQ.
Internal send queues are regular sendqueues which are reserved for WQE commands
towards the hardware and firmware. These queues typically carry resync
information for ongoing TLS RX connections and when changing schedule queues
for rate limited connections.
The internal queue, IQ, code is more or less a stripped down copy
of the existing SQ managing code with exception of:
1) An optional single segment memory buffer which can be read or
written as a whole by the hardware, may be provided.
2) An optional completion callback for all transmit operations, may
be provided.
3) Does not support mbufs.
mlx5en: Make the receive packet indirection table, RQT, static instead of dynamic.
Allocate the RQT once, pointing all initial entries to the drop RQN.
When opening the channels simplify modify the RQT, directing all traffic
to the new RQNs. Similarly when closing the channels point all RQT entries
back to the so-called drop RQN.
mlx5en: Implement dummy receive queue, RQ, for dropping packets.
What is a drop RQ and why is it needed?
The RSS indirection table, also called the RQT, selects the
destination RQ based on the receive queue number, RQN. The RQT is
frequently referred to by flow steering rules to distribute traffic
among multiple RQs. The problem is that the RQs cannot be destroyed
before the RQT referring them is destroyed too. Further, TLS RX
rules may still be referring to the RQT even if the link went
down. Because there is no magic RQN for dropping packets, we create
a dummy RQ, also called drop RQ, which sole purpose is to drop all
received packets. When the link goes down this RQN is filled in all
RQT entries, of the main RQT, so the real RQs which are about to be
destroyed can be released and the TLS RX rules can be sustained.
mlx5en: Patch to inhibit transmit doorbell writes during packet reception.
During packet reception the network stack frequently transmit data in
response to TCP window updates. To reduce the number of transmit doorbells
needed, inhibit all transmit doorbells designated for the same channel until
after the reception of packets for the given channel is completed.
While at it slightly refactor the mlx5e_tx_notify_hw() function:
1) The doorbell information is always stored into sq->doorbell.d64 .
No need to pass a separate pointer to this variable.
2) Move checks for skipping doorbell writes inside this function.
Ryan Stone [Tue, 8 Feb 2022 15:08:50 +0000 (16:08 +0100)]
LRO: Don't merge ACK and non-ACK packets together
LRO was willing to merge ACK and non-ACK packets together. This
can cause incorrect th_ack values to be reported up the stack.
While non-ACKs are quite unlikely to appear in practice, LRO's
behaviour is against the spec. Make LRO unwilling to merge
packets with different TH_ACK flag values in order to fix the
issue.
Found by: Sysunit test
Differential Revision: https://reviews.freebsd.org/D33775
Reviewed by: rrs
Ryan Stone [Tue, 8 Feb 2022 15:08:50 +0000 (16:08 +0100)]
LRO: Fix lost packets when merging 1 payload with an ACK
To check if it needed to regenerate a packet's header before
sending it up the stack, LRO was checking if more than one payload
had been merged into the packet. This failed in the case where
a single payload was merged with one or more pure ACKs. This
results in lost ACKs.
Fix this by precisely tracking whether header regeneration is
required instead of using an incorrect heuristic.
Found with: Sysunit test
Differential Revision: https://reviews.freebsd.org/D33774
Reviewed by: rrs
Use layer five checksum flags in the mbuf packet header to pass on crypto state.
The mbuf protocol flags get cleared between layers, and also it was discovered
that M_DECRYPTED conflicts with M_HASFCS when receiving ethernet patckets.
Add the proper CSUM_TLS_MASK and CSUM_TLS_DECRYPTED defines, and start using
these instead of M_DECRYPTED inside the TCP LRO code.
This change is needed by coming TLS RX hardware offload support patches.
Update the TCP LRO code to handle both encrypted and un-encrypted traffic.
Encrypted and un-encrypted traffic needs to be coalesced separately.
Split the 16-bit lro_type field in the address information into two
8-bit fields, and then use the last 8-bit field for flags, which among
other indicate if the received mbuf is encrypted or un-encrypted.
nd6: use CARP link level address in SLLAO for NS sent out
When sending an NS, check if we are using a IPv6 CARP address
and if we do, then put proper CARP link level address into
ND_OPT_SOURCE_LINKADDR option and also put PACKET_TAG_CARP tag
on the packet. The latter will enforce CARP link level address
at the data link layer too, which might be necessary for broken
implementations.
The code really follows what NA sending code has been doing since
introduction of carp(4). While here, bring to style(9) the whole
block of code.
Mark Johnston [Mon, 31 Jan 2022 21:14:00 +0000 (16:14 -0500)]
pf: Initialize pf_kpool mutexes earlier
There are some error paths in ioctl handlers that will call
pf_krule_free() before the rule's rpool.mtx field is initialized,
causing a panic with INVARIANTS enabled.
Fix the problem by introducing pf_krule_alloc() and initializing the
mutex there. This does mean that the rule->krule and pool->kpool
conversion functions need to stop zeroing the input structure, but I
don't see a nicer way to handle this except perhaps by guarding the
mtx_destroy() with a mtx_initialized() check.
Constify some related functions while here and add a regression test
based on a syzkaller reproducer.
Reported by: syzbot+77cd12872691d219c158@syzkaller.appspotmail.com
Reviewed by: kp
Sponsored by: The FreeBSD Foundation
Cy Schubert [Sat, 8 Jan 2022 04:12:50 +0000 (20:12 -0800)]
ipfilter: Restore ipfsync
ipfsync is a WIP sync daemon designed to be used in a failover scenario.
It was removed by 5ee61c7daa511927aae8652d6a3ea78866a50ef8. This commit
restores its three files. ipfsync is in my work queue.
Cy Schubert [Tue, 4 Jan 2022 03:04:04 +0000 (19:04 -0800)]
ipfilter userland: Fix branch mismerge
The work to ANSIfy and adjust returns to style(9) resulted in a mismerge
of a stash when ipfilter was moved from contrib to sbin. An older file
replaced WIP at the time, resulting in a regression.
The majority of this work was done in 2018 saved as git stashes within
a git-svn tree and migrated to the git tree. The regression occurred
when the various stashes were sequentially merged to create individual
commits, following the ipfilter move to netpfil and sbin.
Cy Schubert [Tue, 21 Dec 2021 17:22:10 +0000 (09:22 -0800)]
ipfilter: INLINE --> inline
Replace the INLINE macro with inline. Some ancient compilers supported
__inline__ instead of inline. The INLINE hack compensated for it.
Ancient compilers are history.
Cy Schubert [Mon, 20 Dec 2021 17:07:20 +0000 (09:07 -0800)]
ipflter: ANSIfy userland function declarations
Convert ipfilter userland function declarations from K&R to ANSI. This
syncs our function declarations with NetBSD hg commit 75edcd7552a0
(apply our changes). Though not copied from NetBSD, this change was
partially inspired by NetBSD's work and inspired by style(9).
Cy Schubert [Mon, 20 Dec 2021 16:43:49 +0000 (08:43 -0800)]
ipflter: ANSIfy kernel function declarations
Convert ipfilter kernel function declarations from K&R to ANSI. This
syncs our function declarations with NetBSD hg commit 75edcd7552a0
(apply our changes). Though not copied from NetBSD, this change was
partially inspired by NetBSD's work and inspired by style(9).
Cy Schubert [Wed, 15 Dec 2021 21:45:47 +0000 (13:45 -0800)]
ipfilter: Move userland bits to sbin
Through fixes and improvements our ipfilter sources have diverged
enough to warrant move from contrib into sbin/ipf. Now that I'm
planning on implementing MSS clamping as in iptables it makes more
sense to move ipfilter to sbin.
This is the second of three commits of the ipfilter move.
Suggested by glebius on two occaions.
Suggested by and discussed with: glebius
Reviewed by: glebius, kp (for #network)
Differential Revision: https://reviews.freebsd.org/D33510
Cy Schubert [Wed, 15 Dec 2021 16:28:18 +0000 (08:28 -0800)]
ipfilter: Move kernel bits to netpfil
Through fixes and improvements our ipfilter sources have diverged
enough to warrant move from contrib into sys/netpil. Now that I'm
planning on implementing MSS clamping as in iptables it makes more
sense to move ipfilter to netpfil.
This is the first of three commits the ipfilter move.
Suggested by glebius on two occaions.
Suggested by and discussed with: glebius
Reviewed by: glebius, kp (for #network)
Differential Revision: https://reviews.freebsd.org/D33510