Andrew Gallatin [Wed, 26 May 2021 13:54:26 +0000 (09:54 -0400)]
cxgbe: fix enabling lro & rxtimestamps
A recent change caused iq flags, like LRO, to be set before
init_iq(). However, init_iq() clears those flags, so they
became effectively impossible to set. This change moves
the initializion of these flags to after the call to init_iq().
This fixes LRO.
Kristof Provost [Tue, 18 May 2021 07:24:50 +0000 (09:24 +0200)]
pf: Move nvlist conversion functions to pf_nv
Separate the conversion functions (between kernel structs and nvlists)
to pf_nv. This reduces the size of pf_ioctl.c, which is already quite
large and complex, a good bit. It also keeps all the fairly
straightforward conversion code together.
Eugene Grosbein [Wed, 26 May 2021 11:30:24 +0000 (18:30 +0700)]
rc.d/random: add support for zero harvest_mask
Replace the check for zero harvest_mask with new check for empty string.
This allows one to specify harvest_mask="0" that disables harversting
entropy from all but "pure" sources. Exact bit values for "pure" sources
differ for stable/12 and later branches, so it is handy to use zero.
The check for zero pre-dates introduction of "pure" non-maskable sources
Use empty string to disable altering sysctl kern.random.harvest.mask.
Note that notion of "pure" random sources is not documented in user level
manual pages yet. Still, it helps to extend battery life for hardware
with embedded "Intel Secure Key RNG" by disabling all other sources.
Note that no defaults changed and default behaviour is not affected.
Randall Stewart [Wed, 26 May 2021 10:43:30 +0000 (06:43 -0400)]
tcp: Add a socket option to rack so we can test various changes to the slop value in timers.
Timer_slop, in TCP, has been 200ms for a long time. This value dates back
a long time when delayed ack timers were longer and links were slower. A
200ms timer slop allows 1 MSS to be sent over a 60kbps link. Its possible that
lowering this value to something more in line with todays delayed ack values (40ms)
might improve TCP. This bit of code makes it so rack can, via a socket option,
adjust the timer slop.
Dmitry Chagin [Wed, 26 May 2021 05:34:32 +0000 (08:34 +0300)]
linux_common: retire extra module version.
The second 'linuxcommon' line was added by c66f5b079d2a259c3a65b1efe0f2143cd030dc52
but Linuxulator's modules dependend on 'linux_common'.
To avoid such mistakes in the future rename moduledata name and module
name to 'linux_common' and retire 'linuxcommon' line.
John Baldwin [Tue, 25 May 2021 23:59:19 +0000 (16:59 -0700)]
crypto: Add crypto_cursor_segment() to fetch both base and length.
This function combines crypto_cursor_segbase() and
crypto_cursor_seglen() into a single function. This is mostly
beneficial in the unmapped mbuf case where back to back calls of these
two functions have to iterate over the sub-components of unmapped
mbufs twice.
Bump __FreeBSD_version for crypto drivers in ports.
John Baldwin [Tue, 25 May 2021 23:59:19 +0000 (16:59 -0700)]
Assume OCF is the only KTLS software backend.
This removes support for loadable software backends. The KTLS OCF
support is now always included in kernels with KERN_TLS and the
ktls_ocf.ko module has been removed. The software encryption routines
now take an mbuf directly and use the TLS mbuf as the crypto buffer
when possible.
Bump __FreeBSD_version for software backends in ports.
John Baldwin [Tue, 25 May 2021 23:59:18 +0000 (16:59 -0700)]
crypto: Add a new type of crypto buffer for a single mbuf.
This is intended for use in KTLS transmit where each TLS record is
described by a single mbuf that is itself queued in the socket buffer.
Using the existing CRYPTO_BUF_MBUF would result in
bus_dmamap_load_crp() walking additional mbufs in the socket buffer
that are not relevant, but generating a S/G list that potentially
exceeds the limit of the tag (while also wasting CPU cycles).
Rick Macklem [Tue, 25 May 2021 21:19:29 +0000 (14:19 -0700)]
nfscl: Use hash lists to improve expected search performance for opens
A problem was reported via email, where a large (130000+) accumulation
of NFSv4 opens on an NFSv4 mount caused significant lock contention
on the mutex used to protect the client mount's open/lock state.
Although the root cause for the accumulation of opens was not
resolved, it is obvious that the NFSv4 client is not designed to
handle 100000+ opens efficiently. When searching for an open,
usually for a match by file handle, a linear search of all opens
is done.
Commit 3f7e14ad9345 added a hash table of lists hashed on file handle
for the opens. This patch uses the hash lists for searching for
a matching open based of file handle instead of an exhaustive
linear search of all opens.
This change appears to be performance neutral for a small number
of opens, but should improve expected performance for a large
number of opens. This patch also moves any found match to the front
of the hash list, to try and maintain the hash lists in recently
used ordering (least recently used at the end of the list).
This commit should not affect the high level semantics of open
handling.
Bjoern A. Zeeb [Tue, 25 May 2021 17:37:15 +0000 (17:37 +0000)]
Bump __FreeBSD_version to 1400015 for LinuxKPI changes.
Commits 17accc08ae15 and de102f870501 add new files to LinuxKPI
which break drm-kmod. In addition various other additions where
comitted. Bump __FreeBSD_version to 1400015 to be able to detect this.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Bjoern A. Zeeb [Mon, 24 May 2021 17:54:16 +0000 (17:54 +0000)]
LinuxKPI: byteorder.h
Add a few more le<n>_{tp,add}_cpu*() #defines/functions found in
wireless drivers. While here fill most of the combinatorics gaps
and also add the remaining combinations [1].
Suggested by: emaste [1] (for one part)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30418
Bjoern A. Zeeb [Mon, 24 May 2021 18:09:37 +0000 (18:09 +0000)]
LinuxKPI: add ether_addr_equal_unaligned()
Replace the implementation for ether_addr_equal() with
ether_addr_equal_unaligned() and add a define for ether_addr_equal()
pointing to the now ether_addr_equal_unaligned() implementation.
This way ether_addr_equal_unaligned() cannot be broken by accident [1].
Suggested by: emaste [1]
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30425
Bjoern A. Zeeb [Mon, 24 May 2021 18:14:37 +0000 (18:14 +0000)]
LinuxKPI: add irq_set_affinity_hint()
Add an implementation for irq_set_affinity_hint() to linux/interrupt.h
and include linux/hardirq.h for synchronize_irq() as needed by
wireless drivers.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30427
Bjoern A. Zeeb [Mon, 24 May 2021 18:17:30 +0000 (18:17 +0000)]
LinuxKPI: add linux/{ip,tcp,udp}.h
Add header files for struct and accessors for IPv4, UDP, and TCP.
Only parts of the fields of the structs have been seen while working
on wireless drivers. The remaining field names are filled up with
the FreeBSD field names for now. If you have insights into their
correct naming in Linux, feel free to adjust.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30428
Bjoern A. Zeeb [Mon, 24 May 2021 18:26:41 +0000 (18:26 +0000)]
LinuxKPI: change BUILD_BUG_ON()
BUILD_BUG_ON() can be used inside functions where the definition to
CTASSERT() (_Static_assert()) seems to not work.
Go back to an old-style CTASSERT() implementation but also add a
variable dclaration to avoid "unsued typedef" errors and dummy-use
the variable to avoid "unusued variable" errors. Given it is all
self-contained in a block and not used outside this should be
optimised away.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30431
Bjoern A. Zeeb [Mon, 24 May 2021 18:40:42 +0000 (18:40 +0000)]
LinuxKPI: add rcu_dereference_check()
Add a define for rcu_dereference_check() to rcu_dereference_protected()
which ignores the check argument. Our lockdep compat implementation
for use cases found in iwlwifi would return 1 anyway.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30436
Bjoern A. Zeeb [Mon, 24 May 2021 18:53:28 +0000 (18:53 +0000)]
LinuxKPI: extract stringify() in their own header file
Add linux/stringify.h as directly included by drivers. Remove the
definitions from compiler.h and include the new header in places
where the stringify macros are already used without linuxkpi.
Randall Stewart [Tue, 25 May 2021 17:23:31 +0000 (13:23 -0400)]
tcp: Fix bugs related to the PUSH bit and rack and an ack war
Michaels testing with UDP tunneling found an issue with the push bit, which was only partly fixed
in the last commit. The problem is the left edge gets transmitted before the adjustments are done
to the send_map, this means that right edge bits must be considered to be added only if
the entire RSM is being retransmitted.
Now syzkaller also continued to find a crash, which Michael sent me the reproducer for. Turns
out that the reproducer on default (freebsd) stack made the stack get into an ack-war with itself.
After fixing the reference issues in rack the same ack-war was found in rack (and bbr). Basically
what happens is we go into the reassembly code and lose the FIN bit. The trick here is we
should not be going into the reassembly code if tlen == 0 i.e. the peer never sent you anything.
That then gets the proper action on the FIN bit but then you end up in LAST_ACK with no
timers running. This is because the usrclosed function gets called and the FIN's and such have
already been exchanged. So when we should be entering FIN_WAIT2 (or even FIN_WAIT1) we get
stuck in LAST_ACK. Fixing this means tweaking the usrclosed function so that we properly
recognize the condition and drop into FIN_WAIT2 where a timer will allow at least TP_MAXIDLE
before closing (to allow time for the peer to retransmit its FIN if the ack is lost). Setting the fast_finwait2
timer can speed this up in testing.
Chuck Silvers [Tue, 25 May 2021 16:42:10 +0000 (09:42 -0700)]
fsdb: add missing bufinit() call
The bufinit() call in fsck_ffs was moved in commit f190f9193bc10
from a function that is shared with fsdb to one that is private to fsck_ffs,
so add a bufinit() call in fsdb to compensate for that.
If copyin family of routines fault, kernel does clear PSL.AC on the
fault entry, but the AC flag of the faulted frame is kept intact. Since
onfault handler is effectively jump, AC survives until syscall exit.
Reported by: m00nbsd, via Sony
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
admbugs: 975
Warner Losh [Tue, 25 May 2021 15:14:32 +0000 (09:14 -0600)]
cam: remove sim callout
Nothing is using the sim callout to unfreeze the queue. Remove it to
simplify the SIM. This was introduced in the original CAM commit in 1998
but setting the CAM_SIM_REL_TIMEOUT_PENDING flag was removed in 1999 in
commit 87cfaf0e1fbd which reworked how bus reset worked. That work was
merged just after 3.2R was released. Remove the unused residuals.
Randall Stewart [Mon, 24 May 2021 18:42:15 +0000 (14:42 -0400)]
tcp: Fix an issue with the PUSH bit as well as fill in the missing mtu change for fsb's
The push bit itself was also not actually being properly moved to
the right edge. The FIN bit was incorrectly on the left edge. We
fix these two issues as well as plumb in the mtu_change for
alternate stacks.
Kristof Provost [Mon, 24 May 2021 06:32:16 +0000 (08:32 +0200)]
pf: fix ioctl() memory leak
When we create an nvlist and insert it into another nvlist we must
remember to destroy it. The nvlist_add_nvlist() function makes a copy,
just like nvlist_add_string() makes a copy of the string. If we don't
we're leaking memory on every (nvlist-based) ioctl() call.
While here remove two redundant 'break' statements.
Andrew Turner [Thu, 20 May 2021 06:52:15 +0000 (06:52 +0000)]
Clean up early arm64 pmap code
Early in the arm64 pmap code we need to translate between a virtual
address and a physical address. Rather than manually walking the page
table we can ask the hardware to do it for us.
Navdeep Parhar [Sun, 23 May 2021 21:58:29 +0000 (14:58 -0700)]
cxgbe(4): Overhaul CLIP (Compressed Local IPv6) table management.
- Process the list of local IPs once instead of once per adapter. Add
addresses from all VNETs to the driver's list but leave hardware
updates for later when the global VNET/IFADDR list locks have been
released.
- Add address to the hardware table synchronously when a CLIP entry is
requested for an address that's not already in there.
- Provide ioctls that allow userspace tools to manage addresses in the
CLIP table.
- Add a knob (hw.cxgbe.clip_db_auto) that controls whether local IPs are
automatically added to the CLIP table or not.
With this patch:
% dmesg | grep -i uart
uart2: <Intel Gemini Lake SIO/LPSS UART 0> mem 0xa1426000-0xa1426fff,0xa1425000-0xa1425fff irq 4 at device 24.0 on pci0
uart3: <Intel Gemini Lake SIO/LPSS UART 1> mem 0xa1424000-0xa1424fff,0xa1423000-0xa1423fff irq 5 at device 24.1 on pci0
uart4: <Intel Gemini Lake SIO/LPSS UART 2> mem 0xfea10000-0xfea10fff irq 6 at device 24.2 on pci0
uart5: <Intel Gemini Lake SIO/LPSS UART 3> mem 0xa1422000-0xa1422fff,0xa1421000-0xa1421fff irq 7 at device 24.3 on pci0
Adrian Chadd [Sun, 23 May 2021 04:23:00 +0000 (21:23 -0700)]
ath: bump the default node queue size to 128 frames, not 64
It turns out that, silly adrian, setting it to 64 means only two
AMPDU frames of 32 subframes each. Thus, whilst those are in-flight,
any subsequent queues frames to that node get dropped.
This ends up being pretty no bueno for performance if any receive
is also going on at that point.
Instead, set it to 128 for the time being to ensure that SOME
frames get queued in the meantime. This results in some frames
being immediately available in the software queue for transmit
when the two existing A-MPDU frames have been completely sent,
rather than the queue remaining empty until at least one is sent.
It's not the best solution - I still think I'm scheduling receive
far more often than giving time to schedule transmit work -
but at least now I'm not starving the transmit side.
Before this, a bidirectional iperf would show receive at ~ 150mbit/sec.
but the transmit side at like 10kbit/sec. With it set to 128 it's
now 150mbit/sec receive, and ~ 10mbit receive. It's better than 10kbit/sec,
but still not as far as I'd like it to be.
Tested:
* AR9380/QCA934x (TL-WDR4300 AP), Macbook pro test STA + AR9380 test STA