Rick Macklem [Tue, 25 May 2021 21:19:29 +0000 (14:19 -0700)]
nfscl: Use hash lists to improve expected search performance for opens
A problem was reported via email, where a large (130000+) accumulation
of NFSv4 opens on an NFSv4 mount caused significant lock contention
on the mutex used to protect the client mount's open/lock state.
Although the root cause for the accumulation of opens was not
resolved, it is obvious that the NFSv4 client is not designed to
handle 100000+ opens efficiently. When searching for an open,
usually for a match by file handle, a linear search of all opens
is done.
Commit 3f7e14ad9345 added a hash table of lists hashed on file handle
for the opens. This patch uses the hash lists for searching for
a matching open based of file handle instead of an exhaustive
linear search of all opens.
This change appears to be performance neutral for a small number
of opens, but should improve expected performance for a large
number of opens. This patch also moves any found match to the front
of the hash list, to try and maintain the hash lists in recently
used ordering (least recently used at the end of the list).
This commit should not affect the high level semantics of open
handling.
Bjoern A. Zeeb [Tue, 25 May 2021 17:37:15 +0000 (17:37 +0000)]
Bump __FreeBSD_version to 1400015 for LinuxKPI changes.
Commits 17accc08ae15 and de102f870501 add new files to LinuxKPI
which break drm-kmod. In addition various other additions where
comitted. Bump __FreeBSD_version to 1400015 to be able to detect this.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Bjoern A. Zeeb [Mon, 24 May 2021 17:54:16 +0000 (17:54 +0000)]
LinuxKPI: byteorder.h
Add a few more le<n>_{tp,add}_cpu*() #defines/functions found in
wireless drivers. While here fill most of the combinatorics gaps
and also add the remaining combinations [1].
Suggested by: emaste [1] (for one part)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30418
Bjoern A. Zeeb [Mon, 24 May 2021 18:09:37 +0000 (18:09 +0000)]
LinuxKPI: add ether_addr_equal_unaligned()
Replace the implementation for ether_addr_equal() with
ether_addr_equal_unaligned() and add a define for ether_addr_equal()
pointing to the now ether_addr_equal_unaligned() implementation.
This way ether_addr_equal_unaligned() cannot be broken by accident [1].
Suggested by: emaste [1]
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30425
Bjoern A. Zeeb [Mon, 24 May 2021 18:14:37 +0000 (18:14 +0000)]
LinuxKPI: add irq_set_affinity_hint()
Add an implementation for irq_set_affinity_hint() to linux/interrupt.h
and include linux/hardirq.h for synchronize_irq() as needed by
wireless drivers.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30427
Bjoern A. Zeeb [Mon, 24 May 2021 18:17:30 +0000 (18:17 +0000)]
LinuxKPI: add linux/{ip,tcp,udp}.h
Add header files for struct and accessors for IPv4, UDP, and TCP.
Only parts of the fields of the structs have been seen while working
on wireless drivers. The remaining field names are filled up with
the FreeBSD field names for now. If you have insights into their
correct naming in Linux, feel free to adjust.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30428
Bjoern A. Zeeb [Mon, 24 May 2021 18:26:41 +0000 (18:26 +0000)]
LinuxKPI: change BUILD_BUG_ON()
BUILD_BUG_ON() can be used inside functions where the definition to
CTASSERT() (_Static_assert()) seems to not work.
Go back to an old-style CTASSERT() implementation but also add a
variable dclaration to avoid "unsued typedef" errors and dummy-use
the variable to avoid "unusued variable" errors. Given it is all
self-contained in a block and not used outside this should be
optimised away.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30431
Bjoern A. Zeeb [Mon, 24 May 2021 18:40:42 +0000 (18:40 +0000)]
LinuxKPI: add rcu_dereference_check()
Add a define for rcu_dereference_check() to rcu_dereference_protected()
which ignores the check argument. Our lockdep compat implementation
for use cases found in iwlwifi would return 1 anyway.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30436
Bjoern A. Zeeb [Mon, 24 May 2021 18:53:28 +0000 (18:53 +0000)]
LinuxKPI: extract stringify() in their own header file
Add linux/stringify.h as directly included by drivers. Remove the
definitions from compiler.h and include the new header in places
where the stringify macros are already used without linuxkpi.
Randall Stewart [Tue, 25 May 2021 17:23:31 +0000 (13:23 -0400)]
tcp: Fix bugs related to the PUSH bit and rack and an ack war
Michaels testing with UDP tunneling found an issue with the push bit, which was only partly fixed
in the last commit. The problem is the left edge gets transmitted before the adjustments are done
to the send_map, this means that right edge bits must be considered to be added only if
the entire RSM is being retransmitted.
Now syzkaller also continued to find a crash, which Michael sent me the reproducer for. Turns
out that the reproducer on default (freebsd) stack made the stack get into an ack-war with itself.
After fixing the reference issues in rack the same ack-war was found in rack (and bbr). Basically
what happens is we go into the reassembly code and lose the FIN bit. The trick here is we
should not be going into the reassembly code if tlen == 0 i.e. the peer never sent you anything.
That then gets the proper action on the FIN bit but then you end up in LAST_ACK with no
timers running. This is because the usrclosed function gets called and the FIN's and such have
already been exchanged. So when we should be entering FIN_WAIT2 (or even FIN_WAIT1) we get
stuck in LAST_ACK. Fixing this means tweaking the usrclosed function so that we properly
recognize the condition and drop into FIN_WAIT2 where a timer will allow at least TP_MAXIDLE
before closing (to allow time for the peer to retransmit its FIN if the ack is lost). Setting the fast_finwait2
timer can speed this up in testing.
Chuck Silvers [Tue, 25 May 2021 16:42:10 +0000 (09:42 -0700)]
fsdb: add missing bufinit() call
The bufinit() call in fsck_ffs was moved in commit f190f9193bc10
from a function that is shared with fsdb to one that is private to fsck_ffs,
so add a bufinit() call in fsdb to compensate for that.
If copyin family of routines fault, kernel does clear PSL.AC on the
fault entry, but the AC flag of the faulted frame is kept intact. Since
onfault handler is effectively jump, AC survives until syscall exit.
Reported by: m00nbsd, via Sony
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
admbugs: 975
Warner Losh [Tue, 25 May 2021 15:14:32 +0000 (09:14 -0600)]
cam: remove sim callout
Nothing is using the sim callout to unfreeze the queue. Remove it to
simplify the SIM. This was introduced in the original CAM commit in 1998
but setting the CAM_SIM_REL_TIMEOUT_PENDING flag was removed in 1999 in
commit 87cfaf0e1fbd which reworked how bus reset worked. That work was
merged just after 3.2R was released. Remove the unused residuals.
Randall Stewart [Mon, 24 May 2021 18:42:15 +0000 (14:42 -0400)]
tcp: Fix an issue with the PUSH bit as well as fill in the missing mtu change for fsb's
The push bit itself was also not actually being properly moved to
the right edge. The FIN bit was incorrectly on the left edge. We
fix these two issues as well as plumb in the mtu_change for
alternate stacks.
Kristof Provost [Mon, 24 May 2021 06:32:16 +0000 (08:32 +0200)]
pf: fix ioctl() memory leak
When we create an nvlist and insert it into another nvlist we must
remember to destroy it. The nvlist_add_nvlist() function makes a copy,
just like nvlist_add_string() makes a copy of the string. If we don't
we're leaking memory on every (nvlist-based) ioctl() call.
While here remove two redundant 'break' statements.
Andrew Turner [Thu, 20 May 2021 06:52:15 +0000 (06:52 +0000)]
Clean up early arm64 pmap code
Early in the arm64 pmap code we need to translate between a virtual
address and a physical address. Rather than manually walking the page
table we can ask the hardware to do it for us.
Navdeep Parhar [Sun, 23 May 2021 21:58:29 +0000 (14:58 -0700)]
cxgbe(4): Overhaul CLIP (Compressed Local IPv6) table management.
- Process the list of local IPs once instead of once per adapter. Add
addresses from all VNETs to the driver's list but leave hardware
updates for later when the global VNET/IFADDR list locks have been
released.
- Add address to the hardware table synchronously when a CLIP entry is
requested for an address that's not already in there.
- Provide ioctls that allow userspace tools to manage addresses in the
CLIP table.
- Add a knob (hw.cxgbe.clip_db_auto) that controls whether local IPs are
automatically added to the CLIP table or not.
With this patch:
% dmesg | grep -i uart
uart2: <Intel Gemini Lake SIO/LPSS UART 0> mem 0xa1426000-0xa1426fff,0xa1425000-0xa1425fff irq 4 at device 24.0 on pci0
uart3: <Intel Gemini Lake SIO/LPSS UART 1> mem 0xa1424000-0xa1424fff,0xa1423000-0xa1423fff irq 5 at device 24.1 on pci0
uart4: <Intel Gemini Lake SIO/LPSS UART 2> mem 0xfea10000-0xfea10fff irq 6 at device 24.2 on pci0
uart5: <Intel Gemini Lake SIO/LPSS UART 3> mem 0xa1422000-0xa1422fff,0xa1421000-0xa1421fff irq 7 at device 24.3 on pci0
Adrian Chadd [Sun, 23 May 2021 04:23:00 +0000 (21:23 -0700)]
ath: bump the default node queue size to 128 frames, not 64
It turns out that, silly adrian, setting it to 64 means only two
AMPDU frames of 32 subframes each. Thus, whilst those are in-flight,
any subsequent queues frames to that node get dropped.
This ends up being pretty no bueno for performance if any receive
is also going on at that point.
Instead, set it to 128 for the time being to ensure that SOME
frames get queued in the meantime. This results in some frames
being immediately available in the software queue for transmit
when the two existing A-MPDU frames have been completely sent,
rather than the queue remaining empty until at least one is sent.
It's not the best solution - I still think I'm scheduling receive
far more often than giving time to schedule transmit work -
but at least now I'm not starving the transmit side.
Before this, a bidirectional iperf would show receive at ~ 150mbit/sec.
but the transmit side at like 10kbit/sec. With it set to 128 it's
now 150mbit/sec receive, and ~ 10mbit receive. It's better than 10kbit/sec,
but still not as far as I'd like it to be.
Tested:
* AR9380/QCA934x (TL-WDR4300 AP), Macbook pro test STA + AR9380 test STA
Adrian Chadd [Sat, 22 May 2021 23:39:16 +0000 (16:39 -0700)]
[ath] Handle STA + AP beacon programming without stomping over HW AP beacon programming
I've been using STA+AP modes at home for a couple years now
and I've been finding and fixing a lot of weird corner cases.
This is the eventual patchset I've landed on.
* Don't force beacon resync in STA mode if we're using sw beacon tracking.
This stops a variety of stomping issues when the STA VAP is reconfigured;
the AP hardware beacons were being stomped on!
* Use the first AP VAP to configure beacons on, rather than the first VAP.
This prevents weird behaviour in ath_beacon_config() when the hardware
is being reconfigured and the STA VAP was the first one created.
* Ensure the beacon interval / timing programming is within the AR9300
HAL bounds by masking off any flags that may have been there before
shifting the value up to 1/8 TUs rather than the 1 TU resolution the
previous chips used.
Now I don't get weird beacon reprogramming during startup, STA state
changes and hardware recovery which showed up as HI-LARIOUS beacon
configurations and STAs that would just disconnect from the AP very
frequently.
Adrian Chadd [Mon, 19 Apr 2021 05:48:13 +0000 (22:48 -0700)]
[ar71xx] During reset, don't spin, just keep trying
I've seen this fail from time to time and just hang during reset.
Instead of it just hanging, just poke it again. I've not seen it
fail in hundreds of test resets now.
Rick Macklem [Sat, 22 May 2021 21:51:38 +0000 (14:51 -0700)]
nfscl: Add hash lists for the NFSv4 opens
A problem was reported via email, where a large (130000+) accumulation
of NFSv4 opens on an NFSv4 mount caused significant lock contention
on the mutex used to protect the client mount's open/lock state.
Although the root cause for the accumulation of opens was not
resolved, it is obvious that the NFSv4 client is not designed to
handle 100000+ opens efficiently. When searching for an open,
usually for a match by file handle, a linear search of all opens
is done.
This patch adds a table of hash lists for the opens, hashed on
file handle. This table will be used by future commits to
search for an open based on file handle more efficiently.
Lutz Donnerhacke [Fri, 21 May 2021 14:54:24 +0000 (16:54 +0200)]
tests/libalias: Add perfomance test utility
In order to compare upcoming changes for their effectivness, measure
performance by counting opertions and the runtime of each operation
over the time. Accumulate all tests in a single instance, so make it
complicated over the time. If you wait long enough, you will notice
the expiry of old flows.
It was possible that termination of ktrace session occured during some
record write, in which case write occured after the close of the vnode.
Use ktr_io_params refcounting to avoid this situation, by taking the
reference on the structure instead of vnode.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30400
Mateusz Guzik [Sat, 22 May 2021 19:48:31 +0000 (19:48 +0000)]
Fix a braino in previous.
Instead of trying to partially ifdef out ktrace handling, define the
missing identifier to 0. Without this fix lack of ktrace in the kernel
also means there is no SIGXFSZ signal delivery.
Mateusz Guzik [Tue, 18 May 2021 19:07:19 +0000 (21:07 +0200)]
lockprof: add contested-only profiling
This allows tracking all wait times with much smaller runtime impact.
For example when doing -j 104 buildkernel on tmpfs:
no profiling: 2921.70s user 282.72s system 6598% cpu 48.562 total
all acquires: 2926.87s user 350.53s system 6656% cpu 49.237 total
contested only: 2919.64s user 290.31s system 6583% cpu 48.756 total
Robert Wing [Thu, 20 May 2021 20:53:52 +0000 (12:53 -0800)]
fsck_ffs(8): fix divide by zero when debug messages are enabled
Only print buffer cache debug message when a cache lookup has been done.
When running `fsck_ffs -d` on a gjournal'ed filesystem, it's possible
that totalreads is greater than zero when no cache lookup has been
done - causing a divide by zero. This commit fixes the following error:
Mark Johnston [Sat, 22 May 2021 16:07:32 +0000 (12:07 -0400)]
ktrace: Avoid recursion in namei()
sys_ktrace() calls namei(), which may call ktrnamei(). But sys_ktrace()
also calls ktrace_enter() first, so if the caller is itself being
traced, the assertion in ktrace_enter() is triggered. And, ktrnamei()
does not check for recursion like most other ktrace ops do.
Fix the bug by simply deferring the ktrace_enter() call.
Also make the parameter to ktrnamei() const and convert to ANSI.
Reported by: syzbot+d0a4de45e58d3c08af4b@syzkaller.appspotmail.com
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30340
Michael Tuexen [Sat, 22 May 2021 12:35:09 +0000 (14:35 +0200)]
tcp: Handle stack switch while processing socket options
Handle the case where during socket option processing, the user
switches a stack such that processing the stack specific socket
option does not make sense anymore. Return an error in this case.
ktrace: add a kern.ktrace.filesize_limit_signal knob
When enabled, writes to ktrace.out that exceed the max file size limit
cause SIGXFSZ as it should be, but note that the limit is taken from
the process that initiated ktrace. When disabled, write is blocked,
but signal is not send.
Note that in either case ktrace for the affected process is stopped.
Requested and reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30257
accounting: explicitly mark the exiting thread as doing accounting
and use the mark to stop applying file size limits on the write of
the accounting record. This allows to remove hack to clear process
limits in acct_process(), and avoids the bug with the clearing being
ineffective because limits are also cached in the thread structure.
Reported and reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30257