Bojan Novković [Fri, 15 Sep 2023 10:41:10 +0000 (12:41 +0200)]
amd64: Add a leaf PTP when pmap_enter(psind=1) creates a wired mapping
This patch reverts the changes made in D19670 and fixes the original
issue by allocating and prepopulating a leaf page table page for wired
userspace 2M pages.
The original issue is an edge case that creates an unmapped, wired
region in userspace. Subsequent faults on this region can trigger wired
superpage creation, which leads to a panic in pmap_demote_pde_locked()
as the pmap does not create a leaf page table page for the wired
superpage. D19670 fixed this by disallowing preemptive creation of
wired superpage mappings, but that fix is currently interfering with an
ongoing effort of speeding up vm_map_wire for large, contiguous entries
(e.g. bhyve wiring guest memory).
lib/libc/tests/string/strcspn_test.c: extend tests to catch previous bug
This extends the strcspn() unit tests to catch mistakes in the
implementation that only appear when a mismatch occurs in a certain
position of the string against a certain position of the set.
See also: 52d4a4d4e0dedc72bc33082a3f84c2d0fd6f2cbb
Sponsored by: The FreeBSD Foundation
Approved by: imp
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D41821
Pierre Pronchery [Fri, 15 Sep 2023 15:14:16 +0000 (17:14 +0200)]
libcrypto: link engines and the legacy provider to libcrypto
OpenSSL's legacy provider module and engines need to link to
libcrypto.so, as it provides some of the actual implementations of
legacy routines.
This is a little tricky due to build order issues. Introduce a small
hack (LIBCRYPTO_WITHOUT_SUBDIRS) that builds libcrypto.so in its usual
early phase without any OpenSSL provider modules or engines. This is
intended to restore the test suite; a future change should remove the
hack and replace it with a better approach.
PR: 254853, 273528
Discussed with: Folks at EuroBSDCon in Coimbra
Sponsored by: The FreeBSD Foundation
sockets: re-check socket state after call to pr_rcvd()
Socket state may have changed after dropping the receive
buffer lock in order to call pr_rcvd(). If the buffer is
empty, re-check the state after reaquiring the lock and
skip calling sbwait() if the socket is in error or the
peer has closed.
lib/libc/amd64/string/memcmp.S: harden against phony buffer lengths
When memcmp(a, b, len) (or equally, bcmp) is called with a phony length
such that a + len < a, the code would malfunction and not compare the
two buffers correctly. While such arguments are illegal (buffers do not
wrap around the end of the address space), it is neverthless conceivable
that people try things like memcmp(a, b, SIZE_MAX) to compare a and b
until the first mismatch, in the knowledge that such a mismatch exists,
expecting memcmp() to stop comparing somewhere around the mismatch.
While memcmp() is usually written to confirm to this assumption, no
version of ISO/IEC 9899 guarantees this behaviour (in contrast to
memchr() for which it is).
Neverthless it appears sensible to at least not grossly misbehave on
phony lengths. This change hardens memcmp() against this case by
comparing at least until the end of the address space if a + len
overflows a 64 bit integer.
An explicit tzset() call is usually not needed as it happens implicitly
the first time we call localtime() or mktime(), but in some cases
(sandboxing, chroot) this may be too late.
kinst currently only traces functions that start and end with the usual
function prologue and epilogue respectively. Ignoring functions that do
not have an epilogue however, makes the filtering too strict, as this
means that we can not trace functions that never return (e.g
vnlru_proc()). This patch relaxes the filtering and only checks whether
the function pushes the frame pointer.
LinuxKPI: 802.11: fix counting the number of supbands
While the main purpose was to assign an(y) early chandef with the
loop, later additions made use of it to also count supbands as well
as to initialise max_rates.
Due to the main goal we can exit the loop early and not properly
count and initialise supbands and max_rates.
Move the terminating condition into the loop and make it a continue
rather than ending the loop.
Fixes: d9945d7821b9b ("improve hw_scan")
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
LinuxKPI: 802.11: make sure bssid for scans (probe_req) is set
In b0f73768220e9 we added bssid[] to struct cfg80211_scan_request
likely while working on mt76 and did not need it (yet) back then.
iwlwifi started to use the field in Linux f1fec51cda70f (April 2023).
Without it set firmware crashes when trying to send probe requests
((empty) SSID also given to hw_scan).
For now always set the field to the wildcard BSSID.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
While debugging other problems I ran into the case where net80211
was thinking a scan was ongoing and new scans could not be started
but given other logging there was clearly no more scan running.
It was hard after the fact to quickly determine enough state to
reconstruct or validate assumptions.
Improve a MSG_SCAN debug logging and implement _db_show_scan() ddb
output which can be printed along with show com /S or /a.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
net80211: mark argument to scan_curchan_task() __unused
Mostly as documentation mark an unused argument to scan_curchan_task()
as __unused. We may possibly want to check all callers in the future
and see if the argument was supposed to be useful or should be entirely
removed.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
vfs: drop one vnode list lock trip during vnlru free recycle
vnlru_free_impl would take the lock prior to returning even though most
frequent caller does not need it.
Unsurprisingly vnode_list mtx is the primary bottleneck when recycling
and avoiding the useless lock trip helps.
Setting maxvnodes to 400000 and running 20 parallel finds each with a
dedicated directory tree of 1 million vnodes in total:
before: 4.50s user 1225.71s system 1979% cpu 1:02.14 total
after: 4.20s user 806.23s system 1973% cpu 41.059 total
That's 34% reduction in total real time.
With this the block *remains* the primary bottleneck when running on
ZFS.
arp(8): fix by-interface and by-host filtering when using netlink
arp(8) has traditionally supported filtering by interface via -i and
by hostname. However, this functionality was omitted from the initial
netlink-ification of arp. This patch re-introduces this filtering
functionality.
This patch also improves by-interface filtering by storing and using the
ifindex of the requested interface for filtering instead of comparing
interface name strings
pkgbase: put library links and symlinks in the -dev package
Some libraries (e.g. ncurses) install links to the main library for
backwards compatibilty. This change ensures that those links are in the
dev package since the files being linked to are in that package.
Avoid IPv6 source address selection on accepting TCP connections
When an application listens IPv6 TCP socket, due to ipfw
forwarding tag it may handle connections for addresses that do not
belongs to the jail or even current host (transparent proxy).
Syncache code can successfully handle TCP handshake for such connections.
When syncache finally accepts connection it uses in6_pcbconnect() to
properly initlize new connection info.
For IPv4 this scenario just works, but for IPv6 it fails when
local address doesn't belongs to the jail. This check occurs when
in6_pcbladdr() applies IPv6 SAS algorithm.
We need IPv6 SAS when we are connection initiator, but in the above
case connection is already established and both source and destination
addresses are known.
Use unused argument to notify in6_pcbconnect() when we don't need
source address selection. This will fix `ipfw fwd` to jailed IPv6
address.
When we are connection initiator, we stil use IPv6 SAS algorithm and
apply all related restrictions.
Hyper-V: vmbus: implementat bus_get_dma_tag in vmbus
In ARM64 Hyper-V UFS filesystem is getting corruption and those
corruptions are consistently happening just after hitting a page
boundary. It is unable to correctly read disk blocks into buffers
that are not aligned to 512-byte boundaries.
It happens because storvsc needs physically contiguous memory which
may not be the case when bus_dma needs to create a bounce buffer.
This can happen when the destination is not cache-line aligned.
Hyper-V VMs have VMbus synthetic devices and PCI pass-thru devices
that are added dynamically via the VMbus protocol and are not
represented in the ACPI DSDT. Only the top level VMbus node exists
in the DSDT. As such, on ARM64 these devices don't pick up coherence
information and default to not hardware coherent.
PR: 267654, 272666
Reviewed by: andrew, whu
Tested by: lwhsu
MFC after: 3 days
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D41728
Eric Joyner [Thu, 24 Aug 2023 23:26:13 +0000 (16:26 -0700)]
ice(4): Update to 1.38.16-k
New features
- Add sysctl "link_active_on_if_down" (defaults to 1 to match previous
behavior): set this to 0 to have the driver bring the physical link down when
the interface is brought administratively down
- Add sysctl "temp" to read chip temperature on E810 devices; this requires a
4.30 or newer NVM (see package sysutils/intel-nvmupdate-100g)
Bug fixes and general changes
- (linked to irdma) properly propagate PF reset request from irdma driver
- (linked to irdma) properly notify irdma of an impending PF reset
- (linked to irdma) move Protocol Engine error handling to irdma
- Print log message when using a DDP that doesn't support the "TX balancing"
mode
- Block LLDP agent configuration when DSCP QoS mode is enabled
- Fix kernel panic when updating NVM when adapter is in the "TX balancing" mode
- Remove ice_sbq_cmd.h since it's unused
- Fix LLDP RX filter to still allow LLDP frames to be received by SW after a PF
reset in SW LLDP mode
- Add ice_if_needs_restart handler in order to fix a bad VLAN and link down
interaction
- Issue PF reset during unload
- nvmupdate process fixes
- Use pci_msix_table_bar() to get MSI-X bar index at runtime instead of hardcoding it
Signed-off-by: Eric Joyner <erj@FreeBSD.org>
Reviewed by: anzhu@netapp.com
MFC after: 3 days
Sponsored by: Intel Corporation, NetApp
Differential Revision: https://reviews.freebsd.org/D41655
Removing artificial completion generator as there had been no indication
of the code being required for E810 cards. Further more it was found
that the code may have unpleasant side effects on user experience when
using ucmatose tool.
It was found through testing that when ULP uses individual vnet, the
search for the correct vlan_id may failing because of no proper
interface with given address.
The solution is to use vnet associated to the connection whenever
possible.
Doug Moore [Wed, 13 Sep 2023 18:17:57 +0000 (13:17 -0500)]
powerpc pmap: initialize kernel pmap radix trie
Commit 2d2bcba7ba70141388729ed49674b36fd01146c5 changed radix trie
implementation and made it necessary that radix tries be initialized
with vm_radix_init. @dbaio reports that in some configurations, there
is a powerpc boot panic and that this commit introduced the
problem. In powerpc/aim/mmu_radix.c, the radix trie in kernel_pmap is
initialized by zeroing all its fields.
Add a call to vm_radix_init to properly initialize
kernel_pmap->pm_radix.
89f361f742ae added a mechanism to allow arbitrary overrides from the
command line. Unfortunately, it also had the (likely unintended)
effect of allowing RELEASE and VERSION to be passed in from the
environment, and Makefile.inc1 happens to define VERSION for the
benefit of pkgbase. To restore the status quo, unset RELEASE and
VERSION at the top of the script.
Newer versions of the AX88179 interweave dummies alongside valid
packet headers in bulk IN transfer data. This was probably done for
backward compatibility with existing drivers.
However current driver records these dummy headers as dropped frames,
leading to stats misreporting one Ierr per Ipkt.
This skips those dummy headers silently, thereby not generating Ierrs
for them.
Wei Hu [Wed, 13 Sep 2023 10:48:02 +0000 (10:48 +0000)]
mana: add ioctl to support toggling offloading features
With this support, users can enable or disable offloading features
such as txcsum, rxcsum, tso and software lro through ifconfig.
To enable or disable tx features, do it on mana interface first,
then hn/netvsc to sync it up with mana. For example:
ifconfig mana0 -txcsum
ifconfig hn0 -tscsum
To enable or disable rx features, just applying on mana interface
would be sufficient.
Disabling txcsum imples disabling tso. Enabling tso when txcsum
is disabled will result in an error message in dmesg requesting
to enable txcsum first.
Above applies to ipv6 offloading features as well.
Tested by: whu
MFC after: 3 days
Sponsored by: Microsoft
ipfilter: Avoid allocating a new ipf token when not needed
Only allocate a new ipftoken_t if one cannot be found. This eliminates
allocating unnecessary token structures that will never be used when
performing simple lookups for existing token structures.
Michael Tuexen [Tue, 12 Sep 2023 23:33:54 +0000 (01:33 +0200)]
sctp: improve shutting down the read side of a socket
When shutdown(..., SHUT_RD) or shutdown(..., SHUT_RDWR) is called,
really clean up the read queue and issue an ungraceful shutdown if
user messages are affected.
Reported by: syzbot+d4e1d30d578891245f59@syzkaller.appspotmail.com
MFC after: 3 days
ncurses: avoid hardcoded assumptions about the layout of .OBJDIR
Abstract out the details of the FreeBSD build into a $TINFO_OBJDIR that
external builds can override if they orchestrate the build a bit
differently and have a different objdir layout as a result. This makes
the ncurses build a little bit more flexible without requiring weird
backflips.
Reviewed by: bapt, sjg
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D41834
sh: introduce a function to iterate over all aliases
Currently the data structure holding alias information is opaque for
consumers outside alias.c and there is no way to iterate over all
aliases, which will become needed by a future commit.
The new function "iteralias" takes a null pointer to return the first
alias or an existing alias to return the next one, unless there is
no alias to return, in which case it returns a null pointer.
I slightly changed the static function hashalias so that it returns the
index into the array holding link heads, and not the link head directly.
In this form it's easier to use by iteralias and the slight adjustment
in the three existing callers doesn't look too bad.
Alan Somers [Tue, 12 Sep 2023 01:20:39 +0000 (19:20 -0600)]
Fix zfsd with the device_removal pool feature.
Previously zfsd would crash in the presence of a pool with a
top-level-vdev that had previously been removed. The crash happened
because the configuration nvlist of such a TLV contains an empty
ZPOOL_CONFIG_CHILDREN array, which led to a pop_front from an empty
list, which has undefined behavior.
The crash only happened in stable/14 and later, probably do to
differences in libcxx, but the change should be MFCed anyway.
Doug Moore [Tue, 12 Sep 2023 13:10:15 +0000 (08:10 -0500)]
powerpc_pmap: include proper vm_radix file
Like the pmap.h files for amd64, arm64, i386 and riscv, the one for powerpc
should #include vm/_vm_radix.h, not vm/vm_radix.h, to get a definition for
struct vm_radix.
pkgbase: Move headers and libs out of runtime and utilities
Headers from src/include were in the runtime-dev package but
subdirectories of src/include ended up in utilities-dev by default.
Neither package is a good choice - the headers in src/include are not
useful without the libraries contained in clibs-dev.
This moves the standard C headers to clibs-dev (C++ headers are already
in this package). While working on this, I found that various clang
libraries and headers were also bundled into utilities-dev by default
so these are also moved to clang-dev.
I also added a FreeBSD-build-essential meta package to make it simple to
install all the toolchain parts.
Doug Moore [Tue, 12 Sep 2023 07:42:38 +0000 (02:42 -0500)]
radix_trie: have vm_radix use pctrie code
Implement everything currently in vm_radix.c with calls to functions
in subr_pctrie.c, asccessed via the interface provided by the
DEFINE_PCTRIE_SMR macro.
Add back some #includes removed in the first attempt, and avoid the
use of a discontinued type in a bit of conditionally compiled code.
Xin LI [Tue, 12 Sep 2023 06:24:08 +0000 (23:24 -0700)]
Disable byteswap.h for now.
Ideally we should be testing __FreeBSD_version (1400079) and/or
BOOTSTRAPPING from an older version, but restore compatibility to
older FreeBSD versions and macOS while we find out a better way to
fix it.
lib/libc/amd64/string/strcspn.S: fix behaviour with sets of 17--32
When a string is matched against a set of 17--32 characters, each chunk
of the string is matched first against the first 16 characters of the
set and then against the remaining characters. We also check at the
same time if the string has a nul byte in the current chunk, terminating
the search if it does.
Due to misconceived logic, the order of checks was "first half of set,
nul byte, second half of set", meaning that a match with the second half
of the set was ignored when the string ended in the same 16 bytes.
Reverse the order of checks to fix this problem.
Sponsored by: The FreeBSD Foundation
Approved by: mjg (blanket, via IRC)
MFC after: 1 week
MFC to: stable/14
stand/loader.efi: fix regression with ignoring nvstore
To read/update the boot loader nvstore, we always need to call
zfs_attach_nvstore() regardless of whether we use bootonce key
in nvstore or the bootfs property of the pool. The call was
unintentionally left in the block of code that is processed
only when bootonce key is present.
Switch the repository to use https by default, base is providing a CA
root bundle suitable to validate the certificates used by the project.
This can now be activated without requiring another packages to be installed
Doug Moore [Mon, 11 Sep 2023 06:53:40 +0000 (01:53 -0500)]
radix_trie: have vm_radix use pctrie code
Implement everything currently in vm_radix.c with calls to functions
in subr_pctrie.c, asccessed via the interface provided by the
DEFINE_PCTRIE_SMR macro.
If we hit the csfailed case in pf_create_state() we may have allocated
a state, so we must also free it. While here reduce the amount of
duplicated cleanup code.
cxgbe(4): Fix tracing with netlink enabled kernels.
1. The tracing ifnet's name must match the nexus name. One way to do
this is to not use a unit number.
2. Do not hold the tracer lock around ether_ifattach or ether_ifdetach.
netlink calls back into the driver (see stack below) and that leads
to illegal lock recursion.
If we receive a state with a route-to interface name set and we can't
find the interface we do not insert the state. However, in that case we
must still clean up the state (and state keys).
Do so, so we do not leak states.
lib/libc/amd64/string/memchr.S: fix behaviour with overly long buffers
When memchr(buf, c, len) is called with a phony len (say, SIZE_MAX),
buf + len overflows and we have buf + len < buf. This confuses the
implementation and makes it return incorrect results. Neverthless we
must support this case as memchr() is guaranteed to work even with
phony buffer lengths, as long as a match is found before the buffer
actually ends.
Sponsored by: The FreeBSD Foundation
Reported by: yuri, des
Tested by: des
Approved by: mjg (blanket, via IRC)
MFC after: 1 week
MFC to: stable/14
PR: 273652
John Baldwin [Sat, 9 Sep 2023 19:13:57 +0000 (12:13 -0700)]
gic_acpi: Limit the number of CPUs to GIC_MAXCPU
madt_table_data contains an array of pointers for each CPU and was
allocated on the stack. If MAXCPU is raised to a sufficiently large
value this can overflow the kernel stack. Cap the stack growth by
using GIC_MAXCPU instead as for other parts of the gicv1/v2 driver in
commit a0e20c0ded1a.