Brooks Davis [Tue, 5 Dec 2023 19:03:35 +0000 (19:03 +0000)]
Cirrus-CI: fix git usage by build user
The git checkout it owned by root, but builds are run as "user". git
refuses to operate in such an environment unless the directory is
trusted so make "user" trust it.
Mark Johnston [Tue, 5 Dec 2023 18:47:03 +0000 (13:47 -0500)]
bhnd: Correct the softc size in the siba_bhndb_driver definition
struct siba_bhndb_softc embeds struct siba_softc and adds an extra
field, "quirks". In practice, this bug was harmless since "quirks" is
unconditionally initialized during driver attach and would have lived in
the redzone of the softc allocation, but KASAN catches the out-of-bounds
access.
Kristof Provost [Wed, 29 Nov 2023 18:06:31 +0000 (19:06 +0100)]
pf: remove incorrect fragmentation check
We do not need to check PFDESC_IP_REAS while tracking TCP state.
Moreover, this check incorrectly considers no-data packets (e.g. RST) to
be in-window when this flag is not set.
Loss of the trailing space in the multi-line format string has
resulted in column name being emitted as "FAILSLEEP", instead of
two columns "FAIL" and "SLEEP".
Brooks Davis [Mon, 4 Dec 2023 20:36:08 +0000 (20:36 +0000)]
Remove never implemented sbrk and sstk syscalls
Both system calls were stubs returning EOPNOTSUPP and libc did not
provide _ or __sys_ prefixed symbols. The actual implementation of
sbrk(2) is on top of the undocumented break(2) system call.
Technically this is a change in ABI, but no non-contrived program ever
called these syscalls.
Gleb Smirnoff [Mon, 4 Dec 2023 18:19:46 +0000 (10:19 -0800)]
kern/subr_trap.c: repair the HPTS performance hack in userret()
It wasn't functional as subr_trap.c doesn't include opt_inet.h. Put a
better comment provided by gallatin@ in place of the old one. The idea
is to use userret() as a cheap place to call a soft clock. This approach
saves CPU on busy machines and saves power on idle machines.
An alternative would be to constantly schedule callouts. Running with
neither callouts nor the soft clock ruins HPTS precision.
Gleb Smirnoff [Mon, 4 Dec 2023 18:19:46 +0000 (10:19 -0800)]
tcp/hpts: make stacks responsible for clearing themselves out HPTS
There already is the tfb_tcp_timer_stop_all method that is supposed to stop
all time events associated with a given tcpcb by given stack. Some time
ago it was doing actual callout_stop(). Today bbr/rack just mark their
internal state as inactive in their tfb_tcp_timer_stop_all methods, but
tcpcb stays in HPTS wheel and potentially called in from HPTS. Change the
methods to also call tcp_hpts_remove(). Note: I'm not sure if internal
flag is still relevant once we are out of HPTS wheel.
Call the method when connection goes into TCP_CLOSED state, instead of
calling it later when tcpcb is freed. Also call it when we switch between
stacks.
Gleb Smirnoff [Mon, 4 Dec 2023 18:19:46 +0000 (10:19 -0800)]
lro: separate HPTS specific code into tcp_lro_hpts.c
Put same copyright header as tcp_hpts.c has, since all this code
was developed by Randall Stewart <rrs@FreeBSD.org> as a part of
the HPTS work. Also copy Mellanox copyright from tcp_lro.c as
Hans Petter Selasky also participated in restructuring the code.
Gleb Smirnoff [Mon, 4 Dec 2023 18:18:56 +0000 (10:18 -0800)]
if_tuntap: fix NOIP build
Note: this removes one TUNDEBUG() for the sake of not having one more
ifdefed variable declaration and for the overall code brevity. The call
from tuntap into LRO can be so easily traced with dtrace(1) that an
80-ish printf(9)-based debugging can be omitted.
Mark Johnston [Mon, 4 Dec 2023 17:29:11 +0000 (12:29 -0500)]
ossl: Move arm_arch.h to a common subdirectory
OpenSSL itself keeps only a single copy of this header. Do the same in
sys/crypto/openssl to avoid the extra maintenance burden. This requires
adjusting the include paths for generated asm files.
Emmanuel Vadot [Fri, 1 Dec 2023 09:27:59 +0000 (10:27 +0100)]
autofs: media: Always use sync option for fat*
Users of autofs for removable media expect to be able to copy files and
directly remove the media without having the need to call sync(8) or umount(8).
Only do that for fat/ntfs filesystems.
Rick Macklem [Sun, 3 Dec 2023 23:31:01 +0000 (15:31 -0800)]
nfscl: Fix processing of a rare Rename reply case
When delegations are enabled (they are not by default in
the FreeBSD NFSv4 server), rename will check for and return
delegations. If the second of these DelegReturn operations
were to fail (they rarely do), then the code would not retry
the rename with returning delegations, as it is intended to do.
The patch fixes the problem, since the DelegReturn reply status
is the second iteration of the loop and not the first iteration.
As noted, this bug would have rarely manifested a problem, since
DelegReturn operations do not normally fail.
Colin Percival [Sun, 3 Dec 2023 21:39:30 +0000 (13:39 -0800)]
release/Makefile.vm: Rework emulator-portinstall
The emulator-portinstall target now unconditionally ensures that qemu
is installed; but is only invoked if needed (aka. when cross building
VM images).
Kirk McKusick [Sun, 3 Dec 2023 20:36:42 +0000 (12:36 -0800)]
Increase UFS/FFS maximum link count from 32767 to 65530.
The link count for a UFS/FFS inode is stored in a signed 16-bit
integer. Thus the maximum link count has been 32767.
This limit has been recently hit by the poudriere build system when
doing a ports build as it needs one directory per port and the
number of ports recently passed 32767.
A long-term solution would be to use one of the spare 32-bit fields
in the inode to store the link count. However, the UFS1 format does
not have a spare and adding the spare in UFS2 would make it hard
to make it compatible when running on older kernels that use the
original link count field. So this patch uses the much simpler
approach of changing the existing link count field from a signed
16-bit value to an unsigned 16-bit value. It has the fewest lines
of code changes. The only thing that changes is the type in the
dinode and inode structures and the definition of UFS_LINK_MAX. It
has the added benefit that it works with both UFS1 and UFS2.
It allows easy backward compatibility. Indeed it is backward
compatibility that is the primary reason to go with this approach.
If a filesystem with the new organization is mounted on an older
kernel, it still needs to work. Thus if we move the new link count
to a new field, we still need to maintain the old link count as
best as possible even when running on a kernel that knows about the
larger link counts. And we would have to carry this overhead for
the indefinite future.
If we have a new link-count field, we will have to add a new
filesystem flag to indicate that we are running with larger link
counts. We will also need to add of one of the new-feature flags
to say that we have larger link counts. Older kernels clear the
new-feature flags that they do not know about, so when a filesystem
is used on an older kernel and then moved back to a newer one, the
newer one will know that the new link counts have not been maintained
and that it will be necessary to run a full fsck on the filesystem
to correct the link counts before it can be mounted.
With this change, older kernels will generally work with the bigger
counts. While it will not itself allow the link count to exceed
32767, it will have no problem working with inodes that have a link
count greater than 32767. Since it tests that i_nlink <= UFS_LINK_MAX,
counts that are bigger than 32767 will appear negative, so will
still pass the test. Of course, if they ever drop below 32767, they
will no longer be able to exceed 32767. The one issue is if the
link count ever exceeds 65535 then it will wrap to zero and the
older kernel will be none the wiser. But this corner case is likely
to be very rare since these kernels and the applications running
on them do not expect to be able to get link counts over 32767. And
over time, the use of new filesystems on older kernels will become
rarer and rarer.
Reported-by: Mark Millard running poudriere on the ports tree Reviewed-by: kib, olce.freebsd_certner.fr Tested-by: Peter Holm, Mark Millard
MFC-after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D42767
if `first_guess' is zero then main() assumes that locate_hunk has failed
and aborts the patch operation. Instead, make sure to return 1 (the
line number) so that the patch operation can continue.
Issue originally found by Neels Hofmeyr in the regress suite of the diff
implementation for got, where the tests assume that applying a diff with
`patch' and then again with `patch -R' yields back the original file.
Previously we were trying to set hca_cap_2 without checking if
sw_vhca_id_valid max value, which is the only settable value inside
hca_cap_2, and seeing that we dont have driver support for sw_vhca_id
yet there is no need to set hca_cap_2 at all, it is enough to query it.
Fixes: 7b959396ca6fae5635260131eedb9bc19f2726a3 ("mlx5: Introduce new destination type TABLE_TYPE")
MFC after: 3 days
Bjoern A. Zeeb [Sun, 12 Nov 2023 20:33:41 +0000 (20:33 +0000)]
wpa: ctrl_iface set sendbuf size
In order to avoid running into the default net.local.dgram.maxdgram
of 2K currently when calling sendto(2) try to set the sndbuf size to
the maximum ctrl message size.
While on 14 and 15 this does not actually raise the limit anymore (and be7c095ac99ad29fd72b780c7d58949a38656c66 raised it for syslogd and this),
FreeBSD 13 still requires this change and it will work as expected there.
In addition we always ensure a large enough send buffer this way
independent of kernel defaults.
The problem occured, e.g., when the scan_list result had enough BSSIDs
so the text output would exceed 2048 bytes.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
PR: 274990
Reviewed by: cy, adrian (with previous comment)
Differential Revision: https://reviews.freebsd.org/D42558
Jessica Clarke [Fri, 1 Dec 2023 23:59:07 +0000 (23:59 +0000)]
armv8rng: Don't require toolchain to support FEAT_RNG
We have the mechanism in place to support encoding system registers
explicitly, so use that rather than requiring LLVM 13+, which breaks our
current set of GitHub CI builds.
Fixes: 9eecef052155 ("Add an Armv8 rndr random number provider")
Gleb Smirnoff [Fri, 1 Dec 2023 23:37:29 +0000 (15:37 -0800)]
unix/dgram: bump maximum datagram size limit to 8k
This is important for wpa_supplicant operation on a crowded network.
Note: we actually need an API to increase maximum datagram size on a
socket. Previously SO_SNDBUF magically acted like that, but that was
an undocumented "feature".
Also move the comment to the proper line. Previously it was the receive
buffer that imposed the limit. Now notion of buffer size and maximum
datagram are separate.
Bjoern A. Zeeb [Thu, 26 Oct 2023 21:14:44 +0000 (21:14 +0000)]
LinuxKPI: 802.11: bring in some HT code
Fix defines and structures to use proper types.
Bring in basic ni->sta synchronization, some channel width handling,
and overload the net80211 functions so that we can talk to
driver/firmware to setup parameters. We will likely not need one
or two of those but it is good for tracing currently.
Cover HT and bits of VHT code in LinuxKPI behind apropriate #ifdef
which are currently not enabled (like LKPI_80211_HW_CRYPTO) until
confirmed to work.
Last, IEEE80211_AMPDU_RX_START made some firmware unhappy.
This will allow others to work on it and test as well.
Sponsored by: The FreeBSD Foundation
MFC after: 10 days
Brooks Davis [Fri, 1 Dec 2023 20:48:29 +0000 (20:48 +0000)]
sysvipc: Fix 32-bit compat on !i386
The various time fields are time_t's which are only 32-bit on i386.
Fixing the old versions is probably of little use, but it's more correct
and in theory there could be powerpc binaries from 6.x.
PR: 240035
Fixes: fbb273bc05bef Properly support for FreeBSD 4 32bit System V shared memory.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D42870
Bjoern A. Zeeb [Fri, 1 Dec 2023 01:37:25 +0000 (01:37 +0000)]
tools/net80211: add mlme_assoc
mlme_assoc is a tool to trigger net80211::ieee80211_sta_join1() calls
which in certain conditions cause problems to the LinuxKPI 802.11 compat
code (but also believed to possibly cause problems in case of race to
other firmware based drivers). This has proven to be a good reproducer
for the problem even on setups which otherwise could run for days without
hitting it.
Bjoern A. Zeeb [Fri, 3 Nov 2023 21:19:26 +0000 (21:19 +0000)]
Revert "Widen EPOCH(9) usage in PCI WLAN drivers."
This reverts commit b65f813c1ab99448278961c5ca80dc422b1eae29.
As a side effect this also seems to fix wtap which seems to have
lost the epoch over the input path in between.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Bjoern A. Zeeb [Sun, 29 Oct 2023 14:25:23 +0000 (14:25 +0000)]
net80211: move net_epoch into net80211
Move the net_epoch into net80211 around the if_input calls and out of
the driver (in this first case LinuxKPI). This reduces coverage but
also allows us to alloc in calls like (*ampdu_rx_start) which do not
actually pass data up the stack.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Tested by: few (rtwn, ath, iwlwifi, ...)
Reviewed by: adrian
Differential Revision: https://reviews.freebsd.org/D42427
Many languages use different case endings depending on whether the month
is referenced as a standalone word (nominative case), or in date context
(genitive, partitive, etc.). sort(1)'s -M option currently sorts months
by testing input against only the abbrevation format, which is
essentially a substring of the full format. While this works fine for
languages like English, where there are no cases, for languages where
there is a different case ending between the abbreviation/full and
standalone formats, it is not sufficient.
For example, in Greek, "May" can take the following forms:
RTLD_DEEPBIND: make lookup not just symbolic, but walk all refobj' DAGs
before starting the walk over the global list. Effectively we visit
needed objects first as well, instead of just the object itself.
This seems to better match the semantic offered by the glibc flag.
Reported by: kevans
PR: 275393
Reviewed by: kevans
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D42841
Mark Johnston [Thu, 30 Nov 2023 17:46:08 +0000 (12:46 -0500)]
ossl: Add support for armv7
OpenSSL provides implementations of several AES modes which use
bitslicing and can be accelerated on CPUs which support the NEON
extension. This patch adds arm platform support to ossl(4) and provides
an AES-CBC implementation, though bsaes_cbc_encrypt() only implements
decryption. The real goal is to provide an accelerated AES-GCM
implementation; this will be added in a subsequent patch.
Initially derived from https://reviews.freebsd.org/D37420.
Mark Johnston [Wed, 29 Nov 2023 20:08:12 +0000 (15:08 -0500)]
ossl: Fix some bugs in the fallback AES-GCM implementation
gcm_*_aesni() are used when the AVX512 implementation is not available.
Fix two bugs which manifest when handling operations spanning multiple
segments:
- Avoid underflow when the length of the input is smaller than the
residual.
- In gcm_decrypt_aesni(), ensure that we begin the operation at the
right offset into the input and output buffers.
Reviewed by: jhb
Fixes: 9b1d87286c78 ("ossl: Add a fallback AES-GCM implementation using AES-NI")
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D42838
Gleb Smirnoff [Thu, 30 Nov 2023 16:30:55 +0000 (08:30 -0800)]
sockets: don't malloc/free sockaddr memory on getpeername/getsockname
Just like it was done for accept(2) in cfb1e92912b4, use same approach
for two simplier syscalls that return socket addresses. Although,
these two syscalls aren't performance critical, this change generalizes
some code between 3 syscalls trimming code size.
Following example of accept(2), provide VNET-aware and INVARIANT-checking
wrappers sopeeraddr() and sosockaddr() around protosw methods.
Gleb Smirnoff [Thu, 30 Nov 2023 16:30:55 +0000 (08:30 -0800)]
sockets: don't malloc/free sockaddr memory on accept(2)
Let the accept functions provide stack memory for protocols to fill it in.
Generic code should provide sockaddr_storage, specialized code may provide
smaller structure.
While rewriting accept(2) make 'addrlen' a true in/out parameter, reporting
required length in case if provided length was insufficient. Our manual
page accept(2) and POSIX don't explicitly require that, but one can read
the text as they do. Linux also does that. Update tests accordingly.
Jamie Gritton [Thu, 30 Nov 2023 00:12:13 +0000 (16:12 -0800)]
jail: Don't allow jail_set(2) to resurrect dying jails.
Currently, a prison in "dying" state (removed but still holding
resources) can be brought back to alive state via "jail -d", or
the JAIL_DYING flag to jail_set(2). This seemed like a good idea
at the time.
Its main use was to improve support for specifying the jid when
creating a jail, which also seemed like a good idea at the time.
But resurrecting a jail that was partway through thr process of
shutting down is trouble waiting to happen.
This patch deprecates that flag, leaving it as a no-op for creating
jails (but still useful for looking at dying jails). It sill allows
creating a new jail with the same jid as a dying one, but will renumber
the old one in that case. That's imperfect, but allows for current
behavior.
The number of events we track can vary over time, but we only allocate
enough space for the exact number of events we are tracking when we
first begin, resulting in a trivially reproducable heap overflow. Fix
this by allocating enough space for the greatest possible number of
events (two per file) and clean up the code a bit.
Also add a test case which triggers the aforementioned heap overflow,
although we don't currently have a way to detect it.
Bjoern A. Zeeb [Wed, 29 Nov 2023 21:33:23 +0000 (21:33 +0000)]
iwlwififw: add firmware for the Bz/B200 chipset
The iwlwifi driver already supports the chipset as "Bz TBD"
(also in 14.0). Add the firmware for it. Successfully tested
for 0x8086/0x272b/0x8086/0x00f4 on arm64 thanks to donated
hardware [1].
vt(4): Call post-switch callback after replacing the backend
[Why]
For instance, it gives a chance to the new backend to refresh the
screen. This is needed by the vt_drmfb backend and `drm_fb_helper`.
This change was lost when I posted changes to reviews.freebsd.org and it
broken the amdgpu driver... Thanks to manu@ for reporting the problem
and wulf@ to find out the missing change!
Tested by: manu
Reviewed by: manu
Approved by: manu
Differential Revision: https://reviews.freebsd.org/D42834
zil_claim_clone_range() takes references on cloned blocks before ZIL
replay. Later zil_free_clone_range() drops them after replay or on
dataset destroy. The total balance is neutral. It means on actual
replay we must take additional references, which would stay in BRT.
Without this blocks could be freed prematurely when either original
file or its clone are destroyed. I've observed BRT being emptied
and the feature being deactivated after ZIL replay completion, which
should not have happened. With the patch I see expected stats.
Reviewed-by: Kay Pedersen <mail@mkwg.de> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15603
John Baldwin [Wed, 29 Nov 2023 18:31:47 +0000 (10:31 -0800)]
pci_cfgreg: Add a PCI domain argument to the low-level register API
This commit changes the API of pci_cfgreg(read|write) to add a domain
argument (referred to as a segment in ACPI parlance) (note that this
is not the same as a NUMA domain, but something PCI-specific). This
does not yet enable access to domains other than 0, but updates the
API to support domains.
Places that use hard-coded bus/slot/function addresses have been
updated to hardcode a domain of 0. A few places that have the PCI
domain (segment) available such as the acpi_pcib_acpi.c Host-PCI
bridge driver pass the PCI domain.
The hpt27xx(4) and hptnr(4) drivers fail to attach to a device not on
domain 0 since they provide APIs to their binary blobs that only
permit bus/slot/function addressing.
The x86 non-ACPI PCI bus drivers all hardcode a domain of 0 as they do
not support multiple domains.
Wraithh [Wed, 29 Nov 2023 17:55:17 +0000 (19:55 +0200)]
Fix zoneid when USER_NS is disabled
getzoneid() should return GLOBAL_ZONEID instead of 0 when USER_NS is disabled.
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ilkka Sovanto <github@ilkka.kapsi.fi>
Closes #15560
Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Warner Losh <imp@FreeBSD.org> Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #15606
Igor Ostapenko [Wed, 29 Nov 2023 12:35:41 +0000 (13:35 +0100)]
pf: fix mem leaks upon vnet destroy
Add missing cleanup actions:
- remove user defined anchor rulesets
- remove user defined ether anchor rulesets
- remove tables linked to user defined anchors
- deal with wildcard anchor peculiarities to get them removed correctly
Mark Johnston [Wed, 29 Nov 2023 17:51:55 +0000 (12:51 -0500)]
ossl: Keep mutable AES-GCM state on the stack
ossl(4)'s AES-GCM implementation keeps mutable state in the session
structure, together with the key schedule. This was done for
convenience, as both are initialized together. However, some OCF
consumers, particularly ZFS, assume that requests may be dispatched to
the same session in parallel. Without serialization, this results in
incorrect output.
Fix the problem by explicitly copying per-session state onto the stack
at the beginning of each operation.
PR: 275306
Reviewed by: jhb
Fixes: 9a3444d91c70 ("ossl: Add a VAES-based AES-GCM implementation for amd64")
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D42783
Warner Losh [Wed, 29 Nov 2023 15:26:29 +0000 (08:26 -0700)]
openzfs: unbreak 32-bit builds.
32-bit builds are broken. fix that by using PRIu64 instead of a
bare '%lu.'
Feel free to revert when upstream has this fixed. I'm agnostic as to the
proper fix, but don't have the time to fight upstreaming this on top of
everything else.
Alan Somers [Wed, 12 Jul 2023 20:46:27 +0000 (14:46 -0600)]
zfsd: fault disks that generate too many I/O delay events
If ZFS reports that a disk had at least 8 I/O operations over 60s that
were each delayed by at least 30s (implying a queue depth > 4 or I/O
aggregation, obviously), fault that disk. Disks that respond this
slowly can degrade the entire system's performance.
Alexander Motin [Wed, 29 Nov 2023 01:50:30 +0000 (18:50 -0700)]
mpi3mr: Make these bus_dmamap_load calls synchronous
These calls "should" all be synchrounous. There's no bouncing that's
needed for them (at least in the typical case that we have a sane card
that has more bits of dma addresses decoded than we have memory), so
there's no errors possible. Ensure these calls are really synchronous
with BUS_DMA_NOWAIT flags (which should never fail now that the
bus_dmamem_alloc() has succeeded).