cxgbe(4): Fix tracing with netlink enabled kernels.
1. The tracing ifnet's name must match the nexus name. One way to do
this is to not use a unit number.
2. Do not hold the tracer lock around ether_ifattach or ether_ifdetach.
netlink calls back into the driver (see stack below) and that leads
to illegal lock recursion.
If we receive a state with a route-to interface name set and we can't
find the interface we do not insert the state. However, in that case we
must still clean up the state (and state keys).
Do so, so we do not leak states.
lib/libc/amd64/string/memchr.S: fix behaviour with overly long buffers
When memchr(buf, c, len) is called with a phony len (say, SIZE_MAX),
buf + len overflows and we have buf + len < buf. This confuses the
implementation and makes it return incorrect results. Neverthless we
must support this case as memchr() is guaranteed to work even with
phony buffer lengths, as long as a match is found before the buffer
actually ends.
Sponsored by: The FreeBSD Foundation
Reported by: yuri, des
Tested by: des
Approved by: mjg (blanket, via IRC)
MFC after: 1 week
MFC to: stable/14
PR: 273652
John Baldwin [Sat, 9 Sep 2023 19:13:57 +0000 (12:13 -0700)]
gic_acpi: Limit the number of CPUs to GIC_MAXCPU
madt_table_data contains an array of pointers for each CPU and was
allocated on the stack. If MAXCPU is raised to a sufficiently large
value this can overflow the kernel stack. Cap the stack growth by
using GIC_MAXCPU instead as for other parts of the gicv1/v2 driver in
commit a0e20c0ded1a.
Warner Losh [Sat, 9 Sep 2023 02:18:33 +0000 (20:18 -0600)]
atkbdc: Add additional heurstic for Chromebook keyboards
It turns out that that heurstic used to determine if we have a Google
coreboot, and thus have the i8042 emulation bugs, is incorrect. At least
one Acer "Peppy" Chromebook has an issue because Acer space'd out the
smbios.bios.version string we're using as part of the heuristic. So, if
the version starts with a space, then enable the workarounds if the
smbios.bios.reldate is 2018 or earlier. While not perfect, it should be
a reasonable dividing line and still allow newer core boot-based
machines that aren't Chromebooks to not have the workaround.
carp: Explicitly mark tunnable net.inet.carp.allow with CTLFLAG_NOFETCH
With recent change 110113bc086f, a vnet tunable can be initialized when
there is a corresponding kernel environment variable unless it is marked
with the flag CTLFLAG_NOFETCH.
The initialization may happen during early boot(linker preload), at that
time vnet0 has not been created. The hander carp_allow_sysctl() for the
tunable net.inet.carp.allow requires vnet, thus invoking it during early
boot will cause kernel panic.
The tunnable is initialized by vnet sysinit routine ipcarp_sysinit() so
let's just mark it with flag CTLFLAG_NOFETCH.
No functional change intended.
Fixes: 110113bc086f sysctl(9): Enable vnet sysctl variables to be loader tunable
MFC after: 2 week
Differential Revision: https://reviews.freebsd.org/D41525
The module preload happens before vnet0 creation, at this moment the vnet
list is empty thus invoking vnet_data_copy() during preload is a noop.
With recent change 110113bc086f, for dynamic module load, aka via kldload,
linker will do vnet propagation right after registering sysctls which
happens after module load, then previous propagation (during module load)
is redundant.
In 3da1cf1e88f8, the meaning of the flag CTLFLAG_TUN is extended to
automatically check if there is a kernel environment variable which
shall initialize the SYSCTL during early boot. It works for all SYSCTL
types both statically and dynamically created ones, except for the
SYSCTLs which belong to VNETs.
This change extends the meaning further, to allow it also works for
the SYSCTLs which belong to VNETs. A typical usage is
```
VNET_DEFINE_STATIC(int, foo) = 0;
SYSCTL_INT(_net, OID_AUTO, foo, CTLFLAG_RWTUN | CTLFLAG_VNET,
&VNET_NAME(foo), 0, "Description of the foo loader tunable");
```
Note that the implementation has a limitation. It behaves the same way
as that of non-vnet loader tunables. That is, after the kernel or modules
being initialized, any changes (e.g. via kenv) to kernel environment
variable will not affect the corresponding vnet variable of subsequently
created VNETs. To overcome it, we can use TUNABLE_XXX_FETCH to fetch
the kernel environment variable into those vnet variables during vnet
constructing.
This change will fix the following SYSCTLs those belong to VNETs and
have CTLFLAG_TUN flag:
```
net.add_addr_allfibs
net.bpf.optimize_writers
net.inet.tcp.fastopen.ccache_buckets
net.link.bridge.inherit_mac
net.link.bridge.ipfw_arp
net.link.bridge.log_stp
net.link.bridge.pfil_bridge
net.link.bridge.pfil_local_phys
net.link.bridge.pfil_member
net.link.bridge.pfil_onlyip
net.link.lagg.default_use_flowid
net.link.lagg.default_use_numa
net.link.lagg.default_flowid_shift
net.link.lagg.lacp.debug
net.link.lagg.lacp.default_strict_mode
```
Although the following vnet SYSCTLs have CTLFLAG_TUN flag, theirs
values are re-fetched via TUNABLE_XXX_FETCH, thus are not affected
by this change.
```
net.inet.ip.reass_hashsize
net.inet.tcp.hostcache.cachelimit
net.inet.tcp.hostcache.hashsize
net.inet.tcp.hostcache.bucketlimit
net.inet.tcp.syncache.bucketlimit
net.inet.tcp.syncache.cachelimit
net.inet.tcp.syncache.hashsize
net.key.spdcache.maxentries
net.key.spdcache.threshold
```
In memoriam: hselasky
Discussed with: hselasky, glebius
Fixes: 3da1cf1e88f8 Extend the meaning of the CTLFLAG_TUN flag ...
MFC after: 2 weeks
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D39638
John Baldwin [Fri, 8 Sep 2023 23:30:35 +0000 (16:30 -0700)]
cxgbe tom: Call t4_rcvd_locked from do_rx_data to return RX credits
In particular, the kernel RPC layer used by the NFS client never
invokes pru_rcvd since it always reads data from the socket upcall
via MSG_SOCALLBCK which avoids calling pru_rcvd. As a result, on an
NFS client connection managed by t4_tom, RX credits were never
returned to the TOE connection to open the TCP window resulting in
connection hangs.
To fix, expand the set of conditions in do_rx_data where RX credits
are returned to match those in t4_rcvd_locked by calling the function
directly.
lib/libc/tests/string: add unit tests for strcspn(3)
We currently use the NetBSD test suite to cover strcspn(3). It only
contains a very rudimentary test of this function. This all new set
of unit tests for the FreeBSD test suite should cover many more edge
cases relating to alignment issues.
Sponsored by: The FreeBSD Foundation
Approved by: mjg
MFC after: 1 week
MFC to: stable/14
Differential Revision: https://reviews.freebsd.org/D41557
This changeset adds both a scalar and an x86-64-v2 implementation
of the strcspn(3) function to libc. A baseline implementation does not
appear to be feasible given the requirements of the function.
The scalar implementation is similar to the generic libc implementation,
but expands the bit set into a byte set to reduce latency, improving
performance. This approach could probably be backported to the generic
C version to benefit other platforms.
The x86-64-v2 implementation is built around the infamous pcmpistri
instruction. An alternative implementation based on the Muła/Langdale
algorithm [1] was prototyped, but performed worse than the pcmpistri
approach except for sets of more than 16 characters with long input
strings.
All implementations provide special cases for the empty set (reduces to
strlen as well as single-character sets (reduces to strchr). The
x86-64-v2 kernel falls back to the scalar implementation for sets of
more than 32 characters. This limit could be raised by additional
multiples of 16 through the use of additional pcmpistri code paths, but
I consider this case to be too rare to be of importance.
Michael Tuexen [Fri, 8 Sep 2023 14:20:51 +0000 (16:20 +0200)]
sctp: cleanup locking for notifications
All notifications are now queued via sctp_ulp_notify(). Do
the locking of the inp read lock there and validate this in all
functions being used.
This is one step in avoiding race conditions when closing the
read end of an SCTP socket.
Mike Karels [Fri, 8 Sep 2023 14:06:42 +0000 (09:06 -0500)]
mountd: do not warn about using class mask with -mask
The previous code would warn that the mask was being defaulted to
an obsolete class mask even if -mask was present after -network.
Import a fix from Peter Much with a little tweaking, deferring the
warning until after all parameters are processed.
PR: 263011
Obtained from: pmc at citilink.dinoex.sub.org
MFC after: 3 days
Reviewed by: rmacklem
Differential Revision: https://reviews.freebsd.org/D41774
When building the frequencies table we convert the value in the DTS to
megahertz and loose precision. While it's not a problem for most of the
DTS it is when the expected frequency value is strict down to the hertz.
So it's either we don't truncate the value and have some ugly and long
values in the sysctls or we just find the closest frequency.
Do the latter.
The M1 uses FDT, and has bge to start with. Add a SOC_* option for
the first SoC we'll be supporting.
IOMMU is added commented out because it does have it, but IOMMU is not
well-tested on aarch64. An initial version of the DART driver will be
upstreamed that just puts the DARTs that support bypass mode into bypass
mode -- we'll be missing some functionality, but we at least still end
up with some USB ports.
Ed Maste [Thu, 7 Sep 2023 16:32:39 +0000 (12:32 -0400)]
ssh-keygen: Generate Ed25519 keys when invoked without arguments
Ed25519 keys are convenient because they're much smaller, and the next
OpenSSH release (9.5) will switch to them by default. Apply the change
to FreeBSD main now, to help identify issues as early as possible.
Reviewed by: kevans, karels, des
Relnotes: Yes
Obtained from: OpenBSD 9de458a24986
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D41773
pf: mark removed connections within a multihome association as shutting down
Parse IP removal in ASCONF chunks, find the affected state(s) and mark
them as shutting down. This will cause them to time out according to
PFTM_TCP_CLOSING timeouts, rather than waiting for the established
session timeout.
MFC after: 3 weeks
Sponsored by: Orange Business Services
Kristof Provost [Wed, 2 Aug 2023 08:44:52 +0000 (10:44 +0200)]
pf tests: basic SCTP multihoming test
The SCTP server will announce multiple addresses. Block one of them with
pf, connect to the other have the client use the blocked address. pf
is expected to have created state for all of the addresses announced by
the server.
In a separate test case add the secondary (client) IP after the
connection has been established. The intent is to verify the
functionality of the ASCONF chunk parsing.
Kristof Provost [Wed, 2 Aug 2023 17:05:00 +0000 (19:05 +0200)]
pf: support SCTP multihoming
SCTP may announce additional IP addresses it'll use in the INIT/INIT_ACK
chunks, or in ASCONF chunks at any time during the connection. Parse these
parameters, evaluate the ruleset for the new connection and if allowed
create the corresponding states.
Ed Maste [Wed, 30 Aug 2023 20:49:44 +0000 (16:49 -0400)]
Update WITH_/WITHOUT_SSP descriptions
ProPolice refers to a specific implementation by Hiroaki Etoh and
Kunikazu Yoda. The implementation in contemporary Clang and GCC is
somewhat different and newer, so use a generic term in the src.conf
descriptions.
Martin Matuska [Thu, 7 Sep 2023 15:18:12 +0000 (17:18 +0200)]
libarchive: merge security fix from vendor branch
This commit fixes a couple of security vulnerabilities in the PAX writer:
1. Heap overflow in url_encode() in archive_write_set_format_pax.c
2. NULL dereference in archive_write_pax_header_xattrs()
3. Another NULL dereference in archive_write_pax_header_xattrs()
4. NULL dereference in archive_write_pax_header_xattr()
Security: No known reference yet
Obtained from: https://github.com/libarchive/libarchive/commit/1b4e0d0f9
MFC after: 3 days
Ed Maste [Thu, 7 Sep 2023 15:10:33 +0000 (11:10 -0400)]
syscons: refer to it as the legacy console
vt(4) is the default console, and although there is no firm deprecation
plan for syscons(4) yet it it is not actively maintained and is not
compatible with contemporary systems (i.e., those booting via UEFI).
Reviewed by: manu
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Each Rx descriptor points to a packet buffer of size 2K, which means
that MTUs greater than 2K see multi-descriptor packets. The TCP-hood of
such packets was being incorrectly determined by looking for a flag on
the last descriptor instead of the first descriptor.
This adds specific width length modifiers in the form of wN and wfN (where N is 8, 16, 32, or 64) which allow printing intN_t and int_fastN_t without resorting to casts or PRI macros.
groff 1.23.0 changed the semantics of the -man parameter, and many
manual pages are not rendered. The -mandoc parameter brings back
the old behavior, as in groff 1.22.4 and earlier.
PR: 273565, 273245
Reviewed by: emaste, bapt
MFC after: 1 week for all supported branches (stable/12, 13, 14)
Differential Revision: https://reviews.freebsd.org/D41737
__crt_aligned_alloc_offset(): fix ov_index for backing allocation address
Wrong value of ov_index resulted in magic check failure, and refuse to
free() the memory allocated with __crt_aligned_alloc_offset().
Then the TLS segments of exited threads leaked.
Colin Percival [Tue, 5 Sep 2023 23:46:38 +0000 (16:46 -0700)]
init_main: Switch from SLIST to STAILQ, fix order
Constructing an SLIST of SYSINITs by inserting them one by one at the
head of the list resulted in them being sorted in anti-stable order:
When two SYSINITs tied for (subsystem, order), they were executed in
the reverse order to the order in which they appeared in the linker
set.
Note that while this changes struct sysinit, it doesn't affect ABI
since SLIST_ENTRY and STAILQ_ENTRY are compatible (in both cases a
single pointer to the next element).
Andrew Turner [Wed, 6 Sep 2023 11:07:41 +0000 (12:07 +0100)]
arm64: Enable FEAT_E0PD when supported
FEAT_E0PD adds two fields to the tcr_el1 special register that, when
set, cause userspace access to either the top or bottom half of the
address spaces without a page walk.
This can be used to stop userspace probing the kernel address space
as the CPU will raise an exception in the same time if the probed
address is in the TLB or not.
Reviewed by: kevans
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D41760
diff3: make the diff3 -E -m and diff3 -m behaviour match gnu diff3
In gnu diff3 3 way merging files where the new file and the target are
already the same will die and show what has failed to be merged except
if -E is passed in argument, in this case it will finish the merge.
This difference in behaviour was breaking one of the etcupdate testcase
with bsd diff3
linuxkpi: fix iteration in __sg_alloc_table_from_pages
Commit 3f686532c9b4 tried to fix an issue with not properly starting
at the first page in the sg list to prevent a panic. This worked but
with the side effect of incrementing "s" during the final iteration
causing it to be NULL since the list had ended.
In cases non-DEBUG kernels this causes a panic with drm-5.15, since
"s" is NULL when we later pass it to sg_mark_end().
This change decouples the iteration sg from the return value so that
it is never incremented past the final page in the chain.
MFC after: 3 days
Reviewed by: manu
Differential Revision: https://reviews.freebsd.org/D41574
Piotr Kasierski [Wed, 6 Sep 2023 13:51:41 +0000 (09:51 -0400)]
qat: Intel 4xxx Series driver API extension
This commit introduces:
- Quick Assist API update for partial decompress and zero padding.
- Refactor of UIO locking.
- VF driver hotplug fix.
- Minor code style fixes for firmware API.
Patch co-authored by: Krzysztof Zdziarski <krzysztofx.zdziarski@intel.com>
Patch co-authored by: Michal Gulbicki <michalx.gulbicki@intel.com>
Patch co-authored by: Piotr Kasierski <piotrx.kasierski@intel.com>
Patch co-authored by: Karol Grzadziel <karolx.grzadziel@intel.com>
When using printm(), one should always pass a scratch pointer to it.
This is achieved by calling printm with memref
BEGIN { printm(fixed_len, memref(ptr, var_len)); }
which will return a pointer to the DTrace scratch space of size
sizeof(uintptr_t) * 2. However, one can easily call printm() as follows
BEGIN { printm(10, (void *)NULL); }
and panic the kernel as a result. This commit does two things:
(1) adds a new macro DTRACE_INSCRATCHPTR(mstate, ptr, howmany) which
checks if a certain pointer is in the DTrace scratch space;
(2) uses DTRACE_INSCRATCHPTR() to implement a check on printm()'s DIFO
return value in order to avoid the panic and sets CPU_DTRACE_BADADDR
if the address is not in the scratch space.
Martin Matuska [Wed, 6 Sep 2023 11:58:10 +0000 (13:58 +0200)]
vfs: copy_file_range() between multiple mountpoints of the same fs type
VOP_COPY_FILE_RANGE(9) is now caled when source and target vnodes
reside on the same filesystem type (not just on the same mountpoint).
The check if vnodes are on the same mountpoint must be done in the
filesystem code. There are currently only three users - fusefs(5) already
has this check, ZFS can handle multiple mountpoints and a check has been
added to NFS client.
ZFS block cloning is now possible between all snapshots and datasets
of the same ZFS pool.
net: Check per-flow priority code point for untagged traffic
Commit 868aabb4708d introduced per-flow priority. There's a defect in the
logic for untagged traffic, it does not check M_VLANTAG set in the mbuf
packet header or MTAG_8021Q/MTAG_8021Q_PCP_OUT tag set by firewall, then
can result missing desired priority in the outbound packets.
For mbuf packet with M_VLANTAG in header, some interfaces happen to work
due to bug in the drivers mentioned in D39499. As modern interfaces have
VLAN hardware offloading, the defect is barely noticeable unless the
feature per-flow priority is widely tested.
As a side effect of this defect, the soft padding to work around buggy
bridges is bypassed. That may result in regression if soft padding is
requested.
sys/dev/arcmsr: Update Areca RAID driver to version 1.50.00.06.
- Suppressed a harmless warning message, "arcmsr_dr_handle: target=f,
lun=0, GONE!!!," which could appear a few seconds after UEFI system
boot due to the boot volume UEFI initialization.
- Corrected various typing errors.
- Refactored arcmsr_initialize() to improve code readability.
- Added support for device IDs 1883 and 1886 controllers.
- Introduced support for controllers requiring host memory for the
RAID 5 and 6 XOR engines.
Many thanks to Areca for continuing to support FreeBSD.
The “undocumented kludge” which unfortunately can't be dropped for backward compatibility reasons was prone to segfaulting and would improperly allow a new linecount when one was already set. Fix these issues and add regression tests.
32-bit compatibility code is conventionally stored in
sys/compat/freebsd32. Move freebsd32_timerfd_gettime() and
freebsd32_timerfd_settime() from sys/kern/sys_timerfd.c to
sys/compat/freebsd32/freebsd32_misc.c.
Do not pollute userspace with <sys/proc.h>, instead declare struct thread
when _KERNEL is defined.
Include <sys/time.h> instead of <sys/timespec.h>. This causes intentional
namespace pollution that mimics Linux. g/musl libcs include <time.h> in
their <sys/timerfd.h>, exposing clock gettime, settime functions and
CLOCK_ macro constants. Ports like Chromium expect this namespace
pollution and fail without it.
Define a locking regime for the members of struct timerfd and document
it so future code can follow the standard. The lock legend can be found
in a comment above struct timerfd.
Additionally,
* Add assertions based on locking regime.
* Fill kn_data with the expiration count when EVFILT_READ is triggered.
* Report st_ctim for stat(2).
* Check if file has f_type == DTYPE_TIMERFD before assigning timerfd
pointer to f_data.
cxgbe(4): Avoid hang on kldunload on netlink enabled kernels.
netlink(4) calls back into the driver during detach and it attempts to
start an internal synchronized op recursively, causing an interruptible
hang. Fix it by failing the ioctl if the VI has been marked as DOOMED
by cxgbe_detach.
Here's the stack for the hang for reference.
#6 begin_synchronized_op
#7 cxgbe_media_status
#8 ifmedia_ioctl
#9 cxgbe_ioctl
#10 if_ioctl
#11 get_operstate_ether
#12 get_operstate
#13 dump_iface
#14 rtnl_handle_ifevent
#15 rtnl_handle_ifnet_event
#16 rt_ifmsg
#17 if_unroute
#18 if_down
#19 if_detach_internal
#20 if_detach
#21 ether_ifdetach
#22 cxgbe_vi_detach
#23 cxgbe_detach
#24 DEVICE_DETACH
MFC after: 3 days
Sponsored by: Chelsio Communications