This should fix an assertion failure building qemu, specifically those
parts using -fzero-call-used-regs.
Reported by: Daniel Berrangé <dan-freebsd@berrange.com>
PR: 277474
MFC after: 3 days
Approved by: so
Approved by: re (so, implicit, appease the commit-hook)
Security: FreeBSD-EN-24:07.clang
Kyle Evans [Fri, 15 Mar 2024 01:19:18 +0000 (20:19 -0500)]
if_wg: use proper barriers around pkt->p_state
Without appropriate load-synchronization to pair with store barriers in
wg_encrypt() and wg_decrypt(), the compiler and hardware are often
allowed to reorder these loads in wg_deliver_out() and wg_deliver_in()
such that we end up with a garbage or intermediate mbuf that we try to
pass on. The issue is particularly prevalent with the weaker
memory models of !x86 platforms.
Switch from the big-hammer wmb() to more explicit acq/rel atomics to
both make it obvious what we're syncing up with, and to avoid somewhat
hefty fences on platforms that don't necessarily need this.
With this patch, my dual-iperf3 reproducer is dramatically more stable
than it is without on aarch64.
PR: 264115
Reviewed by: andrew, zlei
Approved by: so
Approved by: re (so, implicit, appease the commit-hook)
Security: FreeBSD-EN-24:06.wireguard
There is a lack of proper visibility checking in kern.ttys sysctl handler
which leads to information leak about processes outside the current jail.
This can be demonstrated with pstat -t: when called from within a jail,
it will output all terminal devices including process groups and
session leader process IDs:
jail# pstat -t | grep pts/ | head
LINE INQ CAN LIN LOW OUTQ USE LOW COL SESS PGID STATE
pts/2 1920 0 0 192 1984 0 199 0 4132 27245 Oi
pts/3 1920 0 0 192 1984 0 199 16 24890 33627 Oi
pts/5 0 0 0 0 0 0 0 25 17758 0 G
pts/16 0 0 0 0 0 0 0 0 52495 0 G
pts/15 0 0 0 0 0 0 0 25 53446 0 G
pts/17 0 0 0 0 0 0 0 6702 33230 0 G
pts/19 0 0 0 0 0 0 0 14 1116 0 G
pts/0 0 0 0 0 0 0 0 0 2241 0 G
pts/23 0 0 0 0 0 0 0 20 15639 0 G
pts/6 0 0 0 0 0 0 0 0 44062 93792 G
jail# pstat -t | grep pts/ | wc -l
85
Devfs does the filtering correctly and we get only one entry:
Stefan Eßer [Tue, 20 Feb 2024 12:02:24 +0000 (13:02 +0100)]
msdosfs: fix potential inode collision on FAT12 and FAT16
FAT file systems do not use inodes, instead all file meta-information
is stored in directory entries.
FAT12 and FAT16 use a fixed size area for root directories, with
typically 512 entries of 32 bytes each (for a total of 16 KB) on hard
disk formats. The file system data is stored in clusters of typically
512 to 4096 bytes, depending on the size of the file system.
The current code uses the offset of a DOS 8.3 style directory entry as
a pseudo-inode, which leads to inode values of 0 to 16368 for typical
root directories with 512 entries.
Sub-directories use 2 cluster length plus the byte offset of the
directory entry in the data area for the pseudo-inode, which may be
as low as 1024 in case of 512 byte clusters. A sub-directory in
cluster 2 and with 512 byte clusters will therefore lead to a
re-use of inode 1024 when there are at least 32 DOS 8.3 style
filenames in the root directory (or 11 14-character Windows
long file names, each of which takes up 3 directory entries).
FAT32 file systems are not affected by this issue and FAT12/FAT16
file systems with larger cluster sizes are unlikely to have as
many directory entries in the root directory as are required to
cause the collision.
This commit leads to inode numbers that are guaranteed to not collide
for all valid FAT12 and FAT16 file system parameters. It does also
provide a small speed-up due to more efficient use of the vnode cache.
PR: 277239
Reviewed by: mckusick
Approved by: re (cperciva)
John Baldwin [Thu, 22 Feb 2024 18:43:43 +0000 (10:43 -0800)]
acpi: Defer reserving resources for ACPI devices
The goal of reserving firmware-assigned resources is to ensure that
"wildcard" resource allocation requests will not claim an address
range that is actually in use even if no attached driver is actively
using that range. However, the current approach can break in some
cases.
In particular, ACPI can enumerate devices behind PCI bridges that
don't show up in a normal PCI scan, but those device_t objects can end
up as direct children of acpi0. Reserving resources for those devices
directly from acpi0 ends up conflicting with later attempts to reserve
the PCI bridge windows.
As a workaround, defer reserving unclaimed resources until after the
initial probe and attach scan. Eventually this pass of reserving
unclaimed resources can be moved earlier, but it requires changes to
other drivers in the tree to permit enumerating devices and reserving
firmware-assigned resources in a depth-first traversal before
attaching devices whose drivers request wildcard allocations.
Cy Schubert [Thu, 15 Feb 2024 15:41:07 +0000 (07:41 -0800)]
heimdal: Fix NULL deref
A flawed logical condition allows a malicious actor to remotely
trigger a NULL pointer dereference using a crafted negTokenInit
token.
Upstream notes:
Reported to Heimdal by Michał Kępień <michal@isc.org>.
From the report:
Acknowledgement
---------------
This flaw was found while working on addressing ZDI-CAN-12302: ISC BIND
TKEY Query Heap-based Buffer Overflow Remote Code Execution
Vulnerability, which was reported to ISC by Trend Micro's Zero Day
Security: CVE-2022-3116
Obtained from: upstream 7a19658c1
MFS requested by: re (cperciva)
Approved by: re (cperciva)
RFC8062 Section 7 requires verification of the PA-PKINIT-KX key
excahnge when anonymous PKINIT is used. Failure to do so can
permit an active attacker to become a man-in-the-middle.
Reported by: emaste
Obtained from: upstream 38c797e1a
Security: CVE-2019-12098
MFS requested by: re (cperciva)
Approved by: re (cperciva)
Cy Schubert [Wed, 14 Feb 2024 20:04:30 +0000 (12:04 -0800)]
Heimdal: CVE-2018-16860 Heimdal KDC: Reject PA-S4U2Self with unkeyed checksum
Upstream's explanation of the problem:
S4U2Self is an extension to Kerberos used in Active Directory to allow
a service to request a kerberos ticket to itself from the Kerberos Key
Distribution Center (KDC) for a non-Kerberos authenticated user
(principal in Kerboros parlance). This is useful to allow internal
code paths to be standardized around Kerberos.
S4U2Proxy (constrained-delegation) is an extension of this mechanism
allowing this impersonation to a second service over the network. It
allows a privileged server that obtained a S4U2Self ticket to itself
to then assert the identity of that principal to a second service and
present itself as that principal to get services from the second
service.
There is a flaw in Samba's AD DC in the Heimdal KDC. When the Heimdal
KDC checks the checksum that is placed on the S4U2Self packet by the
server to protect the requested principal against modification, it
does not confirm that the checksum algorithm that protects the user
name (principal) in the request is keyed. This allows a
man-in-the-middle attacker who can intercept the request to the KDC to
modify the packet by replacing the user name (principal) in the
request with any desired user name (principal) that exists in the KDC
and replace the checksum protecting that name with a CRC32 checksum
(which requires no prior knowledge to compute).
This would allow a S4U2Self ticket requested on behalf of user name
(principal) user@EXAMPLE.COM to any service to be changed to a
S4U2Self ticket with a user name (principal) of
Administrator@EXAMPLE.COM. This ticket would then contain the PAC of
the modified user name (principal).
Reported by: emaste
Security: CVE-2018-16860
Obtained from: Upstream c6257cc2c
MFS requested by: re (cperciva)
Approved by: re (cperciva)
Apply upstream b1e699103. This fixes a bug introduced by upstream f469fc6 which may in some cases enable bypass of capath policy.
Upstream writes in their commit log:
Note, this may break sites that rely on the bug. With the bug some
incomplete [capaths] worked, that should not have. These may now break
authentication in some cross-realm configurations.
Reported by: emaste
Security: CVE-2017-6594
Obtained from: upstream b1e699103
MFS requested by: re (cperciva
Approved by: re (cperciva)
Bjoern A. Zeeb [Wed, 14 Feb 2024 21:56:48 +0000 (21:56 +0000)]
LinuxKPI: 802.11: lsta txq locking cleanup
Rename the LSTA lock to LSTA_TXQ lock as that is really what it is and
put down the full set of macros. Replace the init and destroy with the
macro invocation rather than direct code.
Put locking around the txq_ready unset and check. Move the taskq_enqueue
call under lock to be sure we do not call it anymore after txq_ready
got unset.
Leave a comment related to the node reference which is passed into the
TX path on the recvif mbuf pointer.
Bjoern A. Zeeb [Mon, 5 Feb 2024 14:51:08 +0000 (14:51 +0000)]
LinuxKPI: 802.11: update the ni/lsta reference cycle
Update the ni/lsta reference cycle, add extra checks and assertions.
This is to accomodate problems we were seeing based on net80211
behaviour (join1() and (*iv_update_bss)() as well as state changes for
new iv_bss nodes during an active session).
This should hopefully help to stabilise behaviour until the underlying
problems gets properly addressed (for this and all other device drivers).
Approved by: re (cperciva)
PR: 272607, 273985, 274003
Reviewed by: cc
Differential Revision: https://reviews.freebsd.org/D43753
Bjoern A. Zeeb [Sat, 3 Feb 2024 16:33:56 +0000 (16:33 +0000)]
LinuxKPI: 802.11: band-aid for invalid state changes after (*iv_update_bss)
With firmware based solutions we cannot just jump from an active session
to a new iv_bss node without tearing down state for the old and bringing
up the new node. This likely used to work on softmac based cards/drivers
where one could essentially set the state and fire at will.
We track (*iv_update_bss) calls from net80211 and set a local flag that
we are out of synch and do not allow any further operations up the state
machine until we hit INIT or SCAN. That means someone will take the state
down, clean up firmware state and then we can join again and build up
state.
Apparently this problem has been "known" for a while as native iwm(4) and
others have similar workarounds (though less strict) and can be equally
pestered into bad states. For LinuxKPI all the KASSERTs just massively
brought this problem out. The solution will be some rewrites in net80211.
Until then, try to keep us more stable at least and not die on second
join1() calls triggered by service netif start wlan0 and similar.
Approved by: re (cperciva)
PR: 271979, 271988, 275255, 263613, 274003
Sponsored by: The FreeBSD Foundation (2023, partial)
Reviewed by: cc
Differential Revision: https://reviews.freebsd.org/D43725
Bjoern A. Zeeb [Mon, 19 Feb 2024 08:35:44 +0000 (08:35 +0000)]
Bump __FreeBSD_version for net80211 'struct ieee80211vap' changes
Change 9b998db87c28 changed 'struct ieee80211vap' internals in net80211.
Given we do not have enough spares and the struct is allocated by
drivers, all wireless drivers have to be recompiled.
__FreeBSD_version is updated to 1303001 to track this change.
Bjoern A. Zeeb [Wed, 10 Jan 2024 10:14:16 +0000 (10:14 +0000)]
net80211: deal with lost state transitions
Since 5efea30f039c4 we can possibly lose a state transition which can
cause trouble further down the road.
The reproducer from 643d6dce6c1e can trigger these for example.
Drivers for firmware based wireless cards have worked around some of
this (and other) problems in the past.
Add an array of tasks rather than a single one as we would simply
get npending > 1 and lose order with other tasks. Try to keep state
changes updated as queued in case we end up with more than one at a
time. While this is not ideal either (call it a hack) it will sort
the problem for now.
We will queue in ieee80211_new_state_locked() and do checks there
and dequeue in ieee80211_newstate_cb().
If we still overrun the (currently) 8 slots we will drop the state
change rather than overwrite the last one.
When dequeing we will update iv_nstate and keep it around for historic
reasons for the moment.
The longer term we should make the callers of
ieee80211_new_state[_locked]() actually use the returned errors
and act appropriately but that will touch a lot more places and
drivers (possibly incl. changed behaviour for ioctls).
rtwn(4) and rum(4) should probably be revisted and net80211 internals
removed (for rum(4) at least the current logic still seems prone to
races).
PR: 271979, 271988, 275255, 263613, 274003
Sponsored by: The FreeBSD Foundation (in 2023)
Reviewed by: cc
Differential Revision: https://reviews.freebsd.org/D43389
Given this changes the internal structure of 'struct ieee80211vap',
which gets allocated by the drivers, and we do not have enough
spares, all wireless drivers need to be recompiled.
Given we are forced to do the update, we leave fields in the middle
of the struct and add more spares at the same time.
__FreeBSD_version will get updated to 1303001 to be able to detect
this change.
Bjoern A. Zeeb [Tue, 16 Jan 2024 18:53:06 +0000 (18:53 +0000)]
net80211: fix a NULL deref in ieee80211_sta_join1()
When ieee80211_sta_join1() gets an obss without ni_nt trying to lock
that will cause a NULL pointer deref. Check for the table to be
valid and deal with the obss node accordingly.
This can happen if sta_newstate() calls ieee80211_reset_bss() for
nstate == INIT and ostate != INIT. ieee80211_reset_bss() itself
calls ieee80211_node_table_reset() which calls node_reclaim()
which ends up in ieee80211_del_node_nt() which does remove the
node from the table and sets ni_table to NULL.
That node (former iv_bss) can then be returned as obss in the
(*iv_update_bss)() call in join1().
Approved by: re (cperciva)
Reviewed by: adrian, cc
Differential Revision: https://reviews.freebsd.org/D43469
Bjoern A. Zeeb [Sun, 28 Jan 2024 00:51:23 +0000 (00:51 +0000)]
LinuxKPI: 802.11: fix field order in ieee80211_key_conf
When adding the new field link_id to struct ieee80211_key_conf, it
was erroneously placed at the end of the struct; the zero-length
(variable sized) array for the key always needs to stay last.
Resort fields and add hopefully helpful comment to avoid the problem
in the future.
Approved by: re (cperciva)
Fixes: adff403fe7a87
Reviewed by: cc
Differential Revision: https://reviews.freebsd.org/D43635
Bjoern A. Zeeb [Sun, 12 Nov 2023 23:51:14 +0000 (23:51 +0000)]
net80211: improve logging about state transitions lost
It is possible that we call ieee80211_new_state_locked() again before
a previous task finished to completion (not run yet or unlocked in
between) since 5efea30f039c4 (and follow-up).
In either case we would overwrite the new state and argument in the vap.
While most drivers somehow deal with that (or not), LinuxKPI 802.11 compat
code has KASSERTs to keep net80211, LinuxKPI and driver/firmware state in
sync and they may trigger due to a missing transition or more likely a
changed ni/lsta.
Enhance the wlandebug +state logging for these cases so they
are easier to debug.
While here remove the unconditional logging to the message buffer;
it has been here for a good decade but not helped to actually identify
and sort the problem.
Approved by: re (cperciva)
Sponsored by: The FreeBSD Foundation
Reviewed by: cc
Differential Revision: https://reviews.freebsd.org/D42560
Bjoern A. Zeeb [Tue, 12 Dec 2023 01:59:17 +0000 (01:59 +0000)]
LinuxKPI: 802.11: more TXQ implementation and locking
Implement ieee80211_handle_wake_tx_queue() and ieee80211_tx_dequeue_ni()
while looking at the code. They are needed by various wireless drivers.
Introduce an ltxq lock and protect the skbq by that.
This prevents panics due to a race between a driver upcall and
the net80211 tx downcall. While the former should be rcu protected we
cannot rely on that.
It remains questionable if we need to protect further fields there
(with a different lock?).
Also introduce a txq_mtx on the lhw which needs to be further deployed
but we need to come up with a good strategy to not end up with 7 different
locks.
Approved by: re (cperciva)
Sponsored by: The FreeBSD Foundation
PR: 274178, 275710
Tested by: cc
Bjoern A. Zeeb [Sun, 12 Nov 2023 20:33:41 +0000 (20:33 +0000)]
wpa: ctrl_iface set sendbuf size
In order to avoid running into the default net.local.dgram.maxdgram
of 2K currently when calling sendto(2) try to set the sndbuf size to
the maximum ctrl message size.
While on 14 and 15 this does not actually raise the limit anymore (and be7c095ac99ad29fd72b780c7d58949a38656c66 raised it for syslogd and this),
FreeBSD 13 still requires this change and it will work as expected there.
In addition we always ensure a large enough send buffer this way
independent of kernel defaults.
The problem occured, e.g., when the scan_list result had enough BSSIDs
so the text output would exceed 2048 bytes.
Approved by: re (cperciva)
Sponsored by: The FreeBSD Foundation
PR: 274990
Reviewed by: cy, adrian (with previous comment)
Differential Revision: https://reviews.freebsd.org/D42558
Bjoern A. Zeeb [Wed, 29 Nov 2023 21:33:23 +0000 (21:33 +0000)]
iwlwififw: add firmware for the Bz/B200 chipset
The iwlwifi driver already supports the chipset as "Bz TBD"
(also in 14.0). Add the firmware for it. Successfully tested
for 0x8086/0x272b/0x8086/0x00f4 on arm64 thanks to donated
hardware [1].
Frank Hilgendorf [Wed, 13 Dec 2023 23:48:08 +0000 (23:48 +0000)]
bwn: remove unused ic_headroom
Unlike bwi(4), bwn(4) does not rely on ic_headroom (despite having it
set) but splits the bwn_txhdr (first) segment into its own transaction.
Remove ic_headroom to avoid net80211 troubles with not enough space in
the mbuf around ieee80211_mbuf_adjust().
Andriy Gapon [Tue, 30 Jan 2024 06:45:01 +0000 (08:45 +0200)]
rdmsr_safe/wrmsr_safe: handle pcb_onfault nesting
rdmsr_safe and wrmsr_safe can be called while pcb_onfault is already
set, so the functions are modified to preserve the handler rather than
resetting it before returning.
One case where that happens is when AMD microcode update routine
is executed on a stack where copyin / copyout was already active.
Here is a sample panic message from a crash caused by resetting the
handler:
<118>Updating CPU Microcode...
Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 03
fault virtual address = 0x11ed0de6000
fault code = supervisor write data, page not present
instruction pointer = 0x20:0xffffffff80c2df03
stack pointer = 0x28:0xfffffe01ce4a4c70
frame pointer = 0x28:0xfffffe01ce4a4c70
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 117 (logger)
trap number = 12
panic: page fault
cpuid = 3
time = 1681462027
KDB: stack backtrace:
db_trace_self_wrapper() at 0xffffffff80615deb = db_trace_self_wrapper+0x2b/frame 0xfffffe01ce4a4830
kdb_backtrace() at 0xffffffff80943c77 = kdb_backtrace+0x37/frame 0xfffffe01ce4a48e0
vpanic() at 0xffffffff808f5fe5 = vpanic+0x185/frame 0xfffffe01ce4a4940
panic() at 0xffffffff808f5da3 = panic+0x43/frame 0xfffffe01ce4a49a0
trap_fatal() at 0xffffffff80c31849 = trap_fatal+0x379/frame 0xfffffe01ce4a4a00
trap_pfault() at 0xffffffff80c318b5 = trap_pfault+0x65/frame 0xfffffe01ce4a4a60
trap() at 0xffffffff80c30f5f = trap+0x29f/frame 0xfffffe01ce4a4b80
trap_check() at 0xffffffff80c31c29 = trap_check+0x29/frame 0xfffffe01ce4a4ba0
calltrap() at 0xffffffff80c07fd8 = calltrap+0x8/frame 0xfffffe01ce4a4ba0
--- trap 0xc, rip = 0xffffffff80c2df03, rsp = 0xfffffe01ce4a4c70, rbp = 0xfffffe01ce4a4c70 ---
copyout_nosmap_std() at 0xffffffff80c2df03 = copyout_nosmap_std+0x63/frame 0xfffffe01ce4a4c70
uiomove_faultflag() at 0xffffffff8095f0d5 = uiomove_faultflag+0xe5/frame 0xfffffe01ce4a4cb0
uiomove() at 0xffffffff8095efeb = uiomove+0xb/frame 0xfffffe01ce4a4cc0
pipe_read() at 0xffffffff80968860 = pipe_read+0x230/frame 0xfffffe01ce4a4d30
dofileread() at 0xffffffff809653cb = dofileread+0x8b/frame 0xfffffe01ce4a4d80
sys_read() at 0xffffffff80964fa0 = sys_read+0xc0/frame 0xfffffe01ce4a4df0
amd64_syscall() at 0xffffffff80c3221a = amd64_syscall+0x18a/frame 0xfffffe01ce4a4f30
fast_syscall_common() at 0xffffffff80c088eb = fast_syscall_common+0xf8/frame 0xfffffe01ce4a4f30
--- syscall (3, FreeBSD ELF64, read), rip = 0x11ece41cfaa, rsp = 0x11ecbec4908, rbp = 0x11ecbec4920 ---
Uptime: 41s
And another one:
Fatal trap 12: page fault while in kernel mode
cpuid = 4; apic id = 04
fault virtual address = 0x800a22000
fault code = supervisor write data, page not present
instruction pointer = 0x20:0xffffffff80b2c7ca
stack pointer = 0x28:0xfffffe01c55b5480
frame pointer = 0x28:0xfffffe01c55b5480
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 68418 (pfctl)
trap number = 12
panic: page fault
cpuid = 4
time = 1625184463
KDB: stack backtrace:
db_trace_self_wrapper() at 0xffffffff805c1e8b = db_trace_self_wrapper+0x2b/frame 0xfffffe01c55b5040
kdb_backtrace() at 0xffffffff808874b7 = kdb_backtrace+0x37/frame 0xfffffe01c55b50f0
vpanic() at 0xffffffff808449d8 = vpanic+0x188/frame 0xfffffe01c55b5150
panic() at 0xffffffff808445f3 = panic+0x43/frame 0xfffffe01c55b51b0
trap_fatal() at 0xffffffff80b300a5 = trap_fatal+0x375/frame 0xfffffe01c55b5210
trap_pfault() at 0xffffffff80b30180 = trap_pfault+0x80/frame 0xfffffe01c55b5280
trap() at 0xffffffff80b2f729 = trap+0x289/frame 0xfffffe01c55b5390
trap_check() at 0xffffffff80b304d9 = trap_check+0x29/frame 0xfffffe01c55b53b0
calltrap() at 0xffffffff80b0bb28 = calltrap+0x8/frame 0xfffffe01c55b53b0
--- trap 0xc, rip = 0xffffffff80b2c7ca, rsp = 0xfffffe01c55b5480, rbp = 0xfffffe01c55b5480 ---
copyout_nosmap_std() at 0xffffffff80b2c7ca = copyout_nosmap_std+0x15a/frame 0xfffffe01c55b5480
pfioctl() at 0xffffffff85539358 = pfioctl+0x4d28/frame 0xfffffe01c55b5940
devfs_ioctl() at 0xffffffff807176cf = devfs_ioctl+0xcf/frame 0xfffffe01c55b59a0
VOP_IOCTL_APV() at 0xffffffff80bb26e2 = VOP_IOCTL_APV+0x92/frame 0xfffffe01c55b59c0
VOP_IOCTL() at 0xffffffff80928014 = VOP_IOCTL+0x34/frame 0xfffffe01c55b5a10
vn_ioctl() at 0xffffffff80923330 = vn_ioctl+0xc0/frame 0xfffffe01c55b5b00
devfs_ioctl_f() at 0xffffffff80717bbe = devfs_ioctl_f+0x1e/frame 0xfffffe01c55b5b20
fo_ioctl() at 0xffffffff808abc6b = fo_ioctl+0xb/frame 0xfffffe01c55b5b30
kern_ioctl() at 0xffffffff808abc01 = kern_ioctl+0x1d1/frame 0xfffffe01c55b5b80
sys_ioctl() at 0xffffffff808ab982 = sys_ioctl+0x132/frame 0xfffffe01c55b5c50
syscallenter() at 0xffffffff80b30cc9 = syscallenter+0x159/frame 0xfffffe01c55b5ca0
amd64_syscall() at 0xffffffff80b309a5 = amd64_syscall+0x15/frame 0xfffffe01c55b5d30
fast_syscall_common() at 0xffffffff80b0c44e = fast_syscall_common+0xf8/frame 0xfffffe01c55b5d30
PR: 276426
Reviewed by: kib, markj
Approved by: re (cperciva)
Warner Losh [Mon, 5 Feb 2024 05:43:49 +0000 (22:43 -0700)]
vtnet: Avoid ifdefs based on __NO_STRICT_ALIGNMENT
Some platforms require an adjustment of the ethernet hearders. Rather
than make this be on __NO_STRICT_ALIGNMENT being defined, define
VTNET_ETHER_ALIGN to be either 0 or ETHER_ALIGN (aka 2). Add a test to
the if statements to only do them when != 0. This eliminates the #ifdef
sprinkled in the code, still communicates the intent and gives the same
compiled results.
Warner Losh [Mon, 5 Feb 2024 05:43:39 +0000 (22:43 -0700)]
vtnet: Account for the padding when selecting allocation size
While we account for the padding in the length of the mbuf we use, we do
not account for it when we 'guess' the size of the mbuf to allocate
based in the MTU of the device. This leads to a situation where we might
fail if the mtu is close to a bucket size (say 2018) such that the added
padding would push us over the edge for a full-sized packet. mtu of 2018
is super rare (2016 and 2020 would both work), but fix it none-the-less.
It's a shame we can't just set VTNET_RX_HEADER_PAD to 2 in this case. The 4
seems hard-coded somewhere I've not found documented (I think it's in the
protocol given the comments about VIRTIO_F_ANY_LAYOUT).
Warner Losh [Mon, 29 Jan 2024 05:08:55 +0000 (22:08 -0700)]
vtnet: Adjust for ethernet alignment.
If the header that we add to the packet's size is 0 % 4 and we're
strictly aligning, then we need to adjust where we store the header so
the packet that follows will have it's struct ip header properly
aligned. We do this on allocation (and when we check the length of the
mbufs in the lro_nomrg case). We can't just adjust the clustersz in the
softc, because it's also used to allocate the mbufs and it needs to be
the proper size for that. Since we otherwise use the size of the mbuf
(or sometimes the smaller size of the received packet) to compute how
much we can buffer, this ensures no overflows. The 2 byte adjustment
also does not affect how many packets we can receive in the lro_nomrg
case.
John F. Carr [Thu, 19 Oct 2023 03:02:42 +0000 (21:02 -0600)]
smartpqi: Drop spinlock before freeing memory
pqisrc_free_device frees the device softc with the os spinlock
held. This causes crashes when devices are removed because the memory
free might sleep (which is prohibited with spin locks held). Drop the
spinlock before releasing the memory.
Eugene Grosbein [Mon, 12 Feb 2024 07:24:28 +0000 (14:24 +0700)]
graid: MFC: unbreak Promise RAID1 with 4+ providers
Fix a problem in graid implementation of Promise RAID1 created with 4+ disks.
Such an array generally works fine until reboot only due to a bug
in metadata writing code. Before the fix, next taste erronously created
RAID1E (kind of RAID10) instead of RAID1, hence graid used wrong offsets
for I/O operations.
The bug did not affect Promise RAID1 arrays with 2 or 3 disks only.
Dimitry Andric [Wed, 14 Feb 2024 19:41:09 +0000 (20:41 +0100)]
lld: work around elftoolchain bug which causes bloated RISCV binaries
The elftoolchain strip(1) command appears to have trouble with the new
.riscv.attributes sections being added by lld 17 to RISCV binaries. This
causes huge 'holes' in the files, making them larger than necessary.
Since nothing in the base system uses the new section yet, patch lld to
leave it out for now.
Direct commit to stable/13, since this intended to go into the 13.3
release, while the elftoolchain bug is being investigated.
Reported by: karels
Submitted by: jrtc27
Approved by: re (cperciva)
Warner Losh [Tue, 6 Feb 2024 23:11:38 +0000 (16:11 -0700)]
leapseconds: Update to the canonical place.
IERS is the source of truth for leap seconds. Their leapsecond file is
updated most quickly and is always right (unlike the IANA one which
often lags). IERS operates this public service for the express purpose
of random people downloading it. Their terms of service are compatible
with open source (we could include this in our release). Rather than
fighting with questions around this because the IANA one changed
locations or the auto update script broken, just use this.
This is in preference to the NIST ftp copy. NIST is in the process of
retiring their FTP services.
Check for privilege PRIV_SCHED_SETPOLICY instead of PRIV_SCHED_SET, to
at least make it coherent with what is done at thread creation when
a realtime policy is requested, and have users authorized by
mac_priority(4) pass it.
This change is good enough in practice since it only allows 'root' (as
before) and mac_priority(4)'s authorized users in (the point of this
change), without other side effects. More changes in this area, to
generally ensure that all privilege checks are consistent, are going to
come as olce's priority revamp project lands.
Alan Somers [Thu, 25 Jan 2024 15:19:37 +0000 (08:19 -0700)]
fusefs: fix invalid value for st_birthtime.tv_nsec
If a file system's on-disk format does not support st_birthtime, it
isn't clear what value it should return in stat(2). Neither our man
page nor the OpenGroup specifies. But our convention for UFS and
msdosfs is to return { .tv_sec = -1, .tv_nsec = 0 }. fusefs is
different. It returns { .tv_sec = -1, .tv_nsec = -1 }. It's done that
ever since the initial import in SVN r241519.
Most software apparently handles this just fine. It must, because we've
had no complaints. But the Rust standard library will panic when
reading such a timestamp during std::fs::metadata, even if the caller
doesn't care about that particular value. That's a separate bug, and
should be fixed.
Change our invalid value to match msdosfs and ufs, pacifying the Rust
standard library.
Mark Johnston [Wed, 7 Feb 2024 14:43:25 +0000 (09:43 -0500)]
inpcb: Restore some NULL checks of credential pointers
At least one out-of-tree port (net-mgmt/ng_ipacct) depends on being able
to call in_pcblookup_local() with cred == NULL, so the MFC of commit ac1750dd143e ("inpcb: Remove NULL checks of credential references")
broke compatibility.
Restore a subset of the NULL checks to avoid breaking the module in the
13.3 release. This is a direct commit to stable/13.
Kyle Evans [Tue, 16 Jan 2024 02:55:58 +0000 (20:55 -0600)]
kern: tty: fix ttyinq_read_uio assertion
It's clear from later context that `rlen` was always expected to include
`flen`, as we'll trim `flen` bytes from the end of the read. Relax our
initial assertion to only require the total size less trimmed bytes to
lie within the out buffer size.
While we're here, I note that if we have to read more than one block and
we're trimming from the end then we'll do the wrong thing and omit
`flen` bytes from every block, rather than just the end. Add an
assertion to make sure we're not doing that, but the only caller that
specifies a non-zero `flen` today will only really be doing so if rlen
is entirely within a single buffer.
Olivier Certner [Thu, 25 Jan 2024 22:25:10 +0000 (23:25 +0100)]
login_cap.h: Remove LOGIN_DEFPRI
This is an implementation detail which is likely to become irrelevant in
the future, as we move to not resetting the priority if the
corresponding capability is not present in the configuration file
('/etc/login.conf').
GitHub's code search and Google show no use of this public constant, and
it doesn't exist in OpenBSD and NetBSD.
So, remove this definition and its sole use in-tree.
PR: 276570 (exp-run)
Reviewed by: emaste
Approved by: emaste (mentor)
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D43609
Olivier Certner [Thu, 25 Jan 2024 22:10:40 +0000 (23:10 +0100)]
login_cap.h: Remove LOGIN_DEFUMASK
This public constant has not been used in-tree since 1997 (this was
noticed while working on previous commit "setusercontext(): umask: Set
it only once (in the common case)").
Since it was an implementation detail and GitHub's code search and
Google show no use of this symbol today, simply remove it.
PR: 276570 (exp-run)
Reviewed by: emaste, kib (earlier version, then part of D40344)
Approved by: emaste (mentor)
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D43608
Olivier Certner [Tue, 30 May 2023 15:14:50 +0000 (17:14 +0200)]
setusercontext(): Set priority from '~/.login_conf' as well
Setting the process priority is done only when the current process'
effective UID corresponds to that for which context is to be set.
Consequently, setting priority is done with appropriate credentials and
will fail if the target user tries to raise it unduly via his
'~/.login_conf'.
PR: 271751
Reviewed by: kib, Andrew Gierth <andrew_tao173.riddles.org.uk>
Approved by: emaste (mentor)
MFC after: 3 days
Relnotes: yes
Sponsored by: Kumacom SAS
Differential Revision: https://reviews.freebsd.org/D40352
Olivier Certner [Wed, 21 Jun 2023 08:53:37 +0000 (10:53 +0200)]
setclasspriority(): New possible value 'inherit'
It indicates to the login.conf machinery (setusercontext() /
setclasscontext()) to leave priority alone, effectively inheriting it
from the parent process.
Olivier Certner [Wed, 21 Jun 2023 08:39:15 +0000 (10:39 +0200)]
login.conf(5): Document priority's default and possible values
Priority is reset to 0 if not explicitly specified.
While here, be more explicit about what "Initial priority (nice) level"
means and document that it is possible to set real-time or idle class'
priorities with this capability.
Reviewed by: emaste
Approved by: emaste (mentor)
MFC after: 3 days
Sponsored by: Kumacom SAS
Differential Revision: https://reviews.freebsd.org/D40689
Olivier Certner [Mon, 29 May 2023 16:39:04 +0000 (18:39 +0200)]
setusercontext(): Better error messages when priority is not set correctly
Polish the syslog messages to contain readily useful information.
Behavior of capability 'priority' is inconsistent with what is done for
all other contexts: 'umask', 'cpumask', resource limits, etc., where an
absence of capability means to inherit the value. It is currently
preserved for compatibility, but is subject to change on a future major
release.
Olivier Certner [Thu, 25 May 2023 12:18:45 +0000 (14:18 +0200)]
setusercontext(): umask: Set it only once (in the common case)
Simplify the code and make it more coherent (umask was the only context
setting not modified by setlogincontext() directly).
Preserve the current behavior of not changing the umask if none is
specified in the login class capabilities database, but without the
superfluous umask() dance. (The only exception to this is that
a special value no user is likely to input in the database now stands
for no specification.)
If some user has a 'umask' override in its '~/.login_conf', the umask
will still be set twice as before (as is the case for all other context
settings overriden in '~/.login_conf').
Log a warning in case of an invalid umask specification.
This change makes it apparent that the value of LOGIN_DEFUMASK doesn't
matter. It will be removed in a subsequent commit.
Mike Karels [Sat, 27 Jan 2024 15:40:07 +0000 (09:40 -0600)]
inet(3): clarify syntax accepted by inet_pton
The section INTERNET ADDRESSES describes the acceptance of dotted
values with varying number of parts in multiple bases. This applies
to inet_aton and inet_addr, but not to inet_pton. Clarify this
section by listing the functions to which this applies. Move the
description of what inet_pton accepts into this section from STANDARDS,
where it is easily missed. Rename the section to clarify that it
applies only to IPv4. (inet_pton also works with IPv6.)
Mike Karels [Fri, 5 Jan 2024 19:41:24 +0000 (13:41 -0600)]
arm64/RPI: enable powerd by default on arm64-aarch64-RPI images
Most 64-bit Raspberry Pi models have a variable processor clock
speed that defaults to a slow speed (e.g. 600 MHz for a nominal
1.5 GHz clock). This results in everything running slowly unless
or until powerd is started, and FreeBSD is then thought to be slow.
Enable powerd by default in /etc/rc.conf on the arm64-aarch64-RPI
images. Tested on Raspberry Pi 3B+ and 4B so far.
Dimitry Andric [Sat, 27 Jan 2024 21:51:08 +0000 (22:51 +0100)]
Add libllvm and liblldb source files to enable WITH_ASAN build
This is another part of fixing the WITH_ASAN build. Some additional
source files had to be added to libllvm and liblldb, since the ASan
instrumentation causes symbols in those files to be referenced.
Dimitry Andric [Sat, 27 Jan 2024 21:24:38 +0000 (22:24 +0100)]
msun: remove fabs from Symbol.map, and adjust comment
We have s_fabs.c, but fabs(3) is already provided by libc due to
historical reasons, so it is not compiled into libm. When the linker
does not use --undefined-version, this leads to a complaint about the
symbol being nonexistent, so remove it from Symbol.map.
While here, adjust the comment about some functions being supplied by
libc: while it is true that all these are indeed in libc, libm still
includes its own versions of frexp(3), isnan(3), isnanf(3), and
isnanl(3).
Reported by: Steve Kargl <sgk@troutmask.apl.washington.edu>
MFC after: 3 days
a freebsd dev member reported a symbol conflict and intercepting this
had little value anyway.
This is one part of fixing the WITH_ASAN build. Some executables in the
base system define their own hexdump() symbol, which would otherwise
conflict with the ASan-intercepted one.
Kyle Evans [Tue, 16 Jan 2024 02:55:59 +0000 (20:55 -0600)]
kern: tty: recanonicalize the buffer on ICANON/VEOF/VEOL changes
Before this change, we would canonicalize any partial input if the new
local mode is not ICANON, but that's about it. If we were switching
from -ICANON -> ICANON, or if VEOF/VEOL changes, then our internal canon
accounting would be wrong.
The main consequence of this is that in ICANON mode, we would
potentially hang a read(2) longer if the new VEOF/VEOL appears later in
the buffer, and FIONREAD would be similarly wrong as a result.
Kyle Evans [Tue, 16 Jan 2024 02:55:59 +0000 (20:55 -0600)]
kern: pts: do not special case closed slave side
This would previously return 1 if the slave side of the pts was closed
to force an application to read() from it and observe the EOF, but it's
not clear why and this is inconsistent both with how we handle devices
with similar mechanics (like pipes) and also with other kernels, such as
OpenBSD/NetBSD and Linux.
Kyle Evans [Tue, 16 Jan 2024 02:55:58 +0000 (20:55 -0600)]
kern: tty: fix EOF handling for canonical reads
If the read(2) buffer is one byte short of an EOF, then we'll end up
reading the line into the buffer, then re-entering and seeing an EOF at
the beginning of the inq, assuming it's a zero-length line.
Fix this corner-case by searching one more byte than we have available
for an EOF. If we found it, then we'll trim it here; otherwise, we'll
limit our read to just the space we have in the out buffer and the next
read(2) will (potentially) read the remainder of the line.
Fix FIONREAD while we're here to match what an application can expect
read(2) to return -- scan for the first break character in the part of
the input that's been canonicalized, we'll never return more than that.
Michael Osipov [Fri, 24 Nov 2023 09:26:41 +0000 (10:26 +0100)]
periodic: Make daily diff(1) output as small is possible
Make, by default, daily diff(1) ignore whitespace changes and the unified output
a context of zero (0) lines. This reduces output of unrelated lines in e-mails
delivered to root.
Michael Osipov [Fri, 24 Nov 2023 09:26:41 +0000 (10:26 +0100)]
periodic: Make security diff(1) output as small is possible
Make, by default, security diff(1) produce a unified output with a context of
zero (0) lines. This reduces output of unrelated lines in e-mails delivered
to root.
Xin LI [Sat, 27 Jan 2024 03:09:39 +0000 (19:09 -0800)]
releng-gce: Advertise the availability of UEFI support in GCE images.
The amd64 and arm64 images supported UEFI, mark it as so users can take
advantage of UEFI boot on GCE. This is already done on FreeBSD
14.0-RELEASE but never codified into the release tools (and should).
Aaron LI [Mon, 22 Jan 2024 16:18:56 +0000 (10:18 -0600)]
wg: detach bpf upon destroy as well
bpfattach() is called in wg_clone_create(), but the bpfdetach() is
missing from wg_close_destroy(). Add the missing bpfdetach() to avoid
leaking both the associated bpf bits as well as the ifnet that bpf will
hold a reference to.
Aaron LI [Wed, 17 Jan 2024 23:29:23 +0000 (23:29 +0000)]
if_wg: fix access to noise_local->l_has_identity and l_private
These members are protected by the identity lock, so rlock it in
noise_remote_alloc() and then assert that we have it held to some extent
in noise_precompute_ss().
Aaron LI [Wed, 17 Jan 2024 23:29:23 +0000 (23:29 +0000)]
if_wg: fix erroneous calculation in calculate_padding() for p_mtu == 0
In practice this is harmless; only keepalive packets may realistically have
p_mtu == 0, and they'll also have no payload so the math works out the same
either way. Still, let's prefer technical accuracy and calculate the amount
of padding needed rather than the padded length...
Mark Johnston [Mon, 15 Jan 2024 17:29:02 +0000 (12:29 -0500)]
condvar: Fix a user-after-free in _cv_wait() when ktrace is enabled
When a thread wakes up after sleeping on a CV, it must not dereference
the CV structure, as it may already have been freed. At least ZFS
relies on this invariant, see commit c636f94bd2ff15be5b904939872b4bce31456c18 for example.
Thus, when logging context-switch events, copy the wmesg into a stack
buffer while it is still safe to do so, and log that after waking up.
While here, move the initial ktrcsw() call later, after assertions and
the SCHEDULER_STOPPED_TD() condition are checked.
Mark Johnston [Mon, 15 Jan 2024 17:27:11 +0000 (12:27 -0500)]
condvar: Clean up condvar.h a bit
- Remove a typedef that has been unused for a long time.
- Remove a LOCORE guard. MI headers like condvar.h don't need such a
guard in general.
- Move a forward declaration into the _KERNEL block.
- Add a types.h include to make the file self-contained.