The commit that purported to fix CVE-2014-8611 (805288c2f062) only hid
it behind another bug. Two later commits, 86a16ada1ea6 and 44cf1e5eb470, attempted to address this new bug but mostly just confused
the issue. This commit rolls back the three previous changes and fixes
CVE-2014-8611 correctly.
The key to understanding the bug (and the fix) is that `_w` has
different meanings for different stream modes. If the stream is
unbuffered, it is always zero. If the stream is fully buffered, it is
the amount of space remaining in the buffer (equal to the buffer size
when the buffer is empty and zero when the buffer is full). If the
stream is line-buffered, it is a negative number reflecting the amount
of data in the buffer (zero when the buffer is empty and negative buffer
size when the buffer is full).
At the heart of `fflush()`, we call the stream's write function in a
loop, where `t` represents the return value from the last call and `n`
the amount of data that remains to be written. When the write function
fails, we need to move the unwritten data to the top of the buffer
(unless nothing was written) and adjust `_p` (which points to the next
free location in the buffer) and `_w` accordingly. These variables have
already been set to the values they should have after a successful
flush, so instead of adjusting them down to reflect what was written,
we're adjusting them up to reflect what remains.
The bug was that while `_p` was always adjusted, we only adjusted `_w`
if the stream was fully buffered. The fix is to also adjust `_w` for
line-buffered streams. Everything else is just noise.
Fixes: 805288c2f062 Fixes: 86a16ada1ea6 Fixes: 44cf1e5eb470
Sponsored by: Klara, Inc.
linuxkpi: races between linux_queue_delayed_work_on() and linux_cancel_delayed_work_sync()
1. Suppose that linux_queue_delayed_work_on() is called with
non-zero delay and found the work.state WORK_ST_IDLE. It
resets the state to WORK_ST_TIMER and locks timer.mtx. Now, if
linux_cancel_delayed_work_sync() was also called meantime, read
state as WORK_ST_TIMER and already taken the mutex, it is executing
callout_stop() on non-armed callout. Then linux_queue_delayed_work_on()
continues and schedules callout. But the return value from cancel() is
false, making it possible to the requeue from callback to slip in.
2. If linux_cancel_delayed_work_sync() returned true, we need to cancel
again. The requeue from callback could have revived the work.
The end result is that we schedule callout that might be freed, since
cancel_delayed_work_sync() claims that everything was stopped. This
contradicts the way the KPI is used in Linux, where consumers expect
that cancel_delayed_work_sync() is reliable on its own.
Zhenlei Huang [Tue, 7 Nov 2023 04:45:25 +0000 (12:45 +0800)]
kern linker: Do not retry loading modules on EEXIST
LINKER_LOAD_FILE() calls linker_load_dependencies() which will return
EEXIST in case the module to be loaded has already been compiled into
the kernel. Since the format of the module is now recognized then there
is no need to retry loading with a different linker, otherwise the
userland will get misleading error number ENOEXEC.
Rick Macklem [Mon, 6 Nov 2023 22:25:30 +0000 (14:25 -0800)]
nfscl: newnfs_copycred() cannot be called when a mutex is held
Since newnfs_copycred() calls crsetgroups() which in turn calls
crextend() which might do a malloc(M_WAITOK), newnfs_copycred()
cannot be called with a mutex held. Fortunately, the malloc()
call is rarely done, since XU_GROUPS is 16 and the NFS client
uses a maximum of 17 (only 17 groups will cause the malloc() to
be called). Further, it is only a problem if the malloc() tries
to sleep(). As such, this bug does not seem to have caused
problems in practice.
This patch fixes the one place in the NFS client where
newnfs_copycred() is called while a mutex is held by moving the
call to after where the mutex is released.
Found by inspection while working on an experimental patch.
Mark Johnston [Mon, 6 Nov 2023 19:57:56 +0000 (14:57 -0500)]
e6000sw: Fix locking in miibus_{read,write}reg implementations
Commit 469290648005e13b819a19353032ca53dda4378f made e6000sw's
implementation of miibus_(read|write)reg assume that the softc lock is
held. I presume that is to avoid lock recursion in e6000sw_attach() ->
e6000sw_attach_miibus() -> mii_attach() -> MIIBUS_READREG().
However, the lock assertion in e6000sw_readphy_locked() can fail if a
different driver uses the interface to probe registers. Work around the
problem by providing implementations which lock the softc if it is not
already locked.
Warner Losh [Mon, 6 Nov 2023 17:47:15 +0000 (10:47 -0700)]
cam: Minor opt_cam.h cleanup
sys/cam/cam.h includes opt_cam.h, so none of the clients need to do
this. cam.h does all the right dancing to conditionally include
opt_cam.h only when it makes sense. It generally only matters when
cam_debug.h is included (it must be included before that). Many of the
stray opt_cam.h includes were after cam_debug.h which would be a problem
were it not included in cam/cam.h. The other users of CAM options that
aren't debug all already include cam/cam.h.
Also trim unneeded sys/cdefs.h files from the files touched.
Alexander Motin [Mon, 6 Nov 2023 16:05:48 +0000 (11:05 -0500)]
nvme: Introduce longer timeouts for admin queue
KIOXIA CD8 SSDs routinely take ~25 seconds to delete non-empty
namespace. In some cases like hot-plug it takes longer, triggering
timeout and controller resets after just 30 seconds. Linux for many
years has separate 60 seconds timeout for admin queue. This patch
does the same. And it is good to be consistent.
Kristof Provost [Mon, 6 Nov 2023 10:57:35 +0000 (11:57 +0100)]
libpfctl: handle the 'pfctl' netlink family not being supported
If we fail to find the pfctl family we should not attempt to make the
call. That means that either pf is not loaded, or it's a very old (i.e.
pre-netlink) version.
Reported by: manu
Sponsored by: Rubicon Communications, LLC ("Netgate")
Roger Pau Monné [Fri, 3 Nov 2023 09:28:16 +0000 (10:28 +0100)]
xen-netfront: attempt to make cleanup idempotent
Current cleanup code assumes that all the fields are allocated and/or setup by
the time cleanup is called, but this is not always true: a failure in mid-setup
of the device will cause the functions to be called with possibly uninitialized
fields.
Fix the functions to cope with such sate, while also attempting to make the
cleanup idempotent.
Finally fix an error path during setup that would not mark the device as
closed, and hence prevents the kernel from finishing booting.
Is plain bogus, for once grant_ref_t is the type of the grant reference, but
not the entry used to store such references in the grant frames. But even if
the above calculation is switched to use grant_entry_v1_t, it would end up as:
Michael Tuexen [Sun, 5 Nov 2023 19:32:46 +0000 (20:32 +0100)]
if_tuntap: trigger the bpf hook on transmitting for the tap interface
The tun interface triggers the bpf hook when a packet is transmitted,
the tap interface triggers it when the packet is read from the
character device. This is inconsistent.
So fix the tap device such that it behaves like the tun device.
This is needed for adding support for the tap device to packetdrill.
Michael Tuexen [Sun, 5 Nov 2023 14:28:54 +0000 (15:28 +0100)]
udplite: make socketoption available on IPv6 sockets
This patch allows the IPPROTO_UDPLITE-level socket options
UDPLITE_SEND_CSCOV and UDPLITE_RECV_CSCOV to be used on
AF_INET6 sockets in addition to AF_INET sockets.
Kyle Evans [Sun, 5 Nov 2023 02:08:36 +0000 (21:08 -0500)]
grep: don't rely on implementation-defined malloc(0) behavior
The very few places that rely on malloc/calloc of a zero-size region
won't attempt to dereference it, so just return NULL rather than rolling
the dice with the underlying malloc implementation.
Dan Mcgregor [Sat, 4 Nov 2023 22:07:56 +0000 (15:07 -0700)]
mountd: Add support for spaces in exported directories
The previous code would correctly parse strings including quotation
marks (") or backslash (/), but the tests when creating the export
includes them in the final string. This prevents exporting paths
with embedded spaces, for example "/exports/with space". Trying
results in log lines resembling:
mountd[1337]: bad exports list line '/exports/with\ space':
/exports/with\ space: lstat() failed: No such file or directory.
Turns out that when creating its exports list, zfs escapes strings
in a format compatible with vis(3). Since I expect that zfs sharenfs
is the dominating use case for generating an exports list, use
strunvis(3) to parse the export path. The result is lines like the
following allowing spaces:
128f63cedc14 and 9e589b093857 added proper UTF-8 backspacing handling in
the tty(4) driver, which is enabled by setting the new IUTF8 flag
through stty(1). Since the default locale is UTF-8, and the feature
itself is important enough, enable IUTF8 by default.
Related discussion:
https://lists.freebsd.org/archives/freebsd-arch/2023-November/000534.html
bsd.progs.mk must pass META_XTRAS to gendirdeps.mk
The indirection used by bsd.progs.mk is setting META_XTRAS
means the value needs to be passed in the environment to
gendirdeps.mk, as any expansion before then will be empty.
Remove a now misleading comment from bsd.progs.mk
before it includes bsd.prog.mk
When the cross-mount walking logic in vfs_lookup() was factored into
a separate function, the main cross-mount traversal loop was changed
from a do...while loop conditional on the current vnode having
VIRF_MOUNTPOINT set to an unconditional for(;;) loop. For the
unionfs 'crosslock' case in which the vnode may be re-locked, this
meant that continuing the loop upon finding inconsistent
v_mountedhere state would no longer branch to a check that the vnode
is in fact still a mountpoint. This would in turn lead to over-
iteration and, for INVARIANTS builds, a failed assert on the next
iteration.
Add compat.aarch32 tunables for maxssiz, maxdsiz, and maxvmem.
Set the default values same as for amd64.
Fix freebsd32 sysentvec on arm64 to provide sv_maxssiz, and sv_fixlimit.
PR: 274705
Reviewed by: markj
Tested by: fuz
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D42451
Mark Johnston [Sat, 4 Nov 2023 14:28:24 +0000 (10:28 -0400)]
pfsync: Avoid transmitting uninitialized bytes in pfsync_sendout()
When IPv6 support was added to pfsync, PFSYNC_MINPKT increased such that
we always allocate enough space for either IPv4 or IPv6 headers. IPv6
headers are 20 bytes larger than IPv4 headers. When pfsync_sendout()
does its thing, it ends up allocating enough space for either; thus when
transmitting an IPv4 packet, the last 20 bytes of the buffer are left
uninitialized.
Fix the problem by stashing the length in a local variable and adjusting
it depending on the address family in use.
While here, just zero the entire buffer in one go rather than being
careful to initialize each subheader. This seems simpler and less error
prone.
Reported by: KMSAN
Reviewed by: kp
Fixes: 6fc7fc2dbb2b ("pfsync: transport over IPv6")
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D42461
net/frr[89] revealed an interesting edge-case on arm when dynamically
linking a shared library that declares more than one static TLS variable
with at least one using the "initial-exec" TLS model. In the case
of frr[89], this library was libfrr.so which essentially does the
following:
#include <stdio.h>
#include "lib.h"
static __thread int *a
__attribute__((tls_model("initial-exec")));
void lib_test()
{
static __thread int b = -1;
printf("&a = %p\n", &a);
printf(" a = %p\n", a);
printf("\n");
printf("&b = %p\n", &b);
printf(" b = %d\n", b);
}
Allocates a file scoped `static __thread` pointer with
tls_model("initial-exec") and later a block scoped TLS int. Notice in
the above minimal reproducer, `b == -1`. The relocation process does
the wrong thing and ends up pointing both `a` and `b` at the same place
in memory.
Bjoern A. Zeeb [Thu, 26 Oct 2023 20:55:59 +0000 (20:55 +0000)]
net80211: add ieee80211_add_vhtcap_ch()
Add an implementation of ieee80211_add_vhtcap() which works based on
information derived from the vap (and possibly channel/band but we do
not support that yet in net80211). This is needed for scans request
information in LinuxKPI at times before we have a BSS.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Reviewed by: adrian, cc
Differential Revision: https://reviews.freebsd.org/D42422
Like for the VAP rename ic_flags_vht to ic_vht_flags for consistency to
keep "VHT" fields together and merge ic_vhtcaps and ic_vht_mcsinfo
into struct ieee80211_vht_cap ic_vht_cap.
While the structure layout changes no other functional changes intended.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Reviewed by: adrian, cc
Differential Revision: https://reviews.freebsd.org/D42421
Bjoern A. Zeeb [Fri, 27 Oct 2023 18:33:22 +0000 (18:33 +0000)]
net80211: combine iv_vhtcaps and iv_vht_mcsinfo
The iv_vhtcaps and iv_vht_mcsinfo fields together form
struct ieee80211_vht_cap so combine them into one field in the VAP
and keep the information together.
While the structure layout changes no other functional changes intended.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Reviewed by: adrian, cc
Differential Revision: https://reviews.freebsd.org/D42420
Bjoern A. Zeeb [Fri, 27 Oct 2023 18:18:24 +0000 (18:18 +0000)]
net80211: rename iv_flags_vht to iv_vht_flags
While the flag field is internal start naming it as well as "iv_vht*"
so we keep all "VHT" fields together. This breaks with what was done
done for HT but with HE, EHT, .. coming one day seems the more logic
choice.
No functional changes intended.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Reviewed by: adrian, cc
Differential Revision: https://reviews.freebsd.org/D42419
Bjoern A. Zeeb [Fri, 27 Oct 2023 20:41:43 +0000 (20:41 +0000)]
LinuxKPI: 802.11: deal with scan_ie_len
We only need to reserve the extra space for DSSS if
NL80211_FEATURE_DS_PARAM_SET_IE_IN_PROBES is set, so add the conditional.
Also add checks in case scan_ie_len will grow beyond the maximum.
Given this is currently unlikely, leave the cleanup for later as
some other restructuring should be done first.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Reviewed by: cc
Differential Revision: https://reviews.freebsd.org/D42425
Fix the last argument passed to ieee80211_add_channel_cbw() in
lkpi_ic_getradiocaps() for both 2Ghz and 5Ghz bands.
We passed in the unmodified version rather than the adjusted version
based on the per-band channel information possibly leaving
ieee80211_channel_flags enabled which should not be.
So far this should not have made a difference given we did not enable
HT or VHT.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Reviewed by: cc
Differential Revision: https://reviews.freebsd.org/D42424
Bjoern A. Zeeb [Wed, 25 Oct 2023 22:29:35 +0000 (22:29 +0000)]
LinuxKPI: 802.11: error on state transition failure
The state transition failures we were seeing in the early days are
solved. If we now experience one stop processing before passing
over to net80211 (sta_newstate()) and before updating iv_state on
the vap.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Reviewed by: cc
Differential Revision: https://reviews.freebsd.org/D42423
Ed Maste [Tue, 11 Oct 2022 19:27:51 +0000 (15:27 -0400)]
Track upstream project rename in contrib/blocklistd
Upstream is now https://github.com/zoulasc/blocklist/. Rename the
contrib directory and update Makefiles to match, in advance of the next
vendor branch update.
Shawn Anastasio [Fri, 3 Nov 2023 17:40:18 +0000 (14:40 -0300)]
powerpc: Fix inconsistent Altivec handling in set_mcontext
When support for fpu_kern_enter/fpu_kern_leave was added to powerpc,
set_mcontext was updated to handle Altivec state restoration in the same
way that the FPU state by lazily restoring the context on the first
trap. However the function was not correctly updated to unconditionally
clear the PCB_VEC and PSL_VEC bits from the pcb's flags and srr1
respectively which can sometimes result in a mismatch between a
process's MSR[VEC] state and its pcb_flags.
Fix this by simply clearing the VEC flags unconditionally in
set_mcontext, which is already done for FPU/VSX.
Warner Losh [Thu, 2 Nov 2023 20:41:09 +0000 (14:41 -0600)]
cam: Make cam_debug macros atomic
The CAM_DEBUG* macros use multiple printfs to dump the data. This is
suboptimal when tracing things that produce even a moderate amount since
it gets intertwingled. I can't even turn on tracing with a 24-disk HBA
on boot without it getting messed up. Add helper routines to work around
clang's over-use of the stack: that way we only pay the stack penalty
when a trace hits.
This is used to the package annotation helping pkg to know about
backward compatibility is set to the version of the packages not
the version of the host building the packages
pkgbase: set a default set of kernel for when PACKAGE_BUILDING=1
PACKAGE_BUILDING is already known in the ports tree as a variable
use to defined when the packages is being actually built in an
automation process, reuse that variable to define the default set
of kernel we plan to build for the default pkgbase.
Rick Macklem [Thu, 2 Nov 2023 21:07:01 +0000 (14:07 -0700)]
krpc: Display stats of TLS usage
This patch adds some sysctls:
kern.rpc.unenc.tx_msgcnt
kern.rpc.unenc.tx_msgbytes
kern.rpc.unenc.rx_msgcnt
kern.rpc.unenc.rx_msgbytes
kern.rpc.tls.tx_msgcnt
kern.rpc.tls.tx_msgbytes
kern.rpc.tls.rx_msgcnt
kern.rpc.tls.rx_msgbytes
kern.rpc.tls.handshake_success
kern.rpc.tls.handshake_failed
kern.rpc.tls.alerts
which allow a NFS server sysadmin to determine how much
NFS-over-TLS is being used. A large number of failed
handshakes might also indicate an NFS confirguration
problem.
This patch moves the definition of "kern.rpc" from the
kgssapi module to the krpc module. As such, both modules
need to be rebuilt from sources. Since __FreeBSD_version
was bumped yesterday, I will not bump it again.
Suggested by: gwollman
Discussed on: freebsd-current
MFC after: 1 month
Mark Johnston [Thu, 2 Nov 2023 18:34:26 +0000 (14:34 -0400)]
riscv: Remove unnecessary invalidations in pmap_enter_quick_locked()
This function always overwrites an invalid PTE, so if
pmap_try_insert_pv_entry() fails it is certainly not necessary to
invalidate anything, because the PTE has not yet been written by that
point.
It should also not be necessary to invalidate TLBs after overwriting an
invalid entry. In principle the TLB could cache negative entries, but
then the worst case scenario is a spurious fault. Since pmap_enter()
does not bother issuing an sfence.vma, pmap_enter_quick_locked() should
behave similarly.
Mark Johnston [Thu, 2 Nov 2023 18:33:55 +0000 (14:33 -0400)]
riscv: Port improvements from arm64/amd64 pmaps, part 2
- Give pmap_promote_l2() a return value indicating whether or not
promotion succeeded.
- Check pmap_ps_enabled() in pmap_promote_l2() rather than making
callers do it.
- Annotate superpages_enabled with __read_frequently.
Mark Johnston [Thu, 2 Nov 2023 18:33:37 +0000 (14:33 -0400)]
riscv: Port improvements from arm64/amd64 pmaps, part 1
- When promoting, do not require that all PTEs all have PTE_A set.
Instead, record whether they did and store this information in the
PTP's valid bits.
- Synchronize some comments in pmap_promote_l2().
- Make pmap_promote_l2() scan starting from the end of the 2MB range
instead of the beginning. See the commit log for 9d1b7fa31f510 for
justification of this, which I believe applies here as well.
Mark Johnston [Thu, 2 Nov 2023 18:30:10 +0000 (14:30 -0400)]
amd64: Remove PMAP_INLINE
With clang it expands to "inline"; clang in practice may inline
externally visible functions even without the hint. So just remove the
hints and let the compiler decide.
No functional change intended. pmap.o is identical before and after
this patch.
Olivier Certner [Fri, 13 Oct 2023 08:52:31 +0000 (10:52 +0200)]
Ensure 'struct thread' is aligned to a cache line
Using the new UMA_ALIGN_CACHE_AND_MASK() facility, which allows to
simultaneously guarantee a minimum of 32 bytes of alignment (the 5 lower
bits are always 0).
For the record, to this day, here's a (possibly non-exhaustive) list of
synchronization primitives using lower bits to store flags in pointers
to thread structures:
- lockmgr, rwlock and sx all use the 5 bits directly.
- rmlock indirectly relies on sx, so can use the 5 bits.
- mtx (non-spin) relies on the 3 lower bits.
Reviewed by: markj, kib
MFC after: 2 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42266
Olivier Certner [Fri, 13 Oct 2023 15:05:34 +0000 (17:05 +0200)]
uma: Permit specifying max of cache line and some custom alignment
To be used for structures for which we want to enforce that pointers to
them have some number of lower bits always set to 0, while still
ensuring we benefit from cache line alignment to avoid false sharing
between structures and fields within the structures (provided they are
properly ordered).
First candidate consumer that comes to mind is 'struct thread', see next
commit.
Reviewed by: markj, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42265
Olivier Certner [Fri, 13 Oct 2023 15:13:28 +0000 (17:13 +0200)]
linuxkpi: dma_get_cache_alignment(): Fix off-by-one result
Substituting 'uma_align_cache' by the appropriately named accessor
uma_get_cache_align_mask() made apparent that dma_get_cache_alignment()
was off by one, since it was defined to be the mask derived from the
alignment value.
Reviewed by: markj, bz
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42264
Olivier Certner [Fri, 13 Oct 2023 14:09:51 +0000 (16:09 +0200)]
uma: New check_align_mask(): Validate alignments (INVARIANTS)
New function check_align_mask() asserts (under INVARIANTS) that the mask
fits in a (signed) integer (see the comment) and that the corresponding
alignment is a power of two.
Use check_align_mask() in uma_set_align_mask() and also in uma_zcreate()
to replace the KASSERT() there (that was checking only for a power of
2).
Reviewed by: kib, markj
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42263
Olivier Certner [Fri, 13 Oct 2023 12:49:11 +0000 (14:49 +0200)]
uma: Make the cache alignment mask unsigned
In uma_set_align_mask(), ensure that the passed value doesn't have its
highest bit set, which would lead to problems since keg/zone alignment
is internally stored as signed integers. Such big values do not make
sense anyway and indicate some programming error. A future commit will
introduce checks for this case and other ones.
Reviewed by: kib, markj
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42262
Olivier Certner [Fri, 13 Oct 2023 12:13:30 +0000 (14:13 +0200)]
uma: UMA_ALIGN_CACHE: Resolve the proper value at use point
Having a special value of -1 that is resolved internally to
'uma_align_cache' provides no significant advantages and prevents
changing that variable to an unsigned type, which is natural for an
alignment mask. So suppress it and replace its use with a call to
uma_get_align_mask(). The small overhead of the added function call is
irrelevant since UMA_ALIGN_CACHE is only used when creating new zones,
which is not performance critical.
Reviewed by: markj, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42259
Create the uma_get_cache_align_mask() accessor and put it in a separate
private header so as to minimize namespace pollution in header/source
files that need only this function and not the whole 'uma.h' header.
Make sure the accessors have '_mask' as a suffix, so that callers are
aware that the real alignment is the power of two that is the mask plus
one. Rename the stem to something more explicit. Rename
uma_set_cache_align_mask()'s single parameter to 'mask'.
Hide 'uma_align_cache' to ensure that it cannot be set in any other way
then by a call to uma_set_cache_align_mask(), which will perform sanity
checks in a further commit. While here, rename it to
'uma_cache_align_mask'.
This is also in preparation for some further changes, such as improving
the sanity checks, eliminating internal resolving of UMA_ALIGN_CACHE and
changing the type of the 'uma_cache_align_mask' variable.
Reviewed by: markj, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42258
Olivier Certner [Tue, 10 Oct 2023 17:36:20 +0000 (19:36 +0200)]
Ensure "init" (PID 1) also executes userret() initially
Calling userret() from fork_return() misses the first return to
userspace of the "init" (PID 1) process. The latter is indeed created
by fork1() followed by a call to cpu_fork_kthread_handler() call that
replaces fork_return() by start_init() as the function to execute after
fork.
A new process' initial return to userspace in the end always happens
through returning from fork_exit(), so move userret() there instead to
fix the omission.
This problem was discovered as part of a revamp of scheduling priorities
that lead to experimenting with asserting and sometimes resetting
priorities in sched_userret(), in the course of which the author
stumbled on panics being triggered only in init() or only in other
processes, depending on the modifications to sched_userret(). This
change currently has no practical effect but will have some in the near
future.
Reviewed by: markj, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42257
This function is to be called only when initializing a new process (so,
'proc0' and at fork), and not in any other circumstances. Setting the
process' 'p_ucred' field to the result of crcowget() on the original
credentials is the only thing it does, hiding the fact that the process'
'p_ucred' field is crushed by the call. Moreover, most of the code it
executes is already encapsulated in crcowget().
To prevent misuse and improve code readability, just remove this
function and replace it with a direct assignment to 'p_ucred'.
Reviewed by: markj (earlier version), kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42255
Zhenlei Huang [Thu, 2 Nov 2023 09:07:11 +0000 (17:07 +0800)]
Hyper-V: vmbus: Add NULL check for vmbus_res
QEMU emulates Hyper-V [1] but lacks the emulation for vmbus_res, thus no
coherence information is available. Add NULL check for it and fallback
to no coherence. This will prevent FreeBSD guests from panic on QEMU
with the Hyper-V enlightenment hv-synic enabled.
For real Hyper-V, both gen1 and gen2 have vmbus_res then they are not
affected by this change.
hmt(4): Do not require input report HID usages to be a member of TLC
Some touchpads places button usages (in HID report descriptor) in to
the 2-nd level collection rather than in to the top level one. That
confuses current code. Remove collection level check in HID report
descriptor parser to fix device detection.
Reported by: Peter Much <pmc@citylink.dinoex.sub.org>
PR: 267094
MFC after: 1 week
Zhenlei Huang [Thu, 2 Nov 2023 05:14:40 +0000 (13:14 +0800)]
cam/ata: Postpone removal of two compat sysctls until 15
Prefer UNMAPPEDIO and ROTATING from flags sysctl. See
1. aeab0812e68c (Add flags sysctl to ada)
2. cf3ff63e55e4 (Convert unmappedio over to a flag)
3. 96eb32bf0f5a (Convert rotating to a flag bit)
Reviewed by: imp, ken, #cam
MFC after: immediately (we want this in 14.0)
Differential Revision: https://reviews.freebsd.org/D42402
Warner Losh [Wed, 1 Nov 2023 22:43:37 +0000 (16:43 -0600)]
libc: Purge unneeded cdefs.h
These sys/cdefs.h are not needed. Purge them. They are mostly left-over
from the $FreeBSD$ removal. A few in libc are still required for macros
that cdefs.h defines. Keep those.