When using copy_file_range(2) with an offset parameter,
the CAP_SEEK capability should be required.
This requirement is similar to the behavior observed with
pread(2)/pwrite(2).
Warner Losh [Wed, 27 Sep 2023 23:07:53 +0000 (17:07 -0600)]
newvers: Add comment about why we need sccs, but deprecate it
The SCCS ID is still the most reliable way to dig out the version
information from the kernel w/o false positives. Add a comment to that
effect. savecore(8) neglects to save the kerneldumpheader that would
have the version information at a fixed location. savecore(8) needs to
be augmented to have the right data in the right places, but until then
the old-school SCCS id needs to remain. Once that's fixed, we plan to
remove it.
The reason it needs to be in a fixed or easily findable location is
because if you have an arbitrary core and want to pull the source and
build artificts that went along with that core, you don't yet have the
symbols you need to read the version string. To solve the chicken / egg
problem, one needs an independent way to know what to use so that
automated analysis of cores can happen. The sccs id being in the kernel
ensures that it is in the core image written. The what(1) utility makes
extracting the version easy.
Mark Johnston [Wed, 27 Sep 2023 22:21:37 +0000 (18:21 -0400)]
makefs/zfs: Remove a nonsensical comment
When populating files, makefs needs to copy their contents into
userspace in order to compute a checksum, so copy_file_range(2) is out
of the question. Though, it could possibly be used when building other
types of filesystems.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Ed Maste [Wed, 27 Sep 2023 13:36:33 +0000 (09:36 -0400)]
freebsd-update: handle file -> directory on upgrade
Upgrading from FreeBSD 13.2 to 14.0 failed with
install: ///usr/include/c++/v1/__string exists but is not a directory
because __string changed from a file to a directory with an LLVM
upgrade.
Now, remove the existing file when the type conflicts. Note that this
is only an interim fix to facilitate upgrades from 13.2 for 14.0 BETA
testing. This change does not handle the directory -> file case and
further work is needed.
PR: 273661
Reviewed by: dim, gordon
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D41893
Commit b6e28991bf3a modified the allocation path for system scope PMCs
so that the event was allocated early for CPU 0. The reason is so that
the PMC's capabilities could be checked, to determine if pmcstat should
allocate the event on every CPU, or just on one CPU in each NUMA domain.
In the current scheme, there is no way to determine this information
without performing the PMC allocation.
This broke the established use-case of log analysis, and so 0aa150775179a was committed to fix the assertion. The result was what
appeared to be functional, but in normal counter measurement pmcstat was
silently allocating two counters for CPU 0.
This cuts the total number of counters that can be allocated from a CPU
in half. Additionally, depending on the particular hardware/event, we
might not be able to allocate the same event twice on a single CPU.
The simplest solution is to release the early-allocated PMC once we have
obtained its capabilities, and reallocate it later on. This restores the
event list logic to behave as it has for many years, and partially
reverts commit b6e28991bf3a.
Reported by: alc, kevans
Reviewed by: jkoshy, ray
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D41978
The loader tunable `net.inet.ip.mfchashsize` does not have corresponding
sysctl MIB entry. Just add it.
While here, the sysctl variable `net.inet.pim.squelch_wholepkt` is actually
a loader tunable. Add sysctl flag CTLFLAG_TUN to it so that `sysctl -T`
will report it correctly.
Reviewed by: kp
Fixes: 443fc3176dee Introduce a number of changes to the MROUTING code
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D41997
MAXCPU could be on the large side, while the hardware may not have
many processors at all. As such only allocate counters for processors
which actually exist, rather than always allocating for the maximum
potentially allowed number.
This module is bundled into flua, it only provides for now the exec
function. The point of the function is to be able to execute a program
without actually executing a shell.
to use it:
fbsd.exec({"id", "bapt"})
Reviewed by: manu
Differential Revision: https://reviews.freebsd.org/D41840
Mark Johnston [Wed, 27 Sep 2023 12:24:11 +0000 (08:24 -0400)]
unix: Fix a lock order reveral
Running the test suite yields:
lock order reversal:
1st 0xfffff80004bc6700 unp (unp, sleep mutex) @ sys/kern/uipc_usrreq.c:390
2nd 0xffffffff81a94b30 unp_link_rwlock (unp_link_rwlock, rw) @ sys/kern/uipc_usrreq.c:2934
lock order unp -> unp_link_rwlock attempted at:
0xffffffff80bc216e at witness_checkorder+0xbbe
0xffffffff80b493a5 at _rw_wlock_cookie+0x65
0xffffffff80c0a8e2 at unp_discard+0x22
0xffffffff80c0a888 at unp_freerights+0x38
0xffffffff80c09fdd at unp_scan+0x9d
0xffffffff80c0f9a7 at uipc_sosend_dgram+0x727
0xffffffff80c00a79 at sousrsend+0x79
0xffffffff80c072d0 at kern_sendit+0x1c0
0xffffffff80c074d7 at sendit+0xb7
0xffffffff80c076f3 at sys_sendmsg+0x63
0xffffffff8104d957 at amd64_syscall+0x6b7
0xffffffff8101f9eb at fast_syscall_common+0xf8
This happens when uipc_sosend_dgram() discards a control message because
the receive socket buffer is full. The overflow handling frees
internalized file references in the socket buffer before freeing mbufs.
It does this with socket PCBs locked, leading to the LOR. Defer
handling of file references until the PCBs are unlocked.
Mark Johnston [Wed, 27 Sep 2023 12:23:58 +0000 (08:23 -0400)]
hdac: Defer interrupt allocation in hdac_attach()
hdac_attach() registers an interrupt handler before allocating various
driver resources which are accessed by the interrupt handler. On some
platforms we observe what appear to be spurious interrupts upon a cold
boot, resulting in panics.
Partially work around the problem by deferring irq allocation until
after other resources are allocated. I think this is not a complete
solution, but is correct and sufficient to work around the problems
reported in the PR.
PR: 268393
Tested by: Alexander Sherikov <asherikov@yandex.com>
Tested by: Oleh Hushchenkov <o.hushchenkov@gmail.com>
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D41883
libc: Rewrite quick_exit() and at_quick_exit() using C11 atomics.
Compiler memory barriers do not prevent the CPU from executing the code
out of order. Switch to C11 atomics. This also lets us get rid of the
mutex; instead, loop until the compare_exchange succeeds.
While here, change the return value of at_quick_exit() on failure to
the more traditional -1, matching atexit().
Standardize promotion conditions across amd64, arm64, and i386.
Stop requiring the accessed bit for superpage promotion.
Tidy up pmap_promote_pde() calls.
Retire PMAP_INLINE. It's no longer used.
Note: Some of these changes are a prerequisite to fixing a panic that
arises when attempting to create a wired superpage mapping by
pmap_enter(psind=1) (as opposed to promotion).
- 0206c0f ("Handle top-level /delete-node/ directives.")
- d612a9e ("Remove C++11 standard constrain")
- Remove extra white lines after the $FreeBSD$ tag removal
Warner Losh [Tue, 26 Sep 2023 04:08:52 +0000 (22:08 -0600)]
nvme: Supress noise messages
When we're suspending, we get messages about waiting for the controller
to reset. These are in error: we're not waiting for it to reset. We put
the recovery state as part of suspending, so we should suppress these as
a false positive.
Also remove a stray debug that's left over from earlier versions of
the recovery code that no longer makes sense.
Christos Zoulas [Wed, 30 Aug 2023 20:37:24 +0000 (20:37 +0000)]
regcomp: use unsigned char when testing for escapes
- cast GETNEXT to unsigned where it is being promoted to int to prevent
sign-extension (really it would have been better for PEEK*() and
GETNEXT() to return unsigned char; this would have removed a ton of
(uch) casts, but it is too intrusive for now).
- fix an isalpha that should have been iswalpha
Ed Maste [Sun, 24 Sep 2023 13:13:15 +0000 (09:13 -0400)]
mfc-candidates: search by committer only, not author
When both --author and --committer are specified, `git log` requires
both to match. Search only by committer, as it is typically the FreeBSD
committer who will perform the MFC.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D41964
LinuxKPI: 802.11: fill regulatory_hint() with some life
Start implementing regulatory_hint() using a .c file based allocation
helper function so we could change structures in the future with
better chances to keep compatibility.
This sets wiphy->regd needed by various LinuxKPI based WiFi drivers.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
LiunxKPI: 802.11: move ieee80211_chanctx_conf into lkpi private struct
Factor out ieee80211_chanctx_conf into struct lkpi_chanctx in order to
keep local state as well. In first instance that is added_to_drv
only. For now we stay single-chanctx only but this paves the path
to make it a list.
Use the new information to implement ieee80211_iter_chan_contexts_atomic().
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
LinuxKPI: 802.11: avoid symbol clash on UP to AC mapping
tid_to_mac80211_ac is an exported symbol in and likely based on iwlwifi,
which leads to a symbol clash in NetBSD. Rename our local LinuxKPI copy
to a better name and add a comment where to find a copy of the mapping
table.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Reported by: Phil Nelson (phil netbsd org)
John Baldwin [Mon, 25 Sep 2023 14:46:21 +0000 (07:46 -0700)]
Retire old diskless setup scripts
These scripts predate /etc/rc.diskless* and use a different scheme. A
comment was added to them back in 2002 noting they were 3 years old at
that point.
rtsock: Add sysctl flag CTLFLAG_TUN to loader tunable
The sysctl variable `net.route.netisr_maxqlen` is actually a loader
tunable. Add sysctl flag CTLFLAG_TUN to it so that `sysctl -T` will
report it correctly.
No functional change intended.
Reviewed by: glebius
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D41928
jail: avoid a clash with /etc/jail.conf.d between rc and jail(8)
Since 13.1, /etc/rc.d/jail has looked for a per-jail config file in
/etc/jail.conf.d. For RELENG 14, the ".include" directive was added to
jail(8), with a sample line in the jail.conf(5) man page that includes
"/etc/jail.conf.d/*.conf".
These two use cases don't work together. When the jail.conf.d files
are included from a master jail.conf, the files in jail.conf.d are
likely to hold only partial configurations, and shouldn't be directly
loaded by rc.d/jail. But there are existing configurations that depend
on the current rc.d behavior. While users could be advised not to
include from /etc/jail.conf.d, it's the natural choice even if not
mentioned in jail.conf.5.
The workaround is for rc.d/jail to continue to load the individual
files, but only when /etc/jail.conf doesn't include from that
directory (via a simple grep test), This allows the current use
while not breaking the previous use.
Reported by: antranigv at freebsd.am
Differential Revision: https://reviews.freebsd.org/D41962
Warner Losh [Sun, 24 Sep 2023 12:57:07 +0000 (06:57 -0600)]
nvme: Fix locking protocol violation to fix suspend / resume
Currently, when we suspend, we need to tear down all the qpairs. We call
nvme_admin_qpair_abort_aers with the admin qpair lock held, but the
tracker it will call for the pending AER also locks it (recursively)
hitting an assert. This routine is called without the qpair lock held
when we destroy the device entirely in a number of places. Add an assert
to this effect and drop the qpair lock before calling it.
nvme_admin_qpair_abort_aers then locks the qpair lock to traverse the
list, dropping it around calls to nvme_qpair_complete_tracker, and
restarting the list scan after picking it back up.
Note: If interrupts are still running, there's a tiny window for these
AERs: If one fires just an instant after we manually complete it, then
we'll be fine: we set the state of the queue to 'waiting' and we ignore
interrupts while 'waiting'. We know we'll destroy all the queue state
with these pending interrupts before looking at them again and we know
all the TRs will have been completed or rescheduled. So either way we're
covered.
Also, tidy up the failure case as well: failing a queue is a superset of
disabling it, so no need to call disable first. This solves solves some
locking issues with recursion since we don't need to recurse.. Set the
qpair state of failed queues to RECOVERY_FAILED and stop scheduling the
watchdog. Assert we're not failed when we're enabling a qpair, since
failure currently is one-way. Make failure a little less verbose.
Next, kill the pre/post reset stuff. It's completely bogus since we
disable the qparis, we don't need to also hold the lock through the
reset: disabling will cause the ISR to return early. This keeps us from
recursing on the recovery lock when resuming. We only need the recovery
lock to avoid a specific race between the timer and the ISR.
Finally, kill NVME_RESET_2X. It'S been a major release since we put it
in and nobody has used it as far as I can tell. And it was a motivator
for the pre/post uglification.
These are all interrelated, so need to be done at the same time.
Alexander Motin [Sat, 23 Sep 2023 16:13:46 +0000 (12:13 -0400)]
kern_sysctl: Make name2oid() non-destructive to the name
It is not the first time I see it panicking while trying to modify
const memory. Lets make it safer and easier to use. While there,
mark few functions using it also const.
Marko Zec [Sat, 23 Sep 2023 08:56:56 +0000 (10:56 +0200)]
ng_eiface: switch VNETs when injecting mbufs into netgraph
A ng_eiface instance may be on lease to a different vnet while
remaining tied to its parent vnet. In such circumstances, before
injecting mbufs into netgraph, curvnet must be set to that of the
ng_eiface's netgraph node. Mark the vnet transition as QUIET,
since otherwise it would be recorded as a curvnet recursion when
ng_eiface's ifnet resides in the same (parent) vnet as its
netgraph node.
Paul Dagnelie [Fri, 22 Sep 2023 23:08:51 +0000 (16:08 -0700)]
Invoke zdb by guid to avoid import errors
The problem that was occurring is basically that a device was removed
by ztest and replaced with another device. It was then reguided. The
import then failed because there were two possible imports with the
same name; one with the new guid, and one with the old. This can
happen because the label writes from the device removal/replacement
can be subject to ztest's error injection.
The other ways to fix this would be to change the error injection to
not trigger on removals (which may not be technically feasible), or
to change the import code to not report configurations that are so
short on devices (which would potentially have unpleasant end-user
effects when trying to recover from data losses/device configuration
issues).
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #15298
vfs: fix reference counting/locking on LK_UPGRADE error
Factoring out this code unfortunately introduced reference and lock leaks in
case of failure in the lock upgrade path under VV_CROSSLOCK. In terms of
practical use, this impacts unionfs (and nullfs in a corner case).
Fixes: 80bd5ef07025 ("vfs: factor out mount point traversal to a dedicated routine")
MFC after: 3 days
MFC to: stable/14 releng/14.0
Sponsored by: The FreeBSD Foundation
Reviewed by: mjg
[mjg: massaged the commit message a little bit]
Lin Ma [Mon, 19 Jun 2023 09:32:59 +0000 (17:32 +0800)]
netlink: add unregister call in cleanup
For protocols that use netlink (generic and route for now), the unint
handler seems to have forgotten to call unregister, which will cause
the assertion the next time the module is loaded.
This patch adds unregister call to netlink_unregister_proto() for those
handlers to avoid bad things happen.
Reviewed-by: melifaro Fixes: 7e5bf68495cc ("netlink: add netlink support")
Pull-request: https://github.com/freebsd/freebsd-src/pull/781 Signed-off-by: Lin Ma <linma@zju.edu.cn>
pf: Convert PF_DEFAULT_TO_DROP into a vnet loader tunable 'net.pf.default_to_drop'
7f7ef494f11d introduced a compile time option PF_DEFAULT_TO_DROP to make
the pf(4) default rule to drop. While this change exposes a vnet loader
tunable 'net.pf.default_to_drop' so that users can change the default
rule without re-compiling the pf(4) module.
This change is similiar to that for IPFW [1].
1. 5f17ebf94db5 Convert IPFW_DEFAULT_TO_ACCEPT into a loader tunable 'net.inet.ip.fw.default_to_accept'