Ed Maste [Tue, 19 Apr 2022 19:44:46 +0000 (15:44 -0400)]
capsicum: briefly describe capabilities in man page
Provide a very brief introduction to capabilities, using a couple of
sentences from David Chisnall's mailing list response[1] to a question
about Linux capabilities and Capsicum.
Mailing list subject (in case the archive URL changes) was
Re: Linux capabilities to Capsicum
Kyle Evans [Sat, 19 Mar 2022 03:03:44 +0000 (22:03 -0500)]
psci: finish psci_present implementation
This was already declared in psci.h, but it was never defined/set. Do
this now, so we can use it to decide if enable-method in /cpus FDT nodes
should be inspected later on. While we're here, convert it to a
boolean.
stand: zfs: handle holes at the tail end correctly
This mirrors dmu_read_impl(), zeroing out the tail end of the buffer and
clipping the read to what's contained by the block that exists.
This fixes an issue that arose during the 13.1 release process; in
13.1-RC1 and later, setting up GELI+ZFS will result in a failure to
boot. The culprit is this, which causes us to fail to load geom_eli.ko
as there's a residual portion after the single datablk that should be
zeroed out.
The correct logic is a lot simpler than the previous iteration. We
record the base fts_name to avoid having to worry about whether we
needed the root symlink name or not (as applicable), then we can simply
shift all of that logic to after path translation to make it less
fragile.
If we're copying to DNE, then we'll have swapped out the NULL root_stat
pointer and then attempted to recurse on it. The previously nonexistent
directory shouldn't exist at all in the new structure, so just back out
from that tree entirely and move on.
The tests have been amended to indicate our expectations better with
subdirectory recursion. If we copy A to A/B, then we expect to copy
everything from A/B/* into A/B/A/B, with exception to the A that we
create in A/B.
Kyle Evans [Thu, 27 Jan 2022 18:02:17 +0000 (12:02 -0600)]
cp: fix some cases with infinite recursion
As noted in the PR, cp -R has some surprising behavior. Typically, when
you `cp -R foo bar` where both foo and bar exist, foo is cleanly copied
to foo/bar. When you `cp -R foo foo` (where foo clearly exists), cp(1)
goes a little off the rails as it creates foo/foo, then discovers that
and creates foo/foo/foo, so on and so forth, until it eventually fails.
POSIX doesn't seem to disallow this behavior, but it isn't very useful.
GNU cp(1) will detect the recursion and squash it, but emit a message in
the process that it has done so.
This change seemingly follows the GNU behavior, but it currently doesn't
warn about the situation -- the author feels that the final product is
about what one might expect from doing this and thus, doesn't need a
warning. The author doesn't feel strongly about this.
PR: 235438
Reviewed by: bapt
Sponsored by: Klara, Inc.
Wojciech Macek [Mon, 20 Dec 2021 05:32:51 +0000 (06:32 +0100)]
cam: don't send scsi commands on shutdown when reboot method RB_NOSYNC
Don't send the SCSI comand SYNCHRONIZE CACHE on devices that are still
open when RB_NOSYNC is the reboot method. This may avoid recursive panics
when doadump is called due to a SCSI/CAM/USB error/bug.
Rick Macklem [Sun, 23 Jan 2022 22:17:40 +0000 (14:17 -0800)]
mountd: Delay starting mountd until after mountlate
PR#254282 reports a problem where nullfs mounts cannot be
exported via mountd for FreeBSD 13.0.
The problem seems to be that, to do the nullfs mounts in
/etc/fstab, they require the "late" mount option, so that the
underlying filesystem is mounted (ZFS for the PR).
Adding "mountlate" to the REQUIRE list in /etc/rc.d/mountd
fixes the problem, but that results in a dependency cycle
because /etc/rc.d/lockd specifies:
REQUIRE: nfsd
BEFORE: DAEMON
--> which forces mountd to preceed DAEMON.
This patch removes "nfsd" from REQUIRE for lockd and statd,
then adds mountlate to REQUIRE for mountd, to fix this
problem. Having lockd REQUIRE nfsd was done in the NetBSD
code when it was pulled into FreeBSD and there does not
seem to be a need for this.
In case this causes problems, a long MFC has been specified.
Mark Johnston [Wed, 13 Apr 2022 14:47:08 +0000 (10:47 -0400)]
libsysdecode: Fix decoding of Capsicum rights
Capsicum rights are a bit tricky since some of them are subsets of
others, and one can have rights R1 and R2 such that R1 is a subset of
R2, but there is no collection of named rights whose union is R2. So,
they don't behave like most other flag sets. sysdecode_cap_rights(3)
does not handle this properly and so can emit misleading decodings.
Try to fix all of these problems:
- Include composite rights in the caprights table.
- Use a constructor to sort the caprights table such that "larger"
rights appear first and thus are matched first.
- Don't print rights that are a subset of rights already printed, so as
to minimize the length of the output.
- Print a trailing message if some of the specific rights are not
matched by the table.
PR: 263165
Reviewed by: pauamma_gundo.com (doc), jhb, emaste
Sponsored by: The FreeBSD Foundation
Mark Johnston [Wed, 23 Mar 2022 16:36:12 +0000 (12:36 -0400)]
setitimer: Fix exit race
We use the p_itcallout callout, interlocked by the proc lock, to
schedule timeouts for the setitimer(2) system call. When a process
exits, the callout must be stopped before the process struct is
recycled.
Currently we attempt to stop the callout in exit1() with the call
_callout_stop_safe(&p->p_itcallout, CS_EXECUTING). If this call returns
0, then we sleep in order to drain the callout. However, this happens
only if the callout is not scheduled at all. If the callout thread is
blocked on the proc lock, then exit1() will not block and the callout
may execute after the process has fully exited, typically resulting in a
panic.
I cannot see a reason to use the CS_EXECUTING flag here. Instead, use
the regular callout_stop()/callout_drain() dance to halt the callout.
Reported by: ler
Tested by: ler, pho
Sponsored by: The FreeBSD Foundation
diff: tests: loosen up requirements for report_identical
This test cannot run without an unprivileged_user being specified
anyways, so just run as the unprivileged user. Revoking read permisions
works just as well if you're guaranteed non-root.
loader: userboot: provide a getsecs() implementation
We don't need it for userboot, but it avoids issues with BIND_NOW, so
just provide it. time(3) isn't defined but ends up being provided by
libc linked into the host process, which is generally fine.
Printing device followed by interface matches, e.g., edk2. Note that
this is only a fallback, many firmware implementations will provide the
protocol that we'll use to format device paths.
Mike Karels [Mon, 11 Apr 2022 19:44:49 +0000 (14:44 -0500)]
genet: fix problems with interface down/up
The genet interface did not resume operation correctly after doing
ifconfig down then up. The down/reset procedure did not clear the
RUNNING flag, and did not reset enough of the hardware state. This
patch is modeled on OpenBSD code, with a call to gen_reset added
to reset the controller completely. Regularize the parameter to
gen_dma_disable() while here.
Mark Johnston [Thu, 14 Apr 2022 19:45:54 +0000 (15:45 -0400)]
vm: Move the "vm_wait in early boot" assertion to the proper place
The assertion was added in commit 1771e987ca6a. After that, vm_wait()
and friends were refactored such that the actual sleep happens
elsewhere. Now the assertion condition is not checked when
vm_wait_doms() is called directly, and it is checked even if we are not
going to sleep (because vm_page_count_min_set(wdoms) is false).
Reviewed by: alc, kib
Sponsored by: The FreeBSD Foundation
Gordon Bergling [Thu, 14 Apr 2022 08:04:14 +0000 (10:04 +0200)]
time(3): Refine history in the manual page
The time() system call first appeared in Version 1 AT&T UNIX. Through
the Version 3 AT&T UNIX, it returned 60 Hz ticks since an epoch that
changed occasionally, because it was a 32-bit value that overflowed in a
little over 2 years.
In Version 4 AT&T UNIX the granularity of the return value was reduced to
whole seconds, delaying the aforementioned overflow until 2038.
Version 7 AT&T UNIX introduced the ftime() system call, which returned
time at a millisecond level, though retained the gtime() system call
(exposed as time() in userland). time() could have been implemented as a
wrapper around ftime(), but that wasn't done.
4.1cBSD implemented a higher-precision time function gettimeofday() to
replace ftime() and reimplemented time() in terms of that.
Since FreeBSD 9 the implementation of time() uses
clock_gettime(CLOCK_SECOND) instead of gettimeofday() for performance
reasons.
Mark Johnston [Mon, 18 Apr 2022 15:45:45 +0000 (11:45 -0400)]
path_test: Correct the kevent test
Perhaps surprisingly, and contrary to the expectations of
path_test:path_event, NOTE_LINK events are not raised when a file is
unlinked. Prior to commit bf13db086b84, the test happened to work
because unlinking the file would cause the vnode to be recycled, and
EVFILT_VNODE knotes deliver an event with EV_EOF set when the vnode is
doomed. Since the test did not verify the note type, the test
succeeded. After commit bf13db086b84, the vnode is not recycled after
being unlinked and so the test hangs.
Fix the test by waiting for NOTE_DELETE instead, and check that we got
the note that we expected.
Reported by: Jenkins
Sponsored by: The FreeBSD Foundation
Insert padding in __cxa_exception struct for compatibility
Similar to https://github.com/llvm/llvm-project/commit/f2a436058fcb, the
addition of __attribute__((__aligned__)) to _Unwind_Exception (in commit b9616964) causes implicit padding to be inserted before the unwindHeader
field in __cxa_exception.
Applications attempt to get at the earlier fields in __cxa_exception, so
preserve the same negative offsets in __cxa_exception, by moving the
padding to the beginning of the struct.
The assumption here is that if the ABI is not aware of the padding
before unwindHeader and put the referenceCount/primaryException in
there, no padding should exist before unwindHeader.
This should make libreoffice's custom exception handling mechanisms work
correctly, even if it was built against an older cxxabi.h/unwind.h pair.
PR: 263370
Approved by: re (gjb, early MFC)
MFC after: immediately
Andrew Turner [Tue, 22 Mar 2022 15:46:15 +0000 (15:46 +0000)]
Add an implementation of .mcount on arm64
To support cc -pg on arm64 we need to implement .mcount. As clang and
gcc think it is function like it just needs to load the arguments
to _mcount and call it.
On gcc the first argument is passed in x0, however this is missing on
clang so we need to load it from the stack. As it's the caller return
address this will be at a known location.
PR: 262709
Reviewed by: emaste (earlier version)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34634
Andrew Turner [Fri, 5 Feb 2021 10:50:29 +0000 (10:50 +0000)]
Add support for arm64 nGnRE device memory
On arm64 we can select how strongly we order device memory. Currently
we use the strongest type of non-Gathering, non-Reordering, no Early
write acknowledgement. This is equivalent to VM_MEMATTR_SO in the 32-bit
arm code.
Create a new memory type to remove the no Early write acknowledgement
option to create a memory attribute that is equivalent to the arm
VM_MEMATTR_DEVICE.
Keep the the old nGnRnE memory as what we provide for VM_MEMATTR_DEVICE
until we can test nGnRE on more hardware. A method for dynamically
switching back may be needed as at least one vendor is known to have
broken nGnRE memory.
This code was marked gone_in(13), so its time has passed.
The only consumer of this interface is dumpon(8). We do not maintain
strict backwards compatibility for this utility because a) it
can't/shouldn't be used from a jail or chroot and b) it is highly
specific interface unique to FreeBSD. The host's (presumably more
up-to-date) copy of dumpon(8) should be used to configure kernel dump
devices.
LinuxKPI: 802.11: start adding rate control to ieee80211_tx_status()
Start adding rate control feedback in ieee80211_tx_status() in order
for net80211 to be able to report something back (which may not
yet be the view of the firmware). iwlwifi is reporting back an MSC 0
even with HT disabled (to be investigated) so we cannot (yet) use
the firmware/driver rate feedback directly.
Implement skb_copy() with omissions of fragments and possibly other fields
for now. Should we hit frags at any point a log message will let us know.
For the few cases we need this currently this is enough.
Initially we were using the IEs from ieee80211_probereq_ie() of net80211
and put them into the common_ies field. Start by manually building the
per-band and common IE parts as drivers put them back together.
This also involves allocating the req.ie as one buffer for all IEs over
all bands and setting req.ie_len correctly based on how many bytes we
put in.
Manually building per-band scan IEs we still use the net80211 routines
to add IEs to the buffer (mostly).
This is needed by Realtek drivers but will equally used by others.
Realtek would simply panic due to skbs being allocated with the wrong
length.
Longer-term this will help us, e.g., when not supporting VHT on 2Ghz
and we would have to do this anyway.
LinuxKPI: skbuff: dev_kfree_skb_irq() and improvements
While it is currently unclear if we will have to defer work in
dev_kfree_skb_irq() to call dev_kfree_skb().
We only have one caller which seems to be fine on FreeBSD by calling
it directly for now.
While here shortcut skb_put()/skb_put_data() saving us work if there
are no adjustments to do.
Also adjust the logging in skb_is_gso() to avoid getting spammed by it.
LinuxKPI: 802.11: use an sx lock to protect the list of vifs
Use an sx lock to protect the list of vifs. We could use the
linux mutex compat for this but our current implementation may
re-acquire the lock recursively so allow this. The change is
mainly motivated by the fact that some callers may sleep in the
interator function called. Recursiveness is needed because we
see find_sta_by_ifaddr() being called from an iterator function
from iterate_interfaces().
ipfw: fix matching and setting DSCP value for IPv6
Matching for DSCP codes has used incorrect bits. Use IPV6_DSCP()
macro for matching opcodes to fix this. Also this leads to always
use value from a mbuf instead of cached value.
Previously different opcodes have used both cached in f_id value
and stored in the mbuf, and it did not always work after setdscp
action, since cached value was not updated.
Update IPv6 flowid value cached in the f_id.flow_id6 when we do
modification of DSCP value in O_SETDSCP opcode, it may be used by
external modules.
For IPv4 use dst pointer as destination address in fib4_lookup().
It keeps destination address from IPv4 header and can be changed
when PACKET_TAG_IPFORWARD tag was set by packet filter.
For IPv6 override destination address with address from dst_sa.sin6_addr,
that was set from PACKET_TAG_IPFORWARD tag.
Reviewed by: eugen
PR: 256828, 261697, 255705
Differential Revision: https://reviews.freebsd.org/D34732
Milan Obuch [Sat, 2 Apr 2022 18:28:33 +0000 (15:28 -0300)]
MFC: mii: include missing sources in loadable module
As of today, using 'kldload miibus' is not equivalent to using 'device
miibus' in a kernel config. Newly introduced PHY drivers (DP83822,
DP83867, VSCPHY) and source files/PHY driver for FDT-enabled kernels
are missing. Without including them, kernel modules using any function
from dev/mii/mii_fdt.c refuse to load. Additionally, miivar.h directly
includes opt_platform.h.
Add the missing sources to the module build, with the FDT-only files
gated behind an OPT_FDT check. Maintain the alphabetical listing of
SRCS, but move the required header files to a separate line to improve
readability.
Ed Maste [Tue, 1 Mar 2022 21:42:13 +0000 (16:42 -0500)]
ssh: use standalone config file for security key support
An upcoming OpenSSH update has multiple config.h settings that change
depending on whether builtin security key support is enabled. Prepare
for this by moving ENABLE_SK_INTERNAL to a new sk_config.h header
(similar to the approach used for optional krb5 support) and optionally
including that, instead of defining the macro directly from CFLAGS.
Reviewed by: kevans
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34407
Ed Maste [Tue, 22 Mar 2022 17:48:43 +0000 (13:48 -0400)]
kbd: replace vestigial spl calls with Giant assertions
The keyboard driver was initially protected via spl* interrupt priority
calls but (as part of a comprehensive effort) migrated to use the Giant
lock (mutex).
The spl calls left behind became NOPs but they can be confusing as they
have no bearing on the actual mutual exclusion that is now present.
Remove them from kbd and add assertions that Giant is held. markj notes
that there is conflation between the "bus topo" lock (which is Giant
under the hood) and Giant. The assertions could either be addressed as
a small item along with bus topology locking work or they'll be removed
if kbd is decoupled from Giant.
PR: 206680
Reviewed by: markj
MFC after: 3 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34645
Mark Johnston [Fri, 8 Apr 2022 15:47:25 +0000 (11:47 -0400)]
net: Fix memory leaks in lltable_calc_llheader() error paths
Also convert raw epoch_call() calls to lltable_free_entry() calls, no
functional change intended. There's no need to asynchronously free the
LLEs in that case to begin with, but we might as well use the lltable
interfaces consistently.
Noticed by code inspection; I believe lltable_calc_llheader() failures
do not generally happen in practice.
Reviewed by: bz
Sponsored by: The FreeBSD Foundation
60970a328e280b25b05f1d9a9de1ef91af573c4a did one half of the job
of making rssi relative to nf and numbers for radiotap were fine.
net80211 internally works with .5 dBm units thus we need to apply a
* 2 to the value we pass in to c_rssi; leave a comment explaining.
Note: it is only ifconfig in user space which re-adjust it for printing
or contrib/wpa for calculations. Other applications getting values
from kernel also have to apply the maths.
In collaboration with: J.R. Oldroyd (fbsd opal.com)
Sponsored by: The FreeBSD Foundation
dev_alloc_skb() comapred to alloc_skb() reserves some headroom
at the beginning of the skb which is used by drivers.
Split the code for the two cases and reserve NET_SKB_PAD space,
which should at least be 32 octets.
Bjoern A. Zeeb [Wed, 30 Mar 2022 17:38:23 +0000 (17:38 +0000)]
LinuxKPI: PCI: add counter for linux_dma_map_phys_common() errors
LinuxKPI is asking for single-segment mappings. Some (wireless) drivers
are using this to map multi-pages and our busdma framework is not very
friendly to that as single-segments [D31823]. Add a counter so we can
track when this happens to gather more information.
Sponsored by: The FreeBSD Foundation
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D34715