John Baldwin [Mon, 7 Feb 2022 22:11:10 +0000 (14:11 -0800)]
Extend the VMM stats interface to support a dynamic count of statistics.
- Add a starting index to 'struct vmstats' and change the
VM_STATS ioctl to fetch the 64 stats starting at that index.
A compat shim for <= 13 continues to fetch only the first 64
stats.
- Extend vm_get_stats() in libvmmapi to use a loop and a static
thread local buffer which grows to hold the stats needed.
John Baldwin [Mon, 7 Feb 2022 20:55:08 +0000 (12:55 -0800)]
cfiscsi_done: Free the dummy PDU earlier.
The dummy PDU needs to be freed before marking task abortion complete
as otherwise cfiscsi_session_terminate_tasks can return and destroy
the session in another thread before the PDU is freed.
Fixes: 2e8d1a55258d iscsi: Allocate a dummy PDU for the internal nexus reset task.
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D34176
John Baldwin [Fri, 4 Feb 2022 23:38:49 +0000 (15:38 -0800)]
cxgbei: Rework parsing of pre-offload PDUs.
sbcut() returns mbufs in reverse order so is not suitable for reading
data from the socket buffer. Instead, check for already-received data
in the receive worker thread before passing offload PDUs up to the
iSCSI layer. This uses soreceive() to read data from the socket and
is also to use M_WAITOK since it now runs from a worker thread instead
of an interrupt thread.
Also, fix decoding of the data segment length for pre-offload PDUs.
Reported by: Jithesh Arakkan @ Chelsio
Fixes: a8c4147edcdc cxgbei: Parse all PDUs received prior to enabling offload mode.
Sponsored by: Chelsio Communications
John Baldwin [Fri, 28 Jan 2022 21:07:04 +0000 (13:07 -0800)]
iscsi: Allocate a dummy PDU for the internal nexus reset task.
When an iSCSI target session is terminated, an internal nexus reset
task is posted to abort existing tasks belonging to the session.
Previously, the ctl_io for this internal nexus reset stored a pointer
to the session in the slot that normally holds a pointer to the PDU
from the initiator that triggered the I/O request. The completion
handler then assumed that any nexus reset I/O was due to an internal
request and fetched the session pointer (instead of the PDU pointer)
from the ctl_io. However, it is possible to trigger a nexus reset via
an on-the-wire task management PDU. If such a PDU were sent to the
target, then the completion handler would incorrectly treat this
request as an internal request and treat the pointer to the received
PDU as a pointer to the session instead.
To fix, allocate a dummy PDU for the internal reset task and use an
invalid opcode to differentiate internal nexus resets from resets
requested by the initiator.
John Baldwin [Thu, 6 Jan 2022 22:46:50 +0000 (14:46 -0800)]
cryptocheck: Add aliases for algs with multiple key sizes.
Previously algorithms such as AES-CBC would provide an algorithm
without a key size for the smallest key size and additional algorithms
with an explicit key size, e.g. "aes-cbc" (128 bits), "aes-cbc192",
and "aes-cbc256".
Instead, always make the key size name explicit and reuse the
"generic" name to request running tests against all of the key sizes.
For example, for AES-CBC this means "aes-cbc128" is now the name of
the variant with a 128-bit key and "aes-cbc" runs tests of AES-CBC
with all three key sizes.
This makes it easier to run tests on all combinations of ciphers like
AES-GCM or AES-CCM with -z in a single invocation.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33759
John Baldwin [Tue, 4 Jan 2022 22:22:32 +0000 (14:22 -0800)]
ccr: Use a software OCF session for requests which fallback to software.
Previously the driver duplicated code from cryptosoft.c to handle
certain edge case AES-CCM and AES-GCM requests. However, this
approach has a few downsides:
1) It only uses "plain" software and not accelerated software since it
uses enc_xform directly.
2) It performs the operation synchronously even though the caller
believes it is invoking an async driver. This was fine for the
original use case of requests with only AAD and no payload that
execute quickly, but is a bit more disingenuous for large requests
which fall back due to exceeding the size of a firmware work
request (e.g. due to large scatter/gather lists).
3) It has required several updates since ccr(4) was added to the tree.
Instead, allocate a software session for AES-CCM and AES-GCM sessions
and dispatch a cloned request asynchronusly to the software session.
John Baldwin [Tue, 4 Jan 2022 22:22:12 +0000 (14:22 -0800)]
OCF: Add crypto_clonereq().
This function clones an existing crypto request, but associates the
new request with a specified session. The intended use case is for
drivers to be able to fall back to software by cloning a request and
dispatch it to an internally allocated software session.
John Baldwin [Tue, 28 Dec 2021 21:51:25 +0000 (13:51 -0800)]
Simplify swi for bus_dma.
When a DMA request using bounce pages completes, a swi is triggered to
schedule pending DMA requests using the just-freed bounce pages. For
a long time this bus_dma swi has been tied to a "virtual memory" swi
(swi_vm). However, all of the swi_vm implementations are the same and
consist of checking a flag (busdma_swi_pending) which is always true
and if set calling busdma_swi. I suspect this dates back to the
pre-SMPng days and that the intention was for swi_vm to serve as a
mux. However, in the current scheme there's no need for the mux.
Instead, remove swi_vm and vm_ih. Each bus_dma implementation that
uses bounce pages is responsible for creating its own swi (busdma_ih)
which it now schedules directly. This swi invokes busdma_swi directly
removing the need for busdma_swi_pending.
One consequence is that the swi now works on RISC-V which had previously
failed to invoke busdma_swi from swi_vm.
John Baldwin [Wed, 22 Dec 2021 20:53:58 +0000 (12:53 -0800)]
cxgbei: Don't fail task setup if the socket is disconnected.
When the initiator is reconnecting to the target, the connection may
temporarily be marked disconnected or not have an associated socket.
New I/O requests received by the initiator in this state should not
fail with ECONNRESET as that results in an I/O error back to userland.
Instead, they need to still succeed so that CAM can queue the requests
and send them once the connection is re-established.
Setting up DDP for zero-copy receive requires a socket, so just punt
on using DDP for these transfers.
John Baldwin [Wed, 29 Dec 2021 22:36:04 +0000 (14:36 -0800)]
iscsi: Handle large Text responses.
Text requests and responses can span multiple PDUs. In that case, the
sender sets the Continue bit in non-final PDUs and the Final bit in
the last PDU. The receiver responds to non-final PDUs with an empty
text PDU.
To support this, add a more abstract API in libiscsi which accepts and
receives key sets rather than PDUs. These routines internally send or
receive one or more PDUs. Use these new functions to replace the
handling of TextRequest and TextResponse PDUs in discovery sessions in
both ctld and iscsid.
Note that there is not currently a use case for large Text requests
and those are still always sent as a single PDU. However, discovery
sessions can return a text response listing targets that spans
multiple PDUs, so the new API supports sending and receiving multi-PDU
responses.
John Baldwin [Wed, 29 Dec 2021 00:40:04 +0000 (16:40 -0800)]
iscsid: Always free the duplicated address in resolve_addr().
If a "raw" IPv6 address (denoted by a leading '[') is used as a target
address, then 'arg' is incremented by one to skip over the '['.
However, this meant that at the end of the function the wrong address
was passed to free(). With malloc junking enabled and given suitably
small strings, malloc() would happily overwrite the correct number of
bytes with junk, but off by one byte overwriting the byte after the
allocation.
This manifested as the first byte of the 'HeaderDigest' key being
overwritten causing the key name on the wire to be sent as
'\x5eaderDigest' which the target rejected.
Reported by: Jithesh Arakkan @ Chelsio
Found with: ASAN (via WITH_ASAN=yes)
Sponsored by: Chelsio Communications
John Baldwin [Wed, 22 Dec 2021 23:23:45 +0000 (15:23 -0800)]
ctld: Disable -Wcast-align warnings.
clang complains about the downcasts from struct connection to struct
ctld_connection as the alignment of struct ctld_connection is higher
on 32-bit platforms. However, the warning is in this case harmless as
the downcasts are on objects originally allocated as instances of
struct ctld_connection with suitable alignment.
John Baldwin [Wed, 22 Dec 2021 18:41:01 +0000 (10:41 -0800)]
libiscsiutil: Fix a memory leak with negotiation keys.
When keys are loaded from a received PDU, a copy of the received keys
block is saved in the keys struct and the name and value pointers
point into that saved block. Freeing the keys frees this block.
However, when keys are added to a keys struct to build a set of keys
later sent in a PDU, the keys data block pointer is not used and
individual key names and values hold allocated strings. When the keys
structure was freed, all of these individual key name and value
strings were leaked.
Instead, allocate copies of strings for names and values when parsing
a set of keys from a received PDU and free all of the individual key
name and value strings when deleting a set of keys.
John Baldwin [Wed, 22 Dec 2021 18:35:46 +0000 (10:35 -0800)]
Add an internal libiscsiutil library.
Move some of the code duplicated between ctld(8) and iscsid(8) into a
libiscsiutil library.
Sharing the low-level PDU code did require having a
'struct connection' base class with a method table to permit separate
initiator vs target behavior (e.g. in handling proxy PDUs).
John Baldwin [Tue, 14 Dec 2021 19:01:05 +0000 (11:01 -0800)]
ktls: Support for TLS 1.3 receive offload.
Note that support for TLS 1.3 receive offload in OpenSSL is still an
open pull request in active development. However, potential changes
to that pull request should not affect the kernel interface.
John Baldwin [Thu, 9 Dec 2021 21:17:54 +0000 (13:17 -0800)]
TLS: Use <machine/tls.h> for libc and rtld.
- Include <machine/tls.h> in MD rtld_machdep.h headers.
- Remove local definitions of TLS_* constants from rtld_machdep.h
headers and libc using the values from <machine/tls.h> instead.
- Use _tcb_set() instead of inlined versions in MD
allocate_initial_tls() routines in rtld. The one exception is amd64
whose _tcb_set() invokes the amd64_set_fsbase ifunc. rtld cannot
use ifuncs, so amd64 inlines the logic to optionally write to fsbase
directly.
- Use _tcb_set() instead of _set_tp() in libc.
- Use '&_tcb_get()->tcb_dtv' instead of _get_tp() in both rtld and libc.
This permits removing _get_tp.c from rtld.
- Use TLS_TCB_SIZE and TLS_TCB_ALIGN with allocate_tls() in MD
allocate_initial_tls() routines in rtld.
John Baldwin [Thu, 9 Dec 2021 21:17:41 +0000 (13:17 -0800)]
libthr: Use <machine/tls.h> for most MD TLS details.
Note that on amd64 this effectively removes the unused tcb_spare field
from the end of struct tcb since the definition of struct tcb in
<x86/tls.h> does not include that field.
Reviewed by: kib, jrtc27
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D33352
John Baldwin [Thu, 9 Dec 2021 21:17:13 +0000 (13:17 -0800)]
Add <machine/tls.h> header to hold MD constants and helpers for TLS.
The header exports the following:
- Definition of struct tcb.
- Helpers to get/set the tcb for the current thread.
- TLS_TCB_SIZE (size of TCB)
- TLS_TCB_ALIGN (alignment of TCB)
- TLS_VARIANT_I or TLS_VARIANT_II
- TLS_DTV_OFFSET (bias of pointers in dtv[])
- TLS_TP_OFFSET (bias of "thread pointer" relative to TCB)
Note that TLS_TP_OFFSET does not account for if the unbiased thread
pointer points to the start of the TCB (arm and x86) or the end of the
TCB (MIPS, PowerPC, and RISC-V).
Note also that for amd64, the struct tcb does not include the unused
tcb_spare field included in the current structure in libthr. libthr
does not use this field, and the existing calls in libc and rtld that
allocate a TCB for amd64 assume it is the size of 3 Elf_Addr's (and
thus do not allocate room for tcb_spare).
A <sys/_tls_variant_i.h> header is used by architectures using
Variant I TLS which uses a common struct tcb.
Reviewed by: kib (older version of x86/tls.h), jrtc27
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D33351
For stable/13 only, sys/arm/include/tls.h includes support for
ARM_TP_ADDRESS which is not present in main.
John Baldwin [Thu, 9 Dec 2021 21:16:19 +0000 (13:16 -0800)]
mips: Add TLS_DTV_OFFSET to the result of tls_get_addr_common.
Previously TLS_DTV_OFFSET was added to the offset passed to
tls_get_addr_common; however, this approach matches powerpc and RISC-V
and better matches the intention.
Reviewed by: kib, jrtc27
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D33347
John Baldwin [Thu, 9 Dec 2021 21:16:00 +0000 (13:16 -0800)]
mips: Rename TLS_DTP_OFFSET to TLS_DTV_OFFSET.
This is the more standard name for the bias of dtv pointers used on
other platforms. This also fixes a few other places that were using
the wrong bias previously on MIPS such as dlpi_tls_data in struct
dl_phdr_info and the recently added __libc_tls_get_addr().
Reviewed by: kib, jrtc27
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D33346
John Baldwin [Thu, 9 Dec 2021 21:15:38 +0000 (13:15 -0800)]
libthr: Remove the DTV_OFFSET macro.
This macro is confusing as it is not related to the similarly named
TLS_DTV_OFFSET. Instead, replace its one use with the desired
expression which is the same on all platforms.
Reviewed by: kib, emaste, jrtc27
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D33345
John Baldwin [Thu, 9 Dec 2021 19:52:42 +0000 (11:52 -0800)]
GMAC: Reset initial hash value and counter in AES_GMAC_Reinit().
Previously, these values were only cleared in AES_GMAC_Init(), so a
second set of operations could reuse the final hash as the initial
hash. Currently this bug does not trigger in cryptosoft as existing
GMAC and GCM operations always use an on-stack auth context
initialized from a template context.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33315
John Baldwin [Thu, 9 Dec 2021 19:52:41 +0000 (11:52 -0800)]
crypto: Don't assert for empty output buffers.
It is always valid for crp_payload_output_start to be 0. However, if
an output buffer is empty (e.g. a decryption request with a tag but an
empty payload), the existing assertion failed since 0 is not less than
0.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33193
John Baldwin [Fri, 30 Jul 2021 00:09:23 +0000 (17:09 -0700)]
geom_vfs: Pre-allocate event for g_vfs_destroy.
When an active g_vfs is orphaned due to an underlying disk going away
the destroy is deferred until the filesystem is unmounted in
g_vfs_done(). However, g_vfs_done() is invoked from a non-sleepable
context and cannot use M_WAITOK to allocate the event. Instead,
allocate the event in g_vfs_orphan() and save it in the softc to be
retrieved by the last call to g_vfs_done().
Alan Somers [Sat, 2 Apr 2022 19:31:24 +0000 (13:31 -0600)]
fusefs: fix two bugs regarding VOP_RECLAIM of the root inode
* We never send FUSE_LOOKUP for the root inode, since its inode number
is hard-coded to 1. Therefore, we should not send FUSE_FORGET for it,
lest the server see its lookup count fall below 0.
* During VOP_RECLAIM, if we are reclaiming the root inode, we must clear
the file system's vroot pointer. Otherwise it will be left pointing
at a reclaimed vnode, which will cause future VOP_LOOKUP operations to
fail. Previously we only cleared that pointer during VFS_UMOUNT. I
don't know of any real-world way to trigger this bug.
Support for this directive has been removed in config(8) on main,
which leaves us unable to build LINT with newer config(8). It's
believed that mcount-based profiling didn't really work on modern
systems anyways, so the value of testing this is low.
We avoid providing limited backwards compatibility here to continue and
warn folks that may somehow be deploying real-world configs with `profile`
specified.
This is a direct commit to stable/13, but a partial MFC of aa3ea612be36.
Mark Johnston [Thu, 21 Apr 2022 14:49:22 +0000 (10:49 -0400)]
ctfdump: Remove definitions of warn() and vwarn()
The presence of the latter causes a link error when building a
statically linked ctfdump(1) because libc defines the same symbol.
libc's warn() is defined as a weak symbol and so does not cause the same
problem, but let's just use libc's version.
Reported by: stephane rochoy <stephane.rochoy@stormshield.eu>
Sponsored by: The FreeBSD Foundation
Mark Johnston [Thu, 21 Apr 2022 17:22:09 +0000 (13:22 -0400)]
mld6: Ensure that mld_domifattach() always succeeds
mld_domifattach() does a memory allocation under the global MLD mutex
and so can fail, but no error handling prevents a null pointer
dereference in this case. The mutex is only needed when updating the
global softc list; the allocation and static initialization of the softc
does not require this mutex. So, reduce the scope of the mutex and use
M_WAITOK for the allocation.
xhci(4): Ensure the so-called data toggle gets properly reset.
Use the drop and enable endpoint context commands to force a reset of
the data toggle for USB 2.0 and USB 3.0 after:
- clear endpoint halt command (when the driver wishes).
- set config command (when the kernel or user-space wants).
- set alternate setting command (only affected endpoints).
Some XHCI HW implementations may not allow the endpoint reset command when
the endpoint context is not in the halted state.
Reported by: Juniper and Gary Jennejohn
Sponsored by: NVIDIA Networking
e1000: Try auto-negotiation for fixed 100 or 10 configuration
Currently if an e1000 interface is set to a fixed media configuration,
for gigabit, it will participate in auto-negotiation as required by
IEEE 802.3-2018 Clause 37. However, if set to fixed media configuration
for 100 or 10, it does NOT participate in auto-negotiation.
By my reading of Clauses 28 and 37, while auto-negotiation is optional
for 100 and 10, it is not prohibited and is, in fact, "highly
recommended".
This patch enables auto-negotiation for fixed 100 and 10 media
configuration, in a similar manner to that already performed for 1000.
I.e., the patch enables advertising of just the manually configured
settings with the goal of allowing the remote end to match the manually
configured settings if it has them available.
To be clear, this patch does NOT allow an em(4) interface that has been
manually configured with specific media settings to respond to
auto-negotiation by then configuring different parameters to those that
were manually configured. The intent of this patch is to fully comply
with the requirements of Clause 37, but for 100 and 10.
The need for this has arisen on an em(4) link where the other end is
under a different administrative control and is set to full
auto-negotiation. Due to the cable length GigE is not working well. It
is desired to set the em(4) end to "media 100baseTX mediatype
full-duplex" which does work when both ends are configured that way.
Currently, because em(4) does not participate in autoneg for this
setting, the remote defaults to half-duplex - i.e., there's a duplex
mismatch and things don't work. With this patch, em(4) would inform the
remote that it has only 100baseTX full, the remote would match that and
it will work.
Ed Maste [Wed, 27 Apr 2022 13:15:09 +0000 (09:15 -0400)]
libcxxrt: Insert padding in __cxa_dependent_exception
Padding was added to __cxa_exception in 45ca8b19 and
__cxa_dependent_exception needs the same layout.
Add some static_asserts to detect this in the future.
John F. Carr [Sat, 19 Mar 2022 22:51:43 +0000 (18:51 -0400)]
hpet: Allow a MMIO window smaller than 1K
Some new AMD systems provide a HPET MMIO region smaller than the 1KB
specified, and a correspondingly small number of timers. Handle this in
the HPET driver rather than requiring a 1KB window. This allows the
HPET driver to attach on such systems.
pf: counter argument to pfr_pool_get() may never be NULL
Coverity points out that if counter was NULL when passed to
pfr_pool_get() we could potentially end up dereferencing it.
Happily all users of the function pass a non-NULL pointer. Enforce this
by assertion and remove the pointless NULL check.
Move the use of 'sc' to after the NULL check.
It's very unlikely that we'd actually hit this, but Coverity is correct
that it's not a good idea to dereference the pointer and only then NULL
check it.
15b1eb142c changed the callout code to store the CALLOUT_SHAREDLOCK flag
in c_iflags (where it used to be c_flags), but failed to update the
check in softclock_call_cc(). This resulted in the callout code always
taking the write lock, even if a read lock had been requested (with
the CALLOUT_SHAREDLOCK flag in callout_init_rm()).
Ed Maste [Tue, 19 Apr 2022 19:44:46 +0000 (15:44 -0400)]
capsicum: briefly describe capabilities in man page
Provide a very brief introduction to capabilities, using a couple of
sentences from David Chisnall's mailing list response[1] to a question
about Linux capabilities and Capsicum.
Mailing list subject (in case the archive URL changes) was
Re: Linux capabilities to Capsicum
Kyle Evans [Sat, 19 Mar 2022 03:03:44 +0000 (22:03 -0500)]
psci: finish psci_present implementation
This was already declared in psci.h, but it was never defined/set. Do
this now, so we can use it to decide if enable-method in /cpus FDT nodes
should be inspected later on. While we're here, convert it to a
boolean.
stand: zfs: handle holes at the tail end correctly
This mirrors dmu_read_impl(), zeroing out the tail end of the buffer and
clipping the read to what's contained by the block that exists.
This fixes an issue that arose during the 13.1 release process; in
13.1-RC1 and later, setting up GELI+ZFS will result in a failure to
boot. The culprit is this, which causes us to fail to load geom_eli.ko
as there's a residual portion after the single datablk that should be
zeroed out.
The correct logic is a lot simpler than the previous iteration. We
record the base fts_name to avoid having to worry about whether we
needed the root symlink name or not (as applicable), then we can simply
shift all of that logic to after path translation to make it less
fragile.
If we're copying to DNE, then we'll have swapped out the NULL root_stat
pointer and then attempted to recurse on it. The previously nonexistent
directory shouldn't exist at all in the new structure, so just back out
from that tree entirely and move on.
The tests have been amended to indicate our expectations better with
subdirectory recursion. If we copy A to A/B, then we expect to copy
everything from A/B/* into A/B/A/B, with exception to the A that we
create in A/B.
Kyle Evans [Thu, 27 Jan 2022 18:02:17 +0000 (12:02 -0600)]
cp: fix some cases with infinite recursion
As noted in the PR, cp -R has some surprising behavior. Typically, when
you `cp -R foo bar` where both foo and bar exist, foo is cleanly copied
to foo/bar. When you `cp -R foo foo` (where foo clearly exists), cp(1)
goes a little off the rails as it creates foo/foo, then discovers that
and creates foo/foo/foo, so on and so forth, until it eventually fails.
POSIX doesn't seem to disallow this behavior, but it isn't very useful.
GNU cp(1) will detect the recursion and squash it, but emit a message in
the process that it has done so.
This change seemingly follows the GNU behavior, but it currently doesn't
warn about the situation -- the author feels that the final product is
about what one might expect from doing this and thus, doesn't need a
warning. The author doesn't feel strongly about this.
PR: 235438
Reviewed by: bapt
Sponsored by: Klara, Inc.
Wojciech Macek [Mon, 20 Dec 2021 05:32:51 +0000 (06:32 +0100)]
cam: don't send scsi commands on shutdown when reboot method RB_NOSYNC
Don't send the SCSI comand SYNCHRONIZE CACHE on devices that are still
open when RB_NOSYNC is the reboot method. This may avoid recursive panics
when doadump is called due to a SCSI/CAM/USB error/bug.
Rick Macklem [Sun, 23 Jan 2022 22:17:40 +0000 (14:17 -0800)]
mountd: Delay starting mountd until after mountlate
PR#254282 reports a problem where nullfs mounts cannot be
exported via mountd for FreeBSD 13.0.
The problem seems to be that, to do the nullfs mounts in
/etc/fstab, they require the "late" mount option, so that the
underlying filesystem is mounted (ZFS for the PR).
Adding "mountlate" to the REQUIRE list in /etc/rc.d/mountd
fixes the problem, but that results in a dependency cycle
because /etc/rc.d/lockd specifies:
REQUIRE: nfsd
BEFORE: DAEMON
--> which forces mountd to preceed DAEMON.
This patch removes "nfsd" from REQUIRE for lockd and statd,
then adds mountlate to REQUIRE for mountd, to fix this
problem. Having lockd REQUIRE nfsd was done in the NetBSD
code when it was pulled into FreeBSD and there does not
seem to be a need for this.
In case this causes problems, a long MFC has been specified.
Mark Johnston [Wed, 13 Apr 2022 14:47:08 +0000 (10:47 -0400)]
libsysdecode: Fix decoding of Capsicum rights
Capsicum rights are a bit tricky since some of them are subsets of
others, and one can have rights R1 and R2 such that R1 is a subset of
R2, but there is no collection of named rights whose union is R2. So,
they don't behave like most other flag sets. sysdecode_cap_rights(3)
does not handle this properly and so can emit misleading decodings.
Try to fix all of these problems:
- Include composite rights in the caprights table.
- Use a constructor to sort the caprights table such that "larger"
rights appear first and thus are matched first.
- Don't print rights that are a subset of rights already printed, so as
to minimize the length of the output.
- Print a trailing message if some of the specific rights are not
matched by the table.
PR: 263165
Reviewed by: pauamma_gundo.com (doc), jhb, emaste
Sponsored by: The FreeBSD Foundation
Mark Johnston [Wed, 23 Mar 2022 16:36:12 +0000 (12:36 -0400)]
setitimer: Fix exit race
We use the p_itcallout callout, interlocked by the proc lock, to
schedule timeouts for the setitimer(2) system call. When a process
exits, the callout must be stopped before the process struct is
recycled.
Currently we attempt to stop the callout in exit1() with the call
_callout_stop_safe(&p->p_itcallout, CS_EXECUTING). If this call returns
0, then we sleep in order to drain the callout. However, this happens
only if the callout is not scheduled at all. If the callout thread is
blocked on the proc lock, then exit1() will not block and the callout
may execute after the process has fully exited, typically resulting in a
panic.
I cannot see a reason to use the CS_EXECUTING flag here. Instead, use
the regular callout_stop()/callout_drain() dance to halt the callout.
Reported by: ler
Tested by: ler, pho
Sponsored by: The FreeBSD Foundation