stand: zfs: handle holes at the tail end correctly
This mirrors dmu_read_impl(), zeroing out the tail end of the buffer and
clipping the read to what's contained by the block that exists.
This fixes an issue that arose during the 13.1 release process; in
13.1-RC1 and later, setting up GELI+ZFS will result in a failure to
boot. The culprit is this, which causes us to fail to load geom_eli.ko
as there's a residual portion after the single datablk that should be
zeroed out.
John Baldwin [Thu, 21 Apr 2022 17:41:09 +0000 (10:41 -0700)]
busdma_bounce: Make the map waiting list per-bounce-zone.
When pages are freed to a bounce zone, only maps waiting for pages for
that zone can make forward progress. If a map for a different bounce
zone is at the head of the global list, then requests that could
otherwise make forward progress will be stalled waiting on the other
bounce zone. If bounce zones shared bounce pages then a global list
would still make sense to prevent "later" requests from starving an
earlier request but that is not a concern with per-zone bounce page
pools.
John Baldwin [Thu, 21 Apr 2022 17:29:14 +0000 (10:29 -0700)]
FB_INSTALL_CDEV: Remove this option and related code.
This option was never enabled in GENERIC and does not appear to work
(the cdevsw is stored in a global array but never passed to make_dev
to be associated with a character device).
Mark Johnston [Thu, 21 Apr 2022 17:22:09 +0000 (13:22 -0400)]
mld6: Ensure that mld_domifattach() always succeeds
mld_domifattach() does a memory allocation under the global MLD mutex
and so can fail, but no error handling prevents a null pointer
dereference in this case. The mutex is only needed when updating the
global softc list; the allocation and static initialization of the softc
does not require this mutex. So, reduce the scope of the mutex and use
M_WAITOK for the allocation.
PR: 261457
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34943
pf: counter argument to pfr_pool_get() may never be NULL
Coverity points out that if counter was NULL when passed to
pfr_pool_get() we could potentially end up dereferencing it.
Happily all users of the function pass a non-NULL pointer. Enforce this
by assertion and remove the pointless NULL check.
Move the use of 'sc' to after the NULL check.
It's very unlikely that we'd actually hit this, but Coverity is correct
that it's not a good idea to dereference the pointer and only then NULL
check it.
Mark Johnston [Thu, 21 Apr 2022 14:49:22 +0000 (10:49 -0400)]
ctfdump: Remove definitions of warn() and vwarn()
The presence of the latter causes a link error when building a
statically linked ctfdump(1) because libc defines the same symbol.
libc's warn() is defined as a weak symbol and so does not cause the same
problem, but let's just use libc's version.
Reported by: stephane rochoy <stephane.rochoy@stormshield.eu>
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
xhci(4): Ensure the so-called data toggle gets properly reset.
Use the drop and enable endpoint context commands to force a reset of
the data toggle for USB 2.0 and USB 3.0 after:
- clear endpoint halt command (when the driver wishes).
- set config command (when the kernel or user-space wants).
- set alternate setting command (only affected endpoints).
Some XHCI HW implementations may not allow the endpoint reset command when
the endpoint context is not in the halted state.
Reported by: Juniper and Gary Jennejohn
MFC after: 1 week
Sponsored by: NVIDIA Networking
Doug Moore [Wed, 20 Apr 2022 22:24:11 +0000 (17:24 -0500)]
dev/iommu: Include offset in maxaddr check.
If iommu_gas_match_one has to adjust for a boundary crossing, its
check against maxaddr includes 'offset' in its calculation, to ensure
that the allocated memory does not exceed the max address. However, if
there's no boundary crossing adjustment, then the maxaddr check
disregards 'offset'. Fix that.
Alan Somers [Wed, 21 Apr 2021 22:56:48 +0000 (16:56 -0600)]
ctlstat: add prometheus output
When invoked by inetd, ctlstat -P will now produce output suitable for
ingestion into Prometheus.
It's a drop-in replacement for https://github.com/Gandi/ctld_exporter,
except that it doesn't report the number of initiators per target, and
it does report time and dma_time.
Andrew Turner [Tue, 19 Apr 2022 16:22:37 +0000 (17:22 +0100)]
Fill the page size array in one posix shm test
The largepage_config posix shared memory test was failing on arm64 as
the page size array is never filled out. Fix this by calling
getpagesizes(3), via pagesizes.
Reviewed by: markj, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34960
15b1eb142c changed the callout code to store the CALLOUT_SHAREDLOCK flag
in c_iflags (where it used to be c_flags), but failed to update the
check in softclock_call_cc(). This resulted in the callout code always
taking the write lock, even if a read lock had been requested (with
the CALLOUT_SHAREDLOCK flag in callout_init_rm()).
When we issue a request to pf and expect a serialised nvlist as a reply
we have to supply a suitable buffer to the kernel.
The required size for this buffer is difficult to predict, and may be
(slightly) different from request to request.
If it's insufficient the kernel will return ENOSPC. Teach libpfctl to
catch this and send the request again with a larger buffer.
Previously it was disabled right before translation was enabled.
This way the disable logic is still executed even when translation
is not be activated, e.g. with hw.iommu.dma=0 tunable set.
On some platforms we need to disable PMR in order for core dump to work.
At the same time it was observed that enabling translation has
a significant impact on network performance.
With this patch PMR can be disabled, with IOMMU translation not being
turned on by appending the following to the loader.conf:
Ed Maste [Tue, 19 Apr 2022 19:44:46 +0000 (15:44 -0400)]
capsicum: briefly describe capabilities in man page
Provide a very brief introduction to capabilities, using a couple of
sentences from David Chisnall's mailing list response[1] to a question
about Linux capabilities and Capsicum.
Mailing list subject (in case the archive URL changes) was
Re: Linux capabilities to Capsicum
John Baldwin [Tue, 19 Apr 2022 17:43:06 +0000 (10:43 -0700)]
Deprecate the 'devclass' argument from *DRIVER_MODULE() macros.
This argument is useless for the vast majority of drivers. For now,
use __VA_ARGS__ wrapper macros so that that the *DRIVER_MODULE()
macros accept both the old version (with a devclass) and the new
version (which omits the argument and stores NULL in the
driver_module_data structure). This provides an API compatiblity
shim that can be merged to older stable branches.
Once all drivers relevant to 14.0 (both in and out of tree) have been
updated, the API compat shims can be dropped.
Insert padding in __cxa_exception struct for compatibility
Similar to https://github.com/llvm/llvm-project/commit/f2a436058fcb, the
addition of __attribute__((__aligned__)) to _Unwind_Exception (in commit b9616964) causes implicit padding to be inserted before the unwindHeader
field in __cxa_exception.
Applications attempt to get at the earlier fields in __cxa_exception, so
preserve the same negative offsets in __cxa_exception, by moving the
padding to the beginning of the struct.
The assumption here is that if the ABI is not aware of the padding
before unwindHeader and put the referenceCount/primaryException in
there, no padding should exist before unwindHeader.
This should make libreoffice's custom exception handling mechanisms work
correctly, even if it was built against an older cxxabi.h/unwind.h pair.
Tom Jones [Tue, 19 Apr 2022 15:20:24 +0000 (16:20 +0100)]
diff3: Add support for -m
diff3 in -m mode generates a complete file with changes bracketed with
conflict markers. This adds support for diff3 to generate version
control style three way merge output.
The output format was inferred from looking at the gnu diff3 output on a
selection of test files as a specification of what diff3 -m should
output is not available. It is likely there are cases where the -m
output differs from other tools and I am happy to update diff3 to
address these.
Discussed with: pstef, kevans
Sponsored by: Klara, Inc.
Tom Jones [Tue, 19 Apr 2022 14:43:35 +0000 (15:43 +0100)]
diff3: Add support for -A
Diff3 in -A mode generates an ed script to show how the 3 files and
brackets changes that conflict. The ed script generated should when
applied leave familiar merge conflict markers in a patched file.
Diff3 output is not documented, this feature has been arrived at by
comparing bsd diff3 output to gnu diff3 output until they were made to
agree. There are likely to still be differences between these formats.
The gnu diff3 guide is actually quite good at explaining how diff3
output should appear, but it doesn't cover every form of output from
diff3.
Tom Jones [Tue, 19 Apr 2022 13:47:07 +0000 (14:47 +0100)]
diff3: Clean up printing of ranges for edscript output
Replace the edscript code that tracked and printed lines using byte
offsets with code that can work from line offsets.
This tidies up the reduces duplication in the edscript output code. It
also fixes the usage of the de struct so that it only tracks diffs as
line offsets rather than the usage changing from line offsets to byte
offsets during the lifetime of diff3.
Large files with large numbers of ranges will probably suffer in
performance here, but as we don't use diff3 yet this isn't a regression.
Include a warning for future hackers so they have a place to start
hacking from.
Reviewed by: pstef
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D34941
Alan Somers [Mon, 18 Apr 2022 21:29:37 +0000 (15:29 -0600)]
prometheus_sysctl_exporter: fix metric aliasing
When exporting sysctls to Prometheus, the exporter replaces "." with
"_". This caused several metrics to alias, confusing the Prometheus
server. Fix it by:
* Renaming the "tcp_log_bucket" UMA zone to "tcp_log_id_bucket". Also,
rename "tcp_log_node" to "tcp_log_id_node" for consistency.
* Not exporting sysctls with "(LEGACY)" in the description. That is
used by ZFS sysctls that have been replaced by others, many of which
alias to the same Prometheus metric name (like "vfs.zfs.arc_max" and
"vfs.zfs.arc.max").
diff: tests: loosen up requirements for report_identical
This test cannot run without an unprivileged_user being specified
anyways, so just run as the unprivileged user. Revoking read permisions
works just as well if you're guaranteed non-root.
Reviewed by: pstef
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D34950
Alan Somers [Mon, 18 Apr 2022 23:03:53 +0000 (17:03 -0600)]
fusefs: correctly handle servers that report too much data written
During a FUSE_WRITE, the kernel requests the server to write a certain
amount of data, and the server responds with the amount that it actually
did write. It is obviously an error for the server to write more than
it was provided, and we always treated it as such, but there were two
problems:
* If the server responded with a huge amount, greater than INT_MAX, it
would trigger an integer overflow which would cause a panic.
* When extending the file, we wrongly set the file's size before
validing the amount written.
Michael Tuexen [Mon, 18 Apr 2022 22:40:31 +0000 (00:40 +0200)]
if_vtnet: improve dumping a kernel
Disable software LRO during kernel dumping, because having it enabled
requires to be in a network epoch, which might or might not be the
case depending on the code path resulting in the panic.
Reviewed by: markj
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D34787
Mark Johnston [Mon, 18 Apr 2022 21:16:10 +0000 (17:16 -0400)]
geli: Add a chicken switch for unmapped I/O
We have a report of a panic in GELI that appears to go away when
unmapped I/O is disabled. Add a tunable to make such investigations
easier in the future. No functional change intended.
PR: 262894
Reviewed by: asomers
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34944
John Baldwin [Mon, 18 Apr 2022 21:09:20 +0000 (14:09 -0700)]
arm ti_mbox_attach: Write sysconfig to TI_MBOX_SYSCONFIG to request reset.
This variable was flagged as a set but unused warning as its value was
read from a register and then modified to set a bit
(TI_MBOX_SYSCONFIG_SOFTRST). After the variable is modified, the code
then loops waiting for the SOFTRST bit to go clear in the
TI_MBOX_SYSCONFIG register. Presumably merely reading from the
register does not request a reset as other places in the driver read
this register, so most likely the updated value of sysconfig setting
the reset bit is supposed to be written to the register to request a
reset before the polling loop that waits for the reset to finish.