Alexander Motin [Tue, 26 Dec 2023 02:19:28 +0000 (21:19 -0500)]
acpi_cpu: Reduce BUS_MASTER_RLD manipulations
Instead of setting and clearing BUS_MASTER_RLD register on every C3
state enter/exit, set it only once if the system supports C3 state
and we are going to "disable" bus master arbitration while in it.
This is what Linux does for the past 14 years, and for even more time
this register is not implemented in a relevant hardware. Same time
since this is only a single bit in a bigger register, ACPI has to
do take a global lock and do read-modify-write for it, that is too
expensive, saved only by C3 not entered frequently, but enough to be
seen in idle system CPU profiles.
Mark Johnston [Tue, 26 Dec 2023 01:43:21 +0000 (20:43 -0500)]
nmount: Ignore errors when copying out an error string
In general we copy error strings as part of reporting an error from
lower layers, so if the copyout() fails there's nothing to do since we'd
prefer to preserve the original error.
This is in preparation for annotating copyin() and related functions
with __result_use_check.
Mark Johnston [Tue, 26 Dec 2023 01:43:06 +0000 (20:43 -0500)]
geom: Report copyout() errors in g_ctl_ioctl_ctl()
Despite the name, req->serror is used in some cases to copy non-error
messages to userspace. So, report errors when copying out so long as
they don't clobber an earlier error.
Mark Johnston [Tue, 26 Dec 2023 01:42:58 +0000 (20:42 -0500)]
gntdev: Handle errors from suword32() in gntdev_alloc_gref()
Try to copy out output values before handling errors, and check that we
did so successfully. In particular, it doesn't seem sensible to ignore
errors here, otherwise userspace won't have any way to refer to the
allocations.
This is in preparation for annotating copyin() and related functions
with __result_use_check.
Mark Johnston [Tue, 26 Dec 2023 01:42:49 +0000 (20:42 -0500)]
mpr: Handle errors from copyout() in ioctl handlers
In preparation for adding a __result_use_check annotation to copyin()
and related functions, start checking for errors from copyout() in
the mpr(4) user command handler. This should make it easier to catch
bugs.
Mark Johnston [Tue, 26 Dec 2023 01:42:33 +0000 (20:42 -0500)]
mps: Handle errors from copyout() in ioctl handlers
In preparation for adding a __result_use_check annotation to copyin()
and related functions, start checking for errors from copyout() in
the mps(4) user command handler. This should make it easier to catch
bugs.
Mark Johnston [Tue, 26 Dec 2023 01:41:32 +0000 (20:41 -0500)]
sendfile: Explicitly ignore errors from copyout()
There is a documented bug in sendfile.2 which notes that sendfile(2)
does not raise an error if it fails to copy out the number of bytes
written. Explicitly ignore the error from copyout() calls in
preparation for annotating copyout() with __result_use_check.
Mark Johnston [Tue, 26 Dec 2023 01:39:39 +0000 (20:39 -0500)]
thread: Add a return value to cpu_set_upcall()
Some implementations copy data to userspace, an operation which can in
principle fail. In preparation for adding a __result_use_check
annotation to copyin() and related functions, let implementations of
cpu_set_upcall() return an error, and check for errors when copying data
to user memory.
Mark Johnston [Tue, 26 Dec 2023 01:38:57 +0000 (20:38 -0500)]
ocs: Check for copyin errors in the ioctl handler
If copyin() fails, the driver will blindly proceed with whatever had
been in the uninitialized DMA buffer. This is not what we want. Check
for copyin failures.
This is in preparation for annotating copyin() and related functions
with __result_use_check.
Mark Johnston [Tue, 26 Dec 2023 01:38:12 +0000 (20:38 -0500)]
mpi3mr: Check for copyin errors in mpi3mr_map_data_buffer_dma()
A failed copyin will cause the driver to use the contents of
uninitialized buffers instead, which is unlikely to be the behaviour
that we want. Check for errors.
This is in preparation for annotating copyin() and related functions
with __result_use_check.
Mark Johnston [Tue, 26 Dec 2023 01:37:49 +0000 (20:37 -0500)]
hid: Handle errors from copyin() in ioctl handlers
If copyin() fails, the driver will proceed blindly with a zeroed buffer,
which is not what we want. In preparation for annotating copyin() with
__result_use_check, start checking for errors.
Dimitry Andric [Mon, 25 Dec 2023 17:18:31 +0000 (18:18 +0100)]
Minimize libc++ errno-related header diffs with upstream
In commit 88640c0e8b6f5 the new EINTEGRITY errno value was added, and
this caused us to carry a patch for upstream libc++ since that time.
Because it can cause merge conflicts when importing libc++ code from
upstream, I have submitted an upstream pull request to get most of that
patch integrated.
It turns out that we do not need the errno.h part of it at all, since
all supported FreeBSD versions define EOWNERDEAD and ENOTRECOVERABLE,
and therefore the block that juggles with ELAST values is never used in
FreeBSD. At the moment it only applies to older versions of Linux, or
possibly other platforms.
Therefore the only part that needs to stay is the definition of a enum
errc value for EINTEGRITY, and this is made optional upon EINTEGRITY
being defined, to make it suitable for upstreaming.
The scalar implementation is fairly simplistic and only performs
slightly better than the generic C implementation. It could be
improved by using the same algorithm as for memchr, but it would
have been a lot more complicated.
The baseline implementation is similar to timingsafe_memcmp. It's
slightly slower than memchr() due to the more complicated main
loop, but I don't think that can be significantly improved.
lib/libc/string: document restrict qualification of memccpy() arguments
POSIX.1-2004 and the upcoming C23 agree that memccpy()'s arguments
are restrict qualified and must not overlap. In 2002, restrict
qualifiers were added to <string.h>'s declaration of the function.
Make things official and document that the arguments must not
overlap.
Based on the strlcpy code from D42863, this patch adds a SIMD-enhanced
implementation of memccpy for amd64. A scalar implementation calling
into memchr and memcpy to do the job is provided, too.
Please note that this code does not behave exactly the same as the C
implementation of memccpy for overlapping inputs. However, overlapping
inputs are not allowed for this function by ISO/IEC 9899:1999 and neither
has the C implementation any code to deal with the possibility. It just
proceeds byte-by-byte, which may or may not do the expected thing for
some overlaps. We do not document whether overlapping inputs are
supported in memccpy(3).
Somewhat similar to stpncpy, but different in that we need to compute
the full source length even if the buffer is shorter than the source.
strlcat is implemented as a simple wrapper around strlcpy. The scalar
implementation of strlcpy just calls into strlen() and memcpy() to do
the job.
Perf-wise we're very close to stpncpy. The code is slightly slower as
it needs to carry on with finding the source string length even if the
buffer ends before the string.
Sponsored by: The FreeBSD Foundation
Tested by: developers@, exp-run
Approved by: mjg
MFC after: 1 month
MFC to: stable/14
PR: 275785
Differential Revision: https://reviews.freebsd.org/D42863
lib/libc/amd64/string/strcat.S: enable use of SIMD
strcat has a bespoke scalar assembly implementation we
inherited from NetBSD. While it performs well, it is
better to call into our SIMD implementations if any SIMD
features are available at all. So do that and implement
strcat() by calling into strlen() and strcpy() if these
are available.
Sponsored by: The FreeBSD Foundation
Tested by: developers@, exp-run
Approved by: mjg
MFC after: 1 month
MFC to: stable/14
PR: 275785
Differential Reviison: https://reviews.freebsd.org/D42600
This was surprisingly annoying to get right, despite being such a simple
function. A scalar implementation is also provided, it just calls into
our optimised memchr(), memcpy(), and memset() routines to carry out its
job.
I'm quite happy with the performance. glibc only beats us for very long
strings, likely due to the use of AVX-512. The scalar implementation
just calls into our optimised memchr(), memcpy(), and memset() routines,
so it has a high overhead to begin with but then performs ok for the
amount of effort that went into it. Still beats the old C code, except
for very short strings.
Sponsored by: The FreeBSD Foundation
Tested by: developers@, exp-run
Approved by: mjg
MFC after: 1 month
MFC to: stable/14
PR: 275785
Differential Revision: https://reviews.freebsd.org/D42519
lib/libc/amd64/string: implement strsep() through strcspn()
The strsep() function is basically strcspn() with extra steps.
On amd64, we now have an optimised implementation of strcspn(),
so instead of implementing the inner loop manually, just call
into the optimised routine.
Sponsored by: The FreeBSD Foundation
Tested by: developers@, exp-run
Approved by: mjg
MFC after: 1 month
MFC to: stable/14
PR: 275785
Differential Revision: https://reviews.freebsd.org/D42346
The baseline implementation is very straightforward, while the scalar
implementation suffers from register pressure and the need to use SWAR
techniques similar to those used for strchr().
Sponsored by: The FreeBSD Foundation
Tested by: developers@, exp-run
Approved by: mjg
MFC after: 1 month
MFC to: stable/14
PR: 275785
Differential Revision: https://reviews.freebsd.org/D42217
The scalar implementation is fairly straightforward and merely unrolled
four times. The baseline implementation closely follows D41971 with
appropriate extensions and extra code paths to pay attention to string
length.
Performance is quite good. We beat both glibc (except for very long
strings, but they likely use AVX which we don't) and Bionic (except for
medium-sized aligned strings, where we are still in the same ballpark).
Sponsored by: The FreeBSD Foundation
Tested by: developers@, exp-run
Approved by: mjg
MFC after: 1 month
MFC to: stable/14
PR: 275785
Differential Revision: https://reviews.freebsd.org/D42122
This is the most complicated one so far. The basic idea is to process
the bulk of the string in aligned blocks of 16 bytes such that one
string runs ahead and the other runs behind. The string that runs ahead
is checked for NUL bytes, the one that runs behind is compared with the
corresponding chunk of the string that runs ahead. This trades an extra
load per iteration for the very complicated block-reassembly needed in
the other implementations (bionic, glibc). On the flip side, we need
two code paths depending on the relative alignment of the two buffers.
The initial part of the string is compared directly if it is known not
to cross a page boundary. Otherwise, a complex slow path to avoid
crossing into unmapped memory commences.
Performance-wise we beat bionic for misaligned strings (i.e. the strings
do not share an alignment offset) and reach comparable performance for
aligned strings. glibc is a bit better as it has a special kernel for
AVX-512, where this stuff is a bit easier to do.
Sponsored by: The FreeBSD Foundation
Tested by: developers@, exp-run
Approved by: mjg
MFC after: 1 month
MFC to: stable/14
PR: 275785
Differential Revision: https://reviews.freebsd.org/D41971
Alexander Motin [Sun, 24 Dec 2023 23:18:11 +0000 (18:18 -0500)]
ig4: Actively use FIFO thresholds
Before every wait for FIFO interrupt set how much data/space do we
want to see there. Previous code was not using it for receive, as
result aggregating interrupts only within processing latency. The
new code needs only one interrupt per transfer per FIFO length.
On my Dell XPS 13 9310 with iichid(4) touchscreen and touchpad this
reduces the interrupt rate per device down to 2 per sample or 16-20
per second when idle and 120-160 per second when actively touched.
LinuxKPI: Add explicit software context to FPU sections
Amdgpu driver does a lot of memory allocations in FPU-protected sections
of code for certain display cores, e.g. for DCN30. This does not work
currently on FreeBSD as its malloc function can not be run within a
critical section. Allocate memory for FPU context to overcome such
restriction.
LinuxKPI: Add pcie_capability_clear_and_set_word() function
It does a Read-Modify-Write operation using clear and set bitmasks on
PCI Express Capability Register at pos. As certain PCI Express
Capability Registers are accessed concurrently in RMW fashion, hence
require locking which is handled transparently to the caller.
and bitmap_shift_right() functions to linux/bitmap.h
They perform calculation of two bitmaps intersection,
copying the contents of u32 array of bits to bitmap and
logical right shifting of the bits in a bitmap.