Dimitry Andric [Mon, 25 Dec 2023 17:18:31 +0000 (18:18 +0100)]
Minimize libc++ errno-related header diffs with upstream
In commit 88640c0e8b6f5 the new EINTEGRITY errno value was added, and
this caused us to carry a patch for upstream libc++ since that time.
Because it can cause merge conflicts when importing libc++ code from
upstream, I have submitted an upstream pull request to get most of that
patch integrated.
It turns out that we do not need the errno.h part of it at all, since
all supported FreeBSD versions define EOWNERDEAD and ENOTRECOVERABLE,
and therefore the block that juggles with ELAST values is never used in
FreeBSD. At the moment it only applies to older versions of Linux, or
possibly other platforms.
Therefore the only part that needs to stay is the definition of a enum
errc value for EINTEGRITY, and this is made optional upon EINTEGRITY
being defined, to make it suitable for upstreaming.
The scalar implementation is fairly simplistic and only performs
slightly better than the generic C implementation. It could be
improved by using the same algorithm as for memchr, but it would
have been a lot more complicated.
The baseline implementation is similar to timingsafe_memcmp. It's
slightly slower than memchr() due to the more complicated main
loop, but I don't think that can be significantly improved.
lib/libc/string: document restrict qualification of memccpy() arguments
POSIX.1-2004 and the upcoming C23 agree that memccpy()'s arguments
are restrict qualified and must not overlap. In 2002, restrict
qualifiers were added to <string.h>'s declaration of the function.
Make things official and document that the arguments must not
overlap.
Based on the strlcpy code from D42863, this patch adds a SIMD-enhanced
implementation of memccpy for amd64. A scalar implementation calling
into memchr and memcpy to do the job is provided, too.
Please note that this code does not behave exactly the same as the C
implementation of memccpy for overlapping inputs. However, overlapping
inputs are not allowed for this function by ISO/IEC 9899:1999 and neither
has the C implementation any code to deal with the possibility. It just
proceeds byte-by-byte, which may or may not do the expected thing for
some overlaps. We do not document whether overlapping inputs are
supported in memccpy(3).
Somewhat similar to stpncpy, but different in that we need to compute
the full source length even if the buffer is shorter than the source.
strlcat is implemented as a simple wrapper around strlcpy. The scalar
implementation of strlcpy just calls into strlen() and memcpy() to do
the job.
Perf-wise we're very close to stpncpy. The code is slightly slower as
it needs to carry on with finding the source string length even if the
buffer ends before the string.
Sponsored by: The FreeBSD Foundation
Tested by: developers@, exp-run
Approved by: mjg
MFC after: 1 month
MFC to: stable/14
PR: 275785
Differential Revision: https://reviews.freebsd.org/D42863
lib/libc/amd64/string/strcat.S: enable use of SIMD
strcat has a bespoke scalar assembly implementation we
inherited from NetBSD. While it performs well, it is
better to call into our SIMD implementations if any SIMD
features are available at all. So do that and implement
strcat() by calling into strlen() and strcpy() if these
are available.
Sponsored by: The FreeBSD Foundation
Tested by: developers@, exp-run
Approved by: mjg
MFC after: 1 month
MFC to: stable/14
PR: 275785
Differential Reviison: https://reviews.freebsd.org/D42600
This was surprisingly annoying to get right, despite being such a simple
function. A scalar implementation is also provided, it just calls into
our optimised memchr(), memcpy(), and memset() routines to carry out its
job.
I'm quite happy with the performance. glibc only beats us for very long
strings, likely due to the use of AVX-512. The scalar implementation
just calls into our optimised memchr(), memcpy(), and memset() routines,
so it has a high overhead to begin with but then performs ok for the
amount of effort that went into it. Still beats the old C code, except
for very short strings.
Sponsored by: The FreeBSD Foundation
Tested by: developers@, exp-run
Approved by: mjg
MFC after: 1 month
MFC to: stable/14
PR: 275785
Differential Revision: https://reviews.freebsd.org/D42519
lib/libc/amd64/string: implement strsep() through strcspn()
The strsep() function is basically strcspn() with extra steps.
On amd64, we now have an optimised implementation of strcspn(),
so instead of implementing the inner loop manually, just call
into the optimised routine.
Sponsored by: The FreeBSD Foundation
Tested by: developers@, exp-run
Approved by: mjg
MFC after: 1 month
MFC to: stable/14
PR: 275785
Differential Revision: https://reviews.freebsd.org/D42346
The baseline implementation is very straightforward, while the scalar
implementation suffers from register pressure and the need to use SWAR
techniques similar to those used for strchr().
Sponsored by: The FreeBSD Foundation
Tested by: developers@, exp-run
Approved by: mjg
MFC after: 1 month
MFC to: stable/14
PR: 275785
Differential Revision: https://reviews.freebsd.org/D42217
The scalar implementation is fairly straightforward and merely unrolled
four times. The baseline implementation closely follows D41971 with
appropriate extensions and extra code paths to pay attention to string
length.
Performance is quite good. We beat both glibc (except for very long
strings, but they likely use AVX which we don't) and Bionic (except for
medium-sized aligned strings, where we are still in the same ballpark).
Sponsored by: The FreeBSD Foundation
Tested by: developers@, exp-run
Approved by: mjg
MFC after: 1 month
MFC to: stable/14
PR: 275785
Differential Revision: https://reviews.freebsd.org/D42122
This is the most complicated one so far. The basic idea is to process
the bulk of the string in aligned blocks of 16 bytes such that one
string runs ahead and the other runs behind. The string that runs ahead
is checked for NUL bytes, the one that runs behind is compared with the
corresponding chunk of the string that runs ahead. This trades an extra
load per iteration for the very complicated block-reassembly needed in
the other implementations (bionic, glibc). On the flip side, we need
two code paths depending on the relative alignment of the two buffers.
The initial part of the string is compared directly if it is known not
to cross a page boundary. Otherwise, a complex slow path to avoid
crossing into unmapped memory commences.
Performance-wise we beat bionic for misaligned strings (i.e. the strings
do not share an alignment offset) and reach comparable performance for
aligned strings. glibc is a bit better as it has a special kernel for
AVX-512, where this stuff is a bit easier to do.
Sponsored by: The FreeBSD Foundation
Tested by: developers@, exp-run
Approved by: mjg
MFC after: 1 month
MFC to: stable/14
PR: 275785
Differential Revision: https://reviews.freebsd.org/D41971
Alexander Motin [Sun, 24 Dec 2023 23:18:11 +0000 (18:18 -0500)]
ig4: Actively use FIFO thresholds
Before every wait for FIFO interrupt set how much data/space do we
want to see there. Previous code was not using it for receive, as
result aggregating interrupts only within processing latency. The
new code needs only one interrupt per transfer per FIFO length.
On my Dell XPS 13 9310 with iichid(4) touchscreen and touchpad this
reduces the interrupt rate per device down to 2 per sample or 16-20
per second when idle and 120-160 per second when actively touched.
LinuxKPI: Add explicit software context to FPU sections
Amdgpu driver does a lot of memory allocations in FPU-protected sections
of code for certain display cores, e.g. for DCN30. This does not work
currently on FreeBSD as its malloc function can not be run within a
critical section. Allocate memory for FPU context to overcome such
restriction.
LinuxKPI: Add pcie_capability_clear_and_set_word() function
It does a Read-Modify-Write operation using clear and set bitmasks on
PCI Express Capability Register at pos. As certain PCI Express
Capability Registers are accessed concurrently in RMW fashion, hence
require locking which is handled transparently to the caller.
and bitmap_shift_right() functions to linux/bitmap.h
They perform calculation of two bitmaps intersection,
copying the contents of u32 array of bits to bitmap and
logical right shifting of the bits in a bitmap.
LinuxKPI: Remove sys/rman.h include from LKPI headers.
sys/rman.h defines `resource` structure which conflicts with the Linux
structure of the same name. To fix that remove reference to sys/rman.h
from linux/pci.h and move resource management code to linux_pci.c.
Update consumers which were depending on linux/pci.h pollution.
Avoid waiting on physical allocations that can't possibly be satisfied
- Change vm_page_reclaim_contig[_domain] to return an errno instead
of a boolean. 0 indicates a successful reclaim, ENOMEM indicates
lack of available memory to reclaim, with any other error (currently
only ERANGE) indicating that reclamation is impossible for the
specified address range. Change all callers to only follow
up with vm_page_wait* in the ENOMEM case.
- Introduce vm_domainset_iter_ignore(), which marks the specified
domain as unavailable for further use by the iterator. Use this
function to ignore domains that can't possibly satisfy a physical
allocation request. Since WAITOK allocations run the iterators
repeatedly, this avoids the possibility of infinitely spinning
in domain iteration if no available domain can satisfy the
allocation request.
Alexander Motin [Sun, 24 Dec 2023 00:10:49 +0000 (19:10 -0500)]
iichid(4): Restore/increase sampling rate
My previous commit by reducing precision reduced the sampling rate
from 60Hz to 40Hz on idle system. Return it back to 60-80Hz range,
that should be good for mouse smoothness on 60Hz displays.
Alexander Motin [Sun, 24 Dec 2023 00:02:49 +0000 (19:02 -0500)]
ig4: Fix FIFO depths detection
At least on my Tiger Lake-LP queue depth detection failed before the
ig4iic_set_config() call, resulting in no FIFO use. Moving it after
solves the problem, getting proper 64 bytes size.
On my Dell XPS 13 9310 with iichid(4) touchscreen and touchpad this
by few times reduces context switch rate in the driver, and probably
also improves the I2C bus utilization.
Xin LI [Sat, 23 Dec 2023 06:46:33 +0000 (22:46 -0800)]
newsyslog(8): Add option to globally override compression method.
Historically, newsyslog compressed rotated log files to save disk space.
This was useful in the early days. However, with modern file systems like
ZFS offering native compression, and with the availability of larger hard
drives, the benefits of additional compression have become less significant.
This is particularly true considering the inconvenience of decompressing
log files when searching for specific patterns.
Additionally, the original implementation of compression methods was not
future-proof. As a result, we have redefined the J, X, Y, Z flags to
signify "treat the file as compressible" rather than "compress the file
with that specific method."
A new command-line option, -c, has been introduced to allow overriding
these settings in a more future-proof way. The available choices are:
* none - do not compress, regardless of flag.
* legacy - historical behavior: J=bzip2, X=xz, Y=zstd, Z=gzip.
* bzip2, xz, zstd, gzip - apply the specified compression method.
Currently, the default is set to 'legacy' to preserve historical behavior.
However, our intention is to change this default to 'none' in FreeBSD 15.0.
Additionally, this update changes the default settings for zstd to use
multithreading and long-range options, better aligning with its intended
use.
Alexander Motin [Sat, 23 Dec 2023 03:50:52 +0000 (22:50 -0500)]
iichid(4): Improve idle sampling hysteresis
In sampling mode some devices return same data indefinitely even if
there is nothing to report. Previous idle hysteresis implementation
activated only when device returned no data, so some devices ended up
polled at fast rate all the time. This new implementation compares
each new report with the previous, and, if they are identical, after
reaching threshold also drop sampling rate to slow.
On my Dell XPS 13 9310 with iichid(4) touchscreen and touchpad this
reduces idle power consumption by ~0.5W by reducing number of context
switches in the driver from ~4000 to ~700 per second when not touched.
Commit 7c5146da1286 modified mountd so that it uses
strunvis(3) to decode directory names in exports lines.
This allows special characters, such as blanks, to be
encoded in the directory names.
This patch updates the exports.5 man page for this change.
Rick Macklem [Fri, 22 Dec 2023 20:11:22 +0000 (12:11 -0800)]
nfscl: Fix handling of a copyout() error reply
If vfs.nfs.nfs_directio_enable is set non-zero (the default is
zero) and a file on an NFS mount is read after being opened
with O_DIRECT | O_ RDONLY, a call to nfsm_mbufuio() calls
copyout() without checking for an error return.
If copyout() returns EFAULT, this would not work correctly.
Only the call path
VOP_READ()->ncl_readrpc()->nfsrpc_read()->nfsrpc_readrpc()
will do this and the error return for EFAULT will
be returned back to VOP_READ().
This patch adds the error check to nfsm_mbufuio().
Namely, switch from vm_fault_quick_hold() to pmap_extract() KPI to
translate gpa to hpa. Assert that the looked up hpa belongs to the wired
page, as it should be for the VM which is configured for pass-throu
(this is theoretically a restriction that could be removed on newer
DMARs).
Noted by: alc
Reviewed by: alc, jhb, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D43140
John Baldwin [Fri, 22 Dec 2023 15:49:40 +0000 (07:49 -0800)]
kldxref: Appease a Coverity warning
While parsing .dynamic, nsym is set when parsing the symbol table from
.dynsym. That parsing also sets ef->ef_symtab to a non-NULL value.
The value of nsym isn't validated until after a check for
ef->ef_symtab being NULL, so nsym always has a valid value when it is
read. However, that chain of events is a bit much for static analysis
to follow, so initialize nsym to 0 before parsing sections to quiet
the warning.
John Baldwin [Fri, 22 Dec 2023 15:49:18 +0000 (07:49 -0800)]
kldxref: Simplify handling of ELF object files
Unlike the backend for ELF DSOs, the object file backend allocated an
aligned chunk of memory and read all of the in-memory sections from
the file into this memory even though most of the file contents were
never used. Instead, just track a set of virtual addresses (based at
0) that each loaded section would be loaded at and only read the
necessary bits from the backing file when needed.
John Baldwin [Fri, 22 Dec 2023 15:49:03 +0000 (07:49 -0800)]
kldxref: Simplify elf_read_raw_data
Use pread as a valid offset is always passed now. Originally the DSO
code read the .hash section in two separate requests and relied on the
implicit offset for the second read, but now the hash table is fetched
in a single call.
Bill Sommerfeld [Thu, 21 Dec 2023 03:46:14 +0000 (10:46 +0700)]
regex: mixed sets are misidentified as singletons
Fix "singleton" function used by regcomp() to turn character set matches
into exact character matches if a character set has exactly one
element.
The underlying cset representation is complex; most critically it
records"small" characters (codepoint less than either 128
or 256 depending on locale) in a bit vector, and "wide" characters in
a secondary array.
Unfortunately the "singleton" function uses to identify singleton sets
treated a cset as a singleton if either the "small" or the "wide" sets
had exactly one element (it would then ignore the other set).
The easiest way to demonstrate this bug:
$ export LANG=C.UTF-8
$ echo 'a' | grep '[abà]'
It should match (and print "a") but instead it doesn't match because the
single accented character in the set is misinterpreted as a singleton.
Bjoern A. Zeeb [Mon, 23 Oct 2023 23:14:35 +0000 (23:14 +0000)]
LinuxKPI: reduce impact of large MAXCPU
Start scaling arrays dynamically instead of using MAXCPU, resulting in
extra allocations on startup but reducing the overall memory footprint.
For the static single CPU mask we provide two versions to further save
memory depending on a low or high CPU count system. The threshold to
switch is currently at 128 CPUs on 64bit platforms.
More detailed comments on the implementations can be found in the code.
If I am not wrong on a MAXCPU=65536 system the memory footprint should
roughly go down from 512M to 1.5M for the static single CPU mask.
Submitted by: olce (most of this final version)
Sponsored by: The FreeBSD Foundation
PR: 274316
Differential Revision: https://reviews.freebsd.org/D42345
Bjoern A. Zeeb [Sat, 2 Dec 2023 20:40:51 +0000 (20:40 +0000)]
net80211: adjust more VHT structures/fields
Replace ieee80211_ie_vhtcap with ieee80211_vht_cap and
ieee80211_ie_vht_operation with ieee80211_vht_operation.
The "ie" version has the two bytes type/length at the beginning which
we did not actually use as such (the one place doing did just as unused
extra work).
Using the non-"ie" versions allows us to re-use them on shared code.
Using an enum helps us to not accidentally get unsuppored or unhandled
values tough we cannot use it in the struct as we need to ensure the
field width.
ieee80211_vht_operation is guarded by _KERNEL/WANT_NET80211. While the
header is supposed to be exported to user land historically, software
such as wpa bring their own structure definitions. For in-tree usage
it is only ifconfig which really cares (at least for now).
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Reviewed by: adrian (earlier), cc
Differential Revision: https://reviews.freebsd.org/D42901
Bjoern A. Zeeb [Sun, 12 Nov 2023 23:51:14 +0000 (23:51 +0000)]
net80211: improve logging about state transitions lost
It is possible that we call ieee80211_new_state_locked() again before
a previous task finished to completion (not run yet or unlocked in
between) since 5efea30f039c4 (and follow-up).
In either case we would overwrite the new state and argument in the vap.
While most drivers somehow deal with that (or not), LinuxKPI 802.11 compat
code has KASSERTs to keep net80211, LinuxKPI and driver/firmware state in
sync and they may trigger due to a missing transition or more likely a
changed ni/lsta.
Enhance the wlandebug +state logging for these cases so they
are easier to debug.
While here remove the unconditional logging to the message buffer;
it has been here for a good decade but not helped to actually identify
and sort the problem.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Reviewed by: cc
Differential Revision: https://reviews.freebsd.org/D42560
Dimitry Andric [Thu, 21 Dec 2023 22:41:57 +0000 (23:41 +0100)]
Silence VLA extension warnings in fusefs tests
Building tests/sys/fs/fusefs with clang 18 results the following
warning:
tests/sys/fs/fusefs/cache.cc:145:14: error: variable length arrays in C++ are a Clang extension [-Werror,-Wvla-cxx-extension]
145 | uint8_t buf[bufsize];
| ^~~~~~~
Because we do not particularly care that this is a clang extension,
suppress the warning.
Dimitry Andric [Thu, 21 Dec 2023 22:39:15 +0000 (23:39 +0100)]
Silence snprintf truncation warnings in printf_test examples
Building share/examples/tests with clang 18 results in a few warnings
like:
share/examples/tests/tests/plain/printf_test.c:67:6: error: 'snprintf' will always be truncated; specified size is 10, but format string expands to at least 17 [-Werror,-Wformat-truncation]
67 | if (snprintf(buffer, sizeof(buffer), "0123456789abcdef") != 16)
| ^
Since these tests are meant as an example of testing snprintf overflow,
suppress the warnings.
Dimitry Andric [Thu, 21 Dec 2023 22:35:17 +0000 (23:35 +0100)]
Fix snprintf truncation in telnet
Building telnet with clang 18 results in the following warning:
contrib/telnet/telnet/telnet.c:231:5: error: 'snprintf' will always be truncated; specified size is 10, but format string expands to at least 11 [-Werror,-Wformat-truncation]
231 | snprintf(temp2, sizeof(temp2), "%c%c%c%c....%c%c", IAC, SB, TELOPT_COMPORT,
| ^
The temp2 buffer is 10 chars, while the format string also consists of
10 chars. Therefore, snprintf(3) will truncate the last character, 'SE'
(end sub negotation) in this case.