Mitchell Horne [Wed, 4 Aug 2021 17:31:36 +0000 (14:31 -0300)]
hwpmc: disable uncore class on Sandy Bridge and newer
It was written for Nehalem and Westmere, with minor but incomplete
updates for Sandy Bridge in 78d763a29b15. The uncore architecture
changed significantly with this generation, bringing new layouts and
locations for some MSRs.
Misprogramming these MSRs in ucp_start_pmc() may panic the system, and
this is trivially reproducible via pmcstat(8) on at least Broadwell and
Haswell. Disable the class on these CPUs until it can be updated more
completely and leave a TODO comment detailing some of the work required.
Note that the nclasses value for Broadwell was already incorrect and
doesn't need changing.
The result is that any uncore events listed by pmcstat -L will no longer
be allocatable, but this is already the case for newer generations of
Intel CPUs.
Numerous counters got migrated from straight uint64_t to the counter(9)
API. Unfortunately the implementation comes with a significiant
performance hit on some platforms and cannot be easily fixed.
Work around the problem by implementing a pf-specific variant.
Roy Marples [Wed, 28 Jul 2021 15:46:59 +0000 (08:46 -0700)]
socket: Implement SO_RERROR
SO_RERROR indicates that receive buffer overflows should be handled as
errors. Historically receive buffer overflows have been ignored and
programs could not tell if they missed messages or messages had been
truncated because of overflows. Since programs historically do not
expect to get receive overflow errors, this behavior is not the
default.
This is really really important for programs that use route(4) to keep
in sync with the system. If we loose a message then we need to reload
the full system state, otherwise the behaviour from that point is
undefined and can lead to chasing bogus bug reports.
Mark Johnston [Wed, 28 Jul 2021 14:16:42 +0000 (10:16 -0400)]
pf: Validate user string nul-termination before copying
Some pf ioctl handlers use strlcpy() to copy strings when converting
from user structures to their in-kernel representations. strlcpy()
ensures that the destination will be nul-terminated, but it assumes that
the source is nul-terminated. In particular, it returns the full length
of the source string, so if the source is not nul-terminated, strlcpy()
will keep scanning until it finds a nul byte, and it may encounter an
unmapped page first. Add a helper to validate user strings before
copying.
There are also places where we look up a ruleset using a user-provided
anchor string. In some ioctl handlers we were already nul-terminating
the string, avoiding the same problem, but in other places we were not.
Fix those by nul-terminating as well. Aside from being consistent,
anchors have a maximum length of MAXPATHLEN - 1 so calling strnlen()
might not be so desirable.
Reported by: syzbot+35a1549b4663e9483dd1@syzkaller.appspotmail.com
Reviewed by: kp
Sponsored by: The FreeBSD Foundation
Mark Johnston [Wed, 28 Jul 2021 14:16:25 +0000 (10:16 -0400)]
pf: Initialize arrays before copying out to userland
A number of pf ioctls populate an array of structures and copy it out.
They have the following structures:
- caller specifies the size of its output buffer
- ioctl handler allocates a kernel buffer of the same size
- ioctl handler populates the buffer, possibly leaving some items
initialized if the caller provided more space than needed
- ioctl handler copies the entire buffer out to userland
Thus, if more space was provided than is required, we end up copying out
uninitialized kernel memory. Simply zero the buffer at allocation time
to prevent this.
Reported by: KMSAN
Reviewed by: kp
Sponsored by: The FreeBSD Foundation
Alexander Motin [Wed, 28 Jul 2021 20:15:43 +0000 (16:15 -0400)]
Do not expose to scheduler caches of single CPU.
Before this change my dual-Xeon(R) Gold 6242R always reported 3 levels
or topology (root, package/L3 and core/L2). But with SMT disabled
core/L2 matches thread, so additional topology level only causes more
traversal work. With this change SMT case is reported same as before,
while non-SMT is reported with only 2 much more simple levels.
Kristof Provost [Mon, 2 Aug 2021 07:46:33 +0000 (09:46 +0200)]
pf: bound DIOCGETSTATES memory use
Similar to what we did earlier for DIOCGETSTATESV2 we only allocate
enough memory for a handful of states and copy those out, bit by bit,
rather than allocating memory for all states in one go.
pf: locally originating connections with 'route-to' fail
Similar to the REPLY_TO shortcut (6d786845cf) we also can't shortcut
ROUTE_TO. If we do we will fail to apply transformations or update the
state, which can lead to premature termination of the connections.
While nvlists are very useful in maximising flexibility for future
extensions their performance is simply unacceptably bad for the
getstates feature, where we can easily want to export a million states
or more.
The DIOCGETSTATESNV call has been MFCd, but has not hit a release on any
branch, so we can still remove it everywhere.
Andrew Turner [Thu, 8 Jul 2021 12:14:56 +0000 (12:14 +0000)]
Update the SCTLR_EL1 register definitions
They are valid as of the ARMv8.7 XML.
While here remove SCTLR_RES0 as it's unused and depends on which CPU
the kernel is running on and switch to shifted values as they are
easier to compare with the documentation.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31120
Andrew Turner [Mon, 14 Jun 2021 11:01:46 +0000 (11:01 +0000)]
Use the correct length when copying arm64 vfp registers
We passed the wrong length into memcpy in the arm64 get_fpcontext and
set_fpcontext. This caused us to copy two status registers we didn't
expect to copy.
These are safe as they exist in both the source and destination, although
in a different order, and we copy the correct values after the memcpy.
virtio: enable VTNET_LEGACY_TX when ALTQ is enabled.
ALTQ only works on network drivers which use if_start (rather than
if_transmit). vtnet uses if_start if built with VTNET_LEGACY_TX. Default
to that the kernel is built with ALTQ enabled, to reduce user surprise.
Alex Richardson [Mon, 2 Aug 2021 13:36:03 +0000 (14:36 +0100)]
Allow bootstrapping llvm-tblgen on macOS and Linux
This is needed in order to build various LLVM binutils (e.g. addr2line)
as well as clang/lld/lldb.
Co-authored-by: Jessica Clarke <jrtc27@FreeBSD.org>
Test Plan: Compiles on ubuntu 18.04 and macOS 11.4
Reviewed By: dim
Differential Revision: https://reviews.freebsd.org/D31057
Alex Richardson [Mon, 2 Aug 2021 08:45:05 +0000 (09:45 +0100)]
tools/build: Don't redefine open() for the linux bootstrap
This is needed to bootstrap llvm-tblgen on Linux since LLVM calls
`::open(...)` which does not work if open is a statement macro.
Also stop defining O_SHLOCK/O_EXLOCK and update the only bootstrap tools
user of those flags to deal with missing definitions.
Alex Richardson [Tue, 20 Jul 2021 15:04:56 +0000 (16:04 +0100)]
bsd.linker.mk: Detect LLD when built with PACKAGE_VENDOR
Recent versions of homebrew's LLD are built with PACKAGE_VENDOR (since
https://github.com/Homebrew/homebrew-core/commit/e7c972b6062af753e564104e58d1fa20c0d1ad7a).
This means that the -v output is now
`Homebrew LLD 12.0.1 (compatible with GNU linkers)` and bsd.linker.mk no
longer detects it as LLD since it only checks whether the first word is
LLD. This change allow me to build on macOS again and should unbreak the
GitHub actions CI.
Alex Richardson [Mon, 19 Jul 2021 14:04:19 +0000 (15:04 +0100)]
Allow building usr.bin/vi with MK_ASAN
We have to namespace the regex functions to avoid duplicate symbol errors.
This also ensures that vi doesn't define the libc reg* functions with
mismatched signatures.
ld: error: duplicate symbol: regcomp
>>> defined at sanitizer_common_interceptors.inc:7519 (/usr/src/contrib/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc:7519)
>>> asan_interceptors.o:(__interceptor_regcomp) in archive /usr/lib/clang/10.0.1/lib/freebsd/libclang_rt.asan-x86_64.a
>>> defined at regcomp.c
>>> .../regex/regcomp.c.o:(.text+0x0)
ld: error: duplicate symbol: regerror
>>> defined at sanitizer_common_interceptors.inc:7543 (/usr/src/contrib/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc:7543)
>>> asan_interceptors.o:(__interceptor_regerror) in archive /usr/lib/clang/10.0.1/lib/freebsd/libclang_rt.asan-x86_64.a
>>> defined at regerror.c
>>> .../regex/regerror.c.o:(.text+0x0)
ld: error: duplicate symbol: regexec
>>> defined at sanitizer_common_interceptors.inc:7530 (/usr/src/contrib/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc:7530)
>>> asan_interceptors.o:(__interceptor_regexec) in archive /usr/lib/clang/10.0.1/lib/freebsd/libclang_rt.asan-x86_64.a
>>> defined at regexec.c
>>> .../regex/regexec.c.o:(.text+0x0)
ld: error: duplicate symbol: regfree
>>> defined at sanitizer_common_interceptors.inc:7553 (/usr/src/contrib/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc:7553)
>>> asan_interceptors.o:(__interceptor_regfree) in archive /usr/lib/clang/10.0.1/lib/freebsd/libclang_rt.asan-x86_64.a
>>> defined at regfree.c
>>> .../regex/regfree.c.o:(.text+0x0)
Committed upstream as https://github.com/lichray/nvi2/pull/92
Alex Richardson [Tue, 6 Jul 2021 11:18:29 +0000 (12:18 +0100)]
Fix building rescue/rescue when sanitizers are enabled
We have to ensure that we don't link any instrumented object files
into rescue as it is a static executable and static binaries can't
use the sanitizer runtime.
Alex Richardson [Tue, 6 Jul 2021 11:16:40 +0000 (12:16 +0100)]
usr.bin/diff: fix UBSan error in readhash
UBSan complains about the `sum = sum * 127 + chrtran(t);` line below since
that can overflow an `int`. Use `unsigned int` instead to ensure that
overflow is well-defined.
Alex Richardson [Tue, 6 Jul 2021 09:50:05 +0000 (10:50 +0100)]
usr.bin/login: send errors to console if syslog isn't running
I was debugging why login(1) wasn't working as expected on a minimal
MFS_ROOT disk image. This image doesn't have syslogd running so the
warnings were lost and I had to use GDB to find out why login(1) was
failing (missing PAM libraries) instead of being able to see it in
the console output.
Alex Richardson [Mon, 5 Jul 2021 13:32:48 +0000 (14:32 +0100)]
usr.bin/sort: Avoid UBSan errors
UBSan complains about out-of-bounds accesses for zero-length arrays. To
avoid this we can use flexible array members. However, the C standard does
not allow for structures that only contain flexible array members, so we
move the length parameters into that structure too.
Alex Richardson [Fri, 2 Jul 2021 08:21:04 +0000 (09:21 +0100)]
Simplify and speed up the kyua build
Instead of having multiple kyua libraries, just include the files as part
of usr.bin/kyua. Previously, we would build each kyua source up to four
times: once as a .o file and once as a .pieo. Additionally, the kyua
libraries might be built again for compat32. As all the kyua libraries
amount to 102 C++ sources the build time is significant (especially when
using an assertions enabled compiler). This change ensures that we build
306 fewer .cpp source files as part of buildworld.
Rick Macklem [Tue, 20 Jul 2021 00:35:39 +0000 (17:35 -0700)]
nfscl: Send stateid.seqid of 0 for NFSv4.1/4.2 mounts
For NFSv4.1/4.2, the client may set the "seqid" field of the
stateid to 0 in RPC requests. This indicates to the server that
it should not check the "seqid" or return NFSERR_OLDSTATEID if the
"seqid" value is not up to date w.r.t. Open/Lock operations
on the stateid. This "seqid" is incremented by the NFSv4 server
for each Open/OpenDowngrade/Lock/Locku operation done on the stateid.
Since a failure return of NFSERR_OLDSTATEID is of no use to
the client for I/O operations, it makes sense to set "seqid"
to 0 for the stateid argument for I/O operations.
This avoids server failure replies of NFSERR_OLDSTATEID,
although I am not aware of any case where this failure occurs.
This makes the FreeBSD NFSv4.1/4.2 client compatible with the
Linux NFSv4.1/4.2 client.
Ed Maste [Tue, 27 Jul 2021 21:18:41 +0000 (17:18 -0400)]
Update WITHOUT_KERNEL_SYMBOLS description
We have installed kernel debug data under /usr/lib/debug/ for some time
now, so the suggestion to set WITHOUT_KERNEL_SYMBOLS for small root
partitions is no longer valid.
Also call them "debug symbol files" rather than just "symbol files",
since they contain much more than just symbols. The kernel also
includes (some) symbols, regardless of the setting of this knob.
Commits 9fb6e613373c and 9ec7dbf46b0a both changed the internal
KAPI between the NFS modules. Bump __FreeBSD_version to 1300514
so that all modules are rebuilt from sources.
Rick Macklem [Fri, 16 Jul 2021 22:01:03 +0000 (15:01 -0700)]
nfsd: Add sysctl to set maximum I/O size up to 1Mbyte
Since MAXPHYS now allows the FreeBSD NFS client
to do 1Mbyte I/O operations, add a sysctl called vfs.nfsd.srvmaxio
so that the maximum NFS server I/O size can be set up to 1Mbyte.
The Linux NFS client can also do 1Mbyte I/O operations.
The default of 128Kbytes for the maximum I/O size has
not been changed for two reasons:
- kern.ipc.maxsockbuf must be increased to support 1Mbyte I/O
- The limited benchmarking I can do actually shows a drop in I/O rate
when the I/O size is above 256Kbytes.
However, daveb@spectralogic.com reports seeing an increase
in I/O rate for the 1Mbyte I/O size vs 128Kbytes using a Linux client.
Rick Macklem [Sun, 11 Jul 2021 20:34:16 +0000 (13:34 -0700)]
mount_nfs.8: Add information for "nconnect" to man page
Commit 1e0a518d6548 added a new NFS mount option "nconnect".
This patch adds information on this option to the man page.
It also adds an IMPLEMENTATION section that explains how
the default I/O size is determined and that "nfsstat -m" can
be used to find out what option settings are actually in use.
Rick Macklem [Fri, 9 Jul 2021 00:39:04 +0000 (17:39 -0700)]
nfscl: Add a Linux compatible "nconnect" mount option
Linux has had an "nconnect" NFS mount option for some time.
It specifies that N (up to 16) TCP connections are to created for a mount,
instead of just one TCP connection.
A discussion on freebsd-net@ indicated that this could improve
client<-->server network bandwidth, if either the client or server
have one of the following:
- multiple network ports aggregated to-gether with lagg/lacp.
- a fast NIC that is using multiple queues
It does result in using more IP port#s and might increase server
peak load for a client.
One difference from the Linux implementation is that this implementation
uses the first TCP connection for all RPCs composed of small messages
and uses the additional TCP connections for RPCs that normally have
large messages (Read/Readdir/Write). The Linux implementation spreads
all RPCs across all TCP connections in a round robin fashion, whereas
this implementation spreads Read/Readdir/Write across the additional
TCP connections in a round robin fashion.
to restore ABI compatibility for pre-10.x binaries.
It restores _umtx_lock() and _umtx_unlock() syscalls, and UMTX_OP_LOCK/
UMTX_OP_UNLOCK umtx_op(2) operations. UMUTEX_ERROR_CHECK flag is left
out for now, I do not think it makes a difference.
PR: 218571
Reviewed by: brooks (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31220
Alexander Motin [Thu, 1 Jul 2021 19:28:55 +0000 (15:28 -0400)]
mrsas(4): Report more correct maximum I/O size.
Subtract one SGE for the case of misaligned address. Also take into
account maximum number of sectors reported by firmware, that gives
nicer 256KB limit instead of 276KB calculated from the SGE limit.
While there, remove number of I/O size checks, duplicating what is
already checked by CAM and busdma(9).
This completes PRR cwnd reduction in all circumstances
for the base TCP stack (SACK loss recovery, ECN window reduction,
non-SACK loss recovery), preventing the arriving ACKs to
clock out new data at the old, too high rate. This
reduces the chance to induce additional losses while
recovering from loss (during congested network conditions).
For non-SACK loss recovery, each ACK is assumed to have
one MSS delivered. In order to prevent ACK-split attacks,
only one window worth of ACKs is considered to actually
have delivered new data.
This updates llvm, clang, compiler-rt, libc++, libunwind, lld, lldb and
openmp to llvmorg-12-init-17869-g8e464dd76bef, the last commit before the
upstream release/12.x branch was created.
Apply upstream libc++ fix to allow building with devel/xxx-xtoolchain-gcc
Merge commit 52e9d80d5db2 from llvm git (by Jason Liu):
[libc++] add `inline` for __open's definition in ifstream and ofstream
Summary:
When building with gcc on AIX, it seems that gcc does not like the
`always_inline` without the `inline` keyword.
So adding the inline keywords in for __open in ifstream and ofstream.
That will also make it consistent with __open in basic_filebuf
(it seems we added `inline` there before for gcc build as well).
Undefine HAVE_(DE)REGISTER_FRAME in llvm's config.h on arm
Otherwise, the lli tool (enable by WITH_CLANG_EXTRAS) won't link on arm,
stating that __register_frame is undefined. This function is normally
provided by libunwind, but explicitly not for the ARM Exception ABI.
Revert libunwind change to fix backtrace segfault on aarch64
Revert commit 22b615a96593 from llvm git (by Daniel Kiss):
[libunwind] Support for leaf function unwinding.
Unwinding leaf function is useful in cases when the backtrace finds a
leaf function for example when it caused a signal.
This patch also add the support for the DW_CFA_undefined because it marks
the end of the frames.
Reland with limit the test to the x86_64-linux target.
Bisection has shown that this particular upstream commit causes programs
using backtrace(3) on aarch64 to segfault. This affects the lang/rust
port, for instance. Until we can upstream to fix this problem, revert
the commit for now.
compilert-rt: build out-of-line LSE atomics helpers for aarch64
Both clang >= 12 and gcc >= 10.1 now default to -moutline-atomics for
aarch64. This requires a bunch of helper functions in libcompiler_rt.a,
to avoid link errors like "undefined symbol: __aarch64_ldadd8_acq_rel".
(Note: of course you can use -mno-outline-atomics as a workaround too,
but this would negate the potential performance benefit of the faster
LSE instructions.)
Bump __FreeBSD_version so ports maintainers can easily detect this.
Warner Losh [Mon, 12 Jul 2021 03:26:08 +0000 (21:26 -0600)]
awk: remove proctab.c
proctab.c is a generated file and never should have been committed to
the tree. This file has been added and removed a couple of times, most
recently added by me in my 2019 updates.
Kristof Provost [Tue, 2 Mar 2021 15:57:27 +0000 (16:57 +0100)]
pf tests: Test the match keyword
The new match keyword can currently only assign queues, so we can only
test it with ALTQ.
Set up a basic scenario where we use 'match' to assign ICMP traffic to a
slow queue, and confirm that it's really getting slowed down.