John Baldwin [Mon, 7 Feb 2022 22:11:10 +0000 (14:11 -0800)]
Extend the VMM stats interface to support a dynamic count of statistics.
- Add a starting index to 'struct vmstats' and change the
VM_STATS ioctl to fetch the 64 stats starting at that index.
A compat shim for <= 13 continues to fetch only the first 64
stats.
- Extend vm_get_stats() in libvmmapi to use a loop and a static
thread local buffer which grows to hold the stats needed.
Kristof Provost [Thu, 27 Jan 2022 10:41:21 +0000 (11:41 +0100)]
dummynet: don't use per-vnet locks to protect global data.
The ref_count counter is global (i.e. not per-vnet) so we can't use a
per-vnet lock to protect it. Moreover, in callouts curvnet is not set,
so we'd end up panicing when trying to use DN_BH_WLOCK().
Instead we use the global sched_lock, which is already used when
evaluating ref_count (in unload_dn_aqm()).
Sebastian Huber [Mon, 7 Feb 2022 21:16:16 +0000 (14:16 -0700)]
kern_ntptime.c: Remove ntp_init()
The ntp_init() function did set a couple of global objects to zero. These
objects are in the .bss section and already initialized to zero during kernel
or module loading.
John Baldwin [Mon, 7 Feb 2022 20:55:08 +0000 (12:55 -0800)]
cfiscsi_done: Free the dummy PDU earlier.
The dummy PDU needs to be freed before marking task abortion complete
as otherwise cfiscsi_session_terminate_tasks can return and destroy
the session in another thread before the PDU is freed.
Fixes: 2e8d1a55258d iscsi: Allocate a dummy PDU for the internal nexus reset task.
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D34176
John Baldwin [Mon, 7 Feb 2022 20:47:51 +0000 (12:47 -0800)]
Stop adding -Wredundant-decls to CWARNFLAGS.
clang doesn't implement it, and Linux doesn't enforce it. As a
result, new instances keep cropping up both in FreeBSD's code and in
upstream sources from vendors.
Warner Losh [Mon, 7 Feb 2022 20:16:15 +0000 (13:16 -0700)]
release: Don't install ubldr.bin
ubldr.bin was obsoleted by our uboot ports last year, so this is
completely unused in the default config (some customers still use
it, but that's not relevant to this script). Don't copy it at all
since it won't be used for re@ produced images.
Warner Losh [Mon, 7 Feb 2022 20:15:03 +0000 (13:15 -0700)]
nanobsd: Stop copying ubldr
manu@ removed support for loading ubldr* from uboot last year. No need
to copy them to the image. This may be needed for some 32-bit platforms
in theory, but those platforms weren't ever the target for nanobsd that
I'm aware of. Should there be platforms where this is used, we can add
it to building those platforms.
Robert Wing [Mon, 7 Feb 2022 19:05:20 +0000 (10:05 -0900)]
pbuf_ctor(): lock the buffer with LK_NOWAIT
This LOR happens when reading from a file backed MD device:
lock order reversal:
1st 0xfffffe00431eaac0 pbufwait (pbufwait, lockmgr) @ /cobra/src/sys/vm/vm_pager.c:471
2nd 0xfffff80003f17930 ufs (ufs, lockmgr) @ /cobra/src/sys/dev/md/md.c:977
lock order pbufwait -> ufs attempted at:
#0 0xffffffff80c78ead at witness_checkorder+0xbdd
#1 0xffffffff80bd6a52 at lockmgr_lock_flags+0x182
#2 0xffffffff80f52d5c at ffs_lock+0x6c
#3 0xffffffff80d0f3f4 at _vn_lock+0x54
#4 0xffffffff80708629 at mdstart_vnode+0x499
#5 0xffffffff807060ec at md_kthread+0x20c
#6 0xffffffff80bbfcd0 at fork_exit+0x80
#7 0xffffffff810b809e at fork_trampoline+0xe
This LOR was previously blessed by witness before commit 531f8cfea06b
("Use dedicated lock name for pbufs").
Instead of blessing ufs and pbufwait, use LK_NOWAIT to prevent recording
the lock order. LK_NOWAIT will be a nop here as the lock is dropped in
pbuf_dtor(). The takes the same approach as 5875b94c7493 ("buf_alloc():
lock the buffer with LK_NOWAIT").
In FreeBSD's libc, a number of internal aliases of the pthread functions
are invoked, typically with an additional prefixed underscore, e.g.
_pthread_cond_init() and so on.
ThreadSanitizer needs to intercept these aliases too, otherwise some
false positive reports about data races might be produced.
Andrew Turner [Mon, 7 Feb 2022 11:47:04 +0000 (11:47 +0000)]
Fix the signal code on 32-bit breakpoints on arm64
When debugging 32-bit programs a debugger may insert a instruction that
will raise the undefined instruction trap. The kernel handles these
by raising a SIGTRAP, however the code was incorrect.
Fix this by using the expected TRAP_BRKPT signal code.
Randall Stewart [Mon, 7 Feb 2022 11:37:46 +0000 (06:37 -0500)]
tcp: Add hystart++ to our cubic implementation.
As promised to the transport call on 11/4/22 here is an implementation
of hystart++ for cubic. It also cleans up the tcp_congestion function
to have a better name. Common variables are moved into the general
cc.h structure so that both cubic and newreno can use them for
hystart++
Reviewed by: Michael Tuexen, Richard Scheffenegger
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D33035
Elliott Mitchell [Wed, 13 Oct 2021 02:00:26 +0000 (19:00 -0700)]
xen: switch to use headers in contrib
These headers originate with the Xen project and shouldn't be mixed with
the main portion of the FreeBSD kernel. Notably they shouldn't be the
target of clean-up commits.
Roger Pau Monné [Fri, 4 Feb 2022 15:16:49 +0000 (16:16 +0100)]
xen: import Xen 4.16 public headers in sys/contrib/
The current path of the Xen headers at /sys/xen/interface/ is not
correct, as those headers are imported verbatim from the Xen sources
and shouldn't be modified, as any modifications would be lost when a
new version is imported.
Changes to the public headers must be first done in Xen upstream so
that they can be backported and new imports will already carry them.
Import Xen 4.16 headers in sys/contrib/xen/. It's unlikely that we
will import different Xen code, so don't place them inside of any
subdirectory. If in the future other pieces of Xen code need to be
imported the headers will need to move into an include/ subdirectory.
Note that this commit does not yet modify the include path to use the
newly imported headers.
xen/grant-table: remove explicit linear mapping additions
There's no need to explicitly add linear mappings for the grant table
area, as the memory is allocated using xenmem_alloc and it should
already have a linear mapping that can be obtained using
rman_get_virtual.
While there also remove the return value of gnttab_map, since there's
no return value anymore.
Dimitry Andric [Sun, 6 Feb 2022 16:07:16 +0000 (17:07 +0100)]
Explicitly include semaphore.h for struct _sem in fusefs setattr test
In libc++'s __threading_support header the semaphore.h header was
implicitly included, but from version 14 onwards, this is no longer the
case, resulting in compile errors:
tests/sys/fs/fusefs/setattr.cc:740:8: error: variable has incomplete type 'sem_t' (aka '_sem')
sem_t sem;
^
tests/sys/fs/fusefs/utils.hh:33:8: note: forward declaration of '_sem'
struct _sem;
^
Dimitry Andric [Sun, 6 Feb 2022 15:25:11 +0000 (16:25 +0100)]
Fix too small sscanf output buffers in kbdmap
This fixes the following warnings from clang 14:
usr.sbin/kbdmap/kbdmap.c:241:16: error: 'sscanf' may overflow; destination buffer in argument 5 has size 20, but the corresponding specifier may require size 21 [-Werror,-Wfortify-source]
&a, &b, buf);
^
usr.sbin/kbdmap/kbdmap.c:615:8: error: 'sscanf' may overflow; destination buffer in argument 3 has size 64, but the corresponding specifier may require size 65 [-Werror,-Wfortify-source]
keym, lng, desc);
^
usr.sbin/kbdmap/kbdmap.c:615:14: error: 'sscanf' may overflow; destination buffer in argument 4 has size 64, but the corresponding specifier may require size 65 [-Werror,-Wfortify-source]
keym, lng, desc);
^
usr.sbin/kbdmap/kbdmap.c:615:19: error: 'sscanf' may overflow; destination buffer in argument 5 has size 256, but the corresponding specifier may require size 257 [-Werror,-Wfortify-source]
keym, lng, desc);
^
In each case, the buffer being sscanf'd into is one byte too small.
Dimitry Andric [Sun, 6 Feb 2022 14:25:22 +0000 (15:25 +0100)]
Fix too small hostname buffer in bootparamd
This fixes the following warning from clang 14:
usr.sbin/bootparamd/bootparamd/bootparamd.c:204:32: error: 'fscanf' may
overflow; destination buffer in argument 3 has size 255, but the
corresponding specifier may require size 256 [-Werror,-Wfortify-source]
The MAX_MACHINE_NAME macro indicates the maximum number of bytes in a
machine name, but it does not include the NUL terminator required for
scanf.
Gleb Smirnoff [Sat, 5 Feb 2022 21:25:38 +0000 (13:25 -0800)]
dmesg: detect wrapped msgbuf on the kernel side and if so, skip first line
Since 59f256ec35d3 dmesg(8) will always skip first line of the message
buffer, cause it might be incomplete. The problem is that in most cases
it is complete, valid and contains the "---<<BOOT>>---" marker. This
skip can be disabled with '-a', but that would also unhide all non-kernel
messages. Move this functionality from dmesg(8) to kernel, since kernel
actually knows if wrap has happened or not.
The main motivation for the change is not actually the value of the
"---<<BOOT>>---" marker. The problem breaks unit tests, that clear
message buffer, perform a test and then check the message buffer for
a result. Example of such test is sys/kern/sonewconn_overflow.
Stefan Eßer [Sat, 5 Feb 2022 12:33:53 +0000 (13:33 +0100)]
libc: add helper furnction to set sysctl() user.* variables
Testing had revealed that trying to retrieve the user.localbase
variable into to small a buffer would return the correct error code,
but would not fill the available buffer space with a partial result.
A partial result is of no use, but this is still a violation of the
documented behavior, which has been fixed in the previous commit to
this function.
I just checked the code for "user.cs_path" and found that it had the
same issue.
Instead of fixing the logic for each user.* sysctl string variable
individually, this commit adds a helper function set_user_str() that
implements the semantics specified in the sysctl() man page.
It is currently only used for "user.cs_path" and "user.localbase",
but it will offer a significant simplification when further such
variables will be added (as I intend to do).
Kristof Provost [Tue, 1 Feb 2022 17:33:42 +0000 (18:33 +0100)]
pf tests: Only do post-test logging when specifically enabled
The pf tests have the ability to log state information (pf rules, pf
states, interfaces, ...) on exit (i.e. on success or on error).
This is useful, but only in specific cases. When it's not needed it may
get in the way of clear output.
Test scripts can add 'debug' to the pft_init call to enable this for the
specified test.
Kristof Provost [Tue, 1 Feb 2022 17:25:57 +0000 (18:25 +0100)]
pf: deal with tables gaining or losing counters
When we create a table without counters, add an entry and later
re-define the table to have counters we wound up trying to read
non-existent counters.
We now cope with this by attempting to add them if needed, removing them
when they're no longer needed and not trying to read from counters that
are not present.
Ed Maste [Sat, 5 Feb 2022 02:02:44 +0000 (21:02 -0500)]
elfctl: update man page example for 'no' prefix
Reported by: Mark Millard on freebsd-current@
Fixes: c763f99d11fd ("elfctl: prefix disable flags with "no"")
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
John Baldwin [Fri, 4 Feb 2022 23:38:49 +0000 (15:38 -0800)]
cxgbei: Rework parsing of pre-offload PDUs.
sbcut() returns mbufs in reverse order so is not suitable for reading
data from the socket buffer. Instead, check for already-received data
in the receive worker thread before passing offload PDUs up to the
iSCSI layer. This uses soreceive() to read data from the socket and
is also to use M_WAITOK since it now runs from a worker thread instead
of an interrupt thread.
Also, fix decoding of the data segment length for pre-offload PDUs.
Reported by: Jithesh Arakkan @ Chelsio
Fixes: a8c4147edcdc cxgbei: Parse all PDUs received prior to enabling offload mode.
Sponsored by: Chelsio Communications
Alan Somers [Mon, 3 Jan 2022 00:16:09 +0000 (17:16 -0700)]
fusefs: require FUSE_NO_OPENDIR_SUPPORT for NFS exporting
FUSE file systems that do not set FUSE_NO_OPENDIR_SUPPORT do not
guarantee that d_off will be valid after closing and reopening a
directory. That conflicts with NFS's statelessness, that results in
unresolvable bugs when NFS reads large directories, if:
* The file system _does_ change the d_off field for the last directory
entry previously returned by VOP_READDIR, or
* The file system deletes the last directory entry previously seen by
NFS.
Rather than doing a poor job of exporting such file systems, it's better
just to refuse.
Even though this is technically a breaking change, 13.0-RELEASE's
NFS-FUSE support was bad enough that an MFC should be allowed.
Alan Somers [Sun, 2 Jan 2022 22:29:50 +0000 (15:29 -0700)]
fusefs: optimize NFS readdir for FUSE_NO_OPENDIR_SUPPORT
In its lowest common denominator, FUSE does not require that a directory
entry's d_off field is valid outside of the lifetime of the directory's
FUSE file handle. But since NFS is stateless, it must reopen the
directory on every call to VOP_READDIR. That means reading the
directory all the way from the first entry. Not only does this create
an O(n^2) condition for large directories, but it can also result in
incorrect behavior if either:
* The file system _does_ change the d_off field for the last directory
entry previously seen by NFS, or
* The file system deletes the last directory entry previously seen by
NFS.
Handily, for file systems that set FUSE_NO_OPENDIR_SUPPORT d_off is
guaranteed to be valid for the lifetime of the directory entry, there is
no need to read the directory from the start.
Alan Somers [Sun, 2 Jan 2022 17:18:47 +0000 (10:18 -0700)]
Fix NFS exports of FUSE file systems for big directories
The FUSE protocol does not require that a directory entry's d_off field
outlive the lifetime of its directory's file handle. Since the NFS
server must reopen the directory on every VOP_READDIR call, that means
it can't pass uio->uio_offset down to the FUSE server. Instead, it must
read the directory from 0 each time. It may need to issue multiple
FUSE_READDIR operations until it finds the d_off field that it's looking
for. That was the intention behind SVN r348209 and r297887, but a logic
bug prevented subsequent FUSE_READDIR operations from ever being issued,
rendering large directories incompletely browseable.
Stefan Eßer [Fri, 4 Feb 2022 22:37:12 +0000 (23:37 +0100)]
whereis: fix fetching of user.cs_path sysctl variable
The current implementation of sysctlbyname() does not support the user
sub-tree. This function exits with a return value of 0, but sets the
passed string buffer to an empty string.
As a result, the whereis program did not use the value of the sysctl
variable "user.cs_path", but only the value of the environment
variable "PATH".
This update makes whereis use the sysctl function with a fixed OID,
which already supports the user sub-tree.
Warner Losh [Fri, 4 Feb 2022 22:38:15 +0000 (15:38 -0700)]
style(9): Default to omitting $FreeBSD$
Advise people to omit $FreeBSD$ (in both comments and macros) unless the
code is definitely going to be merged to stable/12. This strengthens
previous statements and is appropriate now that stable/11 is no longer
supported. If people are wrong and things are unexpected merged to 12,
tags can be added before that merge. No sense adding a tag that will
never be expanded and removed later on the off chance it might wind up
in stable/12.
The next step is likely to weaken this to apply just to mergemaster
managed files, but not today.
Kirk McKusick [Fri, 4 Feb 2022 19:46:36 +0000 (11:46 -0800)]
Have fsck_ffs(8) properly correct superblock check-hash failures.
Part of the problem was that fsck_ffs would read the superblock
multiple times complaining and repairing the superblock check hash
each time and then at the end failing to write out the superblock
with the corrected check hash. This fix reads the superblock just
once and if the check hash is corrected ensures that the fixed
superblock gets written.
Ed Maste [Sun, 16 Jan 2022 19:22:05 +0000 (14:22 -0500)]
compiler-rt: re-exec with ASLR disabled when necessary
Some sanitizers (at least msan) currently require ASLR to be disabled.
When we detect that ASLR is enabled, re-exec with it disabled rather
than exiting with an error. See LLVM GitHub issue 53256 for more
detail: https://github.com/llvm/llvm-project/issues/53256
No objection: dim
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33934
Ed Maste [Wed, 19 Jan 2022 18:08:18 +0000 (13:08 -0500)]
compiler-rt: support ReExec() on FreeBSD
Based on getMainExecutable() in llvm/lib/Support/Unix/Path.inc.
This will need a little more work for an upstream change as it must
support older FreeBSD releases that lack elf_aux_info() / AT_EXEC_PATH.
No objection: dim
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33934
Stefan Eßer [Fri, 4 Feb 2022 12:44:20 +0000 (13:44 +0100)]
libc: return partial sysctl() result if buffer is too small
Testing of a new feature revealed that calling sysctl() to retrieve
the value of the user.localbase variable passing too low a buffer size
could leave the result buffer unchanged.
The behavior in the normal case of a sufficiently large buffer was
correct.
All known callers pass a sufficiently large buffer and have thus not
been affected by this issue. If a non-default value had been assigned
to this variable, the result was as documented, too.
Fix the function to fill the buffer with a partial result, if the
passed in buffer size is too low to hold the full result.
Atomics have significant other use besides providing in-system
primitives for safe memory updates. They are used for implementing
communication with out of system software or hardware following some
protocols.
For instance, even UP kernel might require a protocol using atomics to
communicate with the software-emulated device on SMP hypervisor. Or
real hardware might need atomic accesses as part of the proper
management protocol.
Another point is that UP configurations on x86 are extinct, so slight
performance hit by unconditionally use proper atomics is not important.
It is compensated by less code clutter, which in fact improves the
UP/i386 lifetime expectations.