Dmitry Chagin [Sun, 26 Feb 2023 13:42:22 +0000 (16:42 +0300)]
linprocfs(4): Fixup process size in the /proc/pid/stat file
According to the Linux sources the kernel exposes a proces virtual
memory size via proc filesystem into the three files - stat, status
and statm. This is the struct mm->total_vm value adjusted to the
corresponding units - bytes, kilobytes and pages.
The fix is based on a fernape@ analysis.
PR: 265937
Reported by: Ray Bellis
MFC after: 3 days
Gordon Bergling [Sun, 26 Feb 2023 13:33:58 +0000 (14:33 +0100)]
route.8: Fix mandoc warnings
- skipping end of block that is not open: Oc
- no blank before trailing delimiter
- remove useless TN macros
- remove commented out reference for esis(4)
MFC after: 5 days
Differential Revision: https://reviews.freebsd.org/D38783
Gordon Bergling [Sun, 26 Feb 2023 13:15:34 +0000 (14:15 +0100)]
route.8: Add information about ROUTE_MPATH and FIB_ALGO
Since the kernel options ROUTE_MPATH and FIB_ALGO are enabled
per default for a while, it's good to have some user facing
documetation about the general functionality of multipath
routing and fib lookup algorithms.
Reviewed by: pauamma, Jose Luis Duran <jlduran at gmail dot com>
MFC after: 5 days
Differential Revision: https://reviews.freebsd.org/D38783
Piotr Kubaj [Sat, 25 Feb 2023 21:09:41 +0000 (22:09 +0100)]
powerpc64*: port mlx5, OFED, KTLS and krping
Summary:
This review ports mlx5 driver, kernel's OFED stack (userland is already enabled), KTLS and krping to powerpc64 and powerpc64le.
krping requires a small change since it uses assembly for amd64 / i386.
NOTE: On powerpc64le RDMA works fine in the userspace with libmlx5, but on powerpc64 it does not. The problem is that contrib/ofed/libmlx5/doorbell.h checks for SIZEOF_LONG but this macro exists on neither powerpc64* nor amd64. Thus, the file silently goes to the fallback function written for 32-bit architectures. It works fine on little-endian architectures, but causes a hard fail on big-endian. It's possible it may also cause some runtime issues on little-endian.
Thus, on powerpc64 I verified that RDMA works with krping.
Warner Losh [Sat, 25 Feb 2023 18:33:22 +0000 (11:33 -0700)]
kern: Remove gcc2_compiled stripping
Bruce added stripping of gcc2_compiled and other symbols when he made
the boot loader load the symbols for the kernel in 1995 (b5d89ca8ade3)
before the FreeBSD 2.1 release. This was copied around a bit and
tweaked over the years, but these symbols aren't produced by clang, nor
gcc12. The were to support dbx for a.out stabs format. gcc removed them
with stabs support last year. gcc 2.95.4 in FreeBSD 4.x continued to
emit these symbols unconditionally (it was missing a test for aout vs
elf it would appaer). They disappeared entirely with gcc 3.2.4 in 5.x
for all non a.out builds, and entirely in FreeBSD 6.x which had gcc
3.2.6.
Daniel Tameling [Sat, 25 Feb 2023 17:25:51 +0000 (10:25 -0700)]
uniq(1): use strtonum to parse options
Previously strtol was used and the result was directly cast to an int
without checking for an overflow. Use strtonum instead since it is
safer and tells us what went wrong.
Mina Galić [Fri, 24 Feb 2023 11:07:42 +0000 (11:07 +0000)]
apic: prevent divide by zero in CPU frequency init
If a CPU for some reason returns 0 as CPU frequency, we currently panic
on the resulting divide by zero when trying to initialize the CPU(s) via
APIC. When this happens, we'll fallback to measuring the frequency
instead.
Mark Johnston [Sat, 25 Feb 2023 15:21:19 +0000 (10:21 -0500)]
ck_queue: add CK_*_FOREACH_FROM
This is a variant of CK_*_FOREACH from FreeBSD queue.h which starts
iteration at the specified item. If the item pointer is NULL, iteration
starts from the beginning of the list.
Mike Karels [Sat, 25 Feb 2023 14:04:00 +0000 (08:04 -0600)]
sys/conf/NOTES: clean up whitespace
Most options in kernel config files use "options<space><tab>OPTION".
This allows the option to be commented out without shifting columns.
A few options had two tabs, and some had spaces. Make them consistent.
Piotr Kubaj [Fri, 24 Feb 2023 13:26:31 +0000 (14:26 +0100)]
gh-bc: don't disable LTO on powerpc64
Summary:
The LTO issue has been fixed. While -flto for some reason is commented out,
since it wasn't completely removed, it may be expected to be reenabled.
Reviewers: se
Approved by: se
MFC after: 3 days
Subscribers: imp
Differential Revision: https://reviews.freebsd.org/D38755
Paul Floyd [Fri, 24 Feb 2023 16:29:01 +0000 (11:29 -0500)]
libc: handle zero alignment in memalign()
For compatibility with glibc. The previous code would trigger a division
by zero in roundup() and terminate. Instead, just pass through to
malloc() for align == 0.
Mitchell Horne [Fri, 24 Feb 2023 17:19:54 +0000 (13:19 -0400)]
bcm_dma: don't dereference NULL softc
This file defines a small API to be used by other drivers. If any of
these functions are called before the bcm_dma device has attached we
should handle the error gracefully. Fix a formatting quirk while here.
Reviewed by: manu
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D38756
Mark Millard [Fri, 17 Feb 2023 20:30:35 +0000 (16:30 -0400)]
bcm_dma: attach at an earlier bus pass
The sdhci_bcm driver attach routine relies on bcm_dma already being
attached, in order to allocate a DMA channel. However, both drivers
attached at the default pass so this is not guaranteed. Newer RPI
firmware exposes this assumption, and the result is a NULL-dereference
in bcm_dma_allocate().
Rick Macklem [Fri, 24 Feb 2023 15:36:28 +0000 (07:36 -0800)]
nfsd: Fix a use after free when vnet prisons are deleted
The Kasan tests show the nfsrvd_cleancache() results
in a modify after free. I think this occurs because the
nfsrv_cleanup() function gets executed after nfs_cleanup()
which free's the nfsstatsv1_p.
This patch makes them use the same subsystem and sets
SI_ORDER_FIRST for nfs_cleanup(), so that it will be called
after nfsrv_cleanup() via VNET_SYSUNINIT().
The patch also sets nfsstatsv1_p NULL after free'ng it,
so that a crash will result if it is used after free'ng.
The reason for this is while looping through loose source routing (LSRR)
and strict source routing (SSRR), hlen will become smaller than the IP
header. It may even become negative. This should terminate the loop.
However, when hlen is unsigned, an integer underflow occurs becoming a
large number causing the loop to continue virtually forever until hlen
is either by chance smaller than the lenghth of an IP header or it
segfaults.
Mike Karels [Thu, 23 Feb 2023 17:45:08 +0000 (11:45 -0600)]
riscv kernel config: clean up whitespace
Most options in kernel config files use "options<space><tab>OPTION".
This allows the option to be commented out without shifting columns.
A few options had two tabs, and some had spaces. Make them consistent.
Mike Karels [Thu, 23 Feb 2023 17:44:18 +0000 (11:44 -0600)]
powerpc kernel config: clean up whitespace
Most options in kernel config files use "options<space><tab>OPTION".
This allows the option to be commented out without shifting columns.
A few options had two tabs, and some had spaces. Make them consistent.
Mike Karels [Thu, 23 Feb 2023 17:42:59 +0000 (11:42 -0600)]
i386 kernel config: clean up whitespace
Most options in kernel config files use "options<space><tab>OPTION".
This allows the option to be commented out without shifting columns.
A few options had two tabs, and some had spaces. Make them consistent.
Mike Karels [Thu, 23 Feb 2023 17:41:31 +0000 (11:41 -0600)]
arm64 kernel config: clean up whitespace
Most options in kernel config files use "options<space><tab>OPTION".
This allows the option to be commented out without shifting columns.
A few options had two tabs, and some had spaces. Make them consistent.
Mike Karels [Thu, 23 Feb 2023 17:38:26 +0000 (11:38 -0600)]
arm kernel config: clean up whitespace
Most options in kernel config files use "options<space><tab>OPTION".
This allows the option to be commented out without shifting columns.
A few options had two tabs, and some had spaces. Make them consistent.
Mike Karels [Thu, 23 Feb 2023 17:09:39 +0000 (11:09 -0600)]
amd64 kernel config: clean up whitespace
Most options in kernel config files use "options<space><tab>OPTION".
This allows the option to be commented out without shifting columns.
A few options had two tabs, and some had spaces. Make them consistent.
Mark Johnston [Fri, 24 Feb 2023 02:45:01 +0000 (21:45 -0500)]
config: Include errno.h in mkmakefile.cc
Commit da8884202940 ("config: error out on malformed env/hint lines")
added a reference to EINVAL. In some configurations the bootstrap tools
build fails for lack of errno definitions.
Fixes: da8884202940 ("config: error out on malformed env/hint lines")
Reported by: syzbot+b1a5d112a737d9a2be9b@syzkaller.appspotmail.com
Bjoern A. Zeeb [Sat, 18 Feb 2023 01:15:21 +0000 (01:15 +0000)]
net80211: ieee80211_swscan_bg_scan() track return variable under lock
As the comment says it probably does not matter but use a local
variable to track state under lock so we can return the last known
good state of what we thought we were operating under after unlocking.
Likely no functional changes.
Sponsored by: The FreeBSD Foundation
MFC atfer: 3 days
Reviewed by: enweiwu, adrian
Differential Revision: https://reviews.freebsd.org/D38660
Zhenlei Huang [Thu, 23 Feb 2023 18:00:09 +0000 (02:00 +0800)]
Delete obsolete Solaris compat header file stdlib.h
This drops function `getexecname()` redirection.
Historically `getexecname()` is a compatibility definition. Since
openzfs has its own implementation of function `getexecname()` in libspl
and has been merged into base, the compat header file stdlib.h is
no longer needed and should not be used.
Also without this fix libspl will end up an incompatible version of
`getprogname()` with libc. In particular, if zfs is enabled, programs
such as pgrep in /rescue can be wrongly statically linked with libspl
and will not function properly.
PR: 269738
Reviewed by: markj
Fixes: 9e5787d2284e Merge OpenZFS support in to HEAD
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D38733
* Make nhop_set_blackhole() set all necessary properties for the
nexthop
* Make nexthops blackhole/reject based on the rtm_type netlink
property instead of using rtflags.
Reported by: Marek Zarychta <zarychtam@plan-b.pwste.edu.pl>
MFC after: 3 days
Kornel Dulęba [Mon, 20 Feb 2023 14:48:40 +0000 (14:48 +0000)]
arm: Fix initialization of VFP context
Make sure that pcb_vfpsaved is always initialized.
Create a vfp_new_thread helper that is heavily based on the arm64 logic.
While here remove un unnecessary assigment and add an assertion
to make sure that it's been properly initialized before we return
from a VFP exception.
Reported by: Mark Millard <marklmi@yahoo.com>
Tested by: Mark Millard <marklmi@yahoo.com>
Differential Revision: https://reviews.freebsd.org/D38698
Kornel Dulęba [Mon, 20 Feb 2023 14:44:36 +0000 (14:44 +0000)]
arm: Unbreak debugging programs that use FP instructions
Contrary to arm64, on armv7 get_vfpcontext/set_vfpcontext can be called
from cpu_ptrace. This can be triggered when gdb hits a breakpoint
in a userspace program.
Relax td == currthread assertion to account for that situation.
While here update an outdated comment in vfp_discard.
Reported by: Mark Millard <marklmi@yahoo.com>
Tested by: Mark Millard <marklmi@yahoo.com>
Differential Revision: https://reviews.freebsd.org/D38696
Gleb Smirnoff [Thu, 23 Feb 2023 04:44:46 +0000 (20:44 -0800)]
unix/dgram tests: match the kernel behavior
In CURRENT for some time an overflowed unix/dgram socket would
return EAGAIN if it has O_NONBLOCK set. This proved to be
undesired. See 71e70c25c00 for details. Update tests to match
the "new" behavior, which actually is the historical behavior.
Rick Macklem [Wed, 22 Feb 2023 22:09:15 +0000 (14:09 -0800)]
nfsd.c: Log a more meaningful failure message
For the cases where the nfsd(8) daemon is already running or
has failed to start within a prison due to an incorrect prison
configuration, the failure message logged is:
Can't read stable storage file: operation not permitted
This patch replaces the above with more meaningful messages.
It depends on commit 10dff9da9748 to differentiate between the
above two cases, however even without this commit, the messages
should be an improvement.
Rick Macklem [Wed, 22 Feb 2023 21:19:07 +0000 (13:19 -0800)]
nfsd: Return ENXIO instead of EPERM when nfsd(8) already running
The nfsd(8) daemon generates an error message that does not
indicate that the nfsd daemon is already running when the nfssvc(2)
syscall fails for the NFSSVC_STABLERESTART. Also, the check for
running nfsd(8) in a vnet prison will return EPERM when it fails.
This patch replaces EPERM with ENXIO so that the nfsd(8) daemon
can generate more reasonable failure messages. The nfsd(8) daemon
will be patched in a future commit.
Mitchell Horne [Wed, 22 Feb 2023 15:11:15 +0000 (11:11 -0400)]
lockmgr: upgrade panic return checks
We short-circuit lockmgr functions in the face of a kernel panic. Other
lock implementations do this with a SCHEDULER_STOPPED() check, which
covers the additional case where the debugger is active but the system
has not panicked. Update this code to match that behaviour.
Michael Tuexen [Tue, 21 Feb 2023 21:38:18 +0000 (22:38 +0100)]
bblog: improve timeout event handling
Extend the BBLog RTO event to deal with all timers of the base
stack. Also provide information about starting, stopping, and
running off. The expiration of the retransmission timer is
reported as it was done before.
Reviewed by: rscheff@
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D38710
Rick Macklem [Tue, 21 Feb 2023 21:00:42 +0000 (13:00 -0800)]
vfs_export: Add mnt_exjail to control exports done in prisons
If there are multiple instances of mountd(8) (in different
prisons), there will be confusion if they manipulate the
exports of the same file system. This patch adds mnt_exjail
to "struct mount" so that the credentials (and, therefore,
the prison) that did the exports for that file system can
be recorded. If another prison has already exported the
file system, vfs_export() will fail with an error.
If mnt_exjail == NULL, the file system has not been exported.
mnt_exjail is checked by the NFS server, so that exports done
from within a different prison will not be used.
The patch also implements vfs_exjail_destroy(), which is
called from prison_cleanup() to release all the mnt_exjail
credential references, so that the prison can be removed.
Mainly to avoid doing a scan of the mountlist for the case
where there were no exports done from within the prison,
a count of how many file systems have been exported from
within the prison is kept in pr_exportcnt.
Michael Tuexen [Tue, 21 Feb 2023 17:26:49 +0000 (18:26 +0100)]
tcp: rearrange enum and remove unused variable
Rearrange the enum tt_which such that TT_REXMIT is 0. This allows
an extension of the BBLog event RTO in a backwards compatible way.
Remove tcptimers, which was only used in trpt, a utility removed
from the source tree recently.
Gleb Smirnoff [Tue, 21 Feb 2023 16:50:07 +0000 (08:50 -0800)]
Revert "unix/dgram: return EAGAIN instead of ENOBUFS when O_NONBLOCK set"
This API change led to unexpected consequences with Go runtime. The
Go runtime emulates blocking sockets over non-blocking sockets and
for that uses available event dispatcher on the target OS, which is
kevent(2) if availabe, with OS independent layer on top. It expects
that if whatever O_NONBLOCK socket returned ever EAGAIN, then it is
supposed to be reported as writable by the event dispatcher. kevent(2)
would never report a unix/dgram socket, since they never change their
state, they always are writeable. The expectations of Go are not
literally specified by SUS, however they are in its spirit. The SUS
specifies EAGAIN for send(2) as "The socket's file descriptor is marked
O_NONBLOCK and the requested operation would block" [1]. This doesn't
apply to FreeBSD unix/dgram socket, it never blocks on send(2).
Thus, changing API trying to mimic Linux was a mistake. But what about
the problem we tried to fix? Discussed that with Max Dounin of nginx,
and we agreed that the log bomb described shall be fixed on nginx side,
and it actually isn't specific to FreeBSD, may happen with nginx on any
non-Linux system with a certain configuration.
Zhenlei Huang [Tue, 21 Feb 2023 15:43:25 +0000 (23:43 +0800)]
jail: Fix redoing ip restricting
`prison_ip_restrict()` is called in loop FOREACH_PRISON_DESCENDANT_LOCKED.
While under low memory, it is still possible that in subsequent rounds
`prison_ip_restrict()` succeed and `redo_ip[46]` flip over from true to
false, thus leave some prisons's IPv[46] addresses unrestricted.
Reviewed by: jamie
Fixes: 8bce8d28abe6 jail: Avoid multipurpose return value of function prison_ip_restrict()
Differential Revision: https://reviews.freebsd.org/D38697