John Baldwin [Sat, 12 Feb 2022 00:04:52 +0000 (16:04 -0800)]
Cast pointer to uintptr_t to avoid alignment warnings.
Both struct ip and struct udphdr both have an aligment of 2, but the
cast from struct ip to a uint32_t pointer confused GCC 9 into raising
the required alignment to 4 and then raising a
-Waddress-of-packed-member error when casting to struct udphdr.
John Baldwin [Fri, 11 Feb 2022 23:16:25 +0000 (15:16 -0800)]
ktls: Write-lock the INP when changing a transmit TLS session.
The TCP rate pacing code relies on being able to read this pointer
safely while holding an INP lock. The initial TLS session pointer is
set while holding the write lock already.
Warner Losh [Fri, 11 Feb 2022 19:50:51 +0000 (12:50 -0700)]
cleankernel: A target to delete the kernel compile file
With the meta-build, it's always a NO_CLEAN build. Provide a way to
remove so one can rebuild from scratch. 'cleankernel' will delete the
kernel and modules object directories. Document this in build(7).
it seems that glibc supports them, and such spelling is mentioned in the
ld.bfd manual. Idea seems to auto-correct some quoting/makefile sytnax
errors on linker command line.
Reviewed by: emaste, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D34247
Ed Maste [Thu, 10 Feb 2022 20:42:05 +0000 (15:42 -0500)]
Makefile.inc1: synthesize PKG_ABI from newvers.sh variables
Previously we inspected ${WSTAGEDIR}/usr/bin/uname to determine PKG_ABI,
but the file will not exist in some cases - for example, if building
only kernel packages. We can instead synthesize the PKG_ABI from
information already provided by newvers.sh.
Reviewed by: kevans, manu (both earlier rev)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34249
Mateusz Guzik [Fri, 11 Feb 2022 12:00:25 +0000 (12:00 +0000)]
tty: switch ttyhook_register to use fget_cap_locked
It is still wrong-ish as fget* funcs don't expect to operate on abitrary
file descriptor tables, but this at least moves it out of the way of an
upcoming change while being bug-compatible.
Warner Losh [Thu, 10 Feb 2022 21:27:08 +0000 (14:27 -0700)]
powerpc: Add static asssert for context size
Add a static assert for the siginfo_t, mcontext_t and ucontext_t
sizes. These are de-facto ABI options and cannot change size ever. For
powerpc64, also add asserts for {u,m}mcontext32_t and siginfo32.
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D34213
John Baldwin [Thu, 10 Feb 2022 20:47:08 +0000 (12:47 -0800)]
libthr: Disable stack unwinding on ARM.
When a thread exits, _Unwind_ForcedUnwind() is used to walk up stack
frames executing pending cleanups pushed by pthread_cleanup_push().
The cleanups are popped by thread_unwind_stop() which is passed as a
callback function to _Unwind_ForcedUnwind().
LLVM's libunwind uses a different function type for the callback on
32-bit ARM relative to all other platforms. The previous unwind.h
header (as well as the unwind.h from libcxxrt) use the non-ARM type on
all platforms, so this has likely been broken on 32-bit arm since it
switched to using LLVM's libunwind.
For now, just disable stack unwinding on 32-bit arm to unbreak the
build until a proper fix is tested.
Mark Johnston [Thu, 10 Feb 2022 20:32:23 +0000 (15:32 -0500)]
libctf: Rip out CTFv1 support
CTFv1 was obsolete before libctf was imported into FreeBSD, and
ctfconvert/ctfmerge can emit only CTFv2. Make ctf.h a bit easier to
maintain by ripping v1 support out. No functional change intended.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Mark Johnston [Thu, 10 Feb 2022 20:31:26 +0000 (15:31 -0500)]
tcp: Avoid conditionally defined fields in union lro_address
The layout of the structure ends up depending on whether the including
file includes opt_inet.h and opt_inet6.h, so different compilation units
can end up seeing different versions of the structure. Fix this by
unconditionally defining the address fields.
As a side effect, this eliminates some duplication in the kernel's CTF
type graph.
Reviewed by: rscheff, tuexen
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34242
Stefan Eßer [Thu, 10 Feb 2022 20:09:34 +0000 (21:09 +0100)]
bin/df: allow -t option to be used together with -l
The df command provides a -l option to exclude all non-local file
systems and a -t option with a (positive or negative) list of file
system types to display.
This commit adds support for a combination of -l and -t. If both are
specified, the parameter list of the -t option is applied on top of
the selection of öocal file systems (independently of the order of
the -l and -t options).
E.g., "df -t noprocfs,sysfs -l" will select all local file systems
except those of type procfs and sysfs.
Jose Luis Duran [Thu, 10 Feb 2022 19:42:36 +0000 (12:42 -0700)]
rc: Allow the removal of firstboot_sentinel on read-only file systems
NanoBSD or, more generally, systems with root_rw_mount="NO" are not able
to remove the firstboot_sentinel file, typically /firstboot, because the
logic in /etc/rc is currently inverted.
When checkyesno root_rw_mount tests on a read-only file system, the
return is 1, hence avoiding the option to mount the system read-write.
Restore the ability to remove the firstboot_sentinel file on read-only
mounted file systems.
Kyle Evans [Thu, 10 Feb 2022 17:20:24 +0000 (11:20 -0600)]
lsvfs: restyle, no functional change
Namely:
- main was using two-space indentation
- re-sort local variables
- explicit braces for loop scope
- make flag bit comparison explicit
The first line of this commit message is unfortunately a lie, as it
introduces a minor functional change on non-FreeBSD systems. Namely,
the first branch is now explicitly compared against `0` and the choice
was made to compare it as greater than 0 to avoid issues on other
systems where `argc != 0` on entry isn't guaranteed (negative when
checked there).
tcp: Add/update AccECN related statistics and numbers
Reserve couters in the tcps struct in preparation
for AccECN, extend the debugging output for TF2
flags, optimize the syncache flags from individual
bits to a codepoint for the specifc ECN handshake.
This is in preparation of AccECN.
No functional chance except for extended debug
output capabilities.
Problem is that it is possible to reach the state with ref_count ==
1 for the mapped non-anonymous object. For instance, anonymous posix
shmfd or linux shmfs object could be mapped, and then corresponding
file descriptor closed, dropping the object reference owned by the
shmfd/shmfs file. Then the check in inactive scan assumes that the
object and page are not mapped and frees the page, while they are not.
PR: 261707
Discussed with: markj
Sponsored by: The FreeBSD Foundation
MFC after: now
Kyle Evans [Thu, 10 Feb 2022 06:15:29 +0000 (00:15 -0600)]
Annotate geom_md with MODULE_VERSION
This was missed in 74d6c131cbe2 where other geom modules were annotated
with MODULE_VERSION. Again, the problem is the same: we can't detect
that geom_md is loaded into the kernel without it.
This was noticed in release builds on the cluster; mdconfig attempts to
load geom_md because it can't detect it in the kernel, but the cluster
config includes md(4) and does not build the kmod. This problem would
have been masked on hosts with the kmod built, as the kmod attempts to
register the g_md module and fails. With this commit, mdconfig would
not even try to load it again.
Chuck Silvers [Thu, 10 Feb 2022 01:09:26 +0000 (17:09 -0800)]
dtrace: remove unnecessary fflush()
This call was added back in the early days of dtrace porting and
no one knows why anymore. The extra flushing causes lots of
unnecessary CPU overhead when a script produces lots of output,
as well as easily losing output because the command can't keep up.
Ed Maste [Wed, 9 Feb 2022 00:45:25 +0000 (19:45 -0500)]
Invert CPU arch test for LLDB default
LLDB currently defaults to enabled on all architectures except arm and
riscv64 (and can probably be enabled for 32-bit arm). Switch to an
opt-out list.
Reviewed by: pkubaj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34218
Martin Matuska [Wed, 9 Feb 2022 23:35:42 +0000 (00:35 +0100)]
libarchive: import changes from upstream
Libarchive 3.6.0
New features:
PR #1614: tar: new option "--no-read-sparse"
PR #1503: RAR reader: filter support
PR #1585: RAR5 reader: self-extracting archive support
New features (not used in FreeBSD base):
PR #1567: tar: threads support for zstd (#1567)
PR #1518: ZIP reader: zstd decompression support
Security Fixes:
PR #1491, #1492, #1493, CVE-2021-36976:
fix invalid memory access and out of bounds read in RAR5 reader
PR #1566, #1618, CVE-2021-31566:
extended fix for following symlinks when processing the fixup list
Other notable bugfixes and improvements:
PR #1620: tar: respect "--ignore-zeros" in c, r and u modes
PR #1625: reduced size of application binaries
Rick Macklem [Wed, 9 Feb 2022 23:17:50 +0000 (15:17 -0800)]
nfsd: Reply NFSERR_SEQMISORDERED for bogus seqid argument
The ESXi NFSv4.1 client bogusly sends the wrong value
for the csa_sequence argument for a Create_session operation.
RFC8881 requires this value to be the same as the sequence
reply from the ExchangeID operation most recently done for
the client ID.
Without this patch, the server replies NFSERR_STALECLIENTID,
which is the correct response for an NFSv4.0 SetClientIDConfirm
but is not the correct error for NFSv4.1/4.2, which is
specified as NFSERR_SEQMISORDERED in RFC8881.
This patch fixes this.
This change does not fix the issue reported in the PR, where
the ESXi client loops, attempting ExchangeID/Create_session
repeatedly.
Kenneth D. Merry [Mon, 24 Jan 2022 21:19:25 +0000 (16:19 -0500)]
Fix non-printable characters in NVMe model and serial numbers.
The NVMe 1.4 spec simply says that Model and Serial numbers are
ASCII strings. Unlike SCSI, it doesn't prohibit non-printable
characters or say that the strings should be padded with spaces.
Since 2014, we have had cam_strvis_sbuf(), which gives additional
options for handling non-ASCII characters. That behavior hasn't
been available for non-sbuf consumers, so users of cam_strvis()
were left with having octal ASCII codes inserted.
So, to avoid having garbage or octal chracters in the strings, use
cam_strvis_sbuf() to create a new function, cam_strvis_flag(), and
re-implement cam_strvis() using cam_strvis_flag().
Now, for the NVMe drives, we can use cam_strvis_flag with the
CAM_STRVIS_FLAG_NONASCII_SPC flag. This transforms non-printable
characters into spaces.
sys/cam/cam.c:
Add a new function, cam_strvis_flag(), that creates an sbuf
on the stack with the user's destination buffer, and calls
cam_strvis_sbuf() with the given flag argument.
Re-implement cam_strvis() to call cam_strvis_flag with the
CAM_STRVIS_FLAG_NONASCII_ESC argument. This should be the
equivalent of the old cam_strvis() function, except for the
overhead of creating the sbuf and calling sbuf_putc/printf.
sys/cam/cam.h:
Declaration for cam_strvis_flag.
sys/cam/nvme/nvme_all.c:
In nvme_print_ident, use the NONASCII_SPC flag with
cam_strvis_flag().
sys/cam/nvme/nvme_da.c:
In ndaregister(), use cam_strvis_flag() with the
NONASCII_SPC flag for the disk description and serial
number we report to GEOM.
sys/cam/nvme/nvme_xpt.c:
In nvme_probe_done(), use cam_strvis_flag with the
NONASCII_SPC flag when storing the drive serial number
in the CAM EDT.
Stefan Eßer [Wed, 9 Feb 2022 21:56:00 +0000 (22:56 +0100)]
sysctlbyname(): restore access to user variables
The optimization of sysctlbyname() in commit d05b53e0baee7 had the
side-effect of not going through the fix-up for the user.* variables
in the previously called sysctl() function.
This lead to 0 or an empty strings being returned by sysctlbyname()
for all user.* variables.
An alternate implementation would store the user variables in the
kernel during system start-up. That would allow to remove the fix-up
code in the C library that is currently required to provide the actual
values.
This update restores the previous code path for the user.* variables
and keeps the performance optimization intact for all other variables.
Randall Stewart [Wed, 9 Feb 2022 21:08:32 +0000 (16:08 -0500)]
cleanup of rack variables.
During a recent deep dive into all the variables so I could
discover why stack switching caused larger retransmits I examined
every variable in rack. In the process I found quite a few bits
that were not used and needed cleanup. This update pulls
out all the unused pieces from rack. Note there are *no* functional
changes here, just the removal of unused variables and a bit of
spacing clean up.
Reviewed by: Michael Tuexen, Richard Scheffenegger
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D34205
Fangrui Song [Wed, 9 Feb 2022 00:59:53 +0000 (19:59 -0500)]
crunchgen: remove -dc from linker invocation
In GNU ld and ld.lld, -dc is used with -r to allocate space to COMMON
symbols. It is presumably to work around legacy code which cannot
handle COMMON symbols in relocatable output. ld.lld may remove -dc or
make it a no-op for the 15.0.0 release.
As of 7420b323a014 crunch/crunchide does not require -dc, as the symbol
hiding technique no longer relied on making symbols local.
In addition -fno-common is now the default in Clang and GCC, so -dc
serves no purpose as the compiler does not generate COMMON symbols
anyway.
See https://maskray.me/blog/2022-02-06-all-about-common-symbols for more
detail on common symbols.
Michael Tuexen [Wed, 9 Feb 2022 18:14:25 +0000 (19:14 +0100)]
usr.sbin: add tcpsso
tcpsso is a command line tool to apply a socket option to an
existing TCP endpoint, which is identified by the inp_gencnt.
tcpsso can be used, for example, to switch the congestion control
module or the TCP stack.
Reviewed by: rrs, rscheff, debdrup, pau amma
Relnotes: yes
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D34139
New features:
PR #1614: tar: new option "--no-read-sparse"
PR #1503: RAR reader: filter support
PR #1585: RAR5 reader: self-extracting archive support
New features (not used in FreeBSD base):
PR #1567: tar: threads support for zstd (#1567)
PR #1518: ZIP reader: zstd decompression support
Security Fixes:
PR #1491, #1492, #1493, CVE-2021-36976:
fix invalid memory access and out of bounds read in RAR5 reader
PR #1566, #1618, CVE-2021-31566:
extended fix for following symlinks when processing the fixup list
Other notable bugfixes and improvements:
PR #1620: tar: respect "--ignore-zeros" in c, r and u modes
PR #1625: reduced size of application binaries
Michael Tuexen [Wed, 9 Feb 2022 11:24:41 +0000 (12:24 +0100)]
tcp: add sysctl interface for setting socket options
This interface allows to set a socket option on a TCP endpoint,
which is specified by its inp_gencnt. This interface will be
used in an upcoming command line tool tcpsso.
Warner Losh [Wed, 9 Feb 2022 00:23:31 +0000 (17:23 -0700)]
test-includes: Simplify $OBJDIR requirements
s=/=_=g in tested names so that all the objects live in $OBJDIR. This is
more robust than depending on side effects of auto OBJDIR features and
should fix buildworld issues some people have seen.
Dimitry Andric [Sun, 6 Feb 2022 17:41:20 +0000 (18:41 +0100)]
tty_info: Avoid warning by using logical instead of bitwise operators
Since TD_IS_RUNNING() and TS_ON_RUNQ() are defined as logical
expressions involving '==', clang 14 warns about them being checked with
a bitwise operator instead of a logical one:
```
sys/kern/tty_info.c:124:9: error: use of bitwise '|' with boolean operands [-Werror,-Wbitwise-instead-of-logical]
runa = TD_IS_RUNNING(td) | TD_ON_RUNQ(td);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
||
sys/sys/proc.h:562:27: note: expanded from macro 'TD_IS_RUNNING'
^
sys/kern/tty_info.c:124:9: note: cast one or both operands to int to silence this warning
sys/sys/proc.h:562:27: note: expanded from macro 'TD_IS_RUNNING'
^
sys/kern/tty_info.c:129:9: error: use of bitwise '|' with boolean operands [-Werror,-Wbitwise-instead-of-logical]
runb = TD_IS_RUNNING(td2) | TD_ON_RUNQ(td2);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
||
sys/sys/proc.h:562:27: note: expanded from macro 'TD_IS_RUNNING'
^
sys/kern/tty_info.c:129:9: note: cast one or both operands to int to silence this warning
sys/sys/proc.h:562:27: note: expanded from macro 'TD_IS_RUNNING'
^
```
Fix this by using logical operators instead. No functional change
intended.
Mark Johnston [Tue, 8 Feb 2022 18:15:54 +0000 (13:15 -0500)]
riscv: Fix a race in pmap_pinit()
All pmaps share the top half of the address space. With 3-level page
tables, the top-level kernel map entries are not static: they might
change if the kernel map is extended (via pmap_growkernel()) or a 1GB
mapping in the direct map is demoted (not implemented yet). Thus the
riscv pmap maintains the allpmaps list to synchronize updates to
top-level entries.
When a pmap is created, it is inserted into this list after copying
top-level entries from the kernel pmap. The copying is done without
holding the allpmaps lock, and it is possible for pmap_pinit() to race
with kernel map updates. In particular, if a thread is modifying L1
entries, and a concurrent pmap_pinit() copies the old version of the
entries, it might not receive the update.
Fix the problem by copying the kernel map entries after inserting the
pmap into the list. This ensures that the nascent pmap always receives
updates, though pmap_distribute_l1() may race with the page copy.
Reviewed by: mhorne, jhb
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34158
Mark Johnston [Tue, 8 Feb 2022 17:36:41 +0000 (12:36 -0500)]
ktls: Disallow transmitting empty frames outside of TLS 1.0/CBC mode
There was nothing preventing one from sending an empty fragment on an
arbitrary KTLS TX-enabled socket, but ktls_frame() asserts that this
could not happen. Though the transmit path handles this case for TLS
1.0 with AES-CBC, we should be strict and allow empty fragments only in
modes where it is explicitly allowed.
Modify sosend_generic() to reject writes to a KTLS-enabled socket if the
number of data bytes is zero, so that userspace cannot trigger the
aforementioned assertion.
Add regression tests to exercise this case.
Reported by: syzkaller
Reviewed by: gallatin, jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34195
Mark Johnston [Tue, 8 Feb 2022 17:34:20 +0000 (12:34 -0500)]
file: Make fget*() and getvnode*() consistent about initializing *fpp
Most fget*() functions initialize the output parameter to NULL. Make
the externally visible interface behave consistently, and make
fget_unlocked_seq() private to kern_descrip.c.
This fixes at least one bug in a consumer, _filemon_wrapper_openat(),
which assumes that getvnode() sets the output file pointer to NULL upon
an error.
Reported by: syzbot+01c0459408f896a5933a@syzkaller.appspotmail.com
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34190
John Baldwin [Tue, 8 Feb 2022 00:20:06 +0000 (16:20 -0800)]
cxgbei: Dispatch sent PDUs to the NIC asynchronously.
Previously the driver was called to send PDUs to the NIC synchronously
from the icl_conn_pdu_queue_cb callback. However, this performed a
fair bit of work while holding the icl connection lock. Instead,
change the callback to add sent PDUs to a STAILQ and defer dispatching
of PDUs to the NIC to a helper thread similar to the scheme used in
the TCP iSCSI backend.
- Replace rx_flags int and the sole RXF_ACTIVE flag with a simple
rx_active bool.
- Add a pool of transmit worker threads for cxgbei.
- Fix worker thread exit to depend on the wakeup in kthread_exit()
to fix a race with module unload.
John Baldwin [Mon, 7 Feb 2022 22:11:10 +0000 (14:11 -0800)]
Extend the VMM stats interface to support a dynamic count of statistics.
- Add a starting index to 'struct vmstats' and change the
VM_STATS ioctl to fetch the 64 stats starting at that index.
A compat shim for <= 13 continues to fetch only the first 64
stats.
- Extend vm_get_stats() in libvmmapi to use a loop and a static
thread local buffer which grows to hold the stats needed.
Kristof Provost [Thu, 27 Jan 2022 10:41:21 +0000 (11:41 +0100)]
dummynet: don't use per-vnet locks to protect global data.
The ref_count counter is global (i.e. not per-vnet) so we can't use a
per-vnet lock to protect it. Moreover, in callouts curvnet is not set,
so we'd end up panicing when trying to use DN_BH_WLOCK().
Instead we use the global sched_lock, which is already used when
evaluating ref_count (in unload_dn_aqm()).