Kyle Evans [Wed, 2 Nov 2022 20:29:16 +0000 (15:29 -0500)]
mktemp: don't double up on trailing slashes for -t paths
This is a minor cosmetic change; re-organize slightly to set tmpdir to
_PATH_TMP if we didn't otherwise have a tmpdir candidate, then check the
trailing char before appending another slash.
While we're here, remove some bogus whitespace and add a test case for
this change.
Obtained from: https://github.com/apple-oss-distributions/shell_cmds
Sponsored by: Klara, Inc.
Mark Johnston [Wed, 2 Nov 2022 17:27:27 +0000 (13:27 -0400)]
arm64: Handle translation faults for thread structures
The break-before-make requirement poses a problem when promoting or
demoting mappings containing thread structures: a CPU may raise a
translation fault while accessing curthread, and data_abort() accesses
the thread again before pmap_fault() can translate the address and
return.
Normally this isn't a problem because we have a hack to ensure that
slabs used by the thread zone are always accessed via the direct map,
where promotions and demotions are rare. However, this hack doesn't
work properly with UMA_MD_SMALL_ALLOC disabled, as is the case with
KASAN configured (since our KASAN implementation does not shadow the
direct map and so tries to force the use of the kernel map wherever
possible).
Fix the problem by modifying data_abort() to handle translation faults
in the kernel map without dereferencing "td", i.e., curthread, and
without enabling interrupts. pmap_klookup() has special handling for
translation faults which makes it safe to call in this context. Then,
revert the aforementioned hack.
Reviewed by: kevans, alc, kib, andrew
MFC after: 1 month
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D37231
Mark Johnston [Wed, 2 Nov 2022 17:08:07 +0000 (13:08 -0400)]
inpcb: Allow SO_REUSEPORT_LB to be used in jails
Currently SO_REUSEPORT_LB silently does nothing when set by a jailed
process. It is trivial to support this option in VNET jails, but it's
also useful in traditional jails.
This patch enables LB groups in jails with the following semantics:
- all PCBs in a group must belong to the same jail,
- PCB lookup prefers jailed groups to non-jailed groups
This is a straightforward extension of the semantics used for individual
listening sockets. One pre-existing quirk of the lbgroup implementation
is that non-jailed lbgroups are searched before jailed listening
sockets; that is preserved with this change.
Mark Johnston [Wed, 2 Nov 2022 17:03:41 +0000 (13:03 -0400)]
inpcb: Remove NULL checks of credential references
Some auditing of the code shows that "cred" is never non-NULL in these
functions, either because all callers pass a non-NULL reference or
because they unconditionally dereference "cred". So, let's simplify the
code a bit and remove NULL checks. No functional change intended.
Mike Karels [Wed, 2 Nov 2022 15:59:09 +0000 (10:59 -0500)]
fetch: support EAI_ADDRFAMILY error, correct two error messages
With the change to return EAI_ADDRFAMILY from getaddrinfo(), fetch
would print "Unknown resolver error" for that error. Add that error
and its string to libfetch's table, using an #ifdef just in case.
Correct error strings for EAI_NODATA (although it is currently unused)
and EAI_NONAME. Should maybe rework the code to use gai_strerror(3),
but that doesn't map directly, and the current strings are shortened.
Reviewed in https://reviews.freebsd.org/D37139 with related changes.
Mike Karels [Wed, 2 Nov 2022 15:57:59 +0000 (10:57 -0500)]
getaddrinfo: distinguish missing addrs from unresolvable names
Rework getaddrinfo(3) to return different error values for unresolvable
names (same as before, EAI_NONAME) and those without a requested addr
(EAI_ADDRFAMILY) when using DNS. This is implemented via an added
error in the nsswitch layer, NS_ADDRFAMILY, which is used only by
getaddrinfo(). The error is passed through nsdispatch(3), but that
routine has no changes to handle this error. The error originates in
the getaddrinfo DNS layer called via nsdispatch(), and is processed
by the search layer that calls nsdispatch().
While here, add a little style to returns near those that were
modified.
Reviewed in https://reviews.freebsd.org/D37139 with related changes.
gai_strerror.c still has messages for EAI_ADDRFAMILY and EAI_NODATA,
but not the man page. Re-add to the man page, and update comments
in the source. Document the errors that are not in RFC 3493 or
POSIX.
Reviewed in https://reviews.freebsd.org/D37139 with related changes.
Mike Karels [Wed, 2 Nov 2022 15:43:04 +0000 (10:43 -0500)]
netdb.h: re-enable EAI_ADDRFAMILY, EAI_NODATA
EAI_ADDRFAMILY and EAI_NODATA are not in RFC 3493, but are available
and used in many other systems. It is desirable to have at least one
of them in order to distinguish between names that do not resolve and
those that do not have the requested address type. A change to
getaddrinfo() will use EAI_ADDRFAMILY. Both were "#if 0"; re-enable,
conditioned on __BSD_VISIBLE, and update comments. Also add comments
and __BSD_VISIBLE conditional for the last three EAI errors, which
are not in the RFC or POSIX. Note, all of these are available in
NetBSD and OpenBSD, and EAI_ADDRFAMILY and EAI_NODATA are available
in Linux (glibc).
Reviewed in https://reviews.freebsd.org/D37139 with related changes.
Ed Maste [Tue, 19 Jul 2022 20:42:27 +0000 (16:42 -0400)]
linux64: improve linux_support.s make rules
Previously we relied on the .s.o rule in share/mk/bsd.suffixes.mk to
tell make that linux_support.o is built from linux_support.s, even
though we do not use the .s.o rule to assemble it.
Reviewed by: sjg
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35864
Allow pf (l2) to be used to redirect ethernet packets to a different
interface.
The intended use case is to send 802.1x challenges out to a side
interface, to enable AT&T links to function with pfSense as a gateway,
rather than the AT&T provided hardware.
Kristof Provost [Wed, 2 Nov 2022 10:58:04 +0000 (11:58 +0100)]
bridge tests: re-enable span test
The root cause of the intermittent span test failures has been
identified as a race between sending the packet and starting the bpf
capture.
This is now resolved, so the test can be re-enabled.
Kristof Provost [Wed, 2 Nov 2022 10:55:39 +0000 (11:55 +0100)]
tests: make sniffer more robust
The Sniffer class is often used by test tools such as pft_ping to verify
that packets actually get sent where they're expected.
It starts a background thread to capture packets, but this thread needs
some time to start, leading to intermittent test failures when the
capture doesn't start before the relevant packet is sent.
Add a semaphore to ensure the Sniffer constructor doesn't return until
the capture is actually running.
Chuck Silvers [Tue, 1 Nov 2022 17:55:14 +0000 (10:55 -0700)]
ipmi: use a queue for kcs driver requests when possible
The ipmi watchdog pretimeout action can trigger unintentionally in
certain rare, complicated situations. What we have seen at Netflix
is that the BMC can sometimes be sent a continuous stream of
writes to port 0x80, and due to what is a bug or misconfiguration
in the BMC software, this results in the BMC running out of memory,
becoming very slow to respond to KCS requests, and eventually being
rebooted by its own internal watchdog. While that is going on in
the BMC, back in the host OS, a number of requests are pending in
the ipmi request queue, and the kcs_loop thread is working on
processing these requests. All of the KCS accesses to process
those requests are timing out and eventually failing because the
BMC is responding very slowly or not at all, and the kcs_loop thread
is holding the IPMI_IO_LOCK the whole time that is going on.
Meanwhile the watchdogd process in the host is trying to pat the
BMC watchdog, and this process is sleeping waiting to get the
IPMI_IO_LOCK. It's not entirely clear why the watchdogd process
is sleeping for this lock, because the intention is that a thread
holding the IPMI_IO_LOCK should not sleep and thus any thread
that wants the lock should just spin to wait for it. My best guess
is that the kcs_loop thread is spinning waiting for the BMC to
respond for so long that it is eventually preempted, and during
the brief interval when the kcs_loop thread is not running,
the watchdogd thread notices that the lock holder is not running
and sleeps. When the kcs_loop thread eventually finishes processing
one request, it drops the IPMI_IO_LOCK and then immediately takes the
lock again so it can process the next request in the queue.
Because the watchdogd thread is sleeping at this point, the kcs_loop
always wins the race to acquire the IPMI_IO_LOCK, thus starving
the watchdogd thread. The callout for the watchdog pretimeout
would be reset by the watchdogd thread after its request to the BMC
watchdog completes, but since that request never processed, the
pretimeout callout eventually fires, even though there is nothing
actually wrong with the host.
To prevent this saga from unfolding:
- when kcs_driver_request() is called in a context where it can sleep,
queue the request and let the worker thread process it rather than
trying to process in the original thread.
- add a new high-priority queue for driver requests, so that the
watchdog patting requests will be processed as quickly as possible
even if lots of application requests have already been queued.
With these two changes, the watchdog pretimeout action does not trigger
even if the BMC is completely out to lunch for long periods of time
(as long as the watchdogd check command does not also get stuck).
Kristof Provost [Tue, 1 Nov 2022 17:03:50 +0000 (18:03 +0100)]
pf tests: make killstate tests more robust
Rather than using a Scapy-based Python script only check if the state
still exists. Scapy tends to be slow to start, it appears because it
lists all interfaces and gets their (IPv6) addresses a couple of times
at startup. This can be sufficient for the ICMP state to time out and
the test to fail.
We now only check if the state exists or is removed as expected, which
makes things faster, and should mean the test is more robust on slower
machines (such as CI VMs).
First, an sbuf_new() in device_get_path() shadows the sb
passed in by dev_wired_cache_add(), leaving its sb in an
unfinished state, leading to a failed KASSERT(). Fixing this
is as simple as removing the sbuf_new() from device_get_path()
Second, we cannot simply take a pointer to the sbuf memory and
store it in the device location cache, because that sbuf
is freed immediately after we add data to the cache, leading
to a use-after-free and eventually a double-free. Fixing this
requires allocating memory for the path.
After a discussion with jhb, we decided that one malloc was
better than two in dev_wired_cache_add, which is why it changed
so much.
Reviewed by: jhb
Sponsored by: Netflix
MFC after: 14 days
Mitchell Horne [Tue, 1 Nov 2022 15:15:18 +0000 (12:15 -0300)]
hier(7): remove text describing /usr/src layout
It poses a maintenance burden, since much of the information is
duplicated in the src tree's README.md file. Readers who are interested
enough in learning about the structure of the src tree can download it,
or browse the README online. Have hier(7) just point them there instead.
arm64: Hyper-V: fixing hung issue during Hyper-V initialization
In non-Hyper-V systems during Hyper-V initialization, system
initialization was getting hung, as hyperv_identify(),
was returning successful irrespective of the type of the platform.
Bjoern A. Zeeb [Mon, 31 Oct 2022 23:53:26 +0000 (23:53 +0000)]
LinuxKPI: 802.11: pass internal variable to lkpi_80211_mo_sta_state()
With mac80211 operations (MO) tracing on we have seen some ill-ordered
executions of MO functions. In order to limit visibility of the mac80211
sta, pass the internal version into lkpi_80211_mo_sta_state() and only
there convert to the argument needed. This mostly eases tracing and
debugging.
Sposnored by: The FreeBSD Foundation
MFC after: 3 days
Mark Johnston [Mon, 31 Oct 2022 23:11:36 +0000 (19:11 -0400)]
dtrace: Fix up %rip for invop probes on x86
When a breakpoint exception is raised, the saved value of %rip points to
the instruction following the breakpoint. However, when fetching the
value of %rip using regs[], it's more natural to provide the address of
the breakpoint itself, so modify the kinst and fbt providers accordingly.
Ed Maste [Mon, 31 Oct 2022 20:38:46 +0000 (16:38 -0400)]
mount_unionfs: remove jokey cautions from man page
There are known issues with unionfs, and the mount_unionfs man page has
a cautionary statement about its use. The caution had additional
"humourous" statements like "BEWARE OF DOG" but they served only to
confuse the situation. Remove them.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Gleb Smirnoff [Mon, 31 Oct 2022 15:57:11 +0000 (08:57 -0700)]
inpcb: retire suppresion of randomization of ephemeral ports
The suppresion was added in 5f311da2ccb with no explanation in the
commit message of the exact problem that was fixed. In the BSDCan
2006 talk [1], slides 12 to 14, we can find that it seems that there
was some problem with the TIME_WAIT state not properly being handled
on the remote side (also FreeBSD!), and this switching off the
suppression had hidden the problem. The rationale of the change was
that other stacks may also be buggy wrt the TIME_WAIT.
I did not find the actual problem in TIME_WAIT that the suppression
has hidden, neither a commit that would fix it. However, since that
time we started to handle SYNs with RFC5961 instead of RFC793, see 3220a2121cc. We also now have the tcp-testsuite [2], that has full
coverage of all possible scenarios of receiving SYN in TIME_WAIT.
Gleb Smirnoff [Mon, 31 Oct 2022 15:30:59 +0000 (08:30 -0700)]
rack/bbr: put back assertion that connection is not in TIME-WAIT
The assertion was incorrectly removed in 0d7445193ab. The leak of
a TIME-WAIT state into tfb_do_segment_nounlock method was fixed in 31bc602ff81. The TIME-WAIT connections are processed by the main
tcp_input() always.
Kyle Evans [Mon, 31 Oct 2022 03:55:46 +0000 (22:55 -0500)]
mktemp: add -p/--tmpdir argument
This matches other mktemp implementations, including OpenBSD and GNU.
The -p option can be used to provide a tmpdir prefix for specified
templates. Precedence works out like so:
-t flag:
- $TMPDIR
- -p directory
- /tmp
Implied -t flag (no arguments or only -d flag):
- -p directory
- $TMPDIR
- /tmp
Some tests have been added for mktemp(1) in the process.
Aymeric Wibo [Sun, 23 Oct 2022 07:46:27 +0000 (09:46 +0200)]
ls(1): add a -v flag to sort naturally
Add a -v flag for ls which sorts entries following a natural ordering
using strverscmp(3) (e.g. "bloem1 bloem9 bloem10" as opposed to
"bloem1 bloem10 bloem9").
Justin Hibbits [Sun, 30 Oct 2022 19:40:05 +0000 (15:40 -0400)]
dtrace: Add pid provider to the build for powerpc
The fasttrap pid provider has been in place for a long time, but stopped
getting built by efe88d92da in preparation for 64-bit atomics. 32-bit
emulation of 64-bit atomics was added in 9aafc7c05.
Piotr Kubaj [Fri, 28 Oct 2022 09:59:05 +0000 (11:59 +0200)]
ofed: allow using IPv6 address in rc_pingpong server
Summary:
The current OFED code allows binding server to IPv6 address. It was added back in https://github.com/linux-rdma/rdma-core/commit/91fc39561d04903cd5b1665d9215a184baa66ba9
Gordon Bergling [Sun, 30 Oct 2022 12:59:37 +0000 (13:59 +0100)]
wg.4: Add some enhancements
- add a SPDX-License-Identifier
- rename the title of the man page
- use better grammar in some places
- reword 'IPs' to 'IP addresses'
- add a missing word in the AUTHORS section
- use '.An -nosplit' in the AUTHORS section
- Xr ipsec and ovpn
Warner Losh [Sat, 29 Oct 2022 14:34:16 +0000 (08:34 -0600)]
sys/modules: MODULES_OVERRIDE takes precedence over EXTRA_MODULES and WITHOUT_MODULES
MODULES_OVERRIDE has traditionally taken precedence over EXTRA_MODULES
and WITHOUT_MODULES as the exact list of modules to build. Over time,
things have been added that has broken this. Move the .endif that makes
this the case to the right place. The so called 'ALL_MODULES' option is
the only thing with higher precedence, but it's not quite all the
options anymore (though it is much more of them, and doesn't quite
work on !x86).
Warner Losh [Fri, 28 Oct 2022 21:42:49 +0000 (15:42 -0600)]
make: Don't print as many ==> and -- xxx -- lines in meta mode
Since metamode just announces what it's doing, the extra -- xxx -- lines
aren't needed for recursive descent, nor are the ==> lines needed. This
speeds up rebuilding kernels a lot...
Mitchell Horne [Sat, 29 Oct 2022 15:30:32 +0000 (12:30 -0300)]
linux, linux64: fix module load
The previous commit added references to to the syscallnames arrays, but
failed to add the relevant source files to the module build. Thus, the
modules failed to load due to missing symbols.
Reported by: cy
Fixes: 1da65dcb1c57 ("linux: populate sv_syscallnames in each sysentvec")
Sponsored by: The FreeBSD Foundation
Mitchell Horne [Sat, 29 Oct 2022 15:28:16 +0000 (12:28 -0300)]
linux, linux64: improve SRCS formatting
Sort the entries alphabetically, and list them with one entry per line.
This makes the diffs much cleaner when adding or removing a new entry,
as I will do in the next commit.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation