Mike Karels [Wed, 2 Nov 2022 15:57:59 +0000 (10:57 -0500)]
getaddrinfo: distinguish missing addrs from unresolvable names
Rework getaddrinfo(3) to return different error values for unresolvable
names (same as before, EAI_NONAME) and those without a requested addr
(EAI_ADDRFAMILY) when using DNS. This is implemented via an added
error in the nsswitch layer, NS_ADDRFAMILY, which is used only by
getaddrinfo(). The error is passed through nsdispatch(3), but that
routine has no changes to handle this error. The error originates in
the getaddrinfo DNS layer called via nsdispatch(), and is processed
by the search layer that calls nsdispatch().
While here, add a little style to returns near those that were
modified.
Reviewed in https://reviews.freebsd.org/D37139 with related changes.
gai_strerror.c still has messages for EAI_ADDRFAMILY and EAI_NODATA,
but not the man page. Re-add to the man page, and update comments
in the source. Document the errors that are not in RFC 3493 or
POSIX.
Reviewed in https://reviews.freebsd.org/D37139 with related changes.
Mike Karels [Wed, 2 Nov 2022 15:43:04 +0000 (10:43 -0500)]
netdb.h: re-enable EAI_ADDRFAMILY, EAI_NODATA
EAI_ADDRFAMILY and EAI_NODATA are not in RFC 3493, but are available
and used in many other systems. It is desirable to have at least one
of them in order to distinguish between names that do not resolve and
those that do not have the requested address type. A change to
getaddrinfo() will use EAI_ADDRFAMILY. Both were "#if 0"; re-enable,
conditioned on __BSD_VISIBLE, and update comments. Also add comments
and __BSD_VISIBLE conditional for the last three EAI errors, which
are not in the RFC or POSIX. Note, all of these are available in
NetBSD and OpenBSD, and EAI_ADDRFAMILY and EAI_NODATA are available
in Linux (glibc).
Reviewed in https://reviews.freebsd.org/D37139 with related changes.
Ed Maste [Tue, 19 Jul 2022 20:42:27 +0000 (16:42 -0400)]
linux64: improve linux_support.s make rules
Previously we relied on the .s.o rule in share/mk/bsd.suffixes.mk to
tell make that linux_support.o is built from linux_support.s, even
though we do not use the .s.o rule to assemble it.
Reviewed by: sjg
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35864
Allow pf (l2) to be used to redirect ethernet packets to a different
interface.
The intended use case is to send 802.1x challenges out to a side
interface, to enable AT&T links to function with pfSense as a gateway,
rather than the AT&T provided hardware.
Kristof Provost [Wed, 2 Nov 2022 10:58:04 +0000 (11:58 +0100)]
bridge tests: re-enable span test
The root cause of the intermittent span test failures has been
identified as a race between sending the packet and starting the bpf
capture.
This is now resolved, so the test can be re-enabled.
Kristof Provost [Wed, 2 Nov 2022 10:55:39 +0000 (11:55 +0100)]
tests: make sniffer more robust
The Sniffer class is often used by test tools such as pft_ping to verify
that packets actually get sent where they're expected.
It starts a background thread to capture packets, but this thread needs
some time to start, leading to intermittent test failures when the
capture doesn't start before the relevant packet is sent.
Add a semaphore to ensure the Sniffer constructor doesn't return until
the capture is actually running.
Chuck Silvers [Tue, 1 Nov 2022 17:55:14 +0000 (10:55 -0700)]
ipmi: use a queue for kcs driver requests when possible
The ipmi watchdog pretimeout action can trigger unintentionally in
certain rare, complicated situations. What we have seen at Netflix
is that the BMC can sometimes be sent a continuous stream of
writes to port 0x80, and due to what is a bug or misconfiguration
in the BMC software, this results in the BMC running out of memory,
becoming very slow to respond to KCS requests, and eventually being
rebooted by its own internal watchdog. While that is going on in
the BMC, back in the host OS, a number of requests are pending in
the ipmi request queue, and the kcs_loop thread is working on
processing these requests. All of the KCS accesses to process
those requests are timing out and eventually failing because the
BMC is responding very slowly or not at all, and the kcs_loop thread
is holding the IPMI_IO_LOCK the whole time that is going on.
Meanwhile the watchdogd process in the host is trying to pat the
BMC watchdog, and this process is sleeping waiting to get the
IPMI_IO_LOCK. It's not entirely clear why the watchdogd process
is sleeping for this lock, because the intention is that a thread
holding the IPMI_IO_LOCK should not sleep and thus any thread
that wants the lock should just spin to wait for it. My best guess
is that the kcs_loop thread is spinning waiting for the BMC to
respond for so long that it is eventually preempted, and during
the brief interval when the kcs_loop thread is not running,
the watchdogd thread notices that the lock holder is not running
and sleeps. When the kcs_loop thread eventually finishes processing
one request, it drops the IPMI_IO_LOCK and then immediately takes the
lock again so it can process the next request in the queue.
Because the watchdogd thread is sleeping at this point, the kcs_loop
always wins the race to acquire the IPMI_IO_LOCK, thus starving
the watchdogd thread. The callout for the watchdog pretimeout
would be reset by the watchdogd thread after its request to the BMC
watchdog completes, but since that request never processed, the
pretimeout callout eventually fires, even though there is nothing
actually wrong with the host.
To prevent this saga from unfolding:
- when kcs_driver_request() is called in a context where it can sleep,
queue the request and let the worker thread process it rather than
trying to process in the original thread.
- add a new high-priority queue for driver requests, so that the
watchdog patting requests will be processed as quickly as possible
even if lots of application requests have already been queued.
With these two changes, the watchdog pretimeout action does not trigger
even if the BMC is completely out to lunch for long periods of time
(as long as the watchdogd check command does not also get stuck).
Kristof Provost [Tue, 1 Nov 2022 17:03:50 +0000 (18:03 +0100)]
pf tests: make killstate tests more robust
Rather than using a Scapy-based Python script only check if the state
still exists. Scapy tends to be slow to start, it appears because it
lists all interfaces and gets their (IPv6) addresses a couple of times
at startup. This can be sufficient for the ICMP state to time out and
the test to fail.
We now only check if the state exists or is removed as expected, which
makes things faster, and should mean the test is more robust on slower
machines (such as CI VMs).
First, an sbuf_new() in device_get_path() shadows the sb
passed in by dev_wired_cache_add(), leaving its sb in an
unfinished state, leading to a failed KASSERT(). Fixing this
is as simple as removing the sbuf_new() from device_get_path()
Second, we cannot simply take a pointer to the sbuf memory and
store it in the device location cache, because that sbuf
is freed immediately after we add data to the cache, leading
to a use-after-free and eventually a double-free. Fixing this
requires allocating memory for the path.
After a discussion with jhb, we decided that one malloc was
better than two in dev_wired_cache_add, which is why it changed
so much.
Reviewed by: jhb
Sponsored by: Netflix
MFC after: 14 days
Mitchell Horne [Tue, 1 Nov 2022 15:15:18 +0000 (12:15 -0300)]
hier(7): remove text describing /usr/src layout
It poses a maintenance burden, since much of the information is
duplicated in the src tree's README.md file. Readers who are interested
enough in learning about the structure of the src tree can download it,
or browse the README online. Have hier(7) just point them there instead.
arm64: Hyper-V: fixing hung issue during Hyper-V initialization
In non-Hyper-V systems during Hyper-V initialization, system
initialization was getting hung, as hyperv_identify(),
was returning successful irrespective of the type of the platform.
Bjoern A. Zeeb [Mon, 31 Oct 2022 23:53:26 +0000 (23:53 +0000)]
LinuxKPI: 802.11: pass internal variable to lkpi_80211_mo_sta_state()
With mac80211 operations (MO) tracing on we have seen some ill-ordered
executions of MO functions. In order to limit visibility of the mac80211
sta, pass the internal version into lkpi_80211_mo_sta_state() and only
there convert to the argument needed. This mostly eases tracing and
debugging.
Sposnored by: The FreeBSD Foundation
MFC after: 3 days
Mark Johnston [Mon, 31 Oct 2022 23:11:36 +0000 (19:11 -0400)]
dtrace: Fix up %rip for invop probes on x86
When a breakpoint exception is raised, the saved value of %rip points to
the instruction following the breakpoint. However, when fetching the
value of %rip using regs[], it's more natural to provide the address of
the breakpoint itself, so modify the kinst and fbt providers accordingly.
Ed Maste [Mon, 31 Oct 2022 20:38:46 +0000 (16:38 -0400)]
mount_unionfs: remove jokey cautions from man page
There are known issues with unionfs, and the mount_unionfs man page has
a cautionary statement about its use. The caution had additional
"humourous" statements like "BEWARE OF DOG" but they served only to
confuse the situation. Remove them.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Gleb Smirnoff [Mon, 31 Oct 2022 15:57:11 +0000 (08:57 -0700)]
inpcb: retire suppresion of randomization of ephemeral ports
The suppresion was added in 5f311da2ccb with no explanation in the
commit message of the exact problem that was fixed. In the BSDCan
2006 talk [1], slides 12 to 14, we can find that it seems that there
was some problem with the TIME_WAIT state not properly being handled
on the remote side (also FreeBSD!), and this switching off the
suppression had hidden the problem. The rationale of the change was
that other stacks may also be buggy wrt the TIME_WAIT.
I did not find the actual problem in TIME_WAIT that the suppression
has hidden, neither a commit that would fix it. However, since that
time we started to handle SYNs with RFC5961 instead of RFC793, see 3220a2121cc. We also now have the tcp-testsuite [2], that has full
coverage of all possible scenarios of receiving SYN in TIME_WAIT.
Gleb Smirnoff [Mon, 31 Oct 2022 15:30:59 +0000 (08:30 -0700)]
rack/bbr: put back assertion that connection is not in TIME-WAIT
The assertion was incorrectly removed in 0d7445193ab. The leak of
a TIME-WAIT state into tfb_do_segment_nounlock method was fixed in 31bc602ff81. The TIME-WAIT connections are processed by the main
tcp_input() always.
Kyle Evans [Mon, 31 Oct 2022 03:55:46 +0000 (22:55 -0500)]
mktemp: add -p/--tmpdir argument
This matches other mktemp implementations, including OpenBSD and GNU.
The -p option can be used to provide a tmpdir prefix for specified
templates. Precedence works out like so:
-t flag:
- $TMPDIR
- -p directory
- /tmp
Implied -t flag (no arguments or only -d flag):
- -p directory
- $TMPDIR
- /tmp
Some tests have been added for mktemp(1) in the process.
Aymeric Wibo [Sun, 23 Oct 2022 07:46:27 +0000 (09:46 +0200)]
ls(1): add a -v flag to sort naturally
Add a -v flag for ls which sorts entries following a natural ordering
using strverscmp(3) (e.g. "bloem1 bloem9 bloem10" as opposed to
"bloem1 bloem10 bloem9").
Justin Hibbits [Sun, 30 Oct 2022 19:40:05 +0000 (15:40 -0400)]
dtrace: Add pid provider to the build for powerpc
The fasttrap pid provider has been in place for a long time, but stopped
getting built by efe88d92da in preparation for 64-bit atomics. 32-bit
emulation of 64-bit atomics was added in 9aafc7c05.
Piotr Kubaj [Fri, 28 Oct 2022 09:59:05 +0000 (11:59 +0200)]
ofed: allow using IPv6 address in rc_pingpong server
Summary:
The current OFED code allows binding server to IPv6 address. It was added back in https://github.com/linux-rdma/rdma-core/commit/91fc39561d04903cd5b1665d9215a184baa66ba9
Gordon Bergling [Sun, 30 Oct 2022 12:59:37 +0000 (13:59 +0100)]
wg.4: Add some enhancements
- add a SPDX-License-Identifier
- rename the title of the man page
- use better grammar in some places
- reword 'IPs' to 'IP addresses'
- add a missing word in the AUTHORS section
- use '.An -nosplit' in the AUTHORS section
- Xr ipsec and ovpn
Warner Losh [Sat, 29 Oct 2022 14:34:16 +0000 (08:34 -0600)]
sys/modules: MODULES_OVERRIDE takes precedence over EXTRA_MODULES and WITHOUT_MODULES
MODULES_OVERRIDE has traditionally taken precedence over EXTRA_MODULES
and WITHOUT_MODULES as the exact list of modules to build. Over time,
things have been added that has broken this. Move the .endif that makes
this the case to the right place. The so called 'ALL_MODULES' option is
the only thing with higher precedence, but it's not quite all the
options anymore (though it is much more of them, and doesn't quite
work on !x86).
Warner Losh [Fri, 28 Oct 2022 21:42:49 +0000 (15:42 -0600)]
make: Don't print as many ==> and -- xxx -- lines in meta mode
Since metamode just announces what it's doing, the extra -- xxx -- lines
aren't needed for recursive descent, nor are the ==> lines needed. This
speeds up rebuilding kernels a lot...
Mitchell Horne [Sat, 29 Oct 2022 15:30:32 +0000 (12:30 -0300)]
linux, linux64: fix module load
The previous commit added references to to the syscallnames arrays, but
failed to add the relevant source files to the module build. Thus, the
modules failed to load due to missing symbols.
Reported by: cy
Fixes: 1da65dcb1c57 ("linux: populate sv_syscallnames in each sysentvec")
Sponsored by: The FreeBSD Foundation
Mitchell Horne [Sat, 29 Oct 2022 15:28:16 +0000 (12:28 -0300)]
linux, linux64: improve SRCS formatting
Sort the entries alphabetically, and list them with one entry per line.
This makes the diffs much cleaner when adding or removing a new entry,
as I will do in the next commit.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Mike Karels [Tue, 25 Oct 2022 19:23:18 +0000 (14:23 -0500)]
genet: add another case where headers need pullup
Wake On LAN packets sent by wake(8) via BPF are lost if txcsum is
enabled. These fall into the "other protocol" case where gen_parse_tx
did nothing. Add code to shift up to gen_tx_hdr_min bytes of the
packet along with the Ethernet header in this case.
Doug Moore [Sat, 29 Oct 2022 05:50:44 +0000 (00:50 -0500)]
iommu_gas: start space search from 1st free space
Maintain a pointer to an element in the domain map that is left of any
sufficiently large free gap in the tree and start the search for free
space there, rather than at the root of the tree. On find_space, move
that pointer to the leftmost leaf in the subtree of nodes with
free_down greater than or equal to the minimum allocation size before
starting the search for space from that pointer. On removal of a node
with address less than that pointer, update that pointer to point to
the predecessor or successor of the removed node.
In experiments with netperf streaming, this reduces by about 40% the
number of map entries examined in first-fit allocation.
Kyle Evans [Sat, 29 Oct 2022 02:41:58 +0000 (21:41 -0500)]
Import wireguard-tools for wg(8)
744bfb213144 ("Import the WireGuard driver from zx2c4.com") re-imported
the WireGuard driver with the intention that wg(8) will be used to
manage WireGuard interfaces, as on other platforms, now that wg(8) has
been dual-licensed MIT specifically to allow our use in base (thanks!).
This is a copy of wireguard-tools/src, with files that we don't need
.gitignore'd out to make it more clear that we're only building files
that are either MIT or dual-licensed MIT. We may go with a different
structure later (e.g., if we end up needing to include outside of src/),
but an upstream restructure seems unlikely in the foreseeable future.
Mitchell Horne [Fri, 28 Oct 2022 21:20:05 +0000 (18:20 -0300)]
ddb: print the actual syscall name
Some architectures will pretty-print a system call trap in the
backtrace. Rather than printing the symbol, use the syscallname()
function to pull the string from the sv_syscallnames array corresponding
to the process. This simplifies the function somewhat.
Mostly, this will result in dropping the "sys" prefix, e.g. "sys_exit"
will now be printed simply as "exit".
Make two minor tweaks to the function signature: use a u_int for the
syscall number since this is a more correct type (see the 'code' member
of struct syscall_args), and make the thread pointer the first argument.
The latter is more natural and conventional.
Suggested by: jrtc27
Reviewed by: jrtc27, markj, jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D37200
Mark Johnston [Fri, 28 Oct 2022 20:53:36 +0000 (16:53 -0400)]
release: Add support for creating ZFS-based VM images
The change extends vmimage.subr to handle a new parameter, VMFS, which
should be equal to either "ufs" or "zfs". When it is set to ZFS, we use
makefs to create a bootable pool populated using the same dataset layout
as bsdinstall and "poudriere image" use. The pool can be grown using
the growfs rc.d script, just as in UFS images.
This will make it easy to provide VM and cloud images with ZFS as the
root filesystem. So far I did not do extensive testing of cloud images;
I merely verified that creation of ZFS-based AWS AMIs works and allows
me to create amd64 and arm64 EC2 instances with ZFS as the root
filesystem.
Reviewed by: emaste, gjb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34426