vn_start_write(): consistently set *mpp to NULL on error or after failed sleep
This ensures that *mpp != NULL iff vn_finished_write() should be
called, regardless of the returned error, except for V_NOWAIT.
The only exception that must be maintained is the case where
vn_start_write(V_NOWAIT) is called with the intent of later dropping
other locks and then doing vn_start_write(V_XSLEEP), which needs the mp
value calculated from the non-waitable call above it.
Also note that V_XSLEEP is not supported by vn_start_secondary_write().
Corvin Köhne [Wed, 29 Mar 2023 08:07:10 +0000 (10:07 +0200)]
bhyve: do not exit if LPC finds no host selector
The host selector is only required when the user likes to use the same
LPC device IDs as the physical LPC device. This is an uncommon use case.
For that reason, it makes no sense to exit when we don't find the host
selector.
Rob Norris [Tue, 14 Mar 2023 22:07:18 +0000 (09:07 +1100)]
dhclient: add ability to ignore options in offers
A machine might exist on multiple networks, all of which offer, say, default
routes or name servers. There's no easy way to indicate in the config
that those options are only valid for a single interface.
Randall Stewart [Mon, 10 Apr 2023 20:33:56 +0000 (16:33 -0400)]
tcp: Rack - in the absence of LRO fixed rate pacing (loopback or interfaces with no LRO) does not work correctly.
Rack is capable of fixed rate or dynamic rate pacing. Both of these can get mixed up when
LRO is not available. This is because LRO will hold off waking up the tcp connection to
processing the inbound packets until the pacing timer is up. Without LRO the pacing only
sort-of works. Sometimes we pace correctly, other times not so much.
This set of changes will make it so pacing works properly in the absence of LRO.
Alan Somers [Fri, 7 Apr 2023 16:07:50 +0000 (10:07 -0600)]
Implement GEOM::rotation_rate for gmirror
If all of the mirror's children have the same rotation rate, report
that. But if they have mixed rotation rates, or if any child has an
unknown rotation rate, report "Unknown".
Mark Johnston [Mon, 10 Apr 2023 15:18:25 +0000 (11:18 -0400)]
bridge: Add support for emulated netmap mode
if_bridge receives packets via a special interface, if_bridge_input,
rather than by if_input. Thus, netmap's usual hooking of ifnet routines
does not work as expected. Instead, modify bridge_input() to pass
packets directly to netmap when it is enabled. This applies to both
locally delivered packets and forwarded packets.
When a netmap application transmits a packet by writing it to the host
TX ring, the mbuf chain is passed to if_input, which ordinarily points
to ether_input(). However, when transmitting via if_bridge,
bridge_input() needs to see the packet again in order to decide whether
to deliver or forward. Thus, introduce a new protocol flag,
M_BRIDGE_INJECT, which 1) causes the packet to be passed to
bridge_input() again after Ethernet processing, and 2) avoids passing
the packet back to netmap. The source MAC address of the packet is used
to determine the original "receiving" interface.
We already remove mbuf tags from packets transitting an if_epair, but we
didn't remove vlan metadata.
In certain configurations this could lead to unexpected vlan tags
turning up on the rx side.
The assert_vop_locked messages are ignored, and file/line information
is not too useful. Fixing this without changing both witness and VFS
asserts KPIs is not possible.
Reviewed by: markj (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D39464
netlink: improve source ifa selection algorithm when adding routes.
Use route destination sockaddr when the gateway is eiter AF_LINK or
has the different family (IPv4 over IPv6). This change ensures
the nexthop IFA has the same family as the destination.
usb(4): Separate the fast path and the slow path to avoid races and use-after-free for the USB FS interface.
Bad behaving user-space USB applicatoins may crash the kernel by issuing
USB FS related ioctl(2)'s out of their expected order. By default
the USB FS ioctl(2) interface is only available to the
administrator, root, and driver applications like webcamd(8) needs
to be hijacked in order for this to happen.
The issue is the fast-path code does not always see updates made
by the slow-path code, and may then work on freed memory.
This is easily fixed by using an EPOCH(9) type of synchronization
mechanism. A SX(9) lock will be used as a substitute for EPOCH(9),
due to the need for sleepability. In addition most calls going into
the fast-path originate from a single user-space process and the
need for multi-thread performance is not present.
usb(4): Code refactoring as a pre-step for adding missing synchronization mechanism.
Move code in switch cases into own functions to make later changes easier to track.
No functional change, except for removing a superfluous break statement when
range checking USB_FS_MAX_FRAMES, in the USB_FS_OPEN case.
It should not have been there at all.
Rick Macklem [Fri, 7 Apr 2023 19:57:26 +0000 (12:57 -0700)]
nfscl: Fix support for doing Null RPCs
Although the NFS client does not currently perform Null RPCs,
this fix is needed if/when it might do so.
Found during testing of experimental code that uses Null RPCs
to maintain/monitor TCP connections for "nconnect" mounts.
Rick Macklem [Fri, 7 Apr 2023 19:49:23 +0000 (12:49 -0700)]
nfsd: Add support for the SP4_MACH_CRED case in ExchangeID
Commit f4179ad46fa4 added support for operation bitmaps for
NFSv4.1/4.2. This commit uses those to implement the SP4_MACH_CRED
case for the NFSv4.1/4.2 ExchangeID operation since the Linux
NFSv4.1/4.2 client is now using this for Kerberized mounts.
The Linux Kerberized NFSv4.1/4.2 mounts currently work without
support for this because Linux will fall back to SP4_NONE,
but there is no guarantee this fallback will work forever.
This commit only affects Kerberized NFSv4.1/4.2 mounts from
Linux at this time.
API contract requires VOPs to handle EXDEV internally, worst case by
falling back to the generic copy routine. This broke with the recent
changes.
While here whack custom loop to lock 2 vnodes with vn_lock_pair, which
provides the same functionality internally. write start/finish around
it plays no role so got eliminated.
One difference is that vn_lock_pair always takes an exclusive lock on
both vnodes. I did not patch around it because current code takes an
exclusive lock on the target vnode. zfs supports shared-locking for
writes, so this serializes different calls to the routine as is, despite
range locking inside. At the same time you may notice the source vnode
can get some traffic if only shared-locked, thus once more this goes
the safer route of exclusive-locking. Note this should be patched to
use shared-locking for both once the feature is considered stable.
Technically the switch to vn_lock_pair should be a separate change, but
it would only introduce churn immediately whacked by the rest of the
patch.
[note: technically the review is still in progress, but so is the
active breakage]
MAC flapping occurs when a bridge receives packets with the same source MAC
address on different member interfaces. The common reasons are:
- user roams from one bridge port to another
- user has wrong network setup, bridge loops e.g.
- someone set duplicated ethernet address on his/her nic
- some bad guy / virus / trojan send spoofed packets
if_bridge currently updates the bridge routing entry silently hence it is hard
to diagnose.
Emit logs when MAC address port flapping occurs to make it easier to diagnose.
ifconfig: Fix configuring if_bridge with additional operating parameters
For clone create and rename operations, the interface name get back can be
different from the one passed to ioctl(). Use the interface name we get back
so that ifconfig will not return unexpected ENXIO.
Randall Stewart [Fri, 7 Apr 2023 14:15:29 +0000 (10:15 -0400)]
tcp: misc cleanup of options for rack as well as socket option logging.
Both BBR and Rack have the ability to log socket options, which is currently disabled. Rack
has an experimental SaD (Sack Attack Detection) algorithm that should be made available. Also
there is a t_maxpeak_rate that needs to be removed (its un-used).
wpa_supplicant/hostapd: Fix uninitialized packet pointer on error
The packet pointer (called packet) will remain uninitialized when
pcap_next_ex() returns an error. This occurs when the wlan
interface is shut down using ifconfig destroy. Adding a NULL
assignment to packet duplicates what pcap_next() does.
The reason we use pcap_next_ex() in this instance is because with
pacp_next() when we receive a null pointer if there was an error
or if no packets were read. With pcap_next_ex() we can differentiate
between an error and legitimately no packets were received.
PR: 270649
Reported by: Robert Morris <rtm@lcs.mit.edu>
Fixes: 6e5d01124fd4
MFC after: 3 days
Mark Johnston [Wed, 5 Apr 2023 20:52:41 +0000 (16:52 -0400)]
netmap: Handle packet batches in generic mode
ifnets are allowed to pass batches of multiple packets to if_input,
linked by the m_nextpkt pointer. iflib_rxeof() sometimes does this, for
example. Netmap's generic mode did not handle this and would only
deliver the first packet in the batch, leaking the rest.
Ed Maste [Fri, 31 Mar 2023 16:57:15 +0000 (12:57 -0400)]
makefs: remove CD9660MAXPATH #define
It was used only in constructing the host path that contains file
content, which is not related to anything CD9660-specific. PATH_MAX is
the appropriate limit. See OpenBSD commit 299d8950f319.
Obtained from: OpenBSD
Sponsored by: The FreeBSD Foundation
Ed Maste [Wed, 5 Apr 2023 15:21:26 +0000 (11:21 -0400)]
src.conf.5: Expand WITH_LLVM_BINUTILS description
List the specific tools that are controlled by WITH_LLVM_BINUTILS, and
mention the tools that are always or never taken from LLVM. Tools come
from one of three sources (LLVM, ELF Tool Chain, bespoke base system)
and it is useful to be explicit.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D39428
Mark Johnston [Wed, 5 Apr 2023 16:12:30 +0000 (12:12 -0400)]
netmap: Fix queue stalls with generic interfaces
In emulated mode, the FreeBSD netmap port attempts to perform zero-copy
transmission. This works as follows: the kernel ring is populated with
mbuf headers to which netmap buffers are attached. When transmitting,
the mbuf refcount is initialized to 2, and when the counter value has
been decremented to 1 netmap infers that the driver has freed the mbuf
and thus transmission is complete.
This scheme does not generalize to the situation where netmap is
attaching to a software interface which may transmit packets among
multiple "queues", as is the case with bridge or lagg interfaces. In
that case, we would be relying on backing hardware drivers to free
transmitted mbufs promptly, but this isn't guaranteed; a driver may
reasonably defer freeing a small number of transmitted buffers
indefinitely. If such a buffer ends up at the tail of a netmap transmit
ring, further transmits can end up blocked indefinitely.
Fix the problem by removing the zero-copy scheme (which is also not
implemented in the Linux port of netmap). Instead, the kernel ring is
populated with regular mbuf clusters into which netmap buffers are
copied by nm_os_generic_xmit_frame(). The refcounting scheme is
preserved, and this lets us avoid allocating a fresh cluster per
transmitted packet in the common case. If the transmit ring is full, a
callout is used to free the "stuck" mbuf, avoiding the queue deadlock
described above.
Furthermore, when recycling mbuf clusters, be sure to fully reinitialize
the mbuf header instead of simply re-setting M_PKTHDR. Some software
interfaces, like if_vlan, may set fields in the header which should be
reset before the mbuf is reused.
It'll be easier to add new properties to the ACPI device emulation if we
have a struct which holds all device specific properties. In some future
commits the acpi_device_emul struct will be expanded to include some
device specific functions to build ACPI tables.
by making it accept some open(2) flags. More precisely, only
O_CLOEXEC is supported, the flag is translated into the KQUEUE_CLOEXEC flag
for kqueuex(2), and O_NONBLOCK is silently ignored.
Reported and tested by: vishwin
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D39377
Randall Stewart [Tue, 4 Apr 2023 20:05:46 +0000 (16:05 -0400)]
Update rack to the latest code used at NF.
There have been many changes to rack over the last couple of years, including:
a) Ability when switching stacks to have one stack query another.
b) Internal use of micro-second timers instead of ticks.
c) Many changes to pacing in forms of
1) Improvements to Dynamic Goodput Pacing (DGP)
2) Improvements to fixed rate paciing
3) A new feature called hybrid pacing where the requestor can
get a combination of DGP and fixed rate pacing with deadlines
for delivery that can dynamically speed things up.
d) All kinds of bugs found during extensive testing and use of the
rack stack for streaming video and in fact all data transferred
by NF