John Baldwin [Fri, 6 May 2022 22:36:10 +0000 (15:36 -0700)]
ata: Remove ata_devclass from DRIVER_MODULE invocations.
Keep the global variable for its uses in ata-pci.c and
chipsets/ata-fsl.c but initialize it in the existing
ata_module_event_handler. Move the module event handler a bit earlier
to ensure the variable is set before any devices are attached.
Rick Macklem [Fri, 6 May 2022 21:03:43 +0000 (14:03 -0700)]
rpc.tlsservd: Add logging of TLS version and cipher used
This patch adds logging of the version of TLS and cipher
negotiated successfully by the TLS handshake for each client,
if the "-v" command line option has been specified.
This information may be useful for monitoring and debugging
NFS-over-TLS mounts.
virtio-console is currently missing .pe_legacy_config, which prevents any
portN configuration from being parsed, and therefore no sockets will be
created.
It simplifies the declaration of the driver structures a little. There
are no current consumers of this macro, in fact it looks like it was
added for exactly this purpose.
This decreases the scope of some variables, so rework the initialization
in vt_init_logos() such that it doesn't require them.
Dmitry Chagin [Fri, 6 May 2022 16:58:53 +0000 (19:58 +0300)]
linux(4): Call semop directly.
As the Linux semop syscall is not defined in i386, and as it is equal
to the native semop syscall, call it directly.
Fix semop definition to match Linux actual one - nsops is size_t type.
Dmitry Chagin [Fri, 6 May 2022 16:51:48 +0000 (19:51 +0300)]
sysvsem: Add a timeout argument to the semop.
For future use in the Linux emulation layer for the semtimedop syscall
split the sys_semop syscall into two counterparts and add
struct timespec *timeout argument to the last one.
Kristof Provost [Fri, 6 May 2022 14:41:34 +0000 (16:41 +0200)]
pf: don't reject dummynet-ed packets
If we pass a packet to dummynet we should indicate we've passed it (but
keep m0 == NULL). Otherwise we'll indicate to the calling layers that
the packet has been rejected.
Kristof Provost [Fri, 6 May 2022 14:37:47 +0000 (16:37 +0200)]
pf: dummynet fix
If we don't have a pipe set we shouldn't feed packets into dummynet.
This could occur if we have a 'dnpipe (0, 100)' configuration, for
example. We do want to feed the packet to dummynet in the return
direction, but not in the forward direction. In that case
pf_pdesc_to_dnflow() should return false, rather than pass a pipe number
of 0 to dummynet.
Kristof Provost [Sun, 27 Mar 2022 18:23:25 +0000 (20:23 +0200)]
if: avoid interface destroy race
When we destroy an interface while the jail containing it is being
destroyed we risk seeing a race between if_vmove() and the destruction
code, which results in us trying to move a destroyed interface.
Protect against this by using the ifnet_detach_sxlock to also covert
if_vmove() (and not just detach).
Bjoern A. Zeeb [Thu, 5 May 2022 22:21:03 +0000 (22:21 +0000)]
net80211: simplify code after STA/AP VAPs traffic hang fix
Combine the comment and double-unsetting of OACTIVE into a single case
after e8de31caceaa36caf5d7b4355072f148e2433b82.
This saves the question of why we do it twice--once right before and
one more time right after the state change check.
Also move the XXX comment about kicking the queue up to where it seems
better suited now.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Reviewed by: adrian
Differential Revision: https://reviews.freebsd.org/D35135
Dan Carpenter [Thu, 4 Apr 2019 15:12:17 +0000 (18:12 +0300)]
xen: Prevent buffer overflow in privcmd ioctl
The "call" variable comes from the user in privcmd_ioctl_hypercall().
It's an offset into the hypercall_page[] which has (PAGE_SIZE / 32)
elements. We need to put an upper bound on it to prevent an out of
bounds access.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
Obtained from: Linux
Linux commit: 42d8644bd77dd2d747e004e367cb0c895a606f39 Fixes: bf7313e3b79 ("xen: implement the privcmd user-space device")
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Reviewed by: royger
John Baldwin [Thu, 5 May 2022 23:40:04 +0000 (16:40 -0700)]
pbio: Add locking and remove D_NEEDGIANT.
Use a sx lock to permit uiomove directly into/out of the the per-port
buffers. In addition, the sx lock provides a stronger guarantee that
I think this driver wants which is to single-thread read and write
calls even while paused. Finally, replace tsleep with dummy wait
channels with calls to pause_sig to more clearly communicate the
intent.
John Baldwin [Thu, 5 May 2022 23:38:25 +0000 (16:38 -0700)]
pbio: Store softc in si_drv1 for character devices.
The port number is still stored in the unit (si_drv0) but is the
entire unit value now.
While here, removed checks for NULL softc since those can never happen
from cdevsw routines. This also resulted in the close method becoming
a no-op, so it has been removed.
John Baldwin [Wed, 4 May 2022 22:59:44 +0000 (15:59 -0700)]
cxgbe tom: Force unsigned modulus for queue indices.
The final transmit and receive queue indices need to be positive
values. However, since txq_idx and rxq_idx are signed (to permit
using -1 to as a marker for uninitialized values), using %= with
another integer type (vi->nofld[tr]xq) yielded a sign-extended modulus
value. This resulted in negative queue indices and a buffer underrun
when arc4random() returned a value with the sign bit set. Use a
temporary unsigned variable to hold the "raw" queue index to force
unsigned modulus.
This worked previously because the modulus was previously applied
directly to the return value of arc4random() which is unsigned before
the result was assigned to txq_idx and rxq_idx.
Discussed with: np
Fixes: db28d4a0cd1c cxgbe/t4_tom: Support for round-robin selection of offload queues.
Sponsored by: Chelsio Communications
Rick Macklem [Thu, 5 May 2022 22:54:14 +0000 (15:54 -0700)]
rpc.tlsservd: Add a -C command line option for preferred_ciphers
rpc.tlsclntd has a -C command line option for setting
preferred_ciphers. Testing at a recent IETF NFSv4 testing
event showed that setting preferred_ciphers is not normally
needed for the rpc.tlsservd.
This patch modifies rpc.tlsservd to not specify preferred_ciphers
by default, but provides the same -C option as rpc.tlsclntd to
set preferred_ciphers, in case it is needed.
The man page update will be done as a separate commit.
Bjoern A. Zeeb [Thu, 5 May 2022 20:43:34 +0000 (20:43 +0000)]
LinuxKPI: skbuff: add memlimit tunable for 64bit systems
Some drivers, such as Realtek's rtw88, require 32bit DMA in
a single segment. busdma(9) has a hard time providing this
currently for 3-ish pages at large quantities
(see lkpi_pci_nseg1_fail in linux_pci.c e86707418c8e8).
Work around this for now by allowing a tunable to enforce
physical addresses allocation limits on 64bit platforms (ignoring PAE)
using "old-school" contigmalloc(9) to avoid bouncing.
A patch needing a custom kernel compiled was tested in the last weeks
by rtw88 users providing the 32bit limit only hardcoded. The 36bit
limit can be found in iwlwifi so is added as a testing option along.
This is put in as a bandaid for now, so people no longer need to patch
and compile their own kernels to use rtw88 and to allow us to MFC the
driver as well before the amounts of commits to track increases by
much more.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Kristof Provost [Thu, 27 Jan 2022 21:01:09 +0000 (22:01 +0100)]
mbuf: do not restore dying interfaces
When we remove an interface it is first removed from the interface list
V_ifnet (by if_unlink_ifnet()) and marked as IFF_DYING. We then wait for
any possible references to stop being used (i.e.
epoch_wait/epoch_drain_callbacks) before we tear it fully down.
However, the index in ifindex_table is not removed, so m_rcvif_restore()
can still find the (now dying) interface.
This results in panics, for example when dummynet restores the rcvif
pointer and passes a packet to ip6_input() we can panic because the
AF_INET6 domain has already been removed (so we end up dereferencing a
NULL pointer there).
Check that the interface is not dying before we restore it, which is
equivalent to checking its presence in V_ifnet, and thus ensures that
future accesses (while in NET_EPOCH) are safe.
Gleb Smirnoff [Thu, 27 Jan 2022 05:58:50 +0000 (21:58 -0800)]
dummynet: use m_rcvif_serialize/restore when queueing packets
This fixed panic with interface being removed while packet
was sitting on a queue. This allows to pass all dummynet
tests including forthcoming dummynet:ipfw_interface_removal
and dummynet:pf_interface_removal and demonstrates use of
m_rcvif_serialize() and m_rcvif_restore().
Gleb Smirnoff [Thu, 27 Jan 2022 05:58:44 +0000 (21:58 -0800)]
ifnet: make if_index global
Now that ifindex is static to if.c we can unvirtualize it. For lifetime
of an ifnet its index never changes. To avoid leaking foreign interfaces
the net.link.generic.system.ifcount sysctl and the ifnet_byindex() KPI
filter their returned value on curvnet. Since if_vmove() no longer
changes the if_index, inline ifindex_alloc() and ifindex_free() into
if_alloc() and if_free() respectively.
API wise the only change is that now minimum interface index can be
greater than 1. The holes in interface indexes were always allowed.
Jessica Clarke [Thu, 5 May 2022 18:07:54 +0000 (19:07 +0100)]
release: Use full window size for installer over serial lines
When running over a serial line we end up defaulting to 80x24, which is
rather cramped for many dialog boxes and occupies very little screen
space for most modern terminals. Thus, run resizewin -z to set the
terminal size if not already known before starting the installer, just
as we do for csh and sh login shells already in their default dotfiles.
Kristof Provost [Thu, 5 May 2022 07:21:32 +0000 (09:21 +0200)]
pf: clear PF_TAG_DUMMYNET for dummynet fast path
ip_dn_io_ptr() (i.e. dummynet_io()) can return the mbuf immediately (as
opposed to owning it and later passing it through dummynet_send(), which
returns it to pf_test()). In that case we must clear the PF_TAG_DUMMYNET
flag to ensure we don't skip any subsequent firewall passes.
This can happen if we process a packet in PFIL_IN, set PF_TAG_DUMMYNET
on it, pass it to ip_dn_io_ptr() but have it returned immediately. The
packet continues its normal path, eventually hitting
pf_test(dir=PFIL_OUT), where we'd skip when we're not supposed to.
Warner Losh [Thu, 5 May 2022 02:28:00 +0000 (20:28 -0600)]
iosched: remove stray debug
This printf was designed to catch misqueued bio requests. Prior to
supporting read_bias == 0, we couldn't get anything but reads and writes
in this queue. However, for read_bias == 0 we queue everything except
BIO_DELETE to this queue, so remove the printf. We don't need to update
any statistics.
Rick Macklem [Wed, 4 May 2022 20:58:22 +0000 (13:58 -0700)]
nfsd: Add a sanity check for Owner/OwnerGroup string length
Robert Morris reported that, if a client sends an absurdly
large Owner/OwnerGroup string, the kernel malloc() for the
large size string can block forever.
This patch adds a sanity limit for Owner/OwnerGroup string
length. Since the RFCs do not specify any limit and FreeBSD
can handle a group name greater than 1Kbyte, the limit is
set at a generous 10Kbytes.
Rick Macklem [Wed, 4 May 2022 20:52:33 +0000 (13:52 -0700)]
nfsd: Fix handling of Open/Create for the pNFS server
When the MDS of a pNFS service receives an Open/Create
and the file already exists, it must do a Setattr of
size == 0. Without this patch, this was eroneously
done via a VOP_SETAATR() call, which would set the
length of the MDS file to 0 (which is already is,
since all data lives on the DSs).
This patch fixes the problem by doing a nfsvno_setattr()
instead of VOP_SETATTR(), which knows to do a proxied
Setattr on the DSs.
For a non-pNFS server, the change has no effect, since
nfsvno_setattr() only does a VOP_SETATTR() for that case.
This was found during a recent IETF NFSv4 testing event.
John Baldwin [Wed, 4 May 2022 20:08:36 +0000 (13:08 -0700)]
OpenSSL: KTLS: Enable KTLS for receiving as well in TLS 1.3
This removes a guard condition that prevents KTLS being enabled for
receiving in TLS 1.3. Use the correct sequence number and BIO for
receive vs transmit offload.
John Baldwin [Wed, 4 May 2022 20:08:27 +0000 (13:08 -0700)]
OpenSSL: KTLS: Handle TLS 1.3 in ssl3_get_record.
- Don't unpad records, check the outer record type, or extract the
inner record type from TLS 1.3 records handled by the kernel. KTLS
performs all of these steps and returns the inner record type in the
TLS header.
- When checking the length of a received TLS 1.3 record don't allow
for the extra byte for the nested record type when KTLS is used.
- Pass a pointer to the record type in the TLS header to the
SSL3_RT_INNER_CONTENT_TYPE message callback. For KTLS, the old
pointer pointed to the last byte of payload rather than the record
type. For the non-KTLS case, the TLS header has been updated with
the inner type before this callback is invoked.
John Baldwin [Wed, 4 May 2022 20:08:17 +0000 (13:08 -0700)]
OpenSSL: KTLS: Add using_ktls helper variable in ssl3_get_record().
When KTLS receive is enabled, pending data may still be present due to
read ahead. This data must still be processed the same as records
received without KTLS. To ease readability (especially in
consideration of additional checks which will be added for TLS 1.3),
add a helper variable 'using_ktls' that is true when the KTLS receive
path is being used to receive a record.
John Baldwin [Wed, 4 May 2022 20:08:03 +0000 (13:08 -0700)]
OpenSSL: KTLS: Check for unprocessed receive records in ktls_configure_crypto.
KTLS implementations currently assume that the start of the in-kernel
socket buffer is aligned with the start of a TLS record for the
receive side. The socket option to enable KTLS specifies the TLS
sequence number of this initial record.
When read ahead is enabled, data can be pending in the SSL read buffer
after negotiating session keys. This pending data must be examined to
ensurs that the kernel's socket buffer does not contain a partial TLS
record as well as to determine the correct sequence number of the
first TLS record to be processed by the kernel.
In preparation for enabling receive kernel offload for TLS 1.3, move
the existing logic to handle read ahead from t1_enc.c into ktls.c and
invoke it from ktls_configure_crypto().
John Baldwin [Wed, 4 May 2022 20:07:36 +0000 (13:07 -0700)]
OpenSSL: Cleanup record length checks for KTLS
In some corner cases the check for packets
which exceed the allowed record length was missing
when KTLS is initially enabled, when some
unprocessed packets are still pending.
Marko Zec [Wed, 4 May 2022 04:19:46 +0000 (06:19 +0200)]
tests: vnet tests started failing in CI, disable temporarily
As a fallout of backing out 91f44749c6fe, vnet tests started
failing in CI. Temporarily broadly disable vnet tests until
specific cases can be resolved, and file a bug.
Devirtualization of V_if_index and V_ifindex_table was rushed into
the tree lacking proper context, discussion, and declaration of intent,
so I'm backing it out as harmful to VNET on the following grounds:
1) The change repurposed the decades-old and stable if_index KBI for
new, unclear goals which were omitted from the commit note.
2) The change opened up a new resource exhaustion vector where any vnet
could starve the system of ifnet indices, including vnet0.
3) To circumvent the newly introduced problem of separating ifnets
belonging to different vnets from the globalized ifindex_table, the
author introduced sysctl_ifcount() which does a linear traversal over
the (potentially huge) global ifnet list just to return a simple upper
bound on existing ifnet indices.
4) The change effectively led to nonuniform ifnet index allocation
among vnets.
5) The commit note clearly stated that the patch changed the implicit
if_index ABI contract where ifnet indices were assumed to be starting
from one. The commit note also included a correct observation that
holes in interface indices were always allowed, but failed to declare
that the userland-observable ifindex tables could now include huge
empty spans even under modest operating conditions.
6) The author had an earlier proposal in the works which did not
affect per-vnet ifnet lists (D33265) but which he abandoned without
providing the rationale behind his decision to do so, at the expense
of sacrificing the vnet isolation contract and if_index ABI / KBI.
Furthermore, the author agreed to back out his changes himself and
to follow up with a proposal for a less intrusive alternative, but
later silently declined to act. Therefore, I decided to resolve the
status-quo by backing this out myself. This in no way precludes a
future proposal aiming to mitigate ifnet-removal related system
crashes or panics to be accepted, provided it would not unnecessarily
compromise the goal of as strict as possible isolation between vnets.