Stefan Eßer [Thu, 30 Dec 2021 11:20:32 +0000 (12:20 +0100)]
Make CPU_SET macros compliant with other implementations
The introduction of <sched.h> improved compatibility with some 3rd
party software, but caused the configure scripts of some ports to
assume that they were run in a GLIBC compatible environment.
Parts of sched.h were made conditional on -D_WITH_CPU_SET_T being
added to ports, but there still were compatibility issues due to
invalid assumptions made in autoconfigure scripts.
The differences between the FreeBSD version of macros like CPU_AND,
CPU_OR, etc. and the GLIBC versions was in the number of arguments:
FreeBSD used a 2-address scheme (one source argument is also used as
the destination of the operation), while GLIBC uses a 3-adderess
scheme (2 source operands and a separately passed destination).
The GLIBC scheme provides a super-set of the functionality of the
FreeBSD macros, since it does not prevent passing the same variable
as source and destination arguments. In code that wanted to preserve
both source arguments, the FreeBSD macros required a temporary copy of
one of the source arguments.
This patch set allows to unconditionally provide functions and macros
expected by 3rd party software written for GLIBC based systems, but
breaks builds of externally maintained sources that use any of the
following macros: CPU_AND, CPU_ANDNOT, CPU_OR, CPU_XOR.
One contributed driver (contrib/ofed/libmlx5) has been patched to
support both the old and the new CPU_OR signatures. If this commit
is merged to -STABLE, the version test will have to be extended to
cover more ranges.
Ports that have added -D_WITH_CPU_SET_T to build on -CURRENT do
no longer require that option.
The FreeBSD version has been bumped to 1400046 to reflect this
incompatible change.
Dimitry Andric [Thu, 30 Dec 2021 09:53:25 +0000 (10:53 +0100)]
Avoid emitting popcnt in libclang_rt.fuzzer*.a if unsupported
Since popcnt is only supported by CPUTYPE=nehalem and later, ensure that
this instruction is only emitted when appropriate. Otherwise, programs
using the library can abort with SIGILL.
See also: https://github.com/llvm/llvm-project/issues/52893
PR: 258156
Reported by: Eric Rucker <bhtooefr@bhtooefr.org>
MFC after: 3 days
Fedor Uporov [Fri, 24 Dec 2021 14:18:15 +0000 (17:18 +0300)]
Improve extents verification logic
Add functionality for extents validation inside the filesystem
extents block. The main logic is implemented under
ext4_validate_extent_entries() function, which verifies extents
or extents indexes depending of extent depth value.
Fedor Uporov [Fri, 29 Oct 2021 12:45:50 +0000 (15:45 +0300)]
Add more accurate directory entries check
Rename ext2_dirbadentry() to ext2_check_direntry(). Add directory
entry inode value check, and call ext2_check_direntry() in all cases.
The dirchk sysctl is removed.
Alexander Motin [Thu, 30 Dec 2021 03:58:52 +0000 (22:58 -0500)]
CTL: Allow I/Os up to 8MB, depending on maxphys value.
For years CTL block backend limited I/O size to 1MB, splitting larger
requests into sequentially processed chunks. It is sufficient for
most of use cases, since typical initiators rarely use bigger I/Os.
One of known exceptions is VMWare VAAI offload, by default sending up
to 8 4MB EXTENDED COPY requests same time. CTL internally converted
those into 32 1MB READ/WRITE requests, that could overwhelm the block
backend, having finite number of processing threads and making more
important interactive I/Os to wait in its queue. Previously it was
partially covered by CTL core serializing sequential reads to help
ZFS speculative prefetcher, but that serialization was significantly
relaxed after recent ZFS improvements.
With the new settings block backend receives 8 4MB requests, that
should be easier for both CTL itself and the underlying storage.
John Baldwin [Thu, 30 Dec 2021 01:50:03 +0000 (17:50 -0800)]
/dev/crypto: Minimize cipher-specific logic.
Rather than duplicating the switches in crypto_auth_hash() and
crypto_cipher(), copy the algorithm constants from the new session
ioctl into a csp directly which permits using the functions in
crypto.c.
John Baldwin [Wed, 29 Dec 2021 22:36:04 +0000 (14:36 -0800)]
iscsi: Handle large Text responses.
Text requests and responses can span multiple PDUs. In that case, the
sender sets the Continue bit in non-final PDUs and the Final bit in
the last PDU. The receiver responds to non-final PDUs with an empty
text PDU.
To support this, add a more abstract API in libiscsi which accepts and
receives key sets rather than PDUs. These routines internally send or
receive one or more PDUs. Use these new functions to replace the
handling of TextRequest and TextResponse PDUs in discovery sessions in
both ctld and iscsid.
Note that there is not currently a use case for large Text requests
and those are still always sent as a single PDU. However, discovery
sessions can return a text response listing targets that spans
multiple PDUs, so the new API supports sending and receiving multi-PDU
responses.
Toomas Soome [Sun, 26 Dec 2021 09:01:16 +0000 (11:01 +0200)]
bhyve smbios type 3 structure is incorrect
If you look at the SMBIOS specification, we'll find something is
missing. In particular at offset 0Dh is supposed to be the OEM-defined
field. This should go between security and height. It is not legal to
actually skip this and will lead to other folks not properly
interpreting later parts of the table.
nhops: split nh_family into nh_upper_family and nh_neigh_family.
With IPv4 over IPv6 nexthops and IP->MPLS support, there is a need
to distingush "upper" e.g. traffic family and "neighbor" e.g. LLE/gateway
address family. Store them explicitly in the private part of the nexthop data.
While here, store nhop fibnum in nhop_prip datastructure to make it self-contained.
Introduce a new function, lltable_get(), to retrieve lltable pointer
for the specified interface and family.
Use it to avoid all-iftable list traversal when adding or deleting
ARP/ND records.
Colin Percival [Mon, 20 Dec 2021 17:55:36 +0000 (09:55 -0800)]
vfs_mountroot: Check for root dev before waiting
If GEOM is idle but the root device is not yet present when we enter
vfs_mountroot_wait_if_necessary, we call vfs_mountroot_wait to wait
for root holds (e.g. CAM or USB initialization). Upon returning from
vfs_mountroot_wait, we wait 100 ms at a time until the root device
shows up.
Since the root device most likely appeared during vfs_mountroot_wait
-- waiting for subsystems which may be responsible for the root
device is the whole purpose of that function -- it makes sense to
check if the device is now present rather than printing a warning
and pausing for 100 ms before checking.
Reviewed by: trasz Fixes: a3ba3d09c248 Make root mount wait mechanism smarter
Sponsored by: https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D33593
Colin Percival [Mon, 20 Dec 2021 17:51:34 +0000 (09:51 -0800)]
vfs_mountroot: Wait for GEOM idle post root holds
In the case of a root hold related to the initialization of a disk
device, a flurry of GEOM tasting is likely to take place as soon as
the device is initialized and the root hold is released. If we
don't wait for GEOM idle it's easy for vfs_mountroot to "win" the
race and proceed before the root filesystem GEOM is ready.
Colin Percival [Mon, 20 Dec 2021 15:17:25 +0000 (07:17 -0800)]
vfs_mountroot: Skip 'Root mount waiting' < 1 s
While the message is technically correct, it's not particularly
helpful in the case where we're only waiting a few ms; this case
occurs frequently on EC2 arm64 instances with CAM initialization
racing to release its root hold before vfs_mountroot reaches this
point. Only print the message if we end up waiting for more than
one second.
Ed Maste [Wed, 29 Dec 2021 19:59:06 +0000 (14:59 -0500)]
ar: deprecate -T option
Other ar implementations (GNU, LLVM) use -T to mean thin archive
rather than use only the first fifteen characters of the archive member
name. We support both -T and -f for this, with -f documented as an
alias of -T.
An exp-run showed that the ports invoking `ar -T` expect thin archives,
not truncated names. Switch -f to be the documented flag for this
behaviour, and emit a warning when -T is used.
The warning will be changed to an error in the future (in main), once
ports no longer use -T.
PR: 260523 [exp-run]
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Bjoern A. Zeeb [Thu, 23 Dec 2021 14:59:49 +0000 (14:59 +0000)]
bhyve: passthru: enable BARs before possibly mmap(2)ing them
The first time we start bhyve with a passthru device everything is fine
as on boot we do enable BARs. If a driver (unload) inside bhyve disables
the BAR(s) as some Linux drivers do, we need to make sure we re-enable
them on next bhyve start.
If we are trying to mmap a disabled BAR for MSI-X (PCIOCBARMMAP)
the kernel will give us an EBUSY.
While we were re-enabling the BAR(s) in the current code loop
cfginit() was writing the changes out too late to the real hardware.
Move the call to init_msix_table() after the register on the real
hardware was updated. That way the kernel will be happy and the
mmap will succeed and bhyve will start.
Also simplify the code given the last argument to init_msix_table()
is unused we do not need to do checks for each bar. [1]
MFC after: 3 days
PR: 260148
Pointed out by: markj [1]
Sponsored by: The FreeBSD Foundation
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D33628
Kevin Bowling [Wed, 29 Dec 2021 16:37:34 +0000 (09:37 -0700)]
igc: Remove redundant IFCAP_VLAN_HWTAGGING check
Match igb(4) as in f7926a6d0c10. From Vincenzo, this check is redundant
to setup providing us an IGC_RXD_STAT_VP bit and would make for an
unexpected condition if IFCAP_VLAN_HWTAGGING were not set but the tag
was stripped, which would be passed up the stack breaking isolation.
Roger Pau Monné [Tue, 28 Dec 2021 08:13:57 +0000 (09:13 +0100)]
mbuf_tags: use explicitly sized type for 'type' parameter
Functions manipulating mbuf tags are using an int type for passing the
'type' parameter, but the internal tag storage is using a 16bit
integer to store it. This leads to the following code:
t = m_tag_alloc(...,0xffffffff,...,...);
m_tag_prepend(m, t);
r = m_tag_locate(m ,...,0xffffffff, NULL);
Returning r == NULL because m_tag_locate doesn't truncate the type
parameter when doing the match. This is unexpected because the type of
the 'type' parameter is int, and the caller doesn't need to know about
the internal truncations.
Fix this by making the 'type' parameter of type uint16_t in order to
match the size of its internal storage and make it obvious to the
caller the actual size of the parameter.
While there also use uint uniformly replacing the existing u_int
instances.
John Baldwin [Wed, 29 Dec 2021 00:40:04 +0000 (16:40 -0800)]
iscsid: Always free the duplicated address in resolve_addr().
If a "raw" IPv6 address (denoted by a leading '[') is used as a target
address, then 'arg' is incremented by one to skip over the '['.
However, this meant that at the end of the function the wrong address
was passed to free(). With malloc junking enabled and given suitably
small strings, malloc() would happily overwrite the correct number of
bytes with junk, but off by one byte overwriting the byte after the
allocation.
This manifested as the first byte of the 'HeaderDigest' key being
overwritten causing the key name on the wire to be sent as
'\x5eaderDigest' which the target rejected.
Reported by: Jithesh Arakkan @ Chelsio
Found with: ASAN (via WITH_ASAN=yes)
Sponsored by: Chelsio Communications
Mark Johnston [Tue, 28 Dec 2021 22:44:57 +0000 (17:44 -0500)]
x86: Do not attempt to calibrate the LAPIC timer if no APIC is present
Reported and tested by: Michael Butler <imb@protected-networks.net>
Reviewed by: jhb, kib
Fixes: 62d09b46ad75 ("x86: Defer LAPIC calibration until after timecounters are available")
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33669
John Baldwin [Tue, 28 Dec 2021 21:51:25 +0000 (13:51 -0800)]
Simplify swi for bus_dma.
When a DMA request using bounce pages completes, a swi is triggered to
schedule pending DMA requests using the just-freed bounce pages. For
a long time this bus_dma swi has been tied to a "virtual memory" swi
(swi_vm). However, all of the swi_vm implementations are the same and
consist of checking a flag (busdma_swi_pending) which is always true
and if set calling busdma_swi. I suspect this dates back to the
pre-SMPng days and that the intention was for swi_vm to serve as a
mux. However, in the current scheme there's no need for the mux.
Instead, remove swi_vm and vm_ih. Each bus_dma implementation that
uses bounce pages is responsible for creating its own swi (busdma_ih)
which it now schedules directly. This swi invokes busdma_swi directly
removing the need for busdma_swi_pending.
One consequence is that the swi now works on RISC-V which had previously
failed to invoke busdma_swi from swi_vm.
Gleb Smirnoff [Tue, 28 Dec 2021 16:50:02 +0000 (08:50 -0800)]
tcp_usr_shutdown: don't cast inp_ppcb to tcpcb before checking inp_flags
While here move out one more erroneous condition out of the epoch and
common return. The only functional change is that if we send control
on a shut down socket we would get EINVAL instead of ECONNRESET.
The logic that sets iri_vtag and M_VLANTAG does not handle the
case where the 802.11q VLAN tag is 0. Fix this issue across
the iflib drivers. While there, also improve and align the
VLAN tag check extraction, by moving it outside the RX descriptor
loop, eliminating a local variable and additional checks.
Since isc_capenable (private copy of ifp->if_capenable) is
now synchronized to if_capenable, use it in the drivers
when checking the IFCAP_* bits.
This results in better cache usage and avoids indirection
through the ifp pointer.
On SIOCSIFCAP, some bits in ifp->if_capenable may be toggled.
When this happens, apply the same change to isc_capenable, which
is the iflib private copy of if_capenable (for a subset of the
IFCAP_* bits). In this way the iflib drivers can check the bits
using isc_capenable rather than if_capenable. This is convenient
because the latter access requires an additional indirection
through the ifp, and it is also less likely to be in cache.
This adds some very simple DWC3 glue for the IPQ4018/IPQ4019.
Other chipsets introduce reset line iteration, some further
clock line iteration and some customisations; I'll look at adding
those later.
This is enough to finally bring up USB 3.0 on my IPQ4018 ASUS
RT-58U router.
Alexander Motin [Tue, 28 Dec 2021 01:52:59 +0000 (20:52 -0500)]
GEOM: Minor polishing in geom_event.
- Remove timeouts from msleep()'s. Those should always be woken up.
- Move wakeup() under the lock to not call on possibly freed pointer.
- Remove some dead code.
Alan Cox [Sat, 25 Dec 2021 03:54:01 +0000 (21:54 -0600)]
Fix pmap_is_prefaultable() on arm64 and riscv
The current implementations never correctly return TRUE. In all cases,
when they currently return TRUE, they should have returned FALSE. And,
in some cases, when they currently return FALSE, they should have
returned TRUE. Except for its effects on performance, specifically,
additional page faults and pointless calls to pmap_enter_quick() that
abort, this error is harmless. That is why it has gone unnoticed.
Add a comment to the amd64, arm64, and riscv implementations
describing how their return values are computed.
Adrian Chadd [Mon, 27 Dec 2021 23:45:40 +0000 (15:45 -0800)]
qcom_tcsr: add initial top control and status register (TCSR) support
The Qualcomm TCSR is some top level glue between multiple IP blocks,
both for doing configuration of said IP blocks, some IPC between
them (mostly between multiple execution environments - eg trustzone
and non-TZ), and interrupt status bits for them.
However, for the IPQ4018/IPQ4019, it only is used as a small subset
of IP block configuration. As for what it actually gets used as
for other Qualcomm chipsets? Well, that'll have to wait.
It's a bit of a mess in linux and openwrt. See, every different
SoC support branch ends up with some different TCSR code for it.
So instead, I'm going to land a single TCSR driver that I'm going
to use for the IPQ4018/IPQ4019. When I add the next chipset, I'll
figure out how to organise things so there's a single TCSR driver
that works for multiple platforms.
Adrian Chadd [Mon, 27 Dec 2021 23:27:29 +0000 (15:27 -0800)]
qcom_qup: add initial v1/v2 QUP SPI driver
The Qualcomm Universal Peripherals Engine (QUP) is a unified SPI and I2C
peripheral that ships with a variety of Qualcomm SoCs.
It supports three transfer modes - single PIO, block PIO and DMA.
This driver only supports the single PIO mode, which is enough to
bootstrap the rest of the SPI NAND/NOR support and means I can do
things like read the Wifi calibration data from NOR. It has some
hardware support code for the other transfer modes as well as
some support for split transfers (ie, transfers with no read or
write phase), but I haven't yet implemented those.
This driver is based on four sources - the linux driver, the u-boot
driver, some initial work done for APQ8064 by mmel@, and the APQ8064
Technical Reference Manual which is surprisingly free and open to
read. The linux and u-boot drivers approach a variety of things
completely differently, from how PIO is done, the hardware support
for re-ordering bytes in a transfer word and how the CS lines
are used.
Adrian Chadd [Sun, 26 Dec 2021 17:07:21 +0000 (09:07 -0800)]
qcom_clk: add the qualcomm clock nodes for the IPQ4018
These clock nodes are used by the IPQ4018/IPQ4019 and derivatives.
They're also used by other 32 and 64 bit qualcomm parts; so it's
best to put these nodes here in a single qcom_clk driver and add
to it as we grow new Qualcomm SoC support.
Bjoern A. Zeeb [Mon, 27 Dec 2021 17:42:51 +0000 (17:42 +0000)]
iwlwifi: plug memory modified after free
In certain situations we saw a memory modified after free. This was
tracked down to a pointer not NULLed after free and used in a different
code path. It is unclear how the race happens pending further
investigation but setting the pointer to NULL after free and adding a
check in the 2nd code path handling the case gracefully helps for now.
While here improve another debug messge in sta handling.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Bjoern A. Zeeb [Mon, 27 Dec 2021 17:40:02 +0000 (17:40 +0000)]
iwlwifi: add man pages
Add and hook up man pages for iwlwifi and iwlwififw and install a copy
of the firmware license to /usr/share/docs/legal so it will always be
shipped with the installed system.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Bjoern A. Zeeb [Mon, 27 Dec 2021 17:36:04 +0000 (17:36 +0000)]
iwlwifi: also depend on linuxkpi_wlan
The 802.11 compat code is split off linuxkpi.ko into linuxkpi_wlan.ko
in case it is built as a module. Depend on that.
While here adjust our module to a longer version to avoid conflicts.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Bjoern A. Zeeb [Sun, 26 Dec 2021 18:52:51 +0000 (18:52 +0000)]
LinuxKPI: add 802.11 compat code
Add 802.11 compat code for mac80211 and to a minimal degree cfg80211.
This allows us to compile and use basic functionality of wireless
drivers such as iwlwifi.
This is a constant work in progress but having it in the tree will
allow others to test and more easy to track changes and avoid having
snapshots no longer applying to branches.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Bjoern A. Zeeb [Sun, 26 Dec 2021 18:29:29 +0000 (18:29 +0000)]
LinuxKPI: import beginning of a new version of netdevice.h
Import a netdevice update complementing the last remaining bits of
the old ifnet derived implementation. Along add a (for now) task
based NAPI implementation.
This is the minimal set of chnages which are needed for the initial
support of wireless drivers. The NAPI implementation has an option to
still switch to "direct dispatch" as it had been used by these drivers
before not relying on a deferred context along with some printf tracing.
This has been helpful in the last weeks for debugging and will be
cleaned once we have had broader testing and are sure this is fine as-is.
Should we need a more time-sensitive or load-sensitive response
in the future we can always switch to something more sophisticated.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
X-Differential Revision: D33075 (abandoned without feedback a while ago)
Bjoern A. Zeeb [Sun, 26 Dec 2021 18:26:26 +0000 (18:26 +0000)]
LinuxKPI: add a work-in-progress skbuff implementation
This is a work-in-progress implementation of sk_buff compat code
used for wireless drivers only currently.
Bring in this version of the code as it has proven to be good enough
to have packets going for a few months.
The current implementation has several drawbacks including the need
for us to copy data between sk_buffs and mbufs.
Do not rely on the internals of this implementation. They are highly
likely to change as we will improve the integration to FreeBSD mbufs.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Rick Macklem [Mon, 27 Dec 2021 16:03:41 +0000 (08:03 -0800)]
nfscommon: Return NFSERR_ATTRNOTSUPP for AUDIT/ALARM ACEs
FreeBSD only supports Allow/Deny ACEs in NFSv4 ACLs.
As such, it does not make sense to parse Audit/Alarm
ACEs. Modify nfsrv_dissectace() so that it returns
NFSERR_ATTRNOTSUPP if an Audit/Alarm ACE is found in
the ACL being parsed. The code has been #ifdef notnow'd,
since Audit/Alarm ACEs might be supported someday.
This should not have significant impact, since FreeBSD
reports to clients that only Allow/Deny ACEs are
supported and an attempt to set one would have failed
anyhow.
Rick Macklem [Mon, 27 Dec 2021 00:37:02 +0000 (16:37 -0800)]
nfscommon: Add arguments for support of the dacl attribute
NFSv4.1/4.2 has an alternative to the acl attribute, called
dacl, that includes support for the ACL_ENTRY_INHERITED flag,
called NFSV4ACE_INHERITED in NFSv4.
This patch adds a dacl argument to nfsrv_buildacl(),
nfsrv_dissectacl() and nfsrv_dissectace(), so that they
will handle NFSV4ACE_INHERITED when dacl == true.
Since these functions are always called with dacl == false
for this patch, semantics should not have changed.
A future patch will add support for dacl.