Cy Schubert [Tue, 9 Nov 2021 22:17:01 +0000 (14:17 -0800)]
Revert "wpa: Fix WITHOUT_CRYPT build"
This reverts commit a30e8044aa4753858c189f3384dae2b2f25a150b.
WITHOUT_OPENSSL build is a subset of WITHOUT_CRYPT build. It was
incorrect to label this patch as fixing WITHOUT_CRYPT when in fact
it fixes WITHOUT_OPENSSL. The build failure will be addressed in a
fix for WITHOUT_OPENSSL build.
Rick Macklem [Thu, 11 Nov 2021 23:43:58 +0000 (15:43 -0800)]
nfscl: Add a LayoutError RPC for NFSv4.2 pNFS mounts
If a pNFS server's DS runs out of disk space, it replies
NFSERR_NOSPC to the client doing writing. For the Linux
client, it then sends a LayoutError RPC to the MDS server to
tell it about the error. This patch adds the same to the
FreeBSD NFSv4.2 pNFS client, to maintain Linux compatible
behaviour, particlularily for non-FreeBSD pNFS servers.
Mark Johnston [Thu, 11 Nov 2021 19:26:41 +0000 (14:26 -0500)]
vm_page: Handle VM_ALLOC_NORECLAIM in the contiguous page allocator
We added _NORECLAIM to request that kmem_alloc_contig_pages() not spend
time scanning physical memory for candidates to reclaim. In some
situations the scanning can induce large amounts of undesirable latency,
and it's less important that the request be satisfied than it is that we
not spend many milliseconds scanning.
The problem extends to vm_reserv_reclaim_contig(), which unlike
vm_reserv_reclaim() may have to scan the entire list of partially
populated reservations. Use VM_ALLOC_NORECLAIM to request that this
scan not be executed.[1]
As a side effect, this fixes a regression in 02fb0585e7b3 ("vm_page:
Drop handling of VM_ALLOC_NOOBJ in vm_page_alloc_contig_domain()")
where VM_ALLOC_CONTIG was not included in VPAC_FLAGS or VPANC_FLAGS even
though it is not masked by kmem_alloc_contig_pages().[2]
Randall Stewart [Thu, 11 Nov 2021 11:35:51 +0000 (06:35 -0500)]
tcp: Rack may still calculate long RTT on persists probes.
When a persists probe is lost, we will end up calculating a long
RTT based on the initial probe and when the response comes from the
second probe (or third etc). This means we have a minimum of a
confidence level of 3 on a incorrect probe. This commit will change it
so that we have one of two options
a) Just not count RTT of probes where we had a loss
<or>
b) Count them still but degrade the confidence to 0.
I have set in this the default being to just not measure them, but I am open
to having the default be otherwise.
Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D32897
Randall Stewart [Thu, 11 Nov 2021 11:28:18 +0000 (06:28 -0500)]
tcp: Congestion control cleanup.
NOTE: HEADS UP read the note below if your kernel config is not including GENERIC!!
This patch does a bit of cleanup on TCP congestion control modules. There were some rather
interesting surprises that one could get i.e. where you use a socket option to change
from one CC (say cc_cubic) to another CC (say cc_vegas) and you could in theory get
a memory failure and end up on cc_newreno. This is not what one would expect. The
new code fixes this by requiring a cc_data_sz() function so we can malloc with M_WAITOK
and pass in to the init function preallocated memory. The CC init is expected in this
case *not* to fail but if it does and a module does break the
"no fail with memory given" contract we do fall back to the CC that was in place at the time.
This also fixes up a set of common newreno utilities that can be shared amongst other
CC modules instead of the other CC modules reaching into newreno and executing
what they think is a "common and understood" function. Lets put these functions in
cc.c and that way we have a common place that is easily findable by future developers or
bug fixers. This also allows newreno to evolve and grow support for its features i.e. ABE
and HYSTART++ without having to dance through hoops for other CC modules, instead
both newreno and the other modules just call into the common functions if they desire
that behavior or roll there own if that makes more sense.
Note: This commit changes the kernel configuration!! If you are not using GENERIC in
some form you must add a CC module option (one of CC_NEWRENO, CC_VEGAS, CC_CUBIC,
CC_CDG, CC_CHD, CC_DCTCP, CC_HTCP, CC_HD). You can have more than one defined
as well if you desire. Note that if you create a kernel configuration that does not
define a congestion control module and includes INET or INET6 the kernel compile will
break. Also you need to define a default, generic adds 'options CC_DEFAULT=\"newreno\"
but you can specify any string that represents the name of the CC module (same names
that show up in the CC module list under net.inet.tcp.cc). If you fail to add the
options CC_DEFAULT in your kernel configuration the kernel build will also break.
Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc.
RELNOTES:YES
Differential Revision: https://reviews.freebsd.org/D32693
Navdeep Parhar [Wed, 10 Nov 2021 19:38:54 +0000 (11:38 -0800)]
cxgbe(4): internal knob for flexible control over FEC selection.
Recent firmwares have support for autonomous FEC selection and a "force"
knob to let the driver control this behavior (or not) in a fine grained
manner. This change adds a driver knob so that all the different ways of
configuring the link FEC can be exercised. Note that this controls the
internal driver/firmware interaction for link configuration and is not
meant for general use.
Navdeep Parhar [Wed, 10 Nov 2021 18:54:53 +0000 (10:54 -0800)]
cxgbe(4): separate sysctls for user-requested and in-use FEC.
Recent firmwares have more leeway in FEC selection and there is a need
to track the FECs requested by the driver separately from the FEC in use
on the link. The existing dev.<port>.<inst>.fec sysctl can read both but
its behavior depends on the link state and it is sometimes hard to find
out what was requested when the link is up.
Split the fec sysctl into two (requested_fec and link_fec) to get access
to both pieces of information regardless of the link state.
Mark Johnston [Wed, 10 Nov 2021 21:57:12 +0000 (16:57 -0500)]
mbuf: Fix an offset calculation in m_apply_extpg_one()
We were not including the requested starting offset in the page offset.
Reviewed by: jhb
Fixes: 3c7a01d773ac ("Extend m_apply() to support unmapped mbufs.")
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32922
Put some Linux compatibility stuff under BSD_VISIBLE namespace, in
particular, sys/cpuset.h definitions. Also, if user really want
Linux compatibility, she can request cpu_set_t typedef with
_WITH_CPU_SET_T define.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32901
ifconfig(8): Don't set network interface capabilities when there is no change.
A quick grep through the kernel code shows network drivers compute the
changed bits of network capabilities after a SIOCSIFCAP IOCTL(2) by
using the bitwise exclusive or operation. When the set capabilities
are equal to the already read capabilities, no action will be taken.
Let ifconfig(8) predict this case and skip the SIOCSIFCAP IOCTL(2)
system call.
Discussed with: kib@ (revert change in case of issues)
MFC after: 1 week
Sponsored by: NVIDIA Networking
Martin Matuska [Wed, 10 Nov 2021 12:41:17 +0000 (13:41 +0100)]
zfs: merge openzfs/zfs@6c8f03232 (master) into main
Notable upstream pull request merges:
#12333: Creating gang ABDs for Raidz optional IOs
#12668: FreeBSD: Catch up with recent VFS changes
#12687: Skip spacemaps reading in case of pool readonly import
#12704: Fix some FreeBSD VOPs to synchronize properly with teardown
#12724: Fix lseek(SEEK_DATA/SEEK_HOLE) mmap consistency
Bjoern A. Zeeb [Tue, 9 Nov 2021 22:11:15 +0000 (22:11 +0000)]
arm64/gicv3: improve a panic message
Print the device/unit in the panic message for which we cannot get
the MSI device ID to have a clue where to start looking.
While here use __func__ instead of hardcoding the function name.
Fedor Uporov [Tue, 9 Nov 2021 20:50:39 +0000 (12:50 -0800)]
Skip spacemaps reading in case of pool readonly import
The only zdb utility require to read metaslab-related data during
read-only pool import because of spacemaps validation. Add global
variable which will allow zdb read spacemaps in case of readonly
import mode.
Brian Atkinson [Tue, 9 Nov 2021 19:51:33 +0000 (12:51 -0700)]
Single IO issue for raidz writes with skip sector
In order to reduce contention on the vq_lock, optional skip sectors
for Raidz writes can be placed into a single IO request. This is done by
padding out the linear ABD for a parity column to contain the skip
sector and by creating gang ABD to contain the data and skip sector for
data columns.
The vdev_raidz_map_alloc() function now contains specific functions for
both reads and write to allocate the ABD's that will be issued down to
the VDEV chldren.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-By: Mark Maybee <mark.maybee@delphix.com> Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #12333
The submit_bio() prototype has changed again. The version is 5.16
still only expects a single argument but the return type has changed
to void. Since we never used the returned value before update the
configure check to detect both single arg versions.
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12725
Commit https://github.com/torvalds/linux/commit/2e9bc346 moved
the elevator.h header under the block/ directory as part of some
refactoring. This turns out not to be a problem since there's
no longer anything we need from the header. This has been the
case for some time, this change removes the elevator.h include
and replaces it with a major.h include.
Reviewed-by: Alexander Lobakin <alobakin@pm.me> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12725
Dries Michiels [Sun, 7 Feb 2021 19:25:31 +0000 (20:25 +0100)]
UPDATING: Change update procedure to use etcupdate(8) over mergemaster(8)
This commit aligns the steps in UPDATING with the steps from the
handbook which already prefers etcupdate(8). While here also remove a
dubious comment.
Kyle Evans [Wed, 10 Nov 2021 06:42:42 +0000 (00:42 -0600)]
grep: fix/remove references to -P
-P in gnugrepland means PCRE, which we do not support. We may eventually
support it if onigmo ends up getting imported as a more performant regex
implementation, and we can re-add it properly in these places (and more)
when that time comes.
The optstr change is a functional nop; the case was not explicitly handled,
thus ending in usage() anyways.
Rick Macklem [Tue, 9 Nov 2021 23:13:15 +0000 (15:13 -0800)]
VOP_ALLOCATE: Update man page for Commit f0c9847a6c47
Commit f0c9847a6c47 added the ioflag and cred arguments to
VOP_ALLOCATE() for NFSv4.2 server support. This patch updates
the man page for these arguments.
John Baldwin [Tue, 9 Nov 2021 18:52:30 +0000 (10:52 -0800)]
crypto: Don't assert on valid IV length for Chacha20-Poly1305.
The assertion checking for valid IV lengths added in 1833d6042c9a
was not properly updated to permit an IV length of 8 in commit 42dcd39528c6.
Reported by: syzbot+f0c0559b8be1d6eb28c7@syzkaller.appspotmail.com
Reviewed by: markj
Fixes: 42dcd39528c6 crypto: Support Chacha20-Poly1305 with a nonce size of 8 bytes.
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32860
John Baldwin [Tue, 9 Nov 2021 18:50:12 +0000 (10:50 -0800)]
Don't require the socket lock for sorele().
Previously, sorele() always required the socket lock and dropped the
lock if the released reference was not the last reference. Many
callers locked the socket lock just before calling sorele() resulting
in a wasted lock/unlock when not dropping the last reference.
Move the previous implementation of sorele() into a new
sorele_locked() function and use it instead of sorele() for various
places in uipc_socket.c that called sorele() while already holding the
socket lock.
The sorele() macro now uses refcount_release_if_not_last() try to drop
the socket reference without locking the socket. If that shortcut
fails, it locks the socket and calls sorele_locked().
Mark Johnston [Tue, 9 Nov 2021 18:07:57 +0000 (13:07 -0500)]
pci: Implement pci_bar_enabled() for SR-IOV VFs
In a VF's configuration space, "memory space enable" is hard-wired to 0,
so the existing implementation always returns false. We need to read
the SR-IOV control register from the PF device to get the value of the
MSE bit.
Fix pci_bar_enabled() to read this register instead for VFs. I don't
see any way to access the PF's config space without a backpointer in the
pci device ivars, so I added one.
This fixes a regression where bhyve(8) fails to map the MSI-X table
after commit 7fa233534736 ("bhyve: Map the MSI-X table unconditionally
for passthrough") when a VF is passed through, since with that commit we
use PCIOCBARMMAP to map the table and that ioctl always fails for VFs
without this change. As a bonus, pciconf(8) now correctly reports the
enablement of BARs for VFs.
Reported and tested by: Raúl Muñoz <raul.munoz@custos.es>
Reviewed by: rstone, jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32839
John Baldwin [Tue, 9 Nov 2021 17:42:12 +0000 (09:42 -0800)]
vfs: Consistently validate AT_* flags in kern_* functions.
Some syscalls checked for invalid AT_* flags in sys_* and others in
kern_*.
Reviewed by: kib
Obtained from: CheriBSD
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D32864
Mike Karels [Thu, 28 Oct 2021 23:39:43 +0000 (18:39 -0500)]
systat: clean up code assuming network classes
Similar to netstat, clean up code that uses inet_lnaof() to check for
binding to "host 0" (lowest host on network) as a "network" bind.
Such things don't happen, and current networks are seldom if ever
found in /etc/networks.
Mike Karels [Wed, 27 Oct 2021 03:12:24 +0000 (22:12 -0500)]
sockstat: change check for wildcard sockets to avoid historical classes
sockstat was checking whether a bound address was "host 0", the lowest
host on a network, using inet_lnaof(). This only works for class A/B/C.
However, it isn't useful to bind such an address unless it is really
the unspecified address INADDR_ANY. Change the check to to use that.
Mike Karels [Wed, 27 Oct 2021 03:39:10 +0000 (22:39 -0500)]
netstat: reduce use of historical Internet classes
When attempting to characterize bound addresses, netstat was checking
for host 0 on a (historical) net using inet_lnaof(). Such addresses
are not normally bound, as they would not work, with the exception
of the unspecified address, INADDR_ANY. Check for that explicitly.
Similarly, don't check bound addresses for a match to a network name.
Mike Karels [Wed, 27 Oct 2021 03:48:23 +0000 (22:48 -0500)]
mountd: deprecate exports to a network without mask
The exports file format allows export to a network using an explicit
mask or prefix length (CIDR). It also allows a network with just
a dotted address, in which case the historical mask was used.
Deprecate this usage, and warn when it is used. Document that this
is deprecated.
Mike Karels [Wed, 27 Oct 2021 03:25:09 +0000 (22:25 -0500)]
man pages: deprecate Internet Class A/B/C
Mark functions inet_netof(), inet_lnaof(), and inet_makeaddr() as
deprecated, as they assume Class A/B/C. inet_makeaddr() mostly works
when networks are a multiple of 8 bits, but warn for anything other
than historical classes. Reduce other mentions of network classes.
Mike Karels [Thu, 28 Oct 2021 14:32:31 +0000 (09:32 -0500)]
ifconfig: warn if setting an Internet address without mask
Add a postproc function for af_inet, and add interface flags as a
parameter. Check there if setting an address without a mask unless
the interface is loopback or point-to-point, where mask is not really
meaningful; warn if so. This will hopefully be an error in the future.
Mike Karels [Wed, 27 Oct 2021 03:01:09 +0000 (22:01 -0500)]
kernel: deprecate Internet Class A/B/C
Hide historical Class A/B/C macros unless IN_HISTORICAL_NETS is defined;
define it for user level. Define IN_MULTICAST separately from IN_CLASSD,
and use it in pf instead of IN_CLASSD. Stop using class for setting
default masks when not specified; instead, define new default mask
(24 bits). Warn when an Internet address is set without a mask.
Rick Macklem [Mon, 8 Nov 2021 23:58:00 +0000 (15:58 -0800)]
nfsd: Fix the NFSv4.2 pNFS MDS server for NFSERR_NOSPC via LayoutError
If a pNFS server's DS runs out of disk space, it replies
NFSERR_NOSPC to the client doing writing. For the Linux
client, it then sends a LayoutError RPC to the MDS server to
tell it about the error and keeps retrying, doing repeated
LayoutGets to the MDS and Write RPCs to the DS. The Linux client is
"stuck" until disk space on the DS is free'd up unless
a subsequent LayoutGet request is sent a NFSERR_NOSPC
reply.
The looping problem still occurs for NFSv4.1 mounts, but no
fix for this is known at this time.
This patch changes the pNFS MDS server to reply to LayoutGet
operations with NFSERR_NOSPC once a LayoutError reports the
problem, until the DS has available space. This keeps the Linux
NFSv4.2 from looping.
Found during recent testing because of issues w.r.t. a DS
being out of space found during a recent IEFT NFSv4 working
group testing event.
Rick Macklem [Mon, 8 Nov 2021 20:59:31 +0000 (12:59 -0800)]
nfsd: Fix f_bavail and f_ffree for NFSv4 when negative
Since the NFS Space_available and Files_available are unsigned,
the NFSv3 server sets them to 0 when negative, so that they
do not appear to be large positive values for non-FreeBSD clients.
This patch fixes the NFSv4 server to do the same.
Found during a recent IEFT NFSv4 working group testing event.
Mitchell Horne [Mon, 8 Nov 2021 19:33:25 +0000 (15:33 -0400)]
hwpmc: initialize arm64 counter/interrupt state
Performance counters and overflow interrupts are assumed to be disabled
by default, but this is not guaranteed. Ensure we disable both during
per-cpu initialization, before enabling the PMU. Otherwise, some systems
(such as the Ampere eMAG) would experience an interrupt storm upon
loading the hwpmc module.
Reviewed by: br
MFC after: 5 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32854
Randall Stewart [Mon, 8 Nov 2021 16:49:34 +0000 (11:49 -0500)]
tcp: Printf should be removed.
There is a printf when a socket option down to the CC module fails, this really
should not be a printf. In fact this whole option needs to be re-thought in coordination
with some other changes in the CC modules (its just not right but its ok what it
does here if it fails since it will just use the ECN beta).
Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D32894
Peter Grehan [Mon, 1 Nov 2021 13:35:43 +0000 (23:35 +1000)]
igc: Use hardware routine for PHY reset
Summary:
The previously used software reset routine wasn't sufficient
to reset the PHY if the bootloader hadn't left the device in
an initialized state. This was seen with the onboard igc port
on an 11th-gen Intel NUC.
The software reset isn't used in the Linux driver so all related
code has been removed.
Tested on: Netgate 6100 onboard ports, a discrete PCIe I225-LM card,
and an 11th-gen Intel NUC.
Kristof Provost [Thu, 4 Nov 2021 17:05:58 +0000 (18:05 +0100)]
if_gif: fix vnet shutdown panic
If an if_gif exists and has an address assigned inside a vnet when the
vnet is shut down we failed to clean up the address, leading to a panic
when we ip_destroy() and the V_in_ifaddrhashtbl is not empty.
This happens because of the VNET_SYS(UN)INIT order, which means we
destroy the if_gif interface before the addresses can be purged (and
if_detach() does not remove addresses, it assumes this will be done by
the stack teardown code).
Set subsystem SI_SUB_PSEUDO just like if_bridge so the cleanup
operations happen in the correct order.
Kornel Duleba [Tue, 2 Nov 2021 11:53:22 +0000 (12:53 +0100)]
ossl: Add support for AES-CBC cipher
AES-CBC OpenSSL assembly is used underneath.
The glue layer(ossl_aes.c) is based on CHACHA20 implementation.
Contrary to the SHA and CHACHA20, AES OpenSSL assembly logic
does not have a fallback implementation in case CPU doesn't
support required instructions.
Because of that CPU caps are checked during initialization and AES
support is advertised only if available.
The feature is available on all architectures that ossl supports:
i386, amd64, arm64.
The biggest advantage of this patch over existing solutions
(aesni(4) and armv8crypto(4)) is that it supports SHA,
allowing for ETA operations.
Felix Johnson [Mon, 8 Nov 2021 06:14:58 +0000 (01:14 -0500)]
find(1): Update date format reference and remove cvs(1) references
cvs(1) is not installed by default. Change the date format reference to
note that find(1) understands ISO8601 and RFC822 date formats. Also
remove references to cvs(1).
PR: 254894
MFC after: 3 days
Reported by: danielsh@apache.org
When using lseek(2) to report data/holes memory mapped regions of
the file were ignored. This could result in incorrect results.
To handle this zfs_holey_common() was updated to asynchronously
writeback any dirty mmap(2) regions prior to reporting holes.
Additionally, while not strictly required, the dn_struct_rwlock is
now held over the dirty check to prevent the dnode structure from
changing. This ensures that a clean dnode can't be dirtied before
the data/hole is located. The range lock is now also taken to
ensure the call cannot race with zfs_write().
Furthermore, the code was refactored to provide a dnode_is_dirty()
helper function which checks the dnode for any dirty records to
determine its dirtiness.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Rich Ercolani <rincebrain@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #11900
Closes #12724
Rick Macklem [Sun, 7 Nov 2021 19:43:03 +0000 (11:43 -0800)]
nfsd: Fix the NFSv4 pNFS MDS server for DS NFSERR_NOSPC
If a pNFS server's DS runs out of disk space, it replies
NFSERR_NOSPC to the client doing writing. For the Linux
client, it then sends a LayoutError RPC to the server to
tell it about the error and keeps retrying, doing repeated
LayoutGet and Write RPCs to the DS. The Linux client is
"stuck" until disk space on the DS is free'd up.
For a mirrored server configuration, the first mirror that
ran out of space was taken offline. This does not make
much sense, since the other mirror(s) will run out of space
soon and the fix is a manual cleanup up disk space.
This patch changes the pNFS server to not disable a mirror
for the mirrored case when this occurs.
Further work is needed, since the Linux client expects the
MDS to reply NFSERR_NOSPC to LayoutGets once the DS is out
of space. Without this further change, the above mentioned
looping occurs.
Found during a recent IEFT NFSv4 working group testing event.
The clearenv(3) function allows us to clear all environment
variable in one shot. This may be useful for security programs that
want to control the environment or what variables are passed to new
spawned programs.
This moves linux_ptrace.c from sys/amd64/linux/ to sys/compat/linux/,
making it possible to use it on architectures other than amd64.
It also enables Linux ptrace(2) on arm64.
Rick Macklem [Sat, 6 Nov 2021 20:26:43 +0000 (13:26 -0700)]
vfs: Add "ioflag" and "cred" arguments to VOP_ALLOCATE
When the NFSv4.2 server does a VOP_ALLOCATE(), it needs
the operation to be done for the RPC's credential and not
td_ucred. It also needs the writing to be done synchronously.
This patch adds "ioflag" and "cred" arguments to VOP_ALLOCATE()
and modifies vop_stdallocate() to use these arguments.
The VOP_ALLOCATE.9 man page will be patched separately.
Andriy Gapon [Wed, 9 Jun 2021 08:50:55 +0000 (11:50 +0300)]
rk3328_codec: set output gain to the value found in linux
According to Linux code the new value should correspond to 0dB gain
while the original value corresponded to 6dB gain which may be
uncomfortable for some output types.