Rick Macklem [Mon, 31 May 2021 00:52:43 +0000 (17:52 -0700)]
nfsd: Add support for the NFSv4.1/4.2 Secinfo_no_name operation
The Linux client is now attempting to use the Secinfo_no_name
operation for NFSv4.1/4.2 mounts. Although it does not seem to
mind the NFSERR_NOTSUPP reply, adding support for it seems
reasonable.
I also noticed that "savflag" needed to be 64bits in
nfsrvd_secinfo() since nd_flag in now 64bits, so I changed
the declaration of it there. I also added code to set "vp" NULL
after performing Secinfo/Secinfo_no_name, since these
operations consume the current FH, which is represented
by "vp" in nfsrvd_compound().
Fixing when the server replies NFSERR_WRONGSEC so that
it conforms to RFC5661 Sec. 2.6 still needs to be done
in a future commit.
Alan Somers [Sun, 30 May 2021 21:54:46 +0000 (15:54 -0600)]
[skip ci] add a CODEOWNERS file
Summary:
Convert MAINTAINERS into a Github CODEOWNERS file. This will
automatically assign reviewers to some GH pull requests. The conversion
is not 1:1; some committers don't have Github accounts (e.g. adrian),
some functional areas don't neatly correspond to a set of files (e.g.
kqueue), and mailing lists can't be assigned as a reviewer (e.g.
secteam@). But it's a start.
vinvalbuf: do not panic if we were unable to flush dirty buffers
Return EBUSY instead and let caller to handle the issue.
For vgone()/vnode reclamation, caller first does vinvalbuf(V_SAVE),
which return EBUSY in case dirty buffers where not flushed. Then caller
calls vinvalbuf(0) due to non-zero return, which gets rid of all dirty
buffers without dependencies.
PR: 238565
Reviewed by: asomers, mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30555
VFS_QUOTACTL(9): allow implementation to indicate busy state changes
Instead of requiring all implementations of vfs_quotactl to unbusy
the mount for Q_QUOTAON and Q_QUOTAOFF, add an "mp_busy" in/out param
to VFS_QUOTACTL(9). The implementation may then indicate to the caller
whether it needed to unbusy the mount.
Also, add stbool.h to libprocstat modules which #define _KERNEL
before including sys/mount.h. Otherwise they'll pull in sys/types.h
before defining _KERNEL and therefore won't have the bool definition
they need for mp_busy.
Lutz Donnerhacke [Sun, 30 May 2021 13:41:24 +0000 (15:41 +0200)]
libalias: Fix nameing and initialization of a constant
The commit 189f8eea contains a refactorisation of a constant. During
later review D30283 the naming of the constant was improved and the
initialization became explicit. Put this into the tree, in order to
MFC the correct naming.
Parts of libprocstat like to pretend they're kernel components for the
sake of including mount.h, and including sys/types.h in the _KERNEL
case doesn't fix the build for some reason. Revert both the
VFS_QUOTACTL() change and the follow-up "fix" for now.
VFS_QUOTACTL(9): allow implementation to indicate busy state changes
Instead of requiring all implementations of vfs_quotactl to unbusy
the mount for Q_QUOTAON and Q_QUOTAOFF, add an "mp_busy" in/out param
to VFS_QUOTACTL(9). The implementation may then indicate to the caller
whether it needed to unbusy the mount.
Justin Hibbits [Mon, 10 May 2021 00:19:07 +0000 (19:19 -0500)]
Apply r350463(43ded0a321a) to powerpc64 radix pmap
Invalidate the last page of a demoted superpage mapping, instead of the
first page, as it results in slightly more promotions and fewer
failures. While here, replace 'boolean_t's with 'bool's in
mmu_radix_advise().
Justin Hibbits [Sun, 9 May 2021 16:07:28 +0000 (11:07 -0500)]
Apply r350335(5d18382b728) to powerpc64 radix pmap
Simplify pmap_clear_modify() a bit, by assuming that since the superpage
demotion succeeded, all 4k mappings from it are valid. Deindent the
surrounding code, as there are no 'else' branches in the code anyway.
Navdeep Parhar [Tue, 25 May 2021 20:47:06 +0000 (13:47 -0700)]
cxgbe(4): Update firmwares to 1.25.6.0.
Changes since 1.25.0.0 are listed here. This list comes from the
Release Notes for the "Chelsio Unified Wire v3.14.0.3 for Linux"
release dated 2021-05-21.
Fixes
-----
BASE:
- Fixed Back to back T6 100G-CR4 link coming up with NO FEC sometimes.
- [T5] Try to bring up link in 1G speed if link doesn't come up on 10G.
- Fixed a bug to not allow BaseR fec in 100G speed.
- Fixed linkup issues on BT adapter in 1G and 100M speed.
- Fixed an issue to allow driver to send VI_ENABLE multiple times (once
with rx disable and then later rx enable).
- Fixed rate limiting not working on class number 16 to 30.
- Fixed backward compatibility issue in port type interpretation with vpd
version 0x80.
ETH:
- Fixed a case when firmware failed to deliver NIC WR completion to host.
- No rate limit support for WR ETH_TX_PKTS2 due to performance reasons.
OFLD
- Fixed a connection hang in SO adapters when tp_plen_max (set by driver)
is more than the window size.
- Added fw_filter_vnic_mode to firmware API file (t4fw_interface.h)
- Use correct rx channel in coprocessor crypto completion (CPL_FW6_PLD). This
was causing out of order completion to host.
FOiSCSI
- Fixed a crash due to unaligned access of ipv6 address.
- Fixed a crash during lun reset.
Enhancements
------------
ETH:
- Rate limiting support added for encapsulated (vxlan, nvgre, geneve) NIC TCP
packets.
OFLD:
- More than 128 SGLs supported in FW_RI_FR_NSMR_WR. Now, more than 16GB
(upto 64GB) of PBLs can be written with single FW_RI_FR_NSMR_WR.
Warner Losh [Sat, 29 May 2021 05:01:52 +0000 (23:01 -0600)]
nvme: fix a race between failing the controller and failing requests
Part of the nvme recovery process for errors is to reset the
card. Sometimes, this results in failing the entire controller. When nda
is in use, we free the sim, which will sleep until all the I/O has
completed. However, with only one thread, the request fail task never
runs once the reset thread sleeps here. Create two threads to allow I/O
to fail until it's all processed and the reset task can proceed.
This is a temporary kludge until I can work out questions that arose
during the review, not least is what was the race that queueing to a
failure task solved. The original commit is vague and other error paths
in the same context do a direct failure. I'll investigate that more
completely before committing changing that to a direct failure. mav@
raised this issue during the review, but didn't otherwise object.
Multiple threads, though, solve the problem in the mean time until other
such means can be perfected.
Jessica Clarke [Sat, 29 May 2021 03:36:36 +0000 (04:36 +0100)]
.github: Attempt to un-break Clang 9 action
GitHub removed Clang 9 from the 20.04 image[1], breaking this build.
Thus, manually add the specific versioned packages we need for the
Ubuntu jobs to ensure they're installed. Note that we don't do the same
for macOS, as Homebrew does not allow multiple llvm@N to co-exist,
giving an error if you attempt to install a second one. In practice we
don't actually use the compiler field here for anything other than the
build name, it's only the cross-bindir that matters, so when it
eventually moves to 12 the name will get confusing but the job will
still work.
Kirk McKusick [Sat, 29 May 2021 02:41:05 +0000 (19:41 -0700)]
Fix fsck_ufs segfault when it needs to rerun.
The segfault was being hit in the rerun of Pass 1 in ginode() when
trying to get an inode that needs to be repaired. When the first run
of fsck_ffs finishes it clears the inode cache, but ginode() was
failing to check properly and tried to access the deallocated cache entry.
Reported by: Peter Holm
Reviewed by: Chuck Silvers
Tested by: Peter Holm and Chuck Silvers
MFC after: 3 days
Sponsored by: Netflix
John Baldwin [Fri, 21 May 2021 00:16:23 +0000 (17:16 -0700)]
cxgbe tom: Free pending iSCSI mbufs on connection shutdown.
If an iSCSI connection is shutdown abruptly (e.g. by a RST from the
peer), pending iSCSI PDUs and page pod work requests can be in the
ulp_pduq when the final CPL is received indicating the death of the
connection.
John Baldwin [Thu, 20 May 2021 23:03:19 +0000 (16:03 -0700)]
cxgbei: Fix a race between transfer setup and a peer reset.
In 4427ac3675f9, the TOM driver stopped sending work requests to
program iSCSI page pods directly and instead queued them to be written
asynchronously with iSCSI PDUs. The queue of mbufs to send is
protected by the inp lock. However, the inp cannot be safely obtained
from the toep since a RST from the remote peer might have cleared
toep->inp asynchronously in an ithread. To fix, obtain the inp from
the socket as is already done in icl_cxgbei_conn_pdu_queue_cb() and
fail the new transfer setup with ECONNRESET if the connection has been
reset.
To avoid passing sockets or inps into the page pod routines, pull the
mbufq out of the two relevant page pod routines such that the routines
queue new work request mbufs to a caller-supplied mbufq.
John Baldwin [Fri, 28 May 2021 23:45:29 +0000 (16:45 -0700)]
cxgbei: Support iSCSI offload on T6.
T6 makes several changes relative to T5 for receive of iSCSI PDUs.
First, earlier adapters issue either 2 or 3 messages to the host for
each PDU received: CPL_ISCSI_HDR contains the BHS of the PDU,
CPL_ISCSI_DATA (when DDP is not used for zero-copy receive) contains
the PDU data as buffers on the freelist, and CPL_RX_ISCSI_DDP with
status of the PDU such as result of CRC checks. In T6, a new
CPL_RX_ISCSI_CMP combines CPL_ISCSI_HDR and CPL_RX_ISCSI_DDP. Data
PDUs which are directly placed via DDP only report a single
CPL_RX_ISCSI_CMP message. Data PDUs received on the free lists are
reported as CPL_ISCSI_DATA followed by CPL_RX_ISCSI_CMP. Control PDUs
such as R2T are still reported via CPL_ISCSI_HDR and CPL_RX_ISCSI_DDP.
Supporting this requires changing the CPL_ISCSI_DATA handler to
allocate a PDU structure if it is not preceded by a CPL_ISCSI_HDR as
well as support for the new CPL_RX_ISCSI_CMP.
Second, when using DDP for zero-copy receive, T6 will only issue a
CPL_RX_ISCSI_CMP after a burst of PDUs have been received (indicated
by the F flag in the BHS). In this case, the CPL_RX_ISCSI_CMP can
reflect the completion of multiple PDUs and the BHS and TCP sequence
number included in the message are from the last PDU received in the
burst. Notably, the message does not include any information about
earlier PDUs received as part of the burst. Instead, the driver must
track the amount of data already received for a given transfer and use
this to compute the amount of data received in a burst. In addition,
the iSCSI layer currently has no way to permit receiving a logical PDU
which spans multiple PDUs. Instead, the driver presents each burst as
a single, "large" PDU to the iSCSI target and initiators. This is
done by rewriting the buffer offset and data length fields in the BHS
of the final PDU as well as rewriting the DataSN so that the received
PDUs appear to be in order.
To track all this, cxgbei maintains a hash table of 'cxgbei_cmp'
structures indexed by transfer tags for each offloaded iSCSI
connection. When a SCSI_DATA_IN message is received, the ITT from the
received BHS is used to find the necessary state in the hash table,
whereas SCSI_DATA_OUT replies use the TTT as the key. The structure
tracks the expected starting offset and DataSN of the next burst as
well as the rewritten DataSN value used for the previously received
PDU.
PAPANI SRIKANTH [Fri, 28 May 2021 06:17:56 +0000 (00:17 -0600)]
Newly added features and bug fixes in latest Microchip SmartPQI driver
It includes:
1)Newly added TMF feature.
2)Added newly Huawei & Inspur PCI ID's
3)Fixed smartpqi driver hangs in Z-Pool while running on FreeBSD12.1
4)Fixed flooding dmesg in kernel while the controller is offline during in ioctls.
5)Avoided unnecessary host memory allocation for rcb sg buffers.
6)Fixed race conditions while accessing internal rcb structure.
7)Fixed where Logical volumes exposing two different names to the OS it's due to the system memory is overwritten with DMA stale data.
8)Fixed dynamically unloading a smartpqi driver.
9)Added device_shutdown callback instead of deprecated shutdown_final kernel event in smartpqi driver.
10)Fixed where Os is crashed during physical drive hot removal during heavy IO.
11)Fixed OS crash during controller lockup/offline during heavy IO.
12)Fixed coverity issues in smartpqi driver
13)Fixed system crash while creating and deleting logical volume in a continuous loop.
14)Fixed where the volume size is not exposing to OS when it expands.
15)Added HC3 pci id's.
Make it under SI_SUB_CPU sysinit, instead of much later SI_SUB_DRIVERS.
The SI_SUB_DRIVERS survived from times when FPU used real ISA attachment,
now it is only pnp stub claiming id.
PR: 255997
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30512
Jessica Clarke [Fri, 28 May 2021 18:07:17 +0000 (19:07 +0100)]
aic7xxx: Fix re-building firmware with -fno-common
The generated C output for aicasm_scan.l defines yylineno already, so
references to it from other files should use an extern declaration.
The STAILQ_HEAD use in aicasm_symbol.h also provided an identifier,
causing it to both define the struct type and define a variable of that
struct type, causing any C file including the header to define the same
variable. This variable is not used (and confusingly clashes with a
field name just below) and was likely caused by confusion when switching
between defining fields using similar type macros and defining the type
itself.
Mark Johnston [Fri, 28 May 2021 14:41:43 +0000 (10:41 -0400)]
libradius: Fix attribute length validation in rad_get_attr(3)
The length of the attribute header needs to be excluded when comparing
the attribute length against the length of the packet. Otherwise,
validation may incorrectly fail when fetching the final attribute in a
message.
Fixes: 8d5c78130 ("libradius: Fix input validation bugs")
Reported by: Peter Eriksson
Tested by: Peter Eriksson
MFC after: now
Sponsored by: The FreeBSD Foundation
Nathan Whitehorn [Fri, 28 May 2021 13:53:42 +0000 (09:53 -0400)]
Fix scripted installation from media without local distfiles.
The bsdinstall script target did not have the infrastructure to fetch
distfiles from a remote server the way the interactive installer does
on e.g. bootonly media. Solve this by factoring out the parts of the
installer that deal with fetching missing distributions into a new
install stage called 'fetchmissingdists', which is called by both the
interactive and scripted installer frontends.
In the course of these changes, cleaned up a few other issues with
the fetching of missing distribution files and added a warning if
fetching the MANIFEST file, which is used to verify the integrity of
the distribution files. We should at some point add cryptographic
signatures to MANIFEST so that it can be fetched safely if not present
on the install media (which it is for bootonly media).
Rick Macklem [Fri, 28 May 2021 02:08:36 +0000 (19:08 -0700)]
nfscl: Use hash lists to improve expected search performance for opens
A problem was reported via email, where a large (130000+) accumulation
of NFSv4 opens on an NFSv4 mount caused significant lock contention
on the mutex used to protect the client mount's open/lock state.
Although the root cause for the accumulation of opens was not
resolved, it is obvious that the NFSv4 client is not designed to
handle 100000+ opens efficiently. When searching for an open,
usually for a match by file handle, a linear search of all opens
is done.
Commit 3f7e14ad9345 added a hash table of lists hashed on file handle
for the opens. This patch uses the hash lists for searching for
a matching open based of file handle instead of an exhaustive
linear search of all opens.
This change appears to be performance neutral for a small number
of opens, but should improve expected performance for a large
number of opens.
This commit should not affect the high level semantics of open
handling.
Mark Johnston [Thu, 27 May 2021 19:49:59 +0000 (15:49 -0400)]
ktrace: Fix a race with fork()
ktrace(2) may toggle trace points in any of
1. a single process
2. all members of a process group
3. all descendents of the processes in 1 or 2
In the first two cases, we do not permit the operation if the process is
being forked or not visible. However, in case 3 we did not enforce this
restriction for descendents. As a result, the assertions about the child
in ktrprocfork() may be violated.
Move these checks into ktrops() so that they are applied consistently.
Allow KTROP_CLEAR for nascent processes. Otherwise, there is a window
where we cannot clear trace points for a nascent child if they are
inherited from the parent.
Mark Johnston [Thu, 27 May 2021 19:49:32 +0000 (15:49 -0400)]
kevent: Prohibit negative change and event list lengths
Previously, a negative change list length would be treated the same as
an empty change list. A negative event list length would result in
bogus copyouts. Make kevent(2) return EINVAL for both cases so that
application bugs are more easily found, and to be more robust against
future changes to kevent internals.
Reviewed by: imp, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30480
Mark Johnston [Thu, 27 May 2021 19:49:12 +0000 (15:49 -0400)]
ktrace: Handle negative array sizes in ktrstructarray
ktrstructarray() may be used to create copies of kevent(2) change and
event arrays. It is called before parameter validation is done and so
should check for bogus array lengths before allocating a copy.
Reported by: syzkaller
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30479
Eric van Gyzen [Thu, 27 May 2021 16:33:22 +0000 (11:33 -0500)]
libprocstat kstack: fix race with thread creation
When collecting kernel stacks for a target process, if the process
adds a thread between the two calls to sysctl, ignore the additional
threads. Previously, procstat would print only a useless error
message. Now, it prints a consistent snapshot of the stacks.
We know that snapshot is already stale, but it could still be stale
even with a more complex fix to reallocate and retry, so such a fix
is hardly worth the effort.
Randall Stewart [Thu, 27 May 2021 14:50:32 +0000 (10:50 -0400)]
tcp: When we have an out-of-order FIN we do want to strip off the FIN bit.
The last set of commits fixed both a panic (in rack) and an ACK-war (in freebsd and bbr).
However there was a missing case, i.e. where we get an out-of-order FIN by itself.
In such a case we don't want to leave the FIN bit set, otherwise we will do the
wrong thing and ack the FIN incorrectly. Instead we need to go through the
tcp_reasm() code and that way the FIN will be stripped and all will be well.
Bjoern A. Zeeb [Wed, 26 May 2021 17:55:21 +0000 (17:55 +0000)]
LinuxKPI: netdevice.h remove more ifnet operating macros
Now that mlx4 and ofed either are operating on ifnet functions
directly or have a private copy of these macros, we can remove them
from linux/netdevice.h.
With this only the #define for net_device to ifnet is left.
Sponsored by: The FreeBSD Foundation
MFC after: 12 days
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D30478
Bjoern A. Zeeb [Wed, 26 May 2021 17:51:24 +0000 (17:51 +0000)]
OFED: migrate LinuxKPI net_device/ifnet macros into ofed
The LinuxKPI net_device actually is an ifnet; in order to further
clean that up so we can extend "net_device" migrate the few macros
left into ofed and make sure the header is included in all files
which need access to the macros.
Sponsored by: The FreeBSD Foundation
MFC after: 12 days
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D30477
Navdeep Parhar [Thu, 27 May 2021 02:18:42 +0000 (19:18 -0700)]
cxgbe(4): Fix an incorrect assert.
CTRL and OFLD tx queues do not have automatic tx credit flush enabled so
it is okay for the cidx not to be the same as the pidx when the queue is
destroyed.
hwpmc: Move 4 bits of mode to extend class size to 8
Since r289025 we have had at least 5 bits class size.
Before that it was even 16 bits, but macro handling conversion between
pmcid and set of CPU, MODE, CLASS, ROWINDEX still use 4 bits class size
and 8 bits mode size.
This breaks some libpmc API methods, like pmc_capabilities.
Since we only have 4 modes and MODE field is a number (not a bitfield)
this patch moves 4 bits of mode to extend the CLASS field.
tcp: Use local CC data only in the correct context
Most CC algos do use local data, and when calling
newreno_cong_signal from there, the latter misinterprets
the data as its own struct, leading to incorrect behavior.
Reported by: chengc_netapp.com
Reviewed By: chengc_netapp.com, tuexen, #transport
MFC after: 3 days
Sponsored By: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D30470
Lutz Donnerhacke [Sun, 23 May 2021 17:48:13 +0000 (19:48 +0200)]
tests/libalias: Improve testing
gettimeofday(3) is almost as expensive as the calls to libalias.
So the call frequency for this call is reduced by a factor of 1000 in
order to neglect it's influence.
Using NAT entries became more realistic: A communication of a random
length of up to 150 packets (10% outgoing, 90% incoming) is applied
for each entry.
Precision of the execution time is raised to see the trends better.
Andrew Gallatin [Wed, 26 May 2021 13:54:26 +0000 (09:54 -0400)]
cxgbe: fix enabling lro & rxtimestamps
A recent change caused iq flags, like LRO, to be set before
init_iq(). However, init_iq() clears those flags, so they
became effectively impossible to set. This change moves
the initializion of these flags to after the call to init_iq().
This fixes LRO.
Kristof Provost [Tue, 18 May 2021 07:24:50 +0000 (09:24 +0200)]
pf: Move nvlist conversion functions to pf_nv
Separate the conversion functions (between kernel structs and nvlists)
to pf_nv. This reduces the size of pf_ioctl.c, which is already quite
large and complex, a good bit. It also keeps all the fairly
straightforward conversion code together.
Eugene Grosbein [Wed, 26 May 2021 11:30:24 +0000 (18:30 +0700)]
rc.d/random: add support for zero harvest_mask
Replace the check for zero harvest_mask with new check for empty string.
This allows one to specify harvest_mask="0" that disables harversting
entropy from all but "pure" sources. Exact bit values for "pure" sources
differ for stable/12 and later branches, so it is handy to use zero.
The check for zero pre-dates introduction of "pure" non-maskable sources
Use empty string to disable altering sysctl kern.random.harvest.mask.
Note that notion of "pure" random sources is not documented in user level
manual pages yet. Still, it helps to extend battery life for hardware
with embedded "Intel Secure Key RNG" by disabling all other sources.
Note that no defaults changed and default behaviour is not affected.
Randall Stewart [Wed, 26 May 2021 10:43:30 +0000 (06:43 -0400)]
tcp: Add a socket option to rack so we can test various changes to the slop value in timers.
Timer_slop, in TCP, has been 200ms for a long time. This value dates back
a long time when delayed ack timers were longer and links were slower. A
200ms timer slop allows 1 MSS to be sent over a 60kbps link. Its possible that
lowering this value to something more in line with todays delayed ack values (40ms)
might improve TCP. This bit of code makes it so rack can, via a socket option,
adjust the timer slop.