Kevin Bowling [Mon, 19 Apr 2021 02:11:27 +0000 (19:11 -0700)]
e1000: Add support for [Tiger, Alder, Meteor] Lake
Add support for current and future client platform PCI IDs. These are
all I219 variants and have no known driver changes versus previous
generation client platform I219 variants.
ng_ubt: Do not clear stall before receiving of HCI command response.
Unconditional execution of "clear feature" request at SETUP stage was
workaround for probe failures on ng_ubt.ko re-kldloading which is
unnecessary now.
Rick Macklem [Fri, 30 Apr 2021 01:29:25 +0000 (18:29 -0700)]
param.h: bump __FreeBSD_version for commit 5a45802b3c8c
Commit 5a45802b3c8c changed the internal KAPI between the krpc
and NFS. As such, the krpc, nfscommon and nfscl modules must
all be rebuilt from sources.
Rick Macklem [Sun, 11 Apr 2021 21:34:57 +0000 (14:34 -0700)]
nfsv4 client: do the BindConnectionToSession as required
During a recent testing event, it was reported that the NFSv4.1/4.2
server erroneously bound the back channel to a new TCP connection.
RFC5661 specifies that the fore channel is implicitly bound to a
new TCP connection when an RPC with Sequence (almost any of them)
is done on it. For the back channel to be bound to the new TCP
connection, an explicit BindConnectionToSession must be done as
the first RPC on the new connection.
Since new TCP connections are created by the "reconnect" layer
(sys/rpc/clnt_rc.c) of the krpc, this patch adds an optional
upcall done by the krpc whenever a new connection is created.
The patch also adds the specific upcall function that does a
BindConnectionToSession and configures the krpc to call it
when required.
This is necessary for correct interoperability with NFSv4.1/NFSv4.2
servers when the nfscbd daemon is running.
If doing NFSv4.1/NFSv4.2 mounts without this patch, it is
recommended that the nfscbd daemon not be running and that
the "pnfs" mount option not be specified.
truncate(1) is not case-sensitive with regard to setting the size
of a file. makefs(8), however, does not honor upper-case values.
Update release-specific files and the release(7) manual page to
reflect this.
Modular fib lookup framework features logic that allows
route update batching for the algorithms that cannot easily
apply the routing change without rebuilding. As a result,
dataplane lookups may return old data until the the sync
takes place. With the default sync timeout of 50ms, it is
possible that new binary like ping(8) executed exactly after
route(8) will still use the old fib data.
To address some aspects of the problem, framework executes
all rtable changes without RTF_GATEWAY synchronously.
To fix the aforementioned problem, this diff extends sync
execution for all RTF_STATIC routes (e.g. ones maintained by
route(8).
This fixes a bunch of tests in the networking space.
Currently, PCB caching mechanism relies on the rib generation
counter (rnh_gen) to invalidate cached nhops/LLE entries.
With certain fib algorithms, it is now possible that the
datapath lookup state applies RIB changes with some delay.
In that scenario, PCB cache will invalidate on the RIB change,
but the new lookup may result in the same nexthop being returned.
When fib algo finally gets in sync with the RIB changes, PCB cache
will not receive any notification and will end up caching the stale data.
To fix this, introduce additional counter, rnh_gen_rib, which is used
only when FIB_ALGO is enabled.
This counter is incremented by the control plane. Each time when fib algo
synchronises with the RIB, it updates rnh_gen to the current rnh_gen_rib value.
Initial fib algo implementation was build on a very simple set of
principles w.r.t updates:
1) algorithm is ether able to apply the change synchronously (DIR24-8)
or requires full rebuild (bsearch, lradix).
2) framework falls back to rebuild on every error (memory allocation,
nhg limit, other internal algo errors, etc).
This changes brings the new "intermediate" concept - batched updates.
Algotirhm can indicate that the particular update has to be handled in
batched fashion (FLM_BATCH).
The framework will write this update and other updates to the temporary
buffer instead of pushing them to the algo callback.
Depending on the update rate, the framework will batch 50..1024 ms of updates
and submit them to a different algo callback.
This functionality is handy for the slow-to-rebuild algorithms like DXR.
The intent is to better handle time intervals with large amount of RIB
updates (e.g. BGP peer going up or down), while still keeping low sync
delay for the rest scenarios.
The implementation is the following: updates are bucketed into the
buckets of size 50ms. If the number of updates within a current bucket
exceeds the threshold of 500 routes/sec (e.g. 10 updates per bucket
interval), the update is delayed for another 50ms. This can be repeated
until the maximum update delay (1 sec) is reached.
inp_lookup_mcast_ifp() is static and is only used in the inp_join_group().
The latter function is also static, and is only used in the inp_setmoptions(),
which relies on inp being non-NULL.
As a result, in the current code, inp_lookup_mcast_ifp() is always called
with non-NULL inp. Eliminate unused RT_DEFAULT_FIB condition and always
use inp fib instead.
Mark Johnston [Wed, 14 Apr 2021 16:57:24 +0000 (12:57 -0400)]
uma: Introduce per-domain reclamation functions
Make it possible to reclaim items from a specific NUMA domain.
- Add uma_zone_reclaim_domain() and uma_reclaim_domain().
- Permit parallel reclamations. Use a counter instead of a flag to
synchronize with zone_dtor().
- Use the zone lock to protect cache_shrink() now that parallel reclaims
can happen.
- Add a sysctl that can be used to trigger reclamation from a specific
domain.
Currently the new KPIs are unused, so there should be no functional
change.
Reviewed by: mav
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29685
Mark Johnston [Wed, 14 Apr 2021 16:56:39 +0000 (12:56 -0400)]
domainset: Define additional global policies
Add global definitions for first-touch and interleave policies. The
former may be useful for UMA, which implements a similar policy without
using domainset iterators.
No functional change intended.
Reviewed by: mav
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29104
Mark Johnston [Wed, 21 Apr 2021 19:38:01 +0000 (15:38 -0400)]
Add required checks for unmapped mbufs in ipdivert and ipfw
Also add an M_ASSERTMAPPED() macro to verify that all mbufs in the chain
are mapped. Use it in ipfw_nat, which operates on a chain returned by
m_megapullup().
PR: 255164
Reviewed by: ae, gallatin
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29838
iwnstats: fix build with clang and allow install under /usr/local/sbin
iwnstats was not compiling because of some issues raised by the clang
compiler due to -Werror. As a tool it is not connected to world build.
Add missing field "barker_mrc" initialization in struct
iwn_sensitivity_limits for -Wmissing-field-initializers, remove unused
pointer *is on iwn_stats_*_print functions and unused variables for
-Wunused-parameter and -Wunused-variable.
The value for field "barker_mrc" of struct iwn2030_sensitivity_limits
was obtained from linux 3.2 wireless/iwlwifi driver code (iwl-2000.c:115
.barker_corr_th_min_mrc = 390).
Also set BINDIR in Makefile to make it possible to install under
/usr/local/sbin/iwnstats as it require super user.
Reviewed by: adrian
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29800
Alexander Motin [Tue, 13 Apr 2021 15:19:10 +0000 (11:19 -0400)]
Fix race in case of device destruction.
During device destruction it is possible that open() succeed, but
fdevname() return NULL, that can't be assigned to string variable.
Fix that by adding explicit NULL check.
Also while there switch from fdevname() to fdevname_r().
Ka Ho Ng [Sun, 11 Apr 2021 06:45:37 +0000 (14:45 +0800)]
vnode_pager_setsize.9: Some clarifications on the manpage
A number of changes:
- Clarifies the locking rules when calling the routine.
- Correct the description regarding the content range to be purged.
- Document the effects on page fault handler.
MFC with: 86a52e262a6f
Sponsored by: The FreeBSD Foundation
Reviewed by: bcr, kib
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D29637
Nathan Whitehorn [Thu, 25 Feb 2021 02:16:56 +0000 (21:16 -0500)]
Use makefs(8) in release VM-image generation instead of md(4) and newfs.
Using makefs instead reduces the privileges needed to build VM images,
simplifies the script (no need to copy files to a fresh image at the end),
and improves portability by allowing generation of cross-endian images.
As a result of the last, this patch also adds support for generation of
powerpc64 and powerpc64le VM images.
No other changes to the output. Tested and working for both amd64 and
powerpc64 targets.
Allocate extra inodes in makefs when leaving free space in UFS images.
By default, makefs(8) has very few spare inodes in its output images,
which is fine for static filesystems, but not so great for VM images
where many more files will be added. Make makefs(8) use the same
default settings as newfs(8) when creating images with free space --
there isn't much point to leaving free space on the image if you
can't put files there. If no free space is requested, use current
behavior of a minimal number of available inodes.
Reviewed by: manu
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D29492
John Baldwin [Fri, 26 Mar 2021 22:05:31 +0000 (15:05 -0700)]
cxgbe: Add a struct sge_ofld_txq type.
This type mirrors struct sge_ofld_rxq and holds state for TCP offload
transmit queues. Currently it only holds a work queue but will
include additional state in future changes.
John Baldwin [Fri, 12 Mar 2021 18:35:56 +0000 (10:35 -0800)]
ccr: Disable requests on port 1 when needed to workaround a firmware bug.
Completions for crypto requests on port 1 can sometimes return a stale
cookie value due to a firmware bug. Disable requests on port 1 by
default on affected firmware.
John Baldwin [Fri, 12 Mar 2021 18:35:05 +0000 (10:35 -0800)]
ccr: Set the RX channel ID correctly in work requests.
These fixes are only relevant for requests on the second port. In
some cases, the crypto completion data, completion message, and
receive descriptor could be written in the wrong order.
- Add a separate rx_channel_id that is a copy of the port's rx_c_chan
and use it when an RX channel ID is required in crypto requests
instead of using the tx_channel_id.
- Set the correct rx_channel_id in the CPL_RX_PHYS_ADDR used to write
the crypto result.
- Set the FID to the first rx queue ID on the adapter rather than the
queue ID of the first rx queue for the port.
- While here, use tx_chan to set the tx_channel_id though this is
identical to the previous value.
John Baldwin [Wed, 17 Feb 2021 21:28:04 +0000 (13:28 -0800)]
Handle negative return values from syncache_expand().
These errors do not clear so to NULL, so the existing check was
treating these failures as success. The rest of do_pass_establish()
then tried to use the listen socket as if it was a connection socket
newly created by syncache_expand().
In addition, for negative return values, do not send a RST to the
peer.
Reported by: Sony Arpita Das @ Chelsio
Sponsored by: Chelsio Communications
"zgrep --version" is expected to print the version information in the
same way as "zgrep -V". However, the case handling the --version flag
is never reached, so "zgrep --version" prints:
zgrep: missing pattern
instead of:
grep (BSD grep, GNU compatible) 2.6.0-FreeBSD
Improve debugging output on routing tests failure.
Most of the routing tests create per-test VNET, making
it harder to repeat the failure with CLI tools.
Provide an additional route/nexthop data on failure.
Address multiple issues with strict rtsock message validation.
D28668 "normalisation" approach was based on the assumption that
we always have at least "standard" sockaddr len.
It turned out to be false - certain older applications like quagga
or routed abuse sin[6]_len field and set it to the offset to the
first fully-zero bit in the mask. It is impossible to normalise
such sockaddrs without reallocation.
With that in mind, change the approach to use a distinct memory
buffer for the altered sockaddrs. This allows supporting the older
software while maintaining the guarantee on the "standard" sockaddrs.
PR: 255273,255089
Differential Revision: https://reviews.freebsd.org/D29826
MFC after: 3 days
Fib algo: extend KPI by allowing algo to set datapath pointers.
Some algorithms may require updating datapath and control plane
algo pointers after the (batched) updates.
Export fib_set_datapath_ptr() to allow setting the new datapath
function or data pointer from the algo.
Add fib_set_algo_ptr() to allow updating algo control plane
pointer from the algo.
Add fib_epoch_call() epoch(9) wrapper to simplify freeing old
datapath state.
Rick Macklem [Sat, 10 Apr 2021 22:50:25 +0000 (15:50 -0700)]
nfsd: fix replies from session cache for multiple retries
Recent testing of network partitioning a FreeBSD NFSv4.1
server from a Linux NFSv4.1 client identified problems
with both the FreeBSD server and Linux client.
Commit 05a39c2c1c18 fixed replying with the cached reply in
in the session slot if same session slot sequence#.
However, the code uses the reply and, as such,
will fail for a subsequent retry of the RPC.
A subsequent retry would be an extremely rare event,
but this patch fixes this, so long as m_copym(..M_NOWAIT)
does not fail, which should also be a rare event.
This fix affects the exceedingly rare case where a NFSv4
client retries a non-idempotent RPC, such as a lock
operation, multiple times. Note that retries only occur
after the client has needed to create a new TCP connection,
with a new TCP connection for each retry.
Alexander Motin [Sat, 17 Apr 2021 14:41:35 +0000 (10:41 -0400)]
mpt(4): Remove incorrect S/G segments limits.
First, two of those four checks are unreachable.
Second, I don't believe there should be ">=" instead of ">".
Third, bus_dma(9) already returns the same EFBIG if ">".
This fixes false I/O errors in worst S/G cases with maxphys >= 2MB.