Kristof Provost [Tue, 1 Jun 2021 14:05:47 +0000 (16:05 +0200)]
pf: Fix more ioctl memory leaks
We must also remember to free nvlists added to a parent nvlist with
nvlist_append_nvlist_array().
More importantly, when nvlist_pack() allocates memory for us it does so
in the M_NVLIST zone, so we must free it with free(.., M_NVLIST). Using
free(.., M_TEMP) as we did silently failed to free the memory.
Rick Macklem [Fri, 21 May 2021 01:37:40 +0000 (18:37 -0700)]
nfsd: Add support for CLAIM_DELEG_PREV_FH to the NFSv4.1/4.2 Open
Commit b3d4c70dc60f added support for CLAIM_DELEG_CUR_FH to Open.
While doing this, I noticed that CLAIM_DELEG_PREV_FH support
could be added the same way. Although I am not aware of any extant
NFSv4.1/4.2 client that uses this claim type, it seems prudent to add
support for this variant of Open to the NFSv4.1/4.2 server.
This patch does not affect mounts from extant NFSv4.1/4.2 clients,
as far as I know.
Rick Macklem [Wed, 19 May 2021 21:52:56 +0000 (14:52 -0700)]
nfscl: Fix NFSv4.1/4.2 mount recovery from an expired lease
The most difficult NFSv4 client recovery case happens when the
lease has expired on the server. For NFSv4.0, the client will
receive a NFSERR_EXPIRED reply from the server to indicate this
has happened.
For NFSv4.1/4.2, most RPCs have a Sequence operation and, as such,
the client will receive a NFSERR_BADSESSION reply when the lease
has expired for these RPCs. The client will then call nfscl_recover()
to handle the NFSERR_BADSESSION reply. However, for the expired lease
case, the first reclaim Open will fail with NFSERR_NOGRACE.
This patch recognizes this case and calls nfscl_expireclient()
to handle the recovery from an expired lease.
This patch only affects NFSv4.1/4.2 mounts when the lease
expires on the server, due to a network partitioning that
exceeds the lease duration or similar.
Jessica Clarke [Fri, 28 May 2021 18:07:17 +0000 (19:07 +0100)]
aic7xxx: Fix re-building firmware with -fno-common
The generated C output for aicasm_scan.l defines yylineno already, so
references to it from other files should use an extern declaration.
The STAILQ_HEAD use in aicasm_symbol.h also provided an identifier,
causing it to both define the struct type and define a variable of that
struct type, causing any C file including the header to define the same
variable. This variable is not used (and confusingly clashes with a
field name just below) and was likely caused by confusion when switching
between defining fields using similar type macros and defining the type
itself.
Kristof Provost [Thu, 27 May 2021 09:28:36 +0000 (11:28 +0200)]
libpfctl: fix memory leak
When we create an nvlist and insert it into another nvlist we must
remember to destroy it. The nvlist_add_nvlist() function makes a copy,
just like nvlist_add_string() makes a copy of the string.
Rick Macklem [Wed, 19 May 2021 21:52:56 +0000 (14:52 -0700)]
nfscl: Fix NFSv4.1/4.2 mount recovery from an expired lease
The most difficult NFSv4 client recovery case happens when the
lease has expired on the server. For NFSv4.0, the client will
receive a NFSERR_EXPIRED reply from the server to indicate this
has happened.
For NFSv4.1/4.2, most RPCs have a Sequence operation and, as such,
the client will receive a NFSERR_BADSESSION reply when the lease
has expired for these RPCs. The client will then call nfscl_recover()
to handle the NFSERR_BADSESSION reply. However, for the expired lease
case, the first reclaim Open will fail with NFSERR_NOGRACE.
This patch recognizes this case and calls nfscl_expireclient()
to handle the recovery from an expired lease.
This patch only affects NFSv4.1/4.2 mounts when the lease
expires on the server, due to a network partitioning that
exceeds the lease duration or similar.
Lutz Donnerhacke [Sun, 23 May 2021 12:43:00 +0000 (14:43 +0200)]
tests/libalias: Test LibAliasIn and redirection
Rework the tests to check the correct layer in a single test. Factor
out tests for reuse in other modules. Extend the test suite for
libalias(3) to incoming connections. Test the various types of
redirections.
gettimeofday(3) is almost as expensive as the calls to libalias.
So the call frequency for this call is reduced by a factor of 1000 in
order to neglect it's influence.
Using NAT entries became more realistic: A communication of a random
length of up to 150 packets (10% outgoing, 90% incoming) is applied
for each entry.
Add port forwardings to the performance tests. This will cause random
incoming packets to match the random port forwardings opends beforehand.
After a long test run, a lot of ressouces have been allocated.
Measure the time tot free them.
Dmitry Chagin [Wed, 19 May 2021 21:08:25 +0000 (00:08 +0300)]
tcsh: cleanup source tree to reduce diff size.
Remove makefiles, configure files and unused at build time files
to reduce the diff size. Otherwise the diff contains a lot of
unnecessary lines what makes reviewing and merging proccess so hard,
especially for re@.
Kristof Provost [Tue, 18 May 2021 07:24:50 +0000 (09:24 +0200)]
pf: Move nvlist conversion functions to pf_nv
Separate the conversion functions (between kernel structs and nvlists)
to pf_nv. This reduces the size of pf_ioctl.c, which is already quite
large and complex, a good bit. It also keeps all the fairly
straightforward conversion code together.
Rick Macklem [Tue, 18 May 2021 22:53:54 +0000 (15:53 -0700)]
nfsd: Add support for CLAIM_DELEG_CUR_FH to the NFSv4.1/4.2 Open
The Linux NFSv4.1/4.2 client now uses the CLAIM_DELEG_CUR_FH
variant of the Open operation when delegations are recalled and
the client has a local open of the file. This patch adds
support for this variant of Open to the NFSv4.1/4.2 server.
This patch only affects mounts from Linux clients when delegations
are enabled on the server.
Rick Macklem [Tue, 18 May 2021 23:17:58 +0000 (16:17 -0700)]
nfsd: Reduce the callback timeout to 800msec
Recent discussion on the nfsv4@ietf.org mailing list confirmed
that an NFSv4 server should reply to an RPC in less than 1second.
If an NFSv4 RPC requires a delegation be recalled,
the server will attempt a CB_RECALL callback.
If the client is not responsive, the RPC reply will be delayed
until the callback times out.
Without this patch, the timeout is set to 4 seconds (set in
ticks, but used as seconds), resulting in the RPC reply taking over 4sec.
This patch redefines the constant as being in milliseconds and it
implements that for a value of 800msec, to ensure the RPC
reply is sent in less than 1second.
This patch only affects mounts from clients when delegations
are enabled on the server and the client is unresponsive to callbacks.
tcp: Use local CC data only in the correct context
Most CC algos do use local data, and when calling
newreno_cong_signal from there, the latter misinterprets
the data as its own struct, leading to incorrect behavior.
Reported by: chengc_netapp.com
Reviewed By: chengc_netapp.com, tuexen, #transport
MFC after: 3 days
Sponsored By: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D30470
Rick Macklem [Sun, 16 May 2021 23:40:01 +0000 (16:40 -0700)]
NFSv4 server: Re-establish the delegation recall timeout
Commit 7a606f280a3e allowed the server to do retries of CB_RECALL
callbacks every couple of seconds. This was needed to allow the
Linux client to re-establish the back channel.
However this patch broke the delegation timeout check, such that
it would just keep retrying CB_RECALLS.
If the client has crashed or been network patitioned from the
server, this continues until the client TCP reconnects to
the server and re-establishes the back channel.
This patch modifies the code such that it still times out the
delegation recall after some minutes, so that the server will
allow the conflicting client request once the delegation times out.
This patch only affects the NFSv4 server when delegations are
enabled and a NFSv4 client that holds a delegation has crashed
or been network partitioned from the server for at least several
minutes when a delegation needs to be recalled.
Lutz Donnerhacke [Fri, 14 May 2021 13:08:08 +0000 (15:08 +0200)]
libalias: Style cleanup
libalias is a convolut of various coding styles modified by a series
of different editors enforcing interesting convetions on spacing and
comments.
This patch is a baseline to start with a perfomance rework of
libalias. Upcoming patches should be focus on the code, not on the
style. That's why most annoying style errors should be fixed
beforehand.
Alex Richardson [Tue, 19 Jan 2021 11:32:32 +0000 (11:32 +0000)]
libalias: Fix -Wcast-align compiler warnings
This fixes -Wcast-align warnings caused by the underaligned `struct ip`.
This also silences them in the public functions by changing the function
signature from char * to void *. This is source and binary compatible and
avoids the -Wcast-align warning.
Lutz Donnerhacke [Sun, 16 May 2021 21:37:37 +0000 (23:37 +0200)]
test/libalias: Tests for instantiation and outgoing NAT
In order to modify libalias for performance, the existing
functionality must not change. Enforce this.
Testing LibAliasOut functionality. This concentrates the typical use
case of initiating data transfers from the inside. Provide a
exhaustive test for the data structure in order to check for
performance improvements.
In order to compare upcoming changes for their effectivness, measure
performance by counting opertions and the runtime of each operation
over the time. Accumulate all tests in a single instance, so make it
complicated over the time. If you wait long enough, you will notice
the expiry of old flows.
Mark Johnston [Fri, 28 May 2021 14:41:43 +0000 (10:41 -0400)]
libradius: Fix attribute length validation in rad_get_attr(3)
The length of the attribute header needs to be excluded when comparing
the attribute length against the length of the packet. Otherwise,
validation may incorrectly fail when fetching the final attribute in a
message.
Fixes: 8d5c78130 ("libradius: Fix input validation bugs")
Reported by: Peter Eriksson
Tested by: Peter Eriksson
Sponsored by: The FreeBSD Foundation
Kevin Bowling [Fri, 16 Apr 2021 23:15:50 +0000 (16:15 -0700)]
e1000: Correct promisc multicast filter handling
There are a number of issues in the e1000 multicast filter handling
that have been present for a long time. Take the updated approach from
ixgbe(4) which does not have the issues.
The issues are outlined in the PR, in particular this solves crossing
over and under the hardware's filter limit, not programming the
hardware filter when we are above its limit, disabling SBP (show bad
packets) when the tunable is enabled and exiting promiscuous mode, and
an off-by-one error in the em_copy_maddr function.
Kevin Bowling [Wed, 21 Apr 2021 02:35:14 +0000 (19:35 -0700)]
ixgbe: Improve device name strings
This is just clerical work to ease bug triage and may be used to set
expectations around the ability for anyone in the community to perform
testing and development on older parts.
Kevin Bowling [Sun, 25 Apr 2021 08:22:23 +0000 (01:22 -0700)]
e1000: Rework em_msi_link interrupt filter
* Fix 82574 Link Status Changes, carrying the OTHER mask bit around as
needed.
* Move igb-class LSC re-arming out of FAST back into the handler.
* Clarify spurious/other interrupt re-arms in FAST.
In MSI-X mode, 82574 and igb-class devices use an interrupt filter to
handle Link Status Changes. We want to do LSC re-arms in the handler
to take advantage of autoclear (EIAC) single shot behavior.
82574 uses 'Other' in ICR and IMS for LSC interrupt types when in MSI-X
mode, so we need to set and re-arm the 'Other' bit during attach and
after ICR reads in the FAST handler if not an LSC or after handling on
LSC due to autoclearing.
This work was primarily done to address the referenced PR, but inspired
some clarification and improvement for igb-class devices once the
intentions of previous bug fix attempts became clearer.
Kevin Bowling [Wed, 21 Apr 2021 05:27:48 +0000 (22:27 -0700)]
e1000: Improve device name strings
This is just clerical work to ease bug triage and may be used to set
expectations around the ability for anyone in the community to perform
testing and development on older parts (this driver covers over 20 years
of silicon)
Reviewed by: erj
Approved by: markj
Sponsored by: Pink Floyd - Any Colour You Like (in kind)
Differential Revision: https://reviews.freebsd.org/D29872
Major improvement to build parallelism for googletest internal tests
Currently the googletest internal tests build after the matching library.
However, each of these is serialized at the top level makefile.
Additionally some of the tests (e.g. the gmock-matches-test) take up to
90 seconds to build with clang -O2. Having to wait for this test to
complete before continuing to the next directory seriously slows down the
parllelism of a -j32 build.
Before this change running `make -C lib/googletest -j32 -s` in buildenv
took 202 seconds, now it's 153 due to improved parallelism.
Reviewed By: emaste (no objection)
Differential Revision: https://reviews.freebsd.org/D26748
Significantly reduce compile time for googletest internal tests
Clang's optimizer spends a really long time on these tests at -O2, so we now
use -O0 instead. This reduces the -j32 time for lib/googletest/test from 131s
to 29s. Using -O0 also reduces the disk usage from 144MB (at -O2) / 92MB (at
-O1) to 82MB.
Reviewed By: ngie, dim
Differential Revision: https://reviews.freebsd.org/D26751
Don Morris [Thu, 20 May 2021 14:54:38 +0000 (10:54 -0400)]
ufs: Avoid M_WAITOK allocations when building a dirhash
At this point the directory's vnode lock is held, so blocking while
waiting for free pages makes the system more susceptible to deadlock in
low memory conditions. This is particularly problematic on NUMA systems
as UMA currently implements a strict first-touch policy.
ufsdirhash_build() already uses M_NOWAIT for other allocations and
already handled failures for the block array allocation, so just convert
to M_NOWAIT.
Lutz Donnerhacke [Thu, 11 Feb 2021 22:59:11 +0000 (23:59 +0100)]
netgraph/ng_bridge: Avoid cache thrashing
Hint the compiler, that this update is needed at most once per second.
Only in this case the memory line needs to be written. This will
reduce the amount of cache trashing during forward of most frames.
Lutz Donnerhacke [Tue, 12 Jan 2021 21:28:29 +0000 (22:28 +0100)]
netgraph/ng_bridge: become SMP aware
The node ng_bridge underwent a lot of changes in the last few months.
All those steps were necessary to distinguish between structure
modifying and read-only data transport paths. Now it's done, the node
can perform frame forwarding on multiple cores in parallel.
Use the new control message to move ethernet addresses from a link to
a new link in ng_bridge(4). Send this message instead of doing the
work directly requires to move the loop detection into the control
message processing. This will delay the loop detection by a few
frames.
This decouples the read-only activity from the modification under a
more strict writer lock.
netgraph/ng_bridge: learn MACs via control message
Add a new control message to move ethernet addresses to a given link
in ng_bridge(4). Send this message instead of doing the work directly.
This decouples the read-only activity from the modification under a
more strict writer lock.
Decoupling the work is a prerequisite for multithreaded operation.
Kristof Provost [Mon, 24 May 2021 06:32:16 +0000 (08:32 +0200)]
pf: fix ioctl() memory leak
When we create an nvlist and insert it into another nvlist we must
remember to destroy it. The nvlist_add_nvlist() function makes a copy,
just like nvlist_add_string() makes a copy of the string. If we don't
we're leaking memory on every (nvlist-based) ioctl() call.
While here remove two redundant 'break' statements.
Kristof Provost [Tue, 18 May 2021 13:03:01 +0000 (15:03 +0200)]
pfctl: Fix crash on ALTQ configuration
The following config could crash pfctl:
altq on igb0 fairq bandwidth 1Gb queue { qLink }
queue qLink fairq(default)
That happens because when we're parsing the parent queue (on igb0) it
doesn't have a parent, and the check in eval_pfqueue_fairq() checks
pa->parent rather than parent.
Kristof Provost [Thu, 13 May 2021 07:51:28 +0000 (09:51 +0200)]
pf: Support killing floating states by interface
Floating states get assigned to interface 'all' (V_pfi_all), so when we
try to flush all states for an interface states originally created
through this interface are not flushed. Only if-bound states can be
flushed in this way.
Given that we track the original interface we can check if the state's
interface is 'all', and if so compare to the orig_if instead.
If copyin family of routines fault, kernel does clear PSL.AC on the
fault entry, but the AC flag of the faulted frame is kept intact. Since
onfault handler is effectively jump, AC survives until syscall exit.
Reported by: m00nbsd, via Sony
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
admbugs: 975
Kristof Provost [Sun, 16 May 2021 06:51:54 +0000 (08:51 +0200)]
pf tests: More set skip on <ifgroup> tests
Test the specific case reported in PR 255852. Clearing the skip flag
on groups was broken because pfctl couldn't work out if a kif was a
group or not, because the kernel no longer set the pfik_group pointer.
Kristof Provost [Sun, 16 May 2021 06:50:17 +0000 (08:50 +0200)]
pf: Set the pfik_group for userspace
Userspace relies on this pointer to work out if the kif is a group or
not. It can't use it for anything else, because it's a pointer to a
kernel address. Substitute 0xfeedc0de for 'true', so that we don't leak
kernel memory addresses to userspace.
Matt Joras [Fri, 30 Aug 2019 20:19:43 +0000 (20:19 +0000)]
MFC r351629: sys/net/if_vlan.c: Wrap a vlan's parent's if_output
in a separate function.
The merge is done in preparation of another merge
to support 802.1ad (qinq). Original commit log follows.
When a vlan interface is created, its if_output is set directly to the
parent interface's if_output. This is fine in the normal case but has an
unfortunate consequence if you end up with a certain combination of vlan
and lagg interfaces.
Consider you have a lagg interface with a single laggport member. When
an interface is added to a lagg its if_output is set to
lagg_port_output, which blackholes traffic from the normal networking
stack but not certain frames from BPF (pseudo_AF_HDRCMPLT). If you now
create a vlan with the laggport member (not the lagg interface) as its
parent, its if_output is set to lagg_port_output as well. While this is
confusing conceptually and likely represents a misconfigured system, it
is not itself a problem. The problem arises when you then remove the
lagg interface. Doing this resets the if_output of the laggport member
back to its original state, but the vlan's if_output is left pointing to
lagg_port_output. This gives rise to the possibility that the system
will panic when e.g. bpf is used to send any frames on the vlan
interface.
Fix this by creating a new function, vlan_output, which simply wraps the
parent's current if_output. That way when the parent's if_output is
restored there is no stale usage of lagg_port_output.
Andriy Gapon [Fri, 7 May 2021 07:17:57 +0000 (10:17 +0300)]
storvsc: fix auto-sense reporting
I saw a situation where the driver set CAM_AUTOSNS_VALID on a failed ccb
even though SRB_STATUS_AUTOSENSE_VALID was not set in the status.
The actual sense data remained all zeros.
The problem seems to be that create_storvsc_request() always sets
hv_storvsc_request::sense_info_len, so checking for sense_info_len != 0
is not enough to determine if any auto-sense data is actually available.
Andriy Gapon [Thu, 6 May 2021 18:49:37 +0000 (21:49 +0300)]
PCI hot-plug: use dedicated taskqueue for device attach / detach
Attaching and detaching devices can be heavy-weight and detaching can
sleep waiting for events. For that reason using the system-wide
single-threaded taskqueue_thread is not really appropriate.
There is even a possibility for a deadlock if taskqueue_thread is used
for detaching.
In fact, there is an easy to reproduce deadlock involving nvme, pass
and a sudden removal of an NVMe device.
A pass peripheral would not release a reference on an nvme sim until
pass_shutdown_kqueue() is executed via taskqueue_thread. But the
taskqueue's thread is blocked in nvme_detach() -> ... -> cam_sim_free()
because of the outstanding reference.
netgraph/ng_bridge: Handle send errors during loop handling
If sending out a packet fails during the loop over all links, the
allocated memory is leaked and not all links receive a copy. This
patch fixes those problems, clarifies a premature abort of the loop,
and fixes a minory style(9) bug.
Kristof Provost [Wed, 12 May 2021 17:13:40 +0000 (19:13 +0200)]
tests: Only log critical errors from scapy
Since 2.4.5 scapy started issuing warnings about a few different
configurations during our tests. These are harmless, but they generate
stderr output, which upsets atf_check.
Configure scapy to only log critical errors (and thus not warnings) to
fix these tests.
IEEE Std 802.1D-2004 Section 17.14 defines permitted ranges for timers.
Incoming BPDU messages should be checked against the permitted ranges.
The rest of 17.14 appears to be enforced already.