If copyin family of routines fault, kernel does clear PSL.AC on the
fault entry, but the AC flag of the faulted frame is kept intact. Since
onfault handler is effectively jump, AC survives until syscall exit.
Reported by: m00nbsd, via Sony
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
admbugs: 975
Kristof Provost [Sun, 16 May 2021 06:51:54 +0000 (08:51 +0200)]
pf tests: More set skip on <ifgroup> tests
Test the specific case reported in PR 255852. Clearing the skip flag
on groups was broken because pfctl couldn't work out if a kif was a
group or not, because the kernel no longer set the pfik_group pointer.
Kristof Provost [Sun, 16 May 2021 06:50:17 +0000 (08:50 +0200)]
pf: Set the pfik_group for userspace
Userspace relies on this pointer to work out if the kif is a group or
not. It can't use it for anything else, because it's a pointer to a
kernel address. Substitute 0xfeedc0de for 'true', so that we don't leak
kernel memory addresses to userspace.
Matt Joras [Fri, 30 Aug 2019 20:19:43 +0000 (20:19 +0000)]
MFC r351629: sys/net/if_vlan.c: Wrap a vlan's parent's if_output
in a separate function.
The merge is done in preparation of another merge
to support 802.1ad (qinq). Original commit log follows.
When a vlan interface is created, its if_output is set directly to the
parent interface's if_output. This is fine in the normal case but has an
unfortunate consequence if you end up with a certain combination of vlan
and lagg interfaces.
Consider you have a lagg interface with a single laggport member. When
an interface is added to a lagg its if_output is set to
lagg_port_output, which blackholes traffic from the normal networking
stack but not certain frames from BPF (pseudo_AF_HDRCMPLT). If you now
create a vlan with the laggport member (not the lagg interface) as its
parent, its if_output is set to lagg_port_output as well. While this is
confusing conceptually and likely represents a misconfigured system, it
is not itself a problem. The problem arises when you then remove the
lagg interface. Doing this resets the if_output of the laggport member
back to its original state, but the vlan's if_output is left pointing to
lagg_port_output. This gives rise to the possibility that the system
will panic when e.g. bpf is used to send any frames on the vlan
interface.
Fix this by creating a new function, vlan_output, which simply wraps the
parent's current if_output. That way when the parent's if_output is
restored there is no stale usage of lagg_port_output.
Andriy Gapon [Fri, 7 May 2021 07:17:57 +0000 (10:17 +0300)]
storvsc: fix auto-sense reporting
I saw a situation where the driver set CAM_AUTOSNS_VALID on a failed ccb
even though SRB_STATUS_AUTOSENSE_VALID was not set in the status.
The actual sense data remained all zeros.
The problem seems to be that create_storvsc_request() always sets
hv_storvsc_request::sense_info_len, so checking for sense_info_len != 0
is not enough to determine if any auto-sense data is actually available.
Andriy Gapon [Thu, 6 May 2021 18:49:37 +0000 (21:49 +0300)]
PCI hot-plug: use dedicated taskqueue for device attach / detach
Attaching and detaching devices can be heavy-weight and detaching can
sleep waiting for events. For that reason using the system-wide
single-threaded taskqueue_thread is not really appropriate.
There is even a possibility for a deadlock if taskqueue_thread is used
for detaching.
In fact, there is an easy to reproduce deadlock involving nvme, pass
and a sudden removal of an NVMe device.
A pass peripheral would not release a reference on an nvme sim until
pass_shutdown_kqueue() is executed via taskqueue_thread. But the
taskqueue's thread is blocked in nvme_detach() -> ... -> cam_sim_free()
because of the outstanding reference.
netgraph/ng_bridge: Handle send errors during loop handling
If sending out a packet fails during the loop over all links, the
allocated memory is leaked and not all links receive a copy. This
patch fixes those problems, clarifies a premature abort of the loop,
and fixes a minory style(9) bug.
Kristof Provost [Wed, 12 May 2021 17:13:40 +0000 (19:13 +0200)]
tests: Only log critical errors from scapy
Since 2.4.5 scapy started issuing warnings about a few different
configurations during our tests. These are harmless, but they generate
stderr output, which upsets atf_check.
Configure scapy to only log critical errors (and thus not warnings) to
fix these tests.
IEEE Std 802.1D-2004 Section 17.14 defines permitted ranges for timers.
Incoming BPDU messages should be checked against the permitted ranges.
The rest of 17.14 appears to be enforced already.
sbin/ipfw: Fix parsing error in table based forward
The argument parser does not recognise the optional port for an
"tablearg" argument. Fix simplifies the code by make the internal
representation expicit for the parser. Includes the fix from D30208.
Kristof Provost [Tue, 4 May 2021 17:23:15 +0000 (19:23 +0200)]
in6_mcast: Return EADDRINUSE when we've already joined the group
Distinguish between truly invalid requests and those that fail because
we've already joined the group. Both cases fail, but differentiating
them allows userspace to make more informed decisions about what the
error means.
For example. radvd tries to join the all-routers group on every SIGHUP.
This fails, because it's already joined it, but this failure should be
ignored (rather than treated as a sign that the interface's multicast is
broken).
This puts us in line with OpenBSD, NetBSD and Linux.
This allows us to kill states created from a rule with route-to/reply-to
set. This is particularly useful in multi-wan setups, where one of the
WAN links goes down.
Submitted by: Steven Brown
Obtained from: https://github.com/pfsense/FreeBSD-src/pull/11/
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30058
Glen Barber [Thu, 18 Feb 2021 04:00:03 +0000 (23:00 -0500)]
release: permanently remove the 'reldoc' target and associates
Following 7b1d1a1658ffb69eff93afc713f9e88ed8b20eac, the structure
for the reldoc target has significantly changed as result of the
ASCIIDoctor/Hugo migration. As the release notes related files
on the installation medium are inherently out of date, purge them
entirely.
Discussed within: re, doceng
No objection: re (silence), doceng (silence)
Timeout: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Glen Barber [Fri, 5 Feb 2021 16:46:49 +0000 (11:46 -0500)]
release: disable the 'reldoc' target after the ASCIIDoctor switch
The 'reldoc' target includes release-related documentation on
installation medium. Since the switch from XML to ASCIIDoctor,
the file locations have moved, and it will take some time to sort
out how this target should work now.
Rick Macklem [Wed, 28 Apr 2021 00:30:16 +0000 (17:30 -0700)]
nfscl: add check for NULL clp and forced dismounts to nfscl_delegreturnvp()
Commit aad780464fad added a function called nfscl_delegreturnvp()
to return delegations during the NFS VOP_RECLAIM().
The function erroneously assumed that nm_clp would
be non-NULL. It will be NULL for NFSV4.0 mounts until
a regular file is opened. It will also be NULL during
vflush() in nfs_unmount() for a forced dismount.
This patch adds a check for clp == NULL to fix this.
Also, since it makes no sense to call nfscl_delegreturnvp()
during a forced dismount, the patch adds a check for that
case and does not do the call during forced dismounts.
pf: Optionally attempt to preserve rule counter values across ruleset updates
Usually rule counters are reset to zero on every update of the ruleset.
With keepcounters set pf will attempt to find matching rules between old
and new rulesets and preserve the rule counters.
pf: Implement the NAT source port selection of MAP-E Customer Edge
MAP-E (RFC 7597) requires special care for selecting source ports
in NAT operation on the Customer Edge because a part of bits of the port
numbers are used by the Border Relay to distinguish another side of the
IPv4-over-IPv6 tunnel.
Rick Macklem [Tue, 27 Apr 2021 00:48:21 +0000 (17:48 -0700)]
nfscl: fix the handling of NFSERR_DELAY for Open/LayoutGet RPCs
For a pNFS mount, the NFSv4.1/4.2 client uses compound RPCs that
have both Open and LayoutGet operations in them.
If the pNFS server were tp reply NFSERR_DELAY for one of these
compounds, the retry after a delay cannot be handled by
newnfs_request(), since there is a reference held on the open
state for the Open operation in them.
Fix this by adding these RPCs to the "don't do delay here"
list in newnfs_request().
This patch is only needed if the mount is using pNFS (the "pnfs"
mount option) and probably only matters if the MDS server
is issuing delegations as well as pNFS layouts.
Rick Macklem [Tue, 27 Apr 2021 22:32:35 +0000 (15:32 -0700)]
nfsd: fix a NFSv4.1 Linux client mount stuck in CLOSE_WAIT
It was reported that a NFSv4.1 Linux client mount against
a FreeBSD12 server was hung, with the TCP connection in
CLOSE_WAIT state on the server.
When a NFSv4.1/4.2 mount is done and the back channel is
bound to the TCP connection, the soclose() is delayed until
a new TCP connection is bound to the back channel, due to
a reference count being held on the SVCXPRT structure in
the krpc for the socket. Without the soclose() call, the socket
will remain in CLOSE_WAIT and this somehow caused the Linux
client to hang.
This patch adds calls to soshutdown(.., SHUT_WR) that
are performed when the server side krpc sees that the
socket is no longer usable. Since this can be done
before the back channel is bound to a new TCP connection,
it allows the TCP connection to proceed to CLOSED state.
Mark Johnston [Tue, 4 May 2021 12:53:57 +0000 (08:53 -0400)]
nfsclient: Copy only initialized fields in nfs_getattr()
When loading attributes from the cache, the NFS client is careful to
copy only the fields that it initialized. After fetching attributes
from the server, however, it would copy the entire vattr structure
initialized from the RPC response, so uninitialized stack bytes would
end up being copied to userspace. In particular, va_birthtime (v2 and
v3) and va_gen (v3) had this problem.
Use a common subroutine to copy fields provided by the NFS client, and
ensure that we provide a dummy va_gen for the v3 case.
Reviewed by: rmacklem
Reported by: KMSAN
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30090
Allow up to 5 labels to be set on each rule.
This offers more flexibility in using labels. For example, it replaces
the customer 'schedule' keyword used by pfSense to terminate states
according to a schedule.
Add a test case where the pfctl optimizer will generate a table
automatically. These tables have long names, which we accidentally broke
in the nvlist ADDRULE ioctl.
When parsing the nvlist for a struct pf_addr_wrap we unconditionally
tried to parse "ifname". This broke for PF_ADDR_TABLE when the table
name was longer than IFNAMSIZ. PF_TABLE_NAME_SIZE is longer than
IFNAMSIZ, so this is a valid configuration.
Only parse (or return) ifname or tblname for the corresponding
pf_addr_wrap type.
This manifested as a failure to set rules such as these, where the pfctl
optimiser generated an automatic table:
pass in proto tcp to 192.168.0.1 port ssh
pass in proto tcp to 192.168.0.2 port ssh
pass in proto tcp to 192.168.0.3 port ssh
pass in proto tcp to 192.168.0.4 port ssh
pass in proto tcp to 192.168.0.5 port ssh
pass in proto tcp to 192.168.0.6 port ssh
pass in proto tcp to 192.168.0.7 port ssh
Rick Macklem [Mon, 26 Apr 2021 23:24:10 +0000 (16:24 -0700)]
nfsd: fix the slot sequence# when a callback fails
Commit 4281bfec3628 patched the server so that the
callback session slot would be free'd for reuse when
a callback attempt fails.
However, this can often result in the sequence# for
the session slot to be advanced such that the client
end will reply NFSERR_SEQMISORDERED.
To avoid the NFSERR_SEQMISORDERED client reply,
this patch negates the sequence# advance for the
case where the callback has failed.
The common case is a failed back channel, where
the callback cannot be sent to the client, and
not advancing the sequence# is correct for this
case. For the uncommon case where the client's
reply to the callback is lost, not advancing the
sequence# will indicate to the client that the
next callback is a retry and not a new callback.
But, since the FreeBSD server always sets "csa_cachethis"
false in the callback sequence operation, a retry
and a new callback should be handled the same way
by the client, so this should not matter.
Until you have this patch in your NFSv4.1/4.2 server,
you should consider avoiding the use of delegations.
Even with this patch, interoperation with the
Linux NFSv4.1/4.2 client in kernel versions prior
to 5.3 can result in frequent 15second delays if
delegations are enabled. This occurs because, for
kernels prior to 5.3, the Linux client does a TCP
reconnect every time it sees multiple concurrent
callbacks and then it takes 15seconds to recover
the back channel after doing so.
Rick Macklem [Fri, 23 Apr 2021 22:24:47 +0000 (15:24 -0700)]
nfsd: fix session slot handling for failed callbacks
When the NFSv4.1/4.2 server does a callback to a client
on the back channel, it will use a session slot in the
back channel session. If the back channel has failed,
the callback will fail and, without this patch, the
session slot will not be released.
As more callbacks are attempted, all session slots
can become busy and then the nfsd thread gets stuck
waiting for a back channel session slot.
This patch frees the session slot upon callback
failure to avoid this problem.
Without this patch, the problem can be avoided by leaving
delegations disabled in the NFS server.
sbin/ipfw: Fix null pointer deference when printing counters
ipfw -[tT] prints statistics of the last access. If the rule was never
used, the counter might be not exist. This happens unconditionally on
inserting a new rule. Avoid printing statistics in this case.
Rick Macklem [Sun, 25 Apr 2021 19:52:48 +0000 (12:52 -0700)]
nfscl: fix delegation recall when the file is not open
Without this patch, if a NFSv4 server recalled a
delegation when the file is not open, the renew
thread would block in the NFS VOP_INACTIVE()
trying to acquire the client state lock that it
already holds.
This patch fixes the problem by delaying the
vrele() call until after the client state
lock is released.
This bug has been in the NFSv4 client for
a long time, but since it only affects
delegation when recalled due to another
client opening the file, it got missed
during previous testing.
Until you have this patch in your client,
you should avoid the use of delegations.
Ed Maste [Tue, 1 Sep 2020 15:30:40 +0000 (15:30 +0000)]
release.7: update for current context
It's no longer unusual to be able to build a release with a single
command, so drop "actually" that hints at a surprise. Also just use
"network install directory" instead of referencing FTP; it's more
likely to be HTTP now.
Reviewed by: gjb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26260
If we reassemble a packet we modify the IP header (to set the length and
remove the fragment offset information), but we failed to update the
checksum. On certain setups (mostly where we did not re-fragment again
afterwards) this could lead to us sending out packets with incorrect
checksums.
struct pf_rule had a few counter_u64_t counters. Those couldn't be
usefully comminicated with userspace, so the fields were doubled up in
uint64_t u_* versions.
Now that we use struct pfctl_rule (i.e. a fully userspace version) we
can safely change the structure and remove this wart.
Stop using the kernel's struct pf_rule, switch to libpfctl's pfctl_rule.
Now that we use nvlists to communicate with the kernel these structures
can be fully decoupled.
pf: Move prototypes for userspace functions to userspace header
These functions no longer exist in the kernel, so there's no reason to
keep the prototypes in a kernel header. Move them to pfctl where they're
actually implemented.