Mark Johnston [Mon, 12 Apr 2021 13:32:30 +0000 (09:32 -0400)]
Rename struct device to struct _device
types.h defines device_t as a typedef of struct device *. struct device
is defined in subr_bus.c and almost all of the kernel uses device_t.
The LinuxKPI also defines a struct device, so type confusion can occur.
This causes bugs and ambiguity for debugging tools. Rename the FreeBSD
struct device to struct _device.
Mark Johnston [Mon, 12 Apr 2021 13:32:08 +0000 (09:32 -0400)]
qlnxr: Properly initialize the Linux device structure
The driver needs to provide a LinuxKPI device structure to register
itself with the IB subsystem. It was erroneously using a copy of its
FreeBSD device structure for this purpose.
Use linux_pci_attach_device() instead, following the example of the
Chelsio iwarp driver. Also ensure that we don't leak the faked device
during detach.
Reviewed by: hselasky
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29595
Rick Macklem [Sun, 11 Apr 2021 23:51:25 +0000 (16:51 -0700)]
nfsd: cut the Linux NFSv4.1/4.2 some slack w.r.t. RFC5661
Recent testing of network partitioning a FreeBSD NFSv4.1
server from a Linux NFSv4.1 client identified problems
with both the FreeBSD server and Linux client.
Sometimes, after some Linux NFSv4.1/4.2 clients establish
a new TCP connection, they will advance the sequence number
for a session slot by 2 instead of 1.
RFC5661 specifies that a server should reply
NFS4ERR_SEQ_MISORDERED for this case.
This might result in a system call error in the client and
seems to disable future use of the slot by the client.
Since advancing the sequence number by 2 seems harmless,
allow this case if vfs.nfs.linuxseqsesshack is non-zero.
Note that, if the order of RPCs is actually reversed,
a subsequent RPC with a smaller sequence number value
for the slot will be received. This will result in
a NFS4ERR_SEQ_MISORDERED reply.
This has not been observed during testing.
Setting vfs.nfs.linuxseqsesshack to 0 will provide
RFC5661 compliant behaviour.
This fix affects the fairly rare case where a NFSv4
Linux client does a TCP reconnect and then apparently
erroneously increments the sequence number for the
session slot twice during the reconnect cycle.
Rick Macklem [Sun, 11 Apr 2021 21:47:36 +0000 (14:47 -0700)]
param.h: bump __FreeBSD_version for commit 7763814fc9c2
Commit 7763814fc9c2 changed the internal KAPI between the krpc
and NFS. As such, the krpc, nfscommon and nfscl modules must
all be rebuilt from sources.
Rick Macklem [Sun, 11 Apr 2021 21:34:57 +0000 (14:34 -0700)]
nfsv4 client: do the BindConnectionToSession as required
During a recent testing event, it was reported that the NFSv4.1/4.2
server erroneously bound the back channel to a new TCP connection.
RFC5661 specifies that the fore channel is implicitly bound to a
new TCP connection when an RPC with Sequence (almost any of them)
is done on it. For the back channel to be bound to the new TCP
connection, an explicit BindConnectionToSession must be done as
the first RPC on the new connection.
Since new TCP connections are created by the "reconnect" layer
(sys/rpc/clnt_rc.c) of the krpc, this patch adds an optional
upcall done by the krpc whenever a new connection is created.
The patch also adds the specific upcall function that does a
BindConnectionToSession and configures the krpc to call it
when required.
This is necessary for correct interoperability with NFSv4.1/NFSv4.2
servers when the nfscbd daemon is running.
If doing NFSv4.1/NFSv4.2 mounts without this patch, it is
recommended that the nfscbd daemon not be running and that
the "pnfs" mount option not be specified.
Andrew Turner [Sun, 11 Apr 2021 09:00:00 +0000 (09:00 +0000)]
Use if ... else when printing memory attributes
In vmstat there is a switch statement that converts these attributes to
a string. As some values can be duplicate we have to hide these from
userspace.
Replace this switch statement with an if ... else macro that lets us
repeat values without a compiler error.
Ka Ho Ng [Sun, 11 Apr 2021 06:45:37 +0000 (14:45 +0800)]
vnode_pager_setsize.9: Some clarifications on the manpage
A number of changes:
- Clarifies the locking rules when calling the routine.
- Correct the description regarding the content range to be purged.
- Document the effects on page fault handler.
MFC after: 3 days
MFC with: 86a52e262a6f
Sponsored by: The FreeBSD Foundation
Reviewed by: bcr, kib
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D29637
Rick Macklem [Sat, 10 Apr 2021 22:50:25 +0000 (15:50 -0700)]
nfsd: fix replies from session cache for multiple retries
Recent testing of network partitioning a FreeBSD NFSv4.1
server from a Linux NFSv4.1 client identified problems
with both the FreeBSD server and Linux client.
Commit 05a39c2c1c18 fixed replying with the cached reply in
in the session slot if same session slot sequence#.
However, the code uses the reply and, as such,
will fail for a subsequent retry of the RPC.
A subsequent retry would be an extremely rare event,
but this patch fixes this, so long as m_copym(..M_NOWAIT)
does not fail, which should also be a rare event.
This fix affects the exceedingly rare case where a NFSv4
client retries a non-idempotent RPC, such as a lock
operation, multiple times. Note that retries only occur
after the client has needed to create a new TCP connection,
with a new TCP connection for each retry.
Slighly relax the gateway validation rules imposed by the 2fe5a79425c7, by requiring only first 8 bytes (everyhing
before sdl_data to be present in the AF_LINK gateway.
Right now, libthr does not initialize RtldLockInfo.rtli_version when calling
_rtld_thread_init(), which makes versioning the interface troublesome.
Add a workaround: if the calling object of _rtld_thread_init() exports
the "_pli_rtli_version" symbol, then consider rtli_version initialized.
Otherwise, forcibly set it to RTLI_VERSION_ONE, currently defined as
RTLI_VERSION.
Export "_pli_rtli_version" from libthr and properly initialize rtli_version.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D29633
Only use -fp-exception-behavior=maytrap on x86, for now
After 3b00222f156d, it turns out that clang only supports strict
floating point semantics for SystemZ and x86 at the moment, while for
other architectures it is still experimental.
Therefore, only use -fp-exception-behavior=maytrap on x86 for now,
otherwise this option results in "error: overriding currently
unsupported use of floating point exceptions on this target
[-Werror,-Wunsupported-floating-point-opt]" on other architectures.
inp_lookup_mcast_ifp() is static and is only used in the inp_join_group().
The latter function is also static, and is only used in the inp_setmoptions(),
which relies on inp being non-NULL.
As a result, in the current code, inp_lookup_mcast_ifp() is always called
with non-NULL inp. Eliminate unused RT_DEFAULT_FIB condition and always
use inp fib instead.
Avoid raising unexpected floating point exceptions in libm
When using clang with x86_64 CPUs that support AVX, some floating point
transformations may raise exceptions that would not have been raised by
the original code. To avoid this, use the -fp-exception-behavior=maytrap
flag, introduced in clang 10.0.0.
In particular, this fixes a number of test failures with ctanhf(3) and
ctanf(3), when libm is compiled with -mavx. An unexpected FE_INVALID
exception is then raised, because clang emits vdivps instructions to
perform certain divides. (The vdivps instruction operates on multiple
single-precision float operands simultaneously, but the exceptions may
be influenced by unused parts of the XMM registers. In this particular
case, it was calculating 0 / 0, which results in FE_INVALID.)
If -fp-exception-behavior=maytrap is specified however, clang uses
vdivss instructions instead, which work on one operand, and should not
raise unexpected exceptions.
struct pf_rule had a few counter_u64_t counters. Those couldn't be
usefully comminicated with userspace, so the fields were doubled up in
uint64_t u_* versions.
Now that we use struct pfctl_rule (i.e. a fully userspace version) we
can safely change the structure and remove this wart.
Stop using the kernel's struct pf_rule, switch to libpfctl's pfctl_rule.
Now that we use nvlists to communicate with the kernel these structures
can be fully decoupled.
pf: Move prototypes for userspace functions to userspace header
These functions no longer exist in the kernel, so there's no reason to
keep the prototypes in a kernel header. Move them to pfctl where they're
actually implemented.
Kristof Provost [Fri, 26 Mar 2021 10:22:15 +0000 (11:22 +0100)]
pfctl: Use the new DIOCGETRULENV ioctl
Create wrapper functions to handle the parsing of the nvlist and move
that code into pfctl_ioctl.c.
At some point this should be moved into a libpfctl.
instead of manual zeroing of the debug registers file in pcb.
This centralizes the cleaning code, but the practical difference is
that PCB_DBREGS flag is cleared, saving some operations on context
switching.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D29687
Warner Losh [Fri, 9 Apr 2021 22:35:17 +0000 (16:35 -0600)]
efivar: Attempt to fix setting/printing/deleting EFI vars with '-' in their name
Due to how we're parsing UUIDs, we were disallowing setting, printing or
deleting any UEFI variable with a '-' in it when you attempted to do that
operation with the exact name (wildcard reporting was unaffected). Fix the
parser to loop over all the dashes in the name and only give up when all
possible matches are exhausted.
tcp_hostcache: implement tcp_hc_updatemtu() via tcp_hc_update.
Locking changes are planned here, and without this change too
much copy-and-paste would be between these two functions.
This fixes a regression in d36d6816151705907393889, where the call to
__tls_get_address() was performed under rtld_bind_lock write-locked.
Instead use tls_get_addr_slow() directly, with locked = true.
Reported by: jkim, many others
Tested by: jkim, bdragon (powerpc), mhorne (riscv)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D29623
The intent is to better handle time intervals with large amount of RIB
updates (e.g. BGP peer going up or down), while still keeping low sync
delay for the rest scenarios.
The implementation is the following: updates are bucketed into the
buckets of size 50ms. If the number of updates within a current bucket
exceeds the threshold of 500 routes/sec (e.g. 10 updates per bucket
interval), the update is delayed for another 50ms. This can be repeated
until the maximum update delay (1 sec) is reached.
Gordon Bergling [Fri, 9 Apr 2021 15:28:18 +0000 (17:28 +0200)]
sysctl.conf(5): Mention sysctl.conf.local in the sysctl.conf(5) manual page
The possibility of using a sysctl.conf.local on a machine that has a shared
sysctl.conf(5) isn't documented. So mention the sysctl.conf.local in the
manual page.
PR: 254901
Submitted by: Jose Luis Duran <jlduran at gmail dot com>
Reported by: Jose Luis Duran <jlduran at gmail dot com>
Reviewed by: markj
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D29673