Michael Tuexen [Sat, 7 Sep 2019 11:52:35 +0000 (11:52 +0000)]
MFC r350520:
Fix the reporting of multiple unknown parameters in an received INIT
chunk. This also plugs an potential mbuf leak.
Thanks to Felix Weinrank for reporting this issue found by fuzz-testing
the userland stack.
Michael Tuexen [Sat, 7 Sep 2019 11:51:07 +0000 (11:51 +0000)]
MFC r350508:
When responding with an ABORT to an INIT chunk containing a
HOSTNAME parameter or a parameter with an illegal length, only
include an error cause indicating why the ABORT was sent.
This also fixes an mbuf leak which could occur.
Michael Tuexen [Sat, 7 Sep 2019 11:46:49 +0000 (11:46 +0000)]
MFC r350404:
When performing after_idle() or post_recovery(), don't disable the
DCTCP specific methods. Also fallthrough NewReno for non ECN capable
TCP connections and improve the integer arithmetic.
Obtained from: Richard Scheffenegger
Differential Revision: https://reviews.freebsd.org/D20550
Michael Tuexen [Sat, 7 Sep 2019 11:33:27 +0000 (11:33 +0000)]
MFC r350403:
* Improve input validation of sysctl parameters for DCTPC.
* Initialize the alpha parameter to a conservative value (like Linux)
* Improve handling of arithmetic.
* Improve man-page
Obtained from: Richard Scheffenegger
Differential Revision: https://reviews.freebsd.org/D20549
Michael Tuexen [Sat, 7 Sep 2019 11:31:05 +0000 (11:31 +0000)]
MFC r350265:
Add a sysctl variable ts_offset_per_conn to change the computation
of the TCP TS offset from taking the IP addresses and the TCP port
numbers into account to a version just taking only the IP addresses
into account. This works around broken middleboxes or endpoints.
The default is to keep the behaviour, which is also the behaviour
recommended in RFC 7323.
Michael Tuexen [Sat, 7 Sep 2019 10:49:37 +0000 (10:49 +0000)]
MFC r349989:
Improve the input validation for l_linger.
When using the SOL_SOCKET level socket option SO_LINGER, the structure
struct linger is used as the option value. The component l_linger is of
type int, but internally copied to the field so_linger of the structure
struct socket. The type of so_linger is short, but it is assumed to be
non-negative and the value is used to compute ticks to be stored in a
variable of type int.
Therefore, perform input validation on l_linger similar to the one
performed by NetBSD and OpenBSD.
Thanks to syzkaller for making me aware of this issue.
Thanks to markj@ for pointing out that a similar check should be added
to so_linger_set().
Michael Tuexen [Sat, 7 Sep 2019 10:47:43 +0000 (10:47 +0000)]
MFC r349986:
When calling sctp_initialize_auth_params(), the inp must have at
least a read lock. To avoid more complex locking dances, just
call it in sctp_aloc_assoc() when the write lock is still held.
Michael Tuexen [Sat, 7 Sep 2019 10:39:49 +0000 (10:39 +0000)]
MFC r349228:
The variable names in the description of the port number usage is
inconsistent. This patch fixes that and improves the precision of
the description.
Thanks to Tom Marcoen for reporting the issue and providing an
initial patch, on which this change is based.
Mike Karels [Fri, 6 Sep 2019 21:53:04 +0000 (21:53 +0000)]
MFC r351379 r351385 r351592:
Change w(1) to compute FROM (host) field size dynamically
It's nice to be able to display a full IPv6 host address if
needed, but it's also nice to display more than 3 characters of a command
line. Compute the needed size for the FROM column in an earlier pass,
and determine the maximum, then print what fits for the command.
Fix address annotation in xml output from w
The libxo xml feature of adding an annotation with the "original"
address from the utmpx file if it is different than the final "from"
field was broken by r351379. This was pointed out by the gcc error
that save_p might be used uninitialized. Save the original address
as needed in each entry, don't just use the last one from the previous
loop.
Alan Somers [Fri, 6 Sep 2019 20:16:08 +0000 (20:16 +0000)]
MFC r350858-r350859, r350987, r351170
r350858:
ping6: Add missing static keyword for a global variable
This fixes -Wmissing-variable-declarations error when compiled with
WARNS=6.
Submitted by: Ján Sučan <sucanjan@gmail.com>
Sponsored by: Google, inc. (Google Summer of Code 2019)
Differential Revision: https://reviews.freebsd.org/D21214
r350859:
ping6: Remove unnecessary level of indirection from dnsdecode() parameter
The `sp' pointer doesn't need to be modified in the caller of
dnsdecode().
This fixes -Wcast-qual error (`must have all intermediate pointers
const qualified to be safe') when compiled with WARNS=6.
Submitted by: Ján Sučan <sucanjan@gmail.com>
Sponsored by: Google, inc. (Google Summer of Code 2019)
Differential Revision: https://reviews.freebsd.org/D21215
r350987:
ping6: Fix data type of a variable for a packet sequence number
Submitted by: Ján Sučan <sucanjan@gmail.com>
Sponsored by: Google, inc. (Google Summer of Code 2019)
Differential Revision: https://reviews.freebsd.org/D21218
r351170:
ping6: Fix dnsdecode() bug introduced by r350859
Revision 350859 removed level of indirection that was needed for setting the
caller's `cp' pointer. dnsdecode() uses return value to indicate error or
success. It returns pointer to a buffer holding a decompressed DNS name or
NULL. The caller uses that value only to find out the result, not for accessing
the buffer.
We use the return value to propagate the new value of `cp' pointer to
the caller instead of using an output argument.
Submitted by: Ján Sučan <sucanjan@gmail.com>
MFC-With: 350859
Sponsored by: Google, Inc (Google Summer of Code 2019)
Differential Revision: https://reviews.freebsd.org/D21266
Alan Somers [Fri, 6 Sep 2019 19:36:41 +0000 (19:36 +0000)]
MFC r350453:
Add a CXXWARNFLAGS variable
Some warning flags are valid for C++ but not C. GCC 8 complains if you pass
such flags when building a C file. Using a separate variable for these
flags allows building both C and C++ files in the same directory (such as
the fusefs tests) under GCC.
Reviewed by: cem, emaste
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21116
Alan Somers [Fri, 6 Sep 2019 19:22:33 +0000 (19:22 +0000)]
MFC r350386, r350390
r350386:
Add v_inval_buf_range, like vtruncbuf but for a range of a file
v_inval_buf_range invalidates all buffers within a certain LBA range of a
file. It will be used by fusefs(5). This commit is a partial merge of
r346162, r346606, and r346756 from projects/fuse2.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21032
Alan Somers [Fri, 6 Sep 2019 19:07:34 +0000 (19:07 +0000)]
MFC r350314:
special-case getvfsbyname(3) for fusefs(5)
fusefs file systems may have a fsname subtype (set by mount_fusefs's "-o
subtype" option) that gets appended to the fsname as returned by statfs(2).
The subtype is set on a per-mount basis so it isn't part of the struct
vfsconf. Special-case getvfsbyname to match either the full "fusefs.foobar"
or short "fusefs" fsname.
This is a merge of r348007, r348054, and r350093 from projects/fuse2
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21043
Alan Somers [Fri, 6 Sep 2019 18:02:58 +0000 (18:02 +0000)]
MFC r345675, r345689
r345675:
fusefs: convert debug printfs into dtrace probes
fuse(4) was heavily instrumented with debug printf statements that could
only be enabled with compile-time flags. They fell into three basic groups:
1. Totally redundant with dtrace FBT probes. These I deleted.
2. Print textual information, usually error messages. These I converted to
SDT probes of the form fuse:fuse:FILE:trace. They work just like the old
printf statements except they can be enabled at runtime with dtrace. They
can be filtered by FILE and/or by priority.
3. More complicated probes that print detailed information. These I
converted into ad-hoc SDT probes.
Also, de-inline fuse_internal_cache_attrs. It's big enough to be a regular
function, and this way it gets a dtrace FBT probe.
This commit is a merge of r345304, r344914, r344703, and r344664 from
projects/fuse2.
Reviewed by: cem
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19667
r345689:
fix the GENERIC-NODEBUG build after r345675
r344183:
FUSE: Respect userspace FS "do-not-cache" of file attributes
The FUSE protocol demands that kernel implementations cache user filesystem
file attributes (vattr data) for a maximum period of time in the range of
[0, ULONG_MAX] seconds. In practice, typical requests are for 0, 1, or 10
seconds; or "a long time" to represent indefinite caching.
Historically, FreeBSD FUSE has ignored this client directive entirely. This
works fine for local-only filesystems, but causes consistency issues with
multi-writer network filesystems.
For now, respect 0 second cache TTLs and do not cache such metadata.
Non-zero metadata caching TTLs in the range [0.000000001, ULONG_MAX] seconds
are still cached indefinitely, because it is unclear how a userspace
filesystem could do anything sensible with those semantics even if
implemented.
In the future, as an optimization, we should implement notify_inval_entry,
etc, which provide userspace filesystems a way of evicting the kernel cache.
One potentially bogus access to invalid cached attribute data was left in
fuse_io_strategy. It is restricted behind the undocumented and non-default
"vfs.fuse.fix_broken_io" sysctl or "brokenio" mount option; maybe these are
deadcode and can be eliminated?
Some minor APIs changed to facilitate this:
1. Attribute cache validity is tracked in FUSE inodes ("fuse_vnode_data").
2. cache_attrs() respects the provided TTL and only caches in the FUSE
inode if TTL > 0. It also grows an "out" argument, which, if non-NULL,
stores the translated fuse_attr (even if not suitable for caching).
3. FUSE VTOVA(vp) returns NULL if the vnode's cache is invalid, to help
avoid programming mistakes.
4. A VOP_LINK check for potential nlink overflow prior to invoking the FUSE
link op was weakened (only performed when we have a valid attr cache). The
check is racy in a multi-writer network filesystem anyway -- classic TOCTOU.
We have to trust any userspace filesystem that rejects local caching to
account for it correctly.
PR: 230258 (inspired by; does not fix)
r344184:
FUSE: Respect userspace FS "do-not-cache" of path components
The FUSE protocol demands that kernel implementations cache user filesystem
path components (lookup/cnp data) for a maximum period of time in the range
of [0, ULONG_MAX] seconds. In practice, typical requests are for 0, 1, or
10 seconds; or "a long time" to represent indefinite caching.
Historically, FreeBSD FUSE has ignored this client directive entirely. This
works fine for local-only filesystems, but causes consistency issues with
multi-writer network filesystems.
For now, respect 0 second cache TTLs and do not cache such metadata.
Non-zero metadata caching TTLs in the range [0.000000001, ULONG_MAX] seconds
are still cached indefinitely, because it is unclear how a userspace
filesystem could do anything sensible with those semantics even if
implemented.
Pass fuse_entry_out to fuse_vnode_get when available and only cache lookup
if the user filesystem did not set a zero second TTL.
PR: 230258 (inspired by; does not fix)
r344185:
FUSE: Only "dirty" cached file size when data is dirty
Most users of fuse_vnode_setsize() set the cached fvdat->filesize and update
the buf cache bounds as a result of either a read from the underlying FUSE
filesystem, or as part of a write-through type operation (like truncate =>
VOP_SETATTR). In these cases, do not set the FN_SIZECHANGE flag, which
indicates that an inode's data is dirty (in particular, that the local buf
cache and fvdat->filesize have dirty extended data).
PR: 230258 (related)
r344186:
FUSE: The FUSE design expects writethrough caching
At least prior to 7.23 (which adds FUSE_WRITEBACK_CACHE), the FUSE protocol
specifies only clean data to be cached.
Prior to this change, we implement and default to writeback caching. This
is ok enough for local only filesystems without hardlinks, but violates the
general design contract with FUSE and breaks distributed filesystems or
concurrent access to hardlinks of the same inode.
In this change, add cache mode as an extension of cache enable/disable. The
new modes are UC (was: cache disabled), WT (default), and WB (was: cache
enabled).
For now, WT caching is implemented as write-around, which meets the goal of
only caching clean data. WT can be better than WA for workloads that
frequently read data that was recently written, but WA is trivial to
implement. Note that this has no effect on O_WRONLY-opened files, which
were already coerced to write-around.
r344187:
FUSE: Refresh cached file size when it changes (lookup)
The cached fvdat->filesize is indepedent of the (mostly unused)
cached_attrs, and we failed to update it when a cached (but perhaps
inactive) vnode was found during VOP_LOOKUP to have a different size than
cached.
As noted in the code comment, this can occur in distributed filesystems or
with other kinds of irregular file behavior (anything is possible in FUSE).
We do something similar in fuse_vnop_getattr already.
PR: 230258 (as reported in description; other issues explored in
comments are not all resolved)
Reported by: MooseFS FreeBSD Team <freebsd AT moosefs.com>
Submitted by: Jakub Kruszona-Zawadzki <acid AT moosefs.com> (earlier version)
r344333:
fuse: add descriptions for remaining sysctls
(Except reclaim revoked; I don't know what that goal of that one is.)
r344334:
Fuse: whitespace and style(9) cleanup
Take a pass through fixing some of the most egregious whitespace issues in
fs/fuse. Also fix some style(9) warts while here. Not 100% cleaned up, but
somewhat less painful to look at and edit.
No functional change.
r344407:
fuse: Fix a regression introduced in r337165
On systems with non-default DFLTPHYS and/or MAXBSIZE, FUSE would attempt to
use a buf cache block size in excess of permitted size. This did not affect
most configurations, since DFLTPHYS and MAXBSIZE both default to 64kB.
The issue was discovered and reported using a custom kernel with a DFLTPHYS
of 512kB.
PR: 230260 (comment #9)
Reported by: ken@
r344857:
FUSE: Prevent trivial panic
When open(2) was invoked against a FUSE filesystem with an unexpected flags
value (no O_RDONLY / O_RDWR / O_WRONLY), an assertion fired, causing panic.
For now, prevent the panic by rejecting such VOP_OPENs with EINVAL.
This is not considered the correct long term fix, but does prevent an
unprivileged denial-of-service.
r344865:
fuse: switch from DFLTPHYS/MAXBSIZE to maxcachebuf
On GENERIC kernels with empty loader.conf, there is no functional change.
DFLTPHYS and MAXBSIZE are both 64kB at the moment. This change allows
larger bufcache block sizes to be used when either MAXBSIZE (custom kernel)
or the loader.conf tunable vfs.maxbcachebuf (GENERIC) is adjusted higher
than the default.
Add cdceem(4) driver, for virtual ethernet devices compliant
with Communication Device Class Ethernet Emulation Model (CDC EEM).
The driver supports both the device, and host side operation; there
is a new USB template (#11) for the former.
This enables communication with virtual USB NIC provided by iLO 5,
as found in new HPE Proliant servers.
Alan Somers [Fri, 6 Sep 2019 17:43:00 +0000 (17:43 +0000)]
MFC r350225:
Remove the USE_RFC2292BIS option and reap dead code
This option was imported as part of the KAME project in r62627 (in 2000).
It was turned on unconditionally in r121472 (in 2003) and has been on ever
since. The old alternative code has bitrotted. Reap the dead code.
Reported by: Ján Sučan <jansucan@gmail.com>
Differential Revision: https://reviews.freebsd.org/D20938
Alan Somers [Fri, 6 Sep 2019 17:21:21 +0000 (17:21 +0000)]
MFC r345350, r346441, r346765
r345350:
Rename fuse(4) to fusefs(4)
This makes it more consistent with other filesystems, which all end in "fs",
and more consistent with its mount helper, which is already named
"mount_fusefs".
Reviewed by: cem, rgrimes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19649
r346441:
Use symlinks for kernel modules rather than hardlinks
When aliasing a kernel module to a different name (ie if_igb for if_em),
it's better to use symlinks than hard links. kldxref will omit entries for
the links, ensuring that the loaded module has the correct name.
Reviewed by: imp
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19979
r346765:
Don't symlink fusefs.ko to fuse.ko on PPC
Some PPC systems (PowerNV) use msdosfs for /boot, which can't handle either
symlinks or hardlinks. So on PPC, copy the module instead. This change fixes
installkernel on such systems after r345350.
Reported by: Brandon Bergren <git_bdragon.rtk0.net>
Reviewed by: jhibbits, rgrimes
MFC-With: 345350, 346441
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19993
MFC r351413,351459,351467: unbreak last(1) for 8-bit locales
Ouput format of last's broken for non UTF-8 locales
since it got libxo(3) support. It uses strftime(3) that produces
non UTF-8 strings passed to xo_emit(3) with wrong %s format -
it should be %hs in this case, so xo_emit(3) produces empty output.
This change is basically no-op when locale is of UTF-8 type,
f.e. en_GB.UTF-8 or ru_RU.UTF-8 or sr_RS.UTF-8@latin.
Also it is no-op for C/POSIX locale that's a subset of UTF-8.
Warner Losh [Thu, 5 Sep 2019 23:40:38 +0000 (23:40 +0000)]
MFC r351747:
Implement nvme suspend / resume for pci attachment
Note: this is merged ~9 hours early due to a desire to have it in before feature
freeze in 20 minutes. Several reports of it working in current (and one that it
worked in -stable after copying all of -current's nvme driver) gives me
confidence that bending the rules a little here is the right trade-off.
Warner Losh [Thu, 5 Sep 2019 23:13:44 +0000 (23:13 +0000)]
MFC r351706:
In nvme_completion_poll, add a sanity check to make sure that we complete the
polling within a second. Panic if we don't. All the commands that use this
interface should typically complete within a few tens to hundreds of
microseconds. Panic rather than return ETIMEDOUT because if the command
somehow does later complete, it will randomly corrupt memory. Also, it helps
to get a traceback from where the unexpected failure happens, rather than an
infinite loop.
Warner Losh [Thu, 5 Sep 2019 23:12:56 +0000 (23:12 +0000)]
MFC r351705:
In all the places that we use the polled for completion interface, except
crash dump support code, move the while loop into an inline function. These
aren't done in the fast path, so if the compiler choses to not inline, any
performance hit is tiny.
Warner Losh [Thu, 5 Sep 2019 23:12:06 +0000 (23:12 +0000)]
MFC r351704:
Add a brief comment explaining why we can return ETIMEDOUT from the call to
the polled interface. Normally this would have the potential to corrupt stack
memory because the completion routines would run after we return. In this
case, however, we're doing a dump so it's safe for reasons explained in the
comment.
Warner Losh [Thu, 5 Sep 2019 23:09:50 +0000 (23:09 +0000)]
MFC r351406,r351447:
r351406:
We need to define version 1 of nvme, not nvme_foo. Otherwise nvd won't load
and people who pull in nvme/nvd from modules can't load nvd.ko since it
depends on nvme, not nvme_foo. The duplicate doesn't matter since kldxref
properly handles that case.
r351447:
It turns out the duplication is only mostly harmless.
Warner Losh [Thu, 5 Sep 2019 23:07:57 +0000 (23:07 +0000)]
MFC r351411:
When we have errors resetting the device before we allocate the queues, don't
try to tear them down in the ctrlr_destroy path. Otherwise, we dereference
queue structures that are NULL and we trap.
r339635:
Fix regex for extracting SHM_* values for libsysdecode
r350301:
libsysdecode: add explicit dependencies on recently changed headers
r350327:
libsysdecode: use the proper include directory
r351151:
Rework r339635 to fix .depend.tables.h handling.
Ian Lepore [Thu, 5 Sep 2019 17:20:48 +0000 (17:20 +0000)]
Fix LINT kernel builds on powerpc64 and sparc64. This is a direct commit
to 12-stable because the nandfs code no longer exists in 13-current.
The build was failing with
nandfs_dat.c:301:
warning: comparison is always false due to limited range of data type
I tried to fix it with an inline (size_t) cast of nargv->nv_nmembs in the
if() expression, but that didn't help (which seems a bit buggy), but using
an intermediate variable fixed it. Elegance doesn't matter as much as
suppressing the warning; this code is long-dead even on this branch.
Ian Lepore [Thu, 5 Sep 2019 16:46:16 +0000 (16:46 +0000)]
MFC r350838, r350840-r350841, r350849, r350879
r350838:
Switch the am335x_pmic driver to using iicdev_readfrom/writeto.
PR: 239697
Submitted by: Chuhong Yuan
r350840:
Garbage collect the no-longer-necessary MAX_IIC_DATA_SIZE (there is not a
buffer allocated at that fixed size anymore).
r350841:
When responding to an interrupt in the am335x_pmic driver, use a taskqueue
thread to do the work that involves i2c IO, which sleeps while the IO is
in progress.
r350849:
Remove use of intr_config_hook from the am335x_pmic and tda19988 drivers.
Long ago this was needed, but now low-level i2c controller drivers cleverly
defer attachment of the bus until interrupts are enabled (if they require
interrupts to function), so that every i2c slave device doesn't have to.
r350879:
Revert r350841. I didn't realize that on this chip, reading the interrupt
status register clears pending interrupts. By moving that code out of the
interrupt handler into a taskqueue task, I effectively created an interrupt
storm by returning from the handler with the interrupt source still active.
We'll have to find a different solution for this driver's need to sleep
in an ithread context.
Ian Lepore [Thu, 5 Sep 2019 16:37:10 +0000 (16:37 +0000)]
MFC r350591, r350971, r351724
r350591:
Add a driver for Texas Instruments ADS101x/ADS111x i2c ADC chips.
Instances of the device can be configured using hints or FDT data.
Interfaces to reconfigure the chip and extract voltage measurements from
it are available via sysctl(8).
r350971:
Fix the driver name in ads111x.4, and hook the manpage up to the build.
The driver was originally written with the name ads1115, but at the last
minute it got renamed to ads111x to reflect its support for many related
chips, but I forgot to update the manpage to match the renaming before
committing it all.
r351724:
Fix the name of the devicetree bindings document file cited in the manpage.
Rick Macklem [Wed, 4 Sep 2019 20:14:21 +0000 (20:14 +0000)]
MFC: r350395
Fix printing of Server Re-Failed and Server Faults.
nfsstat -s prints bogus large numbers for the Server Re-Failed and Server
Faults fields. This was introduced by r328588.
Although I know nothing about libxo, these lines aren't titles and this
patch seems to fix the problem, so I am committing it for rea@ who emailed
it to me.
It also deleted the trailing ':' from the title lines, since those were not
in the pre-r328588 output.
If there is a more correct fix, someone conversant with libxo will need
to do so.
MFC r351408-r351410: reduce pollution from mips machine/regnum.h
r351408:
libsa: mips: use _JB_* from machine/asm.h, remove regnum dep
This brings the libsa/mips _setjmp implementation closer to parity with the
libc version.
r351409:
mips: hide regnum definitions behind _KERNEL/_WANT_MIPS_REGNUM
machine/regnum.h ends up being included by sys/procfs.h and sys/ptrace.h via
machine/reg.h. Many of the regnum definitions are too short and too generic
to be exposing to any userland application including one of these two
headers. Moreover, these actively cause build failures in googletest
(template <typename T1 ...> expanding to template <typename 9 ...>).
Hide the definitions behind _KERNEL or _WANT_MIPS_REGNUM, and patch all of
the userland consumers to define as needed.
r351410:
libsa: mips: fix typo that had slipped into the diff on local machine
MFC r349951, r349994, r349995, r350005, r350023 (by jhibbits), r350478:
Provide protection against starvation of the ll/sc loops when accessing
userpace.
Initialize the frentry (the control block that defines a rule) checksum
to zero. Matching checksums save time and effort by mitigating the need
for full rule compare.
Pull in r368867 from upstream libc++ trunk (by Marshall Clow):
Rework recursive_timed_mutex so that it uses __thread_id instead of
using the lower-level __libcpp_thread_id. This is prep for fixing
PR42918. Reviewed as https://reviews.llvm.org/D65895
Pull in r368916 from upstream libc++ trunk (by Marshall Clow):
Fix thread comparison by making sure we never pass our special 'not a
thread' value to the underlying implementation. Fixes PR#42918.
This should fix std::thread::id::operator==() attempting to call
pthread_equal(3) with zero values.