markj [Mon, 17 Feb 2020 15:10:41 +0000 (15:10 +0000)]
Fix a swap block allocation race.
putpages' allocation of swap blocks is done under the global sw_dev
lock. Previously it would drop that lock before inserting the allocated
blocks into the object's trie, creating a window in which swap blocks
are allocated but are not visible to swapoff. This can cause
swp_pager_strategy() to fail and panic the system.
Fix the problem bluntly, by allocating swap blocks under the object
lock.
Reviewed by: jeff, kib
Tested by: pho
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23665
markj [Mon, 17 Feb 2020 15:09:40 +0000 (15:09 +0000)]
Fix object locking races in swapoff(2).
swap_pager_swapoff_object()'s goal is to allocate pages for all valid
swap blocks belonging to the object, for which there is no resident
page. If the page corresponding to a block is already resident and
valid, the block can simply be discarded.
The existing implementation tries to minimize the number of I/Os used.
For each cluster of swap blocks, it finds maximal runs of valid swap
blocks not resident in memory, and valid resident pages. During this
processing, the object lock may be dropped in several places: when
calling getpages, or when blocking on a busy page in
vm_page_grab_pages(). While the lock is dropped, another thread may
free swap blocks, causing getpages to page in stale data.
Fix the problem following a suggestion from Jeff: use getpages'
readahead capability to perform clustering rather than doing it
ourselves. The simplies the code a bit without reintroducing the old
behaviour of performing one I/O per page.
Reviewed by: jeff
Reported by: dhw, gallatin
Tested by: pho
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23664
tuexen [Mon, 17 Feb 2020 14:54:21 +0000 (14:54 +0000)]
Don't use uninitialised stack memory if the sysctl variable
net.inet.tcp.hostcache.enable is set to 0.
The bug resulted in using possibly a too small MSS value or wrong
initial retransmission timer settings. Possibly the value used
for ssthresh was also wrong.
kib [Mon, 17 Feb 2020 13:31:30 +0000 (13:31 +0000)]
pciconf: List names of all known extended PCIe capabilities.
Some ids are redundand because the list_ecaps() function decodes them
by explicit switch case. But listing them all makes it easier to not
miss ecaps, while not changing the functionality.
Initial submission by: Dmitry Luhtionov <dmitryluhtionov@gmail.com>
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
bz [Mon, 17 Feb 2020 11:08:50 +0000 (11:08 +0000)]
Partially revert VNET change and expand VNET structure.
Revert parts of r353274 replacing vnet_state with a shutdown flag.
Not having the state flag for the current SI_SUB_* makes it harder to debug
kernel or module panics related to VNET bringup or teardown.
Not having the state also does not allow us to check for other dependency
levels between components, e.g. for moving interfaces.
Expand the VNET structure with the new boolean flag indicating that we are
doing a shutdown of a given vnet and update the vnet magic cookie for the
change.
Update libkvm to compile with a bool in the kernel struct.
Bump __FreeBSD_version for (external) module builds to more easily detect
the change.
hselasky [Mon, 17 Feb 2020 09:46:32 +0000 (09:46 +0000)]
Fix kernel panic while trying to read multicast stream.
When VIMAGE is enabled make sure the "m_pkthdr.rcvif" pointer is set
for all mbufs being input by the IGMP/MLD6 code. Else there will be a
NULL-pointer dereference in the netisr code when trying to set the
VNET based on the incoming mbuf. Add an assert to catch this when
queueing mbufs on a netisr to make debugging of similar cases easier.
Found by: Vladislav V. Prodan
PR: 244002
Reviewed by: bz@
MFC after: 1 week
Sponsored by: Mellanox Technologies
scottl [Sun, 16 Feb 2020 23:10:59 +0000 (23:10 +0000)]
Add rudamentary support for UFS to probe whether a block device supports the
BIO_SPEEDUP command. Add complimentary support to the CAM periphs that
support it. This is a redo of r357710.
kaktus [Sun, 16 Feb 2020 17:11:54 +0000 (17:11 +0000)]
Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (5 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked). Use it in
preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
kib [Sun, 16 Feb 2020 15:43:28 +0000 (15:43 +0000)]
Fix build of some modules for some kernel configs.
Namely, vmm.ko cannot be compiled without 'option SMP', the code uses
IPIs and LAPIC.
Recently systrace was forced over any configs, check for KDTRACE_HOOK
before compiling the dtrace/ modules.
Reviewed by: markj
Discussed with: mjg
Tested by: se (previous version)
Sponsored by: The FreeBSD Foundation (kib)
Differential revision: https://reviews.freebsd.org/D23699
mjg [Sun, 16 Feb 2020 03:33:34 +0000 (03:33 +0000)]
vfs: fix vlrureclaim ->v_object access
The routine was checking for ->v_type == VBAD. Since vgone drops the interlock
early sets this type at the end of the process of dooming a vnode, this opens
a time window where it can clear the pointer while the inerlock-holders is
accessing it.
Another note is that the code was:
(vp->v_object != NULL &&
vp->v_object->resident_page_count > trigger)
With the compiler being fully allowed to emit another read to get the pointer,
and in fact it did on the kernel used by pho.
Use atomic_load_ptr and remember the result.
Note that this depends on type-safety of vm_object.
mjg [Sun, 16 Feb 2020 03:14:55 +0000 (03:14 +0000)]
refcount: add missing release fence to refcount_release_if_gt
The CPU succeeding in releasing the not last reference can still have pending
stores to the object protected by the affected counter. This opens a time
window where another CPU can release the last reference and free the object,
resulting in use-after-free. On top of that this prevents the compiler from
generating more accesses to the object regardless of how atomic_fcmpset_rel_int
is implemented (of course as long as it provides the release semantic).
mmacy [Sun, 16 Feb 2020 00:12:53 +0000 (00:12 +0000)]
Add zfree to zero allocation before free
Key and cookie management typically wants to
avoid information leaks by explicitly zeroing
before free. This routine simplifies that by
permitting consumers to do so without carrying
the size around.
kevans [Sat, 15 Feb 2020 19:47:49 +0000 (19:47 +0000)]
fetch(3): don't leak sockshost on failure
fetch_socks5_getenv will allocate memory for the host (or set it to NULL) in
all cases through the function; the caller is responsible for freeing it if
we end up allocating.
While I'm here, I've eliminated a label that just jumps to the next line...
kevans [Sat, 15 Feb 2020 19:31:40 +0000 (19:31 +0000)]
fetch(3): move bits of fetch_socks5_getenv around
This commit separates out port parsing and validation from grabbing the host
from the env var. The only related bit really is that we need to be more
specific with the delimiter in the IPv6 case.
dim [Sat, 15 Feb 2020 19:15:24 +0000 (19:15 +0000)]
Merge r357970 from the clang1000-import branch:
Fix the following -Werror warning from clang 10.0.0 in hptmv(4):
sys/dev/hptmv/ioctl.c:240:4: error: misleading indentation; statement is not part of the previous 'if' [-Werror,-Wmisleading-indentation]
_vbus_p=pArray->pVBus;
^
sys/dev/hptmv/ioctl.c:237:10: note: previous statement is here
if(!mIsArray(pArray))
^
This is because the return statement after the if statement was not
indented. (Note that this file has been idented assuming 4-space tabs.)
kaktus [Sat, 15 Feb 2020 18:57:49 +0000 (18:57 +0000)]
Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (4 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked). Use it in
preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
kaktus [Sat, 15 Feb 2020 18:54:59 +0000 (18:54 +0000)]
Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (2 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked). Use it in
preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
kaktus [Sat, 15 Feb 2020 18:52:12 +0000 (18:52 +0000)]
Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (2 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked). Use it in
preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
kaktus [Sat, 15 Feb 2020 18:48:38 +0000 (18:48 +0000)]
Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (1 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked). Use it in
preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
imp [Sat, 15 Feb 2020 18:14:23 +0000 (18:14 +0000)]
The KASSERT is too strict: revert r357897
It's valid for a periph to be removed with outstanding transactions on the
device. In CAM, multiple periphs attach to a single device. There's no interlock
to prevent one of these going away while other periphs have outstanding CCBs and
it's not an error either. Remove this overly agressive KASSERT to prevent
false-positive panics when devices depart.
kevans [Sat, 15 Feb 2020 18:03:16 +0000 (18:03 +0000)]
fetch(3): Add SOCKS5 support
This change adds SOCKS5 support to the library fetch(3) and updates the man
page.
Details: Within the fetch_connect() function, fetch(3) checks if the
SOCKS5_PROXY environment variable is set. If so, it connects to this host
rather than the end-host. It then initializes the SOCKS5 connection in
accordance with RFC 1928 and returns the resulting conn_t (file descriptor)
for usage by the regular FTP/HTTP handlers.
Design Decision: This change defaults all DNS resolutions through the proxy
by sending all IPs as hostnames. Going forward, another feature might be to
create another environmental variable to toggle resolutions through the
proxy or not..
One may set the SOCKS5_PROXY environment variable in any of the formats:
(note by kevans)
I've since been informed that Void Linux/xbps has a fork of libfetch that
also implements SOCKS5. I may compare/contrast the two in the mid-to-near
future.
melifaro [Sat, 15 Feb 2020 15:39:53 +0000 (15:39 +0000)]
Make ping6(1) return code consistent with the man page.
When every sendto() call originated by ping6(1) fails, current code always
returns 2 ("transmission was successful but no responses were received")
which is incorrect. Return EX_OSERR instead as in many cases it indicates
some kernel-level problems.
mjg [Sat, 15 Feb 2020 13:00:39 +0000 (13:00 +0000)]
vfs: make write suspension mandatory
At the time opt-in was introduced adding yourself as a writer was esrializing
across the mount point. Nowadays it is fully per-cpu, the only impact being
a small single-threaded hit on top of what's there right now.
Vast majority of the overhead stems from the call to VOP_GETWRITEMOUNT which
has is done regardless.
Should someone want to microoptimize this single-threaded they can coalesce
looking the mount up with adding a write to it.
kib [Fri, 14 Feb 2020 23:27:45 +0000 (23:27 +0000)]
Consolidate read code for timecounters and fix possible overflow in
bintime()/binuptime().
The algorithm to read the consistent snapshot of current timehand is
repeated in each accessor, including the details proper rollup
detection and synchronization with the writer. In fact there are only
two different kind of readers: one for bintime()/binuptime() which has
to do the in-place calculation, and another kind which fetches some
member from struct timehand.
Extract the logic into type-checked macros, GETTHBINTIME() for bintime
calculation, and GETTHMEMBER() for safe read of a structure' member.
This way, the synchronization is only written in bintime_off() and
getthmember().
In bintime_off(), use overflow-safe calculation of th_scale *
delta(timecounter). In tc_windup, pre-calculate the min delta value
which overflows and require slow algorithm, into the new timehands
th_large_delta member.
This part with overflow fix was written by Bruce Evans.
Reported by: Mark Millard <marklmi@yahoo.com> (the overflow issue)
Tested by: pho
Discussed with: emaste
Sponsored by: The FreeBSD Foundation (kib)
MFC after: 3 weeks
emaste [Fri, 14 Feb 2020 22:32:33 +0000 (22:32 +0000)]
Update version in openssh FREEBSD-vendor metadata
It appears that FREEBSD-vendor is an idea that never really took off
and we should probably just remove it, but until then we might as well
record the correct version.
This includes a small battery of /memreserve/ fixes to make sure dtc is
properly writing these regions into the output file and reading them back
out.
As of this update, dtc will now also assume common defaults for -I/-O if
only one is specified; namely, dts for one implies dtb for the other and
vice versa (Requested by: jhibbits, preserves GPL dtc behavior too).
mjg [Fri, 14 Feb 2020 13:14:19 +0000 (13:14 +0000)]
amd64: only check for error != 0 in the inlined part of l1d flush check
this replaces the following near the syscall exit:
cmp $0x39,%rax
ja 0xffffffff8108f82c
movabs $0x200001800060005,%rcx
bt %rax,%rcx
jae 0xffffffff8108f82c
kevans [Fri, 14 Feb 2020 04:16:22 +0000 (04:16 +0000)]
ncurses: correct check for gcc >= 5.0
The hack in question is intended to workaround seemingly bogus #line markers
in cpp output. As far as I can tell, llvm cpp doesn't do this by default, so
there's no reason to add -P.
In our /bin/sh, the main incantation should be placed in a sub-shell in
order to properly pipe the output to fgrep.
The main motivation for this change is admittedly to stop emitting the noise
about clang not being gcc in make -s buildworld
imp [Fri, 14 Feb 2020 00:13:23 +0000 (00:13 +0000)]
Add a KASSERT that there's no outstanding CCBs when we call camperiphfree. We
know that if there are any outstanding CCBs, then when they dereference the path
that's freed at the bottom of camperiphfree there will be some flavor of
panic. This moves that eventual panic to a traceback of when we free the last
reference on the device, which is earlier but may not be early enough.
kib [Thu, 13 Feb 2020 23:22:12 +0000 (23:22 +0000)]
Return success, instead of ESRCH, from pthread_cancel(3) applied to the
exited but not yet joined thread.
Before, if the thread exited but was not yet joined, we returned
ESRCH.
According to IEEE Std 1003.1™-2017 recommendation in the
description of pthread_cancel(3):
If an implementation detects use of a thread ID after the end of its
lifetime, it is recommended that the function should fail and report
an [ESRCH] error.
So it seems desirable to not return ESRCH until the lifetime of the
thread ID ends. According to the section 2.9.2 Thread IDs,
The lifetime of a thread ID ends after the thread terminates if it
was created with the detachstate attribute set to
PTHREAD_CREATE_DETACHED or if pthread_detach() or pthread_join()
has been called for that thread.
In other words, lifetime for thread ID of exited but not yet joined thread
did not ended yet.
Prompted by: cperciva
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
jhb [Thu, 13 Feb 2020 23:04:11 +0000 (23:04 +0000)]
Don't check the auth algorithm for GCM.
The upstream OpenSSL changes only set the cipher for GCM since the
authentication is redundant, and changes to OCF will soon remove the
GCM authentication algorithm constants entirely for the same reason.
In addition, ktls_create_session() already validates these fields and
wouldn't pass down an invalid auth_algorithm value to any drivers or
ktls backends.