Dimitry Andric [Mon, 18 Nov 2019 07:04:59 +0000 (07:04 +0000)]
MFC r354724:
Merge commit 5bbb604bb from llvm git (by Craig Topper):
[InstCombine] Disable some portions of foldGEPICmp for GEPs that
return a vector of pointers. Fix other portions.
llvm-svn: 370114
This should fix instances of 'Assertion failed: (isa<X>(Val) &&
"cast<Ty>() argument of incompatible type!"), function cast, file
/usr/src/contrib/llvm/include/llvm/Support/Casting.h, line 255', when
building openjdk8 for aarch64 and armv7.
Alan Cox [Sun, 17 Nov 2019 22:44:38 +0000 (22:44 +0000)]
MFC r352847,352930,354585
Eliminate redundant calls to critical_enter() and critical_exit() from
pmap_update_entry(). It suffices that interrupts are blocked.
In short, pmap_enter_quick_locked("user space", ..., VM_PROT_READ) doesn't
work. More precisely, it doesn't set ATTR_AP(ATTR_AP_USER) in the page
table entry, so any attempt to read from the mapped page by user space
generates a page fault. This problem has gone unnoticed because the page
fault handler, vm_fault(), will ultimately call pmap_enter(), which
replaces the non-working page table entry with one that has
ATTR_AP(ATTR_AP_USER) set. This change reduces the number of page faults
during a "buildworld" by about 19.4%.
Eliminate a redundant pmap_load() from pmap_remove_pages().
There is no reason why the pmap_invalidate_all() in pmap_remove_pages()
must be performed before the final PV list lock release. Move it past
the lock release.
Eliminate a stale comment from pmap_page_test_mappings(). We implemented
a modified bit in r350004.
Scott Long [Sat, 16 Nov 2019 00:36:42 +0000 (00:36 +0000)]
MFC r354759:
TSX Asynchronous Abort mitigation for Intel CVE-2019-11135.
This CVE has already been announced in FreeBSD SA-19:26.mcu.
Mitigation for TAA involves either turning off TSX or turning on the
VERW mitigation used for MDS. Some CPUs will also be self-mitigating
for TAA and require no software workaround.
Control knobs are:
machdep.mitigations.taa.enable:
0 - no software mitigation is enabled
1 - attempt to disable TSX
2 - use the VERW mitigation
3 - automatically select the mitigation based on processor
features.
machdep.mitigations.taa.state:
inactive - no mitigation is active/enabled
TSX disable - TSX is disabled in the bare metal CPU as well as
- any virtualized CPUs
VERW - VERW instruction clears CPU buffers
not vulnerable - The CPU has identified itself as not being
vulnerable
Nothing in the base FreeBSD system uses TSX. However, the instructions
are straight-forward to add to custom applications and require no kernel
support, so the mitigation is provided for users with untrusted
applications and tenants.
Andriy Gapon [Fri, 15 Nov 2019 07:01:04 +0000 (07:01 +0000)]
MFC r353039: add ability to set watchdog timeout for a shutdown
This change allows to specify a watchdog(9) timeout for a system
shutdown. The timeout is activated when the watchdogd daemon is
stopped. The idea is to a prevent any indefinite hang during late
stages of the shutdown. The feature is implemented in rc.d/watchdogd,
it builds upon watchdogd -x option.
Note that the shutdown timeout is not actiavted when the watchdogd
service is individually stopped by an operator. It is also not
activated for the 'shutdown' to the single-user mode. In those cases it
is assumed that the operator knows what they are doing and they have
means to recover the system should it hang.
Significant subchanges and implementation details:
- the argument to rc.shutdown, completely unused before, is assigned to
rc_shutdown variable that can be inspected by rc scripts
- init(8) passes "single" or "reboot" as the argument, this is not
changed
- the argument is not mandatory and if it is not set then rc_shutdown is
set to "unspecified"
- however, the default jail management scripts and jail configuration
examples have been updated to pass "jail" to rc.shutdown, just in case
- the new timeout can be set via watchdogd_shutdown_timeout rc option
- for consistency, the regular timeout can now be set via
watchdogd_timeout rc option
- watchdogd_shutdown_timeout and watchdogd_timeout override timeout
specifications in watchdogd_flags
- existing configurations, where the new rc options are not set, should
keep working as before
MFC r354443:
Enqueue lladdr_task to update link level address of vlan, when its parent
interface has changed.
During vlan reconfiguration without destroying interface, it is possible,
that parent interface will be changed. This usually means, that link
layer address of vlan will be different. Therefore we need to update all
associated with vlan's addresses permanent llentries - NDP for IPv6
addresses, and ARP for IPv4 addresses. This is done via lladdr_task
execution. To avoid extra work, before execution do the check, that L2
address is different.
UEFI 1.10 on macs does not seem to provide devpath to name translation,
provide our own (limited) version, so we can get information about commmon
devices.
Brooks Davis [Wed, 13 Nov 2019 23:26:12 +0000 (23:26 +0000)]
MFC r353871
Record prior MFC of r353408
r353408:
Fix -DNO_CLEAN build across r353340 and r353381
opensolaris_atomic.S is now only used on i386 with opensolaris_atomic.c
used on other platforms. After r353381 it doesn't exist on those
platforms so the stale dependency would result in a build error.
r353871:
Additional fix for -DNO_CLEAN build across r353340 and r353381
opensolaris_atomic.S is now only used on i386 with opensolaris_atomic.c
used on other platforms. After r353381 it doesn't exist on those
platforms so the stale dependency would result in a build error.
r353408 addressed this issue for cddl/lib/libzpool, but it persisted
with the opensolaris and zfs modules.
Andriy Gapon [Wed, 13 Nov 2019 07:41:19 +0000 (07:41 +0000)]
MFC r353747: vmm: remove a wmb() call
After removing wmb(), vm_set_rendezvous_func() became super trivial, so
there was no point in keeping it.
The wmb (sfence on amd64, lock nop on i386) was not needed. This can be
explained from several points of view.
First, wmb() is used for store-store ordering (although, the primitive
is undocumented). There was no obvious subsequent store that needed the
barrier.
Second, x86 has a memory model with strong ordering including total
store order. An explicit store barrier may be needed only when working
with special memory (device, special caching mode) or using special
instructions (non-temporal stores). That was not the case for this
code.
Third, I believe that there is a misconception that sfence "flushes" the
store buffer in a sense that it speeds up the propagation of stores from
the store buffer to the global visibility. I think that such
propagation always happens as fast as possible. sfence only makes
subsequent stores wait for that propagation to complete. So, sfence is
only useful for ordering of stores and only in the situations described
above.
Toomas Soome [Wed, 13 Nov 2019 07:04:11 +0000 (07:04 +0000)]
MFC: r354415
loader.efi: HARDDRIVE_DEVICE_PATH may have subpaths
The macos does create Vendor Media devices on top of APFS container
(like partition table inside the partition), so we need to collect such
devices into respective device tree.
MFC r353275:
Compile time assert a valid subsystem for all VNET init and uninit functions.
Using VNET init and uninit functions outside the given range has undefined
behaviour.
Merge commit b92deded8 from llvm git (by Louis Dionne):
[libc++] Move __clamp_to_integral to <cmath>, and harden against
min()/max() macros
llvm-svn: 370900
Merge commit 0ec6a4882 from llvm git (by Louis Dionne):
[libc++] Fix potential OOB in poisson_distribution
See details in the original Chromium bug report:
https://bugs.chromium.org/p/chromium/issues/detail?id=994957
Together, these fix a security issue in libc++'s implementation of
std::poisson_distribution, which can be exploited to read data which is
out of bounds.
Note there are no programs in the FreeBSD base system that use
std::poisson_distribution, so this is only a possible issue for ports
and external programs which have been built against libc++. Therefore,
I am bumping __FreeBSD_version for the benefit of our port maintainers.
Dimitry Andric [Sun, 10 Nov 2019 17:33:10 +0000 (17:33 +0000)]
MFC r354255:
Add __isnan()/__isnanf() aliases for compatibility with glibc and CUDA
Even though clang comes with a number of internal CUDA wrapper headers,
compiling sample CUDA programs will result in errors similar to:
In file included from <built-in>:1:
In file included from /usr/lib/clang/9.0.0/include/__clang_cuda_runtime_wrapper.h:204:
/usr/home/arr/cuda/var/cuda-repo-10-0-local-10.0.130-410.48/usr/local/cuda-10.0//include/crt/math_functions.hpp:2910:7: error: no matching function for call to '__isnan'
if (__isnan(a)) {
^~~~~~~
/usr/lib/clang/9.0.0/include/__clang_cuda_device_functions.h:460:16: note: candidate function not viable: call to __device__ function from __host__ function
__DEVICE__ int __isnan(double __a) { return __nv_isnand(__a); }
^
CUDA expects __isnan() and __isnanf() declarations to be available,
which are glibc specific extensions, equivalent to the regular isnan()
and isnanf().
To provide these, define __isnan() and __isnanf() as aliases of the
already existing static inline functions __inline_isnan() and
__inline_isnanf() from math.h.
Toomas Soome [Sun, 10 Nov 2019 09:32:17 +0000 (09:32 +0000)]
MFC r354279:
loader: calculate physical vdev psize from asize
Since physical device asize is calculated from psize and the asize is stored
in pool label, we can use asize to set the value of psize, which is used to
calculate the location of the pool labels.
This patch modifies the zfs_ioc_snapshot_list_next() ioctl to enable it
to take input parameters that alter the way looping through the list of
snapshots is performed. The idea here is to restrict functions that
throw away some of the snapshots returned by the ioctl to a range of
snapshots that these functions actually use. This improves efficiency
and execution speed for some rollback and send operations.
Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matt Ahrens <mahrens@delphix.com> Signed-off-by: Alek Pinchuk <apinchuk@datto.com>
Closes #8077
zfsonlinux/zfs@4c0883fb4af0d5565459099b98fcf90ecbfa1ca1
r354120:
Commit missing file from r354116
Pointy-hat-to: Me
Reported by: Dan Mack
MFC-With: 354116
The valectl(4) program is used to manage vale(4) switches.
Add it to the system commands so that it can be used right away.
This program was previously called vale-ctl, and stored in
tools/tools/netmap
Check CSUM_SND_TAG flag before classifying packet has having a send
tag in mlx5en(4). This fixes an issue with packets being dropped when
doing packet forwarding, because the rcvif field is still set. The
send tag pointer and rcvif field share the same memory location.
Alexander Motin [Wed, 6 Nov 2019 17:49:38 +0000 (17:49 +0000)]
MFC r354241: Some more taskqueue optimizations.
- Optimize enqueue for two task priority values by adding new tq_hint
field, pointing to the last task inserted into the middle of the list.
In case of more then two priority values it should halve average search.
- Move tq_active insert/remove out of the taskqueue_run_locked loop.
Instead of dirtying few shared cache lines per task introduce different
mechanism to drain active tasks, based on task sequence number counter,
that uses only cache lines already present in cache. Since the new
mechanism does not need ordering, switch tq_active from TAILQ to LIST.
- Move static and dynamic struct taskqueue fields into different cache
lines. Move lock into its own cache line, so that heavy lock spinning
by multiple waiting threads would not affect the running thread.
- While there, correct some TQ_SLEEP() wait messages.
This change fixes certain ZFS write workloads, causing huge congestion
on taskqueue lock. Those workloads combine some large block writes to
saturate the pool and trigger allocation throttling, which uses higher
priority tasks to requeue the delayed I/Os, with many small blocks to
generate deep queue of small tasks for taskqueue to sort.
The previous code came from OpenSolaris, which in my understanding require
allocation size to be known to free memory. To store that size previous
code allocated additional 8 byte header. But I have noticed that zlib
with present settings allocates 64KB context buffers for each call, that
could be efficiently cached by UMA, but addition of those 8 bytes makes
them fall back to physical RAM allocations, that cause huge overhead and
lock congestion on small blocks. Since FreeBSD's free() does not have
the size argument, switching to it solves the problem, increasing write
speed to ZVOLs with 4KB block size and GZIP compression on my 40-threads
test system from ~60MB/s to ~600MB/s.
The goal of this driver is consolidate information about SuperIO chips
and to provide for peaceful coexistence of drivers that need to access
SuperIO configuration registers.
While SuperIO chips can host various functions most of them are
discoverable and accessible without any knowledge of the SuperIO.
Examples are: keyboard and mouse controllers, UARTs, floppy disk
controllers. SuperIO-s also provide non-standard functions such as
GPIO, watchdog timers and hardware monitoring. Such functions do
require drivers with a knowledge of a specific SuperIO.
At this time the driver supports a number of ITE and Nuvoton (fka
Winbond) SuperIO chips.
There is a single driver for all devices. So, I have not done the usual
split between the hardware driver and the bus functionality. Although,
superio does act as a bus for devices that represent known non-standard
functions of a SuperIO chip. The bus provides enumeration of child
devices based on the hardcoded knowledge of such functions. The
knowledge as extracted from datasheets and other drivers.
As there is a single driver, I have not defined a kobj interface for it.
So, its interface is currently made of simple functions.
I think that we can the flexibility (and complications) when we actually
need it.
Mitchell Horne [Sat, 2 Nov 2019 19:52:22 +0000 (19:52 +0000)]
MFC r353334:
RISC-V: Fix an alignment warning in libthr
Compiling with clang gives a loss-of-alignment error due the cast to
uint8_t *. Since the TLS is always tcb aligned and TP_OFFSET is defined
as sizeof(struct tcb) we can guarantee there is no misalignment. Silence
the error by moving the offset into the inline assembly.
Mitchell Horne [Sat, 2 Nov 2019 19:50:36 +0000 (19:50 +0000)]
MFC r352730:
Fix some broken relocation handling
In a few cases, the symbol lookup is missing before attempting to
perform the relocation. While the relocation types affected are
currently unused, this results in an uninitialized variable warning,
that is escalated to an error when building with clang.
Mitchell Horne [Sat, 2 Nov 2019 19:48:42 +0000 (19:48 +0000)]
MFC r352729:
Cleanup of elf_machdep.c
Fix some style(9) violations.
This also changes the name of the machine-dependent sysctl kern.debug_kld to
debug.kld_reloc, and changes its type from int to bool. This is acceptable
since we are not currently concerned with preserving the RISC-V ABI.
Mitchell Horne [Sat, 2 Nov 2019 19:46:39 +0000 (19:46 +0000)]
MFC r340228-r340229, r340231
r340228 by jhb:
Enable use of a global shared page for RISC-V.
machine/vmparam.h already defines the SHAREDPAGE constant. This
change just enables it for ELF executables. The only use of the
shared page currently is to hold the signal trampoline.
Andriy Gapon [Thu, 31 Oct 2019 09:14:50 +0000 (09:14 +0000)]
MFC r353176,r353304,r353556,r353559: large_dnode improvements and fixes
r353176: MFV r350898, r351075: 8423 8199 7432 Implement large_dnode pool feature
This updates FreeBSD large_dnode code (that was imported from ZoL) to a
version that was committed to illumos. It has some cleanups,
improvements and fixes comparing to what we have in FreeBSD now.
I think that the most significant update is 8199 multi-threaded
dmu_object_alloc().
r353304: zfs: use atomic_load_64 to read atomic variable in dmu_object_alloc_impl
r353556: MFV r353551: 10452 ZoL: merge in large dnode feature fixes
r353559: MFV r353558: 10572 10579 Fix race in dnode_check_slots_free()