MFC r353275:
Compile time assert a valid subsystem for all VNET init and uninit functions.
Using VNET init and uninit functions outside the given range has undefined
behaviour.
Merge commit b92deded8 from llvm git (by Louis Dionne):
[libc++] Move __clamp_to_integral to <cmath>, and harden against
min()/max() macros
llvm-svn: 370900
Merge commit 0ec6a4882 from llvm git (by Louis Dionne):
[libc++] Fix potential OOB in poisson_distribution
See details in the original Chromium bug report:
https://bugs.chromium.org/p/chromium/issues/detail?id=994957
Together, these fix a security issue in libc++'s implementation of
std::poisson_distribution, which can be exploited to read data which is
out of bounds.
Note there are no programs in the FreeBSD base system that use
std::poisson_distribution, so this is only a possible issue for ports
and external programs which have been built against libc++. Therefore,
I am bumping __FreeBSD_version for the benefit of our port maintainers.
Dimitry Andric [Sun, 10 Nov 2019 17:33:10 +0000 (17:33 +0000)]
MFC r354255:
Add __isnan()/__isnanf() aliases for compatibility with glibc and CUDA
Even though clang comes with a number of internal CUDA wrapper headers,
compiling sample CUDA programs will result in errors similar to:
In file included from <built-in>:1:
In file included from /usr/lib/clang/9.0.0/include/__clang_cuda_runtime_wrapper.h:204:
/usr/home/arr/cuda/var/cuda-repo-10-0-local-10.0.130-410.48/usr/local/cuda-10.0//include/crt/math_functions.hpp:2910:7: error: no matching function for call to '__isnan'
if (__isnan(a)) {
^~~~~~~
/usr/lib/clang/9.0.0/include/__clang_cuda_device_functions.h:460:16: note: candidate function not viable: call to __device__ function from __host__ function
__DEVICE__ int __isnan(double __a) { return __nv_isnand(__a); }
^
CUDA expects __isnan() and __isnanf() declarations to be available,
which are glibc specific extensions, equivalent to the regular isnan()
and isnanf().
To provide these, define __isnan() and __isnanf() as aliases of the
already existing static inline functions __inline_isnan() and
__inline_isnanf() from math.h.
Toomas Soome [Sun, 10 Nov 2019 09:32:17 +0000 (09:32 +0000)]
MFC r354279:
loader: calculate physical vdev psize from asize
Since physical device asize is calculated from psize and the asize is stored
in pool label, we can use asize to set the value of psize, which is used to
calculate the location of the pool labels.
This patch modifies the zfs_ioc_snapshot_list_next() ioctl to enable it
to take input parameters that alter the way looping through the list of
snapshots is performed. The idea here is to restrict functions that
throw away some of the snapshots returned by the ioctl to a range of
snapshots that these functions actually use. This improves efficiency
and execution speed for some rollback and send operations.
Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matt Ahrens <mahrens@delphix.com> Signed-off-by: Alek Pinchuk <apinchuk@datto.com>
Closes #8077
zfsonlinux/zfs@4c0883fb4af0d5565459099b98fcf90ecbfa1ca1
r354120:
Commit missing file from r354116
Pointy-hat-to: Me
Reported by: Dan Mack
MFC-With: 354116
The valectl(4) program is used to manage vale(4) switches.
Add it to the system commands so that it can be used right away.
This program was previously called vale-ctl, and stored in
tools/tools/netmap
Check CSUM_SND_TAG flag before classifying packet has having a send
tag in mlx5en(4). This fixes an issue with packets being dropped when
doing packet forwarding, because the rcvif field is still set. The
send tag pointer and rcvif field share the same memory location.
Alexander Motin [Wed, 6 Nov 2019 17:49:38 +0000 (17:49 +0000)]
MFC r354241: Some more taskqueue optimizations.
- Optimize enqueue for two task priority values by adding new tq_hint
field, pointing to the last task inserted into the middle of the list.
In case of more then two priority values it should halve average search.
- Move tq_active insert/remove out of the taskqueue_run_locked loop.
Instead of dirtying few shared cache lines per task introduce different
mechanism to drain active tasks, based on task sequence number counter,
that uses only cache lines already present in cache. Since the new
mechanism does not need ordering, switch tq_active from TAILQ to LIST.
- Move static and dynamic struct taskqueue fields into different cache
lines. Move lock into its own cache line, so that heavy lock spinning
by multiple waiting threads would not affect the running thread.
- While there, correct some TQ_SLEEP() wait messages.
This change fixes certain ZFS write workloads, causing huge congestion
on taskqueue lock. Those workloads combine some large block writes to
saturate the pool and trigger allocation throttling, which uses higher
priority tasks to requeue the delayed I/Os, with many small blocks to
generate deep queue of small tasks for taskqueue to sort.
The previous code came from OpenSolaris, which in my understanding require
allocation size to be known to free memory. To store that size previous
code allocated additional 8 byte header. But I have noticed that zlib
with present settings allocates 64KB context buffers for each call, that
could be efficiently cached by UMA, but addition of those 8 bytes makes
them fall back to physical RAM allocations, that cause huge overhead and
lock congestion on small blocks. Since FreeBSD's free() does not have
the size argument, switching to it solves the problem, increasing write
speed to ZVOLs with 4KB block size and GZIP compression on my 40-threads
test system from ~60MB/s to ~600MB/s.
The goal of this driver is consolidate information about SuperIO chips
and to provide for peaceful coexistence of drivers that need to access
SuperIO configuration registers.
While SuperIO chips can host various functions most of them are
discoverable and accessible without any knowledge of the SuperIO.
Examples are: keyboard and mouse controllers, UARTs, floppy disk
controllers. SuperIO-s also provide non-standard functions such as
GPIO, watchdog timers and hardware monitoring. Such functions do
require drivers with a knowledge of a specific SuperIO.
At this time the driver supports a number of ITE and Nuvoton (fka
Winbond) SuperIO chips.
There is a single driver for all devices. So, I have not done the usual
split between the hardware driver and the bus functionality. Although,
superio does act as a bus for devices that represent known non-standard
functions of a SuperIO chip. The bus provides enumeration of child
devices based on the hardcoded knowledge of such functions. The
knowledge as extracted from datasheets and other drivers.
As there is a single driver, I have not defined a kobj interface for it.
So, its interface is currently made of simple functions.
I think that we can the flexibility (and complications) when we actually
need it.
Mitchell Horne [Sat, 2 Nov 2019 19:52:22 +0000 (19:52 +0000)]
MFC r353334:
RISC-V: Fix an alignment warning in libthr
Compiling with clang gives a loss-of-alignment error due the cast to
uint8_t *. Since the TLS is always tcb aligned and TP_OFFSET is defined
as sizeof(struct tcb) we can guarantee there is no misalignment. Silence
the error by moving the offset into the inline assembly.
Mitchell Horne [Sat, 2 Nov 2019 19:50:36 +0000 (19:50 +0000)]
MFC r352730:
Fix some broken relocation handling
In a few cases, the symbol lookup is missing before attempting to
perform the relocation. While the relocation types affected are
currently unused, this results in an uninitialized variable warning,
that is escalated to an error when building with clang.
Mitchell Horne [Sat, 2 Nov 2019 19:48:42 +0000 (19:48 +0000)]
MFC r352729:
Cleanup of elf_machdep.c
Fix some style(9) violations.
This also changes the name of the machine-dependent sysctl kern.debug_kld to
debug.kld_reloc, and changes its type from int to bool. This is acceptable
since we are not currently concerned with preserving the RISC-V ABI.
Mitchell Horne [Sat, 2 Nov 2019 19:46:39 +0000 (19:46 +0000)]
MFC r340228-r340229, r340231
r340228 by jhb:
Enable use of a global shared page for RISC-V.
machine/vmparam.h already defines the SHAREDPAGE constant. This
change just enables it for ELF executables. The only use of the
shared page currently is to hold the signal trampoline.
Andriy Gapon [Thu, 31 Oct 2019 09:14:50 +0000 (09:14 +0000)]
MFC r353176,r353304,r353556,r353559: large_dnode improvements and fixes
r353176: MFV r350898, r351075: 8423 8199 7432 Implement large_dnode pool feature
This updates FreeBSD large_dnode code (that was imported from ZoL) to a
version that was committed to illumos. It has some cleanups,
improvements and fixes comparing to what we have in FreeBSD now.
I think that the most significant update is 8199 multi-threaded
dmu_object_alloc().
r353304: zfs: use atomic_load_64 to read atomic variable in dmu_object_alloc_impl
r353556: MFV r353551: 10452 ZoL: merge in large dnode feature fixes
r353559: MFV r353558: 10572 10579 Fix race in dnode_check_slots_free()
Alan Somers [Wed, 30 Oct 2019 02:33:43 +0000 (02:33 +0000)]
MFC r353439:
MFZol: Fix performance of "zfs recv" with many deletions
This patch fixes 2 issues with the DMU free throttle implemented
in dmu_free_long_range(). The first issue is that get_next_chunk()
was calculating the number of L1 blocks the free would dirty
incorrectly. In some cases involving extremely large files, this
code would greatly overestimate the number of affected L1 blocks,
causing excessive calls to txg_wait_open(). This patch corrects
the calculation.
The second issue is that the free throttle uses the total number
of free'd blocks in all (open, quiescing, and syncing) txgs to
determine whether to throttle. This causes large frees (such as
those created by the first issue) to cause 4 txg syncs before
any further frees were allowed to proceed. This patch ensures
that the accounting is done entirely in a per-txg fashion, so
that frees from a given txg don't affect those that immediately
follow it.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Tom Caputi <tcaputi@datto.com>
zfsonlinux/zfs@f4c594da94d856c422512a54e48070f890b2685b
Freeing throttle should account for holes
Deletion throttle currently does not account for holes in a file.
This means that it can activate when it shouldn't.
To fix it we switch the throttle to be based on the number of
L1 blocks we will have to dirty when freeing
Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alek Pinchuk <apinchuk@datto.com>
zfsonlinux/zfs@65282ee9e06b130f1f0169baf5d9bf0dd8fc1ef9
r353117:
ZFS: the hotspare_add_004_neg test needs at least two disks
Sponsored by: Axcient
r353118:
ZFS: fix several of the "zpool create" tests
* Remove zpool_create_013_neg. FreeBSD doesn't have an equivalent of
Solaris's metadevices. GEOM would be the equivalent, but since all geoms
are the same from ZFS's perspective, this test would be redundant with
zpool_create_012_neg
* Remove zpool_create_014_neg. FreeBSD does not support swapping to regular
files.
* Remove zpool_create_016_pos. This test is redundant with literally every
other test that creates a disk-backed pool.
* s:/etc/vfstab:/etc/fstab in zpool_create_011_neg
* Delete the VTOC-related portion of zpool_create_008_pos. FreeBSD doesn't
use VTOC.
* Replace dumpadm with dumpon and swap with swapon in multiple tests.
* In zpool_create_015_neg, don't require "zpool create -n" to fail. It's
reasonable for that variant to succeed, because it doesn't actually open
the zvol.
* Greatly simplify zpool_create_012_neg. Make it safer, too, but not
interfering with the system's regular swap devices.
* Expect zpool_create_011_neg to fail (PR 241070)
* Delete some redundant cleanup steps in various tests
* Remove some unneeeded ATF timeout specifications. The default is fine.
PR: 241070
Sponsored by: Axcient
r353281:
ZFS: fix several zvol_misc tests
* Adapt zvol_misc_001_neg to use dumpon instead of Solaris's dumpadm
* Disable zvol_misc_003_neg, zvol_misc_005_neg, and zvol_misc_006_pos,
because they involve using a zvol as a dump device, which FreeBSD does not
yet support.
Sponsored by: Axcient
r353282:
zfs: fix the slog_012_neg test
This test attempts to corrupt a file-backed vdev by deleting it and then
recreating it with truncate. But that doesn't work, because the pool
already has the vdev open, and it happily hangs on to the open-but-deleted
file. Fix by truncating the file without deleting it.
Sponsored by: Axcient
r353284:
ZFS: fix the zpool_get_002_pos test
ZFS has grown some additional properties that hadn't been added to the
config file yet. While I'm here, improve the error message, and remove a
superfluous command.
Sponsored by: Axcient
r353285:
zfs: fix the zdb_001_neg test
The test needed to be updated for r331701 (MFV illumos 8671400), which added
a "-k" option.
Sponsored by: Axcient
r353286:
zfs: skip the zfsd tests if zfsd is not running
* Replace runwattr with sudo
* Fix a scoping bug with the "dtst" variable
* Cleanup user properties created during tests
* Eliminate the checks for refreservation and send support. They will always
be supported.
* Fix verify_fs_snapshot. It seemed to assume that permissions would not yet
be delegated, but that's not how it's actually used.
* Combine verify_fs_promote with verify_vol_promote
* Remove some useless sleeps
* Fix backwards condition in verify_vol_volsize
* Remove some redundant cleanup steps in the tests. cleanup.ksh will handle
everything.
* Disable some parts of the tests that FreeBSD doesn't support:
* Creating snapshots with mkdir
* devices
* shareisci
* sharenfs
* xattr
* zoned
The sharenfs parts could probably be reenabled with more work to remove the
Solarisms.
r353288:
ZFS: mark hotspare_scrub_002_pos as an expected failure
"zpool scrub" doesn't detect all errors on active spares in raidz arrays
PR: 241069
Sponsored by: Axcient
r353289:
ZFS: fix the redundancy tests
* Fix force_sync_path, which ensures that a file is fully flushed to disk.
Apparently "zpool history"'s performance has improved, but exporting and
importing the pool still works.
* Fix file_dva by using undocumented zdb syntax to clarify that we're
interested in the pool's root file system, not the pool itself. This
should also fix the zpool_clear_001_pos test.
* Remove a redundant cleanup step
r353309:
zfs: fix the zfsd_autoreplace_003_pos test
The test declared that it only needed 5 disks, but actually tried to use 6.
Fix it to use just 5, which is all it really needs.
Sponsored by: Axcient
r353310:
zfs: fix the zfsd_hotspare_007_pos test
It was trying to destroy the pool while zfsd was detaching the spare, and
"zpool destroy" failed. Fix by waiting until the spare has fully detached.
Sponsored by: Axcient
r353360:
ZFS: multiple fixes to the zpool_import tests
* Don't create a UFS mountpoint just to store some temporary files. The
tests should always be executed with a sufficiently large TMPDIR.
Creating the UFS mountpoint is not only unneccessary, but it slowed
zpool_import_missing_002_pos greatly, because that test moves large files
between TMPDIR and the UFS mountpoint. This change also allows many of
the tests to be executed with just a single test disk, instead of two.
* Move zpool_import_missing_002_pos's backup device dir from / to $PWD to
prevent cross-device moves. On my system, these two changes improved that
test's speed by 39x. It should also prevent ENOSPC errors seen in CI.
* If insufficient disks are available, don't try to partition one of them.
Just rely on Kyua to skip the test. Users who care will configure Kyua
with sufficient disks.
Sponsored by: Axcient
r353361:
ZFS: in the tests, don't override PWD
The ZFS test suite was overriding the common $PWD variable with the path to
the pwd command, even though no test wanted to use it that way. Most tests
didn't notice, because ksh93 eventually restored it to its proper meaning.
Sponsored by: Axcient
r353366:
ZFS: fix the zpool_add_010_pos test
The test is necessarily racy, because it depends on being able to complete a
"zpool add" before a previous resilver finishes. But it was racier than it
needed to be. Move the first "zpool add" to before the resilver starts.
Sponsored by: Axcient
r353379:
zfs: multiple improvements to the zpool_add tests
* Don't partition a disk if too few are available. Just rely on Kyua to
ensure that the tests aren't run with insufficient disks.
* Remove redundant cleanup steps
* In zpool_add_003_pos, store the temporary file in $PWD so Kyua will
automatically clean it up.
* Update zpool_add_005_pos to use dumpon instead of dumpadm. This test had
never been ported to FreeBSD.
* In zpool_add_005_pos, don't format the dump disk with UFS. That was
pointless.
Rick Macklem [Wed, 30 Oct 2019 01:57:40 +0000 (01:57 +0000)]
MFC: r352736
Replace all mtx_assert() calls for n_mtx and ncl_iod_mutex with macros.
To be consistent with replacing the mtx_lock()/mtx_unlock() calls on
the NFS node mutex (n_mtx) and ncl_iod_mutex, this patch replaces
all mtx_assert() calls on these mutexes with macros as well.
This will simplify changing these locks to sx locks in a future commit.
However, this change may be delayed indefinitely, since it appears there
is a deadlock when vnode_pager_setsize() is called to shrink the size
and the NFS node lock is held.
There is no semantic change as a result of this commit.
Alan Somers [Wed, 30 Oct 2019 01:35:00 +0000 (01:35 +0000)]
MFC r352404, r352413-r352414
r352404:
fusefs: fix some minor issues with fuse_vnode_setparent
* When unparenting a vnode, actually clear the flag. AFAIK this is basically
a no-op because we only unparent a vnode when reclaiming it or when
unlinking.
* There's no need to call fuse_vnode_setparent during reclaim, because we're
about to free the vnode data anyway.
Reviewed by: emaste
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21630
r352413:
fusefs: fix some minor Coverity CIDs in the tests
Where open(2) is expected to fail, the tests should assert or expect that
its return value is -1. These tests all accepted too much but happened to
pass anyway.
Rick Macklem [Mon, 28 Oct 2019 22:54:36 +0000 (22:54 +0000)]
MFC: r352664
Replace all mtx_lock()/mtx_unlock() on the iod lock with macros.
Since the NFS node mutex needs to change to an sx lock so it can be held when
vnode_pager_setsize() is called and the iod lock is held when the NFS node lock
is acquired, the iod mutex will need to be changed to an sx lock as well.
To simply the future commit that changes both the NFS node lock and iod lock
to sx locks, this commit replaces all mtx_lock()/mtx_unlock() calls on the
iod lock with macros.
There is no semantic change as a result of this commit.
Rick Macklem [Mon, 28 Oct 2019 01:44:31 +0000 (01:44 +0000)]
MFC: r352636
Replace all mtx_lock()/mtx_unlock() on n_mtx with the macros.
For a long time, some places in the NFS code have locked/unlocked the
NFS node lock with the macros NFSLOCKNODE()/NFSUNLOCKNODE() whereas
others have simply used mtx_lock()/mtx_unlock().
Since the NFS node mutex needs to change to an sx lock so it can be held when
vnode_pager_setsize() is called, replace all occurrences of mtx_lock/mtx_unlock
with the macros to simply making the change to an sx lock in future commit.
There is no semantic change as a result of this commit.
John Baldwin [Fri, 25 Oct 2019 22:17:24 +0000 (22:17 +0000)]
MFC 353023: Fix the EMBEDFS_FORMAT helper variable for riscv64.
It was defined with the wrong MACHINE_ARCH previously. This permits
using an MFS image without defining MD_ROOT_SIZE which has various
benefits (one being that the build is able to treat the MFS image as
a dependency and properly re-link the kernel with the new image when
building with NO_CLEAN).
John Baldwin [Fri, 25 Oct 2019 22:15:20 +0000 (22:15 +0000)]
MFC 353059: Restore description of packets dropped due to full reassembly queue.
r265408 renamed tcps_rcvmemdrop to tcps_rcvreassfull and gave it a more
specific description. r279122 (libxo-ification) reverted that change.
This commit brings it back, but with a small tweak to the description.
John Baldwin [Fri, 25 Oct 2019 22:04:05 +0000 (22:04 +0000)]
MFC 353585,353586: Support hot insertion and removal of PCI devices on EC2.
353585:
Export pci_attach() and pci_detach().
353586:
Support hot insertion and removal of PCI devices on EC2.
Install ACPI notify handlers on PCI devices with an _EJ0 method. This
handler is invoked when devices are added or removed.
- When an ACPI_NOTIFY_DEVICE_CHECK event posts, rescan the parent bus
device. Note that strictly speaking we only need to rescan the
specified device, but BUS_RESCAN is what is available, so we rescan
the entire bus.
- When an ACPI_NOTIFY_EJECT_REQUEST event posts, detach the device
associated with the ACPI handle, invoke the _EJ0 method, and then
delete the device.
Eventually this might be changed to vector notify events to devd in
userspace where devctl can be used instead to permit more complex
actions such as graceful unmounting of filesystems.
John Baldwin [Fri, 25 Oct 2019 21:23:44 +0000 (21:23 +0000)]
MFC 353371: Don't free the cursor boundary tag during vmem_destroy().
The cursor boundary tag is statically allocated in the vmem instead of
from the vmem_bt_zone. Explicitly remove it from the vmem's segment
list in vmem_destroy before freeing all the segments from the vmem.
John Baldwin [Fri, 25 Oct 2019 21:14:43 +0000 (21:14 +0000)]
MFC 353323: Set the FID field in lookaside crypto requests to the rx queue ID.
The PCI block in the adapter requires this field to be set to a valid
queue ID. It is not clear why it did not fail on all machines, but
the effect was that crypto operations reading input data via DMA
failed with an internal PCI read error on machines with 128G or more
of RAM.
Many extern struct pcpu <something>__pcpu declarations were
copied/pasted in sources. The issue is that the definition is MD, but
it cannot be provided by machine/pcpu.h due to actual struct pcpu
defined in sys/pcpu.h later than the inclusion of machine/pcpu.h.
This forced the copying when other code needed direct access to
__pcpu. There is no way around it, due to machine/pcpu.h supplying
part of struct pcpu fields.
To work around the problem, add a new machine/pcpu_aux.h header, which
should fill any needed MD definitions after struct pcpu definition is
completed. This allows to remove copies of __pcpu spread around the
source. Also on x86 it makes it possible to remove work arounds like
OFFSETOF_CURTHREAD or clang specific warnings supressions.
Michael Tuexen [Fri, 25 Oct 2019 18:17:56 +0000 (18:17 +0000)]
MFC r354044:
Ensure that the flags indicating IPv4/IPv6 are not changed by failing
bind() calls. This would lead to inconsistent state resulting in a panic.
A fix for stable/11 was committed in
https://svnweb.freebsd.org/base?view=revision&revision=338986
Reported by: syzbot+2609a378d89264ff5a42@syzkaller.appspotmail.com
Obtained from: jtl@
Sponsored by: Netflix, Inc.