Conrad Meyer [Fri, 11 Oct 2019 06:02:03 +0000 (06:02 +0000)]
Fix braino in r353429
cy@ points out that I got parameter order backwards between definition and
invocation of the helper function. He is totally correct. The earlier
version of this patch predated the XFree column so this is one I introduced,
rather than the original author.
Conrad Meyer [Fri, 11 Oct 2019 01:31:31 +0000 (01:31 +0000)]
ddb: Add CSV option, sorting to 'show (malloc|uma)'
Add /i option for machine-parseable CSV output. This allows ready copy/
pasting into more sophisticated tooling outside of DDB.
Add total zone size ("Memory Use") as a new column for UMA.
For both, sort the displayed list on size (print the largest zones/types
first). This is handy for quickly diagnosing "where has my memory gone?" at
a high level.
Gleb Smirnoff [Thu, 10 Oct 2019 23:44:56 +0000 (23:44 +0000)]
Add two extra functions that basically give count of addresses
on interface. Such function could been implemented on top of
the if_foreach_llm?addr(), but several drivers need counting,
so avoid copy-n-paste inside the drivers.
Gleb Smirnoff [Thu, 10 Oct 2019 23:42:55 +0000 (23:42 +0000)]
Provide new KPI for network drivers to access lists of interface
addresses. The KPI doesn't reveal neither how addresses are stored,
how the access to them is synchronized, neither reveal struct ifaddr
and struct ifmaddr.
Conrad Meyer [Thu, 10 Oct 2019 22:49:45 +0000 (22:49 +0000)]
nvdimm(4): Calculate and save memattr once; it never changes
Refactor nvdimm_spa_memattr() routine and callers to just save the value at
initialization and use the value directly. The reference value from NFIT,
MemoryMapping, is read only once, so the associated memattr could never
change.
Dimitry Andric [Thu, 10 Oct 2019 20:33:55 +0000 (20:33 +0000)]
Pull in r374444 from upstream lldb trunk (by me):
Fix process launch failure on FreeBSD after r365761
Summary:
After rLLDB365761, and with `LLVM_ENABLE_ABI_BREAKING_CHECKS`
enabled, launching any process on FreeBSD crashes lldb with:
```
Expected<T> must be checked before access or destruction.
Expected<T> value was in success state. (Note: Expected<T> values in
success mode must still be checked prior to being destroyed).
```
This is because `m_operation_thread` and `m_monitor_thread` were
wrapped in `llvm::Expected<>`, but this requires the objects to be
correctly initialized before accessing them.
To fix the crashes, use `llvm::Optional<>` for the members (as
indicated by labath), and use local variables to store the return
values of `LaunchThread` and `StartMonitoringChildProcess`. Then,
only assign to the member variables after checking if the return
values indicated success.
Dimitry Andric [Thu, 10 Oct 2019 20:30:54 +0000 (20:30 +0000)]
Revert r353363 in preparation for applying upstream fix:
Put in a band-aid fix for lldb 9 exiting with "Expected<T> must be
checked before access or destruction" when launching executables, while
we sort this out with upstream.
Brooks Davis [Thu, 10 Oct 2019 16:29:13 +0000 (16:29 +0000)]
Fix -DNO_CLEAN build across r353340 and r353381
opensolaris_atomic.S is now only used on i386 with opensolaris_atomic.c
used on other platforms. After r353381 it doesn't exist on those
platforms so the stale dependency would result in a build error.
Andriy Gapon [Thu, 10 Oct 2019 07:39:41 +0000 (07:39 +0000)]
emulate illumos membar_producer with atomic_thread_fence_rel
membar_producer is supposed to be a store-store barrier.
Also, in the code that FreeBSD has ported from illumos membar_producer
is used only with regular stores to regular memory (with respect to
caching).
We do not have an MI primitive for the store-store barrier, so
atomic_thread_fence_rel is the closest we have as it provides
(load | store) -> store barrier.
Previously, membar_producer was an empty function call on all 32-bit
arm-s, 32-bit powerpc, riscv and all mips variants. I think that it was
inadequate.
On other platforms, such as amd64, arm64, i386, powerpc64, sparc64,
membar_producer was implemented using stronger primitives than required
for a store-store barrier with respect to regular memory access.
For example, it used sfence on amd64 and lock-ed nop in i386 (despite TSO).
On powerpc64 we now use recommended lwsync instead of eieio.
On sparc64 FreeBSD uses TSO mode.
On arm64/aarch64 we now use dmb sy instead of dmb ish. Not sure if this
is an improvement, actually.
After this change we can drop opensolaris_atomic.S for aarch64, amd64,
powerpc64 and sparc64 as all required atomic operations have either
direct or light-weight mapping to FreeBSD native atomic operations.
Doug Ambrisko [Thu, 10 Oct 2019 03:12:17 +0000 (03:12 +0000)]
This driver attaches to the Intel VMD drive and connects a new PCI domain
starting at the max. domain, and then work down. Then existing FreeBSD
drivers will attach. Interrupt routing from the VMD MSI-X to the NVME
drive is not well known, so any interrupt is sent to all children that
register.
VROC used Intel meta data so graid(8) works with it. However, graid(8)
supports RAID 0,1,10 for read and write. I have some early code to
support writes with RAID 5. Note that RAID 5 can have life issues
with SSDs since it can cause write amplification from updating the parity
data.
Hot plug support needs a change to skip the following check to work:
if (pcib_request_feature(dev, PCI_FEATURE_HP) != 0) {
in sys/dev/pci/pci_pci.c.
Looked at by: imp, rpokala, bcr
Differential Revision: https://reviews.freebsd.org/D21383
John Baldwin [Wed, 9 Oct 2019 21:20:39 +0000 (21:20 +0000)]
Don't free the cursor boundary tag during vmem_destroy().
The cursor boundary tag is statically allocated in the vmem instead of
from the vmem_bt_zone. Explicitly remove it from the vmem's segment
list in vmem_destroy before freeing all the segments from the vmem.
Warner Losh [Wed, 9 Oct 2019 21:18:46 +0000 (21:18 +0000)]
Wordsmith and simplify
Simplify expressions as suggested by jhb. The extra indirection made
sense in earlier versions of this patch, but not the final one.
While here, apply suggestion from emaste for wording of universe.
Also wordsmith awkwardly worded comment about when we effectively
neuter the universe build for an architecture.
Once llvm 9.0 has been vetted for mips and powerpc, I'll take them out
of these lists.
Warner Losh [Wed, 9 Oct 2019 21:02:06 +0000 (21:02 +0000)]
Fix casting error from newer gcc
Cast the pointers to (uintptr_t) before assigning to type
uint64_t. This eliminates an error from gcc when we cast the pointer
to a larger integer type.
Warner Losh [Wed, 9 Oct 2019 20:59:10 +0000 (20:59 +0000)]
Don't compile old gcc 4.2.1 archs by default in universe/tinderbox.
Only compile clang supporting architectures of amd64, arm, arm64,
i386, and riscv as part of universe. Compile the other architectures
if MAKE_OBSOLETE_GCC is defined. In all cases, explicit lists of
architectures in TARGETS= on the command line override.
For mips, powerpc and sparc64, do the same thing we do for risvc when
MAKE_OBSOLETE_GCC isn't defined and short-circuit their universe build
with an echo saying to install the xtoolchain port or pkg.
PR: 241134
Discussed on: arch@ (https://lists.freebsd.org/pipermail/freebsd-arch/2019-August/019674.html)
Differential Revision: https://reviews.freebsd.org/D21942
Alan Somers [Wed, 9 Oct 2019 20:16:40 +0000 (20:16 +0000)]
ZFS: fix the zpool_add_010_pos test
The test is necessarily racy, because it depends on being able to complete a
"zpool add" before a previous resilver finishes. But it was racier than it
needed to be. Move the first "zpool add" to before the resilver starts.
Dimitry Andric [Wed, 9 Oct 2019 19:51:41 +0000 (19:51 +0000)]
Put in a band-aid fix for lldb 9 exiting with "Expected<T> must be
checked before access or destruction" when launching executables, while
we sort this out with upstream.
Alan Somers [Wed, 9 Oct 2019 17:36:57 +0000 (17:36 +0000)]
ZFS: in the tests, don't override PWD
The ZFS test suite was overriding the common $PWD variable with the path to
the pwd command, even though no test wanted to use it that way. Most tests
didn't notice, because ksh93 eventually restored it to its proper meaning.
Alan Somers [Wed, 9 Oct 2019 17:24:09 +0000 (17:24 +0000)]
ZFS: multiple fixes to the zpool_import tests
* Don't create a UFS mountpoint just to store some temporary files. The
tests should always be executed with a sufficiently large TMPDIR.
Creating the UFS mountpoint is not only unneccessary, but it slowed
zpool_import_missing_002_pos greatly, because that test moves large files
between TMPDIR and the UFS mountpoint. This change also allows many of
the tests to be executed with just a single test disk, instead of two.
* Move zpool_import_missing_002_pos's backup device dir from / to $PWD to
prevent cross-device moves. On my system, these two changes improved that
test's speed by 39x. It should also prevent ENOSPC errors seen in CI.
* If insufficient disks are available, don't try to partition one of them.
Just rely on Kyua to skip the test. Users who care will configure Kyua
with sufficient disks.
Ensure the epoch_call() function is not called more than one time
before the callback has been executed, by always checking the
RS_FUNERAL_SCHD flag before invoking epoch_call().
The "rs_number_dead" is balanced again after r353353.
Gleb Smirnoff [Wed, 9 Oct 2019 17:02:28 +0000 (17:02 +0000)]
ip6_output() has a complex set of gotos, and some can jump out of
the epoch section towards return statement. Since entering epoch
is cheap, it is easier to cover the whole function with epoch,
rather than try to properly maintain its state.
Andriy Gapon [Wed, 9 Oct 2019 11:26:36 +0000 (11:26 +0000)]
cleanup of illumos compatibility atomics
atomic_cas_32 is implemented using atomic_fcmpset_32 on all platforms.
Ditto for atomic_cas_64 and atomic_fcmpset_64 on platforms that have it.
The only exception is sparc64 that provides MD atomic_cas_32 and
atomic_cas_64.
This is slightly inefficient as fcmpset reports whether the operation
updated the target and that information is not needed for cas.
Nevertheless, there is less code to maintain and to add for new platforms.
Also, the operations are done inline now as opposed to function calls before.
atomic_add_64_nv is implemented using atomic_fetchadd_64 on platforms
that provide it.
casptr, cas32, atomic_or_8, atomic_or_8_nv are completely removed as they
have no users.
atomic_mtx that is used to emulate 64-bit atomics on platforms that lack
them is defined only on those platforms.
As a result, platform specific opensolaris_atomic.S files have lost most of
their code. The only exception is i386 where the compat+contrib code
provides 64-bit atomics for userland use. That code assumes availability of
cmpxchg8b instruction. FreeBSD does not have that assumption for i386
userland and does not provide 64-bit atomics. Hopefully, this can and will
be fixed.
Gleb Smirnoff [Wed, 9 Oct 2019 05:52:07 +0000 (05:52 +0000)]
Revert changes to rip6_bind() from r353292. This function is always
called in syscall context, so it must enter epoch itself. This
changeset originates from early version of the patch, and somehow
slipped to the final version.
Yuri Pankov [Wed, 9 Oct 2019 05:28:10 +0000 (05:28 +0000)]
bsdinstall: fix ESP detection for auto ZFS layout
Pass the list of user selected disks from zfsboot to bootconfig so that
the latter doesn't rely on ESP autodetection that apparently fails for
some cases, e.g. memstick installation with nvme (boot) and sata drives.
While here, fix printing of debug messages in bootconfig.
Mitchell Horne [Wed, 9 Oct 2019 02:02:22 +0000 (02:02 +0000)]
RISC-V: Fix an alignment warning in libthr
Compiling with clang gives a loss-of-alignment error due the cast to
uint8_t *. Since the TLS is always tcb aligned and TP_OFFSET is defined
as sizeof(struct tcb) we can guarantee there is no misalignment. Silence
the error by moving the offset into the inline assembly.
Mark Johnston [Tue, 8 Oct 2019 23:34:48 +0000 (23:34 +0000)]
Fix handling of empty SCM_RIGHTS messages.
As unp_internalize() processes the input control messages, it builds
an output mbuf chain containing the internalized representations of
those messages. In one special case, that of an empty SCM_RIGHTS
message, the message is simply discarded. However, the loop which
appends mbufs to the output chain assumed that each iteration would
produce an mbuf, resulting in a null pointer dereference if an empty
SCM_RIGHTS message was followed by a non-empty message.
Fix this by advancing the output mbuf chain tail pointer only if an
internalized control message was produced.
Reported by: syzbot+1b5cced0f7fad26ae382@syzkaller.appspotmail.com
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
John Baldwin [Tue, 8 Oct 2019 21:40:42 +0000 (21:40 +0000)]
Add support for KTLS in the Chelsio TOE module.
This adds a TOE hook to allocate a KTLS session. It also recognizes
TLS mbufs in the socket buffer and sends those to the NIC using a TLS
work request to encrypt the record before segmenting it.
TOE TLS support must be enabled via the dev.t6nex.<N>.tls sysctl in
addition to enabling KTLS.
Brooks Davis [Tue, 8 Oct 2019 21:39:51 +0000 (21:39 +0000)]
msun: Silence new harmless -Wimplicit-int-float-conversion warnings
Clang from trunk recently added a warning for when implicit int-to-float
conversions cause a loss of precision. The code in question is designed
to be able to handle that, so add explicit casts to silence this.
John Baldwin [Tue, 8 Oct 2019 21:34:06 +0000 (21:34 +0000)]
Add a TOE KTLS mode and a TOE hook for allocating TLS sessions.
This adds the glue to allocate TLS sessions and invokes it from
the TLS enable socket option handler. This also adds some counters
for active TOE sessions.
The TOE KTLS mode is returned by getsockopt(TLSTX_TLS_MODE) when
TOE KTLS is in use on a socket, but cannot be set via setsockopt().
To simplify various checks, a TLS session now includes an explicit
'mode' member set to the value returned by TLSTX_TLS_MODE. Various
places that used to check 'sw_encrypt' against NULL to determine
software vs ifnet (NIC) TLS now check 'mode' instead.
Brooks Davis [Tue, 8 Oct 2019 21:14:09 +0000 (21:14 +0000)]
Fix various -Wpointer-compare warnings
This warning (comparing a pointer against a zero character literal
rather than NULL) has existed since GCC 7.1.0, and was recently added to
Clang trunk.
Almost all of these are harmless, except for fwcontrol's str2node, which
needs to both guard against dereferencing a NULL pointer (though in
practice it appears none of the callers will ever pass one in), as well
as ensure it doesn't parse the empty string as node 0 due to strtol's
awkward interface.
John Baldwin [Tue, 8 Oct 2019 20:22:05 +0000 (20:22 +0000)]
Set the FID field in lookaside crypto requests to the rx queue ID.
The PCI block in the adapter requires this field to be set to a valid
queue ID. It is not clear why it did not fail on all machines, but
the effect was that crypto operations reading input data via DMA
failed with an internal PCI read error on machines with 128G or more
of RAM.
As noted by the commit message, callouts are now persistant
and should not be in the auto-zero section of the RQ's and SQ's.
This fixes an assert when using the TX completion event
factor feature with mlx5en(4).
Found by: gallatin@
MFC after: 3 days
Sponsored by: Mellanox Technologies
Glen Barber [Tue, 8 Oct 2019 18:58:23 +0000 (18:58 +0000)]
Rework the logic for installing the pkg(8) configuration.
'quarterly' package sets do not exist for head, so explicitly
install the 'latest' configuration file there. Otherwise,
fall back to the original conditional evaluation to determine
if the 'latest' or 'quarterly' configuration file should be
installed.
Reported by: manu
Reviewed by: manu
MFC after: 3 days
Sponsored by: Rubicon Communications, LLC (Netgate)
Dimitry Andric [Tue, 8 Oct 2019 18:21:33 +0000 (18:21 +0000)]
Prepare for merging back to head:
* Set tentative merge date
* Add UPDATING entry
* Bump __FreeBSD_version
* Bump FREEBSD_CC_VERSION
* Bump LLD_REVISION
Gleb Smirnoff [Tue, 8 Oct 2019 17:55:45 +0000 (17:55 +0000)]
Remove epoch assertion from if_setlladdr(). Originally this function was
protected by IF_ADDR_LOCK(), which was a mutex, so that two simultaneous
if_setlladdr() can't execute. Later it was switched to IF_ADDR_RLOCK(),
likely by a mistake. Later it was switched to NET_EPOCH_ENTER(). Then I
incorrectly added NET_EPOCH_ASSERT() here.
In reality ifp->if_addr never goes away and never changes its length. So,
doing bcopy() in it is always "safe", meaning it won't dereference a wrong
pointer or write into someone's else memory. Of course doing two bcopy() in
parallel would result in a mess of two addresses, but net epoch doesn't
protect against that, neither IF_ADDR_RLOCK() did.
So for now, just remove the assertion and leave for later a proper fix.
Gleb Smirnoff [Tue, 8 Oct 2019 16:45:56 +0000 (16:45 +0000)]
In DIAGNOSTIC block of if_delmulti_ifma_flags() enter the network epoch.
This quickly plugs the regression from r353292. The locking of multicast
definitely needs a broader review today...
Mark Johnston [Tue, 8 Oct 2019 15:03:48 +0000 (15:03 +0000)]
Avoid erroneously clearing PGA_WRITEABLE in riscv's pmap_enter().
During a CoW fault, we must check for both 4KB and 2MB mappings before
clearing PGA_WRITEABLE on the old mapping's page. Previously we were
only checking for 4KB mappings. This was missed in r344106.
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Eric van Gyzen [Tue, 8 Oct 2019 13:43:05 +0000 (13:43 +0000)]
Fix problems in the kern_maxfiles__increase test
ATF functions such as ATF_REQUIRE do not work correctly in child processes.
Use plain C functions to report errors instead.
In the parent, check for the untimely demise of children. Without this,
the test hung until the framework's timeout.
Raise the resource limit on the number of open files. If this was too low,
the test hit the two problems above.
Restore the kern.maxfiles sysctl OID in the cleanup function.
The body prematurely removed the symlink in which the old value was saved.
Make the test more robust by opening more files. In fact, due to the
integer division by 4, this was necessary to make the test valid with
some initial values of maxfiles. Thanks, asomers@.
wait() for children instead of sleeping.
Clean up a temporary file created by the test ("afile").
Andriy Gapon [Tue, 8 Oct 2019 11:27:48 +0000 (11:27 +0000)]
zfs: use atomic_load_64 to read atomic variable in dmu_object_alloc_impl
As long as we support ZFS on 32-bit platforms we should do this for all
64-bit variables that are modified in a lockless fashion using atomic
operations. Otherwise, there is a risk of a reading a torn value.
Here is a rationale for why I am doing this in dmu_object_alloc_impl:
- it's very recent code
- the code deals with object IDs and a number of objects in a file
system can overflow 32 bits
- incorrect allocation of an object ID may result in hard to debug
problems
- fixing all plain reads of 64-bit atomic variables is not a trivial
undertaking to do in one shot, so I chose to do it incrementally
Michael Tuexen [Tue, 8 Oct 2019 11:07:16 +0000 (11:07 +0000)]
Validate length before use it, not vice versa.
r353060 should have contained this...
This fixes
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18070
MFC after: 3 days
Make sure the vnet_shutdown field is not set until after all
VNET_SYSUNINIT()'s in the SI_SUB_VNET_DONE subsystem have been
executed. Especially the vnet_if_return() functions requires that
if_move() is still operational.
Doug Moore [Tue, 8 Oct 2019 07:14:21 +0000 (07:14 +0000)]
Define macro VM_MAP_ENTRY_FOREACH for enumerating the entries in a vm_map.
In case the implementation ever changes from using a chain of next pointers,
then changing the macro definition will be necessary, but changing all the
files that iterate over vm_map entries will not.
Drop a counter in vm_object.c that would have an effect only if the
vm_map entry count was wrong.
Justin Hibbits [Tue, 8 Oct 2019 01:36:34 +0000 (01:36 +0000)]
powerpc: Implement atomic_(f)cmpset_ for short and char
|
This adds two implementations for each atomic_fcmpset_ and atomic_cmpset_
short and char functions, selectable at compile time for the target
architecture. By default, it uses a generic shift-and-mask to perform atomic
updates to sub-components of 32-bit words from <sys/_atomic_subword.h>.
However, if ISA_206_ATOMICS is defined it uses the ll/sc instructions for
halfword and bytes, introduced in PowerISA 2.06. These instructions are
supported by all IBM processors from POWER7 on, as well as the Freescale/NXP
e6500 core. Although the e5500 and e500mc both implement PowerISA 2.06 they
do not implement these instructions.
As part of this, clean up the atomic_(f)cmpset_acq and _rel wrappers, by
using macros to reduce code duplication.
ISA_206_ATOMICS requires clang or newer binutils (2.20 or later).
Mateusz Guzik [Mon, 7 Oct 2019 23:19:09 +0000 (23:19 +0000)]
vm: stop trylocking page queues in vm_page_pqbatch_submit
About 11 minutes of poudriere -s -j 104 and probing on return value of
trylocks reveals that over 10% of attempts fail, which in turn means
there are more atomics performed than necessary.
Trylocking was there to try preventing migration, but it's not very likely
to happen if the lock is uncontested.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21925
Gleb Smirnoff [Mon, 7 Oct 2019 22:40:05 +0000 (22:40 +0000)]
Widen NET_EPOCH coverage.
When epoch(9) was introduced to network stack, it was basically
dropped in place of existing locking, which was mutexes and
rwlocks. For the sake of performance mutex covered areas were
as small as possible, so became epoch covered areas.
However, epoch doesn't introduce any contention, it just delays
memory reclaim. So, there is no point to minimise epoch covered
areas in sense of performance. Meanwhile entering/exiting epoch
also has non-zero CPU usage, so doing this less often is a win.
Not the least is also code maintainability. In the new paradigm
we can assume that at any stage of processing a packet, we are
inside network epoch. This makes coding both input and output
path way easier.
On output path we already enter epoch quite early - in the
ip_output(), in the ip6_output().
This patch does the same for the input path. All ISR processing,
network related callouts, other ways of packet injection to the
network stack shall be performed in net_epoch. Any leaf function
that walks network configuration now asserts epoch.
Tricky part is configuration code paths - ioctls, sysctls. They
also call into leaf functions, so some need to be changed.
This patch would introduce more epoch recursions (see EPOCH_TRACE)
than we had before. They will be cleaned up separately, as several
of them aren't trivial. Note, that unlike a lock recursion the
epoch recursion is safe and just wastes a bit of resources.
Michael Tuexen [Mon, 7 Oct 2019 20:35:04 +0000 (20:35 +0000)]
In r343587 a simple port filter as sysctl tunable was added to siftr.
The new sysctl was not added to the siftr.4 man page at the time.
This updates the man page, and removes one left over trailing whitespace.
Submitted by: Richard Scheffenegger
Reviewed by: bcr@
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D21619
Alan Somers [Mon, 7 Oct 2019 20:21:23 +0000 (20:21 +0000)]
ZFS: fix the redundancy tests
* Fix force_sync_path, which ensures that a file is fully flushed to disk.
Apparently "zpool history"'s performance has improved, but exporting and
importing the pool still works.
* Fix file_dva by using undocumented zdb syntax to clarify that we're
interested in the pool's root file system, not the pool itself. This
should also fix the zpool_clear_001_pos test.
* Remove a redundant cleanup step
Alan Somers [Mon, 7 Oct 2019 20:13:49 +0000 (20:13 +0000)]
ZFS: fix the delegate tests
These tests have never worked correctly
* Replace runwattr with sudo
* Fix a scoping bug with the "dtst" variable
* Cleanup user properties created during tests
* Eliminate the checks for refreservation and send support. They will always
be supported.
* Fix verify_fs_snapshot. It seemed to assume that permissions would not yet
be delegated, but that's not how it's actually used.
* Combine verify_fs_promote with verify_vol_promote
* Remove some useless sleeps
* Fix backwards condition in verify_vol_volsize
* Remove some redundant cleanup steps in the tests. cleanup.ksh will handle
everything.
* Disable some parts of the tests that FreeBSD doesn't support:
* Creating snapshots with mkdir
* devices
* shareisci
* sharenfs
* xattr
* zoned
The sharenfs parts could probably be reenabled with more work to remove the
Solarisms.