dougm [Mon, 14 Oct 2019 17:15:42 +0000 (17:15 +0000)]
Move the definition of _vm_map_assert_consistent so that it can use
vm_map_free_{left,right} rather than re-implementing them. Use the
VM_MAP_FOREACH macro where applicable. Fix some indentation.
Suggested by: kib (in a comment on D21964)
Tested by: pho (as part of D21964)
Differential Revision: https://reviews.freebsd.org/D22011
luporl [Mon, 14 Oct 2019 13:04:04 +0000 (13:04 +0000)]
[PPC64] Initial kernel minidump implementation
Based on POWER9BSD implementation, with all POWER9 specific code removed and
addition of new methods in PPC64 MMU interface, to isolate platform specific
code. Currently, the new methods are implemented on pseries and PowerNV
(D21643).
jhibbits [Sun, 13 Oct 2019 19:33:00 +0000 (19:33 +0000)]
powerpc/pmap: Tighten condition for removing tracked pages in Book-E pmap
There are cases where there's no vm_page_t structure for a given physical
address, such as the CCSR. In this case, trying to obtain the
md.page_tracked struct member would lead to a NULL dereference, and panic.
Tighten this up by checking for kernel_pmap AND that the page structure
actually exists before dereferencing. The flag can only be set when it's
tracked in the kernel pmap anyway.
tuexen [Sun, 13 Oct 2019 18:17:08 +0000 (18:17 +0000)]
Use an event handler to notify the SCTP about IP address changes
instead of calling an SCTP specific function from the IP code.
This is a requirement of supporting SCTP as a kernel loadable module.
This patch was developed by markj@, I tweaked a bit the SCTP related
code.
markj [Sun, 13 Oct 2019 16:14:04 +0000 (16:14 +0000)]
Move SCTP DTrace probe definitions into a .c file.
Previously they were defined in a header which was included exactly
once. Change this to follow the usual practice of putting definitions
in C files. No functional change intended.
Discussed with: tuexen
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
kib [Sun, 13 Oct 2019 06:56:45 +0000 (06:56 +0000)]
Restore nofaulting operations after r352807
The TDP_NOFAULTING flag should be checked in vm_fault(), not in
vm_fault_trap(). Otherwise kernel accesses to userspace, like
vn_io_fault(), enter vm locking when it should not.
Reported and tested by: pho
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Differential revision: https://reviews.freebsd.org/D21992
scottl [Sun, 13 Oct 2019 05:11:53 +0000 (05:11 +0000)]
Fix the botched field ordering in the last commit. While here, fix
whitespace, and also reorder the fields so they are easier to read on
an 80 column display (the lines wrapped even before these changes).
Also fix non-standard nomenclature in the Caps code, and update the
man page.
bdragon [Sat, 12 Oct 2019 23:16:17 +0000 (23:16 +0000)]
Fix read past end of struct in ncsw glue code.
The logic in XX_IsPortalIntr() was reading past the end of XX_PInfo.
This was causing it to erroneously return 1 instead of 0 in some
circumstances, causing a panic on the AmigaOne X5000 due to mixing
exclusive and nonexclusive interrupts on the same interrupt line.
Since this code is only called a couple of times during startup, use
a simple double loop instead of the complex read-ahead single loop.
This also fixes a bug where it would never check cpu=0 on type=1.
scottl [Sat, 12 Oct 2019 22:27:57 +0000 (22:27 +0000)]
Change from the non-standard nomenclature of "chip" and "card" to the
standard nomenclature of "device" and "vendor" with the "sub" variants.
This changes the printed format, so anything that scrapes and parses
this will need to be adapted. No compatibility shims are provided,
but this will not be MFC'd.
mav [Sat, 12 Oct 2019 19:03:07 +0000 (19:03 +0000)]
Allocate device softc from the device domain.
Since we are trying to bind device interrupt threads to the device domain,
it should have sense to make memory often accessed by them local. If domain
is not known, fall back to round-robin.
tuexen [Sat, 12 Oct 2019 17:57:03 +0000 (17:57 +0000)]
Ensure that local variables are reset to their initial value when
dealing with error cases in a loop over all remote addresses.
This issue was found and reported by OSS_Fuzz in:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18080
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18086
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18121
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18163
kib [Fri, 11 Oct 2019 18:41:24 +0000 (18:41 +0000)]
devfs_vptocnp(): correct the component name when node is not at top.
Node' cdp.si_name is the full path as provided by make_dev(9), it
should not be returned by VOP_VPTOCNP() when only the last component
is requested. Use the dirent entry instead.
With this note, handling of VDIR and VCHR nodes only differs in
handling of root vnode, which simplifies and unifies the logic.
Reported by: Li, Zhichao1 <Zhichao_Li1@Dell.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
kp [Fri, 11 Oct 2019 17:04:38 +0000 (17:04 +0000)]
mountroot: run statfs after mounting devfs
The usual flow for mounting a file system is to VFS_MOUNT() and then
immediately VFS_STATFS().
That's not done in vfs_mountroot_devfs(), which means the
mp->mnt_stat.f_iosize field is not correctly populated, which in turn
causes us to mark valid aio operations as unsafe (because the io size is
set to 0), ultimately causing the aio_test:md_waitcomplete test to fail.
avg [Fri, 11 Oct 2019 17:01:02 +0000 (17:01 +0000)]
fix up r353340, don't assume that fcmpset has strong semantics
fcmpset can have two kinds of semantics, weak and strong.
For practical purposes, strong semantics means that if fcmpset fails
then the reported current value is always different from the expected
value. Weak semantics means that the reported current value may be the
same as the expected value even though fcmpset failed. That's a so
called "sporadic" failure.
I originally implemented atomic_cas expecting strong semantics, but many
platforms actually have weak one.
Reported by: pkubaj (not confirmed if same issue)
Discussed with: kib, mjg
MFC after: 19 days
X-MFC with: r353340
asomers [Fri, 11 Oct 2019 14:59:28 +0000 (14:59 +0000)]
MFZol: Fix performance of "zfs recv" with many deletions
This patch fixes 2 issues with the DMU free throttle implemented
in dmu_free_long_range(). The first issue is that get_next_chunk()
was calculating the number of L1 blocks the free would dirty
incorrectly. In some cases involving extremely large files, this
code would greatly overestimate the number of affected L1 blocks,
causing excessive calls to txg_wait_open(). This patch corrects
the calculation.
The second issue is that the free throttle uses the total number
of free'd blocks in all (open, quiescing, and syncing) txgs to
determine whether to throttle. This causes large frees (such as
those created by the first issue) to cause 4 txg syncs before
any further frees were allowed to proceed. This patch ensures
that the accounting is done entirely in a per-txg fashion, so
that frees from a given txg don't affect those that immediately
follow it.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Tom Caputi <tcaputi@datto.com>
zfsonlinux/zfs@f4c594da94d856c422512a54e48070f890b2685b
Freeing throttle should account for holes
Deletion throttle currently does not account for holes in a file.
This means that it can activate when it shouldn't.
To fix it we switch the throttle to be based on the number of
L1 blocks we will have to dirty when freeing
Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alek Pinchuk <apinchuk@datto.com>
zfsonlinux/zfs@65282ee9e06b130f1f0169baf5d9bf0dd8fc1ef9
mjg [Fri, 11 Oct 2019 14:57:47 +0000 (14:57 +0000)]
amd64 pmap: handle fictitious mappigns with addresses beyond pv_table
There are provisions to do it already with pv_dummy, but new locking code
did not account for it. Previous one did not have the problem because
it hashed the address into the lock array.
While here annotate common vars with __read_mostly and __exclusive_cache_line.
Reported by: Thomas Laus
Tesetd by: jkim, Thomas Laus Fixes: r353149 ("amd64 pmap: implement per-superpage locks")
Sponsored by: The FreeBSD Foundation
avg [Fri, 11 Oct 2019 11:13:47 +0000 (11:13 +0000)]
add superio.4 and superio.9 manual pages
This adds basic documentation on what the superio driver is and how
other drivers can interact with it. I decided to also document
superio's ivar accessors.
cem [Fri, 11 Oct 2019 06:02:03 +0000 (06:02 +0000)]
Fix braino in r353429
cy@ points out that I got parameter order backwards between definition and
invocation of the helper function. He is totally correct. The earlier
version of this patch predated the XFree column so this is one I introduced,
rather than the original author.
cem [Fri, 11 Oct 2019 01:31:31 +0000 (01:31 +0000)]
ddb: Add CSV option, sorting to 'show (malloc|uma)'
Add /i option for machine-parseable CSV output. This allows ready copy/
pasting into more sophisticated tooling outside of DDB.
Add total zone size ("Memory Use") as a new column for UMA.
For both, sort the displayed list on size (print the largest zones/types
first). This is handy for quickly diagnosing "where has my memory gone?" at
a high level.
glebius [Thu, 10 Oct 2019 23:44:56 +0000 (23:44 +0000)]
Add two extra functions that basically give count of addresses
on interface. Such function could been implemented on top of
the if_foreach_llm?addr(), but several drivers need counting,
so avoid copy-n-paste inside the drivers.
glebius [Thu, 10 Oct 2019 23:42:55 +0000 (23:42 +0000)]
Provide new KPI for network drivers to access lists of interface
addresses. The KPI doesn't reveal neither how addresses are stored,
how the access to them is synchronized, neither reveal struct ifaddr
and struct ifmaddr.
cem [Thu, 10 Oct 2019 22:49:45 +0000 (22:49 +0000)]
nvdimm(4): Calculate and save memattr once; it never changes
Refactor nvdimm_spa_memattr() routine and callers to just save the value at
initialization and use the value directly. The reference value from NFIT,
MemoryMapping, is read only once, so the associated memattr could never
change.
dim [Thu, 10 Oct 2019 20:33:55 +0000 (20:33 +0000)]
Pull in r374444 from upstream lldb trunk (by me):
Fix process launch failure on FreeBSD after r365761
Summary:
After rLLDB365761, and with `LLVM_ENABLE_ABI_BREAKING_CHECKS`
enabled, launching any process on FreeBSD crashes lldb with:
```
Expected<T> must be checked before access or destruction.
Expected<T> value was in success state. (Note: Expected<T> values in
success mode must still be checked prior to being destroyed).
```
This is because `m_operation_thread` and `m_monitor_thread` were
wrapped in `llvm::Expected<>`, but this requires the objects to be
correctly initialized before accessing them.
To fix the crashes, use `llvm::Optional<>` for the members (as
indicated by labath), and use local variables to store the return
values of `LaunchThread` and `StartMonitoringChildProcess`. Then,
only assign to the member variables after checking if the return
values indicated success.
dim [Thu, 10 Oct 2019 20:30:54 +0000 (20:30 +0000)]
Revert r353363 in preparation for applying upstream fix:
Put in a band-aid fix for lldb 9 exiting with "Expected<T> must be
checked before access or destruction" when launching executables, while
we sort this out with upstream.
brooks [Thu, 10 Oct 2019 16:29:13 +0000 (16:29 +0000)]
Fix -DNO_CLEAN build across r353340 and r353381
opensolaris_atomic.S is now only used on i386 with opensolaris_atomic.c
used on other platforms. After r353381 it doesn't exist on those
platforms so the stale dependency would result in a build error.
avg [Thu, 10 Oct 2019 07:39:41 +0000 (07:39 +0000)]
emulate illumos membar_producer with atomic_thread_fence_rel
membar_producer is supposed to be a store-store barrier.
Also, in the code that FreeBSD has ported from illumos membar_producer
is used only with regular stores to regular memory (with respect to
caching).
We do not have an MI primitive for the store-store barrier, so
atomic_thread_fence_rel is the closest we have as it provides
(load | store) -> store barrier.
Previously, membar_producer was an empty function call on all 32-bit
arm-s, 32-bit powerpc, riscv and all mips variants. I think that it was
inadequate.
On other platforms, such as amd64, arm64, i386, powerpc64, sparc64,
membar_producer was implemented using stronger primitives than required
for a store-store barrier with respect to regular memory access.
For example, it used sfence on amd64 and lock-ed nop in i386 (despite TSO).
On powerpc64 we now use recommended lwsync instead of eieio.
On sparc64 FreeBSD uses TSO mode.
On arm64/aarch64 we now use dmb sy instead of dmb ish. Not sure if this
is an improvement, actually.
After this change we can drop opensolaris_atomic.S for aarch64, amd64,
powerpc64 and sparc64 as all required atomic operations have either
direct or light-weight mapping to FreeBSD native atomic operations.
ambrisko [Thu, 10 Oct 2019 03:12:17 +0000 (03:12 +0000)]
This driver attaches to the Intel VMD drive and connects a new PCI domain
starting at the max. domain, and then work down. Then existing FreeBSD
drivers will attach. Interrupt routing from the VMD MSI-X to the NVME
drive is not well known, so any interrupt is sent to all children that
register.
VROC used Intel meta data so graid(8) works with it. However, graid(8)
supports RAID 0,1,10 for read and write. I have some early code to
support writes with RAID 5. Note that RAID 5 can have life issues
with SSDs since it can cause write amplification from updating the parity
data.
Hot plug support needs a change to skip the following check to work:
if (pcib_request_feature(dev, PCI_FEATURE_HP) != 0) {
in sys/dev/pci/pci_pci.c.
Looked at by: imp, rpokala, bcr
Differential Revision: https://reviews.freebsd.org/D21383
jhb [Wed, 9 Oct 2019 21:20:39 +0000 (21:20 +0000)]
Don't free the cursor boundary tag during vmem_destroy().
The cursor boundary tag is statically allocated in the vmem instead of
from the vmem_bt_zone. Explicitly remove it from the vmem's segment
list in vmem_destroy before freeing all the segments from the vmem.
imp [Wed, 9 Oct 2019 21:18:46 +0000 (21:18 +0000)]
Wordsmith and simplify
Simplify expressions as suggested by jhb. The extra indirection made
sense in earlier versions of this patch, but not the final one.
While here, apply suggestion from emaste for wording of universe.
Also wordsmith awkwardly worded comment about when we effectively
neuter the universe build for an architecture.
Once llvm 9.0 has been vetted for mips and powerpc, I'll take them out
of these lists.
imp [Wed, 9 Oct 2019 21:02:06 +0000 (21:02 +0000)]
Fix casting error from newer gcc
Cast the pointers to (uintptr_t) before assigning to type
uint64_t. This eliminates an error from gcc when we cast the pointer
to a larger integer type.