The `zfs userspace` squashes all entries with unresolved numeric
values into a single output entry due to the comparsion always
made by the string name which is empty in case of unresolved IDs.
Fix this by falling to a numerical comparison when either one
of string values is not found. This then compares any numerical
values after all with a name resolved.
Signed-off-by: Pavel Boldin <boldin.pavel@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reported by: clusteradm
Obtained from: ZFS-on-Linux
MFC after: 3 days
jhibbits [Sat, 18 May 2019 11:14:43 +0000 (11:14 +0000)]
powerpc64/pmap: NUMA-ize the page pv lock pool to reduce contention
It was found during building llvm that the page pv lock pool was seeing very
high contention. Since the pmap is already NUMA aware, it was surmised that
the domains were referencing similar pages in the different domains. This
reduces contention to the point of noise in a lockstat(8) run (~51% down to
under 5%), reducing build times by up to 20%.
This doesn't do a perfect domain alignment, just a best-guess based on
hardware available, that the domain is roughly specified in the upper bits
of the PA. Trying to be more clever would more than likely result in
reduced performance just on the work needed.
markj [Sat, 18 May 2019 02:02:14 +0000 (02:02 +0000)]
Use M_NEXTFIT in memguard(9).
memguard(9) wants to avoid reuse of freed addresses for as long as
possible. Previously it maintained a racily updated cursor which was
passed to vmem_xalloc(9) as the minimum address. However, vmem will
not in general return the lowest free address in the arena, so this
trick only really works until the cursor has wrapped around the first
time.
markj [Sat, 18 May 2019 01:46:38 +0000 (01:46 +0000)]
Implement the M_NEXTFIT allocation strategy for vmem(9).
This is described in the vmem paper: "directs vmem to use the next free
segment after the one previously allocated." The implementation adds a
new boundary tag type, M_CURSOR, which is linked into the segment list
and precedes the segment following the previous M_NEXTFIT allocation.
The cursor is used to locate the next free segment satisfying the
allocation constraints.
This implementation isn't O(1) since busy tags aren't coalesced, and we
may potentially scan the entire segment list during an M_NEXTFIT
allocation.
cem [Sat, 18 May 2019 00:22:28 +0000 (00:22 +0000)]
Add DragonFly's partition number to fdisk(8) and diskmbr.h
This change doesn't make any attempt to add support for these slices to the
relevent GEOM classes. Just register the number in fdisk and the canonical
list of kernel macros (diskmbr.h).
stevek [Fri, 17 May 2019 18:13:43 +0000 (18:13 +0000)]
Obtain a shared lock instead of exclusive in the MAC/veriexec
MAC_VERIEXEC_CHECK_PATH_SYSCALL per-MAC policy system call.
When we are checking the status of the fingerprint on a vnode using the
per-MAC-policy syscall, we do not need an exclusive lock on the vnode.
Even if there is more than one thread requesting the status at the same time,
the worst we can end up doing is processing the file more than once.
This can potentially be improved in the future with offloading the fingerprint
evaluation to a separate thread and blocking until the update completes. But
for now the race is acceptable.
Obtained from: Juniper Networks, Inc.
MFC after: 1 week
stevek [Fri, 17 May 2019 18:06:24 +0000 (18:06 +0000)]
Fix format strings for some debug messages that could have arguments that
are different types across architectures by using %ju and typecasting to
uintmax_t, where appropriate.
Obtained from: Juniper Networks, Inc.
MFC after: 1 week
stevek [Fri, 17 May 2019 18:02:26 +0000 (18:02 +0000)]
Protect commands that are considered dangerous with checks for kmem write
priv. This allows for MAC/veriexec to prevent apps that are not "trusted"
from using these commands.
Obtained from: Juniper Networks, Inc.
MFC after: 1 week
stevek [Fri, 17 May 2019 17:50:01 +0000 (17:50 +0000)]
Ensure we have obtained a lock on the process before calling
mac_veriexec_get_executable_flags(). Only try locking/unlocking if the caller
has not already acquired the process lock.
Obtained from: Juniper Networks, Inc.
MFC after: 1 week
stevek [Fri, 17 May 2019 17:21:32 +0000 (17:21 +0000)]
Instead of individual conditional statements to look for each hypervisor
type, use a table to make it easier to add more in the future, if needed.
Add VirtualBox detection to the table ("VBoxVBoxVBox" is the hypervisor
vendor string to look for.) Also add VM_GUEST_VBOX to the VM_GUEST
enumeration to indicate VirtualBox.
Save the CPUID base for the hypervisor entry that we detected. Driver code
may need to know about it in order to obtain additional CPUID features.
kib [Fri, 17 May 2019 17:11:01 +0000 (17:11 +0000)]
Free microcode memory later.
With lockless DI, pmap_remove() requires operational thread lock,
which is initialized at SI_SUB_RUN_QUEUE for thread0. Move it even
later where APs are started, the moment after which other boot memory
like trampoline stacks is already being freed.
Reported by: gtetlow
Sponsored by: The FreeBSD Foundation
MFC after: 30 days
manu [Fri, 17 May 2019 17:04:01 +0000 (17:04 +0000)]
pci: ecam: Do not warn on mismatch of bus_end
We cannot know the bus end number before parsing the MCFG table
so don't set the bus_end before that. If the MCFG table doesn't
exist we will set the configuration base address based on the _CBA
value and set the bus_end to the maximal number allowed by PCI.
dougm [Fri, 17 May 2019 15:52:17 +0000 (15:52 +0000)]
Implement the ffs and fls functions, and their longer counterparts, in
cpufunc, in terms of __builtin_ffs and the like, for arm64
architectures, and use those, rather than the simple libkern
implementations, in building arm64 kernels.
johalun [Thu, 16 May 2019 21:17:18 +0000 (21:17 +0000)]
LinuxKPI: Finalize import of seq_file.
seq_file.h and linux_seq_file.c was imported form ports earlier but
linux_seq_file.c was never compiled in with the module. With this
commit base seq_file will replace ports seq_file and it required a
few modifications to not break functionality and build.
hrs [Thu, 16 May 2019 19:09:41 +0000 (19:09 +0000)]
Fix hostname to be returned in an ICMPv6 NI Reply message defined
in RFC 4620, ICMPv6 Node Information Queries. A vnet jail with an
IPv6 address sent a hostname of the host environment, not the
jail, even if another hostname was set to the jail.
This change can be tested by the following commands:
johalun [Thu, 16 May 2019 17:44:17 +0000 (17:44 +0000)]
LinuxKPI: Update access_ok macro for v5.0.
Check LINUXKPI_VERSION macro for backwards compatibility.
It's recommended to update any drivers that depend on the older KPI
so we can deprecate < 5.0 code as we update to newer Linux version.
This patch is part of D19565
cem [Thu, 16 May 2019 17:34:36 +0000 (17:34 +0000)]
xdma(4): Fix invalid pointer use (breaks arm.SOCFPGA build)
In xdma_handle_mem_node(), vmem_size_t and vmem_addr_t pointers were passed to
an FDT API that emits u_long values to the output parameter pointer. This
broke on systems with both xdma and 32-bit vmem size/addr types (SOCFPGA).
kib [Thu, 16 May 2019 13:28:48 +0000 (13:28 +0000)]
amd64 pmap: rework delayed invalidation, removing global mutex.
For machines having cmpxcgh16b instruction, i.e. everything but very
early Athlons, provide lockless implementation of delayed
invalidation.
The implementation maintains lock-less single-linked list with the
trick from the T.L. Harris article about volatile mark of the elements
being removed. Double-CAS is used to atomically update both link and
generation. New thread starting DI appends itself to the end of the
queue, setting the generation to the generation of the last element
+1. On DI finish, thread donates its generation to the previous
element. The generation of the fake head of the list is the last
passed DI generation. Basically, the implementation is a queued
spinlock but without spinlock.
Many thanks both to Peter Holm and Mark Johnson for keeping with me
while I produced intermediate versions of the patch.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
MFC note: td_md.md_invl_gen should go to the end of struct thread
Differential revision: https://reviews.freebsd.org/D19630
kib [Thu, 16 May 2019 13:13:33 +0000 (13:13 +0000)]
rtld_malloc.c: cleanup morepages().
Use roundup2() and rounddown2() instead of inlining them.
Get rid of the fd local variable, use literal -1 for the mmap argument.
Use MAP_FAILED as mmap(2) failure indicator.
After that, apply some style.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
rlibby [Thu, 16 May 2019 05:29:54 +0000 (05:29 +0000)]
db show thread: avoid overflow in tick conversion
The previous calculations for displaying the time since last switch
easily overflowed, after less than 36 min for hz=1000. Now overflow
takes 2000 times longer (as long as ticks takes to wrap).
rlibby [Thu, 16 May 2019 04:24:08 +0000 (04:24 +0000)]
iommu static analysis cleanup
A static analyzer complained about a couple instances of checking a
variable against NULL after already having dereferenced it.
- dmar_gas_alloc_region: remove the tautological NULL checks
- dmar_release_resources / dmar_fini_fault_log: don't deref unit->regs
unless initialized.
And while here, fix an inverted initialization check in dmar_fini_qi.
cy [Thu, 16 May 2019 02:41:25 +0000 (02:41 +0000)]
The driver list prints "(null)" for the NDIS driver when -h (help) or
an unknown switch is passed outputting the command usage. This is
because the NDIS driver is uninitialized when usage help is printed.
To resolve this we initialize the driver prior to the possibility of
printing the usage help message.
Obtained from: The wpa_supplicant port
MFC after: 1 week
kevans [Thu, 16 May 2019 02:11:33 +0000 (02:11 +0000)]
libbe(3): Descend into children of datasets w/ mountpoint=none
These datasets will generally be canmount=noauto,mountpoint=none (e.g.
zroot/var) but have children that may need to be mounted. Instead of
skipping that segment for no good reason, descend.
rmacklem [Tue, 14 May 2019 22:00:47 +0000 (22:00 +0000)]
Replace global list for grouplist with list(s) for each exportlist element.
In mountd.c, the grouplist structures are linked into a single global
linked list headed by "grphead". The only use of this linked list is
to free all list elements when the exportlist elements are also all being
free'd at the time the exports are being reloaded.
This patch replaces this one global linked list head with a list head in
each exportlist structure, where the grouplist elements for that exported
file system are linked.
The only change is that now the grouplist elements are free'd with the
associated exportlist element as they are free'd instead of all grouplist
elements being free'd after the exportlist elements are free'd. This
change should have no effect in practice.
This is being done, since a future patch that will add a "-I" option for
incrementally updating the exports in the kernel needs to know which
grouplist elements are associated with each exported file system and
having them linked into a list headed by the exportlist element does that.
markj [Tue, 14 May 2019 21:30:55 +0000 (21:30 +0000)]
Close some races in multicast socket option handling.
r333175 converted the global multicast lock to a sleepable sx lock,
so the lock order with respect to the (non-sleepable) inp lock changed.
To handle this, r333175 and r333505 added code to drop the inp lock,
but this opened races that could leave multicast group description
structures in an inconsistent state. This change fixes the problem by
simply acquiring the global lock sooner. Along the way, this fixes
some LORs and bogus error handling introduced in r333175, and commits
some related cleanup.
kevans [Tue, 14 May 2019 20:32:29 +0000 (20:32 +0000)]
tuntap: Defer clearing if_softc until after if_detach
r346670 added an sx to close a race between the ifioctl handler and
interface destruction. Unfortunately, it clears if_softc immediately after
the interface is closed, but before if_detach has been invoked.
Any time before detachment, an interface that's part of a bridge may still
receive traffic that's pushed through tunstart/tunstart_l2 and promptly
lead to a panic because if_softc is now NULL.
Fix it by deferring the clearing of if_softc until after the interface has
detached and thus been removed from the bridge. if_softc still gets cleared
in case another thread has already entered the ioctl handler before it's
replaced with ifdead_ioctl.