jhibbits [Mon, 20 May 2019 02:41:09 +0000 (02:41 +0000)]
ksyms: Fixup symbols for powerpc in the kernel, not just modules
Summary:
PowerPC kernels are fully position independent, just like kernel modules.
The same fixups that are done for modules therefore need to be done to the
kernel, else symbol resolution in, e.g., DTrace, cannot resolve the kernel
symbols, so only addresses in the kernel are printed, while kernel module
symbols are printed.
Test Plan:
Run lockstat on powerpc64. Note symbols are resolved for kernel and
modules.
cem [Mon, 20 May 2019 00:38:23 +0000 (00:38 +0000)]
Extract eventfilter declarations to sys/_eventfilter.h
This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h"
in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header
pollution substantially.
EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c
files into appropriate headers (e.g., sys/proc.h, powernv/opal.h).
As a side effect of reduced header pollution, many .c files and headers no
longer contain needed definitions. The remainder of the patch addresses
adding appropriate includes to fix those files.
LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by
sys/mutex.h since r326106 (but silently protected by header pollution prior
to this change).
No functional change (intended). Of course, any out of tree modules that
relied on header pollution for sys/eventhandler.h, sys/lock.h, or
sys/mutex.h inclusion need to be fixed. __FreeBSD_version has been bumped.
melifaro [Sun, 19 May 2019 21:49:56 +0000 (21:49 +0000)]
Fix rt_ifa selection during loopback route insertion process.
Currently such routes are added with a link-level IFA, which is
plain wrong. Only after the insertion they get fixed by the special
link_rtrequest() ifa handler. This behaviour complicates routing code
and makes ifa selection more complex.
Streamline this process by explicitly moving link_rtrequest() logic
to the pre-insertion rt_getifa_fib() ifa selector. Avoid calling all
this logic in the loopback route case by explicitly specifying
proper rt_ifa inside the ifa_maintain_loopback_route().ยง
dim [Sun, 19 May 2019 20:13:55 +0000 (20:13 +0000)]
To avoid unnecessarily modifying ports, add a -lgomp symlink, since GCC
does not ship a -lomp symlink. Also update OptionalObsoleteFiles for
this, and add 32-bit variants while here.
dim [Sun, 19 May 2019 19:42:35 +0000 (19:42 +0000)]
Fix OptionalObsoleteFiles copy/paste mistake from r345236, which
connected libomp to the build. The comparison should not have been
against ${MK_OPENSSH}, but against ${MK_OPENMP}, obviously.
ian [Sun, 19 May 2019 16:56:59 +0000 (16:56 +0000)]
Add common support functions for USB devices configured via FDT data.
FDT data is sometimes used to configure usb devices which are hardwired into
an embedded system. Because the devices are instantiated by the usb
enumeration process rather than by ofwbus iterating through the fdt data, it
is somewhat difficult for a usb driver to locate fdt data that belongs to
it. In the past, various ad-hoc methods have been used, which can lead to
errors such applying configuration that should apply only to a hardwired
device onto a similar device attached by the user at runtime. For example,
if the user adds an ethernet device that uses the same driver as the builtin
ethernet, both devices might end up with the same MAC address.
These changes add a new usb_fdt_get_node() helper function that a driver can
use to locate FDT data that belongs to a single unique instance of the
device. This function locates the proper FDT data using the mechanism
detailed in the standard "usb-device.txt" binding document [1].
There is also a new usb_fdt_get_mac_addr() function, used to retrieve the
mac address for a given device instance from the fdt data. It uses
usb_fdt_get_node() to locate the right node in the FDT data, and attempts to
obtain the mac-address or local-mac-address property (in that order, the
same as linux does it).
The existing if_smsc driver is modified to use the new functions, both as an
example and for testing the new functions. Rpi and rpi2 boards use this
driver and provide the mac address via the fdt data.
johalun [Sun, 19 May 2019 15:44:21 +0000 (15:44 +0000)]
LinuxKPI: Finalize move of lindebugfs from ports to base.
The source file was moved to base earlier and also improved upon,
but never compiled in. This patch will:
- Make a module in sys/modules
- Make lindebugfs depend on linuxkpi (for seq_file)
- Check if read/write functions are set before calling, DRM drivers
don't always set both of them.
jhb [Sat, 18 May 2019 21:20:38 +0000 (21:20 +0000)]
Expose the MD_CLEAR capability used by Intel MDS mitigations to guests.
Submitted by: Patrick Mooney <pmooney@pfmooney.com>
Reviewed by: kib
Tested by: Patrick on SmartOS with Linux and Windows guests
Obtained from: Joyent
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D20296
rgrimes [Sat, 18 May 2019 19:32:38 +0000 (19:32 +0000)]
bhyve virtio needs barriers
Under certain tight race conditions, we found that the lack of a memory
barrier in bhyve's virtio handling causes it to miss a NO_NOTIFY state
transition on block devices, resulting in guest stall. The investigation
is recorded in OS-7613. As part of the examination into bhyve's use of
barriers, one other section was found to be problematic, but only on
non-x86 ISAs with less strict memory ordering. That was addressed in
this patch as well, although it was not at all a problem on x86.
kib [Sat, 18 May 2019 16:19:31 +0000 (16:19 +0000)]
Make lock-less delayed invalidation operational very early.
Apparently, there is more code trying to call pmap_remove() early,
mostly to free preloaded memory. Instead of moving all deallocations
to the point where a scheduler is initialized, add missed setup of
thread0 di init at hammer_time().
The code in pmap_delayed_invl_start_u() is modified to not ever take
the thread lock if the thread priority is less or equal to PVM. Since
thread0 starts at priority 0, and then is reset to PVM at
proc0_init(), this eliminates taking the thread lock during early
boot.
While there, fix off by one in comparision of the base priority.
Reported and tested by: bcran (previous version)
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 29 days
The `zfs userspace` squashes all entries with unresolved numeric
values into a single output entry due to the comparsion always
made by the string name which is empty in case of unresolved IDs.
Fix this by falling to a numerical comparison when either one
of string values is not found. This then compares any numerical
values after all with a name resolved.
Signed-off-by: Pavel Boldin <boldin.pavel@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reported by: clusteradm
Obtained from: ZFS-on-Linux
MFC after: 3 days
jhibbits [Sat, 18 May 2019 11:14:43 +0000 (11:14 +0000)]
powerpc64/pmap: NUMA-ize the page pv lock pool to reduce contention
It was found during building llvm that the page pv lock pool was seeing very
high contention. Since the pmap is already NUMA aware, it was surmised that
the domains were referencing similar pages in the different domains. This
reduces contention to the point of noise in a lockstat(8) run (~51% down to
under 5%), reducing build times by up to 20%.
This doesn't do a perfect domain alignment, just a best-guess based on
hardware available, that the domain is roughly specified in the upper bits
of the PA. Trying to be more clever would more than likely result in
reduced performance just on the work needed.
markj [Sat, 18 May 2019 02:02:14 +0000 (02:02 +0000)]
Use M_NEXTFIT in memguard(9).
memguard(9) wants to avoid reuse of freed addresses for as long as
possible. Previously it maintained a racily updated cursor which was
passed to vmem_xalloc(9) as the minimum address. However, vmem will
not in general return the lowest free address in the arena, so this
trick only really works until the cursor has wrapped around the first
time.
markj [Sat, 18 May 2019 01:46:38 +0000 (01:46 +0000)]
Implement the M_NEXTFIT allocation strategy for vmem(9).
This is described in the vmem paper: "directs vmem to use the next free
segment after the one previously allocated." The implementation adds a
new boundary tag type, M_CURSOR, which is linked into the segment list
and precedes the segment following the previous M_NEXTFIT allocation.
The cursor is used to locate the next free segment satisfying the
allocation constraints.
This implementation isn't O(1) since busy tags aren't coalesced, and we
may potentially scan the entire segment list during an M_NEXTFIT
allocation.
cem [Sat, 18 May 2019 00:22:28 +0000 (00:22 +0000)]
Add DragonFly's partition number to fdisk(8) and diskmbr.h
This change doesn't make any attempt to add support for these slices to the
relevent GEOM classes. Just register the number in fdisk and the canonical
list of kernel macros (diskmbr.h).
stevek [Fri, 17 May 2019 18:13:43 +0000 (18:13 +0000)]
Obtain a shared lock instead of exclusive in the MAC/veriexec
MAC_VERIEXEC_CHECK_PATH_SYSCALL per-MAC policy system call.
When we are checking the status of the fingerprint on a vnode using the
per-MAC-policy syscall, we do not need an exclusive lock on the vnode.
Even if there is more than one thread requesting the status at the same time,
the worst we can end up doing is processing the file more than once.
This can potentially be improved in the future with offloading the fingerprint
evaluation to a separate thread and blocking until the update completes. But
for now the race is acceptable.
Obtained from: Juniper Networks, Inc.
MFC after: 1 week
stevek [Fri, 17 May 2019 18:06:24 +0000 (18:06 +0000)]
Fix format strings for some debug messages that could have arguments that
are different types across architectures by using %ju and typecasting to
uintmax_t, where appropriate.
Obtained from: Juniper Networks, Inc.
MFC after: 1 week
stevek [Fri, 17 May 2019 18:02:26 +0000 (18:02 +0000)]
Protect commands that are considered dangerous with checks for kmem write
priv. This allows for MAC/veriexec to prevent apps that are not "trusted"
from using these commands.
Obtained from: Juniper Networks, Inc.
MFC after: 1 week
stevek [Fri, 17 May 2019 17:50:01 +0000 (17:50 +0000)]
Ensure we have obtained a lock on the process before calling
mac_veriexec_get_executable_flags(). Only try locking/unlocking if the caller
has not already acquired the process lock.
Obtained from: Juniper Networks, Inc.
MFC after: 1 week
stevek [Fri, 17 May 2019 17:21:32 +0000 (17:21 +0000)]
Instead of individual conditional statements to look for each hypervisor
type, use a table to make it easier to add more in the future, if needed.
Add VirtualBox detection to the table ("VBoxVBoxVBox" is the hypervisor
vendor string to look for.) Also add VM_GUEST_VBOX to the VM_GUEST
enumeration to indicate VirtualBox.
Save the CPUID base for the hypervisor entry that we detected. Driver code
may need to know about it in order to obtain additional CPUID features.
kib [Fri, 17 May 2019 17:11:01 +0000 (17:11 +0000)]
Free microcode memory later.
With lockless DI, pmap_remove() requires operational thread lock,
which is initialized at SI_SUB_RUN_QUEUE for thread0. Move it even
later where APs are started, the moment after which other boot memory
like trampoline stacks is already being freed.
Reported by: gtetlow
Sponsored by: The FreeBSD Foundation
MFC after: 30 days
manu [Fri, 17 May 2019 17:04:01 +0000 (17:04 +0000)]
pci: ecam: Do not warn on mismatch of bus_end
We cannot know the bus end number before parsing the MCFG table
so don't set the bus_end before that. If the MCFG table doesn't
exist we will set the configuration base address based on the _CBA
value and set the bus_end to the maximal number allowed by PCI.
dougm [Fri, 17 May 2019 15:52:17 +0000 (15:52 +0000)]
Implement the ffs and fls functions, and their longer counterparts, in
cpufunc, in terms of __builtin_ffs and the like, for arm64
architectures, and use those, rather than the simple libkern
implementations, in building arm64 kernels.
johalun [Thu, 16 May 2019 21:17:18 +0000 (21:17 +0000)]
LinuxKPI: Finalize import of seq_file.
seq_file.h and linux_seq_file.c was imported form ports earlier but
linux_seq_file.c was never compiled in with the module. With this
commit base seq_file will replace ports seq_file and it required a
few modifications to not break functionality and build.
hrs [Thu, 16 May 2019 19:09:41 +0000 (19:09 +0000)]
Fix hostname to be returned in an ICMPv6 NI Reply message defined
in RFC 4620, ICMPv6 Node Information Queries. A vnet jail with an
IPv6 address sent a hostname of the host environment, not the
jail, even if another hostname was set to the jail.
This change can be tested by the following commands:
johalun [Thu, 16 May 2019 17:44:17 +0000 (17:44 +0000)]
LinuxKPI: Update access_ok macro for v5.0.
Check LINUXKPI_VERSION macro for backwards compatibility.
It's recommended to update any drivers that depend on the older KPI
so we can deprecate < 5.0 code as we update to newer Linux version.
This patch is part of D19565
cem [Thu, 16 May 2019 17:34:36 +0000 (17:34 +0000)]
xdma(4): Fix invalid pointer use (breaks arm.SOCFPGA build)
In xdma_handle_mem_node(), vmem_size_t and vmem_addr_t pointers were passed to
an FDT API that emits u_long values to the output parameter pointer. This
broke on systems with both xdma and 32-bit vmem size/addr types (SOCFPGA).
kib [Thu, 16 May 2019 13:28:48 +0000 (13:28 +0000)]
amd64 pmap: rework delayed invalidation, removing global mutex.
For machines having cmpxcgh16b instruction, i.e. everything but very
early Athlons, provide lockless implementation of delayed
invalidation.
The implementation maintains lock-less single-linked list with the
trick from the T.L. Harris article about volatile mark of the elements
being removed. Double-CAS is used to atomically update both link and
generation. New thread starting DI appends itself to the end of the
queue, setting the generation to the generation of the last element
+1. On DI finish, thread donates its generation to the previous
element. The generation of the fake head of the list is the last
passed DI generation. Basically, the implementation is a queued
spinlock but without spinlock.
Many thanks both to Peter Holm and Mark Johnson for keeping with me
while I produced intermediate versions of the patch.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
MFC note: td_md.md_invl_gen should go to the end of struct thread
Differential revision: https://reviews.freebsd.org/D19630
kib [Thu, 16 May 2019 13:13:33 +0000 (13:13 +0000)]
rtld_malloc.c: cleanup morepages().
Use roundup2() and rounddown2() instead of inlining them.
Get rid of the fd local variable, use literal -1 for the mmap argument.
Use MAP_FAILED as mmap(2) failure indicator.
After that, apply some style.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
rlibby [Thu, 16 May 2019 05:29:54 +0000 (05:29 +0000)]
db show thread: avoid overflow in tick conversion
The previous calculations for displaying the time since last switch
easily overflowed, after less than 36 min for hz=1000. Now overflow
takes 2000 times longer (as long as ticks takes to wrap).
rlibby [Thu, 16 May 2019 04:24:08 +0000 (04:24 +0000)]
iommu static analysis cleanup
A static analyzer complained about a couple instances of checking a
variable against NULL after already having dereferenced it.
- dmar_gas_alloc_region: remove the tautological NULL checks
- dmar_release_resources / dmar_fini_fault_log: don't deref unit->regs
unless initialized.
And while here, fix an inverted initialization check in dmar_fini_qi.
cy [Thu, 16 May 2019 02:41:25 +0000 (02:41 +0000)]
The driver list prints "(null)" for the NDIS driver when -h (help) or
an unknown switch is passed outputting the command usage. This is
because the NDIS driver is uninitialized when usage help is printed.
To resolve this we initialize the driver prior to the possibility of
printing the usage help message.
Obtained from: The wpa_supplicant port
MFC after: 1 week