oshogbo [Wed, 5 Jun 2019 22:29:05 +0000 (22:29 +0000)]
dtrace: 64-bits registers support
The registers in ilumos and FreeBSD have a different number.
In the illumos, last 32-bits register defined is SS an in FreeBSD is GS.
This off-by-one caused the uregs array to returns the wrong 64-bits register
on amd64.
kib [Wed, 5 Jun 2019 20:21:17 +0000 (20:21 +0000)]
In vm_map_entry_set_vnode_text(), tolerate tmpfs mappings for which
vnode is no longer resident.
Mapping of tmpfs file does not bump use count on the vnode, because
backing object has swap type. As result, even during normal
operations, and of course on forced unmount, we might end up with text
mapping from tmpfs node which has no vnode in memory. In this case,
there is no v_writecount to clear (this was done during reclaim), and
no reason to assert that the vnode is present.
Restructure the code to silently ignore OBJ_SWAP objects with
OBJ_TMPFS_NODE flag set, but OBJ_TMPFS flag clear.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
kib [Wed, 5 Jun 2019 20:16:25 +0000 (20:16 +0000)]
Manually clear text references on reclaim for nullfs and tmpfs.
Both filesystems do no use vnode_pager_dealloc() which would handle
this case otherwise. Nullfs because vnode vm_object handle never
points to nullfs vnode. Tmpfs because its vm_object is never vnode
object at all.
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
jhb [Wed, 5 Jun 2019 19:30:32 +0000 (19:30 +0000)]
Support MSI-X for passthrough devices with a separate PBA BAR.
pci_alloc_msix() requires both the table and PBA BARs to be allocated
by the driver. ppt was only allocating the table BAR so would fail
for devices with the PBA in a separate BAR. Fix this by allocating
the PBA BAR before pci_alloc_msix() if it is stored in a separate BAR.
While here, release BARs after calling pci_release_msi() instead of
before. Also, don't call bus_teardown_intr() in error handling code
if bus_setup_intr() has just failed.
jhb [Wed, 5 Jun 2019 19:29:02 +0000 (19:29 +0000)]
Don't simulate PBA access if the PBA is in a separate BAR.
bhyve has to virtualize the MSI-X table to trap reads and writes to
that table and map those to virtual interrupts that it maps real host
interrupts on to. For the pending-bit-array (PBA), bhyve passes
accesses from the guest directly to the hardware.
bhyve's virtualization of the MSI-X table is done by intercepting all
reads and writes to the BAR holding the MSI-X table. However, if the
PBA is stored in the same BAR as the MSI-X table, accesses to the PBA
portion of this BAR have to be forwarded to the real BAR.
However, in the case that the PBA was stored in a separate BAR and
it's offset in that separate BAR overlapped with the portion of the
MSI-X table BAR that the table used, the handlers for the table BAR
would incorrectly think that some accesses were PBA reads and writes.
This caused a crash in bhyve when it indirected a NULL pointer. Fix
this case by never trying to handle PBA access if the PBA lives in a
separate BAR.
emaste [Wed, 5 Jun 2019 14:08:39 +0000 (14:08 +0000)]
Use CLANG knob to remove llvm-symbolizer man page
r348504 moved llvm-symbolizer from the CLANG_EXTRAS knob to CLANG, but
the man page was still in the CLANG_EXTRAS section in
OptionalObsoleteFiles.inc.
Reported by: jhb
MFC after: 3 days
MFC with: r348504
avg [Wed, 5 Jun 2019 13:18:00 +0000 (13:18 +0000)]
first step towards enforcing must-succeed semantics for bus accessors
Unlike BUS_READ_IVAR / BUS_WRITE_IVAR, bus accessors do not have a
return code. It is assumed that there is a tight coupling between a bus
driver and a driver for a device on the bus with respect to instance
variables that the bus defines for its children. So, the driver is
supposed to have only valid accesses to the variables and, thus, the
accessors must always succeed.
Of course, programming errors sometimes happen. At present, such errors
go completely unnoticed. The idea of this change is to start catching
them. As a first step, there will be a warning about a failed accessor
call. This is to give developers a heads-up. I plan to replace the
printf with a KASSERT a week later, so that the warning is harder to
ignore.
cperciva [Wed, 5 Jun 2019 04:58:42 +0000 (04:58 +0000)]
Only respond to the PCIe Attention Button if a device is already plugged in.
Prior to this commit, if PCIEM_SLOT_STA_ABP and PCIEM_SLOT_STA_PDC are
asserted simultaneously, FreeBSD sets a 5 second "hardware going away" timer
and then processes the "presence detect" change. In the (physically
challenging) case that someone presses the "attention button" and inserts
a new PCIe device at exactly the same moment, this results in FreeBSD
recognizing that the device is present, attaching it, and then detaching it
5 seconds later.
On EC2 "bare metal" hardware this is the precise sequence of events which
takes place when a new EBS volume is attached; virtual machines have no
difficulty effecting physically implausible simultaneity.
This patch changes the handling of PCIEM_SLOT_STA_ABP to only detach a
device if the presence of a device was detected *before* the interrupt
which reports the Attention Button push.
Reported by: Matt Wilson
Reviewed by: jhb
MFC after: 1 week
Sponsored by: https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D20499
imp [Wed, 5 Jun 2019 00:08:30 +0000 (00:08 +0000)]
ufs_module.c can't currently be compiled with -Wcast-align, but the
code is safe enough. Turn off the warning for now until I can find the
right construct to silence it in the code.
Rather than using the legacy IP struct fields in the union for the
port number, properly access them by their IPv6 names.
This will make it easier to slice up and compile out address families
in the future.
markj [Tue, 4 Jun 2019 18:34:05 +0000 (18:34 +0000)]
elfcopy: Use libelftc's string table routines to build .shstrtab.
This replaces some hand-rolled routines and is substantially faster
since libelftc uses a hash table for lookups and insertions, whereas
elfcopy would perform a linear scan of the table.
PR: 234949
Reviewed by: emaste
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20473
markj [Tue, 4 Jun 2019 18:29:08 +0000 (18:29 +0000)]
elfcopy: Use elf_getscn() instead of iterating over all sections.
When removing a section, we would loop over all sections looking for
a corresponding relocation section. With r348652 it is much faster
to just use elf_getscn().
PR: 234949
Reviewed by: emaste
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20471
markj [Tue, 4 Jun 2019 18:26:29 +0000 (18:26 +0000)]
libelf: Use a red-black tree to manage the section list.
The tree is indexed by section number. This speeds up elf_getscn()
and its callers, which previously had to traverse a linked list. In
particular, since .shstrtab is often the last section in a file,
elf_strptr() would have to traverse the entire list.
PR: 234949
Reviewed by: emaste
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20443
hselasky [Tue, 4 Jun 2019 16:40:18 +0000 (16:40 +0000)]
In usb(4) fix a lost completion event issue towards libusb(3). It may happen
if a USB transfer is cancelled that we need to fake a completion event.
Implement missing support in ugen_fs_copy_out() to handle this.
This fixes issues with webcamd(8) and firefox.
MFC after: 3 days
Sponsored by: Mellanox Technologies
alc [Tue, 4 Jun 2019 16:21:14 +0000 (16:21 +0000)]
The changes to pmap_demote_pde_locked()'s control flow in r348476 resulted
in the loss of a KASSERT that guarded against the invalidation a wired
mapping. Restore this KASSERT.
Remove an unnecessary KASSERT from pmap_demote_pde_locked(). It guards
against a state that was already handled at the start of the function.
cem [Tue, 4 Jun 2019 16:07:01 +0000 (16:07 +0000)]
daemon(8): Don't block SIGTERM during restart delay
I believe this was introduced in the original '-r' commit, r231911 (2012).
At the time, the scope was limited to a 1 second sleep. r332518 (2018)
added '-R', which increased the potential duration of the affected interval
(from 1 to N seconds) by permitting arbitrary restart intervals.
Instead, handle SIGTERM normally during restart-sleep, when the monitored
process is not running, and shut down promptly.
(I noticed this behavior when debugging a child process that exited quickly
under the 'daemon -r -R 30' environment. 'kill <daemonpid>' had no
immediate effect and the monitor process slept until the next restart
attempt. This was annoying.)
emaste [Tue, 4 Jun 2019 13:07:10 +0000 (13:07 +0000)]
Expose the kernel's build-ID through sysctl
After our migration (of certain architectures) to lld the kernel is built
with a unique build-ID. Make it available via a sysctl and uname(1) to
allow the user to identify their running kernel.
emaste [Tue, 4 Jun 2019 13:00:49 +0000 (13:00 +0000)]
build llvm-ar and llvm-nm with Clang (promote out of CLANG_EXTRAS)
To facilitate experimentation with LTO we require an ar that supports
LLVM IR, and to a lesser degree also an nm. As a first step always
install llvm-ar and llvm-nm.
slavash [Tue, 4 Jun 2019 06:21:31 +0000 (06:21 +0000)]
Fix prio vs. nonprio tagged traffic in RDMACM
In current RDMACM implementation RDMACM server will not find a GID
index when the request was prio-tagged and the sever is non
prio-tagged and vise-versa.
According to 802.1Q-2014, VLAN tagged packets with VLAN id 0 should
be considered as untagged. Treat RDMACM request the same.
cem [Tue, 4 Jun 2019 02:37:11 +0000 (02:37 +0000)]
virtio(4): Add PNP match metadata for virtio devices
Register MODULE_PNP_INFO for virtio devices using the newbus PNP information
provided by the previous commit. Matching can be quite simple; existing
probe routines only matched on bus (implicit) and device_type. The same
matching criteria are retained exactly, but is now also available to
devmatch(8).
cem [Tue, 4 Jun 2019 02:34:59 +0000 (02:34 +0000)]
virtio(4): Expose PNP metadata through newbus
Expose the same fields and widths from both vtio buses, even though they
don't quite line up; several virtio drivers can attach to both buses,
and sharing a PNP info table for both seems more convenient.
In practice, I doubt any virtio driver really needs to match on anything
other than bus and device_type (eliminating the unused entries for
vtmmio), and also in practice device_type is << 2^16 (so far, values
range from 1 to 20). So it might be fine to only expose a 16-bit
device_type for PNP purposes. On the other hand, I don't see much harm
in overkill here.
cem [Tue, 4 Jun 2019 00:01:37 +0000 (00:01 +0000)]
virtio_random(4): Fix random(4) integration
random(4) masks unregistered entropy sources. Prior to this revision,
virtio_random(4) did not correctly register a random_source and did not
function as a source of entropy.
Random source registration for loadable pure sources requires registering a
poll callback, which is invoked periodically by random(4)'s harvestq
kthread. The periodic poll makes virtio_random(4)'s periodic entropy
collection redundant, so this revision removes the callout.
The current random source API is somewhat limiting, so simply fail to attach
any virtio_random devices if one is already registered as a source. This
scenario is expected to be uncommon.
While here, handle the possibility of short reads from the hypervisor random
device gracefully / correctly. It is not clear why a hypervisor would
return a short read or if it is allowed by spec, but we may as well handle
it.
Reviewed by: bryanv (earlier version), markm
Security: yes (note: many other "pure" random sources remain broken)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D20419
rmacklem [Mon, 3 Jun 2019 22:58:51 +0000 (22:58 +0000)]
Modify mountd so that it incrementally updates the kernel exports upon a reload.
Without this patch, mountd would delete/load all exports from the exports
file(s) when it receives a SIGHUP. This works fine for small exports file(s),
but can take several seconds to do when there are large numbers (10000+) of
exported file systems. Most of this time is spent doing the system calls
that delete/export each of these file systems. When the "-S" option
has been specified (the default these days), the nfsd threads are suspended
for several seconds while the reload is done.
This patch changes mountd so that it only does system calls for file systems
where the exports have been changed/added/deleted as compared to the exports
done for the previous load/reload of the exports file(s).
Basically, when SIGHUP is posted to mountd, it saves the exportlist structures
from the previous load and creates a new set of structures from the current
exports file(s). Then it compares the current with the previous and only does
system calls for cases that have been changed/added/deleted.
The nfsd threads do not need to be suspended until the comparison step is
being done. This results in a suspension period of milliseconds for a server
with 10000+ exported file systems.
There is some code using a LOGDEBUG() macro that allow runtime debugging
output via syslog(LOG_DEBUG,...) that can be enabled by creating a file
called /var/log/mountd.debug. This code is expected to be replaced with
code that uses dtrace by cy@ in the near future, once issues w.r.t. dtrace
in stable/12 have been resolved.
The patch should not change the usage of the exports file(s), but improves
the performance of reloading large exports file(s) where there are only a
small number of changes done to the file(s).
Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
This is irrelevant to FreeBSD, just to reduce divergence.
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: George Wilson <george.wilson@delphix.com>
imp [Mon, 3 Jun 2019 19:10:46 +0000 (19:10 +0000)]
[zfsboot] Fix boot env back compat (#190)
* Fix boot env back compat
zfsboot must try zfsloader before loader in order to remain compatible
with boot environments created prior to zfs functionality being rolled
into loader proper.
* Improve comments in zfsboot
Explain the significance of the load path order, and put the comment
about looping through the paths in the appropriate scope.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Allan Jude <allanjude@freebsd.org>
Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andrew Stormont <astormont@racktopsystems.com>
jhb [Mon, 3 Jun 2019 15:41:54 +0000 (15:41 +0000)]
Add 'device cxgbe' explicitly in the synopsis.
ccr depends on symbols exported by the cxgbe driver as well as having
a runtime dependency. While the runtime depenency was noted in the
manpage already, the compile-time dependency wasn't as clear.
PR: 238265
MFC after: 3 days
Sponsored by: Chelsio Communications
kib [Mon, 3 Jun 2019 15:32:42 +0000 (15:32 +0000)]
amd64 ef_rt_arch_call: Preserve %rflags around call into EFI RT service.
If service code faulted, we might end up unwinding with interrupts
disabled. Top-level kernel code should have interrupts enabled, which
is enforced by checks.
Save %rflags before entering EFI, and restore to the known good value
on return. This handles situation with disabled interrupts on fault
and perhaps other potential bugs, e.g. invalid value for PSL_D.
Reported and tested by: Jan Martin Mikkelsen <janm@transactionware.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
sobomax [Mon, 3 Jun 2019 15:12:44 +0000 (15:12 +0000)]
Leave mtree hardcoded for now. Reverting partially 348521 and also
the followup stopgap change, because I don't think it's a correct. I still
need to figure out where to stick it in. In cannot be in Makefile.inc1
and it cannot be in etc/Makefile from the looks of it to avoid
chicken-and-egg problem.
wulf [Mon, 3 Jun 2019 10:04:34 +0000 (10:04 +0000)]
psm(4): Add natural scrolling support to sysmouse protocol
This change enables natural scrolling with two finger scroll enabled
and when user is using a trackpad (mouse and trackpoint are not affected).
Depending on trackpad model it can be activated with setting of
hw.psm.synaptics.natural_scroll or hw.psm.elantech.natural_scroll sysctl
values to 1.
Evdev protocol is not affected by this change too. Tune userland client
e.g. libinput to enable natural scrolling in that case.
imp [Mon, 3 Jun 2019 05:25:22 +0000 (05:25 +0000)]
Another partial revert of r301289.
In this case, a change was made in one-true-awk from *FS to
getsval(fsloc) in a line just after one of the lines that had the 0 ->
NULL change. It works both ways as far as I can tell. It looks like a
bug fix, but I've not tried to track down which ancient version of
one-true-awk it was in (github starts too late for tracking this
down). Before and after the changes the regression suite is passes
100% relative to the un-modified one-true-awk.
imp [Mon, 3 Jun 2019 05:25:16 +0000 (05:25 +0000)]
Fix mismerge that crept into r301289.
The conversion of 0 -> NULL required a rebase at some point, as noted
in r301289 when pfg commited it. In that rebase, three lines remained
that had been removed in a prior version of awk, and one of them had a
0 -> NULL change causing a conflict. The conflict should have been
resolved by removing the three lines, but wasn't. This introduces a
regression into f.split3 test which prior to this commit we were
failing, but a pure onetrueawk wasn't. Remove the offending 3 lines.
alc [Mon, 3 Jun 2019 05:15:36 +0000 (05:15 +0000)]
Retire vm_reserv_extend_{contig,page}(). These functions were introduced
as part of a false start toward fine-grained reservation locking. In the
end, they were not needed, so eliminate them.
Order the parameters to vm_reserv_alloc_{contig,page}() consistently with
the vm_page functions that call them.
Update the comments about the locking requirements for
vm_reserv_alloc_{contig,page}(). They no longer require a free page
queues lock.
Wrap several lines that became too long after the "req" and "domain"
parameters were added to vm_reserv_alloc_{contig,page}().