Kyle Evans [Wed, 25 Nov 2020 03:14:25 +0000 (03:14 +0000)]
MFC kern: cpuset: properly rebase when attaching to a jail
The current logic is a fine choice for a system administrator modifying
process cpusets or a process creating a new cpuset(2), but not ideal for
processes attaching to a jail.
Currently, when a process attaches to a jail, it does exactly what any other
process does and loses any mask it might have applied in the process of
doing so because cpuset_setproc() is entirely based around the assumption
that non-anonymous cpusets in the process can be replaced with the new
parent set.
This approach slightly improves the jail attach integration by modifying
cpuset_setproc() callers to indicate if they should rebase their cpuset to
the indicated set or not (i.e. cpuset_setproc_update_set).
If we're rebasing and the process currently has a cpuset assigned that is
not the containing jail's root set, then we will now create a new base set
for it hanging off the jail's root with the existing mask applied instead of
using the jail's root set as the new base set.
Note that the common case will be that the process doesn't have a cpuset
within the jail root, but the system root can freely assign a cpuset from
a jail to a process outside of the jail with no restriction. We assume that
that may have happened or that it could happen due to a race when we drop
the proc lock, so we must recheck both within the loop to gather up
sufficient freed cpusets and after the loop.
To recap, here's how it worked before in all cases:
0 4 <-- jail 0 4 <-- jail / process
| |
1 -> 1
|
3 <-- process
0 4 <-- jail 0 4 <-- jail / process
| |
1 <-- process -> 1
More importantly, in both cases, the attaching process still retains the
mask it had prior to attaching or the attach fails with EDEADLK if it's
left with no CPUs to run on or the domain policy is incompatible. The
author of this patch considers this almost a security feature, because a MAC
policy could grant PRIV_JAIL_ATTACH to an unprivileged user that's
restricted to some subset of available CPUs the ability to attach to a jail,
which might lift the user's restrictions if they attach to a jail with a
wider mask.
In most cases, it's anticipated that admins will use this to be able to,
for example, `cpuset -c -l 1 jail -c path=/ command=/long/running/cmd`,
and avoid the need for contortions to spawn a command inside a jail with a
more limited cpuset than the jail.
Kyle Evans [Sat, 12 Dec 2020 05:57:42 +0000 (05:57 +0000)]
MFC lualoader: module-manipulation commands
4634bb1f: lualoader: provide module-manipulation commands
Specifically, we have:
- enable-module
- disable-module
- toggle-module
These can be used to add/remove modules to be loaded or force modules to be
loaded in spite of modules_blacklist. In the typical case, a user is
expected to use them to recover an issue happening due to a module directive
they've added to their loader.conf or because they discover that they've
under-specified what to load.
A last minute rewrite left this logically wrong; if it's present in
modules_blacklist, then we do not load it.
7ed84fa1: lualoader: cli: provide a show-module-options loader command
This effectively dumps everything lualoader knows about to the console using
the libsa pager; that particular lua interface was added in r368591.
A pager stub implementation has been added that just dumps the output as-is
as a compat shim for older loader binaries that do not have lpager. This
stub should be moved into a more appropriate .lua file if we add anything
else that needs the pager.
Kyle Evans [Sat, 12 Dec 2020 21:25:38 +0000 (21:25 +0000)]
MFC: stand: liblua: add a pager module
This is nearly a 1:1 mapping of the pager API from libsa. The only real
difference is that pager.output() will accept any number of arguments and
coerce all of them to strings for output using luaL_tolstring (i.e. the
__tostring metamethod will be used).
The only consumer planned at this time is the upcoming "show-module-options"
implementation.
Kyle Evans [Sat, 19 Dec 2020 03:30:06 +0000 (03:30 +0000)]
MFC kern: cpuset: allow jails to modify child jails' roots
This partially lifts a restriction imposed by r191639 ("Prevent a superuser
inside a jail from modifying the dedicated root cpuset of that jail") that's
perhaps beneficial after r192895 ("Add hierarchical jails."). Jails still
cannot modify their own cpuset, but they can modify child jails' roots to
further restrict them or widen them back to the modifying jails' own mask.
As a side effect of this, the system root may once again widen the mask of
jails as long as they're still using a subset of the parent jails' mask.
This was previously prevented by the fact that cpuset_getroot of a root set
will return that root, rather than the root's parent -- cpuset_modify uses
cpuset_getroot since it was introduced in r327895, previously it was just
validating against set->cs_parent which allowed the system root to widen
jail masks.
The existing logic is nice in theory, but in practice freebsd-update will
not preserve the timestamps on these files. When doing a major upgrade, e.g.
from 12.1-RELEASE -> 12.2-RELEASE, pwd.mkdb et al. appear in the INDEX and
we clobber the timestamp several times in the process of packaging up the
existing system into /var/db/freebsd-update/files and extracting for
comparisons. This leads to these files not getting regenerated when they're
most likely to be needed.
Measures could be taken to preserve timestamps, but it's unclear whether
the complexity and overhead of doing so is really outweighed by the marginal
benefit.
I observed this issue when pkg subsequently failed to install a package that
wanted to add a user, claiming that the user was removed in the process.
bapt@ pointed to this pre-existing bug with freebsd-update as the cause.
Gordon Bergling [Sat, 19 Dec 2020 13:51:46 +0000 (13:51 +0000)]
MFC r368815: zonectl(8): Fix a few issues reported by mandoc
- Add missing quotation mark for a comment above the .Dd
- inserting missing end of block: Sh breaks Bd
- skipping paragraph macro: Pp before Bl
- skipping paragraph macro: Pp before Bd
- empty block: Bd
Gordon Bergling [Sat, 19 Dec 2020 13:36:59 +0000 (13:36 +0000)]
MFC r368813: bluetooth: Fix a mandoc related issues
- new sentence, new line
- sections out of conventional order: Sh FILES
- unusual Xr order: bthost(1) after bthidd(8)
- no blank before trailing delimiter
- whitespace at end of input line
- sections out of conventional order: Sh EXIT STATUS
Gordon Bergling [Sat, 19 Dec 2020 10:15:58 +0000 (10:15 +0000)]
MFC r368793: bhnd_erom(9): Fix a few mandoc related issues
- skipping paragraph macro: Pp before Bl
- skipping paragraph macro: Pp after Ss
- skipping paragraph macro: Pp at the end of Ss
- unusual Xr punctuation: none before bhnd_driver_get_erom_class(9)
- unusual Xr punctuation: none before bus_space(9)
Gordon Bergling [Sat, 19 Dec 2020 10:11:37 +0000 (10:11 +0000)]
MFC r368792: bhnd(9): Fix a few mandoc related issues
- skipping paragraph macro: Pp before Bl
- skipping paragraph macro: Pp at the end of Ss
- missing section argument: Xr device_set_desc
- unusual Xr punctuation: none before bhnd_erom(9)
Mark Johnston [Mon, 7 Dec 2020 14:52:57 +0000 (14:52 +0000)]
iflib: Detach tasks upon device registration failure
In some error paths we would fail to detach from the iflib taskqueue
groups. Also move the detach code into its own subroutine instead of
duplicating it.
Submitted by: Sai Rajesh Tallamraju <stallamr@netapp.com>
Sponsored by: NetApp, Inc.
Kyle Evans [Sun, 22 Nov 2020 05:47:45 +0000 (05:47 +0000)]
MFC kern: _umtx_op: introduce 32-bit/i386 flags for operations
This patch takes advantage of the consolidation that happened to provide two
flags that can be used with the native _umtx_op(2): UMTX_OP___32BIT and
UMTX_OP__I386.
UMTX_OP__32BIT iindicates that we are being provided with 32-bit structures.
Note that this flag alone indicates a 64bit time_t, since this is the
majority case.
UMTX_OP__I386 has been provided so that we can emulate i386 as well,
regardless of whether the host is amd64 or not.
Both imply a different set of copyops in sysumtx_op. freebsd32__umtx_op
simply ignores the flags, since it's already doing a 32-bit operation and
it's unlikely we'll be running an emulator under compat32. Future work
could consider it, but the author sees little benefit.
This will be used by qemu-bsd-user to pass on all _umtx_op calls to the
native interface as long as the host/target endianness matches, effectively
eliminating most if not all of the remaining unresolved deadlocks for most.
This version changed a fair amount from what was under review, mostly in
response to refactoring of the prereq reorganization and battle-testing
it with qemu-bsd-user. The main changes are as follows:
1.) The i386 flag got renamed to omit '32BIT' since this is redundant.
2.) The flags are now properly handled on 32-bit platforms to emulate other
32-bit platforms.
3.) Robust list handling was fixed, and the 32-bit functionality that was
previously gated by COMPAT_FREEBSD32 is now unconditional.
4.) Robust list handling was also improved, including the error reported
when a process has already registered 32-bit ABI lists and also
detecting if native robust lists have already been registered. Both
scenarios now return EBUSY rather than EINVAL, because the input is
technically valid but we're too busy with another ABI's lists.
libsysdecode/kdump/truss support will go into review soon-ish, along with
the associated manpage update.
Kyle Evans [Tue, 17 Nov 2020 03:36:58 +0000 (03:36 +0000)]
MFC kern: _umtx_op: compat32 refactoring
63ecb272: umtx_op: reduce redundancy required for compat32
All of the compat32 variants are substantially the same, save for
copyin/copyout (mostly). Apply the same kind of technique used with kevent
here by having the syscall routines supply a umtx_copyops describing the
operations needed.
umtx_copyops carries the bare minimum needed- size of timespec and
_umtx_time are used for determining if copyout is needed in the sem2_wait
case.
One of the last shifts inadvertently moved these static assertions out of a
COMPAT_FREEBSD32 block, which the relevant definitions are limited to.
Fix it.
27a9392d: _umtx_op: fix robust lists after r367744
A copy-pasto left us copying in 24-bytes at the address of the rb pointer
instead of the intended target.
15eaec6a: _umtx_op: move compat32 definitions back in
These are reasonably compact, and a future commit will blur the compat32
lines by supporting 32-bit operations with the native _umtx_op.
60e60e73: freebsd32: take the _umtx_op struct definitions back
Providing these in freebsd32.h facilitates local testing/measuring of the
structs rather than forcing one to locally recreate them. Sanity checking
offsets/sizes remains in kern_umtx.c where these are typically used.
Brandon Bergren [Fri, 21 Aug 2020 03:31:01 +0000 (03:31 +0000)]
MFC r364447:
[PowerPC] Fix translation-related crashes during startup
After spending a lot of time trying to track down what was going on, I have
isolated the "black screen" failures when using boot1 to boot a G4 machine.
It turns out we were replacing the traps before installing the temporary
BAT entry for the bottom of physical memory. That meant that until the MMU
was bootstrapped, the cached translations were the only thing keeping us
from losing.
Throwing boot1 into the mix was affecting execution flow enough to cause us
to hit an uncached page and crash.
Fix this by properly setting up the initial BAT entry at the same time we
are replacing the OpenFirmware traps, so we can continue executing in
segment 0 until the rest of the DMAP has been set up.
A second thing discovered while researching this is that we were entering a
BAT region for segment 16. It turns out this range was a) considered part
of KVA, and b) has firmware mappings with varying attributes.
If we ever accessed an unmapped page in segment 16, it would cause a BAT
entry to be installed for the whole segment, which would bypass the
existing mappings until it was flushed out again.
Instead, translate the OFW memory attributes into VM memory attributes and
install the ranges into the kernel address space properly.
Brandon Bergren [Mon, 1 Jun 2020 19:40:59 +0000 (19:40 +0000)]
MFC r361703:
[PowerPC] Fix build-id note on powerpc64 kernel
Due to the ordering of the powerpc64 linker script, we were discarding
all notes before emitting .note.gnu.build-id. This had the effect of
generating an empty build id section and breaking the kern.build_id
sysctl added in r348611.
Conrad Meyer [Wed, 18 Dec 2019 06:22:28 +0000 (06:22 +0000)]
MFC r355876 (by cem):
acpi(4): Add _CID to PNP info string
While a given ACPI device may have 0-N compatibility IDs, in practice most
seem to have 0 or 1. If one is present, emit it as part of the PNP info
string associated with a device. This could enable MODULE_PNP_INFO-based
automatic kldload for ACPI drivers associated with a given _CID (but without
a good _HID or _UID identifier).
Michael Tuexen [Fri, 18 Dec 2020 10:13:28 +0000 (10:13 +0000)]
MFC r368593:
Clean up more resouces of an existing SCTP association in case of
a restart.
This fixes a use-after-free scenario, which was reported by Felix
Wilhelm from Google in case a peer is able to modify the cookie.
However, this can also be triggered by an assciation restart under
some specific conditions.
MFC r368622:
Harden the handling of outgoing streams in case of an restart or INIT
collision. This avouds an out-of-bounce access in case the peer can
break the cookie signature. Thanks to Felix Wilhelm from Google for
reporting the issue.
Michael Tuexen [Fri, 18 Dec 2020 10:08:11 +0000 (10:08 +0000)]
MFC r368394:
When dropping packets (RRQ or WRQ) for debugging, report the send
operation as successful. Reporting a failure stops the transfer
instead of using timeouts.
MFC r368521:
Fix the TFTP client when performing a RRQ for files smaller than 512 bytes
and the server not sending an OACK:
* Close the file.
* Report the correct the number of received blocks.
MFC r368647:
Improve the counting of blocks used to transfer a file from the
server to the client in case of not using an OACK: Don't miss
the first block in case of it is not also the last one.
MFC r368657:
When receiving a file having a length, which is a mulitple of the blocksize,
close the file once it is received.
Ryan Libby [Fri, 18 Dec 2020 08:40:33 +0000 (08:40 +0000)]
MFC r350739-r350740 (by cem)
r350739:
Disable useless -Wformat-zero-length
It is part of -Wformat, which is enabled by -Wall. Empty format strings are
well defined and it is perfectly reasonable to expect them in a formatting
interface.
r350740:
r350739 try #2
For some inexplicable reason, C++ compilers reject the -Wno- flag, and also
(ab)use CWARNFLAGS.
Ryan Libby [Fri, 18 Dec 2020 08:29:38 +0000 (08:29 +0000)]
MFC r357019:
uma: fix zone domain overlaying pcpu cache with disabled cpus
UMA zone structures have two arrays at the end which are sized according
to the machine: an array of CPU count length, and an array of NUMA
domain count length. The CPU counting was wrong in the case where some
CPUs are disabled (when mp_ncpus != mp_maxid + 1), and this caused the
second array to be overlaid with the first.
Michal Meloun [Thu, 17 Dec 2020 13:17:26 +0000 (13:17 +0000)]
MFC r368167,r368187,r368203:
r368167:
NVME: Don't try to swap data on little endian machines. These swapping
functions violate BUSDMA contract - we cannot write to armed (by
bus_dmamap_sync(PRE_..)) buffers. Remove them at least from little endian
machines until a better solution will be developed.
r368187:
Unbreak r368167 in userland. Decorate unused arguments.
r368203:
Always use the __unused attribute even for potentially unused parameters.
Michal Meloun [Thu, 17 Dec 2020 12:58:05 +0000 (12:58 +0000)]
MFC r368364:
DesignWare PCIe driver: Don't call bus_generic_attach() twice.
bus_generic_attach() should be called from the attach function of the real
implementation, not from the common init function.
Martin Matuska [Wed, 16 Dec 2020 22:24:20 +0000 (22:24 +0000)]
MFC r368207,368607:
MFC r368207:
Update libarchive to 3.5.0
Relevant vendor changes:
Issue #1258: add archive_read_support_filter_by_code()
PR #1347: mtree digest reader support
Issue #1381: skip hardlinks pointing to itself on extraction
PR #1387: fix writing of cpio archives with hardlinks without file type
PR #1388: fix rdev field in cpio format for device nodes
PR #1389: completed support for UTF-8 encoding conversion
PR #1405: more formats in archive_read_support_format_by_code()
PR #1408: fix uninitialized size in rar5_read_data
PR #1409: system extended attribute support
PR #1435: support for decompression of symbolic links in zipx archives
Issue #1456: memory leak after unsuccessful archive_write_open_filename
MFC r368607:
Sync libarchive with vendor.
Vendor changes:
Issue #1461: Unbreak build without lzma
Issue #1462: warc reader: Fix build with gcc11
Issue #1463: Fix code compatibility in test_archive_read_support.c
Issue #1464: Use built-in strnlen on platforms where not available
Issue #1465: warc reader: fix undefined behaviour in deconst() function
Ian Lepore [Wed, 16 Dec 2020 17:09:38 +0000 (17:09 +0000)]
MFC 368585:
Provide userland notification of gpio pin changes ("userland gpio interrupts").
This is an import of the Google Summer of Code 2018 project completed by
Christian Kramer (and, sadly, ignored by us for two years now). The goals
stated for that project were:
FreeBSD already has support for interrupts implemented in the GPIO
controller drivers of several SoCs, but there are no interfaces to take
advantage of them out of user space yet. The goal of this work is to
implement such an interface by providing descriptors which integrate
with the common I/O system calls and multiplexing mechanisms.
The initial imported code supports the following functionality:
- A kernel driver that provides an interface to the user space; the
existing gpioc(4) driver was enhanced with this functionality.
- Implement support for the most common I/O system calls / multiplexing
mechanisms:
- read() Places the pin number on which the interrupt occurred in the
buffer. Blocking and non-blocking behaviour supported.
- poll()/select()
- kqueue()
- signal driven I/O. Posting SIGIO when the O_ASYNC was set.
- Many-to-many relationship between pins and file descriptors.
- A file descriptor can monitor several GPIO pins.
- A GPIO pin can be monitored by multiple file descriptors.
- Integration with gpioctl and libgpio.
I added some fixes (mostly to locking) and feature enhancements on top of
the original gsoc code. The feature ehancements allow the user to choose
between detailed and summary event reporting. Detailed reporting provides
a record describing each pin change event. Summary reporting provides the
time of the first and last change of each pin, and a count of how many times
it changed state since the last read(2) call. Another enhancement allows
the recording of multiple state change events on multiple pins between each
call to read(2) (the original code would track only a single event at a time).
The phabricator review for these changes timed out without approval, but I
cite it below anyway, because the review contains a series of diffs that
show how I evolved the code from its original state in Christian's github
repo for the gsoc project to what is being commited here. (In effect,
the phab review extends the VC history back to the original code.)
Submitted by: Christian Kramer
Obtained from: https://github.com/ckraemer/freebsd/tree/gsoc2018
Differential Revision: https://reviews.freebsd.org/D27398
atkbdc(4): Add quirk for "System76 lemur Pro" laptops.
Currently atkbdc(4) assumes all coreboot BIOSes belonging to Chromebooks
and unconditionally sets a number of quirks to workaround known issues.
Exclude "System76" laptops from this set as they appeared to be a
traditional hardware ("lemur Pro" is a rebranded Clevo chassis) with
coreboot firmware on board. KBDC_QUIRK_KEEP_ACTIVATED quirk activated for
Chromebook platform makes keyboard on this devices inoperable.
"Purism Librem" laptops may require the same exclusion too.
PR: 250711
Reported by: nick.lott@gmail.com
r367854:
psm(4): Disable AUX multiplexer probing on all Lenovo laptops.
Rudimentary AUX multiplexing support was added to kernel to make possible
touchpad initialization on some HP EliteBook laptops with trackpoint.
Disable multiplexer probing on all Lenovo laptops now as they use touchpad
pass-through port rather than AUX multiplexer to connect trackpoint and
at least two model (X120e and X121e) is known for getting PS/2 AUX port
dysfunctional after switching back to hidden multiplexing mode.
AUX MUX probing can be reenabled with setting of hw.psm.mux_disabled loader
tunable to 0.
PR: 249987
Reported by: jwb
r368365:
atkbd(4): Change quirk table end-of-list marker to NULL vendor/maker/product
This fixes regression introduced in r367349 which effectively resulted in
truncation of quirk table.
Kyle Evans [Tue, 15 Dec 2020 21:54:31 +0000 (21:54 +0000)]
MFC r368326: kern: soclose: don't sleep on SO_LINGER w/ timeout=0
This is a valid scenario that's handled in the various protocol layers where
it makes sense (e.g., tcp_disconnect and sctp_disconnect). Given that it
indicates we should immediately drop the connection, it makes little sense
to sleep on it.
This could lead to panics with INVARIANTS. On non-INVARIANTS kernels, this
could result in the thread hanging until a signal interrupts it if the
protocol does not mark the socket as disconnected for whatever reason.
Kyle Evans [Tue, 15 Dec 2020 21:53:54 +0000 (21:53 +0000)]
MFC r368388: bectl: simplify the tail end of the jail cmd
This has already confused me once (and I'm pretty sure I wrote it), so let's
clarify: unjailing after the command has completed will only happen if we're
interactive and -U has not been specified.
This just folds two conditionals together to make it obvious how -b/-U
interact with each other.
Kyle Evans [Tue, 15 Dec 2020 21:53:15 +0000 (21:53 +0000)]
MFC r368462: cpuset_set{affinity,domain}: do not allow empty masks
cpuset_modify() would not currently catch this, because it only checks that
the new mask is a subset of the root set and circumvents the EDEADLK check
in cpuset_testupdate().
This change both directly validates the mask coming in since we can
trivially detect an empty mask, and it updates cpuset_testupdate to catch
stuff like this going forward by always ensuring we don't end up with an
empty mask.
The check_mask argument has been renamed because the 'check' verbiage does
not imply to me that it's actually doing a different operation. We're either
augmenting the existing mask, or we are replacing it entirely.
Kyle Evans [Tue, 15 Dec 2020 21:52:31 +0000 (21:52 +0000)]
MFC r368461: kern: cpuset: resolve race between cpuset_lookup/cpuset_rel
The race plays out like so between threads A and B:
1. A ref's cpuset 10
2. B does a lookup of cpuset 10, grabs the cpuset lock and searches
cpuset_ids
3. A rel's cpuset 10 and observes the last ref, waits on the cpuset lock
while B is still searching and not yet ref'd
4. B ref's cpuset 10 and drops the cpuset lock
5. A proceeds to free the cpuset out from underneath B
Resolve the race by only releasing the last reference under the cpuset lock.
Thread A now picks up the spinlock and observes that the cpuset has been
revived, returning immediately for B to deal with later.
Kyle Evans [Tue, 15 Dec 2020 21:51:45 +0000 (21:51 +0000)]
MFC r368460: kern: cpuset: plug a unr leak
cpuset_rel_defer() is supposed to be functionally equivalent to
cpuset_rel() but with anything that might sleep deferred until
cpuset_rel_complete -- this setup is used specifically for cpuset_setproc.
Add in the missing unr free to match cpuset_rel. This fixes a leak that
was observed when I wrote a small userland application to try and debug
another issue, which effectively did:
cpuset(&newid);
cpuset(&scratch);
newid gets leaked when scratch is created; it's off the list, so there's
no mechanism for anything else to relinquish it. A more realistic reproducer
would likely be a process that inherits some cpuset that it's the only ref
for, but it creates a new one to modify. Alternatively, administratively
reassigning a process' cpuset that it's the last ref for will have the same
effect.
Kristof Provost [Tue, 15 Dec 2020 15:33:28 +0000 (15:33 +0000)]
MFC r368237:
if: Fix panic when destroying vnet and epair simultaneously
When destroying a vnet and an epair (with one end in the vnet) we often
panicked. This was the result of the destruction of the epair, which destroys
both ends simultaneously, happening while vnet_if_return() was moving the
struct ifnet to its home vnet. This can result in a freed ifnet being re-added
to the home vnet V_ifnet list. That in turn panics the next time the ifnet is
used.
Prevent this race by ensuring that vnet_if_return() cannot run at the same time
as if_detach() or epair_clone_destroy().
Kristof Provost [Tue, 15 Dec 2020 08:29:45 +0000 (08:29 +0000)]
MFC r368588:
pf: Allow net.pf.request_maxcount to be set from loader.conf
Mark request_maxcount as RWTUN so we can set it both at runtime and from
loader.conf. This avoids users getting caught out by the change from tunable to
run time configuration.
Brooks Davis [Mon, 14 Dec 2020 22:07:07 +0000 (22:07 +0000)]
MFC r368561:
ndis(4): expand deprecation to the whole driver
nids(4) was a clever idea in the early 2000's when the market was
flooded with 10/100 NICs with Windows-only drivers, but that hasn't been
the case for ages and the driver has had no meaningful maintenance in
ages. It only supports Windows-XP era drivers.
Brooks Davis [Mon, 14 Dec 2020 21:56:15 +0000 (21:56 +0000)]
MFC r368543:
style(9): Correct whitespace in struct definitions
struct ifconf and struct ifreq use the odd style "struct<tab>foo".
struct ifdrv seems to have tried to follow this but was committed with
spaces in place of most tabs resulting in "struct<space><space>ifdrv".
John Baldwin [Mon, 14 Dec 2020 20:48:59 +0000 (20:48 +0000)]
MFC 368004: Pull the check for VM ownership into ppt_find().
This reduces some code duplication. One behavior change is that
ppt_assign_device() will now only succeed if the device is unowned.
Previously, a device could be assigned to the same VM multiple times,
but each time it was assigned, the device's state was reset.
John Baldwin [Mon, 14 Dec 2020 20:40:21 +0000 (20:40 +0000)]
MFC 368003:
Honor the disabled setting for MSI-X interrupts for passthrough devices.
Add a new ioctl to disable all MSI-X interrupts for a PCI passthrough
device and invoke it if a write to the MSI-X capability registers
disables MSI-X. This avoids leaving MSI-X interrupts enabled on the
host if a guest device driver has disabled them (e.g. as part of
detaching a guest device driver).
This was found by Chelsio QA when testing that a Linux guest could
switch from MSI-X to MSI interrupts when using the cxgb4vf driver.
While here, explicitly fail requests to enable MSI on a passthrough
device if MSI-X is enabled and vice versa.