John Baldwin [Wed, 30 Nov 2022 21:06:46 +0000 (13:06 -0800)]
vmm: Remove stale comment for vm_rendezvous.
Support for rendezvous outside of a vcpu context (vcpuid of -1) was
removed in commit 949f0f47a4e7, and the vm, vcpuid argument pair was
replaced by a single struct vcpu pointer in commit d8be3d523dd5.
John Baldwin [Fri, 18 Nov 2022 18:05:35 +0000 (10:05 -0800)]
vmm: Allocate vCPUs on first use of a vCPU.
Convert the vcpu[] array in struct vm to an array of pointers and
allocate vCPUs on first use. This avoids always allocating VM_MAXCPU
vCPUs for each VM, but instead only allocates the vCPUs in use. A new
per-VM sx lock is added to serialize attempts to allocate vCPUs on
first use. However, a given vCPU is never freed while the VM is
active, so the pointer is read via an unlocked read first to avoid the
need for the lock in the common case once the vCPU has been created.
Some ioctls need to lock all vCPUs. To prevent races with ioctls that
want to allocate a new vCPU, these ioctls also lock the sx lock that
protects vCPU creation.
John Baldwin [Fri, 18 Nov 2022 18:05:10 +0000 (10:05 -0800)]
vmm: Use a cpuset_t for vCPUs waiting for STARTUP IPIs.
Retire the boot_state member of struct vlapic and instead use a cpuset
in the VM to track vCPUs waiting for STARTUP IPIs. INIT IPIs add
vCPUs to this set, and STARTUP IPIs remove vCPUs from the set.
STARTUP IPIs are only reported to userland for vCPUs that were removed
from the set.
In particular, this permits a subsequent change to allocate vCPUs on
demand when the vCPU may not be allocated until after a STARTUP IPI is
reported to userland.
Robert Wing [Fri, 20 Jan 2023 11:10:53 +0000 (11:10 +0000)]
vmm: take exclusive mem_segs_lock in vm_cleanup()
The consumers of vm_cleanup() are vm_reinit() and vm_destroy().
The vm_reinit() call path is, here vmmdev_ioctl() takes mem_segs_lock:
vmmdev_ioctl()
vm_reinit()
vm_cleanup(destroy=false)
The call path for vm_destroy() is (mem_segs_lock not taken):
sysctl_vmm_destroy()
vmmdev_destroy()
vm_destroy()
vm_cleanup(destroy=true)
Fix this by taking mem_segs_lock in vm_cleanup() when destroy == true.
Reviewed by: corvink, markj, jhb Fixes: 67b69e76e8ee ("vmm: Use an sx lock to protect the memory map.")
Differential Revision: https://reviews.freebsd.org/D38071
John Baldwin [Fri, 18 Nov 2022 18:04:37 +0000 (10:04 -0800)]
vmm: Use an sx lock to protect the memory map.
Previously bhyve obtained a "read lock" on the memory map for ioctls
needing to read the map by locking the last vCPU. This is now
replaced by a new per-VM sx lock. Modifying the map requires
exclusively locking the sx lock as well as locking all existing vCPUs.
Reading the map requires either locking one vCPU or the sx lock.
This permits safely modifying or querying the memory map while some
vCPUs do not exist which will be true in a future commit.
John Baldwin [Fri, 18 Nov 2022 18:04:11 +0000 (10:04 -0800)]
vmm vmx: Allocate vpids on demand as each vCPU is initialized.
Compared to the previous version this does mean that if the system as
a whole runs out of dedicated vPIDs you might end up with some vCPUs
within a single VM using dedicated vPIDs and others using shared
vPIDs, but this should not break anything.
John Baldwin [Fri, 18 Nov 2022 18:03:52 +0000 (10:03 -0800)]
vmm: Lookup vcpu pointers in vmmdev_ioctl.
Centralize mapping vCPU IDs to struct vcpu objects in vmmdev_ioctl and
pass vcpu pointers to the routines in vmm.c. For operations that want
to perform an action on all vCPUs or on a single vCPU, pass pointers
to both the VM and the vCPU using a NULL vCPU pointer to request
global actions.
John Baldwin [Fri, 18 Nov 2022 18:02:09 +0000 (10:02 -0800)]
vmm: Use struct vcpu in the instruction emulation code.
This passes struct vcpu down in place of struct vm and and integer
vcpu index through the in-kernel instruction emulation code. To
minimize userland disruption, helper macros are used for the vCPU
arguments passed into and through the shared instruction emulation
code.
A few other APIs used by the instruction emulation code have also been
updated to accept struct vcpu in the kernel including
vm_get/set_register and vm_inject_fault.
John Baldwin [Fri, 18 Nov 2022 18:01:57 +0000 (10:01 -0800)]
vmm: Add vm_gpa_hold_global wrapper function.
This handles the case that guest pages are being held not on behalf of
a virtual CPU but globally. Previously this was handled by passing a
vcpuid of -1 to vm_gpa_hold, but that will not work in the future when
vm_gpa_hold is changed to accept a struct vcpu pointer.
John Baldwin [Fri, 18 Nov 2022 18:00:00 +0000 (10:00 -0800)]
vmm: Remove the per-vm cookie argument from vmmops taking a vcpu.
This requires storing a reference to the per-vm cookie in the
CPU-specific vCPU structure. Take advantage of this new field to
remove no-longer-needed function arguments in the CPU-specific
backends. In particular, stop passing the per-vm cookie to functions
that either don't use it or only use it for KTR traces.
John Baldwin [Fri, 18 Nov 2022 17:59:21 +0000 (09:59 -0800)]
vmm: Refactor storage of CPU-dependent per-vCPU data.
Rather than storing static arrays of per-vCPU data in the CPU-specific
per-VM structure, adopt a more dynamic model similar to that used to
manage CPU-specific per-VM data.
That is, add new vmmops methods to init and cleanup a single vCPU.
The init method returns a pointer that is stored in 'struct vcpu' as a
cookie pointer. This cookie pointer is now passed to other vmmops
callbacks in place of the integer index. The index is now only used
in KTR traces and when calling back into the CPU-independent layer.
John Baldwin [Fri, 18 Nov 2022 17:58:56 +0000 (09:58 -0800)]
vmm vmx: Add a global bool to indicate if the host has the TSC_AUX MSR.
A future commit will remove direct access to vCPU structures from
struct vmx, so add a dedicated boolean for this rather than checking
the capabilities for vCPU 0.
John Baldwin [Fri, 18 Nov 2022 17:58:41 +0000 (09:58 -0800)]
vmm: Rework snapshotting of CPU-specific per-vCPU data.
Previously some per-vCPU state was saved in vmmops_snapshot and other
state was saved in vmmops_vcmx_snapshot. Consolidate all per-vCPU
state into the latter routine and rename the hook to the more generic
'vcpu_snapshot'. Note that the CPU-independent per-vCPU data is still
stored in a separate blob as well as the per-vCPU local APIC data.
John Baldwin [Fri, 18 Nov 2022 17:57:29 +0000 (09:57 -0800)]
vmm: Simplify saving of absolute TSC values in snapshots.
Read the current "now" TSC value and use it to compute absolute time
saved value in vm_snapshot_vcpus rather than iterating over vCPUs
multiple times in vm_snapshot_vm.
John Baldwin [Tue, 29 Nov 2022 01:10:30 +0000 (17:10 -0800)]
bhyve: Avoid passing a possible garbage pointer to free().
All of the error paths in pci_vtcon_sock_add free the sock pointer.
However, sock is not initialized until part way through the function.
An early error would pass stack garbage to free().
John Baldwin [Tue, 29 Nov 2022 01:10:07 +0000 (17:10 -0800)]
bhyve: Appease warning about a potentially unaligned pointer.
When initializing the device model for a PCI pass through device that
uses MSI-X, bhyve reads the MSI-X capability from the real device to
save a copy in the emulated PCI config space. It also saves a copy in
a local struct msixcap on the stack. Since struct msixcap is packed,
GCC complains that casting a pointer to the struct to a uint32_t
pointer may result in an unaligned pointer.
This path is not performance critical, so to appease the compiler,
simply change the pointer to a char * and use memcpy to copy the 4
bytes read in each iteration of the loop.
John Baldwin [Tue, 29 Nov 2022 01:09:15 +0000 (17:09 -0800)]
bhyve: Avoid unlikely truncation of the blockif ident strings.
The ident string for NVMe and VirtIO block deivces do not contain the
bus, and the various fields can potentially use up to three characters
when printed as unsigned values (full range of uint8_t) even if not
likely in practice.
John Baldwin [Tue, 29 Nov 2022 01:08:09 +0000 (17:08 -0800)]
bhyve: Fix sign compare warnings in the e1000 device model.
Adding a bare constant to a uint16_t promotes to a signed int which
triggers these warnings. Changing the constant to be explicitly
unsigned instead promotes the expression to unsigned int.
Mark Johnston [Fri, 18 Nov 2022 19:10:33 +0000 (14:10 -0500)]
bhyve: Enable the default compiler warnings
Disable -Wcast-align for now since we have many instances of that
warning (I fixed some but not most of them) and platforms on which bhyve
runs don't particularly care about unaligned accesses.
John Baldwin [Thu, 26 Jan 2023 20:19:12 +0000 (12:19 -0800)]
bhyve: Mark pci_de_vinput as static.
Originally in main this was fixed in commit 37045dfa891a. However,
when that commit was merged to stable/13 in commit 976ed044fbbb,
pci_virtio_input was not yet merged to stable/13. This is a direct
commit to complete the earlier MFC.
Mark Johnston [Fri, 18 Nov 2022 19:07:20 +0000 (14:07 -0500)]
bhyve: Let BASL compile with raised warnings
- Make basl_dump() as unused.
- Avoid arithmetic on a void pointer.
- Avoid a signed/unsigned comparison with
BASL_TABLE_CHECKSUM_LEN_FULL_TABLE.
- Ignore warnings about unused parameters from stuff pulled in by
acpi.h. In particular, any prototype wrapped by
ACPI_DBG_DEPENDENT_RETURN_VOID() will raise such parameters unless
ACPI_DEBUG_OUTPUT is defined.
Mark Johnston [Fri, 18 Nov 2022 19:07:38 +0000 (14:07 -0500)]
bhyve: Avoid using a packed struct for xhci port registers
I believe the __packed annotation is there only because
pci_xhci_portregs_read() is treating the register set as an array of
uint32_t. clang warns about taking the address of portregs->portsc
because it is a packed member and thus might not have expected
alignment.
Fix the problem by simply selecting the field to read with a switch
statement. This mimics pci_xhci_portregs_write(). While here, switch
to using some symbolic constants.
There is a small semantic change here in that pci_xhci_portregs_read()
would silently truncate unaligned offsets. For consistency with
pci_xhci_portregs_write(), which does not do that, return all ones for
unaligned reads instead.
Warner Losh [Thu, 27 Oct 2022 17:32:18 +0000 (11:32 -0600)]
bhyve: Implement MSR_MISC_FEATURES_ENABLES
Linux reads MISC_FEATURES_ENABLES to manage the CPUID faulting feature
(undocumented in the Intel SDM, but documented in 323850-004 (Intel
Virtualization Technology FlexMigration Application Note). Since bhyve
doesn't emulate this feature, we always return 0. Neither does bhyve
support the MONITOR/MWAIT fault bit also in this MSR (which is
documented in the sdm), so always return 0.
Mark Johnston [Tue, 25 Oct 2022 13:16:23 +0000 (09:16 -0400)]
bhyve: Address warnings in blockif_proc()
- Use unsigned types for all arithmetic. Use a new signed variable for
holding the return value of pread() and pwrite().
- Handle short I/O from pwrite().
Mark Johnston [Sat, 22 Oct 2022 17:41:33 +0000 (13:41 -0400)]
bhyve: Fix some warnings in the snapshot code
- Qualify unexported symbols with "static".
- Drop some unnecessary and incorrect casts.
- Avoid arithmetic on void pointers.
- Avoid signed/unsigned comparisons in loops which use nitems() as a
bound.
Mark Johnston [Thu, 8 Sep 2022 23:08:10 +0000 (19:08 -0400)]
bhyve: Address some warnings in bhyverun.c
- Add const and __unused qualifiers where appropriate.
- Localize some global variables.
- Consistently spell vmexit state as "vme" in vmexit handlers, to avoid
shadowing the global vm_exit state array.
- Similarly, avoid shadowing "optarg".
Chuck Tuffli [Sun, 14 Aug 2022 14:47:34 +0000 (07:47 -0700)]
bhyve nvme: Fix Controller init error cases
Fuzzing of bhyve uncovered an assertion failure in the NVMe emulation.
Investigation uncovered several corner cases the code did not handle.
This change handles several Controller initialization errors, including
- bad AQ sizes
- bad AQ vm_map_gpa
- doorbell writes prior to RDY
- doorbell writes to uninitialized queue
- CSTS.RDY if CFS set
Vitaliy Gusev [Thu, 30 Jun 2022 21:21:57 +0000 (14:21 -0700)]
libvmmapi: Add vm_close()
Currently there is no way to safely free a vm structure without
leaking the fd. vm_destroy() closes the fd but also destroys the VM
whereas in some cases a VM needs to be opened (vm_open) and then
closed (vm_close).
Vitaliy Gusev [Thu, 23 Jun 2022 18:46:06 +0000 (11:46 -0700)]
bhyve: Snapshot impovements for 'blockif' backend
When pausing a block I/O device model as part of suspending a VM, wait
for all active block I/O requests to finish before saving snapshot
data. This avoids having to save information about in-flight requests
both in the block_if layer and in storage device models.
For the AHCI device model, the queues are now guaranteed to be idle
when taking a snapshot, so remove the code to save queue state and
rely on the initial state in a resumed VM having all queues already
idle.
This will also simplify adding NVMe snapshot support in the future.
Robert Wing [Sun, 10 Apr 2022 21:37:24 +0000 (13:37 -0800)]
vmm_instruction_emul.c: fix bhyve build
The __diagused macro was used to cure a "set but not used" warning. This
broke the build for bhyve since __diagused is only defined in the
kernel. Define __diagused when not building the kernel.
Fixes: 5241577a223d ("vmm: fix set but not used warning")
Reported by: Jenkins
virtio-console is currently missing .pe_legacy_config, which prevents any
portN configuration from being parsed, and therefore no sockets will be
created.