John Baldwin [Fri, 18 Nov 2022 18:02:09 +0000 (10:02 -0800)]
vmm: Use struct vcpu in the instruction emulation code.
This passes struct vcpu down in place of struct vm and and integer
vcpu index through the in-kernel instruction emulation code. To
minimize userland disruption, helper macros are used for the vCPU
arguments passed into and through the shared instruction emulation
code.
A few other APIs used by the instruction emulation code have also been
updated to accept struct vcpu in the kernel including
vm_get/set_register and vm_inject_fault.
John Baldwin [Fri, 18 Nov 2022 18:01:57 +0000 (10:01 -0800)]
vmm: Add vm_gpa_hold_global wrapper function.
This handles the case that guest pages are being held not on behalf of
a virtual CPU but globally. Previously this was handled by passing a
vcpuid of -1 to vm_gpa_hold, but that will not work in the future when
vm_gpa_hold is changed to accept a struct vcpu pointer.
John Baldwin [Fri, 18 Nov 2022 18:00:00 +0000 (10:00 -0800)]
vmm: Remove the per-vm cookie argument from vmmops taking a vcpu.
This requires storing a reference to the per-vm cookie in the
CPU-specific vCPU structure. Take advantage of this new field to
remove no-longer-needed function arguments in the CPU-specific
backends. In particular, stop passing the per-vm cookie to functions
that either don't use it or only use it for KTR traces.
John Baldwin [Fri, 18 Nov 2022 17:59:21 +0000 (09:59 -0800)]
vmm: Refactor storage of CPU-dependent per-vCPU data.
Rather than storing static arrays of per-vCPU data in the CPU-specific
per-VM structure, adopt a more dynamic model similar to that used to
manage CPU-specific per-VM data.
That is, add new vmmops methods to init and cleanup a single vCPU.
The init method returns a pointer that is stored in 'struct vcpu' as a
cookie pointer. This cookie pointer is now passed to other vmmops
callbacks in place of the integer index. The index is now only used
in KTR traces and when calling back into the CPU-independent layer.
John Baldwin [Fri, 18 Nov 2022 17:58:56 +0000 (09:58 -0800)]
vmm vmx: Add a global bool to indicate if the host has the TSC_AUX MSR.
A future commit will remove direct access to vCPU structures from
struct vmx, so add a dedicated boolean for this rather than checking
the capabilities for vCPU 0.
John Baldwin [Fri, 18 Nov 2022 17:58:41 +0000 (09:58 -0800)]
vmm: Rework snapshotting of CPU-specific per-vCPU data.
Previously some per-vCPU state was saved in vmmops_snapshot and other
state was saved in vmmops_vcmx_snapshot. Consolidate all per-vCPU
state into the latter routine and rename the hook to the more generic
'vcpu_snapshot'. Note that the CPU-independent per-vCPU data is still
stored in a separate blob as well as the per-vCPU local APIC data.
John Baldwin [Fri, 18 Nov 2022 17:57:29 +0000 (09:57 -0800)]
vmm: Simplify saving of absolute TSC values in snapshots.
Read the current "now" TSC value and use it to compute absolute time
saved value in vm_snapshot_vcpus rather than iterating over vCPUs
multiple times in vm_snapshot_vm.
John Baldwin [Tue, 29 Nov 2022 01:10:30 +0000 (17:10 -0800)]
bhyve: Avoid passing a possible garbage pointer to free().
All of the error paths in pci_vtcon_sock_add free the sock pointer.
However, sock is not initialized until part way through the function.
An early error would pass stack garbage to free().
John Baldwin [Tue, 29 Nov 2022 01:10:07 +0000 (17:10 -0800)]
bhyve: Appease warning about a potentially unaligned pointer.
When initializing the device model for a PCI pass through device that
uses MSI-X, bhyve reads the MSI-X capability from the real device to
save a copy in the emulated PCI config space. It also saves a copy in
a local struct msixcap on the stack. Since struct msixcap is packed,
GCC complains that casting a pointer to the struct to a uint32_t
pointer may result in an unaligned pointer.
This path is not performance critical, so to appease the compiler,
simply change the pointer to a char * and use memcpy to copy the 4
bytes read in each iteration of the loop.
John Baldwin [Tue, 29 Nov 2022 01:09:15 +0000 (17:09 -0800)]
bhyve: Avoid unlikely truncation of the blockif ident strings.
The ident string for NVMe and VirtIO block deivces do not contain the
bus, and the various fields can potentially use up to three characters
when printed as unsigned values (full range of uint8_t) even if not
likely in practice.
John Baldwin [Tue, 29 Nov 2022 01:08:09 +0000 (17:08 -0800)]
bhyve: Fix sign compare warnings in the e1000 device model.
Adding a bare constant to a uint16_t promotes to a signed int which
triggers these warnings. Changing the constant to be explicitly
unsigned instead promotes the expression to unsigned int.
Mark Johnston [Fri, 18 Nov 2022 19:10:33 +0000 (14:10 -0500)]
bhyve: Enable the default compiler warnings
Disable -Wcast-align for now since we have many instances of that
warning (I fixed some but not most of them) and platforms on which bhyve
runs don't particularly care about unaligned accesses.
John Baldwin [Thu, 26 Jan 2023 20:19:12 +0000 (12:19 -0800)]
bhyve: Mark pci_de_vinput as static.
Originally in main this was fixed in commit 37045dfa891a. However,
when that commit was merged to stable/13 in commit 976ed044fbbb,
pci_virtio_input was not yet merged to stable/13. This is a direct
commit to complete the earlier MFC.
Mark Johnston [Fri, 18 Nov 2022 19:07:20 +0000 (14:07 -0500)]
bhyve: Let BASL compile with raised warnings
- Make basl_dump() as unused.
- Avoid arithmetic on a void pointer.
- Avoid a signed/unsigned comparison with
BASL_TABLE_CHECKSUM_LEN_FULL_TABLE.
- Ignore warnings about unused parameters from stuff pulled in by
acpi.h. In particular, any prototype wrapped by
ACPI_DBG_DEPENDENT_RETURN_VOID() will raise such parameters unless
ACPI_DEBUG_OUTPUT is defined.
Mark Johnston [Fri, 18 Nov 2022 19:07:38 +0000 (14:07 -0500)]
bhyve: Avoid using a packed struct for xhci port registers
I believe the __packed annotation is there only because
pci_xhci_portregs_read() is treating the register set as an array of
uint32_t. clang warns about taking the address of portregs->portsc
because it is a packed member and thus might not have expected
alignment.
Fix the problem by simply selecting the field to read with a switch
statement. This mimics pci_xhci_portregs_write(). While here, switch
to using some symbolic constants.
There is a small semantic change here in that pci_xhci_portregs_read()
would silently truncate unaligned offsets. For consistency with
pci_xhci_portregs_write(), which does not do that, return all ones for
unaligned reads instead.
Warner Losh [Thu, 27 Oct 2022 17:32:18 +0000 (11:32 -0600)]
bhyve: Implement MSR_MISC_FEATURES_ENABLES
Linux reads MISC_FEATURES_ENABLES to manage the CPUID faulting feature
(undocumented in the Intel SDM, but documented in 323850-004 (Intel
Virtualization Technology FlexMigration Application Note). Since bhyve
doesn't emulate this feature, we always return 0. Neither does bhyve
support the MONITOR/MWAIT fault bit also in this MSR (which is
documented in the sdm), so always return 0.
Mark Johnston [Tue, 25 Oct 2022 13:16:23 +0000 (09:16 -0400)]
bhyve: Address warnings in blockif_proc()
- Use unsigned types for all arithmetic. Use a new signed variable for
holding the return value of pread() and pwrite().
- Handle short I/O from pwrite().
Mark Johnston [Sat, 22 Oct 2022 17:41:33 +0000 (13:41 -0400)]
bhyve: Fix some warnings in the snapshot code
- Qualify unexported symbols with "static".
- Drop some unnecessary and incorrect casts.
- Avoid arithmetic on void pointers.
- Avoid signed/unsigned comparisons in loops which use nitems() as a
bound.
Mark Johnston [Thu, 8 Sep 2022 23:08:10 +0000 (19:08 -0400)]
bhyve: Address some warnings in bhyverun.c
- Add const and __unused qualifiers where appropriate.
- Localize some global variables.
- Consistently spell vmexit state as "vme" in vmexit handlers, to avoid
shadowing the global vm_exit state array.
- Similarly, avoid shadowing "optarg".
Chuck Tuffli [Sun, 14 Aug 2022 14:47:34 +0000 (07:47 -0700)]
bhyve nvme: Fix Controller init error cases
Fuzzing of bhyve uncovered an assertion failure in the NVMe emulation.
Investigation uncovered several corner cases the code did not handle.
This change handles several Controller initialization errors, including
- bad AQ sizes
- bad AQ vm_map_gpa
- doorbell writes prior to RDY
- doorbell writes to uninitialized queue
- CSTS.RDY if CFS set
Vitaliy Gusev [Thu, 30 Jun 2022 21:21:57 +0000 (14:21 -0700)]
libvmmapi: Add vm_close()
Currently there is no way to safely free a vm structure without
leaking the fd. vm_destroy() closes the fd but also destroys the VM
whereas in some cases a VM needs to be opened (vm_open) and then
closed (vm_close).
Vitaliy Gusev [Thu, 23 Jun 2022 18:46:06 +0000 (11:46 -0700)]
bhyve: Snapshot impovements for 'blockif' backend
When pausing a block I/O device model as part of suspending a VM, wait
for all active block I/O requests to finish before saving snapshot
data. This avoids having to save information about in-flight requests
both in the block_if layer and in storage device models.
For the AHCI device model, the queues are now guaranteed to be idle
when taking a snapshot, so remove the code to save queue state and
rely on the initial state in a resumed VM having all queues already
idle.
This will also simplify adding NVMe snapshot support in the future.
Robert Wing [Sun, 10 Apr 2022 21:37:24 +0000 (13:37 -0800)]
vmm_instruction_emul.c: fix bhyve build
The __diagused macro was used to cure a "set but not used" warning. This
broke the build for bhyve since __diagused is only defined in the
kernel. Define __diagused when not building the kernel.
Fixes: 5241577a223d ("vmm: fix set but not used warning")
Reported by: Jenkins
virtio-console is currently missing .pe_legacy_config, which prevents any
portN configuration from being parsed, and therefore no sockets will be
created.
Robert Wing [Wed, 3 Mar 2021 06:05:47 +0000 (21:05 -0900)]
bhyve/snapshot: provide a way to send other messages/data to bhyve
This is a step towards sending messages (other than suspend/checkpoint)
from bhyvectl to bhyve.
Introduce a new struct, ipc_message - this struct stores the type of
message and a union containing message specific structures for the type
of message being sent.
Robert Wing [Mon, 8 Mar 2021 00:23:29 +0000 (15:23 -0900)]
bhyve/snapshot: use SOCK_DGRAM instead of SOCK_STREAM
The save/restore feature uses a unix domain socket to send messages
from bhyvectl(8) to a bhyve(8) process. A datagram socket will suffice
for this.
An added benefit of using a datagram socket is simplified code. For
bhyve, the listen/accept calls are dropped; and for bhyvectl, the
connect() call is dropped.
EPRINTLN handles raw mode for bhyve(8), use it to print error messages.
Robert Wing [Sat, 27 Feb 2021 21:07:35 +0000 (12:07 -0900)]
bhyve/snapshot: rename and bump size of MAX_SNAPSHOT_VMNAME
MAX_SNAPSHOT_VMNAME is a macro used to set the size of a character
buffer that stores a filename or the path to a file - this file is used
by the save/restore feature.
Since the file doesn't have anything to do with a vm name, rename
MAX_SNAPSHOT_VMNAME to MAX_SNAPSHOT_FILENAME. Bump the size to PATH_MAX
while here.
Robert Wing [Sat, 27 Feb 2021 21:05:52 +0000 (12:05 -0900)]
bhyvectl: reduce code duplication
Combine send_start_checkpoint() and send_start_suspend() into a
single function named snapshot_request().
snapshot_request() is equivalent to send_start_checkpoint() and
send_start_suspend() except that it takes an additional argument. The
additional argument, enum ipc_opcode, is used to determine the type of
snapshot request being performed. Also, switch to using strlcpy instead
of strncpy.