CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log

pf tests: test fast port re-use with syncookies

When a src/dst ip/port tuple is re-used before the pf state fully
expires we clean up the state and create a new one, unless syncookies
are enabled.

Test this, by running two back-to-back nc sessions, with a fixed source
port. Move the interface and IP to a different (vnet) jail, to trick the
network stack into letting us do this.

MFC after:      2 weeks
Event:          Aberdeen hackathon 2022
Differential Revision:  https://reviews.freebsd.org/D36886

(cherry picked from commit dc698b2cd59ebc08b05a261dbba8ee5707450d28)

pf: fix syncookies in conjunction with tcp fast port reuse

Basic scenario: we have a closed connection (In TCPS_FIN_WAIT_2), and
get a new connection (i.e. SYN) re-using the tuple.

Without syncookies we look at the SYN, and completely unlink the old,
closed state on the SYN.
With syncookies we send a generated SYN|ACK back, and drop the SYN,
never looking at the state table.

So when the ACK (i.e. the third step in the three way handshake for
connection setup) turns up, we’ve not actually removed the old state, so
we find it, and don’t do the syncookie dance, or allow the new
connection to get set up.

Explicitly check for this in pf_test_state_tcp(). If we find a state in
TCPS_FIN_WAIT_2 and the syncookie is valid we delete the existing state
so we can set up the new state.
Note that when we verify the syncookie in pf_test_state_tcp() we don't
decrement the number of half-open connections to avoid an incorrect
double decrement.

MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D37919

(cherry picked from commit 9c041b450d5e604c3e35b5799b60a2c53795feef)

pf: fix panic on deferred packets

The pfsync_defer_tmo() callout needs to set the correct vnet before it
can transmit packets. It used the rcvif in the mbuf to get this vnet,
but that doesn't work for locally originated traffic. In that case the
rcvif pointer is NULL, and the dereference leads to a panic.

Instead use the sc_sync_if, which is always set (if pfsync is enabled,
at least).

PR: 268246
MFC after: 2 weeks

(cherry picked from commit fd02192c3acaefeb62db11e0c10ab36240b79ba2)

13.2: update stable/13 to -PRERELEASE to start the release cycle

Approved by: re (implicit)
Sponsored by: https://www.patreon.com/cperciva

cal: don't print terminal control characters unless stdout is a TTY

A similar change was made in svn r223931, but it was incomplete, working
only when the utility was invoked as "ncal". Fix the same issue when
invoking as "cal".

PR: 268936
Reported by: Ray Bellis <ray@bellis.me.uk>
Sponsored by: Axcient
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D38045

(cherry picked from commit 92e978439f0c3139775ad96d412959f5a74b17b6)

Switch wg(4) to the new if_clone KPI

Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D37740

(cherry picked from commit eb3f9a7aece9473d678adddcf6aefe6c1eec0ac4)

fsx: more consistent debug output with -[RWU]

(cherry picked from commit da303f5fd4ee8582cf76712212dd0b711d7d9dac)

fsx: bounds check the inputs

In particular, don't allow the user to specify a file size that can't be
expressed as an int, since fsx's random-number generator only has a 32
bit range.

(cherry picked from commit 3f8ca7a22ed917a3e3a4ad78538d9f468d6d3bd8)

ping(8): man page cleanup

* Appease mandoc -T lint and igor

* Use example.com for documentation

* Update the IPv4 TTL section.
  Update the IPv4 TTL section specifically for FreeBSD.
  FreeBSD changed the default TTL to 64 in
  5639e86bdd7ea151958776264bf5a67e60a54d68.  NetBSD and OpenBSD still
  use 255.  Remove some references of extinct operating systems.

Reviewed by: gbe (manpages), asomers
Pull Request: https://github.com/freebsd/freebsd-src/pull/630

(cherry picked from commit 8eb4df948711166a438f4111f7069a412d1456bd)

Add test cases for ping with IP options in the response

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D37210

(cherry picked from commit e35cfc606a299ef40767e708362529c370f767f5)

improvements to cap_sysctl.3

* Correct some function prototypes which were documented with the wrong
pointer type.
* Clarify return values and requirements for freeing the limit handle.

[skip ci]

Sponsored by: Axcient
Reviewed by: oshogbo
Differential Revision: https://reviews.freebsd.org/D37586

(cherry picked from commit 6c93a2d0bc37f0c912e402f3f94c3c01350dca26)

loader: md: Use default func for fmtdev and parsedev

The default function are enough for md so use them instead of the
disks ones that doesn't work for it anymore.

Reviewed by: imp
Sponsored by: Beckhoff Automation GmbH & Co. KG
MFC after: now
Differential Revision: https://reviews.freebsd.org/D38218

(cherry picked from commit 04afa8cc370e6bb7302b6fe09c8d27a606fe414e)

Allow any user to read the NFS stats, for example with nfsstat(1).

This was originally allowed by 3cea29603d3 (2011). But it got broken by
693957f8861 (2016) and apparently nobody noticed.

Sponsored by: Axcient
Reviewed by: rmacklem, ken
Differential Revision: https://reviews.freebsd.org/D37589

(cherry picked from commit d2ce00e9a6176014bbeb792dd9959ef1e60d787e)

document first appearance of fhlink et al

[skip ci]

Sponsored by: Axcient
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D37575

(cherry picked from commit 34120c0c5234a56945f9a732b05a8d8b97492916)

ena: Update driver version to v2.6.2

Bug Fixes:
* Remove timer service re-arm on ena_restore_device failure.
* Re-Enable per-packet missing tx completion print

Minor Changes:
* Switch driver owners from Semihalf to Amazon in man file.

MFC after: 2 weeks
Sponsored by: Amazon, Inc.
Pull Request: https://github.com/freebsd/freebsd-src/pull/637

(cherry picked from commit e5de1d8dad25308f3a332f9386e05b851a36c507)

ena: Switch driver owners from semihalf to amazon in man file

1. Update ena.4 manual file to include amazon owner emails.
2. State that the driver is developed by amazon but leave
that it was originally written by Semihalf, similarly to other
drivers in the /share/man/ directory of the FreeBSD source code.
3. Advance year in copyright notice to 2022.

MFC after: 2 weeks
Sponsored by: Amazon, Inc.

(cherry picked from commit fb47286c38a5315bdd93389b0547b8ccccdc7327)

ena: Remove timer service re-arm on ena_restore_device failure

In case the reset sequence fails (ena_destroy_device() followed by
ena_restore_device() calls) during ena_restore_device(), the driver
resources are being freed. After the clean-up, the timer service is
re-armed in order to try and re-initialize the driver state.
But, such an attempt would fail given that the resources are freed.
Moreover, this would actually cause either the system to fail or a
panic.
When the driver fails in ena_restore_device() procedure, the only
recovery is either unloading and loading the driver or instance
reboot.

This change removes the timer service re-arm in case of failure
in ena_restore_device().

MFC after: 2 weeks
Sponsored by: Amazon, Inc.
Fixes: 78554d0c707c ("ena: start timer service on attach")
(cherry picked from commit c4a85b8d684d3db9dc4d3d01d966130e21390529)

ena: Re-Enable per-packet missing tx completion print

Commit [1] first added the ena_tx_buffer.print_once member,
so that a message about a missing tx completion is printed only
once per packet (and not every second when the watchdog runs).
In this commit print_once is initialized to true, and is set back
to false after detecting a missing tx completion and printing
a warning about it to dmesg.

Commit [2] incorrectly reverses the values assigned to print_once.
The variable is initialized to be true but is checked to be false
when a missing tx completion is detected. This is never true, and
therefore the warning print for each missing tx completion is never
printed since this commit.

Commit [3] added time passed since last TX cleanup to the missing
tx completions per-packet print. However, due to the issue in commit
[2], this time is never printed.

This commit reverses back the values assigned to ena_tx_buffer.print_once
erroneously by commit [2], bringing back to life the missing tx
completion per-packet print.

Also add a space after "." in the missing tx completion print.

[1] - 9b8d05b8ac78 ("Add support for Amazon Elastic Network Adapter (ENA) NIC")
[2] - 74dba3ad7851 ("Split function checking for missing TX completion in ENA driver")
[3] - d8aba82b5ca7 ("ena: Store ticks of last Tx cleanup")

Fixes: 74dba3ad7851 ("Split function checking for missing TX completion in ENA driver")
Fixes: d8aba82b5ca7 ("ena: Store ticks of last Tx cleanup")
MFC after: 2 weeks
Sponsored by: Amazon, Inc.

(cherry picked from commit f01b2cd98e93ef5ac3698b97bbaa45ac9c50fe5d)

ena: Remove write only variables

Sponsored by: Netflix

(cherry picked from commit 094b2a2379c1c92f134fac22be78517c71470ca6)

ena: Remove unused variable.

(cherry picked from commit 4dab99b936920cf0c8f3fcf63754985906714a40)

ena: Remove unused devclass argument to DRIVER_MODULE.

(cherry picked from commit 1dc1476cac52a65c6a25e786f90ccc7529b6a7de)

ipfw: Add missing 'va' code point name

Per RFC 5865, add the 'va' (VOICE-ADMIT, 101100) symbolic name.

Reviewed By: melifaro, pauamma
Differential Revision: https://reviews.freebsd.org/D37508
MFC after: 2 weeks

(cherry picked from commit bdd60b224fa461a5849f60575afdb458613f4ccd)

netlink: add netlink to GENERIC@amd64

Netlink is a communication protocol defined in RFC 3549. It is async,
TLV-based protocol, providing 1-1 and 1-many communications between kernel
and userland. Netlink is currently used in Linux kernel to modify, read and
subscribe for nearly all networking states. Interface state, addresses, routes,
firewall, rules, fibs, etc, are controlled via Netlink.

Netlink support was added in D36002. It has got a number of improvements and
first customers since then:
* net/bird2 got netlink support, enabling route multipath in FreeBSD
* netlink-based devd notifications are being worked on ( D37574 ).
* linux(4) fully supports and depends on Netlink

Enabling Netlink in GENERIC targets two goals.
The first one is to provide stability for the third-party userland applications,
so they can rely on the fact that netlink always exists since 14.0 and potentially 13.2.
Loadable module makes life of the app delepers harder. For example, `net/bird2` can be
either build with netlink or rtsock support, but not both.

The second goal is to enable gradual conversion of the base userland tools
to use netlink(4) interfaces. Converting tools like netstat (D36529), route,
ifconfig one-by-one simplifies testing and addressing the feedback.
Othewise, switching all base to use netlink at once may be too big of a leap.

This change targets amd64, the other architectures will follow soon.

Differential Revision: https://reviews.freebsd.org/D37783

(cherry picked from commit 692e19cf519578176d51d4c1001b01b1f355c1de)

bhyve: Mark pci_de_vinput as const.

I missed this in the prior fixup commit that added static.

Fixes: 03851ae8cd4d bhyve: Mark pci_de_vinput as static.

bhyveload: open guest boot disk image O_RDWR

When a boot environment has been booted via the bootonce feature,
userboot clears the bootonce value from an nvlist but fails to write the
updated nvlist back to disk.

The failure occurs because bhyveload opens the guest boot disk image
O_RDONLY, fix this by opening it O_RDWR.

Reviewed by: imp, markj, jhb
Differential Revision: https://reviews.freebsd.org/D37274

(cherry picked from commit 5a023bd2a53a7279b126ae6bf949560c6342b57a)

bhyve: Fix a buffer overread in the PCI hda device model.

The sc->codecs array contains HDA_CODEC_MAX (15) entries. The
guest-supplied cad field in the verb provided to hda_send_command is a
4-bit field that was used as an index into sc->codecs without any
bounds checking. The highest value (15) would overflow the array.

Other uses of sc->codecs in the device model used sc->codecs_no to
determine which array indices have been initialized, so use a similar
check to reject requests for uninitialized or invalid cad indices in
hda_send_command.

PR: 264582
Reported by: Robert Morris <rtm@lcs.mit.edu>
Reviewed by: corvink, markj, emaste
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D38128

(cherry picked from commit cf57f20edcf9c75f0f9f1ac1c44729184970b9d9)

bhyve: Fix a global buffer overread in the PCI hda device model.

hda_write did not validate the relative register offset before using
it as an index into the hda_set_reg_table array to lookup a function
pointer to execute after updating the register's value.

PR: 264435
Reported by: Robert Morris <rtm@lcs.mit.edu>
Reviewed by: corvink, markj, emaste
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D38127

(cherry picked from commit bfe8e339eb77910c2eb739b45aaa936148b33897)

bhyve: Remove vmctx argument from PCI device model methods.

Most of these arguments were unused. Device models which do need
access to the vmctx in one of these methods can obtain it from the
pi_vmctx member of the pci_devinst argument instead.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D38096

(cherry picked from commit 6a284cacb1e56a81dff7f5c34bbbd7ed1a0b6cb1)

bhyve: Fix a mismerge in the PCI passthrough device model.

This block of code was removed in stable/13 commit 7ea16192a01e. It
was accidentally re-added in commit aa5eea98b99c (probably due to
resolving a conflict during the merge).

This is a direct commit to stable/13.

bhyve: Avoid triggering false -Wfree-nonheap-object warnings.

XHCI port and slot numbers are 1-based rather than 0-based. To handle
this, bhyve was subtracting one item from the pointers saved in the
softc so that index 1 accessed index 0 of the allocated array.

However, this is UB and confused GCC 12. The compiler noticed that
the calls to free() were using an offset and emitted a warning.
Rather than storing UB pointers in the softc, push the decrement
operation into the existing macros that wrap accesses to the relevant
arrays.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D36829

(cherry picked from commit b36b14beda4ff7ecbb906ada756141f76fcb81aa)

bhyve: Simplify spinup_ap_realmode slightly.

There is no reason to modify the passed in rip variable.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37647

(cherry picked from commit e53fcff1848bb5acafc3dc38cfeb34724d9b0b96)

bhyve: Tidy vCPU pthread startup.

Set the thread affinity in fbsdrun_start_thread next to where the
thread name is set. This keeps all the pthread initialization
operations at the start of a thread in one place.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37646

(cherry picked from commit 7224a96a55d512e00f390d4477e0fb0a163d7528)

bhyve: Don't access vcpumap[vcpu] directly in parse_cpuset().

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37645

(cherry picked from commit 8487443792ce67fce21ae32470d6d8c217e93368)

bhyve: Allocate struct vm_exit on the stack in vm_loop.

The global vmexit[] array is no longer needed to smuggle the rip
value from fbsdrun_addcpu() to vm_loop().

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37644

(cherry picked from commit a20c00c60e1c022f77a7bc638f29d3d6d9456953)

bhyve: Remove some no-op code for setting RIP.

fbsdrun_addcpu() read the current vCPU's RIP register from the kernel
via vm_get_register() to pass along through some layers to vm_loop()
which then set the register via vm_set_register(). However, this is
just always setting the value back to itself.

Reviewed by: corvink
Differential Revision: https://reviews.freebsd.org/D37643

(cherry picked from commit ceb0d0b0f184d72f31ebdaa4edc752aed78a5807)

bhyve: Simplify setting vCPU capabilities.

- Enable VM_CAP_IPI_EXIT in fbsdrun_set_capabilities along with other
  capabilities enabled on all vCPUs.

- Don't call fbsdrun_set_capabilities a second time on the BSP in
  spinup_vcpu.

- To preserve previous behavior, don't unconditionally enable
  unrestricted guest mode on the BSP (this unbreaks single-vCPU guests
  on Nehalem systems, though supporting such setups is of dubious
  value).  Other places that enbale UG on the BSP are careful to check
  the result of the operation and fail if it is not available.

- Don't set any capabilities in spinup_ap().  These are now all
  redundant with earlier settings from spinup_vcpu().

- While here, axe a stale comment from fbsdrun_addcpu().  This
  function is now always called from the main thread for all vCPUs.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37642

(cherry picked from commit 461663ddbad02a4a5135673d545695b1a9f25ed0)

bhyve: Remove unused return value from spinup_ap.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37641

(cherry picked from commit e7d5d2d1876afb429c6e1b4453bdd6d165bc5a68)

bhyve: Remove handler for VM_EXITCODE_SPINUP_AP.

Since commit 0bda8d3e9f7a, bhyve always enables VM_EXITCODE_IPI exits
instead, so this handler is no longer used.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37640

(cherry picked from commit 007d9ca5dd953c009cf976b49e2568444a16a473)

bhyve: Remove the unused vcpu argument from all of the I/O port handlers.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37653

(cherry picked from commit 08b05de1e21a7f3720eb618613276e3f3ab665f3)

bhyve: Remove unused vcpu argument from PCI read/write methods.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37652

(cherry picked from commit 78c2cd83eccac0ebd23ebd15df77ee33a5781a6e)

bhyve: Pass a vCPU ID of 0 to vm_setup_pptdev_msi*.

These ioctls are not vCPU-specific and the ioctl now ignores the vCPU
ID. 0 is used instead of -1 to provide limited forwards
compatibility.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37651

(cherry picked from commit 0857e5555d77357e34ea6d70b28ead6335e41d33)

bhyve: Remove unused argument from pci_nvme_handle_doorbell.

Reviewed by: corvink, chuck, markj
Differential Revision: https://reviews.freebsd.org/D37650

(cherry picked from commit 34781da505fd06fc7cdf18e6e0c0eb187391d2c7)

RELNOTES: Document lifting the hard limit on guest vCPUs in bhyve.

vmm: Free vCPUs when destroying them.

Reported by: andrew
Reviewed by: corvink, andrew, markj
Differential Revision: https://reviews.freebsd.org/D37649

(cherry picked from commit af3b48e101986fb0840739f8c4bb3195e78008b1)

vmm: Avoid infinite loop in vcpu_lock_all error case.

Reported by: Coverity (CIDs 1501060,1501071)
Reviewed by: corvink, markj, emaste
Differential Revision: https://reviews.freebsd.org/D37648

(cherry picked from commit d212d6ebb4ea3b3e9c3964c1a6d3f41817e437e1)

vmm: Don't lock a vCPU for VM_PPTDEV_MSI[X].

These are manipulating state in a ppt(4) device none of which is
vCPU-specific. Mark the vcpu fields in the relevant ioctl structures
as unused, but don't remove them for now.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37639

(cherry picked from commit 91980db1beecd52e34a1550a247e374cfcc746a2)

vmm: VM_GET/SET_KERNEMU_DEV should run with the vCPU locked.

Reviewed by: corvink, kib, markj
Differential Revision: https://reviews.freebsd.org/D37638

(cherry picked from commit 62be9ffd82fb1a03db5e04d32ab75550f1f4f2c7)

vmm: Remove stale comment for vm_rendezvous.

Support for rendezvous outside of a vcpu context (vcpuid of -1) was
removed in commit 949f0f47a4e7, and the vm, vcpuid argument pair was
replaced by a single struct vcpu pointer in commit d8be3d523dd5.

Reported by: andrew

(cherry picked from commit 1f6db5d6b5de5e0cafcdb141a988120b0faea049)

vmm: Fix build w/o KDTRACE_HOOKS.

Reviewed by: imp
Differential revision: https://reviews.freebsd.org/D37446

(cherry picked from commit 2ee1a18d51ed68ee34df7dcfd05f6cfc16110202)

vmm: Fix non-INVARIANTS build

Reported by: O. Hartmann <freebsd@walstatt-de.de>
Reviewed by: jhb
Fixes: 58eefc67a1cf
Differential Revision: https://reviews.freebsd.org/D37444

(cherry picked from commit d487cba33d777efb9f6f7d7967ad2eaa629bcb90)

vmm: Trim some pointless #ifdef KTR.

Reported by: markj
Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37272

(cherry picked from commit 49fd5115a9b244c599e068977324e8f6a9993066)

vmm: Convert VM_MAXCPU into a loader tunable hw.vmm.maxcpu.

The default is now the number of physical CPUs in the system rather
than 16.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37175

(cherry picked from commit ee98f99d7a68b284a669fefb969cbfc31df2d0ab)

vmm: Allocate vCPUs on first use of a vCPU.

Convert the vcpu[] array in struct vm to an array of pointers and
allocate vCPUs on first use.  This avoids always allocating VM_MAXCPU
vCPUs for each VM, but instead only allocates the vCPUs in use.  A new
per-VM sx lock is added to serialize attempts to allocate vCPUs on
first use.  However, a given vCPU is never freed while the VM is
active, so the pointer is read via an unlocked read first to avoid the
need for the lock in the common case once the vCPU has been created.

Some ioctls need to lock all vCPUs.  To prevent races with ioctls that
want to allocate a new vCPU, these ioctls also lock the sx lock that
protects vCPU creation.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37174

(cherry picked from commit 98568a005a193ce2c37702a8377ddd10c570e452)

vmm: don't lock a mtx in the icr_low write handler

x2apic accesses are handled by a wrmsr exit. This handler is called in a
critical section. So, we can't lock a mtx in the icr_low handler.

Reported by: kp, pho
Tested by: kp, pho
Approved by: manu (mentor)
Fixes: c0f35dbf19c3c8825bd2b321d8efd582807d1940 vmm: Use a cpuset_t for vCPUs waiting for STARTUP IPIs.
MFC after: 1 week
MFC with: c0f35dbf19c3c8825bd2b321d8efd582807d1940
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D37452

(cherry picked from commit 7c326ab5bb9aced8dcbc2465ac1c9ff8df2ba46b)

vmm: Use a cpuset_t for vCPUs waiting for STARTUP IPIs.

Retire the boot_state member of struct vlapic and instead use a cpuset
in the VM to track vCPUs waiting for STARTUP IPIs. INIT IPIs add
vCPUs to this set, and STARTUP IPIs remove vCPUs from the set.
STARTUP IPIs are only reported to userland for vCPUs that were removed
from the set.

In particular, this permits a subsequent change to allocate vCPUs on
demand when the vCPU may not be allocated until after a STARTUP IPI is
reported to userland.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37173

(cherry picked from commit c0f35dbf19c3c8825bd2b321d8efd582807d1940)

vmm devmem_mmap_single: Bump object reference under memsegs lock.

Reported by: markj
Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37273

(cherry picked from commit 223de44c93659457e05036dec25b0af610a773a6)

vmm: take exclusive mem_segs_lock in vm_cleanup()

The consumers of vm_cleanup() are vm_reinit() and vm_destroy().

The vm_reinit() call path is, here vmmdev_ioctl() takes mem_segs_lock:
    vmmdev_ioctl()
    vm_reinit()
    vm_cleanup(destroy=false)

The call path for vm_destroy() is (mem_segs_lock not taken):
    sysctl_vmm_destroy()
    vmmdev_destroy()
    vm_destroy()
    vm_cleanup(destroy=true)

Fix this by taking mem_segs_lock in vm_cleanup() when destroy == true.

Reviewed by: corvink, markj, jhb
Fixes: 67b69e76e8ee ("vmm: Use an sx lock to protect the memory map.")
Differential Revision: https://reviews.freebsd.org/D38071

(cherry picked from commit c668e8173a8fc047b54a5c51b0fe4637e87836b6)

vmm: take exclusive mem_segs_lock when (un)assigning ppt dev

PR: 268744
Reported by: mmatalka@gmail.com
Reviewed by: corvink, markj, jhb
Fixes: 67b69e76e8ee ("vmm: Use an sx lock to protect the memory map.")
Differential Revision: https://reviews.freebsd.org/D37962

(cherry picked from commit ccf32a68f821c5c724fb9a5b4b9576925122292f)

vmm: Use an sx lock to protect the memory map.

Previously bhyve obtained a "read lock" on the memory map for ioctls
needing to read the map by locking the last vCPU. This is now
replaced by a new per-VM sx lock. Modifying the map requires
exclusively locking the sx lock as well as locking all existing vCPUs.
Reading the map requires either locking one vCPU or the sx lock.

This permits safely modifying or querying the memory map while some
vCPUs do not exist which will be true in a future commit.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37172

(cherry picked from commit 67b69e76e8eecfd204f6de636d622a1d681c8d7e)

vmm: Destroy mutexes.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37171

(cherry picked from commit 08ebb360764729632e1f6bc4e3f434abdd708204)

vmm stat: Add a special nelems constant for arrays sized by vCPU count.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37170

(cherry picked from commit d5118d0fc4599f69116ec8de59052606e36e6306)

vmm vmx: Allocate vpids on demand as each vCPU is initialized.

Compared to the previous version this does mean that if the system as
a whole runs out of dedicated vPIDs you might end up with some vCPUs
within a single VM using dedicated vPIDs and others using shared
vPIDs, but this should not break anything.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37169

(cherry picked from commit 58eefc67a1cf16623c23354efd089f65401c0455)

vmm: Lookup vcpu pointers in vmmdev_ioctl.

Centralize mapping vCPU IDs to struct vcpu objects in vmmdev_ioctl and
pass vcpu pointers to the routines in vmm.c. For operations that want
to perform an action on all vCPUs or on a single vCPU, pass pointers
to both the VM and the vCPU using a NULL vCPU pointer to request
global actions.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37168

(cherry picked from commit 3f0f4b1598e0e7005bebed7ea3458e96d0fb8e2f)

vmm ppt: Remove unused vcpu arg from MSI setup handlers.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37167

(cherry picked from commit 0cbc39d53d2270fa77255c663a0cfa5ed502ab0a)

vmm: Remove unused vcpuid argument from vioapic_process_eoi.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37166

(cherry picked from commit e42c24d56b3d949aafd0c916e30ab91a4fe1e24d)

vmm: Use struct vcpu in the rendezvous code.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37165

(cherry picked from commit d8be3d523dd50a17f48957c1bb2e0cd7bbf02cab)

vmm: Remove support for vm_rendezvous with a cpuid of -1.

This is not currently used.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37164

(cherry picked from commit 949f0f47a4e774fea7222923440851c612a3f6fa)

vmm: Remove vcpuid from I/O port handlers.

No I/O ports are vCPU-specific (unlike memory which does have
vCPU-specific ranges such as the local APIC).

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37163

(cherry picked from commit 9388bc1e3a58ac17c06e690fb9f46c9f36098f2d)

vmm: Restore the correct vm_inject_*() prototypes

Fixes: 80cb5d845b8f ("vmm: Pass vcpu instead of vm and vcpuid...")
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D37443

(cherry picked from commit ca6b48f08034114edf1fa19cdc088021af2eddf3)

vmm: Pass vcpu instead of vm and vcpuid to APIs used from CPU backends.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37162

(cherry picked from commit 80cb5d845b8f4b7dc25b5dc7f4a9a653b98b0cc6)

vmm: Use struct vcpu in the instruction emulation code.

This passes struct vcpu down in place of struct vm and and integer
vcpu index through the in-kernel instruction emulation code. To
minimize userland disruption, helper macros are used for the vCPU
arguments passed into and through the shared instruction emulation
code.

A few other APIs used by the instruction emulation code have also been
updated to accept struct vcpu in the kernel including
vm_get/set_register and vm_inject_fault.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37161

(cherry picked from commit d3956e46736ffaee5060c9baf0a40f428bc34ec3)

vmm: Add vm_gpa_hold_global wrapper function.

This handles the case that guest pages are being held not on behalf of
a virtual CPU but globally. Previously this was handled by passing a
vcpuid of -1 to vm_gpa_hold, but that will not work in the future when
vm_gpa_hold is changed to accept a struct vcpu pointer.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37160

(cherry picked from commit 28b561ad9d03617418aed33b9b8c1311e940f0c8)

vmm: Add _KERNEL guards for io headers shared with userspace.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37159

(cherry picked from commit 0f435e647645afc68076ff5b005f9366c44413eb)

bhyve: Remove unused vm and vcpu arguments from vm_copy routines.

The arguments identifying the VM and vCPU are only needed for
vm_copy_setup.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37158

(cherry picked from commit 2b4fe856f44ded02f3450bac1782bb49b60b7dd5)

vmm: Use struct vcpu with the vmm_stat API.

The function callbacks still use struct vm and and vCPU index.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37157

(cherry picked from commit 3dc3d32ad67b38ab44ed4a7cf3020a0741b47ec1)

vmm: Expose struct vcpu as an opaque type.

Pass a pointer to the current struct vcpu to the vcpu_init callback
and save this pointer in the CPU-specific vcpu structures.

Add routines to fetch a struct vcpu by index from a VM and to query
the VM and vcpuid from a struct vcpu.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37156

(cherry picked from commit 950af9ffc616ee573a1ce6ef0c841e897b13dfc4)

vmm: Use VLAPIC_CTR* in more places.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37155

(cherry picked from commit d030f941e63f5b20efa14833912aae29ff737fcf)

vmm vmx: Add VMX_CTR* wrapper macros.

These macros are similar to VCPU_CTR* but accept a single vmx_vcpu
pointer as the first argument instead of separate vm and vcpuid.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37154

(cherry picked from commit 57e0119ef3a95d7faa11c44b1acbb8193aadfb35)

vmm svm: Add SVM_CTR* wrapper macros.

These macros are similar to VCPU_CTR* but accept a single svm_vcpu
pointer as the first argument instead of separate vm and vcpuid.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37153

(cherry picked from commit fca494dad06242aa45d3e722f0c16b405dc8039c)

vmm: Remove the per-vm cookie argument from vmmops taking a vcpu.

This requires storing a reference to the per-vm cookie in the
CPU-specific vCPU structure. Take advantage of this new field to
remove no-longer-needed function arguments in the CPU-specific
backends. In particular, stop passing the per-vm cookie to functions
that either don't use it or only use it for KTR traces.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37152

(cherry picked from commit 869c8d1946eb4feb8ad651abdf87af0e5c0111b4)

vmm: Refactor storage of CPU-dependent per-vCPU data.

Rather than storing static arrays of per-vCPU data in the CPU-specific
per-VM structure, adopt a more dynamic model similar to that used to
manage CPU-specific per-VM data.

That is, add new vmmops methods to init and cleanup a single vCPU.
The init method returns a pointer that is stored in 'struct vcpu' as a
cookie pointer. This cookie pointer is now passed to other vmmops
callbacks in place of the integer index. The index is now only used
in KTR traces and when calling back into the CPU-independent layer.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37151

(cherry picked from commit 1aa5150479bf35c90c6770e6ea90e8462cfb6bf9)

vmm vmx: Add a global bool to indicate if the host has the TSC_AUX MSR.

A future commit will remove direct access to vCPU structures from
struct vmx, so add a dedicated boolean for this rather than checking
the capabilities for vCPU 0.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37269

(cherry picked from commit 73abae4493782e44a3382b15f5563c3f400bf51f)

vmm: Rework snapshotting of CPU-specific per-vCPU data.

Previously some per-vCPU state was saved in vmmops_snapshot and other
state was saved in vmmops_vcmx_snapshot. Consolidate all per-vCPU
state into the latter routine and rename the hook to the more generic
'vcpu_snapshot'. Note that the CPU-independent per-vCPU data is still
stored in a separate blob as well as the per-vCPU local APIC data.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37150

(cherry picked from commit 39ec056e6dbd89e26ee21d2928dbd37335de0ebc)

vmm svm: Mark all VMCB state caches dirty on vCPU restore.

Mark Johnston noticed that this was missing VMCB_CACHE_LBR. Just set
all the bits as is done in svm_run() rather than trying to clear
individual bits.

Reported by: markj
Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37259

(cherry picked from commit 19b9dd2e08eda491ab1c181ca5a3659f7e7628fc)

vmm vmx: Refactor per-vCPU data.

Add a struct vmx_vcpu to hold per-vCPU data specific to VT-x and
move parallel arrays out of struct vmx into a single array of
this structure.

While here, dynamically allocate the VMCS, APIC page and PIR
descriptors for each vCPU rather than embedding them.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37149

(cherry picked from commit 0f00260c679f8d192f9d673fe4fb94b47a2ac6c5)

vmm svm: Refactor per-vCPU data.

- Allocate VMCBs separately to avoid excessive padding in struct
svm_vcpu.

- Allocate APIC pages dynamically directly in struct vlapic.

- Move vm_mtrr into struct svm_vcpu rather than using a separate
parallel array.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37148

(cherry picked from commit 215d2fd53f6c254cb900e1775abae86d3fdada65)

vmm: Use vm_get_maxcpus() instead of VM_MAXCPU in various places.

Mostly these are loops that iterate over all possible vCPU IDs for a
specific virtual machine.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37147

(cherry picked from commit 35abc6c238e98e313c5b1cb5ed18b8ed68615abc)

vmm: Simplify saving of absolute TSC values in snapshots.

Read the current "now" TSC value and use it to compute absolute time
saved value in vm_snapshot_vcpus rather than iterating over vCPUs
multiple times in vm_snapshot_vm.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37146

(cherry picked from commit a7db532e3a6f83067b342f569b56076d011f8a1e)

bhyve: Avoid passing a possible garbage pointer to free().

All of the error paths in pci_vtcon_sock_add free the sock pointer.
However, sock is not initialized until part way through the function.
An early error would pass stack garbage to free().

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37491

(cherry picked from commit bc928800723b65daa9b005bec4ffd8ad8c781a09)

bhyve: Appease warning about a potentially unaligned pointer.

When initializing the device model for a PCI pass through device that
uses MSI-X, bhyve reads the MSI-X capability from the real device to
save a copy in the emulated PCI config space. It also saves a copy in
a local struct msixcap on the stack. Since struct msixcap is packed,
GCC complains that casting a pointer to the struct to a uint32_t
pointer may result in an unaligned pointer.

This path is not performance critical, so to appease the compiler,
simply change the pointer to a char * and use memcpy to copy the 4
bytes read in each iteration of the loop.

Reviewed by: corvink, bz, markj
Differential Revision: https://reviews.freebsd.org/D37490

(cherry picked from commit 32b21dd2719f87811b66deeeb513acf7067f8fac)

bhyve: Fix sign compare warnings in the NVMe device model.

Reviewed by: corvink
Differential Revision: https://reviews.freebsd.org/D37489

(cherry picked from commit 15cebe3d637f70abd1ee95e2745d6676d9b1e7dd)

bhyve: Avoid unlikely truncation of the blockif ident strings.

The ident string for NVMe and VirtIO block deivces do not contain the
bus, and the various fields can potentially use up to three characters
when printed as unsigned values (full range of uint8_t) even if not
likely in practice.

Reviewed by: corvink, chuck
Differential Revision: https://reviews.freebsd.org/D37488

(cherry picked from commit 5d805962ca9347bbf62750452c4c980decb94793)

bhyve: Clear lid to 0 for internal device errors for NVMe AENs.

Reported by: GCC
Reviewed by: corvink, chuck, imp, markj
Differential Revision: https://reviews.freebsd.org/D37487

(cherry picked from commit 47d61162396bac8a7320a6768f218b192dd19ee1)

bhyve: Don't leak uninitialized bits in NVMe completion statuses.

In some cases, some bits in the 16-bit status word were never
initialized.

Reported by: GCC
Reviewed by: corvink, chuck, markj
Differential Revision: https://reviews.freebsd.org/D37486

(cherry picked from commit 1d9e8a9e60953b148a036b39d1fe7037fdbb40a3)

bhyve: Fix sign compare warnings in the e1000 device model.

Adding a bare constant to a uint16_t promotes to a signed int which
triggers these warnings. Changing the constant to be explicitly
unsigned instead promotes the expression to unsigned int.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37485

(cherry picked from commit e7cd5ffff88f1f4dfba2693041cc78fcf3613fba)

bhyve basl: Use GCC pragmas.

These work with both clang and GCC.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37484

(cherry picked from commit 0acf696151e3c43967988c8271aa27683566a755)

bhyve: Enable the default compiler warnings

Disable -Wcast-align for now since we have many instances of that
warning (I fixed some but not most of them) and platforms on which bhyve
runs don't particularly care about unaligned accesses.

Reviewed by: corvink
Differential Revision: https://reviews.freebsd.org/D37296

(cherry picked from commit 71ebd117386cda6410ca65eb487b63e5dedf3245)

bhyve: Mark pci_de_vinput as static.

Originally in main this was fixed in commit 37045dfa891a. However,
when that commit was merged to stable/13 in commit 976ed044fbbb,
pci_virtio_input was not yet merged to stable/13. This is a direct
commit to complete the earlier MFC.

bhyve: Let BASL compile with raised warnings

- Make basl_dump() as unused.
- Avoid arithmetic on a void pointer.
- Avoid a signed/unsigned comparison with
  BASL_TABLE_CHECKSUM_LEN_FULL_TABLE.
- Ignore warnings about unused parameters from stuff pulled in by
  acpi.h.  In particular, any prototype wrapped by
  ACPI_DBG_DEPENDENT_RETURN_VOID() will raise such parameters unless
  ACPI_DEBUG_OUTPUT is defined.

Reviewed by: corvink, jhb
Differential Revision: https://reviews.freebsd.org/D37397

(cherry picked from commit c127c61efa4d8414be9a7373b50c7f348b6e461e)

bhyve: Avoid using a packed struct for xhci port registers

I believe the __packed annotation is there only because
pci_xhci_portregs_read() is treating the register set as an array of
uint32_t.  clang warns about taking the address of portregs->portsc
because it is a packed member and thus might not have expected
alignment.

Fix the problem by simply selecting the field to read with a switch
statement.  This mimics pci_xhci_portregs_write().  While here, switch
to using some symbolic constants.

There is a small semantic change here in that pci_xhci_portregs_read()
would silently truncate unaligned offsets.  For consistency with
pci_xhci_portregs_write(), which does not do that, return all ones for
unaligned reads instead.

MFC after: 2 weeks
Reviewed by: corvink, jhb
Differential Revision: https://reviews.freebsd.org/D37408

(cherry picked from commit 0705b7f4e64fdbad49a3a6d9131029a9734deb2c)