CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log

mips: Fix sendsig for stack layout randomisation

PS_STRINGS doesn't account for the stack gap, we need to use the new
PROC_PS_STRINGS macro to correctly point at the trampoline.

This is a direct commit to stable/13 as mips no longer exists in main.

Fixes: d247611467e0 ("exec: Introduce the PROC_PS_STRINGS() macro")

libc: Fix longjmp/_longjmp(buf, 0) for MIPS

Like AArch64 and RISC-V in the past, MIPS fails to handle this special
case, and will cause the corresponding setjmp/_setjmp to return 0 rather
than 1. Fix this so the newly-added regression tests pass.

This is a direct commit to stable/13 as mips no longer exists in main.

Reviewed by: arichardson, jhb
Differential Revision: https://reviews.freebsd.org/D29363

libc: Fix longjmp/_longjmp(buf, 0) for AArch64 and RISC-V

These architectures fail to handle this special case, and will cause the
corresponding setjmp/_setjmp to return 0 rather than 1. Fix this and add
regression tests (also committed upstream).

PR: 268684
Reviewed by: arichardson, jhb
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29363

(cherry picked from commit 9fb118bebced1452a46756a13be0161021b10905)

Makefile: Avoid sanitizing PATH on non-FreeBSD systems

Allow the build process to find host binaries during the host-symlinks target when
cross-building on non-FreeBSD systems. Whilst most non-FreeBSD systems have all
the needed tools in /sbin:/bin:/usr/sbin:/usr/bin:/usr/local/bin (the final
path added by host-symlinks itself), Homebrew for macOS on Arm defaults to
/opt/homebrew/bin, other more niche systems may also deviate and users may
expect tools in a customised PATH to be picked up, unlike on FreeBSD where we
want to ensure everything comes from base. In particular, (un)xz are needed
from Homebrew on macOS, and thus cannot be found on Arm without this.

Note that non-FreeBSD builds enforce BUILD_WITH_STRICT_TMPPATH, and so the
actual main build steps will still use a sanitised PATH.

Reviewed by: jrtc27, arichardson
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D37991

(cherry picked from commit 16fbf0191243e7c9dff6615b1424b5d39186b36c)

strfmon(3): Match the return type

(cherry picked from commit f5924ad8fde4d5e9c233b821cb6097c6a46740b5)

strfmon(3): Wording improvements

(cherry picked from commit 59cc636d94a6c9b28147304fa59351224f801e92)

strfmon(3): Add an EXAMPLES section

(cherry picked from commit cdd9d92dade61a6b5c37b758e9533a076bb5a2de)

libcxx: add comment explaining why umtx is only used for 64bits

(cherry picked from commit 4c4a29267cbdd05471322e03bfd5eff8eb68e750)

libcxx: use __SIZEOF_LONG__ == 8 instead of __LP64__

For MFC, 64bit mips is excluded from the new mode of wait/wake, because
it uses 32bit __cxx_contention_t, as noted by jrtc27.

(cherry picked from commit 25b18d8935a5ee933d5465f5db41ad58c26590f9)

libcxx: Implement atomic::wait/notify using _umtx_op(2) for 64bit arches

(cherry picked from commit 9c996882b0ed279ca314d995f2eb1d538b94e3f8)

freebsd32: Make sendmsg match native ABI for unpadded final control message

The API says that CMSG_SPACE should be used for msg_controllen, but in
practice the native ABI allows you to only use CMSG_LEN for the final
(typically only) control message, and real-world software does this,
including Wayland. For freebsd32, this is in practice mostly harmless,
since control messages are generally used to carry file descriptors,
which are already 4 bytes in size and thus no padding is needed, but
they can carry other quantities that may not result in an aligned
length. This was discovered after CheriBSD's freebsd64 equivalent was
updated to match the freebsd32 implementation, as that uses 8 byte
alignment which does break the file descriptor use case, and thus
Wayland.

This used to be addressed by aligning buflen before the first iteration,
but that allowed unwanted invalid inputs and was lost in 1b1428dcc82b,
with no safer equivalent put in its place.

Reviewed by: brooks, kib, markj
Obtained from: CheriBSD
Fixes: 1b1428dcc82b ("Fix a TOCTOU vulnerability in freebsd32_copyin_control().")
Differential Revision: https://reviews.freebsd.org/D36554

(cherry picked from commit 7b673a2c73d0577e2c006aeb110295a522b98135)

freebsd32_sendmsg: fix control message ABI

When a freebsd32 caller uses all or most allowed space for control
messages (MCLBYTES == 2K) then the message may no longer fit when
the messages are padded for 64-bit alignment. Historically we've just
shrugged and said there is no ABI guarantee. We ran into this on
CheriBSD where a capsicumized 64-bit nm would fail when called with more
than 64 files.

Fix this by not gratutiously capping size of mbuf data we'll allocate
to MCLBYTES and let m_get2 allocate up to MJUMPAGESIZE (4K or larger).
Instead of hard-coding a length check, let m_get2 do it and check for a
NULL return.

Reviewed by: markj, jhb, emaste
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D36322

(cherry picked from commit c46697b9cb97a14f61ac0a58758aab081b9e48c5)

kvmclock: Fix initialization when EARLY_AP_STARTUP is not defined

To attach to the hypervisor, kvmclock needs to write a per-CPU MSR.
When EARLY_AP_STARTUP is not defined, device attach happens too early:
APs are not yet spun up, so smp_rendezvous only runs the callback on the
local CPU. As a result, the timecounter only gets initialized on the
BSP, and then timekeeping is broken on SMP systems.

Implement handling for !EARLY_AP_STARTUP kernels: keep track of the CPU
on which device attach ran, and then use a SI_SUB_SMP SYSINIT to
register the rest of the CPUs with the hypervisor.

Reported by: Shrikanth R Kamath <kshrikanth@juniper.net>
Reviewed by: kib, jhb (earlier versions)
Sponsored by: Klara, Inc.
Sponsored by: Juniper Networks, Inc.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D37705

(cherry picked from commit 568f552b0410f4f72db9ce710f7803acca23f79f)

Add kf_file_nlink field to kf_file and populate it

This will allow user-space programs (e.g. lsof) to locate deleted files
whose nlink equals zero. Prior to this commit, programs has to use
stat(kf_path) to get nlink, but that will fail if the file is deleted.

[mjg: s/fail/file in the commit message]

Reviewed by: mjg
Differential Revision: https://reviews.freebsd.org/D38169

(cherry picked from commit dec7db49602df75119f6b57d32f1eb58e0040abc)

netpfil tests: improve pfsync_defer.py

Return different exit code depending on which failure was encountered.
The pfsync test expect a very particular failure, not just any.

MFC after: 1 week
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D38123

(cherry picked from commit 06012728beff45e94d58410eae7cda2ea980ef77)

netpfil tests: improve pft_ping.py

Multiple improvements to pft_ping.py:

* Automatically use IPv6 when IPv6 addresses are used, --ip6 is not needed.
* Building of ping requests and parsing of ping replies is done layer by
  layer. This way most arguments are available both for IPv6 and IPv4,
  for ICMP and TCP.
* Use argument groups for improved readability.
* Change ToS and TTL argument name to TC and HL to reflect the modern
  IPv6 nomenclature. The argument still set related IPv4 header fields
  properly.
* Instead of sniffing for the very specific case of duplicated packets,
  allow for sniffing on multiple interfaces.
* Report which sniffer has failed by setting bits of error code.
* Raise meaningful exceptions when irrecoverable errors happen.
* Make IPv4 fragmentation flags configurable.
* Make IPv6 HL / IPv4 TTL configurable.
* Make TCP MSS configurable.
* Make TCP sequence number configurable.
* Make ICMP payload size configurable.
* Add debug output.
* Move command line argument parsing out of network functions.
* Make the code somehow PEP-8 compliant.

MFC after: 1 week
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D38122

(cherry picked from commit f57218e469a7b1ab40521ea75ebfd45b493851ca)

netpfil tests: improve sniffer.py

Multiple improvements to sniffer.py:

* Remove ambiguity of configuring recvif, it must be now explicitly specified.
* Don't catch exceptions around creating the sniffer, let it properly
fail and display the whole stack trace.
* Count correct packets so that duplicates can be found.

MFC after: 1 week
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D38120

(cherry picked from commit a39dedeb31052ec74b0cd394d56f8d7cc8534645)

nfsserver: Fix handling of SP4_NONE

For NFSv4.1/4.2, when the client specifies SP4_NONE for
state protection in the ExchangeID operation arguments,
the server MUST allow the state management operations for
any user credentials. (I misread the RFC and thought that
SP4_NONE meant "at the server's discression" and not MUST
be allowed.)

This means that the "sec=XXX" field of the "V4:" exports(5)
line only applies to NFSv4.0.

This patch fixes the server to always allow state management
operations for SP4_NONE, which is the only state management
option currently supported. (I have patches that add support
for SP4_MACH_CRED to the server. These will be in a future commit.)

In practice, this bug does not seem to have caused
interoperability problems.

(cherry picked from commit 5a0050e68a54353af53ac28df90854797ebbef16)

grep: properly switch EOL indicator with -z

-z is supposed to use only the NUL byte as EOL, but we were
inadvertently using both newline and NUL due to REG_NEWLINE in cflags.

The odds of anyone relying on this bsdgrep-specific bug are quite low,
so let's just fix it. At least one port in the wild has been reported
to expect the intended behavior.

Reported by: Hill Ma <maahiuzeon@gmail.com>
Triaged by: the self-proclaimed peanut gallery on Discord

(cherry picked from commit e898a3af97f97f74c0fb22032bbd163f7cc92a05)

pf tests: test fast port re-use with syncookies

When a src/dst ip/port tuple is re-used before the pf state fully
expires we clean up the state and create a new one, unless syncookies
are enabled.

Test this, by running two back-to-back nc sessions, with a fixed source
port. Move the interface and IP to a different (vnet) jail, to trick the
network stack into letting us do this.

MFC after:      2 weeks
Event:          Aberdeen hackathon 2022
Differential Revision:  https://reviews.freebsd.org/D36886

(cherry picked from commit dc698b2cd59ebc08b05a261dbba8ee5707450d28)

pf: fix syncookies in conjunction with tcp fast port reuse

Basic scenario: we have a closed connection (In TCPS_FIN_WAIT_2), and
get a new connection (i.e. SYN) re-using the tuple.

Without syncookies we look at the SYN, and completely unlink the old,
closed state on the SYN.
With syncookies we send a generated SYN|ACK back, and drop the SYN,
never looking at the state table.

So when the ACK (i.e. the third step in the three way handshake for
connection setup) turns up, we’ve not actually removed the old state, so
we find it, and don’t do the syncookie dance, or allow the new
connection to get set up.

Explicitly check for this in pf_test_state_tcp(). If we find a state in
TCPS_FIN_WAIT_2 and the syncookie is valid we delete the existing state
so we can set up the new state.
Note that when we verify the syncookie in pf_test_state_tcp() we don't
decrement the number of half-open connections to avoid an incorrect
double decrement.

MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D37919

(cherry picked from commit 9c041b450d5e604c3e35b5799b60a2c53795feef)

pf: fix panic on deferred packets

The pfsync_defer_tmo() callout needs to set the correct vnet before it
can transmit packets. It used the rcvif in the mbuf to get this vnet,
but that doesn't work for locally originated traffic. In that case the
rcvif pointer is NULL, and the dereference leads to a panic.

Instead use the sc_sync_if, which is always set (if pfsync is enabled,
at least).

PR: 268246
MFC after: 2 weeks

(cherry picked from commit fd02192c3acaefeb62db11e0c10ab36240b79ba2)

13.2: update stable/13 to -PRERELEASE to start the release cycle

Approved by: re (implicit)
Sponsored by: https://www.patreon.com/cperciva

cal: don't print terminal control characters unless stdout is a TTY

A similar change was made in svn r223931, but it was incomplete, working
only when the utility was invoked as "ncal". Fix the same issue when
invoking as "cal".

PR: 268936
Reported by: Ray Bellis <ray@bellis.me.uk>
Sponsored by: Axcient
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D38045

(cherry picked from commit 92e978439f0c3139775ad96d412959f5a74b17b6)

Switch wg(4) to the new if_clone KPI

Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D37740

(cherry picked from commit eb3f9a7aece9473d678adddcf6aefe6c1eec0ac4)

fsx: more consistent debug output with -[RWU]

(cherry picked from commit da303f5fd4ee8582cf76712212dd0b711d7d9dac)

fsx: bounds check the inputs

In particular, don't allow the user to specify a file size that can't be
expressed as an int, since fsx's random-number generator only has a 32
bit range.

(cherry picked from commit 3f8ca7a22ed917a3e3a4ad78538d9f468d6d3bd8)

ping(8): man page cleanup

* Appease mandoc -T lint and igor

* Use example.com for documentation

* Update the IPv4 TTL section.
  Update the IPv4 TTL section specifically for FreeBSD.
  FreeBSD changed the default TTL to 64 in
  5639e86bdd7ea151958776264bf5a67e60a54d68.  NetBSD and OpenBSD still
  use 255.  Remove some references of extinct operating systems.

Reviewed by: gbe (manpages), asomers
Pull Request: https://github.com/freebsd/freebsd-src/pull/630

(cherry picked from commit 8eb4df948711166a438f4111f7069a412d1456bd)

Add test cases for ping with IP options in the response

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D37210

(cherry picked from commit e35cfc606a299ef40767e708362529c370f767f5)

improvements to cap_sysctl.3

* Correct some function prototypes which were documented with the wrong
pointer type.
* Clarify return values and requirements for freeing the limit handle.

[skip ci]

Sponsored by: Axcient
Reviewed by: oshogbo
Differential Revision: https://reviews.freebsd.org/D37586

(cherry picked from commit 6c93a2d0bc37f0c912e402f3f94c3c01350dca26)

loader: md: Use default func for fmtdev and parsedev

The default function are enough for md so use them instead of the
disks ones that doesn't work for it anymore.

Reviewed by: imp
Sponsored by: Beckhoff Automation GmbH & Co. KG
MFC after: now
Differential Revision: https://reviews.freebsd.org/D38218

(cherry picked from commit 04afa8cc370e6bb7302b6fe09c8d27a606fe414e)

Allow any user to read the NFS stats, for example with nfsstat(1).

This was originally allowed by 3cea29603d3 (2011). But it got broken by
693957f8861 (2016) and apparently nobody noticed.

Sponsored by: Axcient
Reviewed by: rmacklem, ken
Differential Revision: https://reviews.freebsd.org/D37589

(cherry picked from commit d2ce00e9a6176014bbeb792dd9959ef1e60d787e)

document first appearance of fhlink et al

[skip ci]

Sponsored by: Axcient
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D37575

(cherry picked from commit 34120c0c5234a56945f9a732b05a8d8b97492916)

ena: Update driver version to v2.6.2

Bug Fixes:
* Remove timer service re-arm on ena_restore_device failure.
* Re-Enable per-packet missing tx completion print

Minor Changes:
* Switch driver owners from Semihalf to Amazon in man file.

MFC after: 2 weeks
Sponsored by: Amazon, Inc.
Pull Request: https://github.com/freebsd/freebsd-src/pull/637

(cherry picked from commit e5de1d8dad25308f3a332f9386e05b851a36c507)

ena: Switch driver owners from semihalf to amazon in man file

1. Update ena.4 manual file to include amazon owner emails.
2. State that the driver is developed by amazon but leave
that it was originally written by Semihalf, similarly to other
drivers in the /share/man/ directory of the FreeBSD source code.
3. Advance year in copyright notice to 2022.

MFC after: 2 weeks
Sponsored by: Amazon, Inc.

(cherry picked from commit fb47286c38a5315bdd93389b0547b8ccccdc7327)

ena: Remove timer service re-arm on ena_restore_device failure

In case the reset sequence fails (ena_destroy_device() followed by
ena_restore_device() calls) during ena_restore_device(), the driver
resources are being freed. After the clean-up, the timer service is
re-armed in order to try and re-initialize the driver state.
But, such an attempt would fail given that the resources are freed.
Moreover, this would actually cause either the system to fail or a
panic.
When the driver fails in ena_restore_device() procedure, the only
recovery is either unloading and loading the driver or instance
reboot.

This change removes the timer service re-arm in case of failure
in ena_restore_device().

MFC after: 2 weeks
Sponsored by: Amazon, Inc.
Fixes: 78554d0c707c ("ena: start timer service on attach")
(cherry picked from commit c4a85b8d684d3db9dc4d3d01d966130e21390529)

ena: Re-Enable per-packet missing tx completion print

Commit [1] first added the ena_tx_buffer.print_once member,
so that a message about a missing tx completion is printed only
once per packet (and not every second when the watchdog runs).
In this commit print_once is initialized to true, and is set back
to false after detecting a missing tx completion and printing
a warning about it to dmesg.

Commit [2] incorrectly reverses the values assigned to print_once.
The variable is initialized to be true but is checked to be false
when a missing tx completion is detected. This is never true, and
therefore the warning print for each missing tx completion is never
printed since this commit.

Commit [3] added time passed since last TX cleanup to the missing
tx completions per-packet print. However, due to the issue in commit
[2], this time is never printed.

This commit reverses back the values assigned to ena_tx_buffer.print_once
erroneously by commit [2], bringing back to life the missing tx
completion per-packet print.

Also add a space after "." in the missing tx completion print.

[1] - 9b8d05b8ac78 ("Add support for Amazon Elastic Network Adapter (ENA) NIC")
[2] - 74dba3ad7851 ("Split function checking for missing TX completion in ENA driver")
[3] - d8aba82b5ca7 ("ena: Store ticks of last Tx cleanup")

Fixes: 74dba3ad7851 ("Split function checking for missing TX completion in ENA driver")
Fixes: d8aba82b5ca7 ("ena: Store ticks of last Tx cleanup")
MFC after: 2 weeks
Sponsored by: Amazon, Inc.

(cherry picked from commit f01b2cd98e93ef5ac3698b97bbaa45ac9c50fe5d)

ena: Remove write only variables

Sponsored by: Netflix

(cherry picked from commit 094b2a2379c1c92f134fac22be78517c71470ca6)

ena: Remove unused variable.

(cherry picked from commit 4dab99b936920cf0c8f3fcf63754985906714a40)

ena: Remove unused devclass argument to DRIVER_MODULE.

(cherry picked from commit 1dc1476cac52a65c6a25e786f90ccc7529b6a7de)

ipfw: Add missing 'va' code point name

Per RFC 5865, add the 'va' (VOICE-ADMIT, 101100) symbolic name.

Reviewed By: melifaro, pauamma
Differential Revision: https://reviews.freebsd.org/D37508
MFC after: 2 weeks

(cherry picked from commit bdd60b224fa461a5849f60575afdb458613f4ccd)

netlink: add netlink to GENERIC@amd64

Netlink is a communication protocol defined in RFC 3549. It is async,
TLV-based protocol, providing 1-1 and 1-many communications between kernel
and userland. Netlink is currently used in Linux kernel to modify, read and
subscribe for nearly all networking states. Interface state, addresses, routes,
firewall, rules, fibs, etc, are controlled via Netlink.

Netlink support was added in D36002. It has got a number of improvements and
first customers since then:
* net/bird2 got netlink support, enabling route multipath in FreeBSD
* netlink-based devd notifications are being worked on ( D37574 ).
* linux(4) fully supports and depends on Netlink

Enabling Netlink in GENERIC targets two goals.
The first one is to provide stability for the third-party userland applications,
so they can rely on the fact that netlink always exists since 14.0 and potentially 13.2.
Loadable module makes life of the app delepers harder. For example, `net/bird2` can be
either build with netlink or rtsock support, but not both.

The second goal is to enable gradual conversion of the base userland tools
to use netlink(4) interfaces. Converting tools like netstat (D36529), route,
ifconfig one-by-one simplifies testing and addressing the feedback.
Othewise, switching all base to use netlink at once may be too big of a leap.

This change targets amd64, the other architectures will follow soon.

Differential Revision: https://reviews.freebsd.org/D37783

(cherry picked from commit 692e19cf519578176d51d4c1001b01b1f355c1de)

bhyve: Mark pci_de_vinput as const.

I missed this in the prior fixup commit that added static.

Fixes: 03851ae8cd4d bhyve: Mark pci_de_vinput as static.

bhyveload: open guest boot disk image O_RDWR

When a boot environment has been booted via the bootonce feature,
userboot clears the bootonce value from an nvlist but fails to write the
updated nvlist back to disk.

The failure occurs because bhyveload opens the guest boot disk image
O_RDONLY, fix this by opening it O_RDWR.

Reviewed by: imp, markj, jhb
Differential Revision: https://reviews.freebsd.org/D37274

(cherry picked from commit 5a023bd2a53a7279b126ae6bf949560c6342b57a)

bhyve: Fix a buffer overread in the PCI hda device model.

The sc->codecs array contains HDA_CODEC_MAX (15) entries. The
guest-supplied cad field in the verb provided to hda_send_command is a
4-bit field that was used as an index into sc->codecs without any
bounds checking. The highest value (15) would overflow the array.

Other uses of sc->codecs in the device model used sc->codecs_no to
determine which array indices have been initialized, so use a similar
check to reject requests for uninitialized or invalid cad indices in
hda_send_command.

PR: 264582
Reported by: Robert Morris <rtm@lcs.mit.edu>
Reviewed by: corvink, markj, emaste
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D38128

(cherry picked from commit cf57f20edcf9c75f0f9f1ac1c44729184970b9d9)

bhyve: Fix a global buffer overread in the PCI hda device model.

hda_write did not validate the relative register offset before using
it as an index into the hda_set_reg_table array to lookup a function
pointer to execute after updating the register's value.

PR: 264435
Reported by: Robert Morris <rtm@lcs.mit.edu>
Reviewed by: corvink, markj, emaste
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D38127

(cherry picked from commit bfe8e339eb77910c2eb739b45aaa936148b33897)

bhyve: Remove vmctx argument from PCI device model methods.

Most of these arguments were unused. Device models which do need
access to the vmctx in one of these methods can obtain it from the
pi_vmctx member of the pci_devinst argument instead.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D38096

(cherry picked from commit 6a284cacb1e56a81dff7f5c34bbbd7ed1a0b6cb1)

bhyve: Fix a mismerge in the PCI passthrough device model.

This block of code was removed in stable/13 commit 7ea16192a01e. It
was accidentally re-added in commit aa5eea98b99c (probably due to
resolving a conflict during the merge).

This is a direct commit to stable/13.

bhyve: Avoid triggering false -Wfree-nonheap-object warnings.

XHCI port and slot numbers are 1-based rather than 0-based. To handle
this, bhyve was subtracting one item from the pointers saved in the
softc so that index 1 accessed index 0 of the allocated array.

However, this is UB and confused GCC 12. The compiler noticed that
the calls to free() were using an offset and emitted a warning.
Rather than storing UB pointers in the softc, push the decrement
operation into the existing macros that wrap accesses to the relevant
arrays.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D36829

(cherry picked from commit b36b14beda4ff7ecbb906ada756141f76fcb81aa)

bhyve: Simplify spinup_ap_realmode slightly.

There is no reason to modify the passed in rip variable.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37647

(cherry picked from commit e53fcff1848bb5acafc3dc38cfeb34724d9b0b96)

bhyve: Tidy vCPU pthread startup.

Set the thread affinity in fbsdrun_start_thread next to where the
thread name is set. This keeps all the pthread initialization
operations at the start of a thread in one place.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37646

(cherry picked from commit 7224a96a55d512e00f390d4477e0fb0a163d7528)

bhyve: Don't access vcpumap[vcpu] directly in parse_cpuset().

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37645

(cherry picked from commit 8487443792ce67fce21ae32470d6d8c217e93368)

bhyve: Allocate struct vm_exit on the stack in vm_loop.

The global vmexit[] array is no longer needed to smuggle the rip
value from fbsdrun_addcpu() to vm_loop().

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37644

(cherry picked from commit a20c00c60e1c022f77a7bc638f29d3d6d9456953)

bhyve: Remove some no-op code for setting RIP.

fbsdrun_addcpu() read the current vCPU's RIP register from the kernel
via vm_get_register() to pass along through some layers to vm_loop()
which then set the register via vm_set_register(). However, this is
just always setting the value back to itself.

Reviewed by: corvink
Differential Revision: https://reviews.freebsd.org/D37643

(cherry picked from commit ceb0d0b0f184d72f31ebdaa4edc752aed78a5807)

bhyve: Simplify setting vCPU capabilities.

- Enable VM_CAP_IPI_EXIT in fbsdrun_set_capabilities along with other
  capabilities enabled on all vCPUs.

- Don't call fbsdrun_set_capabilities a second time on the BSP in
  spinup_vcpu.

- To preserve previous behavior, don't unconditionally enable
  unrestricted guest mode on the BSP (this unbreaks single-vCPU guests
  on Nehalem systems, though supporting such setups is of dubious
  value).  Other places that enbale UG on the BSP are careful to check
  the result of the operation and fail if it is not available.

- Don't set any capabilities in spinup_ap().  These are now all
  redundant with earlier settings from spinup_vcpu().

- While here, axe a stale comment from fbsdrun_addcpu().  This
  function is now always called from the main thread for all vCPUs.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37642

(cherry picked from commit 461663ddbad02a4a5135673d545695b1a9f25ed0)

bhyve: Remove unused return value from spinup_ap.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37641

(cherry picked from commit e7d5d2d1876afb429c6e1b4453bdd6d165bc5a68)

bhyve: Remove handler for VM_EXITCODE_SPINUP_AP.

Since commit 0bda8d3e9f7a, bhyve always enables VM_EXITCODE_IPI exits
instead, so this handler is no longer used.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37640

(cherry picked from commit 007d9ca5dd953c009cf976b49e2568444a16a473)

bhyve: Remove the unused vcpu argument from all of the I/O port handlers.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37653

(cherry picked from commit 08b05de1e21a7f3720eb618613276e3f3ab665f3)

bhyve: Remove unused vcpu argument from PCI read/write methods.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37652

(cherry picked from commit 78c2cd83eccac0ebd23ebd15df77ee33a5781a6e)

bhyve: Pass a vCPU ID of 0 to vm_setup_pptdev_msi*.

These ioctls are not vCPU-specific and the ioctl now ignores the vCPU
ID. 0 is used instead of -1 to provide limited forwards
compatibility.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37651

(cherry picked from commit 0857e5555d77357e34ea6d70b28ead6335e41d33)

bhyve: Remove unused argument from pci_nvme_handle_doorbell.

Reviewed by: corvink, chuck, markj
Differential Revision: https://reviews.freebsd.org/D37650

(cherry picked from commit 34781da505fd06fc7cdf18e6e0c0eb187391d2c7)

RELNOTES: Document lifting the hard limit on guest vCPUs in bhyve.

vmm: Free vCPUs when destroying them.

Reported by: andrew
Reviewed by: corvink, andrew, markj
Differential Revision: https://reviews.freebsd.org/D37649

(cherry picked from commit af3b48e101986fb0840739f8c4bb3195e78008b1)

vmm: Avoid infinite loop in vcpu_lock_all error case.

Reported by: Coverity (CIDs 1501060,1501071)
Reviewed by: corvink, markj, emaste
Differential Revision: https://reviews.freebsd.org/D37648

(cherry picked from commit d212d6ebb4ea3b3e9c3964c1a6d3f41817e437e1)

vmm: Don't lock a vCPU for VM_PPTDEV_MSI[X].

These are manipulating state in a ppt(4) device none of which is
vCPU-specific. Mark the vcpu fields in the relevant ioctl structures
as unused, but don't remove them for now.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37639

(cherry picked from commit 91980db1beecd52e34a1550a247e374cfcc746a2)

vmm: VM_GET/SET_KERNEMU_DEV should run with the vCPU locked.

Reviewed by: corvink, kib, markj
Differential Revision: https://reviews.freebsd.org/D37638

(cherry picked from commit 62be9ffd82fb1a03db5e04d32ab75550f1f4f2c7)

vmm: Remove stale comment for vm_rendezvous.

Support for rendezvous outside of a vcpu context (vcpuid of -1) was
removed in commit 949f0f47a4e7, and the vm, vcpuid argument pair was
replaced by a single struct vcpu pointer in commit d8be3d523dd5.

Reported by: andrew

(cherry picked from commit 1f6db5d6b5de5e0cafcdb141a988120b0faea049)

vmm: Fix build w/o KDTRACE_HOOKS.

Reviewed by: imp
Differential revision: https://reviews.freebsd.org/D37446

(cherry picked from commit 2ee1a18d51ed68ee34df7dcfd05f6cfc16110202)

vmm: Fix non-INVARIANTS build

Reported by: O. Hartmann <freebsd@walstatt-de.de>
Reviewed by: jhb
Fixes: 58eefc67a1cf
Differential Revision: https://reviews.freebsd.org/D37444

(cherry picked from commit d487cba33d777efb9f6f7d7967ad2eaa629bcb90)

vmm: Trim some pointless #ifdef KTR.

Reported by: markj
Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37272

(cherry picked from commit 49fd5115a9b244c599e068977324e8f6a9993066)

vmm: Convert VM_MAXCPU into a loader tunable hw.vmm.maxcpu.

The default is now the number of physical CPUs in the system rather
than 16.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37175

(cherry picked from commit ee98f99d7a68b284a669fefb969cbfc31df2d0ab)

vmm: Allocate vCPUs on first use of a vCPU.

Convert the vcpu[] array in struct vm to an array of pointers and
allocate vCPUs on first use.  This avoids always allocating VM_MAXCPU
vCPUs for each VM, but instead only allocates the vCPUs in use.  A new
per-VM sx lock is added to serialize attempts to allocate vCPUs on
first use.  However, a given vCPU is never freed while the VM is
active, so the pointer is read via an unlocked read first to avoid the
need for the lock in the common case once the vCPU has been created.

Some ioctls need to lock all vCPUs.  To prevent races with ioctls that
want to allocate a new vCPU, these ioctls also lock the sx lock that
protects vCPU creation.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37174

(cherry picked from commit 98568a005a193ce2c37702a8377ddd10c570e452)

vmm: don't lock a mtx in the icr_low write handler

x2apic accesses are handled by a wrmsr exit. This handler is called in a
critical section. So, we can't lock a mtx in the icr_low handler.

Reported by: kp, pho
Tested by: kp, pho
Approved by: manu (mentor)
Fixes: c0f35dbf19c3c8825bd2b321d8efd582807d1940 vmm: Use a cpuset_t for vCPUs waiting for STARTUP IPIs.
MFC after: 1 week
MFC with: c0f35dbf19c3c8825bd2b321d8efd582807d1940
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D37452

(cherry picked from commit 7c326ab5bb9aced8dcbc2465ac1c9ff8df2ba46b)

vmm: Use a cpuset_t for vCPUs waiting for STARTUP IPIs.

Retire the boot_state member of struct vlapic and instead use a cpuset
in the VM to track vCPUs waiting for STARTUP IPIs. INIT IPIs add
vCPUs to this set, and STARTUP IPIs remove vCPUs from the set.
STARTUP IPIs are only reported to userland for vCPUs that were removed
from the set.

In particular, this permits a subsequent change to allocate vCPUs on
demand when the vCPU may not be allocated until after a STARTUP IPI is
reported to userland.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37173

(cherry picked from commit c0f35dbf19c3c8825bd2b321d8efd582807d1940)

vmm devmem_mmap_single: Bump object reference under memsegs lock.

Reported by: markj
Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37273

(cherry picked from commit 223de44c93659457e05036dec25b0af610a773a6)

vmm: take exclusive mem_segs_lock in vm_cleanup()

The consumers of vm_cleanup() are vm_reinit() and vm_destroy().

The vm_reinit() call path is, here vmmdev_ioctl() takes mem_segs_lock:
    vmmdev_ioctl()
    vm_reinit()
    vm_cleanup(destroy=false)

The call path for vm_destroy() is (mem_segs_lock not taken):
    sysctl_vmm_destroy()
    vmmdev_destroy()
    vm_destroy()
    vm_cleanup(destroy=true)

Fix this by taking mem_segs_lock in vm_cleanup() when destroy == true.

Reviewed by: corvink, markj, jhb
Fixes: 67b69e76e8ee ("vmm: Use an sx lock to protect the memory map.")
Differential Revision: https://reviews.freebsd.org/D38071

(cherry picked from commit c668e8173a8fc047b54a5c51b0fe4637e87836b6)

vmm: take exclusive mem_segs_lock when (un)assigning ppt dev

PR: 268744
Reported by: mmatalka@gmail.com
Reviewed by: corvink, markj, jhb
Fixes: 67b69e76e8ee ("vmm: Use an sx lock to protect the memory map.")
Differential Revision: https://reviews.freebsd.org/D37962

(cherry picked from commit ccf32a68f821c5c724fb9a5b4b9576925122292f)

vmm: Use an sx lock to protect the memory map.

Previously bhyve obtained a "read lock" on the memory map for ioctls
needing to read the map by locking the last vCPU. This is now
replaced by a new per-VM sx lock. Modifying the map requires
exclusively locking the sx lock as well as locking all existing vCPUs.
Reading the map requires either locking one vCPU or the sx lock.

This permits safely modifying or querying the memory map while some
vCPUs do not exist which will be true in a future commit.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37172

(cherry picked from commit 67b69e76e8eecfd204f6de636d622a1d681c8d7e)

vmm: Destroy mutexes.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37171

(cherry picked from commit 08ebb360764729632e1f6bc4e3f434abdd708204)

vmm stat: Add a special nelems constant for arrays sized by vCPU count.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37170

(cherry picked from commit d5118d0fc4599f69116ec8de59052606e36e6306)

vmm vmx: Allocate vpids on demand as each vCPU is initialized.

Compared to the previous version this does mean that if the system as
a whole runs out of dedicated vPIDs you might end up with some vCPUs
within a single VM using dedicated vPIDs and others using shared
vPIDs, but this should not break anything.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37169

(cherry picked from commit 58eefc67a1cf16623c23354efd089f65401c0455)

vmm: Lookup vcpu pointers in vmmdev_ioctl.

Centralize mapping vCPU IDs to struct vcpu objects in vmmdev_ioctl and
pass vcpu pointers to the routines in vmm.c. For operations that want
to perform an action on all vCPUs or on a single vCPU, pass pointers
to both the VM and the vCPU using a NULL vCPU pointer to request
global actions.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37168

(cherry picked from commit 3f0f4b1598e0e7005bebed7ea3458e96d0fb8e2f)

vmm ppt: Remove unused vcpu arg from MSI setup handlers.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37167

(cherry picked from commit 0cbc39d53d2270fa77255c663a0cfa5ed502ab0a)

vmm: Remove unused vcpuid argument from vioapic_process_eoi.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37166

(cherry picked from commit e42c24d56b3d949aafd0c916e30ab91a4fe1e24d)

vmm: Use struct vcpu in the rendezvous code.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37165

(cherry picked from commit d8be3d523dd50a17f48957c1bb2e0cd7bbf02cab)

vmm: Remove support for vm_rendezvous with a cpuid of -1.

This is not currently used.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37164

(cherry picked from commit 949f0f47a4e774fea7222923440851c612a3f6fa)

vmm: Remove vcpuid from I/O port handlers.

No I/O ports are vCPU-specific (unlike memory which does have
vCPU-specific ranges such as the local APIC).

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37163

(cherry picked from commit 9388bc1e3a58ac17c06e690fb9f46c9f36098f2d)

vmm: Restore the correct vm_inject_*() prototypes

Fixes: 80cb5d845b8f ("vmm: Pass vcpu instead of vm and vcpuid...")
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D37443

(cherry picked from commit ca6b48f08034114edf1fa19cdc088021af2eddf3)

vmm: Pass vcpu instead of vm and vcpuid to APIs used from CPU backends.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37162

(cherry picked from commit 80cb5d845b8f4b7dc25b5dc7f4a9a653b98b0cc6)

vmm: Use struct vcpu in the instruction emulation code.

This passes struct vcpu down in place of struct vm and and integer
vcpu index through the in-kernel instruction emulation code. To
minimize userland disruption, helper macros are used for the vCPU
arguments passed into and through the shared instruction emulation
code.

A few other APIs used by the instruction emulation code have also been
updated to accept struct vcpu in the kernel including
vm_get/set_register and vm_inject_fault.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37161

(cherry picked from commit d3956e46736ffaee5060c9baf0a40f428bc34ec3)

vmm: Add vm_gpa_hold_global wrapper function.

This handles the case that guest pages are being held not on behalf of
a virtual CPU but globally. Previously this was handled by passing a
vcpuid of -1 to vm_gpa_hold, but that will not work in the future when
vm_gpa_hold is changed to accept a struct vcpu pointer.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37160

(cherry picked from commit 28b561ad9d03617418aed33b9b8c1311e940f0c8)

vmm: Add _KERNEL guards for io headers shared with userspace.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37159

(cherry picked from commit 0f435e647645afc68076ff5b005f9366c44413eb)

bhyve: Remove unused vm and vcpu arguments from vm_copy routines.

The arguments identifying the VM and vCPU are only needed for
vm_copy_setup.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37158

(cherry picked from commit 2b4fe856f44ded02f3450bac1782bb49b60b7dd5)

vmm: Use struct vcpu with the vmm_stat API.

The function callbacks still use struct vm and and vCPU index.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37157

(cherry picked from commit 3dc3d32ad67b38ab44ed4a7cf3020a0741b47ec1)

vmm: Expose struct vcpu as an opaque type.

Pass a pointer to the current struct vcpu to the vcpu_init callback
and save this pointer in the CPU-specific vcpu structures.

Add routines to fetch a struct vcpu by index from a VM and to query
the VM and vcpuid from a struct vcpu.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37156

(cherry picked from commit 950af9ffc616ee573a1ce6ef0c841e897b13dfc4)

vmm: Use VLAPIC_CTR* in more places.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37155

(cherry picked from commit d030f941e63f5b20efa14833912aae29ff737fcf)

vmm vmx: Add VMX_CTR* wrapper macros.

These macros are similar to VCPU_CTR* but accept a single vmx_vcpu
pointer as the first argument instead of separate vm and vcpuid.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37154

(cherry picked from commit 57e0119ef3a95d7faa11c44b1acbb8193aadfb35)

vmm svm: Add SVM_CTR* wrapper macros.

These macros are similar to VCPU_CTR* but accept a single svm_vcpu
pointer as the first argument instead of separate vm and vcpuid.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37153

(cherry picked from commit fca494dad06242aa45d3e722f0c16b405dc8039c)

vmm: Remove the per-vm cookie argument from vmmops taking a vcpu.

This requires storing a reference to the per-vm cookie in the
CPU-specific vCPU structure. Take advantage of this new field to
remove no-longer-needed function arguments in the CPU-specific
backends. In particular, stop passing the per-vm cookie to functions
that either don't use it or only use it for KTR traces.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37152

(cherry picked from commit 869c8d1946eb4feb8ad651abdf87af0e5c0111b4)

vmm: Refactor storage of CPU-dependent per-vCPU data.

Rather than storing static arrays of per-vCPU data in the CPU-specific
per-VM structure, adopt a more dynamic model similar to that used to
manage CPU-specific per-VM data.

That is, add new vmmops methods to init and cleanup a single vCPU.
The init method returns a pointer that is stored in 'struct vcpu' as a
cookie pointer. This cookie pointer is now passed to other vmmops
callbacks in place of the integer index. The index is now only used
in KTR traces and when calling back into the CPU-independent layer.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37151

(cherry picked from commit 1aa5150479bf35c90c6770e6ea90e8462cfb6bf9)