Corvin Köhne [Mon, 22 Nov 2021 15:26:03 +0000 (16:26 +0100)]
bhyve: keep physical and virtual COMMAND reg in sync
On startup all virtual BARs are registered.
Additionally, the encoding bit in the virtual cmd register is set.
After that, the passthru emulation overwrites the virtual cmd register with
the physical one.
This could lead to a mismatch between registered BARs and the encoding
bits in the cmd register.
Instead of writing the physical to the virtual cmd register,
write the virtual to the physical cmd register to solve this issue.
Corvin Köhne [Mon, 22 Nov 2021 15:22:48 +0000 (16:22 +0100)]
bhyve: move 64 bit BAR location to match OVMF assumptions
OVMF will fail, if large 64 bit BARs are used. GCD-Map doesn't cover
64 bit addresses of BARs.
OVMF assumes that 64 bit addresses of BARS are located on next 32 GB
boundary behind Top of High RAM.
This patch moves 64 bit BARs on next 32 GB boundary behind Top of High
RAM to match OVMF assumptions.
Corvin Köhne [Fri, 15 Oct 2021 07:25:54 +0000 (09:25 +0200)]
bhyve: ignore low bits of CFGADR
Bhyve could emulate wrong PCI registers.
In the best case, the guest reads wrong registers and the device driver would
report some errors.
In the worst case, the guest writes to wrong PCI registers and could brick
hardware when using PCI passthrough.
According to Intels specification, low bits of CFGADR should be
ignored. Some OS like linux may rely on it. Otherwise, bhyve could
emulate a wrong PCI register.
E.g.
If linux would like to read 2 bytes from offset 0x02, following would
happen.
linux:
outl 0x80000002 at CFGADR
inw at CFGDAT + 2
bhyve:
cfgoff = 0x80000002 & 0xFF = 0x02
coff = cfgoff + (port - CFGDAT) = 0x02 + 0x02 = 0x04
Bhyve would emulate the register at offset 0x04 not 0x02.
When calling file_findfile with only a type it returns
the first file matching the type. But in fdt_apply_overlays we
then iterate on the next files and try loading them as dtb overlays.
Fix this by checking the type one more time.
Sponsored by: Diablotin Systems
Reported by: Mark Millard <marklmi@yahoo.com>
Emmanuel Vadot [Mon, 15 Nov 2021 11:12:50 +0000 (12:12 +0100)]
loader: lsefi: Print more information
Printing the EFI_HANDLE pointer isn't very useful.
If the handle have a IMAGE_DEVICE_PATH or a DEVICE_PATH protocol print it.
This makes it easier to see which devices are present and what protocol they
expose.
Emmanuel Vadot [Sun, 14 Nov 2021 14:32:00 +0000 (15:32 +0100)]
loader: Add more bus name to pnpautoload
Add ofwbus, iicbus and spibus to pnpautoload so modules under those
buses will be loaded.
On my rockpro64 now :
OK pnpautoload -v
Autoloading modules for simplebus
Using DTB provided by EFI at 0x8100000.
Autoloading modules for ofwbus
/boot/kernel/rk_spi.ko text=0x14b2 text=0xd4c data=0x4d0+0x8 syms=[0x8+0xa98+0x8+0x807]
/boot/kernel/dwwdt.ko text=0x12e2 text=0x78c data=0x4c8+0x10 syms=[0x8+0x990+0x8+0x6e1]
Autoloading modules for iicbus
Autoloading modules for spibus
/boot/kernel/mx25l.ko text=0x1613 text=0x114c data=0x6e8+0x8 syms=[0x8+0xa08+0x8+0x665]
loading required module 'fdt_slicer'
/boot/kernel/fdt_slicer.ko text=0x95e text=0x340 data=0x290 syms=[0x8+0x6c0+0x8+0x4a0]
Ian Lepore [Tue, 9 Nov 2021 14:34:06 +0000 (15:34 +0100)]
Add ETHER_ALIGN support to ng_device(4).
This adds a new ng_device command, NGM_DEVICE_ETHERALIGN, which has no
associated args. After the command arrives, the device begins adjusting all
packets sent out its hook to have ETHER_ALIGN bytes of padding at the
beginning of the packet. The ETHER_ALIGN padding is added only when
running on an architecture that requires strict alignment of IP headers
(based on the __NO_STRICT_ALIGNMENT macro, which is only #define'd on
x86 as of this writing).
This also adds ascii <-> binary command translation to ng_device, both for
the existing NGM_DEVICE_GET_DEVNAME and the new ETHERALIGN command.
This also gives a name to every ng_device node when it is constructed, using
the cdev device name (ngd0, ngd1, etc). This makes it easier to address
command msgs to the device using ngctl(8).
Emmanuel Vadot [Thu, 30 Dec 2021 08:47:06 +0000 (09:47 +0100)]
loader: tftp: Copy the first block into the cache
tftp_open reads the first block so copy it in the cached data.
If we have more than one block (i.e. we called tftp_read before
tftp_preload) simply just reset the transfer.
Emmanuel Vadot [Fri, 10 Dec 2021 09:33:43 +0000 (10:33 +0100)]
loader: Add preload operation to fs_ops
When we load an ELF file (kernel or module) we do seek(2) a lot to
parse/load the different sections of the ELF file.
Protocol like TFTP suffers a lot from this as there is no resume or
a way to start the tranfer from a specified offset in the file.
fs_preload is added to help those protocol.
Call preload just after opening the ELF file that we need to load so
the underlying method can cache the hole file and then read/lseek operations
are faster.
Emmanuel Vadot [Mon, 13 Dec 2021 09:35:23 +0000 (10:35 +0100)]
loader: tftp: Don't let tftp timeout
When we load a kernel or module we open/close it a few times.
Since we're using the same port number each time and that we requested
the same file the ACK that we send are valid on the server side and the
server send us the file multiple times.
This makes tftp loading time very inconsistant due to the UDP "flood" that
we have to process.
Alexander Motin [Thu, 30 Dec 2021 03:58:52 +0000 (22:58 -0500)]
CTL: Allow I/Os up to 8MB, depending on maxphys value.
For years CTL block backend limited I/O size to 1MB, splitting larger
requests into sequentially processed chunks. It is sufficient for
most of use cases, since typical initiators rarely use bigger I/Os.
One of known exceptions is VMWare VAAI offload, by default sending up
to 8 4MB EXTENDED COPY requests same time. CTL internally converted
those into 32 1MB READ/WRITE requests, that could overwhelm the block
backend, having finite number of processing threads and making more
important interactive I/Os to wait in its queue. Previously it was
partially covered by CTL core serializing sequential reads to help
ZFS speculative prefetcher, but that serialization was significantly
relaxed after recent ZFS improvements.
With the new settings block backend receives 8 4MB requests, that
should be easier for both CTL itself and the underlying storage.
Colin Percival [Mon, 20 Dec 2021 17:55:36 +0000 (09:55 -0800)]
vfs_mountroot: Check for root dev before waiting
If GEOM is idle but the root device is not yet present when we enter
vfs_mountroot_wait_if_necessary, we call vfs_mountroot_wait to wait
for root holds (e.g. CAM or USB initialization). Upon returning from
vfs_mountroot_wait, we wait 100 ms at a time until the root device
shows up.
Since the root device most likely appeared during vfs_mountroot_wait
-- waiting for subsystems which may be responsible for the root
device is the whole purpose of that function -- it makes sense to
check if the device is now present rather than printing a warning
and pausing for 100 ms before checking.
Reviewed by: trasz Fixes: a3ba3d09c248 Make root mount wait mechanism smarter
Sponsored by: https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D33593
Colin Percival [Mon, 20 Dec 2021 17:51:34 +0000 (09:51 -0800)]
vfs_mountroot: Wait for GEOM idle post root holds
In the case of a root hold related to the initialization of a disk
device, a flurry of GEOM tasting is likely to take place as soon as
the device is initialized and the root hold is released. If we
don't wait for GEOM idle it's easy for vfs_mountroot to "win" the
race and proceed before the root filesystem GEOM is ready.
Colin Percival [Mon, 20 Dec 2021 15:17:25 +0000 (07:17 -0800)]
vfs_mountroot: Skip 'Root mount waiting' < 1 s
While the message is technically correct, it's not particularly
helpful in the case where we're only waiting a few ms; this case
occurs frequently on EC2 arm64 instances with CAM initialization
racing to release its root hold before vfs_mountroot reaches this
point. Only print the message if we end up waiting for more than
one second.
Alexander Motin [Tue, 28 Dec 2021 01:52:59 +0000 (20:52 -0500)]
GEOM: Minor polishing in geom_event.
- Remove timeouts from msleep()'s. Those should always be woken up.
- Move wakeup() under the lock to not call on possibly freed pointer.
- Remove some dead code.
SDT probe frb_natv4in is only available when an error is encountered.
Make it also available when no error is encountered, i.e. NATed and
not translated.
Leandro Lupori [Wed, 15 Dec 2021 11:49:47 +0000 (08:49 -0300)]
powerpc64: fix the calculation of Maxmem
The calculation of Maxmem was skipping the last phys_avail segment,
because of a wrong stop condition.
This was detected when using QEMU/PowerNV with Radix MMU and low
memory (2G). In this case opal_pci would allocate a DMA window that
was too small to cover all physical memory, resulting in reading all
zeroes from disk when using memory that was not inside the allocated
window.
Reviewed by: jhibbits
Sponsored by: Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D33449
largepage_mprotect maps a superpage and later extends the mapping. This
occasionally fails with ASLR disabled. To fix this, first try to
reserve a sufficiently large virtual address region.
Reported by: Jenkins
Sponsored by: The FreeBSD Foundation
Olivier Houchard [Wed, 10 Mar 2021 18:01:41 +0000 (19:01 +0100)]
arm64: Fix COMPAT_FREEBSD32.
The ENTRY() macro was modified by commit 28d945204ea1014d7de6906af8470ed8b3311335 to add an optional NOP instruction
at the beginning of the function. It is of course an arm64 instruction, so
unsuitable for the 32bits sigcode. So just use EENTRY() instead for
aarch32_sigcode. This should fix receiving signals when running 32bits
binaries on FreeBSD/arm64.
Ed Maste [Fri, 7 Jan 2022 15:34:08 +0000 (10:34 -0500)]
Build libclang also if LLDB is enabled
LLDB depends on libclang as it uses Clang as the expression parser.
Previously setting WITHOUT_CLANG but leaving LLDB enabled (as default)
resulted in a build failure.
Users who set WITHOUT_CLANG in order to reduce build time or size
might want to set WITHOUT_LLDB in addition to WITHOUT_CLANG, or use
WITHOUT_TOOLCHAIN instead.
PR: 260993
Reported by: eugen
Reviewed by: dim
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Bjoern A. Zeeb [Sat, 1 Jan 2022 18:08:31 +0000 (18:08 +0000)]
iwlwifi: clarifying man page update
Based on some feedback clarify the man page for
- how to load the driver currently
- status of the driver with respect to iwm(4)
and leave a comment to (automatically) add a full list of chipsets
to the man page.
Sponsored by: The FreeBSD Foundation
Reviewed by: debdrup
Differential Revision: https://reviews.freebsd.org/D33713
Bjoern A. Zeeb [Thu, 23 Dec 2021 14:59:49 +0000 (14:59 +0000)]
bhyve: passthru: enable BARs before possibly mmap(2)ing them
The first time we start bhyve with a passthru device everything is fine
as on boot we do enable BARs. If a driver (unload) inside bhyve disables
the BAR(s) as some Linux drivers do, we need to make sure we re-enable
them on next bhyve start.
If we are trying to mmap a disabled BAR for MSI-X (PCIOCBARMMAP)
the kernel will give us an EBUSY.
While we were re-enabling the BAR(s) in the current code loop
cfginit() was writing the changes out too late to the real hardware.
Move the call to init_msix_table() after the register on the real
hardware was updated. That way the kernel will be happy and the
mmap will succeed and bhyve will start.
Also simplify the code given the last argument to init_msix_table()
is unused we do not need to do checks for each bar. [1]
Some passthru devices only support MSI instead of MSI-X. For those
devices the initialization of MSI-X table will fail. Re-add (or in
the MFC case keep) the check erroneously removed in the initial commit. [2]
PR: 260148
Pointed out by: markj [1]
Sponsored by: The FreeBSD Foundation
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D33628
Submitted by: (C.Koehne beckhoff.com) [2]
Reviewed by: manu, bz
Differential Revision: https://reviews.freebsd.org/D33728
Bjoern A. Zeeb [Fri, 31 Dec 2021 11:47:14 +0000 (11:47 +0000)]
LinuxKPI: 802.11 fix queue wait
We are using a bandaid to wait for queues after station creation
looping and pausing.
The abort condition was looping in the wrong direction so we were
potentially waiting forever if queues never became ready.
From initial user test data we also found that the wait time was
too low in some cases so increase the length.
Bjoern A. Zeeb [Fri, 31 Dec 2021 11:51:18 +0000 (11:51 +0000)]
iwlwifi: import correct firmware versions for select Intel iwlwifi/mvm
The firmware files for 3160, 7260, and 7265 imported contain old versions
no longer supported by the driver.
Replace with latest versions from linux-firmware to possibly also
support these chip revisions.
Reported by: FreeBSD User (freebsd walstatt-de.de) on wireless (2021-12-30)
Sponsored by: The FreeBSD Foundation
Cy Schubert [Tue, 16 Feb 2021 15:44:07 +0000 (07:44 -0800)]
ipfilter: Make LARGE_NAT a tunable.
LARGE_NAT is a C macro that increases
NAT_SIZE from 127 to 2047,
RDR_SIZE from 127 to 2047,
HOSTMAP_SIZE from 2047 to 8191,
NAT_TABLE_MAX from 30000 to 180000, and
NAT_TABLE_SZ from 2047 to 16383.
These values can be altered at runtime using the ipf -T command however
some adminstrators of large firewalls rebuild the kernel to enable
LARGE_NAT at boot. This revision adds the tunable net.inet.ipf.large_nat
which allows an administrator to set this option at boot instead of build
time. Setting the LARGE_NAT macro to 1 is unaffected allowing build-time
users to continue using the old way.
Mark Johnston [Tue, 4 Jan 2022 19:02:55 +0000 (14:02 -0500)]
bhyve: Map the right BAR in init_msix_table()
The PBA and MSI-X table can reside in different BARs.
Reported by: Andy Fiddaman <andy@omniosce.org>
Reviewed by: jhb
Fixes: 7fa233534736 ("bhyve: Map the MSI-X table unconditionally for passthrough")
Sponsored by: The FreeBSD Foundation
Mark Johnston [Wed, 5 Jan 2022 15:08:13 +0000 (10:08 -0500)]
bhyve: Correct unmapping of the MSI-X table BAR
The starting address passed to mprotect was wrong, so in the case where
the last page containing the table is not the last page of the BAR, the
wrong region would be unmapped.
Reported by: Andy Fiddaman <andy@omniosce.org>
Reviewed by: jhb
Fixes: 7fa233534736 ("bhyve: Map the MSI-X table unconditionally for passthrough")
Sponsored by: The FreeBSD Foundation
ng_ubt(4): Introduce net.bluetooth.usb_isoc_enable loader tunable to disable
isochronous transfers.
If users want to disable isochronous transfers, which cause high
frequency periodic interrupts from the USB host controller, then
net.bluetooth.usb_isoc_enable can be set to zero, either as a
sysctl(8) or as a loader.conf(5) tunable.
ifconfig(8): Don't set network interface capabilities when there is no change.
A quick grep through the kernel code shows network drivers compute the
changed bits of network capabilities after a SIOCSIFCAP IOCTL(2) by
using the bitwise exclusive or operation. When the set capabilities
are equal to the already read capabilities, no action will be taken.
Let ifconfig(8) predict this case and skip the SIOCSIFCAP IOCTL(2)
system call.
Discussed with: kib@ (revert change in case of issues)
Sponsored by: NVIDIA Networking
Warner Losh [Tue, 16 Nov 2021 03:35:58 +0000 (20:35 -0700)]
mpr: Minor formatting changes to match mps.
Minor reformatting nits to make mprsas_scsiio_timeout match
mpssas_scsiio_timeout more closely. The differences aren't necessary and
are distracting when comparing the routines. No functional changes.
Warner Losh [Thu, 2 Dec 2021 20:53:44 +0000 (13:53 -0700)]
mps(4): Fix unmatched devq release.
Port 9781c28c6d63 and a8837c77efd0 to the mps driver. Before this
change devq was frozen only if some command was sent to the target after
reset started, but release was called always. This change freezes the
devq immediately, leaving mprsas_action_scsiio() check only to cover
race condition due to different lock devq use.
This should also avoid unnecessary requeue of the commands, creating
additional log noise and confusing some broken apps. It also avoids a
'busy' requeue of I/Os failing when we're doing recovery that takes
longer than the normal busy timeout. These I/Os failing can lead to
filesystems being unmounted in the force unmount case for I/O errors.