Ed Maste [Sat, 30 May 2020 19:16:33 +0000 (19:16 +0000)]
binutils: build as with BINUTILS || BINUTILS_BOOTSTRAP
Previously we descended into as only if MK_BINUTILS was true, including
during the bootstrap tool phase. BINUTILS is now disabled by default on
all archs, and we failed to build it during amd64 bootstrap.
Descend into as if either BINUTILS or BINUTILS_BOOTSTRAP is enabled.
This is not quite correct: we should either have the test also depend on
BOOTSTRAPPING, or set BINUTILS to the value of BINUTILS_BOOTSTRAP during
the bootstrap phase. However, this simple change fixes the build and
has been tested, and binutils will be removed completely in the near
future.
Ed Maste [Sat, 30 May 2020 16:12:00 +0000 (16:12 +0000)]
Disable BINUTILS by default on amd64
The retirement of obsolete binutils 2.17.50 has been in progress for
quite some time. All tools other than GNU as were removed prior to this
commit, and it was built only on amd64 - installed as /usr/bin/as, and
used as a bootstrap tool.
The amd64 exp-run has completed and failures have now been addressed in
the individual ports, so disable it by default.
PR: 233611, 205250 [exp-run]
Sponsored by: The FreeBSD Foundation
Jilles Tjoelker [Sat, 30 May 2020 16:00:49 +0000 (16:00 +0000)]
sh: Allow more scripts without #!
Austin Group bugs #1226 and #1250 changed the requirements for shell scripts
without #! (POSIX does not specify #!; this is about the shell execution
when execve(2) returns an [ENOEXEC] error).
POSIX says we shall allow execution if the initial part intended to be
parsed by the shell consists of characters and does not contain the NUL
character. This allows concatenating a shell script (ending with exec or
exit) and a binary payload.
In order to reject common binary files such as PNG images, check that there
is a lowercase letter or expansion before the last newline before the NUL
character, in addition to the check for the newline character suggested by
POSIX.
Mike Karels [Sat, 30 May 2020 02:09:36 +0000 (02:09 +0000)]
genet: workaround for problem with ICMPv6 echo replies
The ICMPv6 echo reply is constructed with the IPv6 header too close to
the beginning of a packet for an Ethernet header to be prepended, so we
end up with an mbuf containing just the Ethernet header. The GENET
controller doesn't seem to handle this, with or without transmit checksum
offload. At least until we have chip documentation, do a pullup to
satisfy the chip. Hopefully this can be fixed properly in the future.
Mike Karels [Sat, 30 May 2020 02:02:34 +0000 (02:02 +0000)]
genet: fix issues with transmit checksum offload
Fix problem with ICMP echo replies: check only deferred data checksum
flags, and not the received checksum status bits, when checking whether
a packet has a deferred checksum; otherwise echo replies are corrupted
because the received checksum status bits are still present.
Fix some unhandled cases in packet shuffling for checksum offload.
Doug Moore [Sat, 30 May 2020 01:48:12 +0000 (01:48 +0000)]
RB_REMOVE invokes RB_REMOVE_COLOR either when child is red or child is
null. In the first case, RB_REMOVE_COLOR just changes the child to
black and returns. With this change, RB_REMOVE handles that case, and
drops the child argument to RB_REMOVE_COLOR, since that value is
always null.
RB_REMOVE_COLOR is changed to remove a couple of unneeded tests, and
to eliminate some deep indentation.
RB_ISRED is defined to combine a null check with a test for redness,
to replace that combination in several places.
John Baldwin [Sat, 30 May 2020 00:47:03 +0000 (00:47 +0000)]
Only build ipsec modules if the kernel includes IPSEC_SUPPORT.
Honoring the kernel-supplied opt_ipsec.h in r361632 causes builds of
ipsec modules to fail if the kernel doesn't include IPSEC_SUPPORT.
However, the module can never be loaded into such a kernel, so only
build the modules if the kernel includes IPSEC_SUPPORT.
Alexander Motin [Fri, 29 May 2020 17:52:20 +0000 (17:52 +0000)]
Remove session locking from cfiscsi_pdu_update_cmdsn().
cs_cmdsn can be incremented with single atomic. expcmdsn/maxcmdsn set in
cfiscsi_pdu_prepare() based on cs_cmdsn are not required to be updated
synchronously, only monotonically, that is achieved with lock there.
Ed Maste [Fri, 29 May 2020 17:36:54 +0000 (17:36 +0000)]
Disable BINUTILS by default on i386
The retirement of obsolete binutils 2.17.50 has been in progress for
quite some time. All tools other than GNU as were removed prior to this
commit, and it was built only on two archs:
i386, installed as /usr/bin/as
amd64, installed as /usr/bin/as and as a bootstrap tool
The i386 exp-run has completed and failures have been addressed in the
individual ports, so disable it there.
PR: 233611, 205250 [exp-run]
Sponsored by: The FreeBSD Foundation
Adrian Chadd [Fri, 29 May 2020 15:56:44 +0000 (15:56 +0000)]
[run] Add initial 802.11n support.
* Enable self-generated 11n frames
* add MCS rates for 1-stream and 2-stream rates; will do 3-stream
once the rest of this tests out OK with other people.
* Hard-code 1 stream for now
* Add A-MPDU RX mbuf tagging
* RTS/CTS if doing RTSCTS in HT protmode as well as legacy; they're
separate configuration flags
* Update the amrr rate index stuff - walk the rates array like others
to find the right one - this now works for MCS and CCK/OFDM rates
* Add support for atheros fast frames/AMSDU support as we can generate
those in net80211.
TODO:
* HT40 isn't enabled yet
* No A-MPDU support just yet; that requires some more firmware research
and maybe porting some ath(4) A-MPDU support/tracking into net80211
* Short preamble flags aren't set yet for MCS; need to check the linux
driver and see what's going on there
* Add 3x3 rates and set tx/rx stream configuration appropriately
* More 5GHz testing; I have a 3x3 dual band USB NIC coming soon that'll
let me test this.
* Figure out why the RX path isn't performing as fast as it could -
there's only a single buffer loaded at a time for the receive path
in the USB bulk handler and this may not be super useful.
Andriy Gapon [Fri, 29 May 2020 07:50:55 +0000 (07:50 +0000)]
do not enable pci bridge decoding on resume until I/O windows are restored
PCI bus driver restores most but not all of a child PCI-PCI bridge
configuration. The bridge's I/O windows are restored by pcib driver and
that happens later in time. This can be problematic because the Command
register is restored before the windows are restored. If the firmware
programs the windows incorrectly or even does not program them at all,
then the bridge can start claiming I/O cycles that are not intended for
it. This will continue until the correct windows are restored.
I have observed this problem with a buggy BIOS where after resuming from
S3 an I/O port window of a PCI-PCI bridge was configured with zero base
and limit causing the bridge to claim 0x0 - 0xFFF port range. That
interfered with ACPI port access including ACPI PM Timer at port 0x808,
thus wreaking havoc in the time keeping.
The solution is to restore the Command register of PCI-PCI bridges after
the windows are restored in pcib driver. While here, I decided that for
other PCI device types (normal and cardbus) it's better to restore the
Command register after their BARs are restored.
To do: per jhb's suggestion, move the window handling to pci driver.
Andriy Gapon [Fri, 29 May 2020 07:44:02 +0000 (07:44 +0000)]
corefile_open_last: don't keep a locked vnode while locking other ones
Consider this scenario:
- kern.corefile=/var/coredumps/%N.%U.%I.core
- multiple processes with the same name crash at the same time
It's possible that one process selects existing file N as oldvp while it
keeps looking for an unused file number. Another process scans through
files and stumbles upon N. That process would be blocked on the vnode
lock while holding the directory vnode exclusively locked. The first
process would, thus, get blocked on the directory's vnode lock.
More generally, holding a file's vnode lock (oldvp) while trying to lock
its directory (for the next lookup) is a violation of the vnode locking
order.
I have observed this deadlock in the wild.
So, the change to keep oldvp "opened" but unlocked and to lock it again
only if it's to be returned as the result.
As kib noted, an alternative would be to keep the directory locked and
to use VOP_LOOKUP directly for scanning through existing core files.
John Baldwin [Fri, 29 May 2020 05:41:21 +0000 (05:41 +0000)]
Increment the correct pointer when a crypto buffer spans an mbuf or iovec.
When a crypto_cursor_copyback() request spanned multiple mbufs or
iovecs, the pointer into the mbuf/iovec was incremented instead of the
pointer into the source buffer being copied from.
PR: 246737
Reported by: Jenkins, ZFS test suite
Sponsored by: Netflix
Alexander Motin [Fri, 29 May 2020 02:32:48 +0000 (02:32 +0000)]
Move EXPDATASN/R2TSN from PDU to CTL_PRIV_FRONTEND.
We any way have per-I/O space in CTL_PRIV_FRONTEND, while for PDU private
fields I have better use ideas. Plus to me such use of PDU fields looked
a layering violation.
Justin Hibbits [Fri, 29 May 2020 00:46:31 +0000 (00:46 +0000)]
powerpc: Stop advertising that POWER8 and POWER9 support HTM
HTM is on the chopping block, doesn't work on FreeBSD, and has only token
support in PowerISA 3.1 and POWER10. Don't advertise something we'll never
support.
Iterating through uint32_t space 1 byte at a time should
look like this:
freebsd-carambola2:/mnt# ./test
Hello, world!
offset 0 pointer 0x410b00 value 0x12345678 0x12345678
offset 1 pointer 0x410b01 value 0x3456789a 0x3456789a
offset 2 pointer 0x410b02 value 0x56789abc 0x56789abc
offset 3 pointer 0x410b03 value 0x789abcde 0x789abcde
offset 4 pointer 0x410b04 value 0x9abcdef1 0x9abcdef1
offset 5 pointer 0x410b05 value 0xbcdef123 0xbcdef123
offset 6 pointer 0x410b06 value 0xdef12345 0xdef12345
offset 7 pointer 0x410b07 value 0xf1234567 0xf1234567
.. but to begin with it looked like this:
offset 0 value 0x12345678
offset 1 value 0x00410a9a
offset 2 value 0x00419abc
offset 3 value 0x009abcde
offset 4 value 0x9abcdef1
offset 5 value 0x00410a23
offset 6 value 0x00412345
offset 7 value 0x00234567
The amusing reason? The compiler is generating the lwr/lwl incorrectly.
Here's an example after I tried to replace the two macros with a single
invocation and offset, rather than having the compiler compile in addiu
to s3 - but the bug is the same:
Alexander Motin [Thu, 28 May 2020 23:55:46 +0000 (23:55 +0000)]
Remove PDU_TOTAL_TRANSFER_LEN() macro.
I don't see a point to copy io->scsiio.kern_total_len into the request
PDU private field. The io is going to stay with us till the end, and
kern_total_len field is not changed after being first initialized.
Ed Maste [Thu, 28 May 2020 22:05:50 +0000 (22:05 +0000)]
rename in-tree libevent v1 to libevent1
r316063 installed pf's embedded libevent as a private lib, with headers
in /usr/include/private/event. Unfortunately we also have a copy of
libevent v2 included in ntp, which needed to be updated for compatibility
with OpenSSL 1.1.
As unadorned 'libevent' generally refers to libevent v2, be explicit that
this one is libevent v1.
Eric van Gyzen [Thu, 28 May 2020 21:56:31 +0000 (21:56 +0000)]
Revert part of r360964
ports/devel/linux_libusb builds FreeBSD libusb with GCC 4.8.5
from devel/linux-c7-devtools. Restore the tests for older GCC
in bsd.sys.mk to accomodate such ports.
Rick Macklem [Thu, 28 May 2020 21:06:10 +0000 (21:06 +0000)]
Add a syscall for the nfs-over-tls daemons to use.
The nfs-over-tls daemons need a system call to perform operations such as
associate a file descriptor with a krpc socket.
The daemons will not be in head for some time, but it will make it
easier for testers of nfs-over-tls to do testing if the system call
is in head (basically the stub for libc which will be commited soon).
Mark Johnston [Thu, 28 May 2020 19:41:00 +0000 (19:41 +0000)]
Fix boot on systems where NUMA domain 0 is unpopulated.
- Add vm_phys_early_add_seg(), complementing vm_phys_early_alloc(), to
ensure that segments registered during hammer_time() are placed in the
right domain. Otherwise, since the SRAT is not parsed at that point,
we just add them to domain 0, which may be incorrect and results in a
domain with only several MB worth of memory.
- Fix uma_startup1() to try allocating memory for zones from any domain.
If domain 0 is unpopulated, the allocation will simply fail, resulting
in a page fault slightly later during boot.
- Change _vm_phys_domain() to return -1 for addresses not covered by the
affinity table, and change vm_phys_early_alloc() to handle wildcard
domains. This is necessary on amd64, where the page array is dense
and pmap_page_array_startup() may allocate page table pages for
non-existent page frames.
Reported and tested by: Rafael Kitover <rkitover@gmail.com>
Reviewed by: cem (earlier version), kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25001
Mitchell Horne [Thu, 28 May 2020 14:56:11 +0000 (14:56 +0000)]
Add macros simplifying the fake preload setup
This is in preparation for booting via loader(8). Lift these macros from arm64
so we don't need to worry about the size when inserting new elements. This
could have been done in r359673, but I didn't think I would be returning to
this function so soon.
Marcin Wojtas [Thu, 28 May 2020 09:13:20 +0000 (09:13 +0000)]
Change return types of hash update functions in SHA-NI
r359374 introduced crypto_apply function which takes as argument a function pointer
that is expected to return an int, however aesni hash update functions
return void.
Because of that the function pointer passed was simply cast with
its return value changed.
This resulted in undefined behavior, in particular when mbuf is used, (ipsec)
m_apply checks return value of function pointer passed to it
and in our case bogusly fails after calculating hash of the first mbuf
in chain.
Fix it by changing signatures of sha update routines in aesni and
dropping the casts.
xen/control: short circuit xctrl_on_watch_event on spurious event
If there's no data to read from xenstore short-circuit
xctrl_on_watch_event to return early, there's no reason to continue
since the lack of data would prevent matching against any known event
type.
xen/blkfront: use the correct type for disk sectors
The correct type to use to represent disk sectors is blkif_sector_t
(which is an uint64_t underneath). This avoid truncation of the disk
size calculation when resizing on i386, as otherwise the calculation
of d_mediasize in xbd_connect is truncated to the size of unsigned
long, which is 32bits on i386.
Note this issue didn't affect amd64, because the size of unsigned long
is 64bits there.
Sponsored by: Citrix Systems R&D
MFC after: 1 week
xenpv: do not use low 1MB for Xen mappings on i386
On amd64 we already avoid using memory below 4GB in order to prevent
clashes with MMIO regions, but i386 was allowed to use any hole in
the physical memory map in order to map Xen pages.
Limit this to memory above the 1MB boundary on i386 in order to avoid
clashes with the MMIO holes in that area.
Sponsored by: Citrix Systems R&D
MFC after: 1 week
fib[46]_lookup_nh_ represents pre-epoch generation of fib api,
providing less guarantees over pointer validness and requiring
on-stack data copying.
Conversion is straight-forwarded, as the only 2 differences are
requirement of running in network epoch and the need to handle
RTF_GATEWAY case in the caller code.
fib4_lookup_nh_ represents pre-epoch generation of fib api,
providing less guarantees over pointer validness and requiring
on-stack data copying.
Switch call to use new fib4_lookup(), allowing to eventually
deprecate old api.
Switch ip_output/icmp_reflect rt lookup calls with fib4_lookup.
fib4_lookup_nh_ represents pre-epoch generation of fib api,
providing less guarantees over pointer validness and requiring
on-stack data copying.
Conversion is straight-forwarded, as the only 2 differences are
requirement of running in network epoch and the need to handle
RTF_GATEWAY case in the caller code.
Replace ip6_ouput fib6_lookup_nh_<ext|basic> calls with fib6_lookup().
fib6_lookup_nh_ represents pre-epoch generation of fib api,
providing less guarantees over pointer validness and requiring
on-stack data copying.
Conversion is straight-forwarded, as the only 2 differences are
requirement of running in network epoch and the need to handle
RTF_GATEWAY case in the caller code.
Switch gif(4) path verification to fib[46]_check_urfp().
fibX_lookup_nh_ represents pre-epoch generation of fib api,
providing less guarantees over pointer validness and requiring
on-stack data copying.
Use specialized fib[46]_check_urpf() from newer KPI instead,
to allow removal of older KPI.
Rick Macklem [Wed, 27 May 2020 23:20:35 +0000 (23:20 +0000)]
Fix sosend() for the case where mbufs are passed in while doing ktls.
For kernel tls, sosend() needs to call ktls_frame() on the mbuf list
to be sent. Without this patch, this was only done when sosend()'s
arguments used a uio_iov and not when an mbuf list is passed in.
At this time, sosend() is never called with an mbuf list argument when
kernel tls is in use, but will be once nfs-over-tls has been incorporated
into head.
Simplify the condition to enable superpage mappings in vm_fault_soft_fast().
The list of arches list there matches the list of arches where
default VM_NRESERVLEVEL > 0. Before sparc64 removal, that was the
only arch that defined VM_NRESERVLEVEL > 0 to help with cache coloring,
but did not implemented superpages. Now it can be simplified.
Alan Somers [Wed, 27 May 2020 19:13:26 +0000 (19:13 +0000)]
geli: fix a livelock during panic
During any kind of shutdown, kern_reboot calls geli's pre_sync event hook,
which tries to destroy all unused geli devices. But during a panic, geli
can't destroy any devices, because the scheduler is stopped, so it can't
switch threads. A livelock results, and the system never dumps core.
This commit fixes the problem by refusing to destroy any devices during
panic, used or otherwise.
Adrian Chadd [Wed, 27 May 2020 18:32:12 +0000 (18:32 +0000)]
[net80211] Fix interrupted scan logic and ticks comparison
The scan task refactoring stuff circa 2014-2016 broke the blocking task
into a taskqueue with some async bits, but it apparently broke scans
being interrupted by traffic.
Notably - the new "field" SCAN_PAUSE sets both SCAN_INTERRUPT and SCAN_CANCEL,
and a bunch of existing code was checking for SCAN_CANCEL only and breaking
the scan. Unfortunately it was then (a) cancelling the scan entirely and
(b) not notifying userland that scan was done.
So:
* Update the calls to scan_end() to only pass in 1 (saying the scan is complete)
if SCAN_CANCEL is set WITHOUT SCAN_INTERRUPT. If both are set then yes,
the scan is interrupted, but it isn't canceled - it's just paused.
* Update the "did the scan flags change whilst the driver was called" logic
to check for canceled scans, not interrupted scans.
* The "scan done" logic now explicitly checks for either interrupted or
completed scans. This accounts for the situation where a scan is being
aborted via traffic but it ALSO happens to have finished (ie the last
channel was checked.)
This doesn't ENTIRELY fix scanning as the resume function is broken
due to incorrect ticks math. Thus, the second half of this patch
changes the ieee80211_ticks_*() macros to use int instead of long,
matching the logic that the TCP code does with ticks and handles
wrapping / negative ticks values. If cast to long then the wrapping
math wouldn't work right (ie, if ticks was actually negative,
ie, after the system has been up for a while.)
This allows contbgscan() to correctly calculate if a scan should
continue based on ticks and ic->ic_lastdata .
Emmanuel Vadot [Wed, 27 May 2020 09:31:50 +0000 (09:31 +0000)]
linuxkpi: Add overflow.h
Only add check_add_overflow and check_mul_overflow as those are the only
two needed function by DRM v5.3.
Both gcc and clang have builtin to do this check so use them directly
but throw an error if the compiler/code checker doesn't support this builtin.
Sponsored-by: The FreeBSD Foundation
Reviewed by: hselsasky
Differential Revision: https://reviews.freebsd.org/D25015
Andrew Turner [Wed, 27 May 2020 08:00:38 +0000 (08:00 +0000)]
Support creating and using arm64 pmap at stage 2
Add minimal support for creating stage 2 IPA -> PA mappings. For this we
need to:
- Create a new vmid set to allocate a vmid for each Virtual Machine
- Add the missing stage 2 attributes
- Use these in pmap_enter to create a new mapping
- Handle stage 2 faults
The vmid set is based on the current asid set that was generalised in
r358328. It adds a function pointer for bhyve to use when the kernel needs
to reset the vmid set. This will need to call into EL2 and invalidate the
TLB.
The stage 2 attributes have been added. To simplify setting these fields
two new functions are added to get the memory type and protection fields.
These are slightly different on stage 1 and stage 2 tables. We then use
them in pmap_enter to set the new level 3 entry to be stored.
The D-cache on all entries is cleaned to the point of coherency. This is
to allow the data to be visible to the VM. To allow for userspace to load
code when creating a new executable entry an invalid entry is created. When
the VM tried to use it the I-cache is invalidated. As the D-cache has
already been cleaned this will ensure the I-cache is synchronised with the
D-cache.
When the hardware implements a VPIPT I-cache we need to either have the
correct VMID set or invalidate it from EL2. As the host kernel will have
the wrong VMID set we need to call into EL2 to clean it. For this a second
function pointer is added that is called when this invalidation is needed.
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D23875
Justin Hibbits [Wed, 27 May 2020 01:24:12 +0000 (01:24 +0000)]
powerpc/mmu: Convert PowerPC pmap drivers to ifunc from kobj
With IFUNC support in the kernel, we can finally get rid of our poor-man's
ifunc for pmap, utilizing kobj. Since moea64 uses a second tier kobj as
well, for its own private methods, this adds a second pmap install function
(pmap_mmu_init()) to perform pmap 'post-install pre-bootstrap'
initialization, before the IFUNCs get initialized.
Eric Joyner [Tue, 26 May 2020 23:35:10 +0000 (23:35 +0000)]
ice(4): Introduce new driver for Intel E800 Ethernet controllers
The ice(4) driver is the driver for the Intel E8xx series Ethernet
controllers; currently with codenames Columbiaville and
Columbia Park.
These new controllers support 100G speeds, as well as introducing
more queues, better virtualization support, and more offload
capabilities. Future work will enable virtual functions (like
in ixl(4)) and the other functionality outlined above.
For full functionality, the kernel should be compiled with
"device ice_ddp" like in the amd64 NOTES file, and/or
ice_ddp_load="YES" should be added to /boot/loader.conf so that
the DDP package file included in this commit can be downloaded
to the adapter. Otherwise, the adapter will fall back to a single
queue mode with limited functionality.
It is wrong to relate on __FreeBSD_version, either from
include/param.h, kernel, or libc, to check for rtld features.
Rtld might be from newer world than the running userspace.
Add special private symbols exported by rtld itself, to indicate the
changes in runtime behavior, and features that cannot be otherwise
detected or deduced at runtime.
Note that the symbols are not exported from libc, so they intentionally
cannot be linked against, and exported from the private namespace from rtld.
Consumers are required to use dlsym(3). For instance, for
_rtld_version_laddr_offset, user should do
ptr = dlsym(RTLD_DEFAULT, "_rtld_version_laddr_offset")
or even
ptr = dlvsym(RTLD_DEFAULT, "_rtld_version_laddr_offset",
"FBSDprivate_1.0");
Non-null ptr means that the change is present.
Also add _rtld_version__FreeBSD_version indicator to report the
headers version used at time of the rtld build.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D24982
Marcin Wojtas [Tue, 26 May 2020 16:11:46 +0000 (16:11 +0000)]
Update ENA driver version to v2.2.0
Driver version upgrade is connected with support for the new device
fetures, like Tx drops reporting or disabling meta caching.
Moreover, the driver configuration from the sysctl was reworked to
provide safer and better flow for configuring:
* number of IO queues (new feature),
* drbr size on Tx,
* Rx queue size.
Moreover, a lot of minor bug fixes and improvements were added.
Copyright date in the license of the modified files in this release was
updated to 2020.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Marcin Wojtas [Tue, 26 May 2020 15:57:02 +0000 (15:57 +0000)]
Add sysctl node for ENA IO queues number adjustment
By default, in ena_attach() the driver attempts to acquire
ena_adapter::max_num_io_queues MSI-X vectors for the purpose of IO
queues, however this is not guaranteed. The number of vectors acquired
depends also on system resources availability.
Regardless of that, enable the number of effectively used IO queues to
be further limited through the sysctl node.
Example: Assumming that there are 8 IO queues configured by default, the
command
$ sysctl dev.ena.0.io_queues_nb=4
will reduce the number of available IO queues to 4. Similarly, the value
can be also increased up to maximum supported value. A value higher than
maximum supported number of IO queues is ignored. Zero is ignored too.
Submitted by: Maciej Bielski <mba@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Marcin Wojtas [Tue, 26 May 2020 15:54:32 +0000 (15:54 +0000)]
Fix assumptions about number of IO queues in the ENA
Make the ena_adapter::num_io_queues a number of effectively used IO
queues. While the ena_adapter::max_num_io_queues is an upper-bound
specified by the HW, the ena_adapter::num_io_queues may be lower than
that, depending on runtime system resources availability.
On reset, there are called ena_destroy_device() and then
ena_restore_device(). The latter calls, in turn, ena_enable_msix(),
which will attempt to re-acquire ena_adapter::max_num_io_queues of
MSIX vectors again.
Thus, the value of ena_adapter::num_io_queues may be different before
and after reset. For this reason, free the IO rings structures (drbr,
counters) in ena_destroy_device() and allocate again in
ena_restore_device().
Submitted by: Maciej Bielski <mba@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.