Glen Barber [Fri, 19 Oct 2018 00:24:23 +0000 (00:24 +0000)]
- Prune svn:mergeinfo from the new branch, as nothing has been merged
here.
- Remove debugging from GENERIC* kernel configurations
- Enable MALLOC_PRODUCTION
- Default dumpdev=NO
- Remove UPDATING entry regarding debugging features
- Switch 12.0 from -ALPHA10 to -BETA1 to prepare for builds.
Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation
Apparently AMD machines cannot tolerate this. This was uncovered by
r339386, where cache flush started really flushing the requested range.
Introduce pmap_mapdev_pciecfg(), which simply does not flush cache
comparing with pmap_mapdev(). It assumes that the MCFG region was
never accessed through the cacheable mapping, which is most likely
true for machine to boot at all.
Note that i386 does not need the change, since the architecture
handles access per-page due to the KVA shortage, and page remapping
already does not flush the cache.
Reported and tested by: mjg, Mike Tancsa <mike@sentex.net>
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
Approved by: re (gjb)
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D17612
Bjoern A. Zeeb [Thu, 18 Oct 2018 20:20:41 +0000 (20:20 +0000)]
In r78161 the lookup_set linker method was introduced which optionally
returns the section start and stop locations as well as a count if the
caller asks for them.
There was only one out-of-file consumer of count which did not actually
use it and hence was eliminated in r339407.
In r194784 parse_dpcpu(), and in r195699 parse_vnet() (a copy of the
former) started to use the link_elf_lookup_set() interface internally
also asking for the count.
count is computed as the difference of the void **stop - void **start
locations and as such, if the absoulte numbers
(stop - start) % sizeof(void *) != 0
a round-down happens, e.g., **stop 0x1003 - **start 0x1000 => count 0.
To get the section size instead of "count is the number of pointer
elements in the section", the parse_*() functions do a
count *= sizeof(void *).
They use the result to allocate memory and copy the section data
into the "master" and per-instance memory regions with a size of
count.
As a result of count possibly round-down this can miss the last
bytes of the section. The good news is that we do not touch
out of bounds memory during these operations (we may at a later stage
if the last bytes would overflow the master sections).
Given relocation in elf_relocaddr() works based on the absolute
numbers of start and stop, this means that we can possibly try to
access relocated data which was never copied and hence we get
random garbage or at best zeroed memory.
Stop the two (last) consumers of count (the parse_*() functions)
from using count as well, and calculate the section size based on
the absolute numbers of stop and start and use the proper size for
the memory allocation and data copies. This will make the symbols
in the last bytes of the pcpu or vnet sections be presented as
expected.
PR: 232289
Approved by: re (gjb)
MFC after: 2 weeks
Michael Tuexen [Thu, 18 Oct 2018 19:21:18 +0000 (19:21 +0000)]
The handling of RST segments in the SYN-RCVD state exists in the
code paths. Both are not consistent and the one on the syn cache code
does not conform to the relevant specifications (Page 69 of RFC 793
and Section 4.2 of RFC 5961).
This patch fixes this:
* The sequence numbers checks are fixed as specified on
page Page 69 RFC 793.
* The sysctl variable net.inet.tcp.insecure_rst is now honoured
and the behaviour as specified in Section 4.2 of RFC 5961.
Approved by: re (gjb@)
Reviewed by: bz@, glebius@, rrs@,
Differential Revision: https://reviews.freebsd.org/D17595
Sponsored by: Netflix, Inc.
The local_unbound service will configure and bootstrap itself, but only
if a network connection is available. This is not an issue when running
'service local_unbound setup' interactively, but can be on a diskless
system where local_unbound self-configures on every boot. To address
this, add explicit dependencies on netwait and defaultroute.
Ruslan Bukin [Thu, 18 Oct 2018 15:25:07 +0000 (15:25 +0000)]
Support RISC-V implementations that do not manage the A and D bits
(e.g. RocketChip, lowRISC and derivatives).
RISC-V page table entries support A (accessed) and D (dirty) bits. The
spec makes hardware support for these bits optional. Implementations that
do not manage these bits in hardware raise page faults for accesses to a
valid page without A set and writes to a writable page without D set.
Check for these types of faults when handling a page fault and fixup the
PTE without calling vm_fault if they occur.
Ruslan Bukin [Thu, 18 Oct 2018 15:08:14 +0000 (15:08 +0000)]
Support RISC-V implementations that do not manage the A and D bits
(e.g. RocketChip, lowRISC and derivatives).
RISC-V page table entries support A (accessed) and D (dirty) bits. The
spec makes hardware support for these bits optional. Implementations that
do not manage these bits in hardware raise page faults for accesses to a
valid page without A set and writes to a writable page without D set.
Check for these types of faults when handling a page fault and fixup the
PTE without calling vm_fault if they occur.
r334853 added a "socket destructor" callback. However, as implemented, it
was really a "socket close" callback.
Update the socket destructor functionality to run when a socket is
destroyed (rather than when it is closed). The original submitter has
confirmed that this change satisfies the intended use case.
Suggested by: rwatson
Submitted by: Michio Honda <micchie at sfc.wide.ad.jp>
Tested by: Michio Honda <micchie at sfc.wide.ad.jp>
Approved by: re (kib)
Differential Revision: https://reviews.freebsd.org/D17590
Bjoern A. Zeeb [Thu, 18 Oct 2018 02:07:30 +0000 (02:07 +0000)]
While preparing to move init(8) to its own package as indicated
in r339413, a current pkgbase update problem came up. For users
testing pkgbase at the moment there is no (automatic) way to pick
up new base packages (yet).
As a result rather than also moving init(8) to its own package,
back out the part of the change in r339413 that moved rc* to its
own package and defer creating new packages until the
infrastructure is in place to handle these cases.
Both init and rc* are considered too problematic to be lost by
early adaptors at this stage.
Discussed with: brd
Reviewed by: brd
Approved by: re (gjb)
Bjoern A. Zeeb [Wed, 17 Oct 2018 16:49:11 +0000 (16:49 +0000)]
Move the rc framework out of sbin/init into libexec/rc.
The reasons for this are forward looking to pkgbase:
* /sbin/init is a special binary; try not to replace it with
every package update because an rc script was touched.
(a follow-up commit will make init its own package)
* having rc in its own place will allow more easy replacement
of the rc framework with alternatives, such as openrc.
Jamie Gritton [Wed, 17 Oct 2018 16:11:43 +0000 (16:11 +0000)]
Add a new jail permission, allow.read_msgbuf. When true, jailed processes
can see the dmesg buffer (this is the current behavior). When false (the
new default), dmesg will be unavailable to jailed users, whether root or
not.
The security.bsd.unprivileged_read_msgbuf sysctl still works as before,
controlling system-wide whether non-root users can see the buffer.
Yuri Pankov [Wed, 17 Oct 2018 14:51:43 +0000 (14:51 +0000)]
strptime: fix parsing of tm_year when both %C and %y appear in the
format string in arbitrary order. This makes the related test cases in
lib/libc/tests/time (not yet connected to the build) pass.
While here, don't error on negative tm_year value based on the
APPLICATION USAGE in
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/time.h.html
(glibc does the same):
tm_year is a signed value; therefore, years before 1900 may be represented.
Approved by: re (gjb), kib (mentor)
Differential Revision: https://reviews.freebsd.org/D17550
Bjoern A. Zeeb [Wed, 17 Oct 2018 10:31:08 +0000 (10:31 +0000)]
The countp argument passed to linker_file_lookup_set() in
linker_load_dependencies() is unused, so no need to ask for the
value in first place. Remove the unused "count" variable.
Add initial driver for ACPI NFIT-enumerated NVDIMMs.
Driver enumerates NVDIMMs. Besides, for each found System Physical
Address (SPA) range, spaN geom provider is created, which allows
formatting and mounting the region as the normal volume. Also,
/dev/nvdimm_spaN node is created, which can be read/written/mapped by
userspace, the mapping is zero-copy.
No support for block access methods implemented, labels are not
parsed. No management interfaces are provided.
Tested by: Intel, NetApp
Sponsored by: The FreeBSD Foundation
Approved by: re (gjb)
MFC after: 2 weeks
Fix for reception of large full speed isochronous frames via the transaction
translator, when using the DWC OTG USB controller driver. Make sure to re-try
getting the complete split packets until a DATA0 packet is received. Larger
isochronous frames may be split into multiple MDATA packets terminated
by a single DATA0 packet.
PR: 230434
MFC after: 3 days
Approved by: re (gjb)
Sponsored by: Mellanox Technologies
The KPI allows to map very large contigous physical memory regions
into KVA, which are not covered by DMAP.
I see both with QEMU and with some real hardware started shipping, the
regions for NVDIMMs might be very far apart from the normal RAM, and
we expect that at least initial users of NVDIMM could install very
large amount of such memory. IMO it is not reasonable to extend DMAP
to cover that far-away regions both because it could overflow existing
4T window for DMAP in KVA, and because it costs in page table pages
allocations, for gap and for possibly unused NV RAM.
Also, KPI provides some special functionality for fast cache flushing
based on the knowledge of the NVRAM mapping use.
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
Approved by: re (gjb)
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D17070
Yuri Pankov [Tue, 16 Oct 2018 17:17:11 +0000 (17:17 +0000)]
apropos/whatis: use output of manpath(1) to set defpaths if -M is not
specified. This fixes searching the paths specified in
/usr/local/etc/man.d/*.conf, as currently apropos/whatis from mandoc
suite aren't aware about them.
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
Approved by: re (gjb)
MFC after: 3 days
Differential revision: https://reviews.freebsd.org/D17070
In r338102, the TCP reassembly code was substantially restructured. Prior
to this change, the code sometimes used a temporary stack variable to hold
details of a TCP segment. r338102 stopped using the variable to hold
segments, but did not actually remove the variable.
Because the variable is no longer used, we can safely remove it.
Import CK as of commit 5221ae2f3722a78c7fc41e47069ad94983d3bccb.
This fixes two problems, one where epoch calls could occur before all
the readers had exited the epoch section, and one where the epoch calls
could be unnecessarily delayed.
Import CK as of commit 5221ae2f3722a78c7fc41e47069ad94983d3bccb.
This fixes two problems, one where epoch calls could occur before all
the readers had exited the epoch section, and one where the epoch calls
could be unnecessarily delayed.
Alexander Motin [Mon, 15 Oct 2018 21:59:24 +0000 (21:59 +0000)]
Skip VDEV_IO_DONE stage only for ZIO_TYPE_FREE.
Device removal code uses zio_vdev_child_io() with ZIO_TYPE_NULL parent,
that never happened before. It confused FreeBSD-specific TRIM code,
which does not use VDEV_IO_DONE for logical ZIO_TYPE_FREE ZIOs. As
result of that stage being skipped device removal ZIOs leaked references
and memory that supposed to be freed by VDEV_IO_DONE, making it stuck.
It is a quick patch rather then a nice fix, but hopefully we'll be able
to drop it all together when alternative TRIM implementation finally get
landed.
PR: 228750, 229007
Discussed with: allanjude, avg, smh
Approved by: re (delphij)
MFC after: 5 days
Sponsored by: iXsystems, Inc.
Yuri Pankov [Mon, 15 Oct 2018 20:11:53 +0000 (20:11 +0000)]
pw: respect path specified using -V when writing pw.conf, and -C is not
explicitly specified. -V path is already used to determine which file
to read default values from, so it's only logical to write them to the
same file.
John Baldwin [Mon, 15 Oct 2018 18:56:54 +0000 (18:56 +0000)]
Various fixes for TLB management on RISC-V.
- Remove the arm64-specific cpu_*cache* and cpu_tlb_flush* functions.
Instead, add RISC-V specific inline functions in cpufunc.h for the
fence.i and sfence.vma instructions.
- Catch up to changes in the arm64 pmap and remove all the cpu_dcache_*
calls, pmap_is_current, pmap_l3_valid_cacheable, and PTE_NEXT bits from
pmap.
- Remove references to the unimplemented riscv_setttb().
- Remove unused cpu_nullop.
- Add a link to the SBI doc to sbi.h.
- Add support for a 4th argument in SBI calls. It's not documented but
it seems implied for the asid argument to SBI_REMOVE_SFENCE_VMA_ASID.
- Pass the arguments from sbi_remote_sfence*() to the SEE. BBL ignores
them so this is just cosmetic.
- Flush icaches on other CPUs when they resume from kdb in case the
debugger wrote any breakpoints while the CPUs were paused in the IPI_STOP
handler.
- Add SMP vs UP versions of pmap_invalidate_* similar to amd64. The
UP versions just use simple fences. The SMP versions use the
sbi_remove_sfence*() functions to perform TLB shootdowns. Since we
don't have a valid pm_active field in the riscv pmap, just IPI all
CPUs for all invalidations for now.
- Remove an extraneous TLB flush from the end of pmap_bootstrap().
- Don't do a TLB flush when writing new mappings in pmap_enter(), only if
modifying an existing mapping. Note that for COW faults a TLB flush is
only performed after explicitly clearing the old mapping as is done in
other pmaps.
- Sync the i-cache on all harts before updating the PTE for executable
mappings in pmap_enter and pmap_enter_quick. Previously the i-cache was
only sync'd after updating the PTE in pmap_enter.
- Use sbi_remote_fence() instead of smp_rendezvous in pmap_sync_icache().
John Baldwin [Mon, 15 Oct 2018 18:12:25 +0000 (18:12 +0000)]
Reload the LDT selector after an AMD-v #VMEXIT.
cpu_switch() always reloads the LDT, so this can only affect the
hypervisor process itself. Fix this by explicitly reloading the host
LDT selector after each #VMEXIT. The stock bhyve process on FreeBSD
never uses a custom LDT, so this change is cosmetic.
Reviewed by: kib
Tested by: Mike Tancsa <mike@sentex.net>
Approved by: re (gjb)
MFC after: 2 weeks
Eric Joyner [Mon, 15 Oct 2018 17:23:41 +0000 (17:23 +0000)]
iavf(4): Finish rename/rebrand internally
Rename functions and variables from ixlv to iavf to match the
user-facing name change. There shouldn't be any functional changes
with this change, but this may help with browsing the source code
and reducing diffs in the future.
Leandro Lupori [Mon, 15 Oct 2018 16:43:07 +0000 (16:43 +0000)]
Initialize SPRG0 before its first possible use
At early boot, PCPU_GET(), that obtains a pointer from SPRG0, was being
used with SPRG0 not yet initialized. If it pointed to an invalid
address, the machine would hang.
Synchronizing the epoch before freeing the multicast addresses while holding
the VLAN_XLOCK() might lead to a deadlock. Use deferred freeing of the VLAN
multicast addresses to resolve deadlock. Backtrace:
Eric Joyner [Sun, 14 Oct 2018 05:09:43 +0000 (05:09 +0000)]
em/igb/ix(4): Port two Tx/Rx fixes made to ixl in r339338
- Fix assert/panic on receive when Jumbo Frames are enabled.
From the commit I made to ixl:
"It turns out that *_isc_rxd_available is supposed to return how many
packets are available to be cleaned on the rx ring. This patch removes
a section of code where if the budget argument is 1, the function would return
one if there was a descriptor available, not necessarily a packet.
This is okay in regular mtu 1500 traffic since the max frame size is less
than the configured receive buffer size (2048), but this doesn't work when
received packets can span more than one descriptor, as is the case when the
mtu is 9000 and the receive buffer size is 4096."
- Fix possible Tx hang because *_isc_txd_credits_update returns incorrect result
From the commit by Krzysztof Galazka to ixl: "Function isc_txd_update_credits
called with clear set to false should return 1 if there are TX descriptors
already handled by HW. It was always returning 0 causing troubles with UDP TX
traffic."
Ed Maste [Sat, 13 Oct 2018 21:26:07 +0000 (21:26 +0000)]
elfcopy: delete filter_reloc, it is broken and unnecessary
elfcopy contained logic to filter individual relocations in STRIP_ALL
mode. However, this is not valid; relocations emitted by the linker are
required, unless they apply to an entire section being removed (which is
handled by other logic in elfcopy).
Note that filter_reloc was also buggy: for RELA relocation sections it
operated on uninitialized rel.r_info resulting in invalid operation.
The logic most likely needs to be inverted: instead of removing
relocations because their associated symbols are being removed, we must
keep symbols referenced by relocations. That said, in practice we do
not encounter this code path today: objects being stripped are either
dynamically linked binaries which retain .dynsym, or static binaries
with no relocations.
Just remove filter_reloc. This fixes certain cases including statically
linked binaries containing ifuncs. Stripping binaries with relocations
referencing removed symbols was already broken, and after this change
may still be broken in a different way.
PR: 232176
Reviewed by: kaiw, kib, markj
Approved by: re (rgrimes)
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17519
Mateusz Guzik [Sat, 13 Oct 2018 21:18:31 +0000 (21:18 +0000)]
amd64: partially depessimize cpu_fetch_syscall_args and cpu_set_syscall_retval
Vast majority of syscalls take 6 or less arguments. Move handling of other
cases to a fallback function. Similarly, special casing for _syscall
and __syscall
magic syscalls is moved away.
Return is almost always 0. The change replaces 3 branches with 1 in the common
case. Also the 'frame' variable convinces clang not to reload it on each access.
Reviewed by: kib
Approved by: re (gjb)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17542
Mateusz Guzik [Sat, 13 Oct 2018 21:17:28 +0000 (21:17 +0000)]
amd64: convert libc bcopy to a C func to avoid future bloat
The function is of limited use and is an almost a direct clone of
memmove/memcpy (with arguments swapped). Introduction of ERMS variants
of string routines would mean avoidable growth of libc.
bcopy will get redefined to a __builtin_memmove later on with this
symbol only left for compatibility.
Reviewed by: kib
Approved by: re (gjb)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17539
Bjoern A. Zeeb [Fri, 12 Oct 2018 22:51:45 +0000 (22:51 +0000)]
In udp_input() when walking the pcblist we can come across
an inp marked FREED after the epoch(9) changes.
Check once we hold the lock and skip the inp if it is the case.
Contrary to IPv6 the locking of the inp is outside the multicast
section and hence a single check seems to suffice.
Eric Joyner [Fri, 12 Oct 2018 22:40:54 +0000 (22:40 +0000)]
ixl/iavf(4): Change ixlv to iavf and update it to use iflib(9)
Finishes the conversion of the 40Gb Intel Ethernet drivers to iflib(9) for
FreeBSD 12.0, and fixes numerous bugs in both ixl(4) and iavf(4).
This commit also re-adds the VF driver to GENERIC since it now compiles and
functions.
The VF driver name was changed from ixlv(4) to iavf(4) because the VF driver is
now intended to be used with future products, not just with Fortville/Fort Park
VFs.
A man page update that documents these drivers is forthcoming in a separate
commit.
Alexander Motin [Fri, 12 Oct 2018 16:55:28 +0000 (16:55 +0000)]
Avoid zero-sized kmem_alloc() in vdev_compact_children().
The device evacuation code adds a dependency that
vdev_compact_children() be able to properly empty the vdev_child
array by setting it to NULL and zeroing vdev_children. Under Linux,
kmem_alloc() and related functions return a sentinel pointer rather
than NULL for zero-sized allocations.
This is a part of ZoL port of device removal patch:
commit a1d477c24c7badc89c60955995fd84d311938486
Author: Matthew Ahrens <mahrens@delphix.com> Ported-by: Tim Chase <tim@chase2k.com>
Approved by: re (kib)
MFC after: 1 week
Call initializecpucache() before ifuncs are resolved.
The function tweaks CPU capabilities based on the VM platform and
tunables, which affected selection of the cache flush method before
ifuncs were used, and should affect the cache flush in the same way
after ifunc.
PR: 232081
Reported by: phk
Analyzed by: avg
Sponsored by: The FreeBSD Foundation
Approved by: re (gjb)
Apparently CLFLUSH on mmio can cause VM exit, as reported in the PR.
I do not see that anything useful can be done except emulating page
faults on invalid addresses.
Due to the instruction encoding pecularity, also emulate SFENCE.
PR: 232081
Reported by: phk
Reviewed by: araujo, avg, jhb (all: previous version)
Sponsored by: The FreeBSD Foundation
Approved by: re (gjb)
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D17482
Alexander Motin [Fri, 12 Oct 2018 15:14:22 +0000 (15:14 +0000)]
Add ZIO_TYPE_FREE support for indirect vdevs.
Upstream code expects only ZIO_TYPE_READ and some ZIO_TYPE_WRITE
requests to removed (indirect) vdevs, while on FreeBSD there is also
ZIO_TYPE_FREE (TRIM). ZIO_TYPE_FREE requests do not have the data
buffers, so don't need the pointer adjustment.
PR: 228750, 229007
Reviewed by: allanjude, sef
Approved by: re (kib)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D17523
Bjoern A. Zeeb [Fri, 12 Oct 2018 11:30:46 +0000 (11:30 +0000)]
r217592 moved the check for imo in udp_input() into the conditional block
but leaving the variable assignment outside the block, where it is no longer
used. Move both the variable and the assignment one block further in.
This should result in no functional changes. It will however make upcoming
changes slightly easier to apply.
Yuri Pankov [Thu, 11 Oct 2018 18:30:12 +0000 (18:30 +0000)]
Restore some of the ctype definitions reported in the PR from pre-CLDR
data, namely 0xE000-0xF8FF private use area, and 0xFF00-0xFFF half- and
fullwidth punctuation.
While here, update tools/tools/locale/README based on my experience
rebuilding the locale data.
PR: 225692
Reviewed by: bapt, cem (previous version)
Approved by: re (gjb), kib (mentor)
Differential Revision: https://reviews.freebsd.org/D17471
John Baldwin [Thu, 11 Oct 2018 18:27:19 +0000 (18:27 +0000)]
Fully restore the GDTR, IDTR, and LDTR after VT-x VM exits.
The VT-x VMCS only stores the base address of the GDTR and IDTR. As a
result, VM exits use a fixed limit of 0xffff for the host GDTR and
IDTR losing the smaller limits set in when the initial GDT is loaded
on each CPU during boot. Explicitly save and restore the full GDTR
and IDTR contents around VM entries and exits to restore the correct
limit.
Similarly, explicitly save and restore the LDT selector. VM exits
always clear the host LDTR as if the LDT was loaded with a NULL
selector and a userspace hypervisor is probably using a NULL selector
anyway, but save and restore the LDT explicitly just to be safe.
Kyle Evans [Thu, 11 Oct 2018 17:18:49 +0000 (17:18 +0000)]
Disable kernels_autodetect on installation media
This feature is disabled on install media as these generally won't have any
interesting kernels to be listed other than the default kernel, so the
potential performance penalty in these situations likely isn't worth it.
Kyle Evans [Thu, 11 Oct 2018 17:17:54 +0000 (17:17 +0000)]
Enable lualoader's kernel autodetection, disabled on install media
As documented in loader.conf(5), kernels_autodetect="YES" will cause the
Lua scripts to effectively scan /boot for directories with a "kernel" file
inside, to be listed in the loader menu.
Ed Maste [Thu, 11 Oct 2018 13:19:17 +0000 (13:19 +0000)]
lld: set sh_link and sh_info for .rela.plt sections
ELF spec says that for SHT_REL and SHT_RELA sh_link should reference the
associated string table and sh_info should reference the "section to
which the relocation applies." ELF Tool Chain's elfcopy / strip use
this (in part) to control whether or not the relocation entry is copied
to the output.
Nathan Whitehorn [Thu, 11 Oct 2018 00:54:39 +0000 (00:54 +0000)]
Loader GELI support, like lua loader, seems to be broken on PowerPC as
well as on SPARC64 and can cause boot failures even when no encrypted
disks are present. Presumably, the reasons, while unknown, are the same
and most-likely are the result of some endian-unsafe code. Pending
finding the actual problem, extend the blacklist entry for these parts
of loader on SPARC to also cover all PowerPC platforms.
In vdev_queue_aggregate() the zio_execute() bypass should not be
called under the vdev queue lock. This can result in a deadlock
as shown in the stack traces below.
Drop the vdev queue lock then walk the parents of the aggregate IO
to determine the list of component IOs to be bypassed. This can
be done safely without holding the io_lock since the new aggregate
IO has not yet been returned and its parents cannot change.