Allocate separate DMA area for synchronous IOCB execution.
Usually IOCBs should be put on queue for asynchronous processing and should
not require additional DMA memory. But there are some cases like aborts and
resets that for external reasons has to be synchronous. Give those cases
separate 2*64 byte DMA area to decouple them from other DMA scratch area
users, using it for asynchronous requests.
re-enable AMD Topology extension on certain models if disabled by BIOS
Some BIOSes disable AMD Topology extension on AMD Family 15h notebook
processors. We re-enable the extension, so that we can properly discover
core and cache topology. Linux seems to do the same.
When processing an ICMP packet containing an SCTP packet, it
is required to check the verification tag. However, this
requires the verification tag to be not 0. Enforce this.
For packets with a verification tag of 0, we need to
check it it contains an INIT chunk and use the initiate
tag for the validation. This will be a separate commit,
since it touches also other code.
l2arc: make sure that all writes honor ashift of a cache device
Previously uncompressed buffers did not obey that rule.
Type of b_asize is changed to uint64_t for consistency,
given that this is a zeta-byte filesystem.
l2arc_compress_buf is renamed to l2arc_transform_buf to better reflect
its new utility. Now not only we ensure that a compressed buffer has
a size aligned to ashift, but we also allocate a properly sized
temporary buffer if the original buffer is not compressed and it has
an odd size. This ensures that all I/O to the cache device is always
ashift-aligned, in terms of both a request offset and a request size.
If the aligned data is larger than the original data, then we have to use
a temporary buffer when reading it as well.
Also, enhance physical zio alignment checks using vdev_logical_ashift.
On FreeBSD we have this information, so we can make stricter assertions.
[amd64] dtrace_invop handler is to be called only for kernel exceptions
DTrace-related exceptions in userland code are handled elsewhere.
One practical problem was a crash in dtrace_invop_start() when saved
%rsp pointed to a virtual address that was not backed.
META_MODE: Avoid changed build command every build.
Because the file is generated with -f using another Makefile, 2
different Makefiles are trying to handle the .meta file for the
target. The obvious .NOMETA_CMP or .NOMETA on the ${MAKE} targets
don't work as they are very limited in scope in bmake. Using
.PHONY fixes the problem and ensures that the ${MAKE} command
is always ran to check if it is outdated in the sub-make.
An example of the problem in gnu/lib/libgcc (with make -dM):
/usr/obj/root/git/freebsd/gnu/lib/libgcc/tm.h.meta: 2: a build command has changed
TARGET_CPU_DEFAULT="" HEADERS="options.h i386/biarch64.h i386/i386.h i386/unix.h i386/att.h dbxelf.h elfos-undef.h elfos.h freebsd-native.h freebsd-spec.h freebsd.h i386/x86-64.h i386/freebsd.h i386/freebsd64.h defaults.h" DEFINES="" /bin/sh /root/git/freebsd/gnu/lib/libgcc/../../../contrib/gcc/mkconfig.sh tm.h
vs
(cd /root/git/freebsd/gnu/lib/libgcc; make -f /root/git/freebsd/gnu/lib/libgcc/../../usr.bin/cc/cc_tools/Makefile MFILE=/root/git/freebsd/gnu/lib/libgcc/../../usr.bin/cc/cc_tools/Makefile GCCDIR=/root/git/freebsd/gnu/lib/libgcc/../../../contrib/gcc tm.h)
Skipping meta for tm.h: .NOMETA
(cd /root/git/freebsd/gnu/lib/libgcc; make -f /root/git/freebsd/gnu/lib/libgcc/../../usr.bin/cc/cc_tools/Makefile MFILE=/root/git/freebsd/gnu/lib/libgcc/../../usr.bin/cc/cc_tools/Makefile GCCDIR=/root/git/freebsd/gnu/lib/libgcc/../../../contrib/gcc tm.h)
`tm.h' is up to date.
Bruce Evans reported that there was a performance regression between
the old and new NFS clients. He did a good job of isolating the problem
which was caused by the new NFS client not setting the post write mtime
correctly. The new NFS client code was cloned from the old client, but
was incorrect, because the mtime in the nfs vnode's cache wasn't yet
updated. This patch fixes this problem. The patch also adds missing mutex
locking.
The previous method would completely nerf CFLAGS once bsd.progs.mk had
recursed into the per-PROG logic and make the CFLAGS for tap testcases
to -O0, instead of appending to CFLAGS for all of the tap testcases.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Author: Alexander Motin <mav@FreeBSD.org>
Improve speculative prefetch of indirect blocks.
Scalability of many operations on wide ZFS pool can be limited by
requirement to prefetch indirect blocks first. Recently added
asynchronous indirect block read partially helped, but did not
solve the problem completely. This patch extends existing prefetcher
functionality to explicitly work with indirect blocks.
Before this change prefetcher issued reads for up to 8MB of data in
advance. With this change it also issues indirect block reads
for up to 64MB of data in advance, so that when it will be time to
actually read those data, it can be done immediately. Alike effect
can be achieved by just increasing maximal data prefetch distance,
but at higher memory cost.
Also this change introduces indirect block prefetch for rewrite
operations, that was never done before. Previously ARC miss for
Indirect blocks regularly blocked rewrites, converting perfectly
aligned asynchronous operations into synchronous read-write pairs,
significantly reducing maximal rewrite speed.
While being there this issue was also fixed:
- prefetch was done always, even if caching for the dataset was
completely disabled.
Testing on FreeBSD with zvol on top of 6x striped 2x mirrored pool
of 12 assorted HDDs shown me such performance numbers:
------- BEFORE --------
Write 491363677 bytes/sec
Read 312430631 bytes/sec
Rewrite 97680464 bytes/sec
-------- AFTER --------
Write 493524146 bytes/sec
Read 438598079 bytes/sec
Rewrite 277506044 bytes/sec
Fix the problem, when gpart(8) can't write both bootcode and partcode
in one command due to wrong file size limit. Do not use bootcode size
to calculate partsize limit.
Also add report message about successful partcode writing.
Update 25xx chips firmware from 7.03.00 to 8.03.00.
While the same update is also available for 24xx chips, it seems have
a problem with disabling virtual ports -- firmware handles the request,
but does not respong on it, causing timeout in driver.
During if_vmove() we call if_detach_internal() which in turn calls the event
handler notifying about interface departure and one of the consumers will
detach if_bpf.
There is no way for us to re-attach this easily as the DLT and hdrlen are
only given on interface creation.
Add a function to allow us to query the DLT and hdrlen from a current
BPF attachment and after if_attach_internal() manually re-add the if_bpf
attachment using these values.
Found by panics triggered by nd6 packets running past BPF_MTAP() with no
proper if_bpf pointer on the interface.
Also add a basic DDB show function to investigate the if_bpf attachment
of an interface.
Reviewed by: gnn
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D5896
zio: align use of "no dump" flag between use_uma and !use_uma cases
At the moment no ZFS buffers are included into a crash dump unless
ZFS_DEBUG (or INVARIANTS) kernel option is enabled. That's not very
helpful for debugging of ZFS problems, because important information
often resides in metadata buffers.
This change switches the dumping behavior when UMA is used from the
illumos behavior to a more useful behavior that we have on FreeBSD
when ZFS buffers are allocated via malloc.
Allow guest writes to AMD microcode update[0xc0010020] MSR without updating actual hardware MSR. This allows guest microcode update to go through which otherwise failing because wrmsr() was returning EINVAL.
hyperv: Identify Hyper-V features and recommends properly
Features bits will be used to detect devices, e.g. timers, which
do not have corresponding event channels.
Submitted by: Jun Su <junsu microsoft com>
Reviewed by: sephe, Dexuan Cui <decui microsoft com>
Rearranged by: sephe
MFC after: 1 week
Sponsored by: Microsoft OSTC
Fix IIC "how" argument dereferencing on big-endian platforms
"how" argument is passed as value of int* pointer to callback
function but dereferenced as char* so only one byte taken into
into account. On little-endian systems it happens to work because
first byte is LSB that contains actual value, on big-endian it's
MSB and in this case it's always equal zero
marius [Sun, 10 Apr 2016 22:43:36 +0000 (22:43 +0000)]
Since r296250 it is no longer possible for devices to use bus space
addresses exceeding 32 bit, so bump BUS_SPACE_MAXADDR to 64 bit.
The whole situation is sub par, though; prior to r296250 and despite
what their names imply, BUS_SPACE_MAX* were primarily, even almost
exclusively used for bus_dma(9). Now these macros also have a vital
role for bus_space(9). However, it does not necessarily hold that
both bus DMA and space addresses universally have the same limits
per platform.
As for sparc64, 64 bit clearly is beyond what can be addressed via
the various IOMMUs. With this change in place, we now rely on the
parent bus DMA tags of the host-to-foo drivers causing the child
tags to be capped as necessary.
Summary:
There is currently a 1GB hole between user and kernel address spaces
into which direct (1:1 PA:VA) device mappings go. This appears to go largely
unused, leaving all devices to contend with the 128MB block at the end of the
32-bit space (0xf8000000-0xffffffff). This easily fills up, and needs to be
densely packed. However, dense packing wastes precious TLB1 space, of which
there are only 16 (e500v2) or 64(e5500) entries available.
Change this by using the 1GB space for all device mappings, and allow the kernel
to use the entire upper 1GB for KVA. This also allows us to use sparse device
mappings, freeing up TLB entries.
Add a function to lookup a device_t object by name.
This just walks the global list of devices looking for one with the
requested name. The one use case outside of devctl2's implementation
is for DDB commands that wish to lookup devices by name.
adrian [Sun, 10 Apr 2016 04:16:34 +0000 (04:16 +0000)]
[net80211] correctly (i hope, wow) do a ticks comparison to limit A-MPDU attempts
I was seeing the stack constantly attempt to renegotiate A-MPDU TX
even after 3 failures. My hunch is that the direct ticks comparison
is failing around the ticks wrap-around point.
This failure shouldn't /really/ happen normally, but it turns out being
the IBSS master node on FreeBSD doesn't quite setup 11n right, so
negotiating A-MPDU TX fails.
adrian [Sun, 10 Apr 2016 03:35:17 +0000 (03:35 +0000)]
[net80211] unconditionally do A-MPDU RX aging.
It's 2016 and vendors (including us!) still have 802.11n TX/RX sequence
handling bugs. It's suboptimal, but I'd rather see us default to handling
things in a sensible way.
So, just delete the #ifdef'ed code for now. I'll leave the option in
so it doesn't break existing configurations.
This all started because I've started getting reports about urtwn not
working after I enabled 802.11n support, and it's because the ARM kernel
configs don't include A-MPDU RX aging.
This allows one to enable DTrace probes relatively early during boot,
during SI_SUB_DTRACE_ANON, before dtrace(1) can invoked. The desired
enabling is created using dtrace -A, which writes a /boot/dtrace.dof
file and uses nextboot(8) to ensure that DTrace kernel modules are loaded
and that the DOF file describing the enabling is loaded by loader(8)
during the subsequent boot. The trace output can then be fetched with
dtrace -a.
With this commit, boot-time DTrace is only functional on i386 and amd64: on
other architectures, the high-resolution timer frequency is initialized
during SI_SUB_CLOCKS and is thus not available when the anonymous
tracing state is initialized. On x86, the TSC is used and is thus available
earlier.
Initialize DTrace hrtimer frequency during SI_SUB_CPU on i386 and amd64.
This allows the hrtimer to be used earlier during boot. This is required
for boot-time DTrace: anonymous enablings are created during
SI_SUB_DTRACE_ANON, which runs before APs are started. In particular,
the DTrace deadman timer requires that the hrtimer be functional.
MFV r297760: 6418 zpool should have a label clearing command
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Author: Will Andrews <will@firepipe.net>
ian [Sat, 9 Apr 2016 19:09:06 +0000 (19:09 +0000)]
Align the start of the text segment to an 8-byte boundary. This fixes
alignment aborts in ubldr.bin for RPi that started happening with clang 3.8
(earlier clang apparently didn't generate strd instructions that trigger
the alignment fault). The abort happened in ubldr.bin and not ubldr (elf
version) because the elf headers are 0xf4 bytes long, and stripping them
off left everything 4-byte aligned.
While here, also stop aligning the data segment to a page boundary, align
it to 8 bytes instead (aligning to a page just needlessly makes the file
bigger); pointed out by andrew@.
Add more fine-grained kernel options for NUMA support.
VM_NUMA_ALLOC is used to enable use of domain-aware memory allocation in
the virtual memory system. DEVICE_NUMA is used to enable affinity
reporting for devices such as bus_get_domain().
MAXMEMDOM must still be set to a value greater than for any NUMA support
to be effective. Note that 'cpuset -gd' always works if MAXMEMDOM is
enabled and the system supports NUMA.
It looks like as with the safety belt of DELAY() fastened (*) we can
completely tear down and free all memory for TCP (after r281599).
(*) in theory a few ticks should be good enough to make sure the timers
are all really gone. Could we use a better matric here and check a
tcbcb count as an optimization?
PR: 164763
Reviewed by: gnn, emaste
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D5734
We attach the "counter" to the tcpcbs. Thus don't free the
TCP Fastopen zone before the tcpcbs are gone, as otherwise
the zone won't be empty.
With that it should be safe to destroy the "tfo" zone without
leaking the memory.
PR: 164763
Reviewed by: gnn
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D5731
While there is no dependency interaction, stopping the timer before
freeing the rest of the resources seems more natural and avoids it
being scheduled an extra time when it is no longer needed.
Reviewed by: gnn, emaste
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D5733