llvm-symbolizer: Move out of CLANG_EXTRAS, into CLANG
ASAN reports become a lot more useful with llvm-symbolizer in $PATH, and the
build is not much more time-consuming. The added benefit is that the
resulting reports will actually include symbol information; without, thread
trace information includes a bunch of addresses that immediately resolve to
an inline function in
^/contrib/compiler-rt/lib/sanitizer_common/sanitizer_common.h and take a
little more effort to examine.
Upgrade our copies of clang, llvm, lld, lldb, compiler-rt, libc++,
libunwind and openmp to the upstream release_80 branch r363030
(effectively, 8.0.1 rc2). The 8.0.1 release should follow this within a
week or so.
MFC r349351 (by jhibbits, partially):
powerpc: Transition to Secure-PLT, like most other OSs (Toolchain part)
Summary:
Toolchain follow-up to r349350. LLVM patches will be submitted upstream for
9.0 as well.
The bsd.cpu.mk change is required because GNU ld assumes BSS-PLT if it
cannot determine for certain that it needs Secure-PLT, and some binaries do
not compile in such a way to make it know to use Secure-PLT.
Upgrade our copies of clang, llvm, lld, lldb, compiler-rt, libc++,
libunwind and openmp to the upstream release_80 branch r364487
(effectively, 8.0.1 rc3). The 8.0.1 release will most likely
have no further changes.
MFC r350177:
Merge llvm, clang, compiler-rt, libc++, libunwind, lld, lldb and openmp
8.0.1 final release r366581. The only functional change is a fix for a
mismerge of upstream r360816, which properly restores the r2 register
when unwinding on PowerPC64 (See https://reviews.freebsd.org/D20337).
Ed Maste [Tue, 23 Jul 2019 18:11:42 +0000 (18:11 +0000)]
bhyve: Fix resource leak when using strdup
MFC r340044 (araujo):
Fix resource leak when using strdup(3).
MFC r344160 (rgrimes):
In r340044 an attempt to quiet coverity warning cid 1357336
was incorrectly implemented leading to a possible double free.
It is possible for both the conditional free,
and the unconditional free added in r340044 to be done,
fix that by initializing uopt to NULL,
removing the conditional free,
and only using the unconditional free at the end.
This changes the return code however the caller only tests for 0 and != 0.
One might ask then, why multiple return codes when the caller only tests
for 0 and != 0? From what I can tell, Darren probably passed various
return codes for sake of debugging. The debugging code is long gone
however we can still use the different return codes using DTrace FBT
traces. We can still determine why the compare failed by examining the
differences between the fr1 and fr2 frentry structs, which is a simple
test in DTrace. This allows reducing the number of tests, improving the
code while not affecting our ability to capture information for
diagnostic purposes.
Bhyve can currently emulate two virtual NICs, namely virtio-net and e1000,
and connect to the host network through two backends, namely tap and netmap.
However, there is no interface between virtual NIC functionalities and
backend functionalities. As a result, the backend code is duplicated between
the two virtual NIC implementations and also within the same virtual NIC.
Also, e1000 cannot currently use netmap as a backend.
This patch introduces a network backend API between virtio-net/e1000 and
tap/netmap, to improve code reuse and add missing functionalities.
Virtual NICs and backends can negotiate virtio-net features, such as checksum
offload and TSO. If the backend supports the features, it will propagate this
information to the guest, so that the latter can make use of them. Currently,
only netmap VALE ports support the features, but support should be added to
tap in the future.
Calculate the offset of the interface name using FR_NAME rather than
calclulating it "by hand". This improves consistency with the rest of
the code and is in line with planned fixes and other work.
Recycle the unused FR_CMPSIZ macro which became orphaned in ipfilter 5
prior to its import into FreeBSD. This macro calculates the size to be
compared within the frentry structure. The ipfilter 4 version of the
macro calculated the compare size based upon the static size of the
frentry struct. Today it uses the ipfilter 5 method of calculating the
size based upon the new to ipfilter 5 fr_size value found in the
frentry struct itself.
Fix VOP_PUTPAGES(9) in regards to the use of VM_PAGER_CLUSTER_OK
Submitted by: Ka Ho Ng <khng300 at gmail.com>
Reviewed by: mckusick
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20695
MFC r349940:
Correctly truncate the rule in case when it has several action opcodes.
It is possible, that opcode at the ACTION_PTR() location is not real
action, but action modificator like "log", "tag" etc. In this case we
need to check for each opcode in the loop to find O_EXTERNAL_ACTION.
Alan Somers [Fri, 19 Jul 2019 14:13:50 +0000 (14:13 +0000)]
MFC r349041:
open(2): fix the description of O_FSYNC
The man page claims that with O_FSYNC (aka O_SYNC) the kernel will not cache
written data. However, that's not true. Nor does POSIX require it.
Perhaps it was true when that section of the man page was written in r69336
(I haven't checked). But it's not true now. Now the effect is simply that
writes are sent to disk immediately and synchronously, but they're still
cached.
See also: https://pubs.opengroup.org/onlinepubs/9699919799/
See also: ffs_write in sys/ufs/ffs/ffs_vnops.c
Reviewed by: cem
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20641
ipfilter commands, in this case ipf(8), passes its operations and rules
via an ioctl interface. Rules can be added or removed and stats and
counters can be zeroed out. As the ipfilter interprets these
instructions or operations they are stored in an integer called
addrem (add/remove). 0 is add, 1 is remove, and 2 is clear stats and
counters. Much of this is not documented. This commit documents these
operations by replacing simple integers with a self documenting
enum along with a few basic comments.
- Add rcu list functions.
- Make rcu hlist's foreach macro use rcu calls instead of the non-rcu macro.
- Bump FreeBSD version so we have a checkpoint for the vboxvideo drm driver.
Pull in r365760 from upstream lld trunk (by Fangrui Song):
[ELF] Handle non-glob patterns before glob patterns in version
scripts & fix a corner case of --dynamic-list
This fixes PR38549, which is silently accepted by ld.bfd.
This seems correct because it makes sense to let non-glob patterns
take precedence over glob patterns.
lld issues an error because
`assignWildcardVersion(ver, VER_NDX_LOCAL);` is processed before
`assignExactVersion(ver, v.id, v.name);`.
Move all assignWildcardVersion() calls after assignExactVersion()
calls to fix this.
Also, move handleDynamicList() to the bottom. computeBinding() called
by includeInDynsym() has this cryptic rule:
if (versionId == VER_NDX_LOCAL && isDefined() && !isPreemptible)
return STB_LOCAL;
Before the change:
* foo's version is set to VER_NDX_LOCAL due to `local: *`
* handleDynamicList() is called
- foo.computeBinding() is STB_LOCAL
- foo.includeInDynsym() is false
- foo.isPreemptible is not set (wrong)
* foo's version is set to V1
After the change:
* foo's version is set to VER_NDX_LOCAL due to `local: *`
* foo's version is set to V1
* handleDynamicList() is called
- foo.computeBinding() is STB_GLOBAL
- foo.includeInDynsym() is true
- foo.isPreemptible is set (correct)
This makes it longer necessary to patch the version scripts for the
samba ports, to avoid "duplicate symbol 'pdb_search_init' in version
script" errors.
Eric van Gyzen [Tue, 16 Jul 2019 16:05:42 +0000 (16:05 +0000)]
MFC r349834
Ignore kern.vt.splash_cpu without graphics
When the system has no graphical console, such as bhyve in common
configurations, ignore kern.vt.splash_cpu, instead of panicking
on INVARIANTS kernels.
MFC r349579: nctgpio: change default pin names to those used by the datasheet(s)
That is, instead of the current GPIO00 - GPIO15 the names will be GPIO00
- GPIO07, GPIO10 - GPIO17. The first digit is a GPIO "bank" / group
number and the second one is a pin number within the bank. Alternative
view is that the pin names are changed from decimal numbering scheme to
octal one (as there are 8 pins per bank).
MFC r349460: gpiobus: provide a new hint, pin_list
"pin_list" allows to specify child pins as a list of pin numbers.
Existing hint "pins" serves the same purpose but with a 32-bit wide bit
mask. One problem with that is that a controller can have more than 32
pins. One example is amdgpio. Also, a list of numbers is a little bit
more human friendly than a matching bit mask. As a side note, it seems
that in FDT pins are typically specified by their numbers as well.
This commit also adds accessors for instance variables (IVARs) that
define the child pins. My primary goal is to allow a child to be
configured programmatically rather than via hints (assuming that FDT is
not supported on a platform). Also, while a child should not care about
specific pin numbers that are allocated to it, it could be interested in
how many were actually assigned to it.
While there, I removed "flags" instance variable. It was unused.
MFC r349645:
Remove dead code added after r348743 in the LinuxKPI. The
LINUXKPI_VERSION macro is not defined for any compiled LinuxKPI code
which basically means __GFP_NOTWIRED is never checked when allocating
pages. This should work fine with the existing external DRM code as
long as the page wiring and unwiring is balanced.
This patch fixes 2 panics. The first one is due to the current VNET not
being set in the emulated adapter transmission path. The second one
is caused by the M_PKTHDR flag not being set when preallocated mbufs
are recycled in the transmit path.
Apply a workaround to be able to build clang 8.0.0 headers with clang
3.4.1, which is still in the stable/10 branch.
It looks like clang 3.4.1 implements static_asserts by instantiating a
temporary static object, and if those are in an anonymous union, it
results in "error: anonymous union can only contain non-static data
members".
To work around this implementation limitation, move the static_asserts
in question out of the anonymous unions.
This should make building the latest stable/11 from stable/10 possible
again.
John Baldwin [Sat, 13 Jul 2019 00:23:20 +0000 (00:23 +0000)]
MFC 343068:
Use capsicum_helpers(3) that allow us to simplify the code and its functions
will return success when the kernel is built without support of
the capability mode.
It is important to note, that I'm taking a more conservative approach
with these changes and it will be done in small steps.
John Baldwin [Fri, 12 Jul 2019 22:31:12 +0000 (22:31 +0000)]
MFC 339911,339936,343075,343166,348592: Various AMD CPU-specific fixes.
339911:
Emulate machine check related MSR_EXTFEATURES to allow guest OSes to
boot on AMD FX Series.
339936:
Merge cases with upper block.
This is a cosmetic change only to simplify code.
343075:
vmm(4): Take steps towards multicore bhyve AMD support
vmm's CPUID emulation presented Intel topology information to the guest, but
disabled AMD topology information and in some cases passed through garbage.
I.e., CPUID leaves 0x8000_001[de] were passed through to the guest, but
guest CPUs can migrate between host threads, so the information presented
was not consistent. This could easily be observed with 'cpucontrol -i 0xfoo
/dev/cpuctl0'.
Slightly improve this situation by enabling the AMD topology feature flag
and presenting at least the CPUID fields used by FreeBSD itself to probe
topology on more modern AMD64 hardware (Family 15h+). Older stuff is
probably less interesting. I have not been able to empirically confirm it
is sufficient, but it should not regress anything either.
343166:
vmm(4): Mask Spectre feature bits on AMD hosts
For parity with Intel hosts, which already mask out the CPUID feature
bits that indicate the presence of the SPEC_CTRL MSR, do the same on
AMD.
Eventually we may want to have a better support story for guests, but
for now, limit the damage of incorrectly indicating an MSR we do not yet
support.
Eventually, we may want a generic CPUID override system for
administrators, or for minimum supported feature set in heterogenous
environments with failover. That is a much larger scope effort than
this bug fix.
348592:
Emulate the AMD MSR_LS_CFG MSR used for various Ryzen errata.
Using dominance vs a set membership check is indistinguishable from a
compile time perspective, and the two queries return equivelent
results. Simplify code by using the existing function.
Pull in r360978 from upstream llvm trunk (by Philip Reames):
[LFTR] Strengthen assertions in genLoopLimit [NFCI]
Pull in r362292 from upstream llvm trunk (by Nikita Popov):
[IndVarSimplify] Fixup nowrap flags during LFTR (PR31181)
Fix for https://bugs.llvm.org/show_bug.cgi?id=31181 and partial fix
for LFTR poison handling issues in general.
When LFTR moves a condition from pre-inc to post-inc, it may now
depend on value that is poison due to nowrap flags. To avoid this, we
clear any nowrap flag that SCEV cannot prove for the post-inc addrec.
Additionally, LFTR may switch to a different IV that is dynamically
dead and as such may be arbitrarily poison. This patch will correct
nowrap flags in some but not all cases where this happens. This is
related to the adoption of IR nowrap flags for the pre-inc addrec.
(See some of the switch_to_different_iv tests, where flags are not
dropped or insufficiently dropped.)
Finally, there are likely similar issues with the handling of GEP
inbounds, but we don't have a test case for this yet.
Pull in r362971 from upstream llvm trunk (by Philip Reames):
Prepare for multi-exit LFTR [NFC]
This change does the plumbing to wire an ExitingBB parameter through
the LFTR implementation, and reorganizes the code to work in terms of
a set of individual loop exits. Most of it is fairly obvious, but
there's one key complexity which makes it worthy of consideration.
The actual multi-exit LFTR patch is in D62625 for context.
Specifically, it turns out the existing code uses the backedge taken
count from before a IV is widened. Oddly, we can end up with a
different (more expensive, but semantically equivelent) BE count for
the loop when requerying after widening. For the nestedIV example
from elim-extend, we end up with the following BE counts:
BEFORE: (-2 + (-1 * %innercount) + %limit)
AFTER: (-1 + (sext i32 (-1 + %limit) to i64) + (-1 * (sext i32 %innercount to i64))<nsw>)
This is the only test in tree which seems sensitive to this
difference. The actual result of using the wider BETC on this example
is that we actually produce slightly better code. :)
In review, we decided to accept that test change. This patch is
structured to preserve the old behavior, but a separate change will
immediate follow with the behavior change. (I wanted it separate for
problem attribution purposes.)
Pull in r362975 from upstream llvm trunk (by Philip Reames):
[LFTR] Use recomputed BE count
This was discussed as part of D62880. The basic thought is that
computing BE taken count after widening should produce (on average)
an equally good backedge taken count as the one before widening.
Since there's only one test in the suite which is impacted by this
change, and it's essentially equivelent codegen, that seems to be a
reasonable assertion. This change was separated from r362971 so that
if this turns out to be problematic, the triggering piece is obvious
and easily revertable.
For the nestedIV example from elim-extend.ll, we end up with the
following BE counts:
BEFORE: (-2 + (-1 * %innercount) + %limit)
AFTER: (-1 + (sext i32 (-1 + %limit) to i64) + (-1 * (sext i32 %innercount to i64))<nsw>)
Note that before is an i32 type, and the after is an i64. Truncating
the i64 produces the i32.
Pull in r362980 from upstream llvm trunk (by Philip Reames):
Factor out a helper function for readability and reuse in a future
patch [NFC]
Pull in r363613 from upstream llvm trunk (by Philip Reames):
Fix a bug w/inbounds invalidation in LFTR (recommit)
Recommit r363289 with a bug fix for crash identified in pr42279.
Issue was that a loop exit test does not have to be an icmp, leading
to a null dereference crash when new logic was exercised for that
case. Test case previously committed in r363601.
Original commit comment follows:
This contains fixes for two cases where we might invalidate inbounds
and leave it stale in the IR (a miscompile). Case 1 is when switching
to an IV with no dynamically live uses, and case 2 is when doing
pre-to-post conversion on the same pointer type IV.
The basic scheme used is to prove that using the given IV (pre or
post increment forms) would have to already trigger UB on the path to
the test we're modifying. As such, our potential UB triggering use
does not change the semantics of the original program.
As was pointed out in the review thread by Nikita, this is defending
against a separate issue from the hasConcreteDef case. This is about
poison, that's about undef. Unfortunately, the two are different, see
Nikita's comment for a fuller explanation, he explains it well.
(Note: I'm going to address Nikita's last style comment in a separate
commit just to minimize chance of subtle bugs being introduced due to
typos.)
Pull in r363875 from upstream llvm trunk (by Philip Reames):
[LFTR] Rename variable to minimize confusion [NFC]
(Recommit of r363293 which was reverted when a dependent patch was.)
As pointed out by Nikita in D62625, BackedgeTakenCount is generally
used to refer to the backedge taken count of the loop. A conditional
backedge taken count - one which only applies if a particular exit is
taken - is called a ExitCount in SCEV code, so be consistent here.
Pull in r363877 from upstream llvm trunk (by Philip Reames):
[LFTR] Stylistic cleanup as suggested in last review comment of
D62939 [NFC]
(Resumbit of r363292 which was reverted along w/an earlier patch)
Pull in r364346 from upstream llvm trunk (by Philip Reames):
[LFTR] Adjust debug output to include extensions (if any)
Pull in r364693 from upstream llvm trunk (by Philip Reames):
[IndVars] Remove a bit of manual constant folding [NFC]
SCEV is more than capable of folding (add x, trunc(0)) to x.
Pull in r364709 from upstream llvm trunk (by Nikita Popov):
[LFTR] Fix post-inc pointer IV with truncated exit count (PR41998)
Fixes https://bugs.llvm.org/show_bug.cgi?id=41998. Usually when we
have a truncated exit count we'll truncate the IV when comparing
against the limit, in which case exit count overflow in post-inc form
doesn't matter. However, for pointer IVs we don't do that, so we have
to be careful about incrementing the IV in the wide type.
I'm fixing this by removing the IVCount variable (which was ExitCount
or ExitCount+1) and replacing it with a UsePostInc flag, and then
moving the actual limit adjustment to the individual cases (which
are: pointer IV where we add to the wide type, integer IV where we
add to the narrow type, and constant integer IV where we add to the
wide type).
Move the new ipf_pcksum6() function from ip_fil_freebsd.c to fil.c.
The reason for this is that ipftest(8), which still works on FreeBSD-11,
fails to link to it, breaking stable/11 builds.
ipftest(8) was broken (segfault) sometime during the FreeBSD-12 cycle.
glebius@ suggested we disable building it until I can get around to
fixing it. Hence this was not caught in -current.
The intention is to fix ipftest(8) as it is used by the netbsd-tests
(imported by ngie@ many moons ago) for regression testing.
Doug Moore [Fri, 12 Jul 2019 02:03:43 +0000 (02:03 +0000)]
MFC r349286, r349293
Modify swapon(8) to invoke BIO_DELETE to trim swap devices, either if
'-E' appears on the swapon command line, or if "trimonce" appears as
an fstab option.
Remove #include <sys/types.h> to fix a style(9) violation.
Resolve IPv6 checksum errors with stateful inspection. According to
PR/203585 this appears to have been broken by r235959, which predates
the ipfilter 5.1.2 import into FreeBSD.
The IPv6 checksum calculation is incorrect. To resolve this we call
in6_cksum() to do the the heavy lifting for us, through a new function
ipf_pcksum6(). Should we need to revisit this area again, a DTrace probe
is added to aid with future debugging.
Register pfil hooks when VNET != vnet0. r302298, which virtualized ipf,
assumed the pfil hook registration performed in ipf_modload() would take
are of this. However ipf_modload() is only called when the ipl kld is
loaded or when ipfilter is first called when it is statically linked
into the kernel at build time.
Prior to this, even though r302298 has been in the tree for a while, it
has never been used. So, r302298 in reality begins now.
Alexander Motin [Mon, 8 Jul 2019 10:21:38 +0000 (10:21 +0000)]
MFC r349321: Improve AHCI Enclosure Management and SES interoperation.
Since SES specs do not define mechanism to map enclosure slots to SATA
disks, AHCI EM code I written many years ago appeared quite useless,
that always bugged me. I was thinking whether it was a good idea, but
if LSI HBAs do that, why I shouldn't?
This change introduces simple non-standard mechanism for the mapping
into both AHCI EM and SES code, that makes AHCI EM on capable controllers
(most of Intel's) a first-class SES citizen, allowing it to report disk
physical path to GEOM, show devices inserted into each enclosure slot in
`sesutil map` and `getencstat`, control locate and fault LEDs for specific
devices with `sesutil locate adaX on` and `sesutil fault adaX on`, etc.
I've successfully tested this on Supermicro X10DRH-i motherboard connected
with sideband cable of its S-SATA Mini-SAS connector to SAS815TQ backplane.
It can indicate with LEDs Locate, Fault and Rebuild/Remap SES statuses for
each disk identical to real SES of Supermicro SAS2 backplanes.
Alexander Motin [Mon, 8 Jul 2019 10:19:49 +0000 (10:19 +0000)]
MFC r341811 (by delphij):
Remove questionable initialization for ICH8M, rely on BIOS to properly
initialize the controller.
According to the datasheet, the old code checks if port 2 (P2E, 0x4) was
the only enabled port (except port 0, which was ignored by mask 0xfe),
and issue a write to the PCS register to disable all but port 0, right
before ahci_ctlr_reset.
Some other operating systems would issue a port enable to all ports, but
since the current code only does the special initialization for ICH8M,
it entirely and rely on BIOS to do the right thing (the alternative
would be https://reviews.freebsd.org/D18300?id=50922 , should we see
reports that we really need to do it).
Alexander Motin [Mon, 8 Jul 2019 10:18:11 +0000 (10:18 +0000)]
MFC r340092 (by imp): Implement ability to turn on/off PHYs for AHCI devices.
As part of Chuck's work on fixing kernel crashes caused by disk I/O
errors, it is useful to be able to trigger various kinds of
errors. This patch allows causing an AHCI-attached disk to disappear,
by having the driver keep the PHY disabled when the driver would
otherwise enable the PHY. It also allows making the disk reappear by
having the driver go back to setting the PHY enable/disable state as
it normal would and simulating the hardware event that causes a bus
rescan.
Alexander Motin [Sun, 7 Jul 2019 18:50:23 +0000 (18:50 +0000)]
MFC r349243: Optimize xpt_getattr().
Do not allocate temporary buffer for attributes we are going to return
as-is, just make sure to NUL-terminate them. Do not zero temporary 64KB
buffer for CDAI_TYPE_SCSI_DEVID, XPT tells us how much data it filled
and there are also length fields inside the returned data also.
Alexander Motin [Sun, 7 Jul 2019 18:49:39 +0000 (18:49 +0000)]
MFC r349220: Add wakeup_any(), cheaper wakeup_one() for taskqueue(9).
wakeup_one() and underlying sleepq_signal() spend additional time trying
to be fair, waking thread with highest priority, sleeping longest time.
But in case of taskqueue there are many absolutely identical threads, and
any fairness between them is quite pointless. It makes even worse, since
round-robin wakeups not only make previous CPU affinity in scheduler quite
useless, but also hide from user chance to see CPU bottlenecks, when
sequential workload with one request at a time looks evenly distributed
between multiple threads.
This change adds new SLEEPQ_UNFAIR flag to sleepq_signal(), making it wakeup
thread that went to sleep last, but no longer in context switch (to avoid
immediate spinning on the thread lock). On top of that new wakeup_any()
function is added, equivalent to wakeup_one(), but setting the flag.
On top of that taskqueue(9) is switchied to wakeup_any() to wakeup its
threads.
As result, on 72-core Xeon v4 machine sequential ZFS write to 12 ZVOLs
with 16KB block size spend 34% less time in wakeup_any() and descendants
then it was spending in wakeup_one(), and total write throughput increased
by ~10% with the same as before CPU usage.
Alexander Motin [Sun, 7 Jul 2019 18:44:51 +0000 (18:44 +0000)]
MFC r349178: Optimize kern.geom.conf* sysctls.
On large systems those sysctls may generate megabytes of output. Before
this change sbuf(9) code was resizing buffer by 4KB each time many times,
generating tons of TLB shootdowns. Unfortunately in this case existing
sbuf_new_for_sysctl() mechanism, supposed to help with this issue, is not
applicable, since all the sbuf writes are done in different kernel thread.
This change improves situation in two ways:
- on first sysctl call, not providing any output buffer, it sets special
sbuf drain function, just counting the data and so not needing big buffer;
- on second sysctl call it uses as initial buffer size value saved on
previous call, so that in most cases there will be no reallocation, unless
GEOM topology changed significantly.
For busy ARC situation when arc_size close to arc_c is desired. But
then it is quite likely that aggsum_compare(&arc_size, arc_c) will need
to flush per-CPU buckets to find exact comparison result. Doing that
often in a hot path penalizes whole idea of aggsum usage there, since it
replaces few simple atomic additions with dozens of lock acquisitions.
Replacing aggsum_compare() with aggsum_upper_bound() in code increasing
arc_p when ARC is growing (arc_size < arc_c) according to PMC profiles
allows to save ~5% of CPU time in aggsum code during sequential write
to 12 ZVOLs with 16KB block size on large dual-socket system.
I suppose there some minor arc_p behavior change due to lower precision
of the new code, but I don't think it is a big deal, since it should
affect only very small window in time (aggsum buckets are flushed every
second) and in ARC size (buckets are limited to 10 average ARC blocks
per CPU).
Alexander Motin [Sun, 7 Jul 2019 18:40:35 +0000 (18:40 +0000)]
MFC r349039: Alike to ZoL disable metaslab allocation tracing code.
It is too generous to collect in production debug traces that can only
be read with kernel debugger. Illumos includes special code in their
mdb debugger to read it, we don't.
Alexander Motin [Sun, 7 Jul 2019 18:38:40 +0000 (18:38 +0000)]
MFC r349029: Update td_runtime of running thread on each statclock().
Normally td_runtime is updated on context switch, but there are some kernel
threads that due to high absolute priority may run for many seconds without
context switches (yes, that is bad, but that is true), which means their
td_runtime was not updated all that time, that made them invisible for top
other then as some general CPU usage.
Alexander Motin [Sun, 7 Jul 2019 18:37:46 +0000 (18:37 +0000)]
MFC r349006: Move write aggregation memory copy out of vq_lock.
Memory copy is too heavy operation to do under the congested lock.
Moving it out reduces congestion by many times to almost invisible.
Since the original zio removed from the queue, and the child zio is
not executed yet, I don't see why would the copy need protection.
My guess it just remained like this from the time when lock was not
dropped here, which was added later to fix lock ordering issue.
Multi-threaded sequential write tests with both HDD and SSD pools
with ZVOL block sizes of 4KB, 16KB, 64KB and 128KB all show major
reduction of lock congestion, saving from 15% to 35% of CPU time
and increasing throughput from 10% to 40%.
Alexander Motin [Sun, 7 Jul 2019 18:31:53 +0000 (18:31 +0000)]
MFC r349284: Make ELEMENT INDEX validation more strict.
SES specifications tell: "The Additional Element Status descriptors shall
be in the same order as the status elements in the Enclosure Status
diagnostic page". It allows us to question ELEMENT INDEX that is lower
then values we already processed. There are many SAS2 enclosures with
this kind of problem.
While there, add more specific error messages for cases when ELEMENT INDEX
is obviously wrong. Also skip elements with INVALID bit set.
Alexander Motin [Sun, 7 Jul 2019 18:29:10 +0000 (18:29 +0000)]
MFC r349281: Fix individual_element_index when some type has 0 elements.
When some type has 0 elements, saved_individual_element_index was set
to -1 on second type bump, since individual_element_index was not
restored after the first. To me it looks easier just to increment
saved_individual_element_index separately than think when to save it.