FreeBSD notes:
- this MFV reverts FreeBSD commit r314549 to make the merge easier
- at present our emulation of cv_timedwait_hires is rather poor,
so I elected to use cv_timedwait_sbt directly
Please see the differential revision for details.
Unfortunately, I did not get any positive reviews, so there could be
bugs in the FreeBSD-specific piece of the merge.
Hence, the long MFC timeout.
https://www.illumos.org/issues/8585
The current implementation of zil_commit() can introduce significant
latency, beyond what is inherent due to the latency of the underlying
storage. The additional latency comes from two main problems:
1. When there's outstanding ZIL blocks being written (i.e. there's
already a "writer thread" in progress), then any new calls to
zil_commit() will block waiting for the currently oustanding ZIL
blocks to complete. The blocks written for each "writer thread" is
coined a "batch", and there can only ever be a single "batch" being
written at a time. When a batch is being written, any new ZIL
transactions will have to wait for the next batch to be written,
which won't occur until the current batch finishes.
As a result, the underlying storage may not be used as efficiently
as possible. While "new" threads enter zil_commit() and are blocked
waiting for the next batch, it's possible that the underlying
storage isn't fully utilized by the current batch of ZIL blocks. In
that case, it'd be better to allow these new threads to generate
(and issue) a new ZIL block, such that it could be serviced by the
underlying storage concurrently with the other ZIL blocks that are
being serviced.
2. Any call to zil_commit() must wait for all ZIL blocks in its "batch"
to complete, prior to zil_commit() returning. The size of any given
batch is proportional to the number of ZIL transaction in the queue
at the time that the batch starts processing the queue; which
doesn't occur until the previous batch completes. Thus, if there's a
lot of transactions in the queue, the batch could be composed of
many ZIL blocks, and each call to zil_commit() will have to wait for
all of these writes to complete (even if the thread calling
zil_commit() only cared about one of the transactions in the batch).
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Prakash Surya <prakash.surya@delphix.com>
cy [Sun, 29 Oct 2017 04:33:50 +0000 (04:33 +0000)]
Sync (make same) the offsetof macro definition in include/ with the
definition of the same in sys/sys/. The problem was discovered while
working on implementing a new C11 gets_s() for libc. (The new gets_s()
requires rsize_t found in include/stddef.h.) The solution to sync the two
definitions was suggested by ed@ while discussing D12667.
rmacklem [Wed, 25 Oct 2017 19:27:12 +0000 (19:27 +0000)]
MFC: r324506
Fix forced dismount when a pNFS mount is hung on a DS.
When a "pnfs" NFSv4.1 mount is hung because of an unresponsive DS,
a forced dismount wouldn't work, because the RPC socket for the DS
was not being closed. This patch fixes this.
This will only affect "pnfs" mounts where the pNFS server's DS
is unresponsive (crashed or network partitioned or...).
Found during testing of the pNFS server.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>
https://www.illumos.org/issues/3871
GCC 4.5.3 on Gentoo Linux did not like a few of the changes made in the issue
3604 patch. It printed an error and a couple of warnings:
../../cmd/zdb/zdb.c: In function 'dump_bpobj':
../../cmd/zdb/zdb.c:1257:3: error: 'for' loop initial declarations are only
allowed in C99 mode
../../cmd/zdb/zdb.c:1257:3: note: use option -std=c99 or -std=gnu99 to compile
your code
../../cmd/zdb/zdb.c: In function 'dump_deadlist':
../../cmd/zdb/zdb.c:1323:8: warning: too many arguments for format
../../cmd/zdb/zdb.c:1323:8: warning: too many arguments for format
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Richard Yao <ryao@gentoo.org>
https://www.illumos.org/issues/6866
Need this for #6865.
To be generally more scripting-friendly, overload this issue with adding '-q'
option which should skip printing any label information.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>
https://www.illumos.org/issues/7280
zdb is very handy for diagnosing problems with a pool in a safe and
quick way. When a pool is in a bad shape, we often want to disable some
fail-safes, or adjust some tunables in order to open them. In the
kernel, this is done by changing public variables in mdb. The goal of
this feature is to add the same capability to zdb and ztest, so that
they can change libzpool tuneables from the command line.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
jch [Tue, 24 Oct 2017 08:56:11 +0000 (08:56 +0000)]
MFC r324179, r324193:
r324179:
Fix an infinite loop in tcp_tw_2msl_scan() when an INP_TIMEWAIT inp has
been destroyed before its tcptw with INVARIANTS undefined.
This is a symmetric change of r307551:
A INP_TIMEWAIT inp should not be destroyed before its tcptw, and INVARIANTS
will catch this case. If INVARIANTS is undefined it will emit a log(LOG_ERR)
and avoid a hard to debug infinite loop in tcp_tw_2msl_scan().
Reported by: Ben Rubson, hselasky
Submitted by: hselasky
Tested by: Ben Rubson, jch
Sponsored by: Verisign, inc
Differential Revision: https://reviews.freebsd.org/D12267
r324193:
Forgotten bits in r324179: Include sys/syslog.h if INVARIANTS is not defined
dim [Tue, 24 Oct 2017 06:49:06 +0000 (06:49 +0000)]
MFC r324826:
Pull in r316035 from upstream llvm trunk (by Tim Northover):
AArch64: account for possible frame index operand in compares.
If the address of a local is used in a comparison, AArch64 can fold
the address-calculation into the comparison via "adds".
Unfortunately, a couple of places (both hit in this one test) are not
ready to deal with that yet and just assume the first source operand
is a register.
This should fix an assertion failure while building the test suite of
www/firefox for AArch64.
bdrewery [Tue, 24 Oct 2017 00:51:11 +0000 (00:51 +0000)]
MFC r318246,r324566,r324668,r324701:
r318246:
Add a regression test for r318191.
r324566:
Fix shadowed variable hidden by WARNS changing to 3 in r313006.
r324668:
This child is expected to exit on SIGTRAP, don't leave a core behind.
r324701:
Add a test for r324671 along with some other masked tests.
emaste [Tue, 24 Oct 2017 00:32:20 +0000 (00:32 +0000)]
MFC r324683: write.2: correct maximum nbytes size for EINVAL error
In FreeBSD 11 and later debug.iosize_max_clamp defaults to 0, and the
maximum nbytes count for write(2) is SSIZE_MAX. Update the man page to
document this, and mention the sysctl that can be set to obtain the
previous behaviour.
bdrewery [Mon, 23 Oct 2017 18:25:21 +0000 (18:25 +0000)]
MFC r316286:
Add support for capturing 'struct ptrace_lwpinfo' for signals resulting in a
process dumping core in the corefile.
Direct stable changed: Padding added to struct thread and td_si added to end
with explicit bzeroing when forking/initializing a thread to preserve KBI.
kib [Sun, 22 Oct 2017 08:47:13 +0000 (08:47 +0000)]
MFC r323772, r324302-r324308, r324310, r324313, r324315, r324326, r324330,
r324334, r324354-r324355, r324366, r324432-r324433, r324437-r324439:
Fixes and improvements for x86 LDT handling.
oshogbo [Sat, 21 Oct 2017 19:37:25 +0000 (19:37 +0000)]
MFC r323858:
IMHO it is possible that failure will be treated as success because we don't
initialize nvp on every loop iteration and the code under 'fail'(!) label
detects success by checking of nvp != NULL.
oshogbo [Sat, 21 Oct 2017 19:34:54 +0000 (19:34 +0000)]
MFC r323851:
Remove redundant initialization. Don't use variable - just return the value.
Make scan-build happy by casting to 'void *' instead of 'void **'.
Submitted by: pjd@
Found by: scan-build and cppcheck
Sponsored by: Wheel Systems
oshogbo [Sat, 21 Oct 2017 19:33:31 +0000 (19:33 +0000)]
MFC r323854:
Because nvp wasn't initialized on every loop iteration once we jumped
to 'fail' on error it was treated as success, because nvp!=NULL. Fix this
by not handling success under 'fail' label and by using separate variable
for parent nvpair.
If we succeeded to allocate nvlist, but failed to allocated nvpair we
would leak nvls[ii] on return. Destroy it when we cannot allocate nvpair,
before we goto fail.
Submitted by: pjd@ and oshogbo@ (minor changes)
Found by: scan-build
Sponsored by: Wheel Systems
oshogbo [Sat, 21 Oct 2017 19:30:33 +0000 (19:30 +0000)]
MFC r323852:
The 'while (array != NULL) { }' suggests scan-build that array may be
initially NULL, which is not possible. Change the loop to
'do {} while (array != NULL)' to satisfy scan-build and assert that
array really cannot be NULL just in case.
Submitted by: pjd@
Found by: scan-build
MFC after: 1 month
Sponsored by: Wheel Systems
hselasky [Fri, 20 Oct 2017 10:04:43 +0000 (10:04 +0000)]
MFC r324445:
When showing the sleepqueues from the in-kernel debugger,
properly dump all the sleepqueues and not just the first one
History:
It appears that in the commit which introduced the code,
r165272, the array indexes of "sq_blocked[0]" and "td_name[i]"
were interchanged. In r180927 "td_name[i]" was corrected to
"td_name[0]", but "sq_blocked[0]" was left unchanged.
davidcs [Thu, 19 Oct 2017 17:28:09 +0000 (17:28 +0000)]
MFC r324535
Add sanity checks in ql_hw_send() qla_send() to ensure that empty slots
in Tx Ring map to empty slot in Tx_buf array before Transmits. If the
checks fail further Transmission on that Tx Ring is prevented.
ian [Thu, 19 Oct 2017 17:26:26 +0000 (17:26 +0000)]
MFC r323997-r323998
r323997:
Fix handling of uncaught exceptions in a std::terminate() handler on arm.
When raising an exception, the unwinder searches for a catch handler and if
none is found it should invoke std::terminate() with the uncaught exception
as the "current" exception. Before this change, the terminate handler was
invoked with no exception as current (abi::__cxa_current_exception_type()
returned NULL), because the return value from the unwinder indicated an
internal failure in unwinding. It turns out that was because all errors
from get_eit_entry() were translated to _URC_FAILURE. Now the error is
returned untranslated, which allows _URC_END_OF_STACK to percolate upwards
to throw_exception() in libcxxrt. When it sees that return status it
properly calls std::terminate() with the uncaught exception installed
as the current exception, allowing custom terminate handlers to work
with it.
r323998:
Fix the return value from _Unwind_Backtrace() on arm.
If unwinding stops due to hitting the end of the call chain, the return
value is supposed to be _URC_END_OF_STACK; other values indicate internal
errors. The return value from get_eit_entry() is now returned without
translating it to _URC_FAILURE, so that callers can see _URC_END_OF_STACK
when it happens.
ian [Thu, 19 Oct 2017 16:07:57 +0000 (16:07 +0000)]
MFC r323392:
Add gpio methods to read/write/configure up to 32 pins simultaneously.
Sometimes it is necessary to combine several gpio pins into an ad-hoc bus
and manipulate the pins as a group. In such cases manipulating the pins
individualy is not an option, because the value on the "bus" assumes
potentially-invalid intermediate values as each pin is changed in turn. Note
that the "bus" may be something as simple as a bi-color LED where changing
colors requires changing both gpio pins at once, or something as complex as
a bitbanged multiplexed address/data bus connected to a microcontroller.
In addition to the absolute requirement of simultaneously changing the
output values of driven pins, a desirable feature of these new methods is to
provide a higher-performance mechanism for reading and writing multiple
pins, especially from userland where pin-at-a-time access incurs a noticible
syscall time penalty.
These new interfaces are NOT intended to abstract away all the ugly details
of how gpio is implemented on any given platform. In fact, to use these
properly you absolutely must know something about how the gpio hardware is
organized. Typically there are "banks" of gpio pins controlled by registers
which group several pins together. A bank may be as small as 2 pins or as
big as "all the pins on the device, hundreds of them." In the latter case, a
driver might support this interface by allowing access to any 32 adjacent
pins within the overall collection. Or, more likely, any 32 adjacent pins
starting at any multiple of 32. Whatever the hardware restrictions may be,
you would need to understand them to use this interface.
In additional to defining the interfaces, two example implementations are
included here, for imx5/6, and allwinner. These represent the two primary
types of gpio hardware drivers. imx6 has multiple gpio devices, each
implementing a single bank of 32 pins. Allwinner implements a single large
gpio number space from 1-n pins, and the driver internally translates that
linear number space to a bank+pin scheme based on how the pins are grouped
into control registers. The allwinner implementation imposes the restriction
that the first_pin argument to the new functions must always be pin 0 of a
bank.
gordon [Tue, 17 Oct 2017 17:30:18 +0000 (17:30 +0000)]
MFC r324696: Update wpa_supplicant/hostapd for 2017-01 vulnerability release.
hostapd: Avoid key reinstallation in FT handshake
Prevent reinstallation of an already in-use group key
Extend protection of GTK/IGTK reinstallation of WNM-Sleep Mode cases
Fix TK configuration to the driver in EAPOL-Key 3/4 retry case
Prevent installation of an all-zero TK
Fix PTK rekeying to generate a new ANonce
TDLS: Reject TPK-TK reconfiguration
WNM: Ignore Key Data in WNM Sleep Mode Response frame if no PMF in use
WNM: Ignore WNM-Sleep Mode Response if WNM-Sleep Mode has not been used
WNM: Ignore WNM-Sleep Mode Response without pending request
FT: Do not allow multiple Reassociation Response frames
TDLS: Ignore incoming TDLS Setup Response retries
ngie [Tue, 17 Oct 2017 15:52:02 +0000 (15:52 +0000)]
MFC r324478:
Check the exit code from fsck_ffs instead of relying on MODIFIED being in the output
^/head@r323923 changed when MODIFIED is printed at exit. It's better to follow the
documented way of determining whether or not a filesystem is clean per fsck_ffs, i.e.,
ensure that the exit code is either 0 or 7.
The pass/fail determination is brittle prior to this commit, and ^/head@r323923 made
the issue apparent -- thus this needs to be fixed independent of ^/head@r323923.
jhb [Tue, 17 Oct 2017 12:45:51 +0000 (12:45 +0000)]
MFC 323579,323585: Add AT_HWCAP and AT_EHDRFLAGS on all platforms.
To preserve KBI on stable/11, a new SV_HWCAP flag is added which
indicates if the sv_hwcap field is present and valid to avoid examining
the field in old modules. Only sysentvec's which wish to use sv_hwcap
need to set the flag in stable/11.
323579:
Add AT_HWCAP and AT_EHDRFLAGS on all platforms.
A new 'u_long *sv_hwcap' field is added to 'struct sysentvec'. A
process ABI can set this field to point to a value holding a mask of
architecture-specific CPU feature flags. If an ABI does not wish to
supply AT_HWCAP to processes the field can be left as NULL.
The support code for AT_EHDRFLAGS was already present on all systems,
just the #define was not present. This is a step towards unifying the
AT_* constants across platforms.
323585:
Add AT_EHDRFLAGS and AT_HWCAP on amd64.
x86 has two separate (but identical) list of AT_* constants and the
earlier commit to add AT_HWCAP only updated the i386 list.
tuexen [Tue, 17 Oct 2017 12:42:17 +0000 (12:42 +0000)]
MFC r322648:
Ensure inp_vflag is consistently set for TCP endpoints.
Make sure that the flags INP_IPV4 and INP_IPV6 are consistently set
for inpcbs used for TCP sockets, no matter if the setting is derived
from the net.inet6.ip6.v6only sysctl or the IPV6_V6ONLY socket option.
For UDP this was already done right.
brooks [Sat, 14 Oct 2017 16:23:25 +0000 (16:23 +0000)]
MFC r324243:
Remove an unneeded and incorrect memset().
On Variant I TLS architectures (aarch64, arm, mips, powerpc, and riscv)
the __libc_allocate_tls function allocates thread local storage memory
with calloc(). It then copies initialization data over the portions with
non-zero initial values. Before this change it would then pointlessly
zero the already zeroed remainder of the storage. Unfortunately the
calculation was wrong and it would zero TLS_TCB_SIZE (2*sizeof(void *))
additional bytes.
In practice, this overflow only matters if the TLS segment is sized such
that calloc() allocates less than TLS_TCB_SIZE extra memory. Even
then, the likely result will be zeroing part of the next bucket. This
coupled with the impact being confined to Tier II platforms means there
will be no security advisory for this issue.
jhb [Fri, 13 Oct 2017 22:40:57 +0000 (22:40 +0000)]
MFC 324039: Don't defer wakeup()s for completed journal workitems.
Normally wakeups() are performed for completed softupdates work items
in workitem_free() before the underlying memory is free()'d.
complete_jseg() was clearing the "wakeup needed" flag in work items to
defer the wakeup until the end of each loop iteration. However, this
resulted in the item being free'd before it's address was used with
wakeup(). As a result, another part of the kernel could allocate this
memory from malloc() and use it as a wait channel for a different
"event" with a different lock. This triggered an assertion failure
when the lock passed to sleepq_add() did not match the existing lock
associated with the sleep queue. Fix this by removing the code to
defer the wakeup in complete_jseg() allowing the wakeup to occur
slightly earlier in workitem_free() before free() is called.
jhb [Fri, 13 Oct 2017 17:11:08 +0000 (17:11 +0000)]
MFC 324072: Add UMA_ALIGNOF().
This is a wrapper around _Alignof() that sets the alignment for a zone
to the alignment required by a given type. This allows the compiler to
determine the proper alignment rather than having the programmer try to
guess.
sephe [Fri, 13 Oct 2017 05:09:56 +0000 (05:09 +0000)]
MFC 324489,324516
324489
hyperv/hn: Workaround erroneous hash type observed on WS2016.
Background:
- UDP 4-tuple hash type is unconditionally enabled in Hyper-V on WS2016,
which is _not_ affected by NDIS_OBJTYPE_RSS_PARAMS.
- Non-fragment UDP/IPv4 datagrams' hash type is delivered to VM as
TCP_IPV4.
Currently this erroneous behavior only applies to WS2016/Windows10.
Force l3/l4 protocol check, if the RXed packet's hash type is TCP_IPV4,
and the Hyper-V is running on WS2016/Windows10. If the RXed packet is
UDP datagram, adjust mbuf hash type to UDP_IPV4.
Sponsored by: Microsoft
324516
hyperv/hn: Workaround erroneous hash type observed on WS2016 for VF.
hselasky [Thu, 12 Oct 2017 08:27:57 +0000 (08:27 +0000)]
MFC r324320:
Add support for new cuse(3) error code, CUSE_ERR_NO_DEVICE.
This error code is useful when emulating Linux input event
devices from userspace.
rmacklem [Wed, 11 Oct 2017 13:20:24 +0000 (13:20 +0000)]
MFC: r324074
Fix a memory leak that occurred in the pNFS client.
When a "pnfs" NFSv4.1 mount was unmounted, it didn't free up the layouts
and deviceinfo structures. This leak only affects "pnfs" mounts and only
when the mount is umounted.
Found while testing the pNFS Flexible File layout client code.