markj [Fri, 8 Mar 2019 21:07:08 +0000 (21:07 +0000)]
Have pthread_cond_destroy() return EBUSY if the condvar has waiters.
This is not required of a compliant implementation, but it's easy to
check for and helps improve compatibility with other common
implementations. Moreover, it's consistent with our
pthread_mutex_destroy().
PR: 234805
Reviewed by: jhb, kib, ngie
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19496
mav [Fri, 8 Mar 2019 19:38:52 +0000 (19:38 +0000)]
Add separate aggregation limit for non-rotating media.
Before sequential scrub patches ZFS never aggregated I/Os above 128KB.
Sequential scrub bumped that to 1MB, which motivation I understand for
spinning disks, since it should reduce number of head seeks. But for
SSDs it makes much less sense to me, especially on FreeBSD, where due
to MAXPHYS limitation device will likely still see bunch of 128KB I/Os
instead of one large. Having more strict aggregation limit allows to
avoid allocation of large memory buffer and memcpy to/from it, that is
a serious problem when bandwidth reaches few GB/s.
bcr [Fri, 8 Mar 2019 18:58:41 +0000 (18:58 +0000)]
Update members of doceng in the chart.
Update the members of the doceng team to include ryusuke@ and me.
Sort and reduce number of line breaks so that our bubble in the
chart is a bit more compact (similar to portmgr next to it).
Update the bounds checking for zfs_vdev_aggregation_limit so that
it has a floor of zero and a maximum value of the supported block
size for the pool.
Additionally add an early return when zfs_vdev_aggregation_limit
equals zero to disable aggregation. For very fast solid state or
memory devices it may be more expensive to perform the aggregation
than to issue the IO immediately.
Commit 8542ef8 allowed optional IOs to be aggregated beyond
the specified aggregation limit. Since the aggregation limit
was also used to enforce the maximum block size, setting
`zfs_vdev_aggregation_limit=16777216` could result in an
attempt to allocate an ABD larger than 16M.
Author: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6259
Closes #6270
zfsonlinux/zfs@2d678f779aba26a93314c8ee1142c3985fa25cb6
r343295 broke DIOCGETSRCNODES by failing to reset 'nr' after counting the
number of source tracking nodes.
This meant that we never copied the information to userspace, leading to '? ->
?' output from pfctl.
hselasky [Fri, 8 Mar 2019 09:18:29 +0000 (09:18 +0000)]
Teardown ifnet after stopping port in the mlx4en(4) driver.
mlx4_en_stop_port() calls mlx4_en_put_qp() which can refer the link level
address of the network interface, which in turn will be freed by the
network interface detach function. Make sure the port is stopped
before detaching the network interface.
jhibbits [Fri, 8 Mar 2019 04:20:33 +0000 (04:20 +0000)]
powerpc64: Fix early exit with invalid kernel SLB entries
The check for early exit should be checking the SLB entry itself. As
currently written it was checking the address of the SLB, which is always
non-zero, so would go through the kernel SR restore loop regardless.
jhibbits [Fri, 8 Mar 2019 03:59:53 +0000 (03:59 +0000)]
powerpc: Fix cpufreq statement scoping
The second statements on the lines are not guarded by the `if' condition.
This triggers a warning with newer gcc. It's relatively harmless given the
usage, but incorrect. Instead, wrap the statements so they're properly
guarded.
cem [Fri, 8 Mar 2019 01:17:20 +0000 (01:17 +0000)]
Fortuna: Add Chacha20 as an alternative stream cipher
Chacha20 with a 256 bit key and 128 bit counter size is a good match for an
AES256-ICM replacement.
In userspace, Chacha20 is typically marginally slower than AES-ICM on
machines with AESNI intrinsics, but typically much faster than AES on
machines without special intrinsics. ChaCha20 does well on typical modern
architectures with SIMD instructions, which includes most types of machines
FreeBSD runs on.
In the kernel, we can't (or don't) make use of AESNI intrinsics for
random(4) anyway. So even on amd64, using Chacha provides a modest
performance improvement in random device throughput today.
This change makes the stream cipher used by random(4) configurable at boot
time with the 'kern.random.use_chacha20_cipher' tunable.
Very rough, non-scientific measurements at the /dev/random device, on a
GENERIC-NODEBUG amd64 VM with 'pv', show a factor of 2.2x higher throughput
for Chacha20 over the existing AES-ICM mode.
When we roam between networks and our link-state goes down, automatically remove
the IPv6-Only flag from the interface. Otherwise we might switch from an
IPv6-only to and IPv4-only network and the flag would stay and we would prevent
IPv4 from working.
While the actual function call to clear the flag is under EXPERIMENTAL,
the eventhandler is not as we might want to re-use it for other
functionality on link-down event (such was re-calculate default routers
for example if there is more than one).
mav [Thu, 7 Mar 2019 22:56:39 +0000 (22:56 +0000)]
Improve entropy for ZFS taskqueue selection.
I just found that at least on Skylake CPUs cpu_ticks() never returns odd
values, only even, and possibly has even bigger step (176/2?), that makes
its lower bits very bad entropy source, leaving half of taskqueues unused.
Switch to sbinuptime(), closer to upstreams, mitigates the problem by the
rate conversion working as kind of hash function. In case that is somehow
not enough (timer rate is too low or too divisible) mix in curcpu.
dim [Thu, 7 Mar 2019 19:33:39 +0000 (19:33 +0000)]
Pull in r354937 from upstream clang trunk (by Jörg Sonnenberger):
Fix inline assembler constraint validation
The current constraint logic is both too lax and too strict. It fails
for input outside the [INT_MIN..INT_MAX] range, but it also
implicitly accepts 0 as value when it should not. Adjust logic to
handle both correctly.
Pull in r355491 from upstream clang trunk (by Hans Wennborg):
Inline asm constraints: allow ICE-like pointers for the "n"
constraint (PR40890)
Apparently GCC allows this, and there's code relying on it (see bug).
The idea is to allow expression that would have been allowed if they
were cast to int. So I based the code on how such a cast would be
done (the CK_PointerToIntegral case in
IntExprEvaluator::VisitCastExpr()).
These should fix assertions and errors when using the inline assembly
"n" constraint in certain ways.
In case of devel/valgrind, a pointer was used as the input for the
constraint, which lead to "Assertion failed: (isInt() && "Invalid
accessor"), function getInt".
In case of math/secp256k1, a very large integer value was used as input
for the constraint, which lead to "error: value '4624529908474429119'
out of range for constraint 'n'".
Make the tests run slightly faster by having pft_ping.py end the capture
of packets as soon as it sees the expected packet, rather than
continuing to sniff.
0mp [Thu, 7 Mar 2019 11:09:25 +0000 (11:09 +0000)]
Do not reference deskutils/cal from cal.1.
The ports version of cal is an abandonware so in order to minimize the
potential bit rot of our documentation let's not mention it at all.
Interested users are going to find suitable alternatives anyway on their
own.
tuexen [Thu, 7 Mar 2019 08:43:20 +0000 (08:43 +0000)]
After removing an entry from the stream scheduler list, set the pointers
to NULL, since we are checking for it in case the element gets inserted
again.
Fix the problem with O_LIMIT states introduced in r344018.
dyn_install_state() uses `rule` pointer when it creates state.
For O_LIMIT states this pointer actually is not struct ip_fw,
it is pointer to O_LIMIT_PARENT state, that keeps actual pointer
to ip_fw parent rule. Thus we need to cache rule id and number
before calling dyn_get_parent_state(), so we can use them later
when the `rule` pointer is overrided.
cem [Thu, 7 Mar 2019 00:55:49 +0000 (00:55 +0000)]
fuse: switch from DFLTPHYS/MAXBSIZE to maxcachebuf
On GENERIC kernels with empty loader.conf, there is no functional change.
DFLTPHYS and MAXBSIZE are both 64kB at the moment. This change allows
larger bufcache block sizes to be used when either MAXBSIZE (custom kernel)
or the loader.conf tunable vfs.maxbcachebuf (GENERIC) is adjusted higher
than the default.
All changes are hidden behind the EXPERIMENTAL option and are not compiled
in by default.
Add ND6_IFF_IPV6_ONLY_MANUAL to be able to set the interface into no-IPv4-mode
manually without router advertisement options. This will allow developers to
test software for the appropriate behaviour even on dual-stack networks or
IPv6-Only networks without the option being set in RA messages.
Update ifconfig to allow setting and displaying the flag.
Update the checks for the filters to check for either the automatic or the manual
flag to be set. Add REVARP to the list of filtered IPv4-related protocols and add
an input filter similar to the output filter.
Add a check, when receiving the IPv6-Only RA flag to see if the receiving
interface has any IPv4 configured. If it does, ignore the IPv6-Only flag.
Add a per-VNET global sysctl, which is on by default, to not process the automatic
RA IPv6-Only flag. This way an administrator (if this is compiled in) has control
over the behaviour in case the node still relies on IPv4.
cem [Wed, 6 Mar 2019 22:56:49 +0000 (22:56 +0000)]
FUSE: Prevent trivial panic
When open(2) was invoked against a FUSE filesystem with an unexpected flags
value (no O_RDONLY / O_RDWR / O_WRONLY), an assertion fired, causing panic.
For now, prevent the panic by rejecting such VOP_OPENs with EINVAL.
This is not considered the correct long term fix, but does prevent an
unprivileged denial-of-service.
dim [Wed, 6 Mar 2019 18:19:27 +0000 (18:19 +0000)]
Put in a temporary workaround for what is likely a gcc 6 bug (it does
not occur with gcc 7 or later). This should prevent the following error
from breaking the head-amd64-gcc CI builds:
In file included from /workspace/src/contrib/llvm/tools/lldb/source/API/SBMemoryRegionInfo.cpp:14:0:
/workspace/src/contrib/llvm/tools/lldb/include/lldb/Target/MemoryRegionInfo.h:128:54: error: 'template<class _InputIterator> lldb_private::MemoryRegionInfos::MemoryRegionInfos(_InputIterator, _InputIterator, const allocator_type&)' inherited from 'std::__1::vector<lldb_private::MemoryRegionInfo>'
using std::vector<lldb_private::MemoryRegionInfo>::vector;
^~~~~~
/workspace/src/contrib/llvm/tools/lldb/include/lldb/Target/MemoryRegionInfo.h:128:54: error: conflicts with version inherited from 'std::__1::vector<lldb_private::MemoryRegionInfo>'
adrian [Wed, 6 Mar 2019 07:58:19 +0000 (07:58 +0000)]
[athani] Add a simple tool to list and control ANI parameters.
This is a WIP tool I'm using to figure out why ANI is weirdly busted in my
home FreeBSD AP/STA setup. Although athstats (mostly) gets the ANI statistics
correct, ANI is making the radio deaf it doesn't recover without being disabled.
adrian [Wed, 6 Mar 2019 07:54:29 +0000 (07:54 +0000)]
[ath_hal] [ath_hal_ar9300] ANI fixes and preparation for userland control.
* The ani function bitmap was being badly used when determining if a command
could be used. In hostap modes only a couple of the ANI control parameters
are enabled.
* The ani function bitmap was not being reset to HAL_ANI_ALL if transitioning
from AP -> STA.
* Change mrcCckOff to mrcCck - 1 == on, rather than 1 == off. This matches
the API used to set the value from userland via the diagnostic API.
* Handle OFDM/CCK noise immunity level commands in ar9300_ani_control().
These will only come from userland and it will go and program the rest of
the ANI control parameters with the values in the ANI table.
* Ensure all of the ANI parameters can be tweaked at runtime, even if they're
disabled.
Extend libsecureboot(old libve) to obtain trusted certificates from UEFI and implement revocation
UEFI related headers were copied from edk2.
A new build option "MK_LOADER_EFI_SECUREBOOT" was added to allow
loading of trusted anchors from UEFI.
Certificate revocation support is also introduced.
The forbidden certificates are loaded from dbx variable.
Verification fails in two cases:
There is a direct match between cert in dbx and the one in the chain.
The CA used to sign the chain is found in dbx.
One can also insert a hash of TBS section of a certificate into dbx.
In this case verifications fails only if a direct match with a
certificate in chain is found.
bcran [Wed, 6 Mar 2019 05:39:40 +0000 (05:39 +0000)]
Add retry loop around GetMemoryMap call to fix fragmentation bug
The call to BS->AllocatePages can cause the memory map to become framented,
causing BS->GetMemoryMap to return EFI_BUFFER_TOO_SMALL more than once. For
example this can happen on the MinnowBoard Turbot, causing the boot to stop
with an error. Avoid this by calling GetMemoryMap in a loop.
markj [Tue, 5 Mar 2019 19:45:37 +0000 (19:45 +0000)]
Show wiring state of map entries in procstat -v.
Note that only entries wired by userspace are shown as such. In
particular, entries transiently wired by sysctl_wire_old_buffer() are
not flagged as wired in procstat -v output.
erj [Tue, 5 Mar 2019 19:12:51 +0000 (19:12 +0000)]
Remove references to CONTIGMALLOC_WORKS in iflib and em
From Jake:
"The iflib_fl_setup() function tries to pick various buffer sizes based
on the max_frame_size value defined by the parent driver. However, this
code was wrapped under CONTIGMALLOC_WORKS, which was never actually
defined anywhere.
This same code pattern was used in if_em.c, likely trying to match
what iflib uses.
Since CONTIGMALLOC_WORKS is not defined, remove this dead code from
iflib_fl_setup and if_em.c
Given that various iflib drivers appear to be using a similar
calculation, it might be worth making this buffer size a value that the
driver can peek at in the future."
The if_tun cloner is not virtualised, but if_clone_attach() does use a
virtualised list of cloners.
The result is that we can't find the if_tun cloner when we try to remove
a renamed tun interface. Virtualise the cloner, and move the final
cleanup into a sysuninit so that we're sure this happens after all of
the vnet_sysuninits
Note that we need unit numbers to be system-unique (rather than unique
per vnet, as is done by if_clone_simple()). The unit number is used to
create the corresponding /dev/tunX device node, and this node must match
with the interface.
Switch to if_clone_advanced() so that we have control over the unit
numbers.
Reproduction scenario:
jail -c -n foo persist vnet
jexec test ifconfig tun create
jexec test ifconfig tun0 name wg0
jexec test ifconfig wg0 destroy
jhibbits [Tue, 5 Mar 2019 04:16:50 +0000 (04:16 +0000)]
Fix binutils compilation error with Clang 8
Summary:
This change fixes the following compilation error when using clang 8 to cross
compile base to powerpc64:
```
/usr/src/gnu/usr.bin/binutils/libopcodes/../../../../contrib/binutils/opcodes/ppc-dis.c:100:35:
error: arithmetic on a null pointer treated as a cast from integer to pointer is
a GNU extension [-Werror,-Wnull-pointer-arithmetic]
info->private_data = (char *) 0 + dialect;
~~~~~~~~~~ ^
1 error generated.
*** [ppc-dis.o] Error code 1
make[6]: stopped in /usr/src/gnu/usr.bin/binutils/libopcodes
1 error
```
Test Plan:
- buildworld for x86_64 (native)
- buildworld for powerpc64 (cross)
- buildworld for powerpc64 (native)
marcel [Tue, 5 Mar 2019 04:15:34 +0000 (04:15 +0000)]
Revert revision 254095
In revision 254095, gpt_entries is not set to match the on-disk
hdr_entries, but rather is computed based on available space.
There are 2 problems with this:
1. The GPT backend respects hdr_entries and only reads and writes
that number of partition entries. On top of that, CRC32 is
computed over the table that has hdr_entries elements. When
the common code works on what is possibly a larger number, the
behaviour becomes inconsistent and problematic. In particular,
it would be possible to add a new partition that on a reboot
isn't there anymore.
2. The calculation of gpt_entries is based on flawed assumptions.
The GPT specification does not dictate that sectors are layed
out in a particular way that the available space can be
determined by looking at LBAs. In practice, implementations
do the same thing, because there's no reason to do it any
other way. Still, GPT allows certain freedoms that can be
exploited in some form or shape if the need arises.
dim [Mon, 4 Mar 2019 19:39:59 +0000 (19:39 +0000)]
Upgrade our copies of clang, llvm, lld, lldb, compiler-rt and libc++ to
the upstream release_80 branch r355313 (effectively, 8.0.0 rc3). The
release will follow very soon, but no more functional changes are
expected.
Release notes for llvm, clang and lld 8.0.0 will soon be available here:
<https://releases.llvm.org/8.0.0/docs/ReleaseNotes.html>
<https://releases.llvm.org/8.0.0/tools/clang/docs/ReleaseNotes.html>
<https://releases.llvm.org/8.0.0/tools/lld/docs/ReleaseNotes.html>
tests: Move common (vnet) test functions into a common file
The netipsec and pf tests have a number of common test functions. These
used to be duplicated, but it makes more sense for them to re-use the
common functions.
fsu [Mon, 4 Mar 2019 10:42:25 +0000 (10:42 +0000)]
Make superblock reading logic more strict.
Add more on-disk superblock consistency checks to ext2_compute_sb_data() function.
It should decrease the probability of mounting filesystems with corrupted superblock data.
avos [Mon, 4 Mar 2019 03:02:14 +0000 (03:02 +0000)]
rtwn_usb(4): fix Tx instability with RTL8192CU chipsets
- Fix data frames transmission via POWER_STATUS register setup -
it seems to be set by MACID_CONFIG firmware command, which was broken*
in r290439 and later disabled in r307529.
We can re-enable it later if / when firmware rate adaptation will be
ready; however, this step will be required anyway - for firmware-less
builds.
- Force RTS / CTS protection frame rate to CCK1 (this rate works fine
without any additional setup; no better workaround is known yet).
The problem was not observed on the channel 1 or with CCK1 rate enforced
('ifconfig wlan0 ucastrate 1' for 11 b/g; not possible for 11n networks
due to ifconfig(8) bug).
* I'm not sure if it works before r290439 because - AFAIR - I never seen
firmware rate adaptation working for 10-STABLE urtwn(4)
(It needs EN_BCN bit set and RSSI updates at least).
Tested with RTL8188CUS in STA mode
(in regular mode and with disabled MRR - DARFRC*8 is set to 0)
mav [Mon, 4 Mar 2019 00:49:07 +0000 (00:49 +0000)]
Reduce CTL threads priority to about PUSER.
Since in most configurations CTL serves as network service, we found
that this change improves local system interactivity under heavy load.
Priority of main threads is set slightly higher then worker taskqueues
to make them quickly sort incoming requests not creating bottlenecks,
while plenty of worker taskqueues should be less sensitive to latency.
tuexen [Sun, 3 Mar 2019 19:55:06 +0000 (19:55 +0000)]
Allocate an assocition id and register the stcb with holding the lock.
This avoids a race where stcbs can be found, which are not completely
initialized.
glebius [Sun, 3 Mar 2019 18:57:48 +0000 (18:57 +0000)]
Remove bogus assert that I added in r319722. It is a legitimate case
to call soabort() on a newborn socket created by sonewconn() in case
if further setup of PCB failed. Code in sofree() handles such socket
correctly.