https://www.illumos.org/issues/7280
zdb is very handy for diagnosing problems with a pool in a safe and
quick way. When a pool is in a bad shape, we often want to disable some
fail-safes, or adjust some tunables in order to open them. In the
kernel, this is done by changing public variables in mdb. The goal of
this feature is to add the same capability to zdb and ztest, so that
they can change libzpool tuneables from the command line.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
https://www.illumos.org/issues/8473
Scrubbing is supposed to detect and repair all errors in the pool. However,
it wrongly ignores active spare devices. The problem can easily be
reproduced in OpenZFS at git rev 0ef125d with these commands:
truncate -s 64m /tmp/a /tmp/b /tmp/c
sudo zpool create testpool mirror /tmp/a /tmp/b spare /tmp/c
sudo zpool replace testpool /tmp/a /tmp/c
/bin/dd if=/dev/zero bs=1024k count=63 oseek=1 conv=notrunc of=/tmp/c
sync
sudo zpool scrub testpool
zpool status testpool # Will show 0 errors, which is wrong
sudo zpool offline testpool /tmp/a
sudo zpool scrub testpool
zpool status testpool # Will show errors on /tmp/c,
# which should've already been fixed
FreeBSD head is partially affected: the first scrub will detect some errors, but the second scrub will detect more.
Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
asomers [Tue, 28 Nov 2017 18:18:39 +0000 (18:18 +0000)]
MFC r323275, r324112
r323275:
Add basic tests for chflags, mkdir, rcp, and rmdir
Add basic command line parsing test coverage for these utilities. The tests
were automatically generated based on their man pages. These tests can be
expanded by hand for more thorough coverage. The aim is to generate very
basic amount of test coverage for all the utilities in the base system.
asomers [Tue, 28 Nov 2017 17:33:10 +0000 (17:33 +0000)]
MFC r323194:
Fix remounting ZFS filesystem with "zfs mount"
"zfs mount -o" passes a list of mount options directly to nmount(2) after
sanity checking them. In particular, zfs(8) will refuse to mount an already
existing file system unless "remount" is specified in the option list.
However, the "remount" option only exists in Illumos. FreeBSD's equivalent is
"update".
asomers [Tue, 28 Nov 2017 17:30:25 +0000 (17:30 +0000)]
MFC r323193:
Honor all options of "zfs mount -o"
The existing code in zmount incorrectly parses the comma-delimited option
string. The result is that nmount only honors the last option. AFAICT the
parsing has been broken ever since ZFS's initial import in change 168404.
brooks [Tue, 28 Nov 2017 17:20:53 +0000 (17:20 +0000)]
MFC r301679 (partial), r309626, r326307
r301679:
Update to a June 8th snapshot of (un)vis form NetBSD.
This adds stravis() and some new encoding flags VIS_SHELL, VIS_META,
and VIS_NOLOCALE.
Assorted cleanups and fixes includeing a manpage typo[0].
NOTE: The symbol for stravis() is not exported in this merge.
r309626:
strvis(3): Avoid internal state of multibyte functions being tainted.
The mbtoc(3) and wctomb(3) functions use internal state which may be
tainted before the call to strvis(3). In this context we can just use
the thread-safe versions mbrtoc(3) and wcrtomb(3) which allow passing
our own state from our stack.
r326307:
Update vis(3) the latest from NetBSD.
This adds VIS_DQ for compatiblity with OpenBSD.
Correct by an off-by-one error and a read buffer overflow detected using
asan.
gjb [Mon, 27 Nov 2017 15:12:14 +0000 (15:12 +0000)]
MFC r326068:
Remove /etc/resolv.conf from virtual machine images, which is
copied from the build host. It is renamed to /etc/resolv.conf.bak
on boot, so never used anyway.
delphij [Wed, 22 Nov 2017 06:36:55 +0000 (06:36 +0000)]
MFC r325532: Update arcmsr(4) to 1.40.00.01:
- Fix clear doorbell queue buffer for ADAPTER_TYPE_B
- Fix release memory resource when detach device
- Add support for ARC-1216, 1226 SAS 12Gb controllers
- Declare some functions as static
- Change checking dword read/write for IOP rqbuffer.
Many thanks to Areca for continuing to support FreeBSD.
gjb [Mon, 20 Nov 2017 15:46:23 +0000 (15:46 +0000)]
MFC r325863:
Only copy /etc/resolv.conf to ${CHROOTDIR} if /etc/resolv.conf does
not already exist within ${CHROOTDIR}. This allows re-using a build
chroot with CHROOTBUILD_SKIP set to a non-empty value and CHROOTDIR
set to '/' in release.conf.
eugen [Mon, 20 Nov 2017 09:24:01 +0000 (09:24 +0000)]
MFC r325436: RTF_PINNED for an interface
Allow a process to assign an IP address to local ppp interface
even if kernel routing table already has a route to the address in question
installed by some routing daemon (PR 223129).
Also, allow loopback route deletion when stopping a VIMAGE jail (PR 222647).
delphij [Mon, 20 Nov 2017 06:49:05 +0000 (06:49 +0000)]
MFC r325383:
Avoid calling get_controller_count() until attaching, this would avoid
costly PCI config space operations that slows down systems without the
hardware.
Many thanks to HighPoint for continued support of FreeBSD!
hselasky [Fri, 17 Nov 2017 15:46:45 +0000 (15:46 +0000)]
MFC r325615:
Make sure the IPv6 scope ID gets zeroed when exchanging CMA messages in ibcore.
Else the IPv6 address matching might fail. This change adds support for both
embedded and non-embedded IPv6 scope IDs when passing a IPv6 link-local socket
address to RDMA. Prior to this change only global IPv6 addresses would work
with RDMA.
hselasky [Fri, 17 Nov 2017 15:43:29 +0000 (15:43 +0000)]
MFC r325614:
Multiple fixes for using IPv6 link-local addresses with RDMA.
1) Fail to resolve RDMA address if rtalloc1() returns the loopback
device, lo0, as the gateway interface.
2) Use ip_dev_find() and ip6_dev_find() to lookup network interfaces
with matching IPv4 and IPv6 addresses, respectivly.
3) In addr_resolve() make sure the "ifa" pointer is always set, also when
the "ifp" is NULL. Else a NULL pointer access might happen trying to
read from the "ifa" pointer later on.
avg [Thu, 16 Nov 2017 23:36:19 +0000 (23:36 +0000)]
MFC r325035: MFV r325013,r325034: 640 number_to_scaled_string is duplicated in several commands
FreeBSD note: of all libcmdutils functionality ZFS (and other illumos
contrib code) currently uses only nicenum() function (which is similar
to humanize_number but has some formatting differences). For this
reason I decided to not port the whole library. As a result, nicenum.c
from libcmdutils is compiled into libzfs and libzpool. This is a bit
ugly, but works. If one day we are forced to create libillumos, then
the file should be moved to that library.
jhb [Thu, 16 Nov 2017 18:22:03 +0000 (18:22 +0000)]
MFC 325039: Rework pass through changes in r305485 to be safer.
Specifically, devices that do not support PCI-e FLR and were not
gracefully shutdown by the guest OS could continue to issue DMA
requests after the VM was terminated. The changes in r305485 meant
that those DMA requests were completed against the host's memory which
could result in random memory corruption. Instead, leave ppt devices
that are not attached to a VM disabled in the IOMMU and only restore
the devices to the host domain if the ppt(4) driver is detached from a
device.
As an added safety belt, disable busmastering for a pass-through device
when before adding it to the host domain during ppt(4) detach.
gjb [Thu, 16 Nov 2017 16:00:01 +0000 (16:00 +0000)]
MFC r320252, r320686, r325769:
r320252:
In release/release.sh:
- Rename chroot_arm_armv6_build_release() to chroot_arm_build_release()
and make it hardware agnostic (such as armv6 -vs- armv7 -vs- arm64).
- Evaluate EMBEDDED_TARGET differently so release/tools/arm.subr can
be used for arm/armv6 and arm64/aarch64.
- Update comments and copyright.
In release/tools/arm.subr:
- In arm_create_disk(), change the default alignment from 63 to 512k,
fixing a boot issue on arm64 and EFI. [1]
- Update comments and copyright.
r320686:
Fix the ftp-stage target by loosening the constraints on the TARGET
and TARGET_ARCH variables.
r325769:
Update the GUMSTIX image build to use arm/arm TARGET/TARGET_ARCH.
Update the TARGET/TARGET_ARCH matching in release/release.sh and
release/Makefile.mirrors for simplification.
Note: The RPI3.conf addition from r320252 is not included, as it is
not supported on 10-STABLE. Additionally, arm64/aarch64 changes are
also excluded from this commit.
jamie [Mon, 13 Nov 2017 23:21:17 +0000 (23:21 +0000)]
MFC r297935:
Separate POSIX sem/shm objects in jails, by prepending the jail's path
name to the object's "path". While the objects don't have real path
names, it's a filesystem-like namespace, which allows jails to be
kept to their own space, but still allows the system / jail parent to
access a jail's IPC.
MFC r297936:
Separate POSIX mqueue objects in jails; actually, separate them by the
jail's root, so jails that don't have their own filesystem directory
also won't have their own mqueue namespace.
MFC r297976:
Clean up some style(9) violations.
MFC r298567:
Use the new PR_METHOD_REMOVE to clean up jail handling in POSIX
message queues.
truckman [Sun, 12 Nov 2017 01:28:20 +0000 (01:28 +0000)]
MFC r325008
Fix Dummynet AQM packet marking function ecn_mark() and fq_codel /
fq_pie schedulers packet classification functions in layer2 (bridge mode).
Dummynet AQM packet marking function ecn_mark() and fq_codel/fq_pie
schedulers packet classification functions (fq_codel_classify_flow()
and fq_pie_classify_flow()) assume mbuf is pointing at L3 (IP)
packet. However, this assumption is incorrect if ipfw/dummynet is
used to manage layer2 traffic (bridge mode) since mbuf will point
at L2 frame. This patch solves this problem by identifying the
source of the frame/packet (L2 or L3) and adding ETHER_HDR_LEN
offset when converting an mbuf pointer to ip pointer if the traffic
is from layer2. More specifically, in dummynet packet tagging
function, tag_mbuf(), iphdr_off is set to ETHER_HDR_LEN if the
traffic is from layer2 and set to zero otherwise. Whenever an access
to IP header is required, mtodo(m, dn_tag_get(m)->iphdr_off) is
used instead of mtod(m, struct ip *) to correctly convert mbuf
pointer to ip pointer in both L2 and L3 traffic.
hselasky [Thu, 9 Nov 2017 19:15:28 +0000 (19:15 +0000)]
MFC r325278:
Unconditionally include "opt_inet6.h" in the LinuxKPI.
This makes sure the INET6 macro gets properly defined,
also for kernel module builds.
hselasky [Thu, 9 Nov 2017 19:00:11 +0000 (19:00 +0000)]
MFC r324792:
The remote DMA TCP portspace selector, RDMA_PS_TCP, is used for both
iWarp and RoCE in ibcore. The selection of RDMA_PS_TCP can not be used
to indicate iWarp protocol use. Backport the proper IB device
capabilities from Linux upstream to distinguish between iWarp and
RoCE. Only allocate the additional socket required for iWarp for RDMA
IDs when at least one iWarp device present. This resolves
interopability issues between iWarp and RoCE in ibcore
hselasky [Thu, 9 Nov 2017 17:02:20 +0000 (17:02 +0000)]
Use MAC-based GID format for the GID table entries in mlx5ib(4) to be
compatible with the ibcore module in FreeBSD 10-stable.
This fixes RDMA over VLAN, because the MAC-based GID format embeds
the VLAN ID into the GID itself and this is what ibcore expects when
requesting GID indexes from the GID cache.
RoCE V1.5 and V2 is not supported in 10-stable by mlx5ib(4).
This is a direct commit.
ken [Mon, 6 Nov 2017 20:08:02 +0000 (20:08 +0000)]
MFC r325371
------------------------------------------------------------------------
r325371 | ken | 2017-11-03 15:04:22 -0600 (Fri, 03 Nov 2017) | 19 lines
Add the LTO-8 Type M density code (0x5d, LTO-8M) to libmt and the
mt(1) man page.
LTO-8 Type M (also known as M8) is a pristine LTO-7 cartridge
formatted in a LTO-8 drive in a new, higher density format. It
has a separate density code, and is only readable in an LTO-8
drive.
lib/libmt/mtlib.c:
Add the LTO-8 Type M density code to the density table
in libmt.
usr.bin/mt/mt.1:
Add the LTO-8 Type M density code to the density
table in the mt(1) man page.
eugen [Mon, 6 Nov 2017 11:11:44 +0000 (11:11 +0000)]
MFC r324364: ftpd(8): fix user context handling
Apply authenticated user context after update of wtmp(5) at start of session,
so that ftpd process is not killed by kernel with SIGXFSZ when user has
"filesize" limit lower than size of system wtmp file. Same applies
to session finalization: revert to super-user context before update of wtmp.
If ftpd hits limit while writing a file at user request,
do not get killed with SIGXFSZ instantly but apparently ignore the signal,
process error and report it to the user, and continue with the session.
rmacklem [Sun, 5 Nov 2017 20:28:28 +0000 (20:28 +0000)]
MFC: r324639
Fix the client IP address reported by nfsdumpstate for 64bit arch and NFSv4.1.
The client IP address was not being reported for some NFSv4 mounts by
nfsdumpstate. Upon investigation, two problems were found for mounts
using IPv4. One was that the code (originally written and tested on i386)
assumed that a "u_long" was a "uint32_t" and would exactly store an
IPv4 host address. Not correct for 64bit arches.
Also, for NFSv4.1 mounts, the field was not being filled in. This was
basically correct, because NFSv4.1 does not use a callback address.
However, it meant that nfsdumpstate could not report the client IP addr.
This patch should fix both of these issues.
For IPv6, the address will still not be reported. The original NFSv4 RFC
only specified IPv4 callback addresses. I think this has changed and, if so,
a future commit to fix reporting of IPv6 addresses will be needed.
rmacklem [Sat, 4 Nov 2017 21:30:27 +0000 (21:30 +0000)]
MFC: r324506
Fix forced dismount when a pNFS mount is hung on a DS.
When a "pnfs" NFSv4.1 mount is hung because of an unresponsive DS,
a forced dismount wouldn't work, because the RPC socket for the DS
was not being closed. This patch fixes this.
This will only affect "pnfs" mounts where the pNFS server's DS
is unresponsive (crashed or network partitioned or...).
Found during testing of the pNFS server.
pfg [Sat, 4 Nov 2017 14:45:36 +0000 (14:45 +0000)]
MFC r325066:
Fix out-of-bounds read in libc/regex.
The bug is an out-of-bounds read detected with address sanitizer that
happens when 'sp' in p_b_coll_elems() includes NUL byte[s], e.g. if it's
equal to "GS\x00". In that case len will be equal to 4, and the
strncmp(cp->name, sp, len) call will succeed when cp->name is "GS" but the
cp->name[len] == '\0' comparison will cause the read to go out-of-bounds.
Checking the length using strlen() instead eliminates the issue.
The bug was found in LLVM with oss-fuzz:
https://reviews.llvm.org/D39380
Obtained from: Vlad Tsyrklevich through posting on openbsd-tech
cy [Sun, 29 Oct 2017 04:33:50 +0000 (04:33 +0000)]
Sync (make same) the offsetof macro definition in include/ with the
definition of the same in sys/sys/. The problem was discovered while
working on implementing a new C11 gets_s() for libc. (The new gets_s()
requires rsize_t found in include/stddef.h.) The solution to sync the two
definitions was suggested by ed@ while discussing D12667.
hselasky [Fri, 20 Oct 2017 10:06:02 +0000 (10:06 +0000)]
MFC r324445:
When showing the sleepqueues from the in-kernel debugger,
properly dump all the sleepqueues and not just the first one
History:
It appears that in the commit which introduced the code,
r165272, the array indexes of "sq_blocked[0]" and "td_name[i]"
were interchanged. In r180927 "td_name[i]" was corrected to
"td_name[0]", but "sq_blocked[0]" was left unchanged.
davidcs [Thu, 19 Oct 2017 17:35:37 +0000 (17:35 +0000)]
MFC r324535
Add sanity checks in ql_hw_send() qla_send() to ensure that empty slots
in Tx Ring map to empty slot in Tx_buf array before Transmits. If the
checks fail further Transmission on that Tx Ring is prevented.
gordon [Thu, 19 Oct 2017 03:18:22 +0000 (03:18 +0000)]
Update wpa_supplicant/hostapd for 2017-01 vulnerability release.
Note this is a different patchset than what was applied to head and
stable/11 due to the much older version of wpa_supplicant/hostapd in
stable/10.
hostapd: Avoid key reinstallation in FT handshake
Prevent reinstallation of an already in-use group key
Extend protection of GTK/IGTK reinstallation of WNM-Sleep Mode cases
Fix TK configuration to the driver in EAPOL-Key 3/4 retry case
Prevent installation of an all-zero TK
Fix PTK rekeying to generate a new ANonce
TDLS: Reject TPK-TK reconfiguration
WNM: Ignore Key Data in WNM Sleep Mode Response frame if no PMF in use
WNM: Ignore WNM-Sleep Mode Response if WNM-Sleep Mode has not been used
WNM: Ignore WNM-Sleep Mode Response without pending request
FT: Do not allow multiple Reassociation Response frames
TDLS: Ignore incoming TDLS Setup Response retries
ngie [Tue, 17 Oct 2017 15:49:36 +0000 (15:49 +0000)]
MFC r324478:
Check the exit code from fsck_ffs instead of relying on MODIFIED being in the output
^/head@r323923 changed when MODIFIED is printed at exit. It's better to follow the
documented way of determining whether or not a filesystem is clean per fsck_ffs, i.e.,
ensure that the exit code is either 0 or 7.
The pass/fail determination is brittle prior to this commit, and ^/head@r323923 made
the issue apparent -- thus this needs to be fixed independent of ^/head@r323923.
hselasky [Tue, 17 Oct 2017 11:20:32 +0000 (11:20 +0000)]
MFC r289568, r300676, r300677, r300719, r300720 and r300721:
Implement LinuxKPI module parameters as SYSCTLs.
The bool module parameter is no longer supported, because there is no
equivalent in FreeBSD 10-stable. These are converted into "int" type.
There are two macros available which control the behaviour of the
LinuxKPI module parameters:
- LINUXKPI_PARAM_PARENT allows the consumer to set the SYSCTL parent
where the modules parameters will be created.
- LINUXKPI_PARAM_PREFIX defines a parameter name prefix, which is
added to all created module parameters.
The LinuxKPI module parameters also have a permissions value.
If any write bits are set we are allowed to modify the module
parameter runtime. Reflect this when creating the static SYSCTL
nodes.
The module_param_call() function is no longer supported.
brooks [Sat, 14 Oct 2017 16:49:39 +0000 (16:49 +0000)]
MFC r324243:
Remove an unneeded and incorrect memset().
On Variant I TLS architectures (aarch64, arm, mips, powerpc, and riscv)
the __libc_allocate_tls function allocates thread local storage memory
with calloc(). It then copies initialization data over the portions with
non-zero initial values. Before this change it would then pointlessly
zero the already zeroed remainder of the storage. Unfortunately the
calculation was wrong and it would zero TLS_TCB_SIZE (2*sizeof(void *))
additional bytes.
In practice, this overflow only matters if the TLS segment is sized such
that calloc() allocates less than TLS_TCB_SIZE extra memory. Even
then, the likely result will be zeroing part of the next bucket. This
coupled with the impact being confined to Tier II platforms means there
will be no security advisory for this issue.
jhb [Fri, 13 Oct 2017 22:40:57 +0000 (22:40 +0000)]
MFC 324039: Don't defer wakeup()s for completed journal workitems.
Normally wakeups() are performed for completed softupdates work items
in workitem_free() before the underlying memory is free()'d.
complete_jseg() was clearing the "wakeup needed" flag in work items to
defer the wakeup until the end of each loop iteration. However, this
resulted in the item being free'd before it's address was used with
wakeup(). As a result, another part of the kernel could allocate this
memory from malloc() and use it as a wait channel for a different
"event" with a different lock. This triggered an assertion failure
when the lock passed to sleepq_add() did not match the existing lock
associated with the sleep queue. Fix this by removing the code to
defer the wakeup in complete_jseg() allowing the wakeup to occur
slightly earlier in workitem_free() before free() is called.
jhb [Fri, 13 Oct 2017 17:11:08 +0000 (17:11 +0000)]
MFC 324072: Add UMA_ALIGNOF().
This is a wrapper around _Alignof() that sets the alignment for a zone
to the alignment required by a given type. This allows the compiler to
determine the proper alignment rather than having the programmer try to
guess.