mav [Thu, 2 Aug 2018 18:55:55 +0000 (18:55 +0000)]
Do not blindly include illumos kernel headers instead of user-space.
It is not needed now, and I doubt it much helped at all, creating more
confusions then good.
gjb [Thu, 2 Aug 2018 18:51:44 +0000 (18:51 +0000)]
Fix the ftp-stage target for arm embedded builds.
The images were renamed from KERNCONF to BOARDNAME when
specified, which would result in an image name of:
12.0-CURRENT-arm-armv7-GENERIC.img
which would then be renamed to use the BOARDNAME for the
SoC the image is targeted to use. BOARDNAME was specified
for all images as of r336994, which now causes the ftp-stage
target to fail, as the rename is no longer necessary.
trasz [Thu, 2 Aug 2018 07:43:28 +0000 (07:43 +0000)]
Make sure the rtld(1) error messages go to stderr, not stdout.
While here fix capitalization of a few nearby strings, add the
rtld's file name prefix so it's obvious where the message come
from, and return zero when "-h" is used.
https://www.illumos.org/issues/7955
Libshare currently initializes all available filesystems when doing any
libshare operation. This requires iterating through all the filesystem
multiple times, which is a huge performance problem for sharing and
unsharing operations.
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Approved by: Gordon Ross <gordon.w.ross@gmail.com>
Author: Daniel Hoffman <dj.hoffman@delphix.com>
For FreeBSD this is practically a NOP, just a diff reduction.
kevans [Wed, 1 Aug 2018 20:08:20 +0000 (20:08 +0000)]
ubldr: Bump heap size, 1MB -> 2MB
1MB was leaving very little margin in some of the worse-case scenarios with
lualoader. 2MB is still low enough that we shouldn't have any problems with
UBoot-supported boards.
markj [Wed, 1 Aug 2018 19:45:04 +0000 (19:45 +0000)]
Fix some nits in the unix_passfd tests.
- Remove return statements in functions with a void return type.
- Allocate enough space for the SCM_CREDS and SCM_RIGHTS messages
received in the rights_creds_payload test.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
kib [Wed, 1 Aug 2018 18:58:24 +0000 (18:58 +0000)]
Add ioctl to conveniently mmap a PCI device BAR into userspace.
Add the ioctl PCIOCBARMMAP on /dev/pci to conveniently create
userspace mapping of a PCI device BAR. This is enormously superior to
read the BAR value with PCIOCREAD and then try to mmap /dev/mem, and
should allow to automatically activate the mapped BARs when needed in
future.
Current implementation creates new sg pager for each user mmap
request. If the pointer (and reference) to a managed device pager is
stored in pci_map, we would be able to revoke all mappings on the BAR
deactivation or relocation. This is related to the unimplemented BAR
activation on mmap, and is postponed for the future.
markj [Wed, 1 Aug 2018 18:45:33 +0000 (18:45 +0000)]
Don't hard-code the number of compression utility arguments.
The zstd invocation constructed by newsyslog contains one more parameter
than invocations for the other supported compression utilities. However,
the maximum number of arguments was hard-coded, leading to an
out-of-bounds array access when using zstd compression.
jhibbits [Wed, 1 Aug 2018 14:50:41 +0000 (14:50 +0000)]
snd_hda: Synchronize DMA buffers for the control path
Make sure both sides of the DMA buffer memory accesses for the CORB and RIRB
(control buffers) in snd_hda (device and CPU) can see coherent memory. This
is needed on weakly ordered architectures including PowerPC and ARM. Patch
originally by mmel, with small changes.
This does not cover the data path of snd_hda. We don't have sync operations
for in-progress DMA buffers, to sync ranges of a map.
rpokala [Wed, 1 Aug 2018 08:24:34 +0000 (08:24 +0000)]
Remove jedec_ts(4)
The jedec_ts(4) driver has been marked as deprecated in stable/11, and is
now being removed from -HEAD. Add a notice in UPDATING, and update the few
remaining references (regarding jedec_dimm(4)'s compatibility and history)
to reflect the fact that jedec_ts(4) is now deleted.
markj [Wed, 1 Aug 2018 03:46:07 +0000 (03:46 +0000)]
Require that MAC label buffers be able to store a non-empty string.
The buffer size may be used to initialize an sbuf in
MAC_POLICY_EXTERNALIZE, and without this constraint it's possible to
trigger an assertion failure in the sbuf code. With INVARIANTS
disabled, the first attempt to write to the sbuf will fail.
mav [Wed, 1 Aug 2018 03:21:17 +0000 (03:21 +0000)]
MFV r337029:
9426 metaslab size can exceed offset addressable by spacemap
metaslab size can exceed offset addressable by spacemap. The vdev can
address up to 2^63 * SPA_MAXBLOCKSIZE (512). A metaslab can address up to
2^47 * 2^vdev_ashift. Therefore we may need to increase the number of
metaslabs so that the maximum metaslab size is capped at the amount that
can be addressed by the spacemap. This should happen in
vdev_metaslab_set_size().
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Don Brady <don.brady@delphix.com>
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
mav [Wed, 1 Aug 2018 02:39:44 +0000 (02:39 +0000)]
MFV r337022:
9403 assertion failed in arc_buf_destroy() when concurrently reading block with checksum error
This assertion (VERIFY) failure was reported when reading a block. Turns out
the problem is that if we get an i/o error (ECKSUM in this case), and there
are multiple concurrent ARC reads of the same block (from different clones),
then the ARC will put multiple buf's on the same ANON hdr, which isn't
supposed to happen, and then causes a panic when we try to arc_buf_destroy()
the buf.
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Matt Ahrens <mahrens@delphix.com>
Author: Matthew Ahrens <mahrens@delphix.com>
The dtrace provider for UDP-Lite is modeled after the UDP provider.
This fixes the bug that UDP-Lite packets were triggering the UDP
provider.
Thanks to dteske@ for providing the dwatch module.
MFV r337014:
9421 zdb should detect and print out the number of "leaked" objects
9422 zfs diff and zdb should explicitly mark objects that are on the deleted queue
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Matt Ahrens <mahrens@delphix.com>
Author: Paul Dagnelie <pcd@delphix.com>
MFV r336991, r337001:
9102 zfs should be able to initialize storage devices
The first access to a disk block can incur a performance penalty on some
platforms (e.g. AWS's EBS, VMware VMDKs). Therefore it is recommended that
volumes be "thick provisioned", where supported by the platform (VMware).
Thick provisioning is time consuming and often is ignored. If the thick
provision step is omitted, customers will see suboptimal performance until
we have written to all parts of the LUN. ZFS should be able to initialize
any unused storage to remove any first-write penalty that exists.
Reviewed by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: George Wilson <george.wilson@delphix.com>
manu [Tue, 31 Jul 2018 20:50:50 +0000 (20:50 +0000)]
ofw_cpu: Add support for getting cpu clock via clock property
Nominal Mhz is either expressed via the clock-frequency property
or can be get via the clock property that holds the cpu clock.
Add support for the later.
manu [Tue, 31 Jul 2018 19:13:50 +0000 (19:13 +0000)]
release: arm: Enable multicons for arm64
Since we have now EFI framebuffer enabled for ARM64 if we boot on a board
with an screen, u-boot will set up a EFI GOP framebuffer and we won't boot
using the serial console.
Also on RPI3 the firmware always setup the framebuffer area resulting in u-boot
always setup the EFI GOP and FreeBSD never using the serial console.
manu [Tue, 31 Jul 2018 19:10:50 +0000 (19:10 +0000)]
release: Restore copy of boot.scr for some board
This is not a problem for 12-CURRENT as EFI boot works but it doesn't
for 11.
While here some board arm_install_uboot also copy ubldr.bin et create
firstboot files but it's already done in arm_install_boot
Reviewed by: gjb
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D16481
manu [Tue, 31 Jul 2018 19:08:24 +0000 (19:08 +0000)]
nvmem: Add nvmem interface and helpers
The nvmem interface helps provider of nvmem data to expose themselves to consumer.
NVMEM is generally present on some embedded board in a form of eeprom or fuses.
The nvmem api are helpers for consumer to read/write the cell data from a provider.
manu [Tue, 31 Jul 2018 18:57:11 +0000 (18:57 +0000)]
release: Deinstall u-boot ports before installing
FORCE_PKG_REGISTER is broken so multiple invocation of release.sh for the
same board will fails if /scratch isn't cleaned.
Leave it but deinstall the package first.
9102 zfs should be able to initialize storage devices
The first access to a disk block can incur a performance penalty on some
platforms (e.g. AWS's EBS, VMware VMDKs). Therefore it is recommended that
volumes be "thick provisioned", where supported by the platform (VMware).
Thick provisioning is time consuming and often is ignored. If the thick
provision step is omitted, customers will see suboptimal performance until
we have written to all parts of the LUN. ZFS should be able to initialize
any unused storage to remove any first-write penalty that exists.
For compat32, emulate the same wraparound check as occurs on the real
ILP32 system.
Reported by and discussed with: asomers
PR: 230162
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D16525
Allow vm object coalescing to occur in the midst of a vm object when the
OBJ_ONEMAPPING flag is set. In other words, allow recycling of existing
but unused subranges of a vm object when the OBJ_ONEMAPPING flag is set.
Such situations are increasingly common with jemalloc >= 5.0. This
change has the expected effect of reducing the number of vm map entry and
object allocations and increasing the number of superpage promotions.
snd_hda: Byteswap the buffer descriptor entries as needed
The buffer descriptor list entries should be in little endian format. Byte swap
them on BE. This is the last piece of the puzzle for snd_hda(4) to work on
PowerPC.
lld: [ELF][ARM] Implement support for Tag_ABI_VFP_args
The Tag_ABI_VFP_args build attribute controls the procedure call
standard used for floating point parameters on ARM. The values are:
0 - Base AAPCS (FP Parameters passed in Core (Integer) registers
1 - VFP AAPCS (FP Parameters passed in FP registers)
2 - Toolchain specific (Neither Base or VFP)
3 - Compatible with all (No use of floating point parameters)
If the Tag_ABI_VFP_args build attribute is missing it has an implicit
value of 0.
We use the attribute in two ways:
* Detect a clash in calling convention between Base, VFP and Toolchain.
we follow ld.bfd's lead and do not error if there is a clash between an
implicit Base AAPCS caused by a missing attribute. Many projects
including the hard-float (VFP AAPCS) version of glibc contain assembler
files that do not use floating point but do not have Tag_ABI_VFP_args.
* Set the EF_ARM_ABI_FLOAT_SOFT or EF_ARM_ABI_FLOAT_HARD ELF header flag
for Base or VFP AAPCS respectively. This flag is used by some ELF
loaders.
References:
* Addenda to, and Errata in, the ABI for the ARM Architecture for
Tag_ABI_VFP_args
* Elf for the ARM Architecture for ELF header flags
Fixes LLVM PR36009
PR: 229050
Obtained from: llvm r338377 by Peter Smith
llvm: [ARM] Complete enumeration values for Tag_ABI_VFP_args
The LLD implementation of Tag_ABI_VFP_args needs to check the rarely
seen values of 3 (toolchain specific) and 4 compatible with both Base
and VFP. Add the missing enumeration values so that LLD can refer to
them without having to use the raw numbers.
llvm: [ELF][ARM] Add Arm ABI names for float ABI ELF Header flags
The ELF for the Arm architecture document defines, for EF_ARM_EABI_VER5
and above, the flags EF_ARM_ABI_FLOAT_HARD and EF_ARM_ABI_FLOAT_SOFT.
These have been defined to be compatible with the existing
EF_ARM_VFP_FLOAT and EF_ARM_SOFT_FLOAT used by gcc for
EF_ARM_EABI_UNKNOWN.
This patch adds the flags in addition to the existing ones so that any
code depending on the old names will still work.
andrew [Tue, 31 Jul 2018 12:53:27 +0000 (12:53 +0000)]
Implement the SSBD (CVE-2018-3639) workaround on arm64
This calls into the Arm Trusted Firmware to enable and disable the
workaround for the Speculative Store Bypass Disable (SSBD) issue, also
known as Spectre Variant 4.
As this may have a large performance overhead, and how exploitable SSBD is
is unknown we follow the Linux lead of allowing the administrator to select
between always on, always off, or only enabled in the kernel, with the
latter being the default.
Only NULL check the VNET pointer when VIMAGE is enabled in ibcore.
Else a NULL VNET pointer should be ignored. This fixes address resolving
when VIMAGE is disabled.
MFC after: 3 days
Sponsored by: Mellanox Technologies
r336940 introduced an "unused variable" warning on platforms which
support INET, but not INET6, like MALTA and MALTA64 as reported
by Mark Millard. Improve the #ifdefs to address this issue.
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Paul Dagnelie <pcd@delphix.com>
MFV r336958: 9337 zfs get all is slow due to uncached metadata
This project's goal is to make read-heavy channel programs and zfs(1m)
administrative commands faster by caching all the metadata that they will
need in the dbuf layer. This will prevent the data from being evicted, so
that any future call to i.e. zfs get all won't have to go to disk (very
much).
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Matthew Ahrens <mahrens@delphix.com>
This project's goal is to make read-heavy channel programs and zfs(1m)
administrative commands faster by caching all the metadata that they will
need in the dbuf layer. This will prevent the data from being evicted, so
that any future call to i.e. zfs get all won't have to go to disk (very
much).
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Matthew Ahrens <mahrens@delphix.com>
We should use zfs_dbgmsg instead of spa_dbgmsg. Or at least,
metaslab_condense() should call zfs_dbgmsg because it's important and rare
enough to always log. It's possible that the message in zio_dva_allocate()
would be too high-frequency for zfs_dbgmsg.
Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Matthew Ahrens <mahrens@delphix.com>
We should use zfs_dbgmsg instead of spa_dbgmsg. Or at least,
metaslab_condense() should call zfs_dbgmsg because it's important and rare
enough to always log. It's possible that the message in zio_dva_allocate()
would be too high-frequency for zfs_dbgmsg.
Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Matthew Ahrens <mahrens@delphix.com>
MFV r336952: 9192 explicitly pass good_writes to vdev_uberblock/label_sync
Currently vdev_label_sync and vdev_uberblock_sync take a zio_t and assume
that its io_private is a pointer to the good_writes count. They should
instead accept this argument explicitly.
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Matthew Ahrens <mahrens@delphix.com>
9192 explicitly pass good_writes to vdev_uberblock/label_sync
Currently vdev_label_sync and vdev_uberblock_sync take a zio_t and assume
that its io_private is a pointer to the good_writes count. They should
instead accept this argument explicitly.
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Matthew Ahrens <mahrens@delphix.com>
MFV r336950: 9290 device removal reduces redundancy of mirrors
Mirrors are supposed to provide redundancy in the face of whole-disk failure
and silent damage (e.g. some data on disk is not right, but ZFS hasn't
detected the whole device as being broken). However, the current device
removal implementation bypasses some of the mirror's redundancy.
Mirrors are supposed to provide redundancy in the face of whole-disk failure
and silent damage (e.g. some data on disk is not right, but ZFS hasn't
detected the whole device as being broken). However, the current device
removal implementation bypasses some of the mirror's redundancy.
MFV r336948: 9112 Improve allocation performance on high-end systems
On high-end systems running async sequential write workloads, especially
NUMA systems with flash or NVMe storage, one significant performance
bottleneck is selecting a metaslab to do allocations from. This process
can be parallelized, providing significant performance increases for
these workloads.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Alexander Motin <mav@FreeBSD.org>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Paul Dagnelie <pcd@delphix.com>
9112 Improve allocation performance on high-end systems
On high-end systems running async sequential write workloads, especially
NUMA systems with flash or NVMe storage, one significant performance
bottleneck is selecting a metaslab to do allocations from. This process
can be parallelized, providing significant performance increases for
these workloads.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Alexander Motin <mav@FreeBSD.org>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Paul Dagnelie <pcd@delphix.com>
The current space map encoding has the following disadvantages:
[1] Assuming 512 sector size each entry can represent at most 16MB for a segment.
This makes the encoding very inefficient for large regions of space.
[2] As vdev-wide space maps have started to be used by new features (i.e.
device removal, zpool checkpoint) we've started imposing limits in the
vdevs that can be used with them based on the maximum addressable offset
(currently 64PB for a top-level vdev).
The new remains backwards compatible with the old one. The introduced
two-word entry format, besides extending the limits imposed by the single-entry
layout, also includes a vdev field and some extra padding after its prefix.
The extra padding after the prefix should is reserved for future usage (e.g.
new prefixes for future encodings or new fields for flags). The new vdev field
not only makes the space maps more self-descriptive, but also opens the doors
for pool-wide space maps.
One final important note is that the number of bits used for vdevs is reduced
to 24 bits for blkptrs. That was decided as we don't know of any setups that
use more than 16M vdevs for the time being and
we wanted to fit the vdev field in the space map. In addition that gives us
some extra bits in dva_t.
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <gwilson@zfsmail.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Serapheim Dimitropoulos <serapheim@delphix.com>
The current space map encoding has the following disadvantages:
[1] Assuming 512 sector size each entry can represent at most 16MB for a segment.
This makes the encoding very inefficient for large regions of space.
[2] As vdev-wide space maps have started to be used by new features (i.e.
device removal, zpool checkpoint) we've started imposing limits in the
vdevs that can be used with them based on the maximum addressable offset
(currently 64PB for a top-level vdev).
The new remains backwards compatible with the old one. The introduced
two-word entry format, besides extending the limits imposed by the single-entry
layout, also includes a vdev field and some extra padding after its prefix.
The extra padding after the prefix should is reserved for future usage (e.g.
new prefixes for future encodings or new fields for flags). The new vdev field
not only makes the space maps more self-descriptive, but also opens the doors
for pool-wide space maps.
One final important note is that the number of bits used for vdevs is reduced
to 24 bits for blkptrs. That was decided as we don't know of any setups that
use more than 16M vdevs for the time being and
we wanted to fit the vdev field in the space map. In addition that gives us
some extra bits in dva_t.
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <gwilson@zfsmail.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Serapheim Dimitropoulos <serapheim@delphix.com>
When a ZFS volume is created with zfs create -V (but without -s), the
refreservation property is set to a value that is volsize plus the maximum
size of metadata. If refreservation is ever set to another value, it is
impossible to set it back to the automatically determined value. There are
other cases where refreservation may be wrong. These include receiving a
volume that was sent without properties and zfs clone.
We need:
zfs set refreservation=auto <volume>
zfs clone -o refreservation=auto <volume>
Each one would use the same function used by zfs create -V to determine the
proper value for refreservation.
Reviewed by: Allan Jude <allanjude@freebsd.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Mike Gerdts <mike.gerdts@joyent.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Approved by: Matt Ahrens <mahrens@delphix.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Approved by: Matt Ahrens <mahrens@delphix.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
TCP/IPv4 allows an implicit connection setup using sendto(), which
is used for TTCP and TCP fast open. This patch adds support for
TCP/IPv6.
While there, improve some tests for detecting multicast addresses,
which are mapped.