CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log

MFH r321674:
Sync libarchive with vendor.

Relevant vendor changes:
PR #926: ensure ar strtab is null terminated

PR: 220462

MFC r321838:
sys/net8021: Add missing braces in setcurchan().

Also fix some indentation.

Obtained from: DragonFlyBSD (git c69e37d6)

MFC r321349:
Improve publication of the newly allocated snapdata.

MFC r321348:
Unlock correct lock in ffs_snapblkfree().

MFC r321347:
Account for lock recursion when transfering snaplock to the vnode lock
in ffs_snapremove().

MFC r321652:
Simplify flow control.

MFC r321605:

As in r315225, discard 3072 bytes of RC4 bytestream instead of 1024.
(This implementation of arc4rand(9) is used by the userland ipftest
utility as it approximates ipfilter kernelspace in userspace.)

PR: 217920
Submitted by: codarren@hackers.mu
Reviewed by: emaste, cem
Approved by: so (implicit, in r315225)
Differential Revision: D11747
Patterned after: r315225

MFC r312983:
Make "desc" pointer non-constant inside the mlx5_core_diagnostics_entry
structure. This fixes compilation with amd64-xtoolchain-gcc.

PR: 216588
Sponsored by: Mellanox Technologies

MFC r312876:
Use ffs() to scan for first bit instead of using a for() loop.
Minor code refactor while at it.

Sponsored by: Mellanox Technologies

MFC r312872:
Add support for reading advanced diagnostic counters.

By default reading the diagnostic counters is disabled. The firmware
decides which counters are supported and only those supported show up
in the dev.mce.X.diagnostics sysctl tree.

To enable reading of diagnostic counters set one or more of the
following sysctls to one:

dev.mce.X.conf.diag_general_enable=1
dev.mce.X.conf.diag_pci_enable=1

Sponsored by: Mellanox Technologies

MFC r312865:
Enforce reading the consumer and producer counters once to ensure
consistent return values from the mlx5e_sq_has_room_for()
function. The two counters are incremented by different threads under
different locks.

Sponsored by: Mellanox Technologies

MFC r312537:
Remove superfluous return statement.

Sponsored by: Mellanox Technologies

MFC r312536:
Allow transmit packet bufring in software to be disabled.

- Add new sysctl node to control the transmit packet bufring.

- Add optimised version of the transmit routine which output packets
directly to the DMA ring instead of using bufring in case the transmit
lock is congested. This can reduce the number of taskswitches which in
turn influence the overall system CPU usage, depending on the
workload.

- Add " TX" suffix to debug name for transmit mutexes to silence some
witness warnings about aquiring duplicate locks having same name.

Sponsored by: Mellanox Technologies
Suggested by: gallatin @

MFC r312528:
Make draining a sendqueue more robust.

Add own state variable to track if a sendqueue is stopped or not.
This will prevent traffic from entering the sendqueue while it is
being destroyed.

Update drain function to wait for traffic to be transmitted before
returning when the link state is active.

Add extra checks in transmit path for stopped SQ's.

While at it:
- Use likely() for a mbuf pointer check.
- Remove redundant IFF_DRV_RUNNING check.

Sponsored by: Mellanox Technologies

MFC r312527:
Add runtime support for modifying the SQ and RQ completion event
moderation mode. The presence of this feature is indicated through the
firmware capabilities.

Sponsored by: Mellanox Technologies

MFC r312526:
Update firmware interface structures and definitions adding support
for new features and commands.

Sponsored by: Mellanox Technologies

MFC r320773:
Implement fix for BULK IN-token retry mechanism. When the hardware is
programmed for infinite IN token retry after NAK, the SAF1761
hardware, however, does not retry the IN-token. This problem is
described in the SAF1761 errata, section 18.1.1.

While at it:
- Add some minor chip specific initialization for RTEMS.
- Add debug print for status registers in the interrupt filter.

Submitted by: Christian Mauderer <christian.mauderer@embedded-brains.de>

MFC r321722:
Properly range check length of parsed information elements in RSU driver.

Found by: Ilja van Sprundel <ivansprundel@ioactive.com>
Sponsored by: Mellanox Technologies

MFC r321627:
Make it possible to request nosys logging to console.

MFC r321625:
Make the number of children for pctrie node available outside subr_pctrie.c.

MFC r321606: adaasync(): Set ADA_STATE_WCACHE based on ADA_FLAG_CAN_WCACHE

The attached patch lets adaasync() set ADA_STATE_WCACHE based on
ADA_FLAG_CAN_WCACHE instead of ADA_FLAG_CAN_RAHEAD.

This fixes a regression introduced in r300207 which changed
the flag names.

PR: 220948
Submitted by: Fabian Keil <fk@fabiankeil.de>
Obtained from: ElectroBSD

MFC r321620: Fix singular/plural "users" output.

It was broken during libxo'fication.

PR: 221039
Submitted by: timur@

MFC r321622, r321623:

  ------------------------------------------------------------------------
  r321622 | ken | 2017-07-27 09:33:57 -0600 (Thu, 27 Jul 2017) | 44 lines

  Fix probing FC targets with hard addressing turned on.

  This largely reverts FreeBSD SVN change 289937 from October 25th, 2015.

  The intent of that change was to keep loop IDs persistent across
  chip reinits.

  The problem is that the change turned on the PREVLOOP /
  PREV_ADDRESS bit (bit 7 in Firmware Options 2), which tells the
  Qlogic chip to not participate in the loop if it can't get the
  requested loop address.  It also turned off soft addressing on 2400
  (4Gb) and newer controllers.

  The isp(4) driver defaults to loop address 0, and the tape drives
  I have tested default to loop address 0 if hard addressing is turned
  on.  So when hard loop addressing is turned on on the drive, the isp(4)
  driver just refuses to participate in the loop.

  The solution is to largely revert that change.  I left some elements
  in place that are related to virtual ports, since they were new.

  This does work with IBM tape drives with hard and soft addressing
  turned on.  I have tested it with 4Gb, 8Gb, and 16Gb controllers.

  sys/dev/isp.c:
   Largely revert FreeBSD SVN change 289937.  I left the
   ispmbox.h changes in place.

   Don't use the PREV_ADDRESS bit on initialization.  It tells
   the chip to not participate if it can't get the requested
   loop ID.

   Do use soft addressing on 2400 and newer chips.

   Use hard addressing when the user has requested a specific
   initiator ID.  (hint.isp.X.iid=N in /boot/loader.conf)

   Leave some of the virtual port options from that change in
   place, but don't turn on the PREV_ADDRESS bit.

  Reviewed by: mav
  Sponsored by: Spectra Logic

  ------------------------------------------------------------------------
  r321623 | ken | 2017-07-27 09:51:56 -0600 (Thu, 27 Jul 2017) | 6 lines

  Remove duplicate assignments from r321622.

  Submitted by: mav
  Sponsored by: Spectra Logic

  ------------------------------------------------------------------------

Merge r316113,316184,316413 from head:
- Remove #define PCIS_SERIALBUS_SMBUS_PROGIF, unused since r200091
- Switch device_probe() from large case statement to a lookup table
- Add several missing SMBus controllers

MFC r320730: Report device descr in addition to ident.

Serial number without device model is somewhat less useful.

MFC r320683: Add naive benchmark for SSDs in ZFS SLOG role.

ZFS SLOGs have very specific access pattern with many cache flushes,
which none of benchmarks I know can simulate.  Since SSD vendors rarely
specify cache flush time, this measurement can be useful to explain why
some ZFS pools are slower then expected.  This test writes data chunks
of different size followed by cache flush, alike to what ZFS SLOG does,
and measures average time.

To illustrate, here is result for 6 years old SATA Intel 710 Series SSD:

Synchronous random writes:
         0.5 kbytes:    138.3 usec/IO =      3.5 Mbytes/s
           1 kbytes:    137.7 usec/IO =      7.1 Mbytes/s
           2 kbytes:    151.1 usec/IO =     12.9 Mbytes/s
           4 kbytes:    158.2 usec/IO =     24.7 Mbytes/s
           8 kbytes:    175.6 usec/IO =     44.5 Mbytes/s
          16 kbytes:    210.1 usec/IO =     74.4 Mbytes/s
          32 kbytes:    274.2 usec/IO =    114.0 Mbytes/s
          64 kbytes:    416.5 usec/IO =    150.1 Mbytes/s
         128 kbytes:    776.6 usec/IO =    161.0 Mbytes/s
         256 kbytes:   1503.1 usec/IO =    166.3 Mbytes/s
         512 kbytes:   2968.7 usec/IO =    168.4 Mbytes/s
        1024 kbytes:   5866.8 usec/IO =    170.5 Mbytes/s
        2048 kbytes:  11696.6 usec/IO =    171.0 Mbytes/s
        4096 kbytes:  23329.6 usec/IO =    171.5 Mbytes/s
        8192 kbytes:  46779.5 usec/IO =    171.0 Mbytes/s

, and much newer and supposedly much faster NVMe Samsung 950 PRO SSD:

Synchronous random writes:
         0.5 kbytes:   2092.9 usec/IO =      0.2 Mbytes/s
           1 kbytes:   2013.1 usec/IO =      0.5 Mbytes/s
           2 kbytes:   2014.8 usec/IO =      1.0 Mbytes/s
           4 kbytes:   2090.7 usec/IO =      1.9 Mbytes/s
           8 kbytes:   2044.5 usec/IO =      3.8 Mbytes/s
          16 kbytes:   2084.8 usec/IO =      7.5 Mbytes/s
          32 kbytes:   2137.1 usec/IO =     14.6 Mbytes/s
          64 kbytes:   2173.4 usec/IO =     28.8 Mbytes/s
         128 kbytes:   2923.9 usec/IO =     42.8 Mbytes/s
         256 kbytes:   3085.3 usec/IO =     81.0 Mbytes/s
         512 kbytes:   3112.2 usec/IO =    160.7 Mbytes/s
        1024 kbytes:   2430.6 usec/IO =    411.4 Mbytes/s
        2048 kbytes:   3788.9 usec/IO =    527.9 Mbytes/s
        4096 kbytes:   6198.0 usec/IO =    645.4 Mbytes/s
        8192 kbytes:  10764.9 usec/IO =    743.2 Mbytes/s

While the first one obviously has maximal throughput limitations, the
second one has so high cache flush latency (about 2 millisecond), that
it makes one almost useless in SLOG role, despite of its good throughput
numbers.  Power loss protection is out of scope of this test, but I
suspect it can be related.

MFC r320555, r320576 (by allanjude):
Add -s (serial) and -p (physpath) to diskinfo

Return the bare requested information, intended for scripting.

The serial number of a SAS/SCSI device can be returned with
'camcontrol inquiry disk -S', but there is no similar switch for SATA.

This provides a way to get this information from both SAS and SATA disks

the -s and -p flags are mutually exclusive, and cannot be used with any
other flags.

MFC r321581:
Mark pages after EOF as clean after pageout.

MFC r321580:
Move rtvals initialization out of the region protected by NFS node lock.

MFC r321512:
Mark name_PCTRIE_LOOKUP_LE() generated function unused.

MFC r320761:

- Use strlcat() instead of strncat().
- Use asprintf() and handle allocation errors.

MFC r321713:

Bump copyright year.

MFC r320359:

Add vfs.nfsd.nfsd_enable_uidtostring, which works just like
vfs.nfsd.nfsd_enable_stringtouid, but in reverse - when set to 1,
it forces the NFSv4 server to return numeric UIDs and GIDs instead
of "user@domain" strings. This helps with clients that can't
translate returned identifiers, eg when rerooting.

The same can be achieved by just never running nfsuserd(8),
but the sysctl is useful to toggle the behaviour back and forth
without rebooting.

MFC r320409:

Revert part of r320359, as suggested by rmacklem@. That case is only used
for nfsuserd -manage-gids and shouldn't depend on sysctl.

MFC r321196:

Rename vfs.nfsd.enable_uidtostring to vfs.nfs.enable_uidtostring.
It applies to both NFS client and NFS server, and is useful for both.
This is different from vfs.nfsd.enable_stringtouid, which is specific
to server side.

Sponsored by: DARPA, AFRL

MFC r320495: Allow status aggregation for ramdisk reads.

MFC r320493: Unify INOT/ATIO setup.

MFC r320604, r320865:
Switch fabric scans from GID_FT to GID_PT+GFF_ID/GFT_ID.

Instead of using GID_FT SNS request to get list of registered FCP ports,
use GID_PT to get list of all Nx_Ports, and then use GFF_ID and/or GFT_ID
requests to find whether they are FCP and target capable.

The problem with old approach is that GID_FT does not report ports without
FC-4 type registered. In particular it was impossible to boot OS from
FreeBSD FC target using QLogic FC BIOS, since one does not register FC-4
type even on new cards and so ignored by old code as incompatible.

As a side bonus this allows initiator to skip pointless logins to other
initiators by fetching that information from SNS instead.

In case some switches do not implement GFF_ID/GFT_ID correctly, add sysctls
to disable that functionality. I handled broken GFF_ID of my Brocade 200E,
but there may be other switches with different bugs.

Linux also uses GID_PT, but GFF_ID is disabled by default there, and GFT_ID
is not supported.

Sponsored by: iXsystems, Inc.

MFC r320575: Move comment respecting previous commit.

MFC r320574: Slightly unify SNS requests for post- and pre-24xx.

MFC r320492: Polish target_id/target_lun setting for ATIOs/INOTs.

For ATIOs it is pointless to report isp_loopid to CAM, since in other
places it operates with port database record IDs, not with loop IDs.

For INOTs target_id/target_lun seems were never set, so wildcard INOTs
probably were not working correctly when LUN IDs were important.

MFC r320990, r321011:

libthr: Avoid checking for negative values in usigned count.

Check for overflow instead.

MFC r320941: Fix GRE over IPv6 tunnels with IPFW

Previously, GRE packets in IPv6 tunnels would be dropped by IPFW (unless
net.inet6.ip6.fw.deny_unknown_exthdrs was unset).

PR: 220640
Submitted by: Kun Xie <kxie@xiplink.com>

MFC r321436: ar: handle partial writes from archive_write_data

libarchive may limit a single archive_write_data call to handling
0x7fffffff bytes. Add a loop to handle partial writes.

Sponsored by: The FreeBSD Foundation

MFC r320491:

atf-sh(3): document atf_init_test_cases(3) fully

The function was missing from the NAME/SYNOPSIS sections. Add a manpage link
to complete the documentation reference.

MFC 321409

    hyperv/hn: Ignore LINK_SPEED_CHANGE status.

    This status will be reported if the backend NIC is wireless; it's not
    useful.  Due to the high frequency of the reporting, this could be
    pretty annoying; ignore it.

    Sponsored by:   Microsoft
    Differential Revision:  https://reviews.freebsd.org/D11651

MFC 321408

    rndis: Add LINK_SPEED_CHANGE status

    Reviewed by:    hselasky
    Sponsored by:   Microsoft
    Differential Revision:  https://reviews.freebsd.org/D11650

MFC 321407

    hyperv/hn: Export VF list and VF-HN mapping

    The VF-HN map will be used later on to implement "transparent VF".

    Sponsored by:   Microsoft
    Differential Revision:  https://reviews.freebsd.org/D11618

MFC 321406

    ethernet: Add ethernet interface attached event and devctl notification.

    ifnet_arrival_event may not be adequate under certain situation; e.g.
    when the LLADDR is needed.  So the ethernet ifattach event is announced
    after all necessary bits are setup.

    Sponsored by:   Microsoft
    Differential Revision:  https://reviews.freebsd.org/D11617

MFC r321640:
Fix style bugs in ksyms.c.

MFC r321639:
Restrict permissions on /dev/ksyms to 0400.

MFC r321437:
Fix style and wrap lines to 80 columns in savecore.c.

MFC r321401:
net80211: do not allow to unload rate control module if it is still in use.

Keep 'nrefs' counter up-to-date, so 'kldunload wlan_amrr' with 1+ active
wlan(4) interface will not lead to kernel panic.

MFC r320837:

Style(9). Whitespace.

MFC r320836:

Eliminate bogus casts.

MFC r321370

Handle WITH/WITHOUT_PF in libsysdecode

Only filter out the PF ioctls if we're building without pf support.
Until now those were always filtered out, so truss did not show symbolic
names for pf ioctls.

Differential Revision: https://reviews.freebsd.org/D11629

MFC r314504 (by tsoome):
loader: r314112 did introduce dereference freed pointer entry

CID: 1371675
Reported by: Coverity
Differential Revision: https://reviews.freebsd.org/D9846

MFC r321366:

Style(9) whitespace fix.

MFC r320814:

Style(9). Add blank line aftr {.

MFC r320815:

Remove init from declaration.

MFC r320816:

Remove init from declaration, collapse two int vars declarations into single.

MFC r320817:

Don't take a lock around atomic operation.

MFC r320818:

Eliminate the bogus cast.

MFC r320819:

Eliminate the bogus cast.

MFC r320820:

Don't initialize error in declaration.

MFC r307865 (by tsoome): loader should boot pre-feature flags pools.

The feature flags chek is missing the corner case where we have valid pool
version, but feature flags are not enabled - as for example plain v28 pool.

This update does fix the boot support for such pools.

Differential Revision: https://reviews.freebsd.org/D8331

PR: 221084

MFC r321371:
Do not allocate struct kinfo_vmobject on stack.

MFC r321342:

Pull in r295886 from upstream clang trunk (by Richard Smith):

  PR32034: Evaluate _Atomic(T) in-place when T is a class or array type.

  This is necessary in order for the evaluation of an _Atomic
  initializer for those types to have an associated object, which an
  initializer for class or array type needs.

This fixes an assertion when building recent versions of LinuxCNC.

Reported by: trasz
PR: 220883

MFC r321306:

Fix printf format warning in iflib.c

Clang 5.0.0 got better warnings about printf format strings using %zd,
and this leads to the following -Werror warning on e.g. arm:

    sys/net/iflib.c:1517:8: error: format specifies type 'ssize_t' (aka 'int') but the argument has type 'bus_size_t' (aka 'unsigned long') [-Werror,-Wformat]
                                              sctx->isc_tx_maxsize, nsegments, sctx->isc_tx_maxsegsize);
                                              ^~~~~~~~~~~~~~~~~~~~
    sys/net/iflib.c:1517:41: error: format specifies type 'ssize_t' (aka 'int') but the argument has type 'bus_size_t' (aka 'unsigned long') [-Werror,-Wformat]
                                              sctx->isc_tx_maxsize, nsegments, sctx->isc_tx_maxsegsize);
                                                                               ^~~~~~~~~~~~~~~~~~~~~~~

Fix this by casting bus_size_t arguments to uintmax_t, and using %ju
instead.

Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D11679

MFC r321305:

Fix printf format warning in zfs_module.c

Clang 5.0.0 got better warnings about print format strings using %zd,
and this leads to the following -Werror warning on e.g. arm:

    sys/boot/efi/boot1/zfs_module.c:186:18: error: format specifies type 'ssize_t' (aka 'int') but the argument has type 'off_t' (aka 'long long') [-Werror,-Wformat]
                        "(%lu)\n", st.st_size, spa->spa_name, filepath, EFI_ERROR_CODE(status));
                                   ^~~~~~~~~~

Fix this by casting off_t arguments to intmax_t, and using %jd instead.

Reviewed by: tsoome
Differential Revision: https://reviews.freebsd.org/D11678

MFC 321075: Set the current vnet pointer in the socket buffer AIO handler.

This fixes panics when using AIO under VIMAGE.

Sponsored by: Chelsio Communications

MFC bmake-20170720

PR: 221023

MFC r314319 (by oshogbo):
Don't try to open devices in the gettc() function which will always
fail in the Capability mode. Instead silently fallback to the syscall
method, which is done for example in the gettimeofday(2) function.

MFC r314320 (by oshogbo):
Remove unneeded variable initialization from r314319.

MFC r321461:
Fix indent.

MFC r320441,r320442:

r320441:

share/examples/tests/Makefile: clean up example snippets/documentation

- TESTSDIR doesn't need to be specified after r289158.
- Including bsd.own.mk isn't required since no MK_<foo> knobs are being
  manipulated.
- TESTS_SUBDIRS should be written out in an append format, one entry
  per line, to provide a better, more conflict resistant example.

r320442:

share/examples/tests/{atf,plain}/Makefile: tweak example Makefile snippets

- Including bsd.own.mk isn't required since no MK_<foo> knobs are being
  manipulated.
- Update documentation to note that ${FILES} is installed via bsd.progs.mk,
  not bsd.prog.mk.

MFC r320443,r320444:

r320443:

Add kyua TAP test integration examples

The examples are patterned loosely after the ATF examples, similar
to the plain test examples.

r320444:

Commit the corresponding mtree file change for the TAP test examples

MFC with: r320443

MFC r320446:

trailing_slash is a TAP-compliant testcase; mark it as such, instead
of calling is a plain testcase.

MFC r320445:

Don't hardcode path to file in /tmp; this violates the kyua sandbox

MFC: r321314
r320062 introduced a bug when doing NFSv4.1 mounts against some non-FreeBSD servers.

r320062 used nm_rsize, nm_wsize to set the maximum request/response sizes for
the NFSv4.1 session. If rsize,wsize are not specified as options, the
value of nm_rsize, nm_wsize is 0 at session creation, resulting in
values for request/response that are too small.
This patch fixes the problem. A workaround is to specify rsize=N,wsize=N
mount options explicitly, so they are set before session creation.
This bug only affects NFSv4.1 mounts against some non-FreeBSD servers.

Add an errata entry to reflect an incorrect attribution for r315330.

Reported by: Jim Thompson
Sponsored by: The FreeBSD Foundation

revert r321601, it depends on an ACPICA update not yet merged

MFC r320152 (by avg): fstyp: move sys/ include path after zfs include paths

The reason is that FreeBSD refcount.h shadows ZFS refcount.h and that
will lead to a build error after a planned import of the ARC buf data
scatter-ization.
It's possible that some day we will have an opposite problem where
a ZFS header would shadow an essential FreeBSD header.
So, we need to think about a better long term solution.

MFC r320352: zfs: port vdev_file part of illumos change 3306

3306 zdb should be able to issue reads in parallel
illumos/illumos-gate/31d7e8fa33fae995f558673adb22641b5aa8b6e1
https://www.illumos.org/issues/3306

The upstream change was made before we started to import upstream commits
individually.  It was imported into the illumos vendor area as r242733.
That commit was MFV-ed in r260138, but as the commit message says
vdev_file.c was left intact.

This commit actually implements the parallel I/O for vdev_file using a
taskqueue with multiple thread.  This implementation does not depend on
the illumos or FreeBSD bio interface at all, but uses zio_t to pass
around all the relevent data.  So, the code looks a bit different from
the upstream.

This commit also incorporates ZoL commit
zfsonlinux/zfs/bc25c9325b0e5ced897b9820dad239539d561ec9 that fixed
https://github.com/zfsonlinux/zfs/issues/2270
We need to use a dedicated taskqueue for exactly the same reason as ZoL
as we do not implement TASKQ_DYNAMIC.

Obtained from:  illumos, ZFS on Linux

MFC r320239: MFV r319950:
5220 L2ARC does not support devices that do not provide 512B access

FreeBSD note: the actual change has been in FreeBSD since r297848.  This
commit accounts for integration of that change with subsequent changes,
especially r320156 (MFV of r318946) and r314274.

illumos/illumos-gate@403a8da73c64ff9dfb6230ba045c765a242213fb
https://github.com/illumos/illumos-gate/commit/403a8da73c64ff9dfb6230ba045c765a242213fb

https://www.illumos.org/issues/5220
  There are disk devices that have logical sector size larger than 512B, for
  example 4KB. That is, their physical sector size is larger than 512B and they
  do not provide emulation for 512B sector sizes. For such devices both a data
  offset and a data size must be properly aligned. L2ARC should arrange that
  because it uses physical I/O.
  zio_vdev_io_start() performs a necessary transformation if io_size is not
  aligned to vdev_ashift, but that is done only for logical I/O. Something
  similar should be done in L2ARC code.
      * a temporary write buffer should be allocated if the original buffer is
        not going to be compressed and its size is not aligned
      * size of a temporary compression buffer should be ashift aligned
      * for the reads, if a size of a target buffer is not sufficiently large and
        it is not aligned then a temporary read buffer should be allocated

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>

MFC r320238: MFV r319742: 8056 zfs send size estimate is inaccurate for some zvols

illumos/illumos-gate@0255edcc85fc0cd1dda0e49bcd52eb66c06a1b16
https://github.com/illumos/illumos-gate/commit/0255edcc85fc0cd1dda0e49bcd52eb66c06a1b16

https://www.illumos.org/issues/8056
  The send size estimate for a zvol can be too low, if the size of the record
  headers (dmu_replay_record_t's) is a significant portion of the size.
  This is typically the case when the data is highly compressible, especially
  with embedded blocks.
  The problem is that dmu_adjust_send_estimate_for_indirects() assumes that
  blocks are the size of the "recordsize" property (128KB).
  However, for zvols, the blocks are the size of the "volblocksize" property
  (8KB). Therefore, we estimate that there will be 16x less record headers than
  there really will be.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Paul Dagnelie <pcd@delphix.com>

MFC r320237: MFV r318947: 7578 Fix/improve some aspects of ZIL writing.

FreeBSD note: this commit removes small differences between what mav
committed to FreeBSD in r308782 and what ended up committed to illumos
after addressing all review comments.

illumos/illumos-gate@c5ee46810f82e8a53d2cc5a487568a573f449039
https://github.com/illumos/illumos-gate/commit/c5ee46810f82e8a53d2cc5a487568a573f449039

https://www.illumos.org/issues/7578
  After some ZIL changes 6 years ago zil_slog_limit got partially broken
  due to zl_itx_list_sz not updated when async itx'es upgraded to sync.
  Actually because of other changes about that time zl_itx_list_sz is not
  really required to implement the functionality, so this patch removes
  some unneeded broken code and variables.
  Original idea of zil_slog_limit was to reduce chance of SLOG abuse by
  single heavy logger, that increased latency for other (more latency critical)
  loggers, by pushing heavy log out into the main pool instead of SLOG. Beside
  huge latency increase for heavy writers, this implementation caused double
  write of all data, since the log records were explicitly prepared for SLOG.
  Since we now have I/O scheduler, I've found it can be much more efficient
  to reduce priority of heavy logger SLOG writes from ZIO_PRIORITY_SYNC_WRITE
  to ZIO_PRIORITY_ASYNC_WRITE, while still leave them on SLOG.
  Existing ZIL implementation had problem with space efficiency when it
  has to write large chunks of data into log blocks of limited size. In some
  cases efficiency stopped to almost as low as 50%. In case of ZIL stored on
  spinning rust, that also reduced log write speed in half, since head had to
  uselessly fly over allocated but not written areas. This change improves
  the situation by offloading problematic operations from z*_log_write() to
  zil_lwb_commit(), which knows real situation of log blocks allocation and
  can split large requests into pieces much more efficiently. Also as side
  effect it removes one of two data copy operations done by ZIL code WR_COPIED
  case.
  While there, untangle and unify code of z*_log_write() functions.
  Also zfs_log_write() alike to zvol_log_write() can now handle writes crossing
  block boundary, that may also improve efficiency if ZPL is made to do that.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Steven Hartland <steven.hartland@multiplay.co.uk>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Alexander Motin <mav@FreeBSD.org>

MFC r320156, r320185, r320186, r320262, r320452, r321111:
MFV r318946: 8021 ARC buf data scatter-ization

illumos/illumos-gate@770499e185d15678ccb0be57ebc626ad18d93383
https://github.com/illumos/illumos-gate/commit/770499e185d15678ccb0be57ebc626ad1
8d93383

https://www.illumos.org/issues/8021
  The ARC buf data project (known simply as "ABD" since its genesis in the ZoL
  community) changes the way the ARC allocates `b_pdata` memory from using linea
r
  `void *` buffers to using scatter/gather lists of fixed-size 1KB chunks. This
  improves ZFS's performance by helping to defragment the address space occupied
  by the ARC, in particular for cases where compressed ARC is enabled. It could
  also ease future work to allocate pages directly from `segkpm` for minimal-
  overhead memory allocations, bypassing the `kmem` subsystem.
  This is essentially the same change as the one which recently landed in ZFS on
  Linux, although they made some platform-specific changes while adapting this
  work to their codebase:
  1. Implemented the equivalent of the `segkpm` suggestion for future work
  mentioned above to bypass issues that they've had with the Linux kernel memory
  allocator.
  2. Changed the internal representation of the ABD's scatter/gather list so it
  could be used to pass I/O directly into Linux block device drivers. (This
  feature is not available in the illumos block device interface yet.)

FreeBSD notes:
- the actual (default) chunk size is 4KB (despite the text above saying 1KB)
- we can try to reimplement ABDs, so that they are not permanently
  mapped into the KVA unless explicitly requested, especially on
  platforms with scarce KVA
- we can try to use unmapped I/O and avoid intermediate allocation of a
  linear, virtual memory mapped buffer
- we can try to avoid extra data copying by referring to chunks / pages
  in the original ABD

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Chris Williamson <chris.williamson@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Dan Kimmel <dan.kimmel@delphix.com>

MFC r320153: revert r315852 which introduced zio_buf_alloc_nowait for use
in vdev_queue_aggregate

I think that the change is still good, but reconciling it with a planned
merge of the ARC buf data scatter-ization is a bit more tedious
than I can handle.

MFC r321299: acpidump: add GIC ITS srat type

From ACPI 6.2, 5.2.16.5

Sponsored by: The FreeBSD Foundation

MFC r321294: acpidump: use C99 designated initializers

Submitted by: Guangyuan Yang <yzgyyang@outlook.com>
Sponsored by: The FreeBSD Foundation

MFC r321302: add arm64 objcopy output target for embedfs

PR: 220877
Submitted by: David NewHamlet

MFC: r321248
Update the nfsv4 man page to reflect recent changes to support
the newer RFCs (5661 and 7530). The main man changes are for the
case of "numbers in strings" for user/groups that RFC7530 allows
and avoids use of nfsuserd(8).

This is a content change.

MFC r319513: linux vdso: pass -fPIC to the assembler, not linker

-fPIC has no effect on linking although it seems to be ignored by
GNU ld.bfd. However, it causes ld.lld to terminate with an invalid
argument error.

This is equivalent to r296057 but for the kernel (not modules) case.

Sponsored by: The FreeBSD Foundation

MFC r316055: makefs: sort roundup with the other off_t members in fsinfo_t

MFC r312857: Use cross-NM (XNM) in compat32 build

An attempt to build mips64 using external toolchain failed as it tried
to use the host amd64 nm.

MFC r319718: arm64: add ".arch armv8-a+crc" to allow use of crc instructions

With Clang 5.0 the .arch directive is required, otherwise Clang
complains "error: instruction requires: crc".

This was reported in D10499 but not added initially, because clang 3.8
available on a ref machine reported unknown directive. Clang 4.0 allows
but does not require the directive.

Sponsored by: The FreeBSD Foundation

MFC r321218: zfs: Fix a typo in the delay_min_dirty_percent sysctl description

The description is FreeBSD-specific and was added in r266497
to fix PR189865.

PR: 220825
Submitted by: Fabian Keil
Obtained from: ElectroBSD

MFC r319953: MFV r319951: 8311 ZFS_READONLY is a little too strict

illumos/illumos-gate@2889ec41c05e9ffe1890b529b3111354da325aeb
https://github.com/illumos/illumos-gate/commit/2889ec41c05e9ffe1890b529b3111354d
a325aeb

https://www.illumos.org/issues/8311
  Description:
  There was a misunderstanding about the enforcement details of the "Read-only"
  flag introduced for SMB/CIFS compatibility, way back in 2007 in the Sun PSARC
  2007/315 case.
  The original authors thought enforcement of the READONLY flag should work
  similarly as the IMMUTABLE flag. Unfortunately, that enforcement is
  incompatible with the expectations of Windows applications using this feature
  through the SMB service. Applications assume (and the MS File System Algorithm
s
  MS-FSA confirms they should) that an SMB client can:
  (a) Open an SMB handle on a file with read/write access,
  (b) Set the DOS attributes to include the READONLY flag,
  (c) continue to have write access via that handle.
  This access model is essentially the same as a Unix/POSIX application that
  creates a file (with read/write access), uses fchmod() to change the file mode
  to something not granting write access (i.e. 0444), and then continues to writ
e
  that file using the open handle it got before the mode change.
  Currently, the SMB server works-around this problem in a way that will become
  difficult to maintain as we implement support for SMB3 persistent handles, so
  SMB depends on this fix.
  I've written a test program that can be used to demonstrate this problem, and
  added it to zfs-tests (tests/functional/acl/cifs/cifs_attr_004_pos).
  It currently fails, but will pass when this problem fixed.
  Steps to Reproduce:
    Run the test program on a ZFS file system.
  Expected Results:
    Pass
  Actual Results:
    Fail.

Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Approved by: Prakash Surya <prakash.surya@delphix.com>
Author: Gordon Ross <gwr@nexenta.com>

MFC r319949: MFV r319948: 5428 provide fts(), reallocarray(), and strtonum()

illumos/illumos-gate@4585130b259133a26efae68275dbe56b08366deb
https://github.com/illumos/illumos-gate/commit/4585130b259133a26efae68275dbe56b08366deb

https://www.illumos.org/issues/5428

Most of the upstream change is not applicable to FreeBSD.
Only the renaming of strtonum to zfs_strtonum is relevant to us.
And we already had it partially done.

Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Joshua M. Clulow <josh@sysmgr.org>
Author: Yuri Pankov <yuri.pankov@nexenta.com>

MFC r319947: MFV r319945,r319946: 8264 want support for promoting datasets in libzfs_core

illumos/illumos-gate@a4b8c9aa65a0a735aba318024a424a90d7b06c37
https://github.com/illumos/illumos-gate/commit/a4b8c9aa65a0a735aba318024a424a90d7b06c37

https://www.illumos.org/issues/8264
Oddly there is a lzc_clone function, but no lzc_promote function.

Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan McDonald <danmcd@kebe.com>
Approved by: Dan McDonald <danmcd@kebe.com>
Author: Andrew Stormont <astormont@racktopsystems.com>

MFC r319751: MFV r319740: 8168 NULL pointer dereference in zfs_create()

illumos/illumos-gate@690031d326342fa4ea28b5e80f1ad6a16281519d
https://github.com/illumos/illumos-gate/commit/690031d326342fa4ea28b5e80f1ad6a16281519d

https://www.illumos.org/issues/8168
  If we manage to export the pool on which we are creating a dataset (filesystem
  or zvol) between entering libzfs`zfs_create() and libzfs`zpool_open() call (for
  which we never check the return value) we end up dereferencing a NULL pointer
  in libzfs`zpool_close().
  This was discovered on ZFS on Linux. The same issue can be reproduced on
  Illumos running in parallel:
    while :; do zpool import -d /tmp testpool ; zpool export testpool ; done
    while :; do zfs create testpool/fs; zfs destroy testpool/fs ; done
  Eventually this will result in several core dumps like this one:
  [root@52-54-00-d3-7a-01 /cores]# mdb core.zfs.4244
  Loading modules: [ libumem.so.1 libc.so.1 libtopo.so.1 libavl.so.1
  libnvpair.so.1 ld.so.1 ]
  > ::stack
  libzfs.so.1`zpool_close+0x17(0, 0, 0, 8047450)
  libzfs.so.1`zfs_create+0x1bb(8090548, 8047e6f, 1, 808cba8)
  zfs_do_create+0x545(2, 8047d74, 80778a0, 801, 0, 3)
  main+0x22c(8047d2c, fef5c6e8, 8047d64, 8055a17, 3, 8047d70)
  _start+0x83(3, 8047e64, 8047e68, 8047e6f, 0, 8047e7b)
  >
  Fix and reproducer (systemtap): https://github.com/zfsonlinux/zfs/pull/6096

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: loli10K <ezomori.nozomu@gmail.com>

MFC r319750: MFV r319741: 8156 dbuf_evict_notify() does not need dbuf_evict_lock

illumos/illumos-gate@dbfd9f930004c390a2ce2cf850c71b4f880eef9c
https://github.com/illumos/illumos-gate/commit/dbfd9f930004c390a2ce2cf850c71b4f880eef9c

https://www.illumos.org/issues/8156
  dbuf_evict_notify() holds the dbuf_evict_lock while checking if it should do
  the eviction itself (because the evict thread is not able to keep up).
  This can result in massive lock contention.
  It isn't necessary to hold the lock, because if we make the wrong choice
  occasionally, nothing bad will happen.

Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>

MFC r319749: MFV r319739: 8005 poor performance of 1MB writes on certain RAID-Z configuration
s

illumos/illumos-gate@5b062782532a1d5961c4a4b655906e1238c7c908
https://github.com/illumos/illumos-gate/commit/5b062782532a1d5961c4a4b655906e123
8c7c908

https://www.illumos.org/issues/8005
  RAID-Z requires that space be allocated in multiples of P+1 sectors,
  because this is the minimum size block that can have the required amount
  of parity. Thus blocks on RAIDZ1 must be allocated in a multiple of 2
  sectors; on RAIDZ2 multiple of 3; and on RAIDZ3 multiple of 4. A sector
  is a unit of 2^ashift bytes, typically 512B or 4KB.
  To satisfy this constraint, the allocation size is rounded up to the
  proper multiple, resulting in up to 3 "pad sectors" at the end of some
  blocks. The contents of these pad sectors are not used, so we do not
  need to read or write these sectors. However, some storage hardware
  performs much worse (around 1/2 as fast) on mostly-contiguous writes
  when there are small gaps of non-overwritten data between the writes.
  Therefore, ZFS creates "optional" zio's when writing RAID-Z blocks that
  include pad sectors. If writing a pad sector will fill the gap between
  two (required) writes, we will issue the optional zio, thus doubling
  performance. The gap-filling performance improvement was introduced in
  July 2009.
  Writing the optional zio is done by the io aggregation code in
  vdev_queue.c. The problem is that it is also subject to the limit on
  the size of aggregate writes, zfs_vdev_aggregation_limit, which is by
  default 128KB. For a given block, if the amount of data plus padding
  written to a leaf device exceeds zfs_vdev_aggregation_limit, the
  optional zio will not be written, resulting in a ~2x performance
  degradation.
  The problem occurs only for certain values of ashift, compressed block
  size, and RAID-Z configuration (number of parity and data disks). It
  cannot occur with the default recordsize=128KB. If compression is
  enabled, all configurations with recordsize=1MB or larger will be
  impacted to some degree.
  The problem notably occurs with recordsize=1MB, compression=off, with 10
  disks in a RAIDZ2 or RAIDZ3 group (with 512B or 4KB sectors). Therefore

Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>

MFC r319748: MFV r319738: 8155 simplify dmu_write_policy handling of pre-compressed buffers

illumos/illumos-gate@adaec86ad212d9fd756bee322934fa54d1258605
https://github.com/illumos/illumos-gate/commit/adaec86ad212d9fd756bee322934fa54d1258605

https://www.illumos.org/issues/8155
  When writing pre-compressed buffers, arc_write() requires that the compression
  algorithm used to compress the buffer matches the compression algorithm
  requested by the zio_prop_t, which is set by dmu_write_policy().
  This makes dmu_write_policy() and its callers a bit more complicated.
  We can simplify this by making arc_write() trust the caller to supply the type
  of pre-compressed buffer that it wants to write, and override the compression
  setting in the zio_prop_t.

Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>

MFC r319672 (by allanjude):
New sentences start on new lines, fix two violations

MFC r318945: MFV r318944: 8265 Reserve send stream flag for large dnode feature

illumos/illumos-gate@bc83969fdbd1cb0d97ba00218c0a3de5c89fba92
https://github.com/illumos/illumos-gate/commit/bc83969fdbd1cb0d97ba00218c0a3de5c
89fba92

https://www.illumos.org/issues/8265
Reserve bit 23 in the zfs send stream flags for the large
dnode feature which has been implemented for Linux.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Brian Behlendorf <behlendorf1@llnl.gov>

MFC r321228:
Allow matches of truncated version strings.

MFC r318935: MFV r318934: 8070 Add some ZFS comments

illumos/illumos-gate@40713f2b249d289022c715107b3951055a63aef0
https://github.com/illumos/illumos-gate/commit/40713f2b249d289022c715107b3951055a63aef0

https://www.illumos.org/issues/8070
Add some ZFS comments left by various developers at different times

Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Alan Somers <asomers@gmail.com>

MFC r318932: MFV r318931: 8063 verify that we do not attempt to access inactive txg

illumos/illumos-gate@b7b2590dd9f11b12a0b4878db3886068cce176af
https://github.com/illumos/illumos-gate/commit/b7b2590dd9f11b12a0b4878db3886068cce176af

https://www.illumos.org/issues/8063
  A standard practice in ZFS is to keep track of "per-txg" state. Any of
  the 3 active TXG's (open, quiescing, syncing) can have different values
  for this state. We should assert that we do not attempt to modify other
  (inactive) TXG's.

Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>