CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log

MFC r338646: dd(1): Correct padding in status=progress

Output padding is specified via outlen, which is set using the return value
of fprintf. Because it's printing that padding plus a trailing byte, it
grows by one each iteration rather than reflecting actual length.

Additionally, iec was sized improperly for scaling up similarly to si.
Fixing this revealed that the humanize_number(3) call to populate persec
was using the wrong width.

MFC r338223, r338263: Missing bits from OptionalObsoleteFiles

r338223:
Remove ZFS leftovers when WITHOUT_ZFS is set

Submitted by: Oliver Pinter
Differential Revision: https://reviews.freebsd.org/D16810

r338263:
Remove hyper-v leftovers when WITHOUT_HYPERV is set

hv_vss_daemon was missed.

MFC r338219, r338250: FDT in Loader fixes

r338219:
fdt_fixups: relocate the /chosen node after applying fixups

As indicated by the comment, any fixups applied (which might include
overlays) can invalidate the previously located node by adding nodes or
setting/adding properties. The later fdt_setprop of fixup-applied property
would then fail because of the bad/wrong node offset.

This would have generally been harmless, but potentially caused multiple
applications of fixups and caused a little bit of bloat.

r338250:
efiloader: Setup FDT in autoload to fix overlays clobbering kenv

manu found in the noted PR that overlays seemed to be clobbering the kenv
and killing the boot. Further inspection revealed that one can `fdt ls` at
the loader prompt for a successful boot, but autoboot breaks it.

In the autoboot case, first setup of FDT is happening in the middle of
bi_load, which triggers loading of the DTBO from /boot.

This is bad, bad, bad. Files in the loader are loaded somewhere in the
middle of the address space one after another. bi_load starts building the
needed kernel bootinfo immediately after the highest-addr loaded file. File
loads in the middle of bi_load suddenly clobber bootinfo and everything goes
off the rails.

The solution to this is to use take advantage of arch_autoload to setup FDT
in efiloader compiled with LOADER_FDT_SUPPORT. This matches how it works in
ubldr land, and is how it should have worked when overlay support was added
to efiloader since fdt_setup_fdtp now has the potential to load files
(courtesy of fdt_platform_load_dtb).

MFC r338039: diff(1): Implement -B/--ignore-blank-lines

As noted by cem in r338035, coccinelle invokes diff(1) with the -B flag.
This was not previously implemented here, so one was forced to create a link
for GNU diff to /usr/local/bin/diff

Implement the -B flag and add some primitive tests for it. It is implemented
in the same fashion that -I is implemented; each chunk's lines are scanned,
and if a non-blank line is encountered then the chunk will be output.
Otherwise, it's skipped.

MFC r337964, r338232: dtc(1) updates

r337964:
dtc(1): Update to 97d2d5715eeb45108cc60367fdf6bd5b2046b050

Notable fixes:
- Overlays may now be generated properly without -@
- /__local_fixups__ were not including unit address in their structure
- The error reporting a magic token was misleading, reporting
  "Bad magic token in header.  Got d00dfeed expected 0xd00dfeed"
  if the token was missing. This has been split out into a separate message.

r338232:
dtc(1): Update to 0892ec7; HACKING and implicit header fixes

Fixes courtesy of arichardson and jmg:
- HACKING was pointing to the wrong place
- Added headers were being relied on implicitly, but libstdc++ did not
  comply with the unspoken wishes of dtc.

MFC r337567 (by mmacy):
Performance optimization of AVL tree comparator functions

MFV:
commit ee36c709c3d5f7040e1bd11f5c75318aa03e789f
Author: Gvozden Neskovic <neskovic@gmail.com>
Date:   Sat Aug 27 20:12:53 2016 +0200

    perf: 2.75x faster ddt_entry_compare()
        First 256bits of ddt_key_t is a block checksum, which are expected
    to be close to random data. Hence, on average, comparison only needs to
    look at first few bytes of the keys. To reduce number of conditional
    jump instructions, the result is computed as: sign(memcmp(k1, k2)).

    Sign of an integer 'a' can be obtained as: `(0 < a) - (a < 0)` := {-1, 0, 1} ,
    which is computed efficiently.  Synthetic performance evaluation of
    original and new algorithm over 1G random keys on 2.6GHz Intel(R) Xeon(R)
    CPU E5-2660 v3:

    old     6.85789 s
    new     2.49089 s

    perf: 2.8x faster vdev_queue_offset_compare() and vdev_queue_timestamp_compare()
        Compute the result directly instead of using conditionals

    perf: zfs_range_compare()
        Speedup between 1.1x - 2.5x, depending on compiler version and
    optimization level.

    perf: spa_error_entry_compare()
        `bcmp()` is not suitable for comparator use. Use `memcmp()` instead.

    perf: 2.8x faster metaslab_compare() and metaslab_rangesize_compare()
    perf: 2.8x faster zil_bp_compare()
    perf: 2.8x faster mze_compare()
    perf: faster dbuf_compare()
    perf: faster compares in spa_misc
    perf: 2.8x faster layout_hash_compare()
    perf: 2.8x faster space_reftree_compare()
    perf: libzfs: faster avl tree comparators
    perf: guid_compare()
    perf: dsl_deadlist_compare()
    perf: perm_set_compare()
    perf: 2x faster range_tree_seg_compare()
    perf: faster unique_compare()
    perf: faster vdev_cache _compare()
    perf: faster vdev_uberblock_compare()
    perf: faster fuid _compare()
    perf: faster zfs_znode_hold_compare()

Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Signed-off-by: Richard Elling <richard.elling@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #5033

MFC r338869: MFV r338866: 9700 ZFS resilvered mirror does not balance reads

illumos/illumos-gate@82f63c3c2bf5e4378706e8dcfccf717d67371be9

Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Jerry Jelinek <jerry.jelinek@joyent.com>

MFC r337972: 9751 Allocation throttling misplacing ditto blocks

Relax allocation throttling for ditto blocks.  Due to random imbalances
in allocation it tends to push block copies to one vdev, that looks
slightly better at the moment.  Slightly less strict policy allows both
improve data security and surprisingly write performance, since we don't
need to touch extra metaslabs on each vdev to respect the min distance.

Sponsored by:   iXsystems, Inc.

MFC r337970: 9738 Fix third block copy allocations, broken at 9112.

Use METASLAB_WEIGHT_CLAIM weight to allocate tertiary blocks.
Previous use of METASLAB_WEIGHT_SECONDARY for that caused errors
later on metaslab_activate_allocator() call, leading to massive
load of unneeded metaslabs and write freezes.

Reviewed by: Paul Dagnelie <pcd@delphix.com>

MFC r337923: Make vfs.zfs.zio.dva_throttle_enabled sysctl writable.

Not sure what I thought originally, but as I see now runtime changes are
working fine, and the code seems like even designed for this.

MFC r337883: Add couple tunables/sysctl, missed in r336949.

MFC r337870: Fix mismerge in r337196.

ZoL did the same mistake, and fixed it with separate commit 863522b1f9:

dsl_scan_scrub_cb: don't double-account non-embedded blocks

We were doing count_block() twice inside this function, once
unconditionally at the beginning (intended to catch the embedded block
case) and once near the end after processing the block.

The double-accounting caused the "zpool scrub" progress statistics in
"zpool status" to climb from 0% to 200% instead of 0% to 100%, and
showed double the I/O rate it was actually seeing.

This was apparently a regression introduced in commit 00c405b4b5e8,
which was an incorrect port of this OpenZFS commit:

https://github.com/openzfs/openzfs/commit/d8a447a7

Reviewed by: Thomas Caputi <tcaputi@datto.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Steven Noonan <steven@uplinklabs.net>
Closes #7720
Closes #7738

MFC r337229: Reduce taskq and context-switch cost of zio pipe

When doing a read from disk, ZFS creates 3 ZIO's: a zio_null(), the
logical zio_read(), and then a physical zio. Currently, each of these
results in a separate taskq_dispatch(zio_execute).

On high-read-iops workloads, this causes a significant performance
impact. By processing all 3 ZIO's in a single taskq entry, we reduce the
overhead on taskq locking and context switching. We accomplish this by
allowing zio_done() to return a "next zio to execute" to zio_execute().

This results in a ~12% performance increase for random reads, from
96,000 iops to 108,000 iops (with recordsize=8k, on SSD's).

Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
External-issue: DLPX-59292
Closes #7736

zfsonlinux/zfs@62840030a7dceaee013ddbcc1eebcfc7922edf7c

MFC r337227: MFV r337223:
9580 Add a hash-table on top of nvlist to speed-up operations

illumos/illumos-gate@2ec7644aab2a726a64681fa66c6db8731b160de1

Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Serapheim Dimitropoulos <serapheim@delphix.com>

MFC r337221: MFV r337220: 8375 Kernel memory leak in nvpair code

illumos/illumos-gate@843c2111b160463f014d325560ad4b051711928e

Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>

MFC r337219: MFV r337218: 7261 nvlist code should enforce name length limit

illumos/illumos-gate@48dd5e630c9b1773b7b10d08a3b90b6c9062d713

Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Matthew Ahrens <mahrens@delphix.com>

MFC r337217: MFV r337216: 7263 deeply nested nvlist can overflow stack

illumos/illumos-gate@9ca527c3d3dfa7c8f304b34a9e03b5eddace838f

Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Matthew Ahrens <mahrens@delphix.com>

MFC r337215: MFV 337214:
9621 Make createtxg and guid properties public

illumos/illumos-gate@e8d4a73c868afb740396041be80ed2b141065e76

Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Yuri Pankov <yuripv@yuripv.net>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Josh Paetzel <josh@tcbug.org>

MFC r337213: MFV r337212:
9465 ARC check for 'anon_size > arc_c/2' can stall the system

illumos/illumos-gate@abe1fd01ce5a83718c5a840daeab4abdaec1c104

Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Don Brady <don.brady@delphix.com>

MFC r337211: MFV r337210: 9577 remove zfs_dbuf_evict_key tsd

The zfs_dbuf_evict_key TSD (thread-specific data) is not necessary - we can
instead pass a flag down in a few places to prevent recursive dbuf eviction.
Making this change has 3 benefits:

1. The code semantics are easier to understand.
2. On Linux, performance is improved, because creating/removing TSD values
(by setting to NULL vs non-NULL) is expensive, and we do it very often.
3. According to Nexenta, the current semantics can cause a deadlock when
concurrently calling dmu_objset_evict_dbufs() (which is rare today, but they
are working on a "parallel unmount" change that triggers this more easily)

illumos/illumos-gate@c2919acbea007fa95c709b60d073db9a24526e01

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Matthew Ahrens <mahrens@delphix.com>

MFC r337209:
MFV r337208: 9591 ms_shift can be incorrectly changed in MOS config for
indirect vdevs that have been historically expanded

illumos/illumos-gate@11f6a9680e013a7c9c57dc0b64d3e91e2eee1a6b

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <gwilson@zfsmail.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Serapheim Dimitropoulos <serapheim@delphix.com>

MFC r337207: MFV r337206: 9338 moved dnode has incorrect dn_next_type

illumos/illumos-gate@c7fbe46df966ea665df63b6e6071808987e839d1

Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>

MFC r337205:
MFV r337204: 9439 ZFS double-free due to failure to dirty indirect block

illumos/illumos-gate@99a19144e82244f3426f055cc73af8a937c0135c

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>

MFC r337202: MFV r337200:
9438 Holes can lose birth time info if a block has a mix of birth times

Ultimately, the problem here is that when you truncate and write a file in
the same transaction group, the dbuf for the indirect block will be zeroed
out to deal with the truncation, and then written for the write. During
this process, we will lose hole birth time information for any holes in the
range. In the case where a dnode is being freed, we need to determine
whether the block should be converted to a higher-level hole in the zio
pipeline, and if so do it when the dnode is being synced out.

illumos/illumos-gate@738e2a3ce3b2579222d6855e7fe75b5bcfcddf8d

Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Paul Dagnelie <pcd@delphix.com>

MFC r337201: Fix build after r337196 mismerge.

MFC r337198: MFV r337197: 9456 ztest failure in zil_commit_waiter_timeout

illumos/illumos-gate@b6031810da58df96413bf76e068638fcab1f228a

Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Approved by: Matt Ahrens <mahrens@delphix.com>
Author: Prakash Surya <prakash.surya@delphix.com>

MFC r337196: MFV r337195: 9454 ::zfs_blkstats should count embedded blocks

illumos/illumos-gate@dec267e7ea9828898b1c64462daa6636c4ef5e29

Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>

MFC r337194: MFV r337193:
9424 ztest failure: "unprotected error in call to Lua API (Invalid value type 'f
unction' for key 'error')"

illumos/illumos-gate@fe3ba4d1227d8746116ece7240682b13595c3142

Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Don Brady <don.brady@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>

MFC r337191:
MFV r337190: 9486 reduce memory used by device removal on fragmented pools

In the most fragmented real-world cases, this reduces memory used by the
mapping from ~1GB to ~50MB of RAM per 1TB of storage removed. Less
fragmented cases will typically also see around 50-100MB of RAM per 1TB
of storage.

illumos/illumos-gate@cfd63e1b1bcf7ba4bf72f55ddbd87ce008d2986d

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Tim Chase <tim@chase2k.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>

MFC r337185:
MFV r337184: 9457 libzfs_import.c:add_config() has a memory leak

A memory leak occurs on lines 209 and 213 because the config is not freed
in the error case. The interface to add_config() seems less than ideal -
it would be better if it copied any data necessary from the config and the
caller freed it.

illumos/illumos-gate@ddfe901b12348d31c500fb57f9174e88860a4061

Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: sara hartse <sara.hartse@delphix.com>

MFC r337183:
MFV r337182: 9330 stack overflow when creating a deeply nested dataset

Datasets that are deeply nested (~100 levels) are impractical. We just put
a limit of 50 levels to newly created datasets. Existing datasets should
work without a problem.

illumos/illumos-gate@5ac95da7d61660aa299c287a39277cb0372be959

Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>

MFC r337181: 9539 Make zvol operations use _by_dnode routines

Continues what was started in 7801 add more by-dnode routines by fully
converting zvols to avoid unnecessary dnode_hold() calls. This saves a
small amount of CPU time and slightly improves latencies of operations
on zvols.

illumos/illumos-gate@8dfe5547fbf0979fc1065a8b6fddc1e940a7cf4f

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Rick McNeal <rick.mcneal@nexenta.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Richard Yao <richard.yao@prophetstor.com>

MFC r337179: 9523 Large alloc in zdb can cause trouble

16MB alloc in zdb_embedded_block() can cause cores in certain situations
(clang, gcc55).

OsX commit: https://github.com/openzfsonosx/zfs/commit/ced236a5da6e72ea7bf6d2919fe14e17cffe10f1
FreeBSD commit: https://svnweb.freebsd.org/base?view=revision&revision=326150
illumos/illumos-gate@03a4c2f4bfaca30115963b76445279b36468a614

Reviewed by: Igor Kozhukhov <igor@dilos.org>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Jorgen Lundman <lundman@lundman.net>

This is an update for r326150 (by avg), where this change comes from.

MFC r337177:
MFV r337175: 9487 Free objects when receiving full stream as clone

All objects after the last written or freed object are not supposed to
exist after receiving the stream. We should free them accordingly, as if
a freeobjects record for them had been included in the stream.

zfsonlinux/zfs@48fbb9ddbf2281911560dfbc2821aa8b74127315
illumos/illumos-gate@7864b8192b8d30471fa2240466d516292e5765b8

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Paul Dagnelie <pcd@delphix.com>

MFC r337172, MFV r337171:
9464 txg_kick() fails to see that we are quiescing, forcing transactions
to their next stages without leaving them accumulate changes

Ideally we would like txg_kick() to get triggered only when we are sure
that we are not syncing AND not quiescing any txg. This way we can kick
an open TXG to the quiescing state when we are sure that there is nothing
going on and we would benefit from the different states running
concurrently.

illumos/illumos-gate@fa41d87de9ec9000964c605eb01d6dc19e4a1abe

Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Serapheim Dimitropoulos <serapheim@delphix.com>

MFC r338947:
  Add "src-ip" or "dst-ip" keyword to the output, when we are printing the
  rest of rule options.

  Reported by: lev

MFC r338955:
When doing lm_add(), check for duplicates.

MFC r337169: MFV r337167: 9442 decrease indirect block size of spacemaps

Updates to indirect blocks of spacemaps can contribute significantly to
write inflation. Therefore we want to reduce the indirect block size of
spacemaps from 128K to 16K.

illumos/illumos-gate@221813c13b43ef48330b03725e00edee85108cf1

Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Albert Lee <trisk@forkgnu.org>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>

MFC r337163: MFV r337161: 9512 zfs remap poolname@snapname coredumps

Only filesystems and volumes are valid "zfs remap" parameters: when passed
a snapshot name zfs_remap_indirects() does not handle the EINVAL returned
from libzfs_core, which results in failing an assertion and consequently
crashing.

illumos/illumos-gate@0b2e8253986c5c761129b58cfdac46d204903de1

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Approved by: Matt Ahrens <mahrens@delphix.com>
Author: loli10K <ezomori.nozomu@gmail.com>

MFC r337160:
Do not blindly include illumos kernel headers instead of user-space.
It is not needed now, and I doubt it much helped at all, creating more
confusions then good.

MFC r337063: MFV r316926:
7955 libshare needs to initialize only those datasets being modified by the consumer

illumos/illumos-gate@8a981c3356b194b3b5c0ae9276a9cc31cd2f93a3
https://github.com/illumos/illumos-gate/commit/8a981c3356b194b3b5c0ae9276a9cc31cd2f93a3

https://www.illumos.org/issues/7955
  Libshare currently initializes all available filesystems when doing any
  libshare operation. This requires iterating through all the filesystem
  multiple times, which is a huge performance problem for sharing and
  unsharing operations.

Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Approved by: Gordon Ross <gordon.w.ross@gmail.com>
Author: Daniel Hoffman <dj.hoffman@delphix.com>

For FreeBSD this is practically a NOP, just a diff reduction.

MFC r337030: MFV r337029:
9426 metaslab size can exceed offset addressable by spacemap

metaslab size can exceed offset addressable by spacemap. The vdev can
address up to 2^63 * SPA_MAXBLOCKSIZE (512). A metaslab can address up to
2^47 * 2^vdev_ashift. Therefore we may need to increase the number of
metaslabs so that the maximum metaslab size is capped at the amount that
can be addressed by the spacemap. This should happen in
vdev_metaslab_set_size().

illumos/illumos-gate@b4bf0cf0458759c67920a031021a9d96cd683cfe

Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Don Brady <don.brady@delphix.com>

MFC r337028: MFV r337027:
9328 zap code can take advantage of c99
9329 panic in zap_leaf_lookup() due to concurrent zapification

illumos/illumos-gate@bf26014c5541b6119f34e0d95294b7f2eb105ac2

Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>

MFC r337025: MFV r337022:
9403 assertion failed in arc_buf_destroy() when concurrently reading block with checksum error

This assertion (VERIFY) failure was reported when reading a block. Turns out
the problem is that if we get an i/o error (ECKSUM in this case), and there
are multiple concurrent ARC reads of the same block (from different clones),
then the ARC will put multiple buf's on the same ANON hdr, which isn't
supposed to happen, and then causes a panic when we try to arc_buf_destroy()
the buf.

illumos/illumos-gate@fa98e487a9619b7902f218663be219e787a57dad

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Matt Ahrens <mahrens@delphix.com>
Author: Matthew Ahrens <mahrens@delphix.com>

MFC r337021: MFV r337020:9443 panic when scrub a v10 pool

illumos/illumos-gate@bb1f424574ac8e08069d0ba993c2a41ffe796794

Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>

MFC r337017: MFV r337014:
9421 zdb should detect and print out the number of "leaked" objects
9422 zfs diff and zdb should explicitly mark objects that are on the deleted queue

illumos/illumos-gate@20b5dafb425396adaebd0267d29e1026fc4dc413

Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Matt Ahrens <mahrens@delphix.com>
Author: Paul Dagnelie <pcd@delphix.com>

MFC r337007: MFV r336991, r337001:
9102 zfs should be able to initialize storage devices

The first access to a disk block can incur a performance penalty on some
platforms (e.g. AWS's EBS, VMware VMDKs). Therefore it is recommended that
volumes be "thick provisioned", where supported by the platform (VMware).
Thick provisioning is time consuming and often is ignored. If the thick
provision step is omitted, customers will see suboptimal performance until
we have written to all parts of the LUN. ZFS should be able to initialize
any unused storage to remove any first-write penalty that exists.

illumos/illumos-gate@094e47e980b0796b94b1b8f51f462a64d246e516

Reviewed by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: George Wilson <george.wilson@delphix.com>

MFC r336961:
MFV r336960: 9256 zfs send space estimation off by > 10% on some datasets

illumos/illummos-gate@df477c0afa111b5205c872dab36dbfde391656de

Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Paul Dagnelie <pcd@delphix.com>

MFC r336959: MFV r336958: 9337 zfs get all is slow due to uncached metadata

This project's goal is to make read-heavy channel programs and zfs(1m)
administrative commands faster by caching all the metadata that they will
need in the dbuf layer. This will prevent the data from being evicted, so
that any future call to i.e. zfs get all won't have to go to disk (very
much).

illumos/illumos-gate@adb52d9262f45a04318fc6e188fe2b7f59d989a5

Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Matthew Ahrens <mahrens@delphix.com>

MFC r336956: MFV r336955: 9236 nuke spa_dbgmsg

We should use zfs_dbgmsg instead of spa_dbgmsg. Or at least,
metaslab_condense() should call zfs_dbgmsg because it's important and rare
enough to always log. It's possible that the message in zio_dva_allocate()
would be too high-frequency for zfs_dbgmsg.

illumos/illumos-gate@21f7c81cc1156e9202ce3412d3ecaa697c3b2222

Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Matthew Ahrens <mahrens@delphix.com>

MFC r336954:
MFV r336952: 9192 explicitly pass good_writes to vdev_uberblock/label_sync

Currently vdev_label_sync and vdev_uberblock_sync take a zio_t and assume
that its io_private is a pointer to the good_writes count. They should
instead accept this argument explicitly.

illumos/illumos-gate@a3b5583021b7b45676bf1f0cc68adf7a97900b56

Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Matthew Ahrens <mahrens@delphix.com>

MFC r336951: MFV r336950: 9290 device removal reduces redundancy of mirrors

Mirrors are supposed to provide redundancy in the face of whole-disk failure
and silent damage (e.g. some data on disk is not right, but ZFS hasn't
detected the whole device as being broken). However, the current device
removal implementation bypasses some of the mirror's redundancy.

illumos/illumos-gate@3a4b1be953ee5601bab748afa07c26ed4996cde6

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Tim Chase <tim@chase2k.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Matthew Ahrens <mahrens@delphix.com>

MFC r336949:
MFV r336948: 9112 Improve allocation performance on high-end systems

On high-end systems running async sequential write workloads, especially
NUMA systems with flash or NVMe storage, one significant performance
bottleneck is selecting a metaslab to do allocations from. This process
can be parallelized, providing significant performance increases for
these workloads.

illumos/illumos-gate@f78cdc34af236a6199dd9e21376f4a46348c0d56

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Alexander Motin <mav@FreeBSD.org>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Paul Dagnelie <pcd@delphix.com>

MFC r336947: MFV r336946: 9238 ZFS Spacemap Encoding V2

The current space map encoding has the following disadvantages:
[1] Assuming 512 sector size each entry can represent at most 16MB for a segment.
This makes the encoding very inefficient for large regions of space.
[2] As vdev-wide space maps have started to be used by new features (i.e.
device removal, zpool checkpoint) we've started imposing limits in the
vdevs that can be used with them based on the maximum addressable offset
(currently 64PB for a top-level vdev).

The new remains backwards compatible with the old one. The introduced
two-word entry format, besides extending the limits imposed by the single-entry
layout, also includes a vdev field and some extra padding after its prefix.

The extra padding after the prefix should is reserved for future usage (e.g.
new prefixes for future encodings or new fields for flags). The new vdev field
not only makes the space maps more self-descriptive, but also opens the doors
for pool-wide space maps.

One final important note is that the number of bits used for vdevs is reduced
to 24 bits for blkptrs. That was decided as we don't know of any setups that
use more than 16M vdevs for the time being and
we wanted to fit the vdev field in the space map. In addition that gives us
some extra bits in dva_t.

illumos/illumos-gate@17f11284b49b98353b5119463254074fd9bc0a28

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <gwilson@zfsmail.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Serapheim Dimitropoulos <serapheim@delphix.com>

MFC r336945: MFV r336944: 9286 want refreservation=auto

When a ZFS volume is created with zfs create -V (but without -s), the
refreservation property is set to a value that is volsize plus the maximum
size of metadata. If refreservation is ever set to another value, it is
impossible to set it back to the automatically determined value. There are
other cases where refreservation may be wrong. These include receiving a
volume that was sent without properties and zfs clone.

We need:

zfs set refreservation=auto <volume>
zfs clone -o refreservation=auto <volume>

Each one would use the same function used by zfs create -V to determine the
proper value for refreservation.

illumos/illumos-gate@1c10ae76c0cb31326c320e7cef1d3f24a1f47125

Reviewed by: Allan Jude <allanjude@freebsd.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Mike Gerdts <mike.gerdts@joyent.com>

MFC r336943:
MFV r336942: 9189 Add debug to vdev_label_read_config when txg check fails

illumos/illumos-gate@b6bf6e1540f30bd97b8d6e2c21d95e17841e0f23

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Approved by: Matt Ahrens <mahrens@delphix.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>

MFC r338682: lld: add -z interpose support

-z interpose sets the DF_1_INTERPOSE flag, marking the object as an
interposer.

PR: 230604
Relnotes: Yes
Sponsored by: The FreeBSD Foundation

MFC r338251:
Add an lld option to emit PC-relative relocations for ifunc calls.

MFC r328810 (by emaste):
ld.lld.1: miscellaneous style improvements

MFC r329002 (by emaste):
Update ld.lld.1 based on the version committed upstream

MFC r329003 (by emaste):
ld.lld.1: explain long options may use one or two dashes

MFC r336875:

audit(4): add tests for sysctl(3) and sysarch(2)

Submitted by: aniketp
Sponsored by: Google, Inc. (GSoC 2018)
Differential Revision: https://reviews.freebsd.org/D16116

MFC r336728:

Introduce test program for auditpipe(4)

Submitted by: aniketp
Sponsored by: Google, Inc. (GSoC 2018)
Differential Revision: https://reviews.freebsd.org/D16395

MFC r336613:

auditd(8): Log a better error when no hostname is set in audit_control

Cherry-pick from https://github.com/openbsm/openbsm/commit/01ba03b

Reviewed by: cem
Obtained from: OpenBSM
Pull Request: https://github.com/openbsm/openbsm/pull/38

MFC r335792, r336564, r336579

r335792:
audit(4): add tests for several more administrative syscalls

Includes ntp_adjtime, auditctl, acct, auditon, and clock_settime.  Includes
quotactl, mount, nmount, swapon, and swapoff in failure mode only.  Success
tests for those syscalls will follow.  Also includes reboot(2) in failure
mode only.  That one can't be tested in success mode.

Submitted by: aniketp
Sponsored by: Google, Inc. (GSoC 2018)
Differential Revision: https://reviews.freebsd.org/D15898

r336564:
Separate the audit(4) tests for auditon(2)'s individual commands

auditon(2) is an ioctl-like syscall with several different variants, each of
which has a distinct audit event.  Write separate audit(4) tests for each
variant.

Submitted by: aniketp
Sponsored by: Google, Inc. (GSoC 2018)
Differential Revision: https://reviews.freebsd.org/D16255

r336579:
audit(4): add more test cases for auditon(2)

auditon(2) is an ioctl-like syscall with several different variants, each of
which has a distinct audit event.  This commit tests the remaining variants
that weren't tested in r336564.

Submitted by: aniketp
X-MFC-With: 336564
Sponsored by: Google, Inc. (GSoC 2018)
Differential Revision: https://reviews.freebsd.org/D16381

MFC r335319, r335354, r335374

r335319:
audit(4): add tests for send, recv, sendto, and recvfrom

Submitted by: aniketp
Sponsored by: Google, Inc. (GSoC 2018)
Differential Revision: https://reviews.freebsd.org/D15869

r335354:
audit(4): add tests for ioctl(2)

Submitted by: aniketp
Sponsored by: Google, Inc. (GSoC 2018)
Differential Revision: https://reviews.freebsd.org/D15872

r335374:
audit(4): add tests for utimes(2) and friends, mprotect, and undelete

Includes utimes(2), futimes(2), lutimes(2), futimesat(2), mprotect(2), and
undelete(2). undelete, for now, is tested only in failure mode.

Submitted by: aniketp
Sponsored by: Google, Inc. (GSoC 2018)
Differential Revision: https://reviews.freebsd.org/D15893

MFC r335261, r335275, r335284-r335285, r335294, r335318, r335320, r335703

r335261:
audit(4): add tests for pathconf(2) and friends

pathconf, lpathconf, and fpathconf are included

Submitted by: aniketp
Sponsored by: Google, Inc. (GSoC 2018)
Differential Revision: https://reviews.freebsd.org/D15842

r335275:
audit(4): add tests for chflags and friends

chflags, fchflags, and lchflags (but not chflagsat) are included.

Submitted by: aniketp
Sponsored by: Google, Inc. (GSoC 2018)
Differential Revision: https://reviews.freebsd.org/D15854

r335284:
audit(4): add tests for extattr_get_file(2) and friends

This commit includes extattr_{get_file, get_fd, get_link, list_file,
list_fd, list_link}. It does not include any syscalls that modify, set, or
delete extended attributes, as those are in a different audit class.

Submitted by: aniketpt
Sponsored by: Google, Inc. (GSoC 2018)
Differential Revision: https://reviews.freebsd.org/D15859

r335285:
audit(4): Add tests for a few syscalls in the ad class

The ad audit class is for administrative commands. This commit adds test
for settimeofday, adjtime, and getfh.

Submitted by: aniketp
Sponsored by: Google, Inc. (GSoC 2018)
Differential Revision: https://reviews.freebsd.org/D15861

r335294:
audit(4): add tests for connect, connectat, and accept

Submitted by: aniketp
Sponsored by: Google, Inc. (GSoC 2018)
Differential Revision: https://reviews.freebsd.org/D15853

r335318:
audit(4): add tests for extattr_set_file and friends

Includes extattr_{set_file, _set_fd, _set_link, _delete_file, _delete_fd,
_delete_link}

Submitted by: aniketp
Sponsored by: Google, Inc. (GSoC 2018)
Differential Revision: https://reviews.freebsd.org/D15867

r335320:
audit(4): Add tests for {get/set}auid, {get/set}audit, {get/set}audit_addr

Submitted by: aniketp
Sponsored by: Google, Inc. (GSoC 2018)
Differential Revision: https://reviews.freebsd.org/D15871

r335703:
audit(4): fix Coverity issues

Fix several incorrect buffer size arguments and a file descriptor leak.

Submitted by: aniketp
Reported by: Coverity
CID: 1393489 1393501 1393509 1393510 1393514 1393515 1393516
CID: 1393517 1393518 1393519
X-MFC-With: 335284
X-MFC-With: 335318
X-MFC-With: 335320
Sponsored by: Google, Inc. (GSoC 2018)
Differential Revision: https://reviews.freebsd.org/D16000

MFC many audit(4) tests.

MFC r334471, r334487, r334496, r334592, r334668, r334933, r335067, r335105,
r335136, r335140, r335145, r335207-r335208, r335215, and r335255-r335256.

r334471:
audit(4): Add tests for the fr class of syscalls

readlink and readlinkat are the only syscalls in this class.  open and
openat are as well, but they'll be handled in a different file.  Also, tidy
up the copyright headers of recently added files in this area.

Submitted by: aniketp
Sponsored by: Google, Inc. (GSoC 2018)
Differential Revision: https://reviews.freebsd.org/D15636

r334487:
audit(4): Add tests for the fw class of syscalls.

truncate and ftruncate are the only syscalls in this class, apart from
certain variations of open and openat, which will be handled in a different
file.

Submitted by: aniketp
Sponsored by: Google, Inc. (GSoC 2018)
Differential Revision: https://reviews.freebsd.org/D15640

r334496:
audit(4): add tests for the fd audit class

The only syscalls in this class are rmdir, unlink, unlinkat, rename, and
renameat.  Also, set is_exclusive for all audit(4) tests, because they can
start and stop auditd.

Submitted by: aniketp
Sponsored by: Google, Inc. (GSoC 2018)
Differential Revision: https://reviews.freebsd.org/D15647

r334592:
audit(4): add tests for the cl audit class

The only syscalls in this class are close, closefrom, munmap, and revoke.

Submitted by: aniketp
Sponsored by: Google, Inc. (GSoC 2018)
Differential Revision: https://reviews.freebsd.org/D15650

r334668:
audit(4): add tests for open(2) and openat(2)

These syscalls are atypical, because each one corresponds to several
different audit events, and they each pass several different audit class
filters.

Submitted by: aniketp
Sponsored by: Google, Inc. (GSoC 2018)
Differential Revision: https://reviews.freebsd.org/D15657

r334933:
audit(4): add tests for stat(2) and friends

This revision adds auditability tests for stat, lstat, fstat, and fstatat,
all from the fa audit class.  More tests from that audit class will follow.

Submitted by: aniketp
Sponsored by: Google, Inc. (GSoC 2018)
Differential Revision: https://reviews.freebsd.org/D15709

r335067:
audit(4): Fix file descriptor leaks in ATF tests

Submitted by: aniketp
Reported by: Coverity
CID: 1393343 1393346 1392695 1392781 1391709 1392078 1392413
CID: 1392014 1392521 1393344 1393345 1393347 1393348 1393349
CID: 1393354 1393355 1393356 1393357 1393358 1393360 1393362
CID: 1393368 1393369 1393370 1393371 1393372 1393373 1393376
CID: 1393380 1393384 1393387 1393388 1393389
Sponsored by: Google, Inc (GSoC 2018)
Differential Revision: https://reviews.freebsd.org/D15782

r335105:
audit(4): add tests for statfs(2), fstatfs(2), and getfsstat(2)

Submitted by: aniketp
Sponsored by: Google, Inc (GSoC 2018)
Differential Revision: https://reviews.freebsd.org/D15750

r335136:
audit(4): add tests for flock, fcntl, and fsync

Submitted by: aniketp
Sponsored by: Google, Inc (GSoC 2018)
Differential Revision: https://reviews.freebsd.org/D15795

r335140:
audit(4): fix typo from r335136

Typo in Makefile accidentally disabled some older tests

X-MFC-With: 335136

r335145:
audit(4): add tests for fhopen, fhstat, and fhstatfs

Submitted by: aniketp
Sponsored by: Google, Inc. (GSoC 2018)
Differential Revision: https://reviews.freebsd.org/D15798

r335207:
audit(4): add tests for access(2), chmod(2), and friends

access(2), eaccess(2), faccessat(2), chmod(2), fchmod(2), lchmod(2), and
fchmodat(2).

Submitted by: aniketp
Sponsored by: Google, Inc. (GSoC 2018)
Differential Revision: https://reviews.freebsd.org/D15805
Differential Revision: https://reviews.freebsd.org/D15808

r335208:
audit(4): improve formatting in tests/sys/audit/open.c

[skip ci]

Submitted by: aniketp
Sponsored by: Google, Inc. (GSoC 2018)
Differential Revision: https://reviews.freebsd.org/D15797

r335215:
audit(4): Add a few tests for network-related syscalls

Add tests for socket(2), socketpair(2), and setsockopt(2)

Submitted by: aniketp
Sponsored by: Google, Inc. (GSoC 2018)
Differential Revision: https://reviews.freebsd.org/D15803

r335255:
audit(4): add tests for bind(2), bindat(2), and listen(2)

Submitted by: aniketp
Sponsored by: Google, Inc. (GSoC 2018)
Differential Revision: https://reviews.freebsd.org/D15843

r335256:
audit(4): add tests for chown(2) and friends

Includes chown, fchown, lchown, and fchownat

Submitted by: aniketp
Sponsored by: Google, Inc. (GSoC 2018)
Differential Revision: https://reviews.freebsd.org/D15825

MFC r334360, r334362, r334388, r334395

r334360:
Add initial set of tests for audit(4)

This change includes the framework for testing the auditability of various
syscalls, and includes changes for the first 12. The tests will start
auditd(8) if needed, though they'll be much faster if it's already running.
The syscalls tested in this commit include mkdir(2), mkdirat(2), mknod(2),
mknodat(2), mkfifo(2), mkfifoat(2), link(2), linkat(2), symlink(2),
symlinkat(2), rename(2), and renameat(2).

Submitted by: aniketp
Sponsored by: Google, Inc (GSoC 2018)
Differential Revision: https://reviews.freebsd.org/D15286

r334362 by emaste:
Temporarily disconnect audit tests

Audit tests added in r334360 broke the build on a number of archs.
Remove the subdir from the top level tests/sys/Makefile until they're
fixed.

r334388:
Fix OpenBSM with GCC with -Wredundant-decls

Upstream change ed47534 consciously added some redundant functional
declarations, and I'm not sure why. AFAICT they were never required. On
FreeBSD, they break the build with GCC (but not Clang) for any program
including libbsm.h with WARNS=6.

Fix by cherry-picking upstream change
https://github.com/openbsm/openbsm/commit/0553c27

Reported by: emaste
Reviewed by: cem
Obtained from: OpenBSM
Pull Request: https://github.com/openbsm/openbsm/pull/31

r334395:
Revert r334362

Reconnect tests/sys/audit now that the GCC issue is fixed by 334388

X-MFC-With: 334362, 334360, 334388

MFC r338932:
Fix some uses of dmaplimit.

Document EN-18:09 through EN-18:12.

Sponsored by: The FreeBSD Foundation

MFC r337222:

Fix LOCAL_PEERCRED with socketpair(2)

Enable the LOCAL_PEERCRED socket option for unix domain stream sockets
created with socketpair(2). Previously, it only worked with unix domain
stream sockets created with socket(2)/listen(2)/connect(2)/accept(2).

PR: 176419
Reported by: Nicholas Wilson <nicholas@nicholaswilson.me.uk>
Differential Revision: https://reviews.freebsd.org/D16350

MFC r309554 and r309631 which breaks down overly long monolithic
souce file and reduces duplication by auto-generating functions
that only differ in the value of the SCM_XXX constant used.

This also fixes unintentional breakage introduced in earlier
MFC in r338617 that happens to rely on some of those changes.

Reported by: asomers
Pointy-hat goes to: sobomax

MFC r338216:

tftpd: Fix data corruption bug with netascii

Transferring files in netascii format requires, among other things,
translating all CR characters to a CR,NUL pair. tftpd does this correctly
except when the CR occurs as the last octet of a packet. In that case, it
erroneously drops the NUL which should be part of the following packet. The
bug was caused by using 0 as a sentinel value in a variable that could
legitimately hold 0. Fix it by switching the sentinel value to -1.

PR: 178055
Reported by: Richard <rsitze@gmail.com>
Reviewed by: cem
Differential Revision: https://reviews.freebsd.org/D16853

MFC r337973:

Add Modbus Application Protocol to /etc/services

IANA reassigned ports 502 and 802 on 2014-06-10

PR: 213276
Submitted by: Mark.Martinec@ijs.si

MFC r337911:

Fix the sys/opencrypto/runtests test when aesni(4) is already loaded

Apparently kldstat requires the full module name, including busname

Reported by: Jenkins

MFC r337779:

tftp: Close a resource leak when putting files

Reported by: Coverity
CID: 1394842

MFC r337482:

Bring VOP_LOOKUP(9) up to date

* Remove the cn_hash field (removed by r51906)
* Add the cn_lkflags field (added by r144285)
* Remove duplicate definition of cnp.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D16629

MFC r336871, r336874

r336871:
getrusage(2): fix return value under 32-bit emulation

According to the man page, getrusage(2) should return EFAULT if the rusage
argument lies outside of the process's address space. But due to an
oversight in r100384, that's never been the case during 32-bit emulation.
Fix it.

PR: 230153
Reported by: tests(7)
Reviewed by: cem
Differential Revision: https://reviews.freebsd.org/D16500

r336874:
freebsd32_getrusage(2): skip freebsd32_rusage_out on error

PR: 230153
Reported by: kib
X-MFC-With: 336871
Differential Revision: https://reviews.freebsd.org/D16500

MFC r336605:

Fix multiple Coverity warnings in tftpd(8)

* Initialize uninitialized variable (CID 1006502)
* strcpy => strlcpy (CID 1006792, 1006791, 1006790)
* Check function return values (CID 1009442, 1009441, 1009440)
* Delete dead code in receive_packet (not reported by Coverity)
* Remove redundant alarm(3) in receive_packet (not reported by Coverity)

Reported by: Coverity
CID: 1006502, 1006792, 1006791, 1006790, 1009442, 1009441, 1009440
Differential Revision: https://reviews.freebsd.org/D11287

MFC r336594:

Fix tmpfs detection in the sys/fs/tmpfs tests

This code was originally written for NetBSD. r306031 tried to adapt it to
FreeBSD, but didn't correctly handle the case that tmpfs was available, but
not already loaded. Fix the logic to load the module if necessary. The
tmpfs tests shouldn't be skipped anymore.

Also, fix a comment that was dislocated by r306031.

Reported by: Jenkins

MFC r336587:

tftpd(8): when completing an WRQ, flush the file before acknowleding receipt

tftpd(8) should flush a newly written file to disk before ACKing the final DATA
packet.  Otherwise there is a narrow race window when a subsequent read may not
see the file.  This is somewhat related to r330710, but the race window is much
smaller.  Hopefully this will fix the intermittent tests in Jenkins.

Reported by: Jenkins

MFC r336582:

makefs(8): add test case for PR 229929

Fix two failing makefs test cases by adding "-M 1m", which was already used
for every other FFS test case. Add a new test case for the underlying
issue: with no -M, -m, or -s options, makefs can underestimate image size.

PR: 229929
Reported by: Jenkins

MFC r315411 (mmel):
Unbreak traceroute on system built without CAPSICUM

MFC r313168 (by pkelsey):
  Fix VIMAGE-related bugs in TFO.  The autokey callout vnet context was
  not being initialized, and the per-vnet fastopen context was only
  being initialized for the default vnet.

  PR: 216613

MFC r338890:
  Update ifr_name before invoking IPSECSREQID ioctl, this fixes the case,
  when `ifconfig ipsec create reqid N` command invoked without interface
  unit number. The "name" global variable is updated after interface
  cloning in the ifclonecreate() and contains actual interface name.

  Reported by: lev

MFC r336165:

Removed pointless NULL check in rip_pcblist.

Sponsored by: Multiplay

MFC r334844, r336180, r336458

r334844

This originated from ZFS On Linux, as
https://github.com/zfsonlinux/zfs/commit/d4a72f23863382bdf6d0ae33196f5b5decbc48fd

During scans (scrubs or resilvers), it sorts the blocks in each transaction
group by block offset; the result can be a significant improvement. (On my
test system just now, which I put some effort to introduce fragmentation into
the pool since I set it up yesterday, a scrub went from 1h2m to 33.5m with the
changes.) I've seen similar rations on production systems.

r336180

Fix up some missed and mis-merges from the sequential scan code
(r334844). Most of the changes involve moving some code around to
reduce conflicts with future merges. One of the missing changes
included a notification on scrub cancellation.

r336458

Fix a couple of typos in r334844 noticed by Richard Kojedzinszky

Approved by: mav
Sponsored by: iXsystems, Inc

MFC r338913: Fix use-after-free in RAID0 error reporting of GEOM_RAID.

MFC r338655:

[ig4] Update list of supported hardware

Reflect the fact that ig4(4) is not an Intel-specific device but
a driver for Synopsys DesignWare I2C controller that now ships in
AMD systems too.

Approved by: re (kib), rpokala

MFC r338654, r338701

r338654:
[ig4] Add PCI IDs for I2C controller on Intel Kaby Lake systems

PR: 221777
Approved by: re (kib)
Submitted by: marc.priggemeyer@gmail.com

r338701:
[ig4] Fix device description for Kaby Lake systems

Kaby Lake I2C controller is Intel Sunrise Point-H not Intel Sunrise Point-LP.

Submitted by: Dmitry Luhtionov
Approved by: re (kib)

MFC r338111, r338215

r338111:
[ig4] add ACPI Device HID for AMD platforms

Added ACPI Device HID AMDI0010 for the designware I2C controllers in
future AMD platforms. Also, when verifying component version check for
minimal value instead of exact match.

PR: 230641
Submitted by: Rajesh <rajfbsd@gmail.com>
Reviewed by: cem, gonzo
Differential Revision: https://reviews.freebsd.org/D16670

r338215:
[ig4] Fix I/O timeout issue with Designware I2C controller on AMD platforms

Due to hardware limitation AMD I2C controller can't trigger pending
interrupt if interrupt status has been changed after clearing
interrupt status bits. So, I2C will lose the interrupt and IO will be
timed out. Implements a workaround to disable I2C controller interrupt
and re-enable I2C interrupt before existing interrupt handler.

Submitted by: rajfbsd@gmail.com
Differential Revision: https://reviews.freebsd.org/D16720

MFC r336050-r336051, r336142, r336326, r337719

r336050:
ig4(4): add support for Apollo Lake I2C controllers

Add PCI ids for I2C controllers on Apollo Lake platform. Also convert
switch/case probe logic into a table.

Reviewed by: avg
Differential Revision: https://reviews.freebsd.org/D16120

r336051:
ig4(4): Fix Apollo lake entries platform identifier

Identify Apollo Lake controllers as IG4_APL and not as a IG4_SKYLAKE

Reported by: rpokala@

r336142:
ig4(4): add devmatch(8) PNP info

Now that we have all devices ids in a table add MODULE_PNP_INFO macro
to let devmatch autoload module

r336326:
Remove MODULE_PNP_INFO for ig4(4) driver

ig4(4) does not support suspend/resume but present on the hardware where
such functionality is critical, like laptops. Remove PNP info to avoid
breaking suspend/resume on the systems where ig4(4) load is not explicitly
requested by the user.

PR: 229791
Reported by: Ali Abdallah

r337719:
[ig4] Fix initialization sequence for newer ig4 chips

Newer chips may require assert/deassert after power down for proper
startup. Check respective flag in DEVIDLE_CTRL and perform operation
if neccesssary.

PR: 221777
Submitted by: marc.priggemeyer@gmail.com
Obtained from: DragonFly BSD
Tested on: Thinkpad T470

MFC r336310:
  Let geli deal with lost devices without crashing.

  PR:           162036
  Submitted by: Fabian Keil <fk@fabiankeil.de>
  Obtained from:        ElectroBSD
  Discussed with: pjd@

MFC r338942:
Add PCIV_INVALID definition

From PCI Spec rev 2.2, 6.2.1. Device Identification:
Vendor ID This field identifies the manufacturer of the device. Valid
vendor identifiers are allocated by the PCI SIG to ensure uniqueness.
0FFFFh is an invalid value for Vendor ID.

Approved by: kib (mentor)
Sponsored by: Mellanox Technologies

MFC r338892:
Correct panic messages.

MFC r336017,r338799

r336017
This exposes ZFS user and group quotas via the normal
quatactl(2) mechanism. (Read-only at this point, however.)
In particular, this is to allow rpc.rquotad query quotas
for NFS mounts, allowing users to see their quotas on the
hosts using the datasets.

The changes specifically:

* Add new RPC entry points for querying quotas.
* Changes the library routines to allow non-UFS quotas.
* Changes rquotad to check for quotas on mounted filesystems,
rather than being limited to entries in /etc/fstab
* Lastly, adds a VFS entry-point for ZFS to query quotas.

Note that this makes one unavoidable behavioural change: if quotas
are enabled, then they can be queried, as opposed to the current
method of checking for quotas being specified in fstab. (With
ZFS, if there are user or group quotas, they're used, always.)

r338799
Author: kib

Fix ZFS VFS op quotactl to follow busy protocol.

Approved by: mav
Sponsored by: iXsystems, inc

MFC r338827:
Sync libarchive with vendor.

Relevant vendor changes:
PR #1019: Add allocation check for the zip_entry struct
Oss-Fuzz #10192: Handle whitespace-only ACL fields correctly

MFC 337673: Add an overview section to bus_dma.9.

Describe the role of tags and mapping objects as abstractions.
Describe static vs dynamic transaction types and give a brief overview
of the set of functions and object life cycles used for static vs
dynamic.

While here, fix a few other typos and expand a bit on parent tags.

MFC r338473: sh: Fix formal overflow in pointer arithmetic

The intention is to lower the value of the pointer, which according to ubsan
cannot be done by adding an unsigned quantity.

Reported by: kevans

MFC r338857:
Fix possible NULL pointer dereference in ffec_alloc_mbufcl().

PR: 231514

Check to ensure the buffer returned is not NULL.

Direct commit to the branch as this behavior is only seeing in stable/11.

Reported by: Thomas Barabosch, Fraunhofer FKIE
Reviewed by: wes@
Approved by: so
Security: FreeBSD-EN-18:10.syscall
Security: CVE-2018-17154