CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log

e1000: correctly set isc_pause_frames only when XOFF increases

From Jake:
The e1000 driver sets the iflib shared context isc_pause_frames value to
the number of received xoff frames. This is done so that the iflib
watchdog timer won't trigger a Tx Hang due to pause frames.

Unfortunately, the function simply sets it to the value of the xoffrxc
counter. Once the device has received a single XOFF packet, the driver
always reports that we received pause frames. This will prevent the Tx
hang detection entirely from that point on.

Fix this by assigning isc_pause_frames to a non-zero value if we
received any XOFF packets in the last timer interval.

We could attempt to calculate the total number of received packets by
doing a subtraction, but the iflib stack only seems to check if
isc_pause_frames is non-zero.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Submitted by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed by: gallatin@
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D21868

Ensure lld respects the WITH/WITHOUT_SHARED_TOOLCHAIN option

Traditionally, toolchain components such as cc, as, and ld have been
built as static executables. The WITH_SHARED_TOOLCHAIN option from
src.conf(5) is meant to link these as regular executables, e.g. using
shared libraries.

The build of ld.lld did not yet check this option. Fix the Makefile so
it will do so now.

Reported by: Mike Cui <cuicui@gmail.com>
PR: 241257
MFC after: 3 days

do_link_state_change() is executed in taskqueue context and in
general is allowed to sleep. Don't enter the epoch for the
whole duration. If some event handlers need the epoch, they
should handle that theirselves.

Discussed with: hselasky

Update some comments; no functional changes. Some historical old comments
in this driver indicate that the SD_CAPA register is write-once and after
being set one time the values in it cannot be changed. That turns out not
to be the case -- the values written to it survive a reset, but they can
be rewritten/changed at any time.

Revert r351218 (by manu).  While the changes in r351218 appear to be (and
should be) correct, they lead to the eMMC on a Beaglebone failing to work
in some situations.

The TI sdhci hardware is kind of strange.  The first device inherently
supports 1.8v and 3.3v and the abililty to switch between them, and the
other two devices must be set to 1.8v in the sdhci power control register to
operate correctly, but doing so actually makes them run at 3.3v (unless an
external level-shifter is present in the signal path).  Even the 1.8v on the
first device may actually be 3.3v (or any other value), depending on what
voltage is fed to the VDDS1-VDDS7 power supply pins on the am335x chip.

Another strange quirk is that the convention for am335x sdhci drivers in
linux and uboot and the am335x boot ROM seems to be to set the voltage in
the sdhci capabilities register to 3.0v even though the actual voltage is
3.3v.  Why this is done is a complete mystery to me, but it seems to be
required for correct operation.

If we had complete modern support for the am335x chip we could get the
actual voltages from the FDT data and the regulator framework.  But our
am335x code currently doesn't have any regulator framework support.
Reverting to the prior code will get the popular Beaglebone boards working
again.

This is part of the fix for PR 241301, but also requires r353651 for a
complete fix.

PR: 241301
Discussed with: manu

Relax the sdhci(4) check that filters out the 1.8v voltage option unless
the slot is flagged as 'embedded'.

The features related to embedded and shared slots were added in v3.0 of
the sdhci spec.  Hardware prior to v3 sometimes supported 1.8v on non-
removable devices in embedded systems, but had no way to indicate that
via the standard sdhci registers (instead they use out of band metadata
such as FDT data).

This change adds the controller specification version to the check for
whether to filter out the 1.8v selection.  On older hardware, the 1.8v
option is allowed to remain.  On 3.0 or later it still requires the
embedded-slot flag to remain.

This is part of the fix for PR 241301 (eMMC not detected on Beaglebone).
Changes to the sdhci_ti driver are also needed for a full fix.

PR: 241301

Clear PGA_WRITEABLE in moea_pvo_remove().

moea_pvo_remove() might remove the last mapping of a page, in which case
it is clearly no longer writeable. This can happen via pmap_remove(),
or when a CoW fault removes the last mapping of the old page.

Reported and tested by: bdragon
Reviewed by: alc, bdragon, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22044

fix section number in zfs-program.8

MFC after: 3 days

attach itwd to the module build on x86

MFC after: 19 days
X-MFC with: r353647

itwd(4): driver for watchdog function in ITE Super I/O chips

The chips are commonly named with "IT" prefix.

MFC after: 19 days

bectl(8): destroy: use BE_DESTROY_AUTOORIGIN if -o is not specified

-o will force the origin to be destroyed unconditionally.
BE_DESTROY_AUTOORIGIN, on the other hand, will only destroy the origin if it
matches the format used by be_snapshot. This lets us clean up the snapshots
that are clearly not user-managed (because we're creating them) while
leaving user-created snapshots in place and warning that they're still
around when the BE created goes away.

wbwd: move to superio(4) bus

This allows to remove a bunch of low level code.
Also, superio(4) provides safer interaction with other drivers
that work with Super I/O configuration registers.

Tested only on PCengines APU2:
superio0: <Nuvoton NCT5104D/NCT6102D/NCT6106D (rev. B+)> at port 0x2e-0x2f on isa0
wbwd0: <Nuvoton NCT6102 (0xc4/0x53) Watchdog Timer> at WDT ldn 0x08 on superio0

The watchdog output is incorrectly wired on that system and the watchdog
does not really do it its job, but the pulse can be seen with a signal
analyzer.

Reviewed by: delphij, bcr (man page)
MFC after: 19 days
Differential Revision: https://reviews.freebsd.org/D21979

libbe(3): add needed bits for be_destroy to auto-destroy some origins

New BEs can be created from either an existing snapshot or an existing BE.
If an existing BE is chosen (either implicitly via 'bectl create' or
explicitly via 'bectl create -e foo bar', for instance), then bectl will
create a snapshot of the current BE or "foo" with be_snapshot, with a name
formatted like: strftime("%F-%T") and a serial added to it.

This commit adds the needed bits for libbe or consumers to determine if a
snapshot names matches one of these auto-created snapshots (with some light
validation of the date/time/serial), and also a be_destroy flag to specify
that the origin should be automatically destroyed if possible.

A future commit to bectl will specify BE_DESTROY_AUTOORIGIN by default so we
clean up the origin in the most common case, non-user-managed snapshots.

move nctgpio to superio(4) bus

This is where it logically belongs.
The change allows to drop a bunch of low lewel code.

Reviewed by: gonzo
MFC after: 19 days
Differential Revision: https://reviews.freebsd.org/D21980

dwc3: Use a pair of ()'s around arguments for some macros

Reported by: hselasky
MFC after: 1 week
X-MFC-With: r353533

Use tables to store the information to decode the arm64 ID registers.

Arm updates these with each new architecture revision. To help keep them
updated use a collection of tables to hold the needed information to
decode these registers.

Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D22020

Stop leaking information from the kernel through timespec

The timespec struct holds a seconds value in a time_t and a nanoseconds
value in a long. On most architectures these are the same size, however
on 32-bit architectures other than i386 time_t is 8 bytes and long is
4 bytes.

Most ABIs will then pad a struct holding an 8 byte and 4 byte value to
16 bytes with 4 bytes of padding. When copying one of these structs the
compiler is free to copy the padding if it wishes.

In this case the padding may contain kernel data that is then leaked to
userspace. Fix this by copying the timespec elements rather than the
entire struct.

This doesn't affect Tier-1 architectures so no SA is expected.

admbugs: 651
MFC after: 1 week
Sponsored by: DARPA, AFRL

bsd.compat.mk isn't setup to be included outside of Makefile.inc so comment it
out here until that's sorted out. Otherwise the build is broken. when
TARGET_ARCH isn't defined.

MFV r353637: 10844 Serialize ZTHR operations to eliminate races

illumos/illumos-gate@6a316e1f6d32750bb8fcf2558dcb17b90ca580fd
https://github.com/illumos/illumos-gate/commit/6a316e1f6d32750bb8fcf2558dcb17b90ca580fd

https://www.illumos.org/issues/10844
ZoL 61c3391acc9 Serialize ZTHR operations to eliminate races

Portions contributed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Author: Serapheim Dimitropoulos <serapheim@delphix.com>
Obtained from: illumos, ZoL
MFC after: 3 weeks

10844 Serialize ZTHR operations to eliminate races

illumos/illumos-gate@6a316e1f6d32750bb8fcf2558dcb17b90ca580fd
https://github.com/illumos/illumos-gate/commit/6a316e1f6d32750bb8fcf2558dcb17b90ca580fd

https://www.illumos.org/issues/10844
ZoL 61c3391acc9 Serialize ZTHR operations to eliminate races

Portions contributed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Author: Serapheim Dimitropoulos <serapheim@delphix.com>

MFV r353630: 10809 Performance optimization of AVL tree comparator functions

illumos/illumos-gate@c4ab0d3f46036e85ad0700125c5a83cc139f55a3
https://github.com/illumos/illumos-gate/commit/c4ab0d3f46036e85ad0700125c5a83cc139f55a3

https://www.illumos.org/issues/10809
  Port ZoL ee36c709c3d Performance optimization of AVL tree comparator functions

This is a followup to r337567 that imported the ZoL commit directly into
FreeBSD.  It seems that at the time we did not have some of the earlier
changes, so some pieces of the ZoL change were not applicable.  Also,
the illumos version got a few style cleanups.  Some changes were missed
or incorrectly merged (e.g., vdev_cache_lastused_compare and
metaslab_rangesize_compare).

Obtained from: ZoL, illumos
MFC after: 25 days
X-MFC after: r353634

Fix panic in network stack due to use after free when receiving
partial fragmented packets before a network interface is detached.

When sending IPv4 or IPv6 fragmented packets and a fragment is lost
before the network device is freed, the mbuf making up the fragment
will remain in the temporary hashed fragment list and cause a panic
when it times out due to accessing a freed network interface
structure.

1) Make sure the m_pkthdr.rcvif always points to a valid network
interface. Else the rcvif field should be set to NULL.

2) Use the rcvif of the last received fragment as m_pkthdr.rcvif for
the fully defragged packet, instead of the first received fragment.

Panic backtrace for IPv6:

panic()
icmp6_reflect() # tries to access rcvif->if_afdata[AF_INET6]->xxx
icmp6_error()
frag6_freef()
frag6_slowtimo()
pfslowtimo()
softclock_call_cc()
softclock()
ithread_loop()

Reviewed by: bz
Differential Revision: https://reviews.freebsd.org/D19622
MFC after: 1 week
Sponsored by: Mellanox Technologies

MFV r348596: 9689 zfs range lock code should not be zpl-specific

illumos/illumos-gate@7931524763ef94dc16989451dddd206563d03bb4

FreeBSD note: some tweaking was needed to avoid a conflict with
sys/rangelock.h.

Author: Matthew Ahrens <mahrens@delphix.com>
Obtained from: illumos
MFC after: 3 weeks

VLAN_TRUNKDEV() requires epochification in ibcore after r353292.

Sponsored by: Mellanox Technologies

Replace rdma_is_upper_dev_rcu() with rdma_vlan_dev_real_dev() in ibcore.
This reduces the number of references to VLAN_TRUNKDEV() in ibcore.
Currently only VLAN is supported as a child interface in FreeBSD.
Remove superfluous RCU locking.

Sponsored by: Mellanox Technologies

VLAN_DEVAT() requires epochification in ipoib after r353292.

Sponsored by: Mellanox Technologies

10809 Performance optimization of AVL tree comparator functions

illumos/illumos-gate@c4ab0d3f46036e85ad0700125c5a83cc139f55a3
https://github.com/illumos/illumos-gate/commit/c4ab0d3f46036e85ad0700125c5a83cc139f55a3

https://www.illumos.org/issues/10809
  Port ZoL ee36c709c3d Performance optimization of AVL tree comparator functions
  From the ZoL commit msg:
      perf: 2.75x faster ddt_entry_compare()
      First 256bits of ddt_key_t is a block checksum, which are expected
      to be close to random data. Hence, on average, comparison only needs to
      look at first few bytes of the keys. To reduce number of conditional
      jump instructions, the result is computed as: sign(memcmp(k1, k2)).

      Sign of an integer 'a' can be obtained as: `(0 < a) - (a < 0)` := {-1, 0, 1},
      which is computed efficiently.  Synthetic performance evaluation
      of original and new algorithm over 1G random keys on 2.6GHz
      Intel(R) Xeon(R) CPU E5-2660 v3:
      old     6.85789 s
      new     2.49089 s

      perf: 2.8x faster vdev_queue_offset_compare() and vdev_queue_timestamp_compare()
          Compute the result directly instead of using conditionals

      perf: zfs_range_compare()
          Speedup between 1.1x - 2.5x, depending on compiler version and
          optimization level.

      perf: spa_error_entry_compare()
          `bcmp()` is not suitable for comparator use. Use `memcmp()` instead.

      perf: 2.8x faster metaslab_compare() and metaslab_rangesize_compare()
      perf: 2.8x faster zil_bp_compare()

Portions contributed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Author: Gvozden Neskovic <neskovic@gmail.com>

MFV r353628:

10842 Mutex leak in dsl_dataset_hold_obj()

illumos/illumos-gate@ad027c0ff9612bff8f47b43d8561da627f80cd29
https://github.com/illumos/illumos-gate/commit/ad027c0ff9612bff8f47b43d8561da627f80cd29

https://www.illumos.org/issues/10842
ZoL d10b2f1d35b Mutex leak in dsl_dataset_hold_obj()

Portions contributed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Author: Jorgen Lundman <lundman@lundman.net>
Obtained from: illumos, ZoL
MFC after: 15 days

10842 Mutex leak in dsl_dataset_hold_obj()

illumos/illumos-gate@ad027c0ff9612bff8f47b43d8561da627f80cd29
https://github.com/illumos/illumos-gate/commit/ad027c0ff9612bff8f47b43d8561da627f80cd29

https://www.illumos.org/issues/10842
ZoL d10b2f1d35b Mutex leak in dsl_dataset_hold_obj()

Portions contributed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Author: Jorgen Lundman <lundman@lundman.net>

fix wording / typos in r353625

Reported by: kib
MFC after: 4 weeks
X-MFC with: r353625, r353618

10841 predictive prefetch disabled on new pools until export/reboot

illumos/illumos-gate@0ce4bbcb47d8f86307fb8d2c84fd0f4e070f576e
https://github.com/illumos/illumos-gate/commit/0ce4bbcb47d8f86307fb8d2c84fd0f4e070f576e

https://www.illumos.org/issues/10841
  ZoL 944a37248a0 predictive prefetch disabled on new pools until export/reboot

  When a pool is initially created (by `zpool create`), predictive
  prefetch is inadvertently disabled, until the pool is
  export/import-ed, or the machine is rebooted.

  When device removal was introduced, we added some code to disable
  predictive prefetching until indirect vdevs have been loaded.  This
  resulted in the "default state" of prefetch being disabled, until we
  proactively enable it after indirect vdevs are loaded.  Unfortunately
  this resulted in a few bugs where in some code paths we neglect to
  enable predictive prefetch.  The first of these was fixed by
  https://github.com/zfsonlinux/zfs/commit/20507534d4ede14d4dd82c99fc8d461704ce7419

  This commit fixes another case where we also need to explicitly enable
  predictive prefetch, when the pool is initially created.

Author: Matthew Ahrens <mahrens@delphix.com>

zfs: add a lame emulation of cv_wait_sig(9) in userland to fix r353618

Not sure if we need anything better.
Maybe we should try to port illumos libfakekernel or provide something
similar natively.

MFC after: 4 weeks
X-MFC with: r353618

MFV r353623: 10473 zfs(1M) missing cross-reference to zfs-program(1M)

illumos/illumos-gate@736e6700391d17ab1494985a80076fc185722699
https://github.com/illumos/illumos-gate/commit/736e6700391d17ab1494985a80076fc185722699

https://www.illumos.org/issues/10473

Author: Jason King <jason.king@joyent.com>
Obtained from: illumos
MFC after: 6 days

10473 zfs(1M) missing cross-reference to zfs-program(1M)

illumos/illumos-gate@736e6700391d17ab1494985a80076fc185722699
https://github.com/illumos/illumos-gate/commit/736e6700391d17ab1494985a80076fc185722699

https://www.illumos.org/issues/10473

Author: Jason King <jason.king@joyent.com>

Fix assert in PowerPC pmaps after introduction of object busy.

The VM_PAGE_OBJECT_BUSY_ASSERT() in pmap_enter() implementation should
be only asserted when the code is executed as result of pmap_enter(),
not when the same code is entered from e.g. pmap_enter_quick(). This
is relevant for all PowerPC pmap variants, because mmu_*_enter() is
used as the backend, and assert is located there.

Add a PowerPC private pmap_enter() PMAP_ENTER_QUICK_LOCKED flag to
indicate that the call is not from pmap_enter(). For non-quick-locked
calls, assert that the object is locked.

Reported and tested by: bdragon
Reviewed by: alc, bdragon, markj
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D22041

MFV r353619: 9691 fat zap should prefetch when iterating

illumos/illumos-gate@52abb70e073c2a88808c0d66fd810ba8c5080572
https://github.com/illumos/illumos-gate/commit/52abb70e073c2a88808c0d66fd810ba8c5080572

https://www.illumos.org/issues/9691
  When iterating over a ZAP object, we're almost always certain to
  iterate over the entire object. If there are multiple leaf blocks, we
  can realize a performance win by issuing reads for all the leaf blocks
  in parallel when the iteration begins.
  For example, if we have 10,000 snapshots, "zfs destroy -nv
  pool/fs@1%9999" can take 30 minutes when the cache is cold. This
  change provides a >3x performance improvement, by issuing the reads
  for all ~64 blocks of each ZAP object in parallel.

Author: Matthew Ahrens <mahrens@delphix.com>
Obtained from: illumos
MFC after: 2 weeks

9691 fat zap should prefetch when iterating

illumos/illumos-gate@52abb70e073c2a88808c0d66fd810ba8c5080572
https://github.com/illumos/illumos-gate/commit/52abb70e073c2a88808c0d66fd810ba8c5080572

https://www.illumos.org/issues/9691
  When iterating over a ZAP object, we're almost always certain to
  iterate over the entire object. If there are multiple leaf blocks, we
  can realize a performance win by issuing reads for all the leaf blocks
  in parallel when the iteration begins.
  For example, if we have 10,000 snapshots, "zfs destroy -nv
  pool/fs@1%9999" can take 30 minutes when the cache is cold. This
  change provides a >3x performance improvement, by issuing the reads
  for all ~64 blocks of each ZAP object in parallel.

Author: Matthew Ahrens <mahrens@delphix.com>

MFV r353617: 9425 allow channel programs to be stopped via signals

illumos/illumos-gate@d0cb1fb92629bc0283c88d4719df7285c1612700
https://github.com/illumos/illumos-gate/commit/d0cb1fb92629bc0283c88d4719df7285c1612700

https://www.illumos.org/issues/9425
  Problem Statement
  ZFS Channel program scripts currently require a timeout, so that hung
  or long-running scripts return a timeout error instead of causing ZFS
  to get wedged.  This limit can currently be set up to 100 million Lua
  instructions. Even with a limit in place, it would be desirable to
  have a sys admin (support engineer) be able to cancel a script that is
  taking a long time.

  Proposed Solution
  Make it possible to abort a channel program by sending an interrupt
  signal.In the underlying txg_wait_sync function, switch the cv_wait to
  a cv_wait_sig to catch the signal. Once a signal is encountered, the
  dsl_sync_task function can install a Lua hook that will get called
  before the Lua interpreter executes a new line of code. The
  dsl_sync_task can resume with a standard txg_wait_sync call and wait
  for the txg to complete. Meanwhile, the hook will abort the script and
  indicate that the channel program was canceled. The kernel returns a
  EINTR to indicate that the channel program run was canceled.

FreeBSD note: the return value of cv_wait_sig() has inverted meaning
between us and illumos.

Author: Don Brady <don.brady@delphix.com>
Obtained from: illumos
MFC after: 4 weeks

9425 allow channel programs to be stopped via signals

illumos/illumos-gate@d0cb1fb92629bc0283c88d4719df7285c1612700
https://github.com/illumos/illumos-gate/commit/d0cb1fb92629bc0283c88d4719df7285c1612700

https://www.illumos.org/issues/9425
  Problem Statement

  ZFS Channel program scripts currently require a timeout, so that hung
  or long- running scripts return a timeout error instead of causing ZFS
  to get wedged.  This limit can currently be set up to 100 million Lua
  instructions. Even with a limit in place, it would be desirable to
  have a sys admin (support engineer) be able to cancel a script that is
  taking a long time.

  Proposed Solution

  Make it possible to abort a channel program by sending an interrupt
  signal.In the underlying txg_wait_sync function, switch the cv_wait to
  a cv_wait_sig to catch the signal. Once a signal is encountered, the
  dsl_sync_task function can install a Lua hook that will get called
  before the Lua interpreter executes a new line of code. The
  dsl_sync_task can resume with a standard txg_wait_sync call and wait
  for the txg to complete. Meanwhile, the hook will abort the script and
  indicate that the channel program was canceled. The kernel returns a
  EINTR to indicate that the channel program run was canceled.

Author: Don Brady <don.brady@delphix.com>

MFV r353615: 9485 Optimize possible split block search space

illumos/illumos-gate@a21fe349793c3805ec504bbe5e9acf06c2d63d7a
https://github.com/illumos/illumos-gate/commit/a21fe349793c3805ec504bbe5e9acf06c2d63d7a

https://www.illumos.org/issues/9485
Port this commit from ZoL:
https://github.com/zfsonlinux/zfs/commit/4589f3ae4c1bb435777da8640eb915f3c713b14d

Author: Brian Behlendorf <behlendorf1@llnl.gov>
Obtained from: illumos, ZoL
MFC after: 3 weeks

9485 Optimize possible split block search space

illumos/illumos-gate@a21fe349793c3805ec504bbe5e9acf06c2d63d7a
https://github.com/illumos/illumos-gate/commit/a21fe349793c3805ec504bbe5e9acf06c2d63d7a

https://www.illumos.org/issues/9485
Port this commit from ZoL:
https://github.com/zfsonlinux/zfs/commit/4589f3ae4c1bb435777da8640eb915f3c713b14d

Author: Brian Behlendorf <behlendorf1@llnl.gov>

MFV r353613: 10731 zfs: NULL pointer errors

FreeBSD already had these changes locally.
This commit removes a small formatting difference.

MFC after: 1 week

10731 zfs: NULL pointer errors

illumos/illumos-gate@dd328bf6d39366b8d7bde6a36114538fc14332dd
https://github.com/illumos/illumos-gate/commit/dd328bf6d39366b8d7bde6a36114538fc14332dd

https://www.illumos.org/issues/10731

Author: Toomas Soome <tsoome@me.com>

MFC r353611: 10330 merge recent ZoL vdev and metaslab changes

illumos/illumos-gate@a0b03b161c4df3cfc54fbc741db09b3bdc23ffba
https://github.com/illumos/illumos-gate/commit/a0b03b161c4df3cfc54fbc741db09b3bdc23ffba

https://www.illumos.org/issues/10330
  3 recent ZoL changes in the vdev and metaslab code which we can pull over:
  PR 8324 c853f382db 8324 Change target size of metaslabs from 256GB to 16GB
  PR 8290 b194fab0fb 8290 Factor metaslab_load_wait() in metaslab_load()
  PR 8286 419ba59145 8286 Update vdev_is_spacemap_addressable() for new spacemap
  encoding

Author: Serapheim Dimitropoulos <serapheimd@gmail.com>
Obtained from: illumos, ZoL
MFC after: 2 weeks

10330 merge recent ZoL vdev and metaslab changes

illumos/illumos-gate@a0b03b161c4df3cfc54fbc741db09b3bdc23ffba
https://github.com/illumos/illumos-gate/commit/a0b03b161c4df3cfc54fbc741db09b3bdc23ffba

https://www.illumos.org/issues/10330
  3 recent ZoL changes in the vdev and metaslab code which we can pull over:
  PR 8324 c853f382db 8324 Change target size of metaslabs from 256GB to 16GB
  PR 8290 b194fab0fb 8290 Factor metaslab_load_wait() in metaslab_load()
  PR 8286 419ba59145 8286 Update vdev_is_spacemap_addressable() for new spacemap
  encoding

Author: Serapheim Dimitropoulos <serapheimd@gmail.com>

10230 zfs mishandles partial writes

illumos/illumos-gate@b0ef425652e5cfce27df9fa5826a9cd64cee110a
https://github.com/illumos/illumos-gate/commit/b0ef425652e5cfce27df9fa5826a9cd64cee110a

https://www.illumos.org/issues/10230
  The trinity fuzzer calls pwritev with an iovec that has one or more entries
  which point to some initial valid data and then the rest point to addresses
  which are not mapped. This yields EFAULT once the write hits the invalid
  address, but we do successfully complete some amount of writing. The zfs_write
  code does not handle this properly. It loses track of the error return from
  dmu_write_uio_dbuf and it has an invalid ASSERT which does not account for the
  partial write case.

Author: Jerry Jelinek <jerry.jelinek@joyent.com>

MFV r353608: 10165 libzpool: passing argument 1 to restrict-qualified parameter

illumos/illumos-gate@f91fcf59ac2fd04f1816f3dcbc69a46d44276a65
https://github.com/illumos/illumos-gate/commit/f91fcf59ac2fd04f1816f3dcbc69a46d44276a65

https://www.illumos.org/issues/10165

Author: Toomas Soome <tsoome@me.com>
MFC after: 10 days

10165 libzpool: passing argument 1 to restrict-qualified parameter aliases with argument 4

illumos/illumos-gate@f91fcf59ac2fd04f1816f3dcbc69a46d44276a65
https://github.com/illumos/illumos-gate/commit/f91fcf59ac2fd04f1816f3dcbc69a46d44276a65

https://www.illumos.org/issues/10165

Author: Toomas Soome <tsoome@me.com>

MFV r353606: 10067 Miscellaneous man page typos

https://www.illumos.org/issues/10067
fileystem - man1m/zfs.1m man1m/boot.1m

Author: Peter Tribble <peter.tribble@gmail.com>
Obtained from: illumos
MFC after: 1 week

10067 Miscellaneous man page typos

illumos/illumos-gate@e61d7e85ebb4a7361eeb10639b742a92e0bf5e55
https://github.com/illumos/illumos-gate/commit/e61d7e85ebb4a7361eeb10639b742a92e0bf5e55

https://www.illumos.org/issues/10067
fileystem - man1m/zfs.1m man1m/boot.1m

Author: Peter Tribble <peter.tribble@gmail.com>

10154 zfs: cast between incompatible function types

illumos/illumos-gate@c62757b2b8b6c26589d7704d0ff20beb107fcd9a
https://github.com/illumos/illumos-gate/commit/c62757b2b8b6c26589d7704d0ff20beb107fcd9a

https://www.illumos.org/issues/10154

Author: Toomas Soome <tsoome@me.com>

powerpc/mpc85xx: Fix function type for fsl_pcib_error_intr()

Since it's only called as an interrupt handler, fsl_pcib_eror_intr() should just
match the driver_intr_t type.

Reported by: bdragon

powerpc: Add AmigaOne platform, a subclass of MPC85xx

Summary:
The AmigaOne platform, encompassing the X5000 and A1222 at this time, is
based on the mpc85xx platform, but includes some things not listed in
the device tree. Some custom devices, like CPLD, could be added to the
device tree with an overlay, or other means. However, some cannot
easily be done, such as the power button interrupt.

The directory will also become a location to add AmigaOne platform drivers,
such as the aforementioned CPLD, and its children.

Reviewed by: bdragon
Differential Revision: https://reviews.freebsd.org/D21829

Fix including bsd.compat.mk outside Makefile.libcompat on mips64.

Reported by: jhb, jenkins

Generalize ARM specific comments in devmap

The comments in devmap are very ARM specific, this generalizes them for other
architectures.

Submitted by: Nicholas O'Brien <nickisobrien_gmail.com>
Reviewed by: manu, philip
Sponsored by: Axiado
Differential Revision: https://reviews.freebsd.org/D22035

ixgbe: Disable EEE for backplane X550EM_X

From Zach:
Intel documentation indicates that backplane X550EM_X KR devices do not
support Energy Efficient Ethernet. Prior to this patch, X552 devices
(device ID 0x15AB) will crash the system when transitioning EEE state
via sysctl.

Signed-off-by: Zach Vargas <zvargas@xes-inc.com>
PR: 240320
Submitted by: Zach Vargas <zvargas@xes-inc.com>
Reviewed by: erj@
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D21673

Missing from r353596.

Add the ability to link programs against a compat ABI.

Linkage is controlled by two make knobs:
WANT_COMPAT - Prefer to link against the compat ABI.
NEED_COMPAT - Link against the compat ABI or fail to build.

Supported values are "32", "soft", and "any". The latter meaning pick
the first[0] supported compat ABI.

This can be used to provide test binaries for compat ABIs or to link
ABI-specific programs.

[0] We currently support only one compat ABI at a time, but this may
change in the future and some code in this commit is structured to ease
that change.

Reviewed by: bdrewery, jhb
Obtained from: CheriBSD (in concept)
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D22023

When assertion for a thread not being in an epoch fails also print all
entered epochs. Works with EPOCH_TRACE only.

Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D22017

Build compat libraries before "everything".

This is required for us to link programs against compat versions of
libraries.

Reviewed by: bdrewery, jhb
Sponsored by: DARPA, AFRL

Allow OBJDIR to be overridden for LIB*DIR variables.

This will allow us to link against internal libraries when building
programs for the system's LIBCOMPAT ABI.

Reviewed by: bdrewery
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL

Rename top-level LIBCOMPAT to _LIBCOMPAT.

This avoids a conflict with LIBCOMPAT defined in bsd.libnames.mk.

Reviewed by: bdrewery
Sponsored by: DARPA, AFRL

Move the per-ARCH definitions to bsd.compat.mk.

This is the first step if refactoring the definitions to allow programs
to be selectively linked against libcompat libraries.

Reviewed by: bdrewery
Sponsored by: DARPA, AFRL

Add copyrights that I forgot to add when splitting arb.h off from tree.h.
While here clean up the RCS tags.

Suggested by: lstewart
MFC after: 2 weeks
Sponsored by: Klara Inc, Netflix

Install an ACPI PCI bus notify handler.

Rescan a PCI bus when the ACPI_NOTIFY_BUS_CHECK event is posted to a
PCI bus.

Reviewed by: scottl
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D21948

Support hot insertion and removal of PCI devices on EC2.

Install ACPI notify handlers on PCI devices with an _EJ0 method.  This
handler is invoked when devices are added or removed.

- When an ACPI_NOTIFY_DEVICE_CHECK event posts, rescan the parent bus
  device.  Note that strictly speaking we only need to rescan the
  specified device, but BUS_RESCAN is what is available, so we rescan
  the entire bus.
- When an ACPI_NOTIFY_EJECT_REQUEST event posts, detach the device
  associated with the ACPI handle, invoke the _EJ0 method, and then
  delete the device.

Eventually this might be changed to vector notify events to devd in
userspace where devctl can be used instead to permit more complex
actions such as graceful unmounting of filesystems.

Tested by: cperciva
Reviewed by: cperciva, imp, scottl
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D21948

Export pci_attach() and pci_detach().

Reviewed by: imp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D21948

Use __FreeBSD_version to determine if gets() has been removed.

GCC compilers set __FreeBSD__ statically to a build-time determined
targeted version (which in ports always matches the build host's
version). This means that when building any version (12 or 13, etc.)
of riscv or some other architecture via GCC on a 12.x host,
__FreeBSD__ will always be set to 12. As a result, __FreeBSD__ cannot
be used to reliably detect the target FreeBSD version being built.
Instead, __FreeBSD_version from either <sys/param.h> (in the kernel)
or <osreldate.h> (in userland) should be used.

This changes the gets() test in libc++ to use __FreeBSD_version from
<osreldate.h>.

Reported by: jenkins (riscv64 and amd64-gcc)
Reviewed by: dim, imp
Differential Revision: https://reviews.freebsd.org/D22034

cxgbe(4): An EQ update can be requested in a TX_PKTS2 work request.

MFC after: 1 week
Sponsored by: Chelsio Communications

Use -march=octeon+ for OCTEON1.

External binutils requires octeon+ for saa.

Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D22033

Fix dwmmc(4) driver attachment when ext_resources are not present.

Ignore only ENOENT (no DTS properties found) and ENODEV (driver not
present) non-zero return values from ext_resources.

Reviewed by: manu
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D22043

Fix a write-only variable warning from external GCC.

Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D22032

Don't set the OUTPUT_FORMAT explicitly but let ld derive it.

This fixes an error with modern ld.bfd and is inline with the changes in
r215251 and r217612.

Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D22031

Update MIPS kernel builds to work with mips-gcc.

- Use a default -march of mips64 on N64 and N32 kernels.
- Set the endianness (via MIPS_ENDIAN) and ABI (via MIPS_ABI) in
  CFLAGS from MACHINE_ARCH.  ARCH_FLAGS now only sets a different
  -march value if needed.
- TRAMP_ARCH_FLAGS inherits MIPS_ENDIAN from MACHINE_ARCH but does
  not set the ABI since XLPN32 needs an N64 ABI for the trampoline
  loader.  When TRAMP_ARCH_FLAGS is used it must set both -march
  and -mabi.

Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D22030

rpcgen: make compiler arglist allocation dynamic

Limit argmax to an absurdly large value prevent overflow (no overflow
possible on FreeBSD due to ARG_MAX).

In CheriBSD we exceed the 19 non-NULL arguments in the static array. Add
a simple size doubling allocator and increase the default to 32.

GC remnants of support for fixed arguments.

Reviewed by: archardson (prior version), James Clarke (prior version)
MFC after: 1 week
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D21971

fix up r353565, somehow a few files did not get committed

MFC after: 3 weeks
X-MFC with: r353565

Remove pfctlinput2(). It came from KAME and had never ever been in use.

MFV r353561: 10343 ZoL: Prefix all refcount functions with zfs_

illumos/illumos-gate@e914ace2e9d9bf2dbf9a1f1ce81cb776022096f5
https://github.com/illumos/illumos-gate/commit/e914ace2e9d9bf2dbf9a1f1ce81cb776022096f5

https://www.illumos.org/issues/10343
  On the openzfs feature/porting matrix, this is listed as:
  prefix to refcount funcs/types
  Having these changes will make it easier to share other work across the
  different ZFS operating systems.
  PR 7963 424fd7c3e Prefix all refcount functions with zfs_
  PR 7885 & 7932 c13060e47 Linux 4.19-rc3+ compat: Remove refcount_t compat
  PR 5823 & 5842 4859fe796 Linux 4.11 compat: avoid refcount_t name conflict

Author: Tim Schumacher <timschumi@gmx.de>
Obtained from: illumos, ZoL
MFC after: 3 weeks

10343 ZoL: Prefix all refcount functions with zfs_

illumos/illumos-gate@e914ace2e9d9bf2dbf9a1f1ce81cb776022096f5
https://github.com/illumos/illumos-gate/commit/e914ace2e9d9bf2dbf9a1f1ce81cb776022096f5

https://www.illumos.org/issues/10343
  On the openzfs feature/porting matrix, this is listed as:
  prefix to refcount funcs/types
  Having these changes will make it easier to share other work across the
  different ZFS operating systems.
  PR 7963 424fd7c3e Prefix all refcount functions with zfs_
  PR 7885 & 7932 c13060e47 Linux 4.19-rc3+ compat: Remove refcount_t compat
  PR 5823 & 5842 4859fe796 Linux 4.11 compat: avoid refcount_t name conflict

Author: Tim Schumacher <timschumi@gmx.de>

MFV r353558: 10572 10579 Fix race in dnode_check_slots_free()

illumos/illumos-gate@aa02ea01948372a32cbf08bfc31c72c32e3fc81e
https://github.com/illumos/illumos-gate/commit/aa02ea01948372a32cbf08bfc31c72c32e3fc81e

10572 Fix race in dnode_check_slots_free()
https://www.illumos.org/issues/10572
  The Fix from ZoL:
  Currently, dnode_check_slots_free() works by checking dn->dn_type
  in the dnode to determine if the dnode is reclaimable. However,
  there is a small window of time between dnode_free_sync() in the
  first call to dsl_dataset_sync() and when the useraccounting code
  is run when the type is set DMU_OT_NONE, but the dnode is not yet
  evictable, leading to crashes. This patch adds the ability for
  dnodes to track which txg they were last dirtied in and adds a
  check for this before performing the reclaim.

  This patch also corrects several instances when dn_dirty_link was
  treated as a list_node_t when it is technically a multilist_node_t.

10579 Don't allow dnode allocation if dn_holds != 0
https://www.illumos.org/issues/10579
  The fix from ZoL:
  This patch simply fixes a small bug where dnode_hold_impl() could
  attempt to allocate a dnode that was in the process of being freed,
  but which still had active references. This patch simply adds the
  required check.

Author: Tom Caputi <tcaputi@datto.com>
Reported by: delphij
MFC after: 2 weeks
X-MFC with: r353176

10572 10579 Fix race in dnode_check_slots_free()

illumos/illumos-gate@aa02ea01948372a32cbf08bfc31c72c32e3fc81e
https://github.com/illumos/illumos-gate/commit/aa02ea01948372a32cbf08bfc31c72c32e3fc81e

10572 Fix race in dnode_check_slots_free()
https://www.illumos.org/issues/10572
  The Fix from ZoL:
  Currently, dnode_check_slots_free() works by checking dn->dn_type
  in the dnode to determine if the dnode is reclaimable. However,
  there is a small window of time between dnode_free_sync() in the
  first call to dsl_dataset_sync() and when the useraccounting code
  is run when the type is set DMU_OT_NONE, but the dnode is not yet
  evictable, leading to crashes. This patch adds the ability for
  dnodes to track which txg they were last dirtied in and adds a
  check for this before performing the reclaim.

  This patch also corrects several instances when dn_dirty_link was
  treated as a list_node_t when it is technically a multilist_node_t.

10579 Don't allow dnode allocation if dn_holds != 0
https://www.illumos.org/issues/10579
  The fix from ZoL:
  This patch simply fixes a small bug where dnode_hold_impl() could
  attempt to allocate a dnode that was in the process of being freed,
  but which still had active references. This patch simply adds the
  required check.

Author: Tom Caputi <tcaputi@datto.com>

MFV r353551: 10452 ZoL: merge in large dnode feature fixes

illumos/illumos-gate@946342a260bbae359b48bf142ec1fe40792ee862
https://github.com/illumos/illumos-gate/commit/946342a260bbae359b48bf142ec1fe40792ee862

https://www.illumos.org/issues/10452
  illumos is missing a few small follow up ZoL bug fixes for the large dnode
  feature. We should pull those in.
  Those commits are in the ZoL tree as (newest to oldest):
  PR 8435 - 75d6b7ddca269542279975f716a343bb40a79baf - Add missing copyright
  notice to large_dnode tests
  PR 7433 - e14a32b1c844d924b9f093375c0badcf10f61741 - Fix object reclaim when
  using large dnodes
  PR 6616 - 48fbb9ddbf2281911560dfbc2821aa8b74127315 - Free objects when
  receiving full stream as clone
  PR 6695 - 39f56627ae988d09b4e3803c01c22b2026b2310e - receive_freeobjects()
  skips freeing some object

Portions contributed by: Ned Bass <bass6@llnl.gov>
Portions contributed by: Tom Caputi <tcaputi@datto.com>
Author: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Obtained from: illumos, ZoL
MFC after: 2 weeks
X-MFC with: r353176

10452 ZoL: merge in large dnode feature fixes

illumos/illumos-gate@946342a260bbae359b48bf142ec1fe40792ee862
https://github.com/illumos/illumos-gate/commit/946342a260bbae359b48bf142ec1fe40792ee862

https://www.illumos.org/issues/10452
  illumos is missing a few small follow up ZoL bug fixes for the large dnode
  feature. We should pull those in.
  Those commits are in the ZoL tree as (newest to oldest):
  PR 8435 - 75d6b7ddca269542279975f716a343bb40a79baf - Add missing copyright
  notice to large_dnode tests
  PR 7433 - e14a32b1c844d924b9f093375c0badcf10f61741 - Fix object reclaim when
  using large dnodes
  PR 6616 - 48fbb9ddbf2281911560dfbc2821aa8b74127315 - Free objects when
  receiving full stream as clone
  PR 6695 - 39f56627ae988d09b4e3803c01c22b2026b2310e - receive_freeobjects()
  skips freeing some object

Portions contributed by: Ned Bass <bass6@llnl.gov>
Portions contributed by: Tom Caputi <tcaputi@datto.com>
Author: Fabian Grünbichler <f.gruenbichler@proxmox.com>

The two functions ifnet_byindex() and ifnet_byindex_locked() are exactly the
same after the network stack was epochified. Merge the two into one function
and cleanup all uses of ifnet_byindex_locked().

While at it:
- Add branch prediction macros.
- Make sure the ifnet pointer is only deferred once,
also when code optimisation is disabled.

Sponsored by: Mellanox Technologies

Exclude the network link eventhandler from epochification after r353292.

This fixes the following assert when "options RATELIMIT" is used:
panic()
malloc()
sysctl_add_oid()
tcp_rl_ifnet_link()
do_link_state_change()
taskqueue_run_locked()

Sponsored by: Mellanox Technologies

Fix missing epochification of the LinuxKPI after r353292.

Sponsored by: Mellanox Technologies

Fix missing epochification of the ibcore code after r353292.

Sponsored by: Mellanox Technologies

Fix missing epochification of the ipoib code after r353292.

Sponsored by: Mellanox Technologies

Explicitly initialize the memory buffer to store O_ICMP6TYPE opcode.

By default next_cmd() initializes only first u32 of opcode. O_ICMP6TYPE
opcode has array of bit masks to store corresponding ICMPv6 types.
An opcode that precedes O_ICMP6TYPE, e.g. O_IP6_DST, can have variable
length and during opcode filling it can modify memory that will be used
by O_ICMP6TYPE opcode. Without explicit initialization this leads to
creation of wrong opcode.

Reported by: Boris N. Lytochkin
Obtained from: Yandex LLC
MFC after: 3 days

boot1.efi: provide generic exit() and stub getchar()

panic() is expecting us to have exit and getchar, lets provide those.

tests: basic VLAN test

Set up two jails connected by an epair. Create VLAN interfaces in both
jails and check connectivity.

This is a very basic test, but exposed panics during the network stack
epoch work, so this is worth testing.

(6/6) Convert pmap to expect busy in write related operations now that all
callers hold it.

This simplifies pmap code and removes a dependency on the object lock.

Reviewed by:    kib, markj
Tested by:      pho
Sponsored by:   Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21596

(5/6) Move the VPO_NOSYNC to PGA_NOSYNC to eliminate the dependency on the
object lock in vm_page_set_validclean().

Reviewed by:    kib, markj
Tested by:      pho
Sponsored by:   Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21595

(4/6) Protect page valid with the busy lock.

Atomics are used for page busy and valid state when the shared busy is
held.  The details of the locking protocol and valid and dirty
synchronization are in the updated vm_page.h comments.

Reviewed by:    kib, markj
Tested by:      pho
Sponsored by:   Netflix, Intel
Differential Revision:        https://reviews.freebsd.org/D21594

(3/6) Add a shared object busy synchronization mechanism that blocks new page
busy acquires while held.

This allows code that would need to acquire and release a very large number
of page busy locks to use the old mechanism where busy is only checked and
not held.  This comes at the cost of false positives but never false
negatives which the single consumer, vm_fault_soft_fast(), handles.

Reviewed by:    kib
Tested by:      pho
Sponsored by:   Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21592

(2/6) Don't release xbusy in vm_page_remove(), defer to vm_page_free_prep().

This persists busy state across operations like rename and replace.

Reviewed by:    kib, markj
Tested by:      pho
Sponsored by:   Netflix, Intel
Differential Revision:  https://reviews.freebsd.org/D21549

powerpc/atomic: Fix atomic_cmpset_rel()

Need a release barrier, not an acquire barrier, else bad things happen.

(1/6) Replace busy checks with acquires where it is trival to do so.

This is the first in a series of patches that promotes the page busy field
to a first class lock that no longer requires the object lock for
consistency.

Reviewed by: kib, markj
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21548

arm: allwinner: Add np and nmm clock file to the build

MFC after: 1 month

arm64: Add Synopsys DWC3 driver

This add a driver for the Synopsys DWC3 driver found on multiple SoCs.
It only supports host mode for now.

MFC after: 1 month