]> CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log
FreeBSD/FreeBSD.git
2 years agoixl(4): Fix 2.5 and 5G speeds reporting and update shared code
Krzysztof Galazka [Mon, 13 Sep 2021 20:39:59 +0000 (13:39 -0700)]
ixl(4): Fix 2.5 and 5G speeds reporting and update shared code

Fix 2.5 and 5G speeds reporting and update shared code with recent
changes:
- Update expected FW API versions for X710 and X722 adapters
- Define pointers related to Preservation Rules Module
- Add definitions for Shadow RAM pointers to new modules: 5th and 6th
  FPA, and Preservation Rules Module.
- Add I40E_RX_PTYPE_PARSER_ABORTED definition, so the driver will know
  opcode for parser aborted packets.
- Add the new filter types needed for custom cloud filters.
- Add support for Minimum Rollback Revision
- Fix RX_ONLY mode for unicast promiscuous on VLAN
- Add EEE LPI status check for X722 adapters
- Fix PHY type identifiers for 2.5G and 5G adapters
- Fix update link data for X722
- Increase the timeout value for PF reset to give PF more time to finish
  reset if it is loaded with filters.
- Added support for Min Rollback Revision for 4 more X722 modules
- Fix reporting of Active Optical Cable media type
- Add flags and fields for double VLAN processing
- Fix potentially uninitialized variables in NVM code

Reviewed by: kbowling@, mike.jakubik@gmail.com
Tested by: gowtham.kumar.ks@intel.com
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D31565

(cherry picked from commit abf774528d7e497460510b0026db85e30f054142)

2 years agoixl(4): Fix reporting of unqualified transceivers
Krzysztof Galazka [Fri, 20 Aug 2021 21:12:28 +0000 (14:12 -0700)]
ixl(4): Fix reporting of unqualified transceivers

When link_active_on_if_down flag is disabled and link is brought down
with ifconfig, FW reports a false positive link event about an
unqualified transceiver. The condition used in the driver to filter out
those false positive events was incorrect and caused that unqualified
module event to also not be reported when the event was valid.

Change the condition to rely on IFF_UP flag instead of
link_active_on_if_down and bump driver version to 2.3.1-k.

Signed-off-by: Krzysztof Galazka <krzysztof.galazka@intel.com>
Signed-off-by: Eric Joyner <erj@FreeBSD.org>
Reviewed by: stallamr@netapp.com, erj@
Tested by: gowtham.kumar.ks@intel.com
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D30733

(cherry picked from commit c4622b01d2f12b889b57ff7d0b03a38dfcb00fd8)

2 years agoixl(4): Add tunable to override Flow Control settings
Krzysztof Galazka [Mon, 5 Apr 2021 18:08:33 +0000 (11:08 -0700)]
ixl(4): Add tunable to override Flow Control settings

Add flow_control to hw.ixl tunables tree to let override
initial flow control configuration for all interfaces.
Keep using configuration set by NVM by default.

Reviewed by: erj@, gallatin@
Tested by: gowtham.kumar.ks_intel.com
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D29338

(cherry picked from commit 20a52706c814ccfd91c65586404abd2a1563a330)

2 years agolibprocstat: extend zfs_defs hack for .pieo
Ed Maste [Sun, 12 Sep 2021 23:04:31 +0000 (19:04 -0400)]
libprocstat: extend zfs_defs hack for .pieo

By default _pie.a archives are built only for INTERNALLIBs, so there is
usually no need for zfs_defs.pieo to exist.  However, some experimental
work builds _pie.a archives for everything.  Extend the existing set of
zfs_defs hacks to build zfs_defs.pieo as well.

Reviewed by: arichardson
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31924

(cherry picked from commit b9df18d6e8917a9bfb62babb7cf9efeca23aa2fc)

2 years agotarg(4): Remove D_NEEDGIANT.
Alexander Motin [Sat, 21 Aug 2021 15:20:54 +0000 (11:20 -0400)]
targ(4): Remove D_NEEDGIANT.

I don't believe this code needs Giant, if ever needed.

MFC after: 1 month

(cherry picked from commit f3dcedd3de27b1a8f493c8256103e8a7fb93f5a4)

2 years agoamd64 wakeup: rework trampoline page allocation
Konstantin Belousov [Sun, 12 Sep 2021 19:41:51 +0000 (22:41 +0300)]
amd64 wakeup: rework trampoline page allocation

(cherry picked from commit 1c56781cc915d1d2957e5b53717513193476d777)

2 years agox86: duplicate acpi_wakeup.c per i386 and amd64
Konstantin Belousov [Sun, 12 Sep 2021 19:24:33 +0000 (22:24 +0300)]
x86: duplicate acpi_wakeup.c per i386 and amd64

(cherry picked from commit 2b6eec531a1b52621223316f7c2940ed1e293886)

2 years agoamd64 acpi_wakeup: map 1:1 whole low 4G for the trampoline page table
Konstantin Belousov [Sat, 11 Sep 2021 18:36:38 +0000 (21:36 +0300)]
amd64 acpi_wakeup: map 1:1 whole low 4G for the trampoline page table

PR: 258432

(cherry picked from commit db2ba218d9fe6a541a4f537a641cce95f952fd98)

2 years agox86 acpi_install_wakeup_handler(): style
Konstantin Belousov [Sat, 11 Sep 2021 18:26:51 +0000 (21:26 +0300)]
x86 acpi_install_wakeup_handler(): style

(cherry picked from commit ceca8ac1ce47e1f87ba09463aa84eb1c879c37d9)

2 years agoamd64: do not touch low memory in acpi_wakeup_ap() if booted by UEFI
Konstantin Belousov [Sat, 11 Sep 2021 18:19:27 +0000 (21:19 +0300)]
amd64: do not touch low memory in acpi_wakeup_ap() if booted by UEFI

(cherry picked from commit e99255c8a6cae324aeede7f5013d080a2d361e3f)

2 years agobsd.lib.mk: add conditions for building _pie.a archives
Ed Maste [Sun, 12 Sep 2021 16:45:50 +0000 (12:45 -0400)]
bsd.lib.mk: add conditions for building _pie.a archives

As with other .a targets, build _pie.a archives only if LIB is set.

At present we build _pie.a only for INTERNALLIBs, and none of them
include bsd.lib.mk without setting LIB.  However, we might want to build
_pie.a for non-INTERNALLIBs in the future.

Reviewed by: arichardson
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31920

(cherry picked from commit 7c0226cad3f36a05832f9c5216dfa3dadb91c92d)

2 years agotest/ptrace/scescx.c: fix printing of braces for syscalls without args
Konstantin Belousov [Thu, 16 Sep 2021 17:23:11 +0000 (20:23 +0300)]
test/ptrace/scescx.c: fix printing of braces for syscalls without args

(cherry picked from commit 9a8eb5db55964c2fc7aca0db5939d8300badc9ab)

2 years agoAdd -Wno-error=unused-but-set-variable when building with Clang 13+
Dimitry Andric [Thu, 26 Aug 2021 15:36:03 +0000 (17:36 +0200)]
Add -Wno-error=unused-but-set-variable when building with Clang 13+

This warning triggers many times while building world. Downgrade it to a
warning until all occurrences have been fixed. Once the Clang warnings
have been fixed we should be able to turn it on for GCC as well. See
also f4fed768bba45a406f73ed1491d7e52fd1a8711d which did the same for the
kernel builds.

Reviewed by: arichardson, imp
Differential Revision: https://reviews.freebsd.org/D31927

(cherry picked from commit 45feade38ec3e8e30086dedc6ee81cbf816293e3)

2 years agozfs: merge openzfs/zfs@71c609852 (zfs-2.1-release) into stable/13
Martin Matuska [Sat, 18 Sep 2021 18:30:40 +0000 (20:30 +0200)]
zfs: merge openzfs/zfs@71c609852 (zfs-2.1-release) into stable/13

OpenZFS release 2.1.1

Notable upstream pull request merges:
  #11997 FreeBSD: Don't force xattr mount option
  #11997 FreeBSD: Implement xattr=sa
  #11997 FreeBSD: Use SET_ERROR to trace xattr name errors
  #12022 Fix endianness issues with zstd
  #12161 Restore FreeBSD sysctl processing for arc.min and arc.max
  #12183 Optimize small random numbers generation
  #12246 arc: Drop an incorrect assert
  #12271 Tinker with slop space accounting with dedup
  #12279 Fix ARC ghost states eviction accounting
  #12281 Move gethrtime() calls out of vdev queue lock
  #12289 Compact dbuf/buf hashes and lock arrays
  #12294 Upstream: dmu_zfetch_stream_fini leaks refcount
  #12295 Fix abd leak, kmem_free correct size of abd_t
  #12297 Avoid vq_lock drop in vdev_queue_aggregate()
  #12299 file reference counts can get corrupted
  #12300 Introduce dsl_dir_diduse_transfer_space()
  #12314 Optimize allocation throttling
  #12320 FreeBSD: Use unmapped I/O for scattered/gang ABD buffers
  #12328 FreeBSD: Hardcode abd_chunk_size to PAGE_SIZE
  #12339 Read past end of argv array in zpool_do_import()
  #12348 Minor ARC optimizations
  #12365 Fixes in persistent L2ARC
  #12375 FreeBSD: Ignore make_dev_s() errors
  #12378 FreeBSD: Switch from MAXPHYS to maxphys on FreeBSD 13+
  #12383 Fixes for KMSAN reports
  #12397 Run arc_evict thread at higher priority
  #12398 Remove b_pabd/b_rabd allocation from arc_hdr_alloc()
  #12422 Fix/improve dbuf hits accounting
  #12428 Fix unfortunate NULL in spa_update_dspace
  #12443 Fixed data integrity issue when underlying disk returns error
  #12446 Allow disabling of unmapped I/O on FreeBSD
  #12473 Initialize parity blocks before RAID-Z reconstruction benchmarking
  #12511 Make 'zpool labelclear -f' work on offlined disks
  #12514 FreeBSD: Don't remove SA xattr if not SA znode
  #12522 Compressed receive with different ashift can result in incorrect
         PSIZE on disk
  #12535 Verify embedded blkptr's in arc_read()
  #12541 Allow sending corrupt snapshots even if metadata is corrupted

Manually included upstream 2.1 backport pull request #12573:
  #12282 FreeBSD: fix compilation of FreeBSD world after 29274c9

Obtained from: OpenZFS
OpenZFS commit: 71c6098526c6d5fbfa84a58cefe6cdc403488d8c
OpenZFS tag: zfs-2.1.1
Relnotes: yes

2 years ago[fib algo][dxr] Fix division by zero.
Marko Zec [Thu, 16 Sep 2021 14:34:05 +0000 (16:34 +0200)]
[fib algo][dxr] Fix division by zero.

A division by zero would occur if DXR would be activated on a vnet
with no IP addresses configured on any interfaces.

PR: 257965
MFC after: 3 days
Reported by: Raul Munoz

(cherry picked from commit eb3148cc4d256c20b5c7c9052539139b6f57f58b)

2 years ago[fib algo][dxr] Optimize trie updating.
Marko Zec [Wed, 15 Sep 2021 20:36:59 +0000 (22:36 +0200)]
[fib algo][dxr] Optimize trie updating.

Don't rebuild in vain trie parts unaffected by accumulated incremental
RIB updates.

PR: 257965
Tested by: Konrad Kreciwilk
MFC after: 3 days

(cherry picked from commit b51f8bae570b4e908191a1dae9da38aacf8c0fab)

2 years ago[fib algo][dxr] Fix undefined behavior.
Marko Zec [Wed, 15 Sep 2021 20:23:17 +0000 (22:23 +0200)]
[fib algo][dxr] Fix undefined behavior.

The result of shifting uint32_t by 32 (or more) is undefined: fix it.

(cherry picked from commit 442c8a245ee3c6640fc9321e18e8316edf469805)

2 years agoe1000: Revert Update intel shared code
Kevin Bowling [Wed, 8 Sep 2021 22:43:13 +0000 (15:43 -0700)]
e1000: Revert Update intel shared code

This reverts commit fc7682b17f3738573099b8b03f5628dcc8148adb.

This will be done incrementally to help with bisecting an issue in
later I21x devices (ich8lan).

PR: 258153
Approved by: imp
MFC after: 1 day

(cherry picked from commit a4378873e9ce1b35b55378c21f8eae69e58c2525)

2 years agocalendar.freebsd: Fix off-by-one error
Kevin Bowling [Fri, 17 Sep 2021 23:05:27 +0000 (16:05 -0700)]
calendar.freebsd: Fix off-by-one error

(cherry picked from commit 007c2463d6d017ad5321d5cd2bc500e577d22196)

2 years agopf: fix NOINET6 builds
Kristof Provost [Fri, 10 Sep 2021 15:20:39 +0000 (17:20 +0200)]
pf: fix NOINET6 builds

MFC after: 1 week
Sponsored by: Modirum MDPay

(cherry picked from commit 9bdff593ead9434e01cfb6084f21c3e93a22963d)

2 years agopf: qid and pqid can be uint16_t
Kristof Provost [Tue, 7 Sep 2021 12:41:37 +0000 (14:41 +0200)]
pf: qid and pqid can be uint16_t

tag2name() returns a uint16_t, so we don't need to use uint32_t for the
qid (or pqid). This reduces the size of struct pf_kstate slightly. That
in turn buys us space to add extra fields for dummynet later.

Happily these fields are not exposed to user space (there are user space
versions of them, but they can just stay uint32_t), so there's no ABI
breakage in modifying this.

MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31873

(cherry picked from commit b64f7ce98f5286721a38b31fa2180313f800fb1d)

2 years agopf tests: synproxy to localhost test
Kristof Provost [Wed, 30 Jun 2021 12:22:27 +0000 (14:22 +0200)]
pf tests: synproxy to localhost test

Test syn-proxying a connection to the local host.

Sponsored by: Modirum MDPay
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31854

(cherry picked from commit 6598cababf6425181a755ec97c3fa66d7ee31393)

2 years agopf: fix synproxy to local
Kristof Provost [Wed, 1 Sep 2021 07:54:55 +0000 (09:54 +0200)]
pf: fix synproxy to local

When we're synproxy-ing a connection that's going to us (as opposed to a
forwarded one) we wound up trying to send out the pf-generated tcp
packets through pf_intr(), which called ip(6)_output(). That doesn't
work all that well for packets that are destined for us, so in that case
we must call ip(6)_input() instead.

MFC after: 1 week
Sponsored by:   Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D31853

(cherry picked from commit 0a51d74c3ab8e7ee8771cc3ee78ffba831c953ef)

2 years agonet: Fix memory leaks upon arp_fillheader() failures
orange30 [Wed, 1 Sep 2021 15:37:36 +0000 (23:37 +0800)]
net: Fix memory leaks upon arp_fillheader() failures

Free memory before return from arprequest_internal().  In in_arpinput(),
if arp_fillheader() fails, it should use goto drop.

Reviewed by: melifaro, imp, markj
Pull Request: https://github.com/freebsd/freebsd-src/pull/534

(cherry picked from commit f5777c123a6382f5fdc9732a87c8fa1ff672f148)

2 years agowpi: Fix a lock leak in an error path in wpi_run()
Mark Johnston [Fri, 10 Sep 2021 14:03:51 +0000 (10:03 -0400)]
wpi: Fix a lock leak in an error path in wpi_run()

PR: 258243
Reported by: dinghao.liu@zju.edu.cn

(cherry picked from commit 6d042d7c861a8fffd1784c720720c3b89c7c0883)

2 years agonet: Enter a net epoch around protocol if_up/down notifications
Mark Johnston [Fri, 10 Sep 2021 13:07:40 +0000 (09:07 -0400)]
net: Enter a net epoch around protocol if_up/down notifications

When traversing a list of interface addresses, we need to be in a net
epoch section, and protocol ctlinput routines need a stable reference to
the address.

Reported by: syzbot+3219af764ead146a3a4e@syzkaller.appspotmail.com
Reviewed by: kp, melifaro
Sponsored by: The FreeBSD Foundation

(cherry picked from commit b1e6a792d68e9c59740d5e925405d8d4343d099b)

2 years agocallout(9): Allow spin locks use with callout_init_mtx().
Alexander Motin [Fri, 3 Sep 2021 01:16:46 +0000 (21:16 -0400)]
callout(9): Allow spin locks use with callout_init_mtx().

Implement lock_spin()/unlock_spin() lock class methods, moving the
assertion to _sleep() instead.  Change assertions in callout(9) to
allow spin locks for both regular and C_DIRECT_EXEC cases. In case of
C_DIRECT_EXEC callouts spin locks are the only locks allowed actually.

As the first use case allow taskqueue_enqueue_timeout() use on fast
task queues.  It actually becomes more efficient due to avoided extra
context switches in callout(9) thanks to C_DIRECT_EXEC.

MFC after: 2 weeks
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D31778

(cherry picked from commit 4730a8972b1f4b67bf9ffde8e63ca906ef4c9563)

2 years agogeli(8): Do not report error on resize to the same size.
Alexander Motin [Wed, 18 Aug 2021 21:11:03 +0000 (17:11 -0400)]
geli(8): Do not report error on resize to the same size.

Just validate the old metadata and exit.  Originally the check was
added to not thash the only copy of metadata, but we can achieve the
same just by skipping the writing/trashing.  The metadata validation
should protect user from wrongly specifying new size instead of old.

MFC after: 1 month
Sponsored by: iXsystems, Inc.

(cherry picked from commit c7cf100aafb4cb881e05a5153de152907f6c07f3)

2 years agoopenssh: remove login class restrictions leftovers
Ed Maste [Thu, 2 Sep 2021 21:13:29 +0000 (17:13 -0400)]
openssh: remove login class restrictions leftovers

MFC after: 2 weeks
Fixes: 27ceebbc2402 ("openssh: simplify login class...")
Sponsored by: The FreeBSD Foundation

(cherry picked from commit ba91e31f478aaade96bbdbf01560e8b7cbe41b56)

2 years agortsx: Call taskqueue sooner, adjust DELAY(9) calls, add an inversion heuristic
Henri Hennebert [Thu, 9 Sep 2021 17:33:51 +0000 (13:33 -0400)]
rtsx: Call taskqueue sooner, adjust DELAY(9) calls, add an inversion heuristic

- Some configurations, e.g. HP EliteBook 840 G3, come with a dummy card
in the card slot which is detected as a valid SD card.  This added long
timeout at boot time.  To alleviate the problem, the default timeout is
reduced to one second during the setup phase. [1]

- Some configurations crash at boot if rtsx(4) is defined in the kernel
config.  At boot time, without a card inserted, the driver found that
a card is present and just after that a "spontaneous" interrupt is
generated showing that no card is present.  To solve this problem,
DELAY(9) is set to one quarter of a second before checking card presence
during driver attach.

- As advised by adrian, taskqueue and DMA are set up sooner during
the driver attach.  A heuristic to try to detect configuration needing
inversion was added.

PR: 255130 [1]
Differential Revision: https://reviews.freebsd.org/D30499

(cherry picked from commit 9d3bc163825415f900d06d62efdf02caaad2d51d)

2 years agosctp: Clear assoc socket references when freeing a PCB
Mark Johnston [Thu, 9 Sep 2021 12:33:26 +0000 (08:33 -0400)]
sctp: Clear assoc socket references when freeing a PCB

This restores behaviour present in the first import of SCTP.  Commit
ceaad40ae729dea2c5d8ffcfdd45bb96fb8969d2 commented this out and commit
62fb761ff28bb184a2543e539dd689fefd5d3246 removed it.  However, once
sctp_inpcb_free() returns, the socket reference is gone no matter what,
so we need to clear it.

Reported by: syzbot+30dd69297fcbc5f0e10a@syzkaller.appspotmail.com
Reported by: syzbot+7b2f9d4bcac1c9569291@syzkaller.appspotmail.com
Reported by: syzbot+ed3e651f7d040af480a6@syzkaller.appspotmail.com
Reviewed by: tuexen
Sponsored by: The FreeBSD Foundation

(cherry picked from commit 4250aa1188b5622a6cef871003abd4a50067bdae)

2 years agoosd: Fix racy assertions
Mark Johnston [Thu, 9 Sep 2021 13:50:27 +0000 (09:50 -0400)]
osd: Fix racy assertions

osd_register(9) may reallocate and expand the destructor array for a
given object type if no space is available for a new key.  This happens
with the object lock held.  Thus, when verifying that a given slot in
the array is occupied, we need to hold the object lock to avoid racing
with a reallocation.

Reported by: syzbot+69ce54c7d7d813315dd3@syzkaller.appspotmail.com
Sponsored by: The FreeBSD Foundation

(cherry picked from commit 187afc58791cd877c8ba0573b7826c31db8c6f73)

2 years agobnxt(4): Fix bugs in WOL support.
Alexander Motin [Thu, 2 Sep 2021 22:11:58 +0000 (18:11 -0400)]
bnxt(4): Fix bugs in WOL support.

Before this change driver reported IFCAP_WOL_MAGIC enabled, but not
supported.  It caused errors on some SIOCSIFCAP calls.  Instead
report the support if hardware supports WOL, and enabled status if
it has such filter installed on boot.

Also bnxt_wol_config() should check WOL status in if_getcapenable(),
not in if_getcapabilities() to get current one.

MFC after: 2 weeks
Sponsored by: iXsystems, Inc.

(cherry picked from commit 8c14d7da5b9be78f71b1aa803e93ae7de973dd42)

2 years agonfsd: Make loop calling VOP_ALLOCATE() iterate until done
Rick Macklem [Sun, 29 Aug 2021 23:46:27 +0000 (16:46 -0700)]
nfsd: Make loop calling VOP_ALLOCATE() iterate until done

The NFSv4.2 Deallocate operation loops on VOP_DEALLOCATE()
while progress is being made (remaining length decreasing).
This patch changes the loop on VOP_ALLOCATE() for the NFSv4.2
Allocate operation do the same, instead of stopping after
an arbitrary 20 iterations.

(cherry picked from commit 13914e51eb8de6fe9f627c9c1d48c09880b2607e)

2 years agoTag 2.1.1
Brian Behlendorf [Wed, 15 Sep 2021 20:37:18 +0000 (13:37 -0700)]
Tag 2.1.1

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2 years agoLinux 5.14 compat: META
Brian Behlendorf [Wed, 15 Sep 2021 20:19:12 +0000 (13:19 -0700)]
Linux 5.14 compat: META

Increase the Linux-Maximum version in the META file to 5.14.
All of the required compatibility patches have been merged
and the 5.14 kernel has been officially released.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12565

2 years agoipmi: fix negative logic in watchdog control flag
Wojciech Macek [Wed, 18 Aug 2021 06:21:14 +0000 (08:21 +0200)]
ipmi: fix negative logic in watchdog control flag

Use wd_enable instead of wd_disable

(cherry picked from commit e3500c602b13f8252eb8bb779849c41d47306cee)

2 years agoipmi: New tunable to deactivate IPMI watchdog
Wojciech Macek [Tue, 17 Aug 2021 06:28:21 +0000 (08:28 +0200)]
ipmi: New tunable to deactivate IPMI watchdog

In case we want to use other WD than IPMI-provided, add
sysctl to disable initialization.

Obtained from: Semihalf
Sponsored by: Stormshield
Differential revision: https://reviews.freebsd.org/D31548

(cherry picked from commit e8ad0a0059afe1cd0af39bab49018ae7bc9be937)

2 years agopath_test: Fix test sorting
Mark Johnston [Wed, 8 Sep 2021 14:59:42 +0000 (10:59 -0400)]
path_test: Fix test sorting

Sponsored by: The FreeBSD Foundation

(cherry picked from commit c4c66153243896f9de49474c817fe450584a3bf3)

2 years agopath_test: Fix the unix socket test
Mark Johnston [Wed, 8 Sep 2021 14:57:04 +0000 (10:57 -0400)]
path_test: Fix the unix socket test

The intent was to specify O_PATH to open(2).

Sponsored by: The FreeBSD Foundation

(cherry picked from commit 8b83b656a507ee767fcb6921985720d1df61101b)

2 years agoi386 loaders: avoid lld 13 garbage collecting linker sets
Dimitry Andric [Wed, 8 Sep 2021 12:04:13 +0000 (14:04 +0200)]
i386 loaders: avoid lld 13 garbage collecting linker sets

Because lld 13 and higher default to garbage collecting start/stop
symbols when using --gc-sections, the linker sets used in the i386 boot
loaders will disappear. This leads to the loaders not recognizing any
commands, and failure to boot.

Until we have a good set of linker scripts for the loaders, work around
it by disabling the start-stop-gc feature.

(cherry picked from commit c90cab0d668af5d947054e47184d4f8dcb874ec8)

2 years agoTurn off acpi_timer_test on !i386 by default
Colin Percival [Wed, 15 Sep 2021 02:42:14 +0000 (19:42 -0700)]
Turn off acpi_timer_test on !i386 by default

The ACPI timer test was introduced in 2002 to detect an erratum in
chipsets used with Pentium II and Pentium III processors.  No other
hardware is known to be affected, so on non-i386 systems it should
be safe to skip the test.

Turning off this test speeds up the FreeBSD boot process by roughly
140 ms on an EC2 c5.xlarge instance.

The previous behaviour can be restored by setting
hw.acpi.timer_test_enabled=1
in /boot/loader.conf.

Requested by: jhb, imp
Sponsored by: https://www.patreon.com/cperciva

2 years agoHide acpi_timer_test behind a tunable
Colin Percival [Tue, 7 Sep 2021 23:58:18 +0000 (16:58 -0700)]
Hide acpi_timer_test behind a tunable

When hw.acpi.timer_test_enabled is set to 0, this makes acpi_timer_test
return 1 without actually testing the ACPI timer; this results in the
ACPI-fast timecounter always being used rather than potentially using
ACPI-safe.

The ACPI timer testing was introduced in 2002 as a workaround for
errata in Pentium II and Pentium III chipsets, and is unlikely to be
needed in 2021.

While I'm here, add TSENTER/TSEXIT to make it easier to see the time
spent on the test (if it is enabled).

Reviewed by: allanjude, imp
MFC After: 1 week

(cherry picked from commit 3c253d03d94e89cf1a26716b58fc27653df2a4f3)

2 years agokern: drop remaining references to removed makesyscalls.sh
Kyle Evans [Thu, 13 May 2021 18:46:17 +0000 (13:46 -0500)]
kern: drop remaining references to removed makesyscalls.sh

This was accidentally omitted from the recent removal of makeyscalls.sh.

(cherry picked from commit 35aa1d6e4542ce7c8be127b85da2a5c9e8ade3f7)

2 years agoopenssh: simplify login class restrictions
Ed Maste [Tue, 31 Aug 2021 19:30:50 +0000 (15:30 -0400)]
openssh: simplify login class restrictions

Login class-based restrictions were introduced in 5b400a39b8ad.  The
code was adapted for sshd's Capsicum sandbox and received many changes
over time, including at least fc3c19a9fceebd393de91cc3, and
e8c56fba2926.

During an attempt to upstream the work a much simpler approach was
suggested.  Adopt it now in the in-tree OpenSSH to reduce conflicts with
future updates.

Submitted by: Yuchiro Naito (against OpenSSH-portable on GitHub)
Obtained from: https://github.com/openssh/openssh-portable/pull/262
Reviewed by: allanjude, kevans
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D31760

(cherry picked from commit 27ceebbc2402e4c98203c7eef9696f4bd3d326f8)

2 years agoAlign taskqueue_enqueue_timeout() to hardclock.
Alexander Motin [Wed, 1 Sep 2021 03:47:51 +0000 (23:47 -0400)]
Align taskqueue_enqueue_timeout() to hardclock.

It is done for all other KPIs using HZ, but was missed here.

MFC after: 2 weeks

(cherry picked from commit 706b1a5724d668a8752ac89cd67113e4c6917d54)

2 years agoFixed data integrity issue when underlying disk returns error
Arun KV [Mon, 13 Sep 2021 20:02:39 +0000 (01:32 +0530)]
Fixed data integrity issue when underlying disk returns error

Errors in zil_lwb_write_done() are not propagated to
zil_lwb_flush_vdevs_done() which can result in zil_commit_impl()
not returning an error to applications even when zfs was not able
to write data to the disk.

Remove the ZIO_FLAG_DONT_PROPAGATE flag from zio_rewrite() to
allow errors to propagate and consolidate the error handling for
flush and write errors to a single location (rather than having
error handling split between the "write done" and "flush done"
handlers).

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: Arun KV <arun.kv@datacore.com>
Closes #12391
Closes #12443

2 years agoZTS: Waiting for zvols to be available
Brian Behlendorf [Mon, 13 Sep 2021 19:18:01 +0000 (12:18 -0700)]
ZTS: Waiting for zvols to be available

This is a follow up patch for PR #12515 which addresses some
additional ZTS tests which are unreliable are should explicitly
wait for the required zvols to be available.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: @Theo13111
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12553

2 years agoVerify embedded blkptr's in arc_read()
Brian Behlendorf [Fri, 10 Sep 2021 01:02:07 +0000 (18:02 -0700)]
Verify embedded blkptr's in arc_read()

The block pointer verification check in arc_read() should also
cover embedded block pointers.  While highly unlikely, accessing
a damaged block pointer can result in panic.  To further harden
the code extend the existing check to include embedded block
pointers and add a comment explaining the rational for this
sanity check.  Lastly, correct a flaw in zfs_blkptr_verify()
so the error count is checked even when checking a untrusted
config to verify the non-pool-specific portions of a block
pointer.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12535

2 years agoLinux 5.15 compat: get_acl()
Brian Behlendorf [Thu, 9 Sep 2021 16:38:35 +0000 (09:38 -0700)]
Linux 5.15 compat: get_acl()

Kernel commits

332f606b32b6 ovl: enable RCU'd ->get_acl()
0cad6246621b vfs: add rcu argument to ->get_acl() callback

Added compatibility code to detect the new ->get_acl() interface
and correctly handle the case where the new rcu argument is set.

Reviewed-by: Coleman Kane <ckane@colemankane.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12548

2 years agoAllow sending corrupt snapshots even if metadata is corrupted
Allan Jude [Thu, 9 Sep 2021 14:17:31 +0000 (10:17 -0400)]
Allow sending corrupt snapshots even if metadata is corrupted

When zfs_send_corrupt_data is set, use the TRAVERSE_HARD flag,
so traverse_visitbp() will not fail with ECKSUM if a blockpointer
cannot be read, but rather will continue and send the objects it can.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Sponsored-By: Klara Inc.
Sponsored-By: WHC Online Solutions Inc.
Closes #12541

2 years agoarc: Drop an incorrect assert
Rich Ercolani [Wed, 8 Sep 2021 21:00:03 +0000 (17:00 -0400)]
arc: Drop an incorrect assert

Unfortunately, there was an overzealous assertion that was (in pretty
specific circumstances) false, causing failure.  This assertion was
added in error, so we're removing it.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #9897
Closes #12020
Closes #12246

2 years agoCompressed receive with different ashift can result in incorrect PSIZE on disk
Paul Dagnelie [Wed, 8 Sep 2021 20:52:28 +0000 (13:52 -0700)]
Compressed receive with different ashift can result in incorrect PSIZE on disk

We round up the psize to the nearest multiple of the asize or to the
lsize, whichever is smaller. Once that's done, we allocate a new
buffer of the appropriate size, zero the tail, and copy the data
into it. This adds a small performance cost to these kinds of writes,
but fixes the bookkeeping problems.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Co-authored-by: Matthew Ahrens <matthew.ahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #12522
Closes #8462

2 years agoLinux 5.15 compat: standalone <linux/stdarg.h>
Alexander [Wed, 8 Sep 2021 19:59:43 +0000 (21:59 +0200)]
Linux 5.15 compat: standalone <linux/stdarg.h>

Kernel commits

39f75da7bcc8 ("isystem: trim/fixup stdarg.h and other headers")
c0891ac15f04 ("isystem: ship and use stdarg.h")
564f963eabd1 ("isystem: delete global -isystem compile option")

(for now can be found in linux-next.git tree, will land into the
 Linus' tree during the ongoing 5.15 cycle with one of akpm merges)

removed the -isystem flag and disallowed the inclusion of any
compiler header files. They also introduced a minimal
<linux/stdarg.h> as a replacement for <stdarg.h>.
include/os/linux/spl/sys/cmn_err.h in the ZFS source tree includes
<stdarg.h> unconditionally. Introduce a test for <linux/stdarg.h>
and include it instead of the compiler's one to prevent module
build breakage.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Closes #12531

2 years agoLinux 5.15 compat: block device readahead
Brian Behlendorf [Wed, 8 Sep 2021 15:03:13 +0000 (08:03 -0700)]
Linux 5.15 compat: block device readahead

The 5.15 kernel moved the backing_dev_info structure out of
the request queue structure which causes a build failure.

Rather than look in the new location for the BDI we instead
detect this upstream refactoring by the existance of either
the blk_queue_update_readahead() or disk_update_readahead()
functions.  In either case, there's no longer any reason to
manually set the ra_pages value since it will be overridden
with a reasonable default (2x the block size) when
blk_queue_io_opt() is called.

Therefore, we update the compatibility wrapper to do nothing
for 5.9 and newer kernels.  While it's tempting to do the
same for older kernels we want to keep the compatibility
code to preserve the existing behavior.  Removing it would
effectively increase the default readahead to 128k.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12532

2 years agoDetect iSCSI in the zpool cmd vdev media script
Don Brady [Thu, 2 Sep 2021 23:11:53 +0000 (17:11 -0600)]
Detect iSCSI in the zpool cmd vdev media script

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #12206

2 years agoCI: don't install abigail-tools
George Melikov [Tue, 31 Aug 2021 20:56:45 +0000 (23:56 +0300)]
CI: don't install abigail-tools

We use docker image instead.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12529

2 years agoUpdate ABI files via new libabigail version
George Melikov [Tue, 31 Aug 2021 19:26:30 +0000 (22:26 +0300)]
Update ABI files via new libabigail version

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12529

2 years agoLibabigail: make .abi files more consistent
George Melikov [Tue, 31 Aug 2021 18:52:05 +0000 (21:52 +0300)]
Libabigail: make .abi files more consistent

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12529

2 years agoCI: use fresh libabigail via docker image
George Melikov [Tue, 31 Aug 2021 17:53:12 +0000 (20:53 +0300)]
CI: use fresh libabigail via docker image

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12529

2 years agoCheck for libabigail version
George Melikov [Tue, 31 Aug 2021 17:49:29 +0000 (20:49 +0300)]
Check for libabigail version

We need to use 1.8.0+ version, older versions
may segfault and give inconsistent results.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12529

2 years agoZTS: Remove exceptions for flaky zhack on FreeBSD
Ryan Moeller [Wed, 1 Sep 2021 20:20:00 +0000 (16:20 -0400)]
ZTS: Remove exceptions for flaky zhack on FreeBSD

Issue #11854 has been resolved, so we can remove the exceptions for it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12527

2 years agoFreeBSD: Don't remove SA xattr if not SA znode
Ryan Moeller [Mon, 30 Aug 2021 23:01:09 +0000 (19:01 -0400)]
FreeBSD: Don't remove SA xattr if not SA znode

We attempt to remove an existing SA xattr when setting a dir xattr, but
this only makes sense if the znode has been upgraded to the SA format.
Otherwise, we will hit an assert in zfs_sa_get_xattr.

Make sure this is an SA znode before attempting to remove the SA xattr.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12514

2 years agoFix cross-endian interoperability of zstd
Rich Ercolani [Mon, 30 Aug 2021 21:13:46 +0000 (17:13 -0400)]
Fix cross-endian interoperability of zstd

It turns out that layouts of union bitfields are a pain, and the
current code results in an inconsistent layout between BE and LE
systems, leading to zstd-active datasets on one erroring out on
the other.

Switch everyone over to the LE layout, and add compatibility code
to read both.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12008
Closes #12022

2 years agoZTS: Waiting for zvols to be available
Brian Behlendorf [Sun, 29 Aug 2021 15:56:58 +0000 (08:56 -0700)]
ZTS: Waiting for zvols to be available

The ZTS block_device_wait helper function should use -e when waiting
for a file to appear since it will be either a block special device
or a symlink.  This didn't cause any failures but when a device path
was specified the function would wait longer than needed.

Additionally update the most flakey test cases to pass the file path
to block_device_wait to try and improve the test reliability.  The
udev behavior on Fedora in particular can result in frequent false
positives.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12515

2 years agoCorrect checking bdev_check_media_change message
Ryan Moeller [Fri, 27 Aug 2021 17:02:54 +0000 (13:02 -0400)]
Correct checking bdev_check_media_change message

We're not looking for bdev_disk_changed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12492

2 years agoMake 'zpool labelclear -f' work on offlined disks
Tony Hutter [Fri, 27 Aug 2021 16:26:49 +0000 (09:26 -0700)]
Make 'zpool labelclear -f' work on offlined disks

This patch allows you to clear the label on offlined disks in an active
pool with `-f`.  Previously, labelclear wouldn't let you do that.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #12511

2 years agovdev_id: Return an error if config file is not found
Anton Gubarkov [Wed, 25 Aug 2021 20:01:26 +0000 (23:01 +0300)]
vdev_id: Return an error if config file is not found

Signed-off-by: Anton Gubarkov <anton.gubarkov@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2 years agozpool-remove.8: describe top-level vdev sector size limitation
Sam Hathaway [Mon, 23 Aug 2021 21:59:18 +0000 (17:59 -0400)]
zpool-remove.8: describe top-level vdev sector size limitation

Document that top-level vdevs cannot be removed unless all top-level
vdevs have the same sector size.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Sam Hathaway <sam@sam-hathaway.com>
Closes #11339
Closes #12472

2 years agoInitialize parity blocks before RAID-Z reconstruction benchmarking
Mark Johnston [Mon, 23 Aug 2021 18:10:17 +0000 (14:10 -0400)]
Initialize parity blocks before RAID-Z reconstruction benchmarking

benchmark_raidz() allocates a row to benchmark parity calculation and
reconstruction.  In the latter case, the parity blocks are left
uninitialized, leading to reports from KMSAN.

Initialize parity blocks to 0xAA as we do for the data earlier in the
function.  This does not affect the selected RAID-Z implementation on
any of several systems tested.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12473

2 years agoZTS: Add tests for creation time
Ryan Moeller [Mon, 26 Jul 2021 20:08:52 +0000 (16:08 -0400)]
ZTS: Add tests for creation time

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12432

2 years agoLinux 4.11 compat: statx support
Richard Yao [Sun, 17 Mar 2019 00:43:13 +0000 (20:43 -0400)]
Linux 4.11 compat: statx support

Linux 4.11 added a new statx system call that allows us to expose crtime
as btime. We do this by caching crtime in the znode to match how atime,
ctime and mtime are cached in the inode.

statx also introduced a new way of reporting whether the immutable,
append and nodump bits have been set. It adds support for reporting
compression and encryption, but the semantics on other filesystems is
not just to report compression/encryption, but to allow it to be turned
on/off at the file level. We do not support that.

We could implement semantics where we refuse to allow user modification
of the bit, but we would need to do a dnode_hold() in zfs_znode_alloc()
to find out encryption/compression information. That would introduce
locking that will have a minor (although unmeasured) performance cost.
It also would be inferior to zdb, which reports far more detailed
information. We therefore omit reporting of encryption/compression
through statx in favor of recommending that users interested in such
information use zdb.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #8507

2 years agozfs.4: Fix typo s/compatiblity/compatibility/
Gordon Bergling [Tue, 17 Aug 2021 17:01:07 +0000 (19:01 +0200)]
zfs.4: Fix typo s/compatiblity/compatibility/

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Gordon Bergling <gbergling@googlemail.com>
Closes #12464

2 years agoRemove b_pabd/b_rabd allocation from arc_hdr_alloc()
Alexander Motin [Tue, 17 Aug 2021 16:15:54 +0000 (12:15 -0400)]
Remove b_pabd/b_rabd allocation from arc_hdr_alloc()

When a header is allocated for full overwrite it is a waste of time
to allocate b_pabd/b_rabd for it, since arc_write() will free them
without ever being touched.  If it is a read or a partial overwrite
then arc_read() and arc_hdr_decrypt() allocate them explicitly.

Reduced memory allocation in user threads also reduces ARC eviction
throttling there, proportionally increasing it in ZIO threads, that
is not good.  To minimize or even avoid it introduce ARC allocation
reserve, allowing certain arc_get_data_abd() callers to allocate a
bit longer in situations where user threads will already throttle.

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12398

2 years agoOptimize arc_l2c_only lists assertions
Alexander Motin [Tue, 17 Aug 2021 15:55:34 +0000 (11:55 -0400)]
Optimize arc_l2c_only lists assertions

It is very expensive and not informative to call multilist_is_empty()
for each arc_change_state() on debug builds to check for impossible.
Instead implement special index function for arc_l2c_only->arcs_list,
multilists, panicking on any attempt to use it.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12421

2 years agoFix/improve dbuf hits accounting
Alexander Motin [Tue, 17 Aug 2021 15:50:31 +0000 (11:50 -0400)]
Fix/improve dbuf hits accounting

Instead of clearing stats inside arc_buf_alloc_impl() do it inside
arc_hdr_alloc() and arc_release().  It fixes statistics being wiped
every time a new dbuf is filled from the ARC.

Remove b_l1hdr.b_l2_hits. L2ARC hits are accounted at b_l2hdr.b_hits.
Since the hits are accounted under hash lock, replace atomics with
simple increments.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12422

2 years agoAvoid vq_lock drop in vdev_queue_aggregate()
Alexander Motin [Tue, 17 Aug 2021 15:47:00 +0000 (11:47 -0400)]
Avoid vq_lock drop in vdev_queue_aggregate()

vq_lock is already too congested for two more operations per I/O.
Instead of dropping and reacquiring it inside vdev_queue_aggregate()
delegate the zio_vdev_io_bypass() and zio_execute() calls for parent
I/Os to callers, that drop the lock any way to execute the new I/O.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12297

2 years agoUse more atomics in refcounts
Alexander Motin [Tue, 17 Aug 2021 15:44:34 +0000 (11:44 -0400)]
Use more atomics in refcounts

Use atomic_load_64() for zfs_refcount_count() to prevent torn reads
on 32-bit platforms.  On 64-bit ones it should not change anything.

When built with ZFS_DEBUG but running without tracking enabled use
atomics instead of mutexes same as for builds without ZFS_DEBUG.
Since rc_tracked can't change live we can check it without lock.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12420

2 years agoZTS: Avoid unset $tmpdir in redacted_panic
Ryan Moeller [Mon, 16 Aug 2021 23:38:34 +0000 (19:38 -0400)]
ZTS: Avoid unset $tmpdir in redacted_panic

The redacted_send tests make use of a $tmpdir variable, except in
redacted_send/redacted_panic the variable is never defined.

Use $TEST_BASE_DIR instead.

Clean up the stream file after the test.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12455

2 years agoRestore FreeBSD sysctl processing for arc.min and arc.max
Allan Jude [Mon, 16 Aug 2021 15:35:19 +0000 (11:35 -0400)]
Restore FreeBSD sysctl processing for arc.min and arc.max

Before OpenZFS 2.0, trying to set the FreeBSD sysctl vfs.zfs.arc_max
to a disallowed value would return an error.
Since the switch, it instead only generates WARN_IF_TUNING_IGNORED

Keep the ability to set the sysctl's specifically to 0, even though
that is less than the minimum, because some tests depend on this.

Also lost, was the ability to set vfs.zfs.arc_max to a value less
than the default vfs.zfs.arc_min at boot time. Restore this as well.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #12161

2 years agozfs: add missed dependency of zfs module on zlib
Ryan Moeller [Fri, 13 Aug 2021 20:42:45 +0000 (16:42 -0400)]
zfs: add missed dependency of zfs module on zlib

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Martin Matuska <mm@FreeBSD.org>
Co-authored-by: Konstantin Belousov <kib@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
External-issue: https://reviews.freebsd.org/D31207
Closes #12442

2 years agoAdd zfs.sh -r flag to reload modules
Ryan Moeller [Fri, 13 Aug 2021 20:37:46 +0000 (16:37 -0400)]
Add zfs.sh -r flag to reload modules

zfs.sh already can load and unload, so why not both?

This is convenient when developing changes to the module and you want
to rapidly make some changes, rebuild the module, reload the module,
and test the changes.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12450

2 years agoFix usage of find in tests/Makefile.am
Ryan Moeller [Fri, 13 Aug 2021 20:13:57 +0000 (16:13 -0400)]
Fix usage of find in tests/Makefile.am

The path is not optional on FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12453

2 years agoRun arc_evict thread at higher priority
Tony Nguyen [Tue, 10 Aug 2021 17:36:26 +0000 (11:36 -0600)]
Run arc_evict thread at higher priority

Run arc_evict thread at higher priority, nice=0, to give it more CPU
time which can improve performance for workload with high ARC evict
activities.

On mixed read/write and sequential read workloads, I've seen between
10-40% better performance.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tony Nguyen <tony.nguyen@delphix.com>
Closes #12397

2 years agoMake get_key_material_file fail more verbosely
Rich Ercolani [Thu, 5 Aug 2021 23:48:33 +0000 (19:48 -0400)]
Make get_key_material_file fail more verbosely

It turns out, there are a lot of possible reasons for fopen to fail.
Let's share which reason we failed for today.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12410

2 years agoEnable /proc/diskstats for zvols
Brian Behlendorf [Thu, 5 Aug 2021 21:35:34 +0000 (14:35 -0700)]
Enable /proc/diskstats for zvols

The /proc/diskstats accounting needs to be explicitly enabled
for block devices which do not use multi-queue.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12440
Closes #12066

2 years agoMan zpool-scrub.8: describe sequential scrub
George Melikov [Thu, 5 Aug 2021 21:30:28 +0000 (00:30 +0300)]
Man zpool-scrub.8: describe sequential scrub

Describe sequential scrub and add examples of scrub status.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12429

2 years agoModify checksum obtain method of QAT
hedongzhang [Tue, 3 Aug 2021 17:46:33 +0000 (01:46 +0800)]
Modify checksum obtain method of QAT

CpaDcGeneratefooter function that obtain the checksum code
does not support the CPA_DC_STATELESS mode. So we get the
adler32 chencksum of the end of the zlib from dc_results.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chengfei Zhu <chengfeix.zhu@intel.com>
Signed-off-by: hedong.zhang <h_d_zhang@163.com>
Closes #12343

2 years agoAllow disabling of unmapped I/O on FreeBSD
Mark Johnston [Mon, 2 Aug 2021 19:18:24 +0000 (15:18 -0400)]
Allow disabling of unmapped I/O on FreeBSD

We have a tunable which permits one to disable the use of unmapped I/O
for the buffer cache.  Respect it in ZFS as well.  This is useful for
KMSAN, which cannot easily maintain shadow state for unmapped pages.

No functional change intended, as unmapped I/O is permitted by default
and there's no real reason to disable it in practice except for
debugging.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12446

2 years agoAdd comment on metaslab_class_throttle_reserve() locking
Alexander Motin [Mon, 26 Jul 2021 23:30:20 +0000 (19:30 -0400)]
Add comment on metaslab_class_throttle_reserve() locking

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Issue #12314
Closes #12419

2 years agoAssorted fixes for the performance tests
John Wren Kennedy [Mon, 26 Jul 2021 21:47:08 +0000 (15:47 -0600)]
Assorted fixes for the performance tests

- Bail out early if we're running the perf tests and forget to
  specify disks.
- Allow perf tests to run with any number of disks.
- Remove weekly vs. nightly settings
- Move variables with common values to perf.shlib
- Use zinject to clear the ARC over export/import
- Fix dbuf cache size calculation

When the meaning of `dbuf_cache_max_bytes` changed, the performance
test that covers the dbuf cache started to fail. The test would try to
write files for the test using the max possible size of the cache,
inevitably filling the pool and failing. This change uses
`dbuf_cache_shift` to correctly calculate the dbuf cache size.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes #12408

2 years agoRead past end of argv array in zpool_do_import()
Matthew Ahrens [Mon, 26 Jul 2021 19:51:39 +0000 (12:51 -0700)]
Read past end of argv array in zpool_do_import()

`zpool_do_import()` passes `argv[0]`, (optionally) `argv[1]`, and
`pool_specified` to `import_pools()`.  If `pool_specified==FALSE`, the
`argv[]` arguments are not used.  However, these values may be off the
end of the `argv[]` array, so loading them could dereference unmapped
memory.  This error is reported by the asan build:

```
=================================================================
==6003==ERROR: AddressSanitizer: heap-buffer-overflow
READ of size 8 at 0x6030000004a8 thread T0
    #0 0x562a078b50eb in zpool_do_import zpool_main.c:3796
    #1 0x562a078858c5 in main zpool_main.c:10709
    #2 0x7f5115231bf6 in __libc_start_main
    #3 0x562a07885eb9 in _start

0x6030000004a8 is located 0 bytes to the right of 24-byte region
allocated by thread T0 here:
    #0 0x7f5116ac6b40 in __interceptor_malloc
    #1 0x562a07885770 in main zpool_main.c:10699
    #2 0x7f5115231bf6 in __libc_start_main
```

This commit passes NULL for these arguments if they are off the end
of the `argv[]` array.

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #12339

2 years agoAdd missing properties to zfs allow manpage
Václav Skála [Tue, 20 Jul 2021 20:21:18 +0000 (22:21 +0200)]
Add missing properties to zfs allow manpage

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Václav Skála <skala@vshosting.cz>
Closes #12402

2 years agoFixes in persistent L2ARC
George Amanakis [Mon, 26 Jul 2021 19:30:24 +0000 (21:30 +0200)]
Fixes in persistent L2ARC

In l2arc_add_vdev() first decide whether the device is eligible for
L2ARC rebuild or whole device trim and then add it to the list of cache
devices. Otherwise l2arc_feed_thread() might already start writing on
the device invalidating previous content as l2ad_hand = l2ad_start.
However l2arc_rebuild_vdev() needs the device present in the cache
device list to figure out its l2arc_dev_t. Fix this by moving most of
l2arc_rebuild_vdev() in a new function l2arc_rebuild_dev() which does
not need to search in the cache device list.

In contrast to l2arc_add_vdev() we do not have to worry about
l2arc_feed_thread() invalidating previous content when onlining a
cache device. The device parameters (l2ad*) are not cleared when
offlining the device and writing new buffers will not invalidate
all previous content. In worst case only buffers that have not had
their log block written to the device will be lost.

Retire persist_l2arc_00{4,5,8} tests since they cover code already
covered by the remaining ones. Test persist_l2arc_006 is renamed to
persist_l2arc_004 and persist_l2arc_007 is renamed to persist_l2arc_005.

Fix a typo in persist_l2arc_004, and remove an assertion that is not
always true from l2arc_arcstats_pos. Also update an assertion in
persist_l2arc_005 and explain why in a comment.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #12365

2 years agoInitialize dn_next_type[] in the dnode constructor
Mark Johnston [Fri, 16 Jul 2021 14:12:47 +0000 (10:12 -0400)]
Initialize dn_next_type[] in the dnode constructor

It seems nothing ensures that this array is zeroed when a dnode is
freshly allocated, so in principle it retains the values from the
previous allocation.  In practice it seems to be the case that the
fields should end up zeroed, but we can zero the field anyway for
consistency.

This was found using KMSAN.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12383

2 years agodummynet: remove unused definitions
Kristof Provost [Wed, 16 Jun 2021 14:52:25 +0000 (16:52 +0200)]
dummynet: remove unused definitions

No functional change.

MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31806

(cherry picked from commit 415e81d5d9ed7a73825d371c0b538765fa57a801)

2 years agonetpfil tests: IPv6 dummynet queue test
Kristof Provost [Thu, 2 Sep 2021 13:40:51 +0000 (15:40 +0200)]
netpfil tests: IPv6 dummynet queue test

Same as the v4 test, but with IPv6.

MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31789

(cherry picked from commit 12184311c16160464a36ae05b1cd8c5e3c24fbaa)

2 years agonetpfil tests: dummynet queue test
Kristof Provost [Thu, 2 Sep 2021 13:38:04 +0000 (15:38 +0200)]
netpfil tests: dummynet queue test

Test prioritisation and dummynet queues.
We need to give the pipe sufficient bandwidth for dummynet to work.
Given that we can't rely on the TCP connection failing alltogether, but
we can measure the effect of dummynet by imposing a time limit on a
larger data transfer.

If TCP is prioritised it'll get most of the pipe bandwidth and easily
manage to transfer the data in 3 seconds or less. When not prioritised
this will not succeed.

MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31788

(cherry picked from commit cb6bfef9ca78623e33d2aef347dcee112a639103)

2 years agodummynet tests: pipe test for IPv6
Kristof Provost [Mon, 14 Jun 2021 19:24:59 +0000 (21:24 +0200)]
dummynet tests: pipe test for IPv6

MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31787

(cherry picked from commit 5fda5913e16afac72f3f420e227803e33d4c1542)

2 years agoipfw: Introduce dnctl
Kristof Provost [Tue, 25 May 2021 14:54:32 +0000 (16:54 +0200)]
ipfw: Introduce dnctl

Introduce a link to the ipfw command, dnctl, for dummynet configuration.
dnctl only handles dummynet configuration, and is part of the effort to
support dummynet in pf.

/sbin/ipfw continues to accept pipe, queue and sched commands, but these can
now also be issued via the new dnctl command.

Reviewed by: donner
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30465

(cherry picked from commit 0b95680e077b7ef5bc6930c7c0f1a41106251d5d)