Zhenlei Huang [Mon, 9 Oct 2023 10:30:22 +0000 (18:30 +0800)]
proc: Add sysctl flag CTLFLAG_TUN to loader tunable
The sysctl variable 'kern.kstack_pages' is actually a loader tunable.
Add sysctl flag CTLFLAG_TUN to it so that `sysctl -T` will report it
correctly.
No functional change intended.
Note that on arm64 the thread0 stack size can not be controlled with it,
kib@ suggested that arm64 maintainers can fix it eventually so let's
enable it also on arm64 right now.
Zhenlei Huang [Mon, 9 Oct 2023 10:30:22 +0000 (18:30 +0800)]
buf: Add sysctl flag CTLFLAG_TUN to loader tunable
The sysctl variable 'vfs.unmapped_buf_allowed' is actually a loader
tunable. Add sysctl flag CTLFLAG_TUN to it so that `sysctl -T` will
report it correctly.
Zhenlei Huang [Mon, 9 Oct 2023 10:30:22 +0000 (18:30 +0800)]
sockets: Add sysctl flag CTLFLAG_TUN to loader tunable
The sysctl variable 'kern.ipc.maxsockets' is actually a loader tunable.
Add sysctl flag CTLFLAG_TUN to it so that `sysctl -T` will report it
correctly.
Zhenlei Huang [Mon, 9 Oct 2023 10:30:21 +0000 (18:30 +0800)]
ddb: Add sysctl flag CTLFLAG_TUN to loader tunable
The sysctl variable 'debug.ddb.capture.bufsize' is actually a loader
tunable. Add sysctl flag CTLFLAG_TUN to it so that `sysctl -T` will
report it correctly.
Zhenlei Huang [Mon, 9 Oct 2023 10:30:21 +0000 (18:30 +0800)]
cam/scsi: Add sysctl flag CTLFLAG_TUN to loader tunable
The sysctl variable 'kern.cam.scsi_delay' is actually a loader tunable.
Add sysctl flag CTLFLAG_TUN to it so that `sysctl -T` will report it
correctly.
The loader tunable 'net.inet.sctp.tcbhashsize' and 'net.inet.sctp.chunkscale'
are only used during vnet initializing, thus it make no senses to make them
writable tunable.
Validate the values of loader tunables on vnet initialize, reset them to
theirs defaults if invalid to prevent potential kernel panics.
Bojan Novković [Mon, 9 Oct 2023 00:38:08 +0000 (20:38 -0400)]
arm64: Add a leaf PTP when pmap_enter(psind=1) creates a wired mapping
Let pmap_enter_l2() create wired mappings. In particular, allocate a
leaf PTP for use during demotion. This is a step towards reverting
commit 64087fd7f372.
Bojan Novković [Mon, 9 Oct 2023 00:32:35 +0000 (20:32 -0400)]
i386: Add a leaf PTP when pmap_enter(psind=1) creates a wired mapping
Let pmap_enter_pde() create wired mappings. In particular, allocate a
leaf PTP for use during demotion. This is a step towards reverting
commit 64087fd7f372.
mount_nfs(8): Indicate that the -t option is deprecated
In mount_nfs.c the -t option is deprecated and advises to use
timeout=<N> instead. However, since that refers to NFS over UDP, which
is not used nowadays, mark this option as deprecated in the man page.
Notable upstream pull request merges:
#15290 54b1b1d89 import: require force when cachefile hostid doesn't
match on-disk
#15319 342357cd9 Reduce number of metaslab preload taskq threads
#15340 2a6c62109 ARC: Remove b_cv from struct l1arc_buf_hdr
#15347 75a2eb7fa ARC: Drop different size headers for crypto
#15350 96b9cf42e ARC: Remove b_bufcnt/b_ebufcnt from ARC headers
#15353 66b81b349 ZIL: Reduce maximum size of WR_COPIED to 7.5K
#15362 5b8688e62 zfsconcepts: add description of block cloning
The use of bitcount() triggered a build error because it couldn't be
located. __bitcount() on the other hand is defined in sys/types.h, which
is included in teken/teken.h.
Bojan Novković [Sat, 7 Oct 2023 18:00:11 +0000 (21:00 +0300)]
tty: fix improper backspace behaviour for UTF8 characters when in canonical mode
This patch adds additional logic in ttydisc_rubchar() to properly handle
backspace behaviour for UTF-8 characters.
Currently, typing in a backspace after a UTF8 character will delete only
one byte from the byte sequence, leaving garbled output in the tty's
output queue. With this change all of the character's bytes are deleted.
This change is only active when the IUTF8 flag is set (see 19054eb6053189144aa962b2ecc1bf5087758a3e "(s)tty: add support for IUTF8
input flag")
The code uses the teken_wcwidth() function to properly handle character
column widths for different code points, and adds the
teken_utf8_bytes_to_codepoint() function that converts a UTF-8 byte
sequence to a codepoint, as specified in RFC3629.
Bojan Novković [Sat, 7 Oct 2023 17:59:57 +0000 (20:59 +0300)]
(s)tty: add support for IUTF8 input flag
This patch adds the necessary kernel and stty code to support setting
the IUTF8 flag for ttys. It is the first of two patches that fix
backspace behaviour for UTF-8 encoded characters when in canonical mode.
Alan Somers [Wed, 4 Oct 2023 18:48:01 +0000 (12:48 -0600)]
fusefs: sanitize FUSE_READLINK results for embedded NULs
If VOP_READLINK returns a path that contains a NUL, it will trigger an
assertion in vfs_lookup. Sanitize such paths in fusefs, rejecting any
and warning the user about the misbehaving server.
Michael Tuexen [Sat, 7 Oct 2023 13:56:00 +0000 (15:56 +0200)]
udp: fix sending of IPv4-mapped addresses
The inp_vflags field must be adjusted during the call of
in_pcbbind_setup(). This is consistent with the other places in the
code, but not elegant at all.
Alan Somers [Fri, 6 Oct 2023 21:05:41 +0000 (15:05 -0600)]
Fix intermittency in the sys.fs.fusefs.symlink.main test
This change is identical to 86885b18689 but for symlink instead of
mknod. The kernel sends a FUSE_FORGET asynchronously with the final
syscall. The lack of an expectation caused this test to occasionally
fail.
Also, remove a sleep that accidentally snuck into a different test.
Alan Somers [Fri, 6 Oct 2023 19:46:42 +0000 (13:46 -0600)]
Fix intermittency in the sys.fs.fusefs.mknod.main test
In the Mknod.parent_inode test case, the kernel sends an extra
FUSE_FORGET message. But because it gets sent asynchronously with the
failing syscall, it doesn't always get received before the test ends.
So we never setup an expectation for it. And 90+% of the time the test
would exit successfully.
Fix the intermittency by always waiting to receive the FUSE_FORGET
message.
Alexander Motin [Fri, 6 Oct 2023 17:09:27 +0000 (13:09 -0400)]
ZIL: Reduce maximum size of WR_COPIED to 7.5K
Benchmarks show that at certain write sizes range lock/unlock take
not so much time as extra memory copy. The exact threshold is not
obvious due to other overheads, but it is definitely lower than
~63KB used before. Make it configurable, defaulting at 7.5KB,
that is 8KB of nearest malloc() size minus itx and lr structs.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15353
Emmanuel Vadot [Thu, 5 Oct 2023 17:10:00 +0000 (19:10 +0200)]
dwc: Rewrite part of the descriptors setup functions
- Add a txdesc_clear which clears the tx desc instead of doing that in
dwc_setup_txdesc based on arguments.
- Remove dwc_set_owner, in the end we always set the owner of the desc
as we do it for id > 0 and then for the first one.
- Remove dwc_ prefix
siv0 [Fri, 6 Oct 2023 16:53:23 +0000 (18:53 +0200)]
rpm: Fix `make rpm` on Debian/Ubuntu
The recent patch to change the bash completion install location based
on the Distribution, ignored that it should still be possible to
create RPMs on Debian derived systems. Additionally `make deb` itself
creates RPMs and converts them via `alien`.
This patch adds the bashcompletiondir variable to the rpm defines and
uses this for the location, where to get the bash completion file.
It still changes the location on Debian/Ubuntu systems in the final
packages from /etc/bash_completion.d to
/usr/share/bash-completion/completions
The daemon utility already does its own buffering and retransmits its
child's output line by line. There's no need for stdio to add its own
buffering on top of this.
Rob Norris [Sat, 16 Sep 2023 07:02:02 +0000 (17:02 +1000)]
import: require force when cachefile hostid doesn't match on-disk
Previously, if a cachefile is passed to zpool import, the cached config
is mostly offered as-is to ZFS_IOC_POOL_TRYIMPORT->spa_tryimport(), and
the results are taken as the canonical pool config and handed back to
ZFS_IOC_POOL_IMPORT.
In the course of its operation, spa_load() will inspect the pool and
build a new config from what it finds on disk. However, it then
regenerates a new config ready to import, and so rightly sets the hostid
and hostname for the local host in the config it returns.
Because of this, the "require force" checks always decide the pool is
exported and last touched by the local host, even if this is not true,
which is possible in a HA environment when MMP is not enabled. The pool
may be imported on another head, but the import checks still pass here,
so the pool ends up imported on both.
(This doesn't happen when a cachefile isn't used, because the pool
config is discovered in userspace in zpool_find_import(), and that does
find the on-disk hostid and hostname correctly).
Since the systemd zfs-import-cache.service unit uses cachefile imports,
this can lead to a system returning after a crash with a "valid"
cachefile on disk and automatically, quietly, importing a pool that has
already been taken up by a secondary head.
This commit causes the on-disk hostid and hostname to be included in the
ZPOOL_CONFIG_LOAD_INFO item in the returned config, and then changes the
"force" checks for zpool import to use them if present.
This method should give no change in behaviour for old userspace on new
kernels (they won't know to look for the new config items) and for new
userspace on old kernels (the won't find the new config items).
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc.
Closes #15290
Rob Norris [Mon, 18 Sep 2023 01:07:32 +0000 (11:07 +1000)]
tests: add tests for zpool import behaviour when hostid changes
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc.
Closes #15290
Kristof Provost [Fri, 6 Oct 2023 12:20:17 +0000 (14:20 +0200)]
pfctl: fix incorrect mask on dynamic address
A PF rule using an IPv4 address followed by an IPv6 address and then a
dynamic address, e.g. "pass from {192.0.2.1 2001:db8::1} to (pppoe0)",
will have an incorrect /32 mask applied to the dynamic address.
MFC after: 3 weeks
Obtained from: OpenBSD
See also: https://ftp.openbsd.org/pub/OpenBSD/patches/5.6/common/007_pfctl.patch.sig
Sponsored by: Rubicon Communications, LLC ("Netgate")
Event: Oslo Hackathon at Modirum
Rob N [Fri, 6 Oct 2023 16:06:29 +0000 (03:06 +1100)]
zfsconcepts: add description of block cloning
Here I'm trying to succinctly introduce the concept, the basics of its
construction, how its different to dedup, how to use it, and where its
limitations lie, in four paragraphs and with enough searchable terms to
help the reader find more information both within OpenZFS and elsewhere.
Phew.
Sponsored-By: Klara, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #15362
Alexander Motin [Fri, 6 Oct 2023 16:04:00 +0000 (12:04 -0400)]
Reduce number of metaslab preload taskq threads.
Before this change ZFS created threads for 50% of CPUs for each top-
level vdev. Plus it created the same number of threads for embedded
log groups (that have only one metaslab and don't need any preload).
As result, on system with 80 CPUs and pool of 60 vdevs this resulted
in 4800 metaslab preload threads, that is absolutely insane.
This patch changes the preload threads to 50% of CPUs in one taskq
per pool, so on the mentioned system it will be only 40 threads.
Among other things this fixes zdb on the mentioned system and pool
on FreeBSD, that failed to create so many threads in one process.
Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15319
Alexander Motin [Tue, 3 Oct 2023 15:57:48 +0000 (11:57 -0400)]
ARC: Drop different size headers for crypto
To reduce memory usage ZFS crypto allocated bigger by 56 bytes ARC
headers only when specific block was encrypted on disk. It was a
nice optimization, except in some cases the code reallocated them
on fly, that invalidated header pointers from the buffers. Since
the buffers use different locking, it created number of races, that
were originally covered (at least partially) by b_evict_lock, used
also to protection evictions. But it has gone as part of #14340.
As result, as was found in #15293, arc_hdr_realloc_crypt() ended
up unprotected and causing use-after-free.
Instead of introducing some even more elaborate locking, this patch
just drops the difference between normal and protected headers. It
cost us additional 56 bytes per header, but with couple patches
saving 24 bytes, the net growth is only 32 bytes with total header
size of 232 bytes on FreeBSD, that IMHO is acceptable price for
simplicity. Additional locking would also end up consuming space,
time or both.
Reviewe-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15293
Closes #15347
Alexander Motin [Fri, 6 Oct 2023 15:56:17 +0000 (11:56 -0400)]
ARC: Remove b_bufcnt/b_ebufcnt from ARC headers
In most cases we do not care about exact number of buffers linked
to the header, we just need to know if it is zero, non-zero or one.
That can easily be checked just looking on b_buf pointer or in some
cases derefencing it.
b_ebufcnt is read only once, and in that case we already traverse
the list as part of arc_buf_remove(), so second traverse should not
be expensive.
This reduces L1 ARC header size by 8 bytes and full crypto header by
16 bytes, down to 176 and 232 bytes on FreeBSD respectively.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15350
Rob N [Fri, 6 Oct 2023 15:39:20 +0000 (02:39 +1100)]
tests/block_cloning: sync before write in fallback test
We're still seeing this test fail intermittently (that is, the clone
happens), which must mean the write and the clone can still be happening
on different txgs.
It might be that there's still activity after the pool is created. So
here we force a sync before starting the write.
Sponsored-By: Klara Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #15359
Bjoern A. Zeeb [Fri, 6 Oct 2023 10:53:07 +0000 (10:53 +0000)]
rtw88: re-connect to the build
This adds the (updated) rtw88 driver back to the build.
Functionality has not been tested (much) so might not currently
work but people offered to test.
Firmware is provided by the wifi-firmware-rtw88-kmod port/package.
Bjoern A. Zeeb [Fri, 6 Oct 2023 10:38:22 +0000 (10:38 +0000)]
net80211: pass __func__, __LINE__ also to ieee80211_alloc_node()
Pass caller information to ieee80211_alloc_node() so that in case
IEEE80211_DEBUG_REFCNT is compiled in we can (better) track references,
in this case the initial ieee80211_node_initref().
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Kristof Provost [Thu, 5 Oct 2023 14:57:50 +0000 (16:57 +0200)]
pf: fix SCTP SDT probe
We want the return value of pf_test_rule(), i.e. the result of the
evaluation of the new state, not the result of the evaluation of the
original packet/state.
MFC after: 1 week
Sponsored by: Orange Business Services
Emmanuel Vadot [Wed, 4 Oct 2023 19:10:47 +0000 (21:10 +0200)]
dwc: Remove if_dwc_mac_type
This doesn't represent the mac_type but if the DMA engine support
extended descriptors.
Read the HW_FEATURE register to learn if the DMA engine supports it.
Emmanuel Vadot [Wed, 4 Oct 2023 18:21:24 +0000 (20:21 +0200)]
dwc: Move BUS_MODE_DEFAULT_PBL; in if_dwcvar.h
And rename it to DMA_DEFAULT_PBL, this is the default for all (most ?)
dma engine that dwc should support.
While here stop including dwc1000_reg.h in if_dwc.c, we don't need it anymore.
Synopsis/Designware controller have multiple version. The version currently
supported by dwc(4) is the version 3 and it's usually called 1000 for gigabit.
In the goal to support all of those in the same base driver start splitting the
core function to a new file.
Synopsis/Designware controller have multiple dma version, the one included
in the driver is the base one. if_awg is one example of a dwc variant that
have another DMA controller. eqos is a newer variant of dwc that have a newer
dma controller.
In the goal to support all of those in the same base driver start splitting the
dma function to a new file.
Synopsis/Designware controller have multiple version. The version currently
supported by dwc(4) is the version 3 and it's usually called 1000 for gigabit.
This file only have definition for the registers of this version so rename it.
snps,dwmac have one required clock named stmmaceth and one optional pclk,
correctly handle both in if_dwc, no need to get/enable stmmacseth again
in if_dwc_rk.
It also have one required reset also named stmmaceth and one optional ahb,
correctly handle both.
Rockchip have another optional clock named clk_mac_speed, get it and enable it
if present. Also fix the optional RMII clocks, they were previously wrongly
enabled in RGMII case.
It makes it easier to find all the sub drivers and change them if needed.
While here do not gate dwc_rk with soc options, dwc_rk is made for all rockchip
SoCs. Same thing for dwc_socfpga
certctl: Split certificate bundles before processing.
This allows 'certctl rehash' to do the right thing when ca_root_nss is
installed, instead of linking the entire bundle to the hash of the
first certificate it contains.
MFC after: 3 days
Reviewed by: allanjude
Differential Revision: https://reviews.freebsd.org/D42087
Bjoern A. Zeeb [Mon, 2 Oct 2023 20:20:14 +0000 (20:20 +0000)]
net80211: de-inline ieee80211_ref_node()
Make ieee80211_ref_node() a macro so we can pass __func__, __LINE__
in for IEEE80211_DEBUG_REFCNT as we do for other refcount related
functions. Add the appropriate IEEE80211_DPRINTF() call to the
_ieee80211_ref_node() implementation to support wlandebug(8) +node
printf style tracing.
As a plus we can now also use Dtrace fbt on the
_ieee80211_{ref,free}_node() implementations with futher logic,
gathering backtraces, etc. more flexibly.
Bjoern A. Zeeb [Thu, 5 Oct 2023 14:01:48 +0000 (14:01 +0000)]
rtw88: Use RF_CFGCH instead of hard coded 0x18
While debugging some funky register reads of 0xaeaea from RF_CFGCH
resulting in "rtw880: [TXGAPK] unknown channel 234!!" more of these
reads came to my attention hidden by using the register index rather
than the defined value. Make this more grep-able.
Bjoern A. Zeeb [Mon, 2 Oct 2023 14:30:46 +0000 (14:30 +0000)]
net80211: remove ieee80211_unref_node()
ieee80211_unref_node() was only used in two error cases in
ieee80211_send_nulldata(). There we do not need to guard against
ni pointer reuse after decrementing the refcount of the ni as we
only update the stats and return.
Update the man page and remove the link for the now gone function.
Sponsored by: The FreeBSD Foundation
X-MFC: never
Reviewed by: adrian, emaste
Differential Revision: https://reviews.freebsd.org/D42035