Fully revert f83f5d58394db57576bbed6dc7531997cabeb102 for the sys/dev/usb/serial
folder, only keeping the zero length packet API introduced in sys/dev/usb
after more reports of USB serial devices not supporting ZLPs.
Alexander Motin [Fri, 20 Aug 2021 13:46:51 +0000 (09:46 -0400)]
mpr(4): Handle mprsas_alloc_tm() errors on device removal.
SAS9305-16e with firmware 16.00.01.00 report HighPriorityCredit of
only 8, while for comparison some other combinations I have report
100 or even 128. In case of large JBOD detach requirement to send
target reset command to each target same time overflows the limit,
and without adequate handling makes devices stuck in half-detached
state, preventing later re-attach.
To handle that in case of allocation error mark the target with new
MPRSAS_TARGET_TOREMOVE flag, and retry the removal attempt next time
something else free high priority command. With this patch I can
successfully detach/attach 102 disk JBOD from/to the SAS9305-16e.
Wei Hu [Fri, 20 Aug 2021 08:43:10 +0000 (08:43 +0000)]
Microsoft Azure Network Adapter(MANA) VF support
MANA is the new network adapter from Microsoft which will be available
in Azure public cloud. It provides SRIOV NIC as virtual function to
guest OS running on Hyper-V.
The code can be divided into two major parts. Gdma_main.c is the one to
bring up the hardware board and drives all underlying hardware queue
infrastructure. Mana_en.c contains all main ethernet driver code.
It has only tested and supported on amd64 architecture.
ls: prevent no-color build from complaining when COLORTERM is non-empty
As 257886 reports, if ls(1) is built with WITHOUT_LS_COLORS="YES", it
issues a warning whenever COLORTERM is non-empty. The warning is not
useful, so I thought to remove it, but as Ed pointed out, we may want
to have a way to determine whether a particular copy of ls has been
compiled with color support or not.
Therefore move the warnx() call to the getopt loop in
a WITHOUT_LS_COLORS build to fire when the user asks for colored output.
PR: 257886
Reported by: Marko Turk
Reviewed by: kevans
Bjoern A. Zeeb [Thu, 19 Aug 2021 17:27:04 +0000 (12:27 -0500)]
localedef: unbreak WITHOUT_LOCALES
After 0fa5403d493b ("pkgbase: move locales into their own package") we
need usr.bin/localedef as a bootstrap tool independent on where
WITHOUT_LOCALE was specified as we ALWAYS process C.UTF-8.
At the same time LOCALES= in the local Makefile is empty but
C.UTF-8 with WITHOUT_LOCALES. C.UTF-8 is excluded from FILES, and thus
after the replacement FILES= is set to only .LC_CTYPE which results in
a build failure not knowing how to build that. Tweak the substitution to
replace only non-empty words so that FILES remains harmlessly empty.
Ka Ho Ng [Thu, 19 Aug 2021 10:30:41 +0000 (18:30 +0800)]
truncate(1): Add hole-punching support
This commit adds hole-punching support to the truncate(1) utility. If
the option -d is specified, truncate(1) performs zeroing, and if
possible hole-punching in case the operation is supported by the
underlying file system of the specified files.
Sponsored by: The FreeBSD Foundation
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31556
Alexander Motin [Wed, 18 Aug 2021 21:11:03 +0000 (17:11 -0400)]
geli(8): Do not report error on resize to the same size.
Just validate the old metadata and exit. Originally the check was
added to not thash the only copy of metadata, but we can achieve the
same just by skipping the writing/trashing. The metadata validation
should protect user from wrongly specifying new size instead of old.
John Baldwin [Wed, 18 Aug 2021 17:56:28 +0000 (10:56 -0700)]
iscsi: Teach the iSCSI stack about "large" received PDUs.
When using iSCSI PDU offload (cxgbei) on T6 adapters, a burst of
received PDUs can be reported via a single message to the driver.
Previously the driver passed these multi-PDU bursts up to the iSCSI
stack up as a single "large" PDU by rewriting the buffer offset, data
segment length, and DataSN fields in the iSCSI header. The DataSN
field in particular was rewritten so that each of the "large" PDUs
used consecutively increasing values. While this worked, the forged
DataSN values did not match the ExpDataSN value in the subsequent SCSI
Response PDU. The initiator does not currently verify this value, but
the forged DataSN values prevent adding a check.
To avoid this, allow a logical iSCSI PDU (struct icl_pdu) to describe
a burst of PDUs via a new 'ip_additional_pdus' field. Normally this
field is set to zero when 'struct icl_pdu' represents a single PDU.
If logical PDU represents a burst of on-the-wire PDUs, then 'ip_npdus'
contains the count of additional on-the-wire PDUs. The header of this
"large" PDU is still modified, but the DataSN field now contains the
DataSN value of the first on-the-wire PDU in the burst.
Kyle Evans [Wed, 18 Aug 2021 17:31:45 +0000 (12:31 -0500)]
uipc: avoid circular pr_{slow,fast}timos
domain_init() gets reinvoked for each vnet on a system, so we must not
alter global state. Practically speaking, we were creating circular
lists and tying up a softclock thread into an infinite loop.
The breakage here was most easily observed by simply creating a jail
in a new vnet and watching the system suddenly become erratic.
Reported by: markj
Fixes: e0a17c3f063f ("uipc: create dedicated lists for fast ...")
Pointy hat: kevans
Cyril Zhang [Wed, 18 Aug 2021 17:41:33 +0000 (13:41 -0400)]
vmm: Add credential to cdev object
Add a credential to the cdev object in sysctl_vmm_create(), then check
that we have the correct credentials in sysctl_vmm_destroy(). This
prevents a process in one jail from opening or destroying the /dev/vmm
file corresponding to a VM in a sibling jail.
Add regression tests.
Reviewed by: jhb, markj
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31156
Scott Long [Tue, 17 Aug 2021 21:50:18 +0000 (21:50 +0000)]
- Fix the growfs rc script to cope with diskid labels.
- Fix a warning in growfs. gpart commit is supposed to be called on disk
device.
- Silence a gpart commit warning in growfs.
John Baldwin [Tue, 17 Aug 2021 21:39:58 +0000 (14:39 -0700)]
OpenSSL: Refactor KTLS tests to better support TLS 1.3.
Most of this upstream commit touched tests not included in the
vendor import. The one change merged in is to remove a constant
only present in an internal header to appease the older tests.
John Baldwin [Tue, 17 Aug 2021 21:39:32 +0000 (14:39 -0700)]
OpenSSL: Update KTLS documentation
KTLS support has been changed to be off by default, and configuration is
via a single "option" rather two "modes". Documentation is updated
accordingly.
John Baldwin [Tue, 17 Aug 2021 21:39:03 +0000 (14:39 -0700)]
OpenSSL: Only enable KTLS if it is explicitly configured
It has always been the case that KTLS is not compiled by default. However
if it is compiled then it was automatically used unless specifically
configured not to. This is problematic because it avoids any crypto
implementations from providers. A user who configures all crypto to use
the FIPS provider may unexpectedly find that TLS related crypto is actually
being performed outside of the FIPS boundary.
Instead we change KTLS so that it is disabled by default.
We also swap to using a single "option" (i.e. SSL_OP_ENABLE_KTLS) rather
than two separate "modes", (i.e. SSL_MODE_NO_KTLS_RX and
SSL_MODE_NO_KTLS_TX).
John Baldwin [Tue, 17 Aug 2021 21:37:47 +0000 (14:37 -0700)]
OpenSSL: Correct the return value of BIO_get_ktls_*().
BIO_get_ktls_send() and BIO_get_ktls_recv() are documented as
returning either 0 or 1. However, they were actually returning the
internal value of the associated BIO flag for the true case instead of
1.
When a prefix gets deleted from the RIB, dpdk_lpm algo needs to know
the nexthop of the "parent" prefix to update its internal state.
The glue code, which utilises RIB as a backing route store, uses
fib[46]_lookup_rt() for the prefix destination after its deletion
to fetch the desired nexthop.
This approach does not work when deleting less-specific prefixes
with most-specific ones are still present. For example, if
10.0.0.0/24, 10.0.0.0/23 and 10.0.0.0/22 exist in RIB, deleting
10.0.0.0/23 would result in 10.0.0.0/24 being returned as a search
result instead of 10.0.0.0/22. This, in turn, results in the failed
datastructure update: part of the deleted /23 prefix will still
contain the reference to an old nexthop. This leads to the
use-after-free behaviour, ending with the eventual crashes.
Fix the logic flaw by properly fetching the prefix "parent" via
newly-created rt_get_inet[6]_parent() helpers.
Randall Stewart [Tue, 17 Aug 2021 20:29:22 +0000 (16:29 -0400)]
tcp: Add support for DSACK based reordering window to rack.
The rack stack, with respect to the rack bits in it, was originally built based
on an early I-D of rack. In fact at that time the TLP bits were in a separate
I-D. The dynamic reordering window based on DSACK events was not present
in rack at that time. It is now part of the RFC and we need to update our stack
to include these features. However we want to have a way to control the feature
so that we can, if the admin decides, make it stay the same way system wide as
well as via socket option. The new sysctl and socket option has the following
meaning for setting:
00 (0) - Keep the old way, i.e. reordering window is 1 and do not use DSACK bytes to add to reorder window
01 (1) - Change the Reordering window to 1/4 of an RTT but do not use DSACK bytes to add to reorder window
10 (2) - Keep the reordering window as 1, but do use SACK bytes to add additional 1/4 RTT delay to the reorder window
11 (3) - reordering window is 1/4 of an RTT and add additional DSACK bytes to increase the reordering window (RFC behavior)
The default currently in the sysctl is 3 so we get standards based behavior.
Reviewed by: tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D31506
Ryan Moeller [Mon, 26 Jul 2021 20:08:52 +0000 (16:08 -0400)]
ZTS: Add tests for creation time
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12432
Richard Yao [Sun, 17 Mar 2019 00:43:13 +0000 (20:43 -0400)]
Linux 4.11 compat: statx support
Linux 4.11 added a new statx system call that allows us to expose crtime
as btime. We do this by caching crtime in the znode to match how atime,
ctime and mtime are cached in the inode.
statx also introduced a new way of reporting whether the immutable,
append and nodump bits have been set. It adds support for reporting
compression and encryption, but the semantics on other filesystems is
not just to report compression/encryption, but to allow it to be turned
on/off at the file level. We do not support that.
We could implement semantics where we refuse to allow user modification
of the bit, but we would need to do a dnode_hold() in zfs_znode_alloc()
to find out encryption/compression information. That would introduce
locking that will have a minor (although unmeasured) performance cost.
It also would be inferior to zdb, which reports far more detailed
information. We therefore omit reporting of encryption/compression
through statx in favor of recommending that users interested in such
information use zdb.
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #8507
The comment only specifies MNT_ROOTFS - which is set by the kernel when
mounting its root file system. So it's not clear if any other flags
are not quite right and for what reason.
Gordon Bergling [Tue, 17 Aug 2021 17:01:07 +0000 (19:01 +0200)]
zfs.4: Fix typo s/compatiblity/compatibility/
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: Gordon Bergling <gbergling@googlemail.com>
Closes #12464
Alex Richardson [Tue, 17 Aug 2021 16:44:40 +0000 (17:44 +0100)]
Mark LLDB/CLANG_BOOTSTRAP/LLD_BOOTSTRAP as broken on non-FreeBSD for now
I enabled these options again in 31ba4ce8898f9dfa5e7f054fdbc26e50a599a6e3,
but unfortunately only my specific build configuration worked whereas the
build with default options is still broken.
Alexander Motin [Tue, 17 Aug 2021 16:15:54 +0000 (12:15 -0400)]
Remove b_pabd/b_rabd allocation from arc_hdr_alloc()
When a header is allocated for full overwrite it is a waste of time
to allocate b_pabd/b_rabd for it, since arc_write() will free them
without ever being touched. If it is a read or a partial overwrite
then arc_read() and arc_hdr_decrypt() allocate them explicitly.
Reduced memory allocation in user threads also reduces ARC eviction
throttling there, proportionally increasing it in ZIO threads, that
is not good. To minimize or even avoid it introduce ARC allocation
reserve, allowing certain arc_get_data_abd() callers to allocate a
bit longer in situations where user threads will already throttle.
Reviewed-by: George Wilson <gwilson@delphix.com> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc.
Closes #12398
Ed Maste [Tue, 17 Aug 2021 15:58:03 +0000 (11:58 -0400)]
sysctl.9: put negative sense sysctl note in own paragraph
The sysctl man page cautions against negative-sense boolean sysctls
(foobar_disable), but it gets lost at the end of a large paragraph.
Move it to a separate paragraph in an attempt to make it more clear.
This man page could use a more holistic review and edit pass. This
change is simple and straightforward and I hope provides a small but
immediate benefit.
Alexander Motin [Tue, 17 Aug 2021 15:59:46 +0000 (11:59 -0400)]
Increase default volblocksize from 8KB to 16KB
Many things has changed since previous default was set many years ago.
Nowadays 8KB does not allow adequate compression or even decent space
efficiency on many of pools due to 4KB disk physical block rounding,
especially on RAIDZ and DRAID. It effectively limits write throughput
to only 2-3GB/s (250-350K blocks/s) due to sync thread, allocation,
vdev queue and other block rate bottlenecks. It keeps L2ARC expensive
despite many optimizations and dedup just unrealistic.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12406
Alexander Motin [Tue, 17 Aug 2021 15:55:34 +0000 (11:55 -0400)]
Optimize arc_l2c_only lists assertions
It is very expensive and not informative to call multilist_is_empty()
for each arc_change_state() on debug builds to check for impossible.
Instead implement special index function for arc_l2c_only->arcs_list,
multilists, panicking on any attempt to use it.
Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc.
Closes #12421
Alexander Motin [Tue, 17 Aug 2021 15:50:31 +0000 (11:50 -0400)]
Fix/improve dbuf hits accounting
Instead of clearing stats inside arc_buf_alloc_impl() do it inside
arc_hdr_alloc() and arc_release(). It fixes statistics being wiped
every time a new dbuf is filled from the ARC.
Remove b_l1hdr.b_l2_hits. L2ARC hits are accounted at b_l2hdr.b_hits.
Since the hits are accounted under hash lock, replace atomics with
simple increments.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Wilson <george.wilson@delphix.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc.
Closes #12422
Alexander Motin [Tue, 17 Aug 2021 15:47:00 +0000 (11:47 -0400)]
Avoid vq_lock drop in vdev_queue_aggregate()
vq_lock is already too congested for two more operations per I/O.
Instead of dropping and reacquiring it inside vdev_queue_aggregate()
delegate the zio_vdev_io_bypass() and zio_execute() calls for parent
I/Os to callers, that drop the lock any way to execute the new I/O.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc.
Closes #12297
Alexander Motin [Tue, 17 Aug 2021 15:44:34 +0000 (11:44 -0400)]
Use more atomics in refcounts
Use atomic_load_64() for zfs_refcount_count() to prevent torn reads
on 32-bit platforms. On 64-bit ones it should not change anything.
When built with ZFS_DEBUG but running without tracking enabled use
atomics instead of mutexes same as for builds without ZFS_DEBUG.
Since rc_tracked can't change live we can check it without lock.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc.
Closes #12420
rtld: avoid use of of getenv(3) for evaluating rtld env vars (LD_XXX)
Scan through the set of environment variables during initialization and
store values in the corresponding ld_env_var_desc structure, in the
single pass at init time. This does not eliminate use of getenv(3) and
unsetenv(3) completely, but provides a foundation to do that as the next
step.
Also organize the scan in a way that makes it easier to support aliases
like LD_DEBUG vs. LD_64_DEBUG.
Suggested by: arichardson
Reviewed by: arichardson, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31545
Instead of specifying the main name part of the environment variable as the
string literal, create array of the var names and access them by symbolic
index. Convert main name parts into complete names by prefixing with
ABI-specific ld_env_vars.
This way the name is not repeated, and also it can carry additional
proporties explicitly. For instance, cleanup of the environment for
the setuid image does not require retyping all names.
Reviewed by: arichardson, markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D31545
Dan Langille [Sun, 15 Aug 2021 16:53:16 +0000 (12:53 -0400)]
Enable rc.d/jail within jails
Jails with jails is a supported. This change allows the script to run
upon startup with a jail. Without this, jails are not automatically
started within jails.
ipfw: fix possible data race between jump cache reading and updating.
Jump cache is used to reduce the cost of rule lookup for O_SKIPTO and
O_CALLRETURN actions. It uses rules chain id to check correctness of
cached value. But due to the possible race, there is the chance that
one thread can read invalid value. In some cases this can lead to out
of bounds access and panic.
Use thread fence operations to constrain the reordering of accesses.
Also rename jump_fast and jump_linear functions to jump_cached and
jump_lookup_pos respectively.
Ryan Moeller [Mon, 16 Aug 2021 23:38:34 +0000 (19:38 -0400)]
ZTS: Avoid unset $tmpdir in redacted_panic
The redacted_send tests make use of a $tmpdir variable, except in
redacted_send/redacted_panic the variable is never defined.
Use $TEST_BASE_DIR instead.
Clean up the stream file after the test.
Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12455
Allow consistency validation of the inet6 fib based on rib data.
Validation can be kicked off by loading test_lookup module and
running sysctl net.route.test.run_inet6_scan=1
The macro bit_foreach() traverses all set bits in the bitstring in the
forward direction, assigning each location in turn to variable.
The macro bit_foreach_at() traverses all set bits in the bitstring in
the forward direction at or after the zero-based bit index, assigning
each location in turn to variable.
The bit_foreach_unset() and bit_foreach_unset_at() macros which
traverses unset bits are implemented for completeness.