Alexander Motin [Sat, 6 Mar 2021 03:39:52 +0000 (22:39 -0500)]
Do not exit ctl_be_block_worker() prematurely.
Return while there are any I/Os in a queue may result in them stuck
indefinitely, since there is only one taskqueue task for all of them.
I think I've reproduced this by switching ha_role to secondary under
heavy load.
Brandon Bergren [Mon, 1 Mar 2021 02:35:53 +0000 (20:35 -0600)]
[PowerPC64] Fix multiple issues in fpsetmask().
Building R exposed a problem in fpsetmask() whereby we were not properly
clamping the provided mask to the valid range.
R initilizes the mask by calling fpsetmask(~0) on FreeBSD. Since we
recently enabled precise exceptions, this was causing an immediate
SIGFPE because we were attempting to set invalid bits in the fpscr.
Properly limit the range of bits that can be set via fpsetmask().
While here, use the correct fp_except_t type instead of fp_rnd_t.
Reported by: pkubaj (in IRC)
Sponsored by: Tag1 Consulting, Inc.
Brandon Bergren [Thu, 25 Feb 2021 18:55:58 +0000 (12:55 -0600)]
[PowerPC64LE] pseries: Fix input buffering logic.
In uart_phyp_get(), when the internal buffer is empty, we make a
hypercall to retrieve up to 16 bytes of input data from the
hypervisor. As this is specified to be returned in BE format, we need
to do a 64-bit byte swap on the first and second half of the data.
If the buffer being passed in was insufficient to return the fetched
data, we store the remainder in the internal buffer and use it to
satisfy the following calls to uart_phyp_get() until it is drained.
However, in this case, we were accidentally byteswapping the internal
buffer again.
Move the byteswapping code to just after the hypercall so it only gets
swapped when we're filling the buffer.
Fixes arrow keys in qemu on pseries, among other console oddities.
Mitchell Horne [Thu, 4 Mar 2021 17:52:45 +0000 (13:52 -0400)]
riscv: fix errors in some atomic type aliases
This appears to be a copy-and-paste error that has simply been
overlooked. The tree contains only two calls to any of the affected
variants, but recent additions to the test suite started exercising the
call to atomic_clear_rel_int() in ng_leave_write(), reliably causing
panics.
Apparently, the issue was inherited from the arm64 atomic header. That
instance was addressed in c90baf6817a0, but the fix did not make its way
to RISC-V.
Note that the particular test case ng_macfilter_test:main still appears
to fail on this platform, but this change reduces the panic to a
timeout.
Mitchell Horne [Mon, 1 Mar 2021 14:01:25 +0000 (10:01 -0400)]
arm64: add definition for IS_SSTEP_TRAP()
arm64 has a distinct exception code for single-step, so we can use this
to detect when an unexpected SS trap is encountered, or when an expected
one is not. See db_stop_at_pc().
Reviewed by: markj, jhb
Sponsored by: The FreeBSD Foundation
Mitchell Horne [Mon, 1 Mar 2021 13:59:25 +0000 (09:59 -0400)]
arm64: fix hardware single-stepping from EL1
The main issue is that debug exceptions must to be disabled for the
entire duration that SS bit in MDSCR_EL1 is set. Otherwise, a
single-step exception will be generated immediately. This can occur
before returning from the debugger (when MDSCR is written to) or before
re-entering it after the single-step (when debug exceptions are unmasked
in the exception handler).
Solve this by delaying the unmask to C code for EL1, and avoid unmasking
at all while handling debug exceptions, thus avoiding any recursive
debug traps.
Reviewed by: markj, jhb
Sponsored by: The FreeBSD Foundation
There were two definitions for the SCSI VPD Block Device Characteristics (page
0xb1): struct scsi_vpd_block_characteristics and struct
scsi_vpd_block_device_characteristics. The latter is more complete and more
widely used. Convert uses of the former to the latter by tweaking the da driver
and removing sturct scsi_vpd_block_characteristics.
Warner Losh [Fri, 26 Feb 2021 18:43:35 +0000 (11:43 -0700)]
cam: add new ASC and ASCQ values related to drive depopulation
Add 04/25 Depopulation restoration in progress, 31/04 Depopulation failed, and
31/05 Depopulation restoration failed.
These are defined in SPC-6r2 (though 31/4 was added in an earlier draft). They
relate to different aspects of in-progress or failed depopulation removal and
restoration commands.
Warner Losh [Tue, 23 Feb 2021 19:33:26 +0000 (12:33 -0700)]
camcontrol: change hueristic for I/O-less devtype
Some SATA drives have 'config' set to 0 in the identify block. Rather than rely
on it, use the strings windows uses to display the drive since they are supposed
to be space padded and will always be non-zero.
Andrew Gierth [Wed, 3 Mar 2021 18:25:11 +0000 (12:25 -0600)]
service(8): use an environment more consistent with init(8)
init(8) sets the "daemon" login class without specifying a pw
entry (so no substitutions are done on the variables). service(8)'s
use of env -L had the effect of specifying root's pw entry, with two
effects: getpwnam and getpwuid are being called, which may not be
entirely safe depending on what nsswitch is up to and what stage of
boot we are at, and substitutions would have been done.
Fix by teaching env(8) to allow -L -/classname to set the class
environment with no pw entry at all specified, and use it in
service(8).
Kyle Evans [Thu, 4 Mar 2021 19:28:53 +0000 (13:28 -0600)]
jail(8): reset to root cpuset before attaching to run commands
Recent changes have made it such that attaching to a jail will augment
the attaching process' cpu mask with the jail's cpuset. While this is
convenient for allowing the administrator to cpuset arbitrary programs
that will attach to a jail, this is decidedly not convenient for
executing long-running daemons during jail creation.
This change inserts a reset of the process cpuset to the root cpuset
between the fork and attach to execute a command. This allows commands
executed to have the widest mask possible, and the administrator can
cpuset(1) it back down inside the jail as needed.
With this applied, one should be able to change a jail's cpuset at
exec.poststart in addition to exec.created. The former was made
difficult if jail(8) itself was running with a constrained set, as then
some processes may have been spawned inside the jail with a non-root
set. The latter is the preferred option so that processes starting in
the jail are constrained appropriately up front.
Note that all system commands are still run with the process' initial
cpuset applied.
armv8crypto: fix AES-XTS regression introduced by ed9b7f44
Initialization of the XTS key schedule was accidentally dropped
when adding AES-GCM support so all-zero schedule was used instead.
This rendered previously created GELI partitions unusable.
This change restores proper XTS key schedule initialization.
Reported by: Peter Jeremy <peter@rulingia.com>
MFC after: immediately
Jung-uk Kim [Wed, 3 Mar 2021 23:10:00 +0000 (18:10 -0500)]
libkvm: Plug couple of memory leaks and check possible calloc(3) failure
First, r204494 introduced dpcpu_off in struct __kvm and it was allocated
from _kvm_dpcpu_init() but it was not free(3)'ed from kvm_close(3).
Second, r291406 introduced kvm_nlist2(3) and converted kvm_nlist(3) to
use the new function but it did not free the temporary buffer.
Also, check possible calloc(3) failure while I am in the neighborhood.
Ed Maste [Tue, 2 Mar 2021 22:35:48 +0000 (17:35 -0500)]
growfs: allow operation on RW-mounted filesystems
growfs supports growing mounted filesystems (writes are temporarily
suspended while the grow happens). Drop the check for fs_clean == 0
to restore this case. Leave fs_flags check for FS_UNCLEAN or
FS_NEEDSFSCK which represent the state of the filesystem when it was
mounted, and fsck should be run first if they are set.
PR: 253754
Reviewed by: mckusick
Fixes: 6eb925f8450f ("Filesystem utilities that modify the...")
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29021
Rick Macklem [Thu, 18 Feb 2021 22:08:19 +0000 (14:08 -0800)]
nfs-over-tls: add user space daemons rpc.tlsclntd and rpc.tlsservd
The kernel changes needed for nfs-over-tls have been committed to main.
However, nfs-over-tls requires user space daemons to handle the
TLS handshake and other non-application data TLS records.
There is one daemon (rpc.tlsclntd) for the client side and one daemon
(rpc.tlsservd) for the server side, although they share a fair amount
of code found in rpc.tlscommon.c and rpc.tlscommon.h.
They use a KTLS enabled OpenSSL to perform the actual work and, as such,
are only built when MK_OPENSSL_KTLS is set.
Communication with the kernel is done via upcall RPCs done on AF_LOCAL
sockets and the custom system call rpctls_syscall.
Ed Maste [Fri, 5 Mar 2021 17:49:23 +0000 (12:49 -0500)]
Cirrus-CI: temporarily avoid qemu smoke test boot
Cirrus-CI has been red for some time because we're running out of disk
space on the ephemeral GCP VMs. For now remove the package + qemu boot,
and just check for build regressions.
This change to be reverted once we have identified and addressed the
underlying issue.
Ed Maste [Fri, 29 Jan 2021 14:34:27 +0000 (09:34 -0500)]
Cirrus-CI: remove svn2git remnant
Previously Cirrus was skipped on svn_head to avoid running CI on two
different branches with identical content. With the transition to git
this serves no purpose.
Reported by: kevans
Sponsored by: The FreeBSD Foundation
Ed Maste [Fri, 19 Feb 2021 01:41:33 +0000 (20:41 -0500)]
tests/sys/audit: force PIE off
df093aa9463b linked against libprivateauditd.a, but that is currently
(and incorrectly) built as position-dependent. For now just force PIE
off for this test to fix the WITH_PIE build.
Peter Grehan [Sat, 27 Feb 2021 04:15:04 +0000 (14:15 +1000)]
Import wireguard fixes from pfSense 2.5
Merge the following fixes from https://github.com/pfsense/FreeBSD-src 1940e7d3 Save address of ingress packets to allow wg to work on HA 8f5531f1 Fix connection to IPv6 endpoint 825ed9ee Fix tcpdump for wg IPv6 rx tunnel traffic 2ec232d3 Fix issue with replying to INITIATION messages in server mode ec77593a Return immediately in wg_init if in DETACH'd state 0f0dde6f Remove unnecessary wg debug printf on transmit 2766dc94 Detect and fix case in wg_init() where sockets weren't cleaned up b62cc7ac Close the UDP tunnel sockets when the interface has been stopped
ixl(4): Add ability to control link state on ifconfig down
Add sysctl link_active_on_if_down, which allows user to control
if interface is kept in active state when it is brought
down with ifconfig. Set it to enabled by default to preserve
backwards compatibility.
ixl(4): Report RX errors as sum of all RX error counters
HW keeps track of RX errors using several counters, each for
specific type of errors. Report RX errors to OS as sum
of all those counters: CRC errors, illegal bytes, checksum,
length, undersize, fragment, oversize and jabber errors.
There is no HW counter for frames with invalid L3/L4 checksums
so add a SW one.
Also add a "rx_errors" sysctl with a copy of netstat IERRORS
counter value to make it easier accessible from scripts.
ix(4): Report RX errors as sum of all RX error counters
HW keeps track of RX errors using several counters, each for
specific type of errors. Report RX errors to OS as sum
of all those counters: CRC errors, illegal bytes, checksum,
length, undersize, fragment, oversize and jabber errors.
Also, add new "rx_errs" sysctl in the dev.ix.N.mac_stats tree. This is
to provide an another way to display the sum of RX errors.
X700 family of controllers has limited number of available VLAN
HW filters. Driver did not handle properly a case when user
assigned more VLANs to the interface which had all filters
already in use. Fix that by disabling HW filtering when
it is impossible to create filters for all requested VLANs.
Keep track of registered VLANs using bitstring to be able
to re-enable HW filtering when number of requested VLANs
drops below the limit.
Also switch all allocations to use M_IXL malloc type
to ease detecting memory leaks in the driver.
Alex Richardson [Mon, 25 Jan 2021 14:11:45 +0000 (14:11 +0000)]
qeueue.h: Add {SLIST,STAILQ,LIST,TAILQ}_END()
We provide these for compat with other queue.h headers since some software
assumes it exists (e.g. the libevent contrib code), but we are not
encouraging their use (NULL should be used instead).
This fixes the following warning (which should arguable be an error since
it results in a function call to an undefined function):
.../contrib/libevent/buffer.c:495:16: warning: implicit declaration of function 'LIST_END' is invalid in C99 [-Wimplicit-function-declaration]
cbent != LIST_END(&buffer->callbacks);
^
.../contrib/libevent/buffer.c:495:13: warning: comparison between pointer and integer ('struct evbuffer_cb_entry *' and 'int') [-Wpointer-integer-compare]
cbent != LIST_END(&buffer->callbacks);
~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Alexander Motin [Tue, 2 Feb 2021 18:37:13 +0000 (13:37 -0500)]
Make DataSN counter of solicited Data-Out local.
DataSN for solicited Data-Out is per-R2T. Since we handle whole R2T
in one go, we don't need to store it anywhere, especially in global
per-command structure. This may allow us to handle multiple R2T per
command at once, if we decide, or may be relax locking.
Rename the second use of that field to io_referenced_task_tag.
Mark Johnston [Thu, 25 Feb 2021 15:04:44 +0000 (10:04 -0500)]
buf: Fix the dirtybufthresh check
dirtybufthresh is a watermark, slightly below the high watermark for
dirty buffers. When a delayed write is issued, the dirtying thread will
start flushing buffers if the dirtybufthresh watermark is reached. This
helps ensure that the high watermark is not reached, otherwise
performance will degrade as clustering and other optimizations are
disabled (see buf_dirty_count_severe()).
When the buffer cache was partitioned into "domains", the dirtybufthresh
threshold checks were not updated. Fix this.
Mark Johnston [Thu, 25 Feb 2021 15:04:44 +0000 (10:04 -0500)]
sendfile: Use the pager size to determine the file extent when possible
Previously sendfile would issue a VOP_GETATTR and use the returned size,
i.e., the file size. When paging in file data, sendfile_swapin() will
use the pager to determine whether it needs to zero-fill, most often
because of a hole in a sparse file. An attempt to page in beyond the
end of a file is treated this way, and occurs when the requested page is
past the end of the pager. In other words, both the file size and pager
size were used interchangeably.
With ZFS, updates to the pager and file sizes are not synchronized by
the exclusive vnode lock, at least partially due to its use of
MNTK_SHARED_WRITES. In particular, the pager size is updated after the
file size, so in the presence of a writer concurrently extending the
file, sendfile could incorrectly instantiate "holes" in the page cache
pages backing the file, which manifests as data corruption when reading
the file back from the page cache. The on-disk copy is unaffected.
Fix this by consistently using the pager size when available.
Reported by: dumbbell
Reviewed by: chs, kib
Tested by: dumbbell, pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28811
Kristof Provost [Wed, 24 Feb 2021 15:40:37 +0000 (16:40 +0100)]
bridge tests: Test that we also forward on some interfaces
Ensure that we not only block on some interfaces, but also forward on
some. Without the previous commit we wound up discarding on all ports,
rather than only on the ports needed to break the loop.
ipfw: make algo name argument optional for some table types
Most of table types currently supported by ipfw have only one
algorithm implementation. When user creates such tables, allow
to omit algo name in arguments. E.g. now it is possible:
ipfw table T1 create type number
ipfw table T2 create type iface
ipfw table T3 create type flow
Kyle Evans [Fri, 26 Feb 2021 21:46:47 +0000 (15:46 -0600)]
jail: allow root to implicitly widen its cpuset to attach
The default behavior for attaching processes to jails is that the jail's
cpuset augments the attaching processes, so that it cannot be used to
escalate a user's ability to take advantage of more CPUs than the
administrator wanted them to.
This is problematic when root needs to manage jails that have disjoint
sets with whatever process is attaching, as this would otherwise result
in a deadlock. Therefore, if we did not have an appropriate common
subset of cpus/domains for our new policy, we now allow the process to
simply take on the jail set *if* it has the privilege to widen its mask
anyways.
With the new logic, root can still usefully cpuset a process that
attaches to a jail with the desire of maintaining the set it was given
pre-attachment while still retaining the ability to manage child jails
without jumping through hoops.
A test has been added to demonstrate the issue; cpuset of a process
down to just the first CPU and attempting to attach to a jail without
access to any of the same CPUs previously resulted in EDEADLK and now
results in taking on the jail's mask for privileged users.
Rick Macklem [Sun, 28 Feb 2021 01:54:05 +0000 (17:54 -0800)]
nfsclient: fix panic in cache_enter_time()
Juraj Lutter (otis@) reported a panic "dvp != vp not true" in
cache_enter_time() called from the NFS client's nfsrpc_readdirplus()
function.
This is specific to an NFSv3 mount with the "rdirplus" mount
option. Unlike NFSv4, NFSv3 replies to ReaddirPlus
includes entries for the current directory.
This trivial patch avoids doing a cache_enter_time()
call for the current directory to avoid the panic.
MFC 9febbc454190:
Fix for natd(8) sending wrong sequence number after TCP retransmission,
terminating a TCP connection.
If a TCP packet must be retransmitted and the data length has changed in the
retransmitted packet, due to the internal workings of TCP, typically when ACK
packets are lost, then there is a 30% chance that the logic in GetDeltaSeqOut()
will find the correct length, which is the last length received.
This can be explained as follows:
If a "227 Entering Passive Mode" packet must be retransmittet and the length
changes from 51 to 50 bytes, for example, then we have three cases for the
list scan in GetDeltaSeqOut(), depending on how many prior packets were
received modulus N_LINK_TCP_DATA=3:
case 1: index 0: original packet 51
index 1: retransmitted packet 50
index 2: not relevant
case 2: index 0: not relevant
index 1: original packet 51
index 2: retransmitted packet 50
case 3: index 0: retransmitted packet 50
index 1: not relevant
index 2: original packet 51
This patch simply changes the searching order for TCP packets, always starting
at the last received packet instead of any received packet, in
GetDeltaAckIn() and GetDeltaSeqOut().
Martin Matuska [Wed, 3 Mar 2021 01:38:09 +0000 (02:38 +0100)]
zfs: cancel TRIM or initialize on FAULTED non-writeable vdevs
From the openzfs commit message:
When a device which is actively trimming or initializing becomes
FAULTED, and therefore no longer writable, cancel the active
TRIM or initialization. When the device is merely taken offline
with `zpool offline` then stop the operation but do not cancel it.
When the device is brought back online the operation will be
resumed if possible.
Martin Matuska [Wed, 3 Mar 2021 01:32:59 +0000 (02:32 +0100)]
zfs: fix assert in FreeBSD-specific dmu_read_pages
From the openzfs 2e160dee9 commit message:
The function has three similar pieces of code: for read-behind pages,
requested pages and read-ahead pages. All three pieces had an
assert to ensure that the page is not mapped. Later the assert was
relaxed to require that the page is not mapped for writing. But that
was done in two places out of three. This change fixes the third piece,
read-ahead.
Martin Matuska [Wed, 3 Mar 2021 01:28:56 +0000 (02:28 +0100)]
zfs: fix vdev_rebuild_thread deadlock
From the openzfs 8e43fa12c commit message:
The metaslab_disable() call may block waiting for a txg sync.
Therefore it's important that vdev_rebuild_thread release the
SCL_CONFIG read lock it is holding before this call. Failure
to do so can result in the txg_sync thread getting blocked
waiting for this lock which results in a deadlock.
Martin Matuska [Wed, 3 Mar 2021 01:25:03 +0000 (02:25 +0100)]
zfs: fix overly broad locking in spa_vdev_config_exit()
Resolves a deadlock which can occur when the ZED or zpool
command attaches a new device.
From the openzfs 75a089ed3 commit message:
Calling vdev_free() only requires the we acquire the spa config
SCL_STATE_ALL locks, not the SCL_ALL locks. In particular, we need
need to avoid taking the SCL_CONFIG lock (included in SCL_ALL) as a
writer since this can lead to a deadlock. The txg_sync_thread() may
block in spa_txg_history_init_io() when taking the SCL_CONFIG lock
as a reading when it detects there's a pending writer.
Mark Johnston [Wed, 24 Feb 2021 02:15:50 +0000 (21:15 -0500)]
rmlock: Add a required compiler membar to the rlock slow path
The tracker flags need to be loaded only after the tracker is removed
from its per-CPU queue. Otherwise, readers may fail to synchronize with
pending writers attempting to propagate priority to active readers, and
readers and writers deadlock on each other. This was observed in a
stable/12-based armv7 kernel where the compiler had reordered the load
of rmp_flags to before the stores updating the queue.
linux: fix handling of flags for 32 bit send(2) syscall
Previously the flags were passed as-is, which could resulted
in spurious EAGAIN returned for non-blocking sockets, which
broke some Steam games.
PR: 248065
Reported By: Alex S <iwtcex@gmail.com>
Tested By: Alex S <iwtcex@gmail.com>
Reviewed By: emaste
MFC After: 3 days
Sponsored By: The FreeBSD Foundation
Use compat.linux.emul_path instead of hardcoded path in /etc/rc.d/linux
In /etc/rc.d/linux the mounting paths of procfs, sysfs and devfs
are hardcoded to "/compat/linux". Switching to the content of
compat.linux.emul_path sysctl would allow to switch linuxulator
to different place.
Submitted by: freebsdnewbie_freenet.de
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27807
update the SACK loss recovery to RFC6675, with the following new features:
- improved pipe calculation which does not degrade under heavy loss
- engaging in Loss Recovery earlier under adverse conditions
- Rescue Retransmission in case some of the trailing packets of a request got lost
All above changes are toggled with the sysctl "rfc6675_pipe" (disabled by default).
Kristof Provost [Mon, 22 Feb 2021 07:19:43 +0000 (08:19 +0100)]
arp/nd: Cope with late calls to iflladdr_event
When tearing down vnet jails we can move an if_bridge out (as
part of the normal vnet_if_return()). This can, when it's clearing out
its list of member interfaces, change its link layer address.
That sends an iflladdr_event, but at that point we've already freed the
AF_INET/AF_INET6 if_afdata pointers.
In other words: when the iflladdr_event callbacks fire we can't assume
that ifp->if_afdata[AF_INET] will be set.
Kristof Provost [Sun, 21 Feb 2021 20:20:32 +0000 (21:20 +0100)]
bridge: Remove members when assigned to a new vnet
When the bridge is moved to a different vnet we must remove all of its
member interfaces (and span interfaces), because we don't know if those
will be moved along with it. We don't want to hold references to
interfaces not in our vnet.
Kristof Provost [Sat, 20 Feb 2021 09:11:30 +0000 (10:11 +0100)]
bridge: Support STP on VLAN devices
VLAN devices have type IFT_L2VLAN, so the STP code mistakenly believed
they couldn't be used for STP. That's not the case, so add the
ITF_L2VLAN to the check.
Emmanuel Vadot [Thu, 25 Feb 2021 15:34:28 +0000 (16:34 +0100)]
mkimg: We always want the last block of the last inserted partition
Even with an absolute offset we want to know the last block the partition
otherwise we endup with an image the size of the metadata.
This allow to create image with the ESP placed at a specific position which
is useful on arm/arm64 where u-boot have always a hard time to read the ESP
if it's not aligned on 512k.
mkimg -v -o sdcard -s gpt -p efi::54M:1M -p freebsd-ufs::1G
now works.