Stefan Eßer [Sun, 27 Dec 2020 20:53:09 +0000 (21:53 +0100)]
bc: Upgrade to version 3.2.4
This update changes the behavior of "-e" or "-f" in BC_ENV_ARGS:
Use of these options on the command line makes bc exit after executing
the given commands. These options will not cause bc to exit when
passed via the environment (but EOF in STDIN or -e or -f on the
command line will make bc exit as before).
The same applies to DC_ENV_ARGS with regard to the dc program.
Make length(0) and length(0.0) return 1 for compatibility with GNU bc
and the traditional FreeBSD bc.
Fix a potential division by zero error in a non-standard (extended)
math library function.
shu [Wed, 3 Feb 2021 16:51:45 +0000 (16:51 +0000)]
linux: make timerfd_settime(2) set expirations count to zero
On Linux, read(2) from a timerfd file descriptor returns an unsigned
8-byte integer (uint64_t) containing the number of expirations
that have occurred, if the timer has already expired one or more
times since its settings were last modified using timerfd_settime(),
or since the last successful read(2). That's to say, once we do
a read or call timerfd_settime(), timer fd's expiration count should
be zero. Some Linux applications create timerfd and add it to epoll
with LT mode, when event comes, they do timerfd_settime instead
of read to stop event source from trigger. On FreeBSD,
timerfd_settime(2) didn't set the count to zero, which caused high
CPU utilization.
Gordon Bergling [Sat, 13 Mar 2021 18:28:26 +0000 (19:28 +0100)]
find(1): Refine the HISTORY within the manual page.
A simple find command appeared in Version 1 AT&T UNIX and was removed in
Version 3 AT&T UNIX. It was rewritten for Version 5 AT&T UNIX and later
be enhanced for the Programmer's Workbench (PWB). These changes were
later incorporated in AT&T UNIX v7.
Kristof Provost [Wed, 10 Mar 2021 21:56:11 +0000 (22:56 +0100)]
pf: Fully remove interrupt events on vnet cleanup
swi_remove() removes the software interrupt handler but does not remove
the associated interrupt event.
This is visible when creating and remove a vnet jail in `procstat -t
12`.
We can remove it manually with intr_event_destroy().
Kristof Provost [Wed, 3 Mar 2021 10:06:49 +0000 (11:06 +0100)]
altq: Increase maximum number of CBQ and HFSC classes
In some configurations we need more classes than ALTQ supports by
default. Increase the maximum number of classes we allow.
This will only cost us a comparatively trivial amount of memory, so
there's little reason not to do so.
If ever we find we want even more we may want to consider turning these
defines into a tunable, but for now do the easy thing.
netmap: fix memory leak in NETMAP_REQ_PORT_INFO_GET
The netmap_ioctl() function has a reference counting bug in case of
NETMAP_REQ_PORT_INFO_GET command. When `hdr->nr_name[0] == '\0'`,
the function does not decrease the refcount of "nmd", which is
increased by netmap_mem_find(), causing a refcount leak.
Reported by: Xiyu Yang <sherllyyang00@gmail.com>
Submitted by: Carl Smith <carl.smith@alliedtelesis.co.nz>
MFC after: 3 days
PR: 254311
Mitchell Horne [Wed, 10 Mar 2021 14:57:12 +0000 (10:57 -0400)]
ns8250: don't drop IER_TXRDY on bus_grab/ungrab
It has been observed that some systems are often unable to resume from
ddb after entering with debug.kdb.enter=1. Checking the status further
shows the terminal is blocked waiting in tty_drain(), but it never makes
progress in clearing the output queue, because sc->sc_txbusy is high.
I noticed that when entering polling mode for the debugger, IER_TXRDY is
set in the failure case. Since this bit is never tracked by the softc,
it will not be restored by ns8250_bus_ungrab(). This creates a race in
which a TX interrupt can be lost, creating the hang described above.
Ensuring that this bit is restored is enough to prevent this, and resume
from ddb as expected.
The solution is to track this bit in the sc->ier field, for the same
lifetime that TX interrupts are enabled.
PR: 223917, 240122
Sponsored by: The FreeBSD Foundation
Juraj Lutter [Sun, 28 Feb 2021 22:07:14 +0000 (23:07 +0100)]
newsyslog(8): Implement a new 'E' flag to not rotate empty log files
Based on an idea from dvl's coworker, László DANIELISZ, implement
a new flag, 'E', that prevents newsyslog(8) from rotating the empty
log files. This 'E' flag ist mostly usable in conjunction with 'B'
flag that instructs newsyslog(8) to not insert an informational
message into the log file after rotation, keeping it still empty.
Rick Macklem [Tue, 2 Mar 2021 22:18:23 +0000 (14:18 -0800)]
nfsclient: Fix ReadDS/WriteDS/CommitDS nfsstats RPC counts for a NFSv3 DS
During a recent virtual NFSv4 testing event, a bug in the FreeBSD client
was detected when doing I/O DS operations on a Flexible File Layout pNFS
server. For an NFSv3 DS, the Read/Write/Commit nfsstats were incremented
instead of the ReadDS/WriteDS/CommitDS counts.
This patch fixes this.
Only the RPC counts reported by nfsstat(1) were affected by this bug,
the I/O operations were performed correctly.
Rick Macklem [Mon, 1 Mar 2021 20:49:32 +0000 (12:49 -0800)]
nfsclient: Fix the stripe unit size for a File Layout pNFS layout
During a recent virtual NFSv4 testing event, a bug in the FreeBSD client
was detected when doing a File Layout pNFS DS I/O operation.
The size of the I/O operation was smaller than expected.
The I/O size is specified as a stripe unit size in bits 6->31 of nflh_util
in the layout. I had misinterpreted RFC5661 and had shifted the value
right by 6 bits. The correct interpretation is to use the value as
presented (it is always an exact multiple of 64), clearing bits 0->5.
This patch fixes this.
Without the patch, I/O through the DSs work, but the I/O size is 1/64th
of what is optimal.
Rick Macklem [Sun, 28 Feb 2021 22:53:54 +0000 (14:53 -0800)]
nfsclient: add nfs node locking around uses of n_direofoffset
During code inspection I noticed that the n_direofoffset field
of the NFS node was being manipulated without any lock being
held to make it SMP safe.
This patch adds locking of the NFS node's mutex around
handling of n_direofoffset to make it SMP safe.
I have not seen any failure that could be attributed to n_direofoffset
being manipulated concurrently by multiple processors, but I think this
is possible, since directories are read with shared vnode
locking, plus locks only on individual buffer cache blocks.
However, there have been as yet unexplained issues w.r.t reading
large directories over NFS that could have conceivably been caused
by concurrent manipulation of n_direofoffset.
Rick Macklem [Sun, 28 Feb 2021 22:15:32 +0000 (14:15 -0800)]
nfsclient: add checks for a server returning the current directory
Commit 3fe2c68ba20f dealt with a panic in cache_enter_time() where
the vnode referred to the directory argument.
It would also be possible to get these panics if a broken
NFS server were to return the directory as an new object being
created within the directory or in a Lookup reply.
This patch adds checks to avoid the panics and logs
messages to indicate that the server is broken for the
file object creation cases.
Rick Macklem [Sun, 28 Feb 2021 01:54:05 +0000 (17:54 -0800)]
nfsclient: fix panic in cache_enter_time()
Juraj Lutter (otis@) reported a panic "dvp != vp not true" in
cache_enter_time() called from the NFS client's nfsrpc_readdirplus()
function.
This is specific to an NFSv3 mount with the "rdirplus" mount
option. Unlike NFSv4, NFSv3 replies to ReaddirPlus
includes entries for the current directory.
This trivial patch avoids doing a cache_enter_time()
call for the current directory to avoid the panic.
Wei Hu [Thu, 15 Oct 2020 11:44:28 +0000 (11:44 +0000)]
Hyper-V: hn: Relinquish cpu in HN_LOCK to avoid deadlock
The try lock loop in HN_LOCK put the thread spinning on cpu if the lock
is not available. It is possible to cause deadlock if the thread holding
the lock is sleeping. Relinquish the cpu to work around this problem even
it doesn't completely solve the issue. The priority inversion could cause
the livelock no matter how less likely it could happen. A more complete
solution may be needed in the future.
Wei Hu [Thu, 15 Oct 2020 05:57:20 +0000 (05:57 +0000)]
Hyper-V: pcib: Check revoke status during device attach
It is possible that the vmbus pcib channel is revoked during attach path.
The attach path could be waiting for response from host and this response will never
arrive since the channel has already been revoked from host point of view. Check
this situation during wait complete and return failed if this happens.
Wei Hu [Mon, 31 Aug 2020 09:05:45 +0000 (09:05 +0000)]
Hyper-V: storvsc: Enhance srb_status code handling.
In hv_storvsc_io_request() when coring, prevent changing of the send channel
from the base channel to another one. storvsc_poll always probes on the base
channel.
Based upon conversations with Microsoft, changed the handling of srb_status
codes. Most we should never get, others yes. All are treated as retry-able
except for two. We should not get these statuses, but if we ever do, the I/O
state is not known.
Wei Hu [Thu, 30 Jul 2020 07:26:11 +0000 (07:26 +0000)]
Prevent framebuffer mmio space from being allocated to other devices on HyperV.
On Gen2 VMs, Hyper-V provides mmio space for framebuffer.
This mmio address range is not useable for other PCI devices.
Currently only efifb driver is using this range without reserving
it from system.
Therefore, vmbus driver reserves it before any other PCI device
drivers start to request mmio addresses.
Alexander Motin [Thu, 28 Jan 2021 20:53:49 +0000 (15:53 -0500)]
Make software iSCSI more configurable.
Move software iSCSI tunables/sysctls into kern.icl.soft subtree.
Replace several hardcoded length constants there with variables.
While there, stretch the limits to better match Linux' open-iscsi
and our own initiator with new MAXPHYS of 1MB. Our CTL target is
also optimized for up to 1MB I/Os, so there is also a match now.
For Windows 10 and VMware 6.7 initiators at default settings it
should make no change, since previous limits were sufficient there.
Tests of QD1 1MB writes from FreeBSD over 10GigE link show throughput
increase by 29% on idle connection and 132% with concurrent QD8 reads.
Alexander Motin [Wed, 3 Mar 2021 20:21:26 +0000 (15:21 -0500)]
Move ic_check_send_space clear to the actual check.
It closes tiny race when the flag could be set between being cleared
and the space is checked, that would create us some more work. The
flag setting is protected by both locks, so we can clear it in either
place, but in between both locks are dropped.
I think it allowed to avoid some TX thread wakeups while the socket
buffer is full. But add there another options if ic_check_send_space
is set, which means socket just reported that new space appeared, so
it may have sense to pull more data from ic_to_send for better TX
coalescing.
Alexander Motin [Sat, 27 Feb 2021 15:14:05 +0000 (10:14 -0500)]
Micro-optimize OOA queue processing.
- Move ctl_get_cmd_entry() calls from every OOA traversal to when
the requests first inserted, storing seridx in struct ctl_scsiio.
- Move some checks out of the loop in ctl_check_ooa().
- Replace checks for errors that can not happen with asserts.
- Transpose ctl_serialize_table, so that any OOA traversal accessed
only one row (cache line). Compact it from enum to uint8_t.
- Optimize static branch predictions in hottest places.
Due to O(n) nature on deep LUN queues this can be the hottest code
path in CTL, and additional 20% of IOPS I see in some 4KB I/O tests
are good to have in reserve. About 50% of CPU time here according
to the profiles is now spent in two memory accesses per traversed
request in OOA.
Alexander Motin [Sun, 21 Feb 2021 21:45:14 +0000 (16:45 -0500)]
Refactor CTL datamove KPI.
- Make frontends call unified CTL core method ctl_datamove_done()
to report move completion. It allows to reduce code duplication
in differerent backends by accounting DMA time in common code.
- Add to ctl_datamove_done() and be_move_done() callback samethr
argument, reporting whether the callback is called in the same
context as ctl_datamove(). It allows for some cases like iSCSI
write with immediate data or camsim frontend write save one context
switch, since we know that the context is sleepable.
- Remove data_move_done() methods from struct ctl_backend_driver,
unused since forever.
Alexander Motin [Fri, 19 Feb 2021 20:42:57 +0000 (15:42 -0500)]
Microoptimize CTL I/O queues.
Switch OOA queue from TAILQ to LIST and change its direction, so that
we traverse it forward, not backward. There is only one place where
we really need other direction, and it is not critical.
Use STAILQ_REMOVE_HEAD() instead of STAILQ_REMOVE() in backends.
Replace few impossible conditions with assertions.
Alexander Motin [Fri, 19 Feb 2021 03:07:32 +0000 (22:07 -0500)]
Save context switch per I/O for iSCSI and IOCTL frontends.
Introduce new CTL core KPI ctl_run(), preprocessing I/Os in the caller
context instead of scheduling another thread just for that. This call
may sleep, that is not acceptable for some frontends like the original
CAM/FC one, but iSCSI already has separate sleepable per-connection RX
threads, and another thread scheduling is mostly just a waste of time.
IOCTL frontend actually waits for the I/O completion in the caller
thread, so the use of another thread for this has even less sense.
With this change I can measure ~5% IOPS improvement on 4KB iSCSI I/Os
to ZFS.
Dimitry Andric [Wed, 10 Mar 2021 21:31:40 +0000 (22:31 +0100)]
Partially revert libcxxrt changes to avoid _Unwind_Exception change
After the recent cherry-picking of libcxxrt commits 0ee0dbfb0d26 and d2b3fadf2db5, users reported that editors/libreoffice packages from the
official package builders did not start anymore. It turns out that the
combination of these commits subtly changes the ABI, requiring all
applications that depend on internal details of struct _Unwind_Exception
(available via unwind-arm.h and unwind-itanium.h) to be recompiled.
However, the FreeBSD package builders always use -RELEASE jails, so
these still use the old declaration of struct _Unwind_Exception, which
is not entirely compatible. In particular, LibreOffice uses this struct
in its internal "uno bridge" component, where it attempts to setup its
own exception handling mechanism.
To fix this incompatibility, go back to the old declarations of struct
_Unwind_Exception, and restore the __LP64__ specific workaround we had
in place before (which was to cope with yet another, older ABI bug).
Effectively, this reverts upstream libcxxrt commits 88bdf6b290da
("Specify double-word alignment for ARM unwind") and b96169641f79
("Updated Itanium unwind"), and reapplies our commit 3c4fd2463bb2
("libcxxrt: add padding in __cxa_allocate_* to fix alignment").
Gleb Smirnoff [Tue, 8 Dec 2020 16:46:00 +0000 (16:46 +0000)]
The list of ports in configuration path shall be protected by locks,
epoch shall be used only for fast path. Thus use LAGG_XLOCK() in
lagg_[un]register_vlan. This fixes sleeping in epoch panic.
Dimitry Andric [Tue, 23 Feb 2021 20:03:32 +0000 (21:03 +0100)]
Build lib/msun tests with compiler builtins disabled
This forces the compiler to emit calls to libm functions, instead of
possibly substituting pre-calculated results at compile time, which
should help to actually test those functions.
We could just use a C implementation using __builtin_fabs(), but using
this assembly version guarantees that there is no additional prolog/epilog
code. Additionally, clang generates worse code for masking off the top bit
than GCC: https://bugs.llvm.org/show_bug.cgi?id=49377.
This fixes the RISCV64 softfloat world build after cf97d2a1dab8. That commit
added -fno-builtin to the msun tests which resulted in the first references to
fabs (previously the compiler inlined all calls).
Reviewed By: dim
Reported by: mjg
Differential Revision: https://reviews.freebsd.org/D28994
Alexander Motin [Sat, 6 Mar 2021 03:39:52 +0000 (22:39 -0500)]
Do not exit ctl_be_block_worker() prematurely.
Return while there are any I/Os in a queue may result in them stuck
indefinitely, since there is only one taskqueue task for all of them.
I think I've reproduced this by switching ha_role to secondary under
heavy load.
Brandon Bergren [Fri, 13 Nov 2020 16:49:41 +0000 (16:49 +0000)]
[PowerPC] Allow traversal of oversize OF properties.
In standards such as LoPAPR, property names in excess of the usual 31
characters exist.
This breaks property traversal.
While in IEEE 1275-1994, nextprop is defined explicitly to work with a
32-byte region of memory, using a larger buffer should be fine. There is
actually no way to pass a buffer length to the nextprop call in the OF
client interface, so SLOF actually just blindly overflows the buffer.
So we have to defensively make the buffer larger, to avoid memory
corruption when reading out long properties on live OF systems.
Note also that on real-mode OF, things are pretty tight because we are
allocating against a static bounce buffer in low memory, so we can't just
use a huge buffer to work around this without it being wasteful of our
limited amount of 32-bit physical memory.
This allows a patched ofwdump to operate properly on SLOF (i.e. pseries)
systems, as well as any other PowerPC systems with overlength properties.
Brandon Bergren [Mon, 1 Mar 2021 02:35:53 +0000 (20:35 -0600)]
[PowerPC64] Fix multiple issues in fpsetmask().
Building R exposed a problem in fpsetmask() whereby we were not properly
clamping the provided mask to the valid range.
R initilizes the mask by calling fpsetmask(~0) on FreeBSD. Since we
recently enabled precise exceptions, this was causing an immediate
SIGFPE because we were attempting to set invalid bits in the fpscr.
Properly limit the range of bits that can be set via fpsetmask().
While here, use the correct fp_except_t type instead of fp_rnd_t.
Reported by: pkubaj (in IRC)
Sponsored by: Tag1 Consulting, Inc.
Jung-uk Kim [Wed, 3 Mar 2021 23:10:00 +0000 (18:10 -0500)]
libkvm: Plug couple of memory leaks and check possible calloc(3) failure
First, r204494 introduced dpcpu_off in struct __kvm and it was allocated
from _kvm_dpcpu_init() but it was not free(3)'ed from kvm_close(3).
Second, r291406 introduced kvm_nlist2(3) and converted kvm_nlist(3) to
use the new function but it did not free the temporary buffer.
Also, check possible calloc(3) failure while I am in the neighborhood.
Ed Maste [Tue, 2 Mar 2021 22:35:48 +0000 (17:35 -0500)]
growfs: allow operation on RW-mounted filesystems
growfs supports growing mounted filesystems (writes are temporarily
suspended while the grow happens). Drop the check for fs_clean == 0
to restore this case. Leave fs_flags check for FS_UNCLEAN or
FS_NEEDSFSCK which represent the state of the filesystem when it was
mounted, and fsck should be run first if they are set.
PR: 253754
Reviewed by: mckusick
Fixes: 6eb925f8450f ("Filesystem utilities that modify the...")
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29021
Kyle Evans [Thu, 4 Mar 2021 20:13:55 +0000 (14:13 -0600)]
Revert "MFC kern: cpuset: properly rebase when attaching to a jail"
This behavior change is too invasive to be made between minor versions,
back it out in stable/12 -- it will be first introduced in 13.0.
The cpuset test has been adjusted to account for the legacy behavior,
with a note added as to why it's different and doesn't work if run
as-is on 13.0.
Conrad Meyer [Tue, 28 Jan 2020 01:39:50 +0000 (01:39 +0000)]
amdtemp(4): Add support for Family 17h CCD sensors
Probe Family 17h CPUs for up to 4 (Zen, Zen+) or 8 (Zen2) CCD temperature
sensors. These were discovered by Ondrej Čerman
(https://github.com/ocerman) and collaborators experimentally, and are not
currently documented in any datasheet I have access to.
Alexander Motin [Tue, 2 Feb 2021 18:37:13 +0000 (13:37 -0500)]
Make DataSN counter of solicited Data-Out local.
DataSN for solicited Data-Out is per-R2T. Since we handle whole R2T
in one go, we don't need to store it anywhere, especially in global
per-command structure. This may allow us to handle multiple R2T per
command at once, if we decide, or may be relax locking.
Rename the second use of that field to io_referenced_task_tag.
Mark Johnston [Thu, 25 Feb 2021 15:04:44 +0000 (10:04 -0500)]
buf: Fix the dirtybufthresh check
dirtybufthresh is a watermark, slightly below the high watermark for
dirty buffers. When a delayed write is issued, the dirtying thread will
start flushing buffers if the dirtybufthresh watermark is reached. This
helps ensure that the high watermark is not reached, otherwise
performance will degrade as clustering and other optimizations are
disabled (see buf_dirty_count_severe()).
When the buffer cache was partitioned into "domains", the dirtybufthresh
threshold checks were not updated. Fix this.
Kristof Provost [Wed, 24 Feb 2021 15:40:37 +0000 (16:40 +0100)]
bridge tests: Test that we also forward on some interfaces
Ensure that we not only block on some interfaces, but also forward on
some. Without the previous commit we wound up discarding on all ports,
rather than only on the ports needed to break the loop.
ipfw: make algo name argument optional for some table types
Most of table types currently supported by ipfw have only one
algorithm implementation. When user creates such tables, allow
to omit algo name in arguments. E.g. now it is possible:
ipfw table T1 create type number
ipfw table T2 create type iface
ipfw table T3 create type flow
Rick Macklem [Mon, 15 Feb 2021 02:16:58 +0000 (18:16 -0800)]
getdirentries.2: fix for NFS mounts
It was reported that getdirentries(2) was
returning dirents with d_off set to 0 for an NFS
mount.
This is believed to be correct behaviour at
this time (it may change for some NFS mounts
in the future), but is inconsistent with what the
getdirentries(2) man page says.
MFC 9febbc454190:
Fix for natd(8) sending wrong sequence number after TCP retransmission,
terminating a TCP connection.
If a TCP packet must be retransmitted and the data length has changed in the
retransmitted packet, due to the internal workings of TCP, typically when ACK
packets are lost, then there is a 30% chance that the logic in GetDeltaSeqOut()
will find the correct length, which is the last length received.
This can be explained as follows:
If a "227 Entering Passive Mode" packet must be retransmittet and the length
changes from 51 to 50 bytes, for example, then we have three cases for the
list scan in GetDeltaSeqOut(), depending on how many prior packets were
received modulus N_LINK_TCP_DATA=3:
case 1: index 0: original packet 51
index 1: retransmitted packet 50
index 2: not relevant
case 2: index 0: not relevant
index 1: original packet 51
index 2: retransmitted packet 50
case 3: index 0: retransmitted packet 50
index 1: not relevant
index 2: original packet 51
This patch simply changes the searching order for TCP packets, always starting
at the last received packet instead of any received packet, in
GetDeltaAckIn() and GetDeltaSeqOut().
Mark Johnston [Wed, 24 Feb 2021 02:15:50 +0000 (21:15 -0500)]
rmlock: Add a required compiler membar to the rlock slow path
The tracker flags need to be loaded only after the tracker is removed
from its per-CPU queue. Otherwise, readers may fail to synchronize with
pending writers attempting to propagate priority to active readers, and
readers and writers deadlock on each other. This was observed in a
stable/12-based armv7 kernel where the compiler had reordered the load
of rmp_flags to before the stores updating the queue.
Kristof Provost [Mon, 1 Jun 2020 19:26:16 +0000 (19:26 +0000)]
bridge tests: Avoid building a switching loop
Enable STP before bringing the bridges up. This avoids a switching loop,
which has a tendency to drown out progress in userspace processes,
especially on single-core systems.
Only check that we have indeed shut down one of the looped interfaces
We used to have an issue with recursive locking with
net.link.bridge.inherit_mac. This causes us to send an ARP request while
we hold the BRIDGE_LOCK, which used to cause us to acquire the
BRIDGE_LOCK again. We can't re-acquire it, so this caused a panic.
Now that we no longer need to acquire the BRIDGE_LOCK for
bridge_transmit() this should no longer panic. Test this.
PR: 216510
Reviewed by: emaste, philip
MFC after: 2 months
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24251
bridge tests: Ensure that bridges in different jails get different MAC addresses
We used to have a problem where bridges created in different vnet jails
would end up having the same mac address. This is now fixed by
including the jail name as a seed for the mac address generation, but we
should verify that it doesn't regress.