Mark Johnston [Tue, 7 Sep 2021 18:51:54 +0000 (14:51 -0400)]
socket: Avoid clearing SS_ISCONNECTING if soconnect() fails
This behaviour appears to date from the 4.4 BSD import. It has two
problems:
1. The update to so_state is not protected by the socket lock, so
concurrent updates to so_state may be lost.
2. Suppose two threads race to call connect(2) on a socket, and one
succeeds while the other fails. Then the failing thread may
incorrectly clear SS_ISCONNECTING, confusing the state machine.
Simply remove the update. It does not appear to be necessary:
pru_connect implementations which call soisconnecting() only do so after
all failure modes have been handled. For instance, tcp_connect() and
tcp6_connect() will never return an error after calling soisconnected().
However, we cannot correctly assert that SS_ISCONNECTED is not set after
an error from soconnect() since the socket lock is not held across the
pru_connect call, so a concurrent connect(2) may have set the flag.
Mark Johnston [Tue, 7 Sep 2021 18:49:31 +0000 (14:49 -0400)]
socket: Rename sb(un)lock() and interlock with listen(2)
In preparation for moving sockbuf locks into the containing socket,
provide alternative macros for the sockbuf I/O locks:
SOCK_IO_SEND_(UN)LOCK() and SOCK_IO_RECV_(UN)LOCK(). These operate on a
socket rather than a socket buffer. Note that these locks are used only
to prevent concurrent readers and writters from interleaving I/O.
When locking for I/O, return an error if the socket is a listening
socket. Currently the check is racy since the sockbuf sx locks are
destroyed during the transition to a listening socket, but that will no
longer be true after some follow-up changes.
Modify a few places to check for errors from
sblock()/SOCK_IO_(SEND|RECV)_LOCK() where they were not before. In
particular, add checks to sendfile() and sorflush().
Reviewed by: tuexen, gallatin
Sponsored by: The FreeBSD Foundation
Ian Lepore [Tue, 28 Sep 2021 19:29:10 +0000 (13:29 -0600)]
Fix busdma resource leak on usb device detach.
When a usb device is detached, usb_pc_dmamap_destroy() called
bus_dmamap_destroy() while the map was still loaded. That's harmless on x86
architectures, but on all other platforms it causes bus_dmamap_destroy() to
return EBUSY and leak away any memory resources (including bounce buffers)
associated with the mapping, as well as any allocated map structure itself.
This change introduces a new is_loaded flag to the usb_page_cache struct to
track whether a map is loaded or not. If the map is loaded,
bus_dmamap_unload() is called before bus_dmamap_destroy() to avoid leaking
away resources.
Use atomic counters to ensure that we correctly track the number of half
open states and syncookie responses in-flight.
This determines if we activate or deactivate syncookies in adaptive
mode.
We'd likely be better served by converting these to the equivalent mem*
calls, but just kill the knob for now. The b* macros being defined get
in the way of _FORTIFY_SOURCE.
These aren't a part of or use libjail(3), but rather are direct
syscalls. Still, they seem like good additions, allowing us to attach
to already-running jails.
kqueue: don't arbitrarily restrict long-past values for NOTE_ABSTIME
NOTE_ABSTIME values are converted to values relative to boottime in
filt_timervalidate(), and negative values are currently rejected. We
don't reject times in the past in general, so clamp this up to 0 as
needed such that the timer fires immediately rather than imposing what
looks like an arbitrary restriction.
Another possible scenario is that the system clock had to be adjusted
by ~minutes or ~hours and we have less than that in terms of uptime,
making a reasonable short-timeout suddenly invalid. Firing it is still
a valid choice in this scenario so that applications can at least
expect a consistent behavior.
kern: random: collect ~16x less from fast-entropy sources
Previously, we were collecting at a base rate of:
64 bits x 32 pools x 10 Hz = 2.5 kB/s
This change drops it to closer to 64-ish bits per pool per second, to
work a little better with entropy providers in virtualized environments
without compromising the security goals of Fortuna.
kern: random: drop read_rate and associated functionality
Refer to discussion in PR 230808 for a less incomplete discussion, but
the gist of this change is that we currently collect orders of magnitude
more entropy than we need.
The excess comes from bytes being read out of /dev/*random. The default
rate at which we collect entropy without the read_rate increase is
already more than we need to recover from a compromise of an internal
state.
For stable/13, the read_rate_increment symbol remains as a stub to avoid
breaking loadable random modules.
Receive Filtering Engine (RFE) configuration is not yet implemented,
and mgb intended to enable all broadcast, multicast, and unicast.
However, MGB_RFE_ALLOW_MULTICAST was missed (MGB_RFE_ALLOW_UNICAST was
included twice).
MFC after: 1 week
Fixes: 8890ab7758b8 ("Introduce if_mgb driver...")
Sponsored by: The FreeBSD Foundation
Previously mgb_admin_intr printed a diagnostic message if no interrupt
status bits were set, but it's not valid to call device_printf() from a
filter. Just drop the message as it has no user-facing value.
Also return FILTER_STRAY in this case - there is nothing further for
the driver to do.
Reviewed by: kbowling
MFC after: 1 week
Fixes: 8890ab7758b8 ("Introduce if_mgb driver...")
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32231
This function was renamed to kern_reboot() in 2010, but the man page has
failed to keep in sync. Bring it up to date on the rename, add the
shutdown hooks to the synopsis, and document the (obvious) fact that
kern_reboot() does not return.
Fix an outdated reference to the old name in kern_reboot(), and leave a
reference to the man page so future readers might find it before any
large changes.
Reviewed by: imp, markj
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32085
rman: fix overflow in rman_reserve_resource_bound()
If the default range of [0, ~0] is given, then (~0 - 0) + 1 == 0. This
in turn will cause any allocation of non-zero size to fail. Zero-sized
allocations are prohibited, so add a KASSERT to this effect.
History indicates it is part of the original rman code. This bug may in
fact be older than some contributors.
David Bright [Mon, 27 Sep 2021 13:18:46 +0000 (06:18 -0700)]
ntb_hw_intel: fix xeon NTB gen3 bar disable logic
In NTB gen3 driver, it was supposed to disable NTB bar access by
default, but due to incorrect register access method, the bar disable
logic does not work as expected. Those registers should be modified
through NTB bar0 rather than PCI configuration space.
Besides, we'd better to protect ourselves from a bad buddy node so
ingress disable logic should be implemented together.
Marcin Wojtas [Fri, 21 May 2021 09:29:22 +0000 (11:29 +0200)]
Disable stack gap for ntpd during build.
When starting, ntpd calls setrlimit(2) to limit maximum size of its
stack. The stack limit chosen by ntpd is 200K, so when stack gap
is enabled, the stack gap is larger than this limit, which results
in ntpd crashing.
When WITHOUT_INET6 is selected we generate a null if-then-else blocks
due to incorrect placment of #if statments. Move the #if statements
reducing unnecessary runtime comparisons WITHOUT_INET6.
Greg V [Sat, 24 Apr 2021 11:53:34 +0000 (14:53 +0300)]
vt: call driver's postswitch when panicking on ttyv0
In vt_kms, the postswitch callback restores fbdev mode when
panicking or entering the debugger. This ensures that even when
a graphical applicatino was running on the first tty, simple framebuffer
mode would be restored and the panic would be visible instead
of the frozen GUI. But vt wouldn't call the postswitch callback
when we're already on the first tty, so running a GUI on it
would prevent you from reading any panics.
Alexander Motin [Sat, 4 Sep 2021 02:18:51 +0000 (22:18 -0400)]
Re-implement virtual console (constty).
Protect conscallout with tty lock instead of Giant. In addition to
Giant removal it also closes race on console unset.
Introduce additional lock to protect against concurrent console sets.
Remove consbuf free on console unset as unsafe, making impossible to
change buffer size after first allocation. Instead increase default
buffer size from 8KB to 64KB and processing rate from 5Hz to 10-15Hz
to make the output more smooth.
Rick Macklem [Thu, 16 Sep 2021 00:29:45 +0000 (17:29 -0700)]
nfscl: Add vfs.nfs.maxalloclen to limit Allocate RPC RTT
Unlike Copy, the NFSv4.2 Allocate operation does not
allow a reply with partial completion. As such, the only way to
limit the time the operation takes to provide a reasonable RPC RTT
is to limit the size of the allocation in the NFSv4.2
client.
This patch adds a sysctl called vfs.nfs.maxalloclen to set
the limit on the size of the Allocate operation.
There is no way to know how long a server will take to do an
allocate operation, but 64Mbytes results in a reasonable
RPC RTT for the slow hardware I test on, so that is what
the default value for vfs.nfs.maxalloclen is set to.
For an 8Gbyte allocation, the elapsed time for doing it in 64Mbyte
chunks was the same as the elapsed time taken for a single large
allocation operation for a FreeBSD server with a UFS file system.
Rick Macklem [Sun, 3 Oct 2021 22:50:46 +0000 (15:50 -0700)]
param.h: Bump __FreeBSD_version for commit a599f9f7620b
Commit a599f9f7620b deleted the variable called nfs_maxcopyrange
from nfscommon.ko, since it no longer needs to be global. As such,
the other nfs modules must be rebuilt from up to date sources.
Bump __FreeBSD_version to 1300516.
Rick Macklem [Sat, 11 Sep 2021 22:36:32 +0000 (15:36 -0700)]
nfscl: Make vfs.nfs.maxcopyrange larger by default
As of commit 103b207536f9, the NFSv4.2 server will limit the size
of a Copy operation based upon a 1 second timeout. The Linux 5.2
kernel server also limits Copy operation size to 4Mbytes.
As such, the NFSv4.2 client can attempt a large Copy without
resulting in a long RPC RTT for these servers.
This patch changes vfs.nfs.maxcopyrange to 64bits and sets
the default to the maximum possible size of SSIZE_MAX, since
a larger size makes the Copy operation more efficient and
allows for copying to complete with fewer RPCs.
The sysctl may be need to be made smaller for other non-FreeBSD
NFSv4.2 servers.
Kyle Evans [Wed, 27 Jan 2021 18:12:33 +0000 (12:12 -0600)]
makesyscalls: sprinkle some assert() on standard function calls
Improves our error reporting, ensuring that we aren't just ignoring
errors in the common case.
Note specifically the boundary where we have to change up our error
handling approach. It's fine to error() out up until we create the
tempdir, then the rest should try to handle it gracefully and abort().
A future change will clean this up further by pcall'ing all of the bits
that cannot currently error() without cleaning up.
From jilles: POSIX requires that a script set `OPTIND=1` before using
different sets of parameters with `getopts`, or the results will be
unspecified.
The specific problem observed here is that we would execute `man -f` or
`man -k` without cleaning up state from man_parse_args()' `getopts`
loop. FreeBSD's /bin/sh seems to reset OPTIND to 1 after we hit the
second getopts loop, rendering the following shift harmless; other
/bin/sh implementations will leave it at what we came into the loop at
(e.g., bash as /bin/sh), shifting off any keywords that we had.
Mark Johnston [Sat, 25 Sep 2021 14:13:56 +0000 (10:13 -0400)]
cam: Avoiding waking up doneq threads if we're dumping
Depending on the state of the target doneq thread at the time of the
panic, the wakeup can hang indefinitely in thread_lock_block_wait().
That function should likely be modified to return immediately if the
scheduler is stopped, but it is also preferable to avoid wakeups in
general after a panic.
Reported by: pho
Reviewed by: mav, imp
Sponsored by: The FreeBSD Foundation
unzip: sync with NetBSD upstream to add passphrase support
- Add support for password protected zip archives.
We use memset_s() rather than explicit_bzero() for more portable
(See PR).
- Use success/failure macro in exit()
- Mention ZIPX format in unzip(1)
Submitted by: Mingye Wang and Alex Kozlov (ak@)
PR: 244181
Reviewed by: mizhka
Obtained from: NetBSD
Differential Revision: https://reviews.freebsd.org/D28892
ng_ether: Create netgraph nodes for bridge interfaces.
Create netgraph nodes for bridge interfaces when the ng_ether module
is loaded. If a bridge interface is created after loading the ng_ether
module, a netgraph node is created via ether_ifattach().
We can't copyout() while holding a lock, in case it triggers a page
fault.
Release the lock before copyout, which is safe because we've already
copied all the data into the nvlist.
Use radix_mmu environment variable to select between Hash or Radix
MMU, when performing the CAS method call. This matches kernel's
behavior, by selecting Hash MMU by default and Radix if radix_mmu
is not zero, to make sure that both loader and kernel always select
the same MMU.
The device tree is queried to detect Radix/GTSE support and to
find out if CAS is supported, making the old CPU version and HV
bit checks unnecessary now.
Reviewed by: jhibbits
Sponsored by: Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D31951
Wenzhuo Lu [Fri, 16 Oct 2015 02:51:09 +0000 (10:51 +0800)]
e1000: fix K1 configuration
This patch is for the following updates to the K1 configurations:
Tx idle period for entering K1 should be 128 ns.
Minimum Tx idle period in K1 should be 256 ns.
Marko Zec [Sat, 25 Sep 2021 04:29:48 +0000 (06:29 +0200)]
[fib_algo][dxr] Split unused range chunk list in multiple buckets
Traversing a single list of unused range chunks in search for a block
of optimal size was suboptimal.
The experience with real-world BGP workloads has shown that on average
unused range chunks are tiny, mostly in length from 1 to 4 or 5, when
DXR is configured with K = 20 which is the current default (D16X4R).
Therefore, introduce a limited amount of buckets to accomodate descriptors
of empty blocks of fixed (small) size, so that those can be found in O(1)
time. If no empty chunks of the requested size can be found in fixed-size
buckets, the search continues in an unsorted list of empty chunks of
variable lengths, which should only happen infrequently.
This change should permit us to manage significantly more empty range
chunks without sacrifying the speed of incremental range table updating.
Philip Paeps [Wed, 29 Sep 2021 04:43:58 +0000 (12:43 +0800)]
contrib/tzdata: correct DST in Jordan and Samoa
Direct commit to stable/13.
The recent tzdata 2021b release includes several controversial changes
under active debate on the tz mailing list. Pending consensus, and
hopefully a 2021c release reflecting it, only merge the DST changes for
Jordan and Samoa. This corrects present and future timestamps in those
regions.
Alexander Motin [Wed, 15 Sep 2021 01:06:39 +0000 (21:06 -0400)]
ipmi(4): Limit maximum watchdog pre-timeout interval.
Previous code by default setting pre-timeout interval to 120 seconds
made impossible to set timeout interval below that, resulting in error
0xcc (Invalid data field in Request) at least on Supermicro boards.
To fix that limit maximum pre-timeout interval to ~1/4 of the timeout
interval, that sounds like a reasonable default: not too short to fire
too late, but also not too long to give many false reports.
Kevin Bowling [Wed, 15 Sep 2021 14:47:19 +0000 (07:47 -0700)]
e1000: Fix up HW vlan ops
2796f7ca:
* Don't reset the entire adapter for vlan changes, fix up the problems
* Add some functions for vlan filter (vfta) manipulation
* Don't muck with the vfta if we aren't doing HW vlan filtering
* Disable interrupts when manipulating vfta on lem(4)-class NICs
* On the I350 there is a specification update (2.4.20) in which the
suggested workaround is to write to the vfta 10 times (if at first you
don't succeed, try, try again). Our shared code has the goods, use it
* Increase a VF's frame receive size in the case of vlans
From the referenced PR, this reduced vlan configuration from minutes
to seconds with hundreds or thousands of vlans and prevents wedging the
adapter with needless adapter reinitialization for each vlan ID.
When handling a data irq, the sdhci driver calls the
sdhci_platform_will_handle() method, to determine if it should allow the
platform driver to handle the transfer or fall back to programmed I/O.
While dumping, the data irq path may be invoked directly (not from an
interrupt context), which the bcm2835_sdhci DMA code is not prepared to
handle. Return early in this case, to force the fallback to PIO.
Otherwise, the KASSERT that follows will be triggered, and the dump will
fail. On non-INVARIANTS kernels, the system will hang, waiting for a DMA
interrupt that will never arrive.
Reviewed by: kevans
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31893
Kornel Duleba [Tue, 31 Aug 2021 06:22:29 +0000 (08:22 +0200)]
mvneta: Fix MTU update sequence
After MTU is updated we might start using allocating RX buffers from different pool. (MJUM9BYTES vs MCLBYTES)
Because of that we need to update the RX buffer size in hardware.
Previously it was done only when the interface was up, which is incorrect since MTU can be changed at any time.
Kornel Duleba [Tue, 31 Aug 2021 12:22:30 +0000 (14:22 +0200)]
if_cdce: Add support for setting RX filtering
We can now set promisc and allmulti modes.
Filtering of given multicast addresses is not supported.
Changing the mode is done by sending a command described in:
"USB CDC Subclass Specification for Ethernet Devices v1.2, section 6.2.4".
This means that at least in theory this feature should work with all
modems that are using this driver.
This fixes Huawei E3372h-320 running new firmware in "HiLink" mode.
Previously it would reset a few seconds after its mode was changed
with "usb_modeswitch".
Setting RX filter to default value at the end of attach function
fixed that.