amd64: stop using top of the thread' kernel stack for FPU user save area
MFC note: this commit changes layout of td_md for amd64, resulting in
static checks for struct thread ABI in kern_thread.c to fail. Next
two commits restore the layout, I decided to not overcomplicate the
merge and not do the work that is going to be overwritten immediately.
Kyle Evans [Sat, 2 Oct 2021 05:17:57 +0000 (00:17 -0500)]
vfs: remove dead fifoop VOP_KQFILTER implementations
These began to become obsolete in d6d64f0f2c26 (r137739) and the deal
was later sealed in 003e18aef4c4 (r137801) when vfs.fifofs.fops was
dropped and vop-bypass for pipes became mandatory.
Ed Maste [Tue, 3 Aug 2021 18:30:06 +0000 (14:30 -0400)]
ar: provide error exit status upon failure
Previously ar and ranlib returned with exit status 0 (success) in the
case of a missing file or other error. Update to use error handling
similar to that added by ELF Tool Chain after that project forked
FreeBSD's ar.
PR: PR257599 [exp-run]
Reported by: Shawn Webb, gehmehgeh (on HardenedBSD IRC)
Reviewed by: markj
Obtained from: elftoolchain
MFC after: 2 months
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31402
John Baldwin [Fri, 8 Oct 2021 18:28:23 +0000 (11:28 -0700)]
Revert "Disable MK_OPENSSL_KTLS for stable/13."
No further issues with the KTLS support in OpenSSL have been seen
since the serf/apache bugs which motivated disabling KTLS. In
addition, the KTLS API in OpenSSL is now finalized since it has been
released in OpenSSL 3.0.
Faraz Vahedi [Fri, 1 Oct 2021 18:48:57 +0000 (13:48 -0500)]
freebsd-update(8): Add -j flag to support jails
Make freebsd-update(8) support jails by adding the -j flag which takes
a jail jid or name as an argument. This takes advantage of the recently
added -j support to freebsd-version(8) in order to get the version of
the installed userland.
Faraz Vahedi [Fri, 1 Oct 2021 18:46:23 +0000 (13:46 -0500)]
freebsd-version(1): Add -j flag to support jails
Make freebsd-version(1) support jails by adding the -j flag which takes
a jail jid or name as an argument. As with other options, -j
flags stack and display in the order requested.
Alan Somers [Thu, 16 Sep 2021 19:19:21 +0000 (13:19 -0600)]
fusefs: don't panic if FUSE_GETATTR fails durint VOP_GETPAGES
During VOP_GETPAGES, fusefs needs to determine the file's length, which
could require a FUSE_GETATTR operation. If that fails, it's better to
SIGBUS than panic.
MFC uipc_shm: Fix kern.ipc.posix_shm_list for jails
Fix error return of kern.ipc.posix_shm_list, which caused it (and thus
"posixshmcontrol ls") to fail for all jails that didn't happen to own
the last shm object in the list.
ena: Avoid unnecessary mbuf collapses for LLQ condition
In case of Low-latency Queue, one small enough descriptor can be pushed
directly to the ENA hw, thus saving one fragment. Check for this
condition before performing collapse.
Check for ENA_FLAG_TRIGGER_RESET inside a locked context in order to
avoid potential race conditions with ena_destroy_device. This aligns the
reset task logic with the Linux driver.
Stay aligned with the Linux driver by adding the following logs:
* inform the user about retrying queue creation
* warn on non-empty ena_tx_buffer.mbuf prior to ena_tx_map_mbuf
ENA silently assumed that ena_up, ena_down and ena_start_xmit routines
should be called within locked context. Driver's logic heavily assumes
on concurrent access to those routines, so for safety and better
documentation about this assumption, the locking assertions were added
to the above functions.
The assertion was added only for the main steps (skipping the helper
functions) which can be called from multiple places including the kernel
and the driver itself.
If LLQ is being used, `ena_tx_ctx.meta_valid` must stay enabled. This
fixes netmap support on latest generation ENA HW and aligns it with the
core driver behavior.
As netmap doesn't support any csum offloads, the
`adapter->disable_meta_caching` value can be simply passed to the HW.
ena: Share ena_global_lock between driver instances
In order to use `ena_global_lock` in sysctl context, it must be kept
outside the driver instance's software context, as sysctls can be called
before attach and after detach, leading to lock use before sx_init and
after sx_destroy otherwise.
Solve this issue by turning `ena_global_lock` into a file scope
variable, shared between all instances of the driver and associated
sysctl context, and in turn initialized/destroyed in dedicated
SYSINIT/SYSUNINIT functions.
As a side effect, this change also fixes existing race in the reset
routine, when simultaneously accessing sysctl exposed properties.
Bind RX/TX queues and MSI-X vectors to matching CPUs based on the RSS
bucket entries.
Introduce sysctls for the following RSS functionality:
- rss.indir_table: indirection table mapping
- rss.indir_table_size: indirection table size
- rss.key: RSS hash key (if Toeplitz used)
Said sysctls are only available when compiled without `option RSS`, as
kernel-side RSS support currently doesn't offer RSS reconfiguration.
Migrate the hash algorithm from CRC32 to Toeplitz and change the initial
hash value to 0x0 in order to match the standard Toeplitz implementation.
Provide helpers for hash key inversion required for HW operations.
When building ENA as compiled into the kernel, the driver would fail to
build. Resolve the problem by introducing the following changes:
1. Add missing `ena_rss.c` entry in `sys/conf/files`.
2. Prevent SYSCTL_ADD_INT from throwing an assert due to an extra
CTLTYPE_INT flag.
Fixes: 986e7b92276 ("ena: Move RSS logic into its own source files") Fixes: 6d1ef2abd33 ("ena: Implement full RSS reconfiguration")
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
MFC after: 1 week
Some of the changes in this release:
* Hardware RSS hash key reconfiguration and indirection table
reconfiguration support.
* Full kernel RSS support.
* Extra statistic counters.
* Netmap support for ENAv3.
* Locking assertions.
* Extra log messages.
* Reset handling fixes.
Default LLQ (Low-latency queue) maximum header size is 96 bytes and can
be too small for some types of packets - like IPv6 packets with multiple
extension. This can be fixed, by using large LLQ headers.
If the device supports larger LLQ headers, the user can activate this
feature by setting sysctl tunable 'hw.ena.force_large_llq_header' to '1'
in the /boot/loader.conf file.
In case the device isn't supporting this feature, the default value (96B)
will be used.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
IO queue related attributes are registered statically at driver attach
with the rest of the ENA specific sysctl nodes. However, the number of
queues can be changed at runtime via the `ena_sysctl_io_queues_nb`
request, leading to a potential exposure of attributes for non-existing
queues.
Introduce a new `ena_sysctl_update_queue_node_nb` function, which
updates the sysctl nodes after the number of queues is altered.
This happens by either registering or unregistering node specific oids,
based on a delta between the previous and current queue count.
NOTE: All unregistered oids must be registered again before the driver
detach, e.g. by another call to this function.
Submitted by: Artur Rojek <ar@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
Bring the obsolete man page up to date:
* update diagnostic error messages
* add documentation of loader tunables
* document netmap support
* add a driver history section
* update the contact information
Submitted by: Artur Rojek <ar@semihalf.com>
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
Marcin Wojtas [Thu, 15 Jul 2021 15:08:16 +0000 (17:08 +0200)]
mmc_fdt_helper: correct typo in DT property name
'no-1-8-v' is a proper name according to the DT binding
documentation
(https://www.kernel.org/doc/Documentation/devicetree/bindings/mmc/mmc-controller.yaml).
Mark Johnston [Tue, 7 Sep 2021 18:51:54 +0000 (14:51 -0400)]
socket: Avoid clearing SS_ISCONNECTING if soconnect() fails
This behaviour appears to date from the 4.4 BSD import. It has two
problems:
1. The update to so_state is not protected by the socket lock, so
concurrent updates to so_state may be lost.
2. Suppose two threads race to call connect(2) on a socket, and one
succeeds while the other fails. Then the failing thread may
incorrectly clear SS_ISCONNECTING, confusing the state machine.
Simply remove the update. It does not appear to be necessary:
pru_connect implementations which call soisconnecting() only do so after
all failure modes have been handled. For instance, tcp_connect() and
tcp6_connect() will never return an error after calling soisconnected().
However, we cannot correctly assert that SS_ISCONNECTED is not set after
an error from soconnect() since the socket lock is not held across the
pru_connect call, so a concurrent connect(2) may have set the flag.
Mark Johnston [Tue, 7 Sep 2021 18:49:31 +0000 (14:49 -0400)]
socket: Rename sb(un)lock() and interlock with listen(2)
In preparation for moving sockbuf locks into the containing socket,
provide alternative macros for the sockbuf I/O locks:
SOCK_IO_SEND_(UN)LOCK() and SOCK_IO_RECV_(UN)LOCK(). These operate on a
socket rather than a socket buffer. Note that these locks are used only
to prevent concurrent readers and writters from interleaving I/O.
When locking for I/O, return an error if the socket is a listening
socket. Currently the check is racy since the sockbuf sx locks are
destroyed during the transition to a listening socket, but that will no
longer be true after some follow-up changes.
Modify a few places to check for errors from
sblock()/SOCK_IO_(SEND|RECV)_LOCK() where they were not before. In
particular, add checks to sendfile() and sorflush().
Reviewed by: tuexen, gallatin
Sponsored by: The FreeBSD Foundation
Ian Lepore [Tue, 28 Sep 2021 19:29:10 +0000 (13:29 -0600)]
Fix busdma resource leak on usb device detach.
When a usb device is detached, usb_pc_dmamap_destroy() called
bus_dmamap_destroy() while the map was still loaded. That's harmless on x86
architectures, but on all other platforms it causes bus_dmamap_destroy() to
return EBUSY and leak away any memory resources (including bounce buffers)
associated with the mapping, as well as any allocated map structure itself.
This change introduces a new is_loaded flag to the usb_page_cache struct to
track whether a map is loaded or not. If the map is loaded,
bus_dmamap_unload() is called before bus_dmamap_destroy() to avoid leaking
away resources.
Use atomic counters to ensure that we correctly track the number of half
open states and syncookie responses in-flight.
This determines if we activate or deactivate syncookies in adaptive
mode.
We'd likely be better served by converting these to the equivalent mem*
calls, but just kill the knob for now. The b* macros being defined get
in the way of _FORTIFY_SOURCE.
These aren't a part of or use libjail(3), but rather are direct
syscalls. Still, they seem like good additions, allowing us to attach
to already-running jails.
kqueue: don't arbitrarily restrict long-past values for NOTE_ABSTIME
NOTE_ABSTIME values are converted to values relative to boottime in
filt_timervalidate(), and negative values are currently rejected. We
don't reject times in the past in general, so clamp this up to 0 as
needed such that the timer fires immediately rather than imposing what
looks like an arbitrary restriction.
Another possible scenario is that the system clock had to be adjusted
by ~minutes or ~hours and we have less than that in terms of uptime,
making a reasonable short-timeout suddenly invalid. Firing it is still
a valid choice in this scenario so that applications can at least
expect a consistent behavior.
kern: random: collect ~16x less from fast-entropy sources
Previously, we were collecting at a base rate of:
64 bits x 32 pools x 10 Hz = 2.5 kB/s
This change drops it to closer to 64-ish bits per pool per second, to
work a little better with entropy providers in virtualized environments
without compromising the security goals of Fortuna.
kern: random: drop read_rate and associated functionality
Refer to discussion in PR 230808 for a less incomplete discussion, but
the gist of this change is that we currently collect orders of magnitude
more entropy than we need.
The excess comes from bytes being read out of /dev/*random. The default
rate at which we collect entropy without the read_rate increase is
already more than we need to recover from a compromise of an internal
state.
For stable/13, the read_rate_increment symbol remains as a stub to avoid
breaking loadable random modules.
Receive Filtering Engine (RFE) configuration is not yet implemented,
and mgb intended to enable all broadcast, multicast, and unicast.
However, MGB_RFE_ALLOW_MULTICAST was missed (MGB_RFE_ALLOW_UNICAST was
included twice).
MFC after: 1 week
Fixes: 8890ab7758b8 ("Introduce if_mgb driver...")
Sponsored by: The FreeBSD Foundation
Previously mgb_admin_intr printed a diagnostic message if no interrupt
status bits were set, but it's not valid to call device_printf() from a
filter. Just drop the message as it has no user-facing value.
Also return FILTER_STRAY in this case - there is nothing further for
the driver to do.
Reviewed by: kbowling
MFC after: 1 week
Fixes: 8890ab7758b8 ("Introduce if_mgb driver...")
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32231
This function was renamed to kern_reboot() in 2010, but the man page has
failed to keep in sync. Bring it up to date on the rename, add the
shutdown hooks to the synopsis, and document the (obvious) fact that
kern_reboot() does not return.
Fix an outdated reference to the old name in kern_reboot(), and leave a
reference to the man page so future readers might find it before any
large changes.
Reviewed by: imp, markj
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32085
rman: fix overflow in rman_reserve_resource_bound()
If the default range of [0, ~0] is given, then (~0 - 0) + 1 == 0. This
in turn will cause any allocation of non-zero size to fail. Zero-sized
allocations are prohibited, so add a KASSERT to this effect.
History indicates it is part of the original rman code. This bug may in
fact be older than some contributors.
David Bright [Mon, 27 Sep 2021 13:18:46 +0000 (06:18 -0700)]
ntb_hw_intel: fix xeon NTB gen3 bar disable logic
In NTB gen3 driver, it was supposed to disable NTB bar access by
default, but due to incorrect register access method, the bar disable
logic does not work as expected. Those registers should be modified
through NTB bar0 rather than PCI configuration space.
Besides, we'd better to protect ourselves from a bad buddy node so
ingress disable logic should be implemented together.
Marcin Wojtas [Fri, 21 May 2021 09:29:22 +0000 (11:29 +0200)]
Disable stack gap for ntpd during build.
When starting, ntpd calls setrlimit(2) to limit maximum size of its
stack. The stack limit chosen by ntpd is 200K, so when stack gap
is enabled, the stack gap is larger than this limit, which results
in ntpd crashing.
When WITHOUT_INET6 is selected we generate a null if-then-else blocks
due to incorrect placment of #if statments. Move the #if statements
reducing unnecessary runtime comparisons WITHOUT_INET6.
Greg V [Sat, 24 Apr 2021 11:53:34 +0000 (14:53 +0300)]
vt: call driver's postswitch when panicking on ttyv0
In vt_kms, the postswitch callback restores fbdev mode when
panicking or entering the debugger. This ensures that even when
a graphical applicatino was running on the first tty, simple framebuffer
mode would be restored and the panic would be visible instead
of the frozen GUI. But vt wouldn't call the postswitch callback
when we're already on the first tty, so running a GUI on it
would prevent you from reading any panics.