Alan Somers [Sun, 3 Jan 2021 04:25:05 +0000 (21:25 -0700)]
aio: micro-optimize the lio_opcode assignments
This allows slightly more efficient opcode testing in-kernel. It is
transparent to userland, except to applications that sneakily submit
aio fsync or aio mlock operations via lio_listio, which has never been
documented, requires the use of deliberately undefined constants
(LIO_SYNC and LIO_MLOCK), and is arguably a bug.
Cy Schubert [Wed, 20 Jan 2021 15:20:22 +0000 (07:20 -0800)]
wpa_supplicant uses PF_ROUTE to return the routing table in order to
determine the length of the routing table buffer. As of 81728a538d24
wpa_supplicant is started before the routing table has been populated
resulting in the length of zero to be returned. This causes
wpa_supplicant to loop endlessly. (The workaround is to kill and restart
wpa_supplicant as by the time it is restarted the routing table is
populated.)
(Personally, I was not able to reproduce this unless wlan0 was a member of
lagg0. However, others experienced this problem on standalone wlan0.)
Address panic with PRR due to missed initialization of recover_fs
Summary:
When using the base stack in conjunction with RACK, it appears that
infrequently, ++tp->t_dupacks is instantly larger than tcprexmtthresh.
This leaves the recover flightsize (sackhint.recover_fs) uninitialized,
leading to a div/0 panic.
Address this by properly initializing the variable just prior to first
use, if it is not properly initialized.
In order to prevent stale information from a prior recovery to
negatively impact the PRR calculations in this event, also clear
recover_fs once loss recovery is finished.
Finally, improve the readability of the initialization of recover_fs
when t_dupacks == tcprexmtthresh by adjusting the indentation and
using the max(1, snd_nxt - snd_una) macro.
Alex Richardson [Wed, 20 Jan 2021 09:56:01 +0000 (09:56 +0000)]
libc: Fix null pointer arithmetic warning in mergesort
This file has other questionable code and "optimizations" (such as copying
one int at a time) that are probably no longer useful, so it might make
sense to replace it with a different implementation at some point.
Mark Johnston [Wed, 20 Jan 2021 02:32:33 +0000 (21:32 -0500)]
ktls: Improve handling of the bind_threads tunable a bit
- Only check for empty domains if we actually tried to configure domain
affinity in the first place. Otherwise setting bind_threads=1 will
always cause the sysctl value to be reported as zero. This is
harmless since the threads end up being bound, but it's confusing.
- Try to improve the sysctl description a bit.
Mark Johnston [Wed, 20 Jan 2021 01:34:36 +0000 (20:34 -0500)]
arm64, riscv: Set VM_KMEM_SIZE_SCALE to 1
This setting limits the amount of memory that can be allocated to UMA.
On systems with a direct map and ample KVA, however, there is no reason
for VM_KMEM_SIZE_SCALE to be larger than 1. This appears to have been
inherited from the 32-bit ARM platform definitions.
Also remove VM_KMEM_SIZE_MIN, which is not needed when
VM_KMEM_SIZE_SCALE is defined to be 1.[*]
Mark Johnston [Wed, 20 Jan 2021 01:34:35 +0000 (20:34 -0500)]
arm64: Stop setting VM_BCACHE_SIZE_MAX
This setting places a (small) limit on the size of the buffer cache,
constraining UFS performance on large servers. The setting comes from
the initial arm64 implementation and appears to be vestigal. Remove it.
Mark Johnston [Wed, 20 Jan 2021 01:34:35 +0000 (20:34 -0500)]
opencrypto: Fix assignment of crypto completions to worker threads
Since r336439 we simply take the session pointer value mod the number of
worker threads (ncpu by default). On small systems this ends up
funneling all completion work through a single thread, which becomes a
bottleneck when processing IPSec traffic using hardware crypto drivers.
(Software drivers such as aesni(4) are unaffected since they invoke
completion handlers synchonously.)
Instead, maintain an incrementing counter with a unique value per
session, and use that to distribute work to completion threads.
Mark Johnston [Wed, 20 Jan 2021 01:34:35 +0000 (20:34 -0500)]
opencrypto: Embed the driver softc in the session structure
Store the driver softc below the fields owned by opencrypto. This is
a bit simpler and saves a pointer dereference when fetching the driver
softc when processing a request.
Get rid of the crypto session UMA zone. Session allocations are
frequent or performance-critical enough to warrant a dedicated zone.
Alex Richardson [Tue, 19 Jan 2021 15:05:43 +0000 (15:05 +0000)]
Remove remaining uses of ${COMPILER_FEATURES:Mc++11}
All supported compilers have C++11 support so these checks can be replaced
with MK_CXX guards.
See also https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252759
Alex Richardson [Tue, 19 Jan 2021 11:35:04 +0000 (11:35 +0000)]
getopt: Fix conversion from string-literal to non-const char *
Define a non-const static char EMSG[] = "" to avoid having to add
__DECONST() to all uses of EMSG. Also make current_dash a const char *
to fix this warning.
Alex Richardson [Tue, 19 Jan 2021 11:32:33 +0000 (11:32 +0000)]
Require uint32_t alignment for ipfw_insn
There are many casts of this struct to uint32_t, so we also need to ensure
that it is sufficiently aligned to safely perform this cast on architectures
that don't allow unaligned accesses. This fixes lots of -Wcast-align warnings.
Alex Richardson [Tue, 19 Jan 2021 11:32:32 +0000 (11:32 +0000)]
libalias: Fix -Wcast-align compiler warnings
This fixes -Wcast-align warnings caused by the underaligned `struct ip`.
This also silences them in the public functions by changing the function
signature from char * to void *. This is source and binary compatible and
avoids the -Wcast-align warning.
John Baldwin [Tue, 19 Jan 2021 19:51:27 +0000 (11:51 -0800)]
Convert unmapped mbufs before computing checksums in IPsec.
This is similar to the logic used in ip_output() to convert mbufs
prior to computing checksums. Unmapped mbufs can be sent when using
sendfile() over IPsec or using KTLS over IPsec.
Reported by: Sony Arpita Das @ Chelsio QA
Reviewed by: np
Sponsored by: Chelsio
Differential Revision: https://reviews.freebsd.org/D28187
John Baldwin [Fri, 8 Jan 2021 22:56:22 +0000 (14:56 -0800)]
arm64: Trim duplicate code from cpu_fork_kthread_handler().
cpu_fork_kthread_handler() is always called after either cpu_fork() or
cpu_copy_thread(). The arm64 version was duplicating some of the work
already done by both of those functions.
Glen Barber [Tue, 19 Jan 2021 18:38:33 +0000 (13:38 -0500)]
release: Add workaround to use SVN for ports
The ports tree is scheduled to be converted from Subversion to Git
after the currently-scheduled 13.0-RELEASE, so the source of truth
will be Subversion for the ports tree.
Lutz Donnerhacke [Tue, 19 Jan 2021 14:56:16 +0000 (15:56 +0100)]
ixl: Permit 802.1ad frames to pass though the chip
This patch is a quick hack to change the internal Ethertype used
within the chip. All frames with this type are dropped silently.
This patch allows you to overwrite the factory default 0x88a8, which
is used by IEEE 802.1ad VLAN stacking.
Michal Meloun [Wed, 13 Jan 2021 12:50:54 +0000 (13:50 +0100)]
arm64 busdma: Fix loading of small bounced buffers.
- Don't oversize the buffer fragment. PAGE_SIZE - (curaddr & PAGE_MASK)
may be greater than the total length of the buffer.
- Don't use roundup2(len, alignment) to calculate the buffer fragment
size. The length of current bounced fragment is not subject to alignment
restriction, and next fragment should start at the page boundary.
Stefan Eßer [Tue, 19 Jan 2021 11:46:52 +0000 (12:46 +0100)]
Remove dependency on files in /usr/bin
In order to reduce the pre-requisites of this file, implement the
pattern matching and creation of a temporary test directory without
use of grep respectively mktemp.
The new version makes it possible to provide a writable /tmp in any
case and independently of other local or remote file systems (except /
and /dev) being mounted.
The use of "dd if=/dev/random" has the same dependency on /dev/random
being operational as the previous version that used "mktemp". If this
is found to be an issue on platforms that do not have gathered
sufficient entropy at the time when this scriot is run, I suggest to
replace the "dd" command with "ps lauxww" to get a somewhat random
test directory name.
Mateusz Guzik [Tue, 19 Jan 2021 09:08:24 +0000 (10:08 +0100)]
cache: save a branch in cache_fplookup_next
Previously the code would branch on top find out whether it should
branch on SDT probe and bumping the numposhits counter, depending
on cache_fplookup_cross_mount.
Arguably it should be done regardless of what said function returns.
Bryan Venteicher [Tue, 19 Jan 2021 04:55:25 +0000 (04:55 +0000)]
if_vtnet: Schedule Rx task if pending items when enabling interrupt
Prior to V1, the driver would enable interrupts and then notify the
host that DRIVER_OK. Since for V1, DRIVER_OK needs to be set before
notifying the virtqueues, there may be items in the queues waiting
to be processed by the time interrupts are enabled.
This fixes a bug where the Rx queue would appear stuck, only being
usable after an interface down/up cycle.
Bryan Venteicher [Tue, 19 Jan 2021 04:55:25 +0000 (04:55 +0000)]
if_vtnet: Limit allocations of unused virtqueues
For multiqueue, we may use fewer than the provided maximum number of
queues. Try to limit allocations of the unused queues: no interrupts,
no indirect descriptors, and no taskqueues.
Bryan Venteicher [Tue, 19 Jan 2021 04:55:25 +0000 (04:55 +0000)]
if_vtnet: Add support for software LRO
This useful when running on hosts that support checksum offloading
but not the GUEST_TSO (LRO) feature. Or potentially, some GRO-like
support when doing forwarding.
Only enable SW LRO when the host LRO is not available since both
tends to be harmful, and difficult to enable/disable selectively
with only a single IFCAP_LRO flag.
Bryan Venteicher [Tue, 19 Jan 2021 04:55:24 +0000 (04:55 +0000)]
if_vtnet: Defer updating generated MAC address until attached
This improves spec compliance because the driver is not suppose
to notify the device prior to setting the DRIVER_OK status, which
could happen with the VIRTIO_NET_F_CTRL_MAC_ADDR.
The VIRTIO_NET_F_MAC feature should always be negotiated so would
be a rare situation.
Bryan Venteicher [Tue, 19 Jan 2021 04:55:24 +0000 (04:55 +0000)]
if_vtnet: Remove at attach PROMISC handling
This may have been required in an early, early, early version of the
specification but I cannot find any reference to it, and a promiscuous
default seems very odd so remove this code.
Bryan Venteicher [Tue, 19 Jan 2021 04:55:24 +0000 (04:55 +0000)]
if_vtnet: Support VIRTIO_NET_F_SPEED_DUPLEX
This features lets the guest driver know the speed and duplex of
the "link". Instead of trying to support many media types based
on the possible/likely speeds/duplexes, only use the speed to
set the interface baudrate.
Bryan Venteicher [Tue, 19 Jan 2021 04:55:24 +0000 (04:55 +0000)]
if_vtnet: Support VIRTIO_NET_F_MTU
This feature lets the guest driver know the maximum MTU size
supported by the host device. If set, use this to limit the
acceptable MTUs, and improve how the receive mbuf cluster size
then is selected.
Bryan Venteicher [Tue, 19 Jan 2021 04:55:24 +0000 (04:55 +0000)]
if_vtnet: Rx path cleanup
- Fix the NEEDS_CSUM and DATA_VALID checksum flags. The NEEDS_CSUM
checksum is incomplete (partial) so offer a fallback for the driver
to calculate the checksum. Simplify DATA_VALID because we know
the host has validated the checksum.
- Default 4K mbuf clusters for mergeable buffers. May need to
scale this down to 2K clusters in certain configurations such
many queue pairs, big queues (like 4096 in GCP), and low memory.
- Use the MTU when calculated the receive mbuf cluster size
when not doing TSO/LRO. This will need more adjustment once
the MTU feature is supported.
Bryan Venteicher [Tue, 19 Jan 2021 04:55:23 +0000 (04:55 +0000)]
virtio: Add modern (v1) virtqueue support
This only supports the legacy virtqueue format that is now called
"Split Virtqueues". Support for the new "Packed Virtqueues" described
in v1.1 is left for a later date.
Bryan Venteicher [Tue, 19 Jan 2021 04:55:23 +0000 (04:55 +0000)]
virtio: Add VirtIO PCI modern (V1) support
Use the existing legacy PCI driver as the basis for shared code
between the legacy and modern PCI drivers. The existing virtio_pci
kernel module will contain both the legacy and modern drivers.
Changes to the virtqueue and each device driver (network, block, etc)
for V1 support come in later commits.
Update the MMIO driver to reflect the VirtIO bus method changes, but
the modern compliance can be improved on later.
Note that the modern PCI driver requires bus_map_resource() to be
implemented, which is not the case on all archs.
The hw.virtio.pci.transitional tunable default value is zero so
transitional devices will continue to be driven via the legacy
driver.
By adding the mergable header to the vtnet_rx_header structure, the size
was increased by 2 bytes, breaking the alignment of this structure as
described the in preceding comments.
Furthermore, the mergable header does not belong the structure. With the
mergable feature, the header is placed in line with the data, so there is
no need for a separate segment, and misleading to follow the mergable
header with any padding.
The V1 header is effectively identical to mergable header, and the driver
has long supported the mergable feature. Revert this so the later changes
that add V1 support can show how V1 is derived from the existing mergable
buffers support, and to facilitate a later MFC.
Emmanuel Vadot [Thu, 14 Jan 2021 12:56:38 +0000 (13:56 +0100)]
pkgbase: differentiate package versions for ALPHA/BETA/PRERELEASE/RC phases
The current postfix conversions are:
CURRENT / STABLE / PRERELEASE, 12.x-CURRENT becomes 12.snapYYYYMMDDhhmmss
ALPHAx -> .ax, so 11.3-ALPHA1 becomes 11.3.a1.YYYYMMDDhhmmss
BETAx -> .bx, so 12.1-BETA2 becomes 12.1.b2.YYYYMMDDhhmmss
RCx -> .rcx, so 13.0-RC3 becomes 13.0.rc3.YYYYMMDDhhmmss
RELEASE -> (nothing), so 12.1-RELEASE becomes 12.1
RELEASE-pX -> pX, so 12.1-RELEASE-p1 becomes 12.1p1
Note that for development branches we will start to drop the minor version
component entirely, which more closely matches how these branches are
physically named (stable/NN).
snap is a new prefix that was added to pkg in [0], which is simply a more
verbose version of the current ".s" used.
As noted, build timestamps are also added to ALPHA/BETA/RC versions. This
is largely irrelevant for re@ snapshots because they will only produce one
set of snapshots for each alpha/beta/rc, but external folks may produce
multiple in that timeframe -- at least for alpha. For them, it is
imperative that the builds have a differentiating characteristic like this
rather than multiple builds across multiple revisions being versioned
identically.
[0] https://github.com/freebsd/pkg/pull/1929
Reviewed by: gjb, manu
Submitted by: rene (original, original version)
Differential Revision: https://reviews.freebsd.org/D28167
Jamie Gritton [Tue, 19 Jan 2021 01:23:51 +0000 (17:23 -0800)]
jail: Clean up some function placement and improve comments.
Move prison_hold, prison_hold_locked ,prison_proc_hold, and
prison_proc_free to a more intuitive part of the file (together with
with prison_free and prison_free_locked), and add or improve comments
to these and others, to better describe what's going in the prison
reference cycle.
make maximum interrupt number tunable on ARM, ARM64, MIPS, and RISC-V
Use a machdep.nirq tunable intead of compile-time constant NIRQ
as a value for maximum number of interrupts. It allows keep a system
footprint small by default with an option to increase the limit
for large systems like server-grade ARM64
Mark Johnston [Mon, 18 Jan 2021 22:07:56 +0000 (17:07 -0500)]
aesni: Ensure that key schedules are aligned
Rather than depending on malloc() returning 16-byte aligned chunks,
allocate some extra pad bytes and ensure that key schedules are
appropriately aligned.
Mark Johnston [Mon, 18 Jan 2021 22:07:56 +0000 (17:07 -0500)]
safexcel: Maintain per-session context records
The context record contains key material precomputed by the driver at
session creation time. Rather than storing various components of the
context record in each session, go a bit further and store the full
context record image so that safexcel_process() can simply copy the
image into each request submitted to the hardware. This simplifies the
data path and eliminates a bunch of unnecessary conditional logic that
was getting executed for each request.
Mark Johnston [Mon, 18 Jan 2021 22:07:56 +0000 (17:07 -0500)]
safexcel: Simplify request allocation
Rather than preallocating a set of requests and moving them between
queues during state transitions, maintain a shadow of the command
descriptor ring to track the driver context of each request. This is
simpler and requires less synchronization between safexcel_process() and
the ring interrupt handler.
Rather than returning a hard error in this case, return ERESTART so that
upper layers get a chance to retry the request (or drop it, depending on
the desired policy).
This case is hard to hit due to the somewhat low bound on queued
requests, but that will no longer be true after an upcoming change.
Mark Johnston [Mon, 18 Jan 2021 22:07:55 +0000 (17:07 -0500)]
linuxkpi: Fix the shrinker scan target
Use the number of items scanned to control the duration of the shrink
loop. Otherwise, if a consumer like TTM is not able to free the number
of items requested for some reason, the shrinker keeps looping forever.
Reviewed by: manu
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28224
Kyle Evans [Mon, 18 Jan 2021 19:34:54 +0000 (13:34 -0600)]
pkgbase: limit PKG_VERSION_FROM calculation to real-update-packages
PKG_ABI is defined in some other targets that do not need to shell out and
calculate PKG_VERSION_FROM. Moreover, it produces extra errors when
bootstrapping an initial pkgbase repo, as the /latest link doesn't exist
yet.
Jamie Gritton [Mon, 18 Jan 2021 18:56:20 +0000 (10:56 -0800)]
jail: Add prison_isvalid() and prison_isalive()
prison_isvalid() checks if a prison record can be used at all, i.e.
pr_ref > 0. This filters out prisons that aren't fully created, and
those that are either in the process of being dismantled, or will be
at the next opportunity. While the check for pr_ref > 0 is simple
enough to make without a convenience function, this prepares the way
for other measures of prison validity.
prison_isalive() checks not only validity as far as the useablity of
the prison structure, but also whether the prison is visible to user
space. It replaces a test for pr_uref > 0, which is currently only
used within kern_jail.c, and not often there.
Both of these functions also assert that either the prison mutex or
allprison_lock is held, since it's generally the case that unlocked
prisons aren't guaranteed to remain useable for any length of time.
This isn't entirely true, for example a thread can assume its own
prison is good, but most exceptions will exist inside of kern_jail.c.
Andrew Gallatin [Thu, 14 Jan 2021 17:44:06 +0000 (12:44 -0500)]
KTLS: Enable KERN_TLS in GENERIC on amd64
Based on discussions on freebsd-arch@, enable KERN_TLS in
GENERIC on amd64, but leave it disabled via the
sysctl kern.ipc.tls.enable. Users wishing to enable
ktls must set kern.ipc.tls.enable=1
While here, fix wording in NOTES to mention that KERN_TLS
also does receive now.