Alan Cox [Sat, 26 Jun 2021 03:29:38 +0000 (22:29 -0500)]
arm64: fix a potential KVA leak in pmap_demote_l1()
In the unlikely event that the 1 GB page mapping being demoted is used
to access the L1 page table page containing the 1 GB page mapping and
the vm_page_alloc() to allocate a new L2 page table page fails, we
would leak a page of kernel virtual address space. Fix this leak.
Alan Cox [Wed, 23 Jun 2021 05:10:20 +0000 (00:10 -0500)]
arm64: remove an unneeded test from pmap_clear_modify()
The page table entry for a 4KB page mapping must be valid if a PV entry
for the mapping exists, so there is no point in testing each page table
entry's validity when iterating over a PV list.
Alan Cox [Mon, 21 Jun 2021 07:45:21 +0000 (02:45 -0500)]
arm64: Use page_to_pvh() when the vm_page_t is known
When support for a sparse pv_table was added, the implementation of
pa_to_pvh() changed from a simple constant-time calculation to iterating
over the array vm_phys_segs[]. To mitigate this issue, an alternative
function, page_to_pvh(), was introduced that still runs in constant time
but requires the vm_page_t to be known. However, three cases where the
vm_page_t is known were not converted to page_to_pvh(). This change
converts those three cases.
Dimitry Andric [Fri, 27 Aug 2021 17:45:43 +0000 (19:45 +0200)]
Fix null pointer subtraction in mergesort()
Clang 13 produces the following warning for this function:
lib/libc/stdlib/merge.c:137:41: error: performing pointer subtraction with a null pointer has undefined behavior [-Werror,-Wnull-pointer-subtraction]
if (!(size % ISIZE) && !(((char *)base - (char *)0) % ISIZE))
^ ~~~~~~~~~
This is meant to check whether the size and base parameters are aligned
to the size of an int, so use our __is_aligned() macro instead.
Also remove the comment that indicated this "stupid subtraction" was
done to pacify some ancient and unknown Cray compiler, and which has
been there since the BSD 4.4 Lite Lib Sources were imported.
Kristof Provost [Tue, 24 Aug 2021 10:24:28 +0000 (12:24 +0200)]
pfctl: fix killing states by ID
Since the conversion to the new DIOCKILLSTATESNV the kernel no longer
exists the id and creatorid to be big-endian.
As a result killing states by id (i.e. `pfctl -k id -k 12345`) no longer
worked.
John Baldwin [Mon, 16 Aug 2021 17:42:46 +0000 (10:42 -0700)]
ktls: Fix accounting for TLS 1.0 empty fragments.
TLS 1.0 empty fragment mbufs have no payload and thus m_epg_npgs is
zero. However, these mbufs need to occupy a "unit" of space for the
purposes of M_NOTREADY tracking similar to regular mbufs. Previously
this was done for the page count returned from ktls_frame() and passed
to ktls_enqueue() as well as the page count passed to pru_ready().
However, sbready() and mb_free_notready() only use m_epg_nrdy to
determine the number of "units" of space in an M_EXT mbuf, so when a
TLS 1.0 fragment was marked ready it would mark one unit of the next
mbuf in the socket buffer as ready as well. To fix, set m_epg_nrdy to
1 for empty fragments. This actually simplifies the code as now only
ktls_frame() has to handle TLS 1.0 fragments explicitly and the rest
of the KTLS functions can just use m_epg_nrdy.
Andrew Gallatin [Wed, 11 Aug 2021 18:06:43 +0000 (14:06 -0400)]
ktls: Init reset tag task for cloned sessions
When cloning a ktls session (which is needed when we need to
switch output NICs for a NIC TLS session), we need to also
init the reset task, like we do when creating a new tls session.
John Baldwin [Tue, 15 Jun 2021 17:36:57 +0000 (10:36 -0700)]
ktls: Don't mark existing received mbufs notready for TOE TLS.
The TOE driver might receive decrypted TLS records that are enqueued
to the socket buffer after ktls_try_toe() returns and before
ktls_enable_rx() locks the receive buffer to call sb_mark_notready().
In that case, sb_mark_notready() would incorrectly treat the decrypted
TLS record as an encrypted record and schedule it for decryption.
This always resulted in the connection being dropped as the data in
the control message did not look like a valid TLS header.
To fix, don't try to handle software decryption of existing buffers in
the socket buffer for TOE TLS in ktls_enable_rx(). If a TOE TLS
driver needs to decrypt existing data in the socket buffer, the driver
will need to manage that in its tod_alloc_tls_session method.
Mitchell Horne [Wed, 11 Aug 2021 17:40:01 +0000 (14:40 -0300)]
kdb: Handle process enumeration before procinit()
Make kdb_thr_first() and kdb_thr_next() return sane values if the
allproc list and pidhashtbl haven't been initialized yet. This can
happen if the debugger is entered very early on, for example with the
'-d' boot flag.
This allows remote gdb to attach at such a time, and fixes some ddb
commands like 'show threads'.
Be explicit about the static initialization of these variables. This
part has no functional change.
Mitchell Horne [Wed, 4 Aug 2021 18:18:18 +0000 (15:18 -0300)]
arm: enable stack-smashing protection
With current generation clang/llvm it can pass all of our tests in
libc/ssp.
While here, remove the extra MACHINE_CPUARCH check for mips. SSP is
included in BROKEN_OPTIONS for this architecture in src.opts.mk, which
is enough to ensure normal builds won't set SSP_CFLAGS.
This affects powerpc64*, making binaries crash with segmentation fault
due to bad code generation around "__stack_chk_guard"
Root cause and/or proper fix is under investigation by:
https://bugs.llvm.org/show_bug.cgi?id=51590
Reviewed by: dim
MFC after: 2 days
Sponsored by: Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D31698
Andrew Turner [Fri, 16 Jul 2021 13:49:33 +0000 (13:49 +0000)]
Move setting arm64 HWCAP values to the ID tables
The HWCAPS values are based on the ID registers. Move setting these
to the existing ID register parsing code.
Previously we would need to handle all possible ID field values where
a HWCAP is set, however as most ID fields follow a scheme where when
the field increments it will only add new features meaning we only
need to check if the field is greater than when the HWCAP feature
was added.
While here stop setting HWCAP value that need kernel support, but this
support is missing.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31201
Andrew Turner [Tue, 27 Jul 2021 19:28:33 +0000 (19:28 +0000)]
Teach the arm64 kernel to identify the Arm AEM
The Arm Architecture Envelope Model is a simulator that models the
architecture rather than any specific implementation. Add its part ID
macro and add it to the list of Arm CPUs we can decode.
Andrew Turner [Wed, 14 Jul 2021 15:19:06 +0000 (15:19 +0000)]
Start to clean up arm64 address space selection
On arm64 we should use bit 55 of the address to decide if aan address
is a user or kernel address. Add a new macro with this check and a
second to ensure the address is in teh canonical form, i.e.
the top bits are all zero or all one.
This will help with supporting future cpu features, including Top
Byte Ignore, Pointer Authentication, and Memory Tagging.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31179
Kristof Provost [Sat, 21 Aug 2021 11:42:27 +0000 (13:42 +0200)]
altq: Fix panics on rmc_restart()
rmc_restart() is called from a timer, but can trigger traffic. This
means the curvnet context will not be set.
Use the vnet associated with the interface we're currently processing to
set it. We also have to enter net_epoch here, for the same reason.
Dave Fullard [Fri, 16 Jul 2021 04:02:48 +0000 (23:02 -0500)]
freebsd-update: create a ZFS boot environment on install
Updated freebsd-update to allow it to create boot environments using
bectl should the system support it. The bectl utility was updated in
r352211 (490e13c1403f) to support a 'check' to determine if the system
supports boot environments. If UFS is used, the bectl check will fail
then no attempt will be made to create the boot environment.
If freebsd-update is run inside a jail, no attempt will be made to
create a boot environment.
The boot environment function will create a new environment using the
format: current FreeBSD kernel version and date/timestamp, example:
12.0-RELEASE-p10_2019-10-03_185233
This functionality can be disabled by setting 'CreateBootEnv' in
freebsd-update.conf to 'no'.
Dimitry Andric [Thu, 26 Aug 2021 18:53:18 +0000 (20:53 +0200)]
Cleanup compiler warning flags in lib/libefivar/Makefile
There is no need to set -Wno-unused-parameter twice, and instead of
appending to CFLAGS, append to CWARNFLAGS instead. While here, add
-Wno-unused-but-set-variable for the sake of clang 13.0.0.
Dimitry Andric [Thu, 26 Aug 2021 20:06:49 +0000 (22:06 +0200)]
googletest: Silence warnings about deprecated implicit copy constructors
Our copy of googletest is rather stale, and causes a number of -Werror
warnings about implicit copy constructor definitions being deprecated,
because several classes have user-declared copy assignment operators.
Silence the warnings until we either upgrade or remove googletest.
Dan Langille [Sun, 15 Aug 2021 16:53:16 +0000 (12:53 -0400)]
Enable rc.d/jail within jails
Jails with jails is a supported. This change allows the script to run
upon startup with a jail. Without this, jails are not automatically
started within jails.
Toomas Soome [Sat, 21 Aug 2021 11:25:36 +0000 (14:25 +0300)]
loader: FB console does leave garbage on screen while scrolling
Scrolling screen will leave "trail" of chars from first column.
Apparently caused by cursor location mismanagement.
Make sure we do not [attempt to] set cursor out of the screen.
Alexander Motin [Mon, 2 Aug 2021 14:50:34 +0000 (10:50 -0400)]
sched_ule(4): Pre-seed sched_random().
I don't think it changes anything, but why not.
While there, make cpu_search_highest() use all 8 lower load bits for
noise, since it does not use cs_prefer and the code is not shared
with cpu_search_lowest() any more.
Alexander Motin [Mon, 2 Aug 2021 02:42:01 +0000 (22:42 -0400)]
sched_ule(4): Use trylock when stealing load.
On some load patterns it is possible for several CPUs to try steal
thread from the same CPU despite randomization introduced. It may
cause significant lock contention when holding one queue lock idle
thread tries to acquire another one. Use of trylock on the remote
queue allows both reduce the contention and handle lock ordering
easier. If we can't get lock inside tdq_trysteal() we just return,
allowing tdq_idled() handle it. If it happens in tdq_idled(), then
we repeat search for load skipping this CPU.
On 2-socket 80-thread Xeon system I am observing dramatic reduction
of the lock spinning time when doing random uncached 4KB reads from
12 ZVOLs, while IOPS increase from 327K to 403K.
Alexander Motin [Mon, 2 Aug 2021 02:07:51 +0000 (22:07 -0400)]
sched_ule(4): Reduce duplicate search for load.
When sched_highest() called for some CPU group returns nothing, idle
thread calls it for the parent CPU group. But the parent CPU group
also includes the CPU group we've just searched, and unless there is
a race going on, it is unlikely we find anything new this time.
Avoid the double search in case of parent group having only two sub-
groups (the most prominent case). Instead of escalating to the parent
group run the next search over the sibling subgroup and escalate two
levels up after if that fail too. In case of more than two siblings
the difference is less significant, while searching the parent group
can result in better decision if we find several candidate CPUs.
On 2-socket 40-core Xeon system I am measuring ~25% reduction of CPU
time spent inside cpu_search_highest() in both SMT (2x20x2) and non-
SMT (2x20) cases.
Alexander Motin [Thu, 29 Jul 2021 01:18:50 +0000 (21:18 -0400)]
Refactor/optimize cpu_search_*().
Remove cpu_search_both(), unused for many years. Without it there is
less sense for the trick of compiling common cpu_search() into separate
cpu_search_lowest() and cpu_search_highest(), so split them completely,
making code more readable. While there, split iteration over children
groups and CPUs, complicating code for very small deduplication.
Stop passing cpuset_t arguments by value and avoid some manipulations.
Since MAXCPU bump from 64 to 256, what was a single register turned
into 32-byte memory array, requiring memory allocation and accesses.
Splitting struct cpu_search into parameter and result parts allows to
even more reduce stack usage, since the first can be passed through
on recursion.
Remove CPU_FFS() from the hot paths, precalculating first and last CPU
for each CPU group in advance during initialization. Again, it was
not a problem for 64 CPUs before, but for 256 FFS needs much more code.
With these changes on 80-thread system doing ~260K uncached ZFS reads
per second I observe ~30% reduction of time spent in cpu_search_*().
Ka Ho Ng [Mon, 19 Apr 2021 08:07:03 +0000 (16:07 +0800)]
AMD-vi: Fortify IVHD device_identify process
- Use malloc(9) to allocate ivhd_hdrs list. The previous assumption
that there are at most 10 IVHDs in a system is not true. A counter
example would be a system with 4 IOMMUs, and each IOMMU is related
to IVHDs type 10h, 11h and 40h in the ACPI IVRS table.
- Always scan through the whole ivhd_hdrs list to find IVHDs that has
the same DeviceId but less prioritized IVHD type.
Sponsored by: The FreeBSD Foundation
MFC with: 74ada297e897
Reviewed by: grehan
Approved by: lwhsu (mentor)
Differential Revision: https://reviews.freebsd.org/D29525
Ka Ho Ng [Mon, 2 Aug 2021 09:54:40 +0000 (17:54 +0800)]
vmm: Bump vmname buffer in struct vm to VM_MAX_NAMELEN + 1
In hw.vmm.create sysctl handler the maximum length of vm name is
VM_MAX_NAMELEN. However in vm_create() the maximum length allowed is
only VM_MAX_NAMELEN - 1 chars. Bump the length of the internal buffer to
allow the length of VM_MAX_NAMELEN for vm name.
Reviewed by: grehan
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31372
Goran Mekić [Wed, 4 Aug 2021 10:04:54 +0000 (18:04 +0800)]
sound: Add an example of basic sound application
This is an example demonstrating the usage of the OSS-compatible APIs
provided by the sound(4) subsystem. It reads frames from a dsp node and
writes them to the same dsp node.
Rick Macklem [Thu, 12 Aug 2021 23:48:28 +0000 (16:48 -0700)]
nfsd: Fix sanity check for NFSv4.2 Allocate operations
The NFSv4.2 Allocate operation sanity checks the aa_offset
and aa_length arguments. Since they are assigned to variables
of type off_t (signed) it was possible for them to be negative.
It was also possible for aa_offset+aa_length to exceed OFF_MAX
when stored in lo_end, which is uint64_t.
This patch adds checks for these cases to the sanity check.
Dimitry Andric [Sat, 21 Aug 2021 21:03:37 +0000 (23:03 +0200)]
Apply clang fix for assertion failure compiling multimedia/minitube
Merge commit 79f9cfbc21e0 from llvm git (by Yaxun (Sam) Liu):
Do not merge LocalInstantiationScope for template specialization
A lambda in a function template may be recursively instantiated. The recursive
lambda will cause a lambda function instantiated multiple times, one inside another.
The inner LocalInstantiationScope should not be marked as MergeWithParentScope
since it already has references to locals properly substituted, otherwise it causes
assertion due to the check for duplicate locals in merged LocalInstantiationScope.
Kyle Evans [Thu, 18 Feb 2021 04:10:46 +0000 (22:10 -0600)]
pkg: use specific CONFSNAME_${file} for FreeBSD.conf
Setting CONFSNAME directly is a little more complicated for downstream
consumers, as any additional CONFS that are added here will inherit the
group name by default. This is perhaps arguably a design flaw in CONFS
because inheriting NAME will never give a good result when additional
files are added, but this is a low-effort change.
While we're here, pull FreeBSD.conf.${branch} out into a PKGCONF
variable so one can just drop a new repo config in entirely with a new
naming scheme. CONFSNAME gets set based on chopping anything off after
".conf", so that, e.g.:
Kyle Evans [Thu, 18 Feb 2021 03:41:53 +0000 (21:41 -0600)]
pkg: allow multiple add arguments again
While pkg(7) add only handles a single 'add' argument, pkg-add(8) fully
handles multiple arguments.
Stop rejecting it, just turn off local-bootstrap mode and proceed to
remote bootstrap if we need it.
While we're here, check if the first argument to pkg add is even a pkg
package. If it's not, also do remote bootstrap instead. Future work
could improve this altogether by picking out a pkg package out of many
and local bootstrap then pass the rest through to the newly installed
pkg.
Adam Fenn [Mon, 2 Aug 2021 16:27:17 +0000 (11:27 -0500)]
devclass_alloc_unit: move "at" hint test to after device-in-use test
Only perform this expensive operation when the unit number is a
potential candidate (i.e. not already in use), thereby reducing device
scan time on systems with many devices, unit numbers, and drivers.
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
X-NetApp-PR: #61
Toomas Soome [Sat, 31 Jul 2021 08:09:48 +0000 (11:09 +0300)]
loader: open file list should be dynamic
Summary:
Open file list is currently created as statically allocated array
(64 items). Once this array is filled up, loader will not be able
to operate with files. In most cases, this mechanism is good enough,
but the problem appears, when we have many disks with zfs pool(s).
In current loader implementation, all discovered zfs pool
configurations are kept in memory and disk devices open - consuming
the open file array. Rewrite the open file mechanism to use
dynamically allocated list.
makesyscalls was rewritten in Lua and introduced in d3276301ab. In the
time since, no objections have risen and a warning was introduced long
ago on invocation of makesyscalls.sh that it would be removed before
FreeBSD 13. Belatedly follow through on that.
Kyle Evans [Sun, 20 Jun 2021 19:36:10 +0000 (14:36 -0500)]
kenv: allow listing of static kernel environments
The early environment is typically cleared, so these new options
need the PRESERVE_EARLY_KENV kernel config(8) option. These environments
are reported as missing by kenv(1) if the option is not present in the
running kernel.
Kyle Evans [Sun, 20 Jun 2021 19:29:31 +0000 (14:29 -0500)]
kern: add an option for preserving the early kenv
Some downstream configurations do not store secrets in the
early (loader/static) environments and desire a way to preserve these
for diagnostic reasons. Provide an option to do so.
Patrick Kelsey [Mon, 26 Apr 2021 04:25:59 +0000 (00:25 -0400)]
iflib: Improve mapping of TX/RX queues to CPUs
iflib now supports mapping each (TX,RX) queue pair to the same CPU
(default), to separate CPUs, or to a pair of physical and logical CPUs
that share the same L2 cache. The mapping mechanism supports unequal
numbers of TX and RX queues, with the excess queues always being
mapped to consecutive physical CPUs. When the platform cannot
distinguish between physical and logical CPUs, all are treated as
physical CPUs. See the comment on get_cpuid_for_queue() for the
entire matrix.
The following device-specific tunables influence the mapping process:
dev.<device>.<unit>.iflib.core_offset (existing)
dev.<device>.<unit>.iflib.separate_txrx (existing)
dev.<device>.<unit>.iflib.use_logical_cores (new)
The following new, read-only sysctls provide visibility of the mapping
results:
dev.<device>.<unit>.iflib.{t,r}xq<n>.cpu
When an iflib driver allocates TX softirqs without providing reference
RX IRQs, iflib now binds those TX softirqs to CPUs using the above
mapping mechanism (that is, treats them as if they were TX IRQs).
Previously, such bindings were left up to the grouptaskqueue code and
thus fell outside of the iflib CPU mapping strategy.
ipfw: fix possible data race between jump cache reading and updating.
Jump cache is used to reduce the cost of rule lookup for O_SKIPTO and
O_CALLRETURN actions. It uses rules chain id to check correctness of
cached value. But due to the possible race, there is the chance that
one thread can read invalid value. In some cases this can lead to out
of bounds access and panic.
Use thread fence operations to constrain the reordering of accesses.
Also rename jump_fast and jump_linear functions to jump_cached and
jump_lookup_pos respectively.