Alexander Motin [Fri, 20 Aug 2021 18:37:55 +0000 (14:37 -0400)]
mpr(4): Fix unmatched devq release.
Before this change devq was frozen only if some command was sent to
the target after reset started, but release was called always. This
change freezes the devq immediately, leaving mprsas_action_scsiio()
check only to cover race condition due to different lock devq use.
This should also avoid unnecessary requeue of the commands, creating
additional log noise and confusing some broken apps.
Alexander Motin [Fri, 20 Aug 2021 13:46:51 +0000 (09:46 -0400)]
mpr(4): Handle mprsas_alloc_tm() errors on device removal.
SAS9305-16e with firmware 16.00.01.00 report HighPriorityCredit of
only 8, while for comparison some other combinations I have report
100 or even 128. In case of large JBOD detach requirement to send
target reset command to each target same time overflows the limit,
and without adequate handling makes devices stuck in half-detached
state, preventing later re-attach.
To handle that in case of allocation error mark the target with new
MPRSAS_TARGET_TOREMOVE flag, and retry the removal attempt next time
something else free high priority command. With this patch I can
successfully detach/attach 102 disk JBOD from/to the SAS9305-16e.
Fix mpr(4) and mps(4) state transitions and a use-after-free panic.
When the mpr(4) and mps(4) drivers probe a SATA device, they issue an
ATA Identify command (via mp{s,r}sas_get_sata_identify()) before the
target is fully setup in the driver. The drivers wait for completion of
the identify command, and have a 5 second timeout. If the timeout
fires, the command is marked with the SATA_ID_TIMEOUT flag so it can be
freed later.
That is where the use-after-free problem comes in. Once the ATA
Identify times out, the driver sends a target reset, and then frees any
identify commands that have timed out. But, once the target reset
completes, commands that were queued to the drive are returned to the
driver by the controller.
At that point, the driver (in mp{s,r}_intr_locked()) looks up the
command descriptor for that particular SMID, marks it CM_STATE_BUSY and
sends it on for completion handling.
The problem at this stage is that the command has already been freed,
and put on the free queue, so its state is CM_STATE_FREE. If INVARIANTS
are turned on, we get a panic as soon as this command is allocated,
because its state is no longer CM_STATE_FREE, but rather CM_STATE_BUSY.
So, the solution is to not free ATA Identify commands that get stuck
until they actually return from the controller. Hopefully this works
correctly on older firmware versions. If not, it could result in
commands hanging around indefinitely. But, the alternative is a
use-after-free panic or assertion (in the INVARIANTS case).
This also tightens up the state transitions between CM_STATE_FREE,
CM_STATE_BUSY and CM_STATE_INQUEUE, so that the state transitions happen
once, and we have assertions to make sure that commands are in the
correct state before transitioning to the next state. Also, for each
state assertion, we print out the current state of the command if it is
incorrect.
mp{s,r}.c: Add a new sysctl variable, dump_reqs_alltypes,
that controls the behavior of the dump_reqs sysctl.
If dump_reqs_alltypes is non-zero, it will dump
all commands, not just the commands that are in the
CM_STATE_INQUEUE state. (You can see the commands
that are in the queue by using mp{s,r}util debug
dumpreqs.)
Make sure that the INQUEUE -> BUSY state transition
happens in one place, the mp{s,r}_complete_command
routine.
mp{s,r}_sas.c: Make sure we print the current command type in
command state assertions.
mp{s,r}_sas_lsi.c:
Add a new completion handler,
mp{s,r}sas_ata_id_complete. This completion
handler will free data allocated for an ATA
Identify command and free the command structure.
In mp{s,r}_ata_id_timeout, do not set the command
state to CM_STATE_BUSY. The command is still in
queue in the controller. Since we were blocking
waiting for this command to complete, there was
no completion handler previously. Set the
completion handler, so that whenever the command
does come back, it will get freed properly.
Do not free ATA Identify commands that have timed
out in mp{s,r}sas_add_device(). Wait for them
to actually come back from the controller.
mp{s,r}var.h: Add a dump_reqs_alltypes variable for the new
dump_reqs_alltypes sysctl.
Make sure we print the current state for state
transition asserts.
This was tested in the Spectra Logic test bed (as described in the
review), as well Netflix's Open Connect fleet (where panics dropped from
a dozen or two a month to zero).
Reviewed by: imp@ (who is handling the commit with ken's OK)
Sponsored by: Spectra Logic
Differential Revision: https://reviews.freebsd.org/D25476
Dimitry Andric [Sun, 29 Aug 2021 11:15:23 +0000 (13:15 +0200)]
Fix acpica macros that subtract null pointers
Clang 13.0.0 produces a new -Werror warning about the ACPI_TO_INTEGER(p)
and ACPI_OFFSET(d, f) macros in acpica's actypes.h:
sys/contrib/dev/acpica/components/dispatcher/dsopcode.c:708:31: error: performing pointer subtraction with a null pointer has undefined behavior [-Werror,-Wnull-pointer-subtraction]
ObjDesc->Region.Address = ACPI_PTR_TO_PHYSADDR (Table);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~
sys/contrib/dev/acpica/include/actypes.h:664:41: note: expanded from macro 'ACPI_PTR_TO_PHYSADDR'
#define ACPI_PTR_TO_PHYSADDR(i) ACPI_TO_INTEGER(i)
^~~~~~~~~~~~~~~~~~
sys/contrib/dev/acpica/include/actypes.h:661:41: note: expanded from macro 'ACPI_TO_INTEGER'
#define ACPI_TO_INTEGER(p) ACPI_PTR_DIFF (p, (void *) 0)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
sys/contrib/dev/acpica/include/actypes.h:656:82: note: expanded from macro 'ACPI_PTR_DIFF'
#define ACPI_PTR_DIFF(a, b) ((ACPI_SIZE) (ACPI_CAST_PTR (UINT8, (a)) - ACPI_CAST_PTR (UINT8, (b))))
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
This problem of undefined behavior was also reported to acpica by @cem
in 2018: https://github.com/acpica/acpica/issues/407, but it seems there
was never any fix committed for it upstream.
Instead fix these locally, for ACPI_TO_INTEGER by simply casting the
incoming pointer to ACPI_SIZE (which corresponds roughly to uintptr_t
and size_t), and for ACPI_OFFSET by reusing our __offsetof definition
from sys/cdefs.h.
Dimitry Andric [Sun, 29 Aug 2021 13:39:16 +0000 (15:39 +0200)]
Remove -simplifycfg-dup-ret from CLANG_OPT_SMALL flags for clang 13
After llvm/clang 13.0.0, the -simplifycfg-dup-ret backend flag is no
longer supported. This was part of CLANG_OPT_SMALL, which is only still
used for stand/i386/boot2 and stand/i386/isoboot, to achieve the very
small binary size required. Luckily clang 13.0.0 does not need any
additional flags for this (I get 240 bytes available when building
boot2).
Dimitry Andric [Sun, 29 Aug 2021 14:02:31 +0000 (16:02 +0200)]
xen: Fix warning by adding KERNBASE to modlist_paddr before casting
Clang 13 produces the following warning for hammer_time_xen():
sys/x86/xen/pv.c:183:19: error: the pointer incremented by -2147483648 refers past the last possible element for an array in 64-bit address space containing 256-bit (32-byte) elements (max possible 576460752303423488 elements) [-Werror,-Warray-bounds]
(vm_paddr_t)start_info->modlist_paddr + KERNBASE;
^ ~~~~~~~~
sys/xen/interface/arch-x86/hvm/start_info.h:131:5: note: array 'modlist_paddr' declared here
uint64_t modlist_paddr; /* Physical address of an array of */
^
This is because the expression first casts start_info->modlist_paddr to
struct hvm_modlist_entry * (via vmpaddr_t), and *then* adds KERNBASE,
which is then interpreted as KERNBASE * sizeof(struct
hvm_modlist_entry).
Instead, parenthesize the addition to get the intended result, and cast
it to struct hvm_modlist_entry * afterwards. Also remove the cast to
vmpaddr_t since it is not necessary.
Dimitry Andric [Sun, 29 Aug 2021 13:53:40 +0000 (15:53 +0200)]
Don't error out on unused but set variables with clang 13
Clang 13.0.0 now has a -Wunused-but-set-variable warning similar to the
one gcc has had for quite a while. Since this triggers *very* often for
our kernel builds, don't make it a hard error, but leave the warning
visible so is some incentive to fix the instances.
Alexander Motin [Fri, 27 Aug 2021 00:21:56 +0000 (20:21 -0400)]
pcib(4): Write window registers after resource adjustment
When adjusting resources we should write updated window base/limit into
the registers. Without this newly added address range won't be routed
through the bridge properly.
Use MIN()/MAX() against current window base/limit to not shrink it on
the other side if the window is shared by several resources.
Align passed resource start/end to the set window granularity to keep
it properly aligned. Currently this is mostly called by other bridges
having the same window alignment, but it may be change one day.
Jessica Clarke [Sat, 7 Aug 2021 18:27:32 +0000 (19:27 +0100)]
pci_pci: Support growing windows in bus_adjust_resource for NEW_PCIB
If we allocate a new window for a bridge rather than reusing an existing
one set up by firmware to cover all the devices then the new window only
includes the range needed for the first device to allocate the resource.
If a request comes in to adjust this resource in order to extend a
downstream window for another device then this will fail as the rman
doesn't have any space, so we must first grow the bridge's own window.
This is needed to support successfully attaching more than one PCI
device on SiFive's HiFive Unmatched, which has the following topology:
Without this, the xHCI endpoint successfully attaches but NVMe M.2 M-key
endpoint fails to attach as, when its adjacent bridge (pcib6) attempts
to allocate a window from its parent (pcib2) on the other side of the
switch, its parent attempts to grow its own window by calling
bus_adjust_resource on its own parent (pcib1) which fails to call the
root port device (pcib0) to request more memory to grow its own window.
Had the root port been directly connected to the switch without the
bridge in the middle then the existing code would have worked, but the
extra hop broke it.
Alan Cox [Mon, 12 Jul 2021 23:25:37 +0000 (18:25 -0500)]
pmap: Micro-optimize pmap_remove_pages() on amd64 and arm64
Reduce the live ranges for three variables so that they do not span the
call to PHYS_TO_VM_PAGE(). This enables the compiler to generate
slightly smaller machine code.
As follow-on work to e4b8deb222278b2a, move page table page
allocation and freeing into their own functions. Use these
functions to provide separate kernel vs. user page table page
accounting, and to wrap common tasks such as management of
zero-filled page state.
amd64 pmap: convert to counter(9), add PV and pagetable page counts
This change converts most of the counters in the amd64 pmap from
global atomics to scalable counter(9) counters. Per discussion
with kib@, it also removes the handrolled per-CPU PCID save count
as it isn't considered generally useful.
The bulk of these counters remain guarded by PV_STATS, as it seems
unlikely that they will be useful outside of very specific debugging
scenarios. However, this change does add two new counters that
are available without PV_STATS. pt_page_count and pv_page_count
track the number of active physical-to-virtual list pages and page
table pages, respectively. These will be useful in evaluating
the memory footprint of pmap structures under various workloads,
which will help to guide future changes in this area.
Kristof Provost [Thu, 26 Aug 2021 08:25:57 +0000 (10:25 +0200)]
pf tests: ALTQ priority test
Test that ALTQ can prioritise one type of traffic over another. Do this
by establishing a slow link and saturating it with ICMP echos.
When prioritised TCP connections reliably go through. When not
prioritised TCP connections reliably fail.
Kristof Provost [Mon, 23 Aug 2021 14:58:50 +0000 (16:58 +0200)]
pf tests: test ALTQ CBQ on top of if_vlan
The main purpose of this test is to verify that we can use ALTQ on top
of if_vlan, but while we're here we also exercise the CBQ code. There's
already a basis test for HFSC, so it makes sense to test another
algorithm while we test if_vlan.
Cyril Zhang [Wed, 18 Aug 2021 17:41:33 +0000 (13:41 -0400)]
vmm: Add credential to cdev object
Add a credential to the cdev object in sysctl_vmm_create(), then check
that we have the correct credentials in sysctl_vmm_destroy(). This
prevents a process in one jail from opening or destroying the /dev/vmm
file corresponding to a VM in a sibling jail.
Add regression tests.
Reviewed by: jhb, markj
Sponsored by: The FreeBSD Foundation
Mark Johnston [Sat, 28 Aug 2021 19:50:44 +0000 (15:50 -0400)]
fsetown: Avoid process group lock recursion
Restore the pre-1d874ba4f8ba behaviour of disassociating the current
SIGIO recipient before looking up the specified process or process
group. This avoids a lock recursion in the scenario where a process
group is configured to receive SIGIO for an fd when it has already been
so configured.
Mark Johnston [Wed, 25 Aug 2021 20:18:10 +0000 (16:18 -0400)]
fsetown: Fix process lookup bugs
- pget()/pfind() will acquire the PID hash bucket locks, which are
sleepable sx locks, but this means that the sigio mutex cannot be held
while calling these functions. Instead, use pget() to hold the
process, after which we lock the sigio and proc locks, respectively.
- funsetownlst() assumes that processes cannot be registered for SIGIO
once they have P_WEXIT set. However, pfind() will happily return
exiting processes, breaking the invariant. Add an explicit check for
P_WEXIT in fsetown() to fix this. [1]
Fixes: f52979098d3c ("Fix a pair of races in SIGIO registration")
Reported by: syzkaller [1]
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Alan Cox [Sat, 24 Jul 2021 08:50:27 +0000 (03:50 -0500)]
amd64: Don't repeat unnecessary tests when cmpset fails
When a cmpset for removing the PG_RW bit in pmap_promote_pde() fails,
there is no need to repeat the alignment, PG_A, and PG_V tests just to
reload the PTE's value. The only bit that we need be concerned with at
this point is PG_M. Use fcmpset instead.
Alan Cox [Sat, 24 Jul 2021 03:50:10 +0000 (22:50 -0500)]
amd64: Eliminate a redundant test from pmap_enter_object()
The call to pmap_allow_2m_x_page() in pmap_enter_object() is redundant.
Specifically, even without the call to pmap_allow_2m_x_page() in
pmap_enter_object(), pmap_allow_2m_x_page() is eventually called by
pmap_enter_pde(), so the outcome will be the same. Essentially,
calling pmap_allow_2m_x_page() in pmap_enter_object() amounts to
"optimizing" for the unexpected case.
Alan Cox [Wed, 7 Jul 2021 18:16:03 +0000 (13:16 -0500)]
arm64: Simplify fcmpset failure in pmap_promote_l2()
When the initial fcmpset in pmap_promote_l2() fails, there is no need
to repeat the check for the physical address being 2MB aligned or for
the accessed bit being set. While the pmap is locked the hardware can
only transition the accessed bit from 0 to 1, and we have already
determined that it is 1 when the fcmpset fails.
Alan Cox [Sun, 4 Jul 2021 05:20:42 +0000 (00:20 -0500)]
On a failed fcmpset don't pointlessly repeat tests
In a few places, on a failed compare-and-set, both the amd64 pmap and
the arm64 pmap repeat tests on bits that won't change state while the
pmap is locked. Eliminate some of these unnecessary tests.
Alan Cox [Wed, 30 Jun 2021 05:59:21 +0000 (00:59 -0500)]
amd64: a simplication to pmap_remove_{all,write}
Eliminate some unnecessary unlocking and relocking when we have to retry
the operation to avoid deadlock. (All of the other pmap functions that
iterate over a PV list already implemented retries without these same
unlocking and relocking operations.)
Alan Cox [Tue, 29 Jun 2021 02:57:04 +0000 (21:57 -0500)]
arm64: a few simplications to pmap_remove_{all,write}
Eliminate some unnecessary unlocking and relocking when we have to retry
the operation to avoid deadlock. (All of the other pmap functions that
iterate over a PV list already implemented retries without these same
unlocking and relocking operations.)
Avoid a pointer dereference by using an existing local variable that
already holds the desired value.
Eliminate some unnecessary repetition of code on a failed fcmpset.
Specifically, there is no point in retesting the DBM bit because it
cannot change state while the pmap lock is held.
Alan Cox [Sat, 26 Jun 2021 03:29:38 +0000 (22:29 -0500)]
arm64: fix a potential KVA leak in pmap_demote_l1()
In the unlikely event that the 1 GB page mapping being demoted is used
to access the L1 page table page containing the 1 GB page mapping and
the vm_page_alloc() to allocate a new L2 page table page fails, we
would leak a page of kernel virtual address space. Fix this leak.
Alan Cox [Wed, 23 Jun 2021 05:10:20 +0000 (00:10 -0500)]
arm64: remove an unneeded test from pmap_clear_modify()
The page table entry for a 4KB page mapping must be valid if a PV entry
for the mapping exists, so there is no point in testing each page table
entry's validity when iterating over a PV list.
Alan Cox [Mon, 21 Jun 2021 07:45:21 +0000 (02:45 -0500)]
arm64: Use page_to_pvh() when the vm_page_t is known
When support for a sparse pv_table was added, the implementation of
pa_to_pvh() changed from a simple constant-time calculation to iterating
over the array vm_phys_segs[]. To mitigate this issue, an alternative
function, page_to_pvh(), was introduced that still runs in constant time
but requires the vm_page_t to be known. However, three cases where the
vm_page_t is known were not converted to page_to_pvh(). This change
converts those three cases.
Dimitry Andric [Fri, 27 Aug 2021 17:45:43 +0000 (19:45 +0200)]
Fix null pointer subtraction in mergesort()
Clang 13 produces the following warning for this function:
lib/libc/stdlib/merge.c:137:41: error: performing pointer subtraction with a null pointer has undefined behavior [-Werror,-Wnull-pointer-subtraction]
if (!(size % ISIZE) && !(((char *)base - (char *)0) % ISIZE))
^ ~~~~~~~~~
This is meant to check whether the size and base parameters are aligned
to the size of an int, so use our __is_aligned() macro instead.
Also remove the comment that indicated this "stupid subtraction" was
done to pacify some ancient and unknown Cray compiler, and which has
been there since the BSD 4.4 Lite Lib Sources were imported.
Kristof Provost [Tue, 24 Aug 2021 10:24:28 +0000 (12:24 +0200)]
pfctl: fix killing states by ID
Since the conversion to the new DIOCKILLSTATESNV the kernel no longer
exists the id and creatorid to be big-endian.
As a result killing states by id (i.e. `pfctl -k id -k 12345`) no longer
worked.
John Baldwin [Mon, 16 Aug 2021 17:42:46 +0000 (10:42 -0700)]
ktls: Fix accounting for TLS 1.0 empty fragments.
TLS 1.0 empty fragment mbufs have no payload and thus m_epg_npgs is
zero. However, these mbufs need to occupy a "unit" of space for the
purposes of M_NOTREADY tracking similar to regular mbufs. Previously
this was done for the page count returned from ktls_frame() and passed
to ktls_enqueue() as well as the page count passed to pru_ready().
However, sbready() and mb_free_notready() only use m_epg_nrdy to
determine the number of "units" of space in an M_EXT mbuf, so when a
TLS 1.0 fragment was marked ready it would mark one unit of the next
mbuf in the socket buffer as ready as well. To fix, set m_epg_nrdy to
1 for empty fragments. This actually simplifies the code as now only
ktls_frame() has to handle TLS 1.0 fragments explicitly and the rest
of the KTLS functions can just use m_epg_nrdy.
Andrew Gallatin [Wed, 11 Aug 2021 18:06:43 +0000 (14:06 -0400)]
ktls: Init reset tag task for cloned sessions
When cloning a ktls session (which is needed when we need to
switch output NICs for a NIC TLS session), we need to also
init the reset task, like we do when creating a new tls session.
John Baldwin [Tue, 15 Jun 2021 17:36:57 +0000 (10:36 -0700)]
ktls: Don't mark existing received mbufs notready for TOE TLS.
The TOE driver might receive decrypted TLS records that are enqueued
to the socket buffer after ktls_try_toe() returns and before
ktls_enable_rx() locks the receive buffer to call sb_mark_notready().
In that case, sb_mark_notready() would incorrectly treat the decrypted
TLS record as an encrypted record and schedule it for decryption.
This always resulted in the connection being dropped as the data in
the control message did not look like a valid TLS header.
To fix, don't try to handle software decryption of existing buffers in
the socket buffer for TOE TLS in ktls_enable_rx(). If a TOE TLS
driver needs to decrypt existing data in the socket buffer, the driver
will need to manage that in its tod_alloc_tls_session method.
Mitchell Horne [Wed, 11 Aug 2021 17:40:01 +0000 (14:40 -0300)]
kdb: Handle process enumeration before procinit()
Make kdb_thr_first() and kdb_thr_next() return sane values if the
allproc list and pidhashtbl haven't been initialized yet. This can
happen if the debugger is entered very early on, for example with the
'-d' boot flag.
This allows remote gdb to attach at such a time, and fixes some ddb
commands like 'show threads'.
Be explicit about the static initialization of these variables. This
part has no functional change.
Mitchell Horne [Wed, 4 Aug 2021 18:18:18 +0000 (15:18 -0300)]
arm: enable stack-smashing protection
With current generation clang/llvm it can pass all of our tests in
libc/ssp.
While here, remove the extra MACHINE_CPUARCH check for mips. SSP is
included in BROKEN_OPTIONS for this architecture in src.opts.mk, which
is enough to ensure normal builds won't set SSP_CFLAGS.
This affects powerpc64*, making binaries crash with segmentation fault
due to bad code generation around "__stack_chk_guard"
Root cause and/or proper fix is under investigation by:
https://bugs.llvm.org/show_bug.cgi?id=51590
Reviewed by: dim
MFC after: 2 days
Sponsored by: Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D31698
Andrew Turner [Fri, 16 Jul 2021 13:49:33 +0000 (13:49 +0000)]
Move setting arm64 HWCAP values to the ID tables
The HWCAPS values are based on the ID registers. Move setting these
to the existing ID register parsing code.
Previously we would need to handle all possible ID field values where
a HWCAP is set, however as most ID fields follow a scheme where when
the field increments it will only add new features meaning we only
need to check if the field is greater than when the HWCAP feature
was added.
While here stop setting HWCAP value that need kernel support, but this
support is missing.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31201
Andrew Turner [Tue, 27 Jul 2021 19:28:33 +0000 (19:28 +0000)]
Teach the arm64 kernel to identify the Arm AEM
The Arm Architecture Envelope Model is a simulator that models the
architecture rather than any specific implementation. Add its part ID
macro and add it to the list of Arm CPUs we can decode.
Andrew Turner [Wed, 14 Jul 2021 15:19:06 +0000 (15:19 +0000)]
Start to clean up arm64 address space selection
On arm64 we should use bit 55 of the address to decide if aan address
is a user or kernel address. Add a new macro with this check and a
second to ensure the address is in teh canonical form, i.e.
the top bits are all zero or all one.
This will help with supporting future cpu features, including Top
Byte Ignore, Pointer Authentication, and Memory Tagging.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31179
Kristof Provost [Sat, 21 Aug 2021 11:42:27 +0000 (13:42 +0200)]
altq: Fix panics on rmc_restart()
rmc_restart() is called from a timer, but can trigger traffic. This
means the curvnet context will not be set.
Use the vnet associated with the interface we're currently processing to
set it. We also have to enter net_epoch here, for the same reason.
Dave Fullard [Fri, 16 Jul 2021 04:02:48 +0000 (23:02 -0500)]
freebsd-update: create a ZFS boot environment on install
Updated freebsd-update to allow it to create boot environments using
bectl should the system support it. The bectl utility was updated in
r352211 (490e13c1403f) to support a 'check' to determine if the system
supports boot environments. If UFS is used, the bectl check will fail
then no attempt will be made to create the boot environment.
If freebsd-update is run inside a jail, no attempt will be made to
create a boot environment.
The boot environment function will create a new environment using the
format: current FreeBSD kernel version and date/timestamp, example:
12.0-RELEASE-p10_2019-10-03_185233
This functionality can be disabled by setting 'CreateBootEnv' in
freebsd-update.conf to 'no'.
Dimitry Andric [Thu, 26 Aug 2021 18:53:18 +0000 (20:53 +0200)]
Cleanup compiler warning flags in lib/libefivar/Makefile
There is no need to set -Wno-unused-parameter twice, and instead of
appending to CFLAGS, append to CWARNFLAGS instead. While here, add
-Wno-unused-but-set-variable for the sake of clang 13.0.0.
Dimitry Andric [Thu, 26 Aug 2021 20:06:49 +0000 (22:06 +0200)]
googletest: Silence warnings about deprecated implicit copy constructors
Our copy of googletest is rather stale, and causes a number of -Werror
warnings about implicit copy constructor definitions being deprecated,
because several classes have user-declared copy assignment operators.
Silence the warnings until we either upgrade or remove googletest.
Dan Langille [Sun, 15 Aug 2021 16:53:16 +0000 (12:53 -0400)]
Enable rc.d/jail within jails
Jails with jails is a supported. This change allows the script to run
upon startup with a jail. Without this, jails are not automatically
started within jails.
Toomas Soome [Sat, 21 Aug 2021 11:25:36 +0000 (14:25 +0300)]
loader: FB console does leave garbage on screen while scrolling
Scrolling screen will leave "trail" of chars from first column.
Apparently caused by cursor location mismanagement.
Make sure we do not [attempt to] set cursor out of the screen.
Alexander Motin [Mon, 2 Aug 2021 14:50:34 +0000 (10:50 -0400)]
sched_ule(4): Pre-seed sched_random().
I don't think it changes anything, but why not.
While there, make cpu_search_highest() use all 8 lower load bits for
noise, since it does not use cs_prefer and the code is not shared
with cpu_search_lowest() any more.
Alexander Motin [Mon, 2 Aug 2021 02:42:01 +0000 (22:42 -0400)]
sched_ule(4): Use trylock when stealing load.
On some load patterns it is possible for several CPUs to try steal
thread from the same CPU despite randomization introduced. It may
cause significant lock contention when holding one queue lock idle
thread tries to acquire another one. Use of trylock on the remote
queue allows both reduce the contention and handle lock ordering
easier. If we can't get lock inside tdq_trysteal() we just return,
allowing tdq_idled() handle it. If it happens in tdq_idled(), then
we repeat search for load skipping this CPU.
On 2-socket 80-thread Xeon system I am observing dramatic reduction
of the lock spinning time when doing random uncached 4KB reads from
12 ZVOLs, while IOPS increase from 327K to 403K.
Alexander Motin [Mon, 2 Aug 2021 02:07:51 +0000 (22:07 -0400)]
sched_ule(4): Reduce duplicate search for load.
When sched_highest() called for some CPU group returns nothing, idle
thread calls it for the parent CPU group. But the parent CPU group
also includes the CPU group we've just searched, and unless there is
a race going on, it is unlikely we find anything new this time.
Avoid the double search in case of parent group having only two sub-
groups (the most prominent case). Instead of escalating to the parent
group run the next search over the sibling subgroup and escalate two
levels up after if that fail too. In case of more than two siblings
the difference is less significant, while searching the parent group
can result in better decision if we find several candidate CPUs.
On 2-socket 40-core Xeon system I am measuring ~25% reduction of CPU
time spent inside cpu_search_highest() in both SMT (2x20x2) and non-
SMT (2x20) cases.
Alexander Motin [Thu, 29 Jul 2021 01:18:50 +0000 (21:18 -0400)]
Refactor/optimize cpu_search_*().
Remove cpu_search_both(), unused for many years. Without it there is
less sense for the trick of compiling common cpu_search() into separate
cpu_search_lowest() and cpu_search_highest(), so split them completely,
making code more readable. While there, split iteration over children
groups and CPUs, complicating code for very small deduplication.
Stop passing cpuset_t arguments by value and avoid some manipulations.
Since MAXCPU bump from 64 to 256, what was a single register turned
into 32-byte memory array, requiring memory allocation and accesses.
Splitting struct cpu_search into parameter and result parts allows to
even more reduce stack usage, since the first can be passed through
on recursion.
Remove CPU_FFS() from the hot paths, precalculating first and last CPU
for each CPU group in advance during initialization. Again, it was
not a problem for 64 CPUs before, but for 256 FFS needs much more code.
With these changes on 80-thread system doing ~260K uncached ZFS reads
per second I observe ~30% reduction of time spent in cpu_search_*().
Ka Ho Ng [Mon, 19 Apr 2021 08:07:03 +0000 (16:07 +0800)]
AMD-vi: Fortify IVHD device_identify process
- Use malloc(9) to allocate ivhd_hdrs list. The previous assumption
that there are at most 10 IVHDs in a system is not true. A counter
example would be a system with 4 IOMMUs, and each IOMMU is related
to IVHDs type 10h, 11h and 40h in the ACPI IVRS table.
- Always scan through the whole ivhd_hdrs list to find IVHDs that has
the same DeviceId but less prioritized IVHD type.
Sponsored by: The FreeBSD Foundation
MFC with: 74ada297e897
Reviewed by: grehan
Approved by: lwhsu (mentor)
Differential Revision: https://reviews.freebsd.org/D29525
Ka Ho Ng [Mon, 2 Aug 2021 09:54:40 +0000 (17:54 +0800)]
vmm: Bump vmname buffer in struct vm to VM_MAX_NAMELEN + 1
In hw.vmm.create sysctl handler the maximum length of vm name is
VM_MAX_NAMELEN. However in vm_create() the maximum length allowed is
only VM_MAX_NAMELEN - 1 chars. Bump the length of the internal buffer to
allow the length of VM_MAX_NAMELEN for vm name.
Reviewed by: grehan
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31372
Goran Mekić [Wed, 4 Aug 2021 10:04:54 +0000 (18:04 +0800)]
sound: Add an example of basic sound application
This is an example demonstrating the usage of the OSS-compatible APIs
provided by the sound(4) subsystem. It reads frames from a dsp node and
writes them to the same dsp node.