gonzo [Thu, 3 Dec 2020 05:39:27 +0000 (05:39 +0000)]
Add support for hw.physmem tunable for ARM/ARM64/RISC-V platforms
hw.physmem tunable allows to limit number of physical memory available to the
system. It's handled in machdep files for x86 and PowerPC. This patch adds
required logic to the consolidated physmem management interface that is used by
ARM, ARM64, and RISC-V.
bdragon [Thu, 3 Dec 2020 01:39:59 +0000 (01:39 +0000)]
[PowerPC64LE] Fix LE VSX/fpr interop
In the PCB struct, we need to match the VSX register file layout
correctly, as the VSRs shadow the FPRs.
In LE, we need to have a dword of padding before the fprs so they end up
on the correct side, as the struct may be manipulated by either the FP
routines or the VSX routines.
Additionally, when saving and restoring fprs, we need to explicitly target
the fpr union member so it gets offset correctly on LE.
Fixes weirdness with FP registers in VSX-using programs (A FPR that was
saved by the FP routines but restored by the VSX routines was becoming 0
due to being loaded to the wrong side of the VSR.)
mhorne [Wed, 2 Dec 2020 21:01:52 +0000 (21:01 +0000)]
uart: allow UART_DEV_DBGPORT for fdt consoles
Allow fdt devices to be used as debug ports for gdb(4).
A debug console can be specified with the "freebsd,debug-path" property
in the device tree's /chosen node, or using the environment variable
hw.fdt.dbgport.
The device should be specified by its name in the device tree, for
example hw.fdt.dbgport="serial2".
r367917 fixed the backpressure on the netmap rxq being stopped but that
doesn't help if some other netmap rxq is starved (because it is stopping
too although the driver doesn't know this yet) and blocks the pipeline.
An alternate fix that works in all cases will be checked in instead.
mhorne [Wed, 2 Dec 2020 17:37:32 +0000 (17:37 +0000)]
em: fix a null de-reference in em_free_pci_resources
A failure in iflib_device_register() can result in
em_free_pci_resources() being called after receive queues have already
been freed. In particular, a failure to allocate IRQ resources will goto
fail_queues, where IFDI_QUEUES_FREE() will be called via
iflib_tx_structures_free(), preceding the call to IFDI_DETACH().
Cope with this by checking adapter->rx_queues before dereferencing it.
A similar check is present in ixgbe(4) and ixl(4).
MFC after: 1 week
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D27260
mmel [Wed, 2 Dec 2020 16:54:24 +0000 (16:54 +0000)]
NVME: Multiple busdma related fixes.
- in nvme_qpair_process_completions() do dma sync before completion buffer
is used.
- in nvme_qpair_submit_tracker(), don't do explicit wmb() also for arm
and arm64. Bus_dmamap_sync() on these architectures is sufficient to ensure
that all CPU stores are visible to external (including DMA) observers.
- Allocate completion buffer as BUS_DMA_COHERENT. On not-DMA coherent systems,
buffers continuously owned (and accessed) by DMA must be allocated with this
flag. Note that BUS_DMA_COHERENT flag is no-op on DMA coherent systems
(or coherent buses in mixed systems).
markj [Wed, 2 Dec 2020 16:46:45 +0000 (16:46 +0000)]
rtsold: Fix bugs reported by Coverity
- Avoid leaking a socket if llflags_get() fails.
- Avoid leaking a file handle if rtsold_init_dumpfile() fails.
- Tighten the check in if_nametosdl() which determines whether we failed
to find the specified interface.
- Fix errno handling in an error path in rtsock_open().
markj [Wed, 2 Dec 2020 16:01:43 +0000 (16:01 +0000)]
pf: Fix table entry counter toggling
When updating a table, pf will keep existing table entry structures
corresponding to addresses that are in both of the old and new tables.
However, the update may also enable or disable per-entry counters which
are allocated separately. Thus when toggling PFR_TFLAG_COUNTERS, the
entries may be missing counters or may have unused counters allocated.
Fix the problem by modifying pfr_ina_commit() to transfer counters
from or to entries in the shadow table.
PR: 251414
Reported by: sigsys@gmail.com
Reviewed by: kp
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27440
rmacklem [Tue, 1 Dec 2020 23:33:10 +0000 (23:33 +0000)]
Improve man page for AmazonEFS mounts.
PR#250770 was actually just a misunderstanding of what
NFS mount options are needed for AmazonEFS mounts.
This patch attempts to clarify the manpage to clarify this.
kib [Tue, 1 Dec 2020 22:53:33 +0000 (22:53 +0000)]
lio_listio(2): send signal even if number of jobs is zero.
Right now, if lio registered zero jobs, syscall frees lio job
structure, cleaning up queued ksi. As result, the realtime signal is
dequeued and never delivered.
Fix it by allowing sendsig() to copy ksi when job count is zero.
PR: 220398
Reported and reviewed by: asomers
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D27421
tsoome [Tue, 1 Dec 2020 22:28:02 +0000 (22:28 +0000)]
ficl: instead of pad, emit can use local variable
Pad in forth is used as "scratchpad" and internal implementations
should not use it. Ficl does not really follow this rule and this can fire back.
emit has no need to use pad, we can use local variable instead.
gonzo [Tue, 1 Dec 2020 20:10:55 +0000 (20:10 +0000)]
[arm64] Bump MAXMEMDOM value to 8 to match amd64
On some of the server-grade ARM64 machines the number of NUMA domains is higher
than 2. When booting GENERIC kernel on such machines the SRAT parser fails
leaving the system with a single domain. To make GENERIC kernel usable on those
server, match the parameter value with the one for amd64 arch.
Some USB WLAN devices have "on-board" storage showing up as umass
and making the root mount wait for a very long time.
The WLAN drivers know how to deal with that an issue an eject
command later when attaching themselves.
Introduce a quirk to not probe these devices as umass and avoid
hangs and confusion altogether.
jhb [Tue, 1 Dec 2020 17:17:22 +0000 (17:17 +0000)]
Make stack_save*() more robust on MIPS.
- Validate any stack addresses read from against td_kstack before
reading. If an unwind operation would attempt to read outside the
bounds of td_kstack, abort the unwind instead.
- For stack_save_td(), don't use the PC and SP from the current
thread, instead read the PC and SP from pcb_context[].
- For stack_save(), use the current PC and SP of the current thread,
not the values from pcb_regs (the horribly named td_frame of the
outermost trapframe). The result was that stack_trace() never
logged _any_ kernel frames but only the frame from the saved
userspace registers on entry from the kernel.
- Inline the one use of stack_register_fetch().
- Add a VALID_PC() helper macro and simplify types to remove
excessive casts in stack_capture().
- Fix stack_capture() to work on compilers written in this century.
Don't treat function epilogues as function prologues by skipping
additions to SP when searching for a function start.
- Add some comments to stack_capture() and fix some style bugs.
if: Fix panic when destroying vnet and epair simultaneously
When destroying a vnet and an epair (with one end in the vnet) we often
panicked. This was the result of the destruction of the epair, which destroys
both ends simultaneously, happening while vnet_if_return() was moving the
struct ifnet to its home vnet. This can result in a freed ifnet being re-added
to the home vnet V_ifnet list. That in turn panics the next time the ifnet is
used.
Prevent this race by ensuring that vnet_if_return() cannot run at the same time
as if_detach() or epair_clone_destroy().
markj [Tue, 1 Dec 2020 16:06:31 +0000 (16:06 +0000)]
vmem: Revert r364744
A pair of bugs are believed to have caused the hangs described in the
commit log message for r364744:
1. uma_reclaim() could trigger reclamation of the reserve of boundary
tags used to avoid deadlock. This was fixed by r366840.
2. The loop in vmem_xalloc() would in some cases try to allocate more
boundary tags than the expected upper bound of BT_MAXALLOC. The
reserve is sized based on the value BT_MAXMALLOC, so this behaviour
could deplete the reserve without guaranteeing a successful
allocation, resulting in a hang. This was fixed by r366838.
Relevant vendor changes:
Issue #1258: add archive_read_support_filter_by_code()
PR #1347: mtree digest reader support
Issue #1381: skip hardlinks pointing to itself on extraction
PR #1387: fix writing of cpio archives with hardlinks without file type
PR #1388: fix rdev field in cpio format for device nodes
PR #1389: completed support for UTF-8 encoding conversion
PR #1405: more formats in archive_read_support_format_by_code()
PR #1408: fix uninitialized size in rar5_read_data
PR #1409: system extended attribute support
PR #1435: support for decompression of symbolic links in zipx archives
Issue #1456: memory leak after unsuccessful archive_write_open_filename
Relevant vendor changes:
Issue #1258: add archive_read_support_filter_by_code()
PR #1347: mtree digest reader support
Issue #1381: skip hardlinks pointing to itself on extraction
PR #1387: fix writing of cpio archives with hardlinks without file type
PR #1388: fix rdev field in cpio format for device nodes
PR #1389: completed support for UTF-8 encoding conversion
PR #1405: more formats in archive_read_support_format_by_code()
PR #1408: fix uninitialized size in rar5_read_data
PR #1409: system extended attribute support
PR #1435: support for decompression of symbolic links in zipx archives
Issue #1456: memory leak after unsuccessful archive_write_open_filename
mhorne [Mon, 30 Nov 2020 22:16:11 +0000 (22:16 +0000)]
efibootmgr: fix an incorrect error handling check
efivar_device_path_to_unix_path() returns standard error codes on
failure and zero on success. Checking for a return value less than zero
means that the actual failure cases won't be handled. This could
manifest as a segfault during the subsequent call to printf().
Reviewed by: imp
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D27424
melifaro [Mon, 30 Nov 2020 21:59:52 +0000 (21:59 +0000)]
Move inner loop logic out of sysctl_sysctl_next_ls().
Refactor sysctl_sysctl_next_ls():
* Move huge inner loop out of sysctl_sysctl_next_ls() into a separate
non-recursive function, returning the next step to be taken.
* Update resulting node oid parts only on successful lookup
* Make sysctl_sysctl_next_ls() return boolean success/failure instead of errno,
slightly simplifying logic
markj [Mon, 30 Nov 2020 20:53:45 +0000 (20:53 +0000)]
qat: Initialize the crypto device ID to -1 instead of 0
Otherwise qat_detach() may attempt to deregister an unrelated crypto
driver if an error occurs in qat_attach() before crypto_get_driverid()
is called, since 0 is a valid driver ID.
MFC after: 3 days
Sponsored by: Rubicon Communications, LLC (Netgate)
markj [Mon, 30 Nov 2020 20:53:25 +0000 (20:53 +0000)]
qat: Fix firmware module autoloading
If firmware_get() fails to find a loaded firmware image, it searches for
candidate KLDs to load. It will search for a KLD containing a module
with the same name as the requested image, and failing that, will load a
KLD with the same basename as the requested image.
The module name given by fw_stub.awk is simply "<mangled KLD name>_fw".
QAT firmware modules contain two images, neither of which match either
of the names used during lookup, so automatic loading of firmware images
after mountroot does not work. Work around this by using the same
string for the first image name and for the KLD basename.
MFC after: 3 days
Sponsored by: Rubicon Communications, LLC (Netgate)
kib [Mon, 30 Nov 2020 17:03:26 +0000 (17:03 +0000)]
ffs: do not read full direct blocks if they are going to be overwritten.
BA_CLRBUF specifies that existing context of the block will be
completely overwritten by caller, so there is no reason to spend io
fetching existing data. We do the same for indirect blocks.
markj [Mon, 30 Nov 2020 16:18:33 +0000 (16:18 +0000)]
uma: Avoid allocating buckets with the cross-domain lock held
Allocation of a bucket can trigger a cross-domain free in the bucket
zone, e.g., if the per-CPU alloc bucket is empty, we free it and get
migrated to a remote domain. This can lead to deadlocks since a bucket
zone may allocate buckets from itself or a pair of bucket zones could be
allocating from each other.
Fix the problem by dropping the cross-domain lock before allocating a
new bucket and handling refill races. Use a list of empty buckets to
ensure that we can make forward progress.
Reported by: imp, mjg (witness(9) warnings)
Discussed with: jeff
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27341
olivier [Mon, 30 Nov 2020 15:04:35 +0000 (15:04 +0000)]
Fix compilation on head and while here:
- remove unwanted whitespaces
- remove useless function ifphys()
- fix the Makefile to install it into /usr/bin
manu [Mon, 30 Nov 2020 14:48:50 +0000 (14:48 +0000)]
arm: allwinner: aw_mmc: Add a sysctl for debuging
Add a new hw.aw_mmc.debug sysctl to help debugging the driver.
Bit 0 will debug card changes (removal, insertion, power up/down)
Bit 1 will debug ios changes
Bit 2 will debug interrupts received
Bit 3 will debug commands sent
tsoome [Mon, 30 Nov 2020 08:22:40 +0000 (08:22 +0000)]
Add VT driver for VBE framebuffer device
Implement vt_vbefb to support Vesa Bios Extensions (VBE) framebuffer with VT.
vt_vbefb is built based on vt_efifb and is assuming similar data for
initialization, use MODINFOMD_VBE_FB to identify the structure vbe_fb
in kernel metadata.
struct vbe_fb, is populated by boot loader, and is passed to kernel via
metadata payload.
mmel [Mon, 30 Nov 2020 07:01:12 +0000 (07:01 +0000)]
NVME: Don't try to swap data on little endian machines.
These swapping functions violate BUSDMA contract - we cannot write
to armed (by bus_dmamap_sync(PRE_..)) buffers. Remove them at least
from little endian machines until a better solution will be developed.
melifaro [Sun, 29 Nov 2020 19:43:33 +0000 (19:43 +0000)]
Remove RADIX_MPATH config option.
ROUTE_MPATH is the new config option controlling new multipath routing
implementation. Remove the last pieces of RADIX_MPATH-related code and
the config option.
kib [Sun, 29 Nov 2020 19:06:32 +0000 (19:06 +0000)]
Reduce MAXPHYS back to 128KB on 32bit architectures.
Some of them have limited KVA, like arm, which prevents startup from
allocating needed number of large pbufs. Other, for instance i386,
are dis-balanced enough after 4/4 that blind bump is probably harmful
because it allows for much more in-flight io than other tunables are
ready for.
Requested by: mmel
Reviewed by: emaste, mmel
Sponsored by: The FreeBSD Foundation
mmel [Sun, 29 Nov 2020 18:59:01 +0000 (18:59 +0000)]
Store MPIDR register in pcpu.
MPIDR represents physical locality of given core and it should be used as
the only viable/robust connection between cpuid (which have zero relation to
cores topology) and external description (for example in FDT). It can be
used for determining which interrupt is associated to given per-CPU PMU
or by scheduler for determining big/little core or cluster topology.
andrew [Sun, 29 Nov 2020 16:22:33 +0000 (16:22 +0000)]
Only set the PCI bus end when we are reducing it
We read the bus end value from the _CRS method. On some systems we need
to further limit it based on the MCFG table.
Support this by setting a default value, then update it if needed in the
_CRS table, and finally reduce it if it is past the end of the MCFG tabel.
This will allow for both systems that use either method to encode this
value.
This partially reverts r347929, removing the error printf.
Reviewed by: philip
Tested by: philip, Andrey Fesenko <f0andrey_gmail.com>
MFC after: 2 weeks
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D27274
melifaro [Sun, 29 Nov 2020 13:52:06 +0000 (13:52 +0000)]
Add nhop_ref_any() to unify referencing nhop or nexthop group.
It allows code within routing subsystem to transparently reference nexthops
and nexthop groups, similar to nhop_free_any(), abstracting ROUTE_MPATH
details.
melifaro [Sun, 29 Nov 2020 13:41:49 +0000 (13:41 +0000)]
Refactor fib4/fib6 functions.
No functional changes.
* Make lookup path of fib<4|6>_lookup_debugnet() separate functions
(fib<46>_lookup_rt()). These will be used in the control plane code
requiring unlocked radix operations and actual prefix pointer.
* Make lookup part of fib<4|6>_check_urpf() separate functions.
This change simplifies the switch to alternative lookup implementations,
which helps algorithmic lookups introduction.
* While here, use static initializers for IPv4/IPv6 keys
melifaro [Sun, 29 Nov 2020 13:27:24 +0000 (13:27 +0000)]
Add tracking for rib/nhops/nhgrp objects and provide cumulative number accessors.
The resulting KPI can be used by routing table consumers to estimate the required
scale for route table export.
* Add tracking for rib routes
* Add accessors for number of nexthops/nexthop objects
* Simplify rib_unsubscribe: store rnh we're attached to instead of requiring it up
again on destruction. This helps in the cases when rnh is not linked yet/already unlinked.
kib [Sun, 29 Nov 2020 10:30:56 +0000 (10:30 +0000)]
bio aio: Destroy ephemeral mapping before unwiring page.
Apparently some architectures, like ppc in its hashed page tables
variants, account mappings by pmap_qenter() in the response from
pmap_is_page_mapped().
While there, eliminate useless userp variable.
Noted and reviewed by: alc (previous version)
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D27409
mmel [Sun, 29 Nov 2020 08:40:12 +0000 (08:40 +0000)]
Remove the pre-ARMv6 and pre-INTRNG code.
ARM has required ARMV6+ and INTRNg for some time now, so remove
always false #ifdefs and unconditionally do always true #ifdefs.
mav [Sun, 29 Nov 2020 00:20:31 +0000 (00:20 +0000)]
Increase nvme(4) maximum transfer size from 1MB to 2MB.
With 4KB page size the 2MB is the maximum we can address with one page PRP.
Going further would require chaining, that would add some more complexity.
On the other side, to reduce memory consumption, allocate the PRP memory
respecting maximum transfer size reported in the controller identify data.
Many of NVMe devices support much smaller values, starting from 128KB.
To do that we have to change the initialization sequence to pull the data
earlier, before setting up the I/O queue pairs. The admin queue pair is
still allocated for full MIN(maxphys, 2MB) size, but it is not a big deal,
since there is only one such queue with only 16 trackers.
kib [Sat, 28 Nov 2020 12:12:51 +0000 (12:12 +0000)]
Make MAXPHYS tunable. Bump MAXPHYS to 1M.
Replace MAXPHYS by runtime variable maxphys. It is initialized from
MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.
Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer
cache buffers exactly to atop(maxbcachebuf) (currently it is sized to
atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1.
The +1 for pbufs allow several pbuf consumers, among them vmapbuf(),
to use unaligned buffers still sized to maxphys, esp. when such
buffers come from userspace (*). Overall, we save significant amount
of otherwise wasted memory in b_pages[] for buffer cache buffers,
while bumping MAXPHYS to desired high value.
Eliminate all direct uses of the MAXPHYS constant in kernel and driver
sources, except a place which initialize maxphys. Some random (and
arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted
straight. Some drivers, which use MAXPHYS to size embeded structures,
get private MAXPHYS-like constant; their convertion is out of scope
for this work.
Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs,
dev/siis, where either submitted by, or based on changes by mav.