mhorne [Mon, 30 Nov 2020 22:16:11 +0000 (22:16 +0000)]
efibootmgr: fix an incorrect error handling check
efivar_device_path_to_unix_path() returns standard error codes on
failure and zero on success. Checking for a return value less than zero
means that the actual failure cases won't be handled. This could
manifest as a segfault during the subsequent call to printf().
Reviewed by: imp
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D27424
melifaro [Mon, 30 Nov 2020 21:59:52 +0000 (21:59 +0000)]
Move inner loop logic out of sysctl_sysctl_next_ls().
Refactor sysctl_sysctl_next_ls():
* Move huge inner loop out of sysctl_sysctl_next_ls() into a separate
non-recursive function, returning the next step to be taken.
* Update resulting node oid parts only on successful lookup
* Make sysctl_sysctl_next_ls() return boolean success/failure instead of errno,
slightly simplifying logic
markj [Mon, 30 Nov 2020 20:53:45 +0000 (20:53 +0000)]
qat: Initialize the crypto device ID to -1 instead of 0
Otherwise qat_detach() may attempt to deregister an unrelated crypto
driver if an error occurs in qat_attach() before crypto_get_driverid()
is called, since 0 is a valid driver ID.
MFC after: 3 days
Sponsored by: Rubicon Communications, LLC (Netgate)
markj [Mon, 30 Nov 2020 20:53:25 +0000 (20:53 +0000)]
qat: Fix firmware module autoloading
If firmware_get() fails to find a loaded firmware image, it searches for
candidate KLDs to load. It will search for a KLD containing a module
with the same name as the requested image, and failing that, will load a
KLD with the same basename as the requested image.
The module name given by fw_stub.awk is simply "<mangled KLD name>_fw".
QAT firmware modules contain two images, neither of which match either
of the names used during lookup, so automatic loading of firmware images
after mountroot does not work. Work around this by using the same
string for the first image name and for the KLD basename.
MFC after: 3 days
Sponsored by: Rubicon Communications, LLC (Netgate)
kib [Mon, 30 Nov 2020 17:03:26 +0000 (17:03 +0000)]
ffs: do not read full direct blocks if they are going to be overwritten.
BA_CLRBUF specifies that existing context of the block will be
completely overwritten by caller, so there is no reason to spend io
fetching existing data. We do the same for indirect blocks.
markj [Mon, 30 Nov 2020 16:18:33 +0000 (16:18 +0000)]
uma: Avoid allocating buckets with the cross-domain lock held
Allocation of a bucket can trigger a cross-domain free in the bucket
zone, e.g., if the per-CPU alloc bucket is empty, we free it and get
migrated to a remote domain. This can lead to deadlocks since a bucket
zone may allocate buckets from itself or a pair of bucket zones could be
allocating from each other.
Fix the problem by dropping the cross-domain lock before allocating a
new bucket and handling refill races. Use a list of empty buckets to
ensure that we can make forward progress.
Reported by: imp, mjg (witness(9) warnings)
Discussed with: jeff
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27341
olivier [Mon, 30 Nov 2020 15:04:35 +0000 (15:04 +0000)]
Fix compilation on head and while here:
- remove unwanted whitespaces
- remove useless function ifphys()
- fix the Makefile to install it into /usr/bin
manu [Mon, 30 Nov 2020 14:48:50 +0000 (14:48 +0000)]
arm: allwinner: aw_mmc: Add a sysctl for debuging
Add a new hw.aw_mmc.debug sysctl to help debugging the driver.
Bit 0 will debug card changes (removal, insertion, power up/down)
Bit 1 will debug ios changes
Bit 2 will debug interrupts received
Bit 3 will debug commands sent
tsoome [Mon, 30 Nov 2020 08:22:40 +0000 (08:22 +0000)]
Add VT driver for VBE framebuffer device
Implement vt_vbefb to support Vesa Bios Extensions (VBE) framebuffer with VT.
vt_vbefb is built based on vt_efifb and is assuming similar data for
initialization, use MODINFOMD_VBE_FB to identify the structure vbe_fb
in kernel metadata.
struct vbe_fb, is populated by boot loader, and is passed to kernel via
metadata payload.
mmel [Mon, 30 Nov 2020 07:01:12 +0000 (07:01 +0000)]
NVME: Don't try to swap data on little endian machines.
These swapping functions violate BUSDMA contract - we cannot write
to armed (by bus_dmamap_sync(PRE_..)) buffers. Remove them at least
from little endian machines until a better solution will be developed.
melifaro [Sun, 29 Nov 2020 19:43:33 +0000 (19:43 +0000)]
Remove RADIX_MPATH config option.
ROUTE_MPATH is the new config option controlling new multipath routing
implementation. Remove the last pieces of RADIX_MPATH-related code and
the config option.
kib [Sun, 29 Nov 2020 19:06:32 +0000 (19:06 +0000)]
Reduce MAXPHYS back to 128KB on 32bit architectures.
Some of them have limited KVA, like arm, which prevents startup from
allocating needed number of large pbufs. Other, for instance i386,
are dis-balanced enough after 4/4 that blind bump is probably harmful
because it allows for much more in-flight io than other tunables are
ready for.
Requested by: mmel
Reviewed by: emaste, mmel
Sponsored by: The FreeBSD Foundation
mmel [Sun, 29 Nov 2020 18:59:01 +0000 (18:59 +0000)]
Store MPIDR register in pcpu.
MPIDR represents physical locality of given core and it should be used as
the only viable/robust connection between cpuid (which have zero relation to
cores topology) and external description (for example in FDT). It can be
used for determining which interrupt is associated to given per-CPU PMU
or by scheduler for determining big/little core or cluster topology.
andrew [Sun, 29 Nov 2020 16:22:33 +0000 (16:22 +0000)]
Only set the PCI bus end when we are reducing it
We read the bus end value from the _CRS method. On some systems we need
to further limit it based on the MCFG table.
Support this by setting a default value, then update it if needed in the
_CRS table, and finally reduce it if it is past the end of the MCFG tabel.
This will allow for both systems that use either method to encode this
value.
This partially reverts r347929, removing the error printf.
Reviewed by: philip
Tested by: philip, Andrey Fesenko <f0andrey_gmail.com>
MFC after: 2 weeks
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D27274
melifaro [Sun, 29 Nov 2020 13:52:06 +0000 (13:52 +0000)]
Add nhop_ref_any() to unify referencing nhop or nexthop group.
It allows code within routing subsystem to transparently reference nexthops
and nexthop groups, similar to nhop_free_any(), abstracting ROUTE_MPATH
details.
melifaro [Sun, 29 Nov 2020 13:41:49 +0000 (13:41 +0000)]
Refactor fib4/fib6 functions.
No functional changes.
* Make lookup path of fib<4|6>_lookup_debugnet() separate functions
(fib<46>_lookup_rt()). These will be used in the control plane code
requiring unlocked radix operations and actual prefix pointer.
* Make lookup part of fib<4|6>_check_urpf() separate functions.
This change simplifies the switch to alternative lookup implementations,
which helps algorithmic lookups introduction.
* While here, use static initializers for IPv4/IPv6 keys
melifaro [Sun, 29 Nov 2020 13:27:24 +0000 (13:27 +0000)]
Add tracking for rib/nhops/nhgrp objects and provide cumulative number accessors.
The resulting KPI can be used by routing table consumers to estimate the required
scale for route table export.
* Add tracking for rib routes
* Add accessors for number of nexthops/nexthop objects
* Simplify rib_unsubscribe: store rnh we're attached to instead of requiring it up
again on destruction. This helps in the cases when rnh is not linked yet/already unlinked.
kib [Sun, 29 Nov 2020 10:30:56 +0000 (10:30 +0000)]
bio aio: Destroy ephemeral mapping before unwiring page.
Apparently some architectures, like ppc in its hashed page tables
variants, account mappings by pmap_qenter() in the response from
pmap_is_page_mapped().
While there, eliminate useless userp variable.
Noted and reviewed by: alc (previous version)
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D27409
mmel [Sun, 29 Nov 2020 08:40:12 +0000 (08:40 +0000)]
Remove the pre-ARMv6 and pre-INTRNG code.
ARM has required ARMV6+ and INTRNg for some time now, so remove
always false #ifdefs and unconditionally do always true #ifdefs.
mav [Sun, 29 Nov 2020 00:20:31 +0000 (00:20 +0000)]
Increase nvme(4) maximum transfer size from 1MB to 2MB.
With 4KB page size the 2MB is the maximum we can address with one page PRP.
Going further would require chaining, that would add some more complexity.
On the other side, to reduce memory consumption, allocate the PRP memory
respecting maximum transfer size reported in the controller identify data.
Many of NVMe devices support much smaller values, starting from 128KB.
To do that we have to change the initialization sequence to pull the data
earlier, before setting up the I/O queue pairs. The admin queue pair is
still allocated for full MIN(maxphys, 2MB) size, but it is not a big deal,
since there is only one such queue with only 16 trackers.
kib [Sat, 28 Nov 2020 12:12:51 +0000 (12:12 +0000)]
Make MAXPHYS tunable. Bump MAXPHYS to 1M.
Replace MAXPHYS by runtime variable maxphys. It is initialized from
MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.
Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer
cache buffers exactly to atop(maxbcachebuf) (currently it is sized to
atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1.
The +1 for pbufs allow several pbuf consumers, among them vmapbuf(),
to use unaligned buffers still sized to maxphys, esp. when such
buffers come from userspace (*). Overall, we save significant amount
of otherwise wasted memory in b_pages[] for buffer cache buffers,
while bumping MAXPHYS to desired high value.
Eliminate all direct uses of the MAXPHYS constant in kernel and driver
sources, except a place which initialize maxphys. Some random (and
arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted
straight. Some drivers, which use MAXPHYS to size embeded structures,
get private MAXPHYS-like constant; their convertion is out of scope
for this work.
Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs,
dev/siis, where either submitted by, or based on changes by mav.
kevans [Sat, 28 Nov 2020 01:21:11 +0000 (01:21 +0000)]
kern: cpuset: drop the lock to allocate domainsets
Restructure the loop a little bit to make it a little more clear how it
really operates: we never allocate any domains at the beginning of the first
iteration, and it will run until we've satisfied the amount we need or we
encounter an error.
The lock is now taken outside of the loop to make stuff inside the loop
easier to evaluate w.r.t. locking.
This fixes it to not try and allocate any domains for the freelist under the
spinlock, which would have happened before if we needed any new domains.
emaste [Fri, 27 Nov 2020 21:38:03 +0000 (21:38 +0000)]
addr2line: add label checks when DW_AT_range and DW_AT_low_pc cannot be used
Check label's ranges for address we want to translate if a CU doesn't
have usable DW_AT_range or DW_AT_low_pc.
Use more appropriate names: "struct CU" -> "struct range"
Developed as part of upstream ELF Tool Chain bug report
https://sourceforge.net/p/elftoolchain/tickets/552/ although this does
not address the specific case reported there.
Submitted by: Tiger Gao <tig@freebsdfoundation.org>
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23782
br [Fri, 27 Nov 2020 21:37:48 +0000 (21:37 +0000)]
o Move options IOMMU from Debugging section back to the Bus section
where it originally was. The bug introduced in r366267.
o Remove options IOMMU from i386/MINIMAL as we don't have it in
i386/GENERIC.
mav [Fri, 27 Nov 2020 15:50:20 +0000 (15:50 +0000)]
Some code reorganization.
- Remove code duplication by adding two new functions to execute prepared
queue entry via either mbox or request queue and wait for result.
- Since the new function executing via request queue sleeps any way, make
it sleep also in case of overflows or handle shortages. It should make it
more reliable and less affecting other less flexible request queue users.
- Turn isp_target_put_entry() into not target-specific isp_send_entry().
- Make handling of responses with control handles more universal.
- Move RQSTYPE_RPT_ID_ACQ handling into new function.
- Inline isp_handle_other_response(), becoming trivial after above.
- Clean the list of IOCBs from pre-24xx ones.
se [Fri, 27 Nov 2020 09:00:21 +0000 (09:00 +0000)]
Make generated C files depend on this Makefile
The contents of lib.c, lib2.c, bc_help.c, and dc_help.c depends on the
parameters passed to strgen.sh in this Makefile. A change to the number
of parameters of strgen.sh has been applied to the invocation of this
command, but this did not cause a rebuild of the generated files.
bcran [Fri, 27 Nov 2020 08:00:32 +0000 (08:00 +0000)]
Fix bhyve SMBIOS type 19 handling to avoid misreporting total RAM amount
This fixes the amount of memory displayed in the EDK2 UiApp to be the same
as passed on the bhyve command line. Otherwise, 8GB is displayed as 4GB,
32GB as 28GB etc.
bcran [Fri, 27 Nov 2020 07:53:15 +0000 (07:53 +0000)]
bhyve: fix smbiostbl.c style issues and add comment about date format
Fix a couple of style issues introduced in my previous commit.
Add a comment explaining that the SMBIOS specification defines the date
format to be mm/dd/yyyy, which is why we don't use ISO 8601.
manu [Thu, 26 Nov 2020 20:22:34 +0000 (20:22 +0000)]
arm64: Do not rely on SPCR table to detect acpi
Since EDK2 commit d8e36289cef7bde628b023219cd65fa8e8d4562a, the Graphical console may
completely hide SPCR, causing panics later when locating timers.
As such simply rely on the ACPI Root pointer presence.
kib [Thu, 26 Nov 2020 18:16:32 +0000 (18:16 +0000)]
nullfs: provide custom bypass for VOP_READ_PGCACHE().
Normal bypass expects locked vnode, which is not true for
VOP_READ_PGCACHE(). Ensure liveness of the lower vnode by taking the
upper vnode interlock, which is also taked by null_reclaim() when
setting v_data to NULL.
Reported and tested by: pho
Reviewed by: markj, mjg
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D27327
kib [Thu, 26 Nov 2020 18:08:42 +0000 (18:08 +0000)]
More careful handling of the mount failure.
- VFS_UNMOUNT() requires vn_start_write() around it [*].
- call VFS_PURGE() before unmount.
- do not destroy mp if cleanup unmount did not succeed.
- set MNTK_UNMOUNT, and indicate forced unmount with MNTK_UNMOUNTF
for VFS_UNMOUNT() in cleanup.
arichardson [Thu, 26 Nov 2020 17:37:27 +0000 (17:37 +0000)]
bsd.lib.mk: Work around build system raciness
We are seeing regular build failures due to libc.so being installed again and
another parallel make job tries to read the partially written libc.so at the
same time. When building with -j32 or higher this almost always happens on
the first clean build (subsequent incremental builds always work fine).
Using -S should "fix" the "section header table goes past the end of the
file: e_shoff = 0x..." errors that have started to plague our builds.
We originally thought this only affected CheriBSD, but I just got the same
error while building the latest upstream FreeBSD.
The real fix should be to not install libraries twice, but until then this
workaround is needed.
Original patch by jrtc27@, I only made some minor changes to the comment.
uqs [Thu, 26 Nov 2020 14:42:16 +0000 (14:42 +0000)]
GH Actions: Use pre-installed clang packages
Also fix the run by setting up the environment in non-deprecated way.
Always run with --debug to understand better what sort of stuff is happening in
the background. Also split out the bmake bootstrap stage (takes about 31s on
ubuntu, but 1m14 on macOS?)
Drops the dependency on coreutils (realpath, nproc) and thus (?) fixes macOS to
be just as fast (4 logical cores vs 2 physical cores before, go figure.)
arichardson [Thu, 26 Nov 2020 13:31:57 +0000 (13:31 +0000)]
Significantly speed up libthr/mutex_test and make more reliable
Instead of using a simple global++ as the data race, with this change we
perform the increment by loading the global, delaying for a bit and then
storing back the incremented value. If I move the increment outside of the
mutex protected range, I can now see the data race with only 100 iterations
on amd64 in almost all cases. Before this change such a racy test almost
always passed with < 100,000 iterations and only reliably failed with the
current limit of 10 million.
I noticed this poorly written test because the mutex:mutex{2,3} and
timedmutex:mutex{2,3} tests were always timing out on our CheriBSD Jenkins.
Writing good concurrency tests is hard so I won't attempt to do so, but this
change should make the test more likely to fail if pthread_mutex_lock is not
implemented correctly while also significantly reducing the time it takes to
run these four tests. It will also reduce the time it takes for QEMU RISC-V
testsuite runs by almost 40 minutes (out of currently 7 hours).