It was realized just a little too late that this was a hack that belonged in
individual regex(3)-using applications. It was surrounded in NOTYET and not
implemented in the engine, so remove it.
kevans [Sat, 5 Dec 2020 03:16:05 +0000 (03:16 +0000)]
libregex: implement \b and \B (word boundary, not word boundary)
This is the last of the needed GNU expressions before we can unleash bsdgrep
by default. \b is effectively an agnostic equivalent of \< and \>, while
\B will match every space that isn't making a transition from
nonchar -> char or char -> nonchar.
kevans [Sat, 5 Dec 2020 03:13:47 +0000 (03:13 +0000)]
libregex: implement \` and \' (begin-of-subj, end-of-subj)
These are GNU extensions, generally equivalent to ^ and $ except that the
new syntax will not match beginning of line after the first in a multi-line
expression or the end of line before absolute last in a multi-line
expression.
kevans [Sat, 5 Dec 2020 02:21:58 +0000 (02:21 +0000)]
Retire GNU_GREP_COMPAT knob
This was introduced and then disabled by default primarily to avoid dealing
with bugs in libgnuregex. rS363823 switched to using libregex for it, so
let's just rip the option out now so we can make sure we're getting tested
with libregex via bsdgrep.
cem [Sat, 5 Dec 2020 00:33:28 +0000 (00:33 +0000)]
Add CFI start/end proc directives to arm64, i386, and ppc
Follow-up to r353959 and r368070: do the same for other architectures.
arm32 already seems to use its own .fnstart/.fnend directives, which
appear to be ARM-specific variants of the same thing. Likewise, MIPS
uses .frame directives.
imp [Fri, 4 Dec 2020 21:34:48 +0000 (21:34 +0000)]
nvme: Remove a wmb() that's not necessary.
bus_dmamap_sync() ensures that memory that's prepared for PREWRITE can
be DMA'd immediately after it returns. The details differ, but this
mirrors atomic thread release semantics, at least for the buffers
synced.
For non-x86 platforms, bus_dmamap_sync() has the right syncing and
fences. So in the past, wmb() had been omitted for them.
For x86 platforms, the memory ordering is already strong enough to
ensure DMA to the device sees the current contents. As such, we don't
need the wmb() here. It translates to an sfence which is only needed
for writes to regions that have the write combining attribute set or
when some exotic opcodes are used. The nvme driver does neither of
these. Since bus_dmamap_sync() includes atomic_thread_fence_rel, we
can be assured any optimizer won't reorder the bus_dmamap_sync and the
bus_space_write operations. The wmb() was a vestiage of the pre-busdma
version initially committed to the tree.
imp [Fri, 4 Dec 2020 21:34:04 +0000 (21:34 +0000)]
busdma: Annotate bus_dmamap_sync() with fence
Add an explicit thread fence release before returning from
bus_dmamap_sync. This should be a no-op in practice, but makes
explicit that all ordinary stores will be completed before subsequent
reads/writes to ordinary device memory. On x86, normal memory ordering
is strong enough to generally guarantee this. The fence keeps the
optimizer (likely LTO) from reordering other calls around this.
The other architectures already have calls, as appropriate, that
are equivalent.
Note: On x86, there is one exception to this rule. If you've mapped
memory as write combining, then you will need to add a sfence or
similar. Normally, though, busdma doesn't operate on such memory, and
drivers that do already cope appropriately.
mhorne [Fri, 4 Dec 2020 21:12:17 +0000 (21:12 +0000)]
ossl: port to arm64
Enable in-kernel acceleration of SHA1 and SHA2 operations on arm64 by adding
support for the ossl(4) crypto driver. This uses OpenSSL's assembly routines
under the hood, which will detect and use SHA intrinsics if they are
supported by the CPU.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27390
The kernel stack unwinder assumes that any jr $ra indicates the end
of the current function. However, modern compilers generate code
that contains jr $ra at various places inside the function.
- Handle LLD inter-function padding when looking for the start of a
function.
- Use call site for symbol name/offset when unwinding
Currently we use the return address, which will normally just give
an output that's off by 8 from the actual call site. However, for
tail calls, this is particularly bad, as we end up printing the
symbol name for the function that comes after the one that made the
call. Instead we should go back two instructions from the return
address for the unwound program counter.
arichardson [Fri, 4 Dec 2020 15:53:37 +0000 (15:53 +0000)]
crunchgen: fix NULL-deref bug introduced in r364647
While porting over the local changes from CheriBSD for upstreaming, I
accidentally committed a broken version of find_entry_point(): we have to
return NULL if the value is not found instead of a value with
ep->name == NULL, since the checks in main were changed to check ep instead
of ep->name for NULL.
This only matters if the crunched tool cannot be found using normal lookup
and one of the fallback paths is used, so it's unlikely to be triggered
in rescue. However, I noticed that one of our CheriBSD test scripts was
failing to run commands under `su` on minimal disk images where all
binaries are hardlinks to a `cheribsdbox` tool generated with crunchgen.
This also updates the bootstrapping check in Makefile.inc1 to bootstrap
crunchgen up to the next version bump.
kevans [Fri, 4 Dec 2020 15:21:12 +0000 (15:21 +0000)]
gnu: don't build libgnuregex for WITH_GNU_GREP_COMPAT
bsdgrep switched over to libregex back in r363823 to fill
WITH_GNU_GREP_COMPAT, since libgnuregex in base is quite buggy and libregex
is somewhat functional. Don't build libgnuregex on our account, please.
hselasky [Fri, 4 Dec 2020 14:50:55 +0000 (14:50 +0000)]
Fix definition of int64_t and uint64_t when long is 64-bit. This gets the kernel
shim code in line with the rest of the kernel, sys/x86/include/_types.h.
PRR improves loss recovery and avoids RTOs in a wide range
of scenarios (ACK thinning) over regular SACK loss recovery.
PRR is disabled by default, enable by net.inet.tcp.do_prr = 1.
Performance may be impeded by token bucket rate policers at
the bottleneck, where net.inet.tcp.do_prr_conservate = 1
should be enabled in addition.
kevans [Fri, 4 Dec 2020 04:39:48 +0000 (04:39 +0000)]
kern: soclose: don't sleep on SO_LINGER w/ timeout=0
This is a valid scenario that's handled in the various protocol layers where
it makes sense (e.g., tcp_disconnect and sctp_disconnect). Given that it
indicates we should immediately drop the connection, it makes little sense
to sleep on it.
This could lead to panics with INVARIANTS. On non-INVARIANTS kernels, this
could result in the thread hanging until a signal interrupts it if the
protocol does not mark the socket as disconnected for whatever reason.
melifaro [Thu, 3 Dec 2020 22:23:57 +0000 (22:23 +0000)]
Add IPv4/IPv6 rtentry prefix accessors.
Multiple consumers like ipfw, netflow or new route lookup algorithms
need to get the prefix data out of struct rtentry.
Instead of providing direct access to the rtentry, create IPv4/IPv6
accessors to abstract struct rtentry internals and avoid including
internal routing headers for external consumers.
While here, move struct route_nhop_data to the public header, so external
customers can actually use lookup functions returning rt&nhop data.
jhb [Thu, 3 Dec 2020 22:06:08 +0000 (22:06 +0000)]
Clear TLS offload mode if a TLS socket hangs without receiving data.
By default, if a TOE TLS socket stops receiving data for more than 5
seconds, revert the connection back to plain TOE mode. This provides
a fallback if the userland SSL library does not support KTLS. In
addition, for client TLS 1.3 sockets using connect(), the TOE socket
blocks before the handshake has completed since the socket option is
only invoked for the final handshake.
The timeout defaults to 5 seconds, but can be changed at boot via the
hw.cxgbe.toe.tls_rx_timeout tunable or for an individual interface via
the dev.<nexus>.toe.tls_rx_timeout sysctl.
jhb [Thu, 3 Dec 2020 21:59:47 +0000 (21:59 +0000)]
Clear TLS offload mode for unsupported cipher suites and versions.
If TOE TLS is requested for an unsupported cipher suite or TLS
version, disable TLS processing and fall back to plain TOE. In
addition, if an error occurs when saving the decryption keys in the
card's memory, disable TLS processing and fall back to plain TOE.
jhb [Thu, 3 Dec 2020 21:49:20 +0000 (21:49 +0000)]
Fix downgrading of TOE TLS sockets to plain TOE.
If a TOE TLS socket ends up using an unsupported TLS version or
ciphersuite, it must be downgraded to a "plain" TOE socket with TLS
encryption/decryption performed on the host. The previous
implementation of this fallback was incomplete and resulted in hung
connections.
dim [Thu, 3 Dec 2020 19:29:18 +0000 (19:29 +0000)]
Merge commit d989ffd10 from llvm git (by Dimitry Andric):
Implement computeHostNumHardwareThreads() for FreeBSD
This retrieves CPU affinity via FreeBSD's cpuset(2) API, and makes
LLVM respect affinity settings configured by the user via the
cpuset(1) command.
In particular, this allows to reduce the number of threads used on
machines with high core counts, which can interact badly with
parallelized build systems. This is particularly noticable with lld,
which spawns lots of threads even for linking e.g. hello_world!
This fix is related to PR48193, but does not adress the more
fundamental problem, which is that LLVM by default grabs as many CPUs
and/or threads as possible.
dim [Thu, 3 Dec 2020 19:26:21 +0000 (19:26 +0000)]
Revert r367815, so we can apply the slightly different version that
landed upstream:
For llvm's internal function which retrieves the number of available
"hardware threads", use cpuset_getaffinity(2) on FreeBSD, so it will
honor processor sets configured by the cpuset(1) command.
This should make it possible to avoid e.g. lld creating a huge number of
threads on a machine with many cores, even for linking simple programs.
markj [Thu, 3 Dec 2020 17:12:31 +0000 (17:12 +0000)]
Always use 64-bit physical addresses for dump_avail[] in minidumps
As of r365978, minidumps include a copy of dump_avail[]. This is an
array of vm_paddr_t ranges. libkvm walks the array assuming that
sizeof(vm_paddr_t) is equal to the platform "word size", but that's not
correct on some platforms. For instance, i386 uses a 64-bit vm_paddr_t.
Fix the problem by always dumping 64-bit addresses. On platforms where
vm_paddr_t is 32 bits wide, namely arm and mips (sometimes), translate
dump_avail[] to an array of uint64_t ranges. With this change, libkvm
no longer needs to maintain a notion of the target word size, so get rid
of it.
This is a no-op on platforms where sizeof(vm_paddr_t) == 8.
Reviewed by: alc, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27082
markj [Thu, 3 Dec 2020 17:10:00 +0000 (17:10 +0000)]
sdt: Create providers and probes in separate passes when loading sdt.ko
The sdt module's load handler iterates over SDT linker sets for the
kernel and all loaded modules to create probes and providers defined by
SDT(9). Probes in one module may belong to a provider in a different
module, but when a probe is created we assume that the provider is
already defined. To maintain this invariant, modify the load handler to
perform two separate passes over loaded modules: one to define providers
and the other to define probes.
The problem manifests when loading linux.ko, which depends on
linux_common.ko, which defines providers used by probes defined in
linux.ko.
Reported by: gallatin
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
avg [Thu, 3 Dec 2020 11:59:40 +0000 (11:59 +0000)]
dtrace: honor LC_NUMERIC for %'d and alike, and LC_TIME for %T
Note that the public documentation on dtrace.org fails to mention %T and
incorrectly documents %Y. The latter actually uses format "%Y %b %e %T"
where %b is always in C locale.
manu [Thu, 3 Dec 2020 11:15:49 +0000 (11:15 +0000)]
if_dwc: Honor snps,pbl property
DTS node can have this property which configure the burst length
for both TX and RX if it's the same.
This unbreak if_dwc on Allwinner A20 and possibly other boards that
uses this prop.
cxgbe(4): Stop but don't free netmap queues when netmap is switched off.
It is common for freelists to be starving when a netmap application
stops. Mailbox commands to free queues can hang in such a situation.
Avoid that by not freeing the queues when netmap is switched off.
Instead, use an alternate method to stop the queues without releasing
the context ids. If netmap is enabled again later then the same queue
is reinitialized for use. Move alloc_nm_rxq and txq to t4_netmap.c
while here.
gonzo [Thu, 3 Dec 2020 05:39:27 +0000 (05:39 +0000)]
Add support for hw.physmem tunable for ARM/ARM64/RISC-V platforms
hw.physmem tunable allows to limit number of physical memory available to the
system. It's handled in machdep files for x86 and PowerPC. This patch adds
required logic to the consolidated physmem management interface that is used by
ARM, ARM64, and RISC-V.
bdragon [Thu, 3 Dec 2020 01:39:59 +0000 (01:39 +0000)]
[PowerPC64LE] Fix LE VSX/fpr interop
In the PCB struct, we need to match the VSX register file layout
correctly, as the VSRs shadow the FPRs.
In LE, we need to have a dword of padding before the fprs so they end up
on the correct side, as the struct may be manipulated by either the FP
routines or the VSX routines.
Additionally, when saving and restoring fprs, we need to explicitly target
the fpr union member so it gets offset correctly on LE.
Fixes weirdness with FP registers in VSX-using programs (A FPR that was
saved by the FP routines but restored by the VSX routines was becoming 0
due to being loaded to the wrong side of the VSR.)
mhorne [Wed, 2 Dec 2020 21:01:52 +0000 (21:01 +0000)]
uart: allow UART_DEV_DBGPORT for fdt consoles
Allow fdt devices to be used as debug ports for gdb(4).
A debug console can be specified with the "freebsd,debug-path" property
in the device tree's /chosen node, or using the environment variable
hw.fdt.dbgport.
The device should be specified by its name in the device tree, for
example hw.fdt.dbgport="serial2".
r367917 fixed the backpressure on the netmap rxq being stopped but that
doesn't help if some other netmap rxq is starved (because it is stopping
too although the driver doesn't know this yet) and blocks the pipeline.
An alternate fix that works in all cases will be checked in instead.
mhorne [Wed, 2 Dec 2020 17:37:32 +0000 (17:37 +0000)]
em: fix a null de-reference in em_free_pci_resources
A failure in iflib_device_register() can result in
em_free_pci_resources() being called after receive queues have already
been freed. In particular, a failure to allocate IRQ resources will goto
fail_queues, where IFDI_QUEUES_FREE() will be called via
iflib_tx_structures_free(), preceding the call to IFDI_DETACH().
Cope with this by checking adapter->rx_queues before dereferencing it.
A similar check is present in ixgbe(4) and ixl(4).
MFC after: 1 week
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D27260
mmel [Wed, 2 Dec 2020 16:54:24 +0000 (16:54 +0000)]
NVME: Multiple busdma related fixes.
- in nvme_qpair_process_completions() do dma sync before completion buffer
is used.
- in nvme_qpair_submit_tracker(), don't do explicit wmb() also for arm
and arm64. Bus_dmamap_sync() on these architectures is sufficient to ensure
that all CPU stores are visible to external (including DMA) observers.
- Allocate completion buffer as BUS_DMA_COHERENT. On not-DMA coherent systems,
buffers continuously owned (and accessed) by DMA must be allocated with this
flag. Note that BUS_DMA_COHERENT flag is no-op on DMA coherent systems
(or coherent buses in mixed systems).
markj [Wed, 2 Dec 2020 16:46:45 +0000 (16:46 +0000)]
rtsold: Fix bugs reported by Coverity
- Avoid leaking a socket if llflags_get() fails.
- Avoid leaking a file handle if rtsold_init_dumpfile() fails.
- Tighten the check in if_nametosdl() which determines whether we failed
to find the specified interface.
- Fix errno handling in an error path in rtsock_open().
markj [Wed, 2 Dec 2020 16:01:43 +0000 (16:01 +0000)]
pf: Fix table entry counter toggling
When updating a table, pf will keep existing table entry structures
corresponding to addresses that are in both of the old and new tables.
However, the update may also enable or disable per-entry counters which
are allocated separately. Thus when toggling PFR_TFLAG_COUNTERS, the
entries may be missing counters or may have unused counters allocated.
Fix the problem by modifying pfr_ina_commit() to transfer counters
from or to entries in the shadow table.
PR: 251414
Reported by: sigsys@gmail.com
Reviewed by: kp
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27440
rmacklem [Tue, 1 Dec 2020 23:33:10 +0000 (23:33 +0000)]
Improve man page for AmazonEFS mounts.
PR#250770 was actually just a misunderstanding of what
NFS mount options are needed for AmazonEFS mounts.
This patch attempts to clarify the manpage to clarify this.
kib [Tue, 1 Dec 2020 22:53:33 +0000 (22:53 +0000)]
lio_listio(2): send signal even if number of jobs is zero.
Right now, if lio registered zero jobs, syscall frees lio job
structure, cleaning up queued ksi. As result, the realtime signal is
dequeued and never delivered.
Fix it by allowing sendsig() to copy ksi when job count is zero.
PR: 220398
Reported and reviewed by: asomers
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D27421
tsoome [Tue, 1 Dec 2020 22:28:02 +0000 (22:28 +0000)]
ficl: instead of pad, emit can use local variable
Pad in forth is used as "scratchpad" and internal implementations
should not use it. Ficl does not really follow this rule and this can fire back.
emit has no need to use pad, we can use local variable instead.
gonzo [Tue, 1 Dec 2020 20:10:55 +0000 (20:10 +0000)]
[arm64] Bump MAXMEMDOM value to 8 to match amd64
On some of the server-grade ARM64 machines the number of NUMA domains is higher
than 2. When booting GENERIC kernel on such machines the SRAT parser fails
leaving the system with a single domain. To make GENERIC kernel usable on those
server, match the parameter value with the one for amd64 arch.
Some USB WLAN devices have "on-board" storage showing up as umass
and making the root mount wait for a very long time.
The WLAN drivers know how to deal with that an issue an eject
command later when attaching themselves.
Introduce a quirk to not probe these devices as umass and avoid
hangs and confusion altogether.
jhb [Tue, 1 Dec 2020 17:17:22 +0000 (17:17 +0000)]
Make stack_save*() more robust on MIPS.
- Validate any stack addresses read from against td_kstack before
reading. If an unwind operation would attempt to read outside the
bounds of td_kstack, abort the unwind instead.
- For stack_save_td(), don't use the PC and SP from the current
thread, instead read the PC and SP from pcb_context[].
- For stack_save(), use the current PC and SP of the current thread,
not the values from pcb_regs (the horribly named td_frame of the
outermost trapframe). The result was that stack_trace() never
logged _any_ kernel frames but only the frame from the saved
userspace registers on entry from the kernel.
- Inline the one use of stack_register_fetch().
- Add a VALID_PC() helper macro and simplify types to remove
excessive casts in stack_capture().
- Fix stack_capture() to work on compilers written in this century.
Don't treat function epilogues as function prologues by skipping
additions to SP when searching for a function start.
- Add some comments to stack_capture() and fix some style bugs.
if: Fix panic when destroying vnet and epair simultaneously
When destroying a vnet and an epair (with one end in the vnet) we often
panicked. This was the result of the destruction of the epair, which destroys
both ends simultaneously, happening while vnet_if_return() was moving the
struct ifnet to its home vnet. This can result in a freed ifnet being re-added
to the home vnet V_ifnet list. That in turn panics the next time the ifnet is
used.
Prevent this race by ensuring that vnet_if_return() cannot run at the same time
as if_detach() or epair_clone_destroy().
markj [Tue, 1 Dec 2020 16:06:31 +0000 (16:06 +0000)]
vmem: Revert r364744
A pair of bugs are believed to have caused the hangs described in the
commit log message for r364744:
1. uma_reclaim() could trigger reclamation of the reserve of boundary
tags used to avoid deadlock. This was fixed by r366840.
2. The loop in vmem_xalloc() would in some cases try to allocate more
boundary tags than the expected upper bound of BT_MAXALLOC. The
reserve is sized based on the value BT_MAXMALLOC, so this behaviour
could deplete the reserve without guaranteeing a successful
allocation, resulting in a hang. This was fixed by r366838.
Relevant vendor changes:
Issue #1258: add archive_read_support_filter_by_code()
PR #1347: mtree digest reader support
Issue #1381: skip hardlinks pointing to itself on extraction
PR #1387: fix writing of cpio archives with hardlinks without file type
PR #1388: fix rdev field in cpio format for device nodes
PR #1389: completed support for UTF-8 encoding conversion
PR #1405: more formats in archive_read_support_format_by_code()
PR #1408: fix uninitialized size in rar5_read_data
PR #1409: system extended attribute support
PR #1435: support for decompression of symbolic links in zipx archives
Issue #1456: memory leak after unsuccessful archive_write_open_filename
Relevant vendor changes:
Issue #1258: add archive_read_support_filter_by_code()
PR #1347: mtree digest reader support
Issue #1381: skip hardlinks pointing to itself on extraction
PR #1387: fix writing of cpio archives with hardlinks without file type
PR #1388: fix rdev field in cpio format for device nodes
PR #1389: completed support for UTF-8 encoding conversion
PR #1405: more formats in archive_read_support_format_by_code()
PR #1408: fix uninitialized size in rar5_read_data
PR #1409: system extended attribute support
PR #1435: support for decompression of symbolic links in zipx archives
Issue #1456: memory leak after unsuccessful archive_write_open_filename
mhorne [Mon, 30 Nov 2020 22:16:11 +0000 (22:16 +0000)]
efibootmgr: fix an incorrect error handling check
efivar_device_path_to_unix_path() returns standard error codes on
failure and zero on success. Checking for a return value less than zero
means that the actual failure cases won't be handled. This could
manifest as a segfault during the subsequent call to printf().
Reviewed by: imp
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D27424
melifaro [Mon, 30 Nov 2020 21:59:52 +0000 (21:59 +0000)]
Move inner loop logic out of sysctl_sysctl_next_ls().
Refactor sysctl_sysctl_next_ls():
* Move huge inner loop out of sysctl_sysctl_next_ls() into a separate
non-recursive function, returning the next step to be taken.
* Update resulting node oid parts only on successful lookup
* Make sysctl_sysctl_next_ls() return boolean success/failure instead of errno,
slightly simplifying logic