kevans [Tue, 8 Dec 2020 14:05:25 +0000 (14:05 +0000)]
src.opts.mk: switch to bsdgrep as /usr/bin/grep
This has been years in the making, and we all knew it was bound to happen
some day. Switch to the BSDL grep implementation now that it's been a
little more thoroughly tested and theoretically supports all of the
extensions that gnugrep in base had with our libregex(3).
Folks shouldn't really notice much from this update; bsdgrep is slower than
gnugrep, but this is currently the price to pay for fewer bugs. Those
dissatisfied with the speed of grep and in need of a faster implementation
should check out what textproc/ripgrep and textproc/the_silver_searcher
can do for them.
I have some WIP to make bsdgrep faster, but do not consider it a blocker
when compared to the pros of switching now (aforementioned bugs, licensing).
ngie [Tue, 8 Dec 2020 04:18:16 +0000 (04:18 +0000)]
extattr_get_file(20: bump .Dd
This is being done for the formatting and context changes. While the net content
hasn't been changed, the content/context changes were sufficient to warrant the
date bump.
ngie [Tue, 8 Dec 2020 04:16:05 +0000 (04:16 +0000)]
extattr_get_file(2): clarify RETURN VALUES
While some of the syscalls' behavior were documented and implied in the
RETURN VALUES section by earlier, e.g., the DESCRIPTION sections, as having
behavior of the other calls (`*_fd` vs `*_file` vs `*_link`), there was a lot
of implied return value behavior in the section prior to this change.
Explicitly document the syscall behavior per the current implementation in
sys/kern/vfs_extattr.c so others can better develop based on its explicit
documented behavior instead of having to digest the context of the manpage to
understand the appropriate behavior.
ngie [Tue, 8 Dec 2020 04:01:03 +0000 (04:01 +0000)]
extattr_get_file(2): sort syscalls alphabetically
Although some sections of the manpage sort the syscalls alphabetically, many
core areas of the manpage do not. Sort the syscalls so it is easier to pick out
functional changes and to improve manpage readability.
This formatting change is also being done to make future functional changes
easier to spot.
ngie [Tue, 8 Dec 2020 03:43:00 +0000 (03:43 +0000)]
extattr_get_fd(2): fix manlint errors
- The CAVEATS section was misspelled as "CAVEAT".
- The CAVEATS section should come before the "BUGS" section and after
other existing sections by convention.
mhorne [Tue, 8 Dec 2020 00:48:50 +0000 (00:48 +0000)]
release: don't checksum images if there are none
For platforms that don't have any of the memstick, cdrom, or dvdrom
release images (i.e. riscv64), the release-install target will trip up
when invoking md5(1) on the non-existent image files. Skipping this
allows the install to complete successfully.
mhorne [Tue, 8 Dec 2020 00:42:03 +0000 (00:42 +0000)]
RISC-V release confs
Add two release flavors for RISC-V. First, the traditional "big-iron"
images, capable of generating distribution sets and VM images. Installer
images won't be built yet, but can be trivially enabled in the future
with the addition of riscv/make-memstick.sh.
Second, a GENERICSD embedded image. I've opted for this instead of
board-specific SD card images as it allows users to just dd the u-boot
they want. The RISC-V hardware ecosystem is still young, so a
configuration for e.g. the new PolarFire SoC Icicle Kit would likely see
very few users.
mhorne [Tue, 8 Dec 2020 00:37:11 +0000 (00:37 +0000)]
riscv: allow building virtual machine images
RISC-V has the same booting requirements as arm64 (loader.efi, no legacy
boot options), so generated images for both architectures have the same
partition layout.
mhorne [Tue, 8 Dec 2020 00:35:13 +0000 (00:35 +0000)]
release.sh: add support for RISC-V embedded builds
Since the few existing RISC-V hardware platforms are single board
computers, we can piggyback off of arm/arm64's embedded build support
for generating SD card images.
I don't see a pressing need to change the naming in this file at this
time.
Reviewed by: gjb, manu
Differential Revision: https://reviews.freebsd.org/D27043
andrew [Mon, 7 Dec 2020 17:54:49 +0000 (17:54 +0000)]
Ensure the boot CPU is CPU 0 on arm64
We assume the boot CPU is always CPU 0 on arm64. To allow for this reserve
cpuid 0 for the boot CPU in the ACPI and FDT cases but otherwise start the
CPU as normal. We then check for the boot CPU in start_cpu and return as if
it was started.
While here extract the FDT CPU init code into a new function to simplify
cpu_mp_start and return FALSE from start_cpu when the CPU fails to start.
Reviewed by: mmel
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D27497
markj [Mon, 7 Dec 2020 14:52:57 +0000 (14:52 +0000)]
iflib: Detach tasks upon device registration failure
In some error paths we would fail to detach from the iflib taskqueue
groups. Also move the detach code into its own subroutine instead of
duplicating it.
Submitted by: Sai Rajesh Tallamraju <stallamr@netapp.com>
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D27342
tsoome [Mon, 7 Dec 2020 11:25:18 +0000 (11:25 +0000)]
loader: xdr_array is missing count
The integer arrays are encoded in nvlist as counted array <count, i0, i1...>,
loader xdr_array() is missing the count. This will affect the pool import when
there are hole devices in pool.
hselasky [Mon, 7 Dec 2020 09:48:06 +0000 (09:48 +0000)]
Prefer using the MIN() function macro over the min() inline function
in the LinuxKPI. Linux defines min() to be a macro, while in FreeBSD
min() is a static inline function clamping its arguments to
"unsigned int".
markj [Sun, 6 Dec 2020 22:45:50 +0000 (22:45 +0000)]
uma: Make uma_zone_set_maxcache() work better with small limits
The old implementation chose the largest bucket zone such that if the
per-CPU caches are fully populated, the total number of items cached is
no larger than the specified limit. If no such zone existed, UMA would
not do any caching.
We can now use uz_bucket_size_max to set a precise limit on the number
of items in a zone's bucket, so the total size of per-CPU caches can be
bounded more easily. Implement a new policy in uma_zone_set_maxcache():
choose a bucket size such that up to half of the limit can be cached in
per-CPU caches, with the rest going to the full bucket cache. This
fixes a problem with the kstack_cache zone: the limit of 4 * mp_ncpus
items meant that the zone would not do any caching, defeating the whole
purpose of the zone. That's because the smallest bucket size holds up
to 2 items and we may cache up to 3 full buckets per CPU, and
2 * 3 * mp_ncpus > 4 * mp_ncpus.
Reported by: mjg
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27168
markj [Sun, 6 Dec 2020 22:45:39 +0000 (22:45 +0000)]
uma: Enforce the use of uz_bucket_size_max in the free path
uz_bucket_size_max is the maximum permitted bucket size. When filling a
new bucket to satisfy uma_zalloc(), the bucket is populated with at most
uz_bucket_size_max items. The maximum number of entries in the bucket
may be larger. When freeing items, however, we will fill per-CPPU
buckets up to their maximum number of entries, potentially exceeding
uz_bucket_size_max. This makes it difficult to precisely limit the
number of items that may be cached in a zone. For example, if one wants
to limit buckets to 1 entry for a particular zone, that's not possible
since the smallest bucket holds up to 2 entries.
Try to solve the problem by using uz_bucket_size_max to limit the number
of entries in a bucket. Note that the ub_entries field is initialized
upon every bucket allocation. Most zones are not affected since they do
not impose any specific limit on the maximum bucket size.
While here, remove the UMA_ZONE_MINBUCKET flag. It was unused and we
now have uma_zone_set_maxcache() to control the zone's cache size more
precisely.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27167
emaste [Sun, 6 Dec 2020 21:34:04 +0000 (21:34 +0000)]
Add deprecation notice to mn(4)
Sync serial (T1/E1) interfaces are largely irrelevant today and phk
confirms this driver is unnecessary in review D23928.
This leaves ce(4) and cp(4) in the tree. They're likely not relevant
either, but glebius contacted the manufacturer and those devices are
still available for purchase. At glebius' suggestion leave them in
the tree as long as they do not impose a maintenace burden.
Approved by: phk
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
tuexen [Sun, 6 Dec 2020 18:43:12 +0000 (18:43 +0000)]
When dropping packets (RRQ or WRQ) for debugging, report the send
operation as successful. Reporting a failure stops the transfer
instead of using timeouts.
kevans [Sun, 6 Dec 2020 17:45:42 +0000 (17:45 +0000)]
bsdgrep: don't link against libregex for bootstrap
r368355 removed the GNU_GREP_COMPAT knob (off by default) and forgot that
bsdgrep may be built/used for bootstrap on some systems.
All base uses should strive to use only POSIX-compliant expressions anyways
and we haven't had libregex by default here up to this point, so just don't
do that if we're bootstrapping.
Note that the resulting binary has the wrong `grep -V` information as it
falsely claims to be GNU compatible, but it is only for bootstrap.
yuripv [Sun, 6 Dec 2020 16:44:41 +0000 (16:44 +0000)]
update wcwidth data from utf8proc
Character width data being out of date is a constant source
of weird rendering issues and wasted time trying to diagnose
those, e.g. as reported by Jeremy Chadwick:
https://gitlab.com/muttmua/mutt/-/issues/67
Sadly, there is no real ("standard") wcwidth data source, so
this tries to rectify the problem using the utf8proc one (through
its C API) which would hopefully benefeat both FreeBSD and
utf8proc through bug reports (if any).
kevans [Sun, 6 Dec 2020 15:58:50 +0000 (15:58 +0000)]
bectl: simplify the tail end of the jail cmd
This has already confused me once (and I'm pretty sure I wrote it), so let's
clarify: unjailing after the command has completed will only happen if we're
interactive and -U has not been specified.
This just folds two conditionals together to make it obvious how -b/-U
interact with each other.
tijl [Sun, 6 Dec 2020 10:58:55 +0000 (10:58 +0000)]
Move V4L feature declarations and DTrace provider definitions from
linux_common.c to linux_util.c so they become available on i386.
linux_common.c defines the linux_common kernel module but this module does
not exist on i386 and linux_common.c is not included in the linux module.
linux_util.c is included in the linux_common module on amd64 and the linux
module on i386.
Remove linux_common.c from files.i386 again. It was added recently in
r367433 when the DTrace provider definitions were moved.
The V4L feature declarations were moved to linux_common in r283423.
tijl [Sat, 5 Dec 2020 14:53:24 +0000 (14:53 +0000)]
Fix i386 linux module after r367395.
In r367395 parts of machine dependent linux_dummy.c were moved to a new
machine independent file sys/compat/linux/linux_dummy.c and the existing
linux_dummy.c was renamed to linux_dummy_machdep.c.
Add linux_dummy_machdep.c to the linux module for i386.
Rename sys/amd64/linux32/linux_dummy.c for consistency.
Add the new linux_dummy.c to the linux module for i386.
kevans [Sat, 5 Dec 2020 14:38:46 +0000 (14:38 +0000)]
libc: regex: partial revert of r368358
Part of the libregex functionality leaked into the tests it shares with
the standard regex(3). Introduce a P flag to set the REG_POSIX cflag to
indicate that libc regex should effectively do nothing while libregex should
specifically run it in non-extended mode.
mmel [Sat, 5 Dec 2020 14:06:01 +0000 (14:06 +0000)]
Simplify startup of secondary cores and store MPIDR register to pcpu.
- record MPIDR for all started cores in pcpu, they will be used as link
between physical locality of given core, ID in external description
(FDT or ACPI) and cupid.
- because of above, cpuid can (and should) be freely assigned, only boot
CPU must have cpuid 0. Simplify startup code according this.
Please note that pure cpuid is not sufficient instrument to hold any
information about core or cluster topology, nor to determistically iterate
over subpart of cores in CPU (iterate over all cores in single cluster for
example). Situation is more complicated by fact that PSCI can reject start
of core without reporting error (because power budget for example), or by
fact that is possible that we booted on non-first core in cluster (thus with
cpuid 0 assigned to random core).
Given cores topology should be exhibited to other parts of system
(for example to scheduler for big.little or multicluster systems) by using
smp_topo interface.
mmel [Sat, 5 Dec 2020 10:55:09 +0000 (10:55 +0000)]
DesignWare PCIe driver: Don't call bus_generic_attach() twice.
bus_generic_attach() should be called from the attach function of the real
implementation, not from the common init function.
It was realized just a little too late that this was a hack that belonged in
individual regex(3)-using applications. It was surrounded in NOTYET and not
implemented in the engine, so remove it.
kevans [Sat, 5 Dec 2020 03:16:05 +0000 (03:16 +0000)]
libregex: implement \b and \B (word boundary, not word boundary)
This is the last of the needed GNU expressions before we can unleash bsdgrep
by default. \b is effectively an agnostic equivalent of \< and \>, while
\B will match every space that isn't making a transition from
nonchar -> char or char -> nonchar.
kevans [Sat, 5 Dec 2020 03:13:47 +0000 (03:13 +0000)]
libregex: implement \` and \' (begin-of-subj, end-of-subj)
These are GNU extensions, generally equivalent to ^ and $ except that the
new syntax will not match beginning of line after the first in a multi-line
expression or the end of line before absolute last in a multi-line
expression.
kevans [Sat, 5 Dec 2020 02:21:58 +0000 (02:21 +0000)]
Retire GNU_GREP_COMPAT knob
This was introduced and then disabled by default primarily to avoid dealing
with bugs in libgnuregex. rS363823 switched to using libregex for it, so
let's just rip the option out now so we can make sure we're getting tested
with libregex via bsdgrep.
cem [Sat, 5 Dec 2020 00:33:28 +0000 (00:33 +0000)]
Add CFI start/end proc directives to arm64, i386, and ppc
Follow-up to r353959 and r368070: do the same for other architectures.
arm32 already seems to use its own .fnstart/.fnend directives, which
appear to be ARM-specific variants of the same thing. Likewise, MIPS
uses .frame directives.
imp [Fri, 4 Dec 2020 21:34:48 +0000 (21:34 +0000)]
nvme: Remove a wmb() that's not necessary.
bus_dmamap_sync() ensures that memory that's prepared for PREWRITE can
be DMA'd immediately after it returns. The details differ, but this
mirrors atomic thread release semantics, at least for the buffers
synced.
For non-x86 platforms, bus_dmamap_sync() has the right syncing and
fences. So in the past, wmb() had been omitted for them.
For x86 platforms, the memory ordering is already strong enough to
ensure DMA to the device sees the current contents. As such, we don't
need the wmb() here. It translates to an sfence which is only needed
for writes to regions that have the write combining attribute set or
when some exotic opcodes are used. The nvme driver does neither of
these. Since bus_dmamap_sync() includes atomic_thread_fence_rel, we
can be assured any optimizer won't reorder the bus_dmamap_sync and the
bus_space_write operations. The wmb() was a vestiage of the pre-busdma
version initially committed to the tree.
imp [Fri, 4 Dec 2020 21:34:04 +0000 (21:34 +0000)]
busdma: Annotate bus_dmamap_sync() with fence
Add an explicit thread fence release before returning from
bus_dmamap_sync. This should be a no-op in practice, but makes
explicit that all ordinary stores will be completed before subsequent
reads/writes to ordinary device memory. On x86, normal memory ordering
is strong enough to generally guarantee this. The fence keeps the
optimizer (likely LTO) from reordering other calls around this.
The other architectures already have calls, as appropriate, that
are equivalent.
Note: On x86, there is one exception to this rule. If you've mapped
memory as write combining, then you will need to add a sfence or
similar. Normally, though, busdma doesn't operate on such memory, and
drivers that do already cope appropriately.
mhorne [Fri, 4 Dec 2020 21:12:17 +0000 (21:12 +0000)]
ossl: port to arm64
Enable in-kernel acceleration of SHA1 and SHA2 operations on arm64 by adding
support for the ossl(4) crypto driver. This uses OpenSSL's assembly routines
under the hood, which will detect and use SHA intrinsics if they are
supported by the CPU.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27390
The kernel stack unwinder assumes that any jr $ra indicates the end
of the current function. However, modern compilers generate code
that contains jr $ra at various places inside the function.
- Handle LLD inter-function padding when looking for the start of a
function.
- Use call site for symbol name/offset when unwinding
Currently we use the return address, which will normally just give
an output that's off by 8 from the actual call site. However, for
tail calls, this is particularly bad, as we end up printing the
symbol name for the function that comes after the one that made the
call. Instead we should go back two instructions from the return
address for the unwound program counter.
arichardson [Fri, 4 Dec 2020 15:53:37 +0000 (15:53 +0000)]
crunchgen: fix NULL-deref bug introduced in r364647
While porting over the local changes from CheriBSD for upstreaming, I
accidentally committed a broken version of find_entry_point(): we have to
return NULL if the value is not found instead of a value with
ep->name == NULL, since the checks in main were changed to check ep instead
of ep->name for NULL.
This only matters if the crunched tool cannot be found using normal lookup
and one of the fallback paths is used, so it's unlikely to be triggered
in rescue. However, I noticed that one of our CheriBSD test scripts was
failing to run commands under `su` on minimal disk images where all
binaries are hardlinks to a `cheribsdbox` tool generated with crunchgen.
This also updates the bootstrapping check in Makefile.inc1 to bootstrap
crunchgen up to the next version bump.
kevans [Fri, 4 Dec 2020 15:21:12 +0000 (15:21 +0000)]
gnu: don't build libgnuregex for WITH_GNU_GREP_COMPAT
bsdgrep switched over to libregex back in r363823 to fill
WITH_GNU_GREP_COMPAT, since libgnuregex in base is quite buggy and libregex
is somewhat functional. Don't build libgnuregex on our account, please.
hselasky [Fri, 4 Dec 2020 14:50:55 +0000 (14:50 +0000)]
Fix definition of int64_t and uint64_t when long is 64-bit. This gets the kernel
shim code in line with the rest of the kernel, sys/x86/include/_types.h.
PRR improves loss recovery and avoids RTOs in a wide range
of scenarios (ACK thinning) over regular SACK loss recovery.
PRR is disabled by default, enable by net.inet.tcp.do_prr = 1.
Performance may be impeded by token bucket rate policers at
the bottleneck, where net.inet.tcp.do_prr_conservate = 1
should be enabled in addition.
kevans [Fri, 4 Dec 2020 04:39:48 +0000 (04:39 +0000)]
kern: soclose: don't sleep on SO_LINGER w/ timeout=0
This is a valid scenario that's handled in the various protocol layers where
it makes sense (e.g., tcp_disconnect and sctp_disconnect). Given that it
indicates we should immediately drop the connection, it makes little sense
to sleep on it.
This could lead to panics with INVARIANTS. On non-INVARIANTS kernels, this
could result in the thread hanging until a signal interrupts it if the
protocol does not mark the socket as disconnected for whatever reason.
melifaro [Thu, 3 Dec 2020 22:23:57 +0000 (22:23 +0000)]
Add IPv4/IPv6 rtentry prefix accessors.
Multiple consumers like ipfw, netflow or new route lookup algorithms
need to get the prefix data out of struct rtentry.
Instead of providing direct access to the rtentry, create IPv4/IPv6
accessors to abstract struct rtentry internals and avoid including
internal routing headers for external consumers.
While here, move struct route_nhop_data to the public header, so external
customers can actually use lookup functions returning rt&nhop data.
jhb [Thu, 3 Dec 2020 22:06:08 +0000 (22:06 +0000)]
Clear TLS offload mode if a TLS socket hangs without receiving data.
By default, if a TOE TLS socket stops receiving data for more than 5
seconds, revert the connection back to plain TOE mode. This provides
a fallback if the userland SSL library does not support KTLS. In
addition, for client TLS 1.3 sockets using connect(), the TOE socket
blocks before the handshake has completed since the socket option is
only invoked for the final handshake.
The timeout defaults to 5 seconds, but can be changed at boot via the
hw.cxgbe.toe.tls_rx_timeout tunable or for an individual interface via
the dev.<nexus>.toe.tls_rx_timeout sysctl.
jhb [Thu, 3 Dec 2020 21:59:47 +0000 (21:59 +0000)]
Clear TLS offload mode for unsupported cipher suites and versions.
If TOE TLS is requested for an unsupported cipher suite or TLS
version, disable TLS processing and fall back to plain TOE. In
addition, if an error occurs when saving the decryption keys in the
card's memory, disable TLS processing and fall back to plain TOE.
jhb [Thu, 3 Dec 2020 21:49:20 +0000 (21:49 +0000)]
Fix downgrading of TOE TLS sockets to plain TOE.
If a TOE TLS socket ends up using an unsupported TLS version or
ciphersuite, it must be downgraded to a "plain" TOE socket with TLS
encryption/decryption performed on the host. The previous
implementation of this fallback was incomplete and resulted in hung
connections.
dim [Thu, 3 Dec 2020 19:29:18 +0000 (19:29 +0000)]
Merge commit d989ffd10 from llvm git (by Dimitry Andric):
Implement computeHostNumHardwareThreads() for FreeBSD
This retrieves CPU affinity via FreeBSD's cpuset(2) API, and makes
LLVM respect affinity settings configured by the user via the
cpuset(1) command.
In particular, this allows to reduce the number of threads used on
machines with high core counts, which can interact badly with
parallelized build systems. This is particularly noticable with lld,
which spawns lots of threads even for linking e.g. hello_world!
This fix is related to PR48193, but does not adress the more
fundamental problem, which is that LLVM by default grabs as many CPUs
and/or threads as possible.
dim [Thu, 3 Dec 2020 19:26:21 +0000 (19:26 +0000)]
Revert r367815, so we can apply the slightly different version that
landed upstream:
For llvm's internal function which retrieves the number of available
"hardware threads", use cpuset_getaffinity(2) on FreeBSD, so it will
honor processor sets configured by the cpuset(1) command.
This should make it possible to avoid e.g. lld creating a huge number of
threads on a machine with many cores, even for linking simple programs.
markj [Thu, 3 Dec 2020 17:12:31 +0000 (17:12 +0000)]
Always use 64-bit physical addresses for dump_avail[] in minidumps
As of r365978, minidumps include a copy of dump_avail[]. This is an
array of vm_paddr_t ranges. libkvm walks the array assuming that
sizeof(vm_paddr_t) is equal to the platform "word size", but that's not
correct on some platforms. For instance, i386 uses a 64-bit vm_paddr_t.
Fix the problem by always dumping 64-bit addresses. On platforms where
vm_paddr_t is 32 bits wide, namely arm and mips (sometimes), translate
dump_avail[] to an array of uint64_t ranges. With this change, libkvm
no longer needs to maintain a notion of the target word size, so get rid
of it.
This is a no-op on platforms where sizeof(vm_paddr_t) == 8.
Reviewed by: alc, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27082
markj [Thu, 3 Dec 2020 17:10:00 +0000 (17:10 +0000)]
sdt: Create providers and probes in separate passes when loading sdt.ko
The sdt module's load handler iterates over SDT linker sets for the
kernel and all loaded modules to create probes and providers defined by
SDT(9). Probes in one module may belong to a provider in a different
module, but when a probe is created we assume that the provider is
already defined. To maintain this invariant, modify the load handler to
perform two separate passes over loaded modules: one to define providers
and the other to define probes.
The problem manifests when loading linux.ko, which depends on
linux_common.ko, which defines providers used by probes defined in
linux.ko.
Reported by: gallatin
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation