Ryan Stone [Fri, 29 Jan 2021 21:13:57 +0000 (16:13 -0500)]
Add a VM flag to prevent reclaim on a failed contig allocation
If a M_WAITOK contig alloc fails, the VM subsystem will try to
reclaim contiguous memory twice before actually failing the
request. On a system with 64GB of RAM I've observed this take
400-500ms before it finally gives up, and I believe that this
will only be worse on systems with even more memory.
In certain contexts this delay is extremely harmful, so add a flag
that will skip reclaim for allocation requests to allow those
paths to opt-out of doing an expensive reclaim.
Michal Meloun [Thu, 21 Jan 2021 14:06:19 +0000 (15:06 +0100)]
dwmmc: Multiple busdma fixes.
- limit maximum segment size to 2048 bytes. Although dwmmc supports a buffer
fragment with a maximum length of 4095 bytes, use the nearest lower power
of two as the maximum fragment size. Otherwise, busdma create excessive
buffer fragments.
- fix off by one error in computation of the maximum data transfer length.
- in addition, reserve two DMA descriptors that can be used by busdma
bouncing. The beginning or end of the buffer can be misaligned.
- Don’t ignore errors passed to bus_dmamap_load() callback function.
- In theory, a DMA engine may be running at time when next dma descriptor is
constructed. Create a full DMA descriptor before OWN bit is set.
shu [Wed, 3 Feb 2021 16:51:45 +0000 (16:51 +0000)]
linux: make timerfd_settime(2) set expirations count to zero
On Linux, read(2) from a timerfd file descriptor returns an unsigned
8-byte integer (uint64_t) containing the number of expirations
that have occurred, if the timer has already expired one or more
times since its settings were last modified using timerfd_settime(),
or since the last successful read(2). That's to say, once we do
a read or call timerfd_settime(), timer fd's expiration count should
be zero. Some Linux applications create timerfd and add it to epoll
with LT mode, when event comes, they do timerfd_settime instead
of read to stop event source from trigger. On FreeBSD,
timerfd_settime(2) didn't set the count to zero, which caused high
CPU utilization.
Alex Richardson [Wed, 3 Feb 2021 15:27:17 +0000 (15:27 +0000)]
Expose clang's alignment builtins and use them for roundup2/rounddown2
This makes roundup2/rounddown2 type- and const-preserving and allows
using it on pointer types without casting to uintptr_t first. Not
performing pointer-to-integer conversions also helps the compiler's
optimization passes and can therefore result in better code generation.
When using it with integer values there should be no change other than
the compiler checking that the alignment value is a valid power-of-two.
I originally implemented these builtins for CHERI a few years ago and
they have been very useful for CheriBSD. However, they are also useful
for non-CHERI code so I was able to upstream them for Clang 10.0.
Rationale from the clang documentation:
Clang provides builtins to support checking and adjusting alignment
of pointers and integers. These builtins can be used to avoid relying
on implementation-defined behavior of arithmetic on integers derived
from pointers. Additionally, these builtins retain type information
and, unlike bitwise arithmetic, they can perform semantic checking on
the alignment value.
There is also a feature request for GCC, so GCC may also support it in
the future: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98641
Alex Richardson [Wed, 3 Feb 2021 15:24:28 +0000 (15:24 +0000)]
readelf: Fix printing NT_FREEBSD_ARCH_TAG
Looking at lib/csu/arm/crt1_s.S, this should be a string and therefore the
restriction to 4 characters seems wrong.
Found whle updating https://reviews.llvm.org/D74393.
Michal Meloun [Sat, 23 Jan 2021 20:19:07 +0000 (21:19 +0100)]
arm64: Initialize VFP control register.
The RW fields in this register reset to architecturally unknown values,
so initialize these to the proper rounding and denormal mode.
MFC after: 1 week
Peter Grehan [Wed, 3 Feb 2021 09:05:09 +0000 (19:05 +1000)]
Always clamp curve25519 keys prior to use.
This fixes an issue where a private key contained bits that should
have been cleared by the clamping process, but were passed through
to the scalar multiplication routine and resulted in an invalid
public key.
Issue diagnosed (and an initial fix proposed) by shamaz.mazum in
PR 252894.
Alex Richardson [Wed, 3 Feb 2021 09:32:16 +0000 (09:32 +0000)]
atf: Fix ATF_BUILD_* values when not using the bootstrap compiler
Currently, we encode the full path and compile flags for the build
compiler in libatf. However, these values are not correct when
cross-compiling: For example, when I build on macOS, CC is set to the
host path /usr/local/Cellar/llvm/11.0.0_1/bin/clang-11. This path will
not exist on the target system.
Simplify this logic and use cc/cpp/c++ since those binaries will exist
on the target system unless the compiler was explicitly disabled.
I'm not convinced ATF needs to encode these values, but this is a
minimal fix for these tests when using a non-bootstrapped compiler.
Alex Richardson [Wed, 3 Feb 2021 09:31:32 +0000 (09:31 +0000)]
sbin/bectl: Skip tests if sparse files are not supported
The tests create a 1GB test file and this causes the tests to fail in the
CheriBSD CI setup where we run tests with a tmpfs mount on /tmp. Tmpfs
does not support sparse files and it appears that tmpfs default to creating
a 1GB mount, so there is not enough space to run these tests.
Instead of checking for at least 1GB of free space, this commit skips the
tests on file systems that do not support sparse files.
Alex Richardson [Wed, 3 Feb 2021 09:29:08 +0000 (09:29 +0000)]
Fix build with read-only source dir after 83c20b8a2da0
I changed the Makefile to use SRCS instead of LDADD, but since there is
still and absolute path to the source the .o file was created inside the
source directory instead of the build directory.
It would be nice if this was an error/warning by default, but for now just
fix this issue by using .PATH and the base name of the file.
ROUTE_MPATH was added to the GENERIC kernel in r368648.
According to the plan in D27428, it was enabled with `net.route.multipath` sysctl set to 0.
Given enough time has passed, this change enables route multipath by default.
The goal is to ship FreeBSD 13 with multipath turned on.
Ed Maste [Tue, 2 Feb 2021 21:55:51 +0000 (16:55 -0500)]
Correct description for kern.proc.proc_td
kern.proc.proc_td returns the process table with an entry for each
thread. Previously the description included "no threads", presumably
a cut-and-pasteo in 2648efa621748.
Description suggested by PauAmma.
PR: 253146
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Neel Chauhan [Tue, 2 Feb 2021 21:24:17 +0000 (13:24 -0800)]
Allow setting alias port ranges in libalias and ipfw. This will allow a system
to be a true RFC 6598 NAT444 setup, where each network segment (e.g. user,
subnet) can have their own dedicated port aliasing ranges.
Alexander Motin [Tue, 2 Feb 2021 18:37:13 +0000 (13:37 -0500)]
Make DataSN counter of solicited Data-Out local.
DataSN for solicited Data-Out is per-R2T. Since we handle whole R2T
in one go, we don't need to store it anywhere, especially in global
per-command structure. This may allow us to handle multiple R2T per
command at once, if we decide, or may be relax locking.
Rename the second use of that field to io_referenced_task_tag.
Userspace has OFED build enabled for quite some time, but kernel modules
were not. This is useless config because any userspace IB code requires
kernel support. So enable modules build by default.
Move WITH_OFED to WITHOUT_OFED since defaults are now enabled.
Use compat.linux.emul_path instead of hardcoded path in /etc/rc.d/linux
In /etc/rc.d/linux the mounting paths of procfs, sysfs and devfs
are hardcoded to "/compat/linux". Switching to the content of
compat.linux.emul_path sysctl would allow to switch linuxulator
to different place.
Submitted by: freebsdnewbie_freenet.de
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27807
David Chisnall [Tue, 2 Feb 2021 14:06:33 +0000 (16:06 +0200)]
rtld: Fix null-pointer dereference
When a library is opened via fdlopen, it has a null pointer for its path
and so _rtld_bind can crash as a result of passing the null pointer to
basename() (which passes it to strrchr(), which doesn't do a null check).
Alex Richardson [Tue, 2 Feb 2021 09:55:18 +0000 (09:55 +0000)]
tests/sys/audit: Skip extattr tests if extattrs are not supported
In the CheriBSD CI, we run the testsuite with /tmp as tmpfs. This causes
the extattr audit tests to fail since tmpfs does not (yet) support
extattrs. Skip those tests if the target path is on a file system that
does not support extended file attributes.
While touching these two files also convert the ATF_REQUIRE_EQ(-1, ...)
checks to use ATF_REQURIE_ERRNO().
Roger Pau Monné [Tue, 19 Jan 2021 11:52:28 +0000 (12:52 +0100)]
bhyve/ioapic: improve the tracking of IRR bit
One common method of EOI'ing an interrupt at the IO-APIC level is to
switch the pin to edge triggering mode and then back into level mode.
That would cause the IRR bit to be cleared and thus further interrupts
to be injected. FreeBSD does indeed use that method if the IO-APIC EOI
register is not supported.
The bhyve IO-APIC emulation code didn't clear the IRR bit when doing
that switch, and was also missing acknowledging the IRR state when
trying to inject an interrupt in vioapic_send_intr.
Roger Pau Monné [Tue, 19 Jan 2021 12:41:03 +0000 (13:41 +0100)]
bhyve/ioapic: only account for asserted line in level mode
After modifying a redirection entry only try to inject an interrupt if
the pin is in level mode, pins in edge mode shouldn't take into
account the line assert status as they are triggered by edge changes,
not the line status itself.
Cy Schubert [Wed, 27 Jan 2021 15:25:00 +0000 (07:25 -0800)]
Indentation cleanup resulting from the cleanup of #ifdefs.
The conscious decision was made not to perform any indentation or
whitespace cleanup while cleaning out old redunant #ifdefs. The
reason for this was to avoid confusing future readers of history and
diffs with cosmetic changes, making bisection of any possible bugs
introduced more difficult. This commit cleans up the whitespace
detritus left behind from the previous #ifdef cleanup commits.
Cy Schubert [Tue, 26 Jan 2021 06:24:28 +0000 (22:24 -0800)]
Retire the K&R/STD C __P prototype declarations.
In the old days when K&R C and STD C were each in use a workaround
(read hack) was required to allow the same code to work on each
without modification. All C compilers support STD C. We can finally
put the __P prototype to rest.
John Baldwin [Tue, 2 Feb 2021 01:09:33 +0000 (17:09 -0800)]
Bump shared library versions after ncurses bump in 13.
A few shared libraries in the base system link against ncurses. An
upgrade from a 12.x host to 13 results in ABI breakage for existing
binaries since the newer versions of these libraries link against the
newer ncurses while the binary itself links against the older ncurses.
For example, dialog4ports built on 12.x sometimes crashes on 13 since
it depends on libdialog which links against ncurses internally.
Toomas Soome [Sun, 31 Jan 2021 21:04:59 +0000 (23:04 +0200)]
vt: parse_font_info_static should set refcount, not parse_font_info
As we get started with no memory allocator, we set up static font data
for font passed by loader (if there is any). At this time, we also must
set refcount 1, and refcount will get incremented in cnprobe() callback.
At some point the memory allocator will be available, and we will set up
properly allocated font data, but we should not disturb the refcount.
Martin Matuska [Mon, 1 Feb 2021 21:08:19 +0000 (22:08 +0100)]
zfs: update zfs_config.h to match OpenZFS gf11b09dec
Update zfs_config.h to match latest merge in FreeBSD
The version string is declared as 2.0.0-FreeBSD_gf11b09dec to provide
more information about the loaded module:
- the OpenZFS version in base is 2.0
- we are using the in tree-module ("FreeBSD")
- the last merged OpenZFS git revision ("gf11b09dec")
With future merges the git revision tag should be updated.
As we are merging from OpenZFS master branch and already include features
like dRAID, referencing patchlevel releases (2.0.1, 2.0.2) is pointless.
Reviewed by: freqlabs
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D28447
iflib: Free resources in a consistent order during detach
Memory and PCI resources are freed with no particular order. This could
cause use-after-frees when detaching following a failed attach. For
instance, iflib_tx_structures_free() frees ctx->ifc_txqs[] but
iflib_tqg_detach() attempts to access this array. Similarly, adapter
queues gets freed by IFDI_QUEUES_FREE() but IFDI_DETACH() attempts to
access adapter queues to free PCI resources.
Jessica Clarke [Mon, 1 Feb 2021 14:15:57 +0000 (14:15 +0000)]
arm64: Improve DDB backtrace support
The existing implementation relies on each trap handler saving a normal
stack frame record, which is a waste of time and space when we're
already saving a trapframe to the stack. It's also wrong as it currently
saves LR not ELR.
Instead of patching it up, rewrite it based on the RISC-V implementation
with inspiration from the amd64 implementation for how to handle
vectored traps to provide an improved implementation. This includes
compressing the information down to one line like other architectures
rather than the highly-verbose old form that repeats itself by printing
LR and FP in one frame only to print them as PC and SP in the next. It
also includes printing out actually useful information about the traps
that occurred, though FAR is not saved in the trapframe so we cannot
print it (in general it can be clobbered between when the trap happened
and now), only ESR.
The AAPCS also allows the stack frame record to be located anywhere in
the frame, not just the top, so the caller's SP is not at a fixed offset
from the callee's FP like on almost all other architectures in
existence. This means there is no way to derive the caller's SP in the
unwinder, and so we have to drop that bit of (unused) state everywhere.
Navdeep Parhar [Mon, 1 Feb 2021 11:00:09 +0000 (03:00 -0800)]
cxgbe(4): Fixes to tx coalescing.
- The behavior implemented in r362905 resulted in delayed transmission
of packets in some cases, causing performance issues. Use a different
heuristic to predict tx requests.
- Add a tunable/sysctl (hw.cxgbe.tx_coalesce) to disable tx coalescing
entirely. It can be changed at any time. There is no change in
default behavior.
Michael Tuexen [Sun, 31 Jan 2021 22:43:15 +0000 (23:43 +0100)]
sctp: improve input validation
Improve the handling of INIT chunks in specific szenarios and
report and appropriate error cause.
Thanks to Anatoly Korniltsev for reporting the issue for the
userland stack.
mips: fix early kernel panic when setting up interrupt counters
Commit 248f0ca converted intrcnt and intrnames from u_long[]
and char[] to u_long* and char* respectively, but for non-INTRNG mips
these symbols were defined in .S file as a pre-allocated static arrays,
so the problem wasn't cought at compile time. Conversion from an array
to a pointer requires pointer initialization and it wasn't done
for MIPS, so whatever happenned to be in the begginning of intcnt[]
array was used as a pointer value.
Move intrcnt/intrnames to C code and allocate them dynamically
although with a fixed size at the moment.
Alexander Motin [Sun, 31 Jan 2021 17:46:57 +0000 (12:46 -0500)]
cxgb(4): Remove assumption of physically contiguous mbufs.
Investigation of iSCSI target data corruption reports brought me to
discovery that cxgb(4) expects mbufs to be physically contiguous, that
is not true after I've started using m_extaddref() in software iSCSI
for large zero-copy transmissions. In case of fragmented memory the
driver transmitted garbage from pages following the first one due to
simple use of pmap_kextract() for the first pointer instead of proper
bus_dmamap_load_mbuf_sg(). Seems like it was done as some optimization
many years ago, and at very least it is wrong in a world of IOMMUs.
This patch just removes that optimization, plus limits packet coalescing
for mbufs crossing page boundary, also depending on assumption of one
segment per packet.
MFC after: 3 days
Sponsored by: iXsystems, Inc.
Reviewed by: mmacy, np
Differential revision: https://reviews.freebsd.org/D28428
Kyle Evans [Sun, 31 Jan 2021 16:07:31 +0000 (10:07 -0600)]
tools: boot: use four jobs for building stand
Parallel builds of stand should be assumed both possible and safe as of 7012461c9bf6, so let's start using some jobs to speed up lualoader test
harness builds.
Kyle Evans [Sun, 31 Jan 2021 15:51:39 +0000 (09:51 -0600)]
lualoader: position hyphens at the beginning of character classes
According to the Lua 5.4 manual section 6.4.1 ("Patterns"), the interaction
between ranges and classes is not defined and hyphens must be specified at
either the beginning or the end of a set if they are not escaped.
Fix the design problem with delayed algorithm sync.
Currently, if the immutable algorithm like bsearch or radix_lockless
receives rtable update notification, it schedules algorithm rebuild.
This rebuild is executed by the callout after ~50 milliseconds.
It is possible that a script adding an interface address and than route
with the gateway bound to that address will fail. It can happen due
to the fact that fib is not updated by the time the route addition
request arrives.
Fix this by allowing synchronous algorithm rebuilds based on certain
conditions. By default, these conditions assume:
1) less than net.route.algo.fib_sync_limit=100 routes
2) routes without gateway.
* Move algo instance build entirely under rib WLOCK.
Rib lock is only used for control plane (except radix algo, but there
are no rebuilds).
* Add rib_walk_ext_locked() function to allow RIB iteration with
rib lock already held.
* Fix rare potential callout use-after-free for fds by binding fd
callout to the relevant rib rmlock. In that case, callout_stop()
under rib WLOCK guarantees no callout will be executed afterwards.
Add rib_subscribe_locked() and rib_unsubsribe_locked() to support
subscriptions during RIB modifications.
Add new subscriptions to the beginning of the lists instead of
the end. This fixes the situation when new subscription is created
int the callback for the existing subscription, leading to the
subscription notification handler pick it.
* Move per-prefix debug lines under LOG_DEBUG2
* Create fib instance counter to distingush log messages between
instances
* Add more messages on rebuild reason.
Bjoern A. Zeeb [Sat, 30 Jan 2021 17:50:26 +0000 (17:50 +0000)]
LinuxKPI: add module dependency on firmware(9)
In a6c2507d1baedb183268e31bc6b6f659a9529904 support for LinuxKPI
firmware loading was added. Record the dependency on firmware(9)
as otherwise (if built as module) linuxkpi will no longer load.
Kyle Evans [Sat, 30 Jan 2021 06:09:10 +0000 (00:09 -0600)]
build: options: mention ports in the WITH_OPENLDAP description
There's a third party dependency on this option; currently,
net/openldap24-{,sasl-}client. At least mention that an openldap from ports
is needed for this option.
PR: 252866
Reported-by: Build Option Survey via Michael Dexter
MFC-after: 3 days
Kirk McKusick [Sat, 30 Jan 2021 08:03:37 +0000 (00:03 -0800)]
Revert 2d4422e7991a, Eliminate lock order reversal in UFS ffs_unmount().
After discussion with Chuck Silvers (chs@) we have decided that
there is a better way to resolve this lock order reversal which
will be committed separately.
Kyle Evans [Sat, 30 Jan 2021 05:48:28 +0000 (23:48 -0600)]
ofed: fix the WITH_OFED_EXTRA build
This option was not tested when WARNS was globally lifted in the src tree up
to 6. Drop WARNS back down to unbreak the build; note that this is still
enabling more warnings than it had before the WARNS change, so the gcc build
may need to be independently evaluated at this level.
PR: 252865
Reported-by: Build Option Servey via Michael Dexter
MFC-after: 3 days
Mateusz Guzik [Fri, 29 Jan 2021 21:48:11 +0000 (22:48 +0100)]
Reimplement strlen
The previous code neglected to use primitives which can find the end
of the string without having to branch on every character.
While here augment the somewhat misleading commentary -- strlen as
implemented here leaves performance on the table, especially so for
userspace. Every arch should get a dedicated variant instead.
In the meantime this commit lessens the problem.
Tested with glibc test suite.
Naive test just calling strlen in a loop on Haswell (ops/s):
The initial plan was to remove rib_lookup_info() before
FreeBSD 13. As several customers are still remaining,
fix rib_lookup_info() for the multipath use case.
D26436 introduced support for stacked vlans that changed the way vlans
are configured. In particular, this change broke setups that have
same-number vlans as subinterfaces.
Vlan support was initially created assuming "vlanX" semantics. In this paradigm,
automatic number assignment supported by cloning (ifconfig vlan create) was a
natural fit.
When "ifaceX.Y" support was added, allowing to have the same vlan number on
multiple devices, cloning code became more complex, as the is no
unified "vlan" namespace anymore. Such interfaces got the first spare
index from "vlan" cloner. This, in turn, led to the following problem:
ifconfig ix0.333 create -> index 1
ifconfig ix0.444 create -> index 2
ifconfig vlan2 create -> allocation failure
This change fixes such allocations by using cloning indexes only for
"vlanX" interfaces.
Reviewed by: hselasky
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D27505