kern: random: collect ~16x less from fast-entropy sources
Previously, we were collecting at a base rate of:
64 bits x 32 pools x 10 Hz = 2.5 kB/s
This change drops it to closer to 64-ish bits per pool per second, to
work a little better with entropy providers in virtualized environments
without compromising the security goals of Fortuna.
kern: random: drop read_rate and associated functionality
Refer to discussion in PR 230808 for a less incomplete discussion, but
the gist of this change is that we currently collect orders of magnitude
more entropy than we need.
The excess comes from bytes being read out of /dev/*random. The default
rate at which we collect entropy without the read_rate increase is
already more than we need to recover from a compromise of an internal
state.
Avoid using atomics as it_wait is guarded by td_lock.
Report threshold calculation is done only if at least one PMC hook
is installed
Fixes:
* avoid unnecessary branching (if frame != null ...)
by having PMC_HOOK_INSTALLED_ANY
condition on the top of them, which should hint
the core not to execute speculatively anything
which us underneath;
* access intr_hwpmc_waiting_report_threshold cacheline
only if at least one hook is loaded;
truss: Decode correctly 64bits arguments on 32bits arm.
Mostly revert ebbc3140ca0d7eee154f7a67ccdae7d3d88d13fd.
We don't need to special-case anything for arm64, the check for the pointer
size is already done for us, just keep the bits about having arm and arm64
having to add padding for 32bits binaries.
Eliminate an unnecessary rerun request in fsck_ffs.
When fsck_ffs is running in preen mode and finds a zero-length directory,
it deletes that directory. In doing this operation, it unnecessary set
its internal flag saying that fsck_ffs needed to be rerun. This patch
deletes the rerun request for this case.
Reported by: Mark Johnson
PR: 246962
MFC after: 1 week
Sponsored by: Netflix
truss: Decode correctly 64bits arguments on 32bits arm.
When decoding 32bits arm syscall, make sure we account for the padding when
decoding 64bits args. Do it too when using a 64bits truss on a 32bits binary.
Add aarch64 to the list of architectures that can run 32bits FreeBSD binaries,
so that truss works correctly with an arm32 binary.
The same should probably be done with mips.
Revert "linux32: add a hack to avoid redefining the type of the savefpu tag"
This reverts commit 0f6829488ef32142b9ea1c0806fb5ecfe0872c02.
Also it changes the type of md_usr_fpu_save struct mdthread member
to void *, which is what uncovered this trouble. Now the save area
is untyped, but since it is hidden behind accessors, it is not too
significant. Since apparently there are consumers affected outside
the tree, this hack is better than one from the reverted revision.
Stefan Eßer [Wed, 22 Sep 2021 11:59:01 +0000 (13:59 +0200)]
ObsoleteFiles.inc: Add sponge(1) command and man-page
The sponge command has been imported on 2017-12-05 but the import has
been reverted the next day.
A script failed and I found that it was due to the left-over broken
sponge binary in base being prefered over the port version. To prevent
a known non-working binary to persist in /usr/bin, I'm adding sponge
to the obsolete files list even though it could only be installed on
a single day in 2017.
I do not plan to MFC this change since the issue will only exist on
systems installed from -CURRENT sources in 2017, and I do assume that
such systems are not running -STABLE today
Until this change, any bindings set in histedit() were lost on calls to
bindcmd().
Only bind -e and bind -v call libedit's keymacro_reset(). Currently you
cannot fool libedit/map.c:map_bind() by trying something like bind -le
as when p[0] == '-', it does a switch statement on p[1].
Alexander Motin [Tue, 21 Sep 2021 22:14:22 +0000 (18:14 -0400)]
sched_ule(4): Improve long-term load balancer.
Before this change long-term load balancer was unable to migrate
running threads, only ones waiting on run queues. But with growing
number of CPU cores it is quite typical now for system to not have
many waiting threads. But same time if due to some coincidence two
long-running CPU-bound threads ended up sharing same physical CPU
core, they could suffer from the SMT penalty indefinitely, and the
load balancer couldn't help.
Improve that by teaching the load balancer to hint running threads
to migrate by marking them with TDF_NEEDRESCHED and new TDF_PICKCPU
flag, making sched_pickcpu() to search for better CPU later, when
it is convenient.
Fix CPU search logic when balancing to limit round-robin migrations
in case of almost equal load to the group of physical cores. The
previous code bounced threads across all the system, that should be
pretty bad for caches and NUMA affinity, while additional fairness
was almost invisible, diminishing with number of cores in the group.
According to https://github.com/NuxiNL/cloudlibc:
CloudABI is no longer being maintained. It was an awesome experiment,
but it never got enough traction to be sustainable.
There is no reason to keep it in FreeBSD.
Approved by: ed (private mail)
Reviewed by: emaste
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D31923
After b4cb3fe0e39a, loader started crashing on PowerPC64, with a
Program Exception (700) error. The problem was that archsw was
used before being initialized, with the new mount feature. This
change fixes the issue by initializing archsw earlier, before
setting currdev, that triggers the mount.
Reviewed by: tsoome
MFC after: 1 month
X-MFC-With: b4cb3fe0e39a
Sponsored by: Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D32027
Alan Somers [Thu, 16 Sep 2021 19:19:21 +0000 (13:19 -0600)]
fusefs: don't panic if FUSE_GETATTR fails durint VOP_GETPAGES
During VOP_GETPAGES, fusefs needs to determine the file's length, which
could require a FUSE_GETATTR operation. If that fails, it's better to
SIGBUS than panic.
For signal send, copyout from the user FPU save area directly.
For sigreturn, we are in sleepable context and can do temporal
allocation of the transient save area. We cannot copying from userspace
directly to user save area because XSAVE state needs to be validated,
also partial copyins can corrupt it.
amd64: stop using top of the thread' kernel stack for FPU user save area
Instead do one more allocation at the thread creation time. This frees
a lot of space on the stack.
Also do not use alloca() for temporal storage in signal delivery sendsig()
function and signal return syscall sys_sigreturn(). This saves equal
amount of space, again by the cost of one more allocation at the thread
creation time.
A useful experiment now would be to reduce KSTACK_PAGES.
Reviewed by: jhb, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31954
amd64: move signal handling and register structures manipulations into exec_machdep.c
from machdep.c which is too large pile of unrelated things.
Some ptrace functions are moved from machdep.c to ptrace_machdep.c.
Now machdep.c contains code mostly related to the low level initialization
and regular low level operation of the architecture, while signal MD code
and registers handling is placed in exec_machdep.c.
Write to the PWREN register should be done in update_ios based
on the power_mode value in the ios struct.
Also none of the manual (RockChip and Altera) and Linux talks about
the needed for an inverted PWREN value so just remove this.
This fixes eMMC (and possibly SD) when u-boot didn't setup the controller.
Mark Johnston [Tue, 21 Sep 2021 15:32:23 +0000 (11:32 -0400)]
bitset(9): Introduce BIT_FOREACH_ISSET and BIT_FOREACH_ISCLR
These allow one to non-destructively iterate over the set or clear bits
in a bitset. The motivation is that we have several code fragments
which iterate over a CPU set like this:
This is slow since CPU_FFS begins the search at the beginning of the
bitset each time. On amd64 and arm64, CPU sets have size 256, so there
are four limbs in the bitset and we do a lot of unnecessary scanning.
A second problem is that this is destructive, so code which needs to
preserve the original set has to make a copy. In particular, we have
quite a few functions which take a cpuset_t parameter by value, meaning
that each call has to copy the 32 byte cpuset_t.
The new macros address both problems.
Reviewed by: cem, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32028
Michael Tuexen [Tue, 21 Sep 2021 15:13:57 +0000 (17:13 +0200)]
sctp: Simplify stream scheduler usage
Callers are getting the stcb send lock, so just KASSERT that.
No need to signal this when calling stream scheduler functions.
No functional change intended.
A different exception is raised when we hit a 32bits breakpoint, rather than
a 64bits one, so handle those as well when COMPAT_FREEBSD32 is defined.
This should fix SIGBUS at least when using breakpoints with thumb2 code.
When handling a data irq, the sdhci driver calls the
sdhci_platform_will_handle() method, to determine if it should allow the
platform driver to handle the transfer or fall back to programmed I/O.
While dumping, the data irq path may be invoked directly (not from an
interrupt context), which the bcm2835_sdhci DMA code is not prepared to
handle. Return early in this case, to force the fallback to PIO.
Otherwise, the KASSERT that follows will be triggered, and the dump will
fail. On non-INVARIANTS kernels, the system will hang, waiting for a DMA
interrupt that will never arrive.
Reviewed by: kevans
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31893
Warner Losh [Tue, 21 Sep 2021 04:02:35 +0000 (22:02 -0600)]
endian.h: Use the __bswap* versions
Make it possible to have all these macros work without bswap* being
defined. bswap* is part of the application namespace and applications
are free to redefine those functions.
Warner Losh [Fri, 17 Sep 2021 22:30:06 +0000 (16:30 -0600)]
camcontrol: depop command
Implement and document the new depop command. This command manages drive elements
for drives that support it. Storage elements are typically heads. Element status
can be discovered. Elements may be removed or restored. And the status of any
current depop operation can be assessed.
depop -d elm will remove element elm and truncate available capacity.
depop -l will list the current drive elements and their current status.
depop -r elm will try to restore all retired elements and rebuild capacity.
Changing storage elements may reinitialize the drive. This operation will lose
data and may take hours to complete. Use the drive provided timeout for
operations by default.
Warner Losh [Fri, 17 Sep 2021 22:29:22 +0000 (16:29 -0600)]
libcam: Define depop structures and introduce scsi_wrap
Define structures related to the depop set of commands (GET PHYSICAL ELEMENT
STATUS, REMOVE ELEMENT AND TRUNCATE, and RESTORE ELEMENT AND REBUILD) as
well as the CDB construction routines.
Also create scsi_wrap.c. This will have convenience routines that will do all
the elements of allocating the ccb, generating the CDB, sending the command
(looping as necessary for cases where data is returned, but it's size isn't
known up front), etc. As this functionality is fleshed out, calling many
camcontrol commands programatically gets much easier.
Greg V [Sat, 24 Apr 2021 11:53:34 +0000 (14:53 +0300)]
vt: call driver's postswitch when panicking on ttyv0
In vt_kms, the postswitch callback restores fbdev mode when
panicking or entering the debugger. This ensures that even when
a graphical applicatino was running on the first tty, simple framebuffer
mode would be restored and the panic would be visible instead
of the frozen GUI. But vt wouldn't call the postswitch callback
when we're already on the first tty, so running a GUI on it
would prevent you from reading any panics.
Add generic mmc_helper which uses newly introduced device_*_property
api. Thanks to this change the sd/mmc drivers will be capable
of parsing both DT and ACPI description.
Ensure backward compatibility for all mmc_fdt_helper users.
Andrew Turner [Mon, 20 Sep 2021 08:55:44 +0000 (08:55 +0000)]
Add ELF macros found in the aaelf64 spec
The arm64 aaelf64 spec [0] has DT_AARCH64_ that could be used with
dynamic linking. It also adds GNU_PROPERTY_AARCH64_FEATURE_1_AND used
to tell the kernel which CPU features the binary is compatible with,
but does not require to execute correctly.
Add these values so the kernel and elf tools can make use of them.
Xin LI [Mon, 20 Sep 2021 05:25:23 +0000 (22:25 -0700)]
The linux rc.d script mounts several filesystems related to Linux ABI
compatibility layer. When /compat is located on a ZFS other than /,
mount would fail because they were not mounted.
Solve this by moving `linux` to depend on `zfs` which mounts all ZFS
filesystems.
Mark Johnston [Sun, 19 Sep 2021 17:45:09 +0000 (13:45 -0400)]
freebsd32: Fix a double copyin in sendmsg() and recvmsg()
freebsd32_sendmsg() and freebsd32_recvmsg() both copyin the message
header twice, once directly and once in freebsd32_copyinmsghdr(). The
iovec length from the former is used when copying in msg_iov, but the
rest of the kernel uses the iovec length from the latter. When
kern_sendit() and kern_recvit() iterate over the iovec to compute the
residual for I/O, they can therefore end up walking past the end of the
copied in iovec, either resulting in a system call error, userspace
memory corruption from uiomove() with invalid iovecs, or a kernel page
fault if the copied-in iovec is followed by an unmapped KVA region.
Reported by: syzbot+7cc64cd0c49605acd421@syzkaller.appspotmail.com
Reviewed by: kib, emaste
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32010
When multiple matches are found, we keep the provided string on the
input line and print unique matches as suggestions.
But the multiple matches might be the same command found in different
directories, so we should deduplicate the matches first and then decide
whether to autocomplete the command or not, based on the number of
unique matches.
Eliminate snaplk / bufwait LOR when creating UFS snapshots
Each vnode has an embedded lock that controls access to its contents.
However vnodes describing a UFS snapshot all share a single snapshot
lock to coordinate their access and update. As part of creating a
new UFS snapshot, it has to have its individual vnode lock replaced
with the filesystem's snapshot lock.
The lock order for regular vnodes with respect to buffer locks is that
they must first acquire the vnode lock, then a buffer lock. The order
for the snapshot lock is reversed: a buffer lock must be acquired before
the snapshot lock.
When creating a new snapshot, the snapshot file must retain its vnode
lock until it has allocated all the blocks that it needs before
switching to the snapshot lock. This update moves one final piece of
the initial snapshot block allocation so that it is done before the
newly created snapshot is switched to use the snapshot lock.
Rick Macklem [Sat, 18 Sep 2021 21:38:43 +0000 (14:38 -0700)]
nfscl: Use vfs.nfs.maxalloclen to limit Deallocate RPC RTT
Unlike Copy, the NFSv4.2 Allocate and Deallocate operations do not
allow a reply with partial completion. As such, the only way to
limit the time the operation takes to provide a reasonable RPC RTT
is to limit the size of the allocation/deallocation in the NFSv4.2
client.
This patch uses the sysctl vfs.nfs.maxalloclen to set
the limit on the size of the Deallocate operation.
There is no way to know how long a server will take to do an
deallocate operation, but 64Mbytes results in a reasonable
RPC RTT for the slow hardware I test on.
For an 8Gbyte deallocation, the elapsed time for doing it in 64Mbyte
chunks was the same (within margin of variability) as the
elapsed time taken for a single large deallocation
operation for a FreeBSD server with a UFS file system.
Mark Johnston [Sat, 18 Sep 2021 14:38:39 +0000 (10:38 -0400)]
unix: Fix a use-after-free in unp_drop()
We need to load the socket pointer after locking the PCB, otherwise
the socket may have been detached and freed by the time that unp_drop()
sets so_error.
This previously went unnoticed as the socket zone was _NOFREE.