Martin Matuska [Wed, 12 Feb 2020 00:16:56 +0000 (00:16 +0000)]
MFV r357783:
Update libarchive to 3.4.2
Relevant vendor changes:
PR #1289: atomic extraction support (bsdtar -x --safe-writes)
PR #1308: big endian fix for UTF16 support in LHA reader
PR #1326: reject RAR5 files that declare invalid header flags
Issue #987: fix support 7z archive entries with Delta filter
Issue #1317: fix compression output buffer handling in XAR writer
Issue #1319: fix uname or gname longer than 32 characters in pax writer
Issue #1325: fix use after free when archiving hardlinks in ISO9660 or XAR
Use localtime_r() and gmtime_r() instead of localtime() and gmtime()
Relevant vendor changes:
PR #1289: atomic extraction support (bsdtar -x --safe-writes)
PR #1308: big endian fix for UTF16 support in LHA reader
PR #1326: reject RAR5 files that declare invalid header flags
Issue #987: fix support 7z archive entries with Delta filter
Issue #1317: fix compression output buffer handling in XAR writer
Issue #1319: fix uname or gname longer than 32 characters in pax writer
Issue #1325: fix use after free when archiving hardlinks in ISO9660 or XAR
Use localtime_r() and gmtime_r() instead of localtime() and gmtime()
Gleb Smirnoff [Tue, 11 Feb 2020 20:59:41 +0000 (20:59 +0000)]
Remove assertion from TASK_INIT() macro, since some users of
sys/taskqueue.h may not have includes that define MPASS. It
was useful during testing of r357771, but can be omitted now.
An invalid value of priority will yield only in potential
priority inversion, not a crash.
This fixes compilation of ports/x11/nvidia-driver.
Mark Johnston [Tue, 11 Feb 2020 20:06:33 +0000 (20:06 +0000)]
Reduce lock hold time in keg_drain().
Maintain a count of free slabs in the per-domain keg structure and use
that to clear the free slab list in constant time for most cases. This
helps minimize lock contention induced by reclamation, in preparation
for proactive trimming of excesses of free memory.
Gleb Smirnoff [Tue, 11 Feb 2020 18:48:07 +0000 (18:48 +0000)]
Add flag to struct task to mark the task as requiring network epoch.
When processing a taskqueue and a task has associated epoch, then
enter for duration of the task. If consecutive tasks belong to the
same epoch, batch them. Now we are talking about the network epoch
only.
Shrink the ta_priority size to 8-bits. No current consumers use
a priority that won't fit into 8 bits. Also complexity of
taskqueue_enqueue() is a square of maximum value of priority, so
we unlikely ever want to go over UCHAR_MAX here.
Mateusz Guzik [Tue, 11 Feb 2020 18:19:56 +0000 (18:19 +0000)]
vfs: fix vhold race in mnt_vnode_next_lazy_relock
vdrop can set the hold count to 0 and wait for the ->mnt_listmtx held by
mnt_vnode_next_lazy_relock caller. The routine incorrectly asserted the
count has to be > 0.
if_media.c: staticize and constify ifmedia description structures used under IFMEDIA_DEBUG.
The reason for this change is to make it clear the scope of the in-kernel usage
of IFM_TYPE_DESCRIPTIONS and IFM_SUBTYPE_ETHERNET_DESCRIPTIONS macros. Also it
is somewhat better C.
Ruslan Bukin [Tue, 11 Feb 2020 15:12:09 +0000 (15:12 +0000)]
Add PCI Express driver for the ARM Neoverse N1 System Development
Platform (N1SDP).
Neoverse N1 is a high-performance ARM microarchitecture designed
by the ARM Holdings for the server market.
The PCI part on N1SDP was shipped untested and suffers from some
integration issues.
For instance accessing to not existing BDFs causes System Error
(SError) exception. To mitigate this, the firmware scans the bus,
catches SErrors and creates a table with valid BDFs. That allows
us to filter-out accesses to invalid BDFs in this driver.
Also the root complex config space (BDF == 0) has an unusual
location in memory map, so remapping accesses to it is required.
Finally, the config space is restricted to 32-bit accesses only.
This was tested on the ARM boxes kindly provided by the ARM Ltd
to the DARPA CHERI Project.
In collaboration with: andrew
Reviewed by: andrew
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D23349
Kyle Evans [Tue, 11 Feb 2020 06:12:02 +0000 (06:12 +0000)]
backup-passwd: mask out all passwords in the diff
The previous expression borked if a username had a plus or hyphen in it.
This is needlessly restrictive- at leSt a hyphen in the middle is valid.
Instead of playing this game, let's just assume the username can't contain a
colon and mask out the second field.
Submitted by: sigsys gmail com
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D23548
Colin Percival [Tue, 11 Feb 2020 04:03:22 +0000 (04:03 +0000)]
Remove /qemu from EC2 ARM AMIs
I forgot to do this as part of r345858 -- I added it to the
vm_extra_pre_umount in vmimage.subr but forgot that function
was overridden in the EC2 build.
Fix for unbalanced EPOCH(9) usage in the generic kernel interrupt
handler.
Interrupt handlers are removed via intr_event_execute_handlers() when
IH_DEAD is set. The thread removing the interrupt is woken up, and
calls intr_event_update(). When this happens, the ie_hflags are
cleared and re-built from all the remaining handlers sharing the
event. When the last IH_NET handler is removed, the IH_NET flag will
be cleared from ih_hflags (or ie_hflags may still be being rebuilt in
a different context), and the ithread_execute_handlers() may return
with ie_hflags missing IH_NET. This can lead to a scenario where
IH_NET was present before calling ithread_execute_handlers, and is not
present at its return, meaning the need for epoch must be cached
locally.
This can happen when loading and unloading network drivers. Also make
sure the ie_hflags is not cleared before being updated.
This is a regression issue after r357004.
Backtrace:
panic()
# trying to access epoch tracker on stack of dead thread
_epoch_enter_preempt()
ifunit_ref()
ifioctl()
fo_ioctl()
kern_ioctl()
sys_ioctl()
syscallenter()
amd64_syscall()
Modify the vm.panic_on_oom sysctl to take a count of events.
Currently, the vm.panic_on_oom sysctl is a boolean which controls the
behavior of the VM system when it encounters an out-of-memory situation.
If set to 0, the VM system kills the largest process. If set to any other
value, the VM system will initiate a panic.
This change makes the sysctl a count of events. If set to 0, the VM system
kills the largest process. If set to any other value, the VM system will
kill the largest process until it has seen the specified number of
out-of-memory events. Once it reaches the specified number of events, it
will initiate a panic.
This change is helpful in capturing cores when the system is in a perpetual
cycle of out-of-memory events (as opposed to just hitting one or two
sporadic out-of-memory events).
Warner Losh [Mon, 10 Feb 2020 17:16:59 +0000 (17:16 +0000)]
Remove sparc64 specific eeprom command
This command was only ever for sparc64, so remove it. Remove
usr.sbin/Makeiile.sparc64 as well since it only references ofwdump
(cross platform) and eeprom.
Mateusz Guzik [Mon, 10 Feb 2020 13:54:34 +0000 (13:54 +0000)]
vfs: fix lock recursion in vrele
vrele is supposed to be called with an unlocked vnode, but this was never
asserted for if v_usecount was > 0. For such counts the lock is never touched
by the routine. As a result the kernel has several consumers which expect
vunref semantics and get away with calling vrele since they happen to never do
it when this is the last reference (and for some of them this may happen to be
a guarantee).
Work around the problem by changing vrele semantics to tolerate being called
with a lock. This eliminates a possible bug where the lock is already held and
vputx takes it anyway.
Scott Long [Mon, 10 Feb 2020 00:23:20 +0000 (00:23 +0000)]
Add rudamentary support for UFS to probe whether a block device supports the
BIO_SPEEDUP command. Add complimentary support to the CAM periphs that
support it.
Ian Lepore [Mon, 10 Feb 2020 00:05:04 +0000 (00:05 +0000)]
Implement atomic_testandclear_{32,int,long} for 32-bit arm. Also, replace
the existing implementation of atomic_testandset with the same new algorithm,
which uses fewer instructions and fewer registers.
Kyle Evans [Sun, 9 Feb 2020 18:53:53 +0000 (18:53 +0000)]
mips: mark GOOGLETEST broken, due to no fault of its own
As explained in the comment; GOOGLETEST cannot currently be compiled on any
mips variant at the moment due to the cross toolchain seemingly using the
wrong spec and not pulling in libgcc. We'll be fine when llvm 10 lands, at
which point this should be reverted most expeditiously.
Use sigfastblock(2) for masking signals in libthr.
Ensure proper handshake to transfer sigfastblock(2) blocking word
ownership from rtld to libthr.
Unfortunately sigfastblock(2) is not enough to stop intercepting
signals in libthr, because critical sections must ensure more than
just signal blocking.
Tested by: pho
Disscussed with: cem, emaste, jilles
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D12773
This allows for rtld to not issue two sigprocmask(2) syscalls for each
symbol binding operation in single-threaded processes. Rtld needs to
block signals as part of locking to ensure signal safety of the bind
process, because signal handlers might need to lazily resolve symbol
references.
As result, number of syscalls issued on startup by simple programs not
using libthr, is typically reduced 2x. For instance, for hello world,
I see:
non-sigfastblock
# (truss ./hello > /dev/null) |& wc -l
63
sigfastblock
# (truss ./hello > /dev/null) |& wc -l
37
Tested by: pho
Disscussed with: cem, emaste, jilles
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D12773
The intent is to provide bsd-specific flags relevant to interpreter
and C runtime. I did not want to reuse AT_FLAGS which is common ELF
auxv entry.
Use bsdflags to report kernel support for sigfastblock(2). This
allows rtld and libthr to safely infer the syscall presence without
SIGSYS. The tunable kern.elf{32,64}.sigfastblock blocks reporting.
Tested by: pho
Disscussed with: cem, emaste, jilles
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D12773
Add a way to manage thread signal mask using shared word, instead of syscall.
A new syscall sigfastblock(2) is added which registers a uint32_t
variable as containing the count of blocks for signal delivery. Its
content is read by kernel on each syscall entry and on AST processing,
non-zero count of blocks is interpreted same as the signal mask
blocking all signals.
The biggest downside of the feature that I see is that memory
corruption that affects the registered fast sigblock location, would
cause quite strange application misbehavior. For instance, the process
would be immune to ^C (but killable by SIGKILL).
With consumers (rtld and libthr added), benchmarks do not show a
slow-down of the syscalls in micro-measurements, and macro benchmarks
like buildworld do not demonstrate a difference. Part of the reason is
that buildworld time is dominated by compiler, and clang already links
to libthr. On the other hand, small utilities typically used by shell
scripts have the total number of syscalls cut by half.
The syscall is not exported from the stable libc version namespace on
purpose. It is intended to be used only by our C runtime
implementation internals.
Tested by: pho
Disscussed with: cem, emaste, jilles
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D12773
Kyle Evans [Sun, 9 Feb 2020 04:05:30 +0000 (04:05 +0000)]
MFV r357687: Import NFS fix for O_SEARCH tests
The version that ended upstream was ultimately slightly different than the
version committed here; notably, statvfs() is used but it's redefined
appropriately to statfs() on FreeBSD since we don't provide the fstypename
for the former interface.
Marcin Wojtas [Sat, 8 Feb 2020 13:33:47 +0000 (13:33 +0000)]
Implement jumbo frame support in mvneta driver
This patch introduces processing of the frames
up to 9kB by the mvneta driver. Some versions of
this NIC limit TX checksum offloading, depending
on the frame size, so add appropriate handling
of this feature.
Kyle Evans [Fri, 7 Feb 2020 22:36:37 +0000 (22:36 +0000)]
O_SEARCH test: mark revokex an expected fail on NFS
The revokex test does not work when the scratch directory is created on NFS.
Given the nature of NFS, it likely can never work without looking like a
security hole since O_SEARCH would rely on the server knowing that the
directory did have +x at the time of open and that it's OK for it to have
been revoked based on POSIX specification for O_SEARCH.
This does mean that O_SEARCH is only partially functional on NFS in general,
but I suspect the execute bit getting revoked in the process is likely not
common.
To make the PMC tool pmcstat working properly on Hygon platform, add
support for Hygon Dhyana family 18h by using the PMC initialization
code path of AMD family 17h.
Kyle Evans [Fri, 7 Feb 2020 21:36:14 +0000 (21:36 +0000)]
geli taste: allow GELIBOOT tagged providers as well
Currently the installer will tag geliboot partitions with both BOOT and
GELIBOOT; the former allows the kernel to taste it at boot, while the latter
is what loaders keys off of.
However, it seems reasonable to assume that if a provider's been tagged with
GELIBOOT that the kernel should also take that as a hint to taste/attach at
boot. This would allow us to stop tagging GELIBOOT partitions with BOOT in
bsdinstall, but I'm not sure that there's a compelling reason to do so any
time soon.
Warner Losh [Fri, 7 Feb 2020 17:47:08 +0000 (17:47 +0000)]
Supress not supported message
For the moment, supress the operation not supported messages at this level. In
the fullness of time, we will have better error tracking so we can diagnose
issues in the future.
Alexander Motin [Fri, 7 Feb 2020 15:50:47 +0000 (15:50 +0000)]
Remove duplicate dbufs accounting.
Since AVL already has embedded element counter, use dn_dbufs_count
only for dbufs not counted there (bonus buffers) and just add them.
This removes two atomics per dbuf life cycle.
According to profiler it reduces time spent by dbuf_destroy() inside
bottlenecked dbuf_evict_thread() from 13.36% to 9.20% of the core.
This counter is used only on illumos, so for FreeBSD it was just a
waste of time.
Ruslan Bukin [Fri, 7 Feb 2020 14:36:28 +0000 (14:36 +0000)]
Fix xae(4) driver attachement on the Government Furnished Equipment (GFE)
riscv cores.
GFE cores come with standard DTS file that lacks standard 'dmas ='
property, which means xae(4) could not find a DMA controller to use.
The 'dmas' property could not be added to the DTS file because the
ethernet controller and DMA engine parts in Linux are implemented
in a single driver.
Instead of 'dmas' property the standard Xilinx 'axistream-connected'
property is provided, so fallback to use it instead.
Suggested by: James Clarke <jrtc27@jrtc27.com>
Reviewed by: James Clarke <jrtc27@jrtc27.com>
Sponsored by: DARPA, AFRL
Scott Long [Fri, 7 Feb 2020 12:15:39 +0000 (12:15 +0000)]
Advertise the MPI Message Version that's contained in the IOCFacts message
in the sysctl block for the driver. mpsutil/mprutil needs this so it can
know how big of a buffer to allocate when requesting the IOCFacts from the
controller. This eliminates the kernel console messages about wrong
allocation sizes.