kib [Tue, 11 Feb 2020 17:45:01 +0000 (17:45 +0000)]
if_media.c: staticize and constify ifmedia description structures used under IFMEDIA_DEBUG.
The reason for this change is to make it clear the scope of the in-kernel usage
of IFM_TYPE_DESCRIPTIONS and IFM_SUBTYPE_ETHERNET_DESCRIPTIONS macros. Also it
is somewhat better C.
br [Tue, 11 Feb 2020 15:12:09 +0000 (15:12 +0000)]
Add PCI Express driver for the ARM Neoverse N1 System Development
Platform (N1SDP).
Neoverse N1 is a high-performance ARM microarchitecture designed
by the ARM Holdings for the server market.
The PCI part on N1SDP was shipped untested and suffers from some
integration issues.
For instance accessing to not existing BDFs causes System Error
(SError) exception. To mitigate this, the firmware scans the bus,
catches SErrors and creates a table with valid BDFs. That allows
us to filter-out accesses to invalid BDFs in this driver.
Also the root complex config space (BDF == 0) has an unusual
location in memory map, so remapping accesses to it is required.
Finally, the config space is restricted to 32-bit accesses only.
This was tested on the ARM boxes kindly provided by the ARM Ltd
to the DARPA CHERI Project.
In collaboration with: andrew
Reviewed by: andrew
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D23349
kevans [Tue, 11 Feb 2020 06:12:02 +0000 (06:12 +0000)]
backup-passwd: mask out all passwords in the diff
The previous expression borked if a username had a plus or hyphen in it.
This is needlessly restrictive- at leSt a hyphen in the middle is valid.
Instead of playing this game, let's just assume the username can't contain a
colon and mask out the second field.
Submitted by: sigsys gmail com
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D23548
cperciva [Tue, 11 Feb 2020 04:03:22 +0000 (04:03 +0000)]
Remove /qemu from EC2 ARM AMIs
I forgot to do this as part of r345858 -- I added it to the
vm_extra_pre_umount in vmimage.subr but forgot that function
was overridden in the EC2 build.
hselasky [Mon, 10 Feb 2020 20:23:08 +0000 (20:23 +0000)]
Fix for unbalanced EPOCH(9) usage in the generic kernel interrupt
handler.
Interrupt handlers are removed via intr_event_execute_handlers() when
IH_DEAD is set. The thread removing the interrupt is woken up, and
calls intr_event_update(). When this happens, the ie_hflags are
cleared and re-built from all the remaining handlers sharing the
event. When the last IH_NET handler is removed, the IH_NET flag will
be cleared from ih_hflags (or ie_hflags may still be being rebuilt in
a different context), and the ithread_execute_handlers() may return
with ie_hflags missing IH_NET. This can lead to a scenario where
IH_NET was present before calling ithread_execute_handlers, and is not
present at its return, meaning the need for epoch must be cached
locally.
This can happen when loading and unloading network drivers. Also make
sure the ie_hflags is not cleared before being updated.
This is a regression issue after r357004.
Backtrace:
panic()
# trying to access epoch tracker on stack of dead thread
_epoch_enter_preempt()
ifunit_ref()
ifioctl()
fo_ioctl()
kern_ioctl()
sys_ioctl()
syscallenter()
amd64_syscall()
jtl [Mon, 10 Feb 2020 18:06:38 +0000 (18:06 +0000)]
Modify the vm.panic_on_oom sysctl to take a count of events.
Currently, the vm.panic_on_oom sysctl is a boolean which controls the
behavior of the VM system when it encounters an out-of-memory situation.
If set to 0, the VM system kills the largest process. If set to any other
value, the VM system will initiate a panic.
This change makes the sysctl a count of events. If set to 0, the VM system
kills the largest process. If set to any other value, the VM system will
kill the largest process until it has seen the specified number of
out-of-memory events. Once it reaches the specified number of events, it
will initiate a panic.
This change is helpful in capturing cores when the system is in a perpetual
cycle of out-of-memory events (as opposed to just hitting one or two
sporadic out-of-memory events).
imp [Mon, 10 Feb 2020 17:16:59 +0000 (17:16 +0000)]
Remove sparc64 specific eeprom command
This command was only ever for sparc64, so remove it. Remove
usr.sbin/Makeiile.sparc64 as well since it only references ofwdump
(cross platform) and eeprom.
mjg [Mon, 10 Feb 2020 13:54:34 +0000 (13:54 +0000)]
vfs: fix lock recursion in vrele
vrele is supposed to be called with an unlocked vnode, but this was never
asserted for if v_usecount was > 0. For such counts the lock is never touched
by the routine. As a result the kernel has several consumers which expect
vunref semantics and get away with calling vrele since they happen to never do
it when this is the last reference (and for some of them this may happen to be
a guarantee).
Work around the problem by changing vrele semantics to tolerate being called
with a lock. This eliminates a possible bug where the lock is already held and
vputx takes it anyway.
scottl [Mon, 10 Feb 2020 00:23:20 +0000 (00:23 +0000)]
Add rudamentary support for UFS to probe whether a block device supports the
BIO_SPEEDUP command. Add complimentary support to the CAM periphs that
support it.
ian [Mon, 10 Feb 2020 00:05:04 +0000 (00:05 +0000)]
Implement atomic_testandclear_{32,int,long} for 32-bit arm. Also, replace
the existing implementation of atomic_testandset with the same new algorithm,
which uses fewer instructions and fewer registers.
kevans [Sun, 9 Feb 2020 18:53:53 +0000 (18:53 +0000)]
mips: mark GOOGLETEST broken, due to no fault of its own
As explained in the comment; GOOGLETEST cannot currently be compiled on any
mips variant at the moment due to the cross toolchain seemingly using the
wrong spec and not pulling in libgcc. We'll be fine when llvm 10 lands, at
which point this should be reverted most expeditiously.
kib [Sun, 9 Feb 2020 12:27:22 +0000 (12:27 +0000)]
Use sigfastblock(2) for masking signals in libthr.
Ensure proper handshake to transfer sigfastblock(2) blocking word
ownership from rtld to libthr.
Unfortunately sigfastblock(2) is not enough to stop intercepting
signals in libthr, because critical sections must ensure more than
just signal blocking.
Tested by: pho
Disscussed with: cem, emaste, jilles
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D12773
kib [Sun, 9 Feb 2020 12:22:43 +0000 (12:22 +0000)]
Use sigfastblock(2) in rtld.
This allows for rtld to not issue two sigprocmask(2) syscalls for each
symbol binding operation in single-threaded processes. Rtld needs to
block signals as part of locking to ensure signal safety of the bind
process, because signal handlers might need to lazily resolve symbol
references.
As result, number of syscalls issued on startup by simple programs not
using libthr, is typically reduced 2x. For instance, for hello world,
I see:
non-sigfastblock
# (truss ./hello > /dev/null) |& wc -l
63
sigfastblock
# (truss ./hello > /dev/null) |& wc -l
37
Tested by: pho
Disscussed with: cem, emaste, jilles
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D12773
kib [Sun, 9 Feb 2020 12:10:37 +0000 (12:10 +0000)]
Add AT_BSDFLAGS auxv entry.
The intent is to provide bsd-specific flags relevant to interpreter
and C runtime. I did not want to reuse AT_FLAGS which is common ELF
auxv entry.
Use bsdflags to report kernel support for sigfastblock(2). This
allows rtld and libthr to safely infer the syscall presence without
SIGSYS. The tunable kern.elf{32,64}.sigfastblock blocks reporting.
Tested by: pho
Disscussed with: cem, emaste, jilles
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D12773
kib [Sun, 9 Feb 2020 11:53:12 +0000 (11:53 +0000)]
Add a way to manage thread signal mask using shared word, instead of syscall.
A new syscall sigfastblock(2) is added which registers a uint32_t
variable as containing the count of blocks for signal delivery. Its
content is read by kernel on each syscall entry and on AST processing,
non-zero count of blocks is interpreted same as the signal mask
blocking all signals.
The biggest downside of the feature that I see is that memory
corruption that affects the registered fast sigblock location, would
cause quite strange application misbehavior. For instance, the process
would be immune to ^C (but killable by SIGKILL).
With consumers (rtld and libthr added), benchmarks do not show a
slow-down of the syscalls in micro-measurements, and macro benchmarks
like buildworld do not demonstrate a difference. Part of the reason is
that buildworld time is dominated by compiler, and clang already links
to libthr. On the other hand, small utilities typically used by shell
scripts have the total number of syscalls cut by half.
The syscall is not exported from the stable libc version namespace on
purpose. It is intended to be used only by our C runtime
implementation internals.
Tested by: pho
Disscussed with: cem, emaste, jilles
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D12773
kevans [Sun, 9 Feb 2020 04:05:30 +0000 (04:05 +0000)]
MFV r357687: Import NFS fix for O_SEARCH tests
The version that ended upstream was ultimately slightly different than the
version committed here; notably, statvfs() is used but it's redefined
appropriately to statfs() on FreeBSD since we don't provide the fstypename
for the former interface.
This patch introduces processing of the frames
up to 9kB by the mvneta driver. Some versions of
this NIC limit TX checksum offloading, depending
on the frame size, so add appropriate handling
of this feature.
kevans [Fri, 7 Feb 2020 22:36:37 +0000 (22:36 +0000)]
O_SEARCH test: mark revokex an expected fail on NFS
The revokex test does not work when the scratch directory is created on NFS.
Given the nature of NFS, it likely can never work without looking like a
security hole since O_SEARCH would rely on the server knowing that the
directory did have +x at the time of open and that it's OK for it to have
been revoked based on POSIX specification for O_SEARCH.
This does mean that O_SEARCH is only partially functional on NFS in general,
but I suspect the execute bit getting revoked in the process is likely not
common.
kib [Fri, 7 Feb 2020 22:28:04 +0000 (22:28 +0000)]
pmc: Add Hygon Dhyana support.
To make the PMC tool pmcstat working properly on Hygon platform, add
support for Hygon Dhyana family 18h by using the PMC initialization
code path of AMD family 17h.
kevans [Fri, 7 Feb 2020 21:36:14 +0000 (21:36 +0000)]
geli taste: allow GELIBOOT tagged providers as well
Currently the installer will tag geliboot partitions with both BOOT and
GELIBOOT; the former allows the kernel to taste it at boot, while the latter
is what loaders keys off of.
However, it seems reasonable to assume that if a provider's been tagged with
GELIBOOT that the kernel should also take that as a hint to taste/attach at
boot. This would allow us to stop tagging GELIBOOT partitions with BOOT in
bsdinstall, but I'm not sure that there's a compelling reason to do so any
time soon.
imp [Fri, 7 Feb 2020 17:47:08 +0000 (17:47 +0000)]
Supress not supported message
For the moment, supress the operation not supported messages at this level. In
the fullness of time, we will have better error tracking so we can diagnose
issues in the future.
mav [Fri, 7 Feb 2020 15:50:47 +0000 (15:50 +0000)]
Remove duplicate dbufs accounting.
Since AVL already has embedded element counter, use dn_dbufs_count
only for dbufs not counted there (bonus buffers) and just add them.
This removes two atomics per dbuf life cycle.
According to profiler it reduces time spent by dbuf_destroy() inside
bottlenecked dbuf_evict_thread() from 13.36% to 9.20% of the core.
This counter is used only on illumos, so for FreeBSD it was just a
waste of time.
Fix xae(4) driver attachement on the Government Furnished Equipment (GFE)
riscv cores.
GFE cores come with standard DTS file that lacks standard 'dmas ='
property, which means xae(4) could not find a DMA controller to use.
The 'dmas' property could not be added to the DTS file because the
ethernet controller and DMA engine parts in Linux are implemented
in a single driver.
Instead of 'dmas' property the standard Xilinx 'axistream-connected'
property is provided, so fallback to use it instead.
Suggested by: James Clarke <jrtc27@jrtc27.com>
Reviewed by: James Clarke <jrtc27@jrtc27.com>
Sponsored by: DARPA, AFRL
scottl [Fri, 7 Feb 2020 12:15:39 +0000 (12:15 +0000)]
Advertise the MPI Message Version that's contained in the IOCFacts message
in the sysctl block for the driver. mpsutil/mprutil needs this so it can
know how big of a buffer to allocate when requesting the IOCFacts from the
controller. This eliminates the kernel console messages about wrong
allocation sizes.
scottl [Fri, 7 Feb 2020 09:22:08 +0000 (09:22 +0000)]
Ever since the block layer expanded its command syntax beyond just
BIO_READ and BIO_WRITE, we've handled this expanded syntax poorly in
drivers when the driver doesn't support a particular command. Do a
sweep and fix that.
jhb [Thu, 6 Feb 2020 21:46:15 +0000 (21:46 +0000)]
Tidy the _set_tp function for RISC-V.
- Use a constant for the offset instead of a magic number.
- Use an addi instruction that writes to tp directly instead of a mv
that writes the result of a compiler-generated addi.
jeff [Thu, 6 Feb 2020 20:51:46 +0000 (20:51 +0000)]
Fix a race in smr_advance() that could result in unnecessary poll calls.
This was relatively harmless but surprising to see in counters. The
race occurred when rd_seq was read after the goal was updated and we
incorrectly calculated the delta between them.
jeff [Thu, 6 Feb 2020 20:47:50 +0000 (20:47 +0000)]
Temporarily force IFF_NEEDSEPOCH until drivers have been resolved.
Recent network epoch changes have left some drivers unexpectedly broken
and there is not yet a consensus on the correct fix. This is patch is
a minor performance impact until we can agree on the correct path
forward.
mav [Thu, 6 Feb 2020 20:32:53 +0000 (20:32 +0000)]
Reduce number of atomic_add() calls in aggsum.
Previous code used 4 atomics to do aggsum_flush_bucket() and 2 more to
re-borrow after the flush. But since asc_borrowed and asc_delta are
accessed only while holding asc_lock, it makes no any sense to modify
as_lower_bound and as_upper_bound in multiple steps. Instead of that
the new code uses only 2 atomics in all the cases, one per as_*_bound
variable. I think even that is overkill, simple atomic store and
load could be used here, since all modifications are done under the
as_lock, but there are no such primitives in ZFS code now.
While there, make borrow code consider previous borrow value, so that
on mixed request patterns reduce chance of needing to borrow again if
much larger request follows tiny one that needed borrow.
Also reduce as_numbuckets from uint64_t to u_int. It makes no sense
to use so large division operation on every aggsum_add().
Reviewed by: Brian Behlendorf, Paul Dagnelie
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
0mp [Thu, 6 Feb 2020 20:18:45 +0000 (20:18 +0000)]
Improve documentation of bootconfig and PARTITIONS
- Mention bootconfig target in TARGETS section.
- Document PARTITIONS variable, which is only mentioned in the examples,
but doesn't have its own point.
jeff [Thu, 6 Feb 2020 20:10:21 +0000 (20:10 +0000)]
Add some global counters for SMR. These may eventually become per-smr
counters. In my stress test there is only one poll for every 15,000
frees. This means we are effectively amortizing the cache coherency
overhead even with very high write rates (3M/s/core).
kevans [Thu, 6 Feb 2020 18:51:36 +0000 (18:51 +0000)]
MFV r357635: imnport v1.9 of the O_SEARCH tests
The RCSID data was wrong, so this is effectively a record-only merge
with correction of said data. No further changes should be needed in this
area, as we've now upstreamed our local changes to this specific test.
jhb [Thu, 6 Feb 2020 18:04:45 +0000 (18:04 +0000)]
Use the context created in makectx() for stack traces.
Always use the kdb_thr_ctx() for db_trace_thread() as on other
architectures. Initialize pcb_ra to be the sepc from the saved
trapframe rather than the saved ra to avoid skipping a frame.
jhb [Thu, 6 Feb 2020 18:02:38 +0000 (18:02 +0000)]
Fix DDB to unwind across exception frames.
While here, add an extra line of information for exceptions and
interrupts and compress the per-frame line down to one line to match
other architectures.
imp [Thu, 6 Feb 2020 18:00:50 +0000 (18:00 +0000)]
Add elf2aout removal
After committing, I noticed elf2aout commit missed the relnotes
yes. It's unlikely to be interesting, since it's just part of sparc64
removal and quite niche, but it was installed on all FreeBSD machines.