kevans [Sun, 22 Nov 2020 05:47:45 +0000 (05:47 +0000)]
[2/2] _umtx_op: introduce 32-bit/i386 flags for operations
This patch takes advantage of the consolidation that happened to provide two
flags that can be used with the native _umtx_op(2): UMTX_OP___32BIT and
UMTX_OP__I386.
UMTX_OP__32BIT iindicates that we are being provided with 32-bit structures.
Note that this flag alone indicates a 64bit time_t, since this is the
majority case.
UMTX_OP__I386 has been provided so that we can emulate i386 as well,
regardless of whether the host is amd64 or not.
Both imply a different set of copyops in sysumtx_op. freebsd32__umtx_op
simply ignores the flags, since it's already doing a 32-bit operation and
it's unlikely we'll be running an emulator under compat32. Future work
could consider it, but the author sees little benefit.
This will be used by qemu-bsd-user to pass on all _umtx_op calls to the
native interface as long as the host/target endianness matches, effectively
eliminating most if not all of the remaining unresolved deadlocks for most.
This version changed a fair amount from what was under review, mostly in
response to refactoring of the prereq reorganization and battle-testing
it with qemu-bsd-user. The main changes are as follows:
1.) The i386 flag got renamed to omit '32BIT' since this is redundant.
2.) The flags are now properly handled on 32-bit platforms to emulate other
32-bit platforms.
3.) Robust list handling was fixed, and the 32-bit functionality that was
previously gated by COMPAT_FREEBSD32 is now unconditional.
4.) Robust list handling was also improved, including the error reported
when a process has already registered 32-bit ABI lists and also
detecting if native robust lists have already been registered. Both
scenarios now return EBUSY rather than EINVAL, because the input is
technically valid but we're too busy with another ABI's lists.
libsysdecode/kdump/truss support will go into review soon-ish, along with
the associated manpage update.
rew [Sun, 22 Nov 2020 05:00:28 +0000 (05:00 +0000)]
fd: free old file descriptor tables when not shared
During the life of a process, new file descriptor tables may be allocated. When
a new table is allocated, the old table is placed in a free list and held onto
until all processes referencing them exit.
When a new file descriptor table is allocated, the old file descriptor table
can be freed when the current process has a single-thread and the file
descriptor table is not being shared with any other processes.
mav [Sun, 22 Nov 2020 04:29:55 +0000 (04:29 +0000)]
Make handlers and atpds overflows unlikely.
- Allocate 256 handlers more than payload commands for management purposes.
- Increase maximum number of handlers from 8K to 16K by tuning the format.
- Just to be safe limit the number of payload commands to 16K - 256.
- Limit number of target exchanges in mixed mode to the number of atpds.
- If we still somehow get out of atpds -- return BUSY, since we really are.
kib [Sat, 21 Nov 2020 21:43:36 +0000 (21:43 +0000)]
Stop using eventhandlers for itimers subsystem exec and exit hooks.
While there, do some minor cleanup for kclocks. They are only
registered from kern_time.c, make registration function static.
Remove event hooks, they are not used by both registered kclocks.
Add some consts.
Perhaps we can stop registering kclocks at all and statically
initialize them.
Reviewed by: mjg
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D27305
np [Sat, 21 Nov 2020 03:27:32 +0000 (03:27 +0000)]
cxgbe(4): Catch up with in-flight netmap rx before destroying queues.
The netmap application using the driver is responsible for replenishing
the receive freelists and they may be totally depleted when the
application exits. Packets in flight, if any, might block the pipeline
in case there aren't enough buffers left in the freelist. Avoid this by
filling up the freelists with a driver allocated buffer.
rmacklem [Fri, 20 Nov 2020 22:29:38 +0000 (22:29 +0000)]
Document the new "tls" NFS mount option.
Recent commits to head have added support for NFS over TLS
to the FreeBSD kernel.
To enable use of this for an NFS mount, the "tls" mount_nfs
option has been added.
Once the IETF has assigned an RFC number, I will replace "NNNN"
with the number.
rmacklem [Fri, 20 Nov 2020 22:14:51 +0000 (22:14 +0000)]
Update man page for new TLS export options.
NFS over TLS uses three new export options, added by r364979.
This patch updates the exports.5 man page for these new options.
Once assigned by IETF, "NNNN" will be replaced with the RFC number.
mav [Fri, 20 Nov 2020 19:36:34 +0000 (19:36 +0000)]
Increase queue depths from 1024/256 to 8192/1024 IOCBs.
Qlogic chips store S/G lists in the same queue as requests themselves. In
the worst case 1MB I/O may require up to 52 IOCBs, that means queue of 1024
IOCBs can store only 19 of such requests. The increase reduces chances of
overflow, while we should be able to afford additional 512KB of RAM per HBA.
The Linux driver uses comparable numbers.
While there, decouple ATIO queue size from response queue size. There is
no reason for them to be equal.
mav [Fri, 20 Nov 2020 18:02:04 +0000 (18:02 +0000)]
Cleanup DMA handling.
- Make isp_start() to set all the IOCB fields aside of S/G list, removing
extra information from isp_send_cmd(), now only doing S/G lists and sending.
- Turn DMA setup/free from being card and PCI-specific into OS-specific,
instead add new card-specific method for isp_send_cmd(). Previously this
function was a monster handling all the cards.
- Remove double error code translation.
jtl [Fri, 20 Nov 2020 17:26:02 +0000 (17:26 +0000)]
When copying types from one CTF container to another, ensure that we
encode 0-length (i.e. "") structure and union member names as offset 0.
This ensures that we don't confuse other parts of the CTF code which
expect this encoding.
This resolves a Dtrace error resolving members of anonymous structs/unions
within the (struct mbuf) type which some users were seeing after r366908.
While here, update the code in ctf_add_generic() to encode 0-length type
names as offset 0.
mhorne [Fri, 20 Nov 2020 15:21:10 +0000 (15:21 +0000)]
riscv: always initialize the static kernel environment
Ensure we initialize the static environment when not booting via
loader(8), and provide a static buffer if this is the case. This fixes
two issues.
First, performing the initialization ensures that kenv variables set in
the kernel's config file are honored. Previously, any new or overridden
values were ignored.
Second, providing the static buffer allows variables to be set in the
device tree's bootargs property of the chosen node. This can be set by
u-boot or by QEMU's '-append' flag. Attempting to this prior to this
change resulted in an early panic, since the static environment had no
buffer backing it.
mhorne [Fri, 20 Nov 2020 14:45:45 +0000 (14:45 +0000)]
Make net/ifq.h C++ friendly
Don't use "new" as an identifier, and add explicit casts from void *.
As a general policy, FreeBSD doesn't make any C++ compatibility
guarantees for kernel headers like it does for userland, but it is a
small effort to do so in this case, to the benefit of a downstream
consumer (NetApp).
Reviewed by: rscheff
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D27286
0mp [Fri, 20 Nov 2020 14:37:07 +0000 (14:37 +0000)]
Revert r367291 (KEYWORD: shutdown for rc.d/zfs)
The problem is that zfs is asked to stop too early in the shutdown
sequence. Other services, such as syslog may still be running and have some
files open (e.g., under /var/log). This of course causes the messages like:
For now, let's remove the shutdown KEYWORD from the zfs service, as people are
reporting problems in their setups:
https://lists.freebsd.org/pipermail/freebsd-current/2020-November/077559.html
In the future, we may think of stopping zfs on shutdown after all the other
services and just before init(8) exits. Another interesting option might be to
a new rcorder(8) KEYWORD like "shutdownjail", but this idea would need to be
discussed a bit.
tuexen [Fri, 20 Nov 2020 13:00:28 +0000 (13:00 +0000)]
Fix an issue I introuced in r367530: tcp_twcheck() can be called
with to == NULL for SYN segments. So don't assume tp != NULL.
Thanks to jhb@ for reporting and suggesting a fix.
mmel [Fri, 20 Nov 2020 09:05:36 +0000 (09:05 +0000)]
Also pass interrupt binding request to non-root interrupt controllers.
There are message based controllers that can bind interrupts even if they are
not implemented as root controllers (such as the ITS subblock of GIC).
mav [Fri, 20 Nov 2020 01:15:48 +0000 (01:15 +0000)]
Remove parallel SCSI and 1/2Gb FC support from isp(4).
This removes 288KB (36%) of the driver code and zillions of hacks and
workarounds, making single driver uniformly support several different
generations of hardware interfaces, not counting minor card variations.
After years of the hopeless fight, I don't think it worth to continue
support for hardware obsolete for 15-20 years. Instead much cleaner
now code should allow to move forward toward better locking, multiple
queues and other cool features.
All the remaining Qlogic cards starting from 4Gb 24xx to 32Gb 27xx use
the same hardware/firmware interface with minor incremental improvements,
so it seems to be a good new starting point. Except one PCI-X model all
all of them are PCIe and so still usable in modern systems.
wulf [Fri, 20 Nov 2020 00:13:30 +0000 (00:13 +0000)]
psm(4): Disable AUX multiplexer probing on all Lenovo laptops.
Rudimentary AUX multiplexing support was added to kernel to make possible
touchpad initialization on some HP EliteBook laptops with trackpoint.
Disable multiplexer probing on all Lenovo laptops now as they use touchpad
pass-through port rather than AUX multiplexer to connect trackpoint and
at least two model (X120e and X121e) is known for getting PS/2 AUX port
dysfunctional after switching back to hidden multiplexing mode.
AUX MUX probing can be reenabled with setting of hw.psm.mux_disabled loader
tunable to 0.
mjg [Thu, 19 Nov 2020 19:25:47 +0000 (19:25 +0000)]
pipe: thundering herd problem in pipelock
All reads and writes are serialized with a hand-rolled lock, but unlocking it
always wakes up all waiters. Existing flag fields get resized to make room for
introduction of waiter counter without growing the struct.
fernape [Thu, 19 Nov 2020 19:05:16 +0000 (19:05 +0000)]
fstat(1): Add EXAMPLES section
* Add examples covering -f, -m and -p flags.
While here, extend the initial description paragraph to note that fstat(1)
will report on all opened files, belonging to processes the user has access to.
The current paragraph may lead to understand that you can get information on
opened files from processes belonging to other users.
fernape [Thu, 19 Nov 2020 18:58:15 +0000 (18:58 +0000)]
grep(1): Add more EXAMPLES
* Add more EXAMPLES covering flags: -A, -B, -c, -f, -i, -H, -l, -q, -R, -w
* While here, change existing wording to use the imperative (remove "To
find")
* Reword first example to be consistent with how grep(1) understand
words (-w)
markj [Thu, 19 Nov 2020 18:37:28 +0000 (18:37 +0000)]
callout(9): Fix a race between CPU migration and callout_drain()
Suppose a running callout re-arms itself, and before the callout
finishes running another CPU calls callout_drain() and goes to sleep.
softclock_call_cc() will wake up the draining thread, which may not run
immediately if there is a lot of CPU load. Furthermore, the callout is
still in the callout wheel so it can continue to run and re-arm itself.
Then, suppose that the callout migrates to another CPU before the
draining thread gets a chance to run. The draining thread is in this
loop in _callout_stop_safe():
while (cc_exec_curr(cc) == c) {
CC_UNLOCK(cc);
sleep();
CC_LOCK(cc);
}
but after the migration, cc points to the wrong CPU's callout state.
Then the draining thread goes off and removes the callout from the
wheel, but does so using the wrong lock and per-CPU callout state.
Fix the problem by doing a re-lookup of the callout CPU after sleeping.
mhorne [Thu, 19 Nov 2020 18:03:40 +0000 (18:03 +0000)]
Add an option for entering KDB on recursive panics
There are many cases where one would choose avoid entering the debugger
on a normal panic, opting instead to reboot and possibly save a kernel
dump. However, recursive kernel panics are an unusual case that might
warrant attention from a human, so provide a secondary tunable,
debug.debugger_on_recursive_panic, to allow entering the debugger only
when this occurs.
For for simplicity in maintaining existing behaviour, the tunable
defaults to zero.
Reviewed by: cem, markj
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D27271
debdrup [Thu, 19 Nov 2020 16:57:45 +0000 (16:57 +0000)]
intro.7: Add missing manual page
Section 7 of the manual pages contain lots of very useful information, but
finding the pages is not always obvious - to assist people in finding the
information, add missing cross-references.
manu [Thu, 19 Nov 2020 14:27:01 +0000 (14:27 +0000)]
release: Switch the Allwinner board to GPT
Allwinner bootrom have an alternate location for u-boot at 128k.
Work was made recently in u-boot to relocate correctly if loaded from
there.
The advantage of this offset is that we can now use a GPT scheme.
mjg [Thu, 19 Nov 2020 10:00:48 +0000 (10:00 +0000)]
thread: numa-aware zombie reaping
The current global list is a significant problem, in particular induces a lot
of cross-domain thread frees. When running poudriere on a 2 domain box about
half of all frees were of that nature.
Patch below introduces per-domain thread data containing zombie lists and
domain-aware reaping. By default it only reaps from the current domain, only
reaping from others if there is free TID shortage.
A dedicated callout is introduced to reap lingering threads if there happens
to be no activity.
mjg [Thu, 19 Nov 2020 06:30:25 +0000 (06:30 +0000)]
pipe: allow for lockless pipe_stat
pipes get stated all thet time and this avoidably contributed to contention.
The pipe lock is only held to accomodate MAC and to check the type.
Since normally there is no probe for pipe stat depessimize this by having the
flag.
The pipe_state field gets modified with locks held all the time and it's not
feasible to convert them to use atomic store. Move the type flag away to a
separate variable as a simple cleanup and to provide stable field to read.
Use short for both fields to avoid growing the struct.
While here short-circuit MAC for pipe_poll as well.
markj [Thu, 19 Nov 2020 03:59:21 +0000 (03:59 +0000)]
vm_phys: Try to clean up NUMA KPIs
It can useful for code outside the VM system to look up the NUMA domain
of a page backing a virtual or physical address, specifically when
creating NUMA-aware data structures. We have _vm_phys_domain() for
this, but the leading underscore implies that it's an internal function,
and vm_phys.h has dependencies on a number of other headers.
Rename vm_phys_domain() to vm_page_domain(), and _vm_phys_domain() to
vm_phys_domain(). Make the latter an inline function.
Add _vm_phys.h and define struct vm_phys_seg there so that it's easier
to use in other headers. Include it from vm_page.h so that
vm_page_domain() can be defined there.
Include machine/vmparam.h from _vm_phys.h since it depends directly on
some constants defined there.
markj [Thu, 19 Nov 2020 02:50:48 +0000 (02:50 +0000)]
Remove NO_EVENTTIMERS support
The arm configs that required it have been removed from the tree.
Removing this option makes the callout code easier to read and
discourages developers from adding new configs without eventtimer
drivers.
Reviewed by: ian, imp, mav
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27270
glebius [Thu, 19 Nov 2020 02:20:38 +0000 (02:20 +0000)]
Add '-u' switch that would uncompress cores that were compressed by
kernel during dump time.
A real life scenario is that cores are compressed to reduce
size of dumpon partition, but we either don't care about space
in the /var/crash or we have a filesystem level compression of
/var/crash. And we want cores to be uncompressed in /var/crash
because we'd like to instantily read them with kgdb. In this
case we want kernel to write cores compressed, but savecore(1)
write them uncompressed.
emaste [Thu, 19 Nov 2020 00:03:15 +0000 (00:03 +0000)]
libc: fix undefined behavior from signed overflow in strstr and memmem
unsigned char promotes to int, which can overflow when shifted left by
24 bits or more. this has been reported multiple times but then
forgotten. it's expected to be benign UB, but can trap when built with
explicit overflow catching (ubsan or similar). fix it now.
note that promotion to uint32_t is safe and portable even outside of
the assumptions usually made in musl, since either uint32_t has rank
at least unsigned int, so that no further default promotions happen,
or int is wide enough that the shift can't overflow. this is a
desirable property to have in case someone wants to reuse the code
elsewhere.
emaste [Thu, 19 Nov 2020 00:02:12 +0000 (00:02 +0000)]
libc: optimize memmem two-way bad character shift
first, the condition (mem && k < p) is redundant, because mem being
nonzero implies the needle is periodic with period exactly p, in which
case any byte that appears in the needle must appear in the last p
bytes of the needle, bounding the shift (k) by p.
second, the whole point of replacing the shift k by mem (=l-p) is to
prevent shifting by less than mem when discarding the memory on shift,
in which case linear time could not be guaranteed. but as written, the
check also replaced shifts greater than mem by mem, reducing the
benefit of the shift. there is no possible benefit to this reduction of
the shift; since mem is being cleared, the full shift is valid and
more optimal. so only replace the shift by mem when it would be less
than mem.
oshogbo [Wed, 18 Nov 2020 21:07:08 +0000 (21:07 +0000)]
jail: introduce per jail suser_enabled setting
The suser_enable sysctl allows to remove a privileged rights from uid 0.
This change introduce per jail setting which allow to make root a
normal user.