Warner Losh [Wed, 7 Oct 2020 06:16:37 +0000 (06:16 +0000)]
Move kernel env global variables, etc to sys/kenv.h
The kernel globals for kenv are confined to 2 files that need them and
a few that likely shouldn't (but as written the code does). Move them
from sys/systm.h to sys/kenv.h. This removed a XXX from systm.h and
cleans it up a little bit...
Warner Losh [Wed, 7 Oct 2020 05:44:35 +0000 (05:44 +0000)]
cam: Add quirk for Samsung MZ7* behind a SATA-to-SAS interposer
Sometimes, this drive will be present in the system such that the the
firmware identification string doesn't start with ATA, such as when
it's behind a SATA-to-SAS interposer. Add another quirk for that.
Kristof Provost [Tue, 6 Oct 2020 19:19:56 +0000 (19:19 +0000)]
bridge: call member interface ioctl() without NET_EPOCH
We're not allowed to hold NET_EPOCH while sleeping, so when we call ioctl()
handlers for member interfaces we cannot be in NET_EPOCH. We still need some
protection of our CK_LISTs, so hold BRIDGE_LOCK instead.
That requires changing BRIDGE_LOCK into a sleepable lock, and separating the
BRIDGE_RT_LOCK, to protect bridge_rtnode lists. That lock is taken in the data
path (while in NET_EPOCH), so it cannot be a sleepable lock.
John Baldwin [Tue, 6 Oct 2020 17:58:56 +0000 (17:58 +0000)]
Store the send tag type in the common send tag header.
Both cxgbe(4) and mlx5(4) wrapped the existing send tag header with
their own identical headers that stored the type that the
type-specific tag structures inherited from, so in practice it seems
drivers need this in the tag anyway. This permits removing these
extra header indirections (struct cxgbe_snd_tag and struct
mlx5e_snd_tag).
In addition, this permits driver-independent code to query the type of
a tag, e.g. to know what type of tag is being queried via
if_snd_query.
Jessica Clarke [Tue, 6 Oct 2020 13:02:20 +0000 (13:02 +0000)]
riscv: Handle supervisor instruction page faults
We should never take instruction page faults when in the kernel, but by
using the standard page fault code we should get a more-informative
message about faulting on a NOFAULT page rather than branching to the
default case here and printing an "Unknown kernel exception ..."
message.
Li-Wen Hsu [Tue, 6 Oct 2020 04:18:42 +0000 (04:18 +0000)]
Clear the dmesg buffer to prevent rotating causes issues
This is a workaround for the current continuously failing test case
sys.kern.sonewconn_overflow.sonewconn_overflow_01
The side effect is the dmesg buffer got cleared and may effect other tests
depends on dmesg output running in parallel. The better solution would be
tailing the log file like /var/log/debug.log
Kyle Evans [Mon, 5 Oct 2020 20:57:44 +0000 (20:57 +0000)]
crunchgen: fix MK_AUTO_OBJ logic after r364166
r364166 converted echo -n `/bin/pwd` to a raw pwd invocation, leaving a
trailing newline at the end of path. This caused a later stat() of it to
erroneously fail and the fallback to MK_AUTO_OBJ=no logic proceeded as
unexpected.
Harry Schmalzbauer bissected the resulting build failure he experienced
(stable/12 host, -HEAD build) down to r365887. This change is mostly
unrelated, except it switches the build to bootstrapped crunchgen - clue!
I then bissected recent crunchgen changes going back a bit since we wouldn't
observe the failure immediately with -CURRENT in most configurations, which
landed me on r364166. After many intense head-scratching minutes and printf
debugging, I realized that the newline was the difference. This is where our
tale ends.
Reported by: Harry Schmalzbauer, O. Hartmann, Mike Tancsa, kevans
MFC after: 3 days
Ryan Moeller [Mon, 5 Oct 2020 20:13:22 +0000 (20:13 +0000)]
Enable iterating all sysctls, even ones with CTLFLAG_SKIP
Add an "nextnoskip" sysctl that allows for listing of sysctls intended to be
normally skipped for cost reasons.
This makes it so the names/descriptions of those sysctls can be discovered with
sysctl -aN/sysctl -ad/sysctl -at.
It also makes it so children are visited when a node flagged with CTLFLAG_SKIP
is explicitly requested.
The intended use case is to mark the root "kstat" node with CTLFLAG_SKIP so that
the extensive and expensive stats are skipped by default but may still be easily
obtained without having to know them all (which may not even be possible) and
request each one-by-one.
Mateusz Guzik [Mon, 5 Oct 2020 19:38:51 +0000 (19:38 +0000)]
cache: fix pwd use-after-free in setting up fallback
Since the code exits smr section prior to calling pwd_hold, the used
pwd can be freed and a new one allocated with the same address, making
the comparison erroneously true.
* Add some examples showing binary, arguments and file info from living
processes.
* Show information from core dumps including an attempt using an old core file.
* While here, fix warning 'no blank before trailing delimiter' reported by igor.
* Add EXAMPLES section with four simple examples.
* Simplify -H flag description. This makes easy to see the difference between
this flag and -h
* While here, fix .Tn deprecated macro.
Kyle Evans [Sun, 4 Oct 2020 22:41:43 +0000 (22:41 +0000)]
lualoader: improve the design of the brand-/logo- mechanism
In the previous world order, any brand/logo was forced to pull in the
drawer and call drawer.add{Brand,Logo} with the name their brand/logo is
taking and a table describing it.
In the new world order, these files just need to return a table that maps
out graphics types to a table of the exact same format as what was
previously being passed back into the drawer. The appeal here is not needing
to grab a reference back to the drawer module and having a cleaner
data-driven looking format for these. The format has been renamed to 'gfx-*'
prefixes and each one can provide a logo and a brand.
drawer.addBrand/drawer.addLogo will remain in place until FreeBSD 13, as
there's no overhead to them and it's not yet worth the break in
compatibility with any pre-existing brands and logos.
Kyle Evans [Sun, 4 Oct 2020 17:07:13 +0000 (17:07 +0000)]
ngctl: add -c (compact output) for the dot command
The output of "ngctl dot" is suitable for small netgraph networks. Even
moderate complex netgraph setups (about a dozen nodes) are hard to
understand from the .dot output, because each node and each hook are shown
as a full blown structure.
This patch allows to generate much more compact output and graphs by
omitting the extra structures for the individual hooks. Instead the names of
the hooks are labels to the edges.
It gives the answer would the thread sleep according to current state
of signals and suspensions. Of course the answer is racy and allows
for false-negatives (no sleep when signal is delivered after process
lock is dropped). Also the answer might change due to signal
rescheduling among threads in multi-threaded process.
Still it is the best approximation I can provide, to answering the
question was the thread interrupted.
Reviewed by: markj
Tested by: pho, rmacklem
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D26628
- Extract suspension check into sig_ast_checksusp() helper.
- Extract signal check and calculation of the interruption errno into
sig_ast_needsigchk() helper.
The helpers are moved to kern_sig.c which is the proper place for
signal-related code.
Improve control flow in sleepq_catch_signals(), to handle ret == 0
(can sleep) and ret != 0 (interrupted) only once, by separating
checking code into sleepq_check_ast_sq_locked(), which return value is
interpreted at single location.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D26628
Fix route flags update during RTM_CHANGE.
Nexthop lookup was not consireding rt_flags when doing
structure comparison, which lead to an original nexthop
selection when changing flags. Fix the case by adding
rt_flags field into comparison and rearranging nhop_priv
fields to allow for efficient matching.
Fix `route change X/Y flags` case - recent changes
disallowed specifying RTF_GATEWAY flag without actual gateway.
It turns out, route(8) fills in RTF_GATEWAY by default, unless
-interface flag is specified. Fix regression by clearing
RTF_GATEWAY flag instead of failing.
Fix route flag reporting in RTM_CHANGE messages by explicitly
updating rtm_flags after operation competion.
Add IPv4/IPv6 tests for flag-only route changes.
amd64: Store full 64bit of FIP/FDP for 64bit processes when using XSAVE.
If current process is 64bit, use rex-prefixed version of XSAVE
(XSAVE64). If current process is 32bit and CPU supports saving
segment registers cs/ds in the FPU save area, use non-prefixed variant
of XSAVE.
Reported and tested by: Michał Górny <mgorny@mgorny@moritz.systems>
PR: 250043
Reviewed by: emaste, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D26643
Fix pmap_pti_add_kva() call for doublefault stack page.
After r354889 stack got struct nmi_pcpu at top, which makes IST top
not page-aligned. Since pmap_pti_add_kva() truncates/rounds up
addresses, it erronously entered a page mapped before double fault
stack into the pti page table.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Add virtio-9p (aka VirtFS) filesystem sharing to bhyve.
VirtFS allows sharing an arbitrary directory tree between bhyve virtual
machine and the host. Current implementation has a fairly complete support
for 9P2000.L protocol, except for the extended attribute support. It has
been verified to work with the qemu-kvm hypervisor.
Move KTRUSERRET() from userret() to ast(). It's a really long
detour - it writes ktrace entries to the filesystem - so the overhead
of ast() won't make any difference.
This change is based on the nexthop objects landed in D24232.
The change introduces the concept of nexthop groups.
Each group contains the collection of nexthops with their
relative weights and a dataplane-optimized structure to enable
efficient nexthop selection.
Simular to the nexthops, nexthop groups are immutable. Dataplane part
gets compiled during group creation and is basically an array of
nexthop pointers, compiled w.r.t their weights.
With this change, `rt_nhop` field of `struct rtentry` contains either
nexthop or nexthop group. They are distinguished by the presense of
NHF_MULTIPATH flag.
All dataplane lookup functions returns pointer to the nexthop object,
leaving nexhop groups details inside routing subsystem.
User-visible changes:
The change is intended to be backward-compatible: all non-mpath operations
should work as before with ROUTE_MPATH and net.route.multipath=1.
All routes now comes with weight, default weight is 1, maximum is 2^24-1.
Current maximum multipath group width is statically set to 64.
This will become sysctl-tunable in the followup changes.
Using functionality:
* Recompile kernel with ROUTE_MPATH
* set net.route.multipath to 1
Next steps:
* Land outbound hashing for locally-originated routes ( D26523 ).
* Fix net/bird multipath (net/frr seems to work fine)
* Add ROUTE_MPATH to GENERIC
* Set net.route.multipath=1 by default
Mark Johnston [Fri, 2 Oct 2020 19:16:06 +0000 (19:16 +0000)]
vm_pageout: Avoid rounding down the inactive scan target
With helper page daemon threads, enabled by default in r364786, we
divide the inactive target by the number of threads, rounding down, and
sum the total number of pages freed by the threads. This sum is
compared with the original target, but by rounding down we might lose
pages, causing the page daemon control loop to conclude that inactive
queue scanning isn't keeping up with demand for free pages. Typically
this results in excessive swapping.
Fix the problem by accounting for the error in the main pagedaemon
thread's target. Note that by default the problem will manifest only in
systems with >16 CPUs in a NUMA domain.
Reviewed by: cem
Discussed with: dougm
Reported and tested by: dhw, glebius
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26610
Mark Johnston [Fri, 2 Oct 2020 19:04:29 +0000 (19:04 +0000)]
uma: Use the bucket cache for cross-domain allocations
uma_zalloc_domain() allocates from the requested domain instead of
following a first-touch policy (the default for most zones). Currently
it is only used by malloc_domainset(), and consumers free returned items
with free(9) since r363834.
Previously uma_zalloc_domain() worked by always going to the keg for an
item. As a result, the use of UMA zone caches was unbalanced: we free
items to the caches, but always allocate from the keg, skipping the
caches.
Make some effort to allocate from the UMA caches when performing a
cross-domain allocation. This avoids blowing up the caches when
something is performing many transient allocations with
malloc_domainset().
Reported and tested by: dhw, glebius
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26427
Mark Johnston [Fri, 2 Oct 2020 19:04:09 +0000 (19:04 +0000)]
uma: Use LIFO for non-SMR bucket caches
When SMR was introduced, zone_put_bucket() was changed to always place
full buckets at the end of the queue. However, it is generally
preferable to use recently used buckets since their items are more
likely to be resident in cache. So, for buckets that have no constraint
on item reuse, use a last-in-first-out ordering as we did before.
Reviewed by: rlibby
Tested by: dhw, glebius
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26426