mjg [Mon, 5 Oct 2020 19:38:51 +0000 (19:38 +0000)]
cache: fix pwd use-after-free in setting up fallback
Since the code exits smr section prior to calling pwd_hold, the used
pwd can be freed and a new one allocated with the same address, making
the comparison erroneously true.
fernape [Mon, 5 Oct 2020 14:07:32 +0000 (14:07 +0000)]
procstat(1): Add EXAMPLES section
* Add some examples showing binary, arguments and file info from living
processes.
* Show information from core dumps including an attempt using an old core file.
* While here, fix warning 'no blank before trailing delimiter' reported by igor.
fernape [Mon, 5 Oct 2020 13:39:37 +0000 (13:39 +0000)]
df(1): Add EXAMPLES section to man page
* Add EXAMPLES section with four simple examples.
* Simplify -H flag description. This makes easy to see the difference between
this flag and -h
* While here, fix .Tn deprecated macro.
kevans [Sun, 4 Oct 2020 22:41:43 +0000 (22:41 +0000)]
lualoader: improve the design of the brand-/logo- mechanism
In the previous world order, any brand/logo was forced to pull in the
drawer and call drawer.add{Brand,Logo} with the name their brand/logo is
taking and a table describing it.
In the new world order, these files just need to return a table that maps
out graphics types to a table of the exact same format as what was
previously being passed back into the drawer. The appeal here is not needing
to grab a reference back to the drawer module and having a cleaner
data-driven looking format for these. The format has been renamed to 'gfx-*'
prefixes and each one can provide a logo and a brand.
drawer.addBrand/drawer.addLogo will remain in place until FreeBSD 13, as
there's no overhead to them and it's not yet worth the break in
compatibility with any pre-existing brands and logos.
kevans [Sun, 4 Oct 2020 17:07:13 +0000 (17:07 +0000)]
ngctl: add -c (compact output) for the dot command
The output of "ngctl dot" is suitable for small netgraph networks. Even
moderate complex netgraph setups (about a dozen nodes) are hard to
understand from the .dot output, because each node and each hook are shown
as a full blown structure.
This patch allows to generate much more compact output and graphs by
omitting the extra structures for the individual hooks. Instead the names of
the hooks are labels to the edges.
kib [Sun, 4 Oct 2020 16:33:42 +0000 (16:33 +0000)]
Add sig_intr(9).
It gives the answer would the thread sleep according to current state
of signals and suspensions. Of course the answer is racy and allows
for false-negatives (no sleep when signal is delivered after process
lock is dropped). Also the answer might change due to signal
rescheduling among threads in multi-threaded process.
Still it is the best approximation I can provide, to answering the
question was the thread interrupted.
Reviewed by: markj
Tested by: pho, rmacklem
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D26628
kib [Sun, 4 Oct 2020 16:30:05 +0000 (16:30 +0000)]
Refactor sleepq_catch_signals().
- Extract suspension check into sig_ast_checksusp() helper.
- Extract signal check and calculation of the interruption errno into
sig_ast_needsigchk() helper.
The helpers are moved to kern_sig.c which is the proper place for
signal-related code.
Improve control flow in sleepq_catch_signals(), to handle ret == 0
(can sleep) and ret != 0 (interrupted) only once, by separating
checking code into sleepq_check_ast_sq_locked(), which return value is
interpreted at single location.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D26628
melifaro [Sun, 4 Oct 2020 13:24:58 +0000 (13:24 +0000)]
Fix route flags update during RTM_CHANGE.
Nexthop lookup was not consireding rt_flags when doing
structure comparison, which lead to an original nexthop
selection when changing flags. Fix the case by adding
rt_flags field into comparison and rearranging nhop_priv
fields to allow for efficient matching.
Fix `route change X/Y flags` case - recent changes
disallowed specifying RTF_GATEWAY flag without actual gateway.
It turns out, route(8) fills in RTF_GATEWAY by default, unless
-interface flag is specified. Fix regression by clearing
RTF_GATEWAY flag instead of failing.
Fix route flag reporting in RTM_CHANGE messages by explicitly
updating rtm_flags after operation competion.
Add IPv4/IPv6 tests for flag-only route changes.
kib [Sat, 3 Oct 2020 23:17:29 +0000 (23:17 +0000)]
amd64: Store full 64bit of FIP/FDP for 64bit processes when using XSAVE.
If current process is 64bit, use rex-prefixed version of XSAVE
(XSAVE64). If current process is 32bit and CPU supports saving
segment registers cs/ds in the FPU save area, use non-prefixed variant
of XSAVE.
Reported and tested by: Michał Górny <mgorny@mgorny@moritz.systems>
PR: 250043
Reviewed by: emaste, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D26643
kib [Sat, 3 Oct 2020 23:11:20 +0000 (23:11 +0000)]
Fix pmap_pti_add_kva() call for doublefault stack page.
After r354889 stack got struct nmi_pcpu at top, which makes IST top
not page-aligned. Since pmap_pti_add_kva() truncates/rounds up
addresses, it erronously entered a page mapped before double fault
stack into the pti page table.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
jceel [Sat, 3 Oct 2020 19:05:13 +0000 (19:05 +0000)]
Add virtio-9p (aka VirtFS) filesystem sharing to bhyve.
VirtFS allows sharing an arbitrary directory tree between bhyve virtual
machine and the host. Current implementation has a fairly complete support
for 9P2000.L protocol, except for the extended attribute support. It has
been verified to work with the qemu-kvm hypervisor.
trasz [Sat, 3 Oct 2020 12:03:08 +0000 (12:03 +0000)]
Move KTRUSERRET() from userret() to ast(). It's a really long
detour - it writes ktrace entries to the filesystem - so the overhead
of ast() won't make any difference.
melifaro [Sat, 3 Oct 2020 10:47:17 +0000 (10:47 +0000)]
Introduce scalable route multipath.
This change is based on the nexthop objects landed in D24232.
The change introduces the concept of nexthop groups.
Each group contains the collection of nexthops with their
relative weights and a dataplane-optimized structure to enable
efficient nexthop selection.
Simular to the nexthops, nexthop groups are immutable. Dataplane part
gets compiled during group creation and is basically an array of
nexthop pointers, compiled w.r.t their weights.
With this change, `rt_nhop` field of `struct rtentry` contains either
nexthop or nexthop group. They are distinguished by the presense of
NHF_MULTIPATH flag.
All dataplane lookup functions returns pointer to the nexthop object,
leaving nexhop groups details inside routing subsystem.
User-visible changes:
The change is intended to be backward-compatible: all non-mpath operations
should work as before with ROUTE_MPATH and net.route.multipath=1.
All routes now comes with weight, default weight is 1, maximum is 2^24-1.
Current maximum multipath group width is statically set to 64.
This will become sysctl-tunable in the followup changes.
Using functionality:
* Recompile kernel with ROUTE_MPATH
* set net.route.multipath to 1
Next steps:
* Land outbound hashing for locally-originated routes ( D26523 ).
* Fix net/bird multipath (net/frr seems to work fine)
* Add ROUTE_MPATH to GENERIC
* Set net.route.multipath=1 by default
markj [Fri, 2 Oct 2020 19:16:06 +0000 (19:16 +0000)]
vm_pageout: Avoid rounding down the inactive scan target
With helper page daemon threads, enabled by default in r364786, we
divide the inactive target by the number of threads, rounding down, and
sum the total number of pages freed by the threads. This sum is
compared with the original target, but by rounding down we might lose
pages, causing the page daemon control loop to conclude that inactive
queue scanning isn't keeping up with demand for free pages. Typically
this results in excessive swapping.
Fix the problem by accounting for the error in the main pagedaemon
thread's target. Note that by default the problem will manifest only in
systems with >16 CPUs in a NUMA domain.
Reviewed by: cem
Discussed with: dougm
Reported and tested by: dhw, glebius
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26610
markj [Fri, 2 Oct 2020 19:04:29 +0000 (19:04 +0000)]
uma: Use the bucket cache for cross-domain allocations
uma_zalloc_domain() allocates from the requested domain instead of
following a first-touch policy (the default for most zones). Currently
it is only used by malloc_domainset(), and consumers free returned items
with free(9) since r363834.
Previously uma_zalloc_domain() worked by always going to the keg for an
item. As a result, the use of UMA zone caches was unbalanced: we free
items to the caches, but always allocate from the keg, skipping the
caches.
Make some effort to allocate from the UMA caches when performing a
cross-domain allocation. This avoids blowing up the caches when
something is performing many transient allocations with
malloc_domainset().
Reported and tested by: dhw, glebius
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26427
markj [Fri, 2 Oct 2020 19:04:09 +0000 (19:04 +0000)]
uma: Use LIFO for non-SMR bucket caches
When SMR was introduced, zone_put_bucket() was changed to always place
full buckets at the end of the queue. However, it is generally
preferable to use recently used buckets since their items are more
likely to be resident in cache. So, for buckets that have no constraint
on item reuse, use a last-in-first-out ordering as we did before.
Reviewed by: rlibby
Tested by: dhw, glebius
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26426
markj [Fri, 2 Oct 2020 18:35:55 +0000 (18:35 +0000)]
newlocale(3): Fix a memory leak.
newlocale() optionally takes a "base" locale, from which components not
specified in the mask are inherited. POSIX says that newlocale() may
modify "base" and return it, or free "base" and return a newly allocated
locale. We were not doing either, so applications which use newlocale()
to modify an existing base locale end up leaking memory on FreeBSD.
This diff fixes the leak by releasing a reference to the base locale
before returning. This is less efficient than modifying "base"
directly, but is simpler for an initial bug fix. Also, update the man
page to clarify behaviour with respect to "base".
PR: 249416
MFC after: 3 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26522
manu [Fri, 2 Oct 2020 18:26:41 +0000 (18:26 +0000)]
linuxkpi: Add backlight support
Add backlight function to linuxkpi.
Graphics drivers expose the backlight of the panel directly so allow them to use the backlight subsystem so
user can use backlight(8) to configure them.
Reviewed by: hselasky
Relnotes: yes
Differential Revision: The FreeBSD Foundation
manu [Fri, 2 Oct 2020 18:18:01 +0000 (18:18 +0000)]
Add backlight subsystem
This is a simple subsystem that allow drivers to register as a backlight.
Each backlight creates a device node under /dev/backlight/backlightX and
an alias based on the name provided.
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26250
markj [Fri, 2 Oct 2020 17:50:22 +0000 (17:50 +0000)]
Implement sparse core dumps
Currently we allocate and map zero-filled anonymous pages when dumping
core. This can result in lots of needless disk I/O and page
allocations. This change tries to make the core dumper more clever and
represent unbacked ranges of virtual memory by holes in the core dump
file.
Add a new page fault type, VM_FAULT_NOFILL, which causes vm_fault() to
clean up and return an error when it would otherwise map a zero-filled
page. Then, in the core dumper code, prefault all user pages and handle
errors by simply extending the size of the core file. This also fixes a
bug related to the fact that vn_io_fault1() does not attempt partial I/O
in the face of errors from vm_fault_quick_hold_pages(): if a truncated
file is mapped into a user process, an attempt to dump beyond the end of
the file results in an error, but this means that valid pages
immediately preceding the end of the file might not have been dumped
either.
The change reduces the core dump size of trivial programs by a factor of
ten simply by excluding unaccessed libc.so pages.
PR: 249067
Reviewed by: kib
Tested by: pho
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26590
markj [Fri, 2 Oct 2020 17:49:13 +0000 (17:49 +0000)]
Simplify the check for non-dumpable VM object types
OBJT_DEFAULT, _SWAP, _VNODE and _PHYS is exactly the set of
non-fictitious object types, so just check for OBJ_FICTITIOUS. The
check no longer excludes dead objects, but such objects have to be
handled regardless.
No functional change intended.
Reviewed by: alc, dougm, kib
Tested by: pho
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26589
emaste [Fri, 2 Oct 2020 14:00:52 +0000 (14:00 +0000)]
libmd: add dependency workaround for r366344
r366344 fixed and reenabled the assembly optimized skein implementation,
but skein_block objects were not being rebuilt in no-clean builds. This
resulted in failing no-clean builds. SKEIN_USE_ASM controls which
routines come from C vs assembly, and with no explicit dependency
r366344's change to SKEIN_USE_ASM did not cause skein_block.{o,pico}
to be rebuilt.
Add a dependency on this Makefile for the skein_block objects. This
dependency is broader in scope than absolutely required (that is, the
skein_block objects will now be rebuilt on any change to this Makefile).
There are ways this could be addressed, but it is probably not worth the
additional effort or testing time to pursue them.
PR: 248221
Reported by: kevans, Jeremy Faulkner
Discussed with: kevans
Sponsored by: The FreeBSD Foundation
Access faults in user mode are treated like TLB misses, which leads to an
endless loop of faults. It's less serious than the same fault in kernel mode,
because we can just terminate the process, but that's not ideal.
cxgbe(4): validate largest_rx_cluster and safest_rx_cluster.
These tunables can only be set to a valid cluster size (2K, 4K, 9K, or
16K) as documented in the man page. Anything else could lead to a
panic on interface up.
mmacy [Thu, 1 Oct 2020 23:28:21 +0000 (23:28 +0000)]
OpenZFS: MFV 2.0-rc3-gfc5966
- Annotate FreeBSD sysctls with CTLFLAG_MPSAFE
- Reduce stack usage of Lua
- Don't save user FPU context in kernel threads
- Add support for procfs_list
- Code cleanup in zio_crypt
- Add DB_RF_NOPREFETCH to dbuf_read()s in dnode.c
- Drop references when skipping dmu_send due to EXDEV
- Eliminate gratuitous bzeroing in dbuf_stats_hash_table_data
- Fix legacy compat for platform IOCs
vangyzen [Thu, 1 Oct 2020 21:48:22 +0000 (21:48 +0000)]
zgrep: fix exit status with multiple files
zgrep should exit with success when given multiple files and the
pattern is found in at least one file. Prior to this change,
it would exit with success only if the pattern was found in _every_ file.
The assembly implementation incorrectly used logical AND instead of
bitwise AND. Fix, and re-enable in libmd.
Submitted by: Yang Zhong <yzhong@freebsdfoundation.org>
Reviewed by: cem (earlier)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26614
kevans [Thu, 1 Oct 2020 19:56:38 +0000 (19:56 +0000)]
auxv: partially revert r366207, cast buflen to unsigned int as needed
The warning generated pre-r366207 is actually a sign comparison warning:
error: comparison of integers of different signs: 'unsigned long' and 'int'
if (strlcpy(buf, execpath, buflen) >= buflen)
Revert parts that affected other lines and just cast this to unsigned int.
The buflen < 0 -> EINVAL has been kept despite no longer serving any
purposes w.r.t. sign-extension because I do believe it's the right thing to
do: "The provided buffer was not the right size for the requested item."
The original warning is confirmed to still be gone with an:
env WARNS=6 make WITHOUT_TESTS=yes.
jhb [Thu, 1 Oct 2020 16:45:11 +0000 (16:45 +0000)]
Clear the upper 32-bits of registers in x86_emulate_cpuid().
Per the Intel manuals, CPUID is supposed to unconditionally zero the
upper 32 bits of the involved (rax/rbx/rcx/rdx) registers.
Previously, the emulation would cast pointers to the 64-bit register
values down to `uint32_t`, which while properly manipulating the lower
bits, would leave any garbage in the upper bits uncleared. While no
existing guest OSes seem to stumble over this in practice, the bhyve
emulation should match x86 expectations.
This was discovered through alignment warnings emitted by gcc9, while
testing it against SmartOS/bhyve.
ngie [Thu, 1 Oct 2020 16:37:49 +0000 (16:37 +0000)]
Eliminate duplicate `afterinstallconfigs` target
Define separate dependent targets which `afterinstallconfigs` relies on, in
order to modify `${DESTDIR}/etc/master.passwd` and
`${DESTDIR}/etc/nsswitch.conf`.
Mark these targets .PHONY, since they manipulate configurations on the fly and
the generation logic isn't 100% defined in terms of the source files/logic,
and is variable, based on MK_foo flags.
netchild [Thu, 1 Oct 2020 08:57:36 +0000 (08:57 +0000)]
Remove nfsstat. Running nfsstat in crashinfo will give the stats of the
running kernel instead of the stats of the crashed kernel. The current
version uses sysctls to query the stats and does not work at all (anymore)
on crash dumps.