scrmap is only used in the one compilation unit in all cases, make it static
rather than extern'ing it. There's little benefit, but it's easy to do.
It's unclear how this hasn't failed many builds before now, since it should
have cropped up sometime around deeper hierarchies getting a default WARNS.
Each entry actually stores a native pointer, not a uint64_t quantity. While
we're here, go ahead and export the pointer as-is rather than converting it
to KVA. This may be more useful as consumers can map /dev/mem and observe
the entry.
For reference, see: sys/contrib/edk2/Include/Uefi/UefiSpec.h
Dimitry Andric [Fri, 8 Jan 2021 16:35:18 +0000 (17:35 +0100)]
MFC llvm fixes for building ceph on powerpc
Merge commit 8f5e3c74b from llvm git (by Teresa Johnson):
[PowerPC] Fix compile time issue in recursive CTR analysis code
Summary:
Avoid re-examining operands on recursive walk looking for CTR.
This was causing huge compile time after some earlier optimization
created a large expression.
The start of the expression (created by IndVarSimplify) looked like:
%469 = lshr i64 trunc (i128 xor (i128 udiv (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 ptrtoint (i8 @_ZN4absl13hash_internal13CityHashState5kSeedE to i64), i64 120) to i128), i128 8192506886679785011), i128 64), i128 mul (i128 zext (i64 add (i64 ptrtoint (i8 @_ZN4absl13hash_internal13CityHashState5kSeedE to i64), i64 120) to i128), i128 8192506886679785011)) to i64), i64 45) to i128), i128 8192506886679785011), i128 64), i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 ptrtoint (i8 @_ZN4absl13hash_internal13CityHashState5kSeedE to i64), i64 120) to i128), i128 8192506886679785011), i128 64), i128 mul (i128 zext (i64 add (i64 ptrtoint (i8 @_ZN4absl13hash_internal13CityHashState5kSeedE to i64), i64 120) to i128), i128 8192506886679785011)) to i64), i64 45) to i128), ...
with the _ZN4absl13hash_internal13CityHashState5kSeedE referenced many times.
Merge commit 4f568fbd2 from llvm git (by Nemanja Ivanovic):
[PowerPC] Do not emit HW loop when TLS var accessed in PHI of loop exit
If any PHI nodes in loop exit blocks have incoming values from the
loop that are accesses of TLS variables with local dynamic or general
dynamic TLS model, the address will be computed inside the loop. Since
this includes a call to __tls_get_addr, this will in turn cause the
CTR loops verifier to complain.
Disable CTR loops in such cases.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=48527
This should fix building ceph 12.2.12 on powerpc64, powerpc, powerpcspe
and powerpc64le.
Marius Strobl [Wed, 26 Jun 2019 15:28:21 +0000 (15:28 +0000)]
o In iflib_txq_drain():
- Remove desc_used, which is only ever written to.
- Remove a dead store to reclaimed.
- Don't recycle avail.
- Sort variables according to style(9).
These changes will make a subsequent commit easier to read.
o In iflib_tx_credits_update(), don't bother checking whether the
ift_txd_credits_update method pointer is NULL; _iflib_pre_assert()
asserts upfront that this method has been assigned and functions
like iflib_{fast_intr_rxtx,netmap_timer_adjust,txq_can_drain}()
and _task_fn_tx() were already unconditionally relying on the
method being callable.
Marko Zec [Wed, 19 Jun 2019 08:39:19 +0000 (08:39 +0000)]
Evaluating htons() at compile time is more efficient than doing ntohs()
at runtime. This change removes a dependency on a barrel shifter pass
before branch resolution, while reducing the instruction stream size
by 9 bytes on amd64.
Mark Johnston [Sun, 3 Jan 2021 16:32:30 +0000 (11:32 -0500)]
Ensure that dirent's d_off field is initialized
We have the d_off field in struct dirent for providing the seek offset
of the next directory entry. Several filesystems were not initializing
the field, which ends up being copied out to userland.
Mark Johnston [Wed, 23 Dec 2020 16:31:47 +0000 (11:31 -0500)]
qatfw: Fix firmware autoloading for qat_c2xxx devices
r368193 was suppsed to rename the MOF firmware image, but the
qat_c2xxxfw makefile defined the two images in the wrong order so the
MMP image was renamed instead.
Mark Johnston [Wed, 23 Dec 2020 16:13:00 +0000 (11:13 -0500)]
ffs: Avoid out-of-bounds accesses in the fs_active bitmap
We use a bitmap to track which cylinder groups have changed between
snapshot creation and filesystem suspension. The "legs" of the bitmap
are four bytes wide (see ACTIVESET()) so we must round up the allocation
size to a multiple of four bytes.
I believe this bug is harmless since UMA/kmem_* will both pad the
allocation and zero the full allocation. Note that malloc() does inline
zeroing when the allocation size is known at compile-time.
Mark Johnston [Wed, 23 Dec 2020 05:11:16 +0000 (00:11 -0500)]
netgraph: Fix ng_ether's shutdown handing
When tearing down a VNET, netgraph sends shutdown messages to all of the
nodes before detaching interfaces (SI_SUB_NETGRAPH comes before
SI_SUB_INIT_IF in teardown order). ng_ether nodes handle this by
destroying themselves without detaching from the parent ifnet. Then,
when ifnets go away they detach their ng_ether nodes again, triggering a
use-after-free.
Handle this by modifying ng_ether_shutdown() to detach from the ifnet.
If the shutdown was triggered by an ifnet being destroyed, we will clear
priv->ifp in the ng_ether detach callback, so priv->ifp may be NULL.
Also get rid of the printf in vnet_netgraph_uninit(). It can be
triggered trivially by ng_ether since ng_ether_shutdown() persists the
node unless NG_REALLY_DIE is set.
Mark Johnston [Fri, 18 Dec 2020 16:04:48 +0000 (16:04 +0000)]
acpi: Ensure that adjacent memory affinity table entries are coalesced
The SRAT may contain multiple distinct entries that together describe a
contiguous region of physical memory. In this case we were not
coalescing the corresponding entries in the memory affinity table, which
led to fragmented phys_avail[] entries. Since r338431 the vm_phys_segs[]
entries derived from phys_avail[] will be coalesced, resulting in a
situation where vm_phys_segs[] entries do not have a covering
phys_avail[] entry. vm_page_startup() will not add such segments to the
physical memory allocator, leaving them unused.
Reported by: Don Morris <dgmorris@earthlink.net>
Reviewed by: kib, vangyzen
Ed Maste [Sun, 27 Dec 2020 00:34:24 +0000 (19:34 -0500)]
Perform kernel linker ifunc test only for builds
dvl reported that "make installkernel" failed with "amd64/arm64/i386
kernel requires linker ifunc support." This test should apply to builds
only; the linker is not used at install time.
I think the same (ifunc-supporting) linker used to build the kernel
should be detected at install time in usual cases (and so not trigger
this error). However, there is no reason to disallow the install, if
for some reason the expected linker isn't the one tested at install
time.
PR: 251580
Reported by: dvl
Sponsored by: The FreeBSD Foundation
Conrad Meyer [Fri, 30 Oct 2020 19:00:42 +0000 (19:00 +0000)]
UFS2: Fix DoS due to corrupted extattrfile
Prior versions of FreeBSD (11.x) may have produced a corrupt extattr file.
(Specifically, r312416 accidentally fixed this defect by removing a strcpy.)
CURRENT FreeBSD supports disk images from those prior versions of FreeBSD.
Validate the internal structure as soon as we read it in from disk, to
prevent these extattr files from causing invariants violations and DoS.
Attempting to access the extattr portion of these files results in
EINTEGRITY. At this time, the only way to repair files damaged in this way
is to copy the contents to another file and move it over the original.
PR: 244089
Reported by: Andrea Venturoli <ml AT netfence.it>
Reviewed by: kib
Discussed with: mckusick (earlier draft)
Mitchell Horne [Wed, 23 Dec 2020 19:36:17 +0000 (15:36 -0400)]
gdb(4) fix x86 signal reporting
The existing values correspond to x86 exception vector numbers, but the
trap numbers used in the kernel do not match these 1-to-1. Prefer the
definitions from x86/trap.h, as they are what actually get passed to
kdb_trap(). This is of little consequence, as gdb_cpu_signal() only
reports the trap reason (signal number) to the gdb client.
This is limited to the subset of trap values for which kdb_trap() is
reachable.
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Mitchell Horne [Wed, 23 Dec 2020 18:37:05 +0000 (14:37 -0400)]
gdb(4): allow bulk write of registers
Add support for the remote 'G' packet. This is not widely used by gdb
when 'P' is supported, but is technically required by any remote gdb
stub implementation [1].
Mitchell Horne [Wed, 23 Dec 2020 18:36:08 +0000 (14:36 -0400)]
gdb(4): handle single register read packets
We support bulk reads of the register set, but not reading specific
registers via the 'p' packet. This is useful at least for the 'call'
command in gdb.
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Kristof Provost [Sun, 20 Dec 2020 20:06:32 +0000 (21:06 +0100)]
pf: Fix unaligned checksum updates
The algorithm we use to update checksums only works correctly if the
updated data is aligned on 16-bit boundaries (relative to the start of
the packet).
Li-Wen Hsu [Sun, 27 Oct 2019 21:07:50 +0000 (21:07 +0000)]
Follow r354121 to fix some python3 errors in sys.netpfil.*
stderr:
Traceback (most recent call last):
File "/usr/tests/sys/netpfil/common/pft_ping.py", line 135, in <module>
main()
File "/usr/tests/sys/netpfil/common/pft_ping.py", line 124, in main
ping(args.sendif[0], args.to[0], args)
File "/usr/tests/sys/netpfil/common/pft_ping.py", line 74, in ping
raw = sp.raw(str(PAYLOAD_MAGIC))
File "/usr/local/lib/python3.6/site-packages/scapy/compat.py", line 52, in raw
return bytes(x)
TypeError: string argument without an encoding
MFC with: r354121
Sponsored by: The FreeBSD Foundation
Bjoern A. Zeeb [Sat, 26 Oct 2019 21:19:55 +0000 (21:19 +0000)]
Upgrade (scapy) py2 tests to work on py3.
In order to move python2 out of the test framework to avoid py2 vs. py3
confusions upgrade the remaining test cases using scapy to work with py3.
That means only one version of scapy needs to be installed in the CI system.
It also gives a path forward for testing i386 issues observed in the CI
system with some of these tests.
Fixes are:
- Use default python from environment (which is 3.x these days).
- properly ident some lines as common for the rest of the file to avoid
errors.
- cast the calculated offset to an int as the division result is considered
a float which is not accepted input.
- when comparing payload to a magic number make sure we always add the
payload properly to the packet and do not try to compare string in
the result but convert the data payload back into an integer.
- fix print formating.
Discussed with: lwhsu, kp (taking it off his todo :)
MFC after: 2 weeks
Kristof Provost [Mon, 19 Aug 2019 10:48:27 +0000 (10:48 +0000)]
netpfil tests: Move pft_ping.py and sniffer.py to the common test directory
The pft_ping.py and sniffer.py tool is moved from tests/sys/netpfil/pf to
tests/sys/netpfil/common directory because these tools are to be used in
common for all the firewalls.
Some files got their contented duplicated in r345409. Some mistakes where
fixed in r345430. The only file that was left with a duplicated content was
CVE-2019-5598.py.
Gordon Bergling [Sat, 19 Dec 2020 12:47:40 +0000 (12:47 +0000)]
MFC r368804: ipfw(8): Fix a few mandoc related issues
- no blank before trailing delimiter
- missing section argument: Xr inet_pton
- skipping paragraph macro: Pp before Ss
- unusual Xr order: syslogd after sysrc
- tab in filled text
There were a few multiline NAT examples which used the .Dl macro with
tabs. I converted them to .Bd, which is a more suitable macro for that case.
Gordon Bergling [Sat, 19 Dec 2020 11:47:38 +0000 (11:47 +0000)]
MFC r368802: nvmecontrol(8): Fix a few mandoc related issues and add a SEE ALSO section
- inserting missing end of block: Ss breaks Bl
- skipping paragraph macro: Pp before Ss
- referenced manual not found: Xr nvme 4 (2 times)
- unknown standard specifier: St The
The macro .St can only be used for standards known by mdoc(7). So add a
SEE ALSO section and add a reference to the NVM Express Base Specification.
Alex Richardson [Wed, 4 Nov 2020 14:31:52 +0000 (14:31 +0000)]
Fix bad libbxo format strings in jls
The existing format string for the empty case was trying to read varargs
values that weren't passed to xo_emit. This appears to work on x86 (since
the next argument is probably a pointer an empty string), but for CHERI
we can bound variadic arguments and detect a read past the end.
While touching these lines also use the libxo 'a' modifier to avoid having to
construct the libxo format string using asprintf.
Found by: CHERI
Reviewed By: allanjude
Differential Revision: https://reviews.freebsd.org/D26885
Mateusz Guzik [Sun, 23 Aug 2020 11:06:59 +0000 (11:06 +0000)]
libc: hide alphasort_thunk behind I_AM_SCANDIR_B
Should unbreak gcc build as reported by tinderbox:
lib/libc/gen/scandir.c:59:12: warning: 'alphasort_thunk' declared 'static' but never defined [-Wunused-function]
Stefan Eßer [Thu, 29 Oct 2020 08:26:38 +0000 (08:26 +0000)]
MFC: Fix calendar -a processing of files included in the user's home directory
The existing code performed a chdir() into the home directory, but the
parser fell back to using the invoking user's home directory as the base
directory for the search for an include file.
Since use of the -a option is limited to UID==0, the directory searched
was typically ~root/.calendar, not the .calendar directory of the user
whose file is being processed.
The code in -CURRENT is quite different (forks sub-processes tp
process the files for each user) but this change should provide the
same functionality as the referenced commit to -CURRENT.
addr2line: add label checks when DW_AT_range and DW_AT_low_pc cannot be used
Check label's ranges for address we want to translate if a CU doesn't
have usable DW_AT_range or DW_AT_low_pc.
Use more appropriate names: "struct CU" -> "struct range"
Developed as part of upstream ELF Tool Chain bug report
https://sourceforge.net/p/elftoolchain/tickets/552/ although this does
not address the specific case reported there.
Submitted by: Tiger Gao <tig@freebsdfoundation.org>
Sponsored by: The FreeBSD Foundation
Differential Revision: D23782
Brooks Davis [Fri, 9 Aug 2019 23:50:57 +0000 (23:50 +0000)]
Don't add -Wno-class-memaccess with older gcc.
This is a gcc 8.0+ warning which needed to be silenced on for the riscv
build. amd64-xtoolchain-gcc still uses gcc 6.4.0 and does not understand
this flag.
Adrian Chadd [Thu, 16 Apr 2020 23:29:49 +0000 (23:29 +0000)]
[libsa] Fix typecast of pointer for st_dev
This code was trying to use a pointer value for st_dev, which is definitely
not a pointer. Instead, cast to uintptr_t so it becomes a non-pointer value
before casting it.
Stefan Eßer [Thu, 10 Dec 2020 09:31:05 +0000 (09:31 +0000)]
Lift scope of buf[] to make it extend to a potential access via *basename
It can be assumed that the contents of the buffer was still allocated and
valid at the point of the out-of-scope access, so there was no security
issue in practice.
Alex Richardson [Fri, 4 Dec 2020 15:53:37 +0000 (15:53 +0000)]
crunchgen: fix NULL-deref bug introduced in r364647
While porting over the local changes from CheriBSD for upstreaming, I
accidentally committed a broken version of find_entry_point(): we have to
return NULL if the value is not found instead of a value with
ep->name == NULL, since the checks in main were changed to check ep instead
of ep->name for NULL.
This only matters if the crunched tool cannot be found using normal lookup
and one of the fallback paths is used, so it's unlikely to be triggered
in rescue. However, I noticed that one of our CheriBSD test scripts was
failing to run commands under `su` on minimal disk images where all
binaries are hardlinks to a `cheribsdbox` tool generated with crunchgen.
This also updates the bootstrapping check in Makefile.inc1 to bootstrap
crunchgen up to the next version bump.
Kyle Evans [Tue, 4 Aug 2020 02:47:24 +0000 (02:47 +0000)]
bsdgrep: switch to libregex for GNU_GREP_COMPAT
libregex is incomplete, but it's a bit less buggy than the in-base
libgnuregex and mostly OK.
While here, rename -DIWTH_GNU -> -DWITH_GNU_COMPAT; the option implies
that we're compatible with the GNU counterpart, not that we're including GNU
anything.
Ed Maste [Sat, 20 Oct 2018 22:35:06 +0000 (22:35 +0000)]
libi386: remove CLANG_NO_IAS workaround
Clang's Integrated Assembler was previously disabled for i386 with the
note that it "doesn't grok .codeNN directives yet." This is no longer
the case (and hasn't been for some time), and the assembled output .text
is identical between gas and IAS.
MFC after: 2 months
Sponsored by: The FreeBSD Foundation
Gordon Bergling [Sat, 19 Dec 2020 09:55:02 +0000 (09:55 +0000)]
MFC r368791: disk(9): Fix a few mandoc related errors
- function name without markup: g_io_deliver()
- function name without markup: disk_gone()
- sections out of conventional order: Sh SEE ALSO
- referenced manual not found: Xr MAKE_DEV 9
Actually the man page of MAKE_DEV has never existed.
MFC r368406:
Prefer using the MIN() function macro over the min() inline function
in the LinuxKPI. Linux defines min() to be a macro, while in FreeBSD
min() is a static inline function clamping its arguments to
"unsigned int".
MFC r368329 and r368341:
Fix definition of int64_t and uint64_t when long is 64-bit. This gets the kernel
shim code in line with the rest of the kernel, sys/x86/include/_types.h.
MFC r368632:
Be bug compatible with other operating systems by allowing non-sequential
interface numbering for USB descriptors in userspace. Else certain USB
control requests using the interface number, won't be recognized by the
USB firmware.
Refer to section 9.2.3 in the USB 2.0 specification:
Interfaces are numbered from zero to one less than the number of concurrent interfaces
supported by the configuration.
Kyle Evans [Tue, 4 Aug 2020 02:14:51 +0000 (02:14 +0000)]
libregex: implement GNU extensions
18a1e2e9: libregex: Implement a subset of the GNU extensions
The entire patch-set is not yet mature enough for commit, but this usable
subset is generally enough for googletest to be happy with and mostly map to
some existing concepts, so they're not as invasive.
The specific changes included here are:
- Branching in BREs with \|
- \w and \W for [[:alnum:]] and [^[:alnum:]] respectively
- \s and \S for [[:space:]] and [^[:space:]] respectively
- Additional quantifiers in BREs, \? and \+ (self-explanatory)
There's some #ifdef'd out work for allowing empty branches as a match-all.
This is a feature that's under assessment... future work will determine
how standard this behavior is and act accordingly.
61898cde: libregex: disable some of the unimplemented test cases for now
This should allow the tests to actually pass. Future work will uncomment the
unimplemented tests as they're implemented.
7518fb34: libc: regex: factor out ISBOW/ISEOW macros
These will be reused for \b (word boundary, which matches both sides).
No functional change.
ca53e5ae: libregex: implement \` and \' (begin-of-subj, end-of-subj)
These are GNU extensions, generally equivalent to ^ and $ except that the
new syntax will not match beginning of line after the first in a multi-line
expression or the end of line before absolute last in a multi-line
expression.
6b986646: libregex: implement \b and \B (word boundary, not word boundary)
This is the last of the needed GNU expressions before we can unleash bsdgrep
by default. \b is effectively an agnostic equivalent of \< and \>, while
\B will match every space that isn't making a transition from
nonchar -> char or char -> nonchar.
It was realized just a little too late that this was a hack that belonged in
individual regex(3)-using applications. It was surrounded in NOTYET and not
implemented in the engine, so remove it.
MFC NOTE: Altered to match the legacy behavior of a\bc => abc.
Part of the libregex functionality leaked into the tests it shares with
the standard regex(3). Introduce a P flag to set the REG_POSIX cflag to
indicate that libc regex should effectively do nothing while libregex should
specifically run it in non-extended mode.
regex(3): Interpret many escaped ordinary characters as EESCAPE
MFC NOTE: This only merged the infrastructure back, the new regcomp symbol
that actually interprets these as EESCAPE was *dropped*. This is purely to
make future commits for libregex easier to merge back so that we can choose
to use it.
In IEEE 1003.1-2008 [1] and earlier revisions, BRE/ERE grammar allows for
any character to be escaped, but "ORD_CHAR preceded by an unescaped
<backslash> character [gives undefined results]".
Historically, we've interpreted an escaped ordinary character as the
ordinary character itself. This becomes problematic when some extensions
give special meanings to an otherwise ordinary character
(e.g. GNU's \b, \s, \w), meaning we may have two different valid
interpretations of the same sequence.
To make this easier to deal with and given that the standard calls this
undefined, we should throw an error (EESCAPE) if we run into this scenario
to ease transition into a state where some escaped ordinaries are blessed
with a special meaning -- it will either error out or have extended
behavior, rather than have two entirely different versions of undefined
behavior that leave the consumer of regex(3) guessing as to what behavior
will be used or leaving them with false impressions.
This change bumps the symbol version of regcomp to FBSD_1.6 and provides the
old escape semantics for legacy applications, just in case one has an older
application that would immediately turn into a pumpkin because of an
extraneous escape that's embedded or otherwise critical to its operation.
This is the final piece needed before enhancing libregex with GNU extensions
and flipping the switch on bsdgrep.
For libc regcomp, this will be a nop. libregex will take this to mean that
it needs to turn off GNU extensions, effectively switching it back to the
POSIX-compliant libc implementation at runtime.
Kyle Evans [Wed, 25 Nov 2020 03:14:25 +0000 (03:14 +0000)]
MFC kern: cpuset: properly rebase when attaching to a jail
The current logic is a fine choice for a system administrator modifying
process cpusets or a process creating a new cpuset(2), but not ideal for
processes attaching to a jail.
Currently, when a process attaches to a jail, it does exactly what any other
process does and loses any mask it might have applied in the process of
doing so because cpuset_setproc() is entirely based around the assumption
that non-anonymous cpusets in the process can be replaced with the new
parent set.
This approach slightly improves the jail attach integration by modifying
cpuset_setproc() callers to indicate if they should rebase their cpuset to
the indicated set or not (i.e. cpuset_setproc_update_set).
If we're rebasing and the process currently has a cpuset assigned that is
not the containing jail's root set, then we will now create a new base set
for it hanging off the jail's root with the existing mask applied instead of
using the jail's root set as the new base set.
Note that the common case will be that the process doesn't have a cpuset
within the jail root, but the system root can freely assign a cpuset from
a jail to a process outside of the jail with no restriction. We assume that
that may have happened or that it could happen due to a race when we drop
the proc lock, so we must recheck both within the loop to gather up
sufficient freed cpusets and after the loop.
To recap, here's how it worked before in all cases:
0 4 <-- jail 0 4 <-- jail / process
| |
1 -> 1
|
3 <-- process
0 4 <-- jail 0 4 <-- jail / process
| |
1 <-- process -> 1
More importantly, in both cases, the attaching process still retains the
mask it had prior to attaching or the attach fails with EDEADLK if it's
left with no CPUs to run on or the domain policy is incompatible. The
author of this patch considers this almost a security feature, because a MAC
policy could grant PRIV_JAIL_ATTACH to an unprivileged user that's
restricted to some subset of available CPUs the ability to attach to a jail,
which might lift the user's restrictions if they attach to a jail with a
wider mask.
In most cases, it's anticipated that admins will use this to be able to,
for example, `cpuset -c -l 1 jail -c path=/ command=/long/running/cmd`,
and avoid the need for contortions to spawn a command inside a jail with a
more limited cpuset than the jail.
Kyle Evans [Sat, 12 Dec 2020 05:57:42 +0000 (05:57 +0000)]
MFC lualoader: module-manipulation commands
4634bb1f: lualoader: provide module-manipulation commands
Specifically, we have:
- enable-module
- disable-module
- toggle-module
These can be used to add/remove modules to be loaded or force modules to be
loaded in spite of modules_blacklist. In the typical case, a user is
expected to use them to recover an issue happening due to a module directive
they've added to their loader.conf or because they discover that they've
under-specified what to load.
A last minute rewrite left this logically wrong; if it's present in
modules_blacklist, then we do not load it.
7ed84fa1: lualoader: cli: provide a show-module-options loader command
This effectively dumps everything lualoader knows about to the console using
the libsa pager; that particular lua interface was added in r368591.
A pager stub implementation has been added that just dumps the output as-is
as a compat shim for older loader binaries that do not have lpager. This
stub should be moved into a more appropriate .lua file if we add anything
else that needs the pager.
Kyle Evans [Sat, 12 Dec 2020 21:25:38 +0000 (21:25 +0000)]
MFC: stand: liblua: add a pager module
This is nearly a 1:1 mapping of the pager API from libsa. The only real
difference is that pager.output() will accept any number of arguments and
coerce all of them to strings for output using luaL_tolstring (i.e. the
__tostring metamethod will be used).
The only consumer planned at this time is the upcoming "show-module-options"
implementation.
Kyle Evans [Sat, 19 Dec 2020 03:30:06 +0000 (03:30 +0000)]
MFC kern: cpuset: allow jails to modify child jails' roots
This partially lifts a restriction imposed by r191639 ("Prevent a superuser
inside a jail from modifying the dedicated root cpuset of that jail") that's
perhaps beneficial after r192895 ("Add hierarchical jails."). Jails still
cannot modify their own cpuset, but they can modify child jails' roots to
further restrict them or widen them back to the modifying jails' own mask.
As a side effect of this, the system root may once again widen the mask of
jails as long as they're still using a subset of the parent jails' mask.
This was previously prevented by the fact that cpuset_getroot of a root set
will return that root, rather than the root's parent -- cpuset_modify uses
cpuset_getroot since it was introduced in r327895, previously it was just
validating against set->cs_parent which allowed the system root to widen
jail masks.