Ryan Moeller [Sun, 28 Feb 2021 10:30:09 +0000 (10:30 +0000)]
sbin/ifconfig: Use a global libifconfig handle
This should eventually replace the socket passed to the various
handlers. In the meantime, making it global avoids repeatedly opening
and closing handles.
Alex Richardson [Fri, 12 Mar 2021 17:12:10 +0000 (17:12 +0000)]
capsicum-test: Update for O_BENEATH removal
Update the tests to check O_RESOLVE_BENEATH instead.
If this looks reasonable, I'll try to upstream this change.
This keeps a compat fallback for O_BENEATH since the Linux port still
has/had O_BENEATH with "no .., no absolute paths" semantics.
Test Plan: `/usr/tests/sys/capsicum/capsicum-test -u 977` passes and
runs the O_RESOLVE_BENEATH tests.
Reviewed By: markj
Differential Revision: https://reviews.freebsd.org/D29016
The argument has to be a single whitespace-separate value. While touching
all these lines also add ksh93, since `atf_set "require.progs"` overrides
the default value specified in the Kyuafile. This then results in tests
being executed despite ksh93 not being installed.
Alex Richardson [Wed, 3 Mar 2021 13:53:30 +0000 (13:53 +0000)]
Fix capsicum-test build with GCC
Apparently GCC defines NULL to 0 in C++11 mode (instead of nullptr), so
this causes the following error:
```
In file included from capsicum-test.h:15,
from capsicum-test.cc:1:
gtest-1.10.0/include/gtest/gtest.h: In instantiation of 'testing::AssertionResult testing::internal::CmpHelperNE(const char*, const char*, const T1&, const T2&) [with T1 = long int; T2 = procstat*]':
capsicum-test.cc:75:3: required from here
gtest-1.10.0/include/gtest/gtest.h:1621:28: error: ISO C++ forbids comparison between pointer and integer [-fpermissive]
1609 | if (val1 op val2) {\
| ~~~~~~~~~~~~
......
1621 | GTEST_IMPL_CMP_HELPER_(NE, !=);
gtest-1.10.0/include/gtest/gtest.h:1609:12: note: in definition of macro 'GTEST_IMPL_CMP_HELPER_'
1609 | if (val1 op val2) {\
| ^~
```
Fix this by using nullptr directly.
Submitted upstream as https://github.com/google/capsicum-test/pull/56
Alex Richardson [Tue, 2 Mar 2021 18:27:34 +0000 (18:27 +0000)]
Simplify the capsicum-test wrapper script
Instead of running tests one-by-one with the shell wrapper we now run
the full gtest testsuite twice (once as root, once as non root). This
significantly speeds up running tests despite running them twice.
This change also passes the missing -u flag to capsicum-test that caused
test failures (https://bugs.freebsd.org/250178)
Previously, running the testsuite with the wrapper script took ~3s per
test on aarch64 QEMU, i.e. a total of almost 5 minutes.
Now it takes 6 seconds to run all tests twice.
Before:
root@freebsd-aarch64:/usr/tests/sys/capsicum # /usr/bin/time kyua test functional
94/96 passed (2 failed)
309.97 real 58.46 user 244.31 sys
After:
root@freebsd-aarch64:/usr/tests/sys/capsicum # /usr/bin/time kyua test functional
functional:test_root -> passed [2.659s]
functional:test_unprivileged -> passed [2.391s]
2/2 passed (0 failed)
5.48 real 1.06 user 2.52 sys
This overhead is caused by kyua + atf-sh spawning lots of additional
processes and can be avoided by just running the googletest test binary.
syscall seconds calls errors
fork 39.810229456 1275 0
sigprocmask 13.546928736 572 0
i.e. 1275 processes spawned to run a single test.
Test Plan: All tests pass with D28907.
PR: 250178
Reviewed By: lwhsu
Differential Revision: https://reviews.freebsd.org/D29014
This includes various fixes that I submitted recently such as updating the
pdkill() tests for the actual implemented behaviour
(https://github.com/google/capsicum-test/pull/53) and lots of changes to
avoid calling sleep() and replacing it with reliable synchronization
(pull requests 49,51,52,53,54). This should make the testsuite more reliable
when running on Jenkins. Additionally, process status is now retrieved using
libprocstat instead of running `ps` and parsing the output
(https://github.com/google/capsicum-test/pull/50). This fixes one previously
failing test and speeds up execution.
Overall, this update reduces the total runtime from ~60s to about 4-5 seconds.
This also fixes a typo in the dup test that caused the head function to
not be called. On my test system without python3 the tests are now
skipped instead of failing.
Alex Richardson [Tue, 2 Mar 2021 18:34:42 +0000 (18:34 +0000)]
tests/sys/audit: add missing comma delimiter between fields
This makes the `kyua report --verbose` output a lot easier to parse when
looking at failed tests. It also fixes the closefrom() test since I
tested my changes with this commit but forgot to push it together with fa32350347b4e351a144b5423f0fb2ca9d67f4ca.
Alex Richardson [Thu, 18 Feb 2021 10:14:27 +0000 (10:14 +0000)]
tests/sys/audit: Avoid race caused by starting auditd(8) for testing
In the CheriBSD CI we reproducibly see the first test in sys/audit
(administrative:acct_failure) fail due to a missing startup message.
It appears this is caused by a race condition when starting auditd:
`service auditd onestart` returns as soon as the initial auditd() parent
exits (after the daemon(3) call).
We can avoid this problem by setting up the auditd infrastructure
in-process: libauditd contains audit_quick_{start,stop}() functions that
look like they are ideally suited to this task.
This patch also avoids forking lots of shell processes for each of the 418
tests by using `auditon(A_SENDTRIGGER, &trigger, sizeof(trigger))` to check
for a running auditd(8) instead of using `service auditd onestatus`.
With these two changes (and D28388 to fix the XFAIL'd test) I can now
boot and run `cd /usr/tests/sys/audit && kyua test` without any failures
in a single-core QEMU instance. Before there would always be at least one
failed test.
Besides making the tests more reliable in CI, a nice side-effect of this
change is that it also significantly speeds up running them by avoiding
lots of fork()/execve() caused by shell scripts:
Running kyua test on an AArch64 QEMU took 315s before and now takes 68s,
so it's roughly 3.5 times faster. This effect is even larger when running
on a CHERI-RISC-V QEMU since emulating CHERI instructions on an x86 host
is noticeably slower than emulating AArch64.
Alex Richardson [Tue, 23 Feb 2021 12:51:35 +0000 (12:51 +0000)]
Fix ptrace_test:ptrace__syscall_args after ATF upgrade
ATF now opens the results file (without CLOEXEC), so the child actually
has a valid file descriptor 3. To fix this simply use a large number that
will definitely not be a valid file descriptor.
Alex Richardson [Mon, 1 Mar 2021 18:51:02 +0000 (18:51 +0000)]
Remove atf_tc_skip calls from ptrace_test
I've run these tests many times in a loop on multiple architectures and
it works reliably for me, maybe it's time to retire these skips?
This also adds an additional waitpid to one of the tests to avoid
a potential race condition (suggested by markj@).
Alex Richardson [Mon, 1 Mar 2021 18:49:31 +0000 (18:49 +0000)]
ptrace_test: Add more debug output on test failures
Mostly automatic, using
`CHILD_REQUIRE\(([^|&\n]*) ==` -> `CHILD_REQUIRE_EQ_INT($1,`
`ATF_REQUIRE\(([^|&\n]*) ==` -> `REQUIRE_EQ_INT($1,` followed by
git-clang-format -f and then manually checking ones that contain ||/&&.
Test Plan:
Still getting the same failure but now it prints
`psr.sr_error (0) == EBADF (9) not met` instead of just failing
without printing the values.
Alex Richardson [Wed, 3 Feb 2021 09:32:16 +0000 (09:32 +0000)]
atf: Fix ATF_BUILD_* values when not using the bootstrap compiler
Currently, we encode the full path and compile flags for the build
compiler in libatf. However, these values are not correct when
cross-compiling: For example, when I build on macOS, CC is set to the
host path /usr/local/Cellar/llvm/11.0.0_1/bin/clang-11. This path will
not exist on the target system.
Simplify this logic and use cc/cpp/c++ since those binaries will exist
on the target system unless the compiler was explicitly disabled.
I'm not convinced ATF needs to encode these values, but this is a
minimal fix for these tests when using a non-bootstrapped compiler.
Alex Richardson [Thu, 18 Feb 2021 10:07:51 +0000 (10:07 +0000)]
libc: Fix t_spawn_fileactions test after ATF update
Since https://github.com/freebsd/atf/commit/4581cefc1e3811dd3c926b5dd4b15fd63d2e19da
ATF opens the results file on startup. This fixes problems like
capsicumized tests not being able to open the file on exit.
However, this test closes all file descriptors above 3 to get a
deterministic fd table allocation for the child. Instead of using closefrom
(which will close the ATF output file FD) I've changed this test use
the lowest available fd and pass that to the helper program as a string.
We could also try to re-open the results file in ATF if we get a EBADF
error, but that will fail when running under Capsicum.
Reviewed By: cem
Differential Revision: https://reviews.freebsd.org/D28684
Alex Richardson [Mon, 15 Feb 2021 22:11:30 +0000 (22:11 +0000)]
Fix two failing tests after ATF update
Since https://github.com/freebsd/atf/commit/4581cefc1e3811dd3c926b5dd4b15fd63d2e19da
ATF opens the results file on startup. This fixes problems like
capsicumized tests not being able to open the file on exit.
However, this test closes all file descriptors just to check that
socketpair returns fd 3+4 and thereby also closes the ATF results file.
This then results in an EBADF when writing the result so the test is
reported as broken.
While system calls that create new file descriptors (must?) use the lowest
available file descriptor number, it does not seem useful to test this
property here. Drop the check for FD==3/4 to unbreak the testsuite.
We could also try to re-open the results file in ATF if we get a EBADF
error, but that will fail when running under Capsicum.
Reviewed By: cem
Differential Revision: https://reviews.freebsd.org/D28683
This includes improvements to the atf-sh helper functions that
significantly reduce the number of spawned processes for each test
and therefore speeds up running the testsuite noticeably.
After changing the namespace.h header we need to provide _err on macOS, too.
Previously we used the system libc err*/warn*, but that does not provide
_err/_warn (which is used by other bootstrapped files from libc).
To fix this problem bootstrap err.c on macOS as well.
Alex Richardson [Wed, 10 Feb 2021 11:05:02 +0000 (11:05 +0000)]
Fix crossbuild bootstrap tools build with Clang 12
Clang 12 no longer allows re-defining a weak symbol as non-weak. This
happed here because we compile err.c with _err defined to err. To fix
this, use the same approach as the libc namespace.h
Alex Richardson [Sat, 13 Feb 2021 13:53:50 +0000 (13:53 +0000)]
bin/pkill: Fix {pgrep,pkill}-j_test.sh
The POSIX sh case statement does not allow for pattern matching using the
regex + qualifier so this case statement never matches. Instead just check
for a string starting with a digit followed by any character.
While touching these files also fix various shellcheck warnings.
Alex Richardson [Sat, 13 Feb 2021 13:52:59 +0000 (13:52 +0000)]
lib/libc/tests/rpc: Correctly set timeout
The rpc_control() API does not accept the CLCR_SET_RPCB_TIMEOUT command,
it only accepts RPC_SVC_CONNMAXREC_GET/RPC_SVC_CONNMAXREC_SET, so it was
not doing anything.
Instead of incorrectly calling this API, use clnt_create_timed() instead.
I noticed this because the test was timing out after 120s in the CheriBSD CI.
Alex Richardson [Tue, 2 Feb 2021 09:55:18 +0000 (09:55 +0000)]
tests/sys/audit: Skip extattr tests if extattrs are not supported
In the CheriBSD CI, we run the testsuite with /tmp as tmpfs. This causes
the extattr audit tests to fail since tmpfs does not (yet) support
extattrs. Skip those tests if the target path is on a file system that
does not support extended file attributes.
While touching these two files also convert the ATF_REQUIRE_EQ(-1, ...)
checks to use ATF_REQURIE_ERRNO().
Alex Richardson [Thu, 28 Jan 2021 17:23:27 +0000 (17:23 +0000)]
tests/sys/audit: fix timeout calculation
This changes the behaviour to a 30s total timeout (needed when running
on slow emulated uniprocessor systems) and timing out after 10s without
any input. This also uses timespecsub() instead of ignoring the
nanoseconds field.
After this change the tests runs more reliably on QEMU and time out less
frequently.
Alex Richardson [Mon, 25 Jan 2021 14:03:17 +0000 (14:03 +0000)]
Don't include libarchive fuzz tests by default
These tests are basic fuzz tests that permute input to trigger crashes
rather than regression or unit tests. Additionally, some of them take a
rather long time to run and should probably be run on a dedicated fuzzing
job instead. Moreover, these simple tests use rand() instead of a real
fuzzing tool that generates interesting inputs (e.g. LLVM libFuzzer) so are
unlikely to find anything interesting when run in CI.
This allows removing one BROKEN_TESTS case due to timeouts and speeds up
running tests on emulated platforms such as QEMU.
Reviewed By: lwhsu, mm
Differential Revision: https://reviews.freebsd.org/D27153
Alex Richardson [Wed, 27 Jan 2021 10:41:57 +0000 (10:41 +0000)]
kerberos5: Silence compiler warnings
Building the kerberos5 subdirectory currently produces lots of warnings.
Since there are many instances of these warnings and it's contrib code,
this change silences the warnings instead of fixing them.
Alex Richardson [Mon, 8 Mar 2021 09:37:39 +0000 (09:37 +0000)]
kern.mk: Fix wrong variable being used for linker path after 172a624f0
When I synchronized kern.mk with bsd.sys.mk, I accidentally changed
CCLDFLAGS to LDFLAGS which is not used by the kernel builds. This commit
should unbreak the GitHub actions cross-build CI. I didn't notice it
locally because cheribuild already passes -fuse-ld in the linker flags as
it predates this being done in the makefiles.
Reported By: Jose Luis Duran
Fixes: 172a624f0 ("Silence annoying and incorrect non-default linker warning with GCC")
Alex Richardson [Thu, 4 Mar 2021 18:27:37 +0000 (18:27 +0000)]
Silence annoying and incorrect non-default linker warning with GCC
The CROSS_TOOLCHAIN GCC .mk files include -B${CROSS_BINUTILS_PREFIX}, so
GCC will select the right linker and we don't need to warn.
While here also apply 17b8b8fb5fc4acc832dabfe7ef11e3e1d399ad0f to kern.mk.
Test Plan: no more warning printed with CROSS_TOOLCHAIN=mips-gcc6
Reviewed By: jhb
Differential Revision: https://reviews.freebsd.org/D29015
Alex Richardson [Wed, 3 Feb 2021 15:24:28 +0000 (15:24 +0000)]
readelf: Fix printing NT_FREEBSD_ARCH_TAG
Looking at lib/csu/arm/crt1_s.S, this should be a string and therefore the
restriction to 4 characters seems wrong.
Found whle updating https://reviews.llvm.org/D74393.
Alex Richardson [Wed, 17 Mar 2021 09:48:28 +0000 (09:48 +0000)]
Create symlinks to host tools on non-FreeBSD hosts
This is unnecessary when cross-building from Linux/macOS.
Additionally, cp -p appears to be broken on macOS Big Sur
(https://openradar.appspot.com/8957219).
For some unknown reason this commit appears to fix
freezes when building on macOS Big Sur.
This also fixes building in docker with volume mounts
with ACLs, since setting the ACL with cp -p fails otherwise.
This causes problems when using ASAN with a runtime older than 12.0 since
the intercept does not expect qsort() to call itself using an interposable
function call. This results in infinite recursion and stack exhaustion
when a binary compiled with -fsanitize=address calls qsort.
See also https://bugs.llvm.org/show_bug.cgi?id=46832 and
https://reviews.llvm.org/D84509 (ASAN runtime patch).
To prevent this problem, this patch uses a static helper function
for the actual qsort() implementation. This prevents interposition and
allows for direct calls. As a nice side-effect, we can also move the
qsort_s checks to the top-level function and out of the recursive calls.
Alex Richardson [Mon, 1 Mar 2021 14:27:30 +0000 (14:27 +0000)]
AArch64: Don't set flush-subnormals-to-zero flag on startup
This flag has been set on startup since 65618fdda0f272a823e6701966421bdca0efa301.
However, This causes some of the math-related tests to fail as they report
zero instead of a tiny number. This fixes at least
/usr/tests/lib/msun/ldexp_test and possibly others.
Additionally, setting this flag prevents printf() from printing subnormal
numbers in decimal form.
See also https://www.openwall.com/lists/musl/2021/02/26/1
Alex Richardson [Tue, 9 Mar 2021 19:11:40 +0000 (19:11 +0000)]
Arch64: Clear VFP state on execve()
I noticed that many of the math-related tests were failing on AArch64.
After a lot of debugging, I noticed that the floating point exception flags
were not being reset when starting a new process. This change resets the
VFP inside exec_setregs() to ensure no VFP register state is leaked from
parent processes to children.
This commit also moves the clearing of fpcr that was added in 65618fdda0f27
from fork() to execve() since that makes more sense: fork() can retain
current register values, but execve() should result in a well-defined
clean state.
Reviewed By: andrew
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29060
Alex Richardson [Mon, 15 Feb 2021 22:09:33 +0000 (22:09 +0000)]
Fix fget_only_user() to return ENOTCAPABLE on a failed capsicum check
After eaad8d1303da500ed691bd774742a4555a05e729 four additional
capsicum-test tests started failing. It turns out this is because
fget_only_user() was returning EBADF on a failed capsicum check instead
of forwarding the return value of cap_check_inline() like
fget_unlocked_seq().
Alex Richardson [Mon, 15 Feb 2021 22:06:41 +0000 (22:06 +0000)]
msun: ctanh/ctanhf: Import fix from musl libc
This applies musl commit b02eed9c4841913d690a2d0029737d72615384fe by
Szabolcs Nagy and updates the tests accordingly. This also allows
removing an XFAIL from the test.
musl commit message:
complex: fix ctanh(+-0+i*nan) and ctanh(+-0+-i*inf)
These cases were incorrect in C11 as described by
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1886.htm
PR: 217528
Reviewed By: dim
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D28578
Mark Johnston [Tue, 2 Mar 2021 15:19:53 +0000 (10:19 -0500)]
vm: Round up npages and alignment for contig reclamation
When searching for runs to reclaim, we need to ensure that the entire
run will be added to the buddy allocator as a single unit. Otherwise,
it will not be visible to vm_phys_alloc_contig() as it is currently
implemented. This is a problem for allocation requests that are not a
power of 2 in size, as with 9KB jumbo mbuf clusters.
Reported by: alc
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28924
Rick Macklem [Tue, 2 Mar 2021 22:18:23 +0000 (14:18 -0800)]
nfsclient: Fix ReadDS/WriteDS/CommitDS nfsstats RPC counts for a NFSv3 DS
During a recent virtual NFSv4 testing event, a bug in the FreeBSD client
was detected when doing I/O DS operations on a Flexible File Layout pNFS
server. For an NFSv3 DS, the Read/Write/Commit nfsstats were incremented
instead of the ReadDS/WriteDS/CommitDS counts.
This patch fixes this.
Only the RPC counts reported by nfsstat(1) were affected by this bug,
the I/O operations were performed correctly.
Rick Macklem [Mon, 1 Mar 2021 20:49:32 +0000 (12:49 -0800)]
nfsclient: Fix the stripe unit size for a File Layout pNFS layout
During a recent virtual NFSv4 testing event, a bug in the FreeBSD client
was detected when doing a File Layout pNFS DS I/O operation.
The size of the I/O operation was smaller than expected.
The I/O size is specified as a stripe unit size in bits 6->31 of nflh_util
in the layout. I had misinterpreted RFC5661 and had shifted the value
right by 6 bits. The correct interpretation is to use the value as
presented (it is always an exact multiple of 64), clearing bits 0->5.
This patch fixes this.
Without the patch, I/O through the DSs work, but the I/O size is 1/64th
of what is optimal.
Rick Macklem [Sun, 28 Feb 2021 22:53:54 +0000 (14:53 -0800)]
nfsclient: add nfs node locking around uses of n_direofoffset
During code inspection I noticed that the n_direofoffset field
of the NFS node was being manipulated without any lock being
held to make it SMP safe.
This patch adds locking of the NFS node's mutex around
handling of n_direofoffset to make it SMP safe.
I have not seen any failure that could be attributed to n_direofoffset
being manipulated concurrently by multiple processors, but I think this
is possible, since directories are read with shared vnode
locking, plus locks only on individual buffer cache blocks.
However, there have been as yet unexplained issues w.r.t reading
large directories over NFS that could have conceivably been caused
by concurrent manipulation of n_direofoffset.
Rick Macklem [Sun, 28 Feb 2021 22:15:32 +0000 (14:15 -0800)]
nfsclient: add checks for a server returning the current directory
Commit 3fe2c68ba20f dealt with a panic in cache_enter_time() where
the vnode referred to the directory argument.
It would also be possible to get these panics if a broken
NFS server were to return the directory as an new object being
created within the directory or in a Lookup reply.
This patch adds checks to avoid the panics and logs
messages to indicate that the server is broken for the
file object creation cases.
Mark Johnston [Mon, 8 Mar 2021 17:39:06 +0000 (12:39 -0500)]
Rename _cscan_atomic.h and _cscan_bus.h to atomic_san.h and bus_san.h
Other kernel sanitizers (KMSAN, KASAN) require interceptors as well, so
put these in a more generic place as a step towards importing the other
sanitizers.
No functional change intended.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29103
Mark Johnston [Mon, 8 Mar 2021 17:39:06 +0000 (12:39 -0500)]
posix timers: Improve the overrun calculation
timer_settime(2) may be used to configure a timeout in the past. If
the timer is also periodic, we also try to compute the number of timer
overruns that occurred between the initial timeout and the time at which
the timer fired. This is done in a loop which iterates once per period
between the initial timeout and now. If the period is small and the
initial timeout was a long time ago, this loop can take forever to run,
so the system is effectively DOSed.
Replace the loop with a more direct calculation of
(now - initial timeout) / period to compute the number of overruns.
Reported by: syzkaller
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29093
Mitchell Horne [Wed, 10 Mar 2021 14:57:12 +0000 (10:57 -0400)]
ns8250: don't drop IER_TXRDY on bus_grab/ungrab
It has been observed that some systems are often unable to resume from
ddb after entering with debug.kdb.enter=1. Checking the status further
shows the terminal is blocked waiting in tty_drain(), but it never makes
progress in clearing the output queue, because sc->sc_txbusy is high.
I noticed that when entering polling mode for the debugger, IER_TXRDY is
set in the failure case. Since this bit is never tracked by the softc,
it will not be restored by ns8250_bus_ungrab(). This creates a race in
which a TX interrupt can be lost, creating the hang described above.
Ensuring that this bit is restored is enough to prevent this, and resume
from ddb as expected.
The solution is to track this bit in the sc->ier field, for the same
lifetime that TX interrupts are enabled.
PR: 223917, 240122
Sponsored by: The FreeBSD Foundation
linux(4): make getcwd(2) return ERANGE instead of ENOMEM
For native FreeBSD binaries, the return value from __getcwd(2)
doesn't really matter, as the libc wrapper takes over and returns
the proper errno.
PR: kern/254120
Reported By: Alex S <iwtcex@gmail.com>
Reviewed By: kib
Sponsored By: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29217
Alexander Motin [Wed, 3 Mar 2021 20:21:26 +0000 (15:21 -0500)]
Move ic_check_send_space clear to the actual check.
It closes tiny race when the flag could be set between being cleared
and the space is checked, that would create us some more work. The
flag setting is protected by both locks, so we can clear it in either
place, but in between both locks are dropped.
I think it allowed to avoid some TX thread wakeups while the socket
buffer is full. But add there another options if ic_check_send_space
is set, which means socket just reported that new space appeared, so
it may have sense to pull more data from ic_to_send for better TX
coalescing.
Alexander Motin [Sat, 27 Feb 2021 15:14:05 +0000 (10:14 -0500)]
Micro-optimize OOA queue processing.
- Move ctl_get_cmd_entry() calls from every OOA traversal to when
the requests first inserted, storing seridx in struct ctl_scsiio.
- Move some checks out of the loop in ctl_check_ooa().
- Replace checks for errors that can not happen with asserts.
- Transpose ctl_serialize_table, so that any OOA traversal accessed
only one row (cache line). Compact it from enum to uint8_t.
- Optimize static branch predictions in hottest places.
Due to O(n) nature on deep LUN queues this can be the hottest code
path in CTL, and additional 20% of IOPS I see in some 4KB I/O tests
are good to have in reserve. About 50% of CPU time here according
to the profiles is now spent in two memory accesses per traversed
request in OOA.
Alexander Motin [Sun, 21 Feb 2021 21:45:14 +0000 (16:45 -0500)]
Refactor CTL datamove KPI.
- Make frontends call unified CTL core method ctl_datamove_done()
to report move completion. It allows to reduce code duplication
in differerent backends by accounting DMA time in common code.
- Add to ctl_datamove_done() and be_move_done() callback samethr
argument, reporting whether the callback is called in the same
context as ctl_datamove(). It allows for some cases like iSCSI
write with immediate data or camsim frontend write save one context
switch, since we know that the context is sleepable.
- Remove data_move_done() methods from struct ctl_backend_driver,
unused since forever.
Alexander Motin [Fri, 19 Feb 2021 20:42:57 +0000 (15:42 -0500)]
Microoptimize CTL I/O queues.
Switch OOA queue from TAILQ to LIST and change its direction, so that
we traverse it forward, not backward. There is only one place where
we really need other direction, and it is not critical.
Use STAILQ_REMOVE_HEAD() instead of STAILQ_REMOVE() in backends.
Replace few impossible conditions with assertions.
Alexander Motin [Fri, 19 Feb 2021 03:07:32 +0000 (22:07 -0500)]
Save context switch per I/O for iSCSI and IOCTL frontends.
Introduce new CTL core KPI ctl_run(), preprocessing I/Os in the caller
context instead of scheduling another thread just for that. This call
may sleep, that is not acceptable for some frontends like the original
CAM/FC one, but iSCSI already has separate sleepable per-connection RX
threads, and another thread scheduling is mostly just a waste of time.
IOCTL frontend actually waits for the I/O completion in the caller
thread, so the use of another thread for this has even less sense.
With this change I can measure ~5% IOPS improvement on 4KB iSCSI I/Os
to ZFS.
Juraj Lutter [Sun, 28 Feb 2021 22:07:14 +0000 (23:07 +0100)]
newsyslog(8): Implement a new 'E' flag to not rotate empty log files
Based on an idea from dvl's coworker, László DANIELISZ, implement
a new flag, 'E', that prevents newsyslog(8) from rotating the empty
log files. This 'E' flag ist mostly usable in conjunction with 'B'
flag that instructs newsyslog(8) to not insert an informational
message into the log file after rotation, keeping it still empty.
Ryan Moeller [Sat, 27 Feb 2021 03:05:31 +0000 (03:05 +0000)]
sbin/ifconfig: Get lagg status with libifconfig
Also trimmed an unused block of code that never prints out LAGG_PROTOS.
Reviewed by: kp (earlier version)
Differential Revision: https://reviews.freebsd.org/D28961
Ryan Moeller [Fri, 26 Feb 2021 23:40:58 +0000 (23:40 +0000)]
sbin/ifconfig: Get carp status with libifconfig
A trivial change now that ifconfig is already using libifconfig.
Reviewed by: kp (earlier version)
Differential Revision: https://reviews.freebsd.org/D28955
Mark Johnston [Thu, 11 Mar 2021 15:34:28 +0000 (10:34 -0500)]
vm_reserv: Fix list locking in vm_reserv_reclaim_contig()
The per-domain partpop queue is locked by the combination of the
per-domain lock and individual reservation mutexes.
vm_reserv_reclaim_contig() scans the queue looking for partially
populated reservations that can be reclaimed in order to satisfy the
caller's allocation.
During the scan, we drop the per-domain lock. At this point, the rvn
pointer may be invalidated. Take care to load rvn after re-acquiring
the per-domain lock.
While here, simplify the condition used to check whether a reservation
was dequeued while the per-domain lock was dropped.
Reviewed by: alc, kib
Reported by: gallatin
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29203
Flush remaining routes from the routing table during VNET shutdown.
Summary:
This fixes rtentry leak for the cloned interfaces created inside the
VNET.
Loopback teardown order is `SI_SUB_INIT_IF`, which happens after `SI_SUB_PROTO_DOMAIN` (route table teardown).
Thus, any route table operations are too late to schedule.
As the intent of the vnet teardown procedures to minimise the amount of effort by doing global cleanups instead of per-interface ones, address this by adding a relatively light-weight routing table cleanup function, `rib_flush_routes()`.
It removes all remaining routes from the routing table and schedules the deletion, which will happen later, when `rtables_destroy()` waits for the current epoch to finish.
Traditionally *BSD routing stack required to supply some
interface data for blackhole/reject routes. This lead to
varieties of hacks in routing daemons when inserting such routes.
With the recent routeing stack changes, gateway sockaddr without
RTF_GATEWAY started to be treated differently, purely as link
identifier.
This change broke net/bird, which installs blackhole routes with
127.0.0.1 gateway without RTF_GATEWAY flags.
Fix this by automatically constructing necessary gateway data at
rtsock level if RTF_REJECT/RTF_BLACKHOLE is set.
Reported by: Marek Zarychta <zarychtam at plan-b.pwste.edu.pl>
Reviewed by: donner
Dimitry Andric [Sat, 13 Mar 2021 13:54:05 +0000 (14:54 +0100)]
Partially revert libcxxrt changes to avoid _Unwind_Exception change
(Note I am also applying this to main and stable/13, to restore the old
libcxxrt ABI and to avoid having to maintain a compat library.)
After the recent cherry-picking of libcxxrt commits 0ee0dbfb0d26 and d2b3fadf2db5, users reported that editors/libreoffice packages from the
official package builders did not start anymore. It turns out that the
combination of these commits subtly changes the ABI, requiring all
applications that depend on internal details of struct _Unwind_Exception
(available via unwind-arm.h and unwind-itanium.h) to be recompiled.
However, the FreeBSD package builders always use -RELEASE jails, so
these still use the old declaration of struct _Unwind_Exception, which
is not entirely compatible. In particular, LibreOffice uses this struct
in its internal "uno bridge" component, where it attempts to setup its
own exception handling mechanism.
To fix this incompatibility, go back to the old declarations of struct
_Unwind_Exception, and restore the __LP64__ specific workaround we had
in place before (which was to cope with yet another, older ABI bug).
Effectively, this reverts upstream libcxxrt commits 88bdf6b290da
("Specify double-word alignment for ARM unwind") and b96169641f79
("Updated Itanium unwind"), and reapplies our commit 3c4fd2463bb2
("libcxxrt: add padding in __cxa_allocate_* to fix alignment").