Mark Johnston [Mon, 18 Jan 2021 22:07:56 +0000 (17:07 -0500)]
safexcel: Simplify request allocation
Rather than preallocating a set of requests and moving them between
queues during state transitions, maintain a shadow of the command
descriptor ring to track the driver context of each request. This is
simpler and requires less synchronization between safexcel_process() and
the ring interrupt handler.
Rather than returning a hard error in this case, return ERESTART so that
upper layers get a chance to retry the request (or drop it, depending on
the desired policy).
This case is hard to hit due to the somewhat low bound on queued
requests, but that will no longer be true after an upcoming change.
Michael Tuexen [Wed, 13 Jan 2021 21:48:17 +0000 (22:48 +0100)]
tcp: add sysctl to tolerate TCP segments missing timestamps
When timestamp support has been negotiated, TCP segements received
without a timestamp should be discarded. However, there are broken
TCP implementations (for example, stacks used by Omniswitch 63xx and
64xx models), which send TCP segments without timestamps although
they negotiated timestamp support.
This patch adds a sysctl variable which tolerates such TCP segments
and allows to interoperate with broken stacks.
iflib: add assert to prevent out-of-bounds array access
The iflib_queues_alloc() allocates isc_nrxqs iflib_dma_info structs
for each rxqset, and links each struct to a different free list.
As a result, it must be isc_nrxqs >= isc_nfl (plus the completion
queue, if present).
Add an assertion to make this constraint explicit.
Kyle Evans [Thu, 31 Dec 2020 17:15:45 +0000 (11:15 -0600)]
stand: properly declare subdir deps or .WAIT, do parallel build
buildworld already runs the stand build in parallel[1], so make it easier to
identify ordering issues by properly establishing dependencies or adding
.WAIT where needed.
Everything in stand/ relies on libsa, either directly or indirectly, because
libsa build is where the stand headers get installed and it gets linked in
most places.
Interpreters depend on their libs, machine dirs usually depend on top-level
libs that are getting built and at least one of the interpreter flavors.
For i386, order btx/libi386/libfirewire before everything else using a
big-ol-.WAIT hammer. btx is the most common dependency, but the others are
used sporadically. This seems to be where the race reporting on the mailing
list is- AFAICT, the following sequence is happening:
1.) One of the loaders gets built based on stale btx/btxldr
2.) btx/btxldr gets rebuilt
3.) installworld triggers loader rebuild because btx was rebuilt after
This seems like the most plausible explanation, as they've verified system
time and timestamps.
While we're here, let's switch stand/ over to a completely parallel build so
we can work out these kinds of issues in isolation rather than in the middle
of a larger build.
Brandon Bergren [Thu, 2 Jan 2020 04:34:22 +0000 (04:34 +0000)]
Move stand/ofw/libofw to stand/libofw.
Since rS330365, there has been no particular reason for libofw to be in a
subdirectory of ofw. Move libofw up a level to make it fit in better with
the other top level libraries.
Also add a LIBOFWSRC to stand/defs.mk to match what all the other
libraries are doing.
refcount_load() does not yet exist on this branch, and the path to MFC'ing
it is slightly non-trivial. Back out the part that uses it -- it's a ddb
command anyways, so the cost of getting it wrong is ~low.
Kyle Evans [Tue, 5 Jan 2021 21:49:46 +0000 (15:49 -0600)]
du: tests: make H_flag tests more strict about output requirements
The current version of this test will effectively pass as long as one of the
specified paths is in the output, and it could even be a subset of one of
the paths.
Strengthen up the test a little bit:
* Specify beginning/end anchors for each path
* Add egrep -v checks to make sure we don't have any *additional* paths
* Ratchet down paths2 to exactly the two paths we expect to appear
Kyle Evans [Tue, 5 Jan 2021 21:33:06 +0000 (15:33 -0600)]
du: tests: fix the H_flag test (primarily grep usage)
This test attempts to use \t (tab intended) in a grep expression. With the
former /usr/bin/grep (i.e. gnugrep), this was interpreted as a literal 't'.
The expression would work anyways because the tr(1) usage would ultimately
replace all of the spaces with a single newline, and they would match the
paths whether they were correctly fromatted or not.
Current /usr/bin/grep (i.e. bsdgrep) is less-tolerant of ordinary-escapes, a
property of the underlying regex(3) engine, to make it easier to identify
when stuff like this happens. In-fact, this expression broke after the
switch happened.
This revision does the bare basics to fix the usage by using a printf to get
a literal tab character to insert into the expression. It also swaps out the
manual insertion of the line prefix into the grep expression by pulling
that part out of $sep and reusing it for the leading path.
The secondary issue was the tr(1) usage, since tr would only replace the
first character of string1 with the first character of string2. This has
instead been replaced by a sed expression, which similary understands \n to
be a newline on all supported versions of FreeBSD. Each path now gets
prefixed with the appropriate context that should be there (i.e. numeric
sequence followed by a tab).
Kyle Evans [Thu, 31 Dec 2020 18:30:43 +0000 (12:30 -0600)]
libc: tests: add some tests for cpuset(2)
The cpuset(2) tests should be run as root (require.user properly set) with
>= 3 cpus for maximum coverage. All tests that want to modify the cpuset
don't assume any particular cpu layout (i.e. the first cpu may not be 0, the
last may not be first + count) and the following scenarios are tested:
1.) newset: basic execute cpuset() to grab a new cpuset, make sure the
assigned cpuset then has a different ID.
2.) transient: create a new cpuset then assign the process its original
cpuset, ensuring that the one we created is now gone.
3.) deadlk: test assigning an anonymous mask, then resetting the process
base affinity with 1-cpu overlap w.r.t. the anonymous mask and with
0-cpu overlap w.r.t. the anonymous mask.
4.) jail_attach_newbase: process attaches to a jail with its own
cpuset+mask (e.g. cpuset -c -l 1,2 jail -c path=/ command=/bin/sh)
5.) jail_attach_newbase_plain: process attaches to a jail with its own
cpuset (e.g. cpuset -c jail -c path=/ command=/bin/sh)
6.) jail_attach_prevbase: process attaches to a jail with the containing
jail's root cpuset (e.g. jail -c path=/ command=/bin/sh)
7.) jail_attach_plain: process attaches to a jail with the containing jail's
root cpuset+mask.
8.) badparent: creates a new cpuset and modifies the anonymous thread mask,
then setid's back to the original and checks that cpuset_getid() returns
the expected set.
Kyle Evans [Thu, 31 Dec 2020 18:26:01 +0000 (12:26 -0600)]
libc: tests: hook CPUSET(9) test up to the build
Add shims to map NetBSD's API to CPUSET(9). Obviously the invalid input
parts of these tests are relatively useless since we're just testing the
shims that aren't used elsewhere, there's still some amount of value in
the parts testing valid inputs.
Kyle Evans [Fri, 8 Jan 2021 19:57:32 +0000 (13:57 -0600)]
libregex: re-enable `make check`
The tests are generally expected to pass, uncomment the annotation that
lets `make check` work. Note that `make check` currently requires kyua
from ports or an appropriate symlink into /usr/local/bin.
Alex Richardson [Tue, 25 Aug 2020 13:30:34 +0000 (13:30 +0000)]
Fix -Wundef warnings when building liblua
We need to define the LUA_FLOAT_INT64 macro even if we don't use it (copied
from stand/luaconf.h). While touching luaconf.h.dist also sync it with the
the 5.3.5 release version (matches the one in lib/liblua).
Kyle Evans [Fri, 14 Aug 2020 02:22:19 +0000 (02:22 +0000)]
flua: don't allow dlopen, et al., for bootstrap flua
There are some logistics issues that need to be sorted out here before we
can actually allow this to work.
It's not really safe to allow LUA_USE_DLOPEN with host lib paths being used.
The host system could have an entirely different lua version and this could
cause us to crash and burn.
If we want to revive this later, we need to make sure to define c module
paths inside OBJDIR that are compiled against whatever version we've
bootstrapped.
Ed Maste [Thu, 13 Aug 2020 00:19:05 +0000 (00:19 +0000)]
flua: initial support for "require" in the base system
Use /usr not /usr/local for base system components.
Use /usr/lib/flua and /usr/share/flua (not lua) for consistency and to
avoid the possibility that other software accidentally finds our base
system modules.
Also drop the version from the path, as flua represents an unspecified
lua version that corresponds to the FreeBSD version it comes with.
LUA_USE_DLOPEN is not yet enabled because some additional changes are
needed wrt symbol visibility.
Kyle Evans [Mon, 23 Nov 2020 00:33:06 +0000 (00:33 +0000)]
kern: dup: do not assume oldfde is valid
oldfde may be invalidated if the table has grown due to the operation that
we're performing, either via fdalloc() or a direct fdgrowtable_exp().
This was technically OK before rS367927 because the old table remained valid
until the filedesc became unused, but now it may be freed immediately if
it's an unshared table in a single-threaded process, so it is no longer a
good assumption to make.
This fixes dup/dup2 invocations that grow the file table; in the initial
report, it manifested as a kernel panic in devel/gmake's configure script.
lualoader: add loader_conf_dirs support (loader.conf.d)
loader_conf_dirs is the supporting mechanism for the included
/boot/loader.conf.d directory. When lualoader finishes processing all of
the loader_conf_files it finds after walking /boot/defaults/loader.conf,
it will now check any and all loader_conf_dirs and process files ending
in ".conf" as if they were a loader.conf.
Note that loader_conf_files may be specified in a loader.conf.d config
file, but loader_conf_dirs may *not*. It will only be processed as specified
in /boot/defaults/loader.conf and any loader_conf_files that were loaded
from there.
Miod Vallat [Fri, 8 Jan 2021 18:59:00 +0000 (12:59 -0600)]
libc: regex: rework unsafe pointer arithmetic
regcomp.c uses the "start + count < end" idiom to check that there are
"count" bytes available in an array of char "start" and "end" both point to.
This is fine, unless "start + count" goes beyond the last element of the
array. In this case, pedantic interpretation of the C standard makes the
comparison of such a pointer against "end" undefined, and optimizers from
hell will happily remove as much code as possible because of this.
An example of this occurs in regcomp.c's bothcases(), which defines
bracket[3], sets "next" to "bracket" and "end" to "bracket + 2". Then it
invokes p_bracket(), which starts with "if (p->next + 5 < p->end)"...
Because bothcases() and p_bracket() are static functions in regcomp.c, there
is a real risk of miscompilation if aggressive inlining happens.
The following diff rewrites the "start + count < end" constructs into "end -
start > count". Assuming "end" and "start" are always pointing in the array
(such as "bracket[3]" above), "end - start" is well-defined and can be
compared without trouble.
As a bonus, MORE2() implies MORE() therefore SEETWO() can be simplified a
bit.
Kyle Evans [Fri, 15 Jan 2021 14:15:40 +0000 (08:15 -0600)]
lualoader: use floor division to get correct type
This fixes the positioning of the "Welcome to FreeBSD" heading, which was
misplaced after the recent update to Lua 5.4. The issue was previously
masked by a compatibility knob in Lua 5.3 that would cause float-tagged
numbers to render faithfully without the decimal component. Lua 5.4 dropped
that and ensures that it always prints a decimal component, even if it has
to append a ".0" to the value.
Standard division produces a "float", floor division (//) can be used to
guarantee an integer. Floating point operations have been completely ripped
out of the liblua compiled for the bootloader, so this is a nop. This is
decidedly better than trying to hack out the float tag entirely.
Kyle Evans [Sat, 16 Jan 2021 05:58:12 +0000 (23:58 -0600)]
bectl: tests: use -R <mount> instead of specifying altroot
-R is currently shorthand for cachefile=none, altroot=<mount>. This is
functionally the same, but perhaps more resilient to future changes that
could be necessary that may be added when -R is specified.
Conrad Meyer [Mon, 27 May 2019 17:33:20 +0000 (17:33 +0000)]
kldxref(8): Sort MDT_MODULE info first in linker.hints output
MDT_MODULE info is required to be ordered before any other MDT metadata for
a given kld because it serves as an implicit record boundary between
distinct klds for linker.hints consumers. kldxref(8) has previously relied
on the assumption that MDT_MODULE was ordered relative to other module
metadata in kld objects by source code ordering.
However, C does not require implementations to emit file scope objects in
any particular order, and it seems that GCC 6.4.0 and/or binutils 2.32 ld
may reorder emitted objects with respect to source code ordering.
So: just take two passes over a given .ko's module metadata, scanning for
the MDT_MODULE on the first pass and the other metadata on subsequent
passes. It's not super expensive and not exactly a performance-critical
piece of code. This ensures MDT_MODULE is always ordered before
MDT_PNP_INFO and other MDTs, regardless of compiler/linker movement. As a
fringe benefit, it removes the requirement that care be taken to always
order MODULE_PNP_INFO after DRIVER_MODULE in source code.
MFC r353632:
Replace rdma_is_upper_dev_rcu() with rdma_vlan_dev_real_dev() in ibcore.
This reduces the number of references to VLAN_TRUNKDEV() in ibcore.
Currently only VLAN is supported as a child interface in FreeBSD.
Remove superfluous RCU locking.
Kristof Provost [Wed, 13 Jan 2021 18:30:01 +0000 (19:30 +0100)]
pf: Don't hold PF_RULES_WLOCK during copyin() on DIOCRCLRTSTATS
We cannot hold a non-sleepable lock during copyin(). This means we can't
safely count the table, so instead we fall back to the pf_ioctl_maxcount
used in other ioctls to protect against overly large requests.
Kristof Provost [Tue, 19 Jan 2021 12:48:31 +0000 (13:48 +0100)]
pfctl: Fix NOCLEAN build
We've created a new pf_ruleset.c file for pfctl and no longer use the
kernel vrsion, but the build system doesn't handle this dependency
change correctly. Delete the dependency file if it contains the kernel
version of the file.
Kristof Provost [Thu, 24 Dec 2020 15:02:04 +0000 (16:02 +0100)]
pfctl: Stop sharing pf_ruleset.c with the kernel
Now that we've split up the datastructures used by the kernel and
userspace there's essentually no more overlap between the pf_ruleset.c
code used by userspace and kernelspace.
Copy the userspace bits to the pfctl directory and stop using the kernel
file.
Reviewed by: philip
MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D27764
Kristof Provost [Sun, 13 Dec 2020 16:20:02 +0000 (17:20 +0100)]
pf: Convert pfi_kkif to use counter_u64
Improve caching behaviour by using counter_u64 rather than variables
shared between cores.
The result of converting all counters to counter(9) (i.e. this full
patch series) is a significant improvement in throughput. As tested by
olivier@, on Intel Xeon E5-2697Av4 (16Cores, 32 threads) hardware with
Mellanox ConnectX-4 MCX416A-CCAT (100GBase-SR4) nics we see:
x FreeBSD 20201223: inet packets-per-second
+ FreeBSD 20201223 with pf patches: inet packets-per-second
+--------------------------------------------------------------------------+
| + |
| xx + |
|xxx +++|
||A| |
| |A||
+--------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 5 9216962952635693439029371057.6 116720.36
+ 5 19427190196984001950292219546509 109084.92
Difference at 95.0% confidence
1.01755e+07 +/- 164756
108.584% +/- 2.9359%
(Student's t, pooled s = 112967)
Reviewed by: philip
MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D27763
Kristof Provost [Sun, 13 Dec 2020 10:36:54 +0000 (11:36 +0100)]
pf: Allocate and free pfi_kkif in separate functions
Factor out allocating and freeing pfi_kkif structures. This will be
useful when we change the counters to be counter_u64, so we don't have
to deal with that complexity in the multiple locations where we allocate
pfi_kkif structures.
Kristof Provost [Sat, 5 Dec 2020 13:38:12 +0000 (14:38 +0100)]
pf: Remove unused fields from pf_krule
The u_* counters are used only to communicate with userspace, as
userspace cannot use counter_u64. As pf_krule is not passed to userspace
these fields are now obsolete.
- Split synopsis into two parts. The first explains how to record
sessions, while the second one explains how to replay (some of)
the recorded sessions.
- Fix the -width argument of the environment variables list.
bootparamd: Fix several warnings and increase warn level to 6.
- Increase WARNS to 6.
- Except -Wcast-align and -Wincompatible-pointer-types-discards-qualifiers
checks.
- Use ANSI C prototype.
- Statically variables and functions.
- Add extern declaration for global variables.
- Rename local variables to resolve shadow warnings.
- Ignore malformed directory entries as created by Dropbox ("/").
(rev 1.24)
- Use libarchive 3.x interface: check result for archive_read_free()
and don't call archive_read_close manually. (rev 1.23)
- Always overwrite symlinks on extraction, ever if they're newer than
entries in archive.
- Use getline() rather than getdelim().
PR: 231827
Submitted by: ak
Reviewed by: mm
Obtained from: NetBSD
Ed Maste [Wed, 13 Jan 2021 03:24:52 +0000 (22:24 -0500)]
elftcl: add -i flag to ignore unknown flags
This may allow an identical elfctl invocation to be used on multiple
FreeBSD versions, with features not implemented on older releases being
silently ignored.
PR: 252629 (related)
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28130
Mark Johnston [Sun, 10 Jan 2021 22:46:32 +0000 (17:46 -0500)]
libdtrace: Format USDT symbols correctly based on symbol binding
Before we did not handle weak symbols correctly, sometimes resulting in
link errors from dtrace -G when processing object files where functions
with weak aliases contain USDT probes.
Reported by: rlibby
Sponsored by: The FreeBSD Foundation
Mark Johnston [Mon, 4 Jan 2021 13:22:21 +0000 (08:22 -0500)]
mvneta: Fix 64-bit MIB reads
It appears we must read MIB values as 2 4-byte words, lower address
first. A single 8-byte MIB read returns the value with the lower 4
bytes copied into the upper 4 bytes, resulting in bogus byte counter
values.
Roger Pau Monné [Wed, 25 Nov 2020 11:34:38 +0000 (12:34 +0100)]
xen: allow limiting the amount of duplicated pending xenstore watches
Xenstore watches received are queued in a list and processed in a
deferred thread. Such queuing was done without any checking, so a
guest could potentially trigger a resource starvation against the
FreeBSD kernel if such kernel is watching any user-controlled xenstore
path.
Allowing limiting the amount of pending events a watch can accumulate
to prevent a remote guest from triggering this resource starvation
issue.
For the PV device backends and frontends this limitation is only
applied to the other end /state node, which is limited to 1 pending
event, the rest of the watched paths can still have unlimited pending
watches because they are either local or controlled by a privileged
domain.
The xenstore user-space device gets special treatment as it's not
possible for the kernel to know whether the paths being watched by
user-space processes are controlled by a guest domain. For this reason
watches set by the xenstore user-space device are limited to 1000
pending events. Note this can be modified using the
max_pending_watch_events sysctl of the device.
This is XSA-349.
Sponsored by: Citrix Systems R&D
MFC after: 3 days
Similarly to what done for iflib in 1d238b07d5d4d9660ae0e,
this patch prevents access to the krings during the interface
reset triggered by netmap_register().
When different processes open separate subsets of the
available rings of a same netmap interface, a device
reset may be performed while one of the processes
is actively using some rings (e.g., caused by another
process executing a nmport_open()).
With this patch, such situation will cause the
active process to get a POLLERR, so that it can
have a chance to detect the situation.
We also guarantee that no process is running a txsync
or rxsync (ioctl or poll) while an iflib device reset
is in progress.
When netmap_fl_refill() is called at initialization time (e.g.,
during netmap_iflib_register()), nic_i must be 0, since the
free list is reinitialized. At the end of the refill cycle, nic_i
must still be zero, because exactly N descriptors (N is the ring size)
are refilled.
This patch therefore fixes the assertions to check on nic_i rather
than on nm_i. The current netmap_reset() may in fact cause nm_i
to be != 0 while the device is resetting: this may happen when
multiple non-cooperating processes open different subsets of the
available netmap rings.
The netmap_reset() function is meant to be called by the driver
when they initialize (or re-initialize) a hardware ring.
However, since the introduction of support for opening (in
netmap mode) a subset of the available rings, netmap_reset()
may be called multiple times on actively used rings, causing
both kring and netmap ring to transition to an inconsistent
state.
This changes improves the situation by resetting all the
indices fields of the kring to 0, as expected after the
reinitialization of a hardware ring.
netmap: iflib: enable/disable krings on any interface reinit
Since 1d238b07d5d4d9660ae0e0, krings are disabled before
a reinit cycle triggered by iflib_netmap_register.
However, this operation is actually necessary also for
any interface reinit triggered by other causes (i.e.,
ifconfig commands).
We achieve this goal by moving the krings enable/disable
operation inside iflib_stop() and iflib_init_locked().
Once here, this change also removes some redundant operations
from iflib_netmap_register(), that are already performed by
iflib_stop().
Restore the hwofs functionality temporarily disabled by 7ba6ecf216fb15e8b147db2 to prevent issues with iflib.
This patch brings the necessary changes to iflib to
enable howfs to allow interface restarts without
disrupting netmap applications actively using its
rings.
After this change, it becomes possible for multiple
non-cooperating netmap applications to use non-overlapping
subsets of the available netmap rings without clashing
with each other.