Kristof Provost [Wed, 13 Jan 2021 18:30:01 +0000 (19:30 +0100)]
pf: Don't hold PF_RULES_WLOCK during copyin() on DIOCRCLRTSTATS
We cannot hold a non-sleepable lock during copyin(). This means we can't
safely count the table, so instead we fall back to the pf_ioctl_maxcount
used in other ioctls to protect against overly large requests.
Kristof Provost [Tue, 19 Jan 2021 12:48:31 +0000 (13:48 +0100)]
pfctl: Fix NOCLEAN build
We've created a new pf_ruleset.c file for pfctl and no longer use the
kernel vrsion, but the build system doesn't handle this dependency
change correctly. Delete the dependency file if it contains the kernel
version of the file.
Kristof Provost [Thu, 24 Dec 2020 15:02:04 +0000 (16:02 +0100)]
pfctl: Stop sharing pf_ruleset.c with the kernel
Now that we've split up the datastructures used by the kernel and
userspace there's essentually no more overlap between the pf_ruleset.c
code used by userspace and kernelspace.
Copy the userspace bits to the pfctl directory and stop using the kernel
file.
Reviewed by: philip
MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D27764
Kristof Provost [Sun, 13 Dec 2020 16:20:02 +0000 (17:20 +0100)]
pf: Convert pfi_kkif to use counter_u64
Improve caching behaviour by using counter_u64 rather than variables
shared between cores.
The result of converting all counters to counter(9) (i.e. this full
patch series) is a significant improvement in throughput. As tested by
olivier@, on Intel Xeon E5-2697Av4 (16Cores, 32 threads) hardware with
Mellanox ConnectX-4 MCX416A-CCAT (100GBase-SR4) nics we see:
x FreeBSD 20201223: inet packets-per-second
+ FreeBSD 20201223 with pf patches: inet packets-per-second
+--------------------------------------------------------------------------+
| + |
| xx + |
|xxx +++|
||A| |
| |A||
+--------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 5 9216962952635693439029371057.6 116720.36
+ 5 19427190196984001950292219546509 109084.92
Difference at 95.0% confidence
1.01755e+07 +/- 164756
108.584% +/- 2.9359%
(Student's t, pooled s = 112967)
Reviewed by: philip
MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D27763
Kristof Provost [Sun, 13 Dec 2020 10:36:54 +0000 (11:36 +0100)]
pf: Allocate and free pfi_kkif in separate functions
Factor out allocating and freeing pfi_kkif structures. This will be
useful when we change the counters to be counter_u64, so we don't have
to deal with that complexity in the multiple locations where we allocate
pfi_kkif structures.
Kristof Provost [Sat, 5 Dec 2020 13:38:12 +0000 (14:38 +0100)]
pf: Remove unused fields from pf_krule
The u_* counters are used only to communicate with userspace, as
userspace cannot use counter_u64. As pf_krule is not passed to userspace
these fields are now obsolete.
- Split synopsis into two parts. The first explains how to record
sessions, while the second one explains how to replay (some of)
the recorded sessions.
- Fix the -width argument of the environment variables list.
bootparamd: Fix several warnings and increase warn level to 6.
- Increase WARNS to 6.
- Except -Wcast-align and -Wincompatible-pointer-types-discards-qualifiers
checks.
- Use ANSI C prototype.
- Statically variables and functions.
- Add extern declaration for global variables.
- Rename local variables to resolve shadow warnings.
- Ignore malformed directory entries as created by Dropbox ("/").
(rev 1.24)
- Use libarchive 3.x interface: check result for archive_read_free()
and don't call archive_read_close manually. (rev 1.23)
- Always overwrite symlinks on extraction, ever if they're newer than
entries in archive.
- Use getline() rather than getdelim().
PR: 231827
Submitted by: ak
Reviewed by: mm
Obtained from: NetBSD
Ed Maste [Wed, 13 Jan 2021 03:24:52 +0000 (22:24 -0500)]
elftcl: add -i flag to ignore unknown flags
This may allow an identical elfctl invocation to be used on multiple
FreeBSD versions, with features not implemented on older releases being
silently ignored.
PR: 252629 (related)
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28130
Mark Johnston [Sun, 10 Jan 2021 22:46:32 +0000 (17:46 -0500)]
libdtrace: Format USDT symbols correctly based on symbol binding
Before we did not handle weak symbols correctly, sometimes resulting in
link errors from dtrace -G when processing object files where functions
with weak aliases contain USDT probes.
Reported by: rlibby
Sponsored by: The FreeBSD Foundation
Mark Johnston [Mon, 4 Jan 2021 13:22:21 +0000 (08:22 -0500)]
mvneta: Fix 64-bit MIB reads
It appears we must read MIB values as 2 4-byte words, lower address
first. A single 8-byte MIB read returns the value with the lower 4
bytes copied into the upper 4 bytes, resulting in bogus byte counter
values.
Roger Pau Monné [Wed, 25 Nov 2020 11:34:38 +0000 (12:34 +0100)]
xen: allow limiting the amount of duplicated pending xenstore watches
Xenstore watches received are queued in a list and processed in a
deferred thread. Such queuing was done without any checking, so a
guest could potentially trigger a resource starvation against the
FreeBSD kernel if such kernel is watching any user-controlled xenstore
path.
Allowing limiting the amount of pending events a watch can accumulate
to prevent a remote guest from triggering this resource starvation
issue.
For the PV device backends and frontends this limitation is only
applied to the other end /state node, which is limited to 1 pending
event, the rest of the watched paths can still have unlimited pending
watches because they are either local or controlled by a privileged
domain.
The xenstore user-space device gets special treatment as it's not
possible for the kernel to know whether the paths being watched by
user-space processes are controlled by a guest domain. For this reason
watches set by the xenstore user-space device are limited to 1000
pending events. Note this can be modified using the
max_pending_watch_events sysctl of the device.
This is XSA-349.
Sponsored by: Citrix Systems R&D
MFC after: 3 days
Similarly to what done for iflib in 1d238b07d5d4d9660ae0e,
this patch prevents access to the krings during the interface
reset triggered by netmap_register().
When different processes open separate subsets of the
available rings of a same netmap interface, a device
reset may be performed while one of the processes
is actively using some rings (e.g., caused by another
process executing a nmport_open()).
With this patch, such situation will cause the
active process to get a POLLERR, so that it can
have a chance to detect the situation.
We also guarantee that no process is running a txsync
or rxsync (ioctl or poll) while an iflib device reset
is in progress.
When netmap_fl_refill() is called at initialization time (e.g.,
during netmap_iflib_register()), nic_i must be 0, since the
free list is reinitialized. At the end of the refill cycle, nic_i
must still be zero, because exactly N descriptors (N is the ring size)
are refilled.
This patch therefore fixes the assertions to check on nic_i rather
than on nm_i. The current netmap_reset() may in fact cause nm_i
to be != 0 while the device is resetting: this may happen when
multiple non-cooperating processes open different subsets of the
available netmap rings.
The netmap_reset() function is meant to be called by the driver
when they initialize (or re-initialize) a hardware ring.
However, since the introduction of support for opening (in
netmap mode) a subset of the available rings, netmap_reset()
may be called multiple times on actively used rings, causing
both kring and netmap ring to transition to an inconsistent
state.
This changes improves the situation by resetting all the
indices fields of the kring to 0, as expected after the
reinitialization of a hardware ring.
netmap: iflib: enable/disable krings on any interface reinit
Since 1d238b07d5d4d9660ae0e0, krings are disabled before
a reinit cycle triggered by iflib_netmap_register.
However, this operation is actually necessary also for
any interface reinit triggered by other causes (i.e.,
ifconfig commands).
We achieve this goal by moving the krings enable/disable
operation inside iflib_stop() and iflib_init_locked().
Once here, this change also removes some redundant operations
from iflib_netmap_register(), that are already performed by
iflib_stop().
Restore the hwofs functionality temporarily disabled by 7ba6ecf216fb15e8b147db2 to prevent issues with iflib.
This patch brings the necessary changes to iflib to
enable howfs to allow interface restarts without
disrupting netmap applications actively using its
rings.
After this change, it becomes possible for multiple
non-cooperating netmap applications to use non-overlapping
subsets of the available netmap rings without clashing
with each other.
Ed Maste [Mon, 11 Jan 2021 00:02:56 +0000 (19:02 -0500)]
cmp: fix -s (silent) when used with skip offsets
-s causes cmp to print nothing for differing files, for use when only
the exit status is of interest.
-z compares the file size first, for regular files, and fails the
comparison early if they do not match.
Prior to this change -s implied -z as an optimization, but this is not
valid when file offsets are specified. Now, enable the -z optimization
for -s only if both skip arguments are not provided / 0.
Note that using -z with differing skip values will currently always
fail. We may want to compare size1 - skip1 with size2 - skip2 instaead,
and in any case the man page should be clarified.
PR: 252542
Fixes: 3e6902efc802ab57fc4e9bf798f2d271b152e7f9
Reported by: William Ahern
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28071
Ed Maste [Sat, 9 Jan 2021 16:22:28 +0000 (11:22 -0500)]
diff: honour flags with -q
Previously -q (just print a line when files differ) ignored flags like
-w (ignore whitespace). Avoid the D_BRIEF short-circuit when flags are
in effect.
PR: 252515
Reported by: Scott Aitken
Reviewed by: kevans
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28064
Ed Maste [Thu, 19 Nov 2020 00:03:15 +0000 (00:03 +0000)]
libc: fix undefined behavior from signed overflow in strstr and memmem
unsigned char promotes to int, which can overflow when shifted left by
24 bits or more. this has been reported multiple times but then
forgotten. it's expected to be benign UB, but can trap when built with
explicit overflow catching (ubsan or similar). fix it now.
note that promotion to uint32_t is safe and portable even outside of
the assumptions usually made in musl, since either uint32_t has rank
at least unsigned int, so that no further default promotions happen,
or int is wide enough that the shift can't overflow. this is a
desirable property to have in case someone wants to reuse the code
elsewhere.
Ed Maste [Thu, 19 Nov 2020 00:02:12 +0000 (00:02 +0000)]
libc: optimize memmem two-way bad character shift
first, the condition (mem && k < p) is redundant, because mem being
nonzero implies the needle is periodic with period exactly p, in which
case any byte that appears in the needle must appear in the last p
bytes of the needle, bounding the shift (k) by p.
second, the whole point of replacing the shift k by mem (=l-p) is to
prevent shifting by less than mem when discarding the memory on shift,
in which case linear time could not be guaranteed. but as written, the
check also replaced shifts greater than mem by mem, reducing the
benefit of the shift. there is no possible benefit to this reduction of
the shift; since mem is being cleared, the full shift is valid and
more optimal. so only replace the shift by mem when it would be less
than mem.
ifa_grouplookup() uses the data loaded in ifa_load() (through is_a_group()), so
we must call ifa_load() before we can rely on any of the data it populates.
Kristof Provost [Mon, 11 Jan 2021 13:09:08 +0000 (14:09 +0100)]
pfctl: Another set skip <group> fix
When retrieving the list of group members we cannot simply use
ifa_lookup(), because it expects the interface to have an IP (v4 or v6)
address. This means that interfaces with no address are not found.
This presents as interfacing being alternately marked as skip and not
whenever the rules are re-loaded.
Happily we only need to fix ifa_grouplookup(). Teach it to also accept
AF_LINK (i.e. interface) node_hosts.
MFC ea0efc370416:
Add support for PL2303HXN to uplcom(4).
Code changes in this commit were obtained from straight from OpenBSD's
uplcom.c with almost no modification, the list of chip names and USB
IDs was obtained from Linux.
Need to update both link layer address and broadcast address when active link changes for IP over infiniband.
This is because the broadcast address contains the so-called P-key, which is interface dependent.
MFC ec52ff6d1411 and 747feea146d8:
Streamline the infiniband code according to the ethernet code.
Specifically implement the if_requestencap callback function for infiniband.
Most of the changes are simply a cut and paste of the equivalent ethernet part.
Ulrich Spörlein [Wed, 23 Dec 2020 21:29:34 +0000 (22:29 +0100)]
Fix newvers.sh to no longer print an outdated SVN rev
We have stopped using SVN, so the notes containing the old SVN revisions
are no longer populated, so fall back to purely counting the number of
commits (currently at about 255337).
Also turn the format more into what git-describe produces, with a name
first, then the number of commits and the hash last. Note that as we
don't tag anything on `main`, git describe will never produce something
useful there and finds the newest vendor tag that was merged in instead.
Ed Maste [Thu, 1 Aug 2019 14:13:04 +0000 (14:13 +0000)]
newvers: append commit count to uname version string
In a git world this provides a facsimile of a monotonically increasing
version number. This might be refined further, but this provides a
starting point for investigation.
Reviewed by: cem
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20462
The "Relnotes" flag in the commit message is removed in cherry-pick as
git repo doesn't affect the stable/11 and stable/12 release.
Mark Johnston [Fri, 8 Jan 2021 18:32:04 +0000 (13:32 -0500)]
mpr, mps: Fix a stack buffer overflow in the user passthru ioctl
Previously we copied in the request into a stack-allocated structure
that could be smaller than the request size. Furthermore, we checked
the request size only after doing the copyin.
Fix this by allocating a buffer to hold the request, then copying the
buffer's contents into a command descriptor. This is a bit heavy-handed
but I expect the overhead will not be noticeable. The approach of
coping the header in first is susceptible to TOCTOU problems.
Mark Johnston [Fri, 8 Jan 2021 18:32:04 +0000 (13:32 -0500)]
safexcel: Fix a race around unblocking of crypto ops
safexcel_ring_intr() could fail to observed that sc_blocked is set after
completing all outstanding ops for a ring, in which case blocked ops
would be deferred forever.
Request structures are managed by individual rings, so move the
"blocked" flag into the per-ring state block and use the ring lock to
synchronize with safexcel_process(). Remove sc_mtx since it is now
unused.
Mark Johnston [Fri, 8 Jan 2021 18:32:04 +0000 (13:32 -0500)]
safexcel: Stop using a stack buffer for the ring lock name
mtx_init() does not make a copy of the name so the buffer must be valid
for the lifetime of the driver instance. Store each ring's lock's name
in the ring structure.
Support for NS_MOREFRAG is broken, as NS_MOREFRAG is copied from
the TX slot to the RX slot rather than the other way around.
Also, the NS_MOREFRAG must be copied also in case of packet
copy (no zerocopy).
scrmap is only used in the one compilation unit in all cases, make it static
rather than extern'ing it. There's little benefit, but it's easy to do.
It's unclear how this hasn't failed many builds before now, since it should
have cropped up sometime around deeper hierarchies getting a default WARNS.
Each entry actually stores a native pointer, not a uint64_t quantity. While
we're here, go ahead and export the pointer as-is rather than converting it
to KVA. This may be more useful as consumers can map /dev/mem and observe
the entry.
For reference, see: sys/contrib/edk2/Include/Uefi/UefiSpec.h
Dimitry Andric [Fri, 8 Jan 2021 16:35:18 +0000 (17:35 +0100)]
MFC llvm fixes for building ceph on powerpc
Merge commit 8f5e3c74b from llvm git (by Teresa Johnson):
[PowerPC] Fix compile time issue in recursive CTR analysis code
Summary:
Avoid re-examining operands on recursive walk looking for CTR.
This was causing huge compile time after some earlier optimization
created a large expression.
The start of the expression (created by IndVarSimplify) looked like:
%469 = lshr i64 trunc (i128 xor (i128 udiv (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 ptrtoint (i8 @_ZN4absl13hash_internal13CityHashState5kSeedE to i64), i64 120) to i128), i128 8192506886679785011), i128 64), i128 mul (i128 zext (i64 add (i64 ptrtoint (i8 @_ZN4absl13hash_internal13CityHashState5kSeedE to i64), i64 120) to i128), i128 8192506886679785011)) to i64), i64 45) to i128), i128 8192506886679785011), i128 64), i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 ptrtoint (i8 @_ZN4absl13hash_internal13CityHashState5kSeedE to i64), i64 120) to i128), i128 8192506886679785011), i128 64), i128 mul (i128 zext (i64 add (i64 ptrtoint (i8 @_ZN4absl13hash_internal13CityHashState5kSeedE to i64), i64 120) to i128), i128 8192506886679785011)) to i64), i64 45) to i128), ...
with the _ZN4absl13hash_internal13CityHashState5kSeedE referenced many times.
Merge commit 4f568fbd2 from llvm git (by Nemanja Ivanovic):
[PowerPC] Do not emit HW loop when TLS var accessed in PHI of loop exit
If any PHI nodes in loop exit blocks have incoming values from the
loop that are accesses of TLS variables with local dynamic or general
dynamic TLS model, the address will be computed inside the loop. Since
this includes a call to __tls_get_addr, this will in turn cause the
CTR loops verifier to complain.
Disable CTR loops in such cases.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=48527
This should fix building ceph 12.2.12 on powerpc64, powerpc, powerpcspe
and powerpc64le.
Marius Strobl [Wed, 26 Jun 2019 15:28:21 +0000 (15:28 +0000)]
o In iflib_txq_drain():
- Remove desc_used, which is only ever written to.
- Remove a dead store to reclaimed.
- Don't recycle avail.
- Sort variables according to style(9).
These changes will make a subsequent commit easier to read.
o In iflib_tx_credits_update(), don't bother checking whether the
ift_txd_credits_update method pointer is NULL; _iflib_pre_assert()
asserts upfront that this method has been assigned and functions
like iflib_{fast_intr_rxtx,netmap_timer_adjust,txq_can_drain}()
and _task_fn_tx() were already unconditionally relying on the
method being callable.
Marko Zec [Wed, 19 Jun 2019 08:39:19 +0000 (08:39 +0000)]
Evaluating htons() at compile time is more efficient than doing ntohs()
at runtime. This change removes a dependency on a barrel shifter pass
before branch resolution, while reducing the instruction stream size
by 9 bytes on amd64.
Mark Johnston [Sun, 3 Jan 2021 16:32:30 +0000 (11:32 -0500)]
Ensure that dirent's d_off field is initialized
We have the d_off field in struct dirent for providing the seek offset
of the next directory entry. Several filesystems were not initializing
the field, which ends up being copied out to userland.
Mark Johnston [Wed, 23 Dec 2020 16:31:47 +0000 (11:31 -0500)]
qatfw: Fix firmware autoloading for qat_c2xxx devices
r368193 was suppsed to rename the MOF firmware image, but the
qat_c2xxxfw makefile defined the two images in the wrong order so the
MMP image was renamed instead.