Mark Johnston [Fri, 2 Jun 2023 15:58:29 +0000 (11:58 -0400)]
ossl: Add a VAES-based AES-GCM implementation for amd64
aes-gcm-avx512.S is generated from OpenSSL 3.1 and implements AES-GCM.
ossl_x86.c detects whether the CPU implements the required AVX512
instructions; if not, the ossl(4) module does not provide an AES-GCM
implementation. The VAES implementation increases throughput for all
buffer sizes in both directions, up to 2x for sufficiently large
buffers.
The "process" implementation is in two parts: a generic OCF layer in
ossl_aes.c that calls a set of MD functions to do the heavy lifting.
The intent there is to make it possible to add other implementations for
other platforms, e.g., to reduce the diff required for D37421.
A follow-up commit will add a fallback path to legacy AES-NI, so that
ossl(4) can be used in preference to aesni(4) on all amd64 platforms.
In the long term we would like to replace aesni(4) and armv8crypto(4)
with ossl(4).
Note, currently this implementation will not be selected by default
since aesni(4) and ossl(4) return the same probe priority for crypto
sessions, and the opencrypto framework selects the first registered
implementation to break a tie. Since aesni(4) is compiled into the
kernel, aesni(4) wins. A separate change may modify ossl(4) to have
priority.
Mark Johnston [Fri, 2 Jun 2023 15:57:38 +0000 (11:57 -0400)]
ossl: Expose more CPUID bits in OPENSSL_ia32cap_P
This is needed to let OpenSSL 3.1 routines detect VAES and VPCLMULQDQ
extensions. The intent is to import ASM routines which implement
AES-GCM using VEX-prefixed AES-NI instructions.
Kristof Provost [Tue, 30 May 2023 19:17:54 +0000 (21:17 +0200)]
pf: carry over rule actions from route-to rules
If we route-to (or dup-to/reply-to) we re-run pf_test(), which will also
create states for the connection.
This means that we may end up matching a different (i.e. not the state
that was created by the route-to rule) state, without the attributes
(such as dummynet pipes/queues) set by the route-to rule.
Address this by inheriting the pf_rule_actions from the route-to rule
while evaluating the connection again in pf_test(). That is, we set
default pf_rule_actions based on the route-to rule for the new
evaluation. The new rule may still overrule these, but if it does not
have such actions the route-to actions are applied.
Do the same for IPv6 rules in pf_test6()/pf_route6().
Pierre Pronchery [Thu, 25 May 2023 17:09:27 +0000 (19:09 +0200)]
dumpon: Request the OpenSSL 1.1 API
OPENSSL_API_COMPAT can be used to specify the OpenSSL API version in
use for the purpose of hiding deprecated interfaces and enabling
the appropriate deprecation notices.
This change is a NFC while we're still using OpenSSL 1.1.1 but will
avoid deprecation warnings upon the switch to OpenSSL 3.0.
A future update may migrate to use the OpenSSL 3.0 APIs.
PR: 271615
Pull request: https://github.com/freebsd/freebsd-src/pull/757
Sponsored by: The FreeBSD Foundation
Ed Maste [Wed, 31 May 2023 16:25:58 +0000 (12:25 -0400)]
dumpon: update OpenSSL initialization call
ERR_load_crypto_strings() was deprecated in OpenSSL 1.1.0, and explicit
initialization is generally not reqiured. In the case of dumpon however
we initialize prior to entering capability mode, so replace with an
OPENSSL_init_crypto call.
Reviewed by: def, Pierre Pronchery
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D40353
cpuset(3): Move cpuset's parselist function into libutil
In order to allow to add cpuset(2) functionality to more utilities than just
cpuset(1) move the parselist code into libutil
While here, make the code a little more "library" friendly, by returning a range
of various errors so that the consumer can check for them and report appropriate
error message to the users
devctl: allow to register a hook to receive the events
In preparation for netlink sysvent add a function that allow
registering a function to hook the events and also send it via
another kernel module (nlsysvent will be that module).
Prepare a static list of known existing events in the kernel that
will be used to prepopulate nlsysvent multicast group (one per event)
Rick Macklem [Thu, 1 Jun 2023 20:43:00 +0000 (13:43 -0700)]
rpc.tls[serv|clnt]d.c: Clean up code for OpenSSL3
There were several function calls that are deprecated for
OpenSSL1.1.1. These have been removed.
There was also a function call deprecated for OpenSSL3 and
that one has been #ifdef'd on OPENSSL_VERSION_NUMBER.
Eric van Gyzen [Thu, 1 Jun 2023 19:14:07 +0000 (14:14 -0500)]
Import vixie cron 4.0
Specifically, import the diff from commit e745bd4c10ab to
commit 83563783cc2 in https://github.com/vixie/cron.git
My sole motivation is changing to the common MIT license.
The old license, especially the "buildable source" clause,
is unfriendly for commercial users of this code. Simply
changing the license without importing [most of] the code
accompanying that license seemed legally dubious.
The most regrettable change is losing Paul's uucp path.
I partially atone for this loss by restoring the upstream
$Id$ tags, since $FreeBSD$ is no longer useful.
This is [intended to be] a complete list of the functional
changes in this commit. Some changes were made so that we
could consider vixie cron to be our upstream and reduce our
diffs against it, while others were simply a good idea.
- main() - use putenv instead of setenv for PATH
- open_pidfile no longer needs snprintf to build pidfile
- crontab main() - abort() on impossible errors
- check for truncation when building strings with snprintf
- getdtablesize() -> sysconf(_SC_OPEN_MAX)
These changes were not taken from upstream's 4.0 diff because
they [could] actually change behavior. Some of them might be
beneficial, but should be taken separately.
- config.h - sendmail args: remove -oi and add -or0s
- call setlocale(LC_ALL, "") at the top of main()
- acquire_daemonlock - we already use pidfile
- cast getpid(), uid_t, and gid_t to long for printf
- remove unnecessary braces - I consider them beneficial
- BSDi support
- glue_strings() - use snprintf(), as we often already did
Cheng Cui [Thu, 1 Jun 2023 11:48:07 +0000 (07:48 -0400)]
cc_cubic: Use units of micro seconds (usecs) instead of ticks in rtt.
This improves TCP friendly cwnd in cases of low latency high drop rate
networks. Tests show +42% and +37% better performance in 1Gpbs and 10Gbps
cases.
netinet6: make IPv6 fragment TTL per-VNET configurable.
Having it configurable adds more flexibility, especially
for the systems with low amount of memory.
Additionally, it allows to speedup frag6/ tests execution.
ifnet: consistently call hooks when the interface gets up.
Some context on the current IPv6 interface setup & address management:
There are two data path for IPv6 initialisation in context of assigning
LL addresses:
1) Userland explicitly requests IFF_UP for the interface w/o any addresses.
if_up() then calls in6_if_up(), which calls in6_ifattach().
The latter sets up some initial ND/IN6 state and disables IPv6 for the
interface if it’s not loopback. If the interface is loopback, then it
adds ::1/128 and LL addresses via in6_ifattach_loopback().
Then, devd notification is generated (if the VNET is the default one),
which triggers rc.network ifconfig_up(), causing ifdisabled to be removed
via SIOCSIFINFO_IN6 from ifconfig. The kernel SIOCSIFINFO_IN6 handler
calls in6_if_up() once again and it assigns the interface link-local address.
2) Userland adds IPv4 or IPv6 address to the interface. SIOCAIFADDR[_IN6]
kernel handler calls IPv4/IPv6 protocol handler to add the address.
Both then call if_ioctl() with SIOCSIFADDR. Ethernet/loopback ioctl handlers
silently sets IFF_UP for the interface. Finally, if.c:ifioctl() wrapper code
compares old and new interface flags and, if IFF_UP is added, it explicitly
calls in6_if_up(), which adds link-local address if either the original
address is IPv6 or the interface is loopback.
In the latter case, “formal” interface-up notifications are missing.
The kernel does not trigger event handler event, does not call carp hook
and does not provide any userland notification.
This diff unifies the event handling in both scenarios, providing the
necessary notifications to the kernel and userland.
Ben Wilber [Thu, 1 Jun 2023 09:29:36 +0000 (11:29 +0200)]
bridge: fix lookup for untagged packets in bridge_transmit()
b0e38a1373 improved if_bridge's ability to cope with different VLANs,
but it failed to update bridge_transmit() to cope with the new rule that
untagged packets are treated as having VLAN ID 0 (rather than 1, as used
to be the case).
netlink: use custom uma zone for the mbuf storage.
Netlink communicates with userland via sockets, utilising
MCLBYTES-sized mbufs to append data to the socket buffers.
These mbufs are never transmitted via logical or physical network.
It may be possible that the 2k mbuf zone is temporary exhausted
due to the DDoS-style traffic, leading to Netlink failure to
respond to the requests.
To address it, this change introduces a custom Netlink-specific
zone for the mbuf storage. It has the following benefits:
* no precious memory from UMA_ZONE_CONTIG zones is utilized for Netlink
* Netlink becomes (more) independent from the traffic spikes and
other related network "corner" conditions.
* Netlink allocations are now isolated within a specific zone, making it
easier to track Netlink mbuf usage and attribute mbufs.
Ed Maste [Wed, 31 May 2023 14:17:52 +0000 (10:17 -0400)]
decryptcore: update for OpenSSL 1.1 API
ERR_load_crypto_strings is deprecated in OpenSSL 1.1, and OpenSSL 1.1
generally does not require explicit initialization. However, we do need
to ensure that initialization is done before entering capability mode so
call OPENSSL_init_crypto instead. Also include header needed for
ERR_error_string.
Reviewed by: vangyzen
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D40343
tap(4) devices advertise themselves as just 'ethernet autoselect',
without duplex or speed capabilities.
This advertisement makes them unable to be aggregated into lacp-based
lagg(4):
- lacp code requires underlying interfaces to be full-duplex, else
interface will not participate in lacp at all
- lacp code requires underlying interface to have non-zero speed, else
this interface can not be selected as active aggregator
Andrew Turner [Thu, 25 May 2023 08:22:03 +0000 (09:22 +0100)]
gicv3: Use an offset to find the redist registers
To find the redistributor registers use the resource we have already
found and add an offset. This removed the need to create a
per-redistributor resource as it can now be a pointer to the resource
found in attach.
While here check the offset is within the bounds of the resource. Some
ACPI tables list each redistributor as a separate memory range, even
if they are physically contiguous. In this case we may not have each
resource virtually contiguous with neighbouring resources. This can
lead to a data abort when reading past the resource range.
Reviewed by: kevans
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D40263
Pierre Pronchery [Thu, 25 May 2023 05:34:44 +0000 (07:34 +0200)]
libunbound: Request the OpenSSL 1.1 API
OPENSSL_API_COMPAT can be used to specify the OpenSSL API version in
use for the purpose of hiding deprecated interfaces and enabling
the appropriate deprecation notices.
This change is a NFC while we're still using OpenSSL 1.1.1 but will
avoid deprecation warnings upon the switch to OpenSSL 3.0.
A future update may migrate to use the OpenSSL 3.0 APIs.
PR: 271615
Reviewed by: emaste
Sponsored by: The FreeBSD Foundation
Adding P2P addresses is complex in both ioctl and Netlink.
In the ioctl interface, "broadcast" field is the same field as the
"peer". In is possible to specify non-p2p address for the p2p
interface in IPv6, but not in IPv4.
In the Netlink interface, "address" field means "peer" address.
As a result, a common notion for the Netlink users is to submit
same address/peer for non-P2P interfaces.
This change customises mapping the attribute on per-family basis.
Specifically,
for IPv4 - if the interface is P2P, assume "address" is p2p and
"local" is the address. If the interfase is non-p2p, use "local"
attribute as the address. If it's not set, use "address" attribute.
for IPv6 - start with "local" attribute as the address. If it's not set,
use use "address" attribute. If both are set and both are the same,
assume non p2p, otherwise add as p2p.
Doug Rabson [Wed, 24 May 2023 13:11:37 +0000 (14:11 +0100)]
netinet*: Fix redirects for connections from localhost
Redirect rules use PFIL_IN and PFIL_OUT events to allow packet filter
rules to change the destination address and port for a connection.
Typically, the rule triggers on an input event when a packet is received
by a router and the destination address and/or port is changed to
implement the redirect. When a reply packet on this connection is output
to the network, the rule triggers again, reversing the modification.
When the connection is initiated on the same host as the packet filter,
it is initially output via lo0 which queues it for input processing.
This causes an input event on the lo0 interface, allowing redirect
processing to rewrite the destination and create state for the
connection. However, when the reply is received, no corresponding output
event is generated; instead, the packet is delivered to the higher level
protocol (e.g. tcp or udp) without reversing the redirect, the reply is
not matched to the connection and the packet is dropped (for tcp, a
connection reset is also sent).
This commit fixes the problem by adding a second packet filter call in
the input path. The second call happens right before the handoff to
higher level processing and provides the missing output event to allow
the redirect's reply processing to perform its rewrite. This extra
processing is disabled by default and can be enabled using pfilctl:
pfilctl link -o pf:default-out inet-local
pfilctl link -o pf:default-out6 inet6-local
Jessica Clarke [Tue, 30 May 2023 23:20:36 +0000 (00:20 +0100)]
pmc: Rework PROCEXEC event to support PIEs
Currently the PROCEXEC event only reports a single address, entryaddr,
which is the entry point of the interpreter in the typical dynamic case,
and used solely to calculate the base address of the interpreter. For
PDEs this is fine, since the base address is known from the program
headers, but for PIEs the base address varies at run time based on where
the kernel chooses to load it, and so pmcstat has no way of knowing the
real address ranges for the executable. This was less of an issue in the
past since PIEs were rare, but now they're on by default on 64-bit
architectures it's more of a problem.
To solve this, pass through what was picked for et_dyn_addr by the
kernel, and use that as the offset for the executable's start address
just as is done for everything in the kernel. Since we're changing this
interface, sanitise the way we determine the interpreter's base address
by passing it through directly rather than indirectly via the entry
point and having to subtract off whatever the ELF header's e_entry is
(and anything that wants the entry point in future can still add that
back on as needed; this merely changes the interface to directly provide
the underlying variables involved).
This will be followed up by a bump to the pmc major version.
Jessica Clarke [Tue, 30 May 2023 23:15:43 +0000 (00:15 +0100)]
imgact: Make et_dyn_addr part of image_params
This already gets passed around between various imgact_elf functions, so
moving it removes an argument from all those places. A future commit
will make use of this for hwpmc, though, to provide the load base for
PIEs, which currently isn't available to tools like pmcstat.
Jessica Clarke [Tue, 30 May 2023 23:15:34 +0000 (00:15 +0100)]
pmc: Provide full path to modules from kernel linker
This unifies the user object and kernel module paths in libpmcstat,
allows modules loaded from non-standard locations (e.g. from a user's
home directory when testing) to be found and, since buffer is what all
the warnings here use (they were never updated when buffer_modules were
added to pick based on where the file was found) has the side-effect of
ensuring the messages are correct.
This includes obsoleting the now-superfluous -k option in pmcstat.
This change breaks the hwpmc ABI and will be followed by a bump to the
pmc major version.
Jessica Clarke [Tue, 30 May 2023 23:15:24 +0000 (00:15 +0100)]
pmc: Initialise and check the pm_flags field for CONFIGURELOG
Whilst the former is not breaking, the latter is, and so this will be
followed by a bump to the pmc major version. This will allow the flags
to actually be usable in future, as otherwise we cannot distinguish
uninitialised stack junk from a deliberately-initialised value.
Mark Johnston [Tue, 30 May 2023 19:15:48 +0000 (15:15 -0400)]
inpcb: Restore missing validation of local addresses for jailed sockets
When looking up a listening socket, the SMR-protected lookup routine may
return a jailed socket with no local address. This happens when using
classic jails with more than one IP address; in a single-IP classic
jail, a bound socket's local address is always rewritten to be that of
the jail.
After commit 7b92493ab1d4, the lookup path failed to check whether the
jail corresponding to a matched wildcard socket actually owns the
address, and would return the match regardless. Restore the omitted
checks.
Fixes: 7b92493ab1d4 ("inpcb: Avoid inp_cred dereferences in SMR-protected lookup")
Reported by: peter
Reviewed by: bz
Differential Revision: https://reviews.freebsd.org/D40268
Mark Johnston [Tue, 30 May 2023 19:11:32 +0000 (15:11 -0400)]
buf: Make the number of pbufs slightly more dynamic
Various subsystems pre-allocate a set of pbufs, allocated to implement
I/O operations. pbuf allocations are transient, unlike most buf
allocations.
Most subsystems preallocate nswbuf or nswbuf/2 pbufs each. The
preallocation ensures that pbuf allocation will succeed in low memory
conditions, which might help avoid deadlocks. Currently we initialize
nswbuf = min(nbuf / 4, 256).
nbuf/4 > 256 on anything but the smallest systems. For example,
nswbuf is 256 in a VM with 128MB of memory. In this configuration, a
firecracker VM with one CPU preallocates over 900 pbufs. This consumes
2MB of RAM and adds several milliseconds to the kernel's (very small)
boot time.
Scale nswbuf by ncpu in the common case. I think this makes more sense
than scaling by the amount of RAM, since pbuf allocations are transient
and aren't used for caching. With the change, we get nswbuf=256 with 8
CPUs. With fewer than 8 CPUs we'll preallocate fewer pbufs than before,
and with more we'll preallocate more.