Ed Maste [Wed, 31 May 2023 16:25:58 +0000 (12:25 -0400)]
dumpon: update OpenSSL initialization call
ERR_load_crypto_strings() was deprecated in OpenSSL 1.1.0, and explicit
initialization is generally not reqiured. In the case of dumpon however
we initialize prior to entering capability mode, so replace with an
OPENSSL_init_crypto call.
Reviewed by: def, Pierre Pronchery
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D40353
cpuset(3): Move cpuset's parselist function into libutil
In order to allow to add cpuset(2) functionality to more utilities than just
cpuset(1) move the parselist code into libutil
While here, make the code a little more "library" friendly, by returning a range
of various errors so that the consumer can check for them and report appropriate
error message to the users
devctl: allow to register a hook to receive the events
In preparation for netlink sysvent add a function that allow
registering a function to hook the events and also send it via
another kernel module (nlsysvent will be that module).
Prepare a static list of known existing events in the kernel that
will be used to prepopulate nlsysvent multicast group (one per event)
Rick Macklem [Thu, 1 Jun 2023 20:43:00 +0000 (13:43 -0700)]
rpc.tls[serv|clnt]d.c: Clean up code for OpenSSL3
There were several function calls that are deprecated for
OpenSSL1.1.1. These have been removed.
There was also a function call deprecated for OpenSSL3 and
that one has been #ifdef'd on OPENSSL_VERSION_NUMBER.
Eric van Gyzen [Thu, 1 Jun 2023 19:14:07 +0000 (14:14 -0500)]
Import vixie cron 4.0
Specifically, import the diff from commit e745bd4c10ab to
commit 83563783cc2 in https://github.com/vixie/cron.git
My sole motivation is changing to the common MIT license.
The old license, especially the "buildable source" clause,
is unfriendly for commercial users of this code. Simply
changing the license without importing [most of] the code
accompanying that license seemed legally dubious.
The most regrettable change is losing Paul's uucp path.
I partially atone for this loss by restoring the upstream
$Id$ tags, since $FreeBSD$ is no longer useful.
This is [intended to be] a complete list of the functional
changes in this commit. Some changes were made so that we
could consider vixie cron to be our upstream and reduce our
diffs against it, while others were simply a good idea.
- main() - use putenv instead of setenv for PATH
- open_pidfile no longer needs snprintf to build pidfile
- crontab main() - abort() on impossible errors
- check for truncation when building strings with snprintf
- getdtablesize() -> sysconf(_SC_OPEN_MAX)
These changes were not taken from upstream's 4.0 diff because
they [could] actually change behavior. Some of them might be
beneficial, but should be taken separately.
- config.h - sendmail args: remove -oi and add -or0s
- call setlocale(LC_ALL, "") at the top of main()
- acquire_daemonlock - we already use pidfile
- cast getpid(), uid_t, and gid_t to long for printf
- remove unnecessary braces - I consider them beneficial
- BSDi support
- glue_strings() - use snprintf(), as we often already did
Cheng Cui [Thu, 1 Jun 2023 11:48:07 +0000 (07:48 -0400)]
cc_cubic: Use units of micro seconds (usecs) instead of ticks in rtt.
This improves TCP friendly cwnd in cases of low latency high drop rate
networks. Tests show +42% and +37% better performance in 1Gpbs and 10Gbps
cases.
netinet6: make IPv6 fragment TTL per-VNET configurable.
Having it configurable adds more flexibility, especially
for the systems with low amount of memory.
Additionally, it allows to speedup frag6/ tests execution.
ifnet: consistently call hooks when the interface gets up.
Some context on the current IPv6 interface setup & address management:
There are two data path for IPv6 initialisation in context of assigning
LL addresses:
1) Userland explicitly requests IFF_UP for the interface w/o any addresses.
if_up() then calls in6_if_up(), which calls in6_ifattach().
The latter sets up some initial ND/IN6 state and disables IPv6 for the
interface if it’s not loopback. If the interface is loopback, then it
adds ::1/128 and LL addresses via in6_ifattach_loopback().
Then, devd notification is generated (if the VNET is the default one),
which triggers rc.network ifconfig_up(), causing ifdisabled to be removed
via SIOCSIFINFO_IN6 from ifconfig. The kernel SIOCSIFINFO_IN6 handler
calls in6_if_up() once again and it assigns the interface link-local address.
2) Userland adds IPv4 or IPv6 address to the interface. SIOCAIFADDR[_IN6]
kernel handler calls IPv4/IPv6 protocol handler to add the address.
Both then call if_ioctl() with SIOCSIFADDR. Ethernet/loopback ioctl handlers
silently sets IFF_UP for the interface. Finally, if.c:ifioctl() wrapper code
compares old and new interface flags and, if IFF_UP is added, it explicitly
calls in6_if_up(), which adds link-local address if either the original
address is IPv6 or the interface is loopback.
In the latter case, “formal” interface-up notifications are missing.
The kernel does not trigger event handler event, does not call carp hook
and does not provide any userland notification.
This diff unifies the event handling in both scenarios, providing the
necessary notifications to the kernel and userland.
Ben Wilber [Thu, 1 Jun 2023 09:29:36 +0000 (11:29 +0200)]
bridge: fix lookup for untagged packets in bridge_transmit()
b0e38a1373 improved if_bridge's ability to cope with different VLANs,
but it failed to update bridge_transmit() to cope with the new rule that
untagged packets are treated as having VLAN ID 0 (rather than 1, as used
to be the case).
netlink: use custom uma zone for the mbuf storage.
Netlink communicates with userland via sockets, utilising
MCLBYTES-sized mbufs to append data to the socket buffers.
These mbufs are never transmitted via logical or physical network.
It may be possible that the 2k mbuf zone is temporary exhausted
due to the DDoS-style traffic, leading to Netlink failure to
respond to the requests.
To address it, this change introduces a custom Netlink-specific
zone for the mbuf storage. It has the following benefits:
* no precious memory from UMA_ZONE_CONTIG zones is utilized for Netlink
* Netlink becomes (more) independent from the traffic spikes and
other related network "corner" conditions.
* Netlink allocations are now isolated within a specific zone, making it
easier to track Netlink mbuf usage and attribute mbufs.
Ed Maste [Wed, 31 May 2023 14:17:52 +0000 (10:17 -0400)]
decryptcore: update for OpenSSL 1.1 API
ERR_load_crypto_strings is deprecated in OpenSSL 1.1, and OpenSSL 1.1
generally does not require explicit initialization. However, we do need
to ensure that initialization is done before entering capability mode so
call OPENSSL_init_crypto instead. Also include header needed for
ERR_error_string.
Reviewed by: vangyzen
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D40343
tap(4) devices advertise themselves as just 'ethernet autoselect',
without duplex or speed capabilities.
This advertisement makes them unable to be aggregated into lacp-based
lagg(4):
- lacp code requires underlying interfaces to be full-duplex, else
interface will not participate in lacp at all
- lacp code requires underlying interface to have non-zero speed, else
this interface can not be selected as active aggregator
Andrew Turner [Thu, 25 May 2023 08:22:03 +0000 (09:22 +0100)]
gicv3: Use an offset to find the redist registers
To find the redistributor registers use the resource we have already
found and add an offset. This removed the need to create a
per-redistributor resource as it can now be a pointer to the resource
found in attach.
While here check the offset is within the bounds of the resource. Some
ACPI tables list each redistributor as a separate memory range, even
if they are physically contiguous. In this case we may not have each
resource virtually contiguous with neighbouring resources. This can
lead to a data abort when reading past the resource range.
Reviewed by: kevans
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D40263
Pierre Pronchery [Thu, 25 May 2023 05:34:44 +0000 (07:34 +0200)]
libunbound: Request the OpenSSL 1.1 API
OPENSSL_API_COMPAT can be used to specify the OpenSSL API version in
use for the purpose of hiding deprecated interfaces and enabling
the appropriate deprecation notices.
This change is a NFC while we're still using OpenSSL 1.1.1 but will
avoid deprecation warnings upon the switch to OpenSSL 3.0.
A future update may migrate to use the OpenSSL 3.0 APIs.
PR: 271615
Reviewed by: emaste
Sponsored by: The FreeBSD Foundation
Adding P2P addresses is complex in both ioctl and Netlink.
In the ioctl interface, "broadcast" field is the same field as the
"peer". In is possible to specify non-p2p address for the p2p
interface in IPv6, but not in IPv4.
In the Netlink interface, "address" field means "peer" address.
As a result, a common notion for the Netlink users is to submit
same address/peer for non-P2P interfaces.
This change customises mapping the attribute on per-family basis.
Specifically,
for IPv4 - if the interface is P2P, assume "address" is p2p and
"local" is the address. If the interfase is non-p2p, use "local"
attribute as the address. If it's not set, use "address" attribute.
for IPv6 - start with "local" attribute as the address. If it's not set,
use use "address" attribute. If both are set and both are the same,
assume non p2p, otherwise add as p2p.
Doug Rabson [Wed, 24 May 2023 13:11:37 +0000 (14:11 +0100)]
netinet*: Fix redirects for connections from localhost
Redirect rules use PFIL_IN and PFIL_OUT events to allow packet filter
rules to change the destination address and port for a connection.
Typically, the rule triggers on an input event when a packet is received
by a router and the destination address and/or port is changed to
implement the redirect. When a reply packet on this connection is output
to the network, the rule triggers again, reversing the modification.
When the connection is initiated on the same host as the packet filter,
it is initially output via lo0 which queues it for input processing.
This causes an input event on the lo0 interface, allowing redirect
processing to rewrite the destination and create state for the
connection. However, when the reply is received, no corresponding output
event is generated; instead, the packet is delivered to the higher level
protocol (e.g. tcp or udp) without reversing the redirect, the reply is
not matched to the connection and the packet is dropped (for tcp, a
connection reset is also sent).
This commit fixes the problem by adding a second packet filter call in
the input path. The second call happens right before the handoff to
higher level processing and provides the missing output event to allow
the redirect's reply processing to perform its rewrite. This extra
processing is disabled by default and can be enabled using pfilctl:
pfilctl link -o pf:default-out inet-local
pfilctl link -o pf:default-out6 inet6-local
Jessica Clarke [Tue, 30 May 2023 23:20:36 +0000 (00:20 +0100)]
pmc: Rework PROCEXEC event to support PIEs
Currently the PROCEXEC event only reports a single address, entryaddr,
which is the entry point of the interpreter in the typical dynamic case,
and used solely to calculate the base address of the interpreter. For
PDEs this is fine, since the base address is known from the program
headers, but for PIEs the base address varies at run time based on where
the kernel chooses to load it, and so pmcstat has no way of knowing the
real address ranges for the executable. This was less of an issue in the
past since PIEs were rare, but now they're on by default on 64-bit
architectures it's more of a problem.
To solve this, pass through what was picked for et_dyn_addr by the
kernel, and use that as the offset for the executable's start address
just as is done for everything in the kernel. Since we're changing this
interface, sanitise the way we determine the interpreter's base address
by passing it through directly rather than indirectly via the entry
point and having to subtract off whatever the ELF header's e_entry is
(and anything that wants the entry point in future can still add that
back on as needed; this merely changes the interface to directly provide
the underlying variables involved).
This will be followed up by a bump to the pmc major version.
Jessica Clarke [Tue, 30 May 2023 23:15:43 +0000 (00:15 +0100)]
imgact: Make et_dyn_addr part of image_params
This already gets passed around between various imgact_elf functions, so
moving it removes an argument from all those places. A future commit
will make use of this for hwpmc, though, to provide the load base for
PIEs, which currently isn't available to tools like pmcstat.
Jessica Clarke [Tue, 30 May 2023 23:15:34 +0000 (00:15 +0100)]
pmc: Provide full path to modules from kernel linker
This unifies the user object and kernel module paths in libpmcstat,
allows modules loaded from non-standard locations (e.g. from a user's
home directory when testing) to be found and, since buffer is what all
the warnings here use (they were never updated when buffer_modules were
added to pick based on where the file was found) has the side-effect of
ensuring the messages are correct.
This includes obsoleting the now-superfluous -k option in pmcstat.
This change breaks the hwpmc ABI and will be followed by a bump to the
pmc major version.
Jessica Clarke [Tue, 30 May 2023 23:15:24 +0000 (00:15 +0100)]
pmc: Initialise and check the pm_flags field for CONFIGURELOG
Whilst the former is not breaking, the latter is, and so this will be
followed by a bump to the pmc major version. This will allow the flags
to actually be usable in future, as otherwise we cannot distinguish
uninitialised stack junk from a deliberately-initialised value.
Mark Johnston [Tue, 30 May 2023 19:15:48 +0000 (15:15 -0400)]
inpcb: Restore missing validation of local addresses for jailed sockets
When looking up a listening socket, the SMR-protected lookup routine may
return a jailed socket with no local address. This happens when using
classic jails with more than one IP address; in a single-IP classic
jail, a bound socket's local address is always rewritten to be that of
the jail.
After commit 7b92493ab1d4, the lookup path failed to check whether the
jail corresponding to a matched wildcard socket actually owns the
address, and would return the match regardless. Restore the omitted
checks.
Fixes: 7b92493ab1d4 ("inpcb: Avoid inp_cred dereferences in SMR-protected lookup")
Reported by: peter
Reviewed by: bz
Differential Revision: https://reviews.freebsd.org/D40268
Mark Johnston [Tue, 30 May 2023 19:11:32 +0000 (15:11 -0400)]
buf: Make the number of pbufs slightly more dynamic
Various subsystems pre-allocate a set of pbufs, allocated to implement
I/O operations. pbuf allocations are transient, unlike most buf
allocations.
Most subsystems preallocate nswbuf or nswbuf/2 pbufs each. The
preallocation ensures that pbuf allocation will succeed in low memory
conditions, which might help avoid deadlocks. Currently we initialize
nswbuf = min(nbuf / 4, 256).
nbuf/4 > 256 on anything but the smallest systems. For example,
nswbuf is 256 in a VM with 128MB of memory. In this configuration, a
firecracker VM with one CPU preallocates over 900 pbufs. This consumes
2MB of RAM and adds several milliseconds to the kernel's (very small)
boot time.
Scale nswbuf by ncpu in the common case. I think this makes more sense
than scaling by the amount of RAM, since pbuf allocations are transient
and aren't used for caching. With the change, we get nswbuf=256 with 8
CPUs. With fewer than 8 CPUs we'll preallocate fewer pbufs than before,
and with more we'll preallocate more.
Jan Schaumann [Tue, 30 May 2023 12:55:38 +0000 (15:55 +0300)]
split(1): auto-extend suffix length if required
If the input cannot be split into the number of files resulting from the
default suffix length, automatically extend the suffix length rather
than bailing out with 'too many files'.
Suffixes are extended such that the resulting files continue to sort
lexically and "cat *" would reproduce the input. For example, splitting
a 1M lines file into (default) 1000 lines per file would yield files
named 'xaa', 'xab', ..., 'xyy', 'xyz', 'xzaaa', 'xzaab', ..., 'xzanl'.
If '-a' is specified, the suffix length is not auto-extended.
This behavior matches GNU sort(1) since around version 8.16.
Reviewed by: christos
Approved by: kevans
Different Revision: https://reviews.freebsd.org/D38279
pf: make contents of struct pfsync_state configurable
Make struct pfsync_state contents configurable by sending out new
versions of the structure in separate subheader actions. Both old and
new version of struct pfsync_state can be understood, so replication of
states from a system running an older kernel is possible. The version
being sent out is configured using ifconfig pfsync0 … version XXXX. The
version is an user-friendly string - 1301 stands for FreeBSD 13.1 (I
have checked synchronization against a host running 13.1), 1400 stands
for 14.0.
A host running an older kernel will just ignore the messages and count
them as "packets discarded for bad action".
Chuck Silvers [Tue, 30 May 2023 02:26:28 +0000 (19:26 -0700)]
ffs: restore backward compatibility of newfs and makefs with older binaries
The previous change to CGSIZE had the unintended side-effect of allowing
newfs and makefs to create file systems that would fail validation when
examined by older commands and kernels, by allowing newfs/makefs to pack
slightly more blocks into a CG than those older binaries think is valid.
Fix this by having newfs/makefs artificially restrict the number of blocks
in a CG to the slightly smaller value that those older binaries will accept.
The validation code will continue to accept the slightly larger value
that the current newfs/makefs (before this change) could create.
reapkill: handle possible pid reuse after the pid was recorded as signalled
Nothing prevents the signalled process from exiting, and then other
process among eligible for signalling to reuse the exited process pid.
In this case, presence of the pid in the 'pids' unr set prevents it from
getting the deserved signal.
Handle it by marking each process with the new flag P2_REAPKILLED when
we are about to send the signal. If the process pid is present in the
pids unr, but the struct proc is not marked with P2_REAPKILLED, we must
send signal to the pid again.
The use of the flag relies on the global sapblk preventing parallel
reapkills.
The pids unr must be used to clear the flags to all signalled processes.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D40089
To use, compile userspace code e.g. into the subr_unit binary, then do
$ while ./subr_unit -iv >|/tmp/subr_unit.log ; do :; done
The loop should be left run for as long as possible.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D40089
Kirk McKusick [Mon, 29 May 2023 21:58:20 +0000 (14:58 -0700)]
Fix a bug in fsck_ffs(8) triggered by corrupted filesystems.
When loading the root directory ensure that it is a directory
and has a size greater than the minimum directory size. If an
invalid root directory is found, fall back to full fsck.
Reported-by: Robert Morris
PR: 271414
MFC-after: 1 week Sponsored-by: The FreeBSD Foundation
Kirk McKusick [Mon, 29 May 2023 21:54:52 +0000 (14:54 -0700)]
Cleanups to fsck_ffs(8).
When checking an inode ensure that it does not have a negative size.
Stop scaning a directory when an unallocated block is found.
Fully clear an inode when it is first allocated.
Ensure that an inode is marked dirty whenever it is updated and that
it has a correct check hash when it is released.
MFC-after: 1 week Sponsored-by: The FreeBSD Foundation
Tetsuya Uemura [Sun, 28 May 2023 12:56:21 +0000 (09:56 -0300)]
bcm2835_gpio: Handle BCM2711 pin configuration
Add support for GPIO internal pull up/down configuration on RPi4 family.
BCM2711 SoC on 4th generation Raspberry Pi changed the way to configure
its GPIO pins' internal pull up/down resistors. NetBSD already have
support for this change, and port it to FreeBSD is trivial.
This patch, based on the NetBSD commit adds the appropriate method for
BCM2711 and now we can properly configure the GPIO pins' pull status.